Commit Graph

1785 Commits

Author SHA1 Message Date
Robert Haas 1e53fe0e70 Use PostgresSigHupHandler in more places.
There seems to be no reason for every background process to have
its own flag indicating that a config-file reload is needed.
Instead, let's just use ConfigFilePending for that purpose
everywhere.

Patch by me, reviewed by Andres Freund and Daniel Gustafsson.

Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
2019-12-17 13:03:57 -05:00
Robert Haas 5910d6c7e3 Move interrupt-handling code into subroutines.
Some auxiliary processes, as well as the autovacuum launcher,
have interrupt handling code directly in their main loops.
Try to abstract things a little better by moving it into
separate functions.

This doesn't make any functional difference, and leaves
in place relatively large differences among processes in how
interrupts are handled, but hopefully it at least makes it
easier to see the commonalities and differences across
process types.

Patch by me, reviewed by Andres Freund and Daniel Gustafsson.

Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
2019-12-17 12:55:13 -05:00
Robert Haas 0d3c3aae33 Use procsignal_sigusr1_handler for auxiliary processes.
AuxiliaryProcessMain does ProcSignalInit, so one might expect that
auxiliary processes would need to respond to SendProcSignal, but none
of the auxiliary processes do that. Change them to use
procsignal_sigusr1_handler instead of their own private handlers so
that they do. Besides seeming more correct, this is also less code. It
shouldn't make any functional difference right now because, as far as
we know, there are no current cases where SendProcSignal targets an
auxiliary process, but there are plans to change that in the future.

Andres Freund

Discussion: http://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
2019-11-25 16:16:27 -05:00
Tom Lane 7bf40ea0d0 Avoid using SplitIdentifierString to parse ListenAddresses, too.
This gets rid of our former behavior of forcibly downcasing
the postmaster's hostname list and truncating the elements to
NAMEDATALEN.  In principle, DNS hostnames are case-insensitive
so the first behavior should be harmless, and server hostnames
are seldom long enough for the second behavior to be an issue.
But it's still dubious, and an easy fix is available: just use
SplitGUCList instead.

AFAICT, all other SplitIdentifierString calls in the backend are
OK: either the items actually are SQL identifiers, or they are
keywords that are short and case-insensitive.

Per thinking about bug #16106.  While this has been wrong for
a very long time, the lack of field complaints means there's
little reason to back-patch.

Discussion: https://postgr.es/m/16106-7d319e4295d08e70@postgresql.org
2019-11-13 13:51:58 -05:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Michael Paquier e3db3f829f Clean up properly error_context_stack in autovacuum worker on exception
Any callback set would have no meaning in the context of an exception.
As an autovacuum worker exits quickly in this context, this could be
only an issue within EmitErrorReport(), where the elog hook is for
example called.  That's unlikely to going to be a problem, but let's be
clean and consistent with other code paths handling exceptions.  This is
present since 2909419, which introduced autovacuum.

Author: Ashwin Agrawal
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALfoeisM+_+dgmAdAOHAu0k-ZpEHHqSSG=GRf3pKJGm8OqWX0w@mail.gmail.com
Backpatch-through: 9.4
2019-10-23 10:25:06 +09:00
Tom Lane 9abb2bfc04 In the postmaster, rely on the signal infrastructure to block signals.
POSIX sigaction(2) can be told to block a set of signals while a
signal handler executes.  Make use of that instead of manually
blocking and unblocking signals in the postmaster's signal handlers.
This should save a few cycles, and it also prevents recursive
invocation of signal handlers when many signals arrive in close
succession.  We have seen buildfarm failures that seem to be due to
postmaster stack overflow caused by such recursion (exacerbated by
a Linux PPC64 kernel bug).

This doesn't change anything about the way that it works on Windows.
Somebody might consider adjusting port/win32/signal.c to let it work
similarly, but I'm not in a position to do that.

For the moment, just apply to HEAD.  Possibly we should consider
back-patching this, but it'd be good to let it age awhile first.

Discussion: https://postgr.es/m/14878.1570820201@sss.pgh.pa.us
2019-10-13 15:48:26 -04:00
Tom Lane 3887e9455f Check for too many postmaster children before spawning a bgworker.
The postmaster's code path for spawning a bgworker neglected to check
whether we already have the max number of live child processes.  That's
a bit hard to hit, since it would necessarily be a transient condition;
but if we do, AssignPostmasterChildSlot() fails causing a postmaster
crash, as seen in a report from Bhargav Kamineni.

To fix, invoke canAcceptConnections() in the bgworker code path, as we
do in the other code paths that spawn children.  Since we don't want
the same pmState tests in this case, add a child-process-type parameter
to canAcceptConnections() so that it can know what to do.

Back-patch to 9.5.  In principle the same hazard exists in 9.4, but the
code is enough different that this patch wouldn't quite fix it there.
Given the tiny usage of bgworkers in that branch it doesn't seem worth
creating a variant patch for it.

Discussion: https://postgr.es/m/18733.1570382257@sss.pgh.pa.us
2019-10-07 12:39:09 -04:00
Tom Lane 9a86f03b4e Rearrange postmaster's startup sequence for better syslogger results.
This is a second try at what commit 57431a911 tried to do, namely,
launch the syslogger before we open postmaster sockets so that our
messages about the sockets end up in the syslogger files.  That
commit fell foul of a bunch of subtle issues caused by trying to
launch a postmaster child process before creating shared memory.
Rather than messing with that interaction, let's postpone opening
the sockets till after we launch the syslogger.

This would not have been terribly safe before commit 7de19fbc0,
because we relied on socket opening to detect whether any competing
postmasters were using the same port number.  But now that we choose
IPC keys without regard to the port number, there's no interaction
to worry about.

Also delay creation of the external PID file (if requested) till after
the sockets are open, since external code could plausibly be relying
on that ordering of events.  And postpone most of the work of
RemovePgTempFiles() so that that potentially-slow processing still
happens after we make the external PID file.  We have to be a bit
careful about that last though: as noted in the discussion subsequent to
bug #15804, EXEC_BACKEND builds still have to clear the parameter-file
temp dir before launching the syslogger.

Patch by me; thanks to Michael Paquier for review/testing.

Discussion: https://postgr.es/m/15804-3721117bf40fb654@postgresql.org
2019-09-11 11:43:01 -04:00
Tom Lane 7de19fbc0b Use data directory inode number, not port, to select SysV resource keys.
This approach provides a much tighter binding between a data directory
and the associated SysV shared memory block (and SysV or named-POSIX
semaphores, if we're using those).  Key collisions are still possible,
but only between data directories stored on different filesystems,
so the situation should be negligible in practice.  More importantly,
restarting the postmaster with a different port number no longer
risks failing to identify a relevant shared memory block, even when
postmaster.pid has been removed.  A standalone backend is likewise
much more certain to detect conflicting leftover backends.

(In the longer term, we might now think about deprecating the port as
a cluster-wide value, so that one postmaster could support sockets
with varying port numbers.  But that's for another day.)

The hazards fixed here apply only on Unix systems; our Windows code
paths already use identifiers derived from the data directory path
name rather than the port.

src/test/recovery/t/017_shm.pl, which intends to test key-collision
cases, has been substantially rewritten since it can no longer use
two postmasters with identical port numbers to trigger the case.
Instead, use Perl's IPC::SharedMem module to create a conflicting
shmem segment directly.  The test script will be skipped if that
module is not available.  (This means that some older buildfarm
members won't run it, but I don't think that that results in any
meaningful coverage loss.)

Patch by me; thanks to Noah Misch and Peter Eisentraut for discussion
and review.

Discussion: https://postgr.es/m/16908.1557521200@sss.pgh.pa.us
2019-09-05 13:31:46 -04:00
Michael Paquier ae060a52b2 Fix thinko when ending progress report for a backend
The logic ending progress reporting for a backend entry introduced by
b6fb647 causes callers of pgstat_progress_end_command() to do some extra
work when track_activities is enabled as the process fields are reset in
the backend entry even if no command were started for reporting.

This resets the fields only if a command is registered for progress
reporting, and only if track_activities is enabled.

Author: Masahiho Sawada
Discussion: https://postgr.es/m/CAD21AoCry_vJ0E-m5oxJXGL3pnos-xYGCzF95rK5Bbi3Uf-rpA@mail.gmail.com
Backpatch-through: 9.6
2019-09-04 15:46:37 +09:00
Tom Lane ee32782395 Fix postmaster state machine to handle dead_end child crashes better.
A report from Alvaro Herrera shows that if we're in PM_STARTUP
state, and we spawn a dead_end child to reject some incoming
connection request, and that child dies with an unexpected exit
code, the postmaster does not respond well.  We correctly send
SIGQUIT to the startup process, but then:

* if the startup process exits with nonzero exit code, as expected,
we thought that that indicated a crash and aborted startup.

* if the startup process exits with zero exit code, which is possible
due to the inherent race condition, we'd advance to PM_RUN state
which is fine --- but the code forgot that AbortStartTime would be
nonzero in this situation.  We'd either die on the Asserts saying
that it was zero, or perhaps misbehave later on.  (A quick look
suggests that the only misbehavior might be busy-waiting due to
DetermineSleepTime doing the wrong thing.)

To fix the first point, adjust the state-machine logic to recognize
that a nonzero exit code is expected after sending SIGQUIT, and have
it transition to a state where we can restart the startup process.
To fix the second point, change the Asserts to clear the variable
rather than just claiming it should be clear already.

Perhaps we could improve this further by not treating a crash of
a dead_end child as a reason for panic'ing the database.  However,
since those child processes are connected to shared memory, that
seems a bit risky.  There are few good reasons for a dead_end child
to report failure anyway (the cause of this in Alvaro's report is
quite unclear).  On balance, therefore, a minimal fix seems best.

This is an oversight in commit 45811be94.  While that was back-patched,
I'm hesitant to back-patch this change.  The lack of reasons for a
dead_end child to fail suggests that the case should be very rare in
the field, which squares with the lack of reports; so it seems like
this might not be worth the risk of introducing new issues.  In any
case we can let it bake awhile in HEAD before considering a back-patch.

Discussion: https://postgr.es/m/20190615160950.GA31378@alvherre.pgsql
2019-08-26 15:59:44 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Michael Paquier 8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Michael Paquier eb43f3d193 Fix inconsistencies and typos in the tree
This is numbered take 8, and addresses again a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-29 12:28:30 +09:00
Michael Paquier 6b8548964b Fix inconsistencies in the code
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
2019-07-08 13:15:09 +09:00
Peter Eisentraut 7e9a4c5c3d Use consistent style for checking return from system calls
Use

    if (something() != 0)
        error ...

instead of just

    if (something)
        error ...

The latter is not incorrect, but it's a bit confusing and not the
common style.

Discussion: https://www.postgresql.org/message-id/flat/5de61b6b-8be9-7771-0048-860328efe027%402ndquadrant.com
2019-07-07 15:28:49 +02:00
Michael Paquier c74d49d41c Fix many typos and inconsistencies
Author: Alexander Lakhin
Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
2019-07-01 10:00:23 +09:00
Michael Paquier f43608bda2 Fix typos and inconsistencies in code comments
Author: Alexander Lakhin
Discussion: https://postgr.es/m/dec6aae8-2d63-639f-4d50-20e229fb83e3@gmail.com
2019-06-14 09:34:34 +09:00
Noah Misch 31d250e049 Update stale comments, and fix comment typos. 2019-06-08 10:12:26 -07:00
Amit Kapila 92c4abc736 Fix assorted inconsistencies.
There were a number of issues in the recent commits which include typos,
code and comments mismatch, leftover function declarations.  Fix them.

Reported-by: Alexander Lakhin
Author: Alexander Lakhin, Amit Kapila and Amit Langote
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
2019-06-08 08:16:38 +05:30
Amit Kapila 9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane 8334515529 Revert "postmaster: Start syslogger earlier".
This commit reverts 57431a911d.

While that's still a good idea in the abstract, we found out
that there are multiple crasher bugs in it on Windows builds,
making the logging_collector option unusable on Windows.
There's no time left to fix these issues before 12beta1,
so revert the patch to allow Windows beta testing to proceed.
We'll try again at some future date.

Per bug #15804 from Yulian Khodorkovskiy and additional
investigation by Michael Paquier.

Discussion: https://postgr.es/m/15804-3721117bf40fb654@postgresql.org
2019-05-19 11:14:23 -04:00
Alvaro Herrera 75445c1515 More message style fixes
Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql
2019-05-16 19:14:31 -04:00
Tom Lane 85ccb6899c Rearrange pgstat_bestart() to avoid failures within its critical section.
We long ago decided to design the shared PgBackendStatus data structure to
minimize the cost of writing status updates, which means that writers just
have to increment the st_changecount field twice.  That isn't hooked into
any sort of resource management mechanism, which means that if something
were to throw error between the two increments, the st_changecount field
would be left odd indefinitely.  That would cause readers to lock up.
Now, since it's also a bad idea to leave the field odd for longer than
absolutely necessary (because readers will spin while we have it set),
the expectation was that we'd treat these segments like spinlock critical
sections, with only short, more or less straight-line, code in them.

That was fine as originally designed, but commit 9029f4b37 broke it
by inserting a significant amount of non-straight-line code into
pgstat_bestart(), code that is very capable of throwing errors, not to
mention taking a significant amount of time during which readers will spin.
We have a report from Neeraj Kumar of readers actually locking up, which
I suspect was due to an encoding conversion error in X509_NAME_to_cstring,
though conceivably it was just a garden-variety OOM failure.

Subsequent commits have loaded even more dubious code into pgstat_bestart's
critical section (and commit fc70a4b0d deserves some kind of booby prize
for managing to miss the critical section entirely, although the negative
consequences seem minimal given that the PgBackendStatus entry should be
seen by readers as inactive at that point).

The right way to fix this mess seems to be to compute all these values
into a local copy of the process' PgBackendStatus struct, and then just
copy the data back within the critical section proper.  This plan can't
be implemented completely cleanly because of the struct's heavy reliance
on out-of-line strings, which we must initialize separately within the
critical section.  But still, the critical section is far smaller and
safer than it was before.

In hopes of forestalling future errors of the same ilk, rename the
macros for st_changecount management to make it more apparent that
the writer-side macros create a critical section.  And to prevent
the worst consequences if we nonetheless manage to mess it up anyway,
adjust those macros so that they really are a critical section, ie
they now bump CritSectionCount.  That doesn't add much overhead, and
it guarantees that if we do somehow throw an error while the counter
is odd, it will lead to PANIC and a database restart to reset shared
memory.

Back-patch to 9.5 where the problem was introduced.

In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach
pgstat_read_current_status to copy st_gssstatus data from shared memory to
local memory.  Hence, subsequent use of that data within the transaction
would potentially see changing data that it shouldn't see.

Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
2019-05-11 21:27:29 -04:00
Fujii Masao b84dbc8eb8 Add TRUNCATE parameter to VACUUM.
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.

This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-08 02:10:33 +09:00
Tom Lane 8d0ddccec6 Avoid "invalid memory alloc request size" while reading pg_stat_activity.
On a 64-bit machine, if you set track_activity_query_size and
max_connections such that their product exceeds 1GB, shared memory
setup will still succeed (given enough RAM), but attempts to read
pg_stat_activity fail with "invalid memory alloc request size".
Work around that by using MemoryContextAllocHuge to allocate the
local copy of the activity strings.  Using the "huge" API costs us
nothing extra in normal cases, and it seems better than throwing
an error and/or explaining to people why they can't do this.

This situation seems insanely profligate today, but who knows what
people will consider normal in ten or twenty years?  So let's fix it
in HEAD but not worry about a back-patch.

Per report from James Tomson.

Discussion: https://postgr.es/m/1CFDCCD6-B268-48D8-85C8-400D2790B2C3@pushd.com
2019-05-07 11:41:37 -04:00
Magnus Hagander 659e53498c Fix union for pgstat message types
The message type for temp files and for checksum failures were missing
from the union. Due to the coding style used there was no compiler error
when this happend. So change the code to actively use the union thereby
producing a compiler error if the same mistake happens again, suggested
by Tom Lane.

Author: Julien Rouhaud
Reported-By: Tomas Vondra
Discussion: https://postgr.es/m/20190430163328.zd4rrlnbvgaqlcdz@development
2019-05-01 12:30:44 +02:00
Noah Misch 90e7f31773 Use preprocessor conditions compatible with Emacs indent.
Emacs wrongly indented hundreds of subsequent lines.
2019-04-28 12:56:53 -07:00
Fujii Masao 978b032d1f Fix function names in comments.
Commit 3eb77eba5a renamed some functions, but forgot to
update some comments referencing to those functions.
This commit fixes those function names in the comments.

Kyotaro Horiguchi
2019-04-25 23:43:48 +09:00
Tom Lane 0fae846232 Fix some minor postmaster-state-machine issues.
In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based
on pmState.  The restriction is unnecessary (PostmasterStateMachine
should work in any state), not future-proof (since it makes too many
assumptions about why the signal might be sent), and broken even today
because a race condition can make it necessary to respond to the signal
in PM_WAIT_READONLY state.  The race condition seems unlikely, but
if it did happen, a hot-standby postmaster could fail to shut down
after receiving a smart-shutdown request.

In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag
if the fork attempt fails.  Leaving it set allows us to try
again in future iterations of the postmaster idle loop.  (The startup
process would eventually send a fresh request signal, but this change
may allow us to retry the fork sooner.)

Remove an obsolete comment and unnecessary test in
PostmasterStateMachine's handling of PM_SHUTDOWN_2 state.  It's not
possible to have a live walreceiver in that state, and AFAICT has not
been possible since commit 5e85315ea.  This isn't a live bug, but the
false comment is quite confusing to readers.

In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that
we don't uselessly perform stat() calls that we're going to ignore the
results of.

Add some comments clarifying the behavior of MaybeStartWalReceiver;
I very nearly rearranged it in a way that'd reintroduce the race
condition fixed in e5d494d78.  Mea culpa for not commenting that
properly at the time.

Back-patch to all supported branches.  The PMSIGNAL_ADVANCE_STATE_MACHINE
change is the only one of even minor significance, but we might as well
keep this code in sync across branches.

Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
2019-04-24 14:15:44 -04:00
Andres Freund fdc7efcc30 Allow pg_class xid & multixid horizons to not be set.
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.

Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry.  This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value.  Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23 21:42:12 -07:00
Noah Misch c098509927 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer stop if shmat() of an old segment fails with EACCES.  A
postmaster will no longer recycle segments pertaining to other data
directories.  That's good for production, but it's bad for integration
tests that crash a postmaster and immediately delete its data directory.
Such a test now leaks a segment indefinitely.  No "make check-world"
test does that.  win32_shmem.c already avoided all these problems.  In
9.6 and later, enhance PostgresNode to facilitate testing.  Back-patch
to 9.4 (all supported versions).

Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com
2019-04-12 22:36:38 -07:00
Magnus Hagander 77bd49adba Show shared object statistics in pg_stat_database
This adds a row to the pg_stat_database view with datoid 0 and datname
NULL for those objects that are not in a database. This was added
particularly for checksums, but we were already tracking more satistics
for these objects, just not returning it.

Also add a checksum_last_failure column that holds the timestamptz of
the last checksum failure that occurred in a database (or in a
non-dataabase file), if any.

Author: Julien Rouhaud <rjuju123@gmail.com>
2019-04-12 14:04:50 +02:00
Amit Kapila bdf35744bd Avoid counting transaction stats for parallel worker cooperating
transaction.

The transaction that is initiated by the parallel worker to cooperate
with the actual transaction started by the main backend to complete the
query execution should not be counted as a separate transaction.  The
other internal transactions started and committed by the parallel worker
are still counted as separate transactions as we that is what we do in
other places like autovacuum.

This will partially fix the bloat in transaction stats due to additional
transactions performed by parallel workers.  For a complete fix, we need to
decide how we want to show all the transactions that are started internally
for various operations and that is a matter of separate patch.

Reported-by: Haribabu Kommi
Author: Haribabu Kommi
Reviewed-by: Amit Kapila, Jamison Kirk and Rahila Syed
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAJrrPGc9=jKXuScvNyQ+VNhO0FZk7LLAShAJRyZjnedd2D61EQ@mail.gmail.com
2019-04-10 08:24:15 +05:30
Noah Misch 617dc6d299 Avoid "could not reattach" by providing space for concurrent allocation.
We've long had reports of intermittent "could not reattach to shared
memory" errors on Windows.  Buildfarm member dory fails that way when
PGSharedMemoryReAttach() execution overlaps with creation of a thread
for the process's "default thread pool".  Fix that by providing a second
region to receive asynchronous allocations that would otherwise intrude
into UsedShmemSegAddr.  In pgwin32_ReserveSharedMemoryRegion(), stop
trying to free reservations landing at incorrect addresses; the caller's
next step has been to terminate the affected process.  Back-patch to 9.4
(all supported versions).

Reviewed by Tom Lane.  He also did much of the prerequisite research;
see commit bcbf2346d6.

Discussion: https://postgr.es/m/20190402135442.GA1173872@rfd.leadboat.com
2019-04-08 21:39:00 -07:00
Thomas Munro de2b38419c Wake up interested backends when a checkpoint fails.
Commit c6c9474a switched to condition variables instead of sleep
loops to notify backends of checkpoint start and stop, but forgot
to broadcast in case of checkpoint failure.

Author: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKGJKbCd%2B_K%2BSEBsbHxVT60SG0ivWHHAdvL0bLTUt2xpA2w%40mail.gmail.com
2019-04-06 09:31:48 +13:00
Noah Misch 82150a05be Revert "Consistently test for in-use shared memory."
This reverts commits 2f932f71d9,
16ee6eaf80 and
6f0e190056.  The buildfarm has revealed
several bugs.  Back-patch like the original commits.

Discussion: https://postgr.es/m/20190404145319.GA1720877@rfd.leadboat.com
2019-04-05 00:00:52 -07:00
Robert Haas a96c41feec Allow VACUUM to be run with index cleanup disabled.
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.

Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
2019-04-04 15:04:43 -04:00
Thomas Munro 3eb77eba5a Refactor the fsync queue for wider use.
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.

For now only md.c uses the new interface, but other users are being
proposed.  Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.

Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key.  This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).

Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
2019-04-04 23:38:38 +13:00
Noah Misch 2f932f71d9 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer recycle segments pertaining to other data directories.
That's good for production, but it's bad for integration tests that
crash a postmaster and immediately delete its data directory.  Such a
test now leaks a segment indefinitely.  No "make check-world" test does
that.  win32_shmem.c already avoided all these problems.  In 9.6 and
later, enhance PostgresNode to facilitate testing.  Back-patch to 9.4
(all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20130911033341.GD225735@tornado.leadboat.com
2019-04-03 17:03:46 -07:00
Stephen Frost b0b39f72b9 GSSAPI encryption support
On both the frontend and backend, prepare for GSSAPI encryption
support by moving common code for error handling into a separate file.
Fix a TODO for handling multiple status messages in the process.
Eliminate the OIDs, which have not been needed for some time.

Add frontend and backend encryption support functions.  Keep the
context initiation for authentication-only separate on both the
frontend and backend in order to avoid concerns about changing the
requested flags to include encryption support.

In postmaster, pull GSSAPI authorization checking into a shared
function.  Also share the initiator name between the encryption and
non-encryption codepaths.

For HBA, add "hostgssenc" and "hostnogssenc" entries that behave
similarly to their SSL counterparts.  "hostgssenc" requires either
"gss", "trust", or "reject" for its authentication.

Similarly, add a "gssencmode" parameter to libpq.  Supported values are
"disable", "require", and "prefer".  Notably, negotiation will only be
attempted if credentials can be acquired.  Move credential acquisition
into its own function to support this behavior.

Add a simple pg_stat_gssapi view similar to pg_stat_ssl, for monitoring
if GSSAPI authentication was used, what principal was used, and if
encryption is being used on the connection.

Finally, add documentation for everything new, and update existing
documentation on connection security.

Thanks to Michael Paquier for the Windows fixes.

Author: Robbie Harwood, with changes to the read/write functions by me.
Reviewed in various forms and at different times by: Michael Paquier,
   Andres Freund, David Steele.
Discussion: https://www.postgresql.org/message-id/flat/jlg1tgq1ktm.fsf@thriss.redhat.com
2019-04-03 15:02:33 -04:00
Peter Eisentraut 481018f280 Add macro to cast away volatile without allowing changes to underlying type
This adds unvolatize(), which works just like unconstify() but for volatile.

Discussion: https://www.postgresql.org/message-id/flat/7a5cbea7-b8df-e910-0f10-04014bcad701%402ndquadrant.com
2019-03-25 09:37:03 +01:00
Michael Paquier 276d2e6c2d Make current_logfiles use permissions assigned to files in data directory
Since its introduction in 19dc233c, current_logfiles has been assigned
the same permissions as a log file, which can be enforced with
log_file_mode.  This setup can lead to incompatibility problems with
group access permissions as current_logfiles is not located in the log
directory, but at the root of the data folder.  Hence, if group
permissions are used but log_file_mode is more restrictive, a backup
with a user in the group having read access could fail even if the log
directory is located outside of the data folder.

Per discussion with the folks mentioned below, we have concluded that
current_logfiles should not be treated as a log file as it only stores
metadata related to log files, and that it should use the same
permissions as all other files in the data directory.  This solution has
the merit to be simple and fixes all the interaction problems between
group access and log_file_mode.

Author: Haribabu Kommi
Reviewed-by: Stephen Frost, Robert Haas, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAJrrPGcEotF1P7AWoeQyD3Pqr-0xkQg_Herv98DjbaMj+naozw@mail.gmail.com
Backpatch-through: 11, where group access has been added.
2019-03-24 21:00:35 +09:00
Tom Lane 0dfe3d0ef5 Make checkpoint requests more robust.
Commit 6f6a6d8b1 introduced a delay of up to 2 seconds if we're trying
to request a checkpoint but the checkpointer hasn't started yet (or,
much less likely, our kill() call fails).  However buildfarm experience
shows that that's not quite enough for slow or heavily-loaded machines.
There's no good reason to assume that the checkpointer won't start
eventually, so we may as well make the timeout much longer, say 60 sec.

However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
idea to be waiting at all, much less for as long as 60 sec.  We can
remove the need for that, and make this whole thing more robust, by
adjusting the code so that the existence of a pending checkpoint
request is clear from the contents of shared memory, and making sure
that the checkpointer process will notice it at startup even if it did
not get a signal.  In this way there's no need for a non-CHECKPOINT_WAIT
call to wait at all; if it can't send the signal, it can nonetheless
assume that the checkpointer will eventually service the request.

A potential downside of this change is that "kill -INT" on the checkpointer
process is no longer enough to trigger a checkpoint, should anyone be
relying on something so hacky.  But there's no obvious reason to do it
like that rather than issuing a plain old CHECKPOINT command, so we'll
assume that nobody is.  There doesn't seem to be a way to preserve this
undocumented quasi-feature without introducing race conditions.

Since a principal reason for messing with this is to prevent intermittent
buildfarm failures, back-patch to all supported branches.

Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us
2019-03-19 12:49:27 -04:00
Robert Haas f41551f61f Fold vacuum's 'int options' parameter into VacuumParams.
Many places need both, so this allows a few functions to take one
fewer parameter.  More importantly, as soon as we add a VACUUM
option that takes a non-Boolean parameter, we need to replace
'int options' with a struct, and it seems better to think
of adding more fields to VacuumParams rather than passing around
both VacuumParams and a separate struct as well.

Patch by me, reviewed by Masahiko Sawada

Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com
2019-03-18 13:57:33 -04:00
Thomas Munro c6c9474aaf Use condition variables to wait for checkpoints.
Previously we used a polling/sleeping loop to wait for checkpoints
to begin and end, which leads to up to a couple hundred milliseconds
of needless thumb-twiddling.  Use condition variables instead.

Author: Thomas Munro
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CA%2BhUKGLY7sDe%2Bbg1K%3DbnEzOofGoo4bJHYh9%2BcDCXJepb6DQmLw%40mail.gmail.com
2019-03-14 10:59:33 +13:00
Andres Freund c2fe139c20 tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al. have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 12:46:41 -07:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Magnus Hagander 6b9e875f72 Track block level checksum failures in pg_stat_database
This adds a column that counts how many checksum failures have occurred
on files belonging to a specific database. Both checksum failures
during normal backend processing and those created when a base backup
detects a checksum failure are counted.

Author: Magnus Hagander
Reviewed by: Julien Rouhaud
2019-03-09 10:47:30 -08:00
Andrew Dunstan 342cb650e0 Don't log incomplete startup packet if it's empty
This will stop logging cases where, for example, a monitor opens a
connection and immediately closes it. If the packet contains any data an
incomplete packet will still be logged.

Author: Tom Lane

Discussion: https://postgr.es/m/a1379a72-2958-1ed0-ef51-09a21219b155@2ndQuadrant.com
2019-03-06 15:36:41 -05:00
Alvaro Herrera 98098faaff Report correct name in autovacuum "work items" activity
We were reporting the database name instead of the relation name to
pg_stat_activity.  Repair.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20190220185552.GR28750@telsasoft.com
2019-02-22 13:00:16 -03:00
Michael Paquier ea92368cd1 Move max_wal_senders out of max_connections for connection slot handling
Since its introduction, max_wal_senders is counted as part of
max_connections when it comes to define how many connection slots can be
used for replication connections with a WAL sender context.  This can
lead to confusion for some users, as it could be possible to block a
base backup or replication from happening because other backend sessions
are already taken for other purposes by an application, and
superuser-only connection slots are not a correct solution to handle
that case.

This commit makes max_wal_senders independent of max_connections for its
handling of PGPROC entries in ProcGlobal, meaning that connection slots
for WAL senders are handled using their own free queue, like autovacuum
workers and bgworkers.

One compatibility issue that this change creates is that a standby now
requires to have a value of max_wal_senders at least equal to its
primary.  So, if a standby created enforces the value of
max_wal_senders to be lower than that, then this could break failovers.
Normally this should not be an issue though, as any settings of a
standby are inherited from its primary as postgresql.conf gets normally
copied as part of a base backup, so parameters would be consistent.

Author: Alexander Kukushkin
Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii
Kliukin
Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com
2019-02-12 10:07:56 +09:00
Peter Eisentraut f60a0e9677 Add more columns to pg_stat_ssl
Add columns client_serial and issuer_dn to pg_stat_ssl.  These allow
uniquely identifying the client certificate.

Rename the existing column clientdn to client_dn, to make the naming
more consistent and easier to read.

Discussion: https://www.postgresql.org/message-id/flat/398754d8-6bb5-c5cf-e7b8-22e5f0983caf@2ndquadrant.com/
2019-02-01 00:33:47 +01:00
Peter Eisentraut 689d15e95e Log PostgreSQL version number on startup
Logging the PostgreSQL version on startup is useful for two reasons:
There is a clear marker in the log file that a new postmaster is
beginning, and it's useful for tracking the server version across
startup while upgrading.

Author: Christoph Berg <christoph.berg@credativ.de>
Discussion: https://www.postgresql.org/message-id/flat/20181121144611.GJ15795@msg.credativ.de/
2019-01-30 23:26:10 +01:00
Peter Eisentraut 57431a911d postmaster: Start syslogger earlier
When the syslogger was originally
added (bdf8ef6925), nothing was normally
logged before the point where it was started.  But since
f9dfa5c977, the creation of sockets
causes messages of level LOG to be written routinely, so those don't
go to the syslogger now.

To improve that, arrange the sequence in PostmasterMain() slightly so
that the syslogger is started early enough to capture those messages.

Discussion: https://www.postgresql.org/message-id/d5d50936-20b9-85f1-06bc-94a01c5040c1%402ndquadrant.com
Reviewed-by: Christoph Berg <christoph.berg@credativ.de>
2019-01-30 21:10:56 +01:00
Andres Freund a9c35cf85c Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays.  For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.

Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.

Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win.  It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.

Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.

Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.

Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.

This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.

Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 14:17:52 -08:00
Andres Freund e7cc78ad43 Remove superfluous tqual.h includes.
Most of these had been obsoleted by 568d4138c / the SnapshotNow
removal.

This is is preparation for moving most of tqual.[ch] into either
snapmgr.h or heapam.h, which in turn is in preparation for pluggable
table AMs.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 12:15:02 -08:00
Andres Freund e0c4ec0728 Replace uses of heap_open et al with the corresponding table_* function.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Magnus Hagander 0301db623d Replace @postgresql.org with @lists.postgresql.org for mailinglists
Commit c0d0e54084 replaced the ones in the documentation, but missed out
on the ones in the code. Replace those as well, but unlike c0d0e54084,
don't backpatch the code changes to avoid breaking translations.
2019-01-19 19:06:35 +01:00
Michael Paquier 42e2a58071 Fix typos in documentation and for one wait event
These have been found while cross-checking for the use of unique words
in the documentation, and a wait event was not getting generated in a way
consistent to what the documentation provided.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9b5a3a85-899a-ae62-dbab-1e7943aa5ab1@gmail.com
2019-01-15 08:47:01 +09:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Michael Paquier 1707a0d2aa Remove configure switch --disable-strong-random
This removes a portion of infrastructure introduced by fe0a0b5 to allow
compilation of Postgres in environments where no strong random source is
available, meaning that there is no linking to OpenSSL and no
/dev/urandom (Windows having its own CryptoAPI).  No systems shipped
this century lack /dev/urandom, and the buildfarm is actually not
testing this switch at all, so just remove it.  This simplifies
particularly some backend code which included a fallback implementation
using shared memory, and removes a set of alternate regression output
files from pgcrypto.

Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20181230063219.GG608@paquier.xyz
2019-01-01 20:05:51 +09:00
Tom Lane 4203842a1c Use pg_strong_random() to select each server process's random seed.
Previously we just set the seed based on process ID and start timestamp.
Both those values are directly available within the session, and can
be found out or guessed by other users too, making the session's series
of random(3) values fairly predictable.  Up to now, our backend-internal
uses of random(3) haven't seemed security-critical, but commit 88bdbd3f7
added one that potentially is: when using log_statement_sample_rate, a
user might be able to predict which of his SQL statements will get logged.

To improve this situation, upgrade the per-process seed initialization
method to use pg_strong_random() if available, greatly reducing the
predictability of the initial seed value.  This adds a few tens of
microseconds to process start time, but since backend startup time is
at least a couple of milliseconds, that seems an acceptable price.

This means that pg_strong_random() needs to be able to run without
reliance on any backend infrastructure, since it will be invoked
before any of that is up.  It was safe for that already, but adjust
comments and #include commands to make it clearer.

Discussion: https://postgr.es/m/3859.1545849900@sss.pgh.pa.us
2018-12-29 17:56:06 -05:00
Michael Paquier b981df4cc0 Prioritize history files when archiving
At the end of recovery for the post-promotion process, a new history
file is created followed by the last partial segment of the previous
timeline.  Based on the timing, the archiver would first try to archive
the last partial segment and then the history file.  This can delay the
detection of a new timeline taken, particularly depending on the time it
takes to transfer the last partial segment as it delays the moment the
history file of the new timeline gets archived.  This can cause promoted
standbys to use the same timeline as one already taken depending on the
circumstances if multiple instances look at archives at the same
location.

This commit changes the order of archiving so as history files are
archived in priority over other file types, which reduces the likelihood
of the same timeline being taken (still not reducing the window to
zero), and it makes the archiver behave more consistently with the
startup process doing its post-promotion business.

Author: David Steele
Reviewed-by: Michael Paquier, Kyotaro Horiguchi
Discussion: https://postgr.es/m/929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net
Backpatch-through: 9.5
2018-12-24 20:24:16 +09:00
Tom Lane a73d083195 Modernize our code for looking up descriptive strings for Unix signals.
At least as far back as the 2008 spec, POSIX has defined strsignal(3)
for looking up descriptive strings for signal numbers.  We hadn't gotten
the word though, and were still using the crufty old sys_siglist array,
which is in no standard even though most Unixen provide it.

Aside from not being formally standards-compliant, this was just plain
ugly because it involved #ifdef's at every place using the code.

To eliminate the #ifdef's, create a portability function pg_strsignal,
which wraps strsignal(3) if available and otherwise falls back to
sys_siglist[] if available.  The set of Unixen with neither API is
probably empty these days, but on any platform with neither, you'll
just get "unrecognized signal".  All extant callers print the numeric
signal number too, so no need to work harder than that.

Along the way, upgrade pg_basebackup's child-error-exit reporting
to match the rest of the system.

Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
2018-12-16 19:38:57 -05:00
Tom Lane ade2d61ed0 Improve detection of child-process SIGPIPE failures.
Commit ffa4cbd62 added logic to detect SIGPIPE failure of a COPY child
process, but it only worked correctly if the SIGPIPE occurred in the
immediate child process.  Depending on the shell in use and the
complexity of the shell command string, we might instead get back
an exit code of 128 + SIGPIPE, representing a shell error exit
reporting SIGPIPE in the child process.

We could just hack up ClosePipeToProgram() to add the extra case,
but it seems like this is a fairly general issue deserving a more
general and better-documented solution.  I chose to add a couple
of functions in src/common/wait_error.c, which is a natural place
to know about wait-result encodings, that will test for either a
specific child-process signal type or any child-process signal failure.
Then, adjust other places that were doing ad-hoc tests of this type
to use the common functions.

In RestoreArchivedFile, this fixes a race condition affecting whether
the process will report an error or just silently proc_exit(1): before,
that depended on whether the intermediate shell got SIGTERM'd itself
or reported a child process failing on SIGTERM.

Like the previous patch, back-patch to v10; we could go further
but there seems no real need to.

Per report from Erik Rijkers.

Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl
2018-12-16 14:32:14 -05:00
Michael Paquier 6d8727f95e Ensure cleanup of orphan archive status files
When a WAL segment is recycled, its ".ready" and ".done" status files
get also automatically removed, however this is not done in a durable
manner.  Hence, in a subsequent crash, it could be possible that a
".ready" status file is still around with its corresponding segment
already gone.

If the backend reaches such a state, the archive command would most
likely complain about a segment non-existing and would keep retrying,
causing WAL segments to bloat pg_wal/, potentially making Postgres crash
hard when running out of space.

As status files are removed after each individual segment, using
durable_unlink() does not completely close the window either, as a crash
could happen between the moment the WAL segment is recycled and the
moment its status files are removed.  This has also some performance
impact with the additional fsync() calls needed to make the removal in a
durable manner.  Doing the cleanup at recovery is not cost-free either
as this makes crash recovery potentially take longer than necessary.

So, instead, as per an idea of Stephen Frost, make the archiver aware of
orphan status files and remove them on-the-fly if the corresponding
segment goes missing.  Removal failures follow a model close to what
happens for WAL segments, where multiple attempts are done before giving
up temporarily, and where a successful orphan removal makes the archiver
move immediately to the next WAL segment thought as ready to be
archived.

Author: Michael Paquier
Reviewed-by: Nathan Bossart, Andres Freund, Stephen Frost, Kyotaro
Horiguchi
Discussion: https://postgr.es/m/20180928032827.GF1500@paquier.xyz
2018-12-10 15:00:59 +09:00
Alvaro Herrera 3be5fe2b10 Silence compiler warnings
Commit cfdf4dc4fc left a few unnecessary assignments, one of which
caused compiler warnings, as reported by Erik Rijkers.  Remove them all.

Discussion: https://postgr.es/m/df0dcca2025b3d90d946ecc508ca9678@xs4all.nl
2018-11-23 13:01:05 -03:00
Thomas Munro cfdf4dc4fc Add WL_EXIT_ON_PM_DEATH pseudo-event.
Users of the WaitEventSet and WaitLatch() APIs can now choose between
asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking
for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death.
This reduces code duplication, since almost all callers want the latter.

Repair all code that was previously ignoring postmaster death completely,
or requesting the event but ignoring it, or requesting the event but then
doing an unconditional PostmasterIsAlive() call every time through its
event loop (which is an expensive syscall on platforms for which we don't
have USE_POSTMASTER_DEATH_SIGNAL support).

Assert that callers of WaitLatchXXX() under the postmaster remember to
ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent
future bugs.

The only process that doesn't handle postmaster death is syslogger.  It
waits until all backends holding the write end of the syslog pipe
(including the postmaster) have closed it by exiting, to be sure to
capture any parting messages.  By using the WaitEventSet API directly
it avoids the new assertion, and as a by-product it may be slightly
more efficient on platforms that have epoll().

Author: Thomas Munro
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com,
            https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
2018-11-23 20:46:34 +13:00
Andres Freund 578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Tom Lane 37afc079ab Avoid defining SIGTTIN/SIGTTOU on Windows.
Setting them to SIG_IGN seems unlikely to have any beneficial effect
on that platform, and given the signal numbering collision with SIGABRT,
it could easily have bad effects.

Given the lack of field complaints that can be traced to this, I don't
presently feel a need to back-patch.

Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us
2018-11-17 16:31:16 -05:00
Tom Lane 125f551c8b Leave SIGTTIN/SIGTTOU signal handling alone in postmaster child processes.
For reasons lost in the mists of time, most postmaster child processes
reset SIGTTIN/SIGTTOU signal handling to SIG_DFL, with the major exception
that backend sessions do not.  It seems like a pretty bad idea for any
postmaster children to do that: if stderr is connected to the terminal,
and the user has put the postmaster in background, any log output would
result in the child process freezing up.  Hence, switch them all to
doing what backends do, ie, nothing.  This allows them to inherit the
postmaster's SIG_IGN setting.  On the other hand, manually-launched
processes such as standalone backends will have default processing,
which seems fine.

In passing, also remove useless resets of SIGCONT and SIGWINCH signal
processing.  Perhaps the postmaster once changed those to something
besides SIG_DFL, but it doesn't now, so these are just wasted (and
confusing) syscalls.

Basically, this propagates the changes made in commit 8e2998d8a from
backends to other postmaster children.  Probably the only reason these
calls now exist elsewhere is that I missed changing pgstat.c along with
postgres.c at the time.

Given the lack of field complaints that can be traced to this, I don't
presently feel a need to back-patch.

Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us
2018-11-17 16:31:16 -05:00
Thomas Munro ab8984f52d Further adjustment to random() seed initialization.
Per complaint from Tom Lane, don't chomp the timestamp at 32 bits, so we
can shift in some of its higher bits.

Discussion: https://postgr.es/m/14712.1542253115%40sss.pgh.pa.us
2018-11-15 17:38:55 +13:00
Thomas Munro 5b0ce3ec33 Increase the number of possible random seeds per time period.
Commit 197e4af9 refactored the initialization of the libc random()
seed, but reduced the number of possible seeds values that could be
chosen in a given time period.  This negation of the effects of
commit 98c50656c was unintentional.  Replace with code that
shifts the fast moving timestamp bits left, similar to the original
algorithm (though not the previous float-tolerating coding, which
is no longer necessary).

Author: Thomas Munro
Reported-by: Noah Misch
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20181112083358.GA1073796%40rfd.leadboat.com
2018-11-15 16:25:30 +13:00
Michael Paquier 10074651e3 Add pg_promote function
This function is able to promote a standby with this new SQL-callable
function.  Execution access can be granted to non-superusers so that
failover tools can observe the principle of least privilege.

Catalog version is bumped.

Author: Laurenz Albe
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/6e7c79b3ec916cf49742fb8849ed17cd87aed620.camel@cybertec.at
2018-10-25 09:46:00 +09:00
Michael Paquier 5ef037cf0b List wait events in alphabetical order
This changes the documentation, and the related structures so as
everything is consistent.

Some wait events were not listed alphabetically since their
introduction, others have been added rather randomly.  Keeping all those
entries in order helps in maintenance, and helps the user looking at the
documentation.

Author: Michael Paquier, Kuntal Ghosh
Discussion: https://postgr.es/m/20181024002539.GI1658@paquier.xyz
Backpatch-through: 10, only for the documentation part to avoid an ABI
breakage.
2018-10-24 17:02:37 +09:00
Thomas Munro 197e4af9d5 Refactor pid, random seed and start time initialization.
Background workers, including parallel workers, were generating
the same sequence of numbers in random().  This showed up as DSM
handle collisions when Parallel Hash created multiple segments,
but any code that calls random() in background workers could be
affected if it cares about different backends generating different
numbers.

Repair by making sure that all new processes initialize the seed
at the same time as they set MyProcPid and MyStartTime in a new
function InitProcessGlobals(), called by the postmaster, its
children and also standalone processes.  Also add a new high
resolution MyStartTimestamp as a potentially useful by-product,
and remove SessionStartTime from struct Port as it is now
redundant.

No back-patch for now, as the known consequences so far are just
a bunch of harmless shm_open(O_EXCL) collisions.

Author: Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D2eJj_6%3DB%2B2tEpGu2nf1BjthCf9nXXUouYvJJ4C5WSwhg%40mail.gmail.com
2018-10-19 13:59:28 +13:00
Stephen Frost 8bddc86400 Add application_name to connection authorized msg
The connection authorized message has quite a bit of useful information
in it, but didn't include the application_name (when provided), so let's
add that as it can be very useful.

Note that at the point where we're emitting the connection authorized
message, we haven't processed GUCs, so it's not possible to get this by
using log_line_prefix (which pulls from the GUC).  There's also
something to be said for having this included in the connection
authorized message and then not needing to repeat it for every line, as
having it in log_line_prefix would do.

The GUC cleans the application name to pure-ascii, so do that here too,
but pull out the logic for cleaning up a string into its own function
in common and re-use it from those places, and check_cluster_name which
was doing the same thing.

Author: Don Seiler <don@seiler.us>
Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com
2018-09-28 19:04:50 -04:00
Peter Eisentraut 842cb9fa62 Refactor dlopen() support
Nowadays, all platforms except Windows and older HP-UX have standard
dlopen() support.  So having a separate implementation per platform
under src/backend/port/dynloader/ is a bit excessive.  Instead, treat
dlopen() like other library functions that happen to be missing
sometimes and put a replacement implementation under src/port/.

Discussion: https://www.postgresql.org/message-id/flat/e11a49cb-570a-60b7-707d-7084c8de0e61%402ndquadrant.com#54e735ae37476a121abb4e33c2549b03
2018-09-06 11:33:04 +02:00
Alexander Korotkov ec74369931 Implement "pg_ctl logrotate" command
Currently there are two ways to trigger log rotation in logging collector
process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly
to logging collector process.  However, it's nice to have more suitable way
for external tools to do that, which wouldn't require SQL connection or
knowledge of logging collector pid.  This commit implements triggering log
rotation by "pg_ctl logrotate" command.

Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp
Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov
2018-09-01 19:46:49 +03:00
Michael Paquier 55875b6d2a Stop bgworkers during fast shutdown with postmaster in startup phase
When a postmaster gets into its phase PM_STARTUP, it would start
background workers using BgWorkerStart_PostmasterStart mode immediately,
which would cause problems for a fast shutdown as the postmaster forgets
to send SIGTERM to already-started background workers.  With smart and
immediate shutdowns, this correctly happened, and fast shutdown is the
only mode missing the shot.

Author: Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mvnD8+DZUfzpi50DoaDfZRDfd7S=gwj5vU9GYn8UvHkA@mail.gmail.com
Backpatch-through: 9.5
2018-08-29 17:10:02 -07:00
Tom Lane bff84a547d Make syslogger more robust against failures in opening CSV log files.
The previous coding figured it'd be good enough to postpone opening
the first CSV log file until we got a message we needed to write there.
This is unsafe, though, because if the open fails we end up in infinite
recursion trying to report the failure.  Instead make the CSV log file
management code look as nearly as possible like the longstanding logic
for the stderr log file.  In particular, open it immediately at postmaster
startup (if enabled), or when we get a SIGHUP in which we find that
log_destination has been changed to enable CSV logging.

It seems OK to fail if a postmaster-start-time open attempt fails, as
we've long done for the stderr log file.  But we can't die if we fail
to open a CSV log file during SIGHUP, so we're still left with a problem.
In that case, write any output meant for the CSV log file to the stderr
log file.  (This will also cover race-condition cases in which backends
send CSV log data before or after we have the CSV log file open.)

This patch also fixes an ancient oversight that, if CSV logging was
turned off during a SIGHUP, we never actually closed the last CSV
log file.

In passing, remember to reset whereToSendOutput = DestNone during syslogger
start, since (unlike all other postmaster children) it's forked before the
postmaster has done that.  This made for a platform-dependent difference
in error reporting behavior between the syslogger and other children:
except on Windows, it'd report problems to the original postmaster stderr
as well as the normal error log file(s).  It's barely possible that that
was intentional at some point; but it doesn't seem likely to be desirable
in production, and the platform dependency definitely isn't desirable.

Per report from Alexander Kukushkin.  It's been like this for a long time,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/CAFh8B==iLUD_gqC-dAENS0V+kVrCeGiKujtKqSQ7++S-caaChw@mail.gmail.com
2018-08-26 14:21:55 -04:00
Tom Lane cc4f6b7786 Clean up assorted misuses of snprintf()'s result value.
Fix a small number of places that were testing the result of snprintf()
but doing so incorrectly.  The right test for buffer overrun, per C99,
is "result >= bufsize" not "result > bufsize".  Some places were also
checking for failure with "result == -1", but the standard only says
that a negative value is delivered on failure.

(Note that this only makes these places correct if snprintf() delivers
C99-compliant results.  But at least now these places are consistent
with all the other places where we assume that.)

Also, make psql_start_test() and isolation_start_test() check for
buffer overrun while constructing their shell commands.  There seems
like a higher risk of overrun, with more severe consequences, here
than there is for the individual file paths that are made elsewhere
in the same functions, so this seemed like a worthwhile change.

Also fix guc.c's do_serialize() to initialize errno = 0 before
calling vsnprintf.  In principle, this should be unnecessary because
vsnprintf should have set errno if it returns a failure indication ...
but the other two places this coding pattern is cribbed from don't
assume that, so let's be consistent.

These errors are all very old, so back-patch as appropriate.  I think
that only the shell command overrun cases are even theoretically
reachable in practice, but there's not much point in erroneous error
checks.

Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
2018-08-15 16:29:31 -04:00
Michael Paquier 246a6c8f7b Make autovacuum more aggressive to remove orphaned temp tables
Commit dafa084, added in 10, made the removal of temporary orphaned
tables more aggressive.  This commit makes an extra step into the
aggressiveness by adding a flag in each backend's MyProc which tracks
down any temporary namespace currently in use.  The flag is set when the
namespace gets created and can be reset if the temporary namespace has
been created in a transaction or sub-transaction which is aborted.  The
flag value assignment is assumed to be atomic, so this can be done in a
lock-less fashion like other flags already present in PGPROC like
databaseId or backendId, still the fact that the temporary namespace and
table created are still locked until the transaction creating those
commits acts as a barrier for other backends.

This new flag gets used by autovacuum to discard more aggressively
orphaned tables by additionally checking for the database a backend is
connected to as well as its temporary namespace in-use, removing
orphaned temporary relations even if a backend reuses the same slot as
one which created temporary relations in a past session.

The base idea of this patch comes from Robert Haas, has been written in
its first version by Tsunakawa Takayuki, then heavily reviewed by me.

Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05
Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI
breakages on already released versions.
2018-08-13 11:49:04 +02:00
Heikki Linnakangas 8e19a82640 Don't run atexit callbacks in quickdie signal handlers.
exit() is not async-signal safe. Even if the libc implementation is, 3rd
party libraries might have installed unsafe atexit() callbacks. After
receiving SIGQUIT, we really just want to exit as quickly as possible, so
we don't really want to run the atexit() callbacks anyway.

The original report by Jimmy Yih was a self-deadlock in startup_die().
However, this patch doesn't address that scenario; the signal handling
while waiting for the startup packet is more complicated. But at least this
alleviates similar problems in the SIGQUIT handlers, like that reported
by Asim R P later in the same thread.

Backpatch to 9.3 (all supported versions).

Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
2018-08-08 19:10:32 +03:00
Tom Lane 41db97399d Fix incorrect initialization of BackendActivityBuffer.
Since commit c8e8b5a6e, this has been zeroed out using the wrong length.
In practice the length would always be too small, leading to not zeroing
the whole buffer rather than clobbering additional memory; and that's
pretty harmless, both because shmem would likely start out as zeroes
and because we'd reinitialize any given entry before use.  Still,
it's bogus, so fix it.

Reported by Petru-Florin Mihancea (bug #15312)

Discussion: https://postgr.es/m/153363913073.1303.6518849192351268091@wrigleys.postgresql.org
2018-08-07 16:00:44 -04:00
Tom Lane 3cb646264e Use a ResourceOwner to track buffer pins in all cases.
Historically, we've allowed auxiliary processes to take buffer pins without
tracking them in a ResourceOwner.  However, that creates problems for error
recovery.  In particular, we've seen multiple reports of assertion crashes
in the startup process when it gets an error while holding a buffer pin,
as for example if it gets ENOSPC during a write.  In a non-assert build,
the process would simply exit without releasing the pin at all.  We've
gotten away with that so far just because a failure exit of the startup
process translates to a database crash anyhow; but any similar behavior
in other aux processes could result in stuck pins and subsequent problems
in vacuum.

To improve this, institute a policy that we must *always* have a resowner
backing any attempt to pin a buffer, which we can enforce just by removing
the previous special-case code in resowner.c.  Add infrastructure to make
it easy to create a process-lifespan AuxProcessResourceOwner and clear
out its contents at appropriate times.  Replace existing ad-hoc resowner
management in bgwriter.c and other aux processes with that.  (Thus, while
the startup process gains a resowner where it had none at all before, some
other aux process types are replacing an ad-hoc resowner with this code.)
Also use the AuxProcessResourceOwner to manage buffer pins taken during
StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap
process or a standalone backend rather than a true auxiliary process.

In passing, remove some other ad-hoc resource owner creations that had
gotten cargo-culted into various other places.  As far as I can tell
that was all unnecessary, and if it had been necessary it was incomplete,
due to lacking any provision for clearing those resowners later.
(Also worth noting in this connection is that a process that hasn't called
InitBufferPoolBackend has no business accessing buffers; so there's more
to do than just add the resowner if we want to touch buffers in processes
not covered by this patch.)

Although this fixes a very old bug, no back-patch, because there's no
evidence of any significant problem in non-assert builds.

Patch by me, pursuant to a report from Justin Pryzby.  Thanks to
Robert Haas and Kyotaro Horiguchi for reviews.

Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com
2018-07-18 12:15:16 -04:00
Michael Paquier edc6b41bd4 Rename VACOPT_NOWAIT to VACOPT_SKIP_LOCKED
When it comes to SELECT ... FOR or LOCK, NOWAIT means to not wait for
something to happen, and issue an error.  SKIP LOCKED means to not wait
for something to happen but to move on without issuing an error.  The
internal option of autovacuum and autoanalyze mentioned above, used only
when wraparound is not involved was named NOWAIT, but behaves like SKIP
LOCKED which is confusing.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20180307050345.GA3095@paquier.xyz
2018-07-12 14:28:28 +09:00
Michael Paquier c55de5e512 Add wait event for fsync of WAL segments
This has been visibly a forgotten spot in the first implementation of
wait events for I/O added by 249cf07, and what has been missing is a
fsync call for WAL segments which is a wrapper reacting on the value of
GUC wal_sync_method.

Reported-by: Konstantin Knizhnik
Author: Konstantin Knizhnik
Reviewed-by: Craig Ringer, Michael Paquier
Discussion: https://postgr.es/m/4a243897-0ad8-f471-aa40-242591f2476e@postgrespro.ru
2018-07-02 22:19:46 +09:00
Tom Lane a7a7387575 Further improve code for probing the availability of ARM CRC instructions.
Andrew Gierth pointed out that commit 1c72ec6f4 would yield the wrong
answer on big-endian ARM systems, because the data being CRC'd would be
different.  To fix that, and avoid the rather unsightly hard-wired
constant, simply compare the hardware and software implementations'
results.

While we're at it, also log the resulting decision at DEBUG1, and error
out if the hw and sw results unexpectedly differ.  Also, since this
file must compile for both frontend and backend, avoid incorrect
dependencies on backend-only headers.

In passing, add a comment to postmaster.c about when the CRC function
pointer will get initialized.

Thomas Munro, based on complaints from Andrew Gierth and Tom Lane

Discussion: https://postgr.es/m/HE1PR0801MB1323D171938EABC04FFE7FA9E3110@HE1PR0801MB1323.eurprd08.prod.outlook.com
2018-05-03 11:32:57 -04:00
Tom Lane 9cb7db3f0c In AtEOXact_Files, complain if any files remain unclosed at commit.
This change makes this module act more like most of our other low-level
resource management modules.  It's a caller error if something is not
explicitly closed by the end of a successful transaction, so issue
a WARNING about it.  This would not actually have caught the file leak
bug fixed in commit 231bcd080, because that was in a transaction-abort
path; but it still seems like a good, and pretty cheap, cross-check.

Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org
2018-04-28 17:45:02 -04:00
Heikki Linnakangas 811969b218 Allocate enough shared string memory for stats of auxiliary processes.
This fixes a bug whereby the st_appname, st_clienthostname, and
st_activity_raw fields for auxiliary processes point beyond the end of
their respective shared memory segments. As a result, the application_name
of a backend might show up as the client hostname of an auxiliary process.

Backpatch to v10, where this bug was introduced, when the auxiliary
processes were added to the array.

Author: Edmund Horner
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com
2018-04-11 23:39:49 +03:00
Heikki Linnakangas a820b4c329 Make local copy of client hostnames in backend status array.
The other strings, application_name and query string, were snapshotted to
local memory in pgstat_read_current_status(), but we forgot to do that for
client hostnames. As a result, the client hostname would appear to change in
the local copy, if the client disconnected.

Backpatch to all supported versions.

Author: Edmund Horner
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com
2018-04-11 23:39:48 +03:00
Magnus Hagander a228cc13ae Revert "Allow on-line enabling and disabling of data checksums"
This reverts the backend sides of commit 1fde38beaa.
I have, at least for now, left the pg_verify_checksums tool in place, as
this tool can be very valuable without the rest of the patch as well,
and since it's a read-only tool that only runs when the cluster is down
it should be a lot safer.
2018-04-09 19:03:42 +02:00
Stephen Frost 2b74022473 Fix EXEC BACKEND + Windows builds for group privs
Under EXEC BACKEND we also need to be going through the group privileges
setup since we do support that on Unixy systems, so add that to
SubPostmasterMain().

Under Windows, we need to simply return true from
GetDataDirectoryCreatePerm(), but that wasn't happening due to a missing
 #else clause.

Per buildfarm.
2018-04-07 19:01:43 -04:00
Stephen Frost c37b3d08ca Allow group access on PGDATA
Allow the cluster to be optionally init'd with read access for the
group.

This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.

The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates.  This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.

Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein.  If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required.  Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.

Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.

Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Stephen Frost da9b580d89 Refactor dir/file permissions
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).

Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).

Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.

Authors: David Steele <david@pgmasters.net>,
         Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Magnus Hagander 1fde38beaa Allow on-line enabling and disabling of data checksums
This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).

Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.

This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.

Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
2018-04-05 22:04:48 +02:00
Magnus Hagander eed1ce72e1 Allow background workers to bypass datallowconn
THis adds a "flags" field to the BackgroundWorkerInitializeConnection()
and BackgroundWorkerInitializeConnectionByOid(). For now only one flag,
BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore
datallowconn.
2018-04-05 19:02:45 +02:00
Alvaro Herrera 484a4a08ab Log when a BRIN autosummarization request fails
Autovacuum's 'workitem' request queue is of limited size, so requests
can fail if they arrive more quickly than autovacuum can process them.
Emit a log message when this happens, to provide better visibility of
this.

Backpatch to 10.  While this represents an API change for
AutoVacuumRequestWork, that function is not yet prepared to deal with
external modules calling it, so there doesn't seem to be any risk (other
than log spam, that is.)

Author: Masahiko Sawada
Reviewed-by: Fabrízio Mello, Ildar Musin, Álvaro Herrera
Discussion: https://postgr.es/m/CAD21AoB1HrQhp6_4rTyHN5kWEJCEsG8YzsjZNt-ctoXSn5Uisw@mail.gmail.com
2018-03-14 11:59:40 -03:00
Tom Lane 38f7831d70 Avoid holding AutovacuumScheduleLock while rechecking table statistics.
In databases with many tables, re-fetching the statistics takes some time,
so that this behavior seriously decreases the available concurrency for
multiple autovac workers.  There's discussion afoot about more complete
fixes, but a simple and back-patchable amelioration is to claim the table
and release the lock before rechecking stats.  If we find out there's no
longer a reason to process the table, re-taking the lock to un-claim the
table is cheap enough.

(This patch is quite old, but got lost amongst a discussion of more
aggressive fixes.  It's not clear when or if such a fix will be
accepted, but in any case it'd be unlikely to get back-patched.
Let's do this now so we have some improvement for the back branches.)

In passing, make the normal un-claim step take AutovacuumScheduleLock
not AutovacuumLock, since that is what is documented to protect the
wi_tableoid field.  This wasn't an actual bug in view of the fact that
readers of that field hold both locks, but it creates some concurrency
penalty against operations that need only AutovacuumLock.

Back-patch to all supported versions.

Jeff Janes

Discussion: https://postgr.es/m/26118.1520865816@sss.pgh.pa.us
2018-03-13 12:28:35 -04:00
Tom Lane 4e0c743c18 Fix cross-checking of ReservedBackends/max_wal_senders/MaxConnections.
We were independently checking ReservedBackends < MaxConnections and
max_wal_senders < MaxConnections, but because walsenders aren't allowed
to use superuser-reserved connections, that's really the wrong thing.
Correct behavior is to insist on ReservedBackends + max_wal_senders being
less than MaxConnections.  Fix the code and associated documentation.

This has been wrong for a long time, but since the situation probably
hardly ever arises in the field (especially pre-v10, when the default
for max_wal_senders was zero), no back-patch.

Discussion: https://postgr.es/m/28271.1520195491@sss.pgh.pa.us
2018-03-08 11:25:26 -05:00
Noah Misch 582edc369c Empty search_path in Autovacuum and non-psql/pgbench clients.
This makes the client programs behave as documented regardless of the
connect-time search_path and regardless of user-created objects.  Today,
a malicious user with CREATE permission on a search_path schema can take
control of certain of these clients' queries and invoke arbitrary SQL
functions under the client identity, often a superuser.  This is
exploitable in the default configuration, where all users have CREATE
privilege on schema "public".

This changes behavior of user-defined code stored in the database, like
pg_index.indexprs and pg_extension_config_dump().  If they reach code
bearing unqualified names, "does not exist" or "no schema has been
selected to create in" errors might appear.  Users may fix such errors
by schema-qualifying affected names.  After upgrading, consider watching
server logs for these errors.

The --table arguments of src/bin/scripts clients have been lax; for
example, "vacuumdb -Zt pg_am\;CHECKPOINT" performed a checkpoint.  That
now fails, but for now, "vacuumdb -Zt 'pg_am(amname);CHECKPOINT'" still
performs a checkpoint.

Back-patch to 9.3 (all supported versions).

Reviewed by Tom Lane, though this fix strategy was not his first choice.
Reported by Arseniy Sharoglazov.

Security: CVE-2018-1058
2018-02-26 07:39:44 -08:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Peter Eisentraut c1869542b3 Use abstracted SSL API in server connection log messages
The existing "connection authorized" server log messages used OpenSSL
API calls directly, even though similar abstracted API calls exist.
Change to use the latter instead.

Change the function prototype for the functions that return the TLS
version and the cipher to return const char * directly instead of
copying into a buffer.  That makes them slightly easier to use.

Add bits= to the message.  psql shows that, so we might as well show the
same information on the client and server.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-26 09:50:46 -05:00
Alvaro Herrera 95be5ce1bc Remove unnecessary include
autovacuum.c no longer needs dsa.h, since commit 31ae1638ce.
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoCWvYyXrvdANSHWWWEWJH5TeAWAkJ_2gqrHhukG+OBo1g@mail.gmail.com
2018-01-23 15:22:13 -03:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Andres Freund 1804284042 Add parallel-aware hash joins.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash.  While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.

After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory.  If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.

The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * divides the work of building the hash table over the CPUs

One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables.  This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes.  Another is that
outer batch 0 must be written to disk if multiple batches are required.

A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.

A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.

Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 00:43:41 -08:00
Robert Haas 28724fd90d Report failure to start a background worker.
When a worker is flagged as BGW_NEVER_RESTART and we fail to start it,
or if it is not marked BGW_NEVER_RESTART but is terminated before
startup succeeds, what BgwHandleStatus should be reported?  The
previous code really hadn't considered this possibility (as indicated
by the comments which ignore it completely) and would typically return
BGWH_NOT_YET_STARTED, but that's not a good answer, because then
there's no way for code using GetBackgroundWorkerPid() to tell the
difference between a worker that has not started but will start
later and a worker that has not started and will never be started.
So, when this case happens, return BGWH_STOPPED instead.  Update the
comments to reflect this.

The preceding fix by itself is insufficient to fix the problem,
because the old code also didn't send a notification to the process
identified in bgw_notify_pid when startup failed.  That might've
been technically correct under the theory that the status of the
worker was BGWH_NOT_YET_STARTED, because the status would indeed not
change when the worker failed to start, but now that we're more
usefully reporting BGWH_STOPPED, a notification is needed.

Without these fixes, code which starts background workers and then
uses the recommended APIs to wait for those background workers to
start would hang indefinitely if the postmaster failed to fork a
worker.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com
2017-12-06 08:58:27 -05:00
Tom Lane 2069e6faa0 Clean up assorted messiness around AllocateDir() usage.
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures.  It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode.  And it eliminates a lot of just plain redundant error-handling
code.

In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern

	dir = AllocateDir(path);
	while ((de = ReadDirExtended(dir, path, LOG)) != NULL)

if they'd like to treat directory-open failures as mere LOG conditions
rather than errors.  Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.

Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine.  Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above.  (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.)  In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.

Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported.  There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.

Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming.  (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire.  If so, this function was broken for its
own purposes as well as breaking callers.)

And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.

Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless.  Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes.  The rest of this is
essentially cosmetic and need not get back-patched.

Michael Paquier, with a bit of additional work by me

Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
2017-12-04 17:02:56 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Robert Haas ae65f6066d Provide for forward compatibility with future minor protocol versions.
Previously, any attempt to request a 3.x protocol version other than
3.0 would lead to a hard connection failure, which made the minor
protocol version really no different from the major protocol version
and precluded gentle protocol version breaks.  Instead, when the
client requests a 3.x protocol version where x is greater than 0, send
the new NegotiateProtocolVersion message to convey that we support
only 3.0.  This makes it possible to introduce new minor protocol
versions without requiring a connection retry when the server is
older.

In addition, if the startup packet includes name/value pairs where
the name starts with "_pq_.", assume that those are protocol options,
not GUCs.  Include those we don't support (i.e. all of them, at
present) in the NegotiateProtocolVersion message so that the client
knows they were not understood.  This makes it possible for the
client to request previously-unsupported features without bumping
the protocol version at all; the client can tell from the server's
response whether the option was understood.

It will take some time before servers that support these new
facilities become common in the wild; to speed things up and make
things easier for a future 3.1 protocol version, back-patch to all
supported releases.

Robert Haas and Badrul Chowdhury

Discussion: http://postgr.es/m/BN6PR21MB0772FFA0CBD298B76017744CD1730@BN6PR21MB0772.namprd21.prod.outlook.com
Discussion: http://postgr.es/m/30788.1498672033@sss.pgh.pa.us
2017-11-21 13:56:24 -05:00
Peter Eisentraut 0e1539ba0d Add some const decorations to prototypes
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-11-10 13:38:57 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Alvaro Herrera be72b9c378 Fix autovacuum work item error handling
In autovacuum's "work item" processing, a few strings were allocated in
the current transaction's memory context, which goes away during error
handling; if an error happened during execution of the work item, the
pfree() calls to clean up afterwards would try to release already-released
memory, possibly leading to a crash.  In branch master, this was already
fixed by commit 335f3d04e4, so backpatch that to REL_10_STABLE to fix
the problem there too.

As a secondary problem, verify that the autovacuum worker is connected
to the right database for each work item; otherwise some items would be
discarded by workers in other databases.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20171014035732.GB31726@telsasoft.com
2017-10-30 15:52:02 +01:00
Tom Lane 11d8d72c27 Allow multiple tables to be specified in one VACUUM or ANALYZE command.
Not much to say about this; does what it says on the tin.

However, formerly, if there was a column list then the ANALYZE action was
implied; now it must be specified, or you get an error.  This is because
it would otherwise be a bit unclear what the user meant if some tables
have column lists and some don't.

Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some
editorialization by me

Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com
2017-10-03 18:53:44 -04:00
Andres Freund 0ba99c84e8 Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h.
All postgres internal usages are replaced, it's just libpq example
usages that haven't been converted. External users of libpq can't
generally rely on including postgres internal headers.

Note that this includes replacing open-coded byte swapping of 64bit
integers (using two 32 bit swaps) with a single 64bit swap.

Where it looked applicable, I have removed netinet/in.h and
arpa/inet.h usage, which previously provided the relevant
functionality. It's perfectly possible that I missed other reasons for
including those, the buildfarm will tell.

Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de
2017-10-01 15:36:14 -07:00
Peter Eisentraut 5373bc2a08 Add background worker type
Add bgw_type field to background worker structure.  It is intended to be
set to the same value for all workers of the same type, so they can be
grouped in pg_stat_activity, for example.

The backend_type column in pg_stat_activity now shows bgw_type for a
background worker.  The ps listing also no longer calls out that a
process is a background worker but just show the bgw_type.  That way,
being a background worker is more of an implementation detail now that
is not shown to the user.  However, most log messages still refer to
'background worker "%s"'; otherwise constructing sensible and
translatable log messages would become tricky.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2017-09-29 11:08:24 -04:00
Tom Lane 335f3d04e4 Improve memory management in autovacuum.c.
Invoke vacuum(), as well as "work item" processing, in the PortalContext
that do_autovacuum() has manufactured, which will be reset before each
such invocation.  This ensures cleanup of any memory leaked by these
operations.  It also avoids the rather dangerous practice of calling
vacuum() in a context that vacuum() itself will destroy while it runs.
There's no known live bug there, but it's not hard to imagine introducing
one if we leave it like this.

Tom Lane, reviewed by Michael Paquier and Alvaro Herrera

Discussion: https://postgr.es/m/13849.1506114543@sss.pgh.pa.us
2017-09-23 13:28:16 -04:00
Peter Eisentraut be87b70b61 Sync process names between ps and pg_stat_activity
Remove gratuitous differences in the process names shown in
pg_stat_activity.backend_type and the ps output.

Reviewed-by: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>
2017-09-20 08:59:03 -04:00
Andres Freund fc49e24fa6 Make WAL segment size configurable at initdb time.
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.

But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.

This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured.  For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.

Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
    Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-19 22:03:48 -07:00
Andres Freund 896537f078 s/NULL byte/NUL byte/ in comment refering to C string terminator.
Reported-By: Robert Haas
Discussion: https://postgr.es/m/CA+Tgmoa+YBvWgFST2NVoeXjVSohEpK=vqnVCsoCkhTVVxfLcVQ@mail.gmail.com
2017-09-19 16:41:07 -07:00
Andres Freund 71edbb6f66 Avoid use of non-portable strnlen() in pgstat_clip_activity().
The use of strnlen rather than strlen was just paranoia. Instead of
giving up on the paranoia, just implement the safeguard
differently. And add a comment explaining why we're careful.

Author: Andres Freund
Discussion: https://postgr.es/m/E1duOkJ-0001Mc-U5@gemulon.postgresql.org
2017-09-19 14:25:47 -07:00
Andres Freund 54b6cd589a Speedup pgstat_report_activity by moving mb-aware truncation to read side.
Previously multi-byte aware truncation was done on every
pgstat_report_activity() call - proving to be a bottleneck for
workloads with long query strings that execute quickly.

Instead move the truncation to the read side, which commonly is
executed far less frequently. That's possible because all server
encodings allow to determine the length of a multi-byte string from
the first byte.

Rename PgBackendStatus.st_activity to st_activity_raw so existing
extension users of the field break - their code has to be adjusted to
use pgstat_clip_activity().

Author: Andres Freund
Tested-By: Khuntal Ghosh
Reviewed-By: Robert Haas, Tom Lane
Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkviecpz@alap3.anarazel.de
2017-09-19 12:51:14 -07:00
Andres Freund ec9e05b3c3 Fix crash restart bug introduced in 8356753c21.
The bug was caused by not re-reading the control file during crash
recovery restarts, which lead to an attempt to pfree() shared memory
contents. The fix is to re-read the control file, which seems good
anyway.

It's unclear as of this moment, whether we want to keep the
refactoring introduced in the commit referenced above, or come up with
an alternative approach. But fixing the bug in the mean time seems
like a good idea regardless.

A followup commit will introduce regression test coverage for crash
restarts.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/14134.1505572349@sss.pgh.pa.us
2017-09-18 17:25:49 -07:00
Andres Freund 8356753c21 Perform only one ReadControlFile() during startup.
Previously we read the control file in multiple places. But soon the
segment size will be configurable and stored in the control file, and
that needs to be available earlier than it currently is needed.

Instead of adding yet another place where it's read, refactor things
so there's a single processing of the control file during startup (in
EXEC_BACKEND that's every individual backend's startup).

Author: Andres Freund
Discussion: http://postgr.es/m/20170913092828.aozd3gvvmw67gmyc@alap3.anarazel.de
2017-09-14 14:14:34 -07:00
Robert Haas baaf272ac9 Use group updates when setting transaction status in clog.
Commit 0e141c0fbb introduced a mechanism
to reduce contention on ProcArrayLock by having a single process clear
XIDs in the procArray on behalf of multiple processes, reducing the
need to hand the lock around.  A previous attempt to introduce a similar
mechanism for CLogControlLock in ccce90b398
crashed and burned, but the design problem which resulted in those
failures is believed to have been corrected in this version.

Amit Kapila, with some cosmetic changes by me.  See the previous commit
message for additional credits.

Discussion: http://postgr.es/m/CAA4eK1KudxzgWhuywY_X=yeSAhJMT4DwCjroV5Ay60xaeB2Eew@mail.gmail.com
2017-09-01 11:45:40 -04:00
Alvaro Herrera 31ae1638ce Simplify autovacuum work-item implementation
The initial implementation of autovacuum work-items used a dynamic
shared memory area (DSA).  However, it's argued that dynamic shared
memory is not portable enough, so we cannot rely on it being supported
everywhere; at the same time, autovacuum work-items are now a critical
part of the server, so it's not acceptable that they don't work in the
cases where dynamic shared memory is disabled.  Therefore, let's fall
back to a simpler implementation of work-items that just uses
autovacuum's main shared memory segment for storage.

Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
2017-08-15 18:14:07 -03:00
Alvaro Herrera d9a622cee1 Fix error handling path in autovacuum launcher
The original code (since 00e6a16d01) was assuming aborting the
transaction in autovacuum launcher was sufficient to release all
resources, but in reality the launcher runs quite a lot of code out of
any transactions.  Re-introduce individual cleanup calls to make abort
more robust.

Reported-by: Robert Haas
Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
2017-08-15 13:35:12 -03:00
Alvaro Herrera b2c95a3798 Fix replication origin-related race conditions
Similar to what was fixed in commit 9915de6c1c for replication slots,
but this time it's related to replication origins: DROP SUBSCRIPTION
attempts to drop the replication origin, but that fails if the
replication worker process hasn't yet marked it unused.  This causes
failures in the buildfarm:
ERROR:  could not drop replication origin with OID 1, in use by PID 34069

Like the aforementioned commit, fix by having the process running DROP
SUBSCRIPTION sleep until the worker marks the the replication origin
struct as free.  This uses a condition variable on each replication
origin shmem state struct, so that the session trying to drop can sleep
and expect to be awakened by the process keeping the origin open.

Also fix a SGML markup in the previous commit.

Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql
2017-08-08 16:07:46 -04:00
Alvaro Herrera 030273b7ea Fix inadequacies in recently added wait events
In commit 9915de6c1c, we introduced a new wait point for replication
slots and incorrectly labelled it as wait event PG_WAIT_LOCK.  That's
wrong, so invent an appropriate new wait event instead, and document it
properly.

While at it, fix numerous other problems in the vicinity:
- two different walreceiver wait events were being mixed up in a single
  wait event (which wasn't documented either); split it out so that they
  can be distinguished, and document the new events properly.

- ParallelBitmapPopulate was documented but didn't exist.

- ParallelBitmapScan was not documented (I think this should be called
  "ParallelBitmapScanInit" instead.)

- Logical replication wait events weren't documented

- various symbols had been added in dartboard order in various places.
  Put them in alphabetical order instead, as was originally intended.

Discussion: https://postgr.es/m/20170808181131.mu4fjepuh5m75cyq@alvherre.pgsql
2017-08-08 15:37:44 -04:00
Tom Lane 45e004fb4e On Windows, retry process creation if we fail to reserve shared memory.
We've heard occasional reports of backend launch failing because
pgwin32_ReserveSharedMemoryRegion() fails, indicating that something
has already used that address space in the child process.  It's not
very clear what, given that we disable ASLR in Windows builds, but
suspicion falls on antivirus products.  It'd be better if we didn't
have to disable ASLR, anyway.  So let's try to ameliorate the problem
by retrying the process launch after such a failure, up to 100 times.

Patch by me, based on previous work by Amit Kapila and others.
This is a longstanding issue, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CAA4eK1+R6hSx6t_yvwtx+NRzneVp+MRqXAdGJZChcau8Uij-8g@mail.gmail.com
2017-07-10 11:00:09 -04:00
Tom Lane f13ea95f9e Change pg_ctl to detect server-ready by watching status in postmaster.pid.
Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log.  We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.

In commit c61559ec3, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse.  In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read.  This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683f).  pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.

In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order.  This is needed
on Windows where the SHMEM_KEY line will never be written at all.  We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.

Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway.  (Yes, I found that
out the hard way.)

While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious.  This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else.  For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec().  (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)

In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.

Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
2017-06-28 17:31:32 -04:00
Tom Lane e5d494d78c Don't lose walreceiver start requests due to race condition in postmaster.
When a walreceiver dies, the startup process will notice that and send
a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new
walreceiver to be launched.  There's a race condition, which at least
in HEAD is very easy to hit, whereby the postmaster might see that
signal before it processes the SIGCHLD from the walreceiver process.
In that situation, sigusr1_handler() just dropped the start request
on the floor, reasoning that it must be redundant.  Eventually, after
10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a
fresh request --- but that's a long time if the connection could have
been re-established almost immediately.

Fix it by setting a state flag inside the postmaster that we won't
clear until we do launch a walreceiver.  In cases where that results
in an extra walreceiver launch, it's up to the walreceiver to realize
it's unwanted and go away --- but we have, and need, that logic anyway
for the opposite race case.

I came across this through investigating unexpected delays in the
src/test/recovery TAP tests: it manifests there in test cases where
a master server is stopped and restarted while leaving streaming
slaves active.

This logic has been broken all along, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/21344.1498494720@sss.pgh.pa.us
2017-06-26 17:31:56 -04:00
Tom Lane ad1b5c842b Ignore old stats file timestamps when starting the stats collector.
The stats collector disregards inquiry messages that bear a cutoff_time
before when it last wrote the relevant stats file.  That's fine, but at
startup when it reads the "permanent" stats files, it absorbed their
timestamps as if they were the times at which the corresponding temporary
stats files had been written.  In reality, of course, there's no data
out there at all.  This led to disregarding inquiry messages soon after
startup if the postmaster had been shut down and restarted within less
than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for
testing and in the field.  Requesting backends would hang for 10 seconds
and then report failure to read statistics, unless they got bailed out
by some other backend coming along and making a newer request within
that interval.

I came across this through investigating unexpected delays in the
src/test/recovery TAP tests: it manifests there because the autovacuum
launcher hangs for 10 seconds when it can't get statistics at startup,
thus preventing a second shutdown from occurring promptly.  We might
want to do some things in the autovac code to make it less prone to
getting stuck that way, but this change is a good bug fix regardless.

In passing, also fix pgstat_read_statsfiles() to ensure that it
re-zeroes its global stats variables if they are corrupted by a
short read from the stats file.  (Other reads in that function
go into temp variables, so that the issue doesn't arise.)

This has been broken since we created the separation between permanent
and temporary stats files in 8.4, so back-patch to all supported branches.

Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
2017-06-26 16:17:05 -04:00
Alvaro Herrera a4f06606a3 Fix autovacuum launcher attachment to its DSA
The autovacuum launcher doesn't actually do anything with its DSA other
than creating it and attaching to it, but it's been observed that after
longjmp'ing to the standard error handling block (for example after
getting SIGINT) the autovacuum enters an infinite loop reporting that it
cannot attach to its DSA anymore (which is correct, because it's already
attached to it.)  Fix by only attempting to attach if not already
attached.

I introduced this bug together with BRIN autosummarization in
7526e10224.

Reported-by: Yugo Nagata.
Author: Thomas Munro.  I added the comment to go with it.
Discussion: https://postgr.es/m/20170621211538.0c9eae73.nagata@sraoss.co.jp
2017-06-22 13:50:26 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Andres Freund 9206ced1dc Clean up latch related code.
The larger part of this patch replaces usages of MyProc->procLatch
with MyLatch.  The latter works even early during backend startup,
where MyProc->procLatch doesn't yet.  While the affected code
shouldn't run in cases where it's not initialized, it might get copied
into places where it might.  Using MyLatch is simpler and a bit faster
to boot, so there's little point to stick with the previous coding.

While doing so I noticed some weaknesses around newly introduced uses
of latches that could lead to missed events, and an omitted
CHECK_FOR_INTERRUPTS() call in worker_spi.

As all the actual bugs are in v10 code, there doesn't seem to be
sufficient reason to backpatch this.

Author: Andres Freund
Discussion:
    https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de
    https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de
Backpatch: -
2017-06-06 16:13:00 -07:00
Andres Freund 703f148e98 Revert "Prevent panic during shutdown checkpoint"
This reverts commit 086221cf6b, which
was made to master only.

The approach implemented in the above commit has some issues.  While
those could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.

Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
2017-06-05 19:18:15 -07:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane 8b0b6303e9 Try to ensure that stats collector's receive buffer size is at least 100KB.
Since commit 4e37b3e15, buildfarm member frogmouth has been failing
occasionally with symptoms indicating that some expected stats data is
getting dropped.  The reason that that commit changed the behavior seems
probably to be that more data is getting shoved at the collector in a short
span of time.  In current sources, the stats test's first session sends
about 9KB of data while exiting, which is probably the same as what was
sent just before wait_for_stats() in the previous test design.  But now,
the test's second session is starting up concurrently, and it sends another
2KB (presumably reflecting its initial catalog accesses).  Since frogmouth
is running on Windows XP, which reputedly has a default socket receive
buffer size of only 8KB, it is not very surprising if this has put us over
the threshold where the receive buffer can overflow and drop messages.

The same mechanism could very easily explain the intermittent stats test
failures we've been seeing for years, since background processes such
as the bgwriter will sometimes send data concurrently with all this, and
could thus cause occasional buffer overflows.

Hence, insert some code into pgstat_init() to increase the stats socket's
receive buffer size to 100KB if it's less than that.  (On failure, emit a
LOG message, but keep going.)  Modern systems seem to have default sizes
in the range of 100KB-250KB, but older platforms don't.  I couldn't find
any platforms that wouldn't accept 100KB, so in theory this won't cause
any portability problems.

If this is successful at reducing the buildfarm failure rate in HEAD,
we should back-patch it, because it's certain that similar buffer overflows
happen in the field on platforms with small buffer sizes.  Going forward,
there might be an argument for trying to increase the buffer size even
more, but let's take a baby step first.

Discussion: https://postgr.es/m/22173.1494788088@sss.pgh.pa.us
2017-05-16 15:24:52 -04:00
Tom Lane 5d00b764cd Make pgstat tabstat lookup hash table less fragile.
Code review for commit 090010f2e.

Fix cases where an elog(ERROR) partway through a function would leave the
persistent data structures in a corrupt state.  pgstat_report_stat got this
wrong by invalidating PgStat_TableEntry structs before removing hashtable
entries pointing to them, and get_tabstat_entry got it wrong by ignoring
the possibility of palloc failure after it had already created a hashtable
entry.

Also, avoid leaking a memory context per transaction, which the previous
code did through misunderstanding hash_create's API.  We do not need to
create a context to hold the hash table; hash_create will do that.
(The leak wasn't that large, amounting to only a memory context header
per iteration, but it's still surprising that nobody noticed it yet.)
2017-05-14 22:52:49 -04:00
Peter Eisentraut c1a7f64b4a Replace "transaction log" with "write-ahead log"
This makes documentation and error messages match the renaming of "xlog"
to "wal" in APIs and file naming.
2017-05-12 11:52:43 -04:00
Andres Freund e6c44eef55 Fix off-by-one possibly leading to skipped XLOG_RUNNING_XACTS records.
Since 6ef2eba3f5 ("Skip checkpoints, archiving on idle systems."),
GetLastImportantRecPtr() is used to avoid performing superfluous
checkpoints, xlog switches, running-xact records when the system is
idle.  Unfortunately the check concerning running-xact records had a
off-by-one error, leading to such records being potentially skipped
when only a single record has been inserted since the last
running-xact record.

An alternative approach would have been to change
GetLastImportantRecPtr()'s definition to point to the end of records,
but that would make the checkpoint code more complicated.

Author: Andres Freund
Discussion: https://postgr.es/m/20170505012447.wsrympaxnfis6ojt@alap3.anarazel.de
Backpatch: no, code only present in master
2017-05-06 16:55:07 -07:00
Peter Eisentraut 086221cf6b Prevent panic during shutdown checkpoint
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-05-05 10:31:42 -04:00
Tom Lane aa1351f1ee Allow multiple bgworkers to be launched per postmaster iteration.
Previously, maybe_start_bgworker() would launch at most one bgworker
process per call, on the grounds that the postmaster might otherwise
neglect its other duties for too long.  However, that seems overly
conservative, especially since bad effects only become obvious when
many hundreds of bgworkers need to be launched at once.  On the other
side of the coin is that the existing logic could result in substantial
delay of bgworker launches, because ServerLoop isn't guaranteed to
iterate immediately after a signal arrives.  (My attempt to fix that
by using pselect(2) encountered too many portability question marks,
and in any case could not help on platforms without pselect().)
One could also question the wisdom of using an O(N^2) processing
method if the system is intended to support so many bgworkers.

As a compromise, allow that function to launch up to 100 bgworkers
per call (and in consequence, rename it to maybe_start_bgworkers).
This will allow any normal parallel-query request for workers
to be satisfied immediately during sigusr1_handler, avoiding the
question of whether ServerLoop will be able to launch more promptly.

There is talk of rewriting the postmaster to use a WaitEventSet to
avoid the signal-response-delay problem, but I'd argue that this change
should be kept even after that happens (if it ever does).

Backpatch to 9.6 where parallel query was added.  The issue exists
before that, but previous uses of bgworkers typically aren't as
sensitive to how quickly they get launched.

Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us
2017-04-26 16:17:34 -04:00
Tom Lane 64925603c9 Revert "Use pselect(2) not select(2), if available, to wait in postmaster's loop."
This reverts commit 81069a9efc.

Buildfarm results suggest that some platforms have versions of pselect(2)
that are not merely non-atomic, but flat out non-functional.  Revert the
use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART
patch as the source of trouble).  If it's so, we should probably look into
blacklisting specific platforms that have broken pselect.

Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
2017-04-24 18:29:03 -04:00
Tom Lane 81069a9efc Use pselect(2) not select(2), if available, to wait in postmaster's loop.
Traditionally we've unblocked signals, called select(2), and then blocked
signals again.  The code expects that the select() will be cancelled with
EINTR if an interrupt occurs; but there's a race condition, which is that
an already-pending signal will be delivered as soon as we unblock, and then
when we reach select() there will be nothing preventing it from waiting.
This can result in a long delay before we perform any action that
ServerLoop was supposed to have taken in response to the signal.  As with
the somewhat-similar symptoms fixed by commit 893902085, the main practical
problem is slow launching of parallel workers.  The window for trouble is
usually pretty short, corresponding to one iteration of ServerLoop; but
it's not negligible.

To fix, use pselect(2) in place of select(2) where available, as that's
designed to solve exactly this problem.  Where not available, we continue
to use the old way, and are no worse off than before.

pselect(2) has been required by POSIX since about 2001, so most modern
platforms should have it.  A bigger portability issue is that some
implementations are said to be non-atomic, ie pselect() isn't really
any different from unblock/select/reblock.  Still, we're no worse off
than before on such a platform.

There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point this
could be reverted, since we'd be using a self-pipe to solve the race
condition.  But that's not happening before v11 at the earliest.

Back-patch to 9.6.  The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.

Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
2017-04-24 14:03:14 -04:00
Tom Lane 8939020853 Run the postmaster's signal handlers without SA_RESTART.
The postmaster keeps signals blocked everywhere except while waiting
for something to happen in ServerLoop().  The code expects that the
select(2) will be cancelled with EINTR if an interrupt occurs; without
that, followup actions that should be performed by ServerLoop() itself
will be delayed.  However, some platforms interpret the SA_RESTART
signal flag as meaning that they should restart rather than cancel
the select(2).  Worse yet, some of them restart it with the original
timeout delay, meaning that a steady stream of signal interrupts can
prevent ServerLoop() from iterating at all if there are no incoming
connection requests.

Observable symptoms of this, on an affected platform such as HPUX 10,
include extremely slow parallel query startup (possibly as much as
30 seconds) and failure to update timestamps on the postmaster's sockets
and lockfiles when no new connections arrive for a long time.

We can fix this by running the postmaster's signal handlers without
SA_RESTART.  That would be quite a scary change if the range of code
where signals are accepted weren't so tiny, but as it is, it seems
safe enough.  (Note that postmaster children do, and must, reset all
the handlers before unblocking signals; so this change should not
affect any child process.)

There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point it might
be appropriate to revert this patch.  But that's not happening before
v11 at the earliest.

Back-patch to 9.6.  The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.

Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
2017-04-24 13:00:30 -04:00
Tom Lane 4fe04244b5 Fix postmaster's handling of fork failure for a bgworker process.
This corner case didn't behave nicely at all: the postmaster would
(partially) update its state as though the process had started
successfully, and be quite confused thereafter.  Fix it to act
like the worker had crashed, instead.

In passing, refactor so that do_start_bgworker contains all the
state-change logic for bgworker launch, rather than just some of it.

Back-patch as far as 9.4.  9.3 contains similar logic, but it's just
enough different that I don't feel comfortable applying the patch
without more study; and the use of bgworkers in 9.3 was so small
that it doesn't seem worth the extra work.

transam/parallel.c is still entirely unprepared for the possibility
of bgworker startup failure, but that seems like material for a
separate patch.

Discussion: https://postgr.es/m/4905.1492813727@sss.pgh.pa.us
2017-04-24 12:16:58 -04:00
Tom Lane 3e51725b38 Avoid depending on non-POSIX behavior of fcntl(2).
The POSIX standard does not say that the success return value for
fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1.
We had several calls that were making the stronger assumption.  Adjust
them to test specifically for -1 for strict spec compliance.

The standard further leaves open the possibility that the O_NONBLOCK
flag bit is not the only active one in F_SETFL's argument.  Formally,
therefore, one ought to get the current flags with F_GETFL and store
them back with only the O_NONBLOCK bit changed when trying to change
the nonblock state.  In port/noblock.c, we were doing the full pushup
in pg_set_block but not in pg_set_noblock, which is just weird.  Make
both of them do it properly, since they have little business making
any assumptions about the socket they're handed.  The other places
where we're issuing F_SETFL are working with FDs we just got from
pipe(2), so it's reasonable to assume the FDs' properties are all
default, so I didn't bother adding F_GETFL steps there.

Also, while pg_set_block deserves some points for trying to do things
right, somebody had decided that it'd be even better to cast fcntl's
third argument to "long".  Which is completely loony, because POSIX
clearly says the third argument for an F_SETFL call is "int".

Given the lack of field complaints, these missteps apparently are not
of significance on any common platforms.  But they're still wrong,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/30882.1492800880@sss.pgh.pa.us
2017-04-21 15:56:16 -04:00
Peter Eisentraut 6275f5d28a Fix new warnings from GCC 7
This addresses the new warning types -Wformat-truncation
-Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
2017-04-17 13:59:46 -04:00
Tom Lane 32470825d3 Avoid passing function pointers across process boundaries.
We'd already recognized that we can't pass function pointers across process
boundaries for functions in loadable modules, since a shared library could
get loaded at different addresses in different processes.  But actually the
practice doesn't work for functions in the core backend either, if we're
using EXEC_BACKEND.  This is the cause of recent failures on buildfarm
member culicidae.  Switch to passing a string function name in all cases.

Something like this needs to be back-patched into 9.6, but let's see
if the buildfarm likes it first.

Petr Jelinek, with a bunch of basically-cosmetic adjustments by me

Discussion: https://postgr.es/m/548f9c1d-eafa-e3fa-9da8-f0cc2f654e60@2ndquadrant.com
2017-04-14 23:50:16 -04:00
Peter Eisentraut 139eb9673c Report statistics in logical replication workers
Author: Stas Kelvich <s.kelvich@postgrespro.ru>
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-04-14 14:37:06 -04:00
Robert Haas 6599c9ac33 Add an Assert() to max_parallel_workers enforcement.
To prevent future bugs along the lines of the one corrected by commit
8ff518699f, or find any that remain
in the current code, add an Assert() that the difference between
parallel_register_count and parallel_terminate_count is in a sane
range.

Kuntal Ghosh, with considerable tidying-up by me, per a suggestion
from Neha Khatri.  Reviewed by Tomas Vondra.

Discussion: http://postgr.es/m/CAFO0U+-E8yzchwVnvn5BeRDPgX2z9vZUxQ8dxx9c0XFGBC7N1Q@mail.gmail.com
2017-04-11 13:03:44 -04:00
Robert Haas 8ff518699f Fix confusion of max_parallel_workers mechanism following crash.
Commit b460f5d669 failed to contemplate
the possibilit that a parallel worker registered before a crash would
be unregistered only after the crash; if that happened, we'd end up
with parallel_terminate_count > parallel_register_count and the
system would refuse to launch any more parallel workers.

The easiest way to fix that seems to be to forget BGW_NEVER_RESTART
workers in ResetBackgroundWorkerCrashTimes() rather than leaving them
around to be cleaned up after the conclusion of the restart, so that
they go away before rather than after shared memory is reset.

To make sure that this fix is water-tight, don't allow parallel
workers to be anything other than BGW_NEVER_RESTART, so that after
recovering from a crash, 0 is guaranteed to be the correct starting
value for parallel_register_count.  The core code wouldn't do this
anyway, but somebody might try to do it in extension code.

Report by Thomas Vondra.  Patch by me, reviewed by Kuntal Ghosh.

Discussion: http://postgr.es/m/CAGz5QC+AVEVS+3rBKRq83AxkJLMZ1peMt4nnrQwczxOrmo3CNw@mail.gmail.com
2017-04-11 12:46:40 -04:00
Robert Haas d4116a7719 Add ProcArrayGroupUpdate wait event.
Discussion: http://postgr.es/m/CA+TgmobgWHcXDcChX2+BqJDk2dkPVF85ZrJFhUyHHQmw8diTpA@mail.gmail.com
2017-04-07 13:41:47 -04:00
Alvaro Herrera 7526e10224 BRIN auto-summarization
Previously, only VACUUM would cause a page range to get initially
summarized by BRIN indexes, which for some use cases takes too much time
since the inserts occur.  To avoid the delay, have brininsert request a
summarization run for the previous range as soon as the first tuple is
inserted into the first page of the next range.  Autovacuum is in charge
of processing these requests, after doing all the regular vacuuming/
analyzing work on tables.

This doesn't impose any new tasks on autovacuum, because autovacuum was
already in charge of doing summarizations.  The only actual effect is to
change the timing, i.e. that it occurs earlier.  For this reason, we
don't go any great lengths to record these requests very robustly; if
they are lost because of a server crash or restart, they will happen at
a later time anyway.

Most of the new code here is in autovacuum, which can now be told about
"work items" to process.  This can be used for other things such as GIN
pending list cleaning, perhaps visibility map bit setting, both of which
are currently invoked during vacuum, but do not really depend on vacuum
taking place.

The requests are at the page range level, a granularity for which we did
not have SQL-level access; we only had index-level summarization
requests via brin_summarize_new_values().  It seems reasonable to add
SQL-level access to range-level summarization too, so add a function
brin_summarize_range() to do that.

Authors: Álvaro Herrera, based on sketch from Simon Riggs.
Reviewed-by: Thomas Munro.
Discussion: https://postgr.es/m/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql
2017-04-01 14:00:53 -03:00
Robert Haas 2113ac4cbb Don't use bgw_main even to specify in-core bgworker entrypoints.
On EXEC_BACKEND builds, this can fail if ASLR is in use.

Backpatch to 9.5.  On master, completely remove the bgw_main field
completely, since there is no situation in which it is safe for an
EXEC_BACKEND build.  On 9.6 and 9.5, leave the field intact to avoid
breaking things for third-party code that doesn't care about working
under EXEC_BACKEND.  Prior to 9.5, there are no in-core bgworker
entrypoints.

Petr Jelinek, reviewed by me.

Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com
2017-03-31 20:43:32 -04:00
Teodor Sigaev 090010f2ec Improve performance of find_tabstat_entry()/get_tabstat_entry()
Patch introduces a hash map reloid -> PgStat_TableStatus which improves
performance in case of large number of tables/partitions.

Author: Aleksander Alekseev
Reviewed-by: Andres Freund, Anastasia Lubennikova, Tels, me

https://commitfest.postgresql.org/13/1058/
2017-03-27 18:34:42 +03:00
Robert Haas fc70a4b0df Show more processes in pg_stat_activity.
Previously, auxiliary processes and background workers not connected
to a database (such as the logical replication launcher) weren't
shown.  Include them, so that we can see the associated wait state
information.  Add a new column to identify the processes type, so that
people can filter them out easily using SQL if they wish.

Before this patch was written, there was discussion about whether we
should expose this information in a separate view, so as to avoid
contaminating pg_stat_activity with things people might not want to
see.  But putting everything in pg_stat_activity was a more popular
choice, so that's what the patch does.

Kuntal Ghosh, reviewed by Amit Langote and Michael Paquier.  Some
revisions and bug fixes by me.

Discussion: http://postgr.es/m/CA+TgmoYES5nhkEGw9nZXU8_FhA8XEm8NTm3-SO+3ML1B81Hkww@mail.gmail.com
2017-03-26 22:02:22 -04:00
Peter Eisentraut 7c4f52409a Logical replication support for initial data copy
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.

For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function.  The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.

A WAL receiver can now execute full SQL commands.  This is used here to
obtain information about tables and publications.

Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.

Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
2017-03-23 08:55:37 -04:00
Tom Lane 17f8ffa1e3 Fix REFRESH MATERIALIZED VIEW to report activity to the stats collector.
The non-concurrent code path for REFRESH MATERIALIZED VIEW failed to
report its updates to the stats collector.  This is bad since it means
auto-analyze doesn't know there's any work to be done.  Adjust it to
report the refresh as a table truncate followed by insertion of an
appropriate number of rows.

Since a matview could contain more than INT_MAX rows, change the
signature of pgstat_count_heap_insert() to accept an int64 rowcount.
(The accumulator it's adding into is already int64, but existing
callers could not insert more than a small number of rows at once,
so the argument had been declared just "int n".)

This is surely a bug fix, but changing pgstat_count_heap_insert()'s API
seems too risky for the back branches.  Given the lack of previous
complaints, I'm not sure it's a big enough problem to justify a kluge
solution that would avoid that.  So, no back-patch, at least for now.

Jim Mlodgenski, adjusted a bit by me

Discussion: https://postgr.es/m/CAB_5SRchSz7-WmdO5szdiknG8Oj_GGqJytrk1KRd11yhcMs1KQ@mail.gmail.com
2017-03-18 17:49:39 -04:00
Robert Haas 249cf070e3 Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add and
6f3bd98ebf, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O.  Add wait events for those operations, too.

Rushabh Lathia, with further hacking by me.  Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.

Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 07:43:01 -04:00
Robert Haas 88e66d193f Rename "pg_clog" directory to "pg_xact".
Names containing the letters "log" sometimes confuse users into
believing that only non-critical data is present.  It is hoped
this renaming will discourage ill-considered removals of transaction
status data.

Michael Paquier

Discussion: http://postgr.es/m/CA+Tgmoa9xFQyjRZupbdEFuwUerFTvC6HjZq1ud6GYragGDFFgA@mail.gmail.com
2017-03-17 09:48:38 -04:00
Tom Lane 6ec4c8584c Reduce log verbosity of startup/shutdown for launcher subprocesses.
There's no really good reason why the autovacuum launcher and logical
replication launcher should announce themselves at startup and shutdown
by default.  Users don't care that those processes exist, and it's
inconsistent that those background processes announce themselves while
others don't.  So, reduce those messages from LOG to DEBUG1 level.

I was sorely tempted to reduce the "starting logical replication worker
for subscription ..." message to DEBUG1 as well, but forebore for now.
Those processes might possibly be of direct interest to users, at least
until logical replication is a lot better shaken out than it is today.

Discussion: https://postgr.es/m/19479.1489121003@sss.pgh.pa.us
2017-03-10 15:18:38 -05:00
Robert Haas f35742ccb7 Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.

Dilip Kumar, with some corrections and cosmetic changes by me.  The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.

Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 12:05:43 -05:00
Robert Haas 7f6fa29f18 Fix user-after-free bug.
Introduced by commit aea5d29836.

Patch from Amit Kapila.  Issue discovered independently by Amit Kapila
and Ashutosh Sharma.
2017-03-06 12:13:57 -05:00
Peter Eisentraut 1e8a850094 Use asynchronous connect API in libpqwalreceiver
This makes the connection attempt from CREATE SUBSCRIPTION and from
WalReceiver interruptable by the user in case the libpq connection is
hanging.  The previous coding required immediate shutdown (SIGQUIT) of
PostgreSQL in that situation.

From: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Thom Brown <thom@linux.com>
2017-03-03 09:13:58 -05:00
Robert Haas 19dc233c32 Add pg_current_logfile() function.
The syslogger will write out the current stderr and csvlog names, if
it's running and there are any, to a new file in the data directory
called "current_logfiles".  We take care to remove this file when it
might no longer be valid (but not at shutdown).  The function
pg_current_logfile() can be used to read the entries in the file.

Gilles Darold, reviewed and modified by Karl O.  Pinc, Michael
Paquier, and me.  Further review by Álvaro Herrera and Christoph Berg.
2017-03-03 11:43:11 +05:30
Robert Haas aea5d29836 Notify bgworker registrant after freeing worker slot.
Tom Lane observed buildfarm failures caused by the select_parallel
regression test trying to launch new parallel queries before the
worker slots used by the previous ones were freed.  Try to fix this by
having the postmaster free the worker slots before it sends the
SIGUSR1 notifications to the registering process.  This doesn't
completely eliminate the possibility that the user backend might
(correctly) observe the worker as dead before the slot is free, but I
believe it should make the window significantly narrower.

Patch by me, per complaint from Tom Lane.  Reviewed by Amit Kapila.

Discussion: http://postgr.es/m/30673.1487310734@sss.pgh.pa.us
2017-03-03 09:25:30 +05:30
Tom Lane 9e3755ecb2 Remove useless duplicate inclusions of system header files.
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.

While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files".  While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern.  (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)

I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
2017-02-25 16:12:55 -05:00
Robert Haas 569174f1be btree: Support parallel index scans.
This isn't exposed to the optimizer or the executor yet; we'll add
support for those things in a separate patch.  But this puts the
basic mechanism in place: several processes can attach to a parallel
btree index scan, and each one will get a subset of the tuples that
would have been produced by a non-parallel scan.  Each index page
becomes the responsibility of a single worker, which then returns
all of the TIDs on that page.

Rahila Syed, Amit Kapila, Robert Haas, reviewed and tested by
Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi.
2017-02-15 07:41:14 -05:00
Heikki Linnakangas 181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Peter Eisentraut 5a366b4ff4 Fix typo: pg_statistics -> pg_statistic 2017-01-25 14:38:33 -05:00
Peter Eisentraut f21a563d25 Move some things from builtins.h to new header files
This avoids that builtins.h has to include additional header files.
2017-01-20 20:29:53 -05:00
Robert Haas c6a389792e Avoid useless respawining the autovacuum launcher at high speed.
When (1) autovacuum = off and (2) there's at least one database with
an XID age greater than autovacuum_freeze_max_age and (3) all tables
in that database that need vacuuming are already being processed by a
worker and (4) the autovacuum launcher is started, a kind of infinite
loop occurs.  The launcher starts a worker and immediately exits.  The
worker, finding no worker to do, immediately starts the launcher,
supposedly so that the next database can be processed.  But because
datfrozenxid for that database hasn't been advanced yet, the new
worker gets put right back into the same database as the old one,
where it once again starts the launcher and exits.  High-speed ping
pong ensues.

There are several possible ways to break the cycle; this seems like
the safest one.

Amit Khandekar (code) and Robert Haas (comments), reviewed by
Álvaro Herrera.

Discussion: http://postgr.es/m/CAJ3gD9eWejf72HKquKSzax0r+epS=nAbQKNnykkMA0E8c+rMDg@mail.gmail.com
2017-01-20 15:55:45 -05:00
Peter Eisentraut 665d1fad99 Logical replication
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers

From: Petr Jelinek <petr@2ndquadrant.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-01-20 09:04:49 -05:00
Tom Lane 6667d9a6d7 Re-allow SSL passphrase prompt at server start, but not thereafter.
Leave OpenSSL's default passphrase collection callback in place during
the first call of secure_initialize() in server startup.  Although that
doesn't work terribly well in daemon contexts, some people feel we should
not break it for anyone who was successfully using it before.  We still
block passphrase demands during SIGHUP, meaning that you can't adjust SSL
configuration on-the-fly if you used a passphrase, but this is no worse
than what it was before commit de41869b6.  And we block passphrase demands
during EXEC_BACKEND reloads; that behavior wasn't useful either, but at
least now it's documented.

Tweak some related log messages for more readability, and avoid issuing
essentially duplicate messages about reload failure caused by a passphrase.

Discussion: https://postgr.es/m/29982.1483412575@sss.pgh.pa.us
2017-01-04 12:44:03 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Tom Lane de41869b64 Allow SSL configuration to be updated at SIGHUP.
It is no longer necessary to restart the server to enable, disable,
or reconfigure SSL.  Instead, we just create a new SSL_CTX struct
(by re-reading all relevant files) whenever we get SIGHUP.  Testing
shows that this is fast enough that it shouldn't be a problem.

In conjunction with that, downgrade the logic that complains about
pg_hba.conf "hostssl" lines when SSL isn't active: now that's just
a warning condition not an error.

An issue that still needs to be addressed is what shall we do with
passphrase-protected server keys?  As this stands, the server would
demand the passphrase again on every SIGHUP, which is certainly
impractical.  But the case was only barely supported before, so that
does not seem a sufficient reason to hold up committing this patch.

Andreas Karlsson, reviewed by Michael Banck and Michael Paquier

Discussion: https://postgr.es/m/556A6E8A.9030400@proxel.se
2017-01-02 21:37:12 -05:00
Andres Freund 6ef2eba3f5 Skip checkpoints, archiving on idle systems.
Some background activity (like checkpoints, archive timeout, standby
snapshots) is not supposed to happen on an idle system. Unfortunately
so far it was not easy to determine when a system is idle, which
defeated some of the attempts to avoid redundant activity on an idle
system.

To make that easier, allow to make individual WAL insertions as not
being "important". By checking whether any important activity happened
since the last time an activity was performed, it now is easy to check
whether some action needs to be repeated.

Use the new facility for checkpoints, archive timeout and standby
snapshots.

The lack of a facility causes some issues in older releases, but in my
opinion the consequences (superflous checkpoints / archived segments)
aren't grave enough to warrant backpatching.

Author: Michael Paquier, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI
Bug: #13685
Discussion:
    https://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.org
    https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com
Backpatch: -
2016-12-22 11:31:50 -08:00
Robert Haas 3761fe3c20 Simplify LWLock tranche machinery by removing array_base/array_stride.
array_base and array_stride were added so that we could identify the
offset of an LWLock within a tranche, but this facility is only very
marginally used apart from the main tranche.  So, give every lock in
the main tranche its own tranche ID and get rid of array_base,
array_stride, and all that's attached.  For debugging facilities
(Trace_lwlocks and LWLOCK_STATS) print the pointer address of the
LWLock using %p instead of the offset.  This is arguably more useful,
and certainly a lot cheaper.  Drop the offset-within-tranche from
the information reported to dtrace and from one can't-happen message
inside lwlock.c.

The main user-visible impact of this change is that pg_stat_activity
will now report all waits for LWLocks as "LWLock" rather than
reporting some as "LWLockTranche" and others as "LWLockNamed".

The main motivation for this change is that the need to specify an
array_base and an array_stride is awkward for parallel query.  There
is only a very limited supply of tranche IDs so we can't just keep
allocating new ones, and if we try to use the same tranche IDs every
time then we run into trouble when multiple parallel contexts are
use simultaneously.  So if we didn't get rid of this mechanism we'd
have to make it even more complicated.  By simplifying it in this
way, we instead reduce the size of the generated code for lwlock.c
by about 5%.

Discussion: http://postgr.es/m/CA+TgmoYsFn6NUW1x0AZtupJGUAs1UDY4dJtCN47_Q6D0sP80PA@mail.gmail.com
2016-12-16 11:29:23 -05:00
Tom Lane be7b2848c6 Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using.  That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky.  It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice.  Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.

Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.

For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence.  (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be.  That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.)  This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty.  Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.

For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space.  Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.

It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases.  However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here.  (This also ends up
being no change in what's physically stored in PGPROC.sem.  We're just
moving the HANDLE fetch from callees to callers.)

It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch.  The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.

Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 13:32:10 -05:00
Heikki Linnakangas 58445c5c8d Further cleanup from the strong-random patch.
Also use the new facility for generating RADIUS authenticator requests,
and salt in chkpass extension.

Reword the error messages to be nicer. Fix bogus error code used in the
message in BackendStartup.
2016-12-12 11:55:32 +02:00
Heikki Linnakangas 41493bac36 Fix two thinkos related to strong random keys.
pg_backend_random() is used for MD5 salt generation, but it can fail, and
no checks were done on its status code.

Fix memory leak, if generating a random number for a cancel key failed.

Both issues were spotted by Coverity. Fix by Michael Paquier.
2016-12-12 09:58:32 +02:00
Heikki Linnakangas 81f2e514a9 Fix query cancellation.
In commit fe0a0b59, the datatype used for MyCancelKey and other variables
that store cancel keys were changed from long to uint32, but I missed this
one. That broke query cancellation on platforms where long is wider than 32
bits.

Report by Andres Freund, fix by Michael Paquier.
2016-12-07 09:47:43 +02:00
Heikki Linnakangas fe0a0b5993 Replace PostmasterRandom() with a stronger source, second attempt.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.

pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:

- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom

Unlike the current pgcrypto function, the source is chosen by configure.
That makes it easier to test different implementations, and ensures that
we don't accidentally fall back to a less secure implementation, if the
primary source fails. All of those methods are quite reliable, it would be
pretty surprising for them to fail, so we'd rather find out by failing
hard.

If no strong random source is available, we fall back to using erand48(),
seeded from current timestamp, like PostmasterRandom() was. That isn't
cryptographically secure, but allows us to still work on platforms that
don't have any of the above stronger sources. Because it's not very secure,
the built-in implementation is only used if explicitly requested with
--disable-strong-random.

This replaces the more complicated Fortuna algorithm we used to have in
pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom,
so it doesn't seem worth the maintenance effort to keep that. pgcrypto
functions that require strong random numbers will be disabled with
--disable-strong-random.

Original patch by Magnus Hagander, tons of further work by Michael Paquier
and me.

Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com
2016-12-05 13:42:59 +02:00
Tom Lane b3427dade1 Delete deleteWhatDependsOn() in favor of more performDeletion() flag bits.
deleteWhatDependsOn() had grown an uncomfortably large number of
assumptions about what it's used for.  There are actually only two minor
differences between what it does and what a regular performDeletion() call
can do, so let's invent additional bits in performDeletion's existing flags
argument that specify those behaviors, and get rid of deleteWhatDependsOn()
as such.  (We'd probably have done it this way from the start, except that
performDeletion didn't originally have a flags argument, IIRC.)

Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an
extension, and use that when dropping temporary objects at session end.
This provides a more general solution to the problem addressed in a hacky
way in commit 08dd23cec: if an extension script creates temp objects and
forgets to remove them again, the whole extension went away when its
contained temp objects were deleted.  The previous solution only covered
temp relations, but this solves it for all object types.

These changes require minor additions in dependency.c to pass the flags
to subroutines that previously didn't get them, but it's still a net
savings of code, and it seems cleaner than before.

Having done this, revert the special-case code added in 08dd23cec that
prevented addition of pg_depend records for temp table extension
membership, because that caused its own oddities: dropping an extension
that had created such a table didn't automatically remove the table,
leading to a failure if the table had another dependency on the extension
(such as use of an extension data type), or to a duplicate-name failure if
you then tried to recreate the extension.  But we keep the part that
prevents the pg_temp_nnn schema from becoming an extension member; we never
want that to happen.  Add a regression test case covering these behaviors.

Although this fixes some arguable bugs, we've heard few field complaints,
and any such problems are easily worked around by explicitly dropping temp
objects at the end of extension scripts (which seems like good practice
anyway).  So I won't risk a back-patch.

Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com
2016-12-02 14:57:55 -05:00
Robert Haas b460f5d669 Add max_parallel_workers GUC.
Increase the default value of the existing max_worker_processes GUC
from 8 to 16, and add a new max_parallel_workers GUC with a maximum
of 8.  This way, even if the maximum amount of parallel query is
happening, there is still room for background workers that do other
things, as originally envisioned when max_worker_processes was added.

Julien Rouhaud, reviewed by Amit Kapila and by revised by me.
2016-12-02 07:42:58 -05:00
Peter Eisentraut 597a87ccc9 Use latch instead of select() in walreceiver
Replace use of poll()/select() by WaitLatchOrSocket(), which is more
portable and flexible.

Also change walreceiver to use its procLatch instead of a custom latch.

From: Petr Jelinek <petr@2ndquadrant.com>
2016-12-01 20:23:28 -05:00
Tom Lane dafa0848da Code review for early drop of orphaned temp relations in autovacuum.
Commit a734fd5d1 exposed some race conditions that existed previously
in the autovac code, but were basically harmless because autovac would
not try to delete orphaned relations immediately.  Specifically, the test
for orphaned-ness was made on a pg_class tuple that might be dead by now,
allowing autovac to try to remove a table that the owning backend had just
finished deleting.  This resulted in a hard crash due to inadequate caution
about accessing the table's catalog entries without any lock.  We must take
a relation lock and then recheck whether the table is still present and
still looks deletable before we do anything.

Also, it seemed to me that deleting multiple tables per transaction, and
trying to continue after errors, represented unjustifiable complexity.
We do not expect this code path to be taken often in the field, nor even
during testing, which means that prioritizing performance over correctness
is a bad tradeoff.  Rip all that out in favor of just starting a new
transaction after each successful temp table deletion.  If we're unlucky
enough to get an error, which shouldn't happen anyway now that we're being
more cautious, let the autovacuum worker fail as it normally would.

In passing, improve the order of operations in the initial scan loop.
Now that we don't care about whether a temp table is a wraparound hazard,
there's no need to perform extract_autovac_opts, get_pgstat_tabentry_relid,
or relation_needs_vacanalyze for temp tables.

Also, if GetTempNamespaceBackendId returns InvalidBackendId (indicating
it doesn't recognize the schema as temp), treat that as meaning it's NOT
an orphaned temp table, not that it IS one, which is what happened before
because BackendIdGetProc necessarily failed.  The case really shouldn't
come up for a table that has RELPERSISTENCE_TEMP, but the consequences
if it did seem undesirable.  (This might represent a back-patchable bug
fix; not sure if it's worth the trouble.)

Discussion: https://postgr.es/m/21299.1480272347@sss.pgh.pa.us
2016-11-27 21:23:39 -05:00
Robert Haas e343dfa42b Remove barrier.h
A new thing also called a "barrier" is proposed, but whether we decide
to take that patch or not, this file seems to have outlived its
usefulness.

Thomas Munro
2016-11-22 20:28:24 -05:00
Robert Haas e8ac886c24 Support condition variables.
Condition variables provide a flexible way to sleep until a
cooperating process causes an arbitrary condition to become true.  In
simple cases, this can be accomplished with a WaitLatch/ResetLatch
loop; the cooperating process can call SetLatch after performing work
that might cause the condition to be satisfied, and the waiting
process can recheck the condition each time.  However, if the process
performing the work doesn't have an easy way to identify which
processes might be waiting, this doesn't work, because it can't
identify which latches to set.  Condition variables solve that problem
by internally maintaining a list of waiters; a process that may have
caused some waiter's condition to be satisfied must "signal" or
"broadcast" on the condition variable.

Robert Haas and Thomas Munro
2016-11-22 14:27:11 -05:00
Tom Lane ae92a9a380 Fix uninitialized variable.
Oversight in a734fd5d1.

Michael Paquier
2016-11-21 19:59:56 -05:00
Robert Haas a734fd5d1c autovacuum: Drop orphan temp tables more quickly but with more caution.
Previously, we only dropped an orphan temp table when it became old
enough to threaten wraparound; instead, doing it immediately.  The
only value of waiting is that someone might be able to examine the
contents of the orphan temp table for forensic purposes, but it's
pretty difficult to actually do that and few users will wish to do so.
On the flip side, not performing the drop immediately generates log
spam and bloats pg_class.

In addition, per a report from Grigory Smolkin, if a temporary schema
contains a very large number of temporary tables, a backend attempting
to clear the temporary schema might fail due to lock table exhaustion.
It's helpful for autovacuum to clean up after such cases, and we don't
want it to wait for wraparound to threaten before doing so.  To
prevent autovacuum from failing in the same manner as a backend trying
to drop an entire temp schema, remove orphan temp tables in batches of
50, committing after each batch, so that we don't accumulate an
unbounded number of locks.  If a drop fails, retry other orphan tables
that need to be dropped up to 10 times before giving up.  With this
system, if a backend does fail to clean a temporary schema due to
lock table exhaustion, autovacuum should hopefully put things right
the next time it processes the database.

Discussion: CAB7nPqSbYT6dRwsXVgiKmBdL_ARemfDZMPA+RPeC_ge0GK70hA@mail.gmail.com

Michael Paquier, with a bunch of comment changes by me.
2016-11-21 13:01:50 -05:00
Robert Haas 4f714b2fd2 If the stats collector dies during Hot Standby, restart it.
This bug exists as far back as 9.0, when Hot Standby was introduced,
so back-patch to all supported branches.

Report and patch by Takayuki Tsunakawa, reviewed by Michael Paquier
and Kuntal Ghosh.
2016-10-27 14:27:40 -04:00
Heikki Linnakangas 56f39009c5 Fix typos in comments.
Vinayak Pokale
2016-10-26 11:12:31 +03:00
Heikki Linnakangas faae1c918e Revert "Replace PostmasterRandom() with a stronger way of generating randomness."
This reverts commit 9e083fd468. That was a
few bricks shy of a load:

* Query cancel stopped working
* Buildfarm member pademelon stopped working, because the box doesn't have
  /dev/urandom nor /dev/random.

This clearly needs some more discussion, and a quite different patch, so
revert for now.
2016-10-18 16:28:23 +03:00
Heikki Linnakangas 9e083fd468 Replace PostmasterRandom() with a stronger way of generating randomness.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.

pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
- /dev/random

Original patch by Magnus Hagander, with further work by Michael Paquier
and me.

Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>
2016-10-17 11:52:50 +03:00
Tom Lane 81e82a2bd4 Fix handling of pgstat counters for TRUNCATE in a prepared transaction.
pgstat_twophase_postcommit is supposed to duplicate the math in
AtEOXact_PgStat, but it had missed out the bit about clearing
t_delta_live_tuples/t_delta_dead_tuples for a TRUNCATE.

It's harder than you might think to replicate the issue here, because
those counters would only be nonzero when a previous transaction in
the same backend had added/deleted tuples in the truncated table,
and those counts hadn't been sent to the stats collector yet.

Evident oversight in commit d42358efb.  I've not added a regression
test for this; we tried to add one in d42358efb, and had to revert it
because it was too timing-sensitive for the buildfarm.

Back-patch to 9.5 where d42358efb came in.

Stas Kelvich

Discussion: <EB57BF68-C06D-4737-BDDC-4BA778F4E62B@postgrespro.ru>
2016-10-13 19:46:05 -04:00
Tom Lane 15fc5e1581 Clean up handling of anonymous mmap'd shared-memory segment.
Fix detaching of the mmap'd segment to have its own on_shmem_exit callback,
rather than piggybacking on the one for detaching from the SysV segment.
That was confusing, and given the distance between the two attach calls,
it was trouble waiting to happen.

Make the detaching calls idempotent by clearing AnonymousShmem to show
we've already unmapped.  I spent quite a bit of time yesterday trying
to find a path that would allow the munmap()'s to be done twice, and
while I did not succeed, it seems silly that there's even a question.

Make the #ifdef logic less confusing by separating "do we want to use
anonymous shmem" from EXEC_BACKEND.  Even though there's no current
scenario where those conditions are different, it is not helpful for
different places in the same file to be testing EXEC_BACKEND for what
are fundamentally different reasons.

Don't do on_exit_reset() in StartBackgroundWorker().  At best that's
useless (InitPostmasterChild would have done it already) and at worst
it could zap some callback that's unrelated to shared memory.

Improve comments, and simplify the huge_pages enablement logic slightly.

Back-patch to 9.4 where hugepage support was introduced.
Arguably this should go into 9.3 as well, but the code looks
significantly different there, and I doubt it's worth the
trouble of adapting the patch given I can't show a live bug.
2016-10-13 13:59:56 -04:00
Robert Haas eb3bc0bd1a Re-alphabetize #include directives.
Thomas Munro
2016-10-05 08:24:25 -04:00
Robert Haas d2ce38e204 Rename WAIT_* constants to PG_WAIT_*.
Windows apparently has a constant named WAIT_TIMEOUT, and some of these
other names are pretty generic, too.  Insert "PG_" at the front of each
name in order to disambiguate.

Michael Paquier
2016-10-05 08:04:52 -04:00
Robert Haas 6c9c95ed1b Fix another Windows compile break.
Commit 6f3bd98ebf is still making
the buildfarm unhappy.  This time it's mastodon that is complaining.
2016-10-04 13:14:19 -04:00
Robert Haas 9445d1121d Fix Windows compile break in 6f3bd98ebf. 2016-10-04 12:18:05 -04:00
Robert Haas 6f3bd98ebf Extend framework from commit 53be0b1ad to report latch waits.
WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an
additional wait_event_info parameter; legal values are defined in
pgstat.h.  This makes it possible to uniquely identify every point in
the core code where we are waiting for a latch; extensions can pass
WAIT_EXTENSION.

Because latches were the major wait primitive not previously covered
by this patch, it is now possible to see information in
pg_stat_activity on a large number of important wait events not
previously addressed, such as ClientRead, ClientWrite, and SyncRep.

Unfortunately, many of the wait events added by this patch will fail
to appear in pg_stat_activity because they're only used in background
processes which don't currently appear in pg_stat_activity.  We should
fix this either by creating a separate view for such information, or
else by deciding to include them in pg_stat_activity after all.

Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and
Thomas Munro.
2016-10-04 11:01:42 -04:00
Tom Lane 3b90e38c5d Do ClosePostmasterPorts() earlier in SubPostmasterMain().
In standard Unix builds, postmaster child processes do ClosePostmasterPorts
immediately after InitPostmasterChild, that is almost immediately after
being spawned.  This is important because we don't want children holding
open the postmaster's end of the postmaster death watch pipe.

However, in EXEC_BACKEND builds, SubPostmasterMain was postponing this
responsibility significantly, in order to make it slightly more convenient
to pass the right flag value to ClosePostmasterPorts.  This is bad,
particularly seeing that process_shared_preload_libraries() might invoke
nearly-arbitrary code.  Rearrange so that we do it as soon as we've
fetched the socket FDs via read_backend_variables().

Also move the comment explaining about randomize_va_space to before the
call of PGSharedMemoryReAttach, which is where it's relevant.  The old
placement was appropriate when the reattach happened inside
CreateSharedMemoryAndSemaphores, but that was a long time ago.

Back-patch to 9.3; the patch doesn't apply cleanly before that, and
it doesn't seem worth a lot of effort given that we've had no actual
field complaints traceable to this.

Discussion: <4157.1475178360@sss.pgh.pa.us>
2016-10-01 17:15:09 -04:00
Alvaro Herrera 51c3e9fade Include <sys/select.h> where needed
<sys/select.h> is required by POSIX.1-2001 to get the prototype of
select(2), but nearly no systems enforce that because older standards
let you get away with including some other headers.  Recent OpenBSD
hacking has removed that frail touch of friendliness, however, which
broke some compiles; fix all the way back to 9.1 by adding the required
standard.  Only vacuumdb.c was reported to fail, but it seems easier to
fix the whole lot in a fell swoop.

Per bug #14334 by Sean Farrell.
2016-09-27 01:05:21 -03:00
Tom Lane da6c4f6ca8 Refer to OS X as "macOS", except for the port name which is still "darwin".
We weren't terribly consistent about whether to call Apple's OS "OS X"
or "Mac OS X", and the former is probably confusing to people who aren't
Apple users.  Now that Apple has rebranded it "macOS", follow their lead
to establish a consistent naming pattern.  Also, avoid the use of the
ancient project name "Darwin", except as the port code name which does not
seem desirable to change.  (In short, this patch touches documentation and
comments, but no actual code.)

I didn't touch contrib/start-scripts/osx/, either.  I suspect those are
obsolete and due for a rewrite, anyway.

I dithered about whether to apply this edit to old release notes, but
those were responsible for quite a lot of the inconsistencies, so I ended
up changing them too.  Anyway, Apple's being ahistorical about this,
so why shouldn't we be?
2016-09-25 15:40:57 -04:00
Tom Lane 49a91b88e6 Avoid using PostmasterRandom() for DSM control segment ID.
Commits 470d886c3 et al intended to fix the problem that the postmaster
selected the same "random" DSM control segment ID on every start.  But
using PostmasterRandom() for that destroys the intended property that the
delay between random_start_time and random_stop_time will be unpredictable.
(Said delay is probably already more predictable than we could wish, but
that doesn't mean that reducing it by a couple orders of magnitude is OK.)
Revert the previous patch and add a comment warning against misuse of
PostmasterRandom.  Fix the original problem by calling srandom() early in
PostmasterMain, using a low-security seed that will later be overwritten
by PostmasterRandom.

Discussion: <20789.1474390434@sss.pgh.pa.us>
2016-09-23 09:54:11 -04:00
Robert Haas 470d886c32 Use PostmasterRandom(), not random(), for DSM control segment ID.
Otherwise, every startup gets the same "random" value, which is
definitely not what was intended.
2016-09-20 12:26:29 -04:00
Heikki Linnakangas ec136d19b2 Move code shared between libpq and backend from backend/libpq/ to common/.
When building libpq, ip.c and md5.c were symlinked or copied from
src/backend/libpq into src/interfaces/libpq, but now that we have a
directory specifically for routines that are shared between the server and
client binaries, src/common/, move them there.

Some routines in ip.c were only used in the backend. Keep those in
src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file
that's now in common.

Fix the comment in src/common/Makefile to reflect how libpq actually links
those files.

There are two more files that libpq symlinks directly from src/backend:
encnames.c and wchar.c. I don't feel compelled to move those right now,
though.

Patch by Michael Paquier, with some changes by me.

Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
2016-09-02 13:49:59 +03:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Heikki Linnakangas fa878703f4 Refactor RandomSalt to handle salts of different lengths.
All we need is 4 bytes at the moment, for MD5 authentication. But in
upcomint patches for SCRAM authentication, SCRAM will need a salt of
different length. It's less scary for the caller to pass the buffer
length anyway, than assume a certain-sized output buffer.

Author: Michael Paquier
Discussion: <CAB7nPqQvO4sxLFeS9D+NM3wpy08ieZdAj_6e117MQHZAfxBFsg@mail.gmail.com>
2016-08-18 13:41:17 +03:00
Tom Lane 8d498a5c8a Fix bogus coding in WaitForBackgroundWorkerShutdown().
Some conditions resulted in "return" directly out of a PG_TRY block,
which left the exception stack dangling, and to add insult to injury
failed to restore the state of set_latch_on_sigusr1.

This is a bug only in 9.5; in HEAD it was accidentally fixed by commit
db0f6cad4, which removed the surrounding PG_TRY block.  However, I (tgl)
chose to apply the patch to HEAD as well, because the old coding was
gratuitously different from WaitForBackgroundWorkerStartup(), and there
would indeed have been no bug if it were done like that to start with.

Dmitry Ivanov

Discussion: <1637882.WfYN5gPf1A@abook>
2016-08-04 16:06:14 -04:00
Tom Lane ef1b5af823 Do not let PostmasterContext survive into background workers.
We don't want postmaster child processes to contain a copy of the
postmaster's PostmasterContext.  That would be a waste of memory at least,
and at worst a security issue, since there are copies of the semi-sensitive
pg_hba and pg_ident data in there.  All other child process types delete
the PostmasterContext after forking, but the original coding of the
background worker patch (commit da07a1e85) did not do so.  It appears
that the only reason for that was to avoid copying the bgworker's
MyBgworkerEntry out of that context; but the couple of additional
statements needed to do so are hardly good justification for it.  Hence,
copy that data and then clear the context as other child processes do.

Because this patch changes the memory context in which a bgworker function
gains control, back-patching it would be a bit risky, so we won't fix this
in back branches.  The "security" complaint is pretty thin anyway for
generic bgworkers; only with the introduction of parallel query is there
any question of running untrusted code in a bgworker process.

Discussion: <14111.1470082717@sss.pgh.pa.us>
2016-08-03 14:48:13 -04:00
Tom Lane c6ea616ff7 Remove duplicate InitPostmasterChild() call while starting a bgworker.
This is apparently harmless on Windows, but on Unix it results in an
assertion failure.  We'd not noticed because this code doesn't get
used on Unix unless you build with -DEXEC_BACKEND.  Bug was evidently
introduced by sloppy refactoring in commit 31c453165.

Thomas Munro

Discussion: <CAEepm=1VOnbVx4wsgQFvj94hu9jVt2nVabCr7QiooUSvPJXkgQ@mail.gmail.com>
2016-08-02 18:39:14 -04:00
Tom Lane e45e990e4b Make "postgres -C guc" print "" not "(null)" for null-valued GUCs.
Commit 0b0baf262 et al made this case print "(null)" on the grounds that
that's what happened on platforms that didn't crash.  But neither behavior
was actually intentional.  What we should print is just an empty string,
for compatibility with the behavior of SHOW and other ways of examining
string GUCs.  Those code paths don't distinguish NULL from empty strings,
so we should not here either.  Per gripe from Alain Radix.

Like the previous patch, back-patch to 9.2 where -C option was introduced.

Discussion: <CA+YdpwxPUADrmxSD7+Td=uOshMB1KkDN7G7cf+FGmNjjxMhjbw@mail.gmail.com>
2016-06-22 11:55:18 -04:00
Tom Lane 0b0baf2621 Avoid crash in "postgres -C guc" for a GUC with a null string value.
Emit "(null)" instead, which was the behavior all along on platforms
that don't crash, eg OS X.  Per report from Jehan-Guillaume de Rorthais.
Back-patch to 9.2 where -C option was introduced.

Michael Paquier

Report: <20160615204036.2d35d86a@firost>
2016-06-16 12:17:38 -04:00
Tom Lane 8383486f10 Force idle_in_transaction_session_timeout off in pg_dump and autovacuum.
We disable statement_timeout and lock_timeout during dump and restore, to
prevent any global settings that might exist from breaking routine backups.
Commit c6dda1f48 should have added idle_in_transaction_session_timeout to
that list, but failed to.

Another place where these timeouts get turned off is autovacuum.  While
I doubt an idle timeout could fire there, it seems better to be safe than
sorry.

pg_dump issue noted by Bernd Helmle, the other one found by grepping.

Report: <352F9B77DB5D3082578D17BB@eje.land.credativ.lan>
2016-06-15 10:53:03 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Tom Lane f64340e743 Don't reset changes_since_analyze after a selective-columns ANALYZE.
If we ANALYZE only selected columns of a table, we should not postpone
auto-analyze because of that; other columns may well still need stats
updates.  As committed, the counter is left alone if a column list is
given, whether or not it includes all analyzable columns of the table.
Per complaint from Tomasz Ostrowski.

It's been like this a long time, so back-patch to all supported branches.

Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>
2016-06-06 17:44:17 -04:00
Tom Lane 22b27b4c9e Avoid useless closely-spaced writes of statistics files.
The original intent in the stats collector was that we should not write out
stats data oftener than every PGSTAT_STAT_INTERVAL msec.  Backends will not
make requests at all if they see the existing data is newer than that, and
the stats collector is supposed to disregard requests having a cutoff_time
older than its most recently written data, so that close-together requests
don't result in multiple writes.  But the latter part of that got broken
in commit 187492b6c2, so that if two backends concurrently decide
the existing stats are too old, the collector would write the data twice.
(In principle the collector's logic would still merge requests as long as
the second one arrives before we've actually written data ... but since
the message collection loop would write data immediately after processing
a single inquiry message, that never happened in practice, and in any case
the window in which it might work would be much shorter than
PGSTAT_STAT_INTERVAL.)

To fix, improve pgstat_recv_inquiry so that it checks whether the cutoff
time is too old, and doesn't add a request to the queue if so.  This means
that we do not need DBWriteRequest.request_time, because the decision is
taken before making a queue entry.  And that means that we don't really
need the DBWriteRequest data structure at all; an OID list of database
OIDs will serve and allow removal of some rather verbose and crufty code.

In passing, improve the comments in this area, which have been rather
neglected.  Also change backend_read_statsfile so that it's not silently
relying on MyDatabaseId to have some particular value in the autovacuum
launcher process.  It accidentally worked as desired because MyDatabaseId
is zero in that process; but that does not seem like a dependency we want,
especially with no documentation about it.

Although this patch is mine, it turns out I'd rediscovered a known bug,
for which Tomas Vondra had already submitted a patch that's functionally
equivalent to the non-cosmetic aspects of this patch.  Thanks to Tomas
for reviewing this version.

Back-patch to 9.3 where the bug was introduced.

Prior-Discussion: <1718942738eb65c8407fcd864883f4c8@fuzzy.cz>
Patch: <4625.1464202586@sss.pgh.pa.us>
2016-05-31 15:55:15 -04:00
Tom Lane 52e8fc3e2e Ensure that backends see up-to-date statistics for shared catalogs.
Ever since we split the statistics collector's reports into per-database
files (commit 187492b6c2), backends have been seeing stale statistics
for shared catalogs.  This is because the inquiry message only prompts the
collector to write the per-database file for the requesting backend's own
database.  Stats for shared catalogs are in a separate file for "DB 0",
which didn't get updated.

In normal operation this was partially masked by the fact that the
autovacuum launcher would send an inquiry message at least once per
autovacuum_naptime that asked for "DB 0"; so the shared-catalog stats would
never be more than a minute out of date.  However the problem becomes very
obvious with autovacuum disabled, as reported by Peter Eisentraut.

To fix, redefine the semantics of inquiry messages so that both the
specified DB and DB 0 will be dumped.  (This might seem a bit inefficient,
but we have no good way to know whether a backend's transaction will look
at shared-catalog stats, so we have to read both groups of stats whenever
we request stats.  Sending two inquiry messages would definitely not be
better.)

Back-patch to 9.3 where the bug was introduced.

Report: <56AD41AC.1030509@gmx.net>
2016-05-25 17:48:15 -04:00
Alvaro Herrera 15739393e4 Fix autovacuum for shared relations
The table-skipping logic in autovacuum would fail to consider that
multiple workers could be processing the same shared catalog in
different databases.  This normally wouldn't be a problem: firstly
because autovacuum workers not for wraparound would simply ignore tables
in which they cannot acquire lock, and secondly because most of the time
these tables are small enough that even if multiple for-wraparound
workers are stuck in the same catalog, they would be over pretty
quickly.  But in cases where the catalogs are severely bloated it could
become a problem.

Backpatch all the way back, because the problem has been there since the
beginning.

Reported by Ondřej Světlík

Discussion: https://www.postgresql.org/message-id/572B63B1.3030603%40flexibee.eu
	https://www.postgresql.org/message-id/572A1072.5080308%40flexibee.eu
2016-05-10 16:23:54 -03:00
Stephen Frost 1574783b4c Use GRANT system to manage access to sensitive functions
Now that pg_dump will properly dump out any ACL changes made to
functions which exist in pg_catalog, switch to using the GRANT system
to manage access to those functions.

This means removing 'if (!superuser()) ereport()' checks from the
functions themselves and then REVOKEing EXECUTE right from 'public' for
these functions in system_views.sql.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Simon Riggs cac0e36682 Revert bf08f2292f
Remove recent changes to logging XLOG_RUNNING_XACTS by request.
2016-04-06 14:03:46 +01:00
Simon Riggs bf08f2292f Avoid archiving XLOG_RUNNING_XACTS on idle server
If archive_timeout > 0 we should avoid logging XLOG_RUNNING_XACTS if idle.

Bug 13685 reported by Laurence Rowe, investigated in detail by Michael Paquier,
though this is not his proposed fix.
20151016203031.3019.72930@wrigleys.postgresql.org

Simple non-invasive patch to allow later backpatch to 9.4 and 9.5
2016-04-04 07:18:05 +01:00
Peter Eisentraut b555ed8102 Merge wal_level "archive" and "hot_standby" into new name "replica"
The distinction between "archive" and "hot_standby" existed only because
at the time "hot_standby" was added, there was some uncertainty about
stability.  This is now a long time ago.  We would like to move forward
with simplifying the replication configuration, but this distinction is
in the way, because a primary server cannot tell (without asking a
standby or predicting the future) which one of these would be the
appropriate level.

Pick a new name for the combined setting to make it clearer that it
covers all (non-logical) backup and replication uses.  The old values
are still accepted but are converted internally.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: David Steele <david@pgmasters.net>
2016-03-18 23:56:03 +01:00
Robert Haas f2b74b01d4 Another comment update.
I thought this was in my last commit, but I goofed.
2016-03-16 14:28:25 -04:00
Robert Haas bc55cc0b6a Fix problems in commit c16dc1aca5.
Vinayak Pokale provided a patch for a copy-and-paste error in a
comment.  I noticed that I'd use the word "automatically" nearby where
I meant to talk about things being "atomic".  Rahila Syed spotted a
misplaced counter update.  Fix all that stuff.
2016-03-16 13:54:04 -04:00
Robert Haas c16dc1aca5 Add simple VACUUM progress reporting.
There's a lot more that could be done here yet - in particular, this
reports only very coarse-grained information about the index vacuuming
phase - but even as it stands, the new pg_stat_progress_vacuum can
tell you quite a bit about what a long-running vacuum is actually
doing.

Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
and Rahila Syed.
2016-03-15 13:32:56 -04:00
Andres Freund 428b1d6b29 Allow to trigger kernel writeback after a configurable number of writes.
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches.  When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively.  This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache.  Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

While desirable and likely possible this patch does not contain an
implementation for windows.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences.  This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:04:34 -08:00
Simon Riggs 37c54863cf Rework wait for AccessExclusiveLocks on Hot Standby
Earlier version committed in 9.0 caused spurious waits in some cases.
New infrastructure for lock waits in 9.3 used to correct and improve this.

Jeff Janes based upon a proposal by Simon Riggs, who also reviewed
Additional review comments from Amit Kapila
2016-03-10 19:26:24 +00:00
Robert Haas 53be0b1add Provide much better wait information in pg_stat_activity.
When a process is waiting for a heavyweight lock, we will now indicate
the type of heavyweight lock for which it is waiting.  Also, you can
now see when a process is waiting for a lightweight lock - in which
case we will indicate the individual lock name or the tranche, as
appropriate - or for a buffer pin.

Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
discussion and suggestions by many others, including Alexander
Korotkov, Vladimir Borodin, and many others.
2016-03-10 12:44:09 -05:00
Robert Haas 090b287fc5 Code review for b6fb6471f6.
Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev.
Patch by Amit Langote.
2016-03-10 06:07:57 -05:00
Andres Freund 1d4a0ab19a Avoid unlikely data-loss scenarios due to rename() without fsync.
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.

Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability.  The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.

Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Robert Haas b6fb6471f6 Add a generic command progress reporting facility.
Using this facility, any utility command can report the target relation
upon which it is operating, if there is one, and up to 10 64-bit
counters; the intent of this is that users should be able to figure out
what a utility command is doing without having to resort to ugly hacks
like attaching strace to a backend.

As a demonstration, this adds very crude reporting to lazy vacuum; we
just report the target relation and nothing else.  A forthcoming patch
will make VACUUM report a bunch of additional data that will make this
much more interesting.  But this gets the basic framework in place.

Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
and Masanori Oyama.
2016-03-09 12:08:58 -05:00
Andres Freund 7975c5e0a9 Allow the WAL writer to flush WAL at a reduced rate.
Commit 4de82f7d7 increased the WAL flush rate, mainly to increase the
likelihood that hint bits can be set quickly. More quickly set hint bits
can reduce contention around the clog et al.  But unfortunately the
increased flush rate can have a significant negative performance impact,
I have measured up to a factor of ~4.  The reason for this slowdown is
that if there are independent writes to the underlying devices, for
example because shared buffers is a lot smaller than the hot data set,
or because a checkpoint is ongoing, the fdatasync() calls force cache
flushes to be emitted to the storage.

This is achieved by flushing WAL only if the last flush was longer than
wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC)
unflushed blocks are pending. Based on some tests the default for
wal_writer_delay is 1MB, which seems to work well both on SSD and
rotational media.

To avoid negative performance impact due to 4de82f7d7 an earlier
commit (db76b1e) made SetHintBits() more likely to succeed; preventing
performance regressions in the pgbench tests I performed.

Discussion: 20160118163908.GW10941@awork2.anarazel.de
2016-02-16 00:56:34 +01:00
Tom Lane c5e9b77127 Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier."
This reverts commit 3971f64843 and a
couple of followon debugging commits; I think we've learned what we can
from them.
2016-02-10 16:01:04 -05:00
Tom Lane 41d505a7ff Add still more chattiness in server shutdown.
Further investigation says that there may be some slow operations after
we've finished ShutdownXLOG(), so add some more log messages to try to
isolate that.  This is all temporary code too.
2016-02-09 19:36:30 -05:00
Tom Lane 3971f64843 Temporarily make pg_ctl and server shutdown a whole lot chattier.
This is a quick hack, due to be reverted when its purpose has been served,
to try to gather information about why some of the buildfarm critters
regularly fail with "postmaster does not shut down" complaints.  Maybe they
are just really overloaded, but maybe something else is going on.  Hence,
instrument pg_ctl to print the current time when it starts waiting for
postmaster shutdown and when it gives up, and add a lot of logging of the
current time in the server's checkpoint and shutdown code paths.

No attempt has been made to make this pretty.  I'm not even totally sure
if it will build on Windows, but we'll soon find out.
2016-02-08 18:43:11 -05:00
Robert Haas c1772ad922 Change the way that LWLocks for extensions are allocated.
The previous RequestAddinLWLocks() method had several disadvantages.
First, the locks would be in the main tranche; we've recently decided
that it's useful for LWLocks used for separate purposes to have
separate tranche IDs.  Second, there wasn't any correlation between
what code called RequestAddinLWLocks() and what code called
LWLockAssign(); when multiple modules are in use, it could become
quite difficult to troubleshoot problems where LWLockAssign() ran out
of locks.  To fix, create a concept of named LWLock tranches which
can be used either by extension or by core code.

Amit Kapila and Robert Haas
2016-02-04 16:43:04 -05:00
Peter Eisentraut 7d17e683fc Add support for systemd service notifications
Insert sd_notify() calls at server start and stop for integration with
systemd.  This allows the use of systemd service units of type "notify",
which greatly simplifies the systemd configuration.

Reviewed-by: Pavel Stěhule <pavel.stehule@gmail.com>
2016-02-02 21:04:29 -05:00
Magnus Hagander e51ab85cd9 Fix typos in comments
Author: Michael Paquier
2016-02-01 11:43:48 +01:00
Tom Lane b8682a7155 Fix startup so that log prefix %h works for the log_connections message.
We entirely randomly chose to initialize port->remote_host just after
printing the log_connections message, when we could perfectly well do it
just before, allowing %h and %r to work for that message.  Per gripe from
Artem Tomyuk.
2016-01-26 15:38:33 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Peter Eisentraut 5db837d3f2 Message improvements 2015-11-16 21:39:23 -05:00
Robert Haas 64b2e7ad91 Pass extra data to bgworkers, and use this to fix parallel contexts.
Up until now, the total amount of data that could be passed to a
background worker at startup was one datum, which can be a small as
4 bytes on some systems.  That's enough to pass a dsm_handle or an
array index, but not much else.  Add a bgw_extra flag to the
BackgroundWorker struct, allowing up to 128 bytes to be passed to
a new worker on any platform.

Use this to fix a problem I recently discovered with the parallel
context machinery added in 9.5: the master assigns each worker an
array index, and each worker subsequently assigns itself an array
index, and there's nothing to guarantee that the two sets of indexes
match, leading to chaos.

Normally, I would not back-patch the change to add bgw_extra, since it
is basically a feature addition.  However, since 9.5 is still in beta
and there seems to be no other sensible way to repair the broken
parallel context machinery, back-patch to 9.5.  Existing background
worker code can ignore the bgw_extra field without a problem, but
might need to be recompiled since the structure size has changed.

Report and patch by me.  Review by Amit Kapila.
2015-11-05 12:13:56 -05:00
Robert Haas c6baec92fc Fix typo in bgworker.c 2015-10-30 10:35:33 +01:00
Tom Lane 869f693a36 On Windows, ensure shared memory handle gets closed if not being used.
Postmaster child processes that aren't supposed to be attached to shared
memory were not bothering to close the shared memory mapping handle they
inherit from the postmaster process.  That's mostly harmless, since the
handle vanishes anyway when the child process exits -- but the syslogger
process, if used, doesn't get killed and restarted during recovery from a
backend crash.  That meant that Windows doesn't see the shared memory
mapping as becoming free, so it doesn't delete it and the postmaster is
unable to create a new one, resulting in failure to recover from crashes
whenever logging_collector is turned on.

Per report from Dmitry Vasilyev.  It's a bit astonishing that we'd not
figured this out long ago, since it's been broken from the very beginnings
of out native Windows support; probably some previously-unexplained trouble
reports trace to this.

A secondary problem is that on Cygwin (perhaps only in older versions?),
exec() may not detach from the shared memory segment after all, in which
case these child processes did remain attached to shared memory, posing
the risk of an unexpected shared memory clobber if they went off the rails
somehow.  That may be a long-gone bug, but we can deal with it now if it's
still live, by detaching within the infrastructure introduced here to deal
with closing the handle.

Back-patch to all supported branches.

Tom Lane and Amit Kapila
2015-10-13 11:21:33 -04:00
Robert Haas db0f6cad48 Remove set_latch_on_sigusr1 flag.
This flag has proven to be a recipe for bugs, and it doesn't seem like
it can really buy anything in terms of performance.  So let's just
*always* set the process latch when we receive SIGUSR1 instead of
trying to do it only when needed.

Per my recent proposal on pgsql-hackers.
2015-10-09 14:31:04 -04:00
Tom Lane 94f5246ce1 Fix uninitialized-variable bug.
For some reason, neither of the compilers I usually use noticed the
uninitialized-variable problem I introduced in commit 7e2a18a916.
That's hardly a good enough excuse though.  Committing with brown paper bag
on head.

In addition to putting the operations in the right order, move the
declaration of "now" inside the loop; there's no need for it to be
outside, and that does wake up older gcc enough to notice any similar
future problem.

Back-patch to 9.4; earlier versions lack the time-to-SIGKILL stanza
so there's no bug.
2015-10-09 09:12:03 -05:00
Tom Lane 7e2a18a916 Perform an immediate shutdown if the postmaster.pid file is removed.
The postmaster now checks every minute or so (worst case, at most two
minutes) that postmaster.pid is still there and still contains its own PID.
If not, it performs an immediate shutdown, as though it had received
SIGQUIT.

The original goal behind this change was to ensure that failed buildfarm
runs would get fully cleaned up, even if the test scripts had left a
postmaster running, which is not an infrequent occurrence.  When the
buildfarm script removes a test postmaster's $PGDATA directory, its next
check on postmaster.pid will fail and cause it to exit.  Previously, manual
intervention was often needed to get rid of such orphaned postmasters,
since they'd block new test postmasters from obtaining the expected socket
address.

However, by checking postmaster.pid and not something else, we can provide
additional robustness: manual removal of postmaster.pid is a frequent DBA
mistake, and now we can at least limit the damage that will ensue if a new
postmaster is started while the old one is still alive.

Back-patch to all supported branches, since we won't get the desired
improvement in buildfarm reliability otherwise.
2015-10-06 17:15:52 -04:00
Robert Haas 8f6bb851bd Remove more volatile qualifiers.
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.
This continues work begun in df4077cda2
and 6ba4ecbf47.

Thomas Munro and Michael Paquier
2015-10-06 15:45:02 -04:00
Fujii Masao 96f6a0cb41 Remove files signaling a standby promotion request at postmaster startup
This commit makes postmaster forcibly remove the files signaling
a standby promotion request. Otherwise, the existence of those files
can trigger a promotion too early, whether a user wants that or not.

This removal of files is usually unnecessary because they can exist
only during a few moments during a standby promotion. However
there is a race condition: if pg_ctl promote is executed and creates
the files during a promotion, the files can stay around even after
the server is brought up to new master. Then, if new standby starts
by using the backup taken from that master, the files can exist
at the server startup and should be removed in order to avoid
an unexpected promotion.

Back-patch to 9.1 where promote signal file was introduced.

Problem reported by Feike Steenbergen.
Original patch by Michael Paquier, modified by me.

Discussion: 20150528100705.4686.91426@wrigleys.postgresql.org
2015-09-09 22:51:44 +09:00
Robert Haas 8a02b3d732 Allow notifications to bgworkers without database connections.
Previously, if one background worker registered another background
worker and set bgw_notify_pid while for the second background worker,
it would not receive notifications from the postmaster unless, at the
time the "parent" was registered, BGWORKER_BACKEND_DATABASE_CONNECTION
was set.

To fix, instead instead of including only those background workers that
requested database connections in the postmater's BackendList, include
them all.  There doesn't seem to be any reason not do this, and indeed
it removes a significant amount of duplicated code.  The other option
is to make PostmasterMarkPIDForWorkerNotify look at BackgroundWorkerList
in addition to BackendList, but that adds more code duplication instead
of getting rid of it.

Patch by me.  Review and testing by Ashutosh Bapat.
2015-09-01 15:30:19 -04:00
Tom Lane d73d14c271 Fix incorrect order of lock file removal and failure to close() sockets.
Commit c9b0cbe98b accidentally broke the
order of operations during postmaster shutdown: it resulted in removing
the per-socket lockfiles after, not before, postmaster.pid.  This creates
a race-condition hazard for a new postmaster that's started immediately
after observing that postmaster.pid has disappeared; if it sees the
socket lockfile still present, it will quite properly refuse to start.
This error appears to be the explanation for at least some of the
intermittent buildfarm failures we've seen in the pg_upgrade test.

Another problem, which has been there all along, is that the postmaster
has never bothered to close() its listen sockets, but has just allowed them
to close at process death.  This creates a different race condition for an
incoming postmaster: it might be unable to bind to the desired listen
address because the old postmaster is still incumbent.  This might explain
some odd failures we've seen in the past, too.  (Note: this is not related
to the fact that individual backends don't close their client communication
sockets.  That behavior is intentional and is not changed by this patch.)

Fix by adding an on_proc_exit function that closes the postmaster's ports
explicitly, and (in 9.3 and up) reshuffling the responsibility for where
to unlink the Unix socket files.  Lock file unlinking can stay where it
is, but teach it to unlink the lock files in reverse order of creation.
2015-08-02 14:55:03 -04:00
Tom Lane 4c8f8ffaca Further code review for pg_stat_ssl patch.
Fix additional bogosity in commit 9029f4b374.  Include the
BackendSslStatusBuffer in the BackendStatusShmemSize calculation,
avoid ugly and error-prone casts to char* and back, put related
code stanzas into a consistent order (and fix a couple of previous
instances of that sin).  All cosmetic except for the size oversight.
2015-07-27 16:29:14 -04:00
Tom Lane 7d791ed49b Fix pointer-arithmetic thinko in pg_stat_ssl patch.
Nasty memory-stomp bug in commit 9029f4b374.  It's not apparent how
this survived even cursory testing :-(.  Per report from Peter Holzer.
2015-07-27 15:58:46 -04:00
Tom Lane 45811be94e Fix postmaster's handling of a startup-process crash.
Ordinarily, a failure (unexpected exit status) of the startup subprocess
should be considered fatal, so the postmaster should just close up shop
and quit.  However, if we sent the startup process a SIGQUIT or SIGKILL
signal, the failure is hardly "unexpected", and we should attempt restart;
this is necessary for recovery from ordinary backend crashes in hot-standby
scenarios.  I attempted to implement the latter rule with a two-line patch
in commit 442231d7f7, but it now emerges that
that patch was a few bricks shy of a load: it failed to distinguish the
case of a signaled startup process from the case where the new startup
process crashes before reaching database consistency.  That resulted in
infinitely respawning a new startup process only to have it crash again.

To handle this properly, we really must track whether we have sent the
*current* startup process a kill signal.  Rather than add yet another
ad-hoc boolean to the postmaster's state, I chose to unify this with the
existing RecoveryError flag into an enum tracking the startup process's
state.  That seems more consistent with the postmaster's general state
machine design.

Back-patch to 9.0, like the previous patch.
2015-07-09 13:22:22 -04:00
Heikki Linnakangas d661532e27 Also trigger restartpoints based on max_wal_size on standby.
When archive recovery and restartpoints were initially introduced,
checkpoint_segments was ignored on the grounds that the files restored from
archive don't consume any space in the recovery server. That was changed in
later releases, but even then it was arguably a feature rather than a bug,
as performing restartpoints as often as checkpoints during normal operation
might be excessive, but you might nevertheless not want to waste a lot of
space for pre-allocated WAL by setting checkpoint_segments to a high value.
But now that we have separate min_wal_size and max_wal_size settings, you
can bound WAL usage with max_wal_size, and still avoid consuming excessive
space usage by setting min_wal_size to a lower value, so that argument is
moot.

There are still some issues with actually limiting the space usage to
max_wal_size: restartpoints in recovery can only start after seeing the
checkpoint record, while a checkpoint starts flushing buffers as soon as
the redo-pointer is set. Restartpoint is paced to happen at the same
leisurily speed, determined by checkpoint_completion_target, as checkpoints,
but because they are started later, max_wal_size can be exceeded by upto
one checkpoint cycle's worth of WAL, depending on
checkpoint_completion_target. But that seems better than not trying at all,
and max_wal_size is a soft limit anyway.

The documentation already claimed that max_wal_size is obeyed in recovery,
so this just fixes the behaviour to match the docs. However, add some
weasel-words there to mention that max_wal_size may well be exceeded by
some amount in recovery.
2015-06-29 00:09:10 +03:00
Robert Haas 91118f1a59 Reduce log level for background worker events from LOG to DEBUG1.
Per discussion, LOG is just too chatty for something that will happen
as routinely as this.

Pavel Stehule
2015-06-26 11:23:32 -04:00
Alvaro Herrera 3c400a3f2b Fix thinko in comment (launcher -> worker) 2015-06-20 11:45:59 -03:00
Tom Lane 48913db887 In immediate shutdown, postmaster should not exit till children are gone.
This adjusts commit 82233ce7ea so that the
postmaster does not exit until all its child processes have exited, even
if the 5-second timeout elapses and we have to send SIGKILL.  There is no
great value in having the postmaster process quit sooner, and doing so can
mislead onlookers into thinking that the cluster is fully terminated when
actually some child processes still survive.

This effect might explain recent test failures on buildfarm member hamster,
wherein we failed to restart a cluster just after shutting it down with
"pg_ctl stop -m immediate".

I also did a bit of code review/beautification, including fixing a faulty
use of the Max() macro on a volatile expression.

Back-patch to 9.4.  In older branches, the postmaster never waited for
children to exit during immediate shutdowns, and changing that would be
too much of a behavioral change.
2015-06-19 14:23:39 -04:00
Alvaro Herrera da1a9d0f5b Clamp autovacuum launcher sleep time to 5 minutes
This avoids the problem that it might go to sleep for an unreasonable
amount of time in unusual conditions like the server clock moving
backwards an unreasonable amount of time.

(Simply moving the server clock forward again doesn't solve the problem
unless you wake up the autovacuum launcher manually, say by sending it
SIGHUP).

Per trouble report from Prakash Itnal in
https://www.postgresql.org/message-id/CAHC5u79-UqbapAABH2t4Rh2eYdyge0Zid-X=Xz-ZWZCBK42S0Q@mail.gmail.com

Analyzed independently by Haribabu Kommi and Tom Lane.
2015-06-19 12:44:36 -03:00
Fujii Masao b5fe62038f Make postmaster restart archiver soon after it dies, even during recovery.
After the archiver dies, postmaster tries to start a new one immediately.
But previously this could happen only while server was running normally
even though archiving was enabled always (i.e., archive_mode was set to
always). So the archiver running during recovery could not restart soon
after it died. This is an oversight in commit ffd3774.

This commit changes reaper(), postmaster's signal handler to cleanup
after a child process dies, so that it tries to a new archiver even during
recovery if necessary.

Patch by me. Review by Alvaro Herrera.
2015-06-12 23:11:51 +09:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Noah Misch b0ce385032 Prevent a double free by not reentering be_tls_close().
Reentering this function with the right timing caused a double free,
typically crashing the backend.  By synchronizing a disconnection with
the authentication timeout, an unauthenticated attacker could achieve
this somewhat consistently.  Call be_tls_close() solely from within
proc_exit_prepare().  Back-patch to 9.0 (all supported versions).

Benkocs Norbert Attila

Security: CVE-2015-3165
2015-05-18 10:02:31 -04:00
Heikki Linnakangas 4df1328950 Put back stats-collector restarting code, removed accidentally.
Removed that code snippet accidentally in the archive_mode='always' patch.

Also, use varname-tags for archive_command in the docs.

Fujii Masao
2015-05-18 10:20:30 +03:00
Heikki Linnakangas ffd37740ee Add archive_mode='always' option.
In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.
2015-05-15 18:55:24 +03:00
Robert Haas 53bb309d2d Teach autovacuum about multixact member wraparound.
The logic introduced in commit b69bf30b9b
and repaired in commits 669c7d20e6 and
7be47c56af helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.

To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases.  This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.

This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off.  We'll need to fix
that via a separate patch.

Thomas Munro and Robert Haas
2015-05-08 12:53:00 -04:00
Robert Haas 924bcf4f16 Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things.  First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers.  Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes.  Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.

Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 15:02:14 -04:00
Andres Freund 6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Magnus Hagander 9029f4b374 Add system view pg_stat_ssl
This view shows information about all connections, such as if the
connection is using SSL, which cipher is used, and which client
certificate (if any) is used.

Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier
2015-04-12 19:07:46 +02:00
Alvaro Herrera 5df64f298d Fix autovacuum launcher shutdown sequence
It was previously possible to have the launcher re-execute its main loop
before shutting down if some other signal was received or an error
occurred after getting SIGTERM, as reported by Qingqing Zhou.

While investigating, Tom Lane further noticed that if autovacuum had
been disabled in the config file, it would misbehave by trying to start
a new worker instead of bailing out immediately -- it would consider
itself as invoked in emergency mode.

Fix both problems by checking the shutdown flag in a few more places.
These problems have existed since autovacuum was introduced, so
backpatch all the way back.
2015-04-08 13:19:49 -03:00
Alvaro Herrera 4ff695b17d Add log_min_autovacuum_duration per-table option
This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.
2015-04-03 11:55:50 -03:00
Alvaro Herrera a75fb9b335 Have autovacuum workers listen to SIGHUP, too
They have historically ignored it, but it's been said to be useful at
times to change their settings mid-flight.

Author: Michael Paquier
2015-04-03 11:52:55 -03:00
Robert Haas b3a5e76e12 After a crash, don't restart workers with BGW_NEVER_RESTART.
Amit Khandekar
2015-04-02 14:38:06 -04:00
Alvaro Herrera 00ee6c7672 autovacuum: Fix polarity of "wraparound" variable
Commit 0d83138974 inadvertently reversed the meaning of the
wraparound variable.  This causes vacuums which are not required for
wraparound to wait for locks to be acquired, and what is worse, it
allows wraparound vacuums to skip locked pages.

Bug reported by Jeff Janes in
http://www.postgresql.org/message-id/CAMkU=1xmTEiaY=5oMHsSQo5vd9V1Ze4kNLL0qN2eH0P_GXOaYw@mail.gmail.com
Analysis and patch by Kyotaro HORIGUCHI
2015-04-02 13:34:50 -03:00
Tom Lane 785941cdc3 Tweak __attribute__-wrapping macros for better pgindent results.
This improves on commit bbfd7edae5 by
making two simple changes:

* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed().  This reduces
pgindent's tendency to misformat declarations involving them.

* attributes are now always attached to function declarations, not
definitions.  Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway.  (It does little good to put a
noreturn annotation where callers can't see it.)  In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.

I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
2015-03-26 14:03:25 -04:00
Robert Haas bf740ce9e5 Fix status reporting for terminated bgworkers that were never started.
Previously, GetBackgroundWorkerPid() would return BGWH_NOT_YET_STARTED
if the slot used for the worker registration had not been reused by
unrelated activity, and BGWH_STOPPED if it had.  Either way, a process
that had requested notification when the state of one of its
background workers changed did not receive such notifications.  Fix
things so that GetBackgroundWorkerPid() always returns BGWH_STOPPED in
this situation, so that we do not erroneously give waiters the
impression that the worker will eventually be started; and send
notifications just as we would if the process terminated after having
been started, so that it's possible to wait for the postmaster to
process a worker termination request without polling.

Discovered by Amit Kapila during testing of parallel sequential scan.
Analysis and fix by me.  Back-patch to 9.4; there may not be anyone
relying on this interface yet, but if anyone is, the new behavior is a
clear improvement.
2015-03-19 11:04:09 -04:00
Alvaro Herrera 0d83138974 Rationalize vacuuming options and parameters
We were involving the parser too much in setting up initial vacuuming
parameters.  This patch moves that responsibility elsewhere to simplify
code, and also to make future additions easier.  To do this, create a
new struct VacuumParams which is filled just prior to vacuum execution,
instead of at parse time; for user-invoked vacuuming this is set up in a
new function ExecVacuum, while autovacuum sets it up by itself.

While at it, add a new member VACOPT_SKIPTOAST to enum VacuumOption,
only set by autovacuum, which is used to disable vacuuming of the toast
table instead of the old do_toast parameter; this relieves the argument
list of vacuum() and some callees a bit.  This partially makes up for
having added more arguments in an effort to avoid having autovacuum from
constructing a VacuumStmt parse node.

Author: Michael Paquier. Some tweaks by Álvaro
Reviewed by: Robert Haas, Stephen Frost, Álvaro Herrera
2015-03-18 11:52:33 -03:00
Andres Freund bbfd7edae5 Add macros wrapping all usage of gcc's __attribute__.
Until now __attribute__() was defined to be empty for all compilers but
gcc. That's problematic because it prevents using it in other compilers;
which is necessary e.g. for atomics portability.  It's also just
generally dubious to do so in a header as widely included as c.h.

Instead add pg_attribute_format_arg, pg_attribute_printf,
pg_attribute_noreturn macros which are implemented in the compilers that
understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
but don't provide fallbacks, since they can affect functionality.

This means that external code that, possibly unwittingly, relied on
__attribute__ defined to be empty on !gcc compilers may now run into
warnings or errors on those compilers. But there shouldn't be many
occurances of that and it's hard to work around...

Discussion: 54B58BA3.8040302@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.
2015-03-11 14:30:01 +01:00
Heikki Linnakangas 88e9823026 Replace checkpoint_segments with min_wal_size and max_wal_size.
Instead of having a single knob (checkpoint_segments) that both triggers
checkpoints, and determines how many checkpoints to recycle, they are now
separate concerns. There is still an internal variable called
CheckpointSegments, which triggers checkpoints. But it no longer determines
how many segments to recycle at a checkpoint. That is now auto-tuned by
keeping a moving average of the distance between checkpoints (in bytes),
and trying to keep that many segments in reserve. The advantage of this is
that you can set max_wal_size very high, but the system won't actually
consume that much space if there isn't any need for it. The min_wal_size
sets a floor for that; you can effectively disable the auto-tuning behavior
by setting min_wal_size equal to max_wal_size.

The max_wal_size setting is now the actual target size of WAL at which a
new checkpoint is triggered, instead of the distance between checkpoints.
Previously, you could calculate the actual WAL usage with the formula
"(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this
patch, you set the desired WAL usage with max_wal_size, and the system
calculates the appropriate CheckpointSegments with the reverse of that
formula. That's a lot more intuitive for administrators to set.

Reviewed by Amit Kapila and Venkata Balaji N.
2015-02-23 18:53:02 +02:00
Tom Lane 33a3b03d63 Use FLEXIBLE_ARRAY_MEMBER in some more places.
Fix a batch of structs that are only visible within individual .c files.

Michael Paquier
2015-02-20 17:32:01 -05:00
Alvaro Herrera d42358efb1 Have TRUNCATE update pgstat tuple counters
This works by keeping a per-subtransaction record of the ins/upd/del
counters before the truncate, and then resetting them; this record is
useful to return to the previous state in case the truncate is rolled
back, either in a subtransaction or whole transaction.  The state is
propagated upwards as subtransactions commit.

When the per-table data is sent to the stats collector, a flag indicates
to reset the live/dead counters to zero as well.

Catalog version bumped due to the change in pgstat format.

Author: Alexander Shulgin
Discussion: 1007.1207238291@sss.pgh.pa.us
Discussion: 548F7D38.2000401@BlueTreble.com
Reviewed-by: Álvaro Herrera, Jim Nasby
2015-02-20 12:10:01 -03:00
Tom Lane 09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
Andres Freund 4f85fde8eb Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers...  The required code was
fragile and verbose, and is likely to contain bugs.

That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.

Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.

As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.

We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.

This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort.  In addition to allowing getting
rid of ImmediateInterruptOK, that is.

Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
Robert Haas 5d2f957f3f Add new function BackgroundWorkerInitializeConnectionByOid.
Sometimes it's useful for a background worker to be able to initialize
its database connection by OID rather than by name, so provide a way
to do that.
2015-02-02 16:23:59 -05:00
Heikki Linnakangas 2b3a8b20c2 Be more careful to not lose sync in the FE/BE protocol.
If any error occurred while we were in the middle of reading a protocol
message from the client, we could lose sync, and incorrectly try to
interpret a part of another message as a new protocol message. That will
usually lead to an "invalid frontend message" error that terminates the
connection. However, this is a security issue because an attacker might
be able to deliberately cause an error, inject a Query message in what's
supposed to be just user data, and have the server execute it.

We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
operations that could ereport(ERROR) in the middle of processing a message,
but a query cancel interrupt or statement timeout could nevertheless cause
it to happen. Also, the V2 fastpath and COPY handling were not so careful.
It's very difficult to recover in the V2 COPY protocol, so we will just
terminate the connection on error. In practice, that's what happened
previously anyway, as we lost protocol sync.

To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
whenever we're in the middle of reading a message. When it's set, we cannot
safely ERROR out and continue running, because we might've read only part
of a message. PqCommReadingMsg acts somewhat similarly to critical sections
in that if an error occurs while it's set, the error handler will force the
connection to be terminated, as if the error was FATAL. It's not
implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
to PANIC in critical sections, because we want to be able to use
PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
advantage of that to prevent an OOM error from terminating the connection.

To prevent unnecessary connection terminations, add a holdoff mechanism
similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
interrupts, but still allow die interrupts. The rules on which interrupts
are processed when are now a bit more complicated, so refactor
ProcessInterrupts() and the calls to it in signal handlers so that the
signal handlers always call it if ImmediateInterruptOK is set, and
ProcessInterrupts() can decide to not do anything if the other conditions
are not met.

Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
Backpatch to all supported versions.

Security: CVE-2015-0244
2015-02-02 17:09:53 +02:00
Tom Lane 586dd5d6a5 Replace a bunch more uses of strncpy() with safer coding.
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about.  So just use memcpy() in
these cases.

In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).

I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution).  There are also a
few such calls in ecpg that could possibly do with more analysis.

AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior.  These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
2015-01-24 13:05:42 -05:00
Tom Lane 75b48e1fff Adjust "pgstat wait timeout" message to be a translatable LOG message.
Per discussion, change the log level of this message to be LOG not WARNING.
The main point of this change is to avoid causing buildfarm run failures
when the stats collector is exceptionally slow to respond, which it not
infrequently is on some of the smaller/slower buildfarm members.

This change does lose notice to an interactive user when his stats query
is looking at out-of-date stats, but the majority opinion (not necessarily
that of yours truly) is that WARNING messages would probably not get
noticed anyway on heavily loaded production systems.  A LOG message at
least ensures that the problem is recorded somewhere where bulk auditing
for the issue is possible.

Also, instead of an untranslated "pgstat wait timeout" message, provide
a translatable and hopefully more understandable message "using stale
statistics instead of current ones because stats collector is not
responding".  The original text was written hastily under the assumption
that it would never really happen in practice, which we now know to be
unduly optimistic.

Back-patch to all active branches, since we've seen the buildfarm issue
in all branches.
2015-01-19 23:01:33 -05:00
Andres Freund 59f71a0d0b Add a default local latch for use in signal handlers.
To do so, move InitializeLatchSupport() into the new common process
initialization functions, and add a new global variable MyLatch.

MyLatch is usable as soon InitPostmasterChild() has been called
(i.e. very early during startup). Initially it points to a process
local latch that exists in all processes. InitProcess/InitAuxiliaryProcess
then replaces that local latch with PGPROC->procLatch. During shutdown
the reverse happens.

This is primarily advantageous for two reasons: For one it simplifies
dealing with the shared process latch, especially in signal handlers,
because instead of having to check for MyProc, MyLatch can be used
unconditionally. For another, a later patch that makes FEs/BE
communication use latches, now can rely on the existence of a latch,
even before having gone through InitProcess.

Discussion: 20140927191243.GD5423@alap3.anarazel.de
2015-01-14 18:45:22 +01:00
Andres Freund 31c453165b Commonalize process startup code.
Move common code, that was duplicated in every postmaster child/every
standalone process, into two functions in miscinit.c.  Not only does
that already result in a fair amount of net code reduction but it also
makes it much easier to remove more duplication in the future. The
prime motivation wasn't code deduplication though, but easier addition
of new common code.
2015-01-14 00:33:14 +01:00
Andres Freund 2be82dcf17 Make logging_collector=on work with non-windows EXEC_BACKEND again.
Commit b94ce6e80 reordered postmaster's startup sequence so that the
tempfile directory is only cleaned up after all the necessary state
for pg_ctl is collected.  Unfortunately the chosen location is after
the syslogger has been started; which normally is fine, except for
!WIN32 EXEC_BACKEND builds, which pass information to children via
files in the temp directory.

Move the call to RemovePgTempFiles() to just before the syslogger has
started. That's the first child we fork.

Luckily EXEC_BACKEND is pretty much only used by endusers on windows,
which has a separate method to pass information to children. That
means the real world impact of this bug is very small.

Discussion: 20150113182344.GF12272@alap3.anarazel.de

Backpatch to 9.1, just as the previous commit was.
2015-01-14 00:14:53 +01:00
Noah Misch 2048e5b881 On Darwin, refuse postmaster startup when multithreaded.
The previous commit introduced its report at LOG level to avoid
surprises at minor release upgrade time.  Compel users deploying the
next major release to also deploy the reported workaround.
2015-01-07 22:46:59 -05:00
Noah Misch 894459e59f On Darwin, detect and report a multithreaded postmaster.
Darwin --enable-nls builds use a substitute setlocale() that may start a
thread.  Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously.  Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork().  Emit LOG messages about the
problem and its workaround.  Back-patch to 9.0 (all supported versions).
2015-01-07 22:35:44 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Andres Freund d72731a704 Lockless StrategyGetBuffer clock sweep hot path.
StrategyGetBuffer() has proven to be a bottleneck in a number of
buffer acquisition heavy workloads. To some degree this has already
been alleviated by 5d7962c6, but it still can be quite a heavy
bottleneck.  The problem is that in unfortunate usage patterns a
single StrategyGetBuffer() call will have to look at a large number of
buffers - in turn making it likely that the process will be put to
sleep while still holding the spinlock.

Replace most of the usage of the buffer_strategy_lock spinlock for the
clock sweep by a atomic nextVictimBuffer variable. That variable,
modulo NBuffers, is the current hand of the clock sweep. The buffer
clock-sweep then only needs to acquire the spinlock after a
wraparound. And even then only in the process that did the wrapping
around. That alleviates nearly all the contention on the relevant
spinlock, although significant contention on the cacheline can still
exist.

Reviewed-By: Robert Haas and Amit Kapila

Discussion: 20141010160020.GG6670@alap3.anarazel.de,
    20141027133218.GA2639@awork2.anarazel.de
2014-12-25 18:26:25 +01:00
Tom Lane 4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Fujii Masao 38628db8d8 Add memory barriers for PgBackendStatus.st_changecount protocol.
st_changecount protocol needs the memory barriers to ensure that
the apparent order of execution is as it desires. Otherwise,
for example, the CPU might rearrange the code so that st_changecount
is incremented twice before the modification on a machine with
weak memory ordering. This surprising result can lead to bugs.

This commit introduces the macros to load and store st_changecount
with the memory barriers. These are called before and after
PgBackendStatus entries are modified or copied into private memory,
in order to prevent CPU from reordering PgBackendStatus access.

Per discussion on pgsql-hackers, we decided not to back-patch this
to 9.4 or before until we get an actual bug report about this.

Patch by me. Review by Robert Haas.
2014-12-18 23:07:51 +09:00
Tom Lane 06d5803ffa Fix assorted confusion between Oid and int32.
In passing, also make some debugging elog's in pgstat.c a bit more
consistently worded.

Back-patch as far as applicable (9.3 or 9.4; none of these mistakes are
really old).

Mark Dilger identified and patched the type violations; the message
rewordings are mine.
2014-12-11 15:41:15 -05:00
Simon Riggs aedccb1f6f action_at_recovery_target recovery config option
action_at_recovery_target = pause | promote | shutdown

Petr Jelinek

Reviewed by Muhammad Asif Naeem, Fujji Masao and
Simon Riggs
2014-11-25 20:13:30 +00:00
Robert Haas d0410d6603 Eliminate one background-worker-related flag variable.
Teach sigusr1_handler() to use the same test for whether a worker
might need to be started as ServerLoop().  Aside from being perhaps
a bit simpler, this prevents a potentially-unbounded delay when
starting a background worker.  On some platforms, select() doesn't
return when interrupted by a signal, but is instead restarted,
including a reset of the timeout to the originally-requested value.
If signals arrive often enough, but no connection requests arrive,
sigusr1_handler() will be executed repeatedly, but the body of
ServerLoop() won't be reached.  This change ensures that, even in
that case, background workers will eventually get launched.

This is far from a perfect fix; really, we need select() to return
control to ServerLoop() after an interrupt, either via the self-pipe
trick or some other mechanism.  But that's going to require more
work and discussion, so let's do this for now to at least mitigate
the damage.

Per investigation of test_shm_mq failures on buildfarm member anole.
2014-10-04 21:25:41 -04:00
Alvaro Herrera 1021bd6a89 Don't balance vacuum cost delay when per-table settings are in effect
When there are cost-delay-related storage options set for a table,
trying to make that table participate in the autovacuum cost-limit
balancing algorithm produces undesirable results: instead of using the
configured values, the global values are always used,
as illustrated by Mark Kirkwood in
http://www.postgresql.org/message-id/52FACF15.8020507@catalyst.net.nz

Since the mechanism is already complicated, just disable it for those
cases rather than trying to make it cope.  There are undesirable
side-effects from this too, namely that the total I/O impact on the
system will be higher whenever such tables are vacuumed.  However, this
is seen as less harmful than slowing down vacuum, because that would
cause bloat to accumulate.  Anyway, in the new system it is possible to
tweak options to get the precise behavior one wants, whereas with the
previous system one was simply hosed.

This has been broken forever, so backpatch to all supported branches.
This might affect systems where cost_limit and cost_delay have been set
for individual tables.
2014-10-03 13:01:27 -03:00
Andres Freund a39e78b710 Block signals while computing the sleep time in postmaster's main loop.
DetermineSleepTime() was previously called without blocked
signals. That's not good, because it allows signal handlers to
interrupt its workings.

DetermineSleepTime() was added in 9.3 with the addition of background
workers (da07a1e856), where it only read from
BackgroundWorkerList.

Since 9.4, where dynamic background workers were added (7f7485a0cd),
the list is also manipulated in DetermineSleepTime(). That's bad
because the list now can be persistently corrupted if modified by both
a signal handler and DetermineSleepTime().

This was discovered during the investigation of hangs on buildfarm
member anole. It's unclear whether this bug is the source of these
hangs or not, but it's worth fixing either way. I have confirmed that
it can cause crashes.

It luckily looks like this only can cause problems when bgworkers are
actively used.

Discussion: 20140929193733.GB14400@awork2.anarazel.de

Backpatch to 9.3 where background workers were introduced.
2014-10-01 15:19:40 +02:00
Andres Freund 11a020eb6e Allow escaping of option values for options passed at connection start.
This is useful to allow to set GUCs to values that include spaces;
something that wasn't previously possible. The primary case motivating
this is the desire to set default_transaction_isolation to 'repeatable
read' on a per connection basis, but other usecases like seach_path do
also exist.

This introduces a slight backward incompatibility: Previously a \ in
an option value would have been passed on literally, now it'll be
taken as an escape.

The relevant mailing list discussion starts with
20140204125823.GJ12016@awork2.anarazel.de.
2014-08-28 13:59:29 +02:00
Heikki Linnakangas 680513ab79 Break out OpenSSL-specific code to separate files.
This refactoring is in preparation for adding support for other SSL
implementations, with no user-visible effects. There are now two #defines,
USE_OPENSSL which is defined when building with OpenSSL, and USE_SSL which
is defined when building with any SSL implementation. Currently, OpenSSL is
the only implementation so the two #defines go together, but USE_SSL is
supposed to be used for implementation-independent code.

The libpq SSL code is changed to use a custom BIO, which does all the raw
I/O, like we've been doing in the backend for a long time. That makes it
possible to use MSG_NOSIGNAL to block SIGPIPE when using SSL, which avoids
a couple of syscall for each send(). Probably doesn't make much performance
difference in practice - the SSL encryption is expensive enough to mask the
effect - but it was a natural result of this refactoring.

Based on a patch by Martijn van Oosterhout from 2006. Briefly reviewed by
Alvaro Herrera, Andreas Karlsson, Jeff Janes.
2014-08-11 11:54:19 +03:00
Tom Lane f51ead09df Avoid wholesale autovacuuming when autovacuum is nominally off.
When autovacuum is nominally off, we will still launch autovac workers
to vacuum tables that are at risk of XID wraparound.  But after we'd done
that, an autovac worker would proceed to autovacuum every table in the
targeted database, if they meet the usual thresholds for autovacuuming.
This is at best pretty unexpected; at worst it delays response to the
wraparound threat.  Fix it so that if autovacuum is nominally off, we
*only* do forced vacuums and not any other work.

Per gripe from Andrey Zhidenkov.  This has been like this all along,
so back-patch to all supported branches.
2014-07-30 14:41:35 -04:00
Robert Haas e280c630a8 Fix mishandling of background worker PGPROCs in EXEC_BACKEND builds.
InitProcess() relies on IsBackgroundWorker to decide whether the PGPROC
for a new backend should be taken from ProcGlobal's freeProcs or from
bgworkerFreeProcs.  In EXEC_BACKEND builds, InitProcess() is called
sooner than in non-EXEC_BACKEND builds, and IsBackgroundWorker wasn't
getting initialized soon enough.

Report by Noah Misch.  Diagnosis and fix by me.
2014-07-30 11:34:06 -04:00
Peter Eisentraut d38228fe40 Add missing serial commas
Also update one place where the wal_level "logical" was not added to an
error message.
2014-07-15 08:31:50 -04:00
Kevin Grittner ac46de56ea Smooth reporting of commit/rollback statistics.
If a connection committed or rolled back any transactions within a
PGSTAT_STAT_INTERVAL pacing interval without accessing any tables,
the reporting of those statistics would be held up until the
connection closed or until it ended a PGSTAT_STAT_INTERVAL interval
in which it had accessed a table.  This could result in under-
reporting of transactions for an extended period, followed by a
spike in reported transactions.

While this is arguably a bug, the impact is minimal, primarily
affecting, and being affected by, monitoring software.  It might
cause more confusion than benefit to change the existing behavior
in released stable branches, so apply only to master and the 9.4
beta.

Gurjeet Singh, with review and editing by Kevin Grittner,
incorporating suggested changes from Abhijit Menon-Sen and Tom
Lane.
2014-07-02 15:20:30 -05:00
Heikki Linnakangas 1c6821be31 Fix and enhance the assertion of no palloc's in a critical section.
The assertion failed if WAL_DEBUG or LWLOCK_STATS was enabled; fix that by
using separate memory contexts for the allocations made within those code
blocks.

This patch introduces a mechanism for marking any memory context as allowed
in a critical section. Previously ErrorContext was exempt as a special case.

Instead of a blanket exception of the checkpointer process, only exempt the
memory context used for the pending ops hash table.
2014-06-30 10:26:00 +03:00
Andres Freund 3bdcf6a5a7 Don't allow to disable backend assertions via the debug_assertions GUC.
The existance of the assert_enabled variable (backing the
debug_assertions GUC) reduced the amount of knowledge some static code
checkers (like coverity and various compilers) could infer from the
existance of the assertion. That could have been solved by optionally
removing the assertion_enabled variable from the Assert() et al macros
at compile time when some special macro is defined, but the resulting
complication doesn't seem to be worth the gain from having
debug_assertions. Recompiling is fast enough.

The debug_assertions GUC is still available, but readonly, as it's
useful when diagnosing problems. The commandline/client startup option
-A, which previously also allowed to enable/disable assertions, has
been removed as it doesn't serve a purpose anymore.

While at it, reduce code duplication in bufmgr.c and localbuf.c
assertions checking for spurious buffer pins. That code had to be
reindented anyway to cope with the assert_enabled removal.
2014-06-20 11:09:17 +02:00
Tom Lane df8b7bc9ff Improve our mechanism for controlling the Linux out-of-memory killer.
Arrange for postmaster child processes to respond to two environment
variables, PG_OOM_ADJUST_FILE and PG_OOM_ADJUST_VALUE, to determine whether
they reset their OOM score adjustments and if so to what.  This is superior
to the previous design involving #ifdef's in several ways.  The behavior is
now available in a default build, and both ends of the adjustment --- the
original adjustment of the postmaster's level and the subsequent
readjustment by child processes --- can now be controlled in one place,
namely the postmaster launch script.  So it's no longer necessary for the
launch script to act on faith that the server was compiled with the
appropriate options.  In addition, if someone wants to use an OOM score
other than zero for the child processes, that doesn't take a recompile
anymore; and we no longer have to cater separately to the two different
historical kernel APIs for this adjustment.

Gurjeet Singh, somewhat revised by me
2014-06-18 20:12:51 -04:00
Fujii Masao 654e8e4447 Save pg_stat_statements statistics file into $PGDATA/pg_stat directory at shutdown.
187492b6c2 changed pgstat.c so that
the stats files were saved into $PGDATA/pg_stat directory when the server
was shutdowned. But it accidentally forgot to change the location of
pg_stat_statements permanent stats file. This commit fixes pg_stat_statements
so that its stats file is also saved into $PGDATA/pg_stat at shutdown.

Since this fix changes the file layout, we don't back-patch it to 9.3
where this oversight was introduced.
2014-06-04 12:09:45 +09:00
Tom Lane f62d417825 Fix unportable setvbuf() usage in initdb.
In yesterday's commit 2dc4f011fd, I tried
to force buffering of stdout/stderr in initdb to be what it is by
default when the program is run interactively on Unix (since that's how
most manual testing is done).  This tripped over the fact that Windows
doesn't support _IOLBF mode.  We dealt with that a long time ago in
syslogger.c by falling back to unbuffered mode on Windows.  Export that
solution in port.h and use it in initdb.

Back-patch to 8.4, like the previous commit.
2014-05-15 15:57:54 -04:00
Robert Haas be7558162a When a background worker exists with code 0, unregister it.
The previous behavior was to restart immediately, which was generally
viewed as less useful.

Petr Jelinek, with some adjustments by me.
2014-05-07 17:44:42 -04:00
Robert Haas eee6cf1f33 When a bgworker exits, always call ReleasePostmasterChildSlot.
Commit e2ce9aa27b was insufficiently
well thought out.  Repair.
2014-05-07 16:30:23 -04:00
Robert Haas 970d1f76d1 Restart bgworkers immediately after a crash-and-restart cycle.
Just as we would start bgworkers immediately after an initial startup
of the server, we should restart them immediately when reinitializing.

Petr Jelinek and Robert Haas
2014-05-07 16:19:35 -04:00
Robert Haas 4d155d8b08 Detach shared memory from bgworkers without shmem access.
Since the postmaster won't perform a crash-and-restart sequence
for background workers which don't request shared memory access,
we'd better make sure that they can't corrupt shared memory.

Patch by me, review by Tom Lane.
2014-05-07 14:56:49 -04:00
Robert Haas e2ce9aa27b Never crash-and-restart for bgworkers without shared memory access.
The motivation for a crash and restart cycle when a backend dies is
that it might have corrupted shared memory on the way down; and we
can't recover reliably except by reinitializing everything.  But that
doesn't apply to processes that don't touch shared memory.  Currently,
there's nothing to prevent a background worker that doesn't request
shared memory access from touching shared memory anyway, but that's a
separate bug.

Previous to this commit, the coding in postmaster.c was inconsistent:
an exit status other than 0 or 1 didn't provoke a crash-and-restart,
but failure to release the postmaster child slot did.  This change
makes those cases consistent.
2014-05-07 13:19:02 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane cad4fe6455 Use AF_UNSPEC not PF_UNSPEC in getaddrinfo calls.
According to the Single Unix Spec and assorted man pages, you're supposed
to use the constants named AF_xxx when setting ai_family for a getaddrinfo
call.  In a few places we were using PF_xxx instead.  Use of PF_xxx
appears to be an ancient BSD convention that was not adopted by later
standardization.  On BSD and most later Unixen, it doesn't matter much
because those constants have equivalent values anyway; but nonetheless
this code is not per spec.

In the same vein, replace PF_INET by AF_INET in one socket() call, which
wasn't even consistent with the other socket() call in the same function
let alone the remainder of our code.

Per investigation of a Cygwin trouble report from Marco Atzeri.  It's
probably a long shot that this will fix his issue, but it's wrong in
any case.
2014-04-16 13:21:20 -04:00
Bruce Momjian 4180934651 check socket creation errors against PGINVALID_SOCKET
Previously, in some places, socket creation errors were checked for
negative values, which is not true for Windows because sockets are
unsigned.  This masked socket creation errors on Windows.

Backpatch through 9.0.  8.4 doesn't have the infrastructure to fix this.
2014-04-16 10:45:48 -04:00
Tom Lane 5d8117e1f3 Block signals earlier during postmaster startup.
Formerly, we set up the postmaster's signal handling only when we were
about to start launching subprocesses.  This is a bad idea though, as
it means that for example a SIGINT arriving before that will kill the
postmaster instantly, perhaps leaving lockfiles, socket files, shared
memory, etc laying about.  We'd rather that such a signal caused orderly
postmaster termination including releasing of those resources.  A simple
fix is to move the PostmasterMain stanza that initializes signal handling
to an earlier point, before we've created any such resources.  Then, an
early-arriving signal will be blocked until we're ready to deal with it
in the usual way.  (The only part that really needs to be moved up is
blocking of signals, but it seems best to keep the signal handler
installation calls together with that; for one thing this ensures the
kernel won't drop any signals we wished to get.  The handlers won't get
invoked in any case until we unblock signals in ServerLoop.)

Per a report from MauMau.  He proposed changing the way "pg_ctl stop"
works to deal with this, but that'd just be masking one symptom not
fixing the core issue.

It's been like this since forever, so back-patch to all supported branches.
2014-04-05 18:16:08 -04:00
Tom Lane fc752505a9 Fix assorted issues in client host name lookup.
The code for matching clients to pg_hba.conf lines that specify host names
(instead of IP address ranges) failed to complain if reverse DNS lookup
failed; instead it silently didn't match, so that you might end up getting
a surprising "no pg_hba.conf entry for ..." error, as seen in bug #9518
from Mike Blackwell.  Since we don't want to make this a fatal error in
situations where pg_hba.conf contains a mixture of host names and IP
addresses (clients matching one of the numeric entries should not have to
have rDNS data), remember the lookup failure and mention it as DETAIL if
we get to "no pg_hba.conf entry".  Apply the same approach to forward-DNS
lookup failures, too, rather than treating them as immediate hard errors.

Along the way, fix a couple of bugs that prevented us from detecting an
rDNS lookup error reliably, and make sure that we make only one rDNS lookup
attempt; formerly, if the lookup attempt failed, the code would try again
for each host name entry in pg_hba.conf.  Since more or less the whole
point of this design is to ensure there's only one lookup attempt not one
per entry, the latter point represents a performance bug that seems
sufficient justification for back-patching.

Also, adjust src/port/getaddrinfo.c so that it plays as well as it can
with this code.  Which is not all that well, since it does not have actual
support for rDNS lookup, but at least it should return the expected (and
required by spec) error codes so that the main code correctly perceives the
lack of functionality as a lookup failure.  It's unlikely that PG is still
being used in production on any machines that require our getaddrinfo.c,
so I'm not excited about working harder than this.

To keep the code in the various branches similar, this includes
back-patching commits c424d0d105 and
1997f34db4 into 9.2 and earlier.

Back-patch to 9.1 where the facility for hostnames in pg_hba.conf was
introduced.
2014-04-02 17:11:24 -04:00
Tom Lane 682c5bbec5 Fix bugs in manipulation of PgBackendStatus.st_clienthostname.
Initialization of this field was not being done according to the
st_changecount protocol (it has to be done within the changecount increment
range, not outside).  And the test to see if the value should be reported
as null was wrong.  Noted while perusing uses of Port.remote_hostname.

This was wrong from the introduction of this code (commit 4a25bc145),
so back-patch to 9.1.
2014-04-01 21:30:34 -04:00
Robert Haas 79a4d24f31 Make it easy to detach completely from shared memory.
The new function dsm_detach_all() can be used either by postmaster
children that don't wish to take any risk of accidentally corrupting
shared memory; or by forked children of regular backends with
the same need.  This patch also updates the postmaster children that
already do PGSharedMemoryDetach() to do dsm_detach_all() as well.

Per discussion with Tom Lane.
2014-03-18 07:58:53 -04:00
Bruce Momjian 886c0be3f6 C comments: remove odd blank lines after #ifdef WIN32 lines 2014-03-13 01:34:42 -04:00
Robert Haas 5a991ef869 Allow logical decoding via the walsender interface.
In order for this to work, walsenders need the optional ability to
connect to a database, so the "replication" keyword now allows true
or false, for backward-compatibility, and the new value "database"
(which causes the "dbname" parameter to be respected).

walsender needs to loop not only when idle but also when sending
decoded data to the user and when waiting for more xlog data to decode.
This means that there are now three separate loops inside walsender.c;
although some refactoring has been done here, this is still a bit ugly.

Andres Freund, with contributions from Álvaro Herrera, and further
review by me.
2014-03-10 13:50:28 -04:00
Alvaro Herrera 2b4f2ab33d Remove the correct pgstat file on DROP DATABASE
We were unlinking the permanent file, not the non-permanent one.  But
since the stat collector already unlinks all permanent files on startup,
there was nothing for it to unlink.  The non-permanent file remained in
place, and was copied to the permanent directory on shutdown, so in
effect no file was ever dropped.

Backpatch to 9.3, where the issue was introduced by commit 187492b6c2.
Before that, there were no per-database files and thus no file to drop
on DROP DATABASE.

Per report from Thom Brown.

Author: Tomáš Vondra
2014-03-05 13:03:29 -03:00
Robert Haas dd1a3bccca Show xid and xmin in pg_stat_activity and pg_stat_replication.
Christian Kruse, reviewed by Andres Freund and myself, with further
minor adjustments by me.
2014-02-25 12:34:04 -05:00
Tom Lane 643f75ca9b Fix unportable coding in BackgroundWorkerStateChange().
PIDs aren't necessarily ints; our usual practice for printing them
is to explicitly cast to long.  Per buildfarm member rover_firefly.
2014-02-15 17:15:05 -05:00
Tom Lane f0ee42d59b Fix unportable coding in DetermineSleepTime().
We should not assume that struct timeval.tv_sec is a long, because
it ain't necessarily.  (POSIX says that it's a time_t, which might
well be 64 bits now or in the future; or for that matter might be
32 bits on machines with 64-bit longs.)  Per buildfarm member panther.

Back-patch to 9.3 where the dubious coding was introduced.
2014-02-15 17:09:50 -05:00
Tom Lane 60ff2fdd99 Centralize getopt-related declarations in a new header file pg_getopt.h.
We used to have externs for getopt() and its API variables scattered
all over the place.  Now that we find we're going to need to tweak the
variable declarations for Cygwin, it seems like a good idea to have
just one place to tweak.

In this commit, the variables are declared "#ifndef HAVE_GETOPT_H".
That may or may not work everywhere, but we'll soon find out.

Andres Freund
2014-02-15 14:31:30 -05:00
Alvaro Herrera 801c2dc72c Separate multixact freezing parameters from xid's
Previously we were piggybacking on transaction ID parameters to freeze
multixacts; but since there isn't necessarily any relationship between
rates of Xid and multixact consumption, this turns out not to be a good
idea.

Therefore, we now have multixact-specific freezing parameters:

vacuum_multixact_freeze_min_age: when to remove multis as we come across
them in vacuum (default to 5 million, i.e. early in comparison to Xid's
default of 50 million)

vacuum_multixact_freeze_table_age: when to force whole-table scans
instead of scanning only the pages marked as not all visible in
visibility map (default to 150 million, same as for Xids).  Whichever of
both which reaches the 150 million mark earlier will cause a whole-table
scan.

autovacuum_multixact_freeze_max_age: when for cause emergency,
uninterruptible whole-table scans (default to 400 million, double as
that for Xids).  This means there shouldn't be more frequent emergency
vacuuming than previously, unless multixacts are being used very
rapidly.

Backpatch to 9.3 where multixacts were made to persist enough to require
freezing.  To avoid an ABI break in 9.3, VacuumStmt has a couple of
fields in an unnatural place, and StdRdOptions is split in two so that
the newly added fields can go at the end.

Patch by me, reviewed by Robert Haas, with additional input from Andres
Freund and Tom Lane.
2014-02-13 19:36:31 -03:00
Peter Eisentraut 66c04c981d Mark some more variables as static or include the appropriate header
Detected by clang's -Wmissing-variable-declarations.

From: Andres Freund <andres@anarazel.de>
2014-02-08 21:21:46 -05:00
Robert Haas b7643b19f0 Fix compiler warning in EXEC_BACKEND builds.
Per a report by Rajeev Rastogi.
2014-01-28 23:35:50 -05:00
Fujii Masao 9132b189bf Add pg_stat_archiver statistics view.
This view shows the statistics about the WAL archiver process's activity.

Gabriele Bartolini, reviewed by Michael Paquier, refactored a bit by me.
2014-01-29 02:58:22 +09:00