Commit Graph

1480 Commits

Author SHA1 Message Date
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Michael Paquier e3db3f829f Clean up properly error_context_stack in autovacuum worker on exception
Any callback set would have no meaning in the context of an exception.
As an autovacuum worker exits quickly in this context, this could be
only an issue within EmitErrorReport(), where the elog hook is for
example called.  That's unlikely to going to be a problem, but let's be
clean and consistent with other code paths handling exceptions.  This is
present since 2909419, which introduced autovacuum.

Author: Ashwin Agrawal
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALfoeisM+_+dgmAdAOHAu0k-ZpEHHqSSG=GRf3pKJGm8OqWX0w@mail.gmail.com
Backpatch-through: 9.4
2019-10-23 10:25:06 +09:00
Tom Lane 9abb2bfc04 In the postmaster, rely on the signal infrastructure to block signals.
POSIX sigaction(2) can be told to block a set of signals while a
signal handler executes.  Make use of that instead of manually
blocking and unblocking signals in the postmaster's signal handlers.
This should save a few cycles, and it also prevents recursive
invocation of signal handlers when many signals arrive in close
succession.  We have seen buildfarm failures that seem to be due to
postmaster stack overflow caused by such recursion (exacerbated by
a Linux PPC64 kernel bug).

This doesn't change anything about the way that it works on Windows.
Somebody might consider adjusting port/win32/signal.c to let it work
similarly, but I'm not in a position to do that.

For the moment, just apply to HEAD.  Possibly we should consider
back-patching this, but it'd be good to let it age awhile first.

Discussion: https://postgr.es/m/14878.1570820201@sss.pgh.pa.us
2019-10-13 15:48:26 -04:00
Tom Lane 3887e9455f Check for too many postmaster children before spawning a bgworker.
The postmaster's code path for spawning a bgworker neglected to check
whether we already have the max number of live child processes.  That's
a bit hard to hit, since it would necessarily be a transient condition;
but if we do, AssignPostmasterChildSlot() fails causing a postmaster
crash, as seen in a report from Bhargav Kamineni.

To fix, invoke canAcceptConnections() in the bgworker code path, as we
do in the other code paths that spawn children.  Since we don't want
the same pmState tests in this case, add a child-process-type parameter
to canAcceptConnections() so that it can know what to do.

Back-patch to 9.5.  In principle the same hazard exists in 9.4, but the
code is enough different that this patch wouldn't quite fix it there.
Given the tiny usage of bgworkers in that branch it doesn't seem worth
creating a variant patch for it.

Discussion: https://postgr.es/m/18733.1570382257@sss.pgh.pa.us
2019-10-07 12:39:09 -04:00
Tom Lane 9a86f03b4e Rearrange postmaster's startup sequence for better syslogger results.
This is a second try at what commit 57431a911 tried to do, namely,
launch the syslogger before we open postmaster sockets so that our
messages about the sockets end up in the syslogger files.  That
commit fell foul of a bunch of subtle issues caused by trying to
launch a postmaster child process before creating shared memory.
Rather than messing with that interaction, let's postpone opening
the sockets till after we launch the syslogger.

This would not have been terribly safe before commit 7de19fbc0,
because we relied on socket opening to detect whether any competing
postmasters were using the same port number.  But now that we choose
IPC keys without regard to the port number, there's no interaction
to worry about.

Also delay creation of the external PID file (if requested) till after
the sockets are open, since external code could plausibly be relying
on that ordering of events.  And postpone most of the work of
RemovePgTempFiles() so that that potentially-slow processing still
happens after we make the external PID file.  We have to be a bit
careful about that last though: as noted in the discussion subsequent to
bug #15804, EXEC_BACKEND builds still have to clear the parameter-file
temp dir before launching the syslogger.

Patch by me; thanks to Michael Paquier for review/testing.

Discussion: https://postgr.es/m/15804-3721117bf40fb654@postgresql.org
2019-09-11 11:43:01 -04:00
Tom Lane 7de19fbc0b Use data directory inode number, not port, to select SysV resource keys.
This approach provides a much tighter binding between a data directory
and the associated SysV shared memory block (and SysV or named-POSIX
semaphores, if we're using those).  Key collisions are still possible,
but only between data directories stored on different filesystems,
so the situation should be negligible in practice.  More importantly,
restarting the postmaster with a different port number no longer
risks failing to identify a relevant shared memory block, even when
postmaster.pid has been removed.  A standalone backend is likewise
much more certain to detect conflicting leftover backends.

(In the longer term, we might now think about deprecating the port as
a cluster-wide value, so that one postmaster could support sockets
with varying port numbers.  But that's for another day.)

The hazards fixed here apply only on Unix systems; our Windows code
paths already use identifiers derived from the data directory path
name rather than the port.

src/test/recovery/t/017_shm.pl, which intends to test key-collision
cases, has been substantially rewritten since it can no longer use
two postmasters with identical port numbers to trigger the case.
Instead, use Perl's IPC::SharedMem module to create a conflicting
shmem segment directly.  The test script will be skipped if that
module is not available.  (This means that some older buildfarm
members won't run it, but I don't think that that results in any
meaningful coverage loss.)

Patch by me; thanks to Noah Misch and Peter Eisentraut for discussion
and review.

Discussion: https://postgr.es/m/16908.1557521200@sss.pgh.pa.us
2019-09-05 13:31:46 -04:00
Michael Paquier ae060a52b2 Fix thinko when ending progress report for a backend
The logic ending progress reporting for a backend entry introduced by
b6fb647 causes callers of pgstat_progress_end_command() to do some extra
work when track_activities is enabled as the process fields are reset in
the backend entry even if no command were started for reporting.

This resets the fields only if a command is registered for progress
reporting, and only if track_activities is enabled.

Author: Masahiho Sawada
Discussion: https://postgr.es/m/CAD21AoCry_vJ0E-m5oxJXGL3pnos-xYGCzF95rK5Bbi3Uf-rpA@mail.gmail.com
Backpatch-through: 9.6
2019-09-04 15:46:37 +09:00
Tom Lane ee32782395 Fix postmaster state machine to handle dead_end child crashes better.
A report from Alvaro Herrera shows that if we're in PM_STARTUP
state, and we spawn a dead_end child to reject some incoming
connection request, and that child dies with an unexpected exit
code, the postmaster does not respond well.  We correctly send
SIGQUIT to the startup process, but then:

* if the startup process exits with nonzero exit code, as expected,
we thought that that indicated a crash and aborted startup.

* if the startup process exits with zero exit code, which is possible
due to the inherent race condition, we'd advance to PM_RUN state
which is fine --- but the code forgot that AbortStartTime would be
nonzero in this situation.  We'd either die on the Asserts saying
that it was zero, or perhaps misbehave later on.  (A quick look
suggests that the only misbehavior might be busy-waiting due to
DetermineSleepTime doing the wrong thing.)

To fix the first point, adjust the state-machine logic to recognize
that a nonzero exit code is expected after sending SIGQUIT, and have
it transition to a state where we can restart the startup process.
To fix the second point, change the Asserts to clear the variable
rather than just claiming it should be clear already.

Perhaps we could improve this further by not treating a crash of
a dead_end child as a reason for panic'ing the database.  However,
since those child processes are connected to shared memory, that
seems a bit risky.  There are few good reasons for a dead_end child
to report failure anyway (the cause of this in Alvaro's report is
quite unclear).  On balance, therefore, a minimal fix seems best.

This is an oversight in commit 45811be94.  While that was back-patched,
I'm hesitant to back-patch this change.  The lack of reasons for a
dead_end child to fail suggests that the case should be very rare in
the field, which squares with the lack of reports; so it seems like
this might not be worth the risk of introducing new issues.  In any
case we can let it bake awhile in HEAD before considering a back-patch.

Discussion: https://postgr.es/m/20190615160950.GA31378@alvherre.pgsql
2019-08-26 15:59:44 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Michael Paquier 8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Michael Paquier eb43f3d193 Fix inconsistencies and typos in the tree
This is numbered take 8, and addresses again a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-29 12:28:30 +09:00
Michael Paquier 6b8548964b Fix inconsistencies in the code
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
2019-07-08 13:15:09 +09:00
Peter Eisentraut 7e9a4c5c3d Use consistent style for checking return from system calls
Use

    if (something() != 0)
        error ...

instead of just

    if (something)
        error ...

The latter is not incorrect, but it's a bit confusing and not the
common style.

Discussion: https://www.postgresql.org/message-id/flat/5de61b6b-8be9-7771-0048-860328efe027%402ndquadrant.com
2019-07-07 15:28:49 +02:00
Michael Paquier c74d49d41c Fix many typos and inconsistencies
Author: Alexander Lakhin
Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
2019-07-01 10:00:23 +09:00
Michael Paquier f43608bda2 Fix typos and inconsistencies in code comments
Author: Alexander Lakhin
Discussion: https://postgr.es/m/dec6aae8-2d63-639f-4d50-20e229fb83e3@gmail.com
2019-06-14 09:34:34 +09:00
Noah Misch 31d250e049 Update stale comments, and fix comment typos. 2019-06-08 10:12:26 -07:00
Amit Kapila 92c4abc736 Fix assorted inconsistencies.
There were a number of issues in the recent commits which include typos,
code and comments mismatch, leftover function declarations.  Fix them.

Reported-by: Alexander Lakhin
Author: Alexander Lakhin, Amit Kapila and Amit Langote
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
2019-06-08 08:16:38 +05:30
Amit Kapila 9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane 8334515529 Revert "postmaster: Start syslogger earlier".
This commit reverts 57431a911d.

While that's still a good idea in the abstract, we found out
that there are multiple crasher bugs in it on Windows builds,
making the logging_collector option unusable on Windows.
There's no time left to fix these issues before 12beta1,
so revert the patch to allow Windows beta testing to proceed.
We'll try again at some future date.

Per bug #15804 from Yulian Khodorkovskiy and additional
investigation by Michael Paquier.

Discussion: https://postgr.es/m/15804-3721117bf40fb654@postgresql.org
2019-05-19 11:14:23 -04:00
Alvaro Herrera 75445c1515 More message style fixes
Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql
2019-05-16 19:14:31 -04:00
Tom Lane 85ccb6899c Rearrange pgstat_bestart() to avoid failures within its critical section.
We long ago decided to design the shared PgBackendStatus data structure to
minimize the cost of writing status updates, which means that writers just
have to increment the st_changecount field twice.  That isn't hooked into
any sort of resource management mechanism, which means that if something
were to throw error between the two increments, the st_changecount field
would be left odd indefinitely.  That would cause readers to lock up.
Now, since it's also a bad idea to leave the field odd for longer than
absolutely necessary (because readers will spin while we have it set),
the expectation was that we'd treat these segments like spinlock critical
sections, with only short, more or less straight-line, code in them.

That was fine as originally designed, but commit 9029f4b37 broke it
by inserting a significant amount of non-straight-line code into
pgstat_bestart(), code that is very capable of throwing errors, not to
mention taking a significant amount of time during which readers will spin.
We have a report from Neeraj Kumar of readers actually locking up, which
I suspect was due to an encoding conversion error in X509_NAME_to_cstring,
though conceivably it was just a garden-variety OOM failure.

Subsequent commits have loaded even more dubious code into pgstat_bestart's
critical section (and commit fc70a4b0d deserves some kind of booby prize
for managing to miss the critical section entirely, although the negative
consequences seem minimal given that the PgBackendStatus entry should be
seen by readers as inactive at that point).

The right way to fix this mess seems to be to compute all these values
into a local copy of the process' PgBackendStatus struct, and then just
copy the data back within the critical section proper.  This plan can't
be implemented completely cleanly because of the struct's heavy reliance
on out-of-line strings, which we must initialize separately within the
critical section.  But still, the critical section is far smaller and
safer than it was before.

In hopes of forestalling future errors of the same ilk, rename the
macros for st_changecount management to make it more apparent that
the writer-side macros create a critical section.  And to prevent
the worst consequences if we nonetheless manage to mess it up anyway,
adjust those macros so that they really are a critical section, ie
they now bump CritSectionCount.  That doesn't add much overhead, and
it guarantees that if we do somehow throw an error while the counter
is odd, it will lead to PANIC and a database restart to reset shared
memory.

Back-patch to 9.5 where the problem was introduced.

In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach
pgstat_read_current_status to copy st_gssstatus data from shared memory to
local memory.  Hence, subsequent use of that data within the transaction
would potentially see changing data that it shouldn't see.

Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
2019-05-11 21:27:29 -04:00
Fujii Masao b84dbc8eb8 Add TRUNCATE parameter to VACUUM.
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.

This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-08 02:10:33 +09:00
Tom Lane 8d0ddccec6 Avoid "invalid memory alloc request size" while reading pg_stat_activity.
On a 64-bit machine, if you set track_activity_query_size and
max_connections such that their product exceeds 1GB, shared memory
setup will still succeed (given enough RAM), but attempts to read
pg_stat_activity fail with "invalid memory alloc request size".
Work around that by using MemoryContextAllocHuge to allocate the
local copy of the activity strings.  Using the "huge" API costs us
nothing extra in normal cases, and it seems better than throwing
an error and/or explaining to people why they can't do this.

This situation seems insanely profligate today, but who knows what
people will consider normal in ten or twenty years?  So let's fix it
in HEAD but not worry about a back-patch.

Per report from James Tomson.

Discussion: https://postgr.es/m/1CFDCCD6-B268-48D8-85C8-400D2790B2C3@pushd.com
2019-05-07 11:41:37 -04:00
Magnus Hagander 659e53498c Fix union for pgstat message types
The message type for temp files and for checksum failures were missing
from the union. Due to the coding style used there was no compiler error
when this happend. So change the code to actively use the union thereby
producing a compiler error if the same mistake happens again, suggested
by Tom Lane.

Author: Julien Rouhaud
Reported-By: Tomas Vondra
Discussion: https://postgr.es/m/20190430163328.zd4rrlnbvgaqlcdz@development
2019-05-01 12:30:44 +02:00
Noah Misch 90e7f31773 Use preprocessor conditions compatible with Emacs indent.
Emacs wrongly indented hundreds of subsequent lines.
2019-04-28 12:56:53 -07:00
Fujii Masao 978b032d1f Fix function names in comments.
Commit 3eb77eba5a renamed some functions, but forgot to
update some comments referencing to those functions.
This commit fixes those function names in the comments.

Kyotaro Horiguchi
2019-04-25 23:43:48 +09:00
Tom Lane 0fae846232 Fix some minor postmaster-state-machine issues.
In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based
on pmState.  The restriction is unnecessary (PostmasterStateMachine
should work in any state), not future-proof (since it makes too many
assumptions about why the signal might be sent), and broken even today
because a race condition can make it necessary to respond to the signal
in PM_WAIT_READONLY state.  The race condition seems unlikely, but
if it did happen, a hot-standby postmaster could fail to shut down
after receiving a smart-shutdown request.

In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag
if the fork attempt fails.  Leaving it set allows us to try
again in future iterations of the postmaster idle loop.  (The startup
process would eventually send a fresh request signal, but this change
may allow us to retry the fork sooner.)

Remove an obsolete comment and unnecessary test in
PostmasterStateMachine's handling of PM_SHUTDOWN_2 state.  It's not
possible to have a live walreceiver in that state, and AFAICT has not
been possible since commit 5e85315ea.  This isn't a live bug, but the
false comment is quite confusing to readers.

In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that
we don't uselessly perform stat() calls that we're going to ignore the
results of.

Add some comments clarifying the behavior of MaybeStartWalReceiver;
I very nearly rearranged it in a way that'd reintroduce the race
condition fixed in e5d494d78.  Mea culpa for not commenting that
properly at the time.

Back-patch to all supported branches.  The PMSIGNAL_ADVANCE_STATE_MACHINE
change is the only one of even minor significance, but we might as well
keep this code in sync across branches.

Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
2019-04-24 14:15:44 -04:00
Andres Freund fdc7efcc30 Allow pg_class xid & multixid horizons to not be set.
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.

Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry.  This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value.  Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23 21:42:12 -07:00
Noah Misch c098509927 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer stop if shmat() of an old segment fails with EACCES.  A
postmaster will no longer recycle segments pertaining to other data
directories.  That's good for production, but it's bad for integration
tests that crash a postmaster and immediately delete its data directory.
Such a test now leaks a segment indefinitely.  No "make check-world"
test does that.  win32_shmem.c already avoided all these problems.  In
9.6 and later, enhance PostgresNode to facilitate testing.  Back-patch
to 9.4 (all supported versions).

Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com
2019-04-12 22:36:38 -07:00
Magnus Hagander 77bd49adba Show shared object statistics in pg_stat_database
This adds a row to the pg_stat_database view with datoid 0 and datname
NULL for those objects that are not in a database. This was added
particularly for checksums, but we were already tracking more satistics
for these objects, just not returning it.

Also add a checksum_last_failure column that holds the timestamptz of
the last checksum failure that occurred in a database (or in a
non-dataabase file), if any.

Author: Julien Rouhaud <rjuju123@gmail.com>
2019-04-12 14:04:50 +02:00
Amit Kapila bdf35744bd Avoid counting transaction stats for parallel worker cooperating
transaction.

The transaction that is initiated by the parallel worker to cooperate
with the actual transaction started by the main backend to complete the
query execution should not be counted as a separate transaction.  The
other internal transactions started and committed by the parallel worker
are still counted as separate transactions as we that is what we do in
other places like autovacuum.

This will partially fix the bloat in transaction stats due to additional
transactions performed by parallel workers.  For a complete fix, we need to
decide how we want to show all the transactions that are started internally
for various operations and that is a matter of separate patch.

Reported-by: Haribabu Kommi
Author: Haribabu Kommi
Reviewed-by: Amit Kapila, Jamison Kirk and Rahila Syed
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAJrrPGc9=jKXuScvNyQ+VNhO0FZk7LLAShAJRyZjnedd2D61EQ@mail.gmail.com
2019-04-10 08:24:15 +05:30
Noah Misch 617dc6d299 Avoid "could not reattach" by providing space for concurrent allocation.
We've long had reports of intermittent "could not reattach to shared
memory" errors on Windows.  Buildfarm member dory fails that way when
PGSharedMemoryReAttach() execution overlaps with creation of a thread
for the process's "default thread pool".  Fix that by providing a second
region to receive asynchronous allocations that would otherwise intrude
into UsedShmemSegAddr.  In pgwin32_ReserveSharedMemoryRegion(), stop
trying to free reservations landing at incorrect addresses; the caller's
next step has been to terminate the affected process.  Back-patch to 9.4
(all supported versions).

Reviewed by Tom Lane.  He also did much of the prerequisite research;
see commit bcbf2346d6.

Discussion: https://postgr.es/m/20190402135442.GA1173872@rfd.leadboat.com
2019-04-08 21:39:00 -07:00
Thomas Munro de2b38419c Wake up interested backends when a checkpoint fails.
Commit c6c9474a switched to condition variables instead of sleep
loops to notify backends of checkpoint start and stop, but forgot
to broadcast in case of checkpoint failure.

Author: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKGJKbCd%2B_K%2BSEBsbHxVT60SG0ivWHHAdvL0bLTUt2xpA2w%40mail.gmail.com
2019-04-06 09:31:48 +13:00
Noah Misch 82150a05be Revert "Consistently test for in-use shared memory."
This reverts commits 2f932f71d9,
16ee6eaf80 and
6f0e190056.  The buildfarm has revealed
several bugs.  Back-patch like the original commits.

Discussion: https://postgr.es/m/20190404145319.GA1720877@rfd.leadboat.com
2019-04-05 00:00:52 -07:00
Robert Haas a96c41feec Allow VACUUM to be run with index cleanup disabled.
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.

Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
2019-04-04 15:04:43 -04:00
Thomas Munro 3eb77eba5a Refactor the fsync queue for wider use.
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.

For now only md.c uses the new interface, but other users are being
proposed.  Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.

Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key.  This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).

Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
2019-04-04 23:38:38 +13:00
Noah Misch 2f932f71d9 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer recycle segments pertaining to other data directories.
That's good for production, but it's bad for integration tests that
crash a postmaster and immediately delete its data directory.  Such a
test now leaks a segment indefinitely.  No "make check-world" test does
that.  win32_shmem.c already avoided all these problems.  In 9.6 and
later, enhance PostgresNode to facilitate testing.  Back-patch to 9.4
(all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20130911033341.GD225735@tornado.leadboat.com
2019-04-03 17:03:46 -07:00
Stephen Frost b0b39f72b9 GSSAPI encryption support
On both the frontend and backend, prepare for GSSAPI encryption
support by moving common code for error handling into a separate file.
Fix a TODO for handling multiple status messages in the process.
Eliminate the OIDs, which have not been needed for some time.

Add frontend and backend encryption support functions.  Keep the
context initiation for authentication-only separate on both the
frontend and backend in order to avoid concerns about changing the
requested flags to include encryption support.

In postmaster, pull GSSAPI authorization checking into a shared
function.  Also share the initiator name between the encryption and
non-encryption codepaths.

For HBA, add "hostgssenc" and "hostnogssenc" entries that behave
similarly to their SSL counterparts.  "hostgssenc" requires either
"gss", "trust", or "reject" for its authentication.

Similarly, add a "gssencmode" parameter to libpq.  Supported values are
"disable", "require", and "prefer".  Notably, negotiation will only be
attempted if credentials can be acquired.  Move credential acquisition
into its own function to support this behavior.

Add a simple pg_stat_gssapi view similar to pg_stat_ssl, for monitoring
if GSSAPI authentication was used, what principal was used, and if
encryption is being used on the connection.

Finally, add documentation for everything new, and update existing
documentation on connection security.

Thanks to Michael Paquier for the Windows fixes.

Author: Robbie Harwood, with changes to the read/write functions by me.
Reviewed in various forms and at different times by: Michael Paquier,
   Andres Freund, David Steele.
Discussion: https://www.postgresql.org/message-id/flat/jlg1tgq1ktm.fsf@thriss.redhat.com
2019-04-03 15:02:33 -04:00
Peter Eisentraut 481018f280 Add macro to cast away volatile without allowing changes to underlying type
This adds unvolatize(), which works just like unconstify() but for volatile.

Discussion: https://www.postgresql.org/message-id/flat/7a5cbea7-b8df-e910-0f10-04014bcad701%402ndquadrant.com
2019-03-25 09:37:03 +01:00
Michael Paquier 276d2e6c2d Make current_logfiles use permissions assigned to files in data directory
Since its introduction in 19dc233c, current_logfiles has been assigned
the same permissions as a log file, which can be enforced with
log_file_mode.  This setup can lead to incompatibility problems with
group access permissions as current_logfiles is not located in the log
directory, but at the root of the data folder.  Hence, if group
permissions are used but log_file_mode is more restrictive, a backup
with a user in the group having read access could fail even if the log
directory is located outside of the data folder.

Per discussion with the folks mentioned below, we have concluded that
current_logfiles should not be treated as a log file as it only stores
metadata related to log files, and that it should use the same
permissions as all other files in the data directory.  This solution has
the merit to be simple and fixes all the interaction problems between
group access and log_file_mode.

Author: Haribabu Kommi
Reviewed-by: Stephen Frost, Robert Haas, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAJrrPGcEotF1P7AWoeQyD3Pqr-0xkQg_Herv98DjbaMj+naozw@mail.gmail.com
Backpatch-through: 11, where group access has been added.
2019-03-24 21:00:35 +09:00
Tom Lane 0dfe3d0ef5 Make checkpoint requests more robust.
Commit 6f6a6d8b1 introduced a delay of up to 2 seconds if we're trying
to request a checkpoint but the checkpointer hasn't started yet (or,
much less likely, our kill() call fails).  However buildfarm experience
shows that that's not quite enough for slow or heavily-loaded machines.
There's no good reason to assume that the checkpointer won't start
eventually, so we may as well make the timeout much longer, say 60 sec.

However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
idea to be waiting at all, much less for as long as 60 sec.  We can
remove the need for that, and make this whole thing more robust, by
adjusting the code so that the existence of a pending checkpoint
request is clear from the contents of shared memory, and making sure
that the checkpointer process will notice it at startup even if it did
not get a signal.  In this way there's no need for a non-CHECKPOINT_WAIT
call to wait at all; if it can't send the signal, it can nonetheless
assume that the checkpointer will eventually service the request.

A potential downside of this change is that "kill -INT" on the checkpointer
process is no longer enough to trigger a checkpoint, should anyone be
relying on something so hacky.  But there's no obvious reason to do it
like that rather than issuing a plain old CHECKPOINT command, so we'll
assume that nobody is.  There doesn't seem to be a way to preserve this
undocumented quasi-feature without introducing race conditions.

Since a principal reason for messing with this is to prevent intermittent
buildfarm failures, back-patch to all supported branches.

Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us
2019-03-19 12:49:27 -04:00
Robert Haas f41551f61f Fold vacuum's 'int options' parameter into VacuumParams.
Many places need both, so this allows a few functions to take one
fewer parameter.  More importantly, as soon as we add a VACUUM
option that takes a non-Boolean parameter, we need to replace
'int options' with a struct, and it seems better to think
of adding more fields to VacuumParams rather than passing around
both VacuumParams and a separate struct as well.

Patch by me, reviewed by Masahiko Sawada

Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com
2019-03-18 13:57:33 -04:00
Thomas Munro c6c9474aaf Use condition variables to wait for checkpoints.
Previously we used a polling/sleeping loop to wait for checkpoints
to begin and end, which leads to up to a couple hundred milliseconds
of needless thumb-twiddling.  Use condition variables instead.

Author: Thomas Munro
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CA%2BhUKGLY7sDe%2Bbg1K%3DbnEzOofGoo4bJHYh9%2BcDCXJepb6DQmLw%40mail.gmail.com
2019-03-14 10:59:33 +13:00
Andres Freund c2fe139c20 tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al. have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 12:46:41 -07:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Magnus Hagander 6b9e875f72 Track block level checksum failures in pg_stat_database
This adds a column that counts how many checksum failures have occurred
on files belonging to a specific database. Both checksum failures
during normal backend processing and those created when a base backup
detects a checksum failure are counted.

Author: Magnus Hagander
Reviewed by: Julien Rouhaud
2019-03-09 10:47:30 -08:00
Andrew Dunstan 342cb650e0 Don't log incomplete startup packet if it's empty
This will stop logging cases where, for example, a monitor opens a
connection and immediately closes it. If the packet contains any data an
incomplete packet will still be logged.

Author: Tom Lane

Discussion: https://postgr.es/m/a1379a72-2958-1ed0-ef51-09a21219b155@2ndQuadrant.com
2019-03-06 15:36:41 -05:00
Alvaro Herrera 98098faaff Report correct name in autovacuum "work items" activity
We were reporting the database name instead of the relation name to
pg_stat_activity.  Repair.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20190220185552.GR28750@telsasoft.com
2019-02-22 13:00:16 -03:00