postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	1e53fe0e70	Use PostgresSigHupHandler in more places. There seems to be no reason for every background process to have its own flag indicating that a config-file reload is needed. Instead, let's just use ConfigFilePending for that purpose everywhere. Patch by me, reviewed by Andres Freund and Daniel Gustafsson. Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com	2019-12-17 13:03:57 -05:00
Robert Haas	5910d6c7e3	Move interrupt-handling code into subroutines. Some auxiliary processes, as well as the autovacuum launcher, have interrupt handling code directly in their main loops. Try to abstract things a little better by moving it into separate functions. This doesn't make any functional difference, and leaves in place relatively large differences among processes in how interrupts are handled, but hopefully it at least makes it easier to see the commonalities and differences across process types. Patch by me, reviewed by Andres Freund and Daniel Gustafsson. Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com	2019-12-17 12:55:13 -05:00
Robert Haas	0d3c3aae33	Use procsignal_sigusr1_handler for auxiliary processes. AuxiliaryProcessMain does ProcSignalInit, so one might expect that auxiliary processes would need to respond to SendProcSignal, but none of the auxiliary processes do that. Change them to use procsignal_sigusr1_handler instead of their own private handlers so that they do. Besides seeming more correct, this is also less code. It shouldn't make any functional difference right now because, as far as we know, there are no current cases where SendProcSignal targets an auxiliary process, but there are plans to change that in the future. Andres Freund Discussion: http://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de	2019-11-25 16:16:27 -05:00
Tom Lane	7bf40ea0d0	Avoid using SplitIdentifierString to parse ListenAddresses, too. This gets rid of our former behavior of forcibly downcasing the postmaster's hostname list and truncating the elements to NAMEDATALEN. In principle, DNS hostnames are case-insensitive so the first behavior should be harmless, and server hostnames are seldom long enough for the second behavior to be an issue. But it's still dubious, and an easy fix is available: just use SplitGUCList instead. AFAICT, all other SplitIdentifierString calls in the backend are OK: either the items actually are SQL identifiers, or they are keywords that are short and case-insensitive. Per thinking about bug #16106. While this has been wrong for a very long time, the lack of field complaints means there's little reason to back-patch. Discussion: https://postgr.es/m/16106-7d319e4295d08e70@postgresql.org	2019-11-13 13:51:58 -05:00
Amit Kapila	14aec03502	Make the order of the header file includes consistent in backend modules. Similar to commits `7e735035f2` and `dddf4cdc33`, this commit makes the order of header file inclusion consistent for backend modules. In the passing, removed a couple of duplicate inclusions. Author: Vignesh C Reviewed-by: Kuntal Ghosh and Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com	2019-11-12 08:30:16 +05:30
Andres Freund	01368e5d9d	Split all OBJS style lines in makefiles into one-line-per-entry style. When maintaining or merging patches, one of the most common sources for conflicts are the list of objects in makefiles. Especially when the split across lines has been changed on both sides, which is somewhat common due to attempting to stay below 80 columns, those conflicts are unnecessarily laborious to resolve. By splitting, and alphabetically sorting, OBJS style lines into one object per line, conflicts should be less frequent, and easier to resolve when they still occur. Author: Andres Freund Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de	2019-11-05 14:41:07 -08:00
Michael Paquier	e3db3f829f	Clean up properly error_context_stack in autovacuum worker on exception Any callback set would have no meaning in the context of an exception. As an autovacuum worker exits quickly in this context, this could be only an issue within EmitErrorReport(), where the elog hook is for example called. That's unlikely to going to be a problem, but let's be clean and consistent with other code paths handling exceptions. This is present since `2909419`, which introduced autovacuum. Author: Ashwin Agrawal Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CALfoeisM+_+dgmAdAOHAu0k-ZpEHHqSSG=GRf3pKJGm8OqWX0w@mail.gmail.com Backpatch-through: 9.4	2019-10-23 10:25:06 +09:00
Tom Lane	9abb2bfc04	In the postmaster, rely on the signal infrastructure to block signals. POSIX sigaction(2) can be told to block a set of signals while a signal handler executes. Make use of that instead of manually blocking and unblocking signals in the postmaster's signal handlers. This should save a few cycles, and it also prevents recursive invocation of signal handlers when many signals arrive in close succession. We have seen buildfarm failures that seem to be due to postmaster stack overflow caused by such recursion (exacerbated by a Linux PPC64 kernel bug). This doesn't change anything about the way that it works on Windows. Somebody might consider adjusting port/win32/signal.c to let it work similarly, but I'm not in a position to do that. For the moment, just apply to HEAD. Possibly we should consider back-patching this, but it'd be good to let it age awhile first. Discussion: https://postgr.es/m/14878.1570820201@sss.pgh.pa.us	2019-10-13 15:48:26 -04:00
Tom Lane	3887e9455f	Check for too many postmaster children before spawning a bgworker. The postmaster's code path for spawning a bgworker neglected to check whether we already have the max number of live child processes. That's a bit hard to hit, since it would necessarily be a transient condition; but if we do, AssignPostmasterChildSlot() fails causing a postmaster crash, as seen in a report from Bhargav Kamineni. To fix, invoke canAcceptConnections() in the bgworker code path, as we do in the other code paths that spawn children. Since we don't want the same pmState tests in this case, add a child-process-type parameter to canAcceptConnections() so that it can know what to do. Back-patch to 9.5. In principle the same hazard exists in 9.4, but the code is enough different that this patch wouldn't quite fix it there. Given the tiny usage of bgworkers in that branch it doesn't seem worth creating a variant patch for it. Discussion: https://postgr.es/m/18733.1570382257@sss.pgh.pa.us	2019-10-07 12:39:09 -04:00
Tom Lane	9a86f03b4e	Rearrange postmaster's startup sequence for better syslogger results. This is a second try at what commit `57431a911` tried to do, namely, launch the syslogger before we open postmaster sockets so that our messages about the sockets end up in the syslogger files. That commit fell foul of a bunch of subtle issues caused by trying to launch a postmaster child process before creating shared memory. Rather than messing with that interaction, let's postpone opening the sockets till after we launch the syslogger. This would not have been terribly safe before commit `7de19fbc0`, because we relied on socket opening to detect whether any competing postmasters were using the same port number. But now that we choose IPC keys without regard to the port number, there's no interaction to worry about. Also delay creation of the external PID file (if requested) till after the sockets are open, since external code could plausibly be relying on that ordering of events. And postpone most of the work of RemovePgTempFiles() so that that potentially-slow processing still happens after we make the external PID file. We have to be a bit careful about that last though: as noted in the discussion subsequent to bug #15804, EXEC_BACKEND builds still have to clear the parameter-file temp dir before launching the syslogger. Patch by me; thanks to Michael Paquier for review/testing. Discussion: https://postgr.es/m/15804-3721117bf40fb654@postgresql.org	2019-09-11 11:43:01 -04:00
Tom Lane	7de19fbc0b	Use data directory inode number, not port, to select SysV resource keys. This approach provides a much tighter binding between a data directory and the associated SysV shared memory block (and SysV or named-POSIX semaphores, if we're using those). Key collisions are still possible, but only between data directories stored on different filesystems, so the situation should be negligible in practice. More importantly, restarting the postmaster with a different port number no longer risks failing to identify a relevant shared memory block, even when postmaster.pid has been removed. A standalone backend is likewise much more certain to detect conflicting leftover backends. (In the longer term, we might now think about deprecating the port as a cluster-wide value, so that one postmaster could support sockets with varying port numbers. But that's for another day.) The hazards fixed here apply only on Unix systems; our Windows code paths already use identifiers derived from the data directory path name rather than the port. src/test/recovery/t/017_shm.pl, which intends to test key-collision cases, has been substantially rewritten since it can no longer use two postmasters with identical port numbers to trigger the case. Instead, use Perl's IPC::SharedMem module to create a conflicting shmem segment directly. The test script will be skipped if that module is not available. (This means that some older buildfarm members won't run it, but I don't think that that results in any meaningful coverage loss.) Patch by me; thanks to Noah Misch and Peter Eisentraut for discussion and review. Discussion: https://postgr.es/m/16908.1557521200@sss.pgh.pa.us	2019-09-05 13:31:46 -04:00
Michael Paquier	ae060a52b2	Fix thinko when ending progress report for a backend The logic ending progress reporting for a backend entry introduced by `b6fb647` causes callers of pgstat_progress_end_command() to do some extra work when track_activities is enabled as the process fields are reset in the backend entry even if no command were started for reporting. This resets the fields only if a command is registered for progress reporting, and only if track_activities is enabled. Author: Masahiho Sawada Discussion: https://postgr.es/m/CAD21AoCry_vJ0E-m5oxJXGL3pnos-xYGCzF95rK5Bbi3Uf-rpA@mail.gmail.com Backpatch-through: 9.6	2019-09-04 15:46:37 +09:00
Tom Lane	ee32782395	Fix postmaster state machine to handle dead_end child crashes better. A report from Alvaro Herrera shows that if we're in PM_STARTUP state, and we spawn a dead_end child to reject some incoming connection request, and that child dies with an unexpected exit code, the postmaster does not respond well. We correctly send SIGQUIT to the startup process, but then: * if the startup process exits with nonzero exit code, as expected, we thought that that indicated a crash and aborted startup. * if the startup process exits with zero exit code, which is possible due to the inherent race condition, we'd advance to PM_RUN state which is fine --- but the code forgot that AbortStartTime would be nonzero in this situation. We'd either die on the Asserts saying that it was zero, or perhaps misbehave later on. (A quick look suggests that the only misbehavior might be busy-waiting due to DetermineSleepTime doing the wrong thing.) To fix the first point, adjust the state-machine logic to recognize that a nonzero exit code is expected after sending SIGQUIT, and have it transition to a state where we can restart the startup process. To fix the second point, change the Asserts to clear the variable rather than just claiming it should be clear already. Perhaps we could improve this further by not treating a crash of a dead_end child as a reason for panic'ing the database. However, since those child processes are connected to shared memory, that seems a bit risky. There are few good reasons for a dead_end child to report failure anyway (the cause of this in Alvaro's report is quite unclear). On balance, therefore, a minimal fix seems best. This is an oversight in commit `45811be94`. While that was back-patched, I'm hesitant to back-patch this change. The lack of reasons for a dead_end child to fail suggests that the case should be very rare in the field, which squares with the lack of reports; so it seems like this might not be worth the risk of introducing new issues. In any case we can let it bake awhile in HEAD before considering a back-patch. Discussion: https://postgr.es/m/20190615160950.GA31378@alvherre.pgsql	2019-08-26 15:59:44 -04:00
Michael Paquier	c96581abe4	Fix inconsistencies and typos in the tree, take 11 This fixes various typos in docs and comments, and removes some orphaned definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com	2019-08-19 16:21:39 +09:00
Michael Paquier	66bde49d96	Fix inconsistencies and typos in the tree, take 10 This addresses some issues with unnecessary code comments, fixes various typos in docs and comments, and removes some orphaned structures and definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com	2019-08-13 13:53:41 +09:00
Michael Paquier	8548ddc61b	Fix inconsistencies and typos in the tree, take 9 This addresses more issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com	2019-08-05 12:14:58 +09:00
Michael Paquier	eb43f3d193	Fix inconsistencies and typos in the tree This is numbered take 8, and addresses again a set of issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com	2019-07-29 12:28:30 +09:00
Michael Paquier	6b8548964b	Fix inconsistencies in the code This addresses a couple of issues in the code: - Typos and inconsistencies in comments and function declarations. - Removal of unreferenced function declarations. - Removal of unnecessary compile flags. - A cleanup error in regressplans.sh. Author: Alexander Lakhin Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com	2019-07-08 13:15:09 +09:00
Peter Eisentraut	7e9a4c5c3d	Use consistent style for checking return from system calls Use if (something() != 0) error ... instead of just if (something) error ... The latter is not incorrect, but it's a bit confusing and not the common style. Discussion: https://www.postgresql.org/message-id/flat/5de61b6b-8be9-7771-0048-860328efe027%402ndquadrant.com	2019-07-07 15:28:49 +02:00
Michael Paquier	c74d49d41c	Fix many typos and inconsistencies Author: Alexander Lakhin Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com	2019-07-01 10:00:23 +09:00
Michael Paquier	f43608bda2	Fix typos and inconsistencies in code comments Author: Alexander Lakhin Discussion: https://postgr.es/m/dec6aae8-2d63-639f-4d50-20e229fb83e3@gmail.com	2019-06-14 09:34:34 +09:00
Noah Misch	31d250e049	Update stale comments, and fix comment typos.	2019-06-08 10:12:26 -07:00
Amit Kapila	92c4abc736	Fix assorted inconsistencies. There were a number of issues in the recent commits which include typos, code and comments mismatch, leftover function declarations. Fix them. Reported-by: Alexander Lakhin Author: Alexander Lakhin, Amit Kapila and Amit Langote Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com	2019-06-08 08:16:38 +05:30
Amit Kapila	9679345f3c	Fix typos. Reported-by: Alexander Lakhin Author: Alexander Lakhin Reviewed-by: Amit Kapila and Tom Lane Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com	2019-05-26 18:28:18 +05:30
Tom Lane	8255c7a5ee	Phase 2 pgindent run for v12. Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com	2019-05-22 13:04:48 -04:00
Tom Lane	8334515529	Revert "postmaster: Start syslogger earlier". This commit reverts `57431a911d`. While that's still a good idea in the abstract, we found out that there are multiple crasher bugs in it on Windows builds, making the logging_collector option unusable on Windows. There's no time left to fix these issues before 12beta1, so revert the patch to allow Windows beta testing to proceed. We'll try again at some future date. Per bug #15804 from Yulian Khodorkovskiy and additional investigation by Michael Paquier. Discussion: https://postgr.es/m/15804-3721117bf40fb654@postgresql.org	2019-05-19 11:14:23 -04:00
Alvaro Herrera	75445c1515	More message style fixes Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql	2019-05-16 19:14:31 -04:00
Tom Lane	85ccb6899c	Rearrange pgstat_bestart() to avoid failures within its critical section. We long ago decided to design the shared PgBackendStatus data structure to minimize the cost of writing status updates, which means that writers just have to increment the st_changecount field twice. That isn't hooked into any sort of resource management mechanism, which means that if something were to throw error between the two increments, the st_changecount field would be left odd indefinitely. That would cause readers to lock up. Now, since it's also a bad idea to leave the field odd for longer than absolutely necessary (because readers will spin while we have it set), the expectation was that we'd treat these segments like spinlock critical sections, with only short, more or less straight-line, code in them. That was fine as originally designed, but commit `9029f4b37` broke it by inserting a significant amount of non-straight-line code into pgstat_bestart(), code that is very capable of throwing errors, not to mention taking a significant amount of time during which readers will spin. We have a report from Neeraj Kumar of readers actually locking up, which I suspect was due to an encoding conversion error in X509_NAME_to_cstring, though conceivably it was just a garden-variety OOM failure. Subsequent commits have loaded even more dubious code into pgstat_bestart's critical section (and commit `fc70a4b0d` deserves some kind of booby prize for managing to miss the critical section entirely, although the negative consequences seem minimal given that the PgBackendStatus entry should be seen by readers as inactive at that point). The right way to fix this mess seems to be to compute all these values into a local copy of the process' PgBackendStatus struct, and then just copy the data back within the critical section proper. This plan can't be implemented completely cleanly because of the struct's heavy reliance on out-of-line strings, which we must initialize separately within the critical section. But still, the critical section is far smaller and safer than it was before. In hopes of forestalling future errors of the same ilk, rename the macros for st_changecount management to make it more apparent that the writer-side macros create a critical section. And to prevent the worst consequences if we nonetheless manage to mess it up anyway, adjust those macros so that they really are a critical section, ie they now bump CritSectionCount. That doesn't add much overhead, and it guarantees that if we do somehow throw an error while the counter is odd, it will lead to PANIC and a database restart to reset shared memory. Back-patch to 9.5 where the problem was introduced. In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach pgstat_read_current_status to copy st_gssstatus data from shared memory to local memory. Hence, subsequent use of that data within the transaction would potentially see changing data that it shouldn't see. Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com	2019-05-11 21:27:29 -04:00
Fujii Masao	b84dbc8eb8	Add TRUNCATE parameter to VACUUM. This commit adds new parameter to VACUUM command, TRUNCATE, which specifies that VACUUM should attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system. This parameter, if specified, overrides the vacuum_truncate reloption. If neither the reloption nor the VACUUM option is used, the default is true, as before. Author: Fujii Masao Reviewed-by: Julien Rouhaud, Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com	2019-05-08 02:10:33 +09:00
Tom Lane	8d0ddccec6	Avoid "invalid memory alloc request size" while reading pg_stat_activity. On a 64-bit machine, if you set track_activity_query_size and max_connections such that their product exceeds 1GB, shared memory setup will still succeed (given enough RAM), but attempts to read pg_stat_activity fail with "invalid memory alloc request size". Work around that by using MemoryContextAllocHuge to allocate the local copy of the activity strings. Using the "huge" API costs us nothing extra in normal cases, and it seems better than throwing an error and/or explaining to people why they can't do this. This situation seems insanely profligate today, but who knows what people will consider normal in ten or twenty years? So let's fix it in HEAD but not worry about a back-patch. Per report from James Tomson. Discussion: https://postgr.es/m/1CFDCCD6-B268-48D8-85C8-400D2790B2C3@pushd.com	2019-05-07 11:41:37 -04:00
Magnus Hagander	659e53498c	Fix union for pgstat message types The message type for temp files and for checksum failures were missing from the union. Due to the coding style used there was no compiler error when this happend. So change the code to actively use the union thereby producing a compiler error if the same mistake happens again, suggested by Tom Lane. Author: Julien Rouhaud Reported-By: Tomas Vondra Discussion: https://postgr.es/m/20190430163328.zd4rrlnbvgaqlcdz@development	2019-05-01 12:30:44 +02:00
Noah Misch	90e7f31773	Use preprocessor conditions compatible with Emacs indent. Emacs wrongly indented hundreds of subsequent lines.	2019-04-28 12:56:53 -07:00
Fujii Masao	978b032d1f	Fix function names in comments. Commit `3eb77eba5a` renamed some functions, but forgot to update some comments referencing to those functions. This commit fixes those function names in the comments. Kyotaro Horiguchi	2019-04-25 23:43:48 +09:00
Tom Lane	0fae846232	Fix some minor postmaster-state-machine issues. In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based on pmState. The restriction is unnecessary (PostmasterStateMachine should work in any state), not future-proof (since it makes too many assumptions about why the signal might be sent), and broken even today because a race condition can make it necessary to respond to the signal in PM_WAIT_READONLY state. The race condition seems unlikely, but if it did happen, a hot-standby postmaster could fail to shut down after receiving a smart-shutdown request. In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag if the fork attempt fails. Leaving it set allows us to try again in future iterations of the postmaster idle loop. (The startup process would eventually send a fresh request signal, but this change may allow us to retry the fork sooner.) Remove an obsolete comment and unnecessary test in PostmasterStateMachine's handling of PM_SHUTDOWN_2 state. It's not possible to have a live walreceiver in that state, and AFAICT has not been possible since commit `5e85315ea`. This isn't a live bug, but the false comment is quite confusing to readers. In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that we don't uselessly perform stat() calls that we're going to ignore the results of. Add some comments clarifying the behavior of MaybeStartWalReceiver; I very nearly rearranged it in a way that'd reintroduce the race condition fixed in `e5d494d78`. Mea culpa for not commenting that properly at the time. Back-patch to all supported branches. The PMSIGNAL_ADVANCE_STATE_MACHINE change is the only one of even minor significance, but we might as well keep this code in sync across branches. Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us	2019-04-24 14:15:44 -04:00
Andres Freund	fdc7efcc30	Allow pg_class xid & multixid horizons to not be set. This allows table AMs that don't need these horizons. This was already documented in the tableam relation_set_new_filenode callback, but an assert prevented if from actually working (the test AM code contained the change itself). Defang the asserts in the general code, and move the stronger ones into heap AM. Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid / relminmxid. Change the table_relation_copy_for_cluster() interface to allow the AM to overwrite the horizons that get set on the pg_class entry. This'd also in the future allow AMs like heap to compute a relfrozenxid during rewrite that's the table's actual minimum rather than a pre-determined value. Arguably it'd have been better to move the whole computation / setting of those values into the callback, but it seems likely that for other reasons it'd be better to be able to use one value to vacuum/cluster multiple tables (e.g. a toast's horizon shouldn't be different than the table's). Reported-By: Heikki Linnakangas Author: Andres Freund Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi	2019-04-23 21:42:12 -07:00
Noah Misch	c098509927	Consistently test for in-use shared memory. postmaster startup scrutinizes any shared memory segment recorded in postmaster.pid, exiting if that segment matches the current data directory and has an attached process. When the postmaster.pid file was missing, a starting postmaster used weaker checks. Change to use the same checks in both scenarios. This increases the chance of a startup failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1 postmaster.pid` && rm postmaster.pid && pg_ctl -w start". A postmaster will no longer stop if shmat() of an old segment fails with EACCES. A postmaster will no longer recycle segments pertaining to other data directories. That's good for production, but it's bad for integration tests that crash a postmaster and immediately delete its data directory. Such a test now leaks a segment indefinitely. No "make check-world" test does that. win32_shmem.c already avoided all these problems. In 9.6 and later, enhance PostgresNode to facilitate testing. Back-patch to 9.4 (all supported versions). Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI. Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com	2019-04-12 22:36:38 -07:00
Magnus Hagander	77bd49adba	Show shared object statistics in pg_stat_database This adds a row to the pg_stat_database view with datoid 0 and datname NULL for those objects that are not in a database. This was added particularly for checksums, but we were already tracking more satistics for these objects, just not returning it. Also add a checksum_last_failure column that holds the timestamptz of the last checksum failure that occurred in a database (or in a non-dataabase file), if any. Author: Julien Rouhaud <rjuju123@gmail.com>	2019-04-12 14:04:50 +02:00
Amit Kapila	bdf35744bd	Avoid counting transaction stats for parallel worker cooperating transaction. The transaction that is initiated by the parallel worker to cooperate with the actual transaction started by the main backend to complete the query execution should not be counted as a separate transaction. The other internal transactions started and committed by the parallel worker are still counted as separate transactions as we that is what we do in other places like autovacuum. This will partially fix the bloat in transaction stats due to additional transactions performed by parallel workers. For a complete fix, we need to decide how we want to show all the transactions that are started internally for various operations and that is a matter of separate patch. Reported-by: Haribabu Kommi Author: Haribabu Kommi Reviewed-by: Amit Kapila, Jamison Kirk and Rahila Syed Backpatch-through: 9.6 Discussion: https://postgr.es/m/CAJrrPGc9=jKXuScvNyQ+VNhO0FZk7LLAShAJRyZjnedd2D61EQ@mail.gmail.com	2019-04-10 08:24:15 +05:30
Noah Misch	617dc6d299	Avoid "could not reattach" by providing space for concurrent allocation. We've long had reports of intermittent "could not reattach to shared memory" errors on Windows. Buildfarm member dory fails that way when PGSharedMemoryReAttach() execution overlaps with creation of a thread for the process's "default thread pool". Fix that by providing a second region to receive asynchronous allocations that would otherwise intrude into UsedShmemSegAddr. In pgwin32_ReserveSharedMemoryRegion(), stop trying to free reservations landing at incorrect addresses; the caller's next step has been to terminate the affected process. Back-patch to 9.4 (all supported versions). Reviewed by Tom Lane. He also did much of the prerequisite research; see commit `bcbf2346d6`. Discussion: https://postgr.es/m/20190402135442.GA1173872@rfd.leadboat.com	2019-04-08 21:39:00 -07:00
Thomas Munro	de2b38419c	Wake up interested backends when a checkpoint fails. Commit `c6c9474a` switched to condition variables instead of sleep loops to notify backends of checkpoint start and stop, but forgot to broadcast in case of checkpoint failure. Author: Thomas Munro Discussion: https://postgr.es/m/CA%2BhUKGJKbCd%2B_K%2BSEBsbHxVT60SG0ivWHHAdvL0bLTUt2xpA2w%40mail.gmail.com	2019-04-06 09:31:48 +13:00
Noah Misch	82150a05be	Revert "Consistently test for in-use shared memory." This reverts commits `2f932f71d9`, `16ee6eaf80` and `6f0e190056`. The buildfarm has revealed several bugs. Back-patch like the original commits. Discussion: https://postgr.es/m/20190404145319.GA1720877@rfd.leadboat.com	2019-04-05 00:00:52 -07:00
Robert Haas	a96c41feec	Allow VACUUM to be run with index cleanup disabled. This commit adds a new reloption, vacuum_index_cleanup, which controls whether index cleanup is performed for a particular relation by default. It also adds a new option to the VACUUM command, INDEX_CLEANUP, which can be used to override the reloption. If neither the reloption nor the VACUUM option is used, the default is true, as before. Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me. The wording of the documentation is mostly due to me. Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com	2019-04-04 15:04:43 -04:00
Thomas Munro	3eb77eba5a	Refactor the fsync queue for wider use. Previously, md.c and checkpointer.c were tightly integrated so that fsync calls could be handed off and processed in the background. Introduce a system of callbacks and file tags, so that other modules can hand off fsync work in the same way. For now only md.c uses the new interface, but other users are being proposed. Since there may be use cases that are not strictly SMGR implementations, use a new function table for sync handlers rather than extending the traditional SMGR one. Instead of using a bitmapset of segment numbers for each RelFileNode in the checkpointer's hash table, make the segment number part of the key. This requires sending explicit "forget" requests for every segment individually when relations are dropped, but suits the file layout schemes of proposed future users better (ie sparse or high segment numbers). Author: Shawn Debnath and Thomas Munro Reviewed-by: Thomas Munro, Andres Freund Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com	2019-04-04 23:38:38 +13:00
Noah Misch	2f932f71d9	Consistently test for in-use shared memory. postmaster startup scrutinizes any shared memory segment recorded in postmaster.pid, exiting if that segment matches the current data directory and has an attached process. When the postmaster.pid file was missing, a starting postmaster used weaker checks. Change to use the same checks in both scenarios. This increases the chance of a startup failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1 postmaster.pid` && rm postmaster.pid && pg_ctl -w start". A postmaster will no longer recycle segments pertaining to other data directories. That's good for production, but it's bad for integration tests that crash a postmaster and immediately delete its data directory. Such a test now leaks a segment indefinitely. No "make check-world" test does that. win32_shmem.c already avoided all these problems. In 9.6 and later, enhance PostgresNode to facilitate testing. Back-patch to 9.4 (all supported versions). Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI. Discussion: https://postgr.es/m/20130911033341.GD225735@tornado.leadboat.com	2019-04-03 17:03:46 -07:00
Stephen Frost	b0b39f72b9	GSSAPI encryption support On both the frontend and backend, prepare for GSSAPI encryption support by moving common code for error handling into a separate file. Fix a TODO for handling multiple status messages in the process. Eliminate the OIDs, which have not been needed for some time. Add frontend and backend encryption support functions. Keep the context initiation for authentication-only separate on both the frontend and backend in order to avoid concerns about changing the requested flags to include encryption support. In postmaster, pull GSSAPI authorization checking into a shared function. Also share the initiator name between the encryption and non-encryption codepaths. For HBA, add "hostgssenc" and "hostnogssenc" entries that behave similarly to their SSL counterparts. "hostgssenc" requires either "gss", "trust", or "reject" for its authentication. Similarly, add a "gssencmode" parameter to libpq. Supported values are "disable", "require", and "prefer". Notably, negotiation will only be attempted if credentials can be acquired. Move credential acquisition into its own function to support this behavior. Add a simple pg_stat_gssapi view similar to pg_stat_ssl, for monitoring if GSSAPI authentication was used, what principal was used, and if encryption is being used on the connection. Finally, add documentation for everything new, and update existing documentation on connection security. Thanks to Michael Paquier for the Windows fixes. Author: Robbie Harwood, with changes to the read/write functions by me. Reviewed in various forms and at different times by: Michael Paquier, Andres Freund, David Steele. Discussion: https://www.postgresql.org/message-id/flat/jlg1tgq1ktm.fsf@thriss.redhat.com	2019-04-03 15:02:33 -04:00
Peter Eisentraut	481018f280	Add macro to cast away volatile without allowing changes to underlying type This adds unvolatize(), which works just like unconstify() but for volatile. Discussion: https://www.postgresql.org/message-id/flat/7a5cbea7-b8df-e910-0f10-04014bcad701%402ndquadrant.com	2019-03-25 09:37:03 +01:00
Michael Paquier	276d2e6c2d	Make current_logfiles use permissions assigned to files in data directory Since its introduction in `19dc233c`, current_logfiles has been assigned the same permissions as a log file, which can be enforced with log_file_mode. This setup can lead to incompatibility problems with group access permissions as current_logfiles is not located in the log directory, but at the root of the data folder. Hence, if group permissions are used but log_file_mode is more restrictive, a backup with a user in the group having read access could fail even if the log directory is located outside of the data folder. Per discussion with the folks mentioned below, we have concluded that current_logfiles should not be treated as a log file as it only stores metadata related to log files, and that it should use the same permissions as all other files in the data directory. This solution has the merit to be simple and fixes all the interaction problems between group access and log_file_mode. Author: Haribabu Kommi Reviewed-by: Stephen Frost, Robert Haas, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAJrrPGcEotF1P7AWoeQyD3Pqr-0xkQg_Herv98DjbaMj+naozw@mail.gmail.com Backpatch-through: 11, where group access has been added.	2019-03-24 21:00:35 +09:00
Tom Lane	0dfe3d0ef5	Make checkpoint requests more robust. Commit `6f6a6d8b1` introduced a delay of up to 2 seconds if we're trying to request a checkpoint but the checkpointer hasn't started yet (or, much less likely, our kill() call fails). However buildfarm experience shows that that's not quite enough for slow or heavily-loaded machines. There's no good reason to assume that the checkpointer won't start eventually, so we may as well make the timeout much longer, say 60 sec. However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad idea to be waiting at all, much less for as long as 60 sec. We can remove the need for that, and make this whole thing more robust, by adjusting the code so that the existence of a pending checkpoint request is clear from the contents of shared memory, and making sure that the checkpointer process will notice it at startup even if it did not get a signal. In this way there's no need for a non-CHECKPOINT_WAIT call to wait at all; if it can't send the signal, it can nonetheless assume that the checkpointer will eventually service the request. A potential downside of this change is that "kill -INT" on the checkpointer process is no longer enough to trigger a checkpoint, should anyone be relying on something so hacky. But there's no obvious reason to do it like that rather than issuing a plain old CHECKPOINT command, so we'll assume that nobody is. There doesn't seem to be a way to preserve this undocumented quasi-feature without introducing race conditions. Since a principal reason for messing with this is to prevent intermittent buildfarm failures, back-patch to all supported branches. Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us	2019-03-19 12:49:27 -04:00
Robert Haas	f41551f61f	Fold vacuum's 'int options' parameter into VacuumParams. Many places need both, so this allows a few functions to take one fewer parameter. More importantly, as soon as we add a VACUUM option that takes a non-Boolean parameter, we need to replace 'int options' with a struct, and it seems better to think of adding more fields to VacuumParams rather than passing around both VacuumParams and a separate struct as well. Patch by me, reviewed by Masahiko Sawada Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com	2019-03-18 13:57:33 -04:00
Thomas Munro	c6c9474aaf	Use condition variables to wait for checkpoints. Previously we used a polling/sleeping loop to wait for checkpoints to begin and end, which leads to up to a couple hundred milliseconds of needless thumb-twiddling. Use condition variables instead. Author: Thomas Munro Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGLY7sDe%2Bbg1K%3DbnEzOofGoo4bJHYh9%2BcDCXJepb6DQmLw%40mail.gmail.com	2019-03-14 10:59:33 +13:00
Andres Freund	c2fe139c20	tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql	2019-03-11 12:46:41 -07:00
Tom Lane	caf626b2cd	Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs. This change makes it possible to specify sub-millisecond delays, which work well on most modern platforms, though that was not true when the cost-delay feature was designed. To support this without breaking existing configuration entries, improve guc.c to allow floating-point GUCs to have units. Also, allow "us" (microseconds) as an input/output unit for time-unit GUCs. (It's not allowed as a base unit, at least not yet.) Likewise change the autovacuum_vacuum_cost_delay reloption to be floating-point; this forces a catversion bump because the layout of StdRdOptions changes. This patch doesn't in itself change the default values or allowed ranges for these parameters, and it should not affect the behavior for any already-allowed setting for them. Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us	2019-03-10 15:01:39 -04:00
Magnus Hagander	6b9e875f72	Track block level checksum failures in pg_stat_database This adds a column that counts how many checksum failures have occurred on files belonging to a specific database. Both checksum failures during normal backend processing and those created when a base backup detects a checksum failure are counted. Author: Magnus Hagander Reviewed by: Julien Rouhaud	2019-03-09 10:47:30 -08:00
Andrew Dunstan	342cb650e0	Don't log incomplete startup packet if it's empty This will stop logging cases where, for example, a monitor opens a connection and immediately closes it. If the packet contains any data an incomplete packet will still be logged. Author: Tom Lane Discussion: https://postgr.es/m/a1379a72-2958-1ed0-ef51-09a21219b155@2ndQuadrant.com	2019-03-06 15:36:41 -05:00
Alvaro Herrera	98098faaff	Report correct name in autovacuum "work items" activity We were reporting the database name instead of the relation name to pg_stat_activity. Repair. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20190220185552.GR28750@telsasoft.com	2019-02-22 13:00:16 -03:00
Michael Paquier	ea92368cd1	Move max_wal_senders out of max_connections for connection slot handling Since its introduction, max_wal_senders is counted as part of max_connections when it comes to define how many connection slots can be used for replication connections with a WAL sender context. This can lead to confusion for some users, as it could be possible to block a base backup or replication from happening because other backend sessions are already taken for other purposes by an application, and superuser-only connection slots are not a correct solution to handle that case. This commit makes max_wal_senders independent of max_connections for its handling of PGPROC entries in ProcGlobal, meaning that connection slots for WAL senders are handled using their own free queue, like autovacuum workers and bgworkers. One compatibility issue that this change creates is that a standby now requires to have a value of max_wal_senders at least equal to its primary. So, if a standby created enforces the value of max_wal_senders to be lower than that, then this could break failovers. Normally this should not be an issue though, as any settings of a standby are inherited from its primary as postgresql.conf gets normally copied as part of a base backup, so parameters would be consistent. Author: Alexander Kukushkin Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii Kliukin Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com	2019-02-12 10:07:56 +09:00
Peter Eisentraut	f60a0e9677	Add more columns to pg_stat_ssl Add columns client_serial and issuer_dn to pg_stat_ssl. These allow uniquely identifying the client certificate. Rename the existing column clientdn to client_dn, to make the naming more consistent and easier to read. Discussion: https://www.postgresql.org/message-id/flat/398754d8-6bb5-c5cf-e7b8-22e5f0983caf@2ndquadrant.com/	2019-02-01 00:33:47 +01:00
Peter Eisentraut	689d15e95e	Log PostgreSQL version number on startup Logging the PostgreSQL version on startup is useful for two reasons: There is a clear marker in the log file that a new postmaster is beginning, and it's useful for tracking the server version across startup while upgrading. Author: Christoph Berg <christoph.berg@credativ.de> Discussion: https://www.postgresql.org/message-id/flat/20181121144611.GJ15795@msg.credativ.de/	2019-01-30 23:26:10 +01:00
Peter Eisentraut	57431a911d	postmaster: Start syslogger earlier When the syslogger was originally added (`bdf8ef6925`), nothing was normally logged before the point where it was started. But since `f9dfa5c977`, the creation of sockets causes messages of level LOG to be written routinely, so those don't go to the syslogger now. To improve that, arrange the sequence in PostmasterMain() slightly so that the syslogger is started early enough to capture those messages. Discussion: https://www.postgresql.org/message-id/d5d50936-20b9-85f1-06bc-94a01c5040c1%402ndquadrant.com Reviewed-by: Christoph Berg <christoph.berg@credativ.de>	2019-01-30 21:10:56 +01:00
Andres Freund	a9c35cf85c	Change function call information to be variable length. Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de	2019-01-26 14:17:52 -08:00
Andres Freund	e7cc78ad43	Remove superfluous tqual.h includes. Most of these had been obsoleted by `568d4138c` / the SnapshotNow removal. This is is preparation for moving most of tqual.[ch] into either snapmgr.h or heapam.h, which in turn is in preparation for pluggable table AMs. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-01-21 12:15:02 -08:00
Andres Freund	e0c4ec0728	Replace uses of heap_open et al with the corresponding table_* function. Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de	2019-01-21 10:51:37 -08:00
Magnus Hagander	0301db623d	Replace @postgresql.org with @lists.postgresql.org for mailinglists Commit `c0d0e54084` replaced the ones in the documentation, but missed out on the ones in the code. Replace those as well, but unlike `c0d0e54084`, don't backpatch the code changes to avoid breaking translations.	2019-01-19 19:06:35 +01:00
Michael Paquier	42e2a58071	Fix typos in documentation and for one wait event These have been found while cross-checking for the use of unique words in the documentation, and a wait event was not getting generated in a way consistent to what the documentation provided. Author: Alexander Lakhin Discussion: https://postgr.es/m/9b5a3a85-899a-ae62-dbab-1e7943aa5ab1@gmail.com	2019-01-15 08:47:01 +09:00
Bruce Momjian	97c39498e5	Update copyright for 2019 Backpatch-through: certain files through 9.4	2019-01-02 12:44:25 -05:00
Michael Paquier	1707a0d2aa	Remove configure switch --disable-strong-random This removes a portion of infrastructure introduced by `fe0a0b5` to allow compilation of Postgres in environments where no strong random source is available, meaning that there is no linking to OpenSSL and no /dev/urandom (Windows having its own CryptoAPI). No systems shipped this century lack /dev/urandom, and the buildfarm is actually not testing this switch at all, so just remove it. This simplifies particularly some backend code which included a fallback implementation using shared memory, and removes a set of alternate regression output files from pgcrypto. Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20181230063219.GG608@paquier.xyz	2019-01-01 20:05:51 +09:00
Tom Lane	4203842a1c	Use pg_strong_random() to select each server process's random seed. Previously we just set the seed based on process ID and start timestamp. Both those values are directly available within the session, and can be found out or guessed by other users too, making the session's series of random(3) values fairly predictable. Up to now, our backend-internal uses of random(3) haven't seemed security-critical, but commit `88bdbd3f7` added one that potentially is: when using log_statement_sample_rate, a user might be able to predict which of his SQL statements will get logged. To improve this situation, upgrade the per-process seed initialization method to use pg_strong_random() if available, greatly reducing the predictability of the initial seed value. This adds a few tens of microseconds to process start time, but since backend startup time is at least a couple of milliseconds, that seems an acceptable price. This means that pg_strong_random() needs to be able to run without reliance on any backend infrastructure, since it will be invoked before any of that is up. It was safe for that already, but adjust comments and #include commands to make it clearer. Discussion: https://postgr.es/m/3859.1545849900@sss.pgh.pa.us	2018-12-29 17:56:06 -05:00
Michael Paquier	b981df4cc0	Prioritize history files when archiving At the end of recovery for the post-promotion process, a new history file is created followed by the last partial segment of the previous timeline. Based on the timing, the archiver would first try to archive the last partial segment and then the history file. This can delay the detection of a new timeline taken, particularly depending on the time it takes to transfer the last partial segment as it delays the moment the history file of the new timeline gets archived. This can cause promoted standbys to use the same timeline as one already taken depending on the circumstances if multiple instances look at archives at the same location. This commit changes the order of archiving so as history files are archived in priority over other file types, which reduces the likelihood of the same timeline being taken (still not reducing the window to zero), and it makes the archiver behave more consistently with the startup process doing its post-promotion business. Author: David Steele Reviewed-by: Michael Paquier, Kyotaro Horiguchi Discussion: https://postgr.es/m/929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net Backpatch-through: 9.5	2018-12-24 20:24:16 +09:00
Tom Lane	a73d083195	Modernize our code for looking up descriptive strings for Unix signals. At least as far back as the 2008 spec, POSIX has defined strsignal(3) for looking up descriptive strings for signal numbers. We hadn't gotten the word though, and were still using the crufty old sys_siglist array, which is in no standard even though most Unixen provide it. Aside from not being formally standards-compliant, this was just plain ugly because it involved #ifdef's at every place using the code. To eliminate the #ifdef's, create a portability function pg_strsignal, which wraps strsignal(3) if available and otherwise falls back to sys_siglist[] if available. The set of Unixen with neither API is probably empty these days, but on any platform with neither, you'll just get "unrecognized signal". All extant callers print the numeric signal number too, so no need to work harder than that. Along the way, upgrade pg_basebackup's child-error-exit reporting to match the rest of the system. Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us	2018-12-16 19:38:57 -05:00
Tom Lane	ade2d61ed0	Improve detection of child-process SIGPIPE failures. Commit `ffa4cbd62` added logic to detect SIGPIPE failure of a COPY child process, but it only worked correctly if the SIGPIPE occurred in the immediate child process. Depending on the shell in use and the complexity of the shell command string, we might instead get back an exit code of 128 + SIGPIPE, representing a shell error exit reporting SIGPIPE in the child process. We could just hack up ClosePipeToProgram() to add the extra case, but it seems like this is a fairly general issue deserving a more general and better-documented solution. I chose to add a couple of functions in src/common/wait_error.c, which is a natural place to know about wait-result encodings, that will test for either a specific child-process signal type or any child-process signal failure. Then, adjust other places that were doing ad-hoc tests of this type to use the common functions. In RestoreArchivedFile, this fixes a race condition affecting whether the process will report an error or just silently proc_exit(1): before, that depended on whether the intermediate shell got SIGTERM'd itself or reported a child process failing on SIGTERM. Like the previous patch, back-patch to v10; we could go further but there seems no real need to. Per report from Erik Rijkers. Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl	2018-12-16 14:32:14 -05:00
Michael Paquier	6d8727f95e	Ensure cleanup of orphan archive status files When a WAL segment is recycled, its ".ready" and ".done" status files get also automatically removed, however this is not done in a durable manner. Hence, in a subsequent crash, it could be possible that a ".ready" status file is still around with its corresponding segment already gone. If the backend reaches such a state, the archive command would most likely complain about a segment non-existing and would keep retrying, causing WAL segments to bloat pg_wal/, potentially making Postgres crash hard when running out of space. As status files are removed after each individual segment, using durable_unlink() does not completely close the window either, as a crash could happen between the moment the WAL segment is recycled and the moment its status files are removed. This has also some performance impact with the additional fsync() calls needed to make the removal in a durable manner. Doing the cleanup at recovery is not cost-free either as this makes crash recovery potentially take longer than necessary. So, instead, as per an idea of Stephen Frost, make the archiver aware of orphan status files and remove them on-the-fly if the corresponding segment goes missing. Removal failures follow a model close to what happens for WAL segments, where multiple attempts are done before giving up temporarily, and where a successful orphan removal makes the archiver move immediately to the next WAL segment thought as ready to be archived. Author: Michael Paquier Reviewed-by: Nathan Bossart, Andres Freund, Stephen Frost, Kyotaro Horiguchi Discussion: https://postgr.es/m/20180928032827.GF1500@paquier.xyz	2018-12-10 15:00:59 +09:00
Alvaro Herrera	3be5fe2b10	Silence compiler warnings Commit `cfdf4dc4fc` left a few unnecessary assignments, one of which caused compiler warnings, as reported by Erik Rijkers. Remove them all. Discussion: https://postgr.es/m/df0dcca2025b3d90d946ecc508ca9678@xs4all.nl	2018-11-23 13:01:05 -03:00
Thomas Munro	cfdf4dc4fc	Add WL_EXIT_ON_PM_DEATH pseudo-event. Users of the WaitEventSet and WaitLatch() APIs can now choose between asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death. This reduces code duplication, since almost all callers want the latter. Repair all code that was previously ignoring postmaster death completely, or requesting the event but ignoring it, or requesting the event but then doing an unconditional PostmasterIsAlive() call every time through its event loop (which is an expensive syscall on platforms for which we don't have USE_POSTMASTER_DEATH_SIGNAL support). Assert that callers of WaitLatchXXX() under the postmaster remember to ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent future bugs. The only process that doesn't handle postmaster death is syslogger. It waits until all backends holding the write end of the syslog pipe (including the postmaster) have closed it by exiting, to be sure to capture any parting messages. By using the WaitEventSet API directly it avoids the new assertion, and as a by-product it may be slightly more efficient on platforms that have epoll(). Author: Thomas Munro Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com, https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com	2018-11-23 20:46:34 +13:00
Andres Freund	578b229718	Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de	2018-11-20 16:00:17 -08:00
Tom Lane	37afc079ab	Avoid defining SIGTTIN/SIGTTOU on Windows. Setting them to SIG_IGN seems unlikely to have any beneficial effect on that platform, and given the signal numbering collision with SIGABRT, it could easily have bad effects. Given the lack of field complaints that can be traced to this, I don't presently feel a need to back-patch. Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us	2018-11-17 16:31:16 -05:00
Tom Lane	125f551c8b	Leave SIGTTIN/SIGTTOU signal handling alone in postmaster child processes. For reasons lost in the mists of time, most postmaster child processes reset SIGTTIN/SIGTTOU signal handling to SIG_DFL, with the major exception that backend sessions do not. It seems like a pretty bad idea for any postmaster children to do that: if stderr is connected to the terminal, and the user has put the postmaster in background, any log output would result in the child process freezing up. Hence, switch them all to doing what backends do, ie, nothing. This allows them to inherit the postmaster's SIG_IGN setting. On the other hand, manually-launched processes such as standalone backends will have default processing, which seems fine. In passing, also remove useless resets of SIGCONT and SIGWINCH signal processing. Perhaps the postmaster once changed those to something besides SIG_DFL, but it doesn't now, so these are just wasted (and confusing) syscalls. Basically, this propagates the changes made in commit `8e2998d8a` from backends to other postmaster children. Probably the only reason these calls now exist elsewhere is that I missed changing pgstat.c along with postgres.c at the time. Given the lack of field complaints that can be traced to this, I don't presently feel a need to back-patch. Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us	2018-11-17 16:31:16 -05:00
Thomas Munro	ab8984f52d	Further adjustment to random() seed initialization. Per complaint from Tom Lane, don't chomp the timestamp at 32 bits, so we can shift in some of its higher bits. Discussion: https://postgr.es/m/14712.1542253115%40sss.pgh.pa.us	2018-11-15 17:38:55 +13:00
Thomas Munro	5b0ce3ec33	Increase the number of possible random seeds per time period. Commit `197e4af9` refactored the initialization of the libc random() seed, but reduced the number of possible seeds values that could be chosen in a given time period. This negation of the effects of commit `98c50656c` was unintentional. Replace with code that shifts the fast moving timestamp bits left, similar to the original algorithm (though not the previous float-tolerating coding, which is no longer necessary). Author: Thomas Munro Reported-by: Noah Misch Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20181112083358.GA1073796%40rfd.leadboat.com	2018-11-15 16:25:30 +13:00
Michael Paquier	10074651e3	Add pg_promote function This function is able to promote a standby with this new SQL-callable function. Execution access can be granted to non-superusers so that failover tools can observe the principle of least privilege. Catalog version is bumped. Author: Laurenz Albe Reviewed-by: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/6e7c79b3ec916cf49742fb8849ed17cd87aed620.camel@cybertec.at	2018-10-25 09:46:00 +09:00
Michael Paquier	5ef037cf0b	List wait events in alphabetical order This changes the documentation, and the related structures so as everything is consistent. Some wait events were not listed alphabetically since their introduction, others have been added rather randomly. Keeping all those entries in order helps in maintenance, and helps the user looking at the documentation. Author: Michael Paquier, Kuntal Ghosh Discussion: https://postgr.es/m/20181024002539.GI1658@paquier.xyz Backpatch-through: 10, only for the documentation part to avoid an ABI breakage.	2018-10-24 17:02:37 +09:00
Thomas Munro	197e4af9d5	Refactor pid, random seed and start time initialization. Background workers, including parallel workers, were generating the same sequence of numbers in random(). This showed up as DSM handle collisions when Parallel Hash created multiple segments, but any code that calls random() in background workers could be affected if it cares about different backends generating different numbers. Repair by making sure that all new processes initialize the seed at the same time as they set MyProcPid and MyStartTime in a new function InitProcessGlobals(), called by the postmaster, its children and also standalone processes. Also add a new high resolution MyStartTimestamp as a potentially useful by-product, and remove SessionStartTime from struct Port as it is now redundant. No back-patch for now, as the known consequences so far are just a bunch of harmless shm_open(O_EXCL) collisions. Author: Thomas Munro Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEepm%3D2eJj_6%3DB%2B2tEpGu2nf1BjthCf9nXXUouYvJJ4C5WSwhg%40mail.gmail.com	2018-10-19 13:59:28 +13:00
Stephen Frost	8bddc86400	Add application_name to connection authorized msg The connection authorized message has quite a bit of useful information in it, but didn't include the application_name (when provided), so let's add that as it can be very useful. Note that at the point where we're emitting the connection authorized message, we haven't processed GUCs, so it's not possible to get this by using log_line_prefix (which pulls from the GUC). There's also something to be said for having this included in the connection authorized message and then not needing to repeat it for every line, as having it in log_line_prefix would do. The GUC cleans the application name to pure-ascii, so do that here too, but pull out the logic for cleaning up a string into its own function in common and re-use it from those places, and check_cluster_name which was doing the same thing. Author: Don Seiler <don@seiler.us> Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com	2018-09-28 19:04:50 -04:00
Peter Eisentraut	842cb9fa62	Refactor dlopen() support Nowadays, all platforms except Windows and older HP-UX have standard dlopen() support. So having a separate implementation per platform under src/backend/port/dynloader/ is a bit excessive. Instead, treat dlopen() like other library functions that happen to be missing sometimes and put a replacement implementation under src/port/. Discussion: https://www.postgresql.org/message-id/flat/e11a49cb-570a-60b7-707d-7084c8de0e61%402ndquadrant.com#54e735ae37476a121abb4e33c2549b03	2018-09-06 11:33:04 +02:00
Alexander Korotkov	ec74369931	Implement "pg_ctl logrotate" command Currently there are two ways to trigger log rotation in logging collector process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly to logging collector process. However, it's nice to have more suitable way for external tools to do that, which wouldn't require SQL connection or knowledge of logging collector pid. This commit implements triggering log rotation by "pg_ctl logrotate" command. Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov	2018-09-01 19:46:49 +03:00
Michael Paquier	55875b6d2a	Stop bgworkers during fast shutdown with postmaster in startup phase When a postmaster gets into its phase PM_STARTUP, it would start background workers using BgWorkerStart_PostmasterStart mode immediately, which would cause problems for a fast shutdown as the postmaster forgets to send SIGTERM to already-started background workers. With smart and immediate shutdowns, this correctly happened, and fast shutdown is the only mode missing the shot. Author: Alexander Kukushkin Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAFh8B=mvnD8+DZUfzpi50DoaDfZRDfd7S=gwj5vU9GYn8UvHkA@mail.gmail.com Backpatch-through: 9.5	2018-08-29 17:10:02 -07:00
Tom Lane	bff84a547d	Make syslogger more robust against failures in opening CSV log files. The previous coding figured it'd be good enough to postpone opening the first CSV log file until we got a message we needed to write there. This is unsafe, though, because if the open fails we end up in infinite recursion trying to report the failure. Instead make the CSV log file management code look as nearly as possible like the longstanding logic for the stderr log file. In particular, open it immediately at postmaster startup (if enabled), or when we get a SIGHUP in which we find that log_destination has been changed to enable CSV logging. It seems OK to fail if a postmaster-start-time open attempt fails, as we've long done for the stderr log file. But we can't die if we fail to open a CSV log file during SIGHUP, so we're still left with a problem. In that case, write any output meant for the CSV log file to the stderr log file. (This will also cover race-condition cases in which backends send CSV log data before or after we have the CSV log file open.) This patch also fixes an ancient oversight that, if CSV logging was turned off during a SIGHUP, we never actually closed the last CSV log file. In passing, remember to reset whereToSendOutput = DestNone during syslogger start, since (unlike all other postmaster children) it's forked before the postmaster has done that. This made for a platform-dependent difference in error reporting behavior between the syslogger and other children: except on Windows, it'd report problems to the original postmaster stderr as well as the normal error log file(s). It's barely possible that that was intentional at some point; but it doesn't seem likely to be desirable in production, and the platform dependency definitely isn't desirable. Per report from Alexander Kukushkin. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAFh8B==iLUD_gqC-dAENS0V+kVrCeGiKujtKqSQ7++S-caaChw@mail.gmail.com	2018-08-26 14:21:55 -04:00
Tom Lane	cc4f6b7786	Clean up assorted misuses of snprintf()'s result value. Fix a small number of places that were testing the result of snprintf() but doing so incorrectly. The right test for buffer overrun, per C99, is "result >= bufsize" not "result > bufsize". Some places were also checking for failure with "result == -1", but the standard only says that a negative value is delivered on failure. (Note that this only makes these places correct if snprintf() delivers C99-compliant results. But at least now these places are consistent with all the other places where we assume that.) Also, make psql_start_test() and isolation_start_test() check for buffer overrun while constructing their shell commands. There seems like a higher risk of overrun, with more severe consequences, here than there is for the individual file paths that are made elsewhere in the same functions, so this seemed like a worthwhile change. Also fix guc.c's do_serialize() to initialize errno = 0 before calling vsnprintf. In principle, this should be unnecessary because vsnprintf should have set errno if it returns a failure indication ... but the other two places this coding pattern is cribbed from don't assume that, so let's be consistent. These errors are all very old, so back-patch as appropriate. I think that only the shell command overrun cases are even theoretically reachable in practice, but there's not much point in erroneous error checks. Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us	2018-08-15 16:29:31 -04:00
Michael Paquier	246a6c8f7b	Make autovacuum more aggressive to remove orphaned temp tables Commit `dafa084`, added in 10, made the removal of temporary orphaned tables more aggressive. This commit makes an extra step into the aggressiveness by adding a flag in each backend's MyProc which tracks down any temporary namespace currently in use. The flag is set when the namespace gets created and can be reset if the temporary namespace has been created in a transaction or sub-transaction which is aborted. The flag value assignment is assumed to be atomic, so this can be done in a lock-less fashion like other flags already present in PGPROC like databaseId or backendId, still the fact that the temporary namespace and table created are still locked until the transaction creating those commits acts as a barrier for other backends. This new flag gets used by autovacuum to discard more aggressively orphaned tables by additionally checking for the database a backend is connected to as well as its temporary namespace in-use, removing orphaned temporary relations even if a backend reuses the same slot as one which created temporary relations in a past session. The base idea of this patch comes from Robert Haas, has been written in its first version by Tsunakawa Takayuki, then heavily reviewed by me. Author: Tsunakawa Takayuki Reviewed-by: Michael Paquier, Kyotaro Horiguchi, Andres Freund Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05 Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI breakages on already released versions.	2018-08-13 11:49:04 +02:00
Heikki Linnakangas	8e19a82640	Don't run atexit callbacks in quickdie signal handlers. exit() is not async-signal safe. Even if the libc implementation is, 3rd party libraries might have installed unsafe atexit() callbacks. After receiving SIGQUIT, we really just want to exit as quickly as possible, so we don't really want to run the atexit() callbacks anyway. The original report by Jimmy Yih was a self-deadlock in startup_die(). However, this patch doesn't address that scenario; the signal handling while waiting for the startup packet is more complicated. But at least this alleviates similar problems in the SIGQUIT handlers, like that reported by Asim R P later in the same thread. Backpatch to 9.3 (all supported versions). Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com	2018-08-08 19:10:32 +03:00
Tom Lane	41db97399d	Fix incorrect initialization of BackendActivityBuffer. Since commit `c8e8b5a6e`, this has been zeroed out using the wrong length. In practice the length would always be too small, leading to not zeroing the whole buffer rather than clobbering additional memory; and that's pretty harmless, both because shmem would likely start out as zeroes and because we'd reinitialize any given entry before use. Still, it's bogus, so fix it. Reported by Petru-Florin Mihancea (bug #15312) Discussion: https://postgr.es/m/153363913073.1303.6518849192351268091@wrigleys.postgresql.org	2018-08-07 16:00:44 -04:00
Tom Lane	3cb646264e	Use a ResourceOwner to track buffer pins in all cases. Historically, we've allowed auxiliary processes to take buffer pins without tracking them in a ResourceOwner. However, that creates problems for error recovery. In particular, we've seen multiple reports of assertion crashes in the startup process when it gets an error while holding a buffer pin, as for example if it gets ENOSPC during a write. In a non-assert build, the process would simply exit without releasing the pin at all. We've gotten away with that so far just because a failure exit of the startup process translates to a database crash anyhow; but any similar behavior in other aux processes could result in stuck pins and subsequent problems in vacuum. To improve this, institute a policy that we must always have a resowner backing any attempt to pin a buffer, which we can enforce just by removing the previous special-case code in resowner.c. Add infrastructure to make it easy to create a process-lifespan AuxProcessResourceOwner and clear out its contents at appropriate times. Replace existing ad-hoc resowner management in bgwriter.c and other aux processes with that. (Thus, while the startup process gains a resowner where it had none at all before, some other aux process types are replacing an ad-hoc resowner with this code.) Also use the AuxProcessResourceOwner to manage buffer pins taken during StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap process or a standalone backend rather than a true auxiliary process. In passing, remove some other ad-hoc resource owner creations that had gotten cargo-culted into various other places. As far as I can tell that was all unnecessary, and if it had been necessary it was incomplete, due to lacking any provision for clearing those resowners later. (Also worth noting in this connection is that a process that hasn't called InitBufferPoolBackend has no business accessing buffers; so there's more to do than just add the resowner if we want to touch buffers in processes not covered by this patch.) Although this fixes a very old bug, no back-patch, because there's no evidence of any significant problem in non-assert builds. Patch by me, pursuant to a report from Justin Pryzby. Thanks to Robert Haas and Kyotaro Horiguchi for reviews. Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com	2018-07-18 12:15:16 -04:00
Michael Paquier	edc6b41bd4	Rename VACOPT_NOWAIT to VACOPT_SKIP_LOCKED When it comes to SELECT ... FOR or LOCK, NOWAIT means to not wait for something to happen, and issue an error. SKIP LOCKED means to not wait for something to happen but to move on without issuing an error. The internal option of autovacuum and autoanalyze mentioned above, used only when wraparound is not involved was named NOWAIT, but behaves like SKIP LOCKED which is confusing. Author: Nathan Bossart Discussion: https://postgr.es/m/20180307050345.GA3095@paquier.xyz	2018-07-12 14:28:28 +09:00
Michael Paquier	c55de5e512	Add wait event for fsync of WAL segments This has been visibly a forgotten spot in the first implementation of wait events for I/O added by `249cf07`, and what has been missing is a fsync call for WAL segments which is a wrapper reacting on the value of GUC wal_sync_method. Reported-by: Konstantin Knizhnik Author: Konstantin Knizhnik Reviewed-by: Craig Ringer, Michael Paquier Discussion: https://postgr.es/m/4a243897-0ad8-f471-aa40-242591f2476e@postgrespro.ru	2018-07-02 22:19:46 +09:00
Tom Lane	a7a7387575	Further improve code for probing the availability of ARM CRC instructions. Andrew Gierth pointed out that commit `1c72ec6f4` would yield the wrong answer on big-endian ARM systems, because the data being CRC'd would be different. To fix that, and avoid the rather unsightly hard-wired constant, simply compare the hardware and software implementations' results. While we're at it, also log the resulting decision at DEBUG1, and error out if the hw and sw results unexpectedly differ. Also, since this file must compile for both frontend and backend, avoid incorrect dependencies on backend-only headers. In passing, add a comment to postmaster.c about when the CRC function pointer will get initialized. Thomas Munro, based on complaints from Andrew Gierth and Tom Lane Discussion: https://postgr.es/m/HE1PR0801MB1323D171938EABC04FFE7FA9E3110@HE1PR0801MB1323.eurprd08.prod.outlook.com	2018-05-03 11:32:57 -04:00
Tom Lane	9cb7db3f0c	In AtEOXact_Files, complain if any files remain unclosed at commit. This change makes this module act more like most of our other low-level resource management modules. It's a caller error if something is not explicitly closed by the end of a successful transaction, so issue a WARNING about it. This would not actually have caught the file leak bug fixed in commit `231bcd080`, because that was in a transaction-abort path; but it still seems like a good, and pretty cheap, cross-check. Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org	2018-04-28 17:45:02 -04:00
Heikki Linnakangas	811969b218	Allocate enough shared string memory for stats of auxiliary processes. This fixes a bug whereby the st_appname, st_clienthostname, and st_activity_raw fields for auxiliary processes point beyond the end of their respective shared memory segments. As a result, the application_name of a backend might show up as the client hostname of an auxiliary process. Backpatch to v10, where this bug was introduced, when the auxiliary processes were added to the array. Author: Edmund Horner Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com	2018-04-11 23:39:49 +03:00
Heikki Linnakangas	a820b4c329	Make local copy of client hostnames in backend status array. The other strings, application_name and query string, were snapshotted to local memory in pgstat_read_current_status(), but we forgot to do that for client hostnames. As a result, the client hostname would appear to change in the local copy, if the client disconnected. Backpatch to all supported versions. Author: Edmund Horner Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com	2018-04-11 23:39:48 +03:00
Magnus Hagander	a228cc13ae	Revert "Allow on-line enabling and disabling of data checksums" This reverts the backend sides of commit `1fde38beaa`. I have, at least for now, left the pg_verify_checksums tool in place, as this tool can be very valuable without the rest of the patch as well, and since it's a read-only tool that only runs when the cluster is down it should be a lot safer.	2018-04-09 19:03:42 +02:00
Stephen Frost	2b74022473	Fix EXEC BACKEND + Windows builds for group privs Under EXEC BACKEND we also need to be going through the group privileges setup since we do support that on Unixy systems, so add that to SubPostmasterMain(). Under Windows, we need to simply return true from GetDataDirectoryCreatePerm(), but that wasn't happening due to a missing #else clause. Per buildfarm.	2018-04-07 19:01:43 -04:00
Stephen Frost	c37b3d08ca	Allow group access on PGDATA Allow the cluster to be optionally init'd with read access for the group. This means a relatively non-privileged user can perform a backup of the cluster without requiring write privileges, which enhances security. The mode of PGDATA is used to determine whether group permissions are enabled for directory and file creates. This method was chosen as it's simple and works well for the various utilities that write into PGDATA. Changing the mode of PGDATA manually will not automatically change the mode of all the files contained therein. If the user would like to enable group access on an existing cluster then changing the mode of all the existing files will be required. Note that pg_upgrade will automatically change the mode of all migrated files if the new cluster is init'd with the -g option. Tests are included for the backend and all the utilities which operate on the PG data directory to ensure that the correct mode is set based on the data directory permissions. Author: David Steele <david@pgmasters.net> Reviewed-By: Michael Paquier, with discussion amongst many others. Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net	2018-04-07 17:45:39 -04:00
Stephen Frost	da9b580d89	Refactor dir/file permissions Consolidate directory and file create permissions for tools which work with the PG data directory by adding a new module (common/file_perm.c) that contains variables (pg_file_create_mode, pg_dir_create_mode) and constants to initialize them (0600 for files and 0700 for directories). Convert mkdir() calls in the backend to MakePGDirectory() if the original call used default permissions (always the case for regular PG directories). Add tests to make sure permissions in PGDATA are set correctly by the tools which modify the PG data directory. Authors: David Steele <david@pgmasters.net>, Adam Brightwell <adam.brightwell@crunchydata.com> Reviewed-By: Michael Paquier, with discussion amongst many others. Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net	2018-04-07 17:45:39 -04:00
Magnus Hagander	1fde38beaa	Allow on-line enabling and disabling of data checksums This makes it possible to turn checksums on in a live cluster, without the previous need for dump/reload or logical replication (and to turn it off). Enabling checkusm starts a background process in the form of a launcher/worker combination that goes through the entire database and recalculates checksums on each and every page. Only when all pages have been checksummed are they fully enabled in the cluster. Any failure of the process will revert to checksums off and the process has to be started. This adds a new WAL record that indicates the state of checksums, so the process works across replicated clusters. Authors: Magnus Hagander and Daniel Gustafsson Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin	2018-04-05 22:04:48 +02:00
Magnus Hagander	eed1ce72e1	Allow background workers to bypass datallowconn THis adds a "flags" field to the BackgroundWorkerInitializeConnection() and BackgroundWorkerInitializeConnectionByOid(). For now only one flag, BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore datallowconn.	2018-04-05 19:02:45 +02:00
Alvaro Herrera	484a4a08ab	Log when a BRIN autosummarization request fails Autovacuum's 'workitem' request queue is of limited size, so requests can fail if they arrive more quickly than autovacuum can process them. Emit a log message when this happens, to provide better visibility of this. Backpatch to 10. While this represents an API change for AutoVacuumRequestWork, that function is not yet prepared to deal with external modules calling it, so there doesn't seem to be any risk (other than log spam, that is.) Author: Masahiko Sawada Reviewed-by: Fabrízio Mello, Ildar Musin, Álvaro Herrera Discussion: https://postgr.es/m/CAD21AoB1HrQhp6_4rTyHN5kWEJCEsG8YzsjZNt-ctoXSn5Uisw@mail.gmail.com	2018-03-14 11:59:40 -03:00
Tom Lane	38f7831d70	Avoid holding AutovacuumScheduleLock while rechecking table statistics. In databases with many tables, re-fetching the statistics takes some time, so that this behavior seriously decreases the available concurrency for multiple autovac workers. There's discussion afoot about more complete fixes, but a simple and back-patchable amelioration is to claim the table and release the lock before rechecking stats. If we find out there's no longer a reason to process the table, re-taking the lock to un-claim the table is cheap enough. (This patch is quite old, but got lost amongst a discussion of more aggressive fixes. It's not clear when or if such a fix will be accepted, but in any case it'd be unlikely to get back-patched. Let's do this now so we have some improvement for the back branches.) In passing, make the normal un-claim step take AutovacuumScheduleLock not AutovacuumLock, since that is what is documented to protect the wi_tableoid field. This wasn't an actual bug in view of the fact that readers of that field hold both locks, but it creates some concurrency penalty against operations that need only AutovacuumLock. Back-patch to all supported versions. Jeff Janes Discussion: https://postgr.es/m/26118.1520865816@sss.pgh.pa.us	2018-03-13 12:28:35 -04:00
Tom Lane	4e0c743c18	Fix cross-checking of ReservedBackends/max_wal_senders/MaxConnections. We were independently checking ReservedBackends < MaxConnections and max_wal_senders < MaxConnections, but because walsenders aren't allowed to use superuser-reserved connections, that's really the wrong thing. Correct behavior is to insist on ReservedBackends + max_wal_senders being less than MaxConnections. Fix the code and associated documentation. This has been wrong for a long time, but since the situation probably hardly ever arises in the field (especially pre-v10, when the default for max_wal_senders was zero), no back-patch. Discussion: https://postgr.es/m/28271.1520195491@sss.pgh.pa.us	2018-03-08 11:25:26 -05:00
Noah Misch	582edc369c	Empty search_path in Autovacuum and non-psql/pgbench clients. This makes the client programs behave as documented regardless of the connect-time search_path and regardless of user-created objects. Today, a malicious user with CREATE permission on a search_path schema can take control of certain of these clients' queries and invoke arbitrary SQL functions under the client identity, often a superuser. This is exploitable in the default configuration, where all users have CREATE privilege on schema "public". This changes behavior of user-defined code stored in the database, like pg_index.indexprs and pg_extension_config_dump(). If they reach code bearing unqualified names, "does not exist" or "no schema has been selected to create in" errors might appear. Users may fix such errors by schema-qualifying affected names. After upgrading, consider watching server logs for these errors. The --table arguments of src/bin/scripts clients have been lax; for example, "vacuumdb -Zt pg_am\;CHECKPOINT" performed a checkpoint. That now fails, but for now, "vacuumdb -Zt 'pg_am(amname);CHECKPOINT'" still performs a checkpoint. Back-patch to 9.3 (all supported versions). Reviewed by Tom Lane, though this fix strategy was not his first choice. Reported by Arseniy Sharoglazov. Security: CVE-2018-1058	2018-02-26 07:39:44 -08:00
Robert Haas	9da0cc3528	Support parallel btree index builds. To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com	2018-02-02 13:32:44 -05:00
Peter Eisentraut	c1869542b3	Use abstracted SSL API in server connection log messages The existing "connection authorized" server log messages used OpenSSL API calls directly, even though similar abstracted API calls exist. Change to use the latter instead. Change the function prototype for the functions that return the TLS version and the cipher to return const char * directly instead of copying into a buffer. That makes them slightly easier to use. Add bits= to the message. psql shows that, so we might as well show the same information on the client and server. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-01-26 09:50:46 -05:00
Alvaro Herrera	95be5ce1bc	Remove unnecessary include autovacuum.c no longer needs dsa.h, since commit `31ae1638ce`. Author: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoCWvYyXrvdANSHWWWEWJH5TeAWAkJ_2gqrHhukG+OBo1g@mail.gmail.com	2018-01-23 15:22:13 -03:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Andres Freund	1804284042	Add parallel-aware hash joins. Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com	2017-12-21 00:43:41 -08:00
Robert Haas	28724fd90d	Report failure to start a background worker. When a worker is flagged as BGW_NEVER_RESTART and we fail to start it, or if it is not marked BGW_NEVER_RESTART but is terminated before startup succeeds, what BgwHandleStatus should be reported? The previous code really hadn't considered this possibility (as indicated by the comments which ignore it completely) and would typically return BGWH_NOT_YET_STARTED, but that's not a good answer, because then there's no way for code using GetBackgroundWorkerPid() to tell the difference between a worker that has not started but will start later and a worker that has not started and will never be started. So, when this case happens, return BGWH_STOPPED instead. Update the comments to reflect this. The preceding fix by itself is insufficient to fix the problem, because the old code also didn't send a notification to the process identified in bgw_notify_pid when startup failed. That might've been technically correct under the theory that the status of the worker was BGWH_NOT_YET_STARTED, because the status would indeed not change when the worker failed to start, but now that we're more usefully reporting BGWH_STOPPED, a notification is needed. Without these fixes, code which starts background workers and then uses the recommended APIs to wait for those background workers to start would hang indefinitely if the postmaster failed to fork a worker. Amit Kapila and Robert Haas Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com	2017-12-06 08:58:27 -05:00
Tom Lane	2069e6faa0	Clean up assorted messiness around AllocateDir() usage. This patch fixes a couple of low-probability bugs that could lead to reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE) concerning directory-open or file-open failures. It also fixes places where we took shortcuts in reporting such errors, either by using elog instead of ereport or by using ereport but forgetting to specify an errcode. And it eliminates a lot of just plain redundant error-handling code. In service of all this, export fd.c's formerly-static function ReadDirExtended, so that external callers can make use of the coding pattern dir = AllocateDir(path); while ((de = ReadDirExtended(dir, path, LOG)) != NULL) if they'd like to treat directory-open failures as mere LOG conditions rather than errors. Also fix FreeDir to be a no-op if we reach it with dir == NULL, as such a coding pattern would cause. Then, remove code at many call sites that was throwing an error or log message for AllocateDir failure, as ReadDir or ReadDirExtended can handle that job just fine. Aside from being a net code savings, this gets rid of a lot of not-quite-up-to-snuff reports, as mentioned above. (In some places these changes result in replacing a custom error message such as "could not open tablespace directory" with more generic wording "could not open directory", but it was agreed that the custom wording buys little as long as we report the directory name.) In some other call sites where we can't just remove code, change the error reports to be fully project-style-compliant. Also reorder code in restoreTwoPhaseData that was acquiring a lock between AllocateDir and ReadDir; in the unlikely but surely not impossible case that LWLockAcquire changes errno, AllocateDir failures would be misreported. There is no great value in opening the directory before acquiring TwoPhaseStateLock, so just do it in the other order. Also fix CheckXLogRemoved to guarantee that it preserves errno, as quite a number of call sites are implicitly assuming. (Again, it's unlikely but I think not impossible that errno could change during a SpinLockAcquire. If so, this function was broken for its own purposes as well as breaking callers.) And change a few places that were using not-per-project-style messages, such as "could not read directory" when "could not open directory" is more correct. Back-patch the exporting of ReadDirExtended, in case we have occasion to back-patch some fix that makes use of it; it's not needed right now but surely making it global is pretty harmless. Also back-patch the restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is essentially cosmetic and need not get back-patched. Michael Paquier, with a bit of additional work by me Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com	2017-12-04 17:02:56 -05:00
Robert Haas	eaedf0df71	Update typedefs.list and re-run pgindent Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com	2017-11-29 09:24:24 -05:00
Robert Haas	ae65f6066d	Provide for forward compatibility with future minor protocol versions. Previously, any attempt to request a 3.x protocol version other than 3.0 would lead to a hard connection failure, which made the minor protocol version really no different from the major protocol version and precluded gentle protocol version breaks. Instead, when the client requests a 3.x protocol version where x is greater than 0, send the new NegotiateProtocolVersion message to convey that we support only 3.0. This makes it possible to introduce new minor protocol versions without requiring a connection retry when the server is older. In addition, if the startup packet includes name/value pairs where the name starts with "_pq_.", assume that those are protocol options, not GUCs. Include those we don't support (i.e. all of them, at present) in the NegotiateProtocolVersion message so that the client knows they were not understood. This makes it possible for the client to request previously-unsupported features without bumping the protocol version at all; the client can tell from the server's response whether the option was understood. It will take some time before servers that support these new facilities become common in the wild; to speed things up and make things easier for a future 3.1 protocol version, back-patch to all supported releases. Robert Haas and Badrul Chowdhury Discussion: http://postgr.es/m/BN6PR21MB0772FFA0CBD298B76017744CD1730@BN6PR21MB0772.namprd21.prod.outlook.com Discussion: http://postgr.es/m/30788.1498672033@sss.pgh.pa.us	2017-11-21 13:56:24 -05:00
Peter Eisentraut	0e1539ba0d	Add some const decorations to prototypes Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2017-11-10 13:38:57 -05:00
Peter Eisentraut	2eb4a831e5	Change TRUE/FALSE to true/false The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-11-08 11:37:28 -05:00
Alvaro Herrera	be72b9c378	Fix autovacuum work item error handling In autovacuum's "work item" processing, a few strings were allocated in the current transaction's memory context, which goes away during error handling; if an error happened during execution of the work item, the pfree() calls to clean up afterwards would try to release already-released memory, possibly leading to a crash. In branch master, this was already fixed by commit `335f3d04e4`, so backpatch that to REL_10_STABLE to fix the problem there too. As a secondary problem, verify that the autovacuum worker is connected to the right database for each work item; otherwise some items would be discarded by workers in other databases. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20171014035732.GB31726@telsasoft.com	2017-10-30 15:52:02 +01:00
Tom Lane	11d8d72c27	Allow multiple tables to be specified in one VACUUM or ANALYZE command. Not much to say about this; does what it says on the tin. However, formerly, if there was a column list then the ANALYZE action was implied; now it must be specified, or you get an error. This is because it would otherwise be a bit unclear what the user meant if some tables have column lists and some don't. Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some editorialization by me Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com	2017-10-03 18:53:44 -04:00
Andres Freund	0ba99c84e8	Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h. All postgres internal usages are replaced, it's just libpq example usages that haven't been converted. External users of libpq can't generally rely on including postgres internal headers. Note that this includes replacing open-coded byte swapping of 64bit integers (using two 32 bit swaps) with a single 64bit swap. Where it looked applicable, I have removed netinet/in.h and arpa/inet.h usage, which previously provided the relevant functionality. It's perfectly possible that I missed other reasons for including those, the buildfarm will tell. Author: Andres Freund Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de	2017-10-01 15:36:14 -07:00
Peter Eisentraut	5373bc2a08	Add background worker type Add bgw_type field to background worker structure. It is intended to be set to the same value for all workers of the same type, so they can be grouped in pg_stat_activity, for example. The backend_type column in pg_stat_activity now shows bgw_type for a background worker. The ps listing also no longer calls out that a process is a background worker but just show the bgw_type. That way, being a background worker is more of an implementation detail now that is not shown to the user. However, most log messages still refer to 'background worker "%s"'; otherwise constructing sensible and translatable log messages would become tricky. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2017-09-29 11:08:24 -04:00
Tom Lane	335f3d04e4	Improve memory management in autovacuum.c. Invoke vacuum(), as well as "work item" processing, in the PortalContext that do_autovacuum() has manufactured, which will be reset before each such invocation. This ensures cleanup of any memory leaked by these operations. It also avoids the rather dangerous practice of calling vacuum() in a context that vacuum() itself will destroy while it runs. There's no known live bug there, but it's not hard to imagine introducing one if we leave it like this. Tom Lane, reviewed by Michael Paquier and Alvaro Herrera Discussion: https://postgr.es/m/13849.1506114543@sss.pgh.pa.us	2017-09-23 13:28:16 -04:00
Peter Eisentraut	be87b70b61	Sync process names between ps and pg_stat_activity Remove gratuitous differences in the process names shown in pg_stat_activity.backend_type and the ps output. Reviewed-by: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>	2017-09-20 08:59:03 -04:00
Andres Freund	fc49e24fa6	Make WAL segment size configurable at initdb time. For performance reasons a larger segment size than the default 16MB can be useful. A larger segment size has two main benefits: Firstly, in setups using archiving, it makes it easier to write scripts that can keep up with higher amounts of WAL, secondly, the WAL has to be written and synced to disk less frequently. But at the same time large segment size are disadvantageous for smaller databases. So far the segment size had to be configured at compile time, often making it unrealistic to choose one fitting to a particularly load. Therefore change it to a initdb time setting. This includes a breaking changes to the xlogreader.h API, which now requires the current segment size to be configured. For that and similar reasons a number of binaries had to be taught how to recognize the current segment size. Author: Beena Emerson, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com	2017-09-19 22:03:48 -07:00
Andres Freund	896537f078	s/NULL byte/NUL byte/ in comment refering to C string terminator. Reported-By: Robert Haas Discussion: https://postgr.es/m/CA+Tgmoa+YBvWgFST2NVoeXjVSohEpK=vqnVCsoCkhTVVxfLcVQ@mail.gmail.com	2017-09-19 16:41:07 -07:00
Andres Freund	71edbb6f66	Avoid use of non-portable strnlen() in pgstat_clip_activity(). The use of strnlen rather than strlen was just paranoia. Instead of giving up on the paranoia, just implement the safeguard differently. And add a comment explaining why we're careful. Author: Andres Freund Discussion: https://postgr.es/m/E1duOkJ-0001Mc-U5@gemulon.postgresql.org	2017-09-19 14:25:47 -07:00
Andres Freund	54b6cd589a	Speedup pgstat_report_activity by moving mb-aware truncation to read side. Previously multi-byte aware truncation was done on every pgstat_report_activity() call - proving to be a bottleneck for workloads with long query strings that execute quickly. Instead move the truncation to the read side, which commonly is executed far less frequently. That's possible because all server encodings allow to determine the length of a multi-byte string from the first byte. Rename PgBackendStatus.st_activity to st_activity_raw so existing extension users of the field break - their code has to be adjusted to use pgstat_clip_activity(). Author: Andres Freund Tested-By: Khuntal Ghosh Reviewed-By: Robert Haas, Tom Lane Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkviecpz@alap3.anarazel.de	2017-09-19 12:51:14 -07:00
Andres Freund	ec9e05b3c3	Fix crash restart bug introduced in `8356753c21`. The bug was caused by not re-reading the control file during crash recovery restarts, which lead to an attempt to pfree() shared memory contents. The fix is to re-read the control file, which seems good anyway. It's unclear as of this moment, whether we want to keep the refactoring introduced in the commit referenced above, or come up with an alternative approach. But fixing the bug in the mean time seems like a good idea regardless. A followup commit will introduce regression test coverage for crash restarts. Reported-By: Tom Lane Discussion: https://postgr.es/m/14134.1505572349@sss.pgh.pa.us	2017-09-18 17:25:49 -07:00
Andres Freund	8356753c21	Perform only one ReadControlFile() during startup. Previously we read the control file in multiple places. But soon the segment size will be configurable and stored in the control file, and that needs to be available earlier than it currently is needed. Instead of adding yet another place where it's read, refactor things so there's a single processing of the control file during startup (in EXEC_BACKEND that's every individual backend's startup). Author: Andres Freund Discussion: http://postgr.es/m/20170913092828.aozd3gvvmw67gmyc@alap3.anarazel.de	2017-09-14 14:14:34 -07:00
Robert Haas	baaf272ac9	Use group updates when setting transaction status in clog. Commit `0e141c0fbb` introduced a mechanism to reduce contention on ProcArrayLock by having a single process clear XIDs in the procArray on behalf of multiple processes, reducing the need to hand the lock around. A previous attempt to introduce a similar mechanism for CLogControlLock in `ccce90b398` crashed and burned, but the design problem which resulted in those failures is believed to have been corrected in this version. Amit Kapila, with some cosmetic changes by me. See the previous commit message for additional credits. Discussion: http://postgr.es/m/CAA4eK1KudxzgWhuywY_X=yeSAhJMT4DwCjroV5Ay60xaeB2Eew@mail.gmail.com	2017-09-01 11:45:40 -04:00
Alvaro Herrera	31ae1638ce	Simplify autovacuum work-item implementation The initial implementation of autovacuum work-items used a dynamic shared memory area (DSA). However, it's argued that dynamic shared memory is not portable enough, so we cannot rely on it being supported everywhere; at the same time, autovacuum work-items are now a critical part of the server, so it's not acceptable that they don't work in the cases where dynamic shared memory is disabled. Therefore, let's fall back to a simpler implementation of work-items that just uses autovacuum's main shared memory segment for storage. Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com	2017-08-15 18:14:07 -03:00
Alvaro Herrera	d9a622cee1	Fix error handling path in autovacuum launcher The original code (since `00e6a16d01`) was assuming aborting the transaction in autovacuum launcher was sufficient to release all resources, but in reality the launcher runs quite a lot of code out of any transactions. Re-introduce individual cleanup calls to make abort more robust. Reported-by: Robert Haas Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com	2017-08-15 13:35:12 -03:00
Alvaro Herrera	b2c95a3798	Fix replication origin-related race conditions Similar to what was fixed in commit `9915de6c1c` for replication slots, but this time it's related to replication origins: DROP SUBSCRIPTION attempts to drop the replication origin, but that fails if the replication worker process hasn't yet marked it unused. This causes failures in the buildfarm: ERROR: could not drop replication origin with OID 1, in use by PID 34069 Like the aforementioned commit, fix by having the process running DROP SUBSCRIPTION sleep until the worker marks the the replication origin struct as free. This uses a condition variable on each replication origin shmem state struct, so that the session trying to drop can sleep and expect to be awakened by the process keeping the origin open. Also fix a SGML markup in the previous commit. Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql	2017-08-08 16:07:46 -04:00
Alvaro Herrera	030273b7ea	Fix inadequacies in recently added wait events In commit `9915de6c1c`, we introduced a new wait point for replication slots and incorrectly labelled it as wait event PG_WAIT_LOCK. That's wrong, so invent an appropriate new wait event instead, and document it properly. While at it, fix numerous other problems in the vicinity: - two different walreceiver wait events were being mixed up in a single wait event (which wasn't documented either); split it out so that they can be distinguished, and document the new events properly. - ParallelBitmapPopulate was documented but didn't exist. - ParallelBitmapScan was not documented (I think this should be called "ParallelBitmapScanInit" instead.) - Logical replication wait events weren't documented - various symbols had been added in dartboard order in various places. Put them in alphabetical order instead, as was originally intended. Discussion: https://postgr.es/m/20170808181131.mu4fjepuh5m75cyq@alvherre.pgsql	2017-08-08 15:37:44 -04:00
Tom Lane	45e004fb4e	On Windows, retry process creation if we fail to reserve shared memory. We've heard occasional reports of backend launch failing because pgwin32_ReserveSharedMemoryRegion() fails, indicating that something has already used that address space in the child process. It's not very clear what, given that we disable ASLR in Windows builds, but suspicion falls on antivirus products. It'd be better if we didn't have to disable ASLR, anyway. So let's try to ameliorate the problem by retrying the process launch after such a failure, up to 100 times. Patch by me, based on previous work by Amit Kapila and others. This is a longstanding issue, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAA4eK1+R6hSx6t_yvwtx+NRzneVp+MRqXAdGJZChcau8Uij-8g@mail.gmail.com	2017-07-10 11:00:09 -04:00
Tom Lane	f13ea95f9e	Change pg_ctl to detect server-ready by watching status in postmaster.pid. Traditionally, "pg_ctl start -w" has waited for the server to become ready to accept connections by attempting a connection once per second. That has the major problem that connection issues (for instance, a kernel packet filter blocking traffic) can't be reliably told apart from server startup issues, and the minor problem that if server startup isn't quick, we accumulate "the database system is starting up" spam in the server log. We've hacked around many of the possible connection issues, but it resulted in ugly and complicated code in pg_ctl.c. In commit `c61559ec3`, I changed the probe rate to every tenth of a second. That prompted Jeff Janes to complain that the log-spam problem had become much worse. In the ensuing discussion, Andres Freund pointed out that we could dispense with connection attempts altogether if the postmaster were changed to report its status in postmaster.pid, which "pg_ctl start" already relies on being able to read. This patch implements that, teaching postmaster.c to report a status string into the pidfile at the same state-change points already identified as being of interest for systemd status reporting (cf commit `7d17e683f`). pg_ctl no longer needs to link with libpq at all; all its functions now depend on reading server files. In support of this, teach AddToDataDirLockFile() to allow addition of postmaster.pid lines in not-necessarily-sequential order. This is needed on Windows where the SHMEM_KEY line will never be written at all. We still have the restriction that we don't want to truncate the pidfile; document the reasons for that a bit better. Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode is broken --- before, they'd just wait out the sixty seconds until the loop gives up, and then report success anyway. (Yes, I found that out the hard way.) While at it, arrange for pg_ctl to not need to #include miscadmin.h; as a rather low-level backend header, requiring that to be compilable client-side is pretty dubious. This requires moving the #define's associated with the pidfile into a new header file, and moving PG_BACKEND_VERSIONSTR someplace else. For lack of a clearly better "someplace else", I put it into port.h, beside the declaration of find_other_exec(), since most users of that macro are passing the value to find_other_exec(). (initdb still depends on miscadmin.h, but at least pg_ctl and pg_upgrade no longer do.) In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the output of "postgres -V", which remarkably it had never done before. Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com	2017-06-28 17:31:32 -04:00
Tom Lane	e5d494d78c	Don't lose walreceiver start requests due to race condition in postmaster. When a walreceiver dies, the startup process will notice that and send a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new walreceiver to be launched. There's a race condition, which at least in HEAD is very easy to hit, whereby the postmaster might see that signal before it processes the SIGCHLD from the walreceiver process. In that situation, sigusr1_handler() just dropped the start request on the floor, reasoning that it must be redundant. Eventually, after 10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a fresh request --- but that's a long time if the connection could have been re-established almost immediately. Fix it by setting a state flag inside the postmaster that we won't clear until we do launch a walreceiver. In cases where that results in an extra walreceiver launch, it's up to the walreceiver to realize it's unwanted and go away --- but we have, and need, that logic anyway for the opposite race case. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there in test cases where a master server is stopped and restarted while leaving streaming slaves active. This logic has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/21344.1498494720@sss.pgh.pa.us	2017-06-26 17:31:56 -04:00
Tom Lane	ad1b5c842b	Ignore old stats file timestamps when starting the stats collector. The stats collector disregards inquiry messages that bear a cutoff_time before when it last wrote the relevant stats file. That's fine, but at startup when it reads the "permanent" stats files, it absorbed their timestamps as if they were the times at which the corresponding temporary stats files had been written. In reality, of course, there's no data out there at all. This led to disregarding inquiry messages soon after startup if the postmaster had been shut down and restarted within less than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for testing and in the field. Requesting backends would hang for 10 seconds and then report failure to read statistics, unless they got bailed out by some other backend coming along and making a newer request within that interval. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there because the autovacuum launcher hangs for 10 seconds when it can't get statistics at startup, thus preventing a second shutdown from occurring promptly. We might want to do some things in the autovac code to make it less prone to getting stuck that way, but this change is a good bug fix regardless. In passing, also fix pgstat_read_statsfiles() to ensure that it re-zeroes its global stats variables if they are corrupted by a short read from the stats file. (Other reads in that function go into temp variables, so that the issue doesn't arise.) This has been broken since we created the separation between permanent and temporary stats files in 8.4, so back-patch to all supported branches. Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us	2017-06-26 16:17:05 -04:00
Alvaro Herrera	a4f06606a3	Fix autovacuum launcher attachment to its DSA The autovacuum launcher doesn't actually do anything with its DSA other than creating it and attaching to it, but it's been observed that after longjmp'ing to the standard error handling block (for example after getting SIGINT) the autovacuum enters an infinite loop reporting that it cannot attach to its DSA anymore (which is correct, because it's already attached to it.) Fix by only attempting to attach if not already attached. I introduced this bug together with BRIN autosummarization in `7526e10224`. Reported-by: Yugo Nagata. Author: Thomas Munro. I added the comment to go with it. Discussion: https://postgr.es/m/20170621211538.0c9eae73.nagata@sraoss.co.jp	2017-06-22 13:50:26 -04:00
Tom Lane	382ceffdf7	Phase 3 of pgindent updates. Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:35:54 -04:00
Tom Lane	c7b8998ebb	Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit `e3860ffa4d` wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:19:25 -04:00
Tom Lane	e3860ffa4d	Initial pgindent run with pg_bsd_indent version 2.0. The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname varname", as well as some similar cases for const- or volatile-qualified pointers. Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 14:39:04 -04:00
Andres Freund	9206ced1dc	Clean up latch related code. The larger part of this patch replaces usages of MyProc->procLatch with MyLatch. The latter works even early during backend startup, where MyProc->procLatch doesn't yet. While the affected code shouldn't run in cases where it's not initialized, it might get copied into places where it might. Using MyLatch is simpler and a bit faster to boot, so there's little point to stick with the previous coding. While doing so I noticed some weaknesses around newly introduced uses of latches that could lead to missed events, and an omitted CHECK_FOR_INTERRUPTS() call in worker_spi. As all the actual bugs are in v10 code, there doesn't seem to be sufficient reason to backpatch this. Author: Andres Freund Discussion: https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de Backpatch: -	2017-06-06 16:13:00 -07:00
Andres Freund	703f148e98	Revert "Prevent panic during shutdown checkpoint" This reverts commit `086221cf6b`, which was made to master only. The approach implemented in the above commit has some issues. While those could easily be fixed incrementally, doing so would make backpatching considerably harder, so instead first revert this patch. Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de	2017-06-05 19:18:15 -07:00
Bruce Momjian	a6fd7b7a5f	Post-PG 10 beta1 pgindent run perltidy run not included.	2017-05-17 16:31:56 -04:00
Tom Lane	8b0b6303e9	Try to ensure that stats collector's receive buffer size is at least 100KB. Since commit `4e37b3e15`, buildfarm member frogmouth has been failing occasionally with symptoms indicating that some expected stats data is getting dropped. The reason that that commit changed the behavior seems probably to be that more data is getting shoved at the collector in a short span of time. In current sources, the stats test's first session sends about 9KB of data while exiting, which is probably the same as what was sent just before wait_for_stats() in the previous test design. But now, the test's second session is starting up concurrently, and it sends another 2KB (presumably reflecting its initial catalog accesses). Since frogmouth is running on Windows XP, which reputedly has a default socket receive buffer size of only 8KB, it is not very surprising if this has put us over the threshold where the receive buffer can overflow and drop messages. The same mechanism could very easily explain the intermittent stats test failures we've been seeing for years, since background processes such as the bgwriter will sometimes send data concurrently with all this, and could thus cause occasional buffer overflows. Hence, insert some code into pgstat_init() to increase the stats socket's receive buffer size to 100KB if it's less than that. (On failure, emit a LOG message, but keep going.) Modern systems seem to have default sizes in the range of 100KB-250KB, but older platforms don't. I couldn't find any platforms that wouldn't accept 100KB, so in theory this won't cause any portability problems. If this is successful at reducing the buildfarm failure rate in HEAD, we should back-patch it, because it's certain that similar buffer overflows happen in the field on platforms with small buffer sizes. Going forward, there might be an argument for trying to increase the buffer size even more, but let's take a baby step first. Discussion: https://postgr.es/m/22173.1494788088@sss.pgh.pa.us	2017-05-16 15:24:52 -04:00
Tom Lane	5d00b764cd	Make pgstat tabstat lookup hash table less fragile. Code review for commit `090010f2e`. Fix cases where an elog(ERROR) partway through a function would leave the persistent data structures in a corrupt state. pgstat_report_stat got this wrong by invalidating PgStat_TableEntry structs before removing hashtable entries pointing to them, and get_tabstat_entry got it wrong by ignoring the possibility of palloc failure after it had already created a hashtable entry. Also, avoid leaking a memory context per transaction, which the previous code did through misunderstanding hash_create's API. We do not need to create a context to hold the hash table; hash_create will do that. (The leak wasn't that large, amounting to only a memory context header per iteration, but it's still surprising that nobody noticed it yet.)	2017-05-14 22:52:49 -04:00
Peter Eisentraut	c1a7f64b4a	Replace "transaction log" with "write-ahead log" This makes documentation and error messages match the renaming of "xlog" to "wal" in APIs and file naming.	2017-05-12 11:52:43 -04:00
Andres Freund	e6c44eef55	Fix off-by-one possibly leading to skipped XLOG_RUNNING_XACTS records. Since `6ef2eba3f5` ("Skip checkpoints, archiving on idle systems."), GetLastImportantRecPtr() is used to avoid performing superfluous checkpoints, xlog switches, running-xact records when the system is idle. Unfortunately the check concerning running-xact records had a off-by-one error, leading to such records being potentially skipped when only a single record has been inserted since the last running-xact record. An alternative approach would have been to change GetLastImportantRecPtr()'s definition to point to the end of records, but that would make the checkpoint code more complicated. Author: Andres Freund Discussion: https://postgr.es/m/20170505012447.wsrympaxnfis6ojt@alap3.anarazel.de Backpatch: no, code only present in master	2017-05-06 16:55:07 -07:00
Peter Eisentraut	086221cf6b	Prevent panic during shutdown checkpoint When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and certain variants of CREATE_REPLICATION_SLOT. So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, the postmaster sends a SIGUSR2 signal to all walsenders. The walsenders then put themselves into the "stopping" state. In this state, they reject any new commands. (For simplicity, we reject all new commands, so that in the future we do not have to track meticulously which commands might generate WAL.) The checkpointer waits for all walsenders to reach this state before proceeding with the shutdown checkpoint. After the shutdown checkpoint is done, the postmaster sends SIGINT (previously unused) to the walsenders. This triggers the existing shutdown behavior of sending out the shutdown checkpoint record and then terminating. Author: Michael Paquier <michael.paquier@gmail.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>	2017-05-05 10:31:42 -04:00
Tom Lane	aa1351f1ee	Allow multiple bgworkers to be launched per postmaster iteration. Previously, maybe_start_bgworker() would launch at most one bgworker process per call, on the grounds that the postmaster might otherwise neglect its other duties for too long. However, that seems overly conservative, especially since bad effects only become obvious when many hundreds of bgworkers need to be launched at once. On the other side of the coin is that the existing logic could result in substantial delay of bgworker launches, because ServerLoop isn't guaranteed to iterate immediately after a signal arrives. (My attempt to fix that by using pselect(2) encountered too many portability question marks, and in any case could not help on platforms without pselect().) One could also question the wisdom of using an O(N^2) processing method if the system is intended to support so many bgworkers. As a compromise, allow that function to launch up to 100 bgworkers per call (and in consequence, rename it to maybe_start_bgworkers). This will allow any normal parallel-query request for workers to be satisfied immediately during sigusr1_handler, avoiding the question of whether ServerLoop will be able to launch more promptly. There is talk of rewriting the postmaster to use a WaitEventSet to avoid the signal-response-delay problem, but I'd argue that this change should be kept even after that happens (if it ever does). Backpatch to 9.6 where parallel query was added. The issue exists before that, but previous uses of bgworkers typically aren't as sensitive to how quickly they get launched. Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us	2017-04-26 16:17:34 -04:00
Tom Lane	64925603c9	Revert "Use pselect(2) not select(2), if available, to wait in postmaster's loop." This reverts commit `81069a9efc`. Buildfarm results suggest that some platforms have versions of pselect(2) that are not merely non-atomic, but flat out non-functional. Revert the use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART patch as the source of trouble). If it's so, we should probably look into blacklisting specific platforms that have broken pselect. Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us	2017-04-24 18:29:03 -04:00
Tom Lane	81069a9efc	Use pselect(2) not select(2), if available, to wait in postmaster's loop. Traditionally we've unblocked signals, called select(2), and then blocked signals again. The code expects that the select() will be cancelled with EINTR if an interrupt occurs; but there's a race condition, which is that an already-pending signal will be delivered as soon as we unblock, and then when we reach select() there will be nothing preventing it from waiting. This can result in a long delay before we perform any action that ServerLoop was supposed to have taken in response to the signal. As with the somewhat-similar symptoms fixed by commit `893902085`, the main practical problem is slow launching of parallel workers. The window for trouble is usually pretty short, corresponding to one iteration of ServerLoop; but it's not negligible. To fix, use pselect(2) in place of select(2) where available, as that's designed to solve exactly this problem. Where not available, we continue to use the old way, and are no worse off than before. pselect(2) has been required by POSIX since about 2001, so most modern platforms should have it. A bigger portability issue is that some implementations are said to be non-atomic, ie pselect() isn't really any different from unblock/select/reblock. Still, we're no worse off than before on such a platform. There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point this could be reverted, since we'd be using a self-pipe to solve the race condition. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us	2017-04-24 14:03:14 -04:00
Tom Lane	8939020853	Run the postmaster's signal handlers without SA_RESTART. The postmaster keeps signals blocked everywhere except while waiting for something to happen in ServerLoop(). The code expects that the select(2) will be cancelled with EINTR if an interrupt occurs; without that, followup actions that should be performed by ServerLoop() itself will be delayed. However, some platforms interpret the SA_RESTART signal flag as meaning that they should restart rather than cancel the select(2). Worse yet, some of them restart it with the original timeout delay, meaning that a steady stream of signal interrupts can prevent ServerLoop() from iterating at all if there are no incoming connection requests. Observable symptoms of this, on an affected platform such as HPUX 10, include extremely slow parallel query startup (possibly as much as 30 seconds) and failure to update timestamps on the postmaster's sockets and lockfiles when no new connections arrive for a long time. We can fix this by running the postmaster's signal handlers without SA_RESTART. That would be quite a scary change if the range of code where signals are accepted weren't so tiny, but as it is, it seems safe enough. (Note that postmaster children do, and must, reset all the handlers before unblocking signals; so this change should not affect any child process.) There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point it might be appropriate to revert this patch. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us	2017-04-24 13:00:30 -04:00
Tom Lane	4fe04244b5	Fix postmaster's handling of fork failure for a bgworker process. This corner case didn't behave nicely at all: the postmaster would (partially) update its state as though the process had started successfully, and be quite confused thereafter. Fix it to act like the worker had crashed, instead. In passing, refactor so that do_start_bgworker contains all the state-change logic for bgworker launch, rather than just some of it. Back-patch as far as 9.4. 9.3 contains similar logic, but it's just enough different that I don't feel comfortable applying the patch without more study; and the use of bgworkers in 9.3 was so small that it doesn't seem worth the extra work. transam/parallel.c is still entirely unprepared for the possibility of bgworker startup failure, but that seems like material for a separate patch. Discussion: https://postgr.es/m/4905.1492813727@sss.pgh.pa.us	2017-04-24 12:16:58 -04:00
Tom Lane	3e51725b38	Avoid depending on non-POSIX behavior of fcntl(2). The POSIX standard does not say that the success return value for fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1. We had several calls that were making the stronger assumption. Adjust them to test specifically for -1 for strict spec compliance. The standard further leaves open the possibility that the O_NONBLOCK flag bit is not the only active one in F_SETFL's argument. Formally, therefore, one ought to get the current flags with F_GETFL and store them back with only the O_NONBLOCK bit changed when trying to change the nonblock state. In port/noblock.c, we were doing the full pushup in pg_set_block but not in pg_set_noblock, which is just weird. Make both of them do it properly, since they have little business making any assumptions about the socket they're handed. The other places where we're issuing F_SETFL are working with FDs we just got from pipe(2), so it's reasonable to assume the FDs' properties are all default, so I didn't bother adding F_GETFL steps there. Also, while pg_set_block deserves some points for trying to do things right, somebody had decided that it'd be even better to cast fcntl's third argument to "long". Which is completely loony, because POSIX clearly says the third argument for an F_SETFL call is "int". Given the lack of field complaints, these missteps apparently are not of significance on any common platforms. But they're still wrong, so back-patch to all supported branches. Discussion: https://postgr.es/m/30882.1492800880@sss.pgh.pa.us	2017-04-21 15:56:16 -04:00
Peter Eisentraut	6275f5d28a	Fix new warnings from GCC 7 This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.	2017-04-17 13:59:46 -04:00
Tom Lane	32470825d3	Avoid passing function pointers across process boundaries. We'd already recognized that we can't pass function pointers across process boundaries for functions in loadable modules, since a shared library could get loaded at different addresses in different processes. But actually the practice doesn't work for functions in the core backend either, if we're using EXEC_BACKEND. This is the cause of recent failures on buildfarm member culicidae. Switch to passing a string function name in all cases. Something like this needs to be back-patched into 9.6, but let's see if the buildfarm likes it first. Petr Jelinek, with a bunch of basically-cosmetic adjustments by me Discussion: https://postgr.es/m/548f9c1d-eafa-e3fa-9da8-f0cc2f654e60@2ndquadrant.com	2017-04-14 23:50:16 -04:00
Peter Eisentraut	139eb9673c	Report statistics in logical replication workers Author: Stas Kelvich <s.kelvich@postgrespro.ru> Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>	2017-04-14 14:37:06 -04:00
Robert Haas	6599c9ac33	Add an Assert() to max_parallel_workers enforcement. To prevent future bugs along the lines of the one corrected by commit `8ff518699f`, or find any that remain in the current code, add an Assert() that the difference between parallel_register_count and parallel_terminate_count is in a sane range. Kuntal Ghosh, with considerable tidying-up by me, per a suggestion from Neha Khatri. Reviewed by Tomas Vondra. Discussion: http://postgr.es/m/CAFO0U+-E8yzchwVnvn5BeRDPgX2z9vZUxQ8dxx9c0XFGBC7N1Q@mail.gmail.com	2017-04-11 13:03:44 -04:00
Robert Haas	8ff518699f	Fix confusion of max_parallel_workers mechanism following crash. Commit `b460f5d669` failed to contemplate the possibilit that a parallel worker registered before a crash would be unregistered only after the crash; if that happened, we'd end up with parallel_terminate_count > parallel_register_count and the system would refuse to launch any more parallel workers. The easiest way to fix that seems to be to forget BGW_NEVER_RESTART workers in ResetBackgroundWorkerCrashTimes() rather than leaving them around to be cleaned up after the conclusion of the restart, so that they go away before rather than after shared memory is reset. To make sure that this fix is water-tight, don't allow parallel workers to be anything other than BGW_NEVER_RESTART, so that after recovering from a crash, 0 is guaranteed to be the correct starting value for parallel_register_count. The core code wouldn't do this anyway, but somebody might try to do it in extension code. Report by Thomas Vondra. Patch by me, reviewed by Kuntal Ghosh. Discussion: http://postgr.es/m/CAGz5QC+AVEVS+3rBKRq83AxkJLMZ1peMt4nnrQwczxOrmo3CNw@mail.gmail.com	2017-04-11 12:46:40 -04:00
Robert Haas	d4116a7719	Add ProcArrayGroupUpdate wait event. Discussion: http://postgr.es/m/CA+TgmobgWHcXDcChX2+BqJDk2dkPVF85ZrJFhUyHHQmw8diTpA@mail.gmail.com	2017-04-07 13:41:47 -04:00
Alvaro Herrera	7526e10224	BRIN auto-summarization Previously, only VACUUM would cause a page range to get initially summarized by BRIN indexes, which for some use cases takes too much time since the inserts occur. To avoid the delay, have brininsert request a summarization run for the previous range as soon as the first tuple is inserted into the first page of the next range. Autovacuum is in charge of processing these requests, after doing all the regular vacuuming/ analyzing work on tables. This doesn't impose any new tasks on autovacuum, because autovacuum was already in charge of doing summarizations. The only actual effect is to change the timing, i.e. that it occurs earlier. For this reason, we don't go any great lengths to record these requests very robustly; if they are lost because of a server crash or restart, they will happen at a later time anyway. Most of the new code here is in autovacuum, which can now be told about "work items" to process. This can be used for other things such as GIN pending list cleaning, perhaps visibility map bit setting, both of which are currently invoked during vacuum, but do not really depend on vacuum taking place. The requests are at the page range level, a granularity for which we did not have SQL-level access; we only had index-level summarization requests via brin_summarize_new_values(). It seems reasonable to add SQL-level access to range-level summarization too, so add a function brin_summarize_range() to do that. Authors: Álvaro Herrera, based on sketch from Simon Riggs. Reviewed-by: Thomas Munro. Discussion: https://postgr.es/m/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql	2017-04-01 14:00:53 -03:00
Robert Haas	2113ac4cbb	Don't use bgw_main even to specify in-core bgworker entrypoints. On EXEC_BACKEND builds, this can fail if ASLR is in use. Backpatch to 9.5. On master, completely remove the bgw_main field completely, since there is no situation in which it is safe for an EXEC_BACKEND build. On 9.6 and 9.5, leave the field intact to avoid breaking things for third-party code that doesn't care about working under EXEC_BACKEND. Prior to 9.5, there are no in-core bgworker entrypoints. Petr Jelinek, reviewed by me. Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com	2017-03-31 20:43:32 -04:00
Teodor Sigaev	090010f2ec	Improve performance of find_tabstat_entry()/get_tabstat_entry() Patch introduces a hash map reloid -> PgStat_TableStatus which improves performance in case of large number of tables/partitions. Author: Aleksander Alekseev Reviewed-by: Andres Freund, Anastasia Lubennikova, Tels, me https://commitfest.postgresql.org/13/1058/	2017-03-27 18:34:42 +03:00
Robert Haas	fc70a4b0df	Show more processes in pg_stat_activity. Previously, auxiliary processes and background workers not connected to a database (such as the logical replication launcher) weren't shown. Include them, so that we can see the associated wait state information. Add a new column to identify the processes type, so that people can filter them out easily using SQL if they wish. Before this patch was written, there was discussion about whether we should expose this information in a separate view, so as to avoid contaminating pg_stat_activity with things people might not want to see. But putting everything in pg_stat_activity was a more popular choice, so that's what the patch does. Kuntal Ghosh, reviewed by Amit Langote and Michael Paquier. Some revisions and bug fixes by me. Discussion: http://postgr.es/m/CA+TgmoYES5nhkEGw9nZXU8_FhA8XEm8NTm3-SO+3ML1B81Hkww@mail.gmail.com	2017-03-26 22:02:22 -04:00
Peter Eisentraut	7c4f52409a	Logical replication support for initial data copy Add functionality for a new subscription to copy the initial data in the tables and then sync with the ongoing apply process. For the copying, add a new internal COPY option to have the COPY source data provided by a callback function. The initial data copy works on the subscriber by receiving COPY data from the publisher and then providing it locally into a COPY that writes to the destination table. A WAL receiver can now execute full SQL commands. This is used here to obtain information about tables and publications. Several new options were added to CREATE and ALTER SUBSCRIPTION to control whether and when initial table syncing happens. Change pg_dump option --no-create-subscription-slots to --no-subscription-connect and use the new CREATE SUBSCRIPTION ... NOCONNECT option for that. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Erik Rijkers <er@xs4all.nl>	2017-03-23 08:55:37 -04:00
Tom Lane	17f8ffa1e3	Fix REFRESH MATERIALIZED VIEW to report activity to the stats collector. The non-concurrent code path for REFRESH MATERIALIZED VIEW failed to report its updates to the stats collector. This is bad since it means auto-analyze doesn't know there's any work to be done. Adjust it to report the refresh as a table truncate followed by insertion of an appropriate number of rows. Since a matview could contain more than INT_MAX rows, change the signature of pgstat_count_heap_insert() to accept an int64 rowcount. (The accumulator it's adding into is already int64, but existing callers could not insert more than a small number of rows at once, so the argument had been declared just "int n".) This is surely a bug fix, but changing pgstat_count_heap_insert()'s API seems too risky for the back branches. Given the lack of previous complaints, I'm not sure it's a big enough problem to justify a kluge solution that would avoid that. So, no back-patch, at least for now. Jim Mlodgenski, adjusted a bit by me Discussion: https://postgr.es/m/CAB_5SRchSz7-WmdO5szdiknG8Oj_GGqJytrk1KRd11yhcMs1KQ@mail.gmail.com	2017-03-18 17:49:39 -04:00
Robert Haas	249cf070e3	Create and use wait events for read, write, and fsync operations. Previous commits, notably `53be0b1add` and `6f3bd98ebf`, made it possible to see from pg_stat_activity when a backend was stuck waiting for another backend, but it's also fairly common for a backend to be stuck waiting for an I/O. Add wait events for those operations, too. Rushabh Lathia, with further hacking by me. Reviewed and tested by Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed. Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com	2017-03-18 07:43:01 -04:00
Robert Haas	88e66d193f	Rename "pg_clog" directory to "pg_xact". Names containing the letters "log" sometimes confuse users into believing that only non-critical data is present. It is hoped this renaming will discourage ill-considered removals of transaction status data. Michael Paquier Discussion: http://postgr.es/m/CA+Tgmoa9xFQyjRZupbdEFuwUerFTvC6HjZq1ud6GYragGDFFgA@mail.gmail.com	2017-03-17 09:48:38 -04:00
Tom Lane	6ec4c8584c	Reduce log verbosity of startup/shutdown for launcher subprocesses. There's no really good reason why the autovacuum launcher and logical replication launcher should announce themselves at startup and shutdown by default. Users don't care that those processes exist, and it's inconsistent that those background processes announce themselves while others don't. So, reduce those messages from LOG to DEBUG1 level. I was sorely tempted to reduce the "starting logical replication worker for subscription ..." message to DEBUG1 as well, but forebore for now. Those processes might possibly be of direct interest to users, at least until logical replication is a lot better shaken out than it is today. Discussion: https://postgr.es/m/19479.1489121003@sss.pgh.pa.us	2017-03-10 15:18:38 -05:00
Robert Haas	f35742ccb7	Support parallel bitmap heap scans. The index is scanned by a single process, but then all cooperating processes can iterate jointly over the resulting set of heap blocks. In the future, we might also want to support using a parallel bitmap index scan to set up for a parallel bitmap heap scan, but that's a job for another day. Dilip Kumar, with some corrections and cosmetic changes by me. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, Thomas Munro, and me. Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com	2017-03-08 12:05:43 -05:00
Robert Haas	7f6fa29f18	Fix user-after-free bug. Introduced by commit `aea5d29836`. Patch from Amit Kapila. Issue discovered independently by Amit Kapila and Ashutosh Sharma.	2017-03-06 12:13:57 -05:00
Peter Eisentraut	1e8a850094	Use asynchronous connect API in libpqwalreceiver This makes the connection attempt from CREATE SUBSCRIPTION and from WalReceiver interruptable by the user in case the libpq connection is hanging. The previous coding required immediate shutdown (SIGQUIT) of PostgreSQL in that situation. From: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Thom Brown <thom@linux.com>	2017-03-03 09:13:58 -05:00
Robert Haas	19dc233c32	Add pg_current_logfile() function. The syslogger will write out the current stderr and csvlog names, if it's running and there are any, to a new file in the data directory called "current_logfiles". We take care to remove this file when it might no longer be valid (but not at shutdown). The function pg_current_logfile() can be used to read the entries in the file. Gilles Darold, reviewed and modified by Karl O. Pinc, Michael Paquier, and me. Further review by Álvaro Herrera and Christoph Berg.	2017-03-03 11:43:11 +05:30
Robert Haas	aea5d29836	Notify bgworker registrant after freeing worker slot. Tom Lane observed buildfarm failures caused by the select_parallel regression test trying to launch new parallel queries before the worker slots used by the previous ones were freed. Try to fix this by having the postmaster free the worker slots before it sends the SIGUSR1 notifications to the registering process. This doesn't completely eliminate the possibility that the user backend might (correctly) observe the worker as dead before the slot is free, but I believe it should make the window significantly narrower. Patch by me, per complaint from Tom Lane. Reviewed by Amit Kapila. Discussion: http://postgr.es/m/30673.1487310734@sss.pgh.pa.us	2017-03-03 09:25:30 +05:30
Tom Lane	9e3755ecb2	Remove useless duplicate inclusions of system header files. c.h #includes a number of core libc header files, such as <stdio.h>. There's no point in re-including these after having read postgres.h, postgres_fe.h, or c.h; so remove code that did so. While at it, also fix some places that were ignoring our standard pattern of "include postgres[_fe].h, then system header files, then other Postgres header files". While there's not any great magic in doing it that way rather than system headers last, it's silly to have just a few files deviating from the general pattern. (But I didn't attempt to enforce this globally, only in files I was touching anyway.) I'd be the first to say that this is mostly compulsive neatnik-ism, but over time it might save enough compile cycles to be useful.	2017-02-25 16:12:55 -05:00
Robert Haas	569174f1be	btree: Support parallel index scans. This isn't exposed to the optimizer or the executor yet; we'll add support for those things in a separate patch. But this puts the basic mechanism in place: several processes can attach to a parallel btree index scan, and each one will get a subset of the tuples that would have been produced by a non-parallel scan. Each index page becomes the responsibility of a single worker, which then returns all of the TIDs on that page. Rahila Syed, Amit Kapila, Robert Haas, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi.	2017-02-15 07:41:14 -05:00
Heikki Linnakangas	181bdb90ba	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:33:58 +02:00
Peter Eisentraut	5a366b4ff4	Fix typo: pg_statistics -> pg_statistic	2017-01-25 14:38:33 -05:00
Peter Eisentraut	f21a563d25	Move some things from builtins.h to new header files This avoids that builtins.h has to include additional header files.	2017-01-20 20:29:53 -05:00
Robert Haas	c6a389792e	Avoid useless respawining the autovacuum launcher at high speed. When (1) autovacuum = off and (2) there's at least one database with an XID age greater than autovacuum_freeze_max_age and (3) all tables in that database that need vacuuming are already being processed by a worker and (4) the autovacuum launcher is started, a kind of infinite loop occurs. The launcher starts a worker and immediately exits. The worker, finding no worker to do, immediately starts the launcher, supposedly so that the next database can be processed. But because datfrozenxid for that database hasn't been advanced yet, the new worker gets put right back into the same database as the old one, where it once again starts the launcher and exits. High-speed ping pong ensues. There are several possible ways to break the cycle; this seems like the safest one. Amit Khandekar (code) and Robert Haas (comments), reviewed by Álvaro Herrera. Discussion: http://postgr.es/m/CAJ3gD9eWejf72HKquKSzax0r+epS=nAbQKNnykkMA0E8c+rMDg@mail.gmail.com	2017-01-20 15:55:45 -05:00
Peter Eisentraut	665d1fad99	Logical replication - Add PUBLICATION catalogs and DDL - Add SUBSCRIPTION catalog and DDL - Define logical replication protocol and output plugin - Add logical replication workers From: Petr Jelinek <petr@2ndquadrant.com> Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Erik Rijkers <er@xs4all.nl> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>	2017-01-20 09:04:49 -05:00
Tom Lane	6667d9a6d7	Re-allow SSL passphrase prompt at server start, but not thereafter. Leave OpenSSL's default passphrase collection callback in place during the first call of secure_initialize() in server startup. Although that doesn't work terribly well in daemon contexts, some people feel we should not break it for anyone who was successfully using it before. We still block passphrase demands during SIGHUP, meaning that you can't adjust SSL configuration on-the-fly if you used a passphrase, but this is no worse than what it was before commit `de41869b6`. And we block passphrase demands during EXEC_BACKEND reloads; that behavior wasn't useful either, but at least now it's documented. Tweak some related log messages for more readability, and avoid issuing essentially duplicate messages about reload failure caused by a passphrase. Discussion: https://postgr.es/m/29982.1483412575@sss.pgh.pa.us	2017-01-04 12:44:03 -05:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Tom Lane	de41869b64	Allow SSL configuration to be updated at SIGHUP. It is no longer necessary to restart the server to enable, disable, or reconfigure SSL. Instead, we just create a new SSL_CTX struct (by re-reading all relevant files) whenever we get SIGHUP. Testing shows that this is fast enough that it shouldn't be a problem. In conjunction with that, downgrade the logic that complains about pg_hba.conf "hostssl" lines when SSL isn't active: now that's just a warning condition not an error. An issue that still needs to be addressed is what shall we do with passphrase-protected server keys? As this stands, the server would demand the passphrase again on every SIGHUP, which is certainly impractical. But the case was only barely supported before, so that does not seem a sufficient reason to hold up committing this patch. Andreas Karlsson, reviewed by Michael Banck and Michael Paquier Discussion: https://postgr.es/m/556A6E8A.9030400@proxel.se	2017-01-02 21:37:12 -05:00
Andres Freund	6ef2eba3f5	Skip checkpoints, archiving on idle systems. Some background activity (like checkpoints, archive timeout, standby snapshots) is not supposed to happen on an idle system. Unfortunately so far it was not easy to determine when a system is idle, which defeated some of the attempts to avoid redundant activity on an idle system. To make that easier, allow to make individual WAL insertions as not being "important". By checking whether any important activity happened since the last time an activity was performed, it now is easy to check whether some action needs to be repeated. Use the new facility for checkpoints, archive timeout and standby snapshots. The lack of a facility causes some issues in older releases, but in my opinion the consequences (superflous checkpoints / archived segments) aren't grave enough to warrant backpatching. Author: Michael Paquier, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI Bug: #13685 Discussion: https://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.org https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com Backpatch: -	2016-12-22 11:31:50 -08:00
Robert Haas	3761fe3c20	Simplify LWLock tranche machinery by removing array_base/array_stride. array_base and array_stride were added so that we could identify the offset of an LWLock within a tranche, but this facility is only very marginally used apart from the main tranche. So, give every lock in the main tranche its own tranche ID and get rid of array_base, array_stride, and all that's attached. For debugging facilities (Trace_lwlocks and LWLOCK_STATS) print the pointer address of the LWLock using %p instead of the offset. This is arguably more useful, and certainly a lot cheaper. Drop the offset-within-tranche from the information reported to dtrace and from one can't-happen message inside lwlock.c. The main user-visible impact of this change is that pg_stat_activity will now report all waits for LWLocks as "LWLock" rather than reporting some as "LWLockTranche" and others as "LWLockNamed". The main motivation for this change is that the need to specify an array_base and an array_stride is awkward for parallel query. There is only a very limited supply of tranche IDs so we can't just keep allocating new ones, and if we try to use the same tranche IDs every time then we run into trouble when multiple parallel contexts are use simultaneously. So if we didn't get rid of this mechanism we'd have to make it even more complicated. By simplifying it in this way, we instead reduce the size of the generated code for lwlock.c by about 5%. Discussion: http://postgr.es/m/CA+TgmoYsFn6NUW1x0AZtupJGUAs1UDY4dJtCN47_Q6D0sP80PA@mail.gmail.com	2016-12-16 11:29:23 -05:00
Tom Lane	be7b2848c6	Make the different Unix-y semaphore implementations ABI-compatible. Previously, the "sem" field of PGPROC varied in size depending on which kernel semaphore API we were using. That was okay as long as there was only one likely choice per platform, but in the wake of commit `ecb0d20a9`, that assumption seems rather shaky. It doesn't seem out of the question anymore that an extension compiled against one API choice might be loaded into a postmaster built with another choice. Moreover, this prevents any possibility of selecting the semaphore API at postmaster startup, which might be something we want to do in future. Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix semaphore APIs, and turn the pointed-to data into an opaque struct whose contents are only known within the responsible modules. For the SysV and unnamed-POSIX APIs, the pointed-to data has to be allocated elsewhere in shared memory, which takes a little bit of rejiggering of the InitShmemAllocation code sequence. (I invented a ShmemAllocUnlocked() function to make that a little cleaner than it used to be. That function is not meant for any uses other than the ones it has now, but it beats having InitShmemAllocation() know explicitly about allocation of space for semaphores and spinlocks.) This change means an extra indirection to access the semaphore data, but since we only touch that when blocking or awakening a process, there shouldn't be any meaningful performance penalty. Moreover, at least for the unnamed-POSIX case on Linux, the sem_t type is quite a bit wider than a pointer, so this reduces sizeof(PGPROC) which seems like a good thing. For the named-POSIX API, there's effectively no change: the PGPROC.sem field was and still is a pointer to something returned by sem_open() in the postmaster's memory space. Document and check the pre-existing limitation that this case can't work in EXEC_BACKEND mode. It did not seem worth unifying the Windows semaphore ABI with the Unix cases, since there's no likelihood of needing ABI compatibility much less runtime switching across those cases. However, we can simplify the Windows code a bit if we define PGSemaphore as being directly a HANDLE, rather than pointer to HANDLE, so let's do that while we're here. (This also ends up being no change in what's physically stored in PGPROC.sem. We're just moving the HANDLE fetch from callees to callers.) It would take a bunch of additional code shuffling to get to the point of actually choosing a semaphore API at postmaster start, but the effects of that would now be localized in the port/XXX_sema.c files, so it seems like fit material for a separate patch. The need for it is unproven as yet, anyhow, whereas the ABI risk to extensions seems real enough. Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us	2016-12-12 13:32:10 -05:00
Heikki Linnakangas	58445c5c8d	Further cleanup from the strong-random patch. Also use the new facility for generating RADIUS authenticator requests, and salt in chkpass extension. Reword the error messages to be nicer. Fix bogus error code used in the message in BackendStartup.	2016-12-12 11:55:32 +02:00
Heikki Linnakangas	41493bac36	Fix two thinkos related to strong random keys. pg_backend_random() is used for MD5 salt generation, but it can fail, and no checks were done on its status code. Fix memory leak, if generating a random number for a cancel key failed. Both issues were spotted by Coverity. Fix by Michael Paquier.	2016-12-12 09:58:32 +02:00
Heikki Linnakangas	81f2e514a9	Fix query cancellation. In commit `fe0a0b59`, the datatype used for MyCancelKey and other variables that store cancel keys were changed from long to uint32, but I missed this one. That broke query cancellation on platforms where long is wider than 32 bits. Report by Andres Freund, fix by Michael Paquier.	2016-12-07 09:47:43 +02:00
Heikki Linnakangas	fe0a0b5993	Replace PostmasterRandom() with a stronger source, second attempt. This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom Unlike the current pgcrypto function, the source is chosen by configure. That makes it easier to test different implementations, and ensures that we don't accidentally fall back to a less secure implementation, if the primary source fails. All of those methods are quite reliable, it would be pretty surprising for them to fail, so we'd rather find out by failing hard. If no strong random source is available, we fall back to using erand48(), seeded from current timestamp, like PostmasterRandom() was. That isn't cryptographically secure, but allows us to still work on platforms that don't have any of the above stronger sources. Because it's not very secure, the built-in implementation is only used if explicitly requested with --disable-strong-random. This replaces the more complicated Fortuna algorithm we used to have in pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom, so it doesn't seem worth the maintenance effort to keep that. pgcrypto functions that require strong random numbers will be disabled with --disable-strong-random. Original patch by Magnus Hagander, tons of further work by Michael Paquier and me. Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com	2016-12-05 13:42:59 +02:00
Tom Lane	b3427dade1	Delete deleteWhatDependsOn() in favor of more performDeletion() flag bits. deleteWhatDependsOn() had grown an uncomfortably large number of assumptions about what it's used for. There are actually only two minor differences between what it does and what a regular performDeletion() call can do, so let's invent additional bits in performDeletion's existing flags argument that specify those behaviors, and get rid of deleteWhatDependsOn() as such. (We'd probably have done it this way from the start, except that performDeletion didn't originally have a flags argument, IIRC.) Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an extension, and use that when dropping temporary objects at session end. This provides a more general solution to the problem addressed in a hacky way in commit 08dd23cec: if an extension script creates temp objects and forgets to remove them again, the whole extension went away when its contained temp objects were deleted. The previous solution only covered temp relations, but this solves it for all object types. These changes require minor additions in dependency.c to pass the flags to subroutines that previously didn't get them, but it's still a net savings of code, and it seems cleaner than before. Having done this, revert the special-case code added in `08dd23cec` that prevented addition of pg_depend records for temp table extension membership, because that caused its own oddities: dropping an extension that had created such a table didn't automatically remove the table, leading to a failure if the table had another dependency on the extension (such as use of an extension data type), or to a duplicate-name failure if you then tried to recreate the extension. But we keep the part that prevents the pg_temp_nnn schema from becoming an extension member; we never want that to happen. Add a regression test case covering these behaviors. Although this fixes some arguable bugs, we've heard few field complaints, and any such problems are easily worked around by explicitly dropping temp objects at the end of extension scripts (which seems like good practice anyway). So I won't risk a back-patch. Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com	2016-12-02 14:57:55 -05:00
Robert Haas	b460f5d669	Add max_parallel_workers GUC. Increase the default value of the existing max_worker_processes GUC from 8 to 16, and add a new max_parallel_workers GUC with a maximum of 8. This way, even if the maximum amount of parallel query is happening, there is still room for background workers that do other things, as originally envisioned when max_worker_processes was added. Julien Rouhaud, reviewed by Amit Kapila and by revised by me.	2016-12-02 07:42:58 -05:00
Peter Eisentraut	597a87ccc9	Use latch instead of select() in walreceiver Replace use of poll()/select() by WaitLatchOrSocket(), which is more portable and flexible. Also change walreceiver to use its procLatch instead of a custom latch. From: Petr Jelinek <petr@2ndquadrant.com>	2016-12-01 20:23:28 -05:00
Tom Lane	dafa0848da	Code review for early drop of orphaned temp relations in autovacuum. Commit `a734fd5d1` exposed some race conditions that existed previously in the autovac code, but were basically harmless because autovac would not try to delete orphaned relations immediately. Specifically, the test for orphaned-ness was made on a pg_class tuple that might be dead by now, allowing autovac to try to remove a table that the owning backend had just finished deleting. This resulted in a hard crash due to inadequate caution about accessing the table's catalog entries without any lock. We must take a relation lock and then recheck whether the table is still present and still looks deletable before we do anything. Also, it seemed to me that deleting multiple tables per transaction, and trying to continue after errors, represented unjustifiable complexity. We do not expect this code path to be taken often in the field, nor even during testing, which means that prioritizing performance over correctness is a bad tradeoff. Rip all that out in favor of just starting a new transaction after each successful temp table deletion. If we're unlucky enough to get an error, which shouldn't happen anyway now that we're being more cautious, let the autovacuum worker fail as it normally would. In passing, improve the order of operations in the initial scan loop. Now that we don't care about whether a temp table is a wraparound hazard, there's no need to perform extract_autovac_opts, get_pgstat_tabentry_relid, or relation_needs_vacanalyze for temp tables. Also, if GetTempNamespaceBackendId returns InvalidBackendId (indicating it doesn't recognize the schema as temp), treat that as meaning it's NOT an orphaned temp table, not that it IS one, which is what happened before because BackendIdGetProc necessarily failed. The case really shouldn't come up for a table that has RELPERSISTENCE_TEMP, but the consequences if it did seem undesirable. (This might represent a back-patchable bug fix; not sure if it's worth the trouble.) Discussion: https://postgr.es/m/21299.1480272347@sss.pgh.pa.us	2016-11-27 21:23:39 -05:00
Robert Haas	e343dfa42b	Remove barrier.h A new thing also called a "barrier" is proposed, but whether we decide to take that patch or not, this file seems to have outlived its usefulness. Thomas Munro	2016-11-22 20:28:24 -05:00
Robert Haas	e8ac886c24	Support condition variables. Condition variables provide a flexible way to sleep until a cooperating process causes an arbitrary condition to become true. In simple cases, this can be accomplished with a WaitLatch/ResetLatch loop; the cooperating process can call SetLatch after performing work that might cause the condition to be satisfied, and the waiting process can recheck the condition each time. However, if the process performing the work doesn't have an easy way to identify which processes might be waiting, this doesn't work, because it can't identify which latches to set. Condition variables solve that problem by internally maintaining a list of waiters; a process that may have caused some waiter's condition to be satisfied must "signal" or "broadcast" on the condition variable. Robert Haas and Thomas Munro	2016-11-22 14:27:11 -05:00
Tom Lane	ae92a9a380	Fix uninitialized variable. Oversight in `a734fd5d1`. Michael Paquier	2016-11-21 19:59:56 -05:00
Robert Haas	a734fd5d1c	autovacuum: Drop orphan temp tables more quickly but with more caution. Previously, we only dropped an orphan temp table when it became old enough to threaten wraparound; instead, doing it immediately. The only value of waiting is that someone might be able to examine the contents of the orphan temp table for forensic purposes, but it's pretty difficult to actually do that and few users will wish to do so. On the flip side, not performing the drop immediately generates log spam and bloats pg_class. In addition, per a report from Grigory Smolkin, if a temporary schema contains a very large number of temporary tables, a backend attempting to clear the temporary schema might fail due to lock table exhaustion. It's helpful for autovacuum to clean up after such cases, and we don't want it to wait for wraparound to threaten before doing so. To prevent autovacuum from failing in the same manner as a backend trying to drop an entire temp schema, remove orphan temp tables in batches of 50, committing after each batch, so that we don't accumulate an unbounded number of locks. If a drop fails, retry other orphan tables that need to be dropped up to 10 times before giving up. With this system, if a backend does fail to clean a temporary schema due to lock table exhaustion, autovacuum should hopefully put things right the next time it processes the database. Discussion: CAB7nPqSbYT6dRwsXVgiKmBdL_ARemfDZMPA+RPeC_ge0GK70hA@mail.gmail.com Michael Paquier, with a bunch of comment changes by me.	2016-11-21 13:01:50 -05:00
Robert Haas	4f714b2fd2	If the stats collector dies during Hot Standby, restart it. This bug exists as far back as 9.0, when Hot Standby was introduced, so back-patch to all supported branches. Report and patch by Takayuki Tsunakawa, reviewed by Michael Paquier and Kuntal Ghosh.	2016-10-27 14:27:40 -04:00
Heikki Linnakangas	56f39009c5	Fix typos in comments. Vinayak Pokale	2016-10-26 11:12:31 +03:00
Heikki Linnakangas	faae1c918e	Revert "Replace PostmasterRandom() with a stronger way of generating randomness." This reverts commit `9e083fd468`. That was a few bricks shy of a load: * Query cancel stopped working * Buildfarm member pademelon stopped working, because the box doesn't have /dev/urandom nor /dev/random. This clearly needs some more discussion, and a quite different patch, so revert for now.	2016-10-18 16:28:23 +03:00
Heikki Linnakangas	9e083fd468	Replace PostmasterRandom() with a stronger way of generating randomness. This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom - /dev/random Original patch by Magnus Hagander, with further work by Michael Paquier and me. Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>	2016-10-17 11:52:50 +03:00
Tom Lane	81e82a2bd4	Fix handling of pgstat counters for TRUNCATE in a prepared transaction. pgstat_twophase_postcommit is supposed to duplicate the math in AtEOXact_PgStat, but it had missed out the bit about clearing t_delta_live_tuples/t_delta_dead_tuples for a TRUNCATE. It's harder than you might think to replicate the issue here, because those counters would only be nonzero when a previous transaction in the same backend had added/deleted tuples in the truncated table, and those counts hadn't been sent to the stats collector yet. Evident oversight in commit `d42358efb`. I've not added a regression test for this; we tried to add one in `d42358efb`, and had to revert it because it was too timing-sensitive for the buildfarm. Back-patch to 9.5 where `d42358efb` came in. Stas Kelvich Discussion: <EB57BF68-C06D-4737-BDDC-4BA778F4E62B@postgrespro.ru>	2016-10-13 19:46:05 -04:00
Tom Lane	15fc5e1581	Clean up handling of anonymous mmap'd shared-memory segment. Fix detaching of the mmap'd segment to have its own on_shmem_exit callback, rather than piggybacking on the one for detaching from the SysV segment. That was confusing, and given the distance between the two attach calls, it was trouble waiting to happen. Make the detaching calls idempotent by clearing AnonymousShmem to show we've already unmapped. I spent quite a bit of time yesterday trying to find a path that would allow the munmap()'s to be done twice, and while I did not succeed, it seems silly that there's even a question. Make the #ifdef logic less confusing by separating "do we want to use anonymous shmem" from EXEC_BACKEND. Even though there's no current scenario where those conditions are different, it is not helpful for different places in the same file to be testing EXEC_BACKEND for what are fundamentally different reasons. Don't do on_exit_reset() in StartBackgroundWorker(). At best that's useless (InitPostmasterChild would have done it already) and at worst it could zap some callback that's unrelated to shared memory. Improve comments, and simplify the huge_pages enablement logic slightly. Back-patch to 9.4 where hugepage support was introduced. Arguably this should go into 9.3 as well, but the code looks significantly different there, and I doubt it's worth the trouble of adapting the patch given I can't show a live bug.	2016-10-13 13:59:56 -04:00
Robert Haas	eb3bc0bd1a	Re-alphabetize #include directives. Thomas Munro	2016-10-05 08:24:25 -04:00
Robert Haas	d2ce38e204	Rename WAIT_* constants to PG_WAIT_*. Windows apparently has a constant named WAIT_TIMEOUT, and some of these other names are pretty generic, too. Insert "PG_" at the front of each name in order to disambiguate. Michael Paquier	2016-10-05 08:04:52 -04:00
Robert Haas	6c9c95ed1b	Fix another Windows compile break. Commit `6f3bd98ebf` is still making the buildfarm unhappy. This time it's mastodon that is complaining.	2016-10-04 13:14:19 -04:00
Robert Haas	9445d1121d	Fix Windows compile break in `6f3bd98ebf`.	2016-10-04 12:18:05 -04:00
Robert Haas	6f3bd98ebf	Extend framework from commit `53be0b1ad` to report latch waits. WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an additional wait_event_info parameter; legal values are defined in pgstat.h. This makes it possible to uniquely identify every point in the core code where we are waiting for a latch; extensions can pass WAIT_EXTENSION. Because latches were the major wait primitive not previously covered by this patch, it is now possible to see information in pg_stat_activity on a large number of important wait events not previously addressed, such as ClientRead, ClientWrite, and SyncRep. Unfortunately, many of the wait events added by this patch will fail to appear in pg_stat_activity because they're only used in background processes which don't currently appear in pg_stat_activity. We should fix this either by creating a separate view for such information, or else by deciding to include them in pg_stat_activity after all. Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and Thomas Munro.	2016-10-04 11:01:42 -04:00
Tom Lane	3b90e38c5d	Do ClosePostmasterPorts() earlier in SubPostmasterMain(). In standard Unix builds, postmaster child processes do ClosePostmasterPorts immediately after InitPostmasterChild, that is almost immediately after being spawned. This is important because we don't want children holding open the postmaster's end of the postmaster death watch pipe. However, in EXEC_BACKEND builds, SubPostmasterMain was postponing this responsibility significantly, in order to make it slightly more convenient to pass the right flag value to ClosePostmasterPorts. This is bad, particularly seeing that process_shared_preload_libraries() might invoke nearly-arbitrary code. Rearrange so that we do it as soon as we've fetched the socket FDs via read_backend_variables(). Also move the comment explaining about randomize_va_space to before the call of PGSharedMemoryReAttach, which is where it's relevant. The old placement was appropriate when the reattach happened inside CreateSharedMemoryAndSemaphores, but that was a long time ago. Back-patch to 9.3; the patch doesn't apply cleanly before that, and it doesn't seem worth a lot of effort given that we've had no actual field complaints traceable to this. Discussion: <4157.1475178360@sss.pgh.pa.us>	2016-10-01 17:15:09 -04:00
Alvaro Herrera	51c3e9fade	Include <sys/select.h> where needed <sys/select.h> is required by POSIX.1-2001 to get the prototype of select(2), but nearly no systems enforce that because older standards let you get away with including some other headers. Recent OpenBSD hacking has removed that frail touch of friendliness, however, which broke some compiles; fix all the way back to 9.1 by adding the required standard. Only vacuumdb.c was reported to fail, but it seems easier to fix the whole lot in a fell swoop. Per bug #14334 by Sean Farrell.	2016-09-27 01:05:21 -03:00
Tom Lane	da6c4f6ca8	Refer to OS X as "macOS", except for the port name which is still "darwin". We weren't terribly consistent about whether to call Apple's OS "OS X" or "Mac OS X", and the former is probably confusing to people who aren't Apple users. Now that Apple has rebranded it "macOS", follow their lead to establish a consistent naming pattern. Also, avoid the use of the ancient project name "Darwin", except as the port code name which does not seem desirable to change. (In short, this patch touches documentation and comments, but no actual code.) I didn't touch contrib/start-scripts/osx/, either. I suspect those are obsolete and due for a rewrite, anyway. I dithered about whether to apply this edit to old release notes, but those were responsible for quite a lot of the inconsistencies, so I ended up changing them too. Anyway, Apple's being ahistorical about this, so why shouldn't we be?	2016-09-25 15:40:57 -04:00
Tom Lane	49a91b88e6	Avoid using PostmasterRandom() for DSM control segment ID. Commits `470d886c3` et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>	2016-09-23 09:54:11 -04:00
Robert Haas	470d886c32	Use PostmasterRandom(), not random(), for DSM control segment ID. Otherwise, every startup gets the same "random" value, which is definitely not what was intended.	2016-09-20 12:26:29 -04:00
Heikki Linnakangas	ec136d19b2	Move code shared between libpq and backend from backend/libpq/ to common/. When building libpq, ip.c and md5.c were symlinked or copied from src/backend/libpq into src/interfaces/libpq, but now that we have a directory specifically for routines that are shared between the server and client binaries, src/common/, move them there. Some routines in ip.c were only used in the backend. Keep those in src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file that's now in common. Fix the comment in src/common/Makefile to reflect how libpq actually links those files. There are two more files that libpq symlinks directly from src/backend: encnames.c and wchar.c. I don't feel compelled to move those right now, though. Patch by Michael Paquier, with some changes by me. Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>	2016-09-02 13:49:59 +03:00
Tom Lane	ea268cdc9a	Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>	2016-08-27 17:50:38 -04:00
Heikki Linnakangas	fa878703f4	Refactor RandomSalt to handle salts of different lengths. All we need is 4 bytes at the moment, for MD5 authentication. But in upcomint patches for SCRAM authentication, SCRAM will need a salt of different length. It's less scary for the caller to pass the buffer length anyway, than assume a certain-sized output buffer. Author: Michael Paquier Discussion: <CAB7nPqQvO4sxLFeS9D+NM3wpy08ieZdAj_6e117MQHZAfxBFsg@mail.gmail.com>	2016-08-18 13:41:17 +03:00
Tom Lane	8d498a5c8a	Fix bogus coding in WaitForBackgroundWorkerShutdown(). Some conditions resulted in "return" directly out of a PG_TRY block, which left the exception stack dangling, and to add insult to injury failed to restore the state of set_latch_on_sigusr1. This is a bug only in 9.5; in HEAD it was accidentally fixed by commit `db0f6cad4`, which removed the surrounding PG_TRY block. However, I (tgl) chose to apply the patch to HEAD as well, because the old coding was gratuitously different from WaitForBackgroundWorkerStartup(), and there would indeed have been no bug if it were done like that to start with. Dmitry Ivanov Discussion: <1637882.WfYN5gPf1A@abook>	2016-08-04 16:06:14 -04:00
Tom Lane	ef1b5af823	Do not let PostmasterContext survive into background workers. We don't want postmaster child processes to contain a copy of the postmaster's PostmasterContext. That would be a waste of memory at least, and at worst a security issue, since there are copies of the semi-sensitive pg_hba and pg_ident data in there. All other child process types delete the PostmasterContext after forking, but the original coding of the background worker patch (commit `da07a1e85`) did not do so. It appears that the only reason for that was to avoid copying the bgworker's MyBgworkerEntry out of that context; but the couple of additional statements needed to do so are hardly good justification for it. Hence, copy that data and then clear the context as other child processes do. Because this patch changes the memory context in which a bgworker function gains control, back-patching it would be a bit risky, so we won't fix this in back branches. The "security" complaint is pretty thin anyway for generic bgworkers; only with the introduction of parallel query is there any question of running untrusted code in a bgworker process. Discussion: <14111.1470082717@sss.pgh.pa.us>	2016-08-03 14:48:13 -04:00
Tom Lane	c6ea616ff7	Remove duplicate InitPostmasterChild() call while starting a bgworker. This is apparently harmless on Windows, but on Unix it results in an assertion failure. We'd not noticed because this code doesn't get used on Unix unless you build with -DEXEC_BACKEND. Bug was evidently introduced by sloppy refactoring in commit `31c453165`. Thomas Munro Discussion: <CAEepm=1VOnbVx4wsgQFvj94hu9jVt2nVabCr7QiooUSvPJXkgQ@mail.gmail.com>	2016-08-02 18:39:14 -04:00
Tom Lane	e45e990e4b	Make "postgres -C guc" print "" not "(null)" for null-valued GUCs. Commit `0b0baf262` et al made this case print "(null)" on the grounds that that's what happened on platforms that didn't crash. But neither behavior was actually intentional. What we should print is just an empty string, for compatibility with the behavior of SHOW and other ways of examining string GUCs. Those code paths don't distinguish NULL from empty strings, so we should not here either. Per gripe from Alain Radix. Like the previous patch, back-patch to 9.2 where -C option was introduced. Discussion: <CA+YdpwxPUADrmxSD7+Td=uOshMB1KkDN7G7cf+FGmNjjxMhjbw@mail.gmail.com>	2016-06-22 11:55:18 -04:00
Tom Lane	0b0baf2621	Avoid crash in "postgres -C guc" for a GUC with a null string value. Emit "(null)" instead, which was the behavior all along on platforms that don't crash, eg OS X. Per report from Jehan-Guillaume de Rorthais. Back-patch to 9.2 where -C option was introduced. Michael Paquier Report: <20160615204036.2d35d86a@firost>	2016-06-16 12:17:38 -04:00
Tom Lane	8383486f10	Force idle_in_transaction_session_timeout off in pg_dump and autovacuum. We disable statement_timeout and lock_timeout during dump and restore, to prevent any global settings that might exist from breaking routine backups. Commit `c6dda1f48` should have added idle_in_transaction_session_timeout to that list, but failed to. Another place where these timeouts get turned off is autovacuum. While I doubt an idle timeout could fire there, it seems better to be safe than sorry. pg_dump issue noted by Bernd Helmle, the other one found by grepping. Report: <352F9B77DB5D3082578D17BB@eje.land.credativ.lan>	2016-06-15 10:53:03 -04:00
Robert Haas	4bc424b968	pgindent run for 9.6	2016-06-09 18:02:36 -04:00
Tom Lane	f64340e743	Don't reset changes_since_analyze after a selective-columns ANALYZE. If we ANALYZE only selected columns of a table, we should not postpone auto-analyze because of that; other columns may well still need stats updates. As committed, the counter is left alone if a column list is given, whether or not it includes all analyzable columns of the table. Per complaint from Tomasz Ostrowski. It's been like this a long time, so back-patch to all supported branches. Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>	2016-06-06 17:44:17 -04:00
Tom Lane	22b27b4c9e	Avoid useless closely-spaced writes of statistics files. The original intent in the stats collector was that we should not write out stats data oftener than every PGSTAT_STAT_INTERVAL msec. Backends will not make requests at all if they see the existing data is newer than that, and the stats collector is supposed to disregard requests having a cutoff_time older than its most recently written data, so that close-together requests don't result in multiple writes. But the latter part of that got broken in commit `187492b6c2`, so that if two backends concurrently decide the existing stats are too old, the collector would write the data twice. (In principle the collector's logic would still merge requests as long as the second one arrives before we've actually written data ... but since the message collection loop would write data immediately after processing a single inquiry message, that never happened in practice, and in any case the window in which it might work would be much shorter than PGSTAT_STAT_INTERVAL.) To fix, improve pgstat_recv_inquiry so that it checks whether the cutoff time is too old, and doesn't add a request to the queue if so. This means that we do not need DBWriteRequest.request_time, because the decision is taken before making a queue entry. And that means that we don't really need the DBWriteRequest data structure at all; an OID list of database OIDs will serve and allow removal of some rather verbose and crufty code. In passing, improve the comments in this area, which have been rather neglected. Also change backend_read_statsfile so that it's not silently relying on MyDatabaseId to have some particular value in the autovacuum launcher process. It accidentally worked as desired because MyDatabaseId is zero in that process; but that does not seem like a dependency we want, especially with no documentation about it. Although this patch is mine, it turns out I'd rediscovered a known bug, for which Tomas Vondra had already submitted a patch that's functionally equivalent to the non-cosmetic aspects of this patch. Thanks to Tomas for reviewing this version. Back-patch to 9.3 where the bug was introduced. Prior-Discussion: <1718942738eb65c8407fcd864883f4c8@fuzzy.cz> Patch: <4625.1464202586@sss.pgh.pa.us>	2016-05-31 15:55:15 -04:00
Tom Lane	52e8fc3e2e	Ensure that backends see up-to-date statistics for shared catalogs. Ever since we split the statistics collector's reports into per-database files (commit `187492b6c2`), backends have been seeing stale statistics for shared catalogs. This is because the inquiry message only prompts the collector to write the per-database file for the requesting backend's own database. Stats for shared catalogs are in a separate file for "DB 0", which didn't get updated. In normal operation this was partially masked by the fact that the autovacuum launcher would send an inquiry message at least once per autovacuum_naptime that asked for "DB 0"; so the shared-catalog stats would never be more than a minute out of date. However the problem becomes very obvious with autovacuum disabled, as reported by Peter Eisentraut. To fix, redefine the semantics of inquiry messages so that both the specified DB and DB 0 will be dumped. (This might seem a bit inefficient, but we have no good way to know whether a backend's transaction will look at shared-catalog stats, so we have to read both groups of stats whenever we request stats. Sending two inquiry messages would definitely not be better.) Back-patch to 9.3 where the bug was introduced. Report: <56AD41AC.1030509@gmx.net>	2016-05-25 17:48:15 -04:00
Alvaro Herrera	15739393e4	Fix autovacuum for shared relations The table-skipping logic in autovacuum would fail to consider that multiple workers could be processing the same shared catalog in different databases. This normally wouldn't be a problem: firstly because autovacuum workers not for wraparound would simply ignore tables in which they cannot acquire lock, and secondly because most of the time these tables are small enough that even if multiple for-wraparound workers are stuck in the same catalog, they would be over pretty quickly. But in cases where the catalogs are severely bloated it could become a problem. Backpatch all the way back, because the problem has been there since the beginning. Reported by Ondřej Světlík Discussion: https://www.postgresql.org/message-id/572B63B1.3030603%40flexibee.eu https://www.postgresql.org/message-id/572A1072.5080308%40flexibee.eu	2016-05-10 16:23:54 -03:00
Stephen Frost	1574783b4c	Use GRANT system to manage access to sensitive functions Now that pg_dump will properly dump out any ACL changes made to functions which exist in pg_catalog, switch to using the GRANT system to manage access to those functions. This means removing 'if (!superuser()) ereport()' checks from the functions themselves and then REVOKEing EXECUTE right from 'public' for these functions in system_views.sql. Reviews by Alexander Korotkov, Jose Luis Tallon	2016-04-06 21:45:32 -04:00
Simon Riggs	cac0e36682	Revert `bf08f2292f` Remove recent changes to logging XLOG_RUNNING_XACTS by request.	2016-04-06 14:03:46 +01:00
Simon Riggs	bf08f2292f	Avoid archiving XLOG_RUNNING_XACTS on idle server If archive_timeout > 0 we should avoid logging XLOG_RUNNING_XACTS if idle. Bug 13685 reported by Laurence Rowe, investigated in detail by Michael Paquier, though this is not his proposed fix. 20151016203031.3019.72930@wrigleys.postgresql.org Simple non-invasive patch to allow later backpatch to 9.4 and 9.5	2016-04-04 07:18:05 +01:00
Peter Eisentraut	b555ed8102	Merge wal_level "archive" and "hot_standby" into new name "replica" The distinction between "archive" and "hot_standby" existed only because at the time "hot_standby" was added, there was some uncertainty about stability. This is now a long time ago. We would like to move forward with simplifying the replication configuration, but this distinction is in the way, because a primary server cannot tell (without asking a standby or predicting the future) which one of these would be the appropriate level. Pick a new name for the combined setting to make it clearer that it covers all (non-logical) backup and replication uses. The old values are still accepted but are converted internally. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: David Steele <david@pgmasters.net>	2016-03-18 23:56:03 +01:00
Robert Haas	f2b74b01d4	Another comment update. I thought this was in my last commit, but I goofed.	2016-03-16 14:28:25 -04:00
Robert Haas	bc55cc0b6a	Fix problems in commit `c16dc1aca5`. Vinayak Pokale provided a patch for a copy-and-paste error in a comment. I noticed that I'd use the word "automatically" nearby where I meant to talk about things being "atomic". Rahila Syed spotted a misplaced counter update. Fix all that stuff.	2016-03-16 13:54:04 -04:00
Robert Haas	c16dc1aca5	Add simple VACUUM progress reporting. There's a lot more that could be done here yet - in particular, this reports only very coarse-grained information about the index vacuuming phase - but even as it stands, the new pg_stat_progress_vacuum can tell you quite a bit about what a long-running vacuum is actually doing. Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale and Rahila Syed.	2016-03-15 13:32:56 -04:00
Andres Freund	428b1d6b29	Allow to trigger kernel writeback after a configurable number of writes. Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund	2016-03-10 17:04:34 -08:00
Simon Riggs	37c54863cf	Rework wait for AccessExclusiveLocks on Hot Standby Earlier version committed in 9.0 caused spurious waits in some cases. New infrastructure for lock waits in 9.3 used to correct and improve this. Jeff Janes based upon a proposal by Simon Riggs, who also reviewed Additional review comments from Amit Kapila	2016-03-10 19:26:24 +00:00
Robert Haas	53be0b1add	Provide much better wait information in pg_stat_activity. When a process is waiting for a heavyweight lock, we will now indicate the type of heavyweight lock for which it is waiting. Also, you can now see when a process is waiting for a lightweight lock - in which case we will indicate the individual lock name or the tranche, as appropriate - or for a buffer pin. Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful discussion and suggestions by many others, including Alexander Korotkov, Vladimir Borodin, and many others.	2016-03-10 12:44:09 -05:00
Robert Haas	090b287fc5	Code review for `b6fb6471f6`. Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev. Patch by Amit Langote.	2016-03-10 06:07:57 -05:00
Andres Freund	1d4a0ab19a	Avoid unlikely data-loss scenarios due to rename() without fsync. Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Robert Haas	b6fb6471f6	Add a generic command progress reporting facility. Using this facility, any utility command can report the target relation upon which it is operating, if there is one, and up to 10 64-bit counters; the intent of this is that users should be able to figure out what a utility command is doing without having to resort to ugly hacks like attaching strace to a backend. As a demonstration, this adds very crude reporting to lazy vacuum; we just report the target relation and nothing else. A forthcoming patch will make VACUUM report a bunch of additional data that will make this much more interesting. But this gets the basic framework in place. Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao, and Masanori Oyama.	2016-03-09 12:08:58 -05:00
Andres Freund	7975c5e0a9	Allow the WAL writer to flush WAL at a reduced rate. Commit `4de82f7d7` increased the WAL flush rate, mainly to increase the likelihood that hint bits can be set quickly. More quickly set hint bits can reduce contention around the clog et al. But unfortunately the increased flush rate can have a significant negative performance impact, I have measured up to a factor of ~4. The reason for this slowdown is that if there are independent writes to the underlying devices, for example because shared buffers is a lot smaller than the hot data set, or because a checkpoint is ongoing, the fdatasync() calls force cache flushes to be emitted to the storage. This is achieved by flushing WAL only if the last flush was longer than wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC) unflushed blocks are pending. Based on some tests the default for wal_writer_delay is 1MB, which seems to work well both on SSD and rotational media. To avoid negative performance impact due to `4de82f7d7` an earlier commit (`db76b1e`) made SetHintBits() more likely to succeed; preventing performance regressions in the pgbench tests I performed. Discussion: 20160118163908.GW10941@awork2.anarazel.de	2016-02-16 00:56:34 +01:00
Tom Lane	c5e9b77127	Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier." This reverts commit `3971f64843` and a couple of followon debugging commits; I think we've learned what we can from them.	2016-02-10 16:01:04 -05:00
Tom Lane	41d505a7ff	Add still more chattiness in server shutdown. Further investigation says that there may be some slow operations after we've finished ShutdownXLOG(), so add some more log messages to try to isolate that. This is all temporary code too.	2016-02-09 19:36:30 -05:00
Tom Lane	3971f64843	Temporarily make pg_ctl and server shutdown a whole lot chattier. This is a quick hack, due to be reverted when its purpose has been served, to try to gather information about why some of the buildfarm critters regularly fail with "postmaster does not shut down" complaints. Maybe they are just really overloaded, but maybe something else is going on. Hence, instrument pg_ctl to print the current time when it starts waiting for postmaster shutdown and when it gives up, and add a lot of logging of the current time in the server's checkpoint and shutdown code paths. No attempt has been made to make this pretty. I'm not even totally sure if it will build on Windows, but we'll soon find out.	2016-02-08 18:43:11 -05:00
Robert Haas	c1772ad922	Change the way that LWLocks for extensions are allocated. The previous RequestAddinLWLocks() method had several disadvantages. First, the locks would be in the main tranche; we've recently decided that it's useful for LWLocks used for separate purposes to have separate tranche IDs. Second, there wasn't any correlation between what code called RequestAddinLWLocks() and what code called LWLockAssign(); when multiple modules are in use, it could become quite difficult to troubleshoot problems where LWLockAssign() ran out of locks. To fix, create a concept of named LWLock tranches which can be used either by extension or by core code. Amit Kapila and Robert Haas	2016-02-04 16:43:04 -05:00
Peter Eisentraut	7d17e683fc	Add support for systemd service notifications Insert sd_notify() calls at server start and stop for integration with systemd. This allows the use of systemd service units of type "notify", which greatly simplifies the systemd configuration. Reviewed-by: Pavel Stěhule <pavel.stehule@gmail.com>	2016-02-02 21:04:29 -05:00
Magnus Hagander	e51ab85cd9	Fix typos in comments Author: Michael Paquier	2016-02-01 11:43:48 +01:00
Tom Lane	b8682a7155	Fix startup so that log prefix %h works for the log_connections message. We entirely randomly chose to initialize port->remote_host just after printing the log_connections message, when we could perfectly well do it just before, allowing %h and %r to work for that message. Per gripe from Artem Tomyuk.	2016-01-26 15:38:33 -05:00
Tom Lane	65c5fcd353	Restructure index access method API to hide most of it at the C level. This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.	2016-01-17 19:36:59 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Peter Eisentraut	5db837d3f2	Message improvements	2015-11-16 21:39:23 -05:00
Robert Haas	64b2e7ad91	Pass extra data to bgworkers, and use this to fix parallel contexts. Up until now, the total amount of data that could be passed to a background worker at startup was one datum, which can be a small as 4 bytes on some systems. That's enough to pass a dsm_handle or an array index, but not much else. Add a bgw_extra flag to the BackgroundWorker struct, allowing up to 128 bytes to be passed to a new worker on any platform. Use this to fix a problem I recently discovered with the parallel context machinery added in 9.5: the master assigns each worker an array index, and each worker subsequently assigns itself an array index, and there's nothing to guarantee that the two sets of indexes match, leading to chaos. Normally, I would not back-patch the change to add bgw_extra, since it is basically a feature addition. However, since 9.5 is still in beta and there seems to be no other sensible way to repair the broken parallel context machinery, back-patch to 9.5. Existing background worker code can ignore the bgw_extra field without a problem, but might need to be recompiled since the structure size has changed. Report and patch by me. Review by Amit Kapila.	2015-11-05 12:13:56 -05:00
Robert Haas	c6baec92fc	Fix typo in bgworker.c	2015-10-30 10:35:33 +01:00
Tom Lane	869f693a36	On Windows, ensure shared memory handle gets closed if not being used. Postmaster child processes that aren't supposed to be attached to shared memory were not bothering to close the shared memory mapping handle they inherit from the postmaster process. That's mostly harmless, since the handle vanishes anyway when the child process exits -- but the syslogger process, if used, doesn't get killed and restarted during recovery from a backend crash. That meant that Windows doesn't see the shared memory mapping as becoming free, so it doesn't delete it and the postmaster is unable to create a new one, resulting in failure to recover from crashes whenever logging_collector is turned on. Per report from Dmitry Vasilyev. It's a bit astonishing that we'd not figured this out long ago, since it's been broken from the very beginnings of out native Windows support; probably some previously-unexplained trouble reports trace to this. A secondary problem is that on Cygwin (perhaps only in older versions?), exec() may not detach from the shared memory segment after all, in which case these child processes did remain attached to shared memory, posing the risk of an unexpected shared memory clobber if they went off the rails somehow. That may be a long-gone bug, but we can deal with it now if it's still live, by detaching within the infrastructure introduced here to deal with closing the handle. Back-patch to all supported branches. Tom Lane and Amit Kapila	2015-10-13 11:21:33 -04:00
Robert Haas	db0f6cad48	Remove set_latch_on_sigusr1 flag. This flag has proven to be a recipe for bugs, and it doesn't seem like it can really buy anything in terms of performance. So let's just always set the process latch when we receive SIGUSR1 instead of trying to do it only when needed. Per my recent proposal on pgsql-hackers.	2015-10-09 14:31:04 -04:00
Tom Lane	94f5246ce1	Fix uninitialized-variable bug. For some reason, neither of the compilers I usually use noticed the uninitialized-variable problem I introduced in commit `7e2a18a916`. That's hardly a good enough excuse though. Committing with brown paper bag on head. In addition to putting the operations in the right order, move the declaration of "now" inside the loop; there's no need for it to be outside, and that does wake up older gcc enough to notice any similar future problem. Back-patch to 9.4; earlier versions lack the time-to-SIGKILL stanza so there's no bug.	2015-10-09 09:12:03 -05:00
Tom Lane	7e2a18a916	Perform an immediate shutdown if the postmaster.pid file is removed. The postmaster now checks every minute or so (worst case, at most two minutes) that postmaster.pid is still there and still contains its own PID. If not, it performs an immediate shutdown, as though it had received SIGQUIT. The original goal behind this change was to ensure that failed buildfarm runs would get fully cleaned up, even if the test scripts had left a postmaster running, which is not an infrequent occurrence. When the buildfarm script removes a test postmaster's $PGDATA directory, its next check on postmaster.pid will fail and cause it to exit. Previously, manual intervention was often needed to get rid of such orphaned postmasters, since they'd block new test postmasters from obtaining the expected socket address. However, by checking postmaster.pid and not something else, we can provide additional robustness: manual removal of postmaster.pid is a frequent DBA mistake, and now we can at least limit the damage that will ensue if a new postmaster is started while the old one is still alive. Back-patch to all supported branches, since we won't get the desired improvement in buildfarm reliability otherwise.	2015-10-06 17:15:52 -04:00
Robert Haas	8f6bb851bd	Remove more volatile qualifiers. Prior to commit `0709b7ee72`, access to variables within a spinlock-protected critical section had to be done through a volatile pointer, but that should no longer be necessary. This continues work begun in `df4077cda2` and `6ba4ecbf47`. Thomas Munro and Michael Paquier	2015-10-06 15:45:02 -04:00
Fujii Masao	96f6a0cb41	Remove files signaling a standby promotion request at postmaster startup This commit makes postmaster forcibly remove the files signaling a standby promotion request. Otherwise, the existence of those files can trigger a promotion too early, whether a user wants that or not. This removal of files is usually unnecessary because they can exist only during a few moments during a standby promotion. However there is a race condition: if pg_ctl promote is executed and creates the files during a promotion, the files can stay around even after the server is brought up to new master. Then, if new standby starts by using the backup taken from that master, the files can exist at the server startup and should be removed in order to avoid an unexpected promotion. Back-patch to 9.1 where promote signal file was introduced. Problem reported by Feike Steenbergen. Original patch by Michael Paquier, modified by me. Discussion: 20150528100705.4686.91426@wrigleys.postgresql.org	2015-09-09 22:51:44 +09:00
Robert Haas	8a02b3d732	Allow notifications to bgworkers without database connections. Previously, if one background worker registered another background worker and set bgw_notify_pid while for the second background worker, it would not receive notifications from the postmaster unless, at the time the "parent" was registered, BGWORKER_BACKEND_DATABASE_CONNECTION was set. To fix, instead instead of including only those background workers that requested database connections in the postmater's BackendList, include them all. There doesn't seem to be any reason not do this, and indeed it removes a significant amount of duplicated code. The other option is to make PostmasterMarkPIDForWorkerNotify look at BackgroundWorkerList in addition to BackendList, but that adds more code duplication instead of getting rid of it. Patch by me. Review and testing by Ashutosh Bapat.	2015-09-01 15:30:19 -04:00
Tom Lane	d73d14c271	Fix incorrect order of lock file removal and failure to close() sockets. Commit `c9b0cbe98b` accidentally broke the order of operations during postmaster shutdown: it resulted in removing the per-socket lockfiles after, not before, postmaster.pid. This creates a race-condition hazard for a new postmaster that's started immediately after observing that postmaster.pid has disappeared; if it sees the socket lockfile still present, it will quite properly refuse to start. This error appears to be the explanation for at least some of the intermittent buildfarm failures we've seen in the pg_upgrade test. Another problem, which has been there all along, is that the postmaster has never bothered to close() its listen sockets, but has just allowed them to close at process death. This creates a different race condition for an incoming postmaster: it might be unable to bind to the desired listen address because the old postmaster is still incumbent. This might explain some odd failures we've seen in the past, too. (Note: this is not related to the fact that individual backends don't close their client communication sockets. That behavior is intentional and is not changed by this patch.) Fix by adding an on_proc_exit function that closes the postmaster's ports explicitly, and (in 9.3 and up) reshuffling the responsibility for where to unlink the Unix socket files. Lock file unlinking can stay where it is, but teach it to unlink the lock files in reverse order of creation.	2015-08-02 14:55:03 -04:00
Tom Lane	4c8f8ffaca	Further code review for pg_stat_ssl patch. Fix additional bogosity in commit `9029f4b374`. Include the BackendSslStatusBuffer in the BackendStatusShmemSize calculation, avoid ugly and error-prone casts to char* and back, put related code stanzas into a consistent order (and fix a couple of previous instances of that sin). All cosmetic except for the size oversight.	2015-07-27 16:29:14 -04:00
Tom Lane	7d791ed49b	Fix pointer-arithmetic thinko in pg_stat_ssl patch. Nasty memory-stomp bug in commit `9029f4b374`. It's not apparent how this survived even cursory testing :-(. Per report from Peter Holzer.	2015-07-27 15:58:46 -04:00
Tom Lane	45811be94e	Fix postmaster's handling of a startup-process crash. Ordinarily, a failure (unexpected exit status) of the startup subprocess should be considered fatal, so the postmaster should just close up shop and quit. However, if we sent the startup process a SIGQUIT or SIGKILL signal, the failure is hardly "unexpected", and we should attempt restart; this is necessary for recovery from ordinary backend crashes in hot-standby scenarios. I attempted to implement the latter rule with a two-line patch in commit `442231d7f7`, but it now emerges that that patch was a few bricks shy of a load: it failed to distinguish the case of a signaled startup process from the case where the new startup process crashes before reaching database consistency. That resulted in infinitely respawning a new startup process only to have it crash again. To handle this properly, we really must track whether we have sent the current startup process a kill signal. Rather than add yet another ad-hoc boolean to the postmaster's state, I chose to unify this with the existing RecoveryError flag into an enum tracking the startup process's state. That seems more consistent with the postmaster's general state machine design. Back-patch to 9.0, like the previous patch.	2015-07-09 13:22:22 -04:00
Heikki Linnakangas	d661532e27	Also trigger restartpoints based on max_wal_size on standby. When archive recovery and restartpoints were initially introduced, checkpoint_segments was ignored on the grounds that the files restored from archive don't consume any space in the recovery server. That was changed in later releases, but even then it was arguably a feature rather than a bug, as performing restartpoints as often as checkpoints during normal operation might be excessive, but you might nevertheless not want to waste a lot of space for pre-allocated WAL by setting checkpoint_segments to a high value. But now that we have separate min_wal_size and max_wal_size settings, you can bound WAL usage with max_wal_size, and still avoid consuming excessive space usage by setting min_wal_size to a lower value, so that argument is moot. There are still some issues with actually limiting the space usage to max_wal_size: restartpoints in recovery can only start after seeing the checkpoint record, while a checkpoint starts flushing buffers as soon as the redo-pointer is set. Restartpoint is paced to happen at the same leisurily speed, determined by checkpoint_completion_target, as checkpoints, but because they are started later, max_wal_size can be exceeded by upto one checkpoint cycle's worth of WAL, depending on checkpoint_completion_target. But that seems better than not trying at all, and max_wal_size is a soft limit anyway. The documentation already claimed that max_wal_size is obeyed in recovery, so this just fixes the behaviour to match the docs. However, add some weasel-words there to mention that max_wal_size may well be exceeded by some amount in recovery.	2015-06-29 00:09:10 +03:00
Robert Haas	91118f1a59	Reduce log level for background worker events from LOG to DEBUG1. Per discussion, LOG is just too chatty for something that will happen as routinely as this. Pavel Stehule	2015-06-26 11:23:32 -04:00
Alvaro Herrera	3c400a3f2b	Fix thinko in comment (launcher -> worker)	2015-06-20 11:45:59 -03:00
Tom Lane	48913db887	In immediate shutdown, postmaster should not exit till children are gone. This adjusts commit `82233ce7ea` so that the postmaster does not exit until all its child processes have exited, even if the 5-second timeout elapses and we have to send SIGKILL. There is no great value in having the postmaster process quit sooner, and doing so can mislead onlookers into thinking that the cluster is fully terminated when actually some child processes still survive. This effect might explain recent test failures on buildfarm member hamster, wherein we failed to restart a cluster just after shutting it down with "pg_ctl stop -m immediate". I also did a bit of code review/beautification, including fixing a faulty use of the Max() macro on a volatile expression. Back-patch to 9.4. In older branches, the postmaster never waited for children to exit during immediate shutdowns, and changing that would be too much of a behavioral change.	2015-06-19 14:23:39 -04:00
Alvaro Herrera	da1a9d0f5b	Clamp autovacuum launcher sleep time to 5 minutes This avoids the problem that it might go to sleep for an unreasonable amount of time in unusual conditions like the server clock moving backwards an unreasonable amount of time. (Simply moving the server clock forward again doesn't solve the problem unless you wake up the autovacuum launcher manually, say by sending it SIGHUP). Per trouble report from Prakash Itnal in https://www.postgresql.org/message-id/CAHC5u79-UqbapAABH2t4Rh2eYdyge0Zid-X=Xz-ZWZCBK42S0Q@mail.gmail.com Analyzed independently by Haribabu Kommi and Tom Lane.	2015-06-19 12:44:36 -03:00
Fujii Masao	b5fe62038f	Make postmaster restart archiver soon after it dies, even during recovery. After the archiver dies, postmaster tries to start a new one immediately. But previously this could happen only while server was running normally even though archiving was enabled always (i.e., archive_mode was set to always). So the archiver running during recovery could not restart soon after it died. This is an oversight in commit `ffd3774`. This commit changes reaper(), postmaster's signal handler to cleanup after a child process dies, so that it tries to a new archiver even during recovery if necessary. Patch by me. Review by Alvaro Herrera.	2015-06-12 23:11:51 +09:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00
Heikki Linnakangas	fa60fb63e5	Fix more typos in comments. Patch by CharSyam, plus a few more I spotted with grep.	2015-05-20 19:45:43 +03:00
Noah Misch	b0ce385032	Prevent a double free by not reentering be_tls_close(). Reentering this function with the right timing caused a double free, typically crashing the backend. By synchronizing a disconnection with the authentication timeout, an unauthenticated attacker could achieve this somewhat consistently. Call be_tls_close() solely from within proc_exit_prepare(). Back-patch to 9.0 (all supported versions). Benkocs Norbert Attila Security: CVE-2015-3165	2015-05-18 10:02:31 -04:00
Heikki Linnakangas	4df1328950	Put back stats-collector restarting code, removed accidentally. Removed that code snippet accidentally in the archive_mode='always' patch. Also, use varname-tags for archive_command in the docs. Fujii Masao	2015-05-18 10:20:30 +03:00
Heikki Linnakangas	ffd37740ee	Add archive_mode='always' option. In 'always' mode, the standby independently archives all files it receives from the primary. Original patch by Fujii Masao, docs and review by me.	2015-05-15 18:55:24 +03:00
Robert Haas	53bb309d2d	Teach autovacuum about multixact member wraparound. The logic introduced in commit `b69bf30b9b` and repaired in commits `669c7d20e6` and `7be47c56af` helps to ensure that we don't overwrite old multixact member information while it is still needed, but a user who creates many large multixacts can still exhaust the member space (and thus start getting errors) while autovacuum stands idly by. To fix this, progressively ramp down the effective value (but not the actual contents) of autovacuum_multixact_freeze_max_age as member space utilization increases. This makes autovacuum more aggressive and also reduces the threshold for a manual VACUUM to perform a full-table scan. This patch leaves unsolved the problem of ensuring that emergency autovacuums are triggered even when autovacuum=off. We'll need to fix that via a separate patch. Thomas Munro and Robert Haas	2015-05-08 12:53:00 -04:00
Robert Haas	924bcf4f16	Create an infrastructure for parallel computation in PostgreSQL. This does four basic things. First, it provides convenience routines to coordinate the startup and shutdown of parallel workers. Second, it synchronizes various pieces of state (e.g. GUCs, combo CID mappings, transaction snapshot) from the parallel group leader to the worker processes. Third, it prohibits various operations that would result in unsafe changes to that state while parallelism is active. Finally, it propagates events that would result in an ErrorResponse, NoticeResponse, or NotifyResponse message being sent to the client from the parallel workers back to the master, from which they can then be sent on to the client. Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke. Suggestions and review from Andres Freund, Heikki Linnakangas, Noah Misch, Simon Riggs, Euler Taveira, and Jim Nasby.	2015-04-30 15:02:14 -04:00
Andres Freund	6aab1f45ac	Fix various typos and grammar errors in comments. Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru	2015-04-26 18:42:31 +02:00
Magnus Hagander	9029f4b374	Add system view pg_stat_ssl This view shows information about all connections, such as if the connection is using SSL, which cipher is used, and which client certificate (if any) is used. Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier	2015-04-12 19:07:46 +02:00
Alvaro Herrera	5df64f298d	Fix autovacuum launcher shutdown sequence It was previously possible to have the launcher re-execute its main loop before shutting down if some other signal was received or an error occurred after getting SIGTERM, as reported by Qingqing Zhou. While investigating, Tom Lane further noticed that if autovacuum had been disabled in the config file, it would misbehave by trying to start a new worker instead of bailing out immediately -- it would consider itself as invoked in emergency mode. Fix both problems by checking the shutdown flag in a few more places. These problems have existed since autovacuum was introduced, so backpatch all the way back.	2015-04-08 13:19:49 -03:00
Alvaro Herrera	4ff695b17d	Add log_min_autovacuum_duration per-table option This is useful to control autovacuum log volume, for situations where monitoring only a set of tables is necessary. Author: Michael Paquier Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa, Taiki Kondo, Huong Dangminh), Fujii Masao.	2015-04-03 11:55:50 -03:00
Alvaro Herrera	a75fb9b335	Have autovacuum workers listen to SIGHUP, too They have historically ignored it, but it's been said to be useful at times to change their settings mid-flight. Author: Michael Paquier	2015-04-03 11:52:55 -03:00
Robert Haas	b3a5e76e12	After a crash, don't restart workers with BGW_NEVER_RESTART. Amit Khandekar	2015-04-02 14:38:06 -04:00
Alvaro Herrera	00ee6c7672	autovacuum: Fix polarity of "wraparound" variable Commit `0d83138974` inadvertently reversed the meaning of the wraparound variable. This causes vacuums which are not required for wraparound to wait for locks to be acquired, and what is worse, it allows wraparound vacuums to skip locked pages. Bug reported by Jeff Janes in http://www.postgresql.org/message-id/CAMkU=1xmTEiaY=5oMHsSQo5vd9V1Ze4kNLL0qN2eH0P_GXOaYw@mail.gmail.com Analysis and patch by Kyotaro HORIGUCHI	2015-04-02 13:34:50 -03:00
Tom Lane	785941cdc3	Tweak __attribute__-wrapping macros for better pgindent results. This improves on commit `bbfd7edae5` by making two simple changes: * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn(). Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces pgindent's tendency to misformat declarations involving them. * attributes are now always attached to function declarations, not definitions. Previously some places were taking creative shortcuts, which were not merely candidates for bad misformatting by pgindent but often were outright wrong anyway. (It does little good to put a noreturn annotation where callers can't see it.) In any case, if we would like to believe that these macros can be used with non-gcc compilers, we should avoid gratuitous variance in usage patterns. I also went through and manually improved the formatting of a lot of declarations, and got rid of excessively repetitive (and now obsolete anyway) comments informing the reader what pg_attribute_printf is for.	2015-03-26 14:03:25 -04:00
Robert Haas	bf740ce9e5	Fix status reporting for terminated bgworkers that were never started. Previously, GetBackgroundWorkerPid() would return BGWH_NOT_YET_STARTED if the slot used for the worker registration had not been reused by unrelated activity, and BGWH_STOPPED if it had. Either way, a process that had requested notification when the state of one of its background workers changed did not receive such notifications. Fix things so that GetBackgroundWorkerPid() always returns BGWH_STOPPED in this situation, so that we do not erroneously give waiters the impression that the worker will eventually be started; and send notifications just as we would if the process terminated after having been started, so that it's possible to wait for the postmaster to process a worker termination request without polling. Discovered by Amit Kapila during testing of parallel sequential scan. Analysis and fix by me. Back-patch to 9.4; there may not be anyone relying on this interface yet, but if anyone is, the new behavior is a clear improvement.	2015-03-19 11:04:09 -04:00
Alvaro Herrera	0d83138974	Rationalize vacuuming options and parameters We were involving the parser too much in setting up initial vacuuming parameters. This patch moves that responsibility elsewhere to simplify code, and also to make future additions easier. To do this, create a new struct VacuumParams which is filled just prior to vacuum execution, instead of at parse time; for user-invoked vacuuming this is set up in a new function ExecVacuum, while autovacuum sets it up by itself. While at it, add a new member VACOPT_SKIPTOAST to enum VacuumOption, only set by autovacuum, which is used to disable vacuuming of the toast table instead of the old do_toast parameter; this relieves the argument list of vacuum() and some callees a bit. This partially makes up for having added more arguments in an effort to avoid having autovacuum from constructing a VacuumStmt parse node. Author: Michael Paquier. Some tweaks by Álvaro Reviewed by: Robert Haas, Stephen Frost, Álvaro Herrera	2015-03-18 11:52:33 -03:00
Andres Freund	bbfd7edae5	Add macros wrapping all usage of gcc's __attribute__. Until now __attribute__() was defined to be empty for all compilers but gcc. That's problematic because it prevents using it in other compilers; which is necessary e.g. for atomics portability. It's also just generally dubious to do so in a header as widely included as c.h. Instead add pg_attribute_format_arg, pg_attribute_printf, pg_attribute_noreturn macros which are implemented in the compilers that understand them. Also add pg_attribute_noreturn and pg_attribute_packed, but don't provide fallbacks, since they can affect functionality. This means that external code that, possibly unwittingly, relied on __attribute__ defined to be empty on !gcc compilers may now run into warnings or errors on those compilers. But there shouldn't be many occurances of that and it's hard to work around... Discussion: 54B58BA3.8040302@ohmu.fi Author: Oskari Saarenmaa, with some minor changes by me.	2015-03-11 14:30:01 +01:00
Heikki Linnakangas	88e9823026	Replace checkpoint_segments with min_wal_size and max_wal_size. Instead of having a single knob (checkpoint_segments) that both triggers checkpoints, and determines how many checkpoints to recycle, they are now separate concerns. There is still an internal variable called CheckpointSegments, which triggers checkpoints. But it no longer determines how many segments to recycle at a checkpoint. That is now auto-tuned by keeping a moving average of the distance between checkpoints (in bytes), and trying to keep that many segments in reserve. The advantage of this is that you can set max_wal_size very high, but the system won't actually consume that much space if there isn't any need for it. The min_wal_size sets a floor for that; you can effectively disable the auto-tuning behavior by setting min_wal_size equal to max_wal_size. The max_wal_size setting is now the actual target size of WAL at which a new checkpoint is triggered, instead of the distance between checkpoints. Previously, you could calculate the actual WAL usage with the formula "(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this patch, you set the desired WAL usage with max_wal_size, and the system calculates the appropriate CheckpointSegments with the reverse of that formula. That's a lot more intuitive for administrators to set. Reviewed by Amit Kapila and Venkata Balaji N.	2015-02-23 18:53:02 +02:00
Tom Lane	33a3b03d63	Use FLEXIBLE_ARRAY_MEMBER in some more places. Fix a batch of structs that are only visible within individual .c files. Michael Paquier	2015-02-20 17:32:01 -05:00
Alvaro Herrera	d42358efb1	Have TRUNCATE update pgstat tuple counters This works by keeping a per-subtransaction record of the ins/upd/del counters before the truncate, and then resetting them; this record is useful to return to the previous state in case the truncate is rolled back, either in a subtransaction or whole transaction. The state is propagated upwards as subtransactions commit. When the per-table data is sent to the stats collector, a flag indicates to reset the live/dead counters to zero as well. Catalog version bumped due to the change in pgstat format. Author: Alexander Shulgin Discussion: 1007.1207238291@sss.pgh.pa.us Discussion: 548F7D38.2000401@BlueTreble.com Reviewed-by: Álvaro Herrera, Jim Nasby	2015-02-20 12:10:01 -03:00
Tom Lane	09d8d110a6	Use FLEXIBLE_ARRAY_MEMBER in a bunch more places. Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]". Aside from being more self-documenting, this should help prevent bogus warnings from static code analyzers and perhaps compiler misoptimizations. This patch is just a down payment on eliminating the whole problem, but it gets rid of a lot of easy-to-fix cases. Note that the main problem with doing this is that one must no longer rely on computing sizeof(the containing struct), since the result would be compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf also warns against spelling that offsetof(struct, lastfield[0]). Michael Paquier, review and additional fixes by me.	2015-02-20 00:11:42 -05:00
Andres Freund	4f85fde8eb	Introduce and use infrastructure for interrupt processing during client reads. Up to now large swathes of backend code ran inside signal handlers while reading commands from the client, to allow for speedy reaction to asynchronous events. Most prominently shared invalidation and NOTIFY handling. That means that complex code like the starting/stopping of transactions is run in signal handlers... The required code was fragile and verbose, and is likely to contain bugs. That approach also severely limited what could be done while communicating with the client. As the read might be from within openssl it wasn't safely possible to trigger an error, e.g. to cancel a backend in idle-in-transaction state. We did that in some cases, namely fatal errors, nonetheless. Now that FE/BE communication in the backend employs non-blocking sockets and latches to block, we can quite simply interrupt reads from signal handlers by setting the latch. That allows us to signal an interrupted read, which is supposed to be retried after returning from within the ssl library. As signal handlers now only need to set the latch to guarantee timely interrupt processing, remove a fair amount of complicated & fragile code from async.c and sinval.c. We could now actually start to process some kinds of interrupts, like sinval ones, more often that before, but that seems better done separately. This work will hopefully allow to handle cases like being blocked by sending data, interrupting idle transactions and similar to be implemented without too much effort. In addition to allowing getting rid of ImmediateInterruptOK, that is. Author: Andres Freund Reviewed-By: Heikki Linnakangas	2015-02-03 22:25:20 +01:00
Robert Haas	5d2f957f3f	Add new function BackgroundWorkerInitializeConnectionByOid. Sometimes it's useful for a background worker to be able to initialize its database connection by OID rather than by name, so provide a way to do that.	2015-02-02 16:23:59 -05:00
Heikki Linnakangas	2b3a8b20c2	Be more careful to not lose sync in the FE/BE protocol. If any error occurred while we were in the middle of reading a protocol message from the client, we could lose sync, and incorrectly try to interpret a part of another message as a new protocol message. That will usually lead to an "invalid frontend message" error that terminates the connection. However, this is a security issue because an attacker might be able to deliberately cause an error, inject a Query message in what's supposed to be just user data, and have the server execute it. We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other operations that could ereport(ERROR) in the middle of processing a message, but a query cancel interrupt or statement timeout could nevertheless cause it to happen. Also, the V2 fastpath and COPY handling were not so careful. It's very difficult to recover in the V2 COPY protocol, so we will just terminate the connection on error. In practice, that's what happened previously anyway, as we lost protocol sync. To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set whenever we're in the middle of reading a message. When it's set, we cannot safely ERROR out and continue running, because we might've read only part of a message. PqCommReadingMsg acts somewhat similarly to critical sections in that if an error occurs while it's set, the error handler will force the connection to be terminated, as if the error was FATAL. It's not implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted to PANIC in critical sections, because we want to be able to use PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes advantage of that to prevent an OOM error from terminating the connection. To prevent unnecessary connection terminations, add a holdoff mechanism similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel interrupts, but still allow die interrupts. The rules on which interrupts are processed when are now a bit more complicated, so refactor ProcessInterrupts() and the calls to it in signal handlers so that the signal handlers always call it if ImmediateInterruptOK is set, and ProcessInterrupts() can decide to not do anything if the other conditions are not met. Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund. Backpatch to all supported versions. Security: CVE-2015-0244	2015-02-02 17:09:53 +02:00
Tom Lane	586dd5d6a5	Replace a bunch more uses of strncpy() with safer coding. strncpy() has a well-deserved reputation for being unsafe, so make an effort to get rid of nearly all occurrences in HEAD. A large fraction of the remaining uses were passing length less than or equal to the known strlen() of the source, in which case no null-padding can occur and the behavior is equivalent to memcpy(), though doubtless slower and certainly harder to reason about. So just use memcpy() in these cases. In other cases, use either StrNCpy() or strlcpy() as appropriate (depending on whether padding to the full length of the destination buffer seems useful). I left a few strncpy() calls alone in the src/timezone/ code, to keep it in sync with upstream (the IANA tzcode distribution). There are also a few such calls in ecpg that could possibly do with more analysis. AFAICT, none of these changes are more than cosmetic, except for the four occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength source leads to a non-null-terminated destination buffer and ensuing misbehavior. These don't seem like security issues, first because no stack clobber is possible and second because if your values of sslcert etc are coming from untrusted sources then you've got problems way worse than this. Still, it's undesirable to have unpredictable behavior for overlength inputs, so back-patch those four changes to all active branches.	2015-01-24 13:05:42 -05:00
Tom Lane	75b48e1fff	Adjust "pgstat wait timeout" message to be a translatable LOG message. Per discussion, change the log level of this message to be LOG not WARNING. The main point of this change is to avoid causing buildfarm run failures when the stats collector is exceptionally slow to respond, which it not infrequently is on some of the smaller/slower buildfarm members. This change does lose notice to an interactive user when his stats query is looking at out-of-date stats, but the majority opinion (not necessarily that of yours truly) is that WARNING messages would probably not get noticed anyway on heavily loaded production systems. A LOG message at least ensures that the problem is recorded somewhere where bulk auditing for the issue is possible. Also, instead of an untranslated "pgstat wait timeout" message, provide a translatable and hopefully more understandable message "using stale statistics instead of current ones because stats collector is not responding". The original text was written hastily under the assumption that it would never really happen in practice, which we now know to be unduly optimistic. Back-patch to all active branches, since we've seen the buildfarm issue in all branches.	2015-01-19 23:01:33 -05:00
Andres Freund	59f71a0d0b	Add a default local latch for use in signal handlers. To do so, move InitializeLatchSupport() into the new common process initialization functions, and add a new global variable MyLatch. MyLatch is usable as soon InitPostmasterChild() has been called (i.e. very early during startup). Initially it points to a process local latch that exists in all processes. InitProcess/InitAuxiliaryProcess then replaces that local latch with PGPROC->procLatch. During shutdown the reverse happens. This is primarily advantageous for two reasons: For one it simplifies dealing with the shared process latch, especially in signal handlers, because instead of having to check for MyProc, MyLatch can be used unconditionally. For another, a later patch that makes FEs/BE communication use latches, now can rely on the existence of a latch, even before having gone through InitProcess. Discussion: 20140927191243.GD5423@alap3.anarazel.de	2015-01-14 18:45:22 +01:00
Andres Freund	31c453165b	Commonalize process startup code. Move common code, that was duplicated in every postmaster child/every standalone process, into two functions in miscinit.c. Not only does that already result in a fair amount of net code reduction but it also makes it much easier to remove more duplication in the future. The prime motivation wasn't code deduplication though, but easier addition of new common code.	2015-01-14 00:33:14 +01:00
Andres Freund	2be82dcf17	Make logging_collector=on work with non-windows EXEC_BACKEND again. Commit `b94ce6e80` reordered postmaster's startup sequence so that the tempfile directory is only cleaned up after all the necessary state for pg_ctl is collected. Unfortunately the chosen location is after the syslogger has been started; which normally is fine, except for !WIN32 EXEC_BACKEND builds, which pass information to children via files in the temp directory. Move the call to RemovePgTempFiles() to just before the syslogger has started. That's the first child we fork. Luckily EXEC_BACKEND is pretty much only used by endusers on windows, which has a separate method to pass information to children. That means the real world impact of this bug is very small. Discussion: 20150113182344.GF12272@alap3.anarazel.de Backpatch to 9.1, just as the previous commit was.	2015-01-14 00:14:53 +01:00
Noah Misch	2048e5b881	On Darwin, refuse postmaster startup when multithreaded. The previous commit introduced its report at LOG level to avoid surprises at minor release upgrade time. Compel users deploying the next major release to also deploy the reported workaround.	2015-01-07 22:46:59 -05:00
Noah Misch	894459e59f	On Darwin, detect and report a multithreaded postmaster. Darwin --enable-nls builds use a substitute setlocale() that may start a thread. Buildfarm member orangutan experienced BackendList corruption on account of different postmaster threads executing signal handlers simultaneously. Furthermore, a multithreaded postmaster risks undefined behavior from sigprocmask() and fork(). Emit LOG messages about the problem and its workaround. Back-patch to 9.0 (all supported versions).	2015-01-07 22:35:44 -05:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Andres Freund	d72731a704	Lockless StrategyGetBuffer clock sweep hot path. StrategyGetBuffer() has proven to be a bottleneck in a number of buffer acquisition heavy workloads. To some degree this has already been alleviated by `5d7962c6`, but it still can be quite a heavy bottleneck. The problem is that in unfortunate usage patterns a single StrategyGetBuffer() call will have to look at a large number of buffers - in turn making it likely that the process will be put to sleep while still holding the spinlock. Replace most of the usage of the buffer_strategy_lock spinlock for the clock sweep by a atomic nextVictimBuffer variable. That variable, modulo NBuffers, is the current hand of the clock sweep. The buffer clock-sweep then only needs to acquire the spinlock after a wraparound. And even then only in the process that did the wrapping around. That alleviates nearly all the contention on the relevant spinlock, although significant contention on the cacheline can still exist. Reviewed-By: Robert Haas and Amit Kapila Discussion: 20141010160020.GG6670@alap3.anarazel.de, 20141027133218.GA2639@awork2.anarazel.de	2014-12-25 18:26:25 +01:00
Tom Lane	4a14f13a0a	Improve hash_create's API for selecting simple-binary-key hash functions. Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane	2014-12-18 13:36:36 -05:00
Fujii Masao	38628db8d8	Add memory barriers for PgBackendStatus.st_changecount protocol. st_changecount protocol needs the memory barriers to ensure that the apparent order of execution is as it desires. Otherwise, for example, the CPU might rearrange the code so that st_changecount is incremented twice before the modification on a machine with weak memory ordering. This surprising result can lead to bugs. This commit introduces the macros to load and store st_changecount with the memory barriers. These are called before and after PgBackendStatus entries are modified or copied into private memory, in order to prevent CPU from reordering PgBackendStatus access. Per discussion on pgsql-hackers, we decided not to back-patch this to 9.4 or before until we get an actual bug report about this. Patch by me. Review by Robert Haas.	2014-12-18 23:07:51 +09:00
Tom Lane	06d5803ffa	Fix assorted confusion between Oid and int32. In passing, also make some debugging elog's in pgstat.c a bit more consistently worded. Back-patch as far as applicable (9.3 or 9.4; none of these mistakes are really old). Mark Dilger identified and patched the type violations; the message rewordings are mine.	2014-12-11 15:41:15 -05:00
Simon Riggs	aedccb1f6f	action_at_recovery_target recovery config option action_at_recovery_target = pause \| promote \| shutdown Petr Jelinek Reviewed by Muhammad Asif Naeem, Fujji Masao and Simon Riggs	2014-11-25 20:13:30 +00:00
Robert Haas	d0410d6603	Eliminate one background-worker-related flag variable. Teach sigusr1_handler() to use the same test for whether a worker might need to be started as ServerLoop(). Aside from being perhaps a bit simpler, this prevents a potentially-unbounded delay when starting a background worker. On some platforms, select() doesn't return when interrupted by a signal, but is instead restarted, including a reset of the timeout to the originally-requested value. If signals arrive often enough, but no connection requests arrive, sigusr1_handler() will be executed repeatedly, but the body of ServerLoop() won't be reached. This change ensures that, even in that case, background workers will eventually get launched. This is far from a perfect fix; really, we need select() to return control to ServerLoop() after an interrupt, either via the self-pipe trick or some other mechanism. But that's going to require more work and discussion, so let's do this for now to at least mitigate the damage. Per investigation of test_shm_mq failures on buildfarm member anole.	2014-10-04 21:25:41 -04:00
Alvaro Herrera	1021bd6a89	Don't balance vacuum cost delay when per-table settings are in effect When there are cost-delay-related storage options set for a table, trying to make that table participate in the autovacuum cost-limit balancing algorithm produces undesirable results: instead of using the configured values, the global values are always used, as illustrated by Mark Kirkwood in http://www.postgresql.org/message-id/52FACF15.8020507@catalyst.net.nz Since the mechanism is already complicated, just disable it for those cases rather than trying to make it cope. There are undesirable side-effects from this too, namely that the total I/O impact on the system will be higher whenever such tables are vacuumed. However, this is seen as less harmful than slowing down vacuum, because that would cause bloat to accumulate. Anyway, in the new system it is possible to tweak options to get the precise behavior one wants, whereas with the previous system one was simply hosed. This has been broken forever, so backpatch to all supported branches. This might affect systems where cost_limit and cost_delay have been set for individual tables.	2014-10-03 13:01:27 -03:00
Andres Freund	a39e78b710	Block signals while computing the sleep time in postmaster's main loop. DetermineSleepTime() was previously called without blocked signals. That's not good, because it allows signal handlers to interrupt its workings. DetermineSleepTime() was added in 9.3 with the addition of background workers (`da07a1e856`), where it only read from BackgroundWorkerList. Since 9.4, where dynamic background workers were added (`7f7485a0cd`), the list is also manipulated in DetermineSleepTime(). That's bad because the list now can be persistently corrupted if modified by both a signal handler and DetermineSleepTime(). This was discovered during the investigation of hangs on buildfarm member anole. It's unclear whether this bug is the source of these hangs or not, but it's worth fixing either way. I have confirmed that it can cause crashes. It luckily looks like this only can cause problems when bgworkers are actively used. Discussion: 20140929193733.GB14400@awork2.anarazel.de Backpatch to 9.3 where background workers were introduced.	2014-10-01 15:19:40 +02:00
Andres Freund	11a020eb6e	Allow escaping of option values for options passed at connection start. This is useful to allow to set GUCs to values that include spaces; something that wasn't previously possible. The primary case motivating this is the desire to set default_transaction_isolation to 'repeatable read' on a per connection basis, but other usecases like seach_path do also exist. This introduces a slight backward incompatibility: Previously a \ in an option value would have been passed on literally, now it'll be taken as an escape. The relevant mailing list discussion starts with 20140204125823.GJ12016@awork2.anarazel.de.	2014-08-28 13:59:29 +02:00
Heikki Linnakangas	680513ab79	Break out OpenSSL-specific code to separate files. This refactoring is in preparation for adding support for other SSL implementations, with no user-visible effects. There are now two #defines, USE_OPENSSL which is defined when building with OpenSSL, and USE_SSL which is defined when building with any SSL implementation. Currently, OpenSSL is the only implementation so the two #defines go together, but USE_SSL is supposed to be used for implementation-independent code. The libpq SSL code is changed to use a custom BIO, which does all the raw I/O, like we've been doing in the backend for a long time. That makes it possible to use MSG_NOSIGNAL to block SIGPIPE when using SSL, which avoids a couple of syscall for each send(). Probably doesn't make much performance difference in practice - the SSL encryption is expensive enough to mask the effect - but it was a natural result of this refactoring. Based on a patch by Martijn van Oosterhout from 2006. Briefly reviewed by Alvaro Herrera, Andreas Karlsson, Jeff Janes.	2014-08-11 11:54:19 +03:00
Tom Lane	f51ead09df	Avoid wholesale autovacuuming when autovacuum is nominally off. When autovacuum is nominally off, we will still launch autovac workers to vacuum tables that are at risk of XID wraparound. But after we'd done that, an autovac worker would proceed to autovacuum every table in the targeted database, if they meet the usual thresholds for autovacuuming. This is at best pretty unexpected; at worst it delays response to the wraparound threat. Fix it so that if autovacuum is nominally off, we only do forced vacuums and not any other work. Per gripe from Andrey Zhidenkov. This has been like this all along, so back-patch to all supported branches.	2014-07-30 14:41:35 -04:00
Robert Haas	e280c630a8	Fix mishandling of background worker PGPROCs in EXEC_BACKEND builds. InitProcess() relies on IsBackgroundWorker to decide whether the PGPROC for a new backend should be taken from ProcGlobal's freeProcs or from bgworkerFreeProcs. In EXEC_BACKEND builds, InitProcess() is called sooner than in non-EXEC_BACKEND builds, and IsBackgroundWorker wasn't getting initialized soon enough. Report by Noah Misch. Diagnosis and fix by me.	2014-07-30 11:34:06 -04:00
Peter Eisentraut	d38228fe40	Add missing serial commas Also update one place where the wal_level "logical" was not added to an error message.	2014-07-15 08:31:50 -04:00
Kevin Grittner	ac46de56ea	Smooth reporting of commit/rollback statistics. If a connection committed or rolled back any transactions within a PGSTAT_STAT_INTERVAL pacing interval without accessing any tables, the reporting of those statistics would be held up until the connection closed or until it ended a PGSTAT_STAT_INTERVAL interval in which it had accessed a table. This could result in under- reporting of transactions for an extended period, followed by a spike in reported transactions. While this is arguably a bug, the impact is minimal, primarily affecting, and being affected by, monitoring software. It might cause more confusion than benefit to change the existing behavior in released stable branches, so apply only to master and the 9.4 beta. Gurjeet Singh, with review and editing by Kevin Grittner, incorporating suggested changes from Abhijit Menon-Sen and Tom Lane.	2014-07-02 15:20:30 -05:00
Heikki Linnakangas	1c6821be31	Fix and enhance the assertion of no palloc's in a critical section. The assertion failed if WAL_DEBUG or LWLOCK_STATS was enabled; fix that by using separate memory contexts for the allocations made within those code blocks. This patch introduces a mechanism for marking any memory context as allowed in a critical section. Previously ErrorContext was exempt as a special case. Instead of a blanket exception of the checkpointer process, only exempt the memory context used for the pending ops hash table.	2014-06-30 10:26:00 +03:00
Andres Freund	3bdcf6a5a7	Don't allow to disable backend assertions via the debug_assertions GUC. The existance of the assert_enabled variable (backing the debug_assertions GUC) reduced the amount of knowledge some static code checkers (like coverity and various compilers) could infer from the existance of the assertion. That could have been solved by optionally removing the assertion_enabled variable from the Assert() et al macros at compile time when some special macro is defined, but the resulting complication doesn't seem to be worth the gain from having debug_assertions. Recompiling is fast enough. The debug_assertions GUC is still available, but readonly, as it's useful when diagnosing problems. The commandline/client startup option -A, which previously also allowed to enable/disable assertions, has been removed as it doesn't serve a purpose anymore. While at it, reduce code duplication in bufmgr.c and localbuf.c assertions checking for spurious buffer pins. That code had to be reindented anyway to cope with the assert_enabled removal.	2014-06-20 11:09:17 +02:00
Tom Lane	df8b7bc9ff	Improve our mechanism for controlling the Linux out-of-memory killer. Arrange for postmaster child processes to respond to two environment variables, PG_OOM_ADJUST_FILE and PG_OOM_ADJUST_VALUE, to determine whether they reset their OOM score adjustments and if so to what. This is superior to the previous design involving #ifdef's in several ways. The behavior is now available in a default build, and both ends of the adjustment --- the original adjustment of the postmaster's level and the subsequent readjustment by child processes --- can now be controlled in one place, namely the postmaster launch script. So it's no longer necessary for the launch script to act on faith that the server was compiled with the appropriate options. In addition, if someone wants to use an OOM score other than zero for the child processes, that doesn't take a recompile anymore; and we no longer have to cater separately to the two different historical kernel APIs for this adjustment. Gurjeet Singh, somewhat revised by me	2014-06-18 20:12:51 -04:00
Fujii Masao	654e8e4447	Save pg_stat_statements statistics file into $PGDATA/pg_stat directory at shutdown. `187492b6c2` changed pgstat.c so that the stats files were saved into $PGDATA/pg_stat directory when the server was shutdowned. But it accidentally forgot to change the location of pg_stat_statements permanent stats file. This commit fixes pg_stat_statements so that its stats file is also saved into $PGDATA/pg_stat at shutdown. Since this fix changes the file layout, we don't back-patch it to 9.3 where this oversight was introduced.	2014-06-04 12:09:45 +09:00
Tom Lane	f62d417825	Fix unportable setvbuf() usage in initdb. In yesterday's commit `2dc4f011fd`, I tried to force buffering of stdout/stderr in initdb to be what it is by default when the program is run interactively on Unix (since that's how most manual testing is done). This tripped over the fact that Windows doesn't support _IOLBF mode. We dealt with that a long time ago in syslogger.c by falling back to unbuffered mode on Windows. Export that solution in port.h and use it in initdb. Back-patch to 8.4, like the previous commit.	2014-05-15 15:57:54 -04:00
Robert Haas	be7558162a	When a background worker exists with code 0, unregister it. The previous behavior was to restart immediately, which was generally viewed as less useful. Petr Jelinek, with some adjustments by me.	2014-05-07 17:44:42 -04:00
Robert Haas	eee6cf1f33	When a bgworker exits, always call ReleasePostmasterChildSlot. Commit `e2ce9aa27b` was insufficiently well thought out. Repair.	2014-05-07 16:30:23 -04:00
Robert Haas	970d1f76d1	Restart bgworkers immediately after a crash-and-restart cycle. Just as we would start bgworkers immediately after an initial startup of the server, we should restart them immediately when reinitializing. Petr Jelinek and Robert Haas	2014-05-07 16:19:35 -04:00
Robert Haas	4d155d8b08	Detach shared memory from bgworkers without shmem access. Since the postmaster won't perform a crash-and-restart sequence for background workers which don't request shared memory access, we'd better make sure that they can't corrupt shared memory. Patch by me, review by Tom Lane.	2014-05-07 14:56:49 -04:00
Robert Haas	e2ce9aa27b	Never crash-and-restart for bgworkers without shared memory access. The motivation for a crash and restart cycle when a backend dies is that it might have corrupted shared memory on the way down; and we can't recover reliably except by reinitializing everything. But that doesn't apply to processes that don't touch shared memory. Currently, there's nothing to prevent a background worker that doesn't request shared memory access from touching shared memory anyway, but that's a separate bug. Previous to this commit, the coding in postmaster.c was inconsistent: an exit status other than 0 or 1 didn't provoke a crash-and-restart, but failure to release the postmaster child slot did. This change makes those cases consistent.	2014-05-07 13:19:02 -04:00
Bruce Momjian	0a78320057	pgindent run for 9.4 This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.	2014-05-06 12:12:18 -04:00
Tom Lane	cad4fe6455	Use AF_UNSPEC not PF_UNSPEC in getaddrinfo calls. According to the Single Unix Spec and assorted man pages, you're supposed to use the constants named AF_xxx when setting ai_family for a getaddrinfo call. In a few places we were using PF_xxx instead. Use of PF_xxx appears to be an ancient BSD convention that was not adopted by later standardization. On BSD and most later Unixen, it doesn't matter much because those constants have equivalent values anyway; but nonetheless this code is not per spec. In the same vein, replace PF_INET by AF_INET in one socket() call, which wasn't even consistent with the other socket() call in the same function let alone the remainder of our code. Per investigation of a Cygwin trouble report from Marco Atzeri. It's probably a long shot that this will fix his issue, but it's wrong in any case.	2014-04-16 13:21:20 -04:00
Bruce Momjian	4180934651	check socket creation errors against PGINVALID_SOCKET Previously, in some places, socket creation errors were checked for negative values, which is not true for Windows because sockets are unsigned. This masked socket creation errors on Windows. Backpatch through 9.0. 8.4 doesn't have the infrastructure to fix this.	2014-04-16 10:45:48 -04:00
Tom Lane	5d8117e1f3	Block signals earlier during postmaster startup. Formerly, we set up the postmaster's signal handling only when we were about to start launching subprocesses. This is a bad idea though, as it means that for example a SIGINT arriving before that will kill the postmaster instantly, perhaps leaving lockfiles, socket files, shared memory, etc laying about. We'd rather that such a signal caused orderly postmaster termination including releasing of those resources. A simple fix is to move the PostmasterMain stanza that initializes signal handling to an earlier point, before we've created any such resources. Then, an early-arriving signal will be blocked until we're ready to deal with it in the usual way. (The only part that really needs to be moved up is blocking of signals, but it seems best to keep the signal handler installation calls together with that; for one thing this ensures the kernel won't drop any signals we wished to get. The handlers won't get invoked in any case until we unblock signals in ServerLoop.) Per a report from MauMau. He proposed changing the way "pg_ctl stop" works to deal with this, but that'd just be masking one symptom not fixing the core issue. It's been like this since forever, so back-patch to all supported branches.	2014-04-05 18:16:08 -04:00
Tom Lane	fc752505a9	Fix assorted issues in client host name lookup. The code for matching clients to pg_hba.conf lines that specify host names (instead of IP address ranges) failed to complain if reverse DNS lookup failed; instead it silently didn't match, so that you might end up getting a surprising "no pg_hba.conf entry for ..." error, as seen in bug #9518 from Mike Blackwell. Since we don't want to make this a fatal error in situations where pg_hba.conf contains a mixture of host names and IP addresses (clients matching one of the numeric entries should not have to have rDNS data), remember the lookup failure and mention it as DETAIL if we get to "no pg_hba.conf entry". Apply the same approach to forward-DNS lookup failures, too, rather than treating them as immediate hard errors. Along the way, fix a couple of bugs that prevented us from detecting an rDNS lookup error reliably, and make sure that we make only one rDNS lookup attempt; formerly, if the lookup attempt failed, the code would try again for each host name entry in pg_hba.conf. Since more or less the whole point of this design is to ensure there's only one lookup attempt not one per entry, the latter point represents a performance bug that seems sufficient justification for back-patching. Also, adjust src/port/getaddrinfo.c so that it plays as well as it can with this code. Which is not all that well, since it does not have actual support for rDNS lookup, but at least it should return the expected (and required by spec) error codes so that the main code correctly perceives the lack of functionality as a lookup failure. It's unlikely that PG is still being used in production on any machines that require our getaddrinfo.c, so I'm not excited about working harder than this. To keep the code in the various branches similar, this includes back-patching commits `c424d0d105` and `1997f34db4` into 9.2 and earlier. Back-patch to 9.1 where the facility for hostnames in pg_hba.conf was introduced.	2014-04-02 17:11:24 -04:00
Tom Lane	682c5bbec5	Fix bugs in manipulation of PgBackendStatus.st_clienthostname. Initialization of this field was not being done according to the st_changecount protocol (it has to be done within the changecount increment range, not outside). And the test to see if the value should be reported as null was wrong. Noted while perusing uses of Port.remote_hostname. This was wrong from the introduction of this code (commit `4a25bc145`), so back-patch to 9.1.	2014-04-01 21:30:34 -04:00
Robert Haas	79a4d24f31	Make it easy to detach completely from shared memory. The new function dsm_detach_all() can be used either by postmaster children that don't wish to take any risk of accidentally corrupting shared memory; or by forked children of regular backends with the same need. This patch also updates the postmaster children that already do PGSharedMemoryDetach() to do dsm_detach_all() as well. Per discussion with Tom Lane.	2014-03-18 07:58:53 -04:00
Bruce Momjian	886c0be3f6	C comments: remove odd blank lines after #ifdef WIN32 lines	2014-03-13 01:34:42 -04:00
Robert Haas	5a991ef869	Allow logical decoding via the walsender interface. In order for this to work, walsenders need the optional ability to connect to a database, so the "replication" keyword now allows true or false, for backward-compatibility, and the new value "database" (which causes the "dbname" parameter to be respected). walsender needs to loop not only when idle but also when sending decoded data to the user and when waiting for more xlog data to decode. This means that there are now three separate loops inside walsender.c; although some refactoring has been done here, this is still a bit ugly. Andres Freund, with contributions from Álvaro Herrera, and further review by me.	2014-03-10 13:50:28 -04:00
Alvaro Herrera	2b4f2ab33d	Remove the correct pgstat file on DROP DATABASE We were unlinking the permanent file, not the non-permanent one. But since the stat collector already unlinks all permanent files on startup, there was nothing for it to unlink. The non-permanent file remained in place, and was copied to the permanent directory on shutdown, so in effect no file was ever dropped. Backpatch to 9.3, where the issue was introduced by commit `187492b6c2`. Before that, there were no per-database files and thus no file to drop on DROP DATABASE. Per report from Thom Brown. Author: Tomáš Vondra	2014-03-05 13:03:29 -03:00
Robert Haas	dd1a3bccca	Show xid and xmin in pg_stat_activity and pg_stat_replication. Christian Kruse, reviewed by Andres Freund and myself, with further minor adjustments by me.	2014-02-25 12:34:04 -05:00
Tom Lane	643f75ca9b	Fix unportable coding in BackgroundWorkerStateChange(). PIDs aren't necessarily ints; our usual practice for printing them is to explicitly cast to long. Per buildfarm member rover_firefly.	2014-02-15 17:15:05 -05:00
Tom Lane	f0ee42d59b	Fix unportable coding in DetermineSleepTime(). We should not assume that struct timeval.tv_sec is a long, because it ain't necessarily. (POSIX says that it's a time_t, which might well be 64 bits now or in the future; or for that matter might be 32 bits on machines with 64-bit longs.) Per buildfarm member panther. Back-patch to 9.3 where the dubious coding was introduced.	2014-02-15 17:09:50 -05:00
Tom Lane	60ff2fdd99	Centralize getopt-related declarations in a new header file pg_getopt.h. We used to have externs for getopt() and its API variables scattered all over the place. Now that we find we're going to need to tweak the variable declarations for Cygwin, it seems like a good idea to have just one place to tweak. In this commit, the variables are declared "#ifndef HAVE_GETOPT_H". That may or may not work everywhere, but we'll soon find out. Andres Freund	2014-02-15 14:31:30 -05:00
Alvaro Herrera	801c2dc72c	Separate multixact freezing parameters from xid's Previously we were piggybacking on transaction ID parameters to freeze multixacts; but since there isn't necessarily any relationship between rates of Xid and multixact consumption, this turns out not to be a good idea. Therefore, we now have multixact-specific freezing parameters: vacuum_multixact_freeze_min_age: when to remove multis as we come across them in vacuum (default to 5 million, i.e. early in comparison to Xid's default of 50 million) vacuum_multixact_freeze_table_age: when to force whole-table scans instead of scanning only the pages marked as not all visible in visibility map (default to 150 million, same as for Xids). Whichever of both which reaches the 150 million mark earlier will cause a whole-table scan. autovacuum_multixact_freeze_max_age: when for cause emergency, uninterruptible whole-table scans (default to 400 million, double as that for Xids). This means there shouldn't be more frequent emergency vacuuming than previously, unless multixacts are being used very rapidly. Backpatch to 9.3 where multixacts were made to persist enough to require freezing. To avoid an ABI break in 9.3, VacuumStmt has a couple of fields in an unnatural place, and StdRdOptions is split in two so that the newly added fields can go at the end. Patch by me, reviewed by Robert Haas, with additional input from Andres Freund and Tom Lane.	2014-02-13 19:36:31 -03:00
Peter Eisentraut	66c04c981d	Mark some more variables as static or include the appropriate header Detected by clang's -Wmissing-variable-declarations. From: Andres Freund <andres@anarazel.de>	2014-02-08 21:21:46 -05:00
Robert Haas	b7643b19f0	Fix compiler warning in EXEC_BACKEND builds. Per a report by Rajeev Rastogi.	2014-01-28 23:35:50 -05:00
Fujii Masao	9132b189bf	Add pg_stat_archiver statistics view. This view shows the statistics about the WAL archiver process's activity. Gabriele Bartolini, reviewed by Michael Paquier, refactored a bit by me.	2014-01-29 02:58:22 +09:00

... 5 6 7 8 9 ...

1785 Commits