ctype are now more like encoding, stored in new datcollate and datctype
columns in pg_database.
This is a stripped-down version of Radek Strnad's patch, with further
changes by me.
each connection. This makes it possible to catch errors in the pg_hba
file when it's being reloaded, instead of silently reloading a broken
file and failing only when a user tries to connect.
This patch also makes the "sameuser" argument to ident authentication
optional.
unnecessary cache resets. The major changes are:
* When the queue overflows, we only issue a cache reset to the specific
backend or backends that still haven't read the oldest message, rather
than resetting everyone as in the original coding.
* When we observe backend(s) falling well behind, we signal SIGUSR1
to only one backend, the one that is furthest behind and doesn't already
have a signal outstanding for it. When it finishes catching up, it will
in turn signal SIGUSR1 to the next-furthest-back guy, if there is one that
is far enough behind to justify a signal. The PMSIGNAL_WAKEN_CHILDREN
mechanism is removed.
* We don't attempt to clean out dead messages after every message-receipt
operation; rather, we do it on the insertion side, and only when the queue
fullness passes certain thresholds.
* Split SInvalLock into SInvalReadLock and SInvalWriteLock so that readers
don't block writers nor vice versa (except during the infrequent queue
cleanout operations).
* Transfer multiple sinval messages for each acquisition of a read or
write lock.
do CancelBackup at a sane place, fix some oversights in the state transitions,
allow only superusers to connect while we are waiting for backup mode to end.
have pg_ctl warn about this.
Cancel running online backups (by renaming the backup_label file,
thus rendering the backup useless) when shutting down in fast mode.
Laurenz Albe
key files that are similar to the one for the postmaster's data directory
permissions check. (I chose to standardize on that one since it's the most
heavily used and presumably best-wordsmithed by now.) Also eliminate explicit
tests on file ownership in these places, since the ensuing read attempt must
fail anyway if it's wrong, and there seems no value in issuing the same error
message for distinct problems. (But I left in the explicit ownership test in
postmaster.c, since it had its own error message anyway.) Also be more
specific in the documentation's descriptions of these checks. Per a gripe
from Kevin Hunter.
With the addition of multiple autovacuum workers, our choices were to delete
the check, document the interaction with autovacuum_max_workers, or complicate
the check to try to hide that interaction. Since this restriction has never
been adequate to ensure backends can't run out of pinnable buffers, it doesn't
really have enough excuse to live to justify the second or third choices.
Per discussion of a complaint from Andreas Kling (see also bug #3888).
This commit also removes several documentation references to this restriction,
but I'm not sure I got them all.
finish archiving everything (when there's no error), and to eliminate various
hazards as best we can. This fixes a previous 8.3 patch that caused the
postmaster to kill and then restart the archiver during shutdown (!?).
The new behavior is that the archiver is allowed to run unmolested until
the bgwriter has exited; then it is sent SIGUSR2 to tell it to do a final
archiving cycle and quit. We only SIGQUIT the archiver if we want a panic
stop; this is important since SIGQUIT will also be sent to any active
archive_command. The postmaster also now doesn't SIGQUIT the stats collector
until the bgwriter is done, since the bgwriter can send stats messages in 8.3.
The postmaster will not exit until both the archiver and stats collector are
gone; this provides some defense (not too bulletproof) against conflicting
archiver or stats collector processes being started by a new postmaster
instance. We continue the prior practice that the archiver will check
for postmaster death immediately before issuing any archive_command; that
gives some additional protection against conflicting archivers.
Also, modify the archiver process to notice SIGTERM and refuse to issue any
more archive commands if it gets it. The postmaster doesn't ever send it
SIGTERM; we assume that any such signal came from init and is a notice of
impending whole-system shutdown. In this situation it seems imprudent to try
to start new archive commands --- if they aren't extremely quick they're
likely to get SIGKILL'd by init.
All per discussion.
childprocess deaths instead of using one thread per child. This drastastically
reduces the address space usage and should allow for more backends running.
Also change the win32_waitpid functionality to use an IO Completion Port for
queueing child death notices instead of using a fixed-size array.
- create a separate archive_mode GUC, on which archive_command is dependent
- %r option in recovery.conf sends last restartpoint to recovery command
- %r used in pg_standby, updated README
- minor other code cleanup in pg_standby
- doc on Warm Standby now mentions pg_standby and %r
- log_restartpoints recovery option emits LOG message at each restartpoint
- end of recovery now displays last transaction end time, as requested
by Warren Little; also shown at each restartpoint
- restart archiver if needed to carry away WAL files at shutdown
Simon Riggs
constant flow of new connection requests could prevent the postmaster from
completing a shutdown or crash restart. This is done by labeling child
processes that are "dead ends", that is, we know that they were launched only
to tell a client that it can't connect. These processes are managed
separately so that they don't confuse us into thinking that we can't advance
to the next stage of a shutdown or restart sequence, until the very end
where we must wait for them to drain out so we can delete the shmem segment.
Per discussion of a misbehavior reported by Keaton Adams.
Since this code was baroque already, and my first attempt at fixing the
problem made it entirely impenetrable, I took the opportunity to rewrite it
in a state-machine style. That eliminates some duplicated code sections and
hopefully makes everything a bit clearer.
as well as regular backends: if no regular backend launches before the autovac
launcher tries to start an autovac worker, the postmaster would get an Assert
fault due to calling PostmasterRandom before random_seed was initialized.
Cleanest solution seems to be to take the initialization of random_seed out
of ServerLoop and let PostmasterRandom do it for itself.
not bothering to initialize is_autovacuum for regular backends, meaning there
was a significant chance of the postmaster prematurely sending them SIGTERM
during database shutdown. Also, leaving the cancel key unset for an autovac
worker meant that any client could send it SIGINT, which doesn't sound
especially good either.
so that we will be able to create a cookie for all processes for CSVlogs.
It is set wherever MyProcPid is set. Take the opportunity to remove the now
unnecessary session-only restriction on the %s and %c escapes in log_line_prefix.
and fsync WAL at convenient intervals. For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.
This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature. I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.
against a Unix server, and Windows-specific server-side authentication
using SSPI "negotiate" method (Kerberos or NTLM).
Only builds properly with MSVC for now.
for it to die before telling the bgwriter to initiate shutdown checkpoint.
Since it's connected to shared memory, this seems more prudent than the
alternative of letting it quit asynchronously. Resolves my complaint
of yesterday about repeated shutdown checkpoints in CVS HEAD.
continue with the schedule. Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.
Patch from ITAGAKI Takahiro, with some editorializing of my own.
by having the postmaster signal it when certain failures occur. This requires
the postmaster setting a flag in shared memory, but should be as safe as the
pmsignal.c code is.
Also make sure the launcher honor's a postgresql.conf change turning it off
on SIGHUP.
o read global SSL configuration file
o add GUC "ssl_ciphers" to control allowed ciphers
o add libpq environment variable PGSSLKEY to control SSL hardware keys
Victor B. Wagner
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work. This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.
For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
socket is still read-ready, the code was a tight loop, wasting lots of CPU.
We can't do anything to clear the failure, other than wait, but we should give
other processes more chance to finish and release FDs; so insert a small sleep.
Also, avoid bogus "close(-1)" in this case. Per report from Jim Nasby.
"database system is ready to accept connections", which is issued by the
postmaster when it really is ready to accept connections. Per proposal from
Markus Schiltknecht and subsequent discussion.
accessing it, like DROP DATABASE. This allows the regression tests to pass
with autovacuum enabled, which open the gates for finally enabling autovacuum
by default.
an optarg). Add some comments noting that code in three different files has
to be kept in sync. Fix erroneous description of -S switch (it sets work_mem
not silent_mode), and do some light copy-editing elsewhere in postgres-ref.
StartupXLOG and ShutdownXLOG no longer need to be critical sections, because
in all contexts where they are invoked, elog(ERROR) would be translated to
elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true:
set ExitOnAnyError before trying to exit. This is a good fix anyway since
the existing code would have gone into an infinite loop on elog(ERROR) during
shutdown.) That avoids a misleading report of PANIC during semi-orderly
failures. Modify the postmaster to include the startup process in the set of
processes that get SIGTERM when a fast shutdown is requested, and also fix it
to not try to restart the bgwriter if the bgwriter fails while trying to write
the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does
something reasonable for a system in warm standby mode, and so should Unix
system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and
some corner-case testing of my own.
Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process. This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command. Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system(). (There is no support in the
core backend for that, but it's widely done using untrusted PLs.) Per gripe
from Stephen Harris and subsequent discussion.
promoted to FATAL) end in exit(1) not exit(0). Then change the postmaster to
allow exit(1) without a system-wide panic, but not for the startup subprocess
or the bgwriter. There were a couple of places that were using exit(1) to
deliberately force a system-wide panic; adjust these to be exit(2) instead.
This fixes the problem noted back in July that if the startup process exits
with elog(ERROR), the postmaster would think everything is hunky-dory and
proceed to start up. Alternative solutions such as trying to run the entire
startup process as a critical section seem less clean, primarily because of
the fact that a fair amount of startup code is shared by all postmaster
children in the EXEC_BACKEND case. We'd need an ugly special case somewhere
near the head of main.c to make it work if it's the child process's
responsibility to determine what happens; and what's the point when the
postmaster already treats different children differently?
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
such as debugging and performance measurement. This consists of two features:
a table of "rendezvous variables" that allows separately-loaded shared
libraries to communicate, and a new GUC setting "local_preload_libraries"
that allows libraries to be loaded into specific sessions without explicit
cooperation from the client application. To make local_preload_libraries
as flexible as possible, we do not restrict its use to superusers; instead,
it is restricted to load only libraries stored in $libdir/plugins/. The
existing LOAD command has also been modified to allow non-superusers to
LOAD libraries stored in this directory.
This patch also renames the existing GUC variable preload_libraries to
shared_preload_libraries (after a suggestion by Simon Riggs) and does some
code refactoring in dfmgr.c to improve clarity.
Korry Douglas, with a little help from Tom Lane.
loaded libraries: call functions _PG_init() and _PG_fini() if the library
defines such symbols. Hence we no longer need to specify an initialization
function in preload_libraries: we can assume that the library used the
_PG_init() convention, instead. This removes one source of pilot error
in use of preloaded libraries. Original patch by Ralf Engelschall,
preload_libraries changes by me.
it's handled just about like timezone; in particular, don't try
to read anything during InitializeGUCOptions. Should solve current
startup failure on Windows, and avoid wasted cycles if a nondefault
setting is specified in postgresql.conf too. Possibly we need to
think about a more general solution for handling 'expensive to set'
GUC options.
EINTR; the stats code was failing to do this and so were a couple of places
in the postmaster. The stats code assumed that recv() could not return EINTR
if a preceding select() showed the socket to be read-ready, but this is
demonstrably false with our Windows implementation of recv(), and it may
not be the case on all Unix variants either. I think this explains the
intermittent stats regression test failures we've been seeing, as well
as reports of stats collector instability under high load on Windows.
Backpatch as far as 8.0.
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid,
based on which we calculate the pg_database columns after each vacuum.
We now force all databases to be vacuumed, even template ones. A backend
noticing too old a database (meaning pg_database.datminxid is in danger of
falling behind Xid wraparound) will signal the postmaster, which in turn will
start an autovacuum iteration to process the offending database. In principle
this is only there to cope with frozen (non-connectable) databases without
forcing users to set them to connectable, but it could force regular user
database to go through a database-wide vacuum at any time. Maybe we should
warn users about this somehow. Of course the real solution will be to use
autovacuum all the time ;-)
There are some additional improvements we could have in this area: for example
the vacuum code could be smarter about not updating pg_database for each table
when called by autovacuum, and do it only once the whole autovacuum iteration
is done.
I updated the system catalogs documentation, but I didn't modify the
maintenance section. Also having some regression tests for this would be nice
but it's not really a very straightforward thing to do.
Catalog version bumped due to system catalog changes.
be delivered directly to the collector process. The extra process context
swaps required to transfer data through the buffer process seem to outweigh
any value the buffering might have. Per recent discussion and tests.
I modified Bruce's draft patch to use poll() rather than select() where
available (this makes a noticeable difference on my system), and fixed
up the EXEC_BACKEND case.
changing semantics too much. statement_timestamp is now set immediately
upon receipt of a client command message, and the various places that used
to do their own gettimeofday() calls to mark command startup are referenced
to that instead. I have also made stats_command_string use that same
value for pg_stat_activity.query_start for both the command itself and
its eventual replacement by <IDLE> or <idle in transaction>. There was
some debate about that, but no argument that seemed convincing enough to
justify an extra gettimeofday() call.
current commands; instead, store current-status information in shared
memory. This substantially reduces the overhead of stats_command_string
and also ensures that pg_stat_activity is fully up to date at all times.
Per my recent proposal.
o remove many WIN32_CLIENT_ONLY defines
o add WIN32_ONLY_COMPILER define
o add 3rd argument to open() for portability
o add include/port/win32_msvc directory for
system includes
Magnus Hagander
byte-swapping on the port number which causes the call to fail on Intel
Macs.
This patch uses htons() instead of htonl() and fixes this bug.
Ashley Clark
it later. This fixes a problem where EXEC_BACKEND didn't have progname
set, causing a segfault if log_min_messages was set below debug2 and our
own snprintf.c was being used.
Also alway strdup() progname.
Backpatch to 8.1.X and 8.0.X.
rather than elog(FATAL), when there is no more room in ShmemBackendArray.
This is a security issue since too many connection requests arriving close
together could cause the postmaster to shut down, resulting in denial of
service. Reported by Yoshiyuki Asaba, fixed by Magnus Hagander.
an LWLock instead of a spinlock. This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND). Per previous discussion.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
to assume that the string pointer passed to set_ps_display is good forever.
There's no need to anyway since ps_status.c itself saves the string, and
we already had an API (get_ps_display) to return it.
I believe this explains Jim Nasby's report of intermittent crashes in
elog.c when %i format code is in use in log_line_prefix.
While at it, repair a previously unnoticed problem: on some platforms such as
Darwin, the string returned by get_ps_display was blank-padded to the maximum
length, meaning that lock.c's attempt to append " waiting" to it never worked.
since it can take a fair amount of time and this can confuse boot scripts
that expect postmaster.pid to appear quickly. Move initialization of SSL
library and preloaded libraries to after that point, too, just for luck.
Per reports from Tony Caduto and others.
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
should surely be timestamptz not timestamp; fix some but not all of the
holes in check_and_make_absolute(); other minor cleanup. Also put in
the missed catversion bump.
delay and limit, both as global GUCs and as table-specific entries in
pg_autovacuum. stats_reset_on_server_start is now OFF by default,
but a reset is forced if we did WAL replay. XID-wrap vacuums do not
ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly. Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries. Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA. This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it. Per recent discussion.
current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields. This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.
and pg_auth_members. There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance). But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies. The catalog changes should
be pretty much done.
* Touch the socket and lock file at least every hour, to
* ensure that they are not removed by overzealous /tmp-cleaning
* tasks. Set to 58 minutes so a cleaner never sees the
* file as an hour old.
is still alive. This improves our odds of not getting fooled by an
unrelated process when checking a stale lock file. Other checks already
in place, plus one newly added in checkDataDir(), ensure that we cannot
attempt to usurp the place of a postmaster belonging to a different userid,
so there is no need to error out. Add comments indicating the importance
of these other checks.
before we can invoke fork() -- flush stdio buffers, save and restore the
profiling timer on Linux with LINUX_PROFILE, and handle BeOS stuff. This
patch moves that code into a single function, fork_process(), instead of
duplicating it at the various callsites of fork().
This patch doesn't address the EXEC_BACKEND case; there is room for
further cleanup there.
in GetNewTransactionId(). Since the limit value has to be computed
before we run any real transactions, this requires adding code to database
startup to scan pg_database and determine the oldest datfrozenxid.
This can conveniently be combined with the first stage of an attack on
the problem that the 'flat file' copies of pg_shadow and pg_group are
not properly updated during WAL recovery. The code I've added to
startup resides in a new file src/backend/utils/init/flatfiles.c, and
it is responsible for rewriting the flat files as well as initializing
the XID wraparound limit value. This will eventually allow us to get
rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add
a trigger to pg_database.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
to shared memory as soon as possible, ie, right after read_backend_variables.
The effective difference from the original code is that this happens
before instead of after read_nondefault_variables(), which loads GUC
information and is apparently capable of expanding the backend's memory
allocation more than you'd think it should. This should fix the
failure-to-attach-to-shared-memory reports we've been seeing on Windows.
Also clean up a few bits of unnecessarily grotty EXEC_BACKEND code.
even uglier than it was already :-(. Also, on Windows only, use temporary
shared memory segments instead of ordinary files to pass over critical
variable values from postmaster to child processes. Magnus Hagander
plain SUSET instead. Also delay processing of options received in
client connection request until after we know if the user is a superuser,
so that SUSET values can be set that way by legitimate superusers.
Per recent discussion.
C:\msys\1.0\home\y-asaba>pg_ctl -D data restart
waiting for postmaster to shut down...LOG: received smart shutdown
request.
LOG: shutting down
LOG: database system is shut down
done
postmaster stopped
postmaster starting
C:\msys\1.0\home\y-asaba>postmaster.exe: invalid argument: "'-D'"
Try "postmaster.exe --help" for more information.
Yoshiyuki Asaba
The vars are renamed to data_directory, config_file, hba_file, and
ident_file, and are guaranteed to be set to accurate absolute paths
during postmaster startup.
This commit does not yet do anything about hiding path values from
non-superusers.
Refactor code into something reasonably understandable, cause
use of the feature to not fail in standalone backends or in
EXEC_BACKEND case, fix sloppy guc.c table entries, make the
documentation minimally usable.
* We are not sure how much precision is in tv_usec, so we
* swap the high and low 16 bits of 'later' and XOR them with
* 'earlier'. On the off chance that the result is 0, we
* loop until it isn't.
Greg Stark
different backends get a reasonably wide set of initial seeds even if
gettimeofday returns tv_usec values with only a few bits of precision.
Per recent discussion.
* Links with -leay32 and -lssleay32 instead of crypto and ssl. On win32,
"crypto and ssl" is only used for static linking.
* Initializes SSL in the backend and not just in the postmaster. We
cannot pass the SSL context from the postmaster through the parameter
file, because it contains function pointers.
* Split one error check in be-secure.c. Previously we could not tell
which of three calls actually failed. The previous code also returned
incorrect error messages if SSL_accept() failed - that function needs to
use SSL_get_error() on the return value, can't just use the error queue.
* Since the win32 implementation uses non-blocking sockets "behind the
scenes" in order to deliver signals correctly, implements a version of
SSL_accept() that can handle this. Also, add a wait function in case
SSL_read or SSL_write() needs more data.
Magnus Hagander
Win32 WaitForMultipleObjects:
ret = WaitForMultipleObjects(win32_numChildren, win32_childHNDArray,
FALSE, 0);
Problem is 'win32_numChildren' could be more then 64 ( function supports
), problem basically arise ( kills postgres ) when you create more then
64 connections and terminate some of them sill leaving more then 64.
Claudio Natoli
recommend that people go get Apache's rotatelogs program. Additional
benefits are that configuration is done through GUC, rather than
externally, and that the postmaster can monitor the log rotator and
restart it after failure (though we certainly hope that won't happen
often).
Andreas Pflug, some rework by Tom Lane.