rather than elog(FATAL), when there is no more room in ShmemBackendArray.
This is a security issue since too many connection requests arriving close
together could cause the postmaster to shut down, resulting in denial of
service. Reported by Yoshiyuki Asaba, fixed by Magnus Hagander.
an LWLock instead of a spinlock. This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND). Per previous discussion.
file. The original code probed the PGPROC array separately for each PID,
which was not good for large numbers of backends: not only is the runtime
O(N^2) but most of it is spent holding ProcArrayLock. Instead, take the
lock just once and copy the active PIDs into an array, then use qsort
and bsearch so that the lookup time is more like O(N log N).
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
to assume that the string pointer passed to set_ps_display is good forever.
There's no need to anyway since ps_status.c itself saves the string, and
we already had an API (get_ps_display) to return it.
I believe this explains Jim Nasby's report of intermittent crashes in
elog.c when %i format code is in use in log_line_prefix.
While at it, repair a previously unnoticed problem: on some platforms such as
Darwin, the string returned by get_ps_display was blank-padded to the maximum
length, meaning that lock.c's attempt to append " waiting" to it never worked.
since it can take a fair amount of time and this can confuse boot scripts
that expect postmaster.pid to appear quickly. Move initialization of SSL
library and preloaded libraries to after that point, too, just for luck.
Per reports from Tony Caduto and others.
generated by bitmap index scans. Along the way, simplify and speed up
the code for counting sequential and index scans; it was both confusing
and inefficient to be taking care of that in the per-tuple loops, IMHO.
initdb forced because of internal changes in pg_stat view definitions.
recovered. I did not see any actual leak while testing this in CVS tip,
but 8.0 definitely has a problem with leaking the space temporarily
palloc'd by BufferSync(). In any case this seems a good idea to forestall
similar problems in future. Per report from Arjen van der Meijden.
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
(the stats system has always collected this info, but the views were
filtering it out). Modify autovacuum so that over-threshold activity
in a toast table can trigger a VACUUM of the parent table, even if the
parent didn't appear to need vacuuming itself. Per discussion a month
or so back about "short, wide tables".
should surely be timestamptz not timestamp; fix some but not all of the
holes in check_and_make_absolute(); other minor cleanup. Also put in
the missed catversion bump.
delay and limit, both as global GUCs and as table-specific entries in
pg_autovacuum. stats_reset_on_server_start is now OFF by default,
but a reset is forced if we did WAL replay. XID-wrap vacuums do not
ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera
against the PGPROC array. Anything in the file that isn't in PGPROC
gets rejected as being a stale entry. This should solve complaints about
stale entries in pg_stat_activity after a BETERM message has been dropped
due to overload.
exit, instead of trying to take shortcuts. Introduce some additional
shutdown callback routines to eliminate kluges like having ProcKill
be responsible for shutting down the buffer manager. Ensure that the
order of operations during shutdown is predictable and what you would
expect given the module layering.
doesn't block the bgwriter from making progress writing out other buffers.
This was a hard problem in the context of the ARC/2Q design, but it's
trivial in the context of clock sweep ... just advance the sweep counter
before we try to write not after.
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly. Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries. Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA. This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it. Per recent discussion.
the difference between checkpoints forced due to WAL segment consumption
and checkpoints forced for other reasons (such as CREATE DATABASE). Avoid
generating 'checkpoints are occurring too frequently' messages when the
checkpoint wasn't caused by WAL segment consumption. Per gripe from
Chris K-L.
current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields. This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.
and pg_auth_members. There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance). But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies. The catalog changes should
be pretty much done.
includes error checking and an appropriate ereport(ERROR) message.
This gets rid of rather tedious and error-prone manipulation of errno,
as well as a Windows-specific bug workaround, at more than a dozen
call sites. After an idea in a recent patch by Heikki Linnakangas.
spotted by Qingqing Zhou. The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places. If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL. But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
hash table. This is a pretty unlikely scenario, since the table
should be tiny, but we can't guarantee continued correct operation
if it does occur. Spotted by Qingqing Zhou.
collector messages, per recent discussion on pgsql-patches. This
actually required quite a few changes -- for example,
"databaseid != InvalidOid" was used to check whether a slot in the
backend entry table was initialized, but that no longer works since
the slot might be initialized prior to receiving the BESTART message
which contains the database id. We now use procpid > 0 to indicate
that a slot is non-empty.
Other changes:
- various comment improvements and cleanups
- there's no need to zero-out the entire activity buffer in
pgstat_add_backend(), we can just set activity[0] to '\0'.
- remove the counting of the # of connections to a database; this
was not used anywhere
One change in behavior I wasn't sure about: previously, the code
would create a hash table entry for a database as soon as any message
was received whose header referenced that database. Now, we only
create hash table entries as needed (so for example BESTART won't
create a database hash table entry, since it doesn't need to
access anything in the per-db hash table). It would be easy enough
to retain the old behavior, but AFAICS it is not required.
* Add session start time to pg_stat_activity
* Add the client IP address and port to pg_stat_activity
Original patch from Magnus Hagander, code review by Neil Conway. Catalog
version bumped. This patch sends the client IP address and port number in
every statistics message; that's not ideal, but will be fixed up shortly.
* Changes the APIs to the timezone functions to take a pg_tz pointer as
an argument, representing the timezone to use for the selected
operation.
* Adds a global_timezone variable that represents the current timezone
in the backend as set by SET TIMEZONE (or guc, or env, etc).
* Implements a hash-table cache of loaded tables, so we don't have to
read and parse the TZ file everytime we change a timezone. While not
necesasry now (we don't change timezones very often), I beleive this
will be necessary (or at least good) when "multiple timezones in the
same query" is eventually implemented. And code-wise, this was the time
to do it.
There are no user-visible changes at this time. Implementing the
"multiple zones in one query" is a later step...
This also gets rid of some of the cruft needed to "back out a timezone
change", since we previously couldn't check a timezone unless it was
activated first.
Passes regression tests on win32, linux (slackware 10) and solaris x86.
Magnus Hagander
whose keys are OIDs. The only one that looks particularly performance
critical is the relcache hashtable, but as long as we've got the function
we may as well use it wherever it's applicable.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
exit. Without this, operations triggered during backend exit (such as
temp table deletions) won't be counted ... which given heavy usage of
temp tables can lead to pg_autovacuum falling way behind on the need
to vacuum pg_class and pg_attribute. Per reports from Steve Crawford
and others.
* Touch the socket and lock file at least every hour, to
* ensure that they are not removed by overzealous /tmp-cleaning
* tasks. Set to 58 minutes so a cleaner never sees the
* file as an hour old.
is still alive. This improves our odds of not getting fooled by an
unrelated process when checking a stale lock file. Other checks already
in place, plus one newly added in checkDataDir(), ensure that we cannot
attempt to usurp the place of a postmaster belonging to a different userid,
so there is no need to error out. Add comments indicating the importance
of these other checks.
elog if the former has trouble writing its file. Code review for
Magnus' patch to redirect stderr to syslog on Windows (Bruce's version
seems right, but did some minor prettification).
Backpatch both changes to 8.0 branch.
before we can invoke fork() -- flush stdio buffers, save and restore the
profiling timer on Linux with LINUX_PROFILE, and handle BeOS stuff. This
patch moves that code into a single function, fork_process(), instead of
duplicating it at the various callsites of fork().
This patch doesn't address the EXEC_BACKEND case; there is room for
further cleanup there.
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers. This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management. Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
in GetNewTransactionId(). Since the limit value has to be computed
before we run any real transactions, this requires adding code to database
startup to scan pg_database and determine the oldest datfrozenxid.
This can conveniently be combined with the first stage of an attack on
the problem that the 'flat file' copies of pg_shadow and pg_group are
not properly updated during WAL recovery. The code I've added to
startup resides in a new file src/backend/utils/init/flatfiles.c, and
it is responsible for rewriting the flat files as well as initializing
the XID wraparound limit value. This will eventually allow us to get
rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add
a trigger to pg_database.
is the minimum required fix. I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
to shared memory as soon as possible, ie, right after read_backend_variables.
The effective difference from the original code is that this happens
before instead of after read_nondefault_variables(), which loads GUC
information and is apparently capable of expanding the backend's memory
allocation more than you'd think it should. This should fix the
failure-to-attach-to-shared-memory reports we've been seeing on Windows.
Also clean up a few bits of unnecessarily grotty EXEC_BACKEND code.
> seconds to 10 seconds. The original number was plucked from thin air
> some months ago, and I'd like to review that now based upon further
> thought, observation and experience.
>
> This change has little or no effect on performance, since the interval
> is there mainly to avoid repeated respawn attempts if archiver fails at
> startup. Archiver start-up time is very quick, so there is little danger
> of exceeding 10 seconds.
>
> On a busy system, if the archiver does die, then many files can build up
> in the 60 seconds before respawning. That xlog file backlog could take
> some time to clear. This then leaves a larger than normal window of data
> loss for a possibly long period.
>
> It's a minor change only, with no other effect on function.
Simon Riggs
even uglier than it was already :-(. Also, on Windows only, use temporary
shared memory segments instead of ordinary files to pass over critical
variable values from postmaster to child processes. Magnus Hagander
plain SUSET instead. Also delay processing of options received in
client connection request until after we know if the user is a superuser,
so that SUSET values can be set that way by legitimate superusers.
Per recent discussion.
files and directories. This ensures that the bgwriter will close any open
file references it is holding for files therein, which is needed for the
rmdir() to succeed. Andrew Dunstan and Tom Lane.
returning a NULL pointer (some callers remembered to check the return
value, but some did not -- it is safer to just bail out).
Also, cleanup pgstat.c to use elog(ERROR) rather than elog(LOG) followed
by exit().
C:\msys\1.0\home\y-asaba>pg_ctl -D data restart
waiting for postmaster to shut down...LOG: received smart shutdown
request.
LOG: shutting down
LOG: database system is shut down
done
postmaster stopped
postmaster starting
C:\msys\1.0\home\y-asaba>postmaster.exe: invalid argument: "'-D'"
Try "postmaster.exe --help" for more information.
Yoshiyuki Asaba
The vars are renamed to data_directory, config_file, hba_file, and
ident_file, and are guaranteed to be set to accurate absolute paths
during postmaster startup.
This commit does not yet do anything about hiding path values from
non-superusers.
Refactor code into something reasonably understandable, cause
use of the feature to not fail in standalone backends or in
EXEC_BACKEND case, fix sloppy guc.c table entries, make the
documentation minimally usable.