module and teach PREPARE and protocol-level prepared statements to use it.
In service of this, rearrange utility-statement processing so that parse
analysis does not assume table schemas can't change before execution for
utility statements (necessary because we don't attempt to re-acquire locks
for utility statements when reusing a stored plan). This requires some
refactoring of the ProcessUtility API, but it ends up cleaner anyway,
for instance we can get rid of the QueryContext global.
Still to do: fix up SPI and related code to use the plan cache; I'm tempted to
try to make SQL functions use it too. Also, there are at least some aspects
of system state that we want to ensure remain the same during a replan as in
the original processing; search_path certainly ought to behave that way for
instance, and perhaps there are others.
o read global SSL configuration file
o add GUC "ssl_ciphers" to control allowed ciphers
o add libpq environment variable PGSSLKEY to control SSL hardware keys
Victor B. Wagner
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work. This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.
For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
socket is still read-ready, the code was a tight loop, wasting lots of CPU.
We can't do anything to clear the failure, other than wait, but we should give
other processes more chance to finish and release FDs; so insert a small sleep.
Also, avoid bogus "close(-1)" in this case. Per report from Jim Nasby.
entries for the victim database go away sooner rather than later. We already
did the equivalent thing at the per-relation level, not sure why it's not
been done for whole databases. With this change, pgstat_vacuum_tabstat
should usually not find anything to do; though we still need it as a backstop
in case DROPDB or TABPURGE messages get lost under load.
already collected in the current transaction; this allows plpgsql functions to
watch for stats updates even though they are confined to a single transaction.
Use this instead of the previous kluge involving pg_stat_file() to wait for
the stats collector to update in the stats regression test. Internally,
decouple storage of stats snapshots from transaction boundaries; they'll
now stick around until someone calls pgstat_clear_snapshot --- which xact.c
still does at transaction end, to maintain the previous behavior. This makes
the logic a lot cleaner, at the price of a couple dozen cycles per transaction
exit.
"database system is ready to accept connections", which is issued by the
postmaster when it really is ready to accept connections. Per proposal from
Markus Schiltknecht and subsequent discussion.
input in the stats collector. Our select() emulation is apparently buggy
for UDP sockets :-(. This should resolve problems with stats collection
(and hence autovacuum) failing under more than minimal load. Diagnosis
and patch by Magnus Hagander.
Patch probably needs to be back-ported to 8.1 and 8.0, but first let's
see if it makes the buildfarm happy...
is deleted. A backend about to unlink a file now sends a "revoke fsync"
request to the bgwriter to make it clean out pending fsync requests. There
is still a race condition where the bgwriter may try to fsync after the unlink
has happened, but we can resolve that by rechecking the fsync request queue
to see if a revoke request arrived meanwhile. This eliminates the former
kluge of "just assuming" that an ENOENT failure is okay, and lets us handle
the fact that on Windows it might be EACCES too without introducing any
questionable assumptions. After an idea of mine improved by Magnus.
The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port
later. In the meantime this could do with some testing on Windows; I've been
able to force it through the code path via ENOENT, but that doesn't prove that
it actually fixes the Windows problem ...
accessing it, like DROP DATABASE. This allows the regression tests to pass
with autovacuum enabled, which open the gates for finally enabling autovacuum
by default.
(or other types of pg_class entry): the function pgstat_vacuum_tabstat,
invoked during VACUUM startup, had runtime proportional to the number of
stats table entries times the number of pg_class rows; in other words
O(N^2) if the stats collector's information is reasonably complete.
Replace list searching with a hash table to bring it back to O(N)
behavior. Per report from kim at myemma.com.
Back-patch as far as 8.1; 8.0 and before use different coding here.
an optarg). Add some comments noting that code in three different files has
to be kept in sync. Fix erroneous description of -S switch (it sets work_mem
not silent_mode), and do some light copy-editing elsewhere in postgres-ref.
identify long-running transactions. Since we already need to record
the transaction-start time (e.g. for now()), we don't need any
additional system calls to report this information.
Catversion bumped, initdb required.
should allow delete-pending files to actually go away, and thereby work
around the various complaints we've seen about 'permission denied'
errors in such cases. Should be reasonably harmless in any case...
StartupXLOG and ShutdownXLOG no longer need to be critical sections, because
in all contexts where they are invoked, elog(ERROR) would be translated to
elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true:
set ExitOnAnyError before trying to exit. This is a good fix anyway since
the existing code would have gone into an infinite loop on elog(ERROR) during
shutdown.) That avoids a misleading report of PANIC during semi-orderly
failures. Modify the postmaster to include the startup process in the set of
processes that get SIGTERM when a fast shutdown is requested, and also fix it
to not try to restart the bgwriter if the bgwriter fails while trying to write
the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does
something reasonable for a system in warm standby mode, and so should Unix
system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and
some corner-case testing of my own.
Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process. This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command. Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system(). (There is no support in the
core backend for that, but it's widely done using untrusted PLs.) Per gripe
from Stephen Harris and subsequent discussion.
promoted to FATAL) end in exit(1) not exit(0). Then change the postmaster to
allow exit(1) without a system-wide panic, but not for the startup subprocess
or the bgwriter. There were a couple of places that were using exit(1) to
deliberately force a system-wide panic; adjust these to be exit(2) instead.
This fixes the problem noted back in July that if the startup process exits
with elog(ERROR), the postmaster would think everything is hunky-dory and
proceed to start up. Alternative solutions such as trying to run the entire
startup process as a critical section seem less clean, primarily because of
the fact that a fair amount of startup code is shared by all postmaster
children in the EXEC_BACKEND case. We'd need an ugly special case somewhere
near the head of main.c to make it work if it's the child process's
responsibility to determine what happens; and what's the point when the
postmaster already treats different children differently?
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
pgstat_bestart() has been called; else any lock-block occurring
during InitPostgres() is disastrous. I believe this explains
recent wasp regression failure; at least it explains the crash I
got while trying to duplicate the problem. I also made
pgstat_report_activity() safe against the same scenario, just
in case. The report_waiting hazard was created by my patch of
19-Aug to include waiting status in pg_stat_activity.
that ps_status provides by appending 'waiting' to the PS display. This
completes the project of making it feasible to turn off process title
updates and instead rely on pg_stat_activity. Per my suggestion a few
weeks ago.
than N seconds apart. This allows a simple, if not very high performance,
means of guaranteeing that a PITR archive is no more than N seconds behind
real time. Also make pg_current_xlog_location return the WAL Write pointer,
add pg_current_xlog_insert_location to return the Insert pointer, and fix
pg_xlogfile_name_offset to return its results as a two-element record instead
of a smashed-together string, as per recent discussion.
Simon Riggs
such as debugging and performance measurement. This consists of two features:
a table of "rendezvous variables" that allows separately-loaded shared
libraries to communicate, and a new GUC setting "local_preload_libraries"
that allows libraries to be loaded into specific sessions without explicit
cooperation from the client application. To make local_preload_libraries
as flexible as possible, we do not restrict its use to superusers; instead,
it is restricted to load only libraries stored in $libdir/plugins/. The
existing LOAD command has also been modified to allow non-superusers to
LOAD libraries stored in this directory.
This patch also renames the existing GUC variable preload_libraries to
shared_preload_libraries (after a suggestion by Simon Riggs) and does some
code refactoring in dfmgr.c to improve clarity.
Korry Douglas, with a little help from Tom Lane.
loaded libraries: call functions _PG_init() and _PG_fini() if the library
defines such symbols. Hence we no longer need to specify an initialization
function in preload_libraries: we can assume that the library used the
_PG_init() convention, instead. This removes one source of pilot error
in use of preloaded libraries. Original patch by Ralf Engelschall,
preload_libraries changes by me.
(table or index) before trying to open its relcache entry. This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load. Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.
Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
it's handled just about like timezone; in particular, don't try
to read anything during InitializeGUCOptions. Should solve current
startup failure on Windows, and avoid wasted cycles if a nondefault
setting is specified in postgresql.conf too. Possibly we need to
think about a more general solution for handling 'expensive to set'
GUC options.
pg_usleep at all. Instead call the replacement function in
port/win32/signal.c by that name. Avoids tricky macro-redefinition
logic and suppresses a compiler warning; furthermore it ensures that
no one can accidentally use the non-signal-aware version of pg_usleep
in a Windows backend.
EINTR; the stats code was failing to do this and so were a couple of places
in the postmaster. The stats code assumed that recv() could not return EINTR
if a preceding select() showed the socket to be read-ready, but this is
demonstrably false with our Windows implementation of recv(), and it may
not be the case on all Unix variants either. I think this explains the
intermittent stats regression test failures we've been seeing, as well
as reports of stats collector instability under high load on Windows.
Backpatch as far as 8.0.
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid,
based on which we calculate the pg_database columns after each vacuum.
We now force all databases to be vacuumed, even template ones. A backend
noticing too old a database (meaning pg_database.datminxid is in danger of
falling behind Xid wraparound) will signal the postmaster, which in turn will
start an autovacuum iteration to process the offending database. In principle
this is only there to cope with frozen (non-connectable) databases without
forcing users to set them to connectable, but it could force regular user
database to go through a database-wide vacuum at any time. Maybe we should
warn users about this somehow. Of course the real solution will be to use
autovacuum all the time ;-)
There are some additional improvements we could have in this area: for example
the vacuum code could be smarter about not updating pg_database for each table
when called by autovacuum, and do it only once the whole autovacuum iteration
is done.
I updated the system catalogs documentation, but I didn't modify the
maintenance section. Also having some regression tests for this would be nice
but it's not really a very straightforward thing to do.
Catalog version bumped due to system catalog changes.
be delivered directly to the collector process. The extra process context
swaps required to transfer data through the buffer process seem to outweigh
any value the buffering might have. Per recent discussion and tests.
I modified Bruce's draft patch to use poll() rather than select() where
available (this makes a noticeable difference on my system), and fixed
up the EXEC_BACKEND case.
analyzing, so that future analyze threshold calculations don't get confused.
Also, make sure we correctly track the decrease of live tuples cause by
deletes.
Per report from Dylan Hansen, patches by Tom Lane and me.
changing semantics too much. statement_timestamp is now set immediately
upon receipt of a client command message, and the various places that used
to do their own gettimeofday() calls to mark command startup are referenced
to that instead. I have also made stats_command_string use that same
value for pg_stat_activity.query_start for both the command itself and
its eventual replacement by <IDLE> or <idle in transaction>. There was
some debate about that, but no argument that seemed convincing enough to
justify an extra gettimeofday() call.
current commands; instead, store current-status information in shared
memory. This substantially reduces the overhead of stats_command_string
and also ensures that pg_stat_activity is fully up to date at all times.
Per my recent proposal.
o remove many WIN32_CLIENT_ONLY defines
o add WIN32_ONLY_COMPILER define
o add 3rd argument to open() for portability
o add include/port/win32_msvc directory for
system includes
Magnus Hagander
---------------------------------------------------------------------------
Delay write of pg_stats file to once every five minutes, during
shutdown, or when requested by a backend:
It changes so the file is only written once every 5 minutes (changeable
of course, I just picked something) instead of once every half second.
It's still written when the stats collector shuts down, just as before.
And it is now also written on backend request. A backend requests a
rewrite by simply sending a special stats message. It operates on the
assumption that the backends aren't actually going to read the
statistics file very often, compared to how frequent it's written today.
Magnus Hagander
issued by autovacuum. Add accessor functions to them, and use those in the
pg_stat_*_tables system views.
Catalog version bumped due to changes in the pgstat views and the pgstat file.
Patch from Larry Rosenman, minor improvements by me.
in various places that were previously doing ad hoc pg_database searches.
This may speed up database-related privilege checks a little bit, but
the main motivation is to eliminate the performance reason for having
ReverifyMyDatabase do such a lot of stuff (viz, avoiding repeat scans
of pg_database during backend startup). The locking reason for having
that routine is about to go away, and it'd be good to have the option
to break it up.
shutdown, or when requested by a backend:
It changes so the file is only written once every 5 minutes (changeable
of course, I just picked something) instead of once every half second.
It's still written when the stats collector shuts down, just as before.
And it is now also written on backend request. A backend requests a
rewrite by simply sending a special stats message. It operates on the
assumption that the backends aren't actually going to read the
statistics file very often, compared to how frequent it's written today.
Magnus Hagander
Per recent discussion, this seems to be making the stats less accurate
rather than more so, particularly on Windows where PID values may be
reused very quickly. Patch by Peter Brant.
byte-swapping on the port number which causes the call to fail on Intel
Macs.
This patch uses htons() instead of htonl() and fixes this bug.
Ashley Clark
it later. This fixes a problem where EXEC_BACKEND didn't have progname
set, causing a segfault if log_min_messages was set below debug2 and our
own snprintf.c was being used.
Also alway strdup() progname.
Backpatch to 8.1.X and 8.0.X.
temp table not only our own process' tables. It's not real important
since vacuum.c will skip temp tables anyway, but might as well make the
code do what it claims to do.
files: avoid creating stats hashtable entries for tables that aren't being
touched except by vacuum/analyze, ensure that entries for dropped tables are
removed promptly, and tweak the data layout to avoid storing useless struct
padding. Also improve the performance of pgstat_vacuum_tabstat(), and make
sure that autovacuum invokes it exactly once per autovac cycle rather than
multiple times or not at all. This should cure recent complaints about 8.1
showing much higher stats I/O volume than was seen in 8.0. It'd still be a
good idea to revisit the design with an eye to not re-writing the entire
stats dataset every half second ... but that would be too much to backpatch,
I fear.
rather than elog(FATAL), when there is no more room in ShmemBackendArray.
This is a security issue since too many connection requests arriving close
together could cause the postmaster to shut down, resulting in denial of
service. Reported by Yoshiyuki Asaba, fixed by Magnus Hagander.
an LWLock instead of a spinlock. This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND). Per previous discussion.
file. The original code probed the PGPROC array separately for each PID,
which was not good for large numbers of backends: not only is the runtime
O(N^2) but most of it is spent holding ProcArrayLock. Instead, take the
lock just once and copy the active PIDs into an array, then use qsort
and bsearch so that the lookup time is more like O(N log N).
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
to assume that the string pointer passed to set_ps_display is good forever.
There's no need to anyway since ps_status.c itself saves the string, and
we already had an API (get_ps_display) to return it.
I believe this explains Jim Nasby's report of intermittent crashes in
elog.c when %i format code is in use in log_line_prefix.
While at it, repair a previously unnoticed problem: on some platforms such as
Darwin, the string returned by get_ps_display was blank-padded to the maximum
length, meaning that lock.c's attempt to append " waiting" to it never worked.
since it can take a fair amount of time and this can confuse boot scripts
that expect postmaster.pid to appear quickly. Move initialization of SSL
library and preloaded libraries to after that point, too, just for luck.
Per reports from Tony Caduto and others.
generated by bitmap index scans. Along the way, simplify and speed up
the code for counting sequential and index scans; it was both confusing
and inefficient to be taking care of that in the per-tuple loops, IMHO.
initdb forced because of internal changes in pg_stat view definitions.
recovered. I did not see any actual leak while testing this in CVS tip,
but 8.0 definitely has a problem with leaking the space temporarily
palloc'd by BufferSync(). In any case this seems a good idea to forestall
similar problems in future. Per report from Arjen van der Meijden.
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
(the stats system has always collected this info, but the views were
filtering it out). Modify autovacuum so that over-threshold activity
in a toast table can trigger a VACUUM of the parent table, even if the
parent didn't appear to need vacuuming itself. Per discussion a month
or so back about "short, wide tables".
should surely be timestamptz not timestamp; fix some but not all of the
holes in check_and_make_absolute(); other minor cleanup. Also put in
the missed catversion bump.
delay and limit, both as global GUCs and as table-specific entries in
pg_autovacuum. stats_reset_on_server_start is now OFF by default,
but a reset is forced if we did WAL replay. XID-wrap vacuums do not
ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera
against the PGPROC array. Anything in the file that isn't in PGPROC
gets rejected as being a stale entry. This should solve complaints about
stale entries in pg_stat_activity after a BETERM message has been dropped
due to overload.
exit, instead of trying to take shortcuts. Introduce some additional
shutdown callback routines to eliminate kluges like having ProcKill
be responsible for shutting down the buffer manager. Ensure that the
order of operations during shutdown is predictable and what you would
expect given the module layering.
doesn't block the bgwriter from making progress writing out other buffers.
This was a hard problem in the context of the ARC/2Q design, but it's
trivial in the context of clock sweep ... just advance the sweep counter
before we try to write not after.
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly. Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries. Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA. This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it. Per recent discussion.
the difference between checkpoints forced due to WAL segment consumption
and checkpoints forced for other reasons (such as CREATE DATABASE). Avoid
generating 'checkpoints are occurring too frequently' messages when the
checkpoint wasn't caused by WAL segment consumption. Per gripe from
Chris K-L.
current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields. This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.
and pg_auth_members. There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance). But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies. The catalog changes should
be pretty much done.
includes error checking and an appropriate ereport(ERROR) message.
This gets rid of rather tedious and error-prone manipulation of errno,
as well as a Windows-specific bug workaround, at more than a dozen
call sites. After an idea in a recent patch by Heikki Linnakangas.
spotted by Qingqing Zhou. The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places. If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL. But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
hash table. This is a pretty unlikely scenario, since the table
should be tiny, but we can't guarantee continued correct operation
if it does occur. Spotted by Qingqing Zhou.
collector messages, per recent discussion on pgsql-patches. This
actually required quite a few changes -- for example,
"databaseid != InvalidOid" was used to check whether a slot in the
backend entry table was initialized, but that no longer works since
the slot might be initialized prior to receiving the BESTART message
which contains the database id. We now use procpid > 0 to indicate
that a slot is non-empty.
Other changes:
- various comment improvements and cleanups
- there's no need to zero-out the entire activity buffer in
pgstat_add_backend(), we can just set activity[0] to '\0'.
- remove the counting of the # of connections to a database; this
was not used anywhere
One change in behavior I wasn't sure about: previously, the code
would create a hash table entry for a database as soon as any message
was received whose header referenced that database. Now, we only
create hash table entries as needed (so for example BESTART won't
create a database hash table entry, since it doesn't need to
access anything in the per-db hash table). It would be easy enough
to retain the old behavior, but AFAICS it is not required.
* Add session start time to pg_stat_activity
* Add the client IP address and port to pg_stat_activity
Original patch from Magnus Hagander, code review by Neil Conway. Catalog
version bumped. This patch sends the client IP address and port number in
every statistics message; that's not ideal, but will be fixed up shortly.
* Changes the APIs to the timezone functions to take a pg_tz pointer as
an argument, representing the timezone to use for the selected
operation.
* Adds a global_timezone variable that represents the current timezone
in the backend as set by SET TIMEZONE (or guc, or env, etc).
* Implements a hash-table cache of loaded tables, so we don't have to
read and parse the TZ file everytime we change a timezone. While not
necesasry now (we don't change timezones very often), I beleive this
will be necessary (or at least good) when "multiple timezones in the
same query" is eventually implemented. And code-wise, this was the time
to do it.
There are no user-visible changes at this time. Implementing the
"multiple zones in one query" is a later step...
This also gets rid of some of the cruft needed to "back out a timezone
change", since we previously couldn't check a timezone unless it was
activated first.
Passes regression tests on win32, linux (slackware 10) and solaris x86.
Magnus Hagander
whose keys are OIDs. The only one that looks particularly performance
critical is the relcache hashtable, but as long as we've got the function
we may as well use it wherever it's applicable.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
exit. Without this, operations triggered during backend exit (such as
temp table deletions) won't be counted ... which given heavy usage of
temp tables can lead to pg_autovacuum falling way behind on the need
to vacuum pg_class and pg_attribute. Per reports from Steve Crawford
and others.
* Touch the socket and lock file at least every hour, to
* ensure that they are not removed by overzealous /tmp-cleaning
* tasks. Set to 58 minutes so a cleaner never sees the
* file as an hour old.
is still alive. This improves our odds of not getting fooled by an
unrelated process when checking a stale lock file. Other checks already
in place, plus one newly added in checkDataDir(), ensure that we cannot
attempt to usurp the place of a postmaster belonging to a different userid,
so there is no need to error out. Add comments indicating the importance
of these other checks.
elog if the former has trouble writing its file. Code review for
Magnus' patch to redirect stderr to syslog on Windows (Bruce's version
seems right, but did some minor prettification).
Backpatch both changes to 8.0 branch.
before we can invoke fork() -- flush stdio buffers, save and restore the
profiling timer on Linux with LINUX_PROFILE, and handle BeOS stuff. This
patch moves that code into a single function, fork_process(), instead of
duplicating it at the various callsites of fork().
This patch doesn't address the EXEC_BACKEND case; there is room for
further cleanup there.
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers. This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management. Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
in GetNewTransactionId(). Since the limit value has to be computed
before we run any real transactions, this requires adding code to database
startup to scan pg_database and determine the oldest datfrozenxid.
This can conveniently be combined with the first stage of an attack on
the problem that the 'flat file' copies of pg_shadow and pg_group are
not properly updated during WAL recovery. The code I've added to
startup resides in a new file src/backend/utils/init/flatfiles.c, and
it is responsible for rewriting the flat files as well as initializing
the XID wraparound limit value. This will eventually allow us to get
rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add
a trigger to pg_database.
is the minimum required fix. I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
to shared memory as soon as possible, ie, right after read_backend_variables.
The effective difference from the original code is that this happens
before instead of after read_nondefault_variables(), which loads GUC
information and is apparently capable of expanding the backend's memory
allocation more than you'd think it should. This should fix the
failure-to-attach-to-shared-memory reports we've been seeing on Windows.
Also clean up a few bits of unnecessarily grotty EXEC_BACKEND code.
> seconds to 10 seconds. The original number was plucked from thin air
> some months ago, and I'd like to review that now based upon further
> thought, observation and experience.
>
> This change has little or no effect on performance, since the interval
> is there mainly to avoid repeated respawn attempts if archiver fails at
> startup. Archiver start-up time is very quick, so there is little danger
> of exceeding 10 seconds.
>
> On a busy system, if the archiver does die, then many files can build up
> in the 60 seconds before respawning. That xlog file backlog could take
> some time to clear. This then leaves a larger than normal window of data
> loss for a possibly long period.
>
> It's a minor change only, with no other effect on function.
Simon Riggs
even uglier than it was already :-(. Also, on Windows only, use temporary
shared memory segments instead of ordinary files to pass over critical
variable values from postmaster to child processes. Magnus Hagander
plain SUSET instead. Also delay processing of options received in
client connection request until after we know if the user is a superuser,
so that SUSET values can be set that way by legitimate superusers.
Per recent discussion.
files and directories. This ensures that the bgwriter will close any open
file references it is holding for files therein, which is needed for the
rmdir() to succeed. Andrew Dunstan and Tom Lane.
returning a NULL pointer (some callers remembered to check the return
value, but some did not -- it is safer to just bail out).
Also, cleanup pgstat.c to use elog(ERROR) rather than elog(LOG) followed
by exit().
C:\msys\1.0\home\y-asaba>pg_ctl -D data restart
waiting for postmaster to shut down...LOG: received smart shutdown
request.
LOG: shutting down
LOG: database system is shut down
done
postmaster stopped
postmaster starting
C:\msys\1.0\home\y-asaba>postmaster.exe: invalid argument: "'-D'"
Try "postmaster.exe --help" for more information.
Yoshiyuki Asaba
The vars are renamed to data_directory, config_file, hba_file, and
ident_file, and are guaranteed to be set to accurate absolute paths
during postmaster startup.
This commit does not yet do anything about hiding path values from
non-superusers.
Refactor code into something reasonably understandable, cause
use of the feature to not fail in standalone backends or in
EXEC_BACKEND case, fix sloppy guc.c table entries, make the
documentation minimally usable.
* We are not sure how much precision is in tv_usec, so we
* swap the high and low 16 bits of 'later' and XOR them with
* 'earlier'. On the off chance that the result is 0, we
* loop until it isn't.
Greg Stark
different backends get a reasonably wide set of initial seeds even if
gettimeofday returns tv_usec values with only a few bits of precision.
Per recent discussion.
* Links with -leay32 and -lssleay32 instead of crypto and ssl. On win32,
"crypto and ssl" is only used for static linking.
* Initializes SSL in the backend and not just in the postmaster. We
cannot pass the SSL context from the postmaster through the parameter
file, because it contains function pointers.
* Split one error check in be-secure.c. Previously we could not tell
which of three calls actually failed. The previous code also returned
incorrect error messages if SSL_accept() failed - that function needs to
use SSL_get_error() on the return value, can't just use the error queue.
* Since the win32 implementation uses non-blocking sockets "behind the
scenes" in order to deliver signals correctly, implements a version of
SSL_accept() that can handle this. Also, add a wait function in case
SSL_read or SSL_write() needs more data.
Magnus Hagander
to allow DBA to choose the form in which log filenames reflect the
current time. Also allow for truncating instead of appending to
pre-existing files --- this is convenient when the log filename pattern
rewrites the same names cyclically. Per Ed L.
Win32 WaitForMultipleObjects:
ret = WaitForMultipleObjects(win32_numChildren, win32_childHNDArray,
FALSE, 0);
Problem is 'win32_numChildren' could be more then 64 ( function supports
), problem basically arise ( kills postgres ) when you create more then
64 connections and terminate some of them sill leaving more then 64.
Claudio Natoli
>>GetLastError will
>>> give much more details than errno.
>>
>>How much more, really? That mapping table gave me the impression that
>>the win32 error codes aren't all that much more detailed than errno...
>
>The mapping table is not complete. My winerror.h from the SDK
>lists 2209
>error codes, whereas errno.h lists 42...
>
>I still don't think we'll get that much more stuff. Right now,
>the Win32
>code paths that actually use the more advanced functions already write
>out the error number in case something happens. We can keep doing that
>for the other paths (ereport the error *number* when the mapping does
>not have a match). The map to errno will catch almost all cases, I
>think. And in the corner cases we can do with just the number, and use
>"net helpmsg" to get the actual message when checking...
Here's an attempt on this. new file goes in backend/port/win32.
Magnus Hagander
slashes to backslashes #ifdef WIN32. This is to cope with the fact
that Windows seems exceedingly unfriendly to slashes in shell commands,
as per recent discussion.
recommend that people go get Apache's rotatelogs program. Additional
benefits are that configuration is done through GUC, rather than
externally, and that the postmaster can monitor the log rotator and
restart it after failure (though we certainly hope that won't happen
often).
Andreas Pflug, some rework by Tom Lane.
and history files as per recent discussion. While at it, remove
pg_terminate_backend, since we have decided we do not have time during
this release cycle to address the reliability concerns it creates.
Split the 'Miscellaneous Functions' documentation section into
'System Information Functions' and 'System Administration Functions',
which hopefully will draw the eyes of those looking for such things.
possible to trap an error inside a function rather than letting it
propagate out to PostgresMain. You still have to use AbortCurrentTransaction
to clean up, but at least the error handling itself will cooperate.
live backends, the archiver and stats processes never got sent a
kill signal. They'd eventually exit on their own, but not for awhile,
which is a bit annoying when you are trying to replace the executable
file on a platform that doesn't allow removal of busy executables.
Also, tweak main loop logic so that we will perform the background
tasks after select() returns EINTR.
recovery more manageable. Also, undo recent change to add FILE_HEADER
and WASTED_SPACE records to XLOG; instead make the XLOG page header
variable-size with extra fields in the first page of an XLOG file.
This should fix the boundary-case bugs observed by Mark Kirkwood.
initdb forced due to change of XLOG representation.
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.
Alvaro Herrera, with some help from Tom Lane.
specified in just one place and adhered to exactly, rather than just more
or less. A side effect is to increase PGSTAT_ACTIVITY_SIZE (maximum
reported query length) from 256 to nearly 1000.
begin the shutdown checkpoint; there isn't anything left for them to do,
so we may as well ensure that they shut down sooner rather than later.
Per discussion.
>> though - the GUC variable was not set in the child
>processes. So "show
>> lc_collate" would *always* return "C", for example. attached
>patch fixes
>> this.
>
>Hm. Why were these vars not propagated by the regular
>mechanism for GUC
>variables (write_nondefault_variables or whatever it's called)? If the
>problem is that it's not accepting PGC_INTERNAL values, then we need to
>fix it there not here, because otherwise we'll have to pass all the
>PGC_INTERNAL variables through the backend_variables file, which seems
>like a recipe for more of the same sort of bug.
Good point :-(
I think the problem is not only that it specifically does not deal with
PGC_INTERNAL variables. The problem is in the fact that
write_nondefault_variables is called *before* the locale is read
(because the locale is read from pg_control and not from any of the
"usual" ways to read it).
Attached patch is another stab at fixing it. It makes postmaster dump a
new copy of the file once it has started the database (before it accepts
any connections), which is when it will know about these parameters.
Also updates the reading code to set the context to the one where the
variable was originally set (PGC_POSTMASTER won't work for PGC_INTERNAL,
and the other way around).
We still pass lc_collate through the special file, because
set_config_option on lc_collate will speficially *not* call setlocale(),
and we need that call. But we no longer call set_config_option from
there.
Magnus Hagander
place of time_t, as per prior discussion. The behavior does not change
on machines without a 64-bit-int type, but on machines with one, which
is most, we are rid of the bizarre boundary behavior at the edges of
the 32-bit-time_t range (1901 and 2038). The system will now treat
times over the full supported timestamp range as being in your local
time zone. It may seem a little bizarre to consider that times in
4000 BC are PST or EST, but this is surely at least as reasonable as
propagating Gregorian calendar rules back that far.
I did not modify the format of the zic timezone database files, which
means that for the moment the system will not know about daylight-savings
periods outside the range 1901-2038. Given the way the files are set up,
it's not a simple decision like 'widen to 64 bits'; we have to actually
think about the range of years that need to be supported. We should
probably inquire what the plans of the upstream zic people are before
making any decisions of our own.
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint. In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends. (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors. Anyone have a problem with that?) This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.
Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC: heap_delete_redo: no block'
later on in xlog replay.
than being random pieces of other files. Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses. While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
about a third, make it work on non-Windows platforms again. (But perhaps
I broke the WIN32 code, since I have no way to test that.) Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently. Take care of one or two
FIXMEs that remained in the code.
of ThisStartUpID and RedoRecPtr into new backends. It's a lot easier just
to make them all grab the values out of shared memory during startup.
This helps to decouple the postmaster from checkpoint execution, which I
need since I'm intending to let the bgwriter do it instead, and it also
fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at
all AFAICS. (Doesn't give me a lot of faith in the amount of testing that
port has gotten.)
the four functions.
> Also, please justify the temp-related changes. I was not aware that we
> had any breakage there.
patch-tmp-schema.txt contains the following bits:
*) Changes pg_namespace_aclmask() so that the superuser is always able
to create objects in the temp namespace.
*) Changes pg_namespace_aclmask() so that if this is a temp namespace,
objects are only allowed to be created in the temp namespace if the
user has TEMP privs on the database. This encompasses all object
creation, not just TEMP tables.
*) InitTempTableNamespace() checks to see if the current user, not the
session user, has access to create a temp namespace.
The first two changes are necessary to support the third change. Now
it's possible to revoke all temp table privs from non-super users and
limiting all creation of temp tables/schemas via a function that's
executed with elevated privs (security definer). Before this change,
it was not possible to have a setuid function to create a temp
table/schema if the session user had no TEMP privs.
patch-area-path.txt contains:
*) Can now determine the area of a closed path.
patch-dfmgr.txt contains:
*) Small tweak to add the library path that's being expanded.
I was using $lib/foo.so and couldn't easily figure out what the error
message, "invalid macro name in dynamic library path" meant without
looking through the source code. With the path in there, at least I
know where to start looking in my config file.
Sean Chittenden
(SIGUSR1, which we have not been using recently) instead of piggybacking
on SIGUSR2-driven NOTIFY processing. This has several good results:
the processing needed to drain the sinval queue is a lot less than the
processing needed to answer a NOTIFY; there's less contention since we
don't have a bunch of backends all trying to acquire exclusive lock on
pg_listener; backends that are sitting inside a transaction block can
still drain the queue, whereas NOTIFY processing can't run if there's
an open transaction block. (This last is a fairly serious issue that
I don't think we ever recognized before --- with clients like JDBC that
tend to sit with open transaction blocks, the sinval queue draining
mechanism never really worked as intended, probably resulting in a lot
of useless cache-reset overhead.) This is the last of several proposed
changes in response to Philip Warner's recent report of sinval-induced
performance problems.
and should do now that we control our own destiny for timezone handling,
but this commit gets the bulk of the picayune diffs in place.
Magnus Hagander and Tom Lane.
timezone code and other places.
Remove elog() calls from find_my_exec; do fprintf(stderr) instead. We
can then remove the exec.c handling in the makefile because it doesn't
have to be built to suppress elog calls.
find_my_exec/find_other_exec(). Remove passing of progname to these
functions as they can find that out from argv[0], which they already
have.
Make get_progname return const char *, and update all progname variables
to be const char *.
all the code that looks for other binaries. I move FindExec into
port/exec.c (and renamed it to find_my_binary()). I also added
find_other_binary that looks for another binary in the same directory as
the calling program, and checks the version string.
The only behavior change was that initdb and pg_dump would look in the
hard-coded bindir directory if it can't find the requested binary in the
same directory as the caller. The new code throws an error. The old
behavior seemed too error prone for version mismatches.
* removed a few redundant defines
* get_user_name safe under win32
* rationalized pipe read EOF for win32 (UPDATED PATCH USED)
* changed all backend instances of sleep() to pg_usleep
- except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a
32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is
acceptable, please replace with pg_usleep(2000000000L)]
I added a comment to that part of the code:
/*
* It would be nice to use pg_usleep() here, but only does 2000 sec
* or 33 minutes, which seems too short.
*/
sleep(1000000);
Claudio Natoli
It works on the principle of turning sockets into non-blocking, and then
emulate blocking behaviour on top of that, while allowing signals to
run. Signals are now implemented using an event instead of APCs, thus
getting rid of the issue of APCs not being compatible with "old style"
sockets functions.
It also moves the win32 specific code away from pqsignal.h/c into
port/win32, and also removes the "thread style workaround" of the APC
issue previously in place.
In order to make things work, a few things are also changed in pgstat.c:
1) There is now a separate pipe to the collector and the bufferer. This
is required because the pipe will otherwise only be signalled in one of
the processes when the postmaster goes down. The MS winsock code for
select() must have some kind of workaround for this behaviour, but I
have found no stable way of doing that. You really are not supposed to
use the same socket from more than one process (unless you use
WSADuplicateSocket(), in which case the docs specifically say that only
one will be flagged).
2) The check for "postmaster death" is moved into a separate select()
call after the main loop. The previous behaviour select():ed on the
postmaster pipe, while later explicitly saying "we do NOT check for
postmaster exit inside the loop".
The issue was that the code relies on the same select() call seeing both
the postmaster pipe *and* the pgstat pipe go away. This does not always
happen, and it appears that useing WSAEventSelect() makes it even more
common that it does not.
Since it's only called when the process exits, I don't think using a
separate select() call will have any significant impact on how the stats
collector works.
Magnus Hagander
listen_addresses parameter, as per recent discussion. The default behavior
is now to listen on localhost, which eliminates the need for the -i
postmaster switch in many scenarios.
Andrew Dunstan
initialization of stats process under EXEC_BACKEND.
[A cleaner, rationalized approach to stat/backend/SSDataBase child
processes under EXEC_BACKEND is on my TODO list. However this patch
takes care of immediate concerns (ie. stats test now passes under
win32)]
Claudio Natoli
#log_line_prefix = '' # e.g. '<%u%%%d> '
# %u=user name %d=database name
# %r=remote host and port
# %p=PID %t=timestamp %i=command tag
# %c=session id %l=session line number
# %s=session start timestamp
# %x=stop here in non-session processes
# %%='%'
Andrew Dunstan
* Mostly, casting etc to remove compilation warnings in win32 only code.
* main.c: set _IONBF to stdout/stderr under win32 (under win32, _IOLBF
defaults to full buffering)
* pg_resetxlog/Makefile: ensures dirmod.o gets cleaned (got bitten by
this when, after "make clean"ing, switching compilation between Ming +
Cygwin)
Claudio Natoli
* Changes incorrect CYGWIN defines to __CYGWIN__
* Some localtime returns NULL checks (when unchecked cause SEGVs under
Win32
regression tests)
* Rationalized CreateSharedMemoryAndSemaphores and
AttachSharedMemoryAndSemaphores (Bruce, I finally remembered to do it);
requires attention.
Claudio Natoli
number of openable files and the number already opened. This eliminates
depending on sysconf(_SC_OPEN_MAX), and allows much saner behavior on
platforms where open-file slots are used up by semaphores.
allow the bgwriter to start before the startup subprocess has finished
... it tends to crash otherwise. (The same problem may have existed for
the checkpointer, I'm not entirely sure.) Remove some code that was
redundant because the bgwriter is handled as a member of the backend list.
Natoli and Bruce Momjian (and some cosmetic fixes from Neil Conway).
Changes:
- remove duplicate signal definitions from pqsignal.h
- replace pqkill() with kill() and redefine kill() in Win32
- use ereport() in place of fprintf() in some error handling in
pqsignal.c
- export pg_queue_signal() and make use of it where necessary
- add a console control handler for Ctrl-C and similar handling
on Win32
- do WaitForSingleObjectEx() in CHECK_FOR_INTERRUPTS() on Win32;
query cancelling should now work on Win32
- various other fixes and cleanups
whereToSendOutput instead because they are really inquiring about
the correct client communication protocol. Update some comments.
This is pointing towards supporting regular FE/BE client protocol
in a standalone backend, per discussion a month or so back.
against the latest shapshot. It also includes the replacement of kill()
with pqkill() and sigsetmask() with pqsigsetmask().
Passes all tests fine on my linux machine once applied. Still doesn't
link completely on Win32 - there are a few things still required. But
much closer than before.
At Bruce's request, I'm goint to write up a README file about the method
of signals delivery chosen and why the others were rejected (basically a
summary of the mailinglist discussions). I'll finish that up once/if the
patch is accepted.
Magnus Hagander
PostmasterPid variable, which gets set (early) in PostmasterMain
getppid would not be the postmaster?
[fork/exec] Implements processCancelRequest by keeping an array of
pid/cancel_key structs in shared mem
[fork/exec] Moves AttachSharedMemoryAndSemaphores call for backends into
SubPostmasterMain
[win32] Implements reaper/waitpid by keeping an arrays of children
pids,handles in postmaster local mem
- this item is largely untested, for reasons which should be
obvious, but appears sound
[win32/all] Added extern for pgpipe in Win32 case, and changed the second
pipe call (which seems to have been missed earlier) to pgpipe
[win32] #define'd ftruncate to chsize in the Win32 case
[win32] PG_USLEEP for Win32 has a misplaced paren. Fixed.
[win32] DLLIMPORT handling for MingW case
Claudio Natoli
pointer type when it is not necessary to do so.
For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
BackendFork/SSDataBase/pgstat) startup, to allow fork/exec calls to
closely mimic (the soon to be provided) Win32 CreateProcess equivalent
calls.
Claudio Natoli
on 64-bit Solaris. Use a non-system-dependent datatype for UsedShmemSegID,
namely unsigned long (which we were already assuming could hold a shmem
key anyway, cf RecordSharedMemoryInLockFile).
This first part of the background writer does no syncing at all.
It's only purpose is to keep the LRU heads clean so that regular
backends seldom to never have to call write().
Jan
to create a TCP/IP socket from FATAL to LOG. This was unwise;
historically we have expected socket conflicts to abort postmaster
startup. Conflicts on port numbers with another postmaster can only
be detected reliably at the TCP socket level.
was modified for IPv6. Use a robust definition of struct sockaddr_storage,
do a proper configure test to see if ss_len exists, don't assume that
getnameinfo() will handle AF_UNIX sockets, don't trust getaddrinfo to
return the protocol we ask for, etc. This incorporates several outstanding
patches from Kurt Roeckx, but I'm to blame for anything that doesn't
work ...
of order; the 'server log' output is actually client output in these
scenarios and we ought to treat elevels the same way as in the client
case. This allows initdb to not send backend stderr to /dev/null anymore,
which makes it much more likely that people will notice problems during
initdb.
Win32 port is now called 'win32' rather than 'win'
add -lwsock32 on Win32
make gethostname() be only used when kerberos4 is enabled
use /port/getopt.c
new /port/opendir.c routines
disable GUC unix_socket_group on Win32
convert some keywords.c symbols to KEYWORD_P to prevent conflict
create new FCNTL_NONBLOCK macro to turn off socket blocking
create new /include/port.h file that has /port prototypes, move
out of c.h
new /include/port/win32_include dir to hold missing include files
work around ERROR being defined in Win32 includes
I had inadvertently omitted it while rearranging things to support
length-counted incoming messages. Also, change the parser's API back
to accepting a 'char *' query string instead of 'StringInfo', as the
latter wasn't buying us anything except overhead. (I think when I put
it in I had some notion of making the parser API 8-bit-clean, but
seeing that flex depends on null-terminated input, that's not really
ever gonna happen.)
have length words. COPY OUT reimplemented per new protocol: it doesn't
need \. anymore, thank goodness. COPY BINARY to/from frontend works,
at least as far as the backend is concerned --- libpq's PQgetline API
is not up to snuff, and will have to be replaced with something that is
null-safe. libpq uses message length words for performance improvement
(no cycles wasted rescanning long messages), but not yet for error
recovery.
with variable-width fields. No more truncation of long user names.
Also, libpq can now send its environment-variable-driven SET commands
as part of the startup packet, saving round trips to server.
> weird behavior across fork boundaries; (b) the additional memory space
> that has to be duplicated into child processes will cost something per
> child launch, even if the child never uses it. But these are only
> arguments that it might not *always* be a prudent thing to do, not that
> we shouldn't give the DBA the tool to do it if he wants. So fire away.
Here is a patch for the above, including a documentation update. It
creates a new GUC variable "preload_libraries", that accepts a list in
the form:
preload_libraries = '$libdir/mylib1:initfunc,$libdir/mylib2'
If ":initfunc" is omitted or not found, no initialization function is
executed, but the library is still preloaded. If "$libdir/mylib" isn't
found, the postmaster refuses to start.
In my testing with PL/R, it reduces the first call to a PL/R function
(after connecting) from almost 2 seconds, down to about 8 ms.
Joe Conway
service it until after we execute SetThisStartUpID(). Else shutdown
process will write the wrong SUI into the shutdown checkpoint, which
seems likely to be trouble --- although I've not quite figured out
how significant it really is.
of the socket file and socket lock file; this should prevent both of them
from being removed by even the stupidest varieties of /tmp-cleaning
script. Per suggestion from Giles Lean.
datetime token tables. Even more embarrassing, the regression tests
revealed some of the problems --- but evidently the bogus output wasn't
questioned. Add code to postmaster startup to directly check the tables
for correct ordering, in hopes of not being embarrassed like this again.
postgresql version 7.3, but yea... this patch adds full IPv6
support to postgres. I've tested it out on 7.2.3 and has
been running perfectly stable.
CREDITS:
The KAME Project (Initial patch)
Nigel Kukard <nkukard@lbsd.net>
Johan Jordaan <johanj@lando.co.za>
database access outside a transaction; revert bogus performance improvement
in SIBackendInit(); improve comments; add documentation (this part courtesy
Neil Conway).
ProcKill instead, where we still have a PGPROC with which to wait on
LWLocks. This fixes 'can't wait without a PROC structure' failures
occasionally seen during backend shutdown (I'm surprised they weren't
more frequent, actually). Add an Assert() to LWLockAcquire to help
catch any similar mistakes in future. Fix failure to update MyProcPid
for standalone backends and pgstat processes.
(overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask,
per earlier discussion. Simplify scheme for overlaying fields in tuple
header (no need for cmax to live in more than one place). Don't try to
clear infomask status bits in tqual.c --- not safe to do it there. Don't
try to force output table of a SELECT INTO to have OIDs, either. Get rid
of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which
has already caused one recent failure. Improve documentation.
connections by the superuser only.
This patch replaces the last patch I sent a couple of days ago.
It closes a connection that has not been authorised by a superuser if it would
leave less than the GUC variable ReservedBackends
(superuser_reserved_connections in postgres.conf) backend process slots free
in the SISeg. This differs to the first patch which only reserved the last
ReservedBackends slots in the procState array. This has made the free slot
test more expensive due to the use of a lock.
After thinking about a comment on the first patch I've also made it a fatal
error if the number of reserved slots is not less than the maximum number of
connections.
Nigel J. Andrews
error handling, and simplifies the code that remains. Apparently,
the code that left Berkeley had a whole "error handling subsystem",
which exceptions and whatnot. Since we don't use that anymore,
there's no reason to keep it around.
The regression tests pass with the patch applied. Unless anyone
sees a problem, please apply.
Neil Conway
bitmap, if present).
Per Tom Lane's suggestion the information whether a tuple has an oid
or not is carried in the tuple descriptor. For debugging reasons
tdhasoid is of type char, not bool. There are predefined values for
WITHOID, WITHOUTOID and UNDEFOID.
This patch has been generated against a cvs snapshot from last week
and I don't expect it to apply cleanly to current sources. While I
post it here for public review, I'm working on a new version against a
current snapshot. (There's been heavy activity recently; hope to
catch up some day ...)
This is a long patch; if it is too hard to swallow, I can provide it
in smaller pieces:
Part 1: Accessor macros
Part 2: tdhasoid in TupDesc
Part 3: Regression test
Part 4: Parameter withoid to heap_addheader
Part 5: Eliminate t_oid from HeapTupleHeader
Part 2 is the most hairy part because of changes in the executor and
even in the parser; the other parts are straightforward.
Up to part 4 the patched postmaster stays binary compatible to
databases created with an unpatched version. Part 5 is small (100
lines) and finally breaks compatibility.
Manfred Koizar
Attached are a revised set of SSL patches. Many of these patches
are motivated by security concerns, it's not just bug fixes. The key
differences (from stock 7.2.1) are:
*) almost all code that directly uses the OpenSSL library is in two
new files,
src/interfaces/libpq/fe-ssl.c
src/backend/postmaster/be-ssl.c
in the long run, it would be nice to merge these two files.
*) the legacy code to read and write network data have been
encapsulated into read_SSL() and write_SSL(). These functions
should probably be renamed - they handle both SSL and non-SSL
cases.
the remaining code should eliminate the problems identified
earlier, albeit not very cleanly.
*) both front- and back-ends will send a SSL shutdown via the
new close_SSL() function. This is necessary for sessions to
work properly.
(Sessions are not yet fully supported, but by cleanly closing
the SSL connection instead of just sending a TCP FIN packet
other SSL tools will be much happier.)
*) The client certificate and key are now expected in a subdirectory
of the user's home directory. Specifically,
- the directory .postgresql must be owned by the user, and
allow no access by 'group' or 'other.'
- the file .postgresql/postgresql.crt must be a regular file
owned by the user.
- the file .postgresql/postgresql.key must be a regular file
owned by the user, and allow no access by 'group' or 'other'.
At the current time encrypted private keys are not supported.
There should also be a way to support multiple client certs/keys.
*) the front-end performs minimal validation of the back-end cert.
Self-signed certs are permitted, but the common name *must*
match the hostname used by the front-end. (The cert itself
should always use a fully qualified domain name (FDQN) in its
common name field.)
This means that
psql -h eris db
will fail, but
psql -h eris.example.com db
will succeed. At the current time this must be an exact match;
future patches may support any FQDN that resolves to the address
returned by getpeername(2).
Another common "problem" is expiring certs. For now, it may be
a good idea to use a very-long-lived self-signed cert.
As a compile-time option, the front-end can specify a file
containing valid root certificates, but it is not yet required.
*) the back-end performs minimal validation of the client cert.
It allows self-signed certs. It checks for expiration. It
supports a compile-time option specifying a file containing
valid root certificates.
*) both front- and back-ends default to TLSv1, not SSLv3/SSLv2.
*) both front- and back-ends support DSA keys. DSA keys are
moderately more expensive on startup, but many people consider
them preferable than RSA keys. (E.g., SSH2 prefers DSA keys.)
*) if /dev/urandom exists, both client and server will read 16k
of randomization data from it.
*) the server can read empheral DH parameters from the files
$DataDir/dh512.pem
$DataDir/dh1024.pem
$DataDir/dh2048.pem
$DataDir/dh4096.pem
if none are provided, the server will default to hardcoded
parameter files provided by the OpenSSL project.
Remaining tasks:
*) the select() clauses need to be revisited - the SSL abstraction
layer may need to absorb more of the current code to avoid rare
deadlock conditions. This also touches on a true solution to
the pg_eof() problem.
*) the SIGPIPE signal handler may need to be revisited.
*) support encrypted private keys.
*) sessions are not yet fully supported. (SSL sessions can span
multiple "connections," and allow the client and server to avoid
costly renegotiations.)
*) makecert - a script that creates back-end certs.
*) pgkeygen - a tool that creates front-end certs.
*) the whole protocol issue, SASL, etc.
*) certs are fully validated - valid root certs must be available.
This is a hassle, but it means that you *can* trust the identity
of the server.
*) the client library can handle hardcoded root certificates, to
avoid the need to copy these files.
*) host name of server cert must resolve to IP address, or be a
recognized alias. This is more liberal than the previous
iteration.
*) the number of bytes transferred is tracked, and the session
key is periodically renegotiated.
*) basic cert generation scripts (mkcert.sh, pgkeygen.sh). The
configuration files have reasonable defaults for each type
of use.
Bear Giles
are motivated by security concerns, it's not just bug fixes. The key
differences (from stock 7.2.1) are:
*) almost all code that directly uses the OpenSSL library is in two
new files,
src/interfaces/libpq/fe-ssl.c
src/backend/postmaster/be-ssl.c
in the long run, it would be nice to merge these two files.
*) the legacy code to read and write network data have been
encapsulated into read_SSL() and write_SSL(). These functions
should probably be renamed - they handle both SSL and non-SSL
cases.
the remaining code should eliminate the problems identified
earlier, albeit not very cleanly.
*) both front- and back-ends will send a SSL shutdown via the
new close_SSL() function. This is necessary for sessions to
work properly.
(Sessions are not yet fully supported, but by cleanly closing
the SSL connection instead of just sending a TCP FIN packet
other SSL tools will be much happier.)
*) The client certificate and key are now expected in a subdirectory
of the user's home directory. Specifically,
- the directory .postgresql must be owned by the user, and
allow no access by 'group' or 'other.'
- the file .postgresql/postgresql.crt must be a regular file
owned by the user.
- the file .postgresql/postgresql.key must be a regular file
owned by the user, and allow no access by 'group' or 'other'.
At the current time encrypted private keys are not supported.
There should also be a way to support multiple client certs/keys.
*) the front-end performs minimal validation of the back-end cert.
Self-signed certs are permitted, but the common name *must*
match the hostname used by the front-end. (The cert itself
should always use a fully qualified domain name (FDQN) in its
common name field.)
This means that
psql -h eris db
will fail, but
psql -h eris.example.com db
will succeed. At the current time this must be an exact match;
future patches may support any FQDN that resolves to the address
returned by getpeername(2).
Another common "problem" is expiring certs. For now, it may be
a good idea to use a very-long-lived self-signed cert.
As a compile-time option, the front-end can specify a file
containing valid root certificates, but it is not yet required.
*) the back-end performs minimal validation of the client cert.
It allows self-signed certs. It checks for expiration. It
supports a compile-time option specifying a file containing
valid root certificates.
*) both front- and back-ends default to TLSv1, not SSLv3/SSLv2.
*) both front- and back-ends support DSA keys. DSA keys are
moderately more expensive on startup, but many people consider
them preferable than RSA keys. (E.g., SSH2 prefers DSA keys.)
*) if /dev/urandom exists, both client and server will read 16k
of randomization data from it.
*) the server can read empheral DH parameters from the files
$DataDir/dh512.pem
$DataDir/dh1024.pem
$DataDir/dh2048.pem
$DataDir/dh4096.pem
if none are provided, the server will default to hardcoded
parameter files provided by the OpenSSL project.
Remaining tasks:
*) the select() clauses need to be revisited - the SSL abstraction
layer may need to absorb more of the current code to avoid rare
deadlock conditions. This also touches on a true solution to
the pg_eof() problem.
*) the SIGPIPE signal handler may need to be revisited.
*) support encrypted private keys.
*) sessions are not yet fully supported. (SSL sessions can span
multiple "connections," and allow the client and server to avoid
costly renegotiations.)
*) makecert - a script that creates back-end certs.
*) pgkeygen - a tool that creates front-end certs.
*) the whole protocol issue, SASL, etc.
*) certs are fully validated - valid root certs must be available.
This is a hassle, but it means that you *can* trust the identity
of the server.
*) the client library can handle hardcoded root certificates, to
avoid the need to copy these files.
*) host name of server cert must resolve to IP address, or be a
recognized alias. This is more liberal than the previous
iteration.
*) the number of bytes transferred is tracked, and the session
key is periodically renegotiated.
*) basic cert generation scripts (mkcert.sh, pgkeygen.sh). The
configuration files have reasonable defaults for each type
of use.
Bear Giles
> Changes to avoid collisions with WIN32 & MFC names...
> 1. Renamed:
> a. PROC => PGPROC
> b. GetUserName() => GetUserNameFromId()
> c. GetCurrentTime() => GetCurrentDateTime()
> d. IGNORE => IGNORE_DTF in include/utils/datetime.h & utils/adt/datetim
>
> 2. Added _P to some lex/yacc tokens:
> CONST, CHAR, DELETE, FLOAT, GROUP, IN, OUT
Jan
yesterday's proposal to pghackers. Also remove unnecessary parameters
to heap_beginscan, heap_rescan. I modified pg_proc.h to reflect the
new numbers of parameters for the AM interface routines, but did not
force an initdb because nothing actually looks at those fields.
GUC support. It's now possible to set datestyle, timezone, and
client_encoding from postgresql.conf and per-database or per-user
settings. Also, implement rollback of SET commands that occur in a
transaction that later fails. Create a SET LOCAL var = value syntax
that sets the variable only for the duration of the current transaction.
All per previous discussions in pghackers.
As proof of concept, provide an alternate implementation based on POSIX
semaphores. Also push the SysV shared-memory implementation into a
separate file so that it can be replaced conveniently.
A new pg_hba.conf column, USER
Allow specifiction of lists of users separated by commas
Allow group names specified by +
Allow include files containing lists of users specified by @
Allow lists of databases, and database files
Allow samegroup in database column to match group name matching dbname
Removal of secondary password files
Remove pg_passwd utility
Lots of code cleanup in user.c and hba.c
New data/global/pg_pwd format
New data/global/pg_group file
per recent pghackers discussion: force a new WAL record at first nextval
after a checkpoint, and ensure that xlog is flushed to disk if a nextval
record is the only thing emitted by a transaction.
o Change all current CVS messages of NOTICE to WARNING. We were going
to do this just before 7.3 beta but it has to be done now, as you will
see below.
o Change current INFO messages that should be controlled by
client_min_messages to NOTICE.
o Force remaining INFO messages, like from EXPLAIN, VACUUM VERBOSE, etc.
to always go to the client.
o Remove INFO from the client_min_messages options and add NOTICE.
Seems we do need three non-ERROR elog levels to handle the various
behaviors we need for these messages.
Regression passed.
when to send what to which, prevent recursion by introducing new COMMERROR
elog level for client-communication problems, get rid of direct writes
to stderr in backend/libpq files, prevent non-error elogs from going to
client during the authentication cycle.
now just below FATAL in server_min_messages. Added more text to
highlight ordering difference between it and client_min_messages.
---------------------------------------------------------------------------
REALLYFATAL => PANIC
STOP => PANIC
New INFO level the prints to client by default
New LOG level the prints to server log by default
Cause VACUUM information to print only to the client
NOTICE => INFO where purely information messages are sent
DEBUG => LOG for purely server status messages
DEBUG removed, kept as backward compatible
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added
DebugLvl removed in favor of new DEBUG[1-5] symbols
New server_min_messages GUC parameter with values:
DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC
New client_min_messages GUC parameter with values:
DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC
Server startup now logged with LOG instead of DEBUG
Remove debug_level GUC parameter
elog() numbers now start at 10
Add test to print error message if older elog() values are passed to elog()
Bootstrap mode now has a -d that requires an argument, like postmaster
the first call of localtime() in a process will read /usr/lib/tztab or
local equivalent. Better to do this once in the postmaster and inherit
the data by fork() than to have to do it during every backend start.
to the client before closing the connection. Before 7.2 this was done
correctly, but new code would simply close the connection with no report
to the client.
so that only one signal number is used not three. Flags in shared
memory tell the reason(s) for the current signal. This method is
extensible to handle more signal reasons without chewing up even more
signal numbers, but the immediate reason is to keep pg_pwd reloads
separate from SIGHUP processing in the postmaster.
Also clean up some problems in the postmaster with delayed response to
checkpoint status changes --- basically, it wouldn't schedule a checkpoint
if it wasn't getting connection requests on a regular basis.
postmaster children before client auth step. Postmaster now rereads
pg_pwd on receipt of SIGHUP, the same way that pg_hba.conf is handled.
No cycles need be expended to validate password cache validity during
connection startup.
environment strings need to be moved around, do so when called from
initial startup (main.c), not in init_ps_status. This eliminates the
former risk of invalidating saved environment-string pointers, since
no code has yet had a chance to grab any such pointers when main.c
is running.
per suggestion from Peter. Simplify several APIs by transmitting the
original argv location directly from main.c to ps_status.c, instead of
passing it down through several levels of subroutines.
subprocesses; perhaps this will fix portability problem just noted by
Lockhart. Also, move test for bad permissions of DataDir to a more
logical place.
bootstrap) check for a valid PG_VERSION file before looking at anything
else in the data directory. This fixes confusing error report when
trying to start current sources in a pre-7.1 data directory.
Per trouble report from Rich Shepard 10/18/01.
just after receipt of the startup packet. Now, postmaster children
that are waiting for client authentication response will show as
'postgres: user database host authentication'. Also, do an
init_ps_display for startup/shutdown/checkpoint subprocesses,
so that they are readily identifiable as well. Fix an obscure race
condition that could lead to Assert failure in the postmaster ---
attempting to start a checkpoint process before any connections have
been received led to calling PostmasterRandom before setting random_seed.
readability. Bizarre '(long *) TRUE' return convention is gone,
in favor of just raising an error internally in dynahash.c when
we detect hashtable corruption. HashTableWalk is gone, in favor
of using hash_seq_search directly, since it had no hope of working
with non-LONGALIGNable datatypes. Simplify some other code that was
made undesirably grotty by promixity to HashTableWalk.
Partial support for BEOS (not sure whether second fork of grandchild
process needs these extra calls or not; someone who has BEOS will need
to test it).
portability issues). Caller-visible data structures are now allocated
on MAXALIGN boundaries, allowing safe use of datatypes wider than 'long'.
Rejigger hash_create API so that caller specifies size of key and
total size of entry, not size of key and size of rest of entry.
This simplifies life considerably since each number is just a sizeof(),
and padding issues etc. are taken care of automatically.
a hung client or lost connection can't indefinitely block a postmaster
child (not to mention the possibility of deliberate DoS attacks).
Timeout is controlled by new authentication_timeout GUC variable,
which I set to 60 seconds by default ... does that seem reasonable?
We will no longer try to send elog messages to the client before we have
initialized backend libpq (oops); however, reporting bogus commandline
switches via elog does work now (not irrelevant, because of PGOPTIONS).
Fix problem with inappropriate sending of checkpoint-process messages
to stderr.
for them, and making them just wastes time during backend startup/shutdown.
Also, remove compile-time MAXBACKENDS limit per long-ago proposal.
You can now set MaxBackends as high as your kernel can stand without
any reconfiguration/recompilation.
Make sure it exits immediately when collector process dies --- in old code,
buffer process would hang around and compete with the new buffer process
for packets. Make sure it doesn't block on writing the pipe when the
collector falls more than a pipeload behind. Avoid leaking pgstats FDs
into every backend.
platforms system(2) gets confused unless the signal handler is set to
SIG_DFL, not SIG_IGN. pgstats.c now uses pqsignal() as it should,
not signal(). Also, arrange for the stats collector process to show
a reasonable ID in 'ps', rather than looking like a postmaster.
number in the data structure so that we can give at least a minimally
useful idea of where the mistake is when we issue syntax error messages.
Move the ClientAuthentication() call to where it should have been in
the first place, so that postmaster memory releasing can happen in a
reasonable place also. Update obsolete comments, correct one real bug
(auth_argument was not picked up correctly).
immediately, we will fork a child even if the database state does not
permit connections to be accepted (eg, we are in recovery mode).
The child process will correctly reject the connection and exit as
soon as it's finished collecting the connection request message.
However, this means that reaper() must be prepared to see child
process exit signals even while it's waiting for startup or shutdown
process to finish. As was, a connection request arriving during a
database recovery or shutdown would cause postmaster abort.
> > secure_ctx changes too. it will be PGC_BACKEND after '-p'.
>
> Oh, okay, I missed that part. Could we see the total state of the
> patch --- ie, a diff against current CVS, not a bunch of deltas?
> I've gotten confused about what's in and what's out.
Ok, here it is. Cleared the ctx comment too - after -p
it will be PGC_BACKEND in any case.
Marko Kreen
a new postmaster child process. This should eliminate problems with
authentication blocking (e.g., ident, SSL init) and also reduce problems
with the accept queue filling up under heavy load.
The option to send elog output to a different file per backend (postgres -o)
has been disabled for now because the initialization would have to happen
in a different order and it's not clear we want to keep this anyway.
Here is Tomified version of my 2 pending patches.
Dropped the set_.._real change as it is not needed.
Desc would be:
* use GUC for settings from cmdline
Marko Kreen
detected sooner in backend startup, and is treated as an expected error
(it gives 'Sorry, too many clients already' now). This allows us not
to have to enforce the MaxBackends limit exactly in the postmaster.
Also, remove ProcRemove() and fold its functionality into ProcKill().
There's no good reason for a backend not to be responsible for removing
its PROC entry, and there are lots of good reasons for the postmaster
not to be touching shared-memory data structures.
datatypes, not only strings. parse_hook is useless for bool, I suppose,
but it seems possibly useful for int and double to apply variable-specific
constraints that are more complex than simple range limits. assign_hook
is definitely useful for all datatypes --- we need it right now for bool
to support date cache reset when changing Australian timezone rule setting.
Also, clean up some residual problems with the reset all/show all patch,
including memory leaks and mistaken reset of PostPortNumber. It seems
best that RESET ALL not touch variables that don't have SUSET or
USERSET context.
directory (which can be made a symlink to put temp files on another disk).
Add code to delete leftover temp files during postmaster startup.
Bruce, with some kibitzing from Tom.
Python) to support shared extension modules, I have learned that Guido
prefers the style of the attached patch to solve the above problem.
I feel that this solution is particularly appropriate in this case
because the following:
PglargeType
PgType
PgQueryType
are already being handled in the way that I am proposing for PgSourceType.
Jason Tishler
* Store two past checkpoint locations, not just one, in pg_control.
On startup, we fall back to the older checkpoint if the newer one
is unreadable. Also, a physical copy of the newest checkpoint record
is kept in pg_control for possible use in disaster recovery (ie,
complete loss of pg_xlog). Also add a version number for pg_control
itself. Remove archdir from pg_control; it ought to be a GUC
parameter, not a special case (not that it's implemented yet anyway).
* Suppress successive checkpoint records when nothing has been entered
in the WAL log since the last one. This is not so much to avoid I/O
as to make it actually useful to keep track of the last two
checkpoints. If the things are right next to each other then there's
not a lot of redundancy gained...
* Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs
on alternate bytes. Polynomial borrowed from ECMA DLT1 standard.
* Fix XLOG record length handling so that it will work at BLCKSZ = 32k.
* Change XID allocation to work more like OID allocation. (This is of
dubious necessity, but I think it's a good idea anyway.)
* Fix a number of minor bugs, such as off-by-one logic for XLOG file
wraparound at the 4 gig mark.
* Add documentation and clean up some coding infelicities; move file
format declarations out to include files where planned contrib
utilities can get at them.
* Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or
every CHECKPOINT_TIMEOUT seconds, whichever comes first. It is also
possible to force a checkpoint by sending SIGUSR1 to the postmaster
(undocumented feature...)
* Defend against kill -9 postmaster by storing shmem block's key and ID
in postmaster.pid lockfile, and checking at startup to ensure that no
processes are still connected to old shmem block (if it still exists).
* Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency
stop, for symmetry with postmaster and xlog utilities. Clean up signal
handling in bootstrap.c so that xlog utilities launched by postmaster
will react to signals better.
* Standalone bootstrap now grabs lockfile in target directory, as added
insurance against running it in parallel with live postmaster.
are now separate files "postgres.h" and "postgres_fe.h", which are meant
to be the primary include files for backend .c files and frontend .c files
respectively. By default, only include files meant for frontend use are
installed into the installation include directory. There is a new make
target 'make install-all-headers' that adds the whole content of the
src/include tree to the installed fileset, for use by people who want to
develop server-side code without keeping the complete source tree on hand.
Cleaned up a whole lot of crufty and inconsistent header inclusions.
any other client connections that may exist (which would only happen if
another client is currently in the authentication cycle). This avoids
wastage of open descriptors in a child. It might also explain peculiar
behaviors like not closing connections when expected, since the kernel
will probably not signal EOF as long as some other backend is randomly
holding open a reference to the connection, even if the client went away
long since ...
actually) to ensure that its file access time doesn't get old enough to
tempt a /tmp directory cleaner to remove it. Still another reason we
should never have put the sockets in /tmp in the first place ...
observed by Inoue. Also, don't call ProcRemove() from postmaster if we
have detected a backend crash --- too risky if shared memory is corrupted.
It's not needed anyway, considering we are going to reinitialize shared
memory and semaphores as soon as the last child is dead.
>> xlog.c : special case for beos to avoid 'link' which does not work yet
>> beos/sem.c : implementation of new sem_ctl call (GETPID) and a new
>sem_op
>> flag (IPCNOWAIT)
>> dynloader/beos.c : add a verification of symbol validity (seem that
the
>> loader sometime return OK with an invalid symbol)
>> postmaster.c : add beos forking support for the new checkpoint
process
>> postgres.c : remove beos special case for getrusage
>> beos.h : Correction of a bas definition of AF_UNIX, misc defnitions
>>
>>
>> thanks
>>
>>
>> cyril
Cyril VELTER
might change it. Experimentation shows that the signal handler call
mechanism does not save/restore errno for you, at least not on Linux
or HPUX, so this is definitely a real risk.
postmaster, because it isn't updated after forking away from the terminal.
Apparently it's not used anyplace in the postmaster ... but seems best
to make it show the correct PID ...
socket file, in favor of having an ordinary lockfile beside the socket file.
Clean up a few robustness problems in the lockfile code. If postmaster is
going to reject a connection request based on database state, it will now
tell you so before authentication exchange not after. (Of course, a failure
after is still possible if conditions change meanwhile, but this makes life
easier for a yet-to-be-written pg_ping utility.)
IPC key assignment will now work correctly even when multiple postmasters
are using same logical port number (which is possible given -k switch).
There is only one shared-mem segment per postmaster now, not 3.
Rip out broken code for non-TAS case in bufmgr and xlog, substitute a
complete S_LOCK emulation using semaphores in spin.c. TAS and non-TAS
logic is now exactly the same.
When deadlock is detected, "Deadlock detected" is now the elog(ERROR)
message, rather than a NOTICE that comes out before an unhelpful ERROR.
re-adopt these settings at every postmaster or standalone-backend startup.
This should fix problems with indexes becoming corrupt due to failure to
provide consistent locale environment for postmaster at all times. Also,
refuse to start up a non-locale-enabled compilation in a database originally
initdb'd with a non-C locale. Suppress LIKE index optimization if locale
is not "C" or "POSIX" (are there any other locales where it's safe?).
Issue NOTICE during initdb if selected locale disables LIKE optimization.
cloned, rather than always cloning template1. Modify initdb to generate
two identical databases rather than one, template0 and template1.
Connections to template0 are disallowed, so that it will always remain
in its virgin as-initdb'd state. pg_dumpall now dumps databases with
restore commands that say CREATE DATABASE foo WITH TEMPLATE = template0.
This allows proper behavior when there is user-added data in template1.
initdb forced!
hosting product, on both shared and dedicated machines. We currently
offer Oracle and MySQL, and it would be a nice middle-ground.
However, as shipped, PostgreSQL lacks the following features we need
that MySQL has:
1. The ability to listen only on a particular IP address. Each
hosting customer has their own IP address, on which all of their
servers (http, ftp, real media, etc.) run.
2. The ability to place the Unix-domain socket in a mode 700 directory.
This allows us to automatically create an empty database, with an
empty DBA password, for new or upgrading customers without having
to interactively set a DBA password and communicate it to (or from)
the customer. This in turn cuts down our install and upgrade times.
3. The ability to connect to the Unix-domain socket from within a
change-rooted environment. We run CGI programs chrooted to the
user's home directory, which is another reason why we need to be
able to specify where the Unix-domain socket is, instead of /tmp.
4. The ability to, if run as root, open a pid file in /var/run as
root, and then setuid to the desired user. (mysqld -u can almost
do this; I had to patch it, too).
The patch below fixes problem 1-3. I plan to address #4, also, but
haven't done so yet. These diffs are big enough that they should give
the PG development team something to think about in the meantime :-)
Also, I'm about to leave for 2 weeks' vacation, so I thought I'd get
out what I have, which works (for the problems it tackles), now.
With these changes, we can set up and run PostgreSQL with scripts the
same way we can with apache or proftpd or mysql.
In summary, this patch makes the following enhancements:
1. Adds an environment variable PGUNIXSOCKET, analogous to MYSQL_UNIX_PORT,
and command line options -k --unix-socket to the relevant programs.
2. Adds a -h option to postmaster to set the hostname or IP address to
listen on instead of the default INADDR_ANY.
3. Extends some library interfaces to support the above.
4. Fixes a few memory leaks in PQconnectdb().
The default behavior is unchanged from stock 7.0.2; if you don't use
any of these new features, they don't change the operation.
David J. MacKenzie
that search loops only have to scan that far and not through all maxBackends
entries. This eliminates a performance penalty for setting maxBackends
much higher than the average number of active backends. Also, eliminate
no-longer-used 'backend tag' concept. Remove setting of environment
variables at backend start (except for CYR_RECODE), since none of them
are being examined by the backend any longer.
working on the VERY latest version of BeOS. I'm sure there will be
alot of comments, but then if there weren't I'd be disappointed!
Thanks for your continuing efforts to get this into your tree.
Haven't bothered with the new files as they haven't changed.
BTW Peter, the compiler is "broken" about the bool define and so on.
I'm filing a bug report to try and get it addressed. Hopefully then we
can tidy up the code a bit.
I await the replies with interest :)
David Reid
user is now defined in terms of the user id, the user name is only computed
upon request (for display purposes). This is kind of the opposite of the
previous state, which would maintain the user name and compute the user id
for permission checks.
Besides perhaps saving a few cycles (integer vs string), this now creates a
single point of attack for changing the user id during a connection, for
purposes of "setuid" functions, etc.
that RAND_MAX applies to them, since it doesn't. Instead add a
config.h parameter MAX_RANDOM_VALUE. This is currently set at 2^31-1
but could be auto-configured if that ever proves necessary. Also fix
some outright bugs like calling srand() where srandom() is appropriate.
There's now only one transition value and transition function.
NULL handling in aggregates is a lot cleaner. Also, use Numeric
accumulators instead of integer accumulators for sum/avg on integer
datatypes --- this avoids overflow at the cost of being a little slower.
Implement VARIANCE() and STDDEV() aggregates in the standard backend.
Also, enable new LIKE selectivity estimators by default. Unrelated
change, but as long as I had to force initdb anyway...
* the result is not recorded anywhere
* the result is not used anywhere
* the result is only used in some places, whereas others have been getting away with it
* the result is used improperly
Also make command line options handling a little better (e.g., --disable-locale,
while redundant, should really still *dis*able).
* Add option to build with OpenSSL out of the box. Fix thusly exposed
bit rot. Although it compiles now, getting this to do something
useful is left as an exercise.
* Fix Kerberos options to defer checking for required libraries until
all the other libraries are checked for.
* Change default odbcinst.ini and krb5.srvtab path to PREFIX/etc.
* Install work around for Autoconf's install-sh relative path anomaly.
Get rid of old INSTL_*_OPTS variables, now that we don't need them
anymore.
* Use `gunzip -c' instead of g?zcat. Reportedly broke on AIX.
* Look for only one of readline.h or readline/readline.h, not both.
* Make check for PS_STRINGS cacheable. Don't test for the header files
separately.
* Disable fcntl(F_SETLK) test on Linux.
* Substitute the standard GCC warnings set into CFLAGS in configure,
don't add it on in Makefile.global.
* Sweep through contrib tree to teach makefiles standard semantics.
... and in completely unrelated news:
* Make postmaster.opts arbitrary options-aware. I still think we need to
save the environment as well.
backend functions via backend PQexec(). The SPI interface has long
been our only documented way to do this, and the backend pqexec/portal
code is unused and suffering bit-rot. I'm putting it out of its misery.
and config.h. Adjusted all referring code.
Scrapped pg_version and changed initdb accordingly. Integrated
src/utils/version.c into src/backend/utils/init/miscinit.c. Changed all
callers.
Set version number to `7.1devel'. (Non-numeric version suffixes now allowed.)
for details). It doesn't really do that much yet, since there are no
short-term memory contexts in the executor, but the infrastructure is
in place and long-term contexts are handled reasonably. A few long-
standing bugs have been fixed, such as 'VACUUM; anything' in a single
query string crashing. Also, out-of-memory is now considered a
recoverable ERROR, not FATAL.
Eliminate a large amount of crufty, now-dead code in and around
memory management.
Fix problem with holding off SIGTRAP, SIGSEGV, etc in postmaster and
backend startup.
option settings. Sort out SIGHUP vs BACKEND -- there is no total ordering
here, so make explicit checks. Add comments explaining all of this.
Removed permissions check on SHOW command.
Add examine_subclass to the game, rename to SQL_inheritance to fit the
official data model better. Adjust documentation.
Standalone backend needs to reset all options before it starts. To
facilitate that, have IsUnderPostmaster be set by the postmaster itself,
don't wait for the magic -p switch.
Also make sure that all environment variables and argv's survive
init_ps_display(). Use strdup where necessary.
Have initdb make configuration files (postgresql.conf, pg_hba.conf) mode
0600 -- having configuration files is no fun if you can't edit them.
we'll get there one day.
Use `cat' to create aclocal.m4, not `aclocal'. Some people don't
have automake installed.
Only run the autoconf rule in the top-level GNUmakefile if the
invoker specified `make configure', don't run it automatically
because of CVS timestamp skew.
That means you can now set your options in either or all of $PGDATA/configuration,
some postmaster option (--enable-fsync=off), or set a SET command. The list of
options is in backend/utils/misc/guc.c, documentation will be written post haste.
pg_options is gone, so is that pq_geqo config file. Also removed were backend -K,
-Q, and -T options (no longer applicable, although -d0 does the same as -Q).
Added to configure an --enable-syslog option.
changed all callers from TPRINTF to elog(DEBUG)
running gcc and HP's cc with warnings cranked way up. Signed vs unsigned
comparisons, routines declared static and then defined not-static,
that kind of thing. Tedious, but perhaps useful...
eliminating some wildly inconsistent coding in various parts of the
system. I set MAXPGPATH = 1024 in config.h.in. If anyone is really
convinced that there ought to be a configure-time test to set the
value, go right ahead ... but I think it's a waste of time.
When drawing up a very simple "text-drawing" of how the negotiation is done,
I realised I had done this last part (fallback) in a very stupid way. Patch
#4 fixes this, and does it in a much better way.
Included is also the simple text-drawing of how the negotiation is done.
//Magnus
of MAXBACKENDS is now 1024, since all it's costing is about 32 bytes of memory
per array slot. configure's --with-maxbackends switch now controls DEF_MAXBACKENDS
which is simply the default value of the postmaster's -N switch. Thus,
the out-of-the-box configuration will still limit you to 64 backends,
but you can go up to 1024 backends simply by restarting the postmaster with
a different -N switch --- no rebuild required.
(--with-maxbackends). Add a postmaster switch (-N backends) that allows
the limit to be reduced at postmaster start time. (You can't increase it,
sorry to say, because there are still some fixed-size arrays.)
Grab the number of semaphores indicated by min(MAXBACKENDS, -N) at
postmaster startup, so that this particular form of bogus configuration
is exposed immediately rather than under heavy load.
> tprintf.patch
>
> tprintf.patch
>
> adds functions and macros which implement a conditional trace package
> with the ability to change flags and numeric options of running
> backends at runtime.
> Options/flags can be specified in the command line and/or read from
> the file pg_options in the data directory.
assert.patch
adds a switch to turn on/off the assert checking if enabled at compile
time. You can now compile postgres with assert checking and disable it
at runtime in a production environment.
Making PQrequestCancel safe to call in a signal handler turned out to be
much easier than I feared. So here are the diffs.
Some notes:
* I modified the postmaster's packet "iodone" callback interface to allow
the callback routine to return a continue-or-drop-connection return
code; this was necessary to allow the connection to be closed after
receiving a Cancel, rather than proceeding to launch a new backend...
Being a neatnik, I also made the iodone proc have a typechecked
parameter list.
* I deleted all code I could find that had to do with OOB.
* I made some edits to ensure that all signals mentioned in the code
are referred to symbolically not by numbers ("SIGUSR2" not "2").
I think Bruce may have already done at least some of the same edits;
I hope that merging these patches is not too painful.
things as well:
* Computes and saves a cancel key for each backend. * fflush
before forking, to eliminate double-buffering problems
between postmaster and backends.
Other cleanups.
Tom Lane
Attached you'll find a (big) patch that fixes make dep and make
depend in all Makefiles where I found it to be appropriate.
It also removes the dependency in Makefile.global for NAMEDATALEN
and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh
a little smarter.
This no longer requires initdb.sh that is turned into initdb with
a sed script when installing Postgres, hence initdb.sh should be
renamed to initdb (after the patch has been applied :-) )
This patch is against the 6.3 sources, as it took a while to
complete.
Please review and apply,
Cheers,
Jeroen van Vianen
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
What it does:
It solves stupid problem with cyrillic charsets IP-based on-fly recoding.
take a look at /data/charset.conf for details.
You can use any tables for any charset.
Tables are from Russian Apache project.
Tables in this patch contains also Ukrainian characters.
Then run ./configure --enable-recode
I haven't had final confirmation from Peter yet, but the attached patch
needs to be applied for the Beta otherwise password and crypt
authentication just won't work.
It puts back the loop in libpq and also fixes a couple of problems with
maintaining compatability with pre-6.3 drivers.
I've completed the patch to fix the protocol and authentication issues I
was discussing a couple of weeks ago. The particular changes are:
- the protocol has a version number
- network byte order is used throughout
- the pg_hba.conf file is used to specify what method is used to
authenticate a frontend (either password, ident, trust, reject, krb4
or krb5)
- support for multiplexed backends is removed
- appropriate changes to man pages
- the -a switch to many programs to specify an authentication service
no longer has any effect
- the libpq.so version number has changed to 1.1
The new backend still supports the old protocol so old interfaces won't
break.
Makefile.global.
End result, if all goes well, should allow for much easier porting, since
there will no longer be a concept of a "port". Most, if not everything,
*should* be determined by configure, or by the compiler itself. Still
work to be done though :)
postgres backend processes end up as so called zombies. It seems that
only Linux a.out (libc.4.6.27) systems are affected.
By:
Wolfgang Roth <roth@statistik.uni-mannheim.de>
Subject: [HACKERS] locale patches !
Hi there,
here are little patches to get Postgres 6.1 works with locale stuff.
This is a patch against 970402.tar.gz, there are no problem to apply them
by hand to 6.0 release. Collate stuff tested about 1-2 months in real
working database but I'm sure there must be no problem. US hackers
could vote against locale implementation ( locale for sure will affect to
speed of postgres ), so I introduce variable USE_LOCALE which
controls locale stuff. Non-US users now could use ~* operator
for searching and <order by> for strings with nation alphabet.
Please, don't forget, as I did first time, to set environment variable
LC_CTYPE and LC_COLLATE because backend get locale information from them.
I start postmaster from a little script, assuming that shell is Bash shell
it looks like:
#!/bin/sh
export LC_CTYPE=koi8-r
export LC_COLLATE=koi8-r
postmaster -B 1024 -S -D/usr/local/pgsql/data/ -o '-Fe'
Subject: [HACKERS] password authentication
This patch adds support for plaintext password authentication. To use
it, you add a line like
host all 0.0.0.0 0.0.0.0 password pg_pwd.conf
to your pg_hba.conf, where 'pg_pwd.conf' is the name of a file containing
the usernames and password hashes in the format of the first two fields
of a Unix /etc/passwd file. (Of course, you can use a specific database
name or IP instead.)
Then, to connect with a password through libpq, you use the PQconnectdb()
function, specifying the "password=" tag in the connect string and also
adding the tag "authtype=password".
I also added a command-line switch '-u' to psql that tells it to prompt
for a username and password and use password authentication.
gmake of the code without interruption.
There's also some tidy-up of the MAXPATHLEN stuff based on the assumption that
all supported platforms have MAXPATHLEN defined in <sys/param.h>.
(The only unknowns for the above are AIX and IRIX5.)
In particular, no more compiled-in default for PGDATA or LIBDIR. Commands
that need them need either invocation options or environment variables.
PGPORT default is hardcoded as 5432, but overrideable with options or
environment variables.
my postmaster 1.07.
It's really simple, the loop dealing with all sockets
can't handle more than one ready socket :-)
A simple logic error dealing with lists.
OR IS THERE ANY REASON FOR SETTING curr TO 0?
Submitted by: Carsten Heyl <Heyl@nads.de>
directory. The code that looks for the pg_hba file doesn't use it, though,
so the postmaster uses the wrong pg_hba file. Also, when the postmaster
looks in one directory and the user thinks it is looking in another
directory, the error messages don't give enough information to solve the
problem. I extended the error message for this.
Submitted by: Bryan Henderson <bryanh@giraffe.netgate.net>
I've enclosed two patches. The first affects Solaris compilability. The
bug stems from netdb.h (where MAXHOSTNAMELEN is defined on a stock
system). If the user has installed the header files from BIND 4.9.x,
there will be no definition of MAXHOSTNAMELEN. The patch will, if all
else fails, try to include <arpa/nameser.h> and set MAXHOSTNAMELEN to
MAXDNAME, which is 256 (just like MAXHOSTNAMELEN on a stock system).
The second patch adds aliases for "ISNULL" to "IS NULL" and likewise for
"NOTNULL" to "IS NOT NULL". I have not removed the postgres specific
ISNULL and NOTNULL. I noticed this on the TODO list, and figured it would
be easy to remove.
The full semantics are:
[ expression IS NULL ]
[ expression IS NOT NULL ]
--Jason
Submitted by: Jason Wright <jason@oozoo.vnet.net>