Commit Graph

922 Commits

Author SHA1 Message Date
Tom Lane
04f4e10cfc Use symbolic names not octal constants for file permission flags.
Purely cosmetic patch to make our coding standards more consistent ---
we were doing symbolic some places and octal other places.  This patch
fixes all C-coded uses of mkdir, chmod, and umask.  There might be some
other calls I missed.  Inconsistency noted while researching tablespace
directory permissions issue.
2010-12-10 17:35:33 -05:00
Tom Lane
b58c25055e Fix leakage of cost_limit when multiple autovacuum workers are active.
When using default autovacuum_vac_cost_limit, autovac_balance_cost relied
on VacuumCostLimit to contain the correct global value ... but after the
first time through in a particular worker process, it didn't, because we'd
trashed it in previous iterations.  Depending on the state of other autovac
workers, this could result in a steady reduction of the effective
cost_limit setting as a particular worker processed more and more tables,
causing it to go slower and slower.  Spotted by Simon Poole (bug #5759).
Fix by saving and restoring the GUC variables in the loop in do_autovacuum.

In passing, improve a few comments.

Back-patch to 8.3 ... the cost rebalancing code has been buggy since it was
put in.
2010-11-19 22:29:44 -05:00
Magnus Hagander
4acf99b2f3 Send paramHandle to subprocesses as 64-bit on Win64
The handle to the shared memory segment containing startup
parameters was sent as 32-bit even on 64-bit systems. Since
HANDLEs appear to be allocated sequentially this shouldn't
be a problem until we reach 2^32 open handles in the postmaster,
but a 64-bit value should be sent across as 64-bit, and not
zero out the top 32 bits.

Noted by Tom Lane.
2010-11-16 12:40:56 +01:00
Robert Haas
3134d8863e Add new buffers_backend_fsync field to pg_stat_bgwriter.
This new field counts the number of times that a backend which writes a
buffer out to the OS must also fsync() it.  This happens when the
bgwriter fsync request queue is full, and is generally detrimental to
performance, so it's good to know when it's happening.  Along the way,
log a new message at level DEBUG1 whenever we fail to hand off an fsync,
so that the problem can also be seen in examination of log files
(if the logging level is cranked up high enough).

Greg Smith, with minor tweaks by me.
2010-11-15 12:42:59 -05:00
Tom Lane
3892a2d861 Fix canAcceptConnections() bugs introduced by replication-related patches.
We must not return any "okay to proceed" result code without having checked
for too many children, else we might fail later on when trying to add the
new child to one of the per-child state arrays.  It's not clear whether
this oversight explains Stefan Kaltenbrunner's recent report, but it could
certainly produce a similar symptom.

Back-patch to 8.4; the logic was not broken before that.
2010-11-14 15:57:37 -05:00
Alvaro Herrera
854ae8c3a6 Fix permanent memory leak in autovacuum launcher
get_database_list was uselessly allocating its output data, along some
created along the way, in a permanent memory context.  This didn't
matter when autovacuum was a single, short-lived process, but now that
the launcher is permanent, it shows up as a permanent leak.

To fix, make get_database list allocate its output data in the caller's
context, which is in charge of freeing it when appropriate; and the
memory leaked by heap_beginscan et al is allocated in a throwaway
transaction context.
2010-11-08 18:35:42 -03:00
Tom Lane
e6721c6e16 Previous patch had no detectable virtue other than being a one-liner.
Try to make the code look self-consistent again, so it doesn't confuse
future developers.
2010-10-27 15:26:24 -04:00
Heikki Linnakangas
869af50fcf Fix long-standing segfault when accept() or one of the calls made right
after accepting a connection fails, and the server is compiled with GSSAPI
support. Report and patch by Alexander V. Chernikov, bug #5731.
2010-10-27 20:07:13 +03:00
Peter Eisentraut
6ab42ae367 Support host names in pg_hba.conf
Peter Eisentraut, reviewed by KaiGai Kohei and Tom Lane
2010-10-15 22:56:18 +03:00
Bruce Momjian
23177114c6 Improve comment about ignoring 128 error code on Windows:
* Microsoft reports it is related to mutex failure:
     *   http://archives.postgresql.org/pgsql-hackers/2010-09/msg00790.php
2010-10-15 01:58:11 +00:00
Tom Lane
f4d242ef94 Remove some unnecessary tests of pgstat_track_counts.
We may as well make pgstat_count_heap_scan() and related macros just count
whenever rel->pgstat_info isn't null.  Testing pgstat_track_counts buys
nothing at all in the normal case where that flag is ON; and when it's OFF,
the pgstat_info link will be null, so it's still a useless test.

This change is unlikely to buy any noticeable performance improvement,
but a cycle shaved is a cycle earned; and my investigations earlier today
convinced me that we're down to the point where individual instructions in
the inner execution loops are starting to matter.
2010-10-12 14:44:25 -04:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Magnus Hagander
594419e74a Treat exit code 128 (ERROR_WAIT_NO_CHILDREN) as non-fatal on Win32,
since it can happen when a process fails to start when the system
is under high load.

Per several bug reports and many peoples investigation.

Back-patch to 8.4, which is as far back as the "deadman-switch"
for shared memory access exists.
2010-09-16 20:37:13 +00:00
Magnus Hagander
946045f04d Add vacuum and analyze counters to pg_stat_*_tables views. 2010-08-21 10:59:17 +00:00
Robert Haas
debcec7dc3 Include the backend ID in the relpath of temporary relations.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.

Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.

Review by Jaime Casanova and Tom Lane.
2010-08-13 20:10:54 +00:00
Tom Lane
46aa77c7bd Add stats functions and views to provide access to a transaction's own
statistics counts.  These numbers are being accumulated but haven't yet been
transmitted to the collector (and won't be, until the transaction ends).
For some purposes, though, it's handy to be able to look at them.

Joel Jacobson, reviewed by Itagaki Takahiro
2010-08-08 16:27:06 +00:00
Robert Haas
5ffaa9005c Add restart_after_crash GUC.
Normally, we automatically restart after a backend crash, but in some
cases when PostgreSQL is invoked by clusterware it may be desirable to
suppress this behavior, so we provide an option which does this.
Since no existing GUC group quite fits, create a new group called
"error handling options" for this and the previously undocumented GUC
exit_on_error, which is now documented.

Review by Fujii Masao.
2010-07-20 00:47:53 +00:00
Tom Lane
3ec694e17b Add a log_file_mode GUC that allows control of the file permissions set on
log files created by the syslogger process.

In passing, make unix_file_permissions display its value in octal, same
as log_file_mode now does.

Martin Pihlak
2010-07-16 22:25:51 +00:00
Bruce Momjian
239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Robert Haas
243bbe6ed8 Add stray "else" that seems to have gone missing. 2010-06-24 16:40:45 +00:00
Peter Eisentraut
418e1d82fd Refactor sprintf calls with computed format strings into multiple calls with
constant format strings, so that the compiler can more easily check the
formats for correctness.
2010-06-16 00:54:16 +00:00
Peter Eisentraut
cb6038c168 Fix some inconsistent quoting of wal_level values in messages
When referring to postgresql.conf syntax, then it's without quotes
(wal_level=archive); in narrative it's with double quotes.  But never
single quotes.
2010-06-03 21:02:12 +00:00
Robert Haas
5e85315ea7 Avoid starting walreceiver in states where it shouldn't be running.
In particular, it's bad to start walreceiver when in state
PM_WAIT_BACKENDS, because we have no provision to kill walreceiver
when in that state.

Fujii Masao
2010-05-27 02:01:37 +00:00
Robert Haas
615704af1e More fixes for shutdown during recovery.
1. If we receive a fast shutdown request while in the PM_STARTUP state,
process it just as we would in PM_RECOVERY, PM_HOT_STANDBY, or PM_RUN.
Without this change, an early fast shutdown followed by Hot Standby causes
the database to get stuck in a state where a shutdown is pending (so no new
connections are allowed) but the shutdown request is never processed unless
we end Hot Standby and enter normal running.

2. Avoid removing the backup label file when a smart or fast shutdown occurs
during recovery.  It makes sense to do this once we've reached normal running,
since we must be taking a backup which now won't be valid.  But during
recovery we must be recovering from a previously taken backup, and any backup
label file is needed to restart recovery from the right place.

Fujii Masao and Robert Haas
2010-05-26 12:32:41 +00:00
Robert Haas
ea9968c331 Rename PM_RECOVERY_CONSISTENT and PMSIGNAL_RECOVERY_CONSISTENT.
The new names PM_HOT_STANDBY and PMSIGNAL_BEGIN_HOT_STANDBY more accurately
reflect their actual function.
2010-05-15 20:01:32 +00:00
Robert Haas
a724584735 We now accept read-only connections in state PM_RECOVERY_CONSISTENT. 2010-05-14 18:08:33 +00:00
Tom Lane
4a69624f49 Cause the archiver process to adopt new postgresql.conf settings (particularly
archive_command) as soon as possible, namely just before issuing a new call
of archive_command, even when there is a backlog of files to be archived.
The original coding would only absorb new settings after clearing the backlog
and returning to the outer loop.  Per discussion.

Back-patch to 8.3.  The logic in prior versions is a bit different and it
doesn't seem worth taking any risks of breaking it.
2010-05-11 16:42:28 +00:00
Tom Lane
77acab75df Modify ShmemInitStruct and ShmemInitHash to throw errors internally,
rather than returning NULL for some-but-not-all failures as they used to.
Remove now-redundant tests for NULL from call sites.

We had to do something about this because many call sites were failing to
check for NULL; and changing it like this seems a lot more useful and
mistake-proof than adding checks to the call sites without them.
2010-04-28 16:54:16 +00:00
Heikki Linnakangas
9b8a73326e Introduce wal_level GUC to explicitly control if information needed for
archival or hot standby should be WAL-logged, instead of deducing that from
other options like archive_mode. This replaces recovery_connections GUC in
the primary, where it now has no effect, but it's still used in the standby
to enable/disable hot standby.

Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.

Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.
2010-04-28 16:10:43 +00:00
Heikki Linnakangas
961ad3fdd9 On Windows, syslogger runs in two threads. The main thread processes config
reload and rotation signals, and a helper thread reads messages from the
pipe and writes them to the log file. However, server code isn't generally
thread-safe, so if both try to do e.g palloc()/pfree() at the same time,
bad things will happen. To fix that, use a critical section (which is like
a mutex) to enforce that only one the threads are active at a time.
2010-04-16 09:51:49 +00:00
Robert Haas
1c850fa807 Make smart shutdown work in combination with Hot Standby/Streaming Replication.
At present, killing the startup process does not release any locks it holds,
so we must wait to stop the startup and walreceiver processes until all
read-only backends have exited.  Without this patch, the startup and
walreceiver processes never exit, so the server gets permanently stuck in
a half-shutdown state.

Fujii Masao, with review, docs, and comment adjustments by me.
2010-04-08 01:39:37 +00:00
Heikki Linnakangas
93001dfd18 Don't pass an invalid file handle to dup2(). That causes a crash on
Windows, thanks to a feature in CRT called Parameter Validation.

Backpatch to 8.2, which is the oldest version supported on Windows. In
8.2 and 8.3 also backpatch the earlier change to use DEVNULL instead of
NULL_DEV #define for a /dev/null-like device. NULL_DEV was hard-coded to
"/dev/null" regardless of platform, which didn't work on Windows, while
DEVNULL works on all platforms. Restarting syslogger didn't work on
Windows on versions 8.3 and below because of that.
2010-04-01 20:12:22 +00:00
Simon Riggs
65cd829232 Modify some new and pre-existing messages for translatability. 2010-03-25 20:40:17 +00:00
Tom Lane
223f82d4da Now that we know last_statrequest > last_statwrite can be observed in the
buildfarm, expend a little more effort on the log message for it.
2010-03-24 16:07:10 +00:00
Tom Lane
52e2b33a55 Add some logging code for unexpected cases in pgstat.c, particularly being
unable to read a stats file for reasons other than ENOENT, and having to reset
last_statrequest because it's later than current time in the collector.
Not clear if this will shed any light on the "pgstat wait timeout" business,
but it seems like a good idea in general.

In passing, do some message-style-police work on recently-added
pgstat_reset_shared_counters code.
2010-03-12 22:19:19 +00:00
Bruce Momjian
65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Robert Haas
e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Bruce Momjian
4b113d9cdc Document that archive_timeout will force new WAL files even if a single
checkpoint has happened, and recommend adjusting checkpoint_timeout to
reduce the impact of this.
2010-02-05 23:37:43 +00:00
Magnus Hagander
f13944e9c9 Make checks for invalid pgStatSock use PGINVALID_SOCKET 2010-01-31 17:39:34 +00:00
Magnus Hagander
083e1b0f27 Add functions to reset the statistics counter for a single table/index or
a single function.
2010-01-28 14:25:41 +00:00
Heikki Linnakangas
1bb2558046 Make standby server continuously retry restoring the next WAL segment with
restore_command, if the connection to the primary server is lost. This
ensures that the standby can recover automatically, if the connection is
lost for a long time and standby falls behind so much that the required
WAL segments have been archived and deleted in the master.

This also makes standby_mode useful without streaming replication; the
server will keep retrying restore_command every few seconds until the
trigger file is found. That's the same basic functionality pg_standby
offers, but without the bells and whistles.

To implement that, refactor the ReadRecord/FetchRecord functions. The
FetchRecord() function introduced in the original streaming replication
patch is removed, and all the retry logic is now in a new function called
XLogReadPage(). XLogReadPage() is now responsible for executing
restore_command, launching walreceiver, and waiting for new WAL to arrive
from primary, as required.

This also changes the life cycle of walreceiver. When launched, it now only
tries to connect to the master once, and exits if the connection fails, or
is lost during streaming for any reason. The startup process detects the
death, and re-launches walreceiver if necessary.
2010-01-27 15:27:51 +00:00
Magnus Hagander
7e40cdc075 Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide shared
statistics of the bgwriter.

Greg Smith
2010-01-19 14:11:32 +00:00
Heikki Linnakangas
40f908bdcd Introduce Streaming Replication.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.

Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.

Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.

Fujii Masao, with additional hacking by me
2010-01-15 09:19:10 +00:00
Tom Lane
d5e0029862 Add some simple support and documentation for using process-specific oom_adj
settings to prevent the postmaster from being OOM-killed on Linux systems.

Alex Hunsaker and Tom Lane
2010-01-11 18:39:32 +00:00
Magnus Hagander
87091cb1f1 Create typedef pgsocket for storing socket descriptors.
This silences some warnings on Win64. Not using the proper SOCKET datatype
was actually wrong on Win32 as well, but didn't cause any warnings there.

Also create define PGINVALID_SOCKET to indicate an invalid/non-existing
socket, instead of using a hardcoded -1 value.
2010-01-10 14:16:08 +00:00
Bruce Momjian
0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Magnus Hagander
13c5fdb5c8 Fix one more cast for _open_osfhandle().
Tsutomu Yamada
2010-01-02 12:01:29 +00:00
Tom Lane
48c192c15e Revise pgstat's tracking of tuple changes to improve the reliability of
decisions about when to auto-analyze.

The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples,
where all three of these numbers could be bad estimates from ANALYZE itself.
Even worse, in the presence of a steady flow of HOT updates and matching
HOT-tuple reclamations, auto-analyze might never trigger at all, even if all
three numbers are exactly right, because n_dead_tuples could hold steady.

To fix, replace last_anl_tuples with an accurately tracked count of the total
number of committed tuple inserts + updates + deletes since the last ANALYZE
on the table.  This can still be compared to the same threshold as before, but
it's much more trustworthy than the old computation.  Tracking this requires
one more intra-transaction counter per modified table within backends, but no
additional memory space in the stats collector.  There probably isn't any
measurable speed difference; if anything it might be a bit faster than before,
since I was able to eliminate some per-tuple arithmetic operations in favor of
adding sums once per (sub)transaction.

Also, simplify the logic around pgstat vacuum and analyze reporting messages
by not trying to fold VACUUM ANALYZE into a single pgstat message.

The original thought behind this patch was to allow scheduling of analyzes
on parent tables by artificially inflating their changes_since_analyze count.
I've left that for a separate patch since this change seems to stand on its
own merit.
2009-12-30 20:32:14 +00:00
Tom Lane
0b39231431 Avoid memory leak if pgstat_vacuum_stat is interrupted partway through.
The temporary hash tables made by pgstat_collect_oids should be allocated
in a short-term memory context, which is not the default behavior of
hash_create.  Noted while looking through hash_create calls in connection
with Robert Haas' recent complaint.

This is a pre-existing bug, but it doesn't seem important enough to
back-patch.  The hash table is not so large that it would matter unless this
happened many times within a session, which seems quite unlikely.
2009-12-27 19:40:07 +00:00
Simon Riggs
efc16ea520 Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 01:32:45 +00:00