Commit Graph

1374 Commits

Author SHA1 Message Date
Tom Lane d18643c4a6 Shift the responsibility for emitting "database system is shut down".
Historically this message has been emitted at the end of ShutdownXLOG().
That's not an insane place for it in a standalone backend, but in the
postmaster environment we've grown a fair amount of stuff that happens
later, including archiver/walsender shutdown, stats collector shutdown,
etc.  Recent buildfarm experimentation showed that on slower machines
there could be many seconds' delay between finishing ShutdownXLOG() and
actual postmaster exit.  That's fairly confusing, both for testing
purposes and for DBAs.  Hence, move the code that prints this message
into UnlinkLockFiles(), so that it comes out just after we remove the
postmaster's pidfile.  That is a more appropriate definition of "is shut
down" from the point of view of "pg_ctl stop", for example.  In general,
removing the pidfile should be the last externally-visible action of
either a postmaster or a standalone backend; compare commit
d73d14c271 for instance.  So this seems
like a reasonably future-proof approach.
2016-02-11 14:14:22 -05:00
Tom Lane c5e9b77127 Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier."
This reverts commit 3971f64843 and a
couple of followon debugging commits; I think we've learned what we can
from them.
2016-02-10 16:01:04 -05:00
Tom Lane 7351e18286 Add more chattiness in server shutdown.
Early returns from the buildfarm show that there's a bit of a gap in the
logging I added in 3971f64843b02e4a: the portion of CreateCheckPoint()
after CheckPointGuts() can take a fair amount of time.  Add a few more
log messages in that section of code.  This too shall be reverted later.
2016-02-09 11:21:46 -05:00
Tom Lane 3971f64843 Temporarily make pg_ctl and server shutdown a whole lot chattier.
This is a quick hack, due to be reverted when its purpose has been served,
to try to gather information about why some of the buildfarm critters
regularly fail with "postmaster does not shut down" complaints.  Maybe they
are just really overloaded, but maybe something else is going on.  Hence,
instrument pg_ctl to print the current time when it starts waiting for
postmaster shutdown and when it gives up, and add a lot of logging of the
current time in the server's checkpoint and shutdown code paths.

No attempt has been made to make this pretty.  I'm not even totally sure
if it will build on Windows, but we'll soon find out.
2016-02-08 18:43:11 -05:00
Robert Haas 7c944bd903 Introduce a new GUC force_parallel_mode for testing purposes.
When force_parallel_mode = true, we enable the parallel mode restrictions
for all queries for which this is believed to be safe.  For the subset of
those queries believed to be safe to run entirely within a worker, we spin
up a worker and run the query there instead of running it in the
original process.  When force_parallel_mode = regress, make additional
changes to allow the regression tests to run cleanly even though parallel
workers have been injected under the hood.

Taken together, this facilitates both better user testing and better
regression testing of the parallelism code.

Robert Haas, with help from Amit Kapila and Rushabh Lathia.
2016-02-07 11:41:33 -05:00
Robert Haas a1c1af2a1f Introduce group locking to prevent parallel processes from deadlocking.
For locking purposes, we now regard heavyweight locks as mutually
non-conflicting between cooperating parallel processes.  There are some
possible pitfalls to this approach that are not to be taken lightly,
but it works OK for now and can be changed later if we find a better
approach.  Without this, it's very easy for parallel queries to
silently self-deadlock if the user backend holds strong relation locks.

Robert Haas, with help from Amit Kapila.  Thanks to Noah Misch and
Andres Freund for extensive discussion of possible issues with this
approach.
2016-02-07 10:16:13 -05:00
Robert Haas 7191ce8bea Make all built-in lwlock tranche IDs fixed.
This makes the values more stable, which seems like a good thing for
anybody who needs to look at at them.

Alexander Korotkov and Amit Kapila
2016-02-02 06:45:55 -05:00
Heikki Linnakangas 61ce1e8f15 Fix misspelled function name in comment. 2016-02-01 10:10:24 +02:00
Simon Riggs 978b2f65aa Speedup 2PC by skipping two phase state files in normal path
2PC state info is written only to WAL at PREPARE, then read back from WAL at
COMMIT PREPARED/ABORT PREPARED. Prepared transactions that live past one bufmgr
checkpoint cycle will be written to disk in the same form as previously. Crash
recovery path is not altered. Measured performance gains of 50-100% for short
2PC transactions by completely avoiding writing files and fsyncing. Other
optimizations still available, further patches in related areas expected.

Stas Kelvich and heavily edited by Simon Riggs

Based upon earlier ideas and patches by Michael Paquier and Heikki Linnakangas,
a concrete example of how Postgres-XC has fed back ideas into PostgreSQL.

Reviewed by Michael Paquier, Jeff Janes and Andres Freund
Performance testing by Jesper Pedersen
2016-01-20 18:40:44 -08:00
Simon Riggs 422a55a687 Refactor to create generic WAL page read callback
Previously we didn’t have a generic WAL page read callback function,
surprisingly. Logical decoding has logical_read_local_xlog_page(), which was
actually generic, so move that to xlogfunc.c and rename to
read_local_xlog_page().
Maintain logical_read_local_xlog_page() so existing callers still work.

As requested by Michael Paquier, Alvaro Herrera and Andres Freund
2016-01-20 17:18:58 -08:00
Simon Riggs e63bb4549a Add new user fn pg_current_xlog_flush_location()
Tomas Vondra, reviewed by Michael Paquier and Amit Kapila
Minor edits by me
2016-01-12 07:54:52 +00:00
Simon Riggs 1e29e6324c Maintain local LogwrtResult consistently
Teach GetFlushRecPtr() to update LogwrtResult cache as performed by all other
functions in xlog.c
2016-01-12 07:33:20 +00:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Noah Misch 3cd1ba147e Fix comments about WAL rule "write xlog before data" versus pg_multixact.
Recovery does not achieve its goal of zeroing all pg_multixact entries
whose accompanying WAL records never reached disk.  Remove that claim
and justify its expendability.  Detail the need for TrimMultiXact(),
which has little in common with the TrimCLOG() rationale.  Merge two
tightly-related comments.  Stop presenting pg_multixact as specific to
heap_lock_tuple(); PostgreSQL 9.3 extended its use to heap_update().

Noticed while investigating a report from Andres Freund.
2016-01-01 01:46:46 -05:00
Joe Conway 241448b23a Rename (new|old)estCommitTs to (new|old)estCommitTsXid
The variables newestCommitTs and oldestCommitTs sound as if they are
timestamps, but in fact they are the transaction Ids that correspond
to the newest and oldest timestamps rather than the actual timestamps.
Rename these variables to reflect that they are actually xids: to wit
newestCommitTsXid and oldestCommitTsXid respectively. Also modify
related code in a similar fashion, particularly the user facing output
emitted by pg_controldata and pg_resetxlog.

Complaint and patch by me, review by Tom Lane and Alvaro Herrera.
Backpatch to 9.5 where these variables were first introduced.
2015-12-28 12:34:11 -08:00
Robert Haas 3fed417452 Provide a way to predefine LWLock tranche IDs.
It's a bit cumbersome to use LWLockNewTrancheId(), because the returned
value needs to be shared between backends so that each backend can call
LWLockRegisterTranche() with the correct ID.  So, for built-in tranches,
use a hard-coded value instead.

This is motivated by an upcoming patch adding further built-in tranches.

Andres Freund and Robert Haas
2015-12-15 11:48:19 -05:00
Andres Freund cca705a5d9 Fix bug in SetOffsetVacuumLimit() triggered by find_multixact_start() failure.
Previously, if find_multixact_start() failed, SetOffsetVacuumLimit() would
install 0 into MultiXactState->offsetStopLimit if it previously succeeded.
Luckily, there are no known cases where find_multixact_start() will return
an error in 9.5 and above. But if it were to happen, for example due to
filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId()
could continue allocating mxids even if close to a wraparound, or it could
erroneously stop allocating mxids, even if no wraparound is looming.  The
wrong value would be corrected the next time SetOffsetVacuumLimit() is
called, or by a restart.

Reported-By: Noah Misch, although this is not his preferred fix
Discussion: 20151210140450.GA22278@alap3.anarazel.de
Backpatch: 9.5, where the bug was introduced as part of 4f627f
2015-12-14 11:34:16 +01:00
Alvaro Herrera 69e7235c93 Fix commit timestamp initialization
This module needs explicit initialization in order to replay WAL records
in recovery, but we had broken this recently following changes to make
other (stranger) scenarios work correctly.  To fix, rework the
initialization sequence so that it always takes place before WAL replay
commences for both master and standby.

I could have gone for a more localized fix that just added a "startup"
call for the master server, but it seemed better to restructure the
existing callers as well so that the whole thing made more sense.  As a
drawback, there is more control logic in xlog.c now than previously, but
doing otherwise meant passing down the ControlFile flag, which seemed
uglier as a whole.

This also meant adding a check to not re-execute ActivateCommitTs if it
had already been called.

Reported by Fujii Masao.

Backpatch to 9.5.
2015-12-11 14:30:43 -03:00
Peter Eisentraut a351705d8a Improve some messages 2015-12-10 22:05:27 -05:00
Andres Freund e3f4cfc7aa Fix bug leading to restoring unlogged relations from empty files.
At the end of crash recovery, unlogged relations are reset to the empty
state, using their init fork as the template. The init fork is copied to
the main fork without going through shared buffers. Unfortunately WAL
replay so far has not necessarily flushed writes from shared buffers to
disk at that point. In normal crash recovery, and before the
introduction of 'fast promotions' in fd4ced523 / 9.3, the
END_OF_RECOVERY checkpoint flushes the buffers out in time. But with
fast promotions that's not the case anymore.

To fix, force WAL writes targeting the init fork to be flushed
immediately (using the new FlushOneBuffer() function). In 9.5+ that
flush can centrally be triggered from the code dealing with restoring
full page writes (XLogReadBufferForRedoExtended), in earlier releases
that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay
function.

Backpatch to 9.1, even if this currently is only known to trigger in
9.3+. Flushing earlier is more robust, and it is advantageous to keep
the branches similar.

Typical symptoms of this bug are errors like
'ERROR:  index "..." contains unexpected zero page at block 0'
shortly after promoting a node.

Reported-By: Thom Brown
Author: Andres Freund and Michael Paquier
Discussion: 20150326175024.GJ451@alap3.anarazel.de
Backpatch: 9.1-
2015-12-10 16:29:26 +01:00
Alvaro Herrera 820ddb2c2f Further tweak commit_timestamp behavior
As pointed out by Fujii Masao, we weren't quite there on a standby
behaving sanely: first because we were failing to acquire the correct
state in the case where no XLOG_PARAMETER_CHANGE message was sent
(because a checkpoint had already happened after the setting was changed
in the master, and then the standby was restarted); and second because
promoting the standby with the feature enabled failed to activate it if
the master had the feature disabled.

This patch fixes both those misbehaviors hopefully without
re-introducing any old problems.

Also change the hint emitted in a standby together with the error
message about the feature being disabled, to make it point out that the
place to chance the setting is the master.  Otherwise, if the setting is
already enabled in the standby, it is very confusing to have it say that
the setting must be enabled ...

Authors: Álvaro Herrera, Petr Jelínek.
Backpatch to 9.5.
2015-12-03 19:22:31 -03:00
Peter Eisentraut 5db837d3f2 Message improvements 2015-11-16 21:39:23 -05:00
Robert Haas fe702a7b3f Move each SLRU's lwlocks to a separate tranche.
This makes it significantly easier to identify these lwlocks in
LWLOCK_STATS or Trace_lwlocks output.  It's also arguably better
from a modularity standpoint, since lwlock.c no longer needs to
know anything about the LWLock needs of the higher-level SLRU
facility.

Ildus Kurbangaliev, reviewd by Álvaro Herrera and by me.
2015-11-12 14:59:09 -05:00
Robert Haas 64b2e7ad91 Pass extra data to bgworkers, and use this to fix parallel contexts.
Up until now, the total amount of data that could be passed to a
background worker at startup was one datum, which can be a small as
4 bytes on some systems.  That's enough to pass a dsm_handle or an
array index, but not much else.  Add a bgw_extra flag to the
BackgroundWorker struct, allowing up to 128 bytes to be passed to
a new worker on any platform.

Use this to fix a problem I recently discovered with the parallel
context machinery added in 9.5: the master assigns each worker an
array index, and each worker subsequently assigns itself an array
index, and there's nothing to guarantee that the two sets of indexes
match, leading to chaos.

Normally, I would not back-patch the change to add bgw_extra, since it
is basically a feature addition.  However, since 9.5 is still in beta
and there seems to be no other sensible way to repair the broken
parallel context machinery, back-patch to 9.5.  Existing background
worker code can ignore the bgw_extra field without a problem, but
might need to be recompiled since the structure size has changed.

Report and patch by me.  Review by Amit Kapila.
2015-11-05 12:13:56 -05:00
Robert Haas 3a1f8611f2 Update parallel executor support to reuse the same DSM.
Commit b0b0d84b3d purported to make it
possible to relaunch workers using the same parallel context, but it had
an unpleasant race condition: we might reinitialize after the workers
have sent their last control message but before they have dettached the
DSM, leaving to crashes.  Repair by introducing a new ParallelContext
operation, ReinitializeParallelDSM.

Adjust execParallel.c to use this new support, so that we can rescan a
Gather node by relaunching workers but without needing to recreate the
DSM.

Amit Kapila, with some adjustments by me.  Extracted from latest parallel
sequential scan patch.
2015-10-30 10:44:54 +01:00
Peter Eisentraut a8d585c091 Message style improvements
Message style, plurals, quoting, spelling, consistency with similar
messages
2015-10-28 20:38:36 -04:00
Alvaro Herrera 531d21b75f Cleanup commit timestamp module activaction, again
Further tweak commit_ts.c so that on a standby the state is completely
consistent with what that in the master, rather than behaving
differently in the cases that the settings differ.  Now in standby and
master the module should always be active or inactive in lockstep.

Author: Petr Jelínek, with some further tweaks by Álvaro Herrera.

Backpatch to 9.5, where commit timestamps were introduced.

Discussion: http://www.postgresql.org/message-id/5622BF9D.2010409@2ndquadrant.com
2015-10-27 15:06:50 -03:00
Robert Haas 31ba62ce32 Fix typos in comments.
CharSyam
2015-10-22 14:52:23 -04:00
Robert Haas b0b0d84b3d Allow a parallel context to relaunch workers.
This may allow some callers to avoid the overhead involved in tearing
down a parallel context and then setting up a new one, which means
releasing the DSM and then allocating and populating a new one.  I
suspect we'll want to revise the Gather node to make use of this new
capability, but even if not it may be useful elsewhere and requires
very little additional code.
2015-10-16 17:18:05 -04:00
Robert Haas a53c06a13e Prohibit parallel query when the isolation level is serializable.
In order for this to be safe, the code which hands true serializability
will need to taught that the SIRead locks taken by a parallel worker
pertain to the same transaction as those taken by the parallel leader.
Some further changes may be needed as well.  Until the necessary
adaptations are made, don't generate parallel plans in serializable
mode, and if a previously-generated parallel plan is used after
serializable mode has been activated, run it serially.

This fixes a bug in commit 7aea8e4f2d.
2015-10-16 11:58:27 -04:00
Robert Haas 82b37765c7 Fix a problem with parallel workers being unable to restore role.
check_role() tries to verify that the user has permission to become the
requested role, but this is inappropriate in a parallel worker, which
needs to exactly recreate the master's authorization settings.  So skip
the check in that case.

This fixes a bug in commit 924bcf4f16.
2015-10-16 11:37:19 -04:00
Robert Haas 6de6d96d97 Invalidate caches after cranking up a parallel worker transaction.
Starting a parallel worker transaction changes our notion of which XIDs
are in-progress or committed, and our notion of the current command
counter ID.  Therefore, our view of these caches prior to starting
this transaction may no longer valid.  Defend against that by clearing
them.

This fixes a bug in commit 924bcf4f16.
2015-10-16 11:31:23 -04:00
Robert Haas 94b4f7e2a6 Tighten up application of parallel mode checks.
Commit 924bcf4f16 failed to enforce
parallel mode checks during the commit of a parallel worker, because
we exited parallel mode prior to ending the transaction so that we
could pop the active snapshot.  Re-establish parallel mode during
parallel worker commit.  Without this, it's far too easy for unsafe
actions during the pre-commit sequence to crash the server instead of
hitting the error checks as intended.

Just to be extra paranoid, adjust a couple of the sanity checks in
xact.c to check not only IsInParallelMode() but also
IsParallelWorker().
2015-10-16 09:59:57 -04:00
Robert Haas 423ec0877f Transfer current command counter ID to parallel workers.
Commit 924bcf4f16 correctly forbade
parallel workers to modify the command counter while in parallel mode,
but it inexplicably neglected to actually transfer the current command
counter from leader to workers.  This can result in the workers seeing
a different set of tuples from the leader, which is bad.  Repair.
2015-10-16 09:58:42 -04:00
Robert Haas 2ad5c27bb5 Don't send protocol messages to a shm_mq that no longer exists.
Commit 2bd9e412f9 introduced a mechanism
for relaying protocol messages from a background worker to another
backend via a shm_mq.  However, there was no provision for shutting
down the communication channel.  Therefore, a protocol message sent
late in the shutdown sequence, such as a DEBUG message resulting from
cranking up log_min_messages, could crash the server.  To fix, install
an on_dsm_detach callback that disables sending messages to the shm_mq
when the associated DSM is detached.
2015-10-16 09:42:33 -04:00
Alvaro Herrera e06b2e1d2e Don't disable commit_ts in standby if enabled locally
Bug noticed by Fujii Masao
2015-10-02 12:49:01 -03:00
Peter Eisentraut 87c2b517ac Fix message punctuation according to style guide 2015-10-01 21:39:56 -04:00
Alvaro Herrera f12e814b88 Fix commit_ts for standby
Module initialization was still not completely correct after commit
6b61955135, per crash report from Takashi Ohnishi.  To fix, instead of
trying to monkey around with the value of the GUC setting directly, add
a separate boolean flag that enables the feature on a standby, but only
for the startup (recovery) process, when it sees that its master server
has the feature enabled.
Discussion: http://www.postgresql.org/message-id/ca44c6c7f9314868bdc521aea4f77cbf@MP-MSGSS-MBX004.msg.nttdata.co.jp

Also change the deactivation routine to delete all segment files rather
than leaving the last one around.  (This doesn't need separate
WAL-logging, because on recovery we execute the same deactivation
routine anyway.)

In passing, clean up the code structure somewhat, particularly so that
xlog.c doesn't know so much about when to activate/deactivate the
feature.

Thanks to Fujii Masao for testing and Petr Jelínek for off-list discussion.

Back-patch to 9.5, where commit_ts was introduced.
2015-10-01 15:06:55 -03:00
Robert Haas 227d57f358 Don't dump core when destroying an unused ParallelContext.
If a transaction or subtransaction creates a ParallelContext but ends
without calling InitializeParallelDSM, the previous code would
seg fault.  Fix that.
2015-09-30 18:36:31 -04:00
Alvaro Herrera 6b61955135 Code review for transaction commit timestamps
There are three main changes here:

1. No longer cause a start failure in a standby if the feature is
disabled in postgresql.conf but enabled in the master.  This reverts one
part of commit 4f3924d9cd43; what we keep is the ability of the standby
to activate/deactivate the module (which includes creating and removing
segments as appropriate) during replay of such actions in the master.

2. Replay WAL records affecting commitTS even if the feature is
disabled.  This means the standby will always have the same state as the
master after replay.

3. Have COMMIT PREPARE record the transaction commit time as well.  We
were previously only applying it in the normal transaction commit path.

Author: Petr Jelínek
Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com
Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com

Additionally, I cleaned up nearby code related to replication origins,
which I found a bit hard to follow, and fixed a couple of typos.

Backpatch to 9.5, where this code was introduced.

Per bug reports from Fujii Masao and subsequent discussion.
2015-09-29 14:40:56 -03:00
Alvaro Herrera 17f5831c81 Fix "sesssion" typo
It was introduced alongside replication origins, by commit
5aa2350426, so backpatch to 9.5.

Pointed out by Fujii Masao
2015-09-28 19:13:42 -03:00
Andres Freund aa29c1ccd9 Remove legacy multixact truncation support.
In 9.5 and master there is no need to support legacy truncation. This is
just committed separately to make it easier to backpatch the WAL logged
multixact truncation to 9.3 and 9.4 if we later decide to do so.

I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
0xD087 to avoid 9.5 reusing a value that has been in use on master while
keeping the numbers increasing between major versions.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Backpatch: 9.5
2015-09-26 19:04:25 +02:00
Andres Freund 4f627f8973 Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.

We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.

This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
   to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
   we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
   this has gotten much easier

During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
   segments. This happened to be hard or impossible to hit due to the
   interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
   that doesn't work for offsets that haven't yet been flushed to
   disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
   slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
   and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
   of emergency vacuuming. The problem remains in
   pg_get_multixact_members(), but that's quite harmless.

For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.

For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound.  To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.

A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
Peter Eisentraut 4a1e15e4a9 Add missing serial comma 2015-09-18 22:41:42 -04:00
Fujii Masao 10fbb79f1a Improve log messages related to tablespace_map file
This patch changes the log message which is logged when the server
successfully renames backup_label file to *.old but fails to rename
tablespace_map file during the shutdown. Previously the WARNING
message "online backup mode was not canceled" was logged in that case.
However this message is confusing because the backup mode is treated
as canceled whenever backup_label is successfully renamed. So this
commit makes the server log the message "online backup mode canceled"
in that case.

Also this commit changes errdetail messages so that they follow the
error message style guide.

Back-patch to 9.5 where tablespace_map file is introduced.

Original patch by Amit Kapila, heavily modified by me.
2015-09-15 23:21:51 +09:00
Fujii Masao 96f6a0cb41 Remove files signaling a standby promotion request at postmaster startup
This commit makes postmaster forcibly remove the files signaling
a standby promotion request. Otherwise, the existence of those files
can trigger a promotion too early, whether a user wants that or not.

This removal of files is usually unnecessary because they can exist
only during a few moments during a standby promotion. However
there is a race condition: if pg_ctl promote is executed and creates
the files during a promotion, the files can stay around even after
the server is brought up to new master. Then, if new standby starts
by using the backup taken from that master, the files can exist
at the server startup and should be removed in order to avoid
an unexpected promotion.

Back-patch to 9.1 where promote signal file was introduced.

Problem reported by Feike Steenbergen.
Original patch by Michael Paquier, modified by me.

Discussion: 20150528100705.4686.91426@wrigleys.postgresql.org
2015-09-09 22:51:44 +09:00
Heikki Linnakangas c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Tom Lane c5454f99c4 Fix subtransaction cleanup after an outer-subtransaction portal fails.
Formerly, we treated only portals created in the current subtransaction as
having failed during subtransaction abort.  However, if the error occurred
while running a portal created in an outer subtransaction (ie, a cursor
declared before the last savepoint), that has to be considered broken too.

To allow reliable detection of which ones those are, add a bookkeeping
field to struct Portal that tracks the innermost subtransaction in which
each portal has actually been executed.  (Without this, we'd end up
failing portals containing functions that had called the subtransaction,
thereby breaking plpgsql exception blocks completely.)

In addition, when we fail an outer-subtransaction Portal, transfer its
resources into the subtransaction's resource owner, so that they're
released early in cleanup of the subxact.  This fixes a problem reported by
Jim Nasby in which a function executed in an outer-subtransaction cursor
could cause an Assert failure or crash by referencing a relation created
within the inner subtransaction.

The proximate cause of the Assert failure is that AtEOSubXact_RelationCache
assumed it could blow away a relcache entry without first checking that the
entry had zero refcount.  That was a bad idea on its own terms, so add such
a check there, and to the similar coding in AtEOXact_RelationCache.  This
provides an independent safety measure in case there are still ways to
provoke the situation despite the Portal-level changes.

This has been broken since subtransactions were invented, so back-patch
to all supported branches.

Tom Lane and Michael Paquier
2015-09-04 13:37:14 -04:00
Fujii Masao 1ea5ce5c5f Document that max_worker_processes must be high enough in standby.
The setting values of some parameters including max_worker_processes
must be equal to or higher than the values on the master. However,
previously max_worker_processes was not listed as such parameter
in the document. So this commit adds it to that list.

Back-patch to 9.4 where max_worker_processes was added.
2015-09-03 22:30:16 +09:00
Alvaro Herrera e68be16b0d Do not allow *timestamp to be passed as NULL
The code had bugs that would cause crashes if NULL was passed as that
argument (originally intended to mean not to bother returning its
value), and after inspection it turns out that nothing seems interested
in the case that *ts is NULL anyway.  Therefore, remove the partial
checks intended to support that case.

Author: Michael Paquier
though I didn't include a proposed Assert.

Backpatch to 9.5.
2015-08-21 14:36:54 -03:00