2001-08-25 20:52:43 +02:00
|
|
|
/*
|
|
|
|
* rmgr.c
|
|
|
|
*
|
|
|
|
* Resource managers definition
|
|
|
|
*
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/access/transam/rmgr.c
|
2001-08-25 20:52:43 +02:00
|
|
|
*/
|
1999-09-27 17:48:12 +02:00
|
|
|
#include "postgres.h"
|
2001-08-25 20:52:43 +02:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
#include "access/clog.h"
|
2006-07-11 19:26:59 +02:00
|
|
|
#include "access/gin.h"
|
2005-05-17 05:34:18 +02:00
|
|
|
#include "access/gist_private.h"
|
2000-11-21 22:16:06 +01:00
|
|
|
#include "access/hash.h"
|
|
|
|
#include "access/heapam.h"
|
2005-06-08 17:50:28 +02:00
|
|
|
#include "access/multixact.h"
|
2000-11-21 22:16:06 +01:00
|
|
|
#include "access/nbtree.h"
|
|
|
|
#include "access/xact.h"
|
2004-07-22 00:31:26 +02:00
|
|
|
#include "access/xlog_internal.h"
|
2008-11-19 11:34:52 +01:00
|
|
|
#include "catalog/storage.h"
|
2004-08-29 23:08:48 +02:00
|
|
|
#include "commands/dbcommands.h"
|
2000-11-30 02:47:33 +01:00
|
|
|
#include "commands/sequence.h"
|
2004-08-29 23:08:48 +02:00
|
|
|
#include "commands/tablespace.h"
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
#include "storage/standby.h"
|
2010-02-07 21:48:13 +01:00
|
|
|
#include "utils/relmapper.h"
|
2000-10-21 17:43:36 +02:00
|
|
|
|
2001-08-25 20:52:43 +02:00
|
|
|
|
2004-07-22 00:31:26 +02:00
|
|
|
const RmgrData RmgrTable[RM_MAX_ID + 1] = {
|
2006-08-07 18:57:57 +02:00
|
|
|
{"XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL},
|
|
|
|
{"Transaction", xact_redo, xact_desc, NULL, NULL, NULL},
|
|
|
|
{"Storage", smgr_redo, smgr_desc, NULL, NULL, NULL},
|
|
|
|
{"CLOG", clog_redo, clog_desc, NULL, NULL, NULL},
|
|
|
|
{"Database", dbase_redo, dbase_desc, NULL, NULL, NULL},
|
|
|
|
{"Tablespace", tblspc_redo, tblspc_desc, NULL, NULL, NULL},
|
|
|
|
{"MultiXact", multixact_redo, multixact_desc, NULL, NULL, NULL},
|
2010-02-07 21:48:13 +01:00
|
|
|
{"RelMap", relmap_redo, relmap_desc, NULL, NULL, NULL},
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
{"Standby", standby_redo, standby_desc, NULL, NULL, NULL},
|
Fix recently-understood problems with handling of XID freezing, particularly
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-05 23:42:10 +01:00
|
|
|
{"Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL},
|
2006-08-07 18:57:57 +02:00
|
|
|
{"Heap", heap_redo, heap_desc, NULL, NULL, NULL},
|
|
|
|
{"Btree", btree_redo, btree_desc, btree_xlog_startup, btree_xlog_cleanup, btree_safe_restartpoint},
|
|
|
|
{"Hash", hash_redo, hash_desc, NULL, NULL, NULL},
|
|
|
|
{"Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, gin_safe_restartpoint},
|
Rewrite the GiST insertion logic so that we don't need the post-recovery
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:
1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.
2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.
These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.
The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.
With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
2010-12-23 15:03:08 +01:00
|
|
|
{"Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL},
|
2006-08-07 18:57:57 +02:00
|
|
|
{"Sequence", seq_redo, seq_desc, NULL, NULL, NULL}
|
2000-10-21 17:43:36 +02:00
|
|
|
};
|