From 410b1dfb885f5b6d60f89003baba32a4efe93225 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 1 Aug 2004 20:57:59 +0000
Subject: [PATCH] Update the in-code documentation about the transaction
 system.  Move it into a README file instead of being in xact.c's header
 comment. Alvaro Herrera.

---
 src/backend/access/transam/README | 233 ++++++++++++++++++++++++++++++
 src/backend/access/transam/xact.c | 130 +----------------
 2 files changed, 236 insertions(+), 127 deletions(-)
 create mode 100644 src/backend/access/transam/README
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
new file mode 100644
index 0000000000..deb6a12f8e
--- /dev/null
+++ b/src/backend/access/transam/README
@@ -0,0 +1,233 @@
+$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.1 2004/08/01 20:57:59 tgl Exp $
+
+The Transaction System
+----------------------
+
+PostgreSQL's transaction system is a three-layer system.  The bottom layer
+implements low-level transactions and subtransactions, on top of which rests
+the mainloop's control code, which in turn implements user-visible
+transactions and savepoints.
+
+The middle layer of code is called by postgres.c before and after the
+processing of each query:
+
+		StartTransactionCommand
+		CommitTransactionCommand
+		AbortCurrentTransaction
+
+Meanwhile, the user can alter the system's state by issuing the SQL commands
+BEGIN, COMMIT, ROLLBACK, SAVEPOINT, ROLLBACK TO or RELEASE.  The traffic cop
+redirects these calls to the toplevel routines
+
+		BeginTransactionBlock
+		EndTransactionBlock
+		UserAbortTransactionBlock
+		DefineSavepoint
+		RollbackToSavepoint
+		ReleaseSavepoint
+
+respectively.  Depending on the current state of the system, these functions
+call low level functions to activate the real transaction system:
+
+		StartTransaction
+		CommitTransaction
+		AbortTransaction
+		CleanupTransaction
+		StartSubTransaction
+		CommitSubTransaction
+		AbortSubTransaction
+		CleanupSubTransaction
+
+Additionally, within a transaction, CommandCounterIncrement is called to
+increment the command counter, which allows future commands to "see" the
+effects of previous commands within the same transaction.  Note that this is
+done automatically by CommitTransactionCommand after each query inside a
+transaction block, but some utility functions also do it internally to allow
+some operations (usually in the system catalogs) to be seen by future
+operations in the same utility command (for example, in DefineRelation it is
+done after creating the heap so the pg_class row is visible, to be able to
+lock it).
+
+
+For example, consider the following sequence of user commands:
+
+1)		BEGIN
+2)		SELECT * FROM foo
+3)		INSERT INTO foo VALUES (...)
+4)		COMMIT
+
+In the main processing loop, this results in the following function call
+sequence:
+
+	 /	StartTransactionCommand;
+	/	ProcessUtility;				<< BEGIN
+1) <			BeginTransactionBlock;
+	\	CommitTransactionCommand;
+	 \		StartTransaction;
+
+	/	StartTransactionCommand;
+2) /		ProcessQuery;				<< SELECT * FROM foo
+   \		CommitTransactionCommand;
+	\		CommandCounterIncrement;
+
+	/	StartTransactionCommand;
+3) /		ProcessQuery;				<< INSERT INTO foo VALUES (...)
+   \		CommitTransactionCommand;
+	\		CommandCounterIncrement;
+
+	 /	StartTransactionCommand;
+	/	ProcessUtility;				<< COMMIT
+4) <			EndTransactionBlock;
+	\			CommitTransaction;
+	 \	CommitTransactionCommand;
+
+The point of this example is to demonstrate the need for
+StartTransactionCommand and CommitTransactionCommand to be state smart -- they
+should call CommandCounterIncrement between the calls to BeginTransactionBlock
+and EndTransactionBlock and outside these calls they need to do normal start,
+commit or abort processing.
+
+Furthermore, suppose the "SELECT * FROM foo" caused an abort condition.	In
+this case AbortCurrentTransaction is called, and the transaction is put in
+aborted state.  In this state, any user input is ignored except for
+transaction-termination statements, or ROLLBACK TO <savepoint> commands.
+
+Transaction aborts can occur in two ways:
+
+1)	system dies from some internal cause  (syntax error, etc)
+2)	user types ROLLBACK
+
+The reason we have to distinguish them is illustrated by the following two
+situations:
+
+	case 1					case 2
+	------					------
+1) user types BEGIN			1) user types BEGIN
+2) user does something			2) user does something
+3) user does not like what		3) system aborts for some reason
+   she sees and types ABORT		   (syntax error, etc)
+
+In case 1, we want to abort the transaction and return to the default state.
+In case 2, there may be more commands coming our way which are part of the
+same transaction block; we have to ignore these commands until we see a COMMIT
+or ROLLBACK.
+
+Internal aborts are handled by AbortCurrentTransaction, while user aborts are
+handled by UserAbortTransactionBlock.  Both of them rely on AbortTransaction
+to do all the real work.  The only difference is what state we enter after
+AbortTransaction does its work:
+
+* AbortCurrentTransaction leaves us in TBLOCK_ABORT,
+* UserAbortTransactionBlock leaves us in TBLOCK_ENDABORT
+
+Low-level transaction abort handling is divided in two phases:
+* AbortTransaction executes as soon as we realize the transaction has
+  failed.  It should release all shared resources (locks etc) so that we do
+  not delay other backends unnecessarily.
+* CleanupTransaction executes when we finally see a user COMMIT
+  or ROLLBACK command; it cleans things up and gets us out of the transaction
+  internally.  In particular, we mustn't destroy TopTransactionContext until
+  this point.
+
+Also, note that when a transaction is committed, we don't close it right away.
+Rather it's put in TBLOCK_END state, which means that when
+CommitTransactionCommand is called after the query has finished processing,
+the transaction has to be closed.  The distinction is subtle but important,
+because it means that control will leave the xact.c code with the transaction
+open, and the main loop will be able to keep processing inside the same
+transaction.  So, in a sense, transaction commit is also handled in two
+phases, the first at EndTransactionBlock and the second at
+CommitTransactionCommand (which is where CommitTransaction is actually
+called).
+
+The rest of the code in xact.c are routines to support the creation and
+finishing of transactions and subtransactions.  For example, AtStart_Memory
+takes care of initializing the memory subsystem at main transaction start.
+
+
+Subtransaction handling
+-----------------------
+
+Subtransactions are implemented using a stack of TransactionState structures,
+each of which has a pointer to its parent transaction's struct.  When a new
+subtransaction is to be opened, PushTransaction is called, which creates a new
+TransactionState, with its parent link pointing to the current transaction.
+StartSubTransaction is in charge of initializing the new TransactionState to
+sane values, and properly initializing other subsystems (AtSubStart routines).
+
+When closing a subtransaction, either CommitSubTransaction has to be called
+(if the subtransaction is committing), or AbortSubTransaction and
+CleanupSubTransaction (if it's aborting).  In either case, PopTransaction is
+called so the system returns to the parent transaction.
+
+One important point regarding subtransaction handling is that several may need
+to be closed in response to a single user command.  That's because savepoints
+have names, and we allow to commit or rollback a savepoint by name, which is
+not necessarily the one that was last opened.  In the case of subtransaction
+commit this is not a problem, and we close all the involved subtransactions
+right away by calling CommitTransactionToLevel, which in turn calls
+CommitSubTransaction and PopTransaction as many times as needed.
+
+In the case of subtransaction abort (when the user issues ROLLBACK TO
+<savepoint>), things are not so easy.  We have to keep the subtransactions
+open and return control to the main loop.  So what RollbackToSavepoint does is
+abort the innermost subtransaction and put it in TBLOCK_SUBENDABORT state, and
+put the rest in TBLOCK_SUBABORT_PENDING state.  Then we return control to the
+main loop, which will in turn return control to us by calling
+CommitTransactionCommand.  At this point we can close all subtransactions that
+are marked with the "abort pending" state.  When that's done, the outermost
+subtransaction is created again, to conform to SQL's definition of ROLLBACK TO.
+
+Other subsystems are allowed to start "internal" subtransactions, which are
+handled by BeginInternalSubtransaction.  This is to allow implementing
+exception handling, e.g. in PL/pgSQL.  ReleaseCurrentSubTransaction and
+RollbackAndReleaseCurrentSubTransaction allows the subsystem to close said
+subtransactions.  The main difference between this and the savepoint/release
+path is that BeginInternalSubtransaction is allowed when no explicit
+transaction block has been established, while DefineSavepoint is not.
+
+
+pg_clog and pg_subtrans
+-----------------------
+
+pg_clog and pg_subtrans are permanent (on-disk) storage of transaction related
+information.  There is a limited number of pages of each kept in memory, so
+in many cases there is no need to actually read from disk.  However, if
+there's a long running transaction or a backend sitting idle with an open
+transaction, it may be necessary to be able to read and write this information
+from disk.  They also allow information to be permanent across server restarts.
+
+pg_clog records the commit status for each transaction.  A transaction can be
+in progress, committed, aborted, or "sub-committed".  This last state means
+that it's a subtransaction that's no longer running, but its parent has not
+updated its state yet (either it is still running, or the backend crashed
+without updating its status).  A sub-committed transaction's status will be
+updated again to the final value as soon as the parent commits or aborts, or
+when the parent is detected to be aborted.
+
+Savepoints are implemented using subtransactions.  A subtransaction is a
+transaction inside a transaction; it gets its own TransactionId, but its
+commit or abort status is not only dependent on whether it committed itself,
+but also whether its parent transaction committed.  To implement multiple
+savepoints in a transaction we allow unlimited transaction nesting depth, so
+any particular subtransaction's commit state is dependent on the commit status
+of each and every ancestor transaction.
+
+The "subtransaction parent" (pg_subtrans) mechanism records, for each
+transaction, the TransactionId of its parent transaction.  This information is
+stored as soon as the subtransaction is created.  Top-level transactions do
+not have a parent, so they leave their pg_subtrans entries set to the default
+value of zero (InvalidTransactionId).
+
+pg_subtrans is used to check whether the transaction in question is still
+running --- the main Xid of a transaction is recorded in the PGPROC struct,
+but since we allow arbitrary nesting of subtransactions, we can't fit all Xids
+in shared memory, so we have to store them on disk.  Note, however, that for
+each transaction we keep a "cache" of Xids that are known to be part of the
+transaction tree, so we can skip looking at pg_subtrans unless we know the
+cache has been overflowed.  See storage/ipc/sinval.c for the gory details.
+
+slru.c is the supporting mechanism for both pg_clog and pg_subtrans.  It
+implements the LRU policy for in-memory buffer pages.  The high-level routines
+for pg_clog are implemented in transam.c, while the low-level functions are in
+clog.c.  pg_subtrans is contained completely in subtrans.c.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 486f85be5d..601519e4e9 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -3,138 +3,14 @@
  * xact.c
  *	  top level transaction system support routines
  *
+ * See src/backend/access/transam/README for more information.
+ *
  * Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.175 2004/08/01 17:32:13 tgl Exp $
- *
- * NOTES
- *		Transaction aborts can now occur two ways:
- *
- *		1)	system dies from some internal cause  (syntax error, etc..)
- *		2)	user types ABORT
- *
- *		These two cases used to be treated identically, but now
- *		we need to distinguish them.  Why?	consider the following
- *		two situations:
- *
- *				case 1							case 2
- *				------							------
- *		1) user types BEGIN				1) user types BEGIN
- *		2) user does something			2) user does something
- *		3) user does not like what		3) system aborts for some reason
- *		   she sees and types ABORT
- *
- *		In case 1, we want to abort the transaction and return to the
- *		default state.	In case 2, there may be more commands coming
- *		our way which are part of the same transaction block and we have
- *		to ignore these commands until we see a COMMIT transaction or
- *		ROLLBACK.
- *
- *		Internal aborts are now handled by AbortTransactionBlock(), just as
- *		they always have been, and user aborts are now handled by
- *		UserAbortTransactionBlock().  Both of them rely on AbortTransaction()
- *		to do all the real work.  The only difference is what state we
- *		enter after AbortTransaction() does its work:
- *
- *		* AbortTransactionBlock() leaves us in TBLOCK_ABORT and
- *		* UserAbortTransactionBlock() leaves us in TBLOCK_ENDABORT
- *
- *		Low-level transaction abort handling is divided into two phases:
- *		* AbortTransaction() executes as soon as we realize the transaction
- *		  has failed.  It should release all shared resources (locks etc)
- *		  so that we do not delay other backends unnecessarily.
- *		* CleanupTransaction() executes when we finally see a user COMMIT
- *		  or ROLLBACK command; it cleans things up and gets us out of
- *		  the transaction internally.  In particular, we mustn't destroy
- *		  TopTransactionContext until this point.
- *
- *	 NOTES
- *		The essential aspects of the transaction system are:
- *
- *				o  transaction id generation
- *				o  transaction log updating
- *				o  memory cleanup
- *				o  cache invalidation
- *				o  lock cleanup
- *
- *		Hence, the functional division of the transaction code is
- *		based on which of the above things need to be done during
- *		a start/commit/abort transaction.  For instance, the
- *		routine AtCommit_Memory() takes care of all the memory
- *		cleanup stuff done at commit time.
- *
- *		The code is layered as follows:
- *
- *				StartTransaction
- *				CommitTransaction
- *				AbortTransaction
- *				CleanupTransaction
- *
- *		are provided to do the lower level work like recording
- *		the transaction status in the log and doing memory cleanup.
- *		above these routines are another set of functions:
- *
- *				StartTransactionCommand
- *				CommitTransactionCommand
- *				AbortCurrentTransaction
- *
- *		These are the routines used in the postgres main processing
- *		loop.  They are sensitive to the current transaction block state
- *		and make calls to the lower level routines appropriately.
- *
- *		Support for transaction blocks is provided via the functions:
- *
- *				BeginTransactionBlock
- *				CommitTransactionBlock
- *				AbortTransactionBlock
- *
- *		These are invoked only in response to a user "BEGIN WORK", "COMMIT",
- *		or "ROLLBACK" command.	The tricky part about these functions
- *		is that they are called within the postgres main loop, in between
- *		the StartTransactionCommand() and CommitTransactionCommand().
- *
- *		For example, consider the following sequence of user commands:
- *
- *		1)		begin
- *		2)		select * from foo
- *		3)		insert into foo (bar = baz)
- *		4)		commit
- *
- *		in the main processing loop, this results in the following
- *		transaction sequence:
- *
- *			/	StartTransactionCommand();
- *		1) /	ProcessUtility();				<< begin
- *		   \		BeginTransactionBlock();
- *			\	CommitTransactionCommand();
- *
- *			/	StartTransactionCommand();
- *		2) <	ProcessQuery();					<< select * from foo
- *			\	CommitTransactionCommand();
- *
- *			/	StartTransactionCommand();
- *		3) <	ProcessQuery();					<< insert into foo (bar = baz)
- *			\	CommitTransactionCommand();
- *
- *			/	StartTransactionCommand();
- *		4) /	ProcessUtility();				<< commit
- *		   \		CommitTransactionBlock();
- *			\	CommitTransactionCommand();
- *
- *		The point of this example is to demonstrate the need for
- *		StartTransactionCommand() and CommitTransactionCommand() to
- *		be state smart -- they should do nothing in between the calls
- *		to BeginTransactionBlock() and EndTransactionBlock() and
- *		outside these calls they need to do normal start/commit
- *		processing.
- *
- *		Furthermore, suppose the "select * from foo" caused an abort
- *		condition.	We would then want to abort the transaction and
- *		ignore all subsequent commands up to the "commit".
- *		-cim 3/23/90
+ *	  $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.176 2004/08/01 20:57:59 tgl Exp $
  *
  *-------------------------------------------------------------------------
  */