postgresql/src/backend/catalog
Alvaro Herrera 0ac5ad5134 Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 12:04:59 -03:00
..
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Catalog.pm Update copyrights for 2013 2013-01-01 17:15:01 -05:00
Makefile Syntax support and documentation for event triggers. 2012-07-18 10:16:16 -04:00
README Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
aclchk.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
catalog.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
dependency.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
genbki.pl Update copyrights for 2013 2013-01-01 17:15:01 -05:00
heap.c Improve concurrency of foreign key locking 2013-01-23 12:04:59 -03:00
index.c Improve concurrency of foreign key locking 2013-01-23 12:04:59 -03:00
indexing.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
information_schema.sql Update copyrights for 2013 2013-01-01 17:15:01 -05:00
namespace.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
objectaddress.c Refactor ALTER some-obj RENAME implementation 2013-01-21 12:06:41 -03:00
pg_aggregate.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_collation.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_constraint.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_conversion.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_db_role_setting.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_depend.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_enum.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_inherits.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_largeobject.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_namespace.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_operator.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_proc.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_range.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_shdepend.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
pg_type.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00
sql_feature_packages.txt > I have installed your patch and adjusted the names of the standards 2004-12-02 22:51:28 +00:00
sql_features.txt Rename SQL feature S403 to ARRAY_MAX_CARDINALITY 2012-12-19 07:14:27 -05:00
storage.c Accelerate end-of-transaction dropping of relations 2013-01-17 16:13:17 -03:00
system_views.sql Update copyrights for 2013 2013-01-01 17:15:01 -05:00
toasting.c Update copyrights for 2013 2013-01-01 17:15:01 -05:00

README

src/backend/catalog/README

System Catalog
==============

This directory contains .c files that manipulate the system catalogs;
src/include/catalog contains the .h files that define the structure
of the system catalogs.

When the compile-time scripts (Gen_fmgrtab.pl and genbki.pl)
execute, they grep the DATA statements out of the .h files and munge
these in order to generate the postgres.bki file.  The .bki file is then
used as input to initdb (which is just a wrapper around postgres
running single-user in bootstrapping mode) in order to generate the
initial (template) system catalog relation files.

-----------------------------------------------------------------

People who are going to hose around with the .h files should be aware
of the following facts:

- It is very important that the DATA statements be properly formatted
(e.g., no broken lines, proper use of white-space and _null_).  The
scripts are line-oriented and break easily.  In addition, the only
documentation on the proper format for them is the code in the
bootstrap/ directory.  Just be careful when adding new DATA
statements.

- Some catalogs require that OIDs be preallocated to tuples because
of cross-references from other pre-loaded tuples.  For example, pg_type
contains pointers into pg_proc (e.g., pg_type.typinput), and pg_proc
contains back-pointers into pg_type (pg_proc.proargtypes).  For such
cases, the OID assigned to a tuple may be explicitly set by use of the
"OID = n" clause of the .bki insert statement.  If no such pointers are
required to a given tuple, then the OID = n clause may be omitted
(then the system generates an OID in the usual way, or leaves it 0 in a
catalog that has no OIDs).  In practice we usually preassign OIDs
for all or none of the pre-loaded tuples in a given catalog, even if only
some of them are actually cross-referenced.

- We also sometimes preallocate OIDs for catalog tuples whose OIDs must
be known directly in the C code.  In such cases, put a #define in the
catalog's .h file, and use the #define symbol in the C code.  Writing
the actual numeric value of any OID in C code is considered very bad form.
Direct references to pg_proc OIDs are common enough that there's a special
mechanism to create the necessary #define's automatically: see
backend/utils/Gen_fmgrtab.pl.  We also have standard conventions for setting
up #define's for the pg_class OIDs of system catalogs and indexes.  For all
the other system catalogs, you have to manually create any #define's you
need.

- If you need to find a valid OID for a new predefined tuple,
use the unused_oids script.  It generates inclusive ranges of
*unused* OIDs (e.g., the line "45-900" means OIDs 45 through 900 have
not been allocated yet).  Currently, OIDs 1-9999 are reserved for manual
assignment; the unused_oids script simply looks through the include/catalog
headers to see which ones do not appear in "OID =" clauses in DATA lines.
(As of Postgres 8.1, it also looks at CATALOG and DECLARE_INDEX lines.)
You can also use the duplicate_oids script to check for mistakes.

- The OID counter starts at 10000 at bootstrap.  If a catalog row is in a
table that requires OIDs, but no OID was preassigned by an "OID =" clause,
then it will receive an OID of 10000 or above.

- To create a "BOOTSTRAP" table you have to do a lot of extra work: these
tables are not created through a normal CREATE TABLE operation, but spring
into existence when first written to during initdb.  Therefore, you must
manually create appropriate entries for them in the pre-loaded contents of
pg_class, pg_attribute, and pg_type.  Avoid making new catalogs be bootstrap
catalogs if at all possible; generally, only tables that must be written to
in order to create a table should be bootstrapped.

- Certain BOOTSTRAP tables must be at the start of the Makefile
POSTGRES_BKI_SRCS variable, as these cannot be created through the standard
heap_create_with_catalog process, because it needs these tables to exist
already.  The list of files this currently includes is:
	pg_proc.h pg_type.h pg_attribute.h pg_class.h
Within this list, pg_type.h must come before pg_attribute.h.
Also, indexing.h must be last, since the indexes can't be created until all
the tables are in place, and toasting.h should probably be next-to-last
(or at least after all the tables that need toast tables).  There are
reputedly some other order dependencies in the .bki list, too.

-----------------------------------------------------------------

When munging the .c files, you should be aware of certain conventions:

- The system catalog cache code (and most catalog-munging code in
general) assumes that the fixed-length portions of all system catalog
tuples are in fact present, because it maps C struct declarations onto
them.  Thus, the variable-length fields must all be at the end, and
only the variable-length fields of a catalog tuple are permitted to be
NULL.  For example, if you set pg_type.typrelid to be NULL, a
piece of code will likely perform "typetup->typrelid" (or, worse,
"typetyp->typelem", which follows typrelid).  This will result in
random errors or even segmentation violations.  Hence, do NOT insert
catalog tuples that contain NULL attributes except in their
variable-length portions!  (The bootstrapping code is fairly good about
marking NOT NULL each of the columns that can legally be referenced via
C struct declarations ... but those markings won't be enforced against
DATA commands, so you must get it right in a DATA line.)

- Modification of the catalogs must be performed with the proper
updating of catalog indexes!  That is, most catalogs have indexes
on them; when you munge them using the executor, the executor will
take care of doing the index updates, but if you make direct access
method calls to insert new or modified tuples into a heap, you must
also make the calls to insert the tuple into ALL of its indexes!  If
not, the new tuple will generally be "invisible" to the system because
most of the accesses to the catalogs in question will be through the
associated indexes.