1999-05-26 19:27:39 +02:00
|
|
|
<chapter id="mvcc">
|
|
|
|
<title>Multi-Version Concurrency Control</title>
|
|
|
|
|
|
|
|
<abstract>
|
|
|
|
<para>
|
|
|
|
Multi-Version Concurrency Control
|
|
|
|
(MVCC)
|
|
|
|
is an advanced technique for improving database performance in a
|
|
|
|
multi-user environment.
|
|
|
|
<ulink url="mailto:vadim@krs.ru">Vadim Mikheev</ulink> provided
|
|
|
|
the implementation for <productname>Postgres</productname>.
|
|
|
|
</para>
|
|
|
|
</abstract>
|
|
|
|
|
|
|
|
<sect1>
|
|
|
|
<title>Introduction</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Unlike most other database systems which use locks for concurrency control,
|
|
|
|
<productname>Postgres</productname>
|
|
|
|
maintains data consistency by using a multiversion model.
|
1999-05-27 17:49:08 +02:00
|
|
|
This means that while querying a database each transaction sees
|
1999-05-26 19:27:39 +02:00
|
|
|
a snapshot of data (a <firstterm>database version</firstterm>)
|
|
|
|
as it was some
|
1999-05-27 17:49:08 +02:00
|
|
|
time ago, regardless of the current state of the underlying data.
|
1999-05-26 19:27:39 +02:00
|
|
|
This protects the transaction from viewing inconsistent data that
|
|
|
|
could be caused by (other) concurrent transaction updates on the same
|
|
|
|
data rows, providing <firstterm>transaction isolation</firstterm>
|
|
|
|
for each database session.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The main difference between multiversion and lock models is that
|
|
|
|
in MVCC locks acquired for querying (reading) data don't conflict
|
|
|
|
with locks acquired for writing data and so reading never blocks
|
|
|
|
writing and writing never blocks reading.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1>
|
|
|
|
<title>Transaction Isolation</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <acronym>ANSI</acronym>/<acronym>ISO</acronym> <acronym>SQL</acronym>
|
|
|
|
standard defines four levels of transaction
|
|
|
|
isolation in terms of three phenomena that must be prevented
|
|
|
|
between concurrent transactions.
|
|
|
|
These undesirable phenomena are:
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
dirty reads
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
A transaction reads data written by concurrent uncommitted transaction.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
non-repeatable reads
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
A transaction re-reads data it has previously read and finds that data
|
|
|
|
has been modified by another committed transaction.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
phantom read
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
A transaction re-executes a query returning a set of rows that satisfy a
|
|
|
|
search condition and finds that additional rows satisfying the condition
|
|
|
|
has been inserted by another committed transaction.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
The four isolation levels and the corresponding behaviors are described below.
|
|
|
|
|
|
|
|
<table tocentry="1">
|
|
|
|
<title><productname>Postgres</productname> Isolation Levels</title>
|
|
|
|
<titleabbrev>Isolation Levels</titleabbrev>
|
|
|
|
<tgroup cols="4">
|
|
|
|
<thead>
|
|
|
|
<row>
|
1999-07-06 19:19:42 +02:00
|
|
|
<entry>
|
|
|
|
</entry>
|
1999-05-27 17:49:08 +02:00
|
|
|
<entry>
|
|
|
|
Dirty Read
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Non-Repeatable Read
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Phantom Read
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
</thead>
|
|
|
|
<tbody>
|
|
|
|
<row>
|
|
|
|
<entry>
|
|
|
|
Read uncommitted
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Possible
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>
|
|
|
|
Read committed
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Not possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Possible
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>
|
|
|
|
Repeatable read
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Not possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Not possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Possible
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>
|
|
|
|
Serializable
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Not possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Not possible
|
|
|
|
</entry>
|
|
|
|
<entry>
|
|
|
|
Not possible
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</table>
|
1999-05-26 19:27:39 +02:00
|
|
|
|
|
|
|
<productname>Postgres</productname>
|
|
|
|
offers the read committed and serializable isolation levels.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1>
|
|
|
|
<title>Read Committed Isolation Level</title>
|
|
|
|
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
<firstterm>Read Committed</firstterm>
|
|
|
|
is the default isolation level in <productname>Postgres</productname>.
|
1999-05-26 19:27:39 +02:00
|
|
|
When a transaction runs on this isolation level, a query sees only
|
|
|
|
data committed before the query began and never sees either dirty data or
|
|
|
|
concurrent transaction changes committed during query execution.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If a row returned by a query while executing an
|
|
|
|
<command>UPDATE</command> statement
|
|
|
|
(or <command>DELETE</command>
|
|
|
|
or <command>SELECT FOR UPDATE</command>)
|
|
|
|
is being updated by a
|
|
|
|
concurrent uncommitted transaction then the second transaction
|
|
|
|
that tries to update this row will wait for the other transaction to
|
|
|
|
commit or rollback. In the case of rollback, the waiting transaction
|
|
|
|
can proceed to change the row. In the case of commit (and if the
|
|
|
|
row still exists; i.e. was not deleted by the other transaction), the
|
|
|
|
query will be re-executed for this row to check that new row
|
|
|
|
version satisfies query search condition. If the new row version
|
|
|
|
satisfies the query search condition then row will be
|
|
|
|
updated (or deleted or marked for update).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
Note that the results of execution of <command>SELECT</command>
|
|
|
|
or <command>INSERT</command> (with a query)
|
1999-05-26 19:27:39 +02:00
|
|
|
statements will not be affected by concurrent transactions.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1>
|
|
|
|
<title>Serializable Isolation Level</title>
|
|
|
|
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
<firstterm>Serializable</firstterm> provides the highest transaction isolation.
|
1999-06-05 04:47:13 +02:00
|
|
|
When a transaction is on the serializable level,
|
1999-05-26 19:27:39 +02:00
|
|
|
a query sees only data
|
|
|
|
committed before the transaction began and never see either dirty data
|
|
|
|
or concurrent transaction changes committed during transaction
|
|
|
|
execution. So, this level emulates serial transaction execution,
|
|
|
|
as if transactions would be executed one after another, serially,
|
|
|
|
rather than concurrently.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
If a row returned by query while executing a
|
|
|
|
<command>UPDATE</command>
|
|
|
|
(or <command>DELETE</command> or <command>SELECT FOR UPDATE</command>)
|
1999-05-26 19:27:39 +02:00
|
|
|
statement is being updated by
|
|
|
|
a concurrent uncommitted transaction then the second transaction
|
|
|
|
that tries to update this row will wait for the other transaction to
|
|
|
|
commit or rollback. In the case of rollback, the waiting transaction
|
|
|
|
can proceed to change the row. In the case of a concurrent
|
|
|
|
transaction commit, a serializable transaction will be rolled back
|
|
|
|
with the message
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
ERROR: Can't serialize access due to concurrent update
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
because a serializable transaction cannot modify rows changed by
|
|
|
|
other transactions after the serializable transaction began.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<note>
|
|
|
|
<para>
|
|
|
|
Note that results of execution of <command>SELECT</command>
|
|
|
|
or <command>INSERT</command> (with a query)
|
|
|
|
will not be affected by concurrent transactions.
|
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1>
|
|
|
|
<title>Locking and Tables</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<productname>Postgres</productname>
|
|
|
|
provides various lock modes to control concurrent
|
|
|
|
access to data in tables. Some of these lock modes are acquired by
|
|
|
|
<productname>Postgres</productname>
|
|
|
|
automatically before statement execution, while others are
|
|
|
|
provided to be used by applications. All lock modes (except for
|
|
|
|
AccessShareLock) acquired in a transaction are held for the duration
|
|
|
|
of the transaction.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In addition to locks, short-term share/exclusive latches are used
|
|
|
|
to control read/write access to table pages in shared buffer pool.
|
|
|
|
Latches are released immediately after a tuple is fetched or updated.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Table-level locks</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
AccessShareLock
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
An internal lock mode acquiring automatically over tables
|
|
|
|
being queried. <productname>Postgres</productname>
|
|
|
|
releases these locks after statement is
|
|
|
|
done.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Conflicts with AccessExclusiveLock only.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
RowShareLock
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Acquired by <command>SELECT FOR UPDATE</command>
|
|
|
|
and <command>LOCK TABLE</command>
|
|
|
|
for <option>IN ROW SHARE MODE</option> statements.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Conflicts with ExclusiveLock and AccessExclusiveLock modes.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
RowExclusiveLock
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Acquired by <command>UPDATE</command>, <command>DELETE</command>,
|
|
|
|
<command>INSERT</command> and <command>LOCK TABLE</command>
|
|
|
|
for <option>IN ROW EXCLUSIVE MODE</option> statements.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Conflicts with ShareLock, ShareRowExclusiveLock, ExclusiveLock and
|
|
|
|
AccessExclusiveLock modes.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
ShareLock
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Acquired by <command>CREATE INDEX</command>
|
|
|
|
and <command>LOCK TABLE</command> table
|
|
|
|
for <option>IN SHARE MODE</option>
|
|
|
|
statements.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Conflicts with RowExclusiveLock, ShareRowExclusiveLock,
|
|
|
|
ExclusiveLock and AccessExclusiveLock modes.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
ShareRowExclusiveLock
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Acquired by <command>LOCK TABLE</command> for
|
|
|
|
<option>IN SHARE ROW EXCLUSIVE MODE</option> statements.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Conflicts with RowExclusiveLock, ShareLock, ShareRowExclusiveLock,
|
|
|
|
ExclusiveLock and AccessExclusiveLock modes.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
ExclusiveLock
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Acquired by <command>LOCK TABLE</command> table
|
|
|
|
for <option>IN EXCLUSIVE MODE</option> statements.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Conflicts with RowShareLock, RowExclusiveLock, ShareLock,
|
|
|
|
ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock
|
|
|
|
modes.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
AccessExclusiveLock
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Acquired by <command>ALTER TABLE</command>,
|
|
|
|
<command>DROP TABLE</command>,
|
|
|
|
<command>VACUUM</command> and <command>LOCK TABLE</command>
|
|
|
|
statements.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Conflicts with RowShareLock, RowExclusiveLock, ShareLock,
|
|
|
|
ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock
|
|
|
|
modes.
|
|
|
|
|
|
|
|
<note>
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
Only AccessExclusiveLock blocks <command>SELECT</command> (without
|
|
|
|
<option>FOR UPDATE</option>) statement.
|
1999-05-26 19:27:39 +02:00
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Row-level locks</title>
|
|
|
|
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
These locks are acquired when internal
|
|
|
|
fields of a row are being updated (or deleted or marked for update).
|
1999-05-26 19:27:39 +02:00
|
|
|
<productname>Postgres</productname>
|
|
|
|
doesn't remember any information about modified rows in memory and
|
1999-05-27 17:49:08 +02:00
|
|
|
so has no limit to the number of rows locked without lock escalation.
|
1999-05-26 19:27:39 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
However, take into account that <command>SELECT FOR UPDATE</command> will modify
|
|
|
|
selected rows to mark them and so will results in disk writes.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Row-level locks don't affect data querying. They are used to block
|
|
|
|
writers to <emphasis>the same row</emphasis> only.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1>
|
|
|
|
<title>Locking and Indices</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Though <productname>Postgres</productname>
|
|
|
|
provides unblocking read/write access to table
|
1999-05-27 17:49:08 +02:00
|
|
|
data, unblocked read/write access is not provided for every
|
|
|
|
index access methods implemented
|
1999-05-26 19:27:39 +02:00
|
|
|
in <productname>Postgres</productname>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The various index types are handled as follows:
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
GiST and R-Tree indices
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
Share/exclusive index-level locks are used for read/write access.
|
1999-05-26 19:27:39 +02:00
|
|
|
Locks are released after statement is done.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
Hash indices
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
Share/exclusive page-level locks are used for read/write access.
|
1999-05-26 19:27:39 +02:00
|
|
|
Locks are released after page is processed.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Page-level locks produces better concurrency than index-level ones
|
|
|
|
but are subject to deadlocks.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>
|
|
|
|
Btree
|
|
|
|
</term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
Short-term share/exclusive page-level latches are used for
|
1999-05-26 19:27:39 +02:00
|
|
|
read/write access. Latches are released immediately after the index
|
|
|
|
tuple is inserted/fetched.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
1999-05-27 17:49:08 +02:00
|
|
|
Btree indices provide the highest concurrency without deadlock
|
1999-05-26 19:27:39 +02:00
|
|
|
conditions.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1>
|
|
|
|
<title>Data consistency checks at the application level</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Because readers in <productname>Postgres</productname>
|
|
|
|
don't lock data, regardless of
|
|
|
|
transaction isolation level, data read by one transaction can be
|
|
|
|
overwritten by another. In the other words, if a row is returned
|
|
|
|
by <command>SELECT</command> it doesn't mean that this row really
|
|
|
|
exists at the time it is returned (i.e. sometime after the
|
|
|
|
statement or transaction began) nor
|
1999-06-04 17:51:42 +02:00
|
|
|
that the row is protected from deletion or update by concurrent
|
1999-05-27 17:49:08 +02:00
|
|
|
transactions before the current transaction does a commit or rollback.
|
1999-05-26 19:27:39 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
To ensure the actual existance of a row and protect it against
|
|
|
|
concurrent updates one must use <command>SELECT FOR UPDATE</command> or
|
|
|
|
an appropriate <command>LOCK TABLE</command> statement.
|
|
|
|
This should be taken into account when porting applications using
|
|
|
|
serializable mode to <productname>Postgres</productname> from other environments.
|
|
|
|
|
|
|
|
<note>
|
|
|
|
<para>
|
|
|
|
Before version 6.5 <productname>Postgres</productname>
|
|
|
|
used read-locks and so the
|
|
|
|
above consideration is also the case
|
|
|
|
when upgrading to 6.5 (or higher) from previous
|
|
|
|
<productname>Postgres</productname> versions.
|
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode: sgml
|
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:"/usr/lib/sgml/catalog"
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|