Multi-Version Concurrency Control concurrency Multi-Version Concurrency Control (MVCC) is an advanced technique for improving database performance in a multi-user environment. Vadim Mikheev (vadim@krs.ru) provided the implementation for Postgres. Introduction Unlike most other database systems which use locks for concurrency control, Postgres maintains data consistency by using a multiversion model. This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session. The main difference between multiversion and lock models is that in MVCC locks acquired for querying (reading) data don't conflict with locks acquired for writing data and so reading never blocks writing and writing never blocks reading. Transaction Isolation The ANSI/ISO SQL standard defines four levels of transaction isolation in terms of three phenomena that must be prevented between concurrent transactions. These undesirable phenomena are: dirty reads dirty reads A transaction reads data written by concurrent uncommitted transaction. non-repeatable reads non-repeatable reads A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read). phantom read phantom reads A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. isolation levels The four isolation levels and the corresponding behaviors are described below. <acronym>ANSI</acronym>/<acronym>ISO</acronym> <acronym>SQL</acronym> Isolation LevelsIsolation Levels Isolation Level Dirty Read Non-Repeatable Read Phantom Read Read uncommitted Possible Possible Possible Read committed Not possible Possible Possible Repeatable read Not possible Not possible Possible Serializable Not possible Not possible Not possible
Postgres offers the read committed and serializable isolation levels.
Read Committed Isolation Level isolation levels read committed Read Committed is the default isolation level in Postgres. When a transaction runs on this isolation level, a SELECT query sees only data committed before the query began and never sees either uncommitted data or changes committed during query execution by concurrent transactions. (However, the SELECT does see the effects of previous updates executed within this same transaction, even though they are not yet committed.) Notice that two successive SELECTs can see different data, even though they are within a single transaction, when other transactions commit changes during execution of the first SELECT. If a target row found by a query while executing an UPDATE statement (or DELETE or SELECT FOR UPDATE) has already been updated by a concurrent uncommitted transaction then the second transaction that tries to update this row will wait for the other transaction to commit or rollback. In the case of rollback, the waiting transaction can proceed to change the row. In the case of commit (and if the row still exists; i.e. was not deleted by the other transaction), the query will be re-executed for this row to check that the new row version still satisfies the query search condition. If the new row version satisfies the query search condition then the row will be updated (or deleted or marked for update). Note that the starting point for the update will be the new row version; moreover, after the update the doubly-updated row is visible to subsequent SELECTs in the current transaction. Thus, the current transaction is able to see the effects of the other transaction for this specific row. The partial transaction isolation provided by Read Committed level is adequate for many applications, and this level is fast and simple to use. However, for applications that do complex queries and updates, it may be necessary to guarantee a more rigorously consistent view of the database than Read Committed level provides. Serializable Isolation Level isolation levels read serializable Serializable provides the highest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, applications using this level must be prepared to retry transactions due to serialization failures. When a transaction is on the serializable level, a SELECT query sees only data committed before the transaction began and never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. (However, the SELECT does see the effects of previous updates executed within this same transaction, even though they are not yet committed.) This is different from Read Committed in that the SELECT sees a snapshot as of the start of the transaction, not as of the start of the current query within the transaction. If a target row found by a query while executing an UPDATE statement (or DELETE or SELECT FOR UPDATE) has already been updated by a concurrent uncommitted transaction then the second transaction that tries to update this row will wait for the other transaction to commit or rollback. In the case of rollback, the waiting transaction can proceed to change the row. In the case of a concurrent transaction commit, a serializable transaction will be rolled back with the message ERROR: Can't serialize access due to concurrent update because a serializable transaction cannot modify rows changed by other transactions after the serializable transaction began. When the application receives this error message, it should abort the current transaction and then retry the whole transaction from the beginning. The second time through, the transaction sees the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update. Note that only updating transactions may need to be retried --- read-only transactions never have serialization conflicts. Serializable transaction level provides a rigorous guarantee that each transaction sees a wholly consistent view of the database. However, the application has to be prepared to retry transactions when concurrent updates make it impossible to sustain the illusion of serial execution, and the cost of redoing complex transactions may be significant. So this level is recommended only when update queries contain logic sufficiently complex that they may give wrong answers in Read Committed level. Data consistency checks at the application level Because readers in Postgres don't lock data, regardless of transaction isolation level, data read by one transaction can be overwritten by another concurrent transaction. In other words, if a row is returned by SELECT it doesn't mean that the row still exists at the time it is returned (i.e. sometime after the current transaction began); the row might have been modified or deleted by an already-committed transaction that committed after this one started. Even if the row is still valid now, it could be changed or deleted before the current transaction does a commit or rollback. Another way to think about it is that each transaction sees a snapshot of the database contents, and concurrently executing transactions may very well see different snapshots. So the whole concept of now is somewhat suspect anyway. This is not normally a big problem if the client applications are isolated from each other, but if the clients can communicate via channels outside the database then serious confusion may ensue. To ensure the current existence of a row and protect it against concurrent updates one must use SELECT FOR UPDATE or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE locks just the returned rows against concurrent updates, while LOCK TABLE protects the whole table.) This should be taken into account when porting applications to Postgres from other environments. Before version 6.5 Postgres used read-locks and so the above consideration is also the case when upgrading to 6.5 (or higher) from previous Postgres versions. Locking and Tables locking Postgres provides various lock modes to control concurrent access to data in tables. Some of these lock modes are acquired by Postgres automatically before statement execution, while others are provided to be used by applications. All lock modes acquired in a transaction are held for the duration of the transaction. Table-level locks AccessShareLock A read-lock mode acquired automatically on tables being queried. Conflicts with AccessExclusiveLock only. RowShareLock Acquired by SELECT FOR UPDATE and LOCK TABLE for statements. Conflicts with ExclusiveLock and AccessExclusiveLock modes. RowExclusiveLock Acquired by UPDATE, DELETE, INSERT and LOCK TABLE for statements. Conflicts with ShareLock, ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock modes. ShareUpdateExclusiveLock Acquired by VACUUM (without ) and LOCK TABLE table for statements. Conflicts with ShareUpdateExclusiveLock, ShareLock, ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock modes. ShareLock Acquired by CREATE INDEX and LOCK TABLE table for statements. Conflicts with RowExclusiveLock, ShareUpdateExclusiveLock, ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock modes. ShareRowExclusiveLock Acquired by LOCK TABLE for statements. Conflicts with RowExclusiveLock, ShareUpdateExclusiveLock, ShareLock, ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock modes. ExclusiveLock Acquired by LOCK TABLE table for statements. Conflicts with RowShareLock, RowExclusiveLock, ShareUpdateExclusiveLock, ShareLock, ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock modes. AccessExclusiveLock Acquired by ALTER TABLE, DROP TABLE, VACUUM FULL and LOCK TABLE statements. Conflicts with all modes (AccessShareLock, RowShareLock, RowExclusiveLock, ShareUpdateExclusiveLock, ShareLock, ShareRowExclusiveLock, ExclusiveLock and AccessExclusiveLock). Only AccessExclusiveLock blocks SELECT (without ) statement. Row-level locks These locks are acquired when rows are being updated (or deleted or marked for update). Row-level locks don't affect data querying. They block writers to the same row only. Postgres doesn't remember any information about modified rows in memory and so has no limit to the number of rows locked at one time. However, locking a row may cause a disk write; thus, for example, SELECT FOR UPDATE will modify selected rows to mark them and so will result in disk writes. In addition to table and row locks, short-term share/exclusive locks are used to control read/write access to table pages in the shared buffer pool. These locks are released immediately after a tuple is fetched or updated. Application writers normally need not be concerned with page-level locks, but we mention them for completeness. Locking and Indexes Though Postgres provides nonblocking read/write access to table data, nonblocking read/write access is not currently offered for every index access method implemented in Postgres. The various index types are handled as follows: GiST and R-Tree indexes Share/exclusive index-level locks are used for read/write access. Locks are released after statement is done. Hash indexes Share/exclusive page-level locks are used for read/write access. Locks are released after page is processed. Page-level locks provide better concurrency than index-level ones but are subject to deadlocks. B-tree indexes Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each index tuple is fetched/inserted. B-tree indexes provide the highest concurrency without deadlock conditions. In short, B-tree indexes are the recommended index type for concurrent applications.