Fix incorrect order of database-locking operations in InitPostgres().

We should set MyProc->databaseId after acquiring the per-database lock,
not beforehand.  The old way risked deadlock against processes trying to
copy or delete the target database, since they would first acquire the lock
and then wait for processes with matching databaseId to exit; that left a
window wherein an incoming process could set its databaseId and then block
on the lock, while the other process had the lock and waited in vain for
the incoming process to exit.

CountOtherDBBackends() would time out and fail after 5 seconds, so this
just resulted in an unexpected failure not a permanent lockup, but it's
still annoying when it happens.  A real-world example of a use-case is that
short-duration connections to a template database should not cause CREATE
DATABASE to fail.

Doing it in the other order should be fine since the contract has always
been that processes searching the ProcArray for a database ID must hold the
relevant per-database lock while searching.  Thus, this actually removes
the former race condition that required an assumption that storing to
MyProc->databaseId is atomic.

It's been like this for a long time, so back-patch to all active branches.
This commit is contained in:
Tom Lane 2015-06-05 13:22:27 -04:00
parent 068cfadf9e
commit ac23b711dd
1 changed files with 29 additions and 15 deletions

View File

@ -848,18 +848,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
strcpy(out_dbname, dbname);
}
/* Now we can mark our PGPROC entry with the database ID */
/* (We assume this is an atomic store so no lock is needed) */
MyProc->databaseId = MyDatabaseId;
/*
* We established a catalog snapshot while reading pg_authid and/or
* pg_database; but until we have set up MyDatabaseId, we won't react to
* incoming sinval messages for unshared catalogs, so we won't realize it
* if the snapshot has been invalidated. Assume it's no good anymore.
*/
InvalidateCatalogSnapshot();
/*
* Now, take a writer's lock on the database we are trying to connect to.
* If there is a concurrently running DROP DATABASE on that database, this
@ -867,9 +855,13 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
* pg_database).
*
* Note that the lock is not held long, only until the end of this startup
* transaction. This is OK since we are already advertising our use of
* the database in the PGPROC array; anyone trying a DROP DATABASE after
* this point will see us there.
* transaction. This is OK since we will advertise our use of the
* database in the ProcArray before dropping the lock (in fact, that's the
* next thing to do). Anyone trying a DROP DATABASE after this point will
* see us in the array once they have the lock. Ordering is important for
* this because we don't want to advertise ourselves as being in this
* database until we have the lock; otherwise we create what amounts to a
* deadlock with CountOtherDBBackends().
*
* Note: use of RowExclusiveLock here is reasonable because we envision
* our session as being a concurrent writer of the database. If we had a
@ -881,6 +873,28 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
LockSharedObject(DatabaseRelationId, MyDatabaseId, 0,
RowExclusiveLock);
/*
* Now we can mark our PGPROC entry with the database ID.
*
* We assume this is an atomic store so no lock is needed; though actually
* things would work fine even if it weren't atomic. Anyone searching the
* ProcArray for this database's ID should hold the database lock, so they
* would not be executing concurrently with this store. A process looking
* for another database's ID could in theory see a chance match if it read
* a partially-updated databaseId value; but as long as all such searches
* wait and retry, as in CountOtherDBBackends(), they will certainly see
* the correct value on their next try.
*/
MyProc->databaseId = MyDatabaseId;
/*
* We established a catalog snapshot while reading pg_authid and/or
* pg_database; but until we have set up MyDatabaseId, we won't react to
* incoming sinval messages for unshared catalogs, so we won't realize it
* if the snapshot has been invalidated. Assume it's no good anymore.
*/
InvalidateCatalogSnapshot();
/*
* Recheck pg_database to make sure the target database hasn't gone away.
* If there was a concurrent DROP DATABASE, this ensures we will die