From ac23b711dd6ccb82fb70ca0f153fe755fd809a46 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Fri, 5 Jun 2015 13:22:27 -0400 Subject: [PATCH] Fix incorrect order of database-locking operations in InitPostgres(). We should set MyProc->databaseId after acquiring the per-database lock, not beforehand. The old way risked deadlock against processes trying to copy or delete the target database, since they would first acquire the lock and then wait for processes with matching databaseId to exit; that left a window wherein an incoming process could set its databaseId and then block on the lock, while the other process had the lock and waited in vain for the incoming process to exit. CountOtherDBBackends() would time out and fail after 5 seconds, so this just resulted in an unexpected failure not a permanent lockup, but it's still annoying when it happens. A real-world example of a use-case is that short-duration connections to a template database should not cause CREATE DATABASE to fail. Doing it in the other order should be fine since the contract has always been that processes searching the ProcArray for a database ID must hold the relevant per-database lock while searching. Thus, this actually removes the former race condition that required an assumption that storing to MyProc->databaseId is atomic. It's been like this for a long time, so back-patch to all active branches. --- src/backend/utils/init/postinit.c | 44 ++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index aa67f75c0c..2fba6b772d 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -848,18 +848,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, strcpy(out_dbname, dbname); } - /* Now we can mark our PGPROC entry with the database ID */ - /* (We assume this is an atomic store so no lock is needed) */ - MyProc->databaseId = MyDatabaseId; - - /* - * We established a catalog snapshot while reading pg_authid and/or - * pg_database; but until we have set up MyDatabaseId, we won't react to - * incoming sinval messages for unshared catalogs, so we won't realize it - * if the snapshot has been invalidated. Assume it's no good anymore. - */ - InvalidateCatalogSnapshot(); - /* * Now, take a writer's lock on the database we are trying to connect to. * If there is a concurrently running DROP DATABASE on that database, this @@ -867,9 +855,13 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, * pg_database). * * Note that the lock is not held long, only until the end of this startup - * transaction. This is OK since we are already advertising our use of - * the database in the PGPROC array; anyone trying a DROP DATABASE after - * this point will see us there. + * transaction. This is OK since we will advertise our use of the + * database in the ProcArray before dropping the lock (in fact, that's the + * next thing to do). Anyone trying a DROP DATABASE after this point will + * see us in the array once they have the lock. Ordering is important for + * this because we don't want to advertise ourselves as being in this + * database until we have the lock; otherwise we create what amounts to a + * deadlock with CountOtherDBBackends(). * * Note: use of RowExclusiveLock here is reasonable because we envision * our session as being a concurrent writer of the database. If we had a @@ -881,6 +873,28 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, RowExclusiveLock); + /* + * Now we can mark our PGPROC entry with the database ID. + * + * We assume this is an atomic store so no lock is needed; though actually + * things would work fine even if it weren't atomic. Anyone searching the + * ProcArray for this database's ID should hold the database lock, so they + * would not be executing concurrently with this store. A process looking + * for another database's ID could in theory see a chance match if it read + * a partially-updated databaseId value; but as long as all such searches + * wait and retry, as in CountOtherDBBackends(), they will certainly see + * the correct value on their next try. + */ + MyProc->databaseId = MyDatabaseId; + + /* + * We established a catalog snapshot while reading pg_authid and/or + * pg_database; but until we have set up MyDatabaseId, we won't react to + * incoming sinval messages for unshared catalogs, so we won't realize it + * if the snapshot has been invalidated. Assume it's no good anymore. + */ + InvalidateCatalogSnapshot(); + /* * Recheck pg_database to make sure the target database hasn't gone away. * If there was a concurrent DROP DATABASE, this ensures we will die