From e5fac1cb1941e4adbcb88206f914e2035e5cccf2 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Wed, 16 Sep 2020 13:38:26 -0400 Subject: [PATCH] Avoid unnecessary recursion to child tables in ALTER TABLE SET NOT NULL. If a partitioned table's column is already marked NOT NULL, there is no need to examine its partitions, because we can rely on previous DDL to have enforced that the child columns are NOT NULL as well. (Unfortunately, the same cannot be said for traditional inheritance, so for now we have to restrict the optimization to partitioned tables.) Hence, we may skip recursing to child tables in this situation. The reason this case is worth worrying about is that when pg_dump dumps a partitioned table having a primary key, it will include the requisite NOT NULL markings in the CREATE TABLE commands, and then add the primary key as a separate step. The primary key addition generates a SET NOT NULL as a subcommand, just to be sure. So the situation where a SET NOT NULL is redundant does arise in the real world. Skipping the recursion does more than just save a few cycles: it means that a command such as "ALTER TABLE ONLY partition_parent ADD PRIMARY KEY" will take locks only on the partition parent table, not on the partitions. It turns out that parallel pg_restore is effectively assuming that that's true, and has little choice but to do so because the dependencies listed for such a TOC entry don't include the partitions. pg_restore could thus issue this ALTER while data restores on the partitions are still in progress. Taking unnecessary locks on the partitions not only hurts concurrency, but can lead to actual deadlock failures, as reported by Domagoj Smoljanovic. (A contributing factor in the deadlock is that TRUNCATE on a child partition wants a non-exclusive lock on the parent. This seems likewise unnecessary, but the fix for it is more invasive so we won't consider back-patching it. Fortunately, getting rid of one of these two poor behaviors is enough to remove the deadlock.) Although support for partitioned primary keys came in with v11, this patch is dependent on the SET NOT NULL refactoring done by commit f4a3fdfbd, so we can only patch back to v12. Patch by me; thanks to Alvaro Herrera and Amit Langote for review. Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com --- src/backend/commands/tablecmds.c | 45 +++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 7 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index c21a309f04..eab570a867 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -5681,14 +5681,10 @@ ATSimpleRecursion(List **wqueue, Relation rel, AlterTableUtilityContext *context) { /* - * Propagate to children if desired. Only plain tables, foreign tables - * and partitioned tables have children, so no need to search for other - * relkinds. + * Propagate to children, if desired and if there are (or might be) any + * children. */ - if (recurse && - (rel->rd_rel->relkind == RELKIND_RELATION || - rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || - rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) + if (recurse && rel->rd_rel->relhassubclass) { Oid relid = RelationGetRelid(rel); ListCell *child; @@ -6698,6 +6694,41 @@ ATPrepSetNotNull(List **wqueue, Relation rel, if (recursing) return; + /* + * If the target column is already marked NOT NULL, we can skip recursing + * to children, because their columns should already be marked NOT NULL as + * well. But there's no point in checking here unless the relation has + * some children; else we can just wait till execution to check. (If it + * does have children, however, this can save taking per-child locks + * unnecessarily. This greatly improves concurrency in some parallel + * restore scenarios.) + * + * Unfortunately, we can only apply this optimization to partitioned + * tables, because traditional inheritance doesn't enforce that child + * columns be NOT NULL when their parent is. (That's a bug that should + * get fixed someday.) + */ + if (rel->rd_rel->relhassubclass && + rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + HeapTuple tuple; + bool attnotnull; + + tuple = SearchSysCacheAttName(RelationGetRelid(rel), cmd->name); + + /* Might as well throw the error now, if name is bad */ + if (!HeapTupleIsValid(tuple)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_COLUMN), + errmsg("column \"%s\" of relation \"%s\" does not exist", + cmd->name, RelationGetRelationName(rel)))); + + attnotnull = ((Form_pg_attribute) GETSTRUCT(tuple))->attnotnull; + ReleaseSysCache(tuple); + if (attnotnull) + return; + } + /* * If we have ALTER TABLE ONLY ... SET NOT NULL on a partitioned table, * apply ALTER TABLE ... CHECK NOT NULL to every child. Otherwise, use