postgresql/src/bin/pg_dump/pg_dump_sort.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1510 lines
45 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* pg_dump_sort.c
* Sort the items of a dump into a safe order for dumping
*
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/bin/pg_dump/pg_dump_sort.c
*
*-------------------------------------------------------------------------
*/
#include "postgres_fe.h"
#include "catalog/pg_class_d.h"
#include "common/int.h"
#include "lib/binaryheap.h"
#include "pg_backup_archiver.h"
#include "pg_backup_utils.h"
#include "pg_dump.h"
/*
* Sort priority for database object types.
* Objects are sorted by type, and within a type by name.
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
*
Fix pg_dump/pg_restore to restore event triggers later. Previously, event triggers were restored just after regular triggers (and FK constraints, which are basically triggers). This is risky since an event trigger, once installed, could interfere with subsequent restore commands. Worse, because event triggers don't have any particular dependencies on any post-data objects, a parallel restore would consider them eligible to be restored the moment the post-data phase starts, allowing them to also interfere with restoration of a whole bunch of objects that would have been restored before them in a serial restore. There's no way to completely remove the risk of a misguided event trigger breaking the restore, since if nothing else it could break other event triggers. But we can certainly push them to later in the process to minimize the hazard. To fix, tweak the RestorePass mechanism introduced by commit 3eb9a5e7c so that event triggers are handled as part of the post-ACL processing pass (renaming the "REFRESH" pass to "POST_ACL" to reflect its more general use). This will cause them to restore after everything except matview refreshes, which seems OK since matview refreshes really ought to run in the post-restore state of the database. In a parallel restore, event triggers and matview refreshes might be intermixed, but that seems all right as well. Also update the code and comments in pg_dump_sort.c so that its idea of how things are sorted agrees with what actually happens due to the RestorePass mechanism. This is mostly cosmetic: it'll affect the order of objects in a dump's TOC, but not the actual restore order. But not changing that would be quite confusing to somebody reading the code. Back-patch to all supported branches. Fabrízio de Royes Mello, tweaked a bit by me Discussion: https://postgr.es/m/CAFcNs+ow1hmFox8P--3GSdtwz-S3Binb6ZmoP6Vk+Xg=K6eZNA@mail.gmail.com
2020-03-09 19:58:11 +01:00
* Triggers, event triggers, and materialized views are intentionally sorted
* late. Triggers must be restored after all data modifications, so that
* they don't interfere with loading data. Event triggers are restored
* next-to-last so that they don't interfere with object creations of any
* kind. Matview refreshes are last because they should execute in the
* database's normal state (e.g., they must come after all ACLs are restored;
* also, if they choose to look at system catalogs, they should see the final
* restore state). If you think to change this, see also the RestorePass
* mechanism in pg_backup_archiver.c.
*
Adjust pg_dump's priority ordering for casts. When a stored expression depends on a user-defined cast, the backend records the dependency as being on the cast's implementation function --- or indeed, if there's no cast function involved but just RelabelType or CoerceViaIO, no dependency is recorded at all. This is problematic for pg_dump, which is at risk of dumping things in the wrong order leading to restore failures. Given the lack of previous reports, the risk isn't that high, but it can be demonstrated if the cast is used in some view whose rowtype is then used as an input or result type for some other function. (That results in the view getting hoisted into the functions portion of the dump, ahead of the cast.) A logically bulletproof fix for this would require including the cast's OID in the parsed form of the expression, whence it could be extracted by dependency.c, and then the stored dependency would force pg_dump to do the right thing. Such a change would be fairly invasive, and certainly not back-patchable. Moreover, since we'd prefer that an expression using cast syntax be equal() to one doing the same thing by explicit function call, the cast OID field would have to have special ignored-by-comparisons semantics, making things messy. So, let's instead fix this by a very simple hack in pg_dump: change the object-type priority order so that casts are initially sorted before functions, immediately after types. This fixes the problem in a fairly direct way for casts that have no implementation function. For those that do, the implementation function will be hoisted to just before the cast by the dependency sorting step, so that we still have a valid dump order. (I'm not sure that this provides a full guarantee of no problems; but since it's been like this for many years without any previous reports, this is probably enough to fix it in practice.) Per report from Дмитрий Иванов. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAPL5KHoGa3uvyKp6z6m48LwCnTsK+LRQ_mcA4uKGfqAVSEjV_A@mail.gmail.com
2021-11-22 23:16:29 +01:00
* On the other hand, casts are intentionally sorted earlier than you might
* expect; logically they should come after functions, since they usually
* depend on those. This works around the backend's habit of recording
* views that use casts as dependent on the cast's underlying function.
* We initially sort casts first, and then any functions used by casts
* will be hoisted above the casts, and in turn views that those functions
* depend on will be hoisted above the functions. But views not used that
* way won't be hoisted.
*
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
* NOTE: object-type priorities must match the section assignments made in
* pg_dump.c; that is, PRE_DATA objects must sort before DO_PRE_DATA_BOUNDARY,
* POST_DATA objects must sort after DO_POST_DATA_BOUNDARY, and DATA objects
* must sort between them.
*/
/* This enum lists the priority levels in order */
enum dbObjectTypePriorities
{
PRIO_NAMESPACE = 1,
PRIO_PROCLANG,
PRIO_COLLATION,
PRIO_TRANSFORM,
PRIO_EXTENSION,
PRIO_TYPE, /* used for DO_TYPE and DO_SHELL_TYPE */
Adjust pg_dump's priority ordering for casts. When a stored expression depends on a user-defined cast, the backend records the dependency as being on the cast's implementation function --- or indeed, if there's no cast function involved but just RelabelType or CoerceViaIO, no dependency is recorded at all. This is problematic for pg_dump, which is at risk of dumping things in the wrong order leading to restore failures. Given the lack of previous reports, the risk isn't that high, but it can be demonstrated if the cast is used in some view whose rowtype is then used as an input or result type for some other function. (That results in the view getting hoisted into the functions portion of the dump, ahead of the cast.) A logically bulletproof fix for this would require including the cast's OID in the parsed form of the expression, whence it could be extracted by dependency.c, and then the stored dependency would force pg_dump to do the right thing. Such a change would be fairly invasive, and certainly not back-patchable. Moreover, since we'd prefer that an expression using cast syntax be equal() to one doing the same thing by explicit function call, the cast OID field would have to have special ignored-by-comparisons semantics, making things messy. So, let's instead fix this by a very simple hack in pg_dump: change the object-type priority order so that casts are initially sorted before functions, immediately after types. This fixes the problem in a fairly direct way for casts that have no implementation function. For those that do, the implementation function will be hoisted to just before the cast by the dependency sorting step, so that we still have a valid dump order. (I'm not sure that this provides a full guarantee of no problems; but since it's been like this for many years without any previous reports, this is probably enough to fix it in practice.) Per report from Дмитрий Иванов. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAPL5KHoGa3uvyKp6z6m48LwCnTsK+LRQ_mcA4uKGfqAVSEjV_A@mail.gmail.com
2021-11-22 23:16:29 +01:00
PRIO_CAST,
PRIO_FUNC,
PRIO_AGG,
PRIO_ACCESS_METHOD,
PRIO_OPERATOR,
PRIO_OPFAMILY, /* used for DO_OPFAMILY and DO_OPCLASS */
PRIO_CONVERSION,
PRIO_TSPARSER,
PRIO_TSTEMPLATE,
PRIO_TSDICT,
PRIO_TSCONFIG,
PRIO_FDW,
PRIO_FOREIGN_SERVER,
PRIO_TABLE,
PRIO_TABLE_ATTACH,
PRIO_DUMMY_TYPE,
PRIO_ATTRDEF,
PRIO_LARGE_OBJECT,
PRIO_PRE_DATA_BOUNDARY, /* boundary! */
PRIO_TABLE_DATA,
PRIO_SEQUENCE_SET,
PRIO_LARGE_OBJECT_DATA,
PRIO_POST_DATA_BOUNDARY, /* boundary! */
PRIO_CONSTRAINT,
PRIO_INDEX,
PRIO_INDEX_ATTACH,
PRIO_STATSEXT,
PRIO_RULE,
PRIO_TRIGGER,
PRIO_FK_CONSTRAINT,
PRIO_POLICY,
PRIO_PUBLICATION,
PRIO_PUBLICATION_REL,
PRIO_PUBLICATION_TABLE_IN_SCHEMA,
PRIO_SUBSCRIPTION,
Allow upgrades to preserve the full subscription's state. This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 03:38:46 +01:00
PRIO_SUBSCRIPTION_REL,
PRIO_DEFAULT_ACL, /* done in ACL pass */
PRIO_EVENT_TRIGGER, /* must be next to last! */
PRIO_REFRESH_MATVIEW /* must be last! */
};
/* This table is indexed by enum DumpableObjectType */
static const int dbObjectTypePriority[] =
{
PRIO_NAMESPACE, /* DO_NAMESPACE */
PRIO_EXTENSION, /* DO_EXTENSION */
PRIO_TYPE, /* DO_TYPE */
PRIO_TYPE, /* DO_SHELL_TYPE */
PRIO_FUNC, /* DO_FUNC */
PRIO_AGG, /* DO_AGG */
PRIO_OPERATOR, /* DO_OPERATOR */
PRIO_ACCESS_METHOD, /* DO_ACCESS_METHOD */
PRIO_OPFAMILY, /* DO_OPCLASS */
PRIO_OPFAMILY, /* DO_OPFAMILY */
PRIO_COLLATION, /* DO_COLLATION */
PRIO_CONVERSION, /* DO_CONVERSION */
PRIO_TABLE, /* DO_TABLE */
PRIO_TABLE_ATTACH, /* DO_TABLE_ATTACH */
PRIO_ATTRDEF, /* DO_ATTRDEF */
PRIO_INDEX, /* DO_INDEX */
PRIO_INDEX_ATTACH, /* DO_INDEX_ATTACH */
PRIO_STATSEXT, /* DO_STATSEXT */
PRIO_RULE, /* DO_RULE */
PRIO_TRIGGER, /* DO_TRIGGER */
PRIO_CONSTRAINT, /* DO_CONSTRAINT */
PRIO_FK_CONSTRAINT, /* DO_FK_CONSTRAINT */
PRIO_PROCLANG, /* DO_PROCLANG */
PRIO_CAST, /* DO_CAST */
PRIO_TABLE_DATA, /* DO_TABLE_DATA */
PRIO_SEQUENCE_SET, /* DO_SEQUENCE_SET */
PRIO_DUMMY_TYPE, /* DO_DUMMY_TYPE */
PRIO_TSPARSER, /* DO_TSPARSER */
PRIO_TSDICT, /* DO_TSDICT */
PRIO_TSTEMPLATE, /* DO_TSTEMPLATE */
PRIO_TSCONFIG, /* DO_TSCONFIG */
PRIO_FDW, /* DO_FDW */
PRIO_FOREIGN_SERVER, /* DO_FOREIGN_SERVER */
PRIO_DEFAULT_ACL, /* DO_DEFAULT_ACL */
PRIO_TRANSFORM, /* DO_TRANSFORM */
PRIO_LARGE_OBJECT, /* DO_LARGE_OBJECT */
PRIO_LARGE_OBJECT_DATA, /* DO_LARGE_OBJECT_DATA */
PRIO_PRE_DATA_BOUNDARY, /* DO_PRE_DATA_BOUNDARY */
PRIO_POST_DATA_BOUNDARY, /* DO_POST_DATA_BOUNDARY */
PRIO_EVENT_TRIGGER, /* DO_EVENT_TRIGGER */
PRIO_REFRESH_MATVIEW, /* DO_REFRESH_MATVIEW */
PRIO_POLICY, /* DO_POLICY */
PRIO_PUBLICATION, /* DO_PUBLICATION */
PRIO_PUBLICATION_REL, /* DO_PUBLICATION_REL */
PRIO_PUBLICATION_TABLE_IN_SCHEMA, /* DO_PUBLICATION_TABLE_IN_SCHEMA */
Allow upgrades to preserve the full subscription's state. This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 03:38:46 +01:00
PRIO_SUBSCRIPTION, /* DO_SUBSCRIPTION */
PRIO_SUBSCRIPTION_REL /* DO_SUBSCRIPTION_REL */
};
Allow upgrades to preserve the full subscription's state. This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 03:38:46 +01:00
StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION_REL + 1),
"array length mismatch");
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
static DumpId preDataBoundId;
static DumpId postDataBoundId;
static int DOTypeNameCompare(const void *p1, const void *p2);
static bool TopoSort(DumpableObject **objs,
int numObjs,
DumpableObject **ordering,
int *nOrdering);
static void findDependencyLoops(DumpableObject **objs, int nObjs, int totObjs);
static int findLoop(DumpableObject *obj,
DumpId startPoint,
bool *processed,
DumpId *searchFailed,
DumpableObject **workspace,
int depth);
static void repairDependencyLoop(DumpableObject **loop,
int nLoop);
static void describeDumpableObject(DumpableObject *obj,
char *buf, int bufsize);
static int int_cmp(void *a, void *b, void *arg);
/*
* Sort the given objects into a type/name-based ordering
*
* Normally this is just the starting point for the dependency-based
* ordering.
*/
void
sortDumpableObjectsByTypeName(DumpableObject **objs, int numObjs)
{
if (numObjs > 1)
qsort(objs, numObjs, sizeof(DumpableObject *),
DOTypeNameCompare);
}
static int
DOTypeNameCompare(const void *p1, const void *p2)
{
DumpableObject *obj1 = *(DumpableObject *const *) p1;
DumpableObject *obj2 = *(DumpableObject *const *) p2;
int cmpval;
/* Sort by type's priority */
cmpval = dbObjectTypePriority[obj1->objType] -
dbObjectTypePriority[obj2->objType];
if (cmpval != 0)
return cmpval;
/*
* Sort by namespace. Typically, all objects of the same priority would
* either have or not have a namespace link, but there are exceptions.
* Sort NULL namespace after non-NULL in such cases.
*/
if (obj1->namespace)
{
if (obj2->namespace)
{
cmpval = strcmp(obj1->namespace->dobj.name,
obj2->namespace->dobj.name);
if (cmpval != 0)
return cmpval;
}
else
return -1;
}
else if (obj2->namespace)
return 1;
/* Sort by name */
cmpval = strcmp(obj1->name, obj2->name);
if (cmpval != 0)
return cmpval;
/* To have a stable sort order, break ties for some object types */
if (obj1->objType == DO_FUNC || obj1->objType == DO_AGG)
{
FuncInfo *fobj1 = *(FuncInfo *const *) p1;
FuncInfo *fobj2 = *(FuncInfo *const *) p2;
int i;
/* Sort by number of arguments, then argument type names */
cmpval = fobj1->nargs - fobj2->nargs;
if (cmpval != 0)
return cmpval;
for (i = 0; i < fobj1->nargs; i++)
{
TypeInfo *argtype1 = findTypeByOid(fobj1->argtypes[i]);
TypeInfo *argtype2 = findTypeByOid(fobj2->argtypes[i]);
if (argtype1 && argtype2)
{
if (argtype1->dobj.namespace && argtype2->dobj.namespace)
{
cmpval = strcmp(argtype1->dobj.namespace->dobj.name,
argtype2->dobj.namespace->dobj.name);
if (cmpval != 0)
return cmpval;
}
cmpval = strcmp(argtype1->dobj.name, argtype2->dobj.name);
if (cmpval != 0)
return cmpval;
}
}
}
else if (obj1->objType == DO_OPERATOR)
{
OprInfo *oobj1 = *(OprInfo *const *) p1;
OprInfo *oobj2 = *(OprInfo *const *) p2;
/* oprkind is 'l', 'r', or 'b'; this sorts prefix, postfix, infix */
cmpval = (oobj2->oprkind - oobj1->oprkind);
if (cmpval != 0)
return cmpval;
}
else if (obj1->objType == DO_ATTRDEF)
{
AttrDefInfo *adobj1 = *(AttrDefInfo *const *) p1;
AttrDefInfo *adobj2 = *(AttrDefInfo *const *) p2;
/* Sort by attribute number */
cmpval = (adobj1->adnum - adobj2->adnum);
if (cmpval != 0)
return cmpval;
}
else if (obj1->objType == DO_POLICY)
{
PolicyInfo *pobj1 = *(PolicyInfo *const *) p1;
PolicyInfo *pobj2 = *(PolicyInfo *const *) p2;
/* Sort by table name (table namespace was considered already) */
cmpval = strcmp(pobj1->poltable->dobj.name,
pobj2->poltable->dobj.name);
if (cmpval != 0)
return cmpval;
}
else if (obj1->objType == DO_TRIGGER)
{
TriggerInfo *tobj1 = *(TriggerInfo *const *) p1;
TriggerInfo *tobj2 = *(TriggerInfo *const *) p2;
/* Sort by table name (table namespace was considered already) */
cmpval = strcmp(tobj1->tgtable->dobj.name,
tobj2->tgtable->dobj.name);
if (cmpval != 0)
return cmpval;
}
/* Usually shouldn't get here, but if we do, sort by OID */
return oidcmp(obj1->catId.oid, obj2->catId.oid);
}
/*
* Sort the given objects into a safe dump order using dependency
* information (to the extent we have it available).
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
*
* The DumpIds of the PRE_DATA_BOUNDARY and POST_DATA_BOUNDARY objects are
* passed in separately, in case we need them during dependency loop repair.
*/
void
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
sortDumpableObjects(DumpableObject **objs, int numObjs,
DumpId preBoundaryId, DumpId postBoundaryId)
{
DumpableObject **ordering;
int nOrdering;
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
if (numObjs <= 0) /* can't happen anymore ... */
return;
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
/*
* Saving the boundary IDs in static variables is a bit grotty, but seems
* better than adding them to parameter lists of subsidiary functions.
*/
preDataBoundId = preBoundaryId;
postDataBoundId = postBoundaryId;
ordering = (DumpableObject **) pg_malloc(numObjs * sizeof(DumpableObject *));
while (!TopoSort(objs, numObjs, ordering, &nOrdering))
findDependencyLoops(ordering, nOrdering, numObjs);
memcpy(objs, ordering, numObjs * sizeof(DumpableObject *));
free(ordering);
}
/*
* TopoSort -- topological sort of a dump list
*
* Generate a re-ordering of the dump list that satisfies all the dependency
* constraints shown in the dump list. (Each such constraint is a fact of a
* partial ordering.) Minimize rearrangement of the list not needed to
* achieve the partial ordering.
*
* The input is the list of numObjs objects in objs[]. This list is not
* modified.
*
* Returns true if able to build an ordering that satisfies all the
* constraints, false if not (there are contradictory constraints).
*
* On success (true result), ordering[] is filled with a sorted array of
* DumpableObject pointers, of length equal to the input list length.
*
* On failure (false result), ordering[] is filled with an unsorted array of
* DumpableObject pointers of length *nOrdering, listing the objects that
* prevented the sort from being completed. In general, these objects either
* participate directly in a dependency cycle, or are depended on by objects
* that are in a cycle. (The latter objects are not actually problematic,
* but it takes further analysis to identify which are which.)
*
* The caller is responsible for allocating sufficient space at *ordering.
*/
static bool
TopoSort(DumpableObject **objs,
int numObjs,
DumpableObject **ordering, /* output argument */
int *nOrdering) /* output argument */
{
DumpId maxDumpId = getMaxDumpId();
binaryheap *pendingHeap;
int *beforeConstraints;
int *idMap;
DumpableObject *obj;
int i,
j,
k;
/*
* This is basically the same algorithm shown for topological sorting in
* Knuth's Volume 1. However, we would like to minimize unnecessary
* rearrangement of the input ordering; that is, when we have a choice of
* which item to output next, we always want to take the one highest in
* the original list. Therefore, instead of maintaining an unordered
* linked list of items-ready-to-output as Knuth does, we maintain a heap
* of their item numbers, which we can use as a priority queue. This
* turns the algorithm from O(N) to O(N log N) because each insertion or
* removal of a heap item takes O(log N) time. However, that's still
* plenty fast enough for this application.
*/
*nOrdering = numObjs; /* for success return */
/* Eliminate the null case */
if (numObjs <= 0)
return true;
/* Create workspace for the above-described heap */
pendingHeap = binaryheap_allocate(numObjs, int_cmp, NULL);
/*
* Scan the constraints, and for each item in the input, generate a count
* of the number of constraints that say it must be before something else.
* The count for the item with dumpId j is stored in beforeConstraints[j].
* We also make a map showing the input-order index of the item with
* dumpId j.
*/
beforeConstraints = (int *) pg_malloc0((maxDumpId + 1) * sizeof(int));
idMap = (int *) pg_malloc((maxDumpId + 1) * sizeof(int));
for (i = 0; i < numObjs; i++)
{
obj = objs[i];
j = obj->dumpId;
if (j <= 0 || j > maxDumpId)
pg_fatal("invalid dumpId %d", j);
idMap[j] = i;
for (j = 0; j < obj->nDeps; j++)
{
k = obj->dependencies[j];
if (k <= 0 || k > maxDumpId)
pg_fatal("invalid dependency %d", k);
beforeConstraints[k]++;
}
}
/*
* Now initialize the heap of items-ready-to-output by filling it with the
* indexes of items that already have beforeConstraints[id] == 0.
*
* We enter the indexes into pendingHeap in decreasing order so that the
* heap invariant is satisfied at the completion of this loop. This
* reduces the amount of work that binaryheap_build() must do.
*/
for (i = numObjs; --i >= 0;)
{
if (beforeConstraints[objs[i]->dumpId] == 0)
binaryheap_add_unordered(pendingHeap, (void *) (intptr_t) i);
}
binaryheap_build(pendingHeap);
/*--------------------
* Now emit objects, working backwards in the output list. At each step,
* we use the priority heap to select the last item that has no remaining
* before-constraints. We remove that item from the heap, output it to
* ordering[], and decrease the beforeConstraints count of each of the
* items it was constrained against. Whenever an item's beforeConstraints
* count is thereby decreased to zero, we insert it into the priority heap
* to show that it is a candidate to output. We are done when the heap
* becomes empty; if we have output every element then we succeeded,
* otherwise we failed.
* i = number of ordering[] entries left to output
* j = objs[] index of item we are outputting
* k = temp for scanning constraint list for item j
*--------------------
*/
i = numObjs;
while (!binaryheap_empty(pendingHeap))
{
/* Select object to output by removing largest heap member */
j = (int) (intptr_t) binaryheap_remove_first(pendingHeap);
obj = objs[j];
/* Output candidate to ordering[] */
ordering[--i] = obj;
/* Update beforeConstraints counts of its predecessors */
for (k = 0; k < obj->nDeps; k++)
{
int id = obj->dependencies[k];
if ((--beforeConstraints[id]) == 0)
binaryheap_add(pendingHeap, (void *) (intptr_t) idMap[id]);
}
}
/*
* If we failed, report the objects that couldn't be output; these are the
* ones with beforeConstraints[] still nonzero.
*/
if (i != 0)
{
k = 0;
for (j = 1; j <= maxDumpId; j++)
{
if (beforeConstraints[j] != 0)
ordering[k++] = objs[idMap[j]];
}
*nOrdering = k;
}
/* Done */
binaryheap_free(pendingHeap);
free(beforeConstraints);
free(idMap);
return (i == 0);
}
/*
* findDependencyLoops - identify loops in TopoSort's failure output,
* and pass each such loop to repairDependencyLoop() for action
*
* In general there may be many loops in the set of objects returned by
* TopoSort; for speed we should try to repair as many loops as we can
* before trying TopoSort again. We can safely repair loops that are
* disjoint (have no members in common); if we find overlapping loops
* then we repair only the first one found, because the action taken to
* repair the first might have repaired the other as well. (If not,
* we'll fix it on the next go-round.)
*
* objs[] lists the objects TopoSort couldn't sort
* nObjs is the number of such objects
* totObjs is the total number of objects in the universe
*/
static void
findDependencyLoops(DumpableObject **objs, int nObjs, int totObjs)
{
/*
* We use three data structures here:
*
* processed[] is a bool array indexed by dump ID, marking the objects
* already processed during this invocation of findDependencyLoops().
*
* searchFailed[] is another array indexed by dump ID. searchFailed[j] is
* set to dump ID k if we have proven that there is no dependency path
* leading from object j back to start point k. This allows us to skip
* useless searching when there are multiple dependency paths from k to j,
* which is a common situation. We could use a simple bool array for
* this, but then we'd need to re-zero it for each start point, resulting
* in O(N^2) zeroing work. Using the start point's dump ID as the "true"
* value lets us skip clearing the array before we consider the next start
* point.
*
* workspace[] is an array of DumpableObject pointers, in which we try to
* build lists of objects constituting loops. We make workspace[] large
* enough to hold all the objects in TopoSort's output, which is huge
* overkill in most cases but could theoretically be necessary if there is
* a single dependency chain linking all the objects.
*/
bool *processed;
DumpId *searchFailed;
DumpableObject **workspace;
bool fixedloop;
int i;
processed = (bool *) pg_malloc0((getMaxDumpId() + 1) * sizeof(bool));
searchFailed = (DumpId *) pg_malloc0((getMaxDumpId() + 1) * sizeof(DumpId));
workspace = (DumpableObject **) pg_malloc(totObjs * sizeof(DumpableObject *));
fixedloop = false;
for (i = 0; i < nObjs; i++)
{
DumpableObject *obj = objs[i];
int looplen;
int j;
looplen = findLoop(obj,
obj->dumpId,
processed,
searchFailed,
workspace,
0);
if (looplen > 0)
{
/* Found a loop, repair it */
repairDependencyLoop(workspace, looplen);
fixedloop = true;
/* Mark loop members as processed */
for (j = 0; j < looplen; j++)
processed[workspace[j]->dumpId] = true;
}
else
{
/*
* There's no loop starting at this object, but mark it processed
* anyway. This is not necessary for correctness, but saves later
* invocations of findLoop() from uselessly chasing references to
* such an object.
*/
processed[obj->dumpId] = true;
}
}
/* We'd better have fixed at least one loop */
if (!fixedloop)
pg_fatal("could not identify dependency loop");
free(workspace);
free(searchFailed);
free(processed);
}
/*
* Recursively search for a circular dependency loop that doesn't include
* any already-processed objects.
*
* obj: object we are examining now
* startPoint: dumpId of starting object for the hoped-for circular loop
* processed[]: flag array marking already-processed objects
* searchFailed[]: flag array marking already-unsuccessfully-visited objects
* workspace[]: work array in which we are building list of loop members
* depth: number of valid entries in workspace[] at call
*
* On success, the length of the loop is returned, and workspace[] is filled
* with pointers to the members of the loop. On failure, we return 0.
*
* Note: it is possible that the given starting object is a member of more
* than one cycle; if so, we will find an arbitrary one of the cycles.
*/
static int
findLoop(DumpableObject *obj,
DumpId startPoint,
bool *processed,
DumpId *searchFailed,
DumpableObject **workspace,
int depth)
{
int i;
/*
* Reject if obj is already processed. This test prevents us from finding
* loops that overlap previously-processed loops.
*/
if (processed[obj->dumpId])
return 0;
/*
* If we've already proven there is no path from this object back to the
* startPoint, forget it.
*/
if (searchFailed[obj->dumpId] == startPoint)
return 0;
/*
* Reject if obj is already present in workspace. This test prevents us
* from going into infinite recursion if we are given a startPoint object
* that links to a cycle it's not a member of, and it guarantees that we
* can't overflow the allocated size of workspace[].
*/
for (i = 0; i < depth; i++)
{
if (workspace[i] == obj)
return 0;
}
2004-08-29 07:07:03 +02:00
/*
* Okay, tentatively add obj to workspace
*/
workspace[depth++] = obj;
2004-08-29 07:07:03 +02:00
/*
* See if we've found a loop back to the desired startPoint; if so, done
*/
for (i = 0; i < obj->nDeps; i++)
{
if (obj->dependencies[i] == startPoint)
return depth;
}
2004-08-29 07:07:03 +02:00
/*
* Recurse down each outgoing branch
*/
for (i = 0; i < obj->nDeps; i++)
{
DumpableObject *nextobj = findObjectByDumpId(obj->dependencies[i]);
int newDepth;
if (!nextobj)
continue; /* ignore dependencies on undumped objects */
newDepth = findLoop(nextobj,
startPoint,
processed,
searchFailed,
workspace,
depth);
if (newDepth > 0)
return newDepth;
}
/*
* Remember there is no path from here back to startPoint
*/
searchFailed[obj->dumpId] = startPoint;
return 0;
}
/*
* A user-defined datatype will have a dependency loop with each of its
* I/O functions (since those have the datatype as input or output).
* Similarly, a range type will have a loop with its canonicalize function,
* if any. Break the loop by making the function depend on the associated
* shell type, instead.
*/
static void
repairTypeFuncLoop(DumpableObject *typeobj, DumpableObject *funcobj)
{
TypeInfo *typeInfo = (TypeInfo *) typeobj;
/* remove function's dependency on type */
removeObjectDependency(funcobj, typeobj->dumpId);
/* add function's dependency on shell type, instead */
if (typeInfo->shellType)
{
addObjectDependency(funcobj, typeInfo->shellType->dobj.dumpId);
2016-06-10 00:02:36 +02:00
/*
* Mark shell type (always including the definition, as we need the
* shell type defined to identify the function fully) as to be dumped
* if any such function is
*/
if (funcobj->dump)
typeInfo->shellType->dobj.dump = funcobj->dump |
DUMP_COMPONENT_DEFINITION;
}
}
/*
* Because we force a view to depend on its ON SELECT rule, while there
* will be an implicit dependency in the other direction, we need to break
* the loop. If there are no other objects in the loop then we can remove
* the implicit dependency and leave the ON SELECT rule non-separate.
* This applies to matviews, as well.
*/
static void
repairViewRuleLoop(DumpableObject *viewobj,
DumpableObject *ruleobj)
{
/* remove rule's dependency on view */
removeObjectDependency(ruleobj, viewobj->dumpId);
Fix pg_dump's handling of circular dependencies in views. pg_dump's traditional solution for breaking a circular dependency involving a view was to create the view with CREATE TABLE and then later issue CREATE RULE "_RETURN" ... to convert the table to a view, relying on the backend's very very ancient code that supports making views that way. We've wanted to get rid of that kluge for a long time, but the thing that finally motivates doing something about it is the recognition that this method fails with the --clean option, because it leads to issuing DROP RULE "_RETURN" followed by DROP TABLE --- and the backend won't let you drop a view's _RETURN rule. Instead, let's break circular dependencies by initially creating the view using CREATE VIEW AS SELECT NULL::columntype AS columnname, ... (so that it has the right column names and types to support external references, but no dependencies beyond the column data types), and then later dumping the ON SELECT rule using the spelling CREATE OR REPLACE VIEW. This method wasn't available when this code was originally written, but it's been possible since PG 7.3, so it seems fine to start relying on it now. To solve the --clean problem, make the dropStmt for an ON SELECT rule be CREATE OR REPLACE VIEW with the same dummy target list as above. In this way, during the DROP phase, we first reduce the view to have no extra dependencies, and then we can drop it entirely when we've gotten rid of whatever had a circular dependency on it. (Note: this should work adequately well with the --if-exists option, since the CREATE OR REPLACE VIEW will go through whether the view exists or not. It could fail if the view exists with a conflicting column set, but we don't really support --clean against a non-matching database anyway.) This allows cleaning up some other kluges inside pg_dump, notably that we don't need a notion of reloptions attached to a rule anymore. Although this is a bug fix, commit to HEAD only for now. The problem's existed for a long time and we've had relatively few complaints, so it doesn't really seem worth taking risks to fix it in the back branches. We might revisit that choice if no problems emerge. Discussion: <19092.1479325184@sss.pgh.pa.us>
2016-11-17 21:25:59 +01:00
/* flags on the two objects are already set correctly for this case */
}
/*
* However, if there are other objects in the loop, we must break the loop
* by making the ON SELECT rule a separately-dumped object.
*
* Because findLoop() finds shorter cycles before longer ones, it's likely
* that we will have previously fired repairViewRuleLoop() and removed the
* rule's dependency on the view. Put it back to ensure the rule won't be
* emitted before the view.
*
* Note: this approach does *not* work for matviews, at the moment.
*/
static void
repairViewRuleMultiLoop(DumpableObject *viewobj,
DumpableObject *ruleobj)
{
TableInfo *viewinfo = (TableInfo *) viewobj;
RuleInfo *ruleinfo = (RuleInfo *) ruleobj;
/* remove view's dependency on rule */
removeObjectDependency(viewobj, ruleobj->dumpId);
Fix pg_dump's handling of circular dependencies in views. pg_dump's traditional solution for breaking a circular dependency involving a view was to create the view with CREATE TABLE and then later issue CREATE RULE "_RETURN" ... to convert the table to a view, relying on the backend's very very ancient code that supports making views that way. We've wanted to get rid of that kluge for a long time, but the thing that finally motivates doing something about it is the recognition that this method fails with the --clean option, because it leads to issuing DROP RULE "_RETURN" followed by DROP TABLE --- and the backend won't let you drop a view's _RETURN rule. Instead, let's break circular dependencies by initially creating the view using CREATE VIEW AS SELECT NULL::columntype AS columnname, ... (so that it has the right column names and types to support external references, but no dependencies beyond the column data types), and then later dumping the ON SELECT rule using the spelling CREATE OR REPLACE VIEW. This method wasn't available when this code was originally written, but it's been possible since PG 7.3, so it seems fine to start relying on it now. To solve the --clean problem, make the dropStmt for an ON SELECT rule be CREATE OR REPLACE VIEW with the same dummy target list as above. In this way, during the DROP phase, we first reduce the view to have no extra dependencies, and then we can drop it entirely when we've gotten rid of whatever had a circular dependency on it. (Note: this should work adequately well with the --if-exists option, since the CREATE OR REPLACE VIEW will go through whether the view exists or not. It could fail if the view exists with a conflicting column set, but we don't really support --clean against a non-matching database anyway.) This allows cleaning up some other kluges inside pg_dump, notably that we don't need a notion of reloptions attached to a rule anymore. Although this is a bug fix, commit to HEAD only for now. The problem's existed for a long time and we've had relatively few complaints, so it doesn't really seem worth taking risks to fix it in the back branches. We might revisit that choice if no problems emerge. Discussion: <19092.1479325184@sss.pgh.pa.us>
2016-11-17 21:25:59 +01:00
/* mark view to be printed with a dummy definition */
viewinfo->dummy_view = true;
/* mark rule as needing its own dump */
ruleinfo->separate = true;
/* put back rule's dependency on view */
addObjectDependency(ruleobj, viewobj->dumpId);
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
/* now that rule is separate, it must be post-data */
addObjectDependency(ruleobj, postDataBoundId);
}
/*
* If a matview is involved in a multi-object loop, we can't currently fix
* that by splitting off the rule. As a stopgap, we try to fix it by
* dropping the constraint that the matview be dumped in the pre-data section.
* This is sufficient to handle cases where a matview depends on some unique
* index, as can happen if it has a GROUP BY for example.
*
* Note that the "next object" is not necessarily the matview itself;
* it could be the matview's rowtype, for example. We may come through here
* several times while removing all the pre-data linkages. In particular,
* if there are other matviews that depend on the one with the circularity
* problem, we'll come through here for each such matview and mark them all
* as postponed. (This works because all MVs have pre-data dependencies
* to begin with, so each of them will get visited.)
*/
static void
repairMatViewBoundaryMultiLoop(DumpableObject *boundaryobj,
DumpableObject *nextobj)
{
/* remove boundary's dependency on object after it in loop */
removeObjectDependency(boundaryobj, nextobj->dumpId);
/* if that object is a matview, mark it as postponed into post-data */
if (nextobj->objType == DO_TABLE)
{
TableInfo *nextinfo = (TableInfo *) nextobj;
if (nextinfo->relkind == RELKIND_MATVIEW)
nextinfo->postponed_def = true;
}
}
/*
* If a function is involved in a multi-object loop, we can't currently fix
* that by splitting it into two DumpableObjects. As a stopgap, we try to fix
* it by dropping the constraint that the function be dumped in the pre-data
* section. This is sufficient to handle cases where a function depends on
* some unique index, as can happen if it has a GROUP BY for example.
*/
static void
repairFunctionBoundaryMultiLoop(DumpableObject *boundaryobj,
DumpableObject *nextobj)
{
/* remove boundary's dependency on object after it in loop */
removeObjectDependency(boundaryobj, nextobj->dumpId);
/* if that object is a function, mark it as postponed into post-data */
if (nextobj->objType == DO_FUNC)
{
FuncInfo *nextinfo = (FuncInfo *) nextobj;
nextinfo->postponed_def = true;
}
}
/*
* Because we make tables depend on their CHECK constraints, while there
* will be an automatic dependency in the other direction, we need to break
* the loop. If there are no other objects in the loop then we can remove
* the automatic dependency and leave the CHECK constraint non-separate.
*/
static void
repairTableConstraintLoop(DumpableObject *tableobj,
DumpableObject *constraintobj)
{
/* remove constraint's dependency on table */
removeObjectDependency(constraintobj, tableobj->dumpId);
}
/*
* However, if there are other objects in the loop, we must break the loop
* by making the CHECK constraint a separately-dumped object.
*
* Because findLoop() finds shorter cycles before longer ones, it's likely
* that we will have previously fired repairTableConstraintLoop() and
* removed the constraint's dependency on the table. Put it back to ensure
* the constraint won't be emitted before the table...
*/
static void
repairTableConstraintMultiLoop(DumpableObject *tableobj,
DumpableObject *constraintobj)
{
/* remove table's dependency on constraint */
removeObjectDependency(tableobj, constraintobj->dumpId);
/* mark constraint as needing its own dump */
((ConstraintInfo *) constraintobj)->separate = true;
/* put back constraint's dependency on table */
addObjectDependency(constraintobj, tableobj->dumpId);
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
/* now that constraint is separate, it must be post-data */
addObjectDependency(constraintobj, postDataBoundId);
}
/*
* Attribute defaults behave exactly the same as CHECK constraints...
*/
static void
repairTableAttrDefLoop(DumpableObject *tableobj,
DumpableObject *attrdefobj)
{
/* remove attrdef's dependency on table */
removeObjectDependency(attrdefobj, tableobj->dumpId);
}
static void
repairTableAttrDefMultiLoop(DumpableObject *tableobj,
DumpableObject *attrdefobj)
{
/* remove table's dependency on attrdef */
removeObjectDependency(tableobj, attrdefobj->dumpId);
/* mark attrdef as needing its own dump */
((AttrDefInfo *) attrdefobj)->separate = true;
/* put back attrdef's dependency on table */
addObjectDependency(attrdefobj, tableobj->dumpId);
}
/*
* CHECK constraints on domains work just like those on tables ...
*/
static void
repairDomainConstraintLoop(DumpableObject *domainobj,
DumpableObject *constraintobj)
{
/* remove constraint's dependency on domain */
removeObjectDependency(constraintobj, domainobj->dumpId);
}
static void
repairDomainConstraintMultiLoop(DumpableObject *domainobj,
DumpableObject *constraintobj)
{
/* remove domain's dependency on constraint */
removeObjectDependency(domainobj, constraintobj->dumpId);
/* mark constraint as needing its own dump */
((ConstraintInfo *) constraintobj)->separate = true;
/* put back constraint's dependency on domain */
addObjectDependency(constraintobj, domainobj->dumpId);
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
/* now that constraint is separate, it must be post-data */
addObjectDependency(constraintobj, postDataBoundId);
}
static void
repairIndexLoop(DumpableObject *partedindex,
DumpableObject *partindex)
{
removeObjectDependency(partedindex, partindex->dumpId);
}
/*
* Fix a dependency loop, or die trying ...
*
* This routine is mainly concerned with reducing the multiple ways that
* a loop might appear to common cases, which it passes off to the
* "fixer" routines above.
*/
static void
repairDependencyLoop(DumpableObject **loop,
int nLoop)
{
int i,
j;
/* Datatype and one of its I/O or canonicalize functions */
if (nLoop == 2 &&
loop[0]->objType == DO_TYPE &&
loop[1]->objType == DO_FUNC)
{
repairTypeFuncLoop(loop[0], loop[1]);
return;
}
if (nLoop == 2 &&
loop[1]->objType == DO_TYPE &&
loop[0]->objType == DO_FUNC)
{
repairTypeFuncLoop(loop[1], loop[0]);
return;
}
/* View (including matview) and its ON SELECT rule */
if (nLoop == 2 &&
loop[0]->objType == DO_TABLE &&
loop[1]->objType == DO_RULE &&
(((TableInfo *) loop[0])->relkind == RELKIND_VIEW ||
((TableInfo *) loop[0])->relkind == RELKIND_MATVIEW) &&
((RuleInfo *) loop[1])->ev_type == '1' &&
((RuleInfo *) loop[1])->is_instead &&
((RuleInfo *) loop[1])->ruletable == (TableInfo *) loop[0])
{
repairViewRuleLoop(loop[0], loop[1]);
return;
}
if (nLoop == 2 &&
loop[1]->objType == DO_TABLE &&
loop[0]->objType == DO_RULE &&
(((TableInfo *) loop[1])->relkind == RELKIND_VIEW ||
((TableInfo *) loop[1])->relkind == RELKIND_MATVIEW) &&
((RuleInfo *) loop[0])->ev_type == '1' &&
((RuleInfo *) loop[0])->is_instead &&
((RuleInfo *) loop[0])->ruletable == (TableInfo *) loop[1])
{
repairViewRuleLoop(loop[1], loop[0]);
return;
}
/* Indirect loop involving view (but not matview) and ON SELECT rule */
if (nLoop > 2)
{
for (i = 0; i < nLoop; i++)
{
if (loop[i]->objType == DO_TABLE &&
((TableInfo *) loop[i])->relkind == RELKIND_VIEW)
{
for (j = 0; j < nLoop; j++)
{
if (loop[j]->objType == DO_RULE &&
((RuleInfo *) loop[j])->ev_type == '1' &&
((RuleInfo *) loop[j])->is_instead &&
((RuleInfo *) loop[j])->ruletable == (TableInfo *) loop[i])
{
repairViewRuleMultiLoop(loop[i], loop[j]);
return;
}
}
}
}
}
/* Indirect loop involving matview and data boundary */
if (nLoop > 2)
{
for (i = 0; i < nLoop; i++)
{
if (loop[i]->objType == DO_TABLE &&
((TableInfo *) loop[i])->relkind == RELKIND_MATVIEW)
{
for (j = 0; j < nLoop; j++)
{
if (loop[j]->objType == DO_PRE_DATA_BOUNDARY)
{
DumpableObject *nextobj;
nextobj = (j < nLoop - 1) ? loop[j + 1] : loop[0];
repairMatViewBoundaryMultiLoop(loop[j], nextobj);
return;
}
}
}
}
}
/* Indirect loop involving function and data boundary */
if (nLoop > 2)
{
for (i = 0; i < nLoop; i++)
{
if (loop[i]->objType == DO_FUNC)
{
for (j = 0; j < nLoop; j++)
{
if (loop[j]->objType == DO_PRE_DATA_BOUNDARY)
{
DumpableObject *nextobj;
nextobj = (j < nLoop - 1) ? loop[j + 1] : loop[0];
repairFunctionBoundaryMultiLoop(loop[j], nextobj);
return;
}
}
}
}
}
/* Table and CHECK constraint */
if (nLoop == 2 &&
loop[0]->objType == DO_TABLE &&
loop[1]->objType == DO_CONSTRAINT &&
((ConstraintInfo *) loop[1])->contype == 'c' &&
((ConstraintInfo *) loop[1])->contable == (TableInfo *) loop[0])
{
repairTableConstraintLoop(loop[0], loop[1]);
return;
}
if (nLoop == 2 &&
loop[1]->objType == DO_TABLE &&
loop[0]->objType == DO_CONSTRAINT &&
((ConstraintInfo *) loop[0])->contype == 'c' &&
((ConstraintInfo *) loop[0])->contable == (TableInfo *) loop[1])
{
repairTableConstraintLoop(loop[1], loop[0]);
return;
}
/* Indirect loop involving table and CHECK constraint */
if (nLoop > 2)
{
for (i = 0; i < nLoop; i++)
{
if (loop[i]->objType == DO_TABLE)
{
for (j = 0; j < nLoop; j++)
{
if (loop[j]->objType == DO_CONSTRAINT &&
((ConstraintInfo *) loop[j])->contype == 'c' &&
((ConstraintInfo *) loop[j])->contable == (TableInfo *) loop[i])
{
repairTableConstraintMultiLoop(loop[i], loop[j]);
return;
}
}
}
}
}
/* Table and attribute default */
if (nLoop == 2 &&
loop[0]->objType == DO_TABLE &&
loop[1]->objType == DO_ATTRDEF &&
((AttrDefInfo *) loop[1])->adtable == (TableInfo *) loop[0])
{
repairTableAttrDefLoop(loop[0], loop[1]);
return;
}
if (nLoop == 2 &&
loop[1]->objType == DO_TABLE &&
loop[0]->objType == DO_ATTRDEF &&
((AttrDefInfo *) loop[0])->adtable == (TableInfo *) loop[1])
{
repairTableAttrDefLoop(loop[1], loop[0]);
return;
}
/* index on partitioned table and corresponding index on partition */
if (nLoop == 2 &&
loop[0]->objType == DO_INDEX &&
loop[1]->objType == DO_INDEX)
{
if (((IndxInfo *) loop[0])->parentidx == loop[1]->catId.oid)
{
repairIndexLoop(loop[0], loop[1]);
return;
}
else if (((IndxInfo *) loop[1])->parentidx == loop[0]->catId.oid)
{
repairIndexLoop(loop[1], loop[0]);
return;
}
}
/* Indirect loop involving table and attribute default */
if (nLoop > 2)
{
for (i = 0; i < nLoop; i++)
{
if (loop[i]->objType == DO_TABLE)
{
for (j = 0; j < nLoop; j++)
{
if (loop[j]->objType == DO_ATTRDEF &&
((AttrDefInfo *) loop[j])->adtable == (TableInfo *) loop[i])
{
repairTableAttrDefMultiLoop(loop[i], loop[j]);
return;
}
}
}
}
}
/* Domain and CHECK constraint */
if (nLoop == 2 &&
loop[0]->objType == DO_TYPE &&
loop[1]->objType == DO_CONSTRAINT &&
((ConstraintInfo *) loop[1])->contype == 'c' &&
((ConstraintInfo *) loop[1])->condomain == (TypeInfo *) loop[0])
{
repairDomainConstraintLoop(loop[0], loop[1]);
return;
}
if (nLoop == 2 &&
loop[1]->objType == DO_TYPE &&
loop[0]->objType == DO_CONSTRAINT &&
((ConstraintInfo *) loop[0])->contype == 'c' &&
((ConstraintInfo *) loop[0])->condomain == (TypeInfo *) loop[1])
{
repairDomainConstraintLoop(loop[1], loop[0]);
return;
}
/* Indirect loop involving domain and CHECK constraint */
if (nLoop > 2)
{
for (i = 0; i < nLoop; i++)
{
if (loop[i]->objType == DO_TYPE)
{
for (j = 0; j < nLoop; j++)
{
if (loop[j]->objType == DO_CONSTRAINT &&
((ConstraintInfo *) loop[j])->contype == 'c' &&
((ConstraintInfo *) loop[j])->condomain == (TypeInfo *) loop[i])
{
repairDomainConstraintMultiLoop(loop[i], loop[j]);
return;
}
}
}
}
}
/*
* Loop of table with itself --- just ignore it.
*
* (Actually, what this arises from is a dependency of a table column on
Fix bogus dependency handling for GENERATED expressions. For GENERATED columns, we record all dependencies of the generation expression as AUTO dependencies of the column itself. This means that the generated column is silently dropped if any dependency is removed, even if CASCADE wasn't specified. This is at least a POLA violation, but I think it's actually based on a misreading of the standard. The standard does say that you can't drop a dependent GENERATED column in RESTRICT mode; but that's buried down in a subparagraph, on a different page from some pseudocode that makes it look like an AUTO drop is being suggested. Change this to be more like the way that we handle regular default expressions, ie record the dependencies as NORMAL dependencies of the pg_attrdef entry. Also, make the pg_attrdef entry's dependency on the column itself be INTERNAL not AUTO. That has two effects: * the column will go away, not just lose its default, if any dependency of the expression is dropped with CASCADE. So we don't need any special mechanism to make that happen. * it provides an additional cross-check preventing someone from dropping the default expression without dropping the column. catversion bump because of change in the contents of pg_depend (which also requires a change in one information_schema view). Per bug #17439 from Kevin Humphreys. Although this is a longstanding bug, it seems impractical to back-patch because of the need for catalog contents changes. Discussion: https://postgr.es/m/17439-7df4421197e928f0@postgresql.org
2022-03-21 19:58:49 +01:00
* another column, which happened with generated columns before v15; or a
* dependency of a table column on the whole table, which happens with
* partitioning. But we didn't pay attention to sub-object IDs while
* collecting the dependency data, so we can't see that here.)
*/
if (nLoop == 1)
{
if (loop[0]->objType == DO_TABLE)
{
removeObjectDependency(loop[0], loop[0]->dumpId);
return;
}
}
/*
* If all the objects are TABLE_DATA items, what we must have is a
* circular set of foreign key constraints (or a single self-referential
* table). Print an appropriate complaint and break the loop arbitrarily.
*/
for (i = 0; i < nLoop; i++)
{
if (loop[i]->objType != DO_TABLE_DATA)
break;
}
if (i >= nLoop)
{
Unified logging system for command-line programs This unifies the various ad hoc logging (message printing, error printing) systems used throughout the command-line programs. Features: - Program name is automatically prefixed. - Message string does not end with newline. This removes a common source of inconsistencies and omissions. - Additionally, a final newline is automatically stripped, simplifying use of PQerrorMessage() etc., another common source of mistakes. - I converted error message strings to use %m where possible. - As a result of the above several points, more translatable message strings can be shared between different components and between frontends and backend, without gratuitous punctuation or whitespace differences. - There is support for setting a "log level". This is not meant to be user-facing, but can be used internally to implement debug or verbose modes. - Lazy argument evaluation, so no significant overhead if logging at some level is disabled. - Some color in the messages, similar to gcc and clang. Set PG_COLOR=auto to try it out. Some colors are predefined, but can be customized by setting PG_COLORS. - Common files (common/, fe_utils/, etc.) can handle logging much more simply by just using one API without worrying too much about the context of the calling program, requiring callbacks, or having to pass "progname" around everywhere. - Some programs called setvbuf() to make sure that stderr is unbuffered, even on Windows. But not all programs did that. This is now done centrally. Soft goals: - Reduces vertical space use and visual complexity of error reporting in the source code. - Encourages more deliberate classification of messages. For example, in some cases it wasn't clear without analyzing the surrounding code whether a message was meant as an error or just an info. - Concepts and terms are vaguely aligned with popular logging frameworks such as log4j and Python logging. This is all just about printing stuff out. Nothing affects program flow (e.g., fatal exits). The uses are just too varied to do that. Some existing code had wrappers that do some kind of print-and-exit, and I adapted those. I tried to keep the output mostly the same, but there is a lot of historical baggage to unwind and special cases to consider, and I might not always have succeeded. One significant change is that pg_rewind used to write all error messages to stdout. That is now changed to stderr. Reviewed-by: Donald Dong <xdong@csumb.edu> Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru> Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
pg_log_warning(ngettext("there are circular foreign-key constraints on this table:",
"there are circular foreign-key constraints among these tables:",
nLoop));
for (i = 0; i < nLoop; i++)
pg_log_warning_detail("%s", loop[i]->name);
pg_log_warning_hint("You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.");
pg_log_warning_hint("Consider using a full dump instead of a --data-only dump to avoid this problem.");
if (nLoop > 1)
removeObjectDependency(loop[0], loop[1]->dumpId);
else /* must be a self-dependency */
removeObjectDependency(loop[0], loop[0]->dumpId);
return;
}
/*
* If we can't find a principled way to break the loop, complain and break
* it in an arbitrary fashion.
*/
Unified logging system for command-line programs This unifies the various ad hoc logging (message printing, error printing) systems used throughout the command-line programs. Features: - Program name is automatically prefixed. - Message string does not end with newline. This removes a common source of inconsistencies and omissions. - Additionally, a final newline is automatically stripped, simplifying use of PQerrorMessage() etc., another common source of mistakes. - I converted error message strings to use %m where possible. - As a result of the above several points, more translatable message strings can be shared between different components and between frontends and backend, without gratuitous punctuation or whitespace differences. - There is support for setting a "log level". This is not meant to be user-facing, but can be used internally to implement debug or verbose modes. - Lazy argument evaluation, so no significant overhead if logging at some level is disabled. - Some color in the messages, similar to gcc and clang. Set PG_COLOR=auto to try it out. Some colors are predefined, but can be customized by setting PG_COLORS. - Common files (common/, fe_utils/, etc.) can handle logging much more simply by just using one API without worrying too much about the context of the calling program, requiring callbacks, or having to pass "progname" around everywhere. - Some programs called setvbuf() to make sure that stderr is unbuffered, even on Windows. But not all programs did that. This is now done centrally. Soft goals: - Reduces vertical space use and visual complexity of error reporting in the source code. - Encourages more deliberate classification of messages. For example, in some cases it wasn't clear without analyzing the surrounding code whether a message was meant as an error or just an info. - Concepts and terms are vaguely aligned with popular logging frameworks such as log4j and Python logging. This is all just about printing stuff out. Nothing affects program flow (e.g., fatal exits). The uses are just too varied to do that. Some existing code had wrappers that do some kind of print-and-exit, and I adapted those. I tried to keep the output mostly the same, but there is a lot of historical baggage to unwind and special cases to consider, and I might not always have succeeded. One significant change is that pg_rewind used to write all error messages to stdout. That is now changed to stderr. Reviewed-by: Donald Dong <xdong@csumb.edu> Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru> Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
pg_log_warning("could not resolve dependency loop among these items:");
for (i = 0; i < nLoop; i++)
{
char buf[1024];
describeDumpableObject(loop[i], buf, sizeof(buf));
pg_log_warning_detail("%s", buf);
}
if (nLoop > 1)
removeObjectDependency(loop[0], loop[1]->dumpId);
else /* must be a self-dependency */
removeObjectDependency(loop[0], loop[0]->dumpId);
}
/*
* Describe a dumpable object usefully for errors
*
* This should probably go somewhere else...
*/
static void
describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
{
switch (obj->objType)
{
case DO_NAMESPACE:
snprintf(buf, bufsize,
"SCHEMA %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_EXTENSION:
snprintf(buf, bufsize,
"EXTENSION %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TYPE:
snprintf(buf, bufsize,
"TYPE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_SHELL_TYPE:
snprintf(buf, bufsize,
"SHELL TYPE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_FUNC:
snprintf(buf, bufsize,
"FUNCTION %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_AGG:
snprintf(buf, bufsize,
"AGGREGATE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_OPERATOR:
snprintf(buf, bufsize,
"OPERATOR %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_ACCESS_METHOD:
snprintf(buf, bufsize,
"ACCESS METHOD %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_OPCLASS:
snprintf(buf, bufsize,
"OPERATOR CLASS %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_OPFAMILY:
snprintf(buf, bufsize,
"OPERATOR FAMILY %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_COLLATION:
snprintf(buf, bufsize,
"COLLATION %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_CONVERSION:
snprintf(buf, bufsize,
"CONVERSION %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TABLE:
snprintf(buf, bufsize,
"TABLE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TABLE_ATTACH:
snprintf(buf, bufsize,
"TABLE ATTACH %s (ID %d)",
obj->name, obj->dumpId);
return;
case DO_ATTRDEF:
snprintf(buf, bufsize,
"ATTRDEF %s.%s (ID %d OID %u)",
((AttrDefInfo *) obj)->adtable->dobj.name,
((AttrDefInfo *) obj)->adtable->attnames[((AttrDefInfo *) obj)->adnum - 1],
obj->dumpId, obj->catId.oid);
return;
case DO_INDEX:
snprintf(buf, bufsize,
"INDEX %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_INDEX_ATTACH:
snprintf(buf, bufsize,
"INDEX ATTACH %s (ID %d)",
obj->name, obj->dumpId);
return;
Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
case DO_STATSEXT:
snprintf(buf, bufsize,
"STATISTICS %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_REFRESH_MATVIEW:
snprintf(buf, bufsize,
"REFRESH MATERIALIZED VIEW %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_RULE:
snprintf(buf, bufsize,
"RULE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TRIGGER:
snprintf(buf, bufsize,
"TRIGGER %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_EVENT_TRIGGER:
snprintf(buf, bufsize,
"EVENT TRIGGER %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_CONSTRAINT:
snprintf(buf, bufsize,
"CONSTRAINT %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_FK_CONSTRAINT:
snprintf(buf, bufsize,
"FK CONSTRAINT %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_PROCLANG:
snprintf(buf, bufsize,
"PROCEDURAL LANGUAGE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_CAST:
snprintf(buf, bufsize,
"CAST %u to %u (ID %d OID %u)",
((CastInfo *) obj)->castsource,
((CastInfo *) obj)->casttarget,
obj->dumpId, obj->catId.oid);
return;
case DO_TRANSFORM:
snprintf(buf, bufsize,
"TRANSFORM %u lang %u (ID %d OID %u)",
((TransformInfo *) obj)->trftype,
((TransformInfo *) obj)->trflang,
obj->dumpId, obj->catId.oid);
return;
case DO_TABLE_DATA:
snprintf(buf, bufsize,
"TABLE DATA %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_SEQUENCE_SET:
snprintf(buf, bufsize,
"SEQUENCE SET %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_DUMMY_TYPE:
snprintf(buf, bufsize,
"DUMMY TYPE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TSPARSER:
snprintf(buf, bufsize,
"TEXT SEARCH PARSER %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TSDICT:
snprintf(buf, bufsize,
"TEXT SEARCH DICTIONARY %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TSTEMPLATE:
snprintf(buf, bufsize,
"TEXT SEARCH TEMPLATE %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_TSCONFIG:
snprintf(buf, bufsize,
"TEXT SEARCH CONFIGURATION %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_FDW:
snprintf(buf, bufsize,
"FOREIGN DATA WRAPPER %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_FOREIGN_SERVER:
snprintf(buf, bufsize,
"FOREIGN SERVER %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_DEFAULT_ACL:
snprintf(buf, bufsize,
"DEFAULT ACL %s (ID %d OID %u)",
obj->name, obj->dumpId, obj->catId.oid);
return;
case DO_LARGE_OBJECT:
snprintf(buf, bufsize,
"LARGE OBJECT (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
case DO_LARGE_OBJECT_DATA:
2005-06-30 05:03:04 +02:00
snprintf(buf, bufsize,
"LARGE OBJECT DATA (ID %d)",
2005-06-30 05:03:04 +02:00
obj->dumpId);
return;
case DO_POLICY:
Row-Level Security Policies (RLS) Building on the updatable security-barrier views work, add the ability to define policies on tables to limit the set of rows which are returned from a query and which are allowed to be added to a table. Expressions defined by the policy for filtering are added to the security barrier quals of the query, while expressions defined to check records being added to a table are added to the with-check options of the query. New top-level commands are CREATE/ALTER/DROP POLICY and are controlled by the table owner. Row Security is able to be enabled and disabled by the owner on a per-table basis using ALTER TABLE .. ENABLE/DISABLE ROW SECURITY. Per discussion, ROW SECURITY is disabled on tables by default and must be enabled for policies on the table to be used. If no policies exist on a table with ROW SECURITY enabled, a default-deny policy is used and no records will be visible. By default, row security is applied at all times except for the table owner and the superuser. A new GUC, row_security, is added which can be set to ON, OFF, or FORCE. When set to FORCE, row security will be applied even for the table owner and superusers. When set to OFF, row security will be disabled when allowed and an error will be thrown if the user does not have rights to bypass row security. Per discussion, pg_dump sets row_security = OFF by default to ensure that exports and backups will have all data in the table or will error if there are insufficient privileges to bypass row security. A new option has been added to pg_dump, --enable-row-security, to ask pg_dump to export with row security enabled. A new role capability, BYPASSRLS, which can only be set by the superuser, is added to allow other users to be able to bypass row security using row_security = OFF. Many thanks to the various individuals who have helped with the design, particularly Robert Haas for his feedback. Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean Rasheed, with additional changes and rework by me. Reviewers have included all of the above, Greg Smith, Jeff McCormick, and Robert Haas.
2014-09-19 17:18:35 +02:00
snprintf(buf, bufsize,
"POLICY (ID %d OID %u)",
Row-Level Security Policies (RLS) Building on the updatable security-barrier views work, add the ability to define policies on tables to limit the set of rows which are returned from a query and which are allowed to be added to a table. Expressions defined by the policy for filtering are added to the security barrier quals of the query, while expressions defined to check records being added to a table are added to the with-check options of the query. New top-level commands are CREATE/ALTER/DROP POLICY and are controlled by the table owner. Row Security is able to be enabled and disabled by the owner on a per-table basis using ALTER TABLE .. ENABLE/DISABLE ROW SECURITY. Per discussion, ROW SECURITY is disabled on tables by default and must be enabled for policies on the table to be used. If no policies exist on a table with ROW SECURITY enabled, a default-deny policy is used and no records will be visible. By default, row security is applied at all times except for the table owner and the superuser. A new GUC, row_security, is added which can be set to ON, OFF, or FORCE. When set to FORCE, row security will be applied even for the table owner and superusers. When set to OFF, row security will be disabled when allowed and an error will be thrown if the user does not have rights to bypass row security. Per discussion, pg_dump sets row_security = OFF by default to ensure that exports and backups will have all data in the table or will error if there are insufficient privileges to bypass row security. A new option has been added to pg_dump, --enable-row-security, to ask pg_dump to export with row security enabled. A new role capability, BYPASSRLS, which can only be set by the superuser, is added to allow other users to be able to bypass row security using row_security = OFF. Many thanks to the various individuals who have helped with the design, particularly Robert Haas for his feedback. Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean Rasheed, with additional changes and rework by me. Reviewers have included all of the above, Greg Smith, Jeff McCormick, and Robert Haas.
2014-09-19 17:18:35 +02:00
obj->dumpId, obj->catId.oid);
return;
case DO_PUBLICATION:
snprintf(buf, bufsize,
"PUBLICATION (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
case DO_PUBLICATION_REL:
snprintf(buf, bufsize,
"PUBLICATION TABLE (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
case DO_PUBLICATION_TABLE_IN_SCHEMA:
snprintf(buf, bufsize,
"PUBLICATION TABLES IN SCHEMA (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
case DO_SUBSCRIPTION:
snprintf(buf, bufsize,
"SUBSCRIPTION (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
Allow upgrades to preserve the full subscription's state. This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 03:38:46 +01:00
case DO_SUBSCRIPTION_REL:
snprintf(buf, bufsize,
"SUBSCRIPTION TABLE (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
Improve pg_dump's dependency-sorting logic to enforce section dump order. As of 9.2, with the --section option, it is very important that the concept of "pre data", "data", and "post data" sections of the output be honored strictly; else a dump divided into separate sectional files might be unrestorable. However, the dependency-sorting logic knew nothing of sections and would happily select output orderings that didn't fit that structure. Doing so was mostly harmless before 9.2, but now we need to be sure it doesn't do that. To fix, create dummy objects representing the section boundaries and add dependencies between them and all the normal objects. (This might sound expensive but it seems to only add a percent or two to pg_dump's runtime.) This also fixes a problem introduced in 9.1 by the feature that allows incomplete GROUP BY lists when a primary key is given in GROUP BY. That means that views can depend on primary key constraints. Previously, pg_dump would deal with that by simply emitting the primary key constraint before the view definition (and hence before the data section of the output). That's bad enough for simple serial restores, where creating an index before the data is loaded works, but is undesirable for speed reasons. But it could lead to outright failure of parallel restores, as seen in bug #6699 from Joe Van Dyk. That happened because pg_restore would switch into parallel mode as soon as it reached the constraint, and then very possibly would try to emit the view definition before the primary key was committed (as a consequence of another bug that causes the view not to be correctly marked as depending on the constraint). Adding the section boundary constraints forces the dependency-sorting code to break the view into separate table and rule declarations, allowing the rule, and hence the primary key constraint it depends on, to revert to their intended location in the post-data section. This also somewhat accidentally works around the bogus-dependency-marking problem, because the rule will be correctly shown as depending on the constraint, so parallel pg_restore will now do the right thing. (We will fix the bogus-dependency problem for real in a separate patch, but that patch is not easily back-portable to 9.1, so the fact that this patch is enough to dodge the only known symptom is fortunate.) Back-patch to 9.1, except for the hunk that adds verification that the finished archive TOC list is in correct section order; the place where it was convenient to add that doesn't exist in 9.1.
2012-06-26 03:19:10 +02:00
case DO_PRE_DATA_BOUNDARY:
snprintf(buf, bufsize,
"PRE-DATA BOUNDARY (ID %d)",
obj->dumpId);
return;
case DO_POST_DATA_BOUNDARY:
snprintf(buf, bufsize,
"POST-DATA BOUNDARY (ID %d)",
obj->dumpId);
return;
}
/* shouldn't get here */
snprintf(buf, bufsize,
"object type %d (ID %d OID %u)",
(int) obj->objType,
obj->dumpId, obj->catId.oid);
}
/* binaryheap comparator that compares "a" and "b" as integers */
static int
int_cmp(void *a, void *b, void *arg)
{
int ai = (int) (intptr_t) a;
int bi = (int) (intptr_t) b;
return pg_cmp_s32(ai, bi);
}