postgresql/doc/src/sgml/bki.sgml

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1080 lines
42 KiB
Plaintext
Raw Normal View History

2010-09-20 22:08:53 +02:00
<!-- doc/src/sgml/bki.sgml -->
<chapter id="bki">
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<title>System Catalog Declarations and Initial Contents</title>
<para>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<productname>PostgreSQL</productname> uses many different system catalogs
to keep track of the existence and properties of database objects, such as
tables and functions. Physically there is no difference between a system
catalog and a plain user table, but the backend C code knows the structure
and properties of each catalog, and can manipulate it directly at a low
level. Thus, for example, it is inadvisable to attempt to alter the
structure of a catalog on-the-fly; that would break assumptions built into
the C code about how rows of the catalog are laid out. But the structure
of the catalogs can change between major versions.
</para>
<para>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
The structures of the catalogs are declared in specially formatted C
header files in the <filename>src/include/catalog/</filename> directory of
the source tree. For each catalog there is a header file
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
named after the catalog (e.g., <filename>pg_class.h</filename>
for <structname>pg_class</structname>), which defines the set of columns
the catalog has, as well as some other basic properties such as its OID.
</para>
<para>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
Many of the catalogs have initial data that must be loaded into them
during the <quote>bootstrap</quote> phase
of <application>initdb</application>, to bring the system up to a point
where it is capable of executing SQL commands. (For
example, <filename>pg_class.h</filename> must contain an entry for itself,
as well as one for each other system catalog and index.) This
initial data is kept in editable form in data files that are also stored
in the <filename>src/include/catalog/</filename> directory. For example,
<filename>pg_proc.dat</filename> describes all the initial rows that must
be inserted into the <structname>pg_proc</structname> catalog.
</para>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<para>
To create the catalog files and load this initial data into them, a
backend running in bootstrap mode reads a <acronym>BKI</acronym>
(Backend Interface) file containing commands and initial data.
The <filename>postgres.bki</filename> file used in this mode is prepared
from the aforementioned header and data files, while building
a <productname>PostgreSQL</productname> distribution, by a Perl script
named <filename>genbki.pl</filename>.
Although it's specific to a particular <productname>PostgreSQL</productname>
release, <filename>postgres.bki</filename> is platform-independent and is
installed in the <filename>share</filename> subdirectory of the
installation tree.
</para>
<para>
<filename>genbki.pl</filename> also produces a derived header file for
each catalog, for example <filename>pg_class_d.h</filename> for
the <structname>pg_class</structname> catalog. This file contains
automatically-generated macro definitions, and may contain other macros,
enum declarations, and so on that can be useful for client C code that
reads a particular catalog.
</para>
<para>
2019-09-06 22:16:58 +02:00
Most PostgreSQL developers don't need to be directly concerned with
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
the <acronym>BKI</acronym> file, but almost any nontrivial feature
addition in the backend will require modifying the catalog header files
and/or initial data files. The rest of this chapter gives some
information about that, and for completeness describes
the <acronym>BKI</acronym> file format.
</para>
<sect1 id="system-catalog-declarations">
<title>System Catalog Declaration Rules</title>
<para>
The key part of a catalog header file is a C structure definition
describing the layout of each row of the catalog. This begins with
a <literal>CATALOG</literal> macro, which so far as the C compiler is
concerned is just shorthand for <literal>typedef struct
FormData_<replaceable>catalogname</replaceable></literal>.
Each field in the struct gives rise to a catalog column.
Fields can be annotated using the BKI property macros described
in <filename>genbki.h</filename>, for example to define a default value
for a field or mark it as nullable or not nullable.
The <literal>CATALOG</literal> line can also be annotated, with some
other BKI property macros described in <filename>genbki.h</filename>, to
define other properties of the catalog as a whole, such as whether
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
it is a shared relation.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
<para>
The system catalog cache code (and most catalog-munging code in general)
assumes that the fixed-length portions of all system catalog tuples are
in fact present, because it maps this C struct declaration onto them.
Thus, all variable-length fields and nullable fields must be placed at
the end, and they cannot be accessed as struct fields.
For example, if you tried to
set <structname>pg_type</structname>.<structfield>typrelid</structfield>
to be NULL, it would fail when some piece of code tried to reference
<literal>typetup-&gt;typrelid</literal> (or worse,
<literal>typetup-&gt;typelem</literal>, because that follows
<structfield>typrelid</structfield>). This would result in
random errors or even segmentation violations.
</para>
<para>
As a partial guard against this type of error, variable-length or
nullable fields should not be made directly visible to the C compiler.
This is accomplished by wrapping them in <literal>#ifdef
CATALOG_VARLEN</literal> ... <literal>#endif</literal> (where
<literal>CATALOG_VARLEN</literal> is a symbol that is never defined).
This prevents C code from carelessly trying to access fields that might
not be there or might be at some other offset.
As an independent guard against creating incorrect rows, we
require all columns that should be non-nullable to be marked so
in <structname>pg_attribute</structname>. The bootstrap code will
automatically mark catalog columns as <literal>NOT NULL</literal>
if they are fixed-width and are not preceded by any nullable or
variable-width column.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
Where this rule is inadequate, you can force correct marking by using
<literal>BKI_FORCE_NOT_NULL</literal>
and <literal>BKI_FORCE_NULL</literal> annotations as needed.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
<para>
Frontend code should not include any <filename>pg_xxx.h</filename>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
catalog header file, as these files may contain C code that won't compile
outside the backend. (Typically, that happens because these files also
contain declarations for functions
in <filename>src/backend/catalog/</filename> files.)
Instead, frontend code may include the corresponding
generated <filename>pg_xxx_d.h</filename> header, which will contain
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
OID <literal>#define</literal>s and any other data that might be of use
on the client side. If you want macros or other code in a catalog header
to be visible to frontend code, write <literal>#ifdef
EXPOSE_TO_CLIENT_CODE</literal> ... <literal>#endif</literal> around that
section to instruct <filename>genbki.pl</filename> to copy that section
to the <filename>pg_xxx_d.h</filename> header.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
<para>
A few of the catalogs are so fundamental that they can't even be created
by the <acronym>BKI</acronym> <literal>create</literal> command that's
used for most catalogs, because that command needs to write information
into these catalogs to describe the new catalog. These are
called <firstterm>bootstrap</firstterm> catalogs, and defining one takes
a lot of extra work: you have to manually prepare appropriate entries for
them in the pre-loaded contents of <structname>pg_class</structname>
and <structname>pg_type</structname>, and those entries will need to be
updated for subsequent changes to the catalog's structure.
(Bootstrap catalogs also need pre-loaded entries
in <structname>pg_attribute</structname>, but
fortunately <filename>genbki.pl</filename> handles that chore nowadays.)
Avoid making new catalogs be bootstrap catalogs if at all possible.
</para>
</sect1>
<sect1 id="system-catalog-initial-data">
<title>System Catalog Initial Data</title>
<para>
Each catalog that has any manually-created initial data (some do not)
has a corresponding <literal>.dat</literal> file that contains its
initial data in an editable format.
</para>
<sect2 id="system-catalog-initial-data-format">
<title>Data File Format</title>
<para>
Each <literal>.dat</literal> file contains Perl data structure literals
that are simply eval'd to produce an in-memory data structure consisting
of an array of hash references, one per catalog row.
A slightly modified excerpt from <filename>pg_database.dat</filename>
will demonstrate the key features:
</para>
<!-- The "slight modification" is the apostrophe in the description. -->
<programlisting><![CDATA[
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
[
# A comment could appear here.
Rethink method for assigning OIDs to the template0 and postgres DBs. Commit aa0105141 assigned fixed OIDs to template0 and postgres in a very ad-hoc way. Notably, instead of teaching Catalog.pm about these OIDs, the unused_oids script was just hacked to not show them as unused. That's problematic since, for example, duplicate_oids wouldn't report any future conflict. Hence, invent a macro DECLARE_OID_DEFINING_MACRO() that can be used to define an OID that is known to Catalog.pm and will participate in duplicate-detection as well as renumbering by renumber_oids.pl. (We don't anticipate renumbering these particular OIDs, but we might as well build out all the Catalog.pm infrastructure while we're here.) Another issue is that aa0105141 neglected to touch IsPinnedObject, with the result that it now claimed template0 and postgres are pinned. The right thing to do there seems to be to teach it that no database is pinned, since in fact DROP DATABASE doesn't check for pinned-ness (and at least for these cases, that is an intentional choice). It's not clear whether this wrong answer had any visible effect, but perhaps it could have resulted in erroneous management of dependency entries. In passing, rename the TemplateDbOid macro to Template1DbOid to reduce confusion (likely we should have done that way back when we invented template0, but we didn't), and rename the OID macros for template0 and postgres to have a similar style. There are no changes to postgres.bki here, so no need for a catversion bump. Discussion: https://postgr.es/m/2935358.1650479692@sss.pgh.pa.us
2022-04-21 22:23:12 +02:00
{ oid => '1', oid_symbol => 'Template1DbOid',
descr => 'database\'s default template',
Rethink method for assigning OIDs to the template0 and postgres DBs. Commit aa0105141 assigned fixed OIDs to template0 and postgres in a very ad-hoc way. Notably, instead of teaching Catalog.pm about these OIDs, the unused_oids script was just hacked to not show them as unused. That's problematic since, for example, duplicate_oids wouldn't report any future conflict. Hence, invent a macro DECLARE_OID_DEFINING_MACRO() that can be used to define an OID that is known to Catalog.pm and will participate in duplicate-detection as well as renumbering by renumber_oids.pl. (We don't anticipate renumbering these particular OIDs, but we might as well build out all the Catalog.pm infrastructure while we're here.) Another issue is that aa0105141 neglected to touch IsPinnedObject, with the result that it now claimed template0 and postgres are pinned. The right thing to do there seems to be to teach it that no database is pinned, since in fact DROP DATABASE doesn't check for pinned-ness (and at least for these cases, that is an intentional choice). It's not clear whether this wrong answer had any visible effect, but perhaps it could have resulted in erroneous management of dependency entries. In passing, rename the TemplateDbOid macro to Template1DbOid to reduce confusion (likely we should have done that way back when we invented template0, but we didn't), and rename the OID macros for template0 and postgres to have a similar style. There are no changes to postgres.bki here, so no need for a catversion bump. Discussion: https://postgr.es/m/2935358.1650479692@sss.pgh.pa.us
2022-04-21 22:23:12 +02:00
datname => 'template1', encoding => 'ENCODING',
datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0',
datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
Rethink method for assigning OIDs to the template0 and postgres DBs. Commit aa0105141 assigned fixed OIDs to template0 and postgres in a very ad-hoc way. Notably, instead of teaching Catalog.pm about these OIDs, the unused_oids script was just hacked to not show them as unused. That's problematic since, for example, duplicate_oids wouldn't report any future conflict. Hence, invent a macro DECLARE_OID_DEFINING_MACRO() that can be used to define an OID that is known to Catalog.pm and will participate in duplicate-detection as well as renumbering by renumber_oids.pl. (We don't anticipate renumbering these particular OIDs, but we might as well build out all the Catalog.pm infrastructure while we're here.) Another issue is that aa0105141 neglected to touch IsPinnedObject, with the result that it now claimed template0 and postgres are pinned. The right thing to do there seems to be to teach it that no database is pinned, since in fact DROP DATABASE doesn't check for pinned-ness (and at least for these cases, that is an intentional choice). It's not clear whether this wrong answer had any visible effect, but perhaps it could have resulted in erroneous management of dependency entries. In passing, rename the TemplateDbOid macro to Template1DbOid to reduce confusion (likely we should have done that way back when we invented template0, but we didn't), and rename the OID macros for template0 and postgres to have a similar style. There are no changes to postgres.bki here, so no need for a catversion bump. Discussion: https://postgr.es/m/2935358.1650479692@sss.pgh.pa.us
2022-04-21 22:23:12 +02:00
datctype => 'LC_CTYPE', daticulocale => 'ICU_LOCALE', datacl => '_null_' },
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
]
]]></programlisting>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<para>
Points to note:
</para>
<itemizedlist>
<listitem>
<para>
The overall file layout is: open square bracket, one or more sets of
curly braces each of which represents a catalog row, close square
bracket. Write a comma after each closing curly brace.
</para>
</listitem>
<listitem>
<para>
Within each catalog row, write comma-separated
<replaceable>key</replaceable> <literal>=&gt;</literal>
<replaceable>value</replaceable> pairs. The
allowed <replaceable>key</replaceable>s are the names of the catalog's
columns, plus the metadata keys <literal>oid</literal>,
<literal>oid_symbol</literal>,
<literal>array_type_oid</literal>, and <literal>descr</literal>.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
(The use of <literal>oid</literal> and <literal>oid_symbol</literal>
is described in <xref linkend="system-catalog-oid-assignment"/> below,
while <literal>array_type_oid</literal> is described in
<xref linkend="system-catalog-auto-array-types"/>.
<literal>descr</literal> supplies a description string for the object,
which will be inserted into <structname>pg_description</structname>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
or <structname>pg_shdescription</structname> as appropriate.)
While the metadata keys are optional, the catalog's defined columns
must all be provided, except when the catalog's <literal>.h</literal>
file specifies a default value for the column.
(In the example above, the <structfield>datdba</structfield> field has
been omitted because <filename>pg_database.h</filename> supplies a
suitable default value for it.)
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
</listitem>
<listitem>
<para>
All values must be single-quoted. Escape single quotes used within a
value with a backslash. Backslashes meant as data can, but need not,
be doubled; this follows Perl's rules for simple quoted literals.
Note that backslashes appearing as data will be treated as escapes by
the bootstrap scanner, according to the same rules as for escape string
constants (see <xref linkend="sql-syntax-strings-escape"/>); for
example <literal>\t</literal> converts to a tab character. If you
actually want a backslash in the final value, you will need to write
four of them: Perl strips two, leaving <literal>\\</literal> for the
bootstrap scanner to see.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
</listitem>
<listitem>
<para>
Null values are represented by <literal>_null_</literal>.
Fix bootstrap parser so that its keywords are unreserved words. Mark Dilger pointed out that the bootstrap parser does not allow any of its keywords to appear as column values unless they're quoted, and proposed dealing with that by quoting such values in genbki.pl. Looking closer, though, we also have that problem with respect to table, column, and type names appearing in the .bki file: the parser would fail if any of those matched any of its keywords. While so far there have been no conflicts (that I've heard of), this seems like a booby trap waiting to catch somebody. Rather than clutter genbki.pl with enough quoting logic to handle all that, let's make the bootstrap parser grow up a little bit and treat its keywords as unreserved. Experimentation shows that it's fairly easy to do so with the exception of _null_, which I don't have a big problem with keeping as a reserved word. The only change needed is that we can't have the "close" command take an optional table name: it has to either require or forbid the table name to avoid shift/reduce conflicts. genbki.pl has historically always included the table name, so I took that option. The implementation has bootscanner.l passing forward the string value of each keyword, in case bootparse.y needs that. This avoids needing to know the precise spelling of each keyword in bootparse.y, which is good because that's not always obvious from the token name. Discussion: https://postgr.es/m/3024FC91-DB6D-4732-B31C-DF772DF039A0@gmail.com
2018-05-05 22:23:07 +02:00
(Note that there is no way to create a value that is just that
string.)
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
</listitem>
<listitem>
<para>
Comments are preceded by <literal>#</literal>, and must be on their
own lines.
</para>
</listitem>
<listitem>
<para>
Field values that are OIDs of other catalog entries should be
represented by symbolic names rather than actual numeric OIDs.
(In the example above, <structfield>dattablespace</structfield>
contains such a reference.)
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
This is described in <xref linkend="system-catalog-oid-references"/>
below.
</para>
</listitem>
<listitem>
<para>
Since hashes are unordered data structures, field order and line
layout aren't semantically significant. However, to maintain a
consistent appearance, we set a few rules that are applied by the
formatting script <filename>reformat_dat_file.pl</filename>:
<itemizedlist>
<listitem>
<para>
Within each pair of curly braces, the metadata
fields <literal>oid</literal>, <literal>oid_symbol</literal>,
<literal>array_type_oid</literal>, and <literal>descr</literal>
(if present) come first, in that order, then the catalog's own
fields appear in their defined order.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
</listitem>
<listitem>
<para>
Newlines are inserted between fields as needed to limit line length
to 80 characters, if possible. A newline is also inserted between
the metadata fields and the regular fields.
</para>
</listitem>
<listitem>
<para>
If the catalog's <literal>.h</literal> file specifies a default
value for a column, and a data entry has that same
value, <filename>reformat_dat_file.pl</filename> will omit it from
the data file. This keeps the data representation compact.
</para>
</listitem>
<listitem>
<para>
<filename>reformat_dat_file.pl</filename> preserves blank lines
and comment lines as-is.
</para>
</listitem>
</itemizedlist>
It's recommended to run <filename>reformat_dat_file.pl</filename>
before submitting catalog data patches. For convenience, you can
simply change to <filename>src/include/catalog/</filename> and
run <literal>make reformat-dat-files</literal>.
</para>
</listitem>
<listitem>
<para>
If you want to add a new method of making the data representation
smaller, you must implement it
in <filename>reformat_dat_file.pl</filename> and also
teach <function>Catalog::ParseData()</function> how to expand the
data back into the full representation.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="system-catalog-oid-assignment">
<title>OID Assignment</title>
<para>
A catalog row appearing in the initial data can be given a
manually-assigned OID by writing an <literal>oid
=&gt; <replaceable>nnnn</replaceable></literal> metadata field.
Furthermore, if an OID is assigned, a C macro for that OID can be
created by writing an <literal>oid_symbol
=&gt; <replaceable>name</replaceable></literal> metadata field.
</para>
<para>
Pre-loaded catalog rows must have preassigned OIDs if there are OID
references to them in other pre-loaded rows. A preassigned OID is
also needed if the row's OID must be referenced from C code.
If neither case applies, the <literal>oid</literal> metadata field can
be omitted, in which case the bootstrap code assigns an OID
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
automatically.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
In practice we usually preassign OIDs for all or none of the pre-loaded
rows in a given catalog, even if only some of them are actually
cross-referenced.
</para>
<para>
Writing the actual numeric value of any OID in C code is considered
very bad form; always use a macro, instead. Direct references
to <structname>pg_proc</structname> OIDs are common enough that there's
a special mechanism to create the necessary macros automatically;
see <filename>src/backend/utils/Gen_fmgrtab.pl</filename>. Similarly
&mdash; but, for historical reasons, not done the same way &mdash;
there's an automatic method for creating macros
for <structname>pg_type</structname>
OIDs. <literal>oid_symbol</literal> entries are therefore not
necessary in those two catalogs. Likewise, macros for
the <structname>pg_class</structname> OIDs of system catalogs and
indexes are set up automatically. For all other system catalogs, you
have to manually specify any macros you need
via <literal>oid_symbol</literal> entries.
</para>
<para>
To find an available OID for a new pre-loaded row, run the
script <filename>src/include/catalog/unused_oids</filename>.
It prints inclusive ranges of unused OIDs (e.g., the output
2019-10-25 20:23:44 +02:00
line <literal>45-900</literal> means OIDs 45 through 900 have not been
allocated yet). Currently, OIDs 1&ndash;9999 are reserved for manual
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
assignment; the <filename>unused_oids</filename> script simply looks
through the catalog headers and <filename>.dat</filename> files
to see which ones do not appear. You can also use
the <filename>duplicate_oids</filename> script to check for mistakes.
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
(<filename>genbki.pl</filename> will assign OIDs for any rows that
Create a script that can renumber manually-assigned OIDs. This commit adds a Perl script renumber_oids.pl, which can reassign a range of manually-assigned OIDs to someplace else by modifying OID fields of the catalog *.dat files and OID-assigning macros in the catalog *.h files. Up to now, we've encouraged new patches that need manually-assigned OIDs to use OIDs just above the range of existing OIDs. Predictably, this leads to patches stepping on each others' toes, as whichever one gets committed first creates an OID conflict that other patch author(s) have to resolve manually. With the availability of renumber_oids.pl, we can eliminate a lot of this hassle. The new project policy, therefore, is: * Encourage new patches to use high OIDs (the documentation suggests choosing a block of OIDs at random in 8000..9999). * After feature freeze in each development cycle, run renumber_oids.pl to move all such OIDs down to lower numbers, thus freeing the high OID range for the next development cycle. This plan should greatly reduce the risk of OID collisions between concurrently-developed patches. Also, if such a collision happens anyway, we have the option to resolve it without much effort by doing an off-schedule OID renumbering to get the first-committed patch out of the way. Or a patch author could use renumber_oids.pl to change their patch's assignments without much pain. This approach does put a premium on not hard-wiring any OID values in places where renumber_oids.pl and genbki.pl can't fix them. Project practice in that respect seems to be pretty good already, but a follow-on patch will sand down some rough edges. John Naylor and Tom Lane, per an idea of Peter Geoghegan's Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com
2019-03-12 15:50:48 +01:00
didn't get one hand-assigned to them, and it will also detect duplicate
OIDs at compile time.)
</para>
<para>
When choosing OIDs for a patch that is not expected to be committed
immediately, best practice is to use a group of more-or-less
consecutive OIDs starting with some random choice in the range
8000&mdash;9999. This minimizes the risk of OID collisions with other
patches being developed concurrently. To keep the 8000&mdash;9999
range free for development purposes, after a patch has been committed
to the master git repository its OIDs should be renumbered into
available space below that range. Typically, this will be done
near the end of each development cycle, moving all OIDs consumed by
patches committed in that cycle at the same time. The script
<filename>renumber_oids.pl</filename> can be used for this purpose.
If an uncommitted patch is found to have OID conflicts with some
recently-committed patch, <filename>renumber_oids.pl</filename> may
also be useful for recovering from that situation.
</para>
<para>
Because of this convention of possibly renumbering OIDs assigned by
patches, the OIDs assigned by a patch should not be considered stable
until the patch has been included in an official release. We do not
change manually-assigned object OIDs once released, however, as that
would create assorted compatibility problems.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
<para>
If <filename>genbki.pl</filename> needs to assign an OID to a catalog
entry that does not have a manually-assigned OID, it will use a value in
the range 10000&mdash;11999. The server's OID counter is set to 10000
at the start of a bootstrap run, so that any objects created on-the-fly
during bootstrap processing also receive OIDs in this range. (The
usual OID assignment mechanism takes care of preventing any conflicts.)
</para>
<para>
Objects with OIDs below <symbol>FirstUnpinnedObjectId</symbol> (12000)
are considered <quote>pinned</quote>, preventing them from being
deleted. (There are a small number of exceptions, which are
hard-wired into <function>IsPinnedObject()</function>.)
<application>initdb</application> forces the OID counter up
to <symbol>FirstUnpinnedObjectId</symbol> as soon as it's ready to
create unpinned objects. Thus objects created during the later phases
of <application>initdb</application>, such as objects created while
running the <filename>information_schema.sql</filename> script, will
not be pinned, while all objects known
to <filename>genbki.pl</filename> will be.
Create a script that can renumber manually-assigned OIDs. This commit adds a Perl script renumber_oids.pl, which can reassign a range of manually-assigned OIDs to someplace else by modifying OID fields of the catalog *.dat files and OID-assigning macros in the catalog *.h files. Up to now, we've encouraged new patches that need manually-assigned OIDs to use OIDs just above the range of existing OIDs. Predictably, this leads to patches stepping on each others' toes, as whichever one gets committed first creates an OID conflict that other patch author(s) have to resolve manually. With the availability of renumber_oids.pl, we can eliminate a lot of this hassle. The new project policy, therefore, is: * Encourage new patches to use high OIDs (the documentation suggests choosing a block of OIDs at random in 8000..9999). * After feature freeze in each development cycle, run renumber_oids.pl to move all such OIDs down to lower numbers, thus freeing the high OID range for the next development cycle. This plan should greatly reduce the risk of OID collisions between concurrently-developed patches. Also, if such a collision happens anyway, we have the option to resolve it without much effort by doing an off-schedule OID renumbering to get the first-committed patch out of the way. Or a patch author could use renumber_oids.pl to change their patch's assignments without much pain. This approach does put a premium on not hard-wiring any OID values in places where renumber_oids.pl and genbki.pl can't fix them. Project practice in that respect seems to be pretty good already, but a follow-on patch will sand down some rough edges. John Naylor and Tom Lane, per an idea of Peter Geoghegan's Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com
2019-03-12 15:50:48 +01:00
</para>
<para>
OIDs assigned during normal database operation are constrained to be
16384 or higher. This ensures that the range 10000&mdash;16383 is free
for OIDs assigned automatically by <filename>genbki.pl</filename> or
during <application>initdb</application>. These
automatically-assigned OIDs are not considered stable, and may change
from one installation to another.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
</sect2>
<sect2 id="system-catalog-oid-references">
<title>OID Reference Lookup</title>
<para>
In principle, cross-references from one initial catalog row to another
could be written just by writing the preassigned OID of the referenced
row in the referencing field. However, that is against project
policy, because it is error-prone, hard to read, and subject to
breakage if a newly-assigned OID is renumbered. Therefore
<filename>genbki.pl</filename> provides mechanisms to write
symbolic references instead.
The rules are as follows:
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
<itemizedlist>
<listitem>
<para>
Use of symbolic references is enabled in a particular catalog column
by attaching <literal>BKI_LOOKUP(<replaceable>lookuprule</replaceable>)</literal>
to the column's definition, where <replaceable>lookuprule</replaceable>
is the name of the referenced catalog, e.g., <literal>pg_proc</literal>.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<literal>BKI_LOOKUP</literal> can be attached to columns of
type <type>Oid</type>, <type>regproc</type>, <type>oidvector</type>,
or <type>Oid[]</type>; in the latter two cases it implies performing a
lookup on each element of the array.
</para>
</listitem>
<listitem>
<para>
It's also permissible to attach <literal>BKI_LOOKUP(encoding)</literal>
to integer columns to reference character set encodings, which are
not currently represented as catalog OIDs, but have a set of values
known to <filename>genbki.pl</filename>.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</para>
</listitem>
<listitem>
<para>
Build in some knowledge about foreign-key relationships in the catalogs. This follows in the spirit of commit dfb75e478, which created primary key and uniqueness constraints to improve the visibility of constraints imposed on the system catalogs. While our catalogs contain many foreign-key-like relationships, they don't quite follow SQL semantics, in that the convention for an omitted reference is to write zero not NULL. Plus, we have some cases in which there are arrays each of whose elements is supposed to be an FK reference; SQL has no way to model that. So we can't create actual foreign key constraints to describe the situation. Nonetheless, we can collect and use knowledge about these relationships. This patch therefore adds annotations to the catalog header files to declare foreign-key relationships. (The BKI_LOOKUP annotations cover simple cases, but we weren't previously distinguishing which such columns are allowed to contain zeroes; we also need new markings for multi-column FK references.) Then, Catalog.pm and genbki.pl are taught to collect this information into a table in a new generated header "system_fk_info.h". The only user of that at the moment is a new SQL function pg_get_catalog_foreign_keys(), which exposes the table to SQL. The oidjoins regression test is rewritten to use pg_get_catalog_foreign_keys() to find out which columns to check. Aside from removing the need for manual maintenance of that test script, this allows it to cover numerous relationships that were not checked by the old implementation based on findoidjoins. (As of this commit, 217 relationships are checked by the test, versus 181 before.) Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 23:11:55 +01:00
In some catalog columns, it's allowed for entries to be zero instead
of a valid reference. If this is allowed, write
<literal>BKI_LOOKUP_OPT</literal> instead
of <literal>BKI_LOOKUP</literal>. Then you can
write <literal>0</literal> for an entry. (If the column is
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
declared <type>regproc</type>, you can optionally
write <literal>-</literal> instead of <literal>0</literal>.)
Build in some knowledge about foreign-key relationships in the catalogs. This follows in the spirit of commit dfb75e478, which created primary key and uniqueness constraints to improve the visibility of constraints imposed on the system catalogs. While our catalogs contain many foreign-key-like relationships, they don't quite follow SQL semantics, in that the convention for an omitted reference is to write zero not NULL. Plus, we have some cases in which there are arrays each of whose elements is supposed to be an FK reference; SQL has no way to model that. So we can't create actual foreign key constraints to describe the situation. Nonetheless, we can collect and use knowledge about these relationships. This patch therefore adds annotations to the catalog header files to declare foreign-key relationships. (The BKI_LOOKUP annotations cover simple cases, but we weren't previously distinguishing which such columns are allowed to contain zeroes; we also need new markings for multi-column FK references.) Then, Catalog.pm and genbki.pl are taught to collect this information into a table in a new generated header "system_fk_info.h". The only user of that at the moment is a new SQL function pg_get_catalog_foreign_keys(), which exposes the table to SQL. The oidjoins regression test is rewritten to use pg_get_catalog_foreign_keys() to find out which columns to check. Aside from removing the need for manual maintenance of that test script, this allows it to cover numerous relationships that were not checked by the old implementation based on findoidjoins. (As of this commit, 217 relationships are checked by the test, versus 181 before.) Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 23:11:55 +01:00
Except for this special case, all entries in
a <literal>BKI_LOOKUP</literal> column must be symbolic references.
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<filename>genbki.pl</filename> will warn about unrecognized names.
</para>
</listitem>
<listitem>
<para>
Most kinds of catalog objects are simply referenced by their names.
Note that type names must exactly match the
referenced <structname>pg_type</structname>
entry's <structfield>typname</structfield>; you do not get to use
any aliases such as <literal>integer</literal>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
for <literal>int4</literal>.
</para>
</listitem>
<listitem>
<para>
A function can be represented by
its <structfield>proname</structfield>, if that is unique among
the <filename>pg_proc.dat</filename> entries (this works like regproc
input). Otherwise, write it
as <replaceable>proname(argtypename,argtypename,...)</replaceable>,
like regprocedure. The argument type names must be spelled exactly as
they are in the <filename>pg_proc.dat</filename> entry's
<structfield>proargtypes</structfield> field. Do not insert any
spaces.
</para>
</listitem>
<listitem>
<para>
Operators are represented
by <replaceable>oprname(lefttype,righttype)</replaceable>,
writing the type names exactly as they appear in
the <filename>pg_operator.dat</filename>
entry's <structfield>oprleft</structfield>
and <structfield>oprright</structfield> fields.
(Write <literal>0</literal> for the omitted operand of a unary
operator.)
</para>
</listitem>
<listitem>
<para>
The names of opclasses and opfamilies are only unique within an
access method, so they are represented
by <replaceable>access_method_name</replaceable><literal>/</literal><replaceable>object_name</replaceable>.
</para>
</listitem>
<listitem>
<para>
In none of these cases is there any provision for
schema-qualification; all objects created during bootstrap are
expected to be in the <literal>pg_catalog</literal> schema.
</para>
</listitem>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</itemizedlist>
<para>
<filename>genbki.pl</filename> resolves all symbolic references while it
runs, and puts simple numeric OIDs into the emitted BKI file. There is
therefore no need for the bootstrap backend to deal with symbolic
references.
</para>
Build in some knowledge about foreign-key relationships in the catalogs. This follows in the spirit of commit dfb75e478, which created primary key and uniqueness constraints to improve the visibility of constraints imposed on the system catalogs. While our catalogs contain many foreign-key-like relationships, they don't quite follow SQL semantics, in that the convention for an omitted reference is to write zero not NULL. Plus, we have some cases in which there are arrays each of whose elements is supposed to be an FK reference; SQL has no way to model that. So we can't create actual foreign key constraints to describe the situation. Nonetheless, we can collect and use knowledge about these relationships. This patch therefore adds annotations to the catalog header files to declare foreign-key relationships. (The BKI_LOOKUP annotations cover simple cases, but we weren't previously distinguishing which such columns are allowed to contain zeroes; we also need new markings for multi-column FK references.) Then, Catalog.pm and genbki.pl are taught to collect this information into a table in a new generated header "system_fk_info.h". The only user of that at the moment is a new SQL function pg_get_catalog_foreign_keys(), which exposes the table to SQL. The oidjoins regression test is rewritten to use pg_get_catalog_foreign_keys() to find out which columns to check. Aside from removing the need for manual maintenance of that test script, this allows it to cover numerous relationships that were not checked by the old implementation based on findoidjoins. (As of this commit, 217 relationships are checked by the test, versus 181 before.) Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 23:11:55 +01:00
<para>
It's desirable to mark OID reference columns
with <literal>BKI_LOOKUP</literal> or <literal>BKI_LOOKUP_OPT</literal>
even if the catalog has no initial data that requires lookup. This
allows <filename>genbki.pl</filename> to record the foreign key
relationships that exist in the system catalogs. That information is
used in the regression tests to check for incorrect entries. See also
the macros <literal>DECLARE_FOREIGN_KEY</literal>,
<literal>DECLARE_FOREIGN_KEY_OPT</literal>,
<literal>DECLARE_ARRAY_FOREIGN_KEY</literal>,
and <literal>DECLARE_ARRAY_FOREIGN_KEY_OPT</literal>, which are
used to declare foreign key relationships that are too complex
for <literal>BKI_LOOKUP</literal> (typically, multi-column foreign
keys).
</para>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</sect2>
<sect2 id="system-catalog-auto-array-types">
<title>Automatic Creation of Array Types</title>
<para>
Most scalar data types should have a corresponding array type (that is,
a standard varlena array type whose element type is the scalar type, and
which is referenced by the <structfield>typarray</structfield> field of
the scalar type's <structname>pg_type</structname>
entry). <filename>genbki.pl</filename> is able to generate
the <structname>pg_type</structname> entry for the array type
automatically in most cases.
</para>
<para>
To use this facility, just write an <literal>array_type_oid
=&gt; <replaceable>nnnn</replaceable></literal> metadata field in the
scalar type's <structname>pg_type</structname> entry, specifying the OID
to use for the array type. You may then omit
the <structfield>typarray</structfield> field, since it will be filled
automatically with that OID.
</para>
<para>
The generated array type's name is the scalar type's name with an
underscore prepended. The array entry's other fields are filled from
<literal>BKI_ARRAY_DEFAULT(<replaceable>value</replaceable>)</literal>
annotations in <filename>pg_type.h</filename>, or if there isn't one,
copied from the scalar type. (There's also a special case
for <structfield>typalign</structfield>.) Then
the <structfield>typelem</structfield>
and <structfield>typarray</structfield> fields of the two entries are
set to cross-reference each other.
</para>
</sect2>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<sect2 id="system-catalog-recipes">
<title>Recipes for Editing Data Files</title>
<para>
Here are some suggestions about the easiest ways to perform common tasks
when updating catalog data files.
</para>
<formalpara>
<title>Add a new column with a default to a catalog:</title>
<para>
Add the column to the header file with
a <literal>BKI_DEFAULT(<replaceable>value</replaceable>)</literal>
annotation. The data file need only be adjusted by adding the field
in existing rows where a non-default value is needed.
</para>
</formalpara>
<formalpara>
<title>Add a default value to an existing column that doesn't have
one:</title>
<para>
Add a <literal>BKI_DEFAULT</literal> annotation to the header file,
then run <literal>make reformat-dat-files</literal> to remove
now-redundant field entries.
</para>
</formalpara>
<formalpara>
<title>Remove a column, whether it has a default or not:</title>
<para>
Remove the column from the header, then run <literal>make
reformat-dat-files</literal> to remove now-useless field entries.
</para>
</formalpara>
<formalpara>
<title>Change or remove an existing default value:</title>
<para>
You cannot simply change the header file, since that will cause the
current data to be interpreted incorrectly. First run <literal>make
expand-dat-files</literal> to rewrite the data files with all
default values inserted explicitly, then change or remove
the <literal>BKI_DEFAULT</literal> annotation, then run <literal>make
reformat-dat-files</literal> to remove superfluous fields again.
</para>
</formalpara>
<formalpara>
<title>Ad-hoc bulk editing:</title>
<para>
<filename>reformat_dat_file.pl</filename> can be adapted to perform
many kinds of bulk changes. Look for its block comments showing where
one-off code can be inserted. In the following example, we are going
2021-07-16 10:35:38 +02:00
to consolidate two Boolean fields in <structname>pg_proc</structname>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
into a char field:
<orderedlist>
<listitem>
<para>
Add the new column, with a default,
to <filename>pg_proc.h</filename>:
<programlisting>
+ /* see PROKIND_ categories below */
+ char prokind BKI_DEFAULT(f);
</programlisting>
</para>
</listitem>
<listitem>
<para>
Create a new script based on <filename>reformat_dat_file.pl</filename>
to insert appropriate values on-the-fly:
<programlisting>
- # At this point we have the full row in memory as a hash
- # and can do any operations we want. As written, it only
- # removes default values, but this script can be adapted to
- # do one-off bulk-editing.
+ # One-off change to migrate to prokind
+ # Default has already been filled in by now, so change to other
+ # values as appropriate
+ if ($values{proisagg} eq 't')
+ {
+ $values{prokind} = 'a';
+ }
+ elsif ($values{proiswindow} eq 't')
+ {
+ $values{prokind} = 'w';
+ }
</programlisting>
</para>
</listitem>
<listitem>
<para>
Run the new script:
<programlisting>
$ cd src/include/catalog
$ perl rewrite_dat_with_prokind.pl pg_proc.dat
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
</programlisting>
At this point <filename>pg_proc.dat</filename> has all three
columns, <structfield>prokind</structfield>,
<structfield>proisagg</structfield>,
and <structfield>proiswindow</structfield>, though they will appear
only in rows where they have non-default values.
</para>
</listitem>
<listitem>
<para>
Remove the old columns from <filename>pg_proc.h</filename>:
<programlisting>
- /* is it an aggregate? */
- bool proisagg BKI_DEFAULT(f);
-
- /* is it a window function? */
- bool proiswindow BKI_DEFAULT(f);
</programlisting>
</para>
</listitem>
<listitem>
<para>
Finally, run <literal>make reformat-dat-files</literal> to remove
the useless old entries from <filename>pg_proc.dat</filename>.
</para>
</listitem>
</orderedlist>
For further examples of scripts used for bulk editing, see
<filename>convert_oid2name.pl</filename>
and <filename>remove_pg_type_oid_symbols.pl</filename> attached to this
message:
<ulink url="https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tNcCT5D0O3ShfnM6jehA@mail.gmail.com"></ulink>
</para>
</formalpara>
</sect2>
</sect1>
<sect1 id="bki-format">
<title><acronym>BKI</acronym> File Format</title>
<para>
This section describes how the <productname>PostgreSQL</productname>
backend interprets <acronym>BKI</acronym> files. This description
Clean up various to-do items associated with system indexes: pg_database now has unique indexes on oid and on datname. pg_shadow now has unique indexes on usename and on usesysid. pg_am now has unique index on oid. pg_opclass now has unique index on oid. pg_amproc now has unique index on amid+amopclaid+amprocnum. Remove pg_rewrite's unnecessary index on oid, delete unused RULEOID syscache. Remove index on pg_listener and associated syscache for performance reasons (caching rows that are certain to change before you need 'em again is rather pointless). Change pg_attrdef's nonunique index on adrelid into a unique index on adrelid+adnum. Fix various incorrect settings of pg_class.relisshared, make that the primary reference point for whether a relation is shared or not. IsSharedSystemRelationName() is now only consulted to initialize relisshared during initial creation of tables and indexes. In theory we might now support shared user relations, though it's not clear how one would get entries for them into pg_class &etc of multiple databases. Fix recently reported bug that pg_attribute rows created for an index all have the same OID. (Proof that non-unique OID doesn't matter unless it's actually used to do lookups ;-)) There's no need to treat pg_trigger, pg_attrdef, pg_relcheck as bootstrap relations. Convert them into plain system catalogs without hardwired entries in pg_class and friends. Unify global.bki and template1.bki into a single init script postgres.bki, since the alleged distinction between them was misleading and pointless. Not to mention that it didn't work for setting up indexes on shared system relations. Rationalize locking of pg_shadow, pg_group, pg_attrdef (no need to use AccessExclusiveLock where ExclusiveLock or even RowExclusiveLock will do). Also, hold locks until transaction commit where necessary.
2001-06-12 07:55:50 +02:00
will be easier to understand if the <filename>postgres.bki</filename>
2005-01-06 00:42:03 +01:00
file is at hand as an example.
</para>
<para>
2002-03-22 20:20:45 +01:00
<acronym>BKI</acronym> input consists of a sequence of commands. Commands are made up
of a number of tokens, depending on the syntax of the command.
Tokens are usually separated by whitespace, but need not be if
Clean up various to-do items associated with system indexes: pg_database now has unique indexes on oid and on datname. pg_shadow now has unique indexes on usename and on usesysid. pg_am now has unique index on oid. pg_opclass now has unique index on oid. pg_amproc now has unique index on amid+amopclaid+amprocnum. Remove pg_rewrite's unnecessary index on oid, delete unused RULEOID syscache. Remove index on pg_listener and associated syscache for performance reasons (caching rows that are certain to change before you need 'em again is rather pointless). Change pg_attrdef's nonunique index on adrelid into a unique index on adrelid+adnum. Fix various incorrect settings of pg_class.relisshared, make that the primary reference point for whether a relation is shared or not. IsSharedSystemRelationName() is now only consulted to initialize relisshared during initial creation of tables and indexes. In theory we might now support shared user relations, though it's not clear how one would get entries for them into pg_class &etc of multiple databases. Fix recently reported bug that pg_attribute rows created for an index all have the same OID. (Proof that non-unique OID doesn't matter unless it's actually used to do lookups ;-)) There's no need to treat pg_trigger, pg_attrdef, pg_relcheck as bootstrap relations. Convert them into plain system catalogs without hardwired entries in pg_class and friends. Unify global.bki and template1.bki into a single init script postgres.bki, since the alleged distinction between them was misleading and pointless. Not to mention that it didn't work for setting up indexes on shared system relations. Rationalize locking of pg_shadow, pg_group, pg_attrdef (no need to use AccessExclusiveLock where ExclusiveLock or even RowExclusiveLock will do). Also, hold locks until transaction commit where necessary.
2001-06-12 07:55:50 +02:00
there is no ambiguity. There is no special command separator; the
next token that syntactically cannot belong to the preceding
command starts a new one. (Usually you would put a new command on
a new line, for clarity.) Tokens can be certain key words, special
characters (parentheses, commas, etc.), identifiers, numbers, or
single-quoted strings. Everything is case sensitive.
</para>
<para>
Lines starting with <literal>#</literal> are ignored.
</para>
</sect1>
<sect1 id="bki-commands">
2002-03-22 20:20:45 +01:00
<title><acronym>BKI</acronym> Commands</title>
<variablelist>
2005-01-06 00:42:03 +01:00
<varlistentry>
<term>
<literal>create</literal>
<replaceable class="parameter">tablename</replaceable>
<replaceable class="parameter">tableoid</replaceable>
<optional><literal>bootstrap</literal></optional>
<optional><literal>shared_relation</literal></optional>
<optional><literal>rowtype_oid</literal> <replaceable>oid</replaceable></optional>
2005-01-06 00:42:03 +01:00
(<replaceable class="parameter">name1</replaceable> =
<replaceable class="parameter">type1</replaceable>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional> <optional>,
<replaceable class="parameter">name2</replaceable> =
<replaceable class="parameter">type2</replaceable>
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional>,
...</optional>)
2005-01-06 00:42:03 +01:00
</term>
<listitem>
<para>
Create a table named <replaceable
class="parameter">tablename</replaceable>, and having the OID
<replaceable class="parameter">tableoid</replaceable>,
with the columns given in parentheses.
2005-01-06 00:42:03 +01:00
</para>
<para>
The following column types are supported directly by
<filename>bootstrap.c</filename>: <type>bool</type>,
2005-01-06 00:42:03 +01:00
<type>bytea</type>, <type>char</type> (1 byte),
<type>name</type>, <type>int2</type>,
<type>int4</type>, <type>regproc</type>, <type>regclass</type>,
<type>regtype</type>, <type>text</type>,
<type>oid</type>, <type>tid</type>, <type>xid</type>,
<type>cid</type>, <type>int2vector</type>, <type>oidvector</type>,
<type>_int4</type> (array), <type>_text</type> (array),
<type>_oid</type> (array), <type>_char</type> (array),
2005-01-06 00:42:03 +01:00
<type>_aclitem</type> (array). Although it is possible to create
tables containing columns of other types, this cannot be done until
after <structname>pg_type</structname> has been created and filled with
appropriate entries. (That effectively means that only these
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
column types can be used in bootstrap catalogs, but non-bootstrap
catalogs can contain any built-in type.)
2005-01-06 00:42:03 +01:00
</para>
<para>
When <literal>bootstrap</literal> is specified,
2005-01-06 00:42:03 +01:00
the table will only be created on disk; nothing is entered into
<structname>pg_class</structname>,
2022-05-19 09:27:34 +02:00
<structname>pg_attribute</structname>, etc., for it. Thus the
2005-01-06 00:42:03 +01:00
table will not be accessible by ordinary SQL operations until
such entries are made the hard way (with <literal>insert</literal>
2005-01-06 00:42:03 +01:00
commands). This option is used for creating
2022-05-19 09:27:34 +02:00
<structname>pg_class</structname> etc. themselves.
2005-01-06 00:42:03 +01:00
</para>
<para>
The table is created as shared if <literal>shared_relation</literal> is
2005-01-06 00:42:03 +01:00
specified.
The table's row type OID (<structname>pg_type</structname> OID) can optionally
be specified via the <literal>rowtype_oid</literal> clause; if not specified,
an OID is automatically generated for it. (The <literal>rowtype_oid</literal>
clause is useless if <literal>bootstrap</literal> is specified, but it can be
provided anyway for documentation.)
2005-01-06 00:42:03 +01:00
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<literal>open</literal> <replaceable class="parameter">tablename</replaceable>
</term>
<listitem>
<para>
Open the table named
<replaceable class="parameter">tablename</replaceable>
for insertion of data. Any currently open table is closed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Fix bootstrap parser so that its keywords are unreserved words. Mark Dilger pointed out that the bootstrap parser does not allow any of its keywords to appear as column values unless they're quoted, and proposed dealing with that by quoting such values in genbki.pl. Looking closer, though, we also have that problem with respect to table, column, and type names appearing in the .bki file: the parser would fail if any of those matched any of its keywords. While so far there have been no conflicts (that I've heard of), this seems like a booby trap waiting to catch somebody. Rather than clutter genbki.pl with enough quoting logic to handle all that, let's make the bootstrap parser grow up a little bit and treat its keywords as unreserved. Experimentation shows that it's fairly easy to do so with the exception of _null_, which I don't have a big problem with keeping as a reserved word. The only change needed is that we can't have the "close" command take an optional table name: it has to either require or forbid the table name to avoid shift/reduce conflicts. genbki.pl has historically always included the table name, so I took that option. The implementation has bootscanner.l passing forward the string value of each keyword, in case bootparse.y needs that. This avoids needing to know the precise spelling of each keyword in bootparse.y, which is good because that's not always obvious from the token name. Discussion: https://postgr.es/m/3024FC91-DB6D-4732-B31C-DF772DF039A0@gmail.com
2018-05-05 22:23:07 +02:00
<literal>close</literal> <replaceable class="parameter">tablename</replaceable>
</term>
<listitem>
<para>
Fix bootstrap parser so that its keywords are unreserved words. Mark Dilger pointed out that the bootstrap parser does not allow any of its keywords to appear as column values unless they're quoted, and proposed dealing with that by quoting such values in genbki.pl. Looking closer, though, we also have that problem with respect to table, column, and type names appearing in the .bki file: the parser would fail if any of those matched any of its keywords. While so far there have been no conflicts (that I've heard of), this seems like a booby trap waiting to catch somebody. Rather than clutter genbki.pl with enough quoting logic to handle all that, let's make the bootstrap parser grow up a little bit and treat its keywords as unreserved. Experimentation shows that it's fairly easy to do so with the exception of _null_, which I don't have a big problem with keeping as a reserved word. The only change needed is that we can't have the "close" command take an optional table name: it has to either require or forbid the table name to avoid shift/reduce conflicts. genbki.pl has historically always included the table name, so I took that option. The implementation has bootscanner.l passing forward the string value of each keyword, in case bootparse.y needs that. This avoids needing to know the precise spelling of each keyword in bootparse.y, which is good because that's not always obvious from the token name. Discussion: https://postgr.es/m/3024FC91-DB6D-4732-B31C-DF772DF039A0@gmail.com
2018-05-05 22:23:07 +02:00
Close the open table. The name of the table must be given as a
cross-check.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
<literal>insert</literal> <literal>(</literal> <optional><replaceable class="parameter">oid_value</replaceable></optional> <replaceable class="parameter">value1</replaceable> <replaceable class="parameter">value2</replaceable> ... <literal>)</literal>
</term>
<listitem>
<para>
Insert a new row into the open table using <replaceable
class="parameter">value1</replaceable>, <replaceable
class="parameter">value2</replaceable>, etc., for its column
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
values.
</para>
<para>
NULL values can be specified using the special key word
Fix bootstrap parser so that its keywords are unreserved words. Mark Dilger pointed out that the bootstrap parser does not allow any of its keywords to appear as column values unless they're quoted, and proposed dealing with that by quoting such values in genbki.pl. Looking closer, though, we also have that problem with respect to table, column, and type names appearing in the .bki file: the parser would fail if any of those matched any of its keywords. While so far there have been no conflicts (that I've heard of), this seems like a booby trap waiting to catch somebody. Rather than clutter genbki.pl with enough quoting logic to handle all that, let's make the bootstrap parser grow up a little bit and treat its keywords as unreserved. Experimentation shows that it's fairly easy to do so with the exception of _null_, which I don't have a big problem with keeping as a reserved word. The only change needed is that we can't have the "close" command take an optional table name: it has to either require or forbid the table name to avoid shift/reduce conflicts. genbki.pl has historically always included the table name, so I took that option. The implementation has bootscanner.l passing forward the string value of each keyword, in case bootparse.y needs that. This avoids needing to know the precise spelling of each keyword in bootparse.y, which is good because that's not always obvious from the token name. Discussion: https://postgr.es/m/3024FC91-DB6D-4732-B31C-DF772DF039A0@gmail.com
2018-05-05 22:23:07 +02:00
<literal>_null_</literal>. Values that do not look like
identifiers or digit strings must be single-quoted.
(To include a single quote in a value, write it twice.
Escape-string-style backslash escapes are allowed in the string, too.)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<literal>declare</literal> <optional><literal>unique</literal></optional>
<literal>index</literal> <replaceable class="parameter">indexname</replaceable>
<replaceable class="parameter">indexoid</replaceable>
<literal>on</literal> <replaceable class="parameter">tablename</replaceable>
<literal>using</literal> <replaceable class="parameter">amname</replaceable>
<literal>(</literal> <replaceable class="parameter">opclass1</replaceable>
<replaceable class="parameter">name1</replaceable>
<optional>, ...</optional> <literal>)</literal>
</term>
<listitem>
<para>
Create an index named <replaceable
class="parameter">indexname</replaceable>, having OID
<replaceable class="parameter">indexoid</replaceable>,
on the table named
<replaceable class="parameter">tablename</replaceable>, using the
<replaceable class="parameter">amname</replaceable> access
method. The fields to index are called <replaceable
class="parameter">name1</replaceable>, <replaceable
class="parameter">name2</replaceable> etc., and the operator
classes to use are <replaceable
class="parameter">opclass1</replaceable>, <replaceable
class="parameter">opclass2</replaceable> etc., respectively.
2005-01-06 00:42:03 +01:00
The index file is created and appropriate catalog entries are
made for it, but the index contents are not initialized by this command.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<literal>declare toast</literal>
<replaceable class="parameter">toasttableoid</replaceable>
<replaceable class="parameter">toastindexoid</replaceable>
<literal>on</literal> <replaceable class="parameter">tablename</replaceable>
</term>
<listitem>
<para>
Create a TOAST table for the table named
<replaceable class="parameter">tablename</replaceable>.
The TOAST table is assigned OID
<replaceable class="parameter">toasttableoid</replaceable>
and its index is assigned OID
<replaceable class="parameter">toastindexoid</replaceable>.
As with <literal>declare index</literal>, filling of the index
is postponed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>build indices</literal></term>
<listitem>
<para>
2005-01-06 00:42:03 +01:00
Fill in the indices that have previously been declared.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1 id="bki-structure">
<title>Structure of the Bootstrap <acronym>BKI</acronym> File</title>
<para>
The <literal>open</literal> command cannot be used until the tables it uses
exist and have entries for the table that is to be opened.
(These minimum tables are <structname>pg_class</structname>,
<structname>pg_attribute</structname>, <structname>pg_proc</structname>, and
<structname>pg_type</structname>.) To allow those tables themselves to be filled,
<literal>create</literal> with the <literal>bootstrap</literal> option implicitly opens
the created table for data insertion.
</para>
<para>
Also, the <literal>declare index</literal> and <literal>declare toast</literal>
commands cannot be used until the system catalogs they need have been
created and filled in.
</para>
<para>
Thus, the structure of the <filename>postgres.bki</filename> file has to
be:
<orderedlist>
<listitem>
<para>
<literal>create bootstrap</literal> one of the critical tables
</para>
</listitem>
<listitem>
<para>
<literal>insert</literal> data describing at least the critical tables
</para>
</listitem>
<listitem>
<para>
<literal>close</literal>
</para>
</listitem>
<listitem>
<para>
Repeat for the other critical tables.
</para>
</listitem>
<listitem>
<para>
<literal>create</literal> (without <literal>bootstrap</literal>) a noncritical table
</para>
</listitem>
<listitem>
<para>
<literal>open</literal>
</para>
</listitem>
<listitem>
<para>
<literal>insert</literal> desired data
</para>
</listitem>
<listitem>
<para>
<literal>close</literal>
</para>
</listitem>
<listitem>
<para>
Repeat for the other noncritical tables.
</para>
</listitem>
<listitem>
<para>
Define indexes and toast tables.
</para>
</listitem>
<listitem>
<para>
<literal>build indices</literal>
</para>
</listitem>
</orderedlist>
</para>
<para>
There are doubtless other, undocumented ordering dependencies.
</para>
</sect1>
<sect1 id="bki-example">
Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
2018-04-08 19:16:50 +02:00
<title>BKI Example</title>
<para>
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
The following sequence of commands will create the table
<literal>test_table</literal> with OID 420, having three columns
<literal>oid</literal>, <literal>cola</literal> and <literal>colb</literal>
of type <type>oid</type>, <type>int4</type> and <type>text</type>,
respectively, and insert two rows into the table:
<programlisting>
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
create test_table 420 (oid = oid, cola = int4, colb = text)
open test_table
insert ( 421 1 'value 1' )
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
insert ( 422 2 _null_ )
close test_table
</programlisting>
</para>
</sect1>
</chapter>