1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* dbcommands.c
|
2001-06-12 07:55:50 +02:00
|
|
|
* Database management commands (create/drop database).
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Note: database creation/destruction commands use exclusive locks on
|
|
|
|
* the database objects (as expressed by LockSharedObject()) to avoid
|
|
|
|
* stepping on each others' toes. Formerly we used table-level locks
|
|
|
|
* on pg_database, but that's too coarse-grained.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2023-01-02 21:00:37 +01:00
|
|
|
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/commands/dbcommands.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
2000-01-13 19:26:18 +01:00
|
|
|
|
|
|
|
#include <fcntl.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <sys/stat.h>
|
1998-04-27 06:08:07 +02:00
|
|
|
|
2019-12-27 00:09:00 +01:00
|
|
|
#include "access/genam.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
#include "access/heapam.h"
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
2019-09-27 23:10:16 +02:00
|
|
|
#include "access/multixact.h"
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
#include "access/tableam.h"
|
2006-07-13 18:49:20 +02:00
|
|
|
#include "access/xact.h"
|
2014-11-06 12:52:08 +01:00
|
|
|
#include "access/xloginsert.h"
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
#include "access/xlogrecovery.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "access/xlogutils.h"
|
2000-10-16 16:52:28 +02:00
|
|
|
#include "catalog/catalog.h"
|
2005-07-07 22:40:02 +02:00
|
|
|
#include "catalog/dependency.h"
|
2005-06-28 07:09:14 +02:00
|
|
|
#include "catalog/indexing.h"
|
2010-11-25 17:48:49 +01:00
|
|
|
#include "catalog/objectaccess.h"
|
2005-06-28 07:09:14 +02:00
|
|
|
#include "catalog/pg_authid.h"
|
2022-02-14 08:09:04 +01:00
|
|
|
#include "catalog/pg_collation.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
#include "catalog/pg_database.h"
|
2009-10-08 00:14:26 +02:00
|
|
|
#include "catalog/pg_db_role_setting.h"
|
2017-01-19 18:00:00 +01:00
|
|
|
#include "catalog/pg_subscription.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/pg_tablespace.h"
|
1999-10-26 05:12:39 +02:00
|
|
|
#include "commands/comment.h"
|
2000-11-14 19:37:49 +01:00
|
|
|
#include "commands/dbcommands.h"
|
2015-03-09 14:49:10 +01:00
|
|
|
#include "commands/dbcommands_xlog.h"
|
2014-07-02 01:02:21 +02:00
|
|
|
#include "commands/defrem.h"
|
2011-07-20 19:18:24 +02:00
|
|
|
#include "commands/seclabel.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "commands/tablespace.h"
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
#include "common/file_perm.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "mb/pg_wchar.h"
|
1999-07-16 05:14:30 +02:00
|
|
|
#include "miscadmin.h"
|
2007-02-09 17:12:19 +01:00
|
|
|
#include "pgstat.h"
|
2004-10-28 02:39:59 +02:00
|
|
|
#include "postmaster/bgwriter.h"
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
#include "replication/slot.h"
|
2010-11-12 22:39:53 +01:00
|
|
|
#include "storage/copydir.h"
|
2012-08-29 00:02:07 +02:00
|
|
|
#include "storage/fd.h"
|
2008-04-17 01:59:40 +02:00
|
|
|
#include "storage/ipc.h"
|
2019-11-12 04:00:16 +01:00
|
|
|
#include "storage/lmgr.h"
|
2019-04-04 10:56:03 +02:00
|
|
|
#include "storage/md.h"
|
2005-05-19 23:35:48 +02:00
|
|
|
#include "storage/procarray.h"
|
2007-01-17 17:25:01 +01:00
|
|
|
#include "storage/smgr.h"
|
2003-06-27 16:45:32 +02:00
|
|
|
#include "utils/acl.h"
|
2000-01-13 19:26:18 +01:00
|
|
|
#include "utils/builtins.h"
|
2000-05-28 19:56:29 +02:00
|
|
|
#include "utils/fmgroids.h"
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
#include "utils/guc.h"
|
2008-09-23 11:20:39 +02:00
|
|
|
#include "utils/pg_locale.h"
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
#include "utils/relmapper.h"
|
2008-11-07 19:25:07 +01:00
|
|
|
#include "utils/snapmgr.h"
|
1998-04-27 06:08:07 +02:00
|
|
|
#include "utils/syscache.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
/*
|
|
|
|
* Create database strategy.
|
|
|
|
*
|
|
|
|
* CREATEDB_WAL_LOG will copy the database at the block level and WAL log each
|
|
|
|
* copied block.
|
|
|
|
*
|
|
|
|
* CREATEDB_FILE_COPY will simply perform a file system level copy of the
|
|
|
|
* database and log a single record for each tablespace copied. To make this
|
|
|
|
* safe, it also triggers checkpoints before and after the operation.
|
|
|
|
*/
|
|
|
|
typedef enum CreateDBStrategy
|
|
|
|
{
|
|
|
|
CREATEDB_WAL_LOG,
|
|
|
|
CREATEDB_FILE_COPY
|
|
|
|
} CreateDBStrategy;
|
|
|
|
|
2008-04-17 01:59:40 +02:00
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
Oid src_dboid; /* source (template) DB */
|
|
|
|
Oid dest_dboid; /* DB we are trying to create */
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
CreateDBStrategy strategy; /* create db strategy */
|
2008-04-17 01:59:40 +02:00
|
|
|
} createdb_failure_params;
|
|
|
|
|
2008-11-07 19:25:07 +01:00
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
Oid dest_dboid; /* DB we are trying to move */
|
|
|
|
Oid dest_tsoid; /* tablespace we are trying to move to */
|
|
|
|
} movedb_failure_params;
|
|
|
|
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
/*
|
|
|
|
* Information about a relation to be copied when creating a database.
|
|
|
|
*/
|
|
|
|
typedef struct CreateDBRelInfo
|
|
|
|
{
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
RelFileLocator rlocator; /* physical relation identifier */
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
Oid reloid; /* relation oid */
|
|
|
|
bool permanent; /* relation is permanent or unlogged */
|
|
|
|
} CreateDBRelInfo;
|
|
|
|
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/* non-export function prototypes */
|
2008-04-17 01:59:40 +02:00
|
|
|
static void createdb_failure_callback(int code, Datum arg);
|
2008-11-07 19:25:07 +01:00
|
|
|
static void movedb(const char *dbname, const char *tblspcname);
|
|
|
|
static void movedb_failure_callback(int code, Datum arg);
|
2006-05-04 18:07:29 +02:00
|
|
|
static bool get_db_info(const char *name, LOCKMODE lockmode,
|
|
|
|
Oid *dbIdP, Oid *ownerIdP,
|
2005-03-12 22:33:55 +01:00
|
|
|
int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
|
2022-01-20 14:56:54 +01:00
|
|
|
TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
|
2022-03-17 11:11:21 +01:00
|
|
|
Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
|
2023-03-08 16:35:42 +01:00
|
|
|
char **dbIcurules,
|
2022-03-17 11:11:21 +01:00
|
|
|
char *dbLocProvider,
|
2022-02-14 08:09:04 +01:00
|
|
|
char **dbCollversion);
|
2004-06-18 08:14:31 +02:00
|
|
|
static void remove_dbtablespaces(Oid db_id);
|
2006-10-19 00:44:12 +02:00
|
|
|
static bool check_db_file_conflict(Oid db_id);
|
2008-08-04 20:03:46 +02:00
|
|
|
static int errdetail_busy_db(int notherbackends, int npreparedxacts);
|
2022-09-20 04:18:36 +02:00
|
|
|
static void CreateDatabaseUsingWalLog(Oid src_dboid, Oid dst_dboid, Oid src_tsid,
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
Oid dst_tsid);
|
2022-09-20 04:18:36 +02:00
|
|
|
static List *ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
static List *ScanSourceDatabasePgClassPage(Page page, Buffer buf, Oid tbid,
|
|
|
|
Oid dbid, char *srcpath,
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
List *rlocatorlist, Snapshot snapshot);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
static CreateDBRelInfo *ScanSourceDatabasePgClassTuple(HeapTupleData *tuple,
|
|
|
|
Oid tbid, Oid dbid,
|
|
|
|
char *srcpath);
|
|
|
|
static void CreateDirAndVersionFile(char *dbpath, Oid dbid, Oid tsid,
|
|
|
|
bool isRedo);
|
2022-09-20 04:18:36 +02:00
|
|
|
static void CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid,
|
|
|
|
Oid src_tsid, Oid dst_tsid);
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
static void recovery_create_dbdir(char *path, bool only_tblspc);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a new database using the WAL_LOG strategy.
|
|
|
|
*
|
|
|
|
* Each copied block is separately written to the write-ahead log.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
CreateDatabaseUsingWalLog(Oid src_dboid, Oid dst_dboid,
|
|
|
|
Oid src_tsid, Oid dst_tsid)
|
|
|
|
{
|
|
|
|
char *srcpath;
|
|
|
|
char *dstpath;
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
List *rlocatorlist = NULL;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
ListCell *cell;
|
|
|
|
LockRelId srcrelid;
|
|
|
|
LockRelId dstrelid;
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
RelFileLocator srcrlocator;
|
|
|
|
RelFileLocator dstrlocator;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
CreateDBRelInfo *relinfo;
|
|
|
|
|
|
|
|
/* Get source and destination database paths. */
|
|
|
|
srcpath = GetDatabasePath(src_dboid, src_tsid);
|
|
|
|
dstpath = GetDatabasePath(dst_dboid, dst_tsid);
|
|
|
|
|
|
|
|
/* Create database directory and write PG_VERSION file. */
|
|
|
|
CreateDirAndVersionFile(dstpath, dst_dboid, dst_tsid, false);
|
|
|
|
|
|
|
|
/* Copy relmap file from source database to the destination database. */
|
|
|
|
RelationMapCopy(dst_dboid, dst_tsid, srcpath, dstpath);
|
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
/* Get list of relfilelocators to copy from the source database. */
|
|
|
|
rlocatorlist = ScanSourceDatabasePgClass(src_tsid, src_dboid, srcpath);
|
|
|
|
Assert(rlocatorlist != NIL);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Database IDs will be the same for all relations so set them before
|
|
|
|
* entering the loop.
|
|
|
|
*/
|
|
|
|
srcrelid.dbId = src_dboid;
|
|
|
|
dstrelid.dbId = dst_dboid;
|
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
/* Loop over our list of relfilelocators and copy each one. */
|
|
|
|
foreach(cell, rlocatorlist)
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
{
|
|
|
|
relinfo = lfirst(cell);
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
srcrlocator = relinfo->rlocator;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the relation is from the source db's default tablespace then we
|
2022-08-04 11:41:29 +02:00
|
|
|
* need to create it in the destination db's default tablespace.
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
* Otherwise, we need to create in the same tablespace as it is in the
|
|
|
|
* source database.
|
|
|
|
*/
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
if (srcrlocator.spcOid == src_tsid)
|
|
|
|
dstrlocator.spcOid = dst_tsid;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
else
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
dstrlocator.spcOid = srcrlocator.spcOid;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
dstrlocator.dbOid = dst_dboid;
|
|
|
|
dstrlocator.relNumber = srcrlocator.relNumber;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Acquire locks on source and target relations before copying.
|
|
|
|
*
|
|
|
|
* We typically do not read relation data into shared_buffers without
|
|
|
|
* holding a relation lock. It's unclear what could go wrong if we
|
|
|
|
* skipped it in this case, because nobody can be modifying either the
|
|
|
|
* source or destination database at this point, and we have locks on
|
|
|
|
* both databases, too, but let's take the conservative route.
|
|
|
|
*/
|
|
|
|
dstrelid.relId = srcrelid.relId = relinfo->reloid;
|
|
|
|
LockRelationId(&srcrelid, AccessShareLock);
|
|
|
|
LockRelationId(&dstrelid, AccessShareLock);
|
|
|
|
|
|
|
|
/* Copy relation storage from source to the destination. */
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
CreateAndCopyRelationData(srcrlocator, dstrlocator, relinfo->permanent);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
/* Release the relation locks. */
|
|
|
|
UnlockRelationId(&srcrelid, AccessShareLock);
|
|
|
|
UnlockRelationId(&dstrelid, AccessShareLock);
|
|
|
|
}
|
|
|
|
|
2022-04-25 10:32:13 +02:00
|
|
|
pfree(srcpath);
|
|
|
|
pfree(dstpath);
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
list_free_deep(rlocatorlist);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan the pg_class table in the source database to identify the relations
|
|
|
|
* that need to be copied to the destination database.
|
|
|
|
*
|
|
|
|
* This is an exception to the usual rule that cross-database access is
|
|
|
|
* not possible. We can make it work here because we know that there are no
|
|
|
|
* connections to the source database and (since there can't be prepared
|
|
|
|
* transactions touching that database) no in-doubt tuples either. This
|
|
|
|
* means that we don't need to worry about pruning removing anything from
|
|
|
|
* under us, and we don't need to be too picky about our snapshot either.
|
|
|
|
* As long as it sees all previously-committed XIDs as committed and all
|
|
|
|
* aborted XIDs as aborted, we should be fine: nothing else is possible
|
|
|
|
* here.
|
|
|
|
*
|
|
|
|
* We can't rely on the relcache for anything here, because that only knows
|
|
|
|
* about the database to which we are connected, and can't handle access to
|
|
|
|
* other databases. That also means we can't rely on the heap scan
|
|
|
|
* infrastructure, which would be a bad idea anyway since it might try
|
|
|
|
* to do things like HOT pruning which we definitely can't do safely in
|
|
|
|
* a database to which we're not even connected.
|
|
|
|
*/
|
|
|
|
static List *
|
|
|
|
ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
|
|
|
|
{
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
RelFileLocator rlocator;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
BlockNumber nblocks;
|
|
|
|
BlockNumber blkno;
|
|
|
|
Buffer buf;
|
2022-07-28 16:08:10 +02:00
|
|
|
RelFileNumber relfilenumber;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
Page page;
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
List *rlocatorlist = NIL;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
LockRelId relid;
|
|
|
|
Snapshot snapshot;
|
2023-05-19 23:24:48 +02:00
|
|
|
SMgrRelation smgr;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
BufferAccessStrategy bstrategy;
|
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
/* Get pg_class relfilenumber. */
|
|
|
|
relfilenumber = RelationMapOidToFilenumberForDatabase(srcpath,
|
|
|
|
RelationRelationId);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
/* Don't read data into shared_buffers without holding a relation lock. */
|
|
|
|
relid.dbId = dbid;
|
|
|
|
relid.relId = RelationRelationId;
|
|
|
|
LockRelationId(&relid, AccessShareLock);
|
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
/* Prepare a RelFileLocator for the pg_class relation. */
|
|
|
|
rlocator.spcOid = tbid;
|
|
|
|
rlocator.dbOid = dbid;
|
|
|
|
rlocator.relNumber = relfilenumber;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
2022-08-12 14:25:41 +02:00
|
|
|
smgr = smgropen(rlocator, InvalidBackendId);
|
|
|
|
nblocks = smgrnblocks(smgr, MAIN_FORKNUM);
|
|
|
|
smgrclose(smgr);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
/* Use a buffer access strategy since this is a bulk read operation. */
|
|
|
|
bstrategy = GetAccessStrategy(BAS_BULKREAD);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* As explained in the function header comments, we need a snapshot that
|
|
|
|
* will see all committed transactions as committed, and our transaction
|
|
|
|
* snapshot - or the active snapshot - might not be new enough for that,
|
|
|
|
* but the return value of GetLatestSnapshot() should work fine.
|
|
|
|
*/
|
|
|
|
snapshot = GetLatestSnapshot();
|
|
|
|
|
|
|
|
/* Process the relation block by block. */
|
|
|
|
for (blkno = 0; blkno < nblocks; blkno++)
|
|
|
|
{
|
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno,
|
Fix corruption of templates after CREATE DATABASE .. STRATEGY WAL_LOG
WAL_LOG does a scan of the template's pg_class to determine the set of
relations that need to be copied from a template database to the new
one. However, as coded in 9c08aea, this copy strategy would load the
pages of pg_class without considering it as a permanent relation,
causing the loaded pages to never be flushed when they should. Any
modification of the template's pg_class, mostly through DDLs, would then
be missed, causing corruptions.
STRATEGY = WAL_LOG is the default over FILE_COPY since it has been
introduced, so any changes done to pg_class on a database template would
be gone. Updates of database templates should be a rare thing, so the
impact of this bug should be hopefully limited. The pre-14 default
strategy FILE_COPY is safe, and can be used as a workaround.
Ryo Matsumura has found and analyzed the issue, and Nathan has written a
test able to reproduce the failure (with few tweaks from me).
Backpatch down to 15, where STRATEGY = WAL_LOG has been introduced.
Author: Nathan Bossart, Ryo Matsumura
Reviewed-by: Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/TYCPR01MB6868677E499C9AD5123084B5E8A39@TYCPR01MB6868.jpnprd01.prod.outlook.com
Backpatch-through: 15
2023-02-22 02:14:52 +01:00
|
|
|
RBM_NORMAL, bstrategy, true);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
LockBuffer(buf, BUFFER_LOCK_SHARE);
|
|
|
|
page = BufferGetPage(buf);
|
|
|
|
if (PageIsNew(page) || PageIsEmpty(page))
|
|
|
|
{
|
|
|
|
UnlockReleaseBuffer(buf);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
/* Append relevant pg_class tuples for current page to rlocatorlist. */
|
|
|
|
rlocatorlist = ScanSourceDatabasePgClassPage(page, buf, tbid, dbid,
|
|
|
|
srcpath, rlocatorlist,
|
|
|
|
snapshot);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
UnlockReleaseBuffer(buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Release relation lock. */
|
|
|
|
UnlockRelationId(&relid, AccessShareLock);
|
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
return rlocatorlist;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan one page of the source database's pg_class relation and add relevant
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
* entries to rlocatorlist. The return value is the updated list.
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
*/
|
|
|
|
static List *
|
|
|
|
ScanSourceDatabasePgClassPage(Page page, Buffer buf, Oid tbid, Oid dbid,
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
char *srcpath, List *rlocatorlist,
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
Snapshot snapshot)
|
|
|
|
{
|
|
|
|
BlockNumber blkno = BufferGetBlockNumber(buf);
|
|
|
|
OffsetNumber offnum;
|
|
|
|
OffsetNumber maxoff;
|
|
|
|
HeapTupleData tuple;
|
|
|
|
|
|
|
|
maxoff = PageGetMaxOffsetNumber(page);
|
|
|
|
|
|
|
|
/* Loop over offsets. */
|
|
|
|
for (offnum = FirstOffsetNumber;
|
|
|
|
offnum <= maxoff;
|
|
|
|
offnum = OffsetNumberNext(offnum))
|
|
|
|
{
|
|
|
|
ItemId itemid;
|
|
|
|
|
|
|
|
itemid = PageGetItemId(page, offnum);
|
|
|
|
|
|
|
|
/* Nothing to do if slot is empty or already dead. */
|
|
|
|
if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid) ||
|
|
|
|
ItemIdIsRedirected(itemid))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
Assert(ItemIdIsNormal(itemid));
|
|
|
|
ItemPointerSet(&(tuple.t_self), blkno, offnum);
|
|
|
|
|
|
|
|
/* Initialize a HeapTupleData structure. */
|
|
|
|
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
|
|
|
|
tuple.t_len = ItemIdGetLength(itemid);
|
|
|
|
tuple.t_tableOid = RelationRelationId;
|
|
|
|
|
|
|
|
/* Skip tuples that are not visible to this snapshot. */
|
|
|
|
if (HeapTupleSatisfiesVisibility(&tuple, snapshot, buf))
|
|
|
|
{
|
|
|
|
CreateDBRelInfo *relinfo;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ScanSourceDatabasePgClassTuple is in charge of constructing a
|
|
|
|
* CreateDBRelInfo object for this tuple, but can also decide that
|
|
|
|
* this tuple isn't something we need to copy. If we do need to
|
|
|
|
* copy the relation, add it to the list.
|
|
|
|
*/
|
|
|
|
relinfo = ScanSourceDatabasePgClassTuple(&tuple, tbid, dbid,
|
|
|
|
srcpath);
|
|
|
|
if (relinfo != NULL)
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
rlocatorlist = lappend(rlocatorlist, relinfo);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
}
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
return rlocatorlist;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Decide whether a certain pg_class tuple represents something that
|
|
|
|
* needs to be copied from the source database to the destination database,
|
|
|
|
* and if so, construct a CreateDBRelInfo for it.
|
|
|
|
*
|
2022-04-11 10:49:41 +02:00
|
|
|
* Visibility checks are handled by the caller, so our job here is just
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
* to assess the data stored in the tuple.
|
|
|
|
*/
|
|
|
|
CreateDBRelInfo *
|
|
|
|
ScanSourceDatabasePgClassTuple(HeapTupleData *tuple, Oid tbid, Oid dbid,
|
|
|
|
char *srcpath)
|
|
|
|
{
|
|
|
|
CreateDBRelInfo *relinfo;
|
|
|
|
Form_pg_class classForm;
|
2022-07-28 16:08:10 +02:00
|
|
|
RelFileNumber relfilenumber = InvalidRelFileNumber;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
classForm = (Form_pg_class) GETSTRUCT(tuple);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return NULL if this object does not need to be copied.
|
|
|
|
*
|
|
|
|
* Shared objects don't need to be copied, because they are shared.
|
|
|
|
* Objects without storage can't be copied, because there's nothing to
|
|
|
|
* copy. Temporary relations don't need to be copied either, because they
|
|
|
|
* are inaccessible outside of the session that created them, which must
|
|
|
|
* be gone already, and couldn't connect to a different database if it
|
|
|
|
* still existed. autovacuum will eventually remove the pg_class entries
|
|
|
|
* as well.
|
|
|
|
*/
|
|
|
|
if (classForm->reltablespace == GLOBALTABLESPACE_OID ||
|
|
|
|
!RELKIND_HAS_STORAGE(classForm->relkind) ||
|
|
|
|
classForm->relpersistence == RELPERSISTENCE_TEMP)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/*
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
* If relfilenumber is valid then directly use it. Otherwise, consult the
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
* relmap.
|
|
|
|
*/
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
if (RelFileNumberIsValid(classForm->relfilenode))
|
|
|
|
relfilenumber = classForm->relfilenode;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
else
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
relfilenumber = RelationMapOidToFilenumberForDatabase(srcpath,
|
|
|
|
classForm->oid);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
/* We must have a valid relfilenumber. */
|
|
|
|
if (!RelFileNumberIsValid(relfilenumber))
|
|
|
|
elog(ERROR, "relation with OID %u does not have a valid relfilenumber",
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
classForm->oid);
|
|
|
|
|
|
|
|
/* Prepare a rel info element and add it to the list. */
|
|
|
|
relinfo = (CreateDBRelInfo *) palloc(sizeof(CreateDBRelInfo));
|
|
|
|
if (OidIsValid(classForm->reltablespace))
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
relinfo->rlocator.spcOid = classForm->reltablespace;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
else
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
relinfo->rlocator.spcOid = tbid;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
relinfo->rlocator.dbOid = dbid;
|
|
|
|
relinfo->rlocator.relNumber = relfilenumber;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
relinfo->reloid = classForm->oid;
|
|
|
|
|
|
|
|
/* Temporary relations were rejected above. */
|
|
|
|
Assert(classForm->relpersistence != RELPERSISTENCE_TEMP);
|
|
|
|
relinfo->permanent =
|
|
|
|
(classForm->relpersistence == RELPERSISTENCE_PERMANENT) ? true : false;
|
|
|
|
|
|
|
|
return relinfo;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create database directory and write out the PG_VERSION file in the database
|
|
|
|
* path. If isRedo is true, it's okay for the database directory to exist
|
|
|
|
* already.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
CreateDirAndVersionFile(char *dbpath, Oid dbid, Oid tsid, bool isRedo)
|
|
|
|
{
|
|
|
|
int fd;
|
|
|
|
int nbytes;
|
|
|
|
char versionfile[MAXPGPATH];
|
|
|
|
char buf[16];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Prepare version data before starting a critical section.
|
|
|
|
*
|
|
|
|
* Note that we don't have to copy this from the source database; there's
|
|
|
|
* only one legal value.
|
|
|
|
*/
|
|
|
|
sprintf(buf, "%s\n", PG_MAJORVERSION);
|
|
|
|
nbytes = strlen(PG_MAJORVERSION) + 1;
|
|
|
|
|
|
|
|
/* If we are not in WAL replay then write the WAL. */
|
|
|
|
if (!isRedo)
|
|
|
|
{
|
|
|
|
xl_dbase_create_wal_log_rec xlrec;
|
|
|
|
XLogRecPtr lsn;
|
|
|
|
|
|
|
|
START_CRIT_SECTION();
|
|
|
|
|
|
|
|
xlrec.db_id = dbid;
|
|
|
|
xlrec.tablespace_id = tsid;
|
|
|
|
|
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterData((char *) (&xlrec),
|
|
|
|
sizeof(xl_dbase_create_wal_log_rec));
|
|
|
|
|
|
|
|
lsn = XLogInsert(RM_DBASE_ID, XLOG_DBASE_CREATE_WAL_LOG);
|
|
|
|
|
|
|
|
/* As always, WAL must hit the disk before the data update does. */
|
|
|
|
XLogFlush(lsn);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Create database directory. */
|
|
|
|
if (MakePGDirectory(dbpath) < 0)
|
|
|
|
{
|
|
|
|
/* Failure other than already exists or not in WAL replay? */
|
|
|
|
if (errno != EEXIST || !isRedo)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m", dbpath)));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create PG_VERSION file in the database path. If the file already
|
|
|
|
* exists and we are in WAL replay then try again to open it in write
|
|
|
|
* mode.
|
|
|
|
*/
|
|
|
|
snprintf(versionfile, sizeof(versionfile), "%s/%s", dbpath, "PG_VERSION");
|
|
|
|
|
|
|
|
fd = OpenTransientFile(versionfile, O_WRONLY | O_CREAT | O_EXCL | PG_BINARY);
|
|
|
|
if (fd < 0 && errno == EEXIST && isRedo)
|
|
|
|
fd = OpenTransientFile(versionfile, O_WRONLY | O_TRUNC | PG_BINARY);
|
|
|
|
|
|
|
|
if (fd < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create file \"%s\": %m", versionfile)));
|
|
|
|
|
|
|
|
/* Write PG_MAJORVERSION in the PG_VERSION file. */
|
|
|
|
pgstat_report_wait_start(WAIT_EVENT_VERSION_FILE_WRITE);
|
|
|
|
errno = 0;
|
|
|
|
if ((int) write(fd, buf, nbytes) != nbytes)
|
|
|
|
{
|
|
|
|
/* If write didn't set errno, assume problem is no disk space. */
|
|
|
|
if (errno == 0)
|
|
|
|
errno = ENOSPC;
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not write to file \"%s\": %m", versionfile)));
|
|
|
|
}
|
|
|
|
pgstat_report_wait_end();
|
|
|
|
|
|
|
|
/* Close the version file. */
|
|
|
|
CloseTransientFile(fd);
|
|
|
|
|
|
|
|
/* Critical section done. */
|
|
|
|
if (!isRedo)
|
|
|
|
END_CRIT_SECTION();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a new database using the FILE_COPY strategy.
|
|
|
|
*
|
|
|
|
* Copy each tablespace at the filesystem level, and log a single WAL record
|
|
|
|
* for each tablespace copied. This requires a checkpoint before and after the
|
|
|
|
* copy, which may be expensive, but it does greatly reduce WAL generation
|
|
|
|
* if the copied database is large.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid, Oid src_tsid,
|
|
|
|
Oid dst_tsid)
|
|
|
|
{
|
|
|
|
TableScanDesc scan;
|
|
|
|
Relation rel;
|
|
|
|
HeapTuple tuple;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Force a checkpoint before starting the copy. This will force all dirty
|
|
|
|
* buffers, including those of unlogged tables, out to disk, to ensure
|
|
|
|
* source database is up-to-date on disk for the copy.
|
|
|
|
* FlushDatabaseBuffers() would suffice for that, but we also want to
|
|
|
|
* process any pending unlink requests. Otherwise, if a checkpoint
|
|
|
|
* happened while we're copying files, a file might be deleted just when
|
|
|
|
* we're about to copy it, causing the lstat() call in copydir() to fail
|
|
|
|
* with ENOENT.
|
|
|
|
*/
|
|
|
|
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
|
|
|
|
CHECKPOINT_WAIT | CHECKPOINT_FLUSH_ALL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Iterate through all tablespaces of the template database, and copy each
|
|
|
|
* one to the new database.
|
|
|
|
*/
|
|
|
|
rel = table_open(TableSpaceRelationId, AccessShareLock);
|
|
|
|
scan = table_beginscan_catalog(rel, 0, NULL);
|
|
|
|
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
|
|
|
|
{
|
|
|
|
Form_pg_tablespace spaceform = (Form_pg_tablespace) GETSTRUCT(tuple);
|
|
|
|
Oid srctablespace = spaceform->oid;
|
|
|
|
Oid dsttablespace;
|
|
|
|
char *srcpath;
|
|
|
|
char *dstpath;
|
|
|
|
struct stat st;
|
|
|
|
|
|
|
|
/* No need to copy global tablespace */
|
|
|
|
if (srctablespace == GLOBALTABLESPACE_OID)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
srcpath = GetDatabasePath(src_dboid, srctablespace);
|
|
|
|
|
|
|
|
if (stat(srcpath, &st) < 0 || !S_ISDIR(st.st_mode) ||
|
|
|
|
directory_is_empty(srcpath))
|
|
|
|
{
|
|
|
|
/* Assume we can ignore it */
|
|
|
|
pfree(srcpath);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (srctablespace == src_tsid)
|
|
|
|
dsttablespace = dst_tsid;
|
|
|
|
else
|
|
|
|
dsttablespace = srctablespace;
|
|
|
|
|
|
|
|
dstpath = GetDatabasePath(dst_dboid, dsttablespace);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Copy this subdirectory to the new location
|
|
|
|
*
|
|
|
|
* We don't need to copy subdirectories
|
|
|
|
*/
|
|
|
|
copydir(srcpath, dstpath, false);
|
|
|
|
|
|
|
|
/* Record the filesystem change in XLOG */
|
|
|
|
{
|
|
|
|
xl_dbase_create_file_copy_rec xlrec;
|
|
|
|
|
|
|
|
xlrec.db_id = dst_dboid;
|
|
|
|
xlrec.tablespace_id = dsttablespace;
|
|
|
|
xlrec.src_db_id = src_dboid;
|
|
|
|
xlrec.src_tablespace_id = srctablespace;
|
|
|
|
|
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterData((char *) &xlrec,
|
|
|
|
sizeof(xl_dbase_create_file_copy_rec));
|
|
|
|
|
|
|
|
(void) XLogInsert(RM_DBASE_ID,
|
|
|
|
XLOG_DBASE_CREATE_FILE_COPY | XLR_SPECIAL_REL_UPDATE);
|
|
|
|
}
|
2022-04-25 10:32:13 +02:00
|
|
|
pfree(srcpath);
|
|
|
|
pfree(dstpath);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
}
|
|
|
|
table_endscan(scan);
|
|
|
|
table_close(rel, AccessShareLock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We force a checkpoint before committing. This effectively means that
|
|
|
|
* committed XLOG_DBASE_CREATE_FILE_COPY operations will never need to be
|
|
|
|
* replayed (at least not in ordinary crash recovery; we still have to
|
|
|
|
* make the XLOG entry for the benefit of PITR operations). This avoids
|
|
|
|
* two nasty scenarios:
|
|
|
|
*
|
|
|
|
* #1: When PITR is off, we don't XLOG the contents of newly created
|
|
|
|
* indexes; therefore the drop-and-recreate-whole-directory behavior of
|
|
|
|
* DBASE_CREATE replay would lose such indexes.
|
|
|
|
*
|
|
|
|
* #2: Since we have to recopy the source database during DBASE_CREATE
|
|
|
|
* replay, we run the risk of copying changes in it that were committed
|
|
|
|
* after the original CREATE DATABASE command but before the system crash
|
|
|
|
* that led to the replay. This is at least unexpected and at worst could
|
|
|
|
* lead to inconsistencies, eg duplicate table names.
|
|
|
|
*
|
|
|
|
* (Both of these were real bugs in releases 8.0 through 8.0.3.)
|
|
|
|
*
|
|
|
|
* In PITR replay, the first of these isn't an issue, and the second is
|
|
|
|
* only a risk if the CREATE DATABASE and subsequent template database
|
|
|
|
* change both occur while a base backup is being taken. There doesn't
|
|
|
|
* seem to be much we can do about that except document it as a
|
|
|
|
* limitation.
|
|
|
|
*
|
|
|
|
* See CreateDatabaseUsingWalLog() for a less cheesy CREATE DATABASE
|
|
|
|
* strategy that avoids these problems.
|
|
|
|
*/
|
|
|
|
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
|
|
|
|
}
|
2000-01-13 19:26:18 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* CREATE DATABASE
|
|
|
|
*/
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid
|
2016-09-06 18:00:00 +02:00
|
|
|
createdb(ParseState *pstate, const CreatedbStmt *stmt)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2000-11-14 19:37:49 +01:00
|
|
|
Oid src_dboid;
|
2005-06-28 07:09:14 +02:00
|
|
|
Oid src_owner;
|
2019-09-27 23:10:16 +02:00
|
|
|
int src_encoding = -1;
|
|
|
|
char *src_collate = NULL;
|
|
|
|
char *src_ctype = NULL;
|
2022-03-17 11:11:21 +01:00
|
|
|
char *src_iculocale = NULL;
|
2023-03-08 16:35:42 +01:00
|
|
|
char *src_icurules = NULL;
|
2022-03-19 02:48:03 +01:00
|
|
|
char src_locprovider = '\0';
|
2022-02-14 08:09:04 +01:00
|
|
|
char *src_collversion = NULL;
|
2000-11-14 19:37:49 +01:00
|
|
|
bool src_istemplate;
|
2005-03-12 22:33:55 +01:00
|
|
|
bool src_allowconn;
|
2019-09-27 23:10:16 +02:00
|
|
|
TransactionId src_frozenxid = InvalidTransactionId;
|
|
|
|
MultiXactId src_minmxid = InvalidMultiXactId;
|
2004-06-18 08:14:31 +02:00
|
|
|
Oid src_deftablespace;
|
2005-08-02 21:02:32 +02:00
|
|
|
volatile Oid dst_deftablespace;
|
2006-05-04 18:07:29 +02:00
|
|
|
Relation pg_database_rel;
|
2000-01-13 19:26:18 +01:00
|
|
|
HeapTuple tuple;
|
2022-07-16 08:42:15 +02:00
|
|
|
Datum new_record[Natts_pg_database] = {0};
|
|
|
|
bool new_record_nulls[Natts_pg_database] = {0};
|
pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba005256a19991203583a94b4f9a21a9 arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.
One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.
To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.
This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 20:23:15 +01:00
|
|
|
Oid dboid = InvalidOid;
|
2006-05-04 18:07:29 +02:00
|
|
|
Oid datdba;
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *option;
|
2004-06-18 08:14:31 +02:00
|
|
|
DefElem *dtablespacename = NULL;
|
2002-06-18 19:27:58 +02:00
|
|
|
DefElem *downer = NULL;
|
|
|
|
DefElem *dtemplate = NULL;
|
|
|
|
DefElem *dencoding = NULL;
|
2019-07-23 14:40:42 +02:00
|
|
|
DefElem *dlocale = NULL;
|
2008-09-23 11:20:39 +02:00
|
|
|
DefElem *dcollate = NULL;
|
|
|
|
DefElem *dctype = NULL;
|
2022-03-17 11:11:21 +01:00
|
|
|
DefElem *diculocale = NULL;
|
2023-03-08 16:35:42 +01:00
|
|
|
DefElem *dicurules = NULL;
|
2022-03-17 11:11:21 +01:00
|
|
|
DefElem *dlocprovider = NULL;
|
2014-07-02 02:10:38 +02:00
|
|
|
DefElem *distemplate = NULL;
|
|
|
|
DefElem *dallowconnections = NULL;
|
2005-07-31 19:19:22 +02:00
|
|
|
DefElem *dconnlimit = NULL;
|
2022-02-14 08:09:04 +01:00
|
|
|
DefElem *dcollversion = NULL;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
DefElem *dstrategy = NULL;
|
2002-06-18 19:27:58 +02:00
|
|
|
char *dbname = stmt->dbname;
|
|
|
|
char *dbowner = NULL;
|
2005-06-21 06:02:34 +02:00
|
|
|
const char *dbtemplate = NULL;
|
2008-09-23 11:20:39 +02:00
|
|
|
char *dbcollate = NULL;
|
|
|
|
char *dbctype = NULL;
|
2022-03-17 11:11:21 +01:00
|
|
|
char *dbiculocale = NULL;
|
2023-03-08 16:35:42 +01:00
|
|
|
char *dbicurules = NULL;
|
2022-03-17 11:11:21 +01:00
|
|
|
char dblocprovider = '\0';
|
Replace empty locale name with implied value in CREATE DATABASE and initdb.
setlocale() accepts locale name "" as meaning "the locale specified by the
process's environment variables". Historically we've accepted that for
Postgres' locale settings, too. However, it's fairly unsafe to store an
empty string in a new database's pg_database.datcollate or datctype fields,
because then the interpretation could vary across postmaster restarts,
possibly resulting in index corruption and other unpleasantness.
Instead, we should expand "" to whatever it means at the moment of calling
CREATE DATABASE, which we can do by saving the value returned by
setlocale().
For consistency, make initdb set up the initial lc_xxx parameter values the
same way. initdb was already doing the right thing for empty locale names,
but it did not replace non-empty names with setlocale results. On a
platform where setlocale chooses to canonicalize the spellings of locale
names, this would result in annoying inconsistency. (It seems that popular
implementations of setlocale don't do such canonicalization, which is a
pity, but the POSIX spec certainly allows it to be done.) The same risk
of inconsistency leads me to not venture back-patching this, although it
could certainly be seen as a longstanding bug.
Per report from Jeff Davis, though this is not his proposed patch.
2012-03-26 03:47:22 +02:00
|
|
|
char *canonname;
|
2006-05-04 18:07:29 +02:00
|
|
|
int encoding = -1;
|
2014-07-02 02:10:38 +02:00
|
|
|
bool dbistemplate = false;
|
|
|
|
bool dballowconnections = true;
|
2006-05-04 18:07:29 +02:00
|
|
|
int dbconnlimit = -1;
|
2022-02-14 08:09:04 +01:00
|
|
|
char *dbcollversion = NULL;
|
2008-08-04 20:03:46 +02:00
|
|
|
int notherbackends;
|
|
|
|
int npreparedxacts;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
CreateDBStrategy dbstrategy = CREATEDB_WAL_LOG;
|
2008-04-17 01:59:40 +02:00
|
|
|
createdb_failure_params fparms;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2002-06-18 19:27:58 +02:00
|
|
|
/* Extract options from the statement node tree */
|
|
|
|
foreach(option, stmt->options)
|
|
|
|
{
|
|
|
|
DefElem *defel = (DefElem *) lfirst(option);
|
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
if (strcmp(defel->defname, "tablespace") == 0)
|
2002-06-18 19:27:58 +02:00
|
|
|
{
|
2004-06-18 08:14:31 +02:00
|
|
|
if (dtablespacename)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2004-06-18 08:14:31 +02:00
|
|
|
dtablespacename = defel;
|
2002-06-18 19:27:58 +02:00
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
else if (strcmp(defel->defname, "owner") == 0)
|
2002-06-18 19:27:58 +02:00
|
|
|
{
|
2004-06-18 08:14:31 +02:00
|
|
|
if (downer)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2004-06-18 08:14:31 +02:00
|
|
|
downer = defel;
|
2002-06-18 19:27:58 +02:00
|
|
|
}
|
|
|
|
else if (strcmp(defel->defname, "template") == 0)
|
|
|
|
{
|
|
|
|
if (dtemplate)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2002-06-18 19:27:58 +02:00
|
|
|
dtemplate = defel;
|
|
|
|
}
|
|
|
|
else if (strcmp(defel->defname, "encoding") == 0)
|
|
|
|
{
|
|
|
|
if (dencoding)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2002-06-18 19:27:58 +02:00
|
|
|
dencoding = defel;
|
|
|
|
}
|
2019-07-23 14:40:42 +02:00
|
|
|
else if (strcmp(defel->defname, "locale") == 0)
|
|
|
|
{
|
|
|
|
if (dlocale)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2019-07-23 14:40:42 +02:00
|
|
|
dlocale = defel;
|
|
|
|
}
|
2009-04-06 10:42:53 +02:00
|
|
|
else if (strcmp(defel->defname, "lc_collate") == 0)
|
2008-09-23 11:20:39 +02:00
|
|
|
{
|
|
|
|
if (dcollate)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2008-09-23 11:20:39 +02:00
|
|
|
dcollate = defel;
|
|
|
|
}
|
2009-04-06 10:42:53 +02:00
|
|
|
else if (strcmp(defel->defname, "lc_ctype") == 0)
|
2008-09-23 11:20:39 +02:00
|
|
|
{
|
|
|
|
if (dctype)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2008-09-23 11:20:39 +02:00
|
|
|
dctype = defel;
|
|
|
|
}
|
2022-03-17 11:11:21 +01:00
|
|
|
else if (strcmp(defel->defname, "icu_locale") == 0)
|
|
|
|
{
|
|
|
|
if (diculocale)
|
|
|
|
errorConflictingDefElem(defel, pstate);
|
|
|
|
diculocale = defel;
|
|
|
|
}
|
2023-03-08 16:35:42 +01:00
|
|
|
else if (strcmp(defel->defname, "icu_rules") == 0)
|
|
|
|
{
|
|
|
|
if (dicurules)
|
|
|
|
errorConflictingDefElem(defel, pstate);
|
|
|
|
dicurules = defel;
|
|
|
|
}
|
2022-03-17 11:11:21 +01:00
|
|
|
else if (strcmp(defel->defname, "locale_provider") == 0)
|
|
|
|
{
|
|
|
|
if (dlocprovider)
|
|
|
|
errorConflictingDefElem(defel, pstate);
|
|
|
|
dlocprovider = defel;
|
|
|
|
}
|
2014-07-02 02:10:38 +02:00
|
|
|
else if (strcmp(defel->defname, "is_template") == 0)
|
|
|
|
{
|
|
|
|
if (distemplate)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2014-07-02 02:10:38 +02:00
|
|
|
distemplate = defel;
|
|
|
|
}
|
|
|
|
else if (strcmp(defel->defname, "allow_connections") == 0)
|
|
|
|
{
|
|
|
|
if (dallowconnections)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2014-07-02 02:10:38 +02:00
|
|
|
dallowconnections = defel;
|
|
|
|
}
|
2014-07-02 01:02:21 +02:00
|
|
|
else if (strcmp(defel->defname, "connection_limit") == 0)
|
2005-07-31 19:19:22 +02:00
|
|
|
{
|
|
|
|
if (dconnlimit)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2005-07-31 19:19:22 +02:00
|
|
|
dconnlimit = defel;
|
|
|
|
}
|
2022-02-14 08:09:04 +01:00
|
|
|
else if (strcmp(defel->defname, "collation_version") == 0)
|
|
|
|
{
|
|
|
|
if (dcollversion)
|
|
|
|
errorConflictingDefElem(defel, pstate);
|
|
|
|
dcollversion = defel;
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
else if (strcmp(defel->defname, "location") == 0)
|
|
|
|
{
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("LOCATION is not supported anymore"),
|
2016-09-06 18:00:00 +02:00
|
|
|
errhint("Consider using tablespaces instead."),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba005256a19991203583a94b4f9a21a9 arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.
One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.
To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.
This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 20:23:15 +01:00
|
|
|
else if (strcmp(defel->defname, "oid") == 0)
|
|
|
|
{
|
2022-11-04 15:39:52 +01:00
|
|
|
dboid = defGetObjectId(defel);
|
pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba005256a19991203583a94b4f9a21a9 arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.
One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.
To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.
This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 20:23:15 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't normally permit new databases to be created with
|
|
|
|
* system-assigned OIDs. pg_upgrade tries to preserve database
|
|
|
|
* OIDs, so we can't allow any database to be created with an OID
|
|
|
|
* that might be in use in a freshly-initialized cluster created
|
|
|
|
* by some future version. We assume all such OIDs will be from
|
|
|
|
* the system-managed OID range.
|
|
|
|
*
|
|
|
|
* As an exception, however, we permit any OID to be assigned when
|
|
|
|
* allow_system_table_mods=on (so that initdb can assign system
|
|
|
|
* OIDs to template0 and postgres) or when performing a binary
|
|
|
|
* upgrade (so that pg_upgrade can preserve whatever OIDs it finds
|
|
|
|
* in the source cluster).
|
|
|
|
*/
|
|
|
|
if (dboid < FirstNormalObjectId &&
|
|
|
|
!allowSystemTableMods && !IsBinaryUpgrade)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
|
|
|
|
errmsg("OIDs less than %u are reserved for system objects", FirstNormalObjectId));
|
|
|
|
}
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
else if (strcmp(defel->defname, "strategy") == 0)
|
|
|
|
{
|
|
|
|
if (dstrategy)
|
|
|
|
errorConflictingDefElem(defel, pstate);
|
|
|
|
dstrategy = defel;
|
|
|
|
}
|
2002-06-18 19:27:58 +02:00
|
|
|
else
|
2014-07-02 01:02:21 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("option \"%s\" not recognized", defel->defname),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2002-06-18 19:27:58 +02:00
|
|
|
}
|
|
|
|
|
2002-11-02 19:41:22 +01:00
|
|
|
if (downer && downer->arg)
|
2014-07-02 01:02:21 +02:00
|
|
|
dbowner = defGetString(downer);
|
2002-11-02 19:41:22 +01:00
|
|
|
if (dtemplate && dtemplate->arg)
|
2014-07-02 01:02:21 +02:00
|
|
|
dbtemplate = defGetString(dtemplate);
|
2002-11-02 19:41:22 +01:00
|
|
|
if (dencoding && dencoding->arg)
|
|
|
|
{
|
|
|
|
const char *encoding_name;
|
|
|
|
|
|
|
|
if (IsA(dencoding->arg, Integer))
|
|
|
|
{
|
2014-07-02 01:02:21 +02:00
|
|
|
encoding = defGetInt32(dencoding);
|
2002-11-02 19:41:22 +01:00
|
|
|
encoding_name = pg_encoding_to_char(encoding);
|
|
|
|
if (strcmp(encoding_name, "") == 0 ||
|
|
|
|
pg_valid_server_encoding(encoding_name) < 0)
|
2003-07-19 01:20:33 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("%d is not a valid encoding code",
|
2016-09-06 18:00:00 +02:00
|
|
|
encoding),
|
|
|
|
parser_errposition(pstate, dencoding->location)));
|
2002-11-02 19:41:22 +01:00
|
|
|
}
|
2014-07-02 01:02:21 +02:00
|
|
|
else
|
2002-11-02 19:41:22 +01:00
|
|
|
{
|
2014-07-02 01:02:21 +02:00
|
|
|
encoding_name = defGetString(dencoding);
|
2007-10-13 22:18:42 +02:00
|
|
|
encoding = pg_valid_server_encoding(encoding_name);
|
|
|
|
if (encoding < 0)
|
2003-07-19 01:20:33 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("%s is not a valid encoding name",
|
2016-09-06 18:00:00 +02:00
|
|
|
encoding_name),
|
|
|
|
parser_errposition(pstate, dencoding->location)));
|
2002-11-02 19:41:22 +01:00
|
|
|
}
|
|
|
|
}
|
2019-07-23 14:40:42 +02:00
|
|
|
if (dlocale && dlocale->arg)
|
|
|
|
{
|
|
|
|
dbcollate = defGetString(dlocale);
|
|
|
|
dbctype = defGetString(dlocale);
|
|
|
|
}
|
2008-09-23 11:20:39 +02:00
|
|
|
if (dcollate && dcollate->arg)
|
2014-07-02 01:02:21 +02:00
|
|
|
dbcollate = defGetString(dcollate);
|
2008-09-23 11:20:39 +02:00
|
|
|
if (dctype && dctype->arg)
|
2014-07-02 01:02:21 +02:00
|
|
|
dbctype = defGetString(dctype);
|
2022-03-17 11:11:21 +01:00
|
|
|
if (diculocale && diculocale->arg)
|
|
|
|
dbiculocale = defGetString(diculocale);
|
2023-03-08 16:35:42 +01:00
|
|
|
if (dicurules && dicurules->arg)
|
|
|
|
dbicurules = defGetString(dicurules);
|
2022-03-17 11:11:21 +01:00
|
|
|
if (dlocprovider && dlocprovider->arg)
|
|
|
|
{
|
|
|
|
char *locproviderstr = defGetString(dlocprovider);
|
|
|
|
|
|
|
|
if (pg_strcasecmp(locproviderstr, "icu") == 0)
|
|
|
|
dblocprovider = COLLPROVIDER_ICU;
|
|
|
|
else if (pg_strcasecmp(locproviderstr, "libc") == 0)
|
|
|
|
dblocprovider = COLLPROVIDER_LIBC;
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("unrecognized locale provider: %s",
|
|
|
|
locproviderstr)));
|
|
|
|
}
|
2014-07-02 02:10:38 +02:00
|
|
|
if (distemplate && distemplate->arg)
|
|
|
|
dbistemplate = defGetBoolean(distemplate);
|
|
|
|
if (dallowconnections && dallowconnections->arg)
|
|
|
|
dballowconnections = defGetBoolean(dallowconnections);
|
2005-07-31 19:19:22 +02:00
|
|
|
if (dconnlimit && dconnlimit->arg)
|
2009-01-30 18:24:47 +01:00
|
|
|
{
|
2014-07-02 01:02:21 +02:00
|
|
|
dbconnlimit = defGetInt32(dconnlimit);
|
2009-01-30 18:24:47 +01:00
|
|
|
if (dbconnlimit < -1)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("invalid connection limit: %d", dbconnlimit)));
|
|
|
|
}
|
2022-02-14 08:09:04 +01:00
|
|
|
if (dcollversion)
|
|
|
|
dbcollversion = defGetString(dcollversion);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2005-06-28 07:09:14 +02:00
|
|
|
/* obtain OID of proposed owner */
|
2002-02-24 21:20:21 +01:00
|
|
|
if (dbowner)
|
2010-08-05 16:45:09 +02:00
|
|
|
datdba = get_role_oid(dbowner, false);
|
2002-02-24 21:20:21 +01:00
|
|
|
else
|
|
|
|
datdba = GetUserId();
|
|
|
|
|
2005-07-14 23:46:30 +02:00
|
|
|
/*
|
|
|
|
* To create a database, must have createdb privilege and must be able to
|
|
|
|
* become the target role (this does not imply that the target role itself
|
|
|
|
* must have createdb privilege). The latter provision guards against
|
|
|
|
* "giveaway" attacks. Note that a superuser will always have both of
|
|
|
|
* these privileges a fortiori.
|
|
|
|
*/
|
2014-12-23 19:35:49 +01:00
|
|
|
if (!have_createdb_privilege())
|
2005-07-14 23:46:30 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied to create database")));
|
|
|
|
|
Add a SET option to the GRANT command.
Similar to how the INHERIT option controls whether or not the
permissions of the granted role are automatically available to the
grantee, the new SET permission controls whether or not the grantee
may use the SET ROLE command to assume the privileges of the granted
role.
In addition, the new SET permission controls whether or not it
is possible to transfer ownership of objects to the target role
or to create new objects owned by the target role using commands
such as CREATE DATABASE .. OWNER. We could alternatively have made
this controlled by the INHERIT option, or allow it when either
option is given. An advantage of this approach is that if you
are granted a predefined role with INHERIT TRUE, SET FALSE, you
can't go and create objects owned by that role.
The underlying theory here is that the ability to create objects
as a target role is not a privilege per se, and thus does not
depend on whether you inherit the target role's privileges. However,
it's surely something you could do anyway if you could SET ROLE
to the target role, and thus making it contingent on whether you
have that ability is reasonable.
Design review by Nathan Bossat, Wolfgang Walther, Jeff Davis,
Peter Eisentraut, and Stephen Frost.
Discussion: http://postgr.es/m/CA+Tgmob+zDSRS6JXYrgq0NWdzCXuTNzT5eK54Dn2hhgt17nm8A@mail.gmail.com
2022-11-18 18:32:50 +01:00
|
|
|
check_can_set_role(GetUserId(), datdba);
|
2000-01-13 19:26:18 +01:00
|
|
|
|
|
|
|
/*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Lookup database (template) to be cloned, and obtain share lock on it.
|
|
|
|
* ShareLock allows two CREATE DATABASEs to work from the same template
|
|
|
|
* concurrently, while ensuring no one is busy dropping it in parallel
|
|
|
|
* (which would be Very Bad since we'd likely get an incomplete copy
|
|
|
|
* without knowing it). This also prevents any new connections from being
|
|
|
|
* made to the source until we finish copying it, so we can be sure it
|
|
|
|
* won't change underneath us.
|
2000-10-22 19:55:49 +02:00
|
|
|
*/
|
2000-11-14 19:37:49 +01:00
|
|
|
if (!dbtemplate)
|
|
|
|
dbtemplate = "template1"; /* Default template database name */
|
2000-10-22 19:55:49 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
if (!get_db_info(dbtemplate, ShareLock,
|
|
|
|
&src_dboid, &src_owner, &src_encoding,
|
2022-01-20 14:56:54 +01:00
|
|
|
&src_istemplate, &src_allowconn,
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
&src_frozenxid, &src_minmxid, &src_deftablespace,
|
2023-03-08 16:35:42 +01:00
|
|
|
&src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
|
2022-03-17 11:11:21 +01:00
|
|
|
&src_collversion))
|
2003-07-19 01:20:33 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
2006-05-04 18:07:29 +02:00
|
|
|
errmsg("template database \"%s\" does not exist",
|
|
|
|
dbtemplate)));
|
2001-03-22 05:01:46 +01:00
|
|
|
|
2000-11-14 19:37:49 +01:00
|
|
|
/*
|
|
|
|
* Permission check: to copy a DB that's not marked datistemplate, you
|
|
|
|
* must be superuser or the owner thereof.
|
|
|
|
*/
|
|
|
|
if (!src_istemplate)
|
|
|
|
{
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, src_dboid, GetUserId()))
|
2003-07-19 01:20:33 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2003-08-01 02:15:26 +02:00
|
|
|
errmsg("permission denied to copy database \"%s\"",
|
2003-07-19 01:20:33 +02:00
|
|
|
dbtemplate)));
|
2000-11-14 19:37:49 +01:00
|
|
|
}
|
2001-03-22 05:01:46 +01:00
|
|
|
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
/* Validate the database creation strategy. */
|
|
|
|
if (dstrategy && dstrategy->arg)
|
|
|
|
{
|
|
|
|
char *strategy;
|
|
|
|
|
|
|
|
strategy = defGetString(dstrategy);
|
|
|
|
if (strcmp(strategy, "wal_log") == 0)
|
|
|
|
dbstrategy = CREATEDB_WAL_LOG;
|
|
|
|
else if (strcmp(strategy, "file_copy") == 0)
|
|
|
|
dbstrategy = CREATEDB_FILE_COPY;
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
2022-09-25 00:38:35 +02:00
|
|
|
errmsg("invalid create database strategy \"%s\"", strategy),
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
errhint("Valid strategies are \"wal_log\", and \"file_copy\".")));
|
|
|
|
}
|
|
|
|
|
2008-09-23 11:20:39 +02:00
|
|
|
/* If encoding or locales are defaulted, use source's setting */
|
2000-11-14 19:37:49 +01:00
|
|
|
if (encoding < 0)
|
|
|
|
encoding = src_encoding;
|
2008-09-23 11:20:39 +02:00
|
|
|
if (dbcollate == NULL)
|
|
|
|
dbcollate = src_collate;
|
|
|
|
if (dbctype == NULL)
|
|
|
|
dbctype = src_ctype;
|
2022-03-17 11:11:21 +01:00
|
|
|
if (dblocprovider == '\0')
|
|
|
|
dblocprovider = src_locprovider;
|
2022-08-22 15:31:50 +02:00
|
|
|
if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
|
|
|
|
dbiculocale = src_iculocale;
|
2023-03-08 16:35:42 +01:00
|
|
|
if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
|
|
|
|
dbicurules = src_icurules;
|
2000-10-22 19:55:49 +02:00
|
|
|
|
Commit Karel's patch.
-------------------------------------------------------------------
Subject: Re: [PATCHES] encoding names
From: Karel Zak <zakkr@zf.jcu.cz>
To: Peter Eisentraut <peter_e@gmx.net>
Cc: pgsql-patches <pgsql-patches@postgresql.org>
Date: Fri, 31 Aug 2001 17:24:38 +0200
On Thu, Aug 30, 2001 at 01:30:40AM +0200, Peter Eisentraut wrote:
> > - convert encoding 'name' to 'id'
>
> I thought we decided not to add functions returning "new" names until we
> know exactly what the new names should be, and pending schema
Ok, the patch not to add functions.
> better
>
> ...(): encoding name too long
Fixed.
I found new bug in command/variable.c in parse_client_encoding(), nobody
probably never see this error:
if (pg_set_client_encoding(encoding))
{
elog(ERROR, "Conversion between %s and %s is not supported",
value, GetDatabaseEncodingName());
}
because pg_set_client_encoding() returns -1 for error and 0 as true.
It's fixed too.
IMHO it can be apply.
Karel
PS:
* following files are renamed:
src/utils/mb/Unicode/KOI8_to_utf8.map -->
src/utils/mb/Unicode/koi8r_to_utf8.map
src/utils/mb/Unicode/WIN_to_utf8.map -->
src/utils/mb/Unicode/win1251_to_utf8.map
src/utils/mb/Unicode/utf8_to_KOI8.map -->
src/utils/mb/Unicode/utf8_to_koi8r.map
src/utils/mb/Unicode/utf8_to_WIN.map -->
src/utils/mb/Unicode/utf8_to_win1251.map
* new file:
src/utils/mb/encname.c
* removed file:
src/utils/mb/common.c
--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/
C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
2001-09-06 06:57:30 +02:00
|
|
|
/* Some encodings are client only */
|
|
|
|
if (!PG_VALID_BE_ENCODING(encoding))
|
2003-07-19 01:20:33 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("invalid server encoding %d", encoding)));
|
2001-10-25 07:50:21 +02:00
|
|
|
|
Replace empty locale name with implied value in CREATE DATABASE and initdb.
setlocale() accepts locale name "" as meaning "the locale specified by the
process's environment variables". Historically we've accepted that for
Postgres' locale settings, too. However, it's fairly unsafe to store an
empty string in a new database's pg_database.datcollate or datctype fields,
because then the interpretation could vary across postmaster restarts,
possibly resulting in index corruption and other unpleasantness.
Instead, we should expand "" to whatever it means at the moment of calling
CREATE DATABASE, which we can do by saving the value returned by
setlocale().
For consistency, make initdb set up the initial lc_xxx parameter values the
same way. initdb was already doing the right thing for empty locale names,
but it did not replace non-empty names with setlocale results. On a
platform where setlocale chooses to canonicalize the spellings of locale
names, this would result in annoying inconsistency. (It seems that popular
implementations of setlocale don't do such canonicalization, which is a
pity, but the POSIX spec certainly allows it to be done.) The same risk
of inconsistency leads me to not venture back-patching this, although it
could certainly be seen as a longstanding bug.
Per report from Jeff Davis, though this is not his proposed patch.
2012-03-26 03:47:22 +02:00
|
|
|
/* Check that the chosen locales are valid, and get canonical spellings */
|
|
|
|
if (!check_locale(LC_COLLATE, dbcollate, &canonname))
|
2008-09-23 11:20:39 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
2012-04-13 19:37:07 +02:00
|
|
|
errmsg("invalid locale name: \"%s\"", dbcollate)));
|
Replace empty locale name with implied value in CREATE DATABASE and initdb.
setlocale() accepts locale name "" as meaning "the locale specified by the
process's environment variables". Historically we've accepted that for
Postgres' locale settings, too. However, it's fairly unsafe to store an
empty string in a new database's pg_database.datcollate or datctype fields,
because then the interpretation could vary across postmaster restarts,
possibly resulting in index corruption and other unpleasantness.
Instead, we should expand "" to whatever it means at the moment of calling
CREATE DATABASE, which we can do by saving the value returned by
setlocale().
For consistency, make initdb set up the initial lc_xxx parameter values the
same way. initdb was already doing the right thing for empty locale names,
but it did not replace non-empty names with setlocale results. On a
platform where setlocale chooses to canonicalize the spellings of locale
names, this would result in annoying inconsistency. (It seems that popular
implementations of setlocale don't do such canonicalization, which is a
pity, but the POSIX spec certainly allows it to be done.) The same risk
of inconsistency leads me to not venture back-patching this, although it
could certainly be seen as a longstanding bug.
Per report from Jeff Davis, though this is not his proposed patch.
2012-03-26 03:47:22 +02:00
|
|
|
dbcollate = canonname;
|
|
|
|
if (!check_locale(LC_CTYPE, dbctype, &canonname))
|
2008-09-23 11:20:39 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
2012-04-13 19:37:07 +02:00
|
|
|
errmsg("invalid locale name: \"%s\"", dbctype)));
|
Replace empty locale name with implied value in CREATE DATABASE and initdb.
setlocale() accepts locale name "" as meaning "the locale specified by the
process's environment variables". Historically we've accepted that for
Postgres' locale settings, too. However, it's fairly unsafe to store an
empty string in a new database's pg_database.datcollate or datctype fields,
because then the interpretation could vary across postmaster restarts,
possibly resulting in index corruption and other unpleasantness.
Instead, we should expand "" to whatever it means at the moment of calling
CREATE DATABASE, which we can do by saving the value returned by
setlocale().
For consistency, make initdb set up the initial lc_xxx parameter values the
same way. initdb was already doing the right thing for empty locale names,
but it did not replace non-empty names with setlocale results. On a
platform where setlocale chooses to canonicalize the spellings of locale
names, this would result in annoying inconsistency. (It seems that popular
implementations of setlocale don't do such canonicalization, which is a
pity, but the POSIX spec certainly allows it to be done.) The same risk
of inconsistency leads me to not venture back-patching this, although it
could certainly be seen as a longstanding bug.
Per report from Jeff Davis, though this is not his proposed patch.
2012-03-26 03:47:22 +02:00
|
|
|
dbctype = canonname;
|
2008-09-23 11:20:39 +02:00
|
|
|
|
2011-02-12 14:54:13 +01:00
|
|
|
check_encoding_locale_matches(encoding, dbcollate, dbctype);
|
2008-09-23 12:58:03 +02:00
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
if (dblocprovider == COLLPROVIDER_ICU)
|
|
|
|
{
|
2022-09-16 09:37:54 +02:00
|
|
|
if (!(is_encoding_supported_by_icu(encoding)))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("encoding \"%s\" is not supported with ICU provider",
|
|
|
|
pg_encoding_to_char(encoding))));
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
/*
|
|
|
|
* This would happen if template0 uses the libc provider but the new
|
|
|
|
* database uses icu.
|
|
|
|
*/
|
|
|
|
if (!dbiculocale)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("ICU locale must be specified")));
|
|
|
|
|
Canonicalize ICU locale names to language tags.
Convert to BCP47 language tags before storing in the catalog, except
during binary upgrade or when the locale comes from an existing
collation or template database.
The resulting language tags can vary slightly between ICU
versions. For instance, "@colBackwards=yes" is converted to
"und-u-kb-true" in older versions of ICU, and to the simpler (but
equivalent) "und-u-kb" in newer versions.
The process of canonicalizing to a language tag also understands more
input locale string formats than ucol_open(). For instance,
"fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is
ignored; effectively treating it the same as the locale "fr" and
opening the wrong collator. Canonicalization properly interprets the
language and region, resulting in the language tag "fr-CA", which can
then be understood by ucol_open().
This commit fixes a problem in prior versions due to ucol_open()
misinterpreting locale strings as described above. For instance,
creating an ICU collation with locale "fr_CA.UTF-8" would store that
string directly in the catalog, which would later be passed to (and
misinterpreted by) ucol_open(). After this commit, the locale string
will be canonicalized to language tag "fr-CA" in the catalog, which
will be properly understood by ucol_open(). Because this fix affects
the resulting collator, we cannot change the locale string stored in
the catalog for existing databases or collations; otherwise we'd risk
corrupting indexes. Therefore, only canonicalize locales for
newly-created (not upgraded) collations/databases. For similar
reasons, do not backport.
Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2023-04-04 19:28:08 +02:00
|
|
|
/*
|
|
|
|
* During binary upgrade, or when the locale came from the template
|
|
|
|
* database, preserve locale string. Otherwise, canonicalize to a
|
|
|
|
* language tag.
|
|
|
|
*/
|
|
|
|
if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
|
|
|
|
{
|
2023-05-19 23:24:48 +02:00
|
|
|
char *langtag = icu_language_tag(dbiculocale,
|
|
|
|
icu_validation_level);
|
Canonicalize ICU locale names to language tags.
Convert to BCP47 language tags before storing in the catalog, except
during binary upgrade or when the locale comes from an existing
collation or template database.
The resulting language tags can vary slightly between ICU
versions. For instance, "@colBackwards=yes" is converted to
"und-u-kb-true" in older versions of ICU, and to the simpler (but
equivalent) "und-u-kb" in newer versions.
The process of canonicalizing to a language tag also understands more
input locale string formats than ucol_open(). For instance,
"fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is
ignored; effectively treating it the same as the locale "fr" and
opening the wrong collator. Canonicalization properly interprets the
language and region, resulting in the language tag "fr-CA", which can
then be understood by ucol_open().
This commit fixes a problem in prior versions due to ucol_open()
misinterpreting locale strings as described above. For instance,
creating an ICU collation with locale "fr_CA.UTF-8" would store that
string directly in the catalog, which would later be passed to (and
misinterpreted by) ucol_open(). After this commit, the locale string
will be canonicalized to language tag "fr-CA" in the catalog, which
will be properly understood by ucol_open(). Because this fix affects
the resulting collator, we cannot change the locale string stored in
the catalog for existing databases or collations; otherwise we'd risk
corrupting indexes. Therefore, only canonicalize locales for
newly-created (not upgraded) collations/databases. For similar
reasons, do not backport.
Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2023-04-04 19:28:08 +02:00
|
|
|
|
|
|
|
if (langtag && strcmp(dbiculocale, langtag) != 0)
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("using standard form \"%s\" for locale \"%s\"",
|
|
|
|
langtag, dbiculocale)));
|
|
|
|
|
|
|
|
dbiculocale = langtag;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-03-29 01:15:59 +02:00
|
|
|
icu_validate_locale(dbiculocale);
|
2022-09-16 09:37:54 +02:00
|
|
|
}
|
2022-09-21 16:28:40 +02:00
|
|
|
else
|
|
|
|
{
|
|
|
|
if (dbiculocale)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("ICU locale cannot be specified unless locale provider is ICU")));
|
2023-03-09 08:09:40 +01:00
|
|
|
|
|
|
|
if (dbicurules)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("ICU rules cannot be specified unless locale provider is ICU")));
|
2022-09-21 16:28:40 +02:00
|
|
|
}
|
2022-03-17 11:11:21 +01:00
|
|
|
|
2008-09-23 11:20:39 +02:00
|
|
|
/*
|
2009-05-06 18:15:21 +02:00
|
|
|
* Check that the new encoding and locale settings match the source
|
|
|
|
* database. We insist on this because we simply copy the source data ---
|
|
|
|
* any non-ASCII data would be wrongly encoded, and any indexes sorted
|
|
|
|
* according to the source locale would be wrong.
|
2008-09-23 11:20:39 +02:00
|
|
|
*
|
2009-05-06 18:15:21 +02:00
|
|
|
* However, we assume that template0 doesn't contain any non-ASCII data
|
|
|
|
* nor any indexes that depend on collation or ctype, so template0 can be
|
|
|
|
* used as template for creating a database with any encoding or locale.
|
2008-09-23 11:20:39 +02:00
|
|
|
*/
|
|
|
|
if (strcmp(dbtemplate, "template0") != 0)
|
|
|
|
{
|
2009-05-06 18:15:21 +02:00
|
|
|
if (encoding != src_encoding)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("new encoding (%s) is incompatible with the encoding of the template database (%s)",
|
|
|
|
pg_encoding_to_char(encoding),
|
|
|
|
pg_encoding_to_char(src_encoding)),
|
|
|
|
errhint("Use the same encoding as in the template database, or use template0 as template.")));
|
|
|
|
|
2009-04-23 19:39:21 +02:00
|
|
|
if (strcmp(dbcollate, src_collate) != 0)
|
2008-09-23 11:20:39 +02:00
|
|
|
ereport(ERROR,
|
2009-05-06 18:15:21 +02:00
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("new collation (%s) is incompatible with the collation of the template database (%s)",
|
|
|
|
dbcollate, src_collate),
|
2009-04-15 23:36:12 +02:00
|
|
|
errhint("Use the same collation as in the template database, or use template0 as template.")));
|
2008-09-23 11:20:39 +02:00
|
|
|
|
2009-04-23 19:39:21 +02:00
|
|
|
if (strcmp(dbctype, src_ctype) != 0)
|
2008-09-23 11:20:39 +02:00
|
|
|
ereport(ERROR,
|
2009-05-06 18:15:21 +02:00
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("new LC_CTYPE (%s) is incompatible with the LC_CTYPE of the template database (%s)",
|
|
|
|
dbctype, src_ctype),
|
2009-04-15 23:36:12 +02:00
|
|
|
errhint("Use the same LC_CTYPE as in the template database, or use template0 as template.")));
|
2022-03-17 11:11:21 +01:00
|
|
|
|
|
|
|
if (dblocprovider != src_locprovider)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("new locale provider (%s) does not match locale provider of the template database (%s)",
|
|
|
|
collprovider_name(dblocprovider), collprovider_name(src_locprovider)),
|
|
|
|
errhint("Use the same locale provider as in the template database, or use template0 as template.")));
|
|
|
|
|
|
|
|
if (dblocprovider == COLLPROVIDER_ICU)
|
|
|
|
{
|
2023-03-08 16:35:42 +01:00
|
|
|
char *val1;
|
|
|
|
char *val2;
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
Assert(dbiculocale);
|
|
|
|
Assert(src_iculocale);
|
|
|
|
if (strcmp(dbiculocale, src_iculocale) != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
|
|
|
|
dbiculocale, src_iculocale),
|
|
|
|
errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
|
2023-03-08 16:35:42 +01:00
|
|
|
|
|
|
|
val1 = dbicurules;
|
|
|
|
if (!val1)
|
|
|
|
val1 = "";
|
|
|
|
val2 = src_icurules;
|
|
|
|
if (!val2)
|
|
|
|
val2 = "";
|
|
|
|
if (strcmp(val1, val2) != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
|
|
|
|
val1, val2),
|
|
|
|
errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
|
2022-03-17 11:11:21 +01:00
|
|
|
}
|
2008-09-23 11:20:39 +02:00
|
|
|
}
|
|
|
|
|
2022-02-14 08:09:04 +01:00
|
|
|
/*
|
|
|
|
* If we got a collation version for the template database, check that it
|
|
|
|
* matches the actual OS collation version. Otherwise error; the user
|
|
|
|
* needs to fix the template database first. Don't complain if a
|
|
|
|
* collation version was specified explicitly as a statement option; that
|
|
|
|
* is used by pg_upgrade to reproduce the old state exactly.
|
|
|
|
*
|
|
|
|
* (If the template database has no collation version, then either the
|
|
|
|
* platform/provider does not support collation versioning, or it's
|
|
|
|
* template0, for which we stipulate that it does not contain
|
|
|
|
* collation-using objects.)
|
|
|
|
*/
|
|
|
|
if (src_collversion && !dcollversion)
|
|
|
|
{
|
|
|
|
char *actual_versionstr;
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
|
2022-02-14 08:09:04 +01:00
|
|
|
if (!actual_versionstr)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
|
|
|
|
dbtemplate)));
|
|
|
|
|
|
|
|
if (strcmp(actual_versionstr, src_collversion) != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errmsg("template database \"%s\" has a collation version mismatch",
|
|
|
|
dbtemplate),
|
|
|
|
errdetail("The template database was created using collation version %s, "
|
|
|
|
"but the operating system provides version %s.",
|
|
|
|
src_collversion, actual_versionstr),
|
|
|
|
errhint("Rebuild all objects in the template database that use the default collation and run "
|
|
|
|
"ALTER DATABASE %s REFRESH COLLATION VERSION, "
|
|
|
|
"or build PostgreSQL with the right library version.",
|
|
|
|
quote_identifier(dbtemplate))));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dbcollversion == NULL)
|
|
|
|
dbcollversion = src_collversion;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Normally, we copy the collation version from the template database.
|
|
|
|
* This last resort only applies if the template database does not have a
|
|
|
|
* collation version, which is normally only the case for template0.
|
|
|
|
*/
|
|
|
|
if (dbcollversion == NULL)
|
2022-03-17 11:11:21 +01:00
|
|
|
dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
|
2022-02-14 08:09:04 +01:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/* Resolve default tablespace for new database */
|
|
|
|
if (dtablespacename && dtablespacename->arg)
|
|
|
|
{
|
|
|
|
char *tablespacename;
|
|
|
|
AclResult aclresult;
|
|
|
|
|
2014-07-02 01:02:21 +02:00
|
|
|
tablespacename = defGetString(dtablespacename);
|
2010-08-05 16:45:09 +02:00
|
|
|
dst_deftablespace = get_tablespace_oid(tablespacename, false);
|
2004-06-18 08:14:31 +02:00
|
|
|
/* check permissions */
|
2022-11-13 08:11:17 +01:00
|
|
|
aclresult = object_aclcheck(TableSpaceRelationId, dst_deftablespace, GetUserId(),
|
2023-05-19 23:24:48 +02:00
|
|
|
ACL_CREATE);
|
2004-06-18 08:14:31 +02:00
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, OBJECT_TABLESPACE,
|
2004-06-18 08:14:31 +02:00
|
|
|
tablespacename);
|
2004-10-17 22:47:21 +02:00
|
|
|
|
2007-10-12 20:55:12 +02:00
|
|
|
/* pg_global must never be the default tablespace */
|
|
|
|
if (dst_deftablespace == GLOBALTABLESPACE_OID)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("pg_global cannot be used as default tablespace")));
|
|
|
|
|
2004-10-17 22:47:21 +02:00
|
|
|
/*
|
|
|
|
* If we are trying to change the default tablespace of the template,
|
|
|
|
* we require that the template not have any files in the new default
|
|
|
|
* tablespace. This is necessary because otherwise the copied
|
|
|
|
* database would contain pg_class rows that refer to its default
|
|
|
|
* tablespace both explicitly (by OID) and implicitly (as zero), which
|
|
|
|
* would cause problems. For example another CREATE DATABASE using
|
|
|
|
* the copied database as template, and trying to change its default
|
|
|
|
* tablespace again, would yield outright incorrect results (it would
|
|
|
|
* improperly move tables to the new default tablespace that should
|
|
|
|
* stay in the same tablespace).
|
|
|
|
*/
|
|
|
|
if (dst_deftablespace != src_deftablespace)
|
|
|
|
{
|
|
|
|
char *srcpath;
|
|
|
|
struct stat st;
|
|
|
|
|
|
|
|
srcpath = GetDatabasePath(src_dboid, dst_deftablespace);
|
|
|
|
|
|
|
|
if (stat(srcpath, &st) == 0 &&
|
|
|
|
S_ISDIR(st.st_mode) &&
|
|
|
|
!directory_is_empty(srcpath))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot assign new default tablespace \"%s\"",
|
|
|
|
tablespacename),
|
|
|
|
errdetail("There is a conflict because database \"%s\" already has some tables in this tablespace.",
|
|
|
|
dbtemplate)));
|
|
|
|
pfree(srcpath);
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Use template database's default tablespace */
|
|
|
|
dst_deftablespace = src_deftablespace;
|
|
|
|
/* Note there is no additional permission check in this path */
|
|
|
|
}
|
|
|
|
|
Add an enforcement mechanism for global object names in regression tests.
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have
on an existing installation.
This commit adds a simple enforcement mechanism for that rule: if the code
is compiled with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS defined, it
will emit a warning (not an error) whenever a database, role, tablespace,
subscription, or replication origin name is created that doesn't obey the
rule. Running one or more buildfarm members with that symbol defined
should be enough to catch new violations, at least in the regular
regression tests. Most TAP tests wouldn't notice such warnings, but
that's actually fine because TAP tests don't execute against an existing
server anyway.
Since it's already the case that running src/test/modules/ tests in
installcheck mode is deprecated, we can use that as a home for tests
that seem unsafe to run against an existing server, such as tests that
might have side-effects on existing roles. Document that (though this
commit doesn't in itself make it any less safe than before).
Update regress.sgml to define these restrictions more clearly, and
to clean up assorted lack-of-up-to-date-ness in its descriptions of
the available regression tests.
Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
2019-06-29 17:34:00 +02:00
|
|
|
/*
|
|
|
|
* If built with appropriate switch, whine when regression-testing
|
|
|
|
* conventions for database names are violated. But don't complain during
|
|
|
|
* initdb.
|
|
|
|
*/
|
|
|
|
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
|
|
|
|
if (IsUnderPostmaster && strstr(dbname, "regression") == NULL)
|
|
|
|
elog(WARNING, "databases created by regression test cases should have names including \"regression\"");
|
|
|
|
#endif
|
|
|
|
|
2000-11-14 19:37:49 +01:00
|
|
|
/*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Check for db name conflict. This is just to give a more friendly error
|
|
|
|
* message than "unique index violation". There's a race condition but
|
|
|
|
* we're willing to accept the less friendly message in that case.
|
2000-11-14 19:37:49 +01:00
|
|
|
*/
|
2010-08-05 16:45:09 +02:00
|
|
|
if (OidIsValid(get_database_oid(dbname, true)))
|
2006-05-04 18:07:29 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_DATABASE),
|
|
|
|
errmsg("database \"%s\" already exists", dbname)));
|
|
|
|
|
2007-06-01 21:38:07 +02:00
|
|
|
/*
|
|
|
|
* The source DB can't have any active backends, except this one
|
|
|
|
* (exception is to allow CREATE DB while connected to template1).
|
|
|
|
* Otherwise we might copy inconsistent data.
|
|
|
|
*
|
|
|
|
* This should be last among the basic error checks, because it involves
|
|
|
|
* potential waiting; we may as well throw an error first if we're gonna
|
|
|
|
* throw one.
|
|
|
|
*/
|
2008-08-04 20:03:46 +02:00
|
|
|
if (CountOtherDBBackends(src_dboid, ¬herbackends, &npreparedxacts))
|
2007-06-01 21:38:07 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("source database \"%s\" is being accessed by other users",
|
2008-08-04 20:03:46 +02:00
|
|
|
dbtemplate),
|
|
|
|
errdetail_busy_db(notherbackends, npreparedxacts)));
|
2007-06-01 21:38:07 +02:00
|
|
|
|
2006-10-19 00:44:12 +02:00
|
|
|
/*
|
|
|
|
* Select an OID for the new database, checking that it doesn't have a
|
|
|
|
* filename conflict with anything already existing in the tablespace
|
|
|
|
* directories.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
|
2006-10-19 00:44:12 +02:00
|
|
|
|
pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba005256a19991203583a94b4f9a21a9 arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.
One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.
To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.
This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 20:23:15 +01:00
|
|
|
/*
|
|
|
|
* If database OID is configured, check if the OID is already in use or
|
|
|
|
* data directory already exists.
|
|
|
|
*/
|
|
|
|
if (OidIsValid(dboid))
|
|
|
|
{
|
|
|
|
char *existing_dbname = get_database_name(dboid);
|
|
|
|
|
|
|
|
if (existing_dbname != NULL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
|
|
|
|
errmsg("database OID %u is already in use by database \"%s\"",
|
|
|
|
dboid, existing_dbname));
|
|
|
|
|
|
|
|
if (check_db_file_conflict(dboid))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
|
|
|
|
errmsg("data directory with the specified OID %u already exists", dboid));
|
|
|
|
}
|
|
|
|
else
|
2006-10-19 00:44:12 +02:00
|
|
|
{
|
pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba005256a19991203583a94b4f9a21a9 arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.
One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.
To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.
This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 20:23:15 +01:00
|
|
|
/* Select an OID for the new database if is not explicitly configured. */
|
|
|
|
do
|
|
|
|
{
|
|
|
|
dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
|
|
|
|
Anum_pg_database_oid);
|
|
|
|
} while (check_db_file_conflict(dboid));
|
|
|
|
}
|
2006-10-19 00:44:12 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
|
|
|
* Insert a new tuple into pg_database. This establishes our ownership of
|
|
|
|
* the new database name (anyone else trying to insert the same name will
|
2006-10-19 00:44:12 +02:00
|
|
|
* block on the unique index, and fail after we commit).
|
2006-05-04 18:07:29 +02:00
|
|
|
*/
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
Assert((dblocprovider == COLLPROVIDER_ICU && dbiculocale) ||
|
|
|
|
(dblocprovider != COLLPROVIDER_ICU && !dbiculocale));
|
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/* Form tuple */
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
new_record[Anum_pg_database_oid - 1] = ObjectIdGetDatum(dboid);
|
2006-05-04 18:07:29 +02:00
|
|
|
new_record[Anum_pg_database_datname - 1] =
|
|
|
|
DirectFunctionCall1(namein, CStringGetDatum(dbname));
|
|
|
|
new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
|
|
|
|
new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
|
2022-03-17 11:11:21 +01:00
|
|
|
new_record[Anum_pg_database_datlocprovider - 1] = CharGetDatum(dblocprovider);
|
2014-07-02 02:10:38 +02:00
|
|
|
new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
|
|
|
|
new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
|
2006-05-04 18:07:29 +02:00
|
|
|
new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
|
Fix recently-understood problems with handling of XID freezing, particularly
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-05 23:42:10 +01:00
|
|
|
new_record[Anum_pg_database_datfrozenxid - 1] = TransactionIdGetDatum(src_frozenxid);
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
new_record[Anum_pg_database_datminmxid - 1] = TransactionIdGetDatum(src_minmxid);
|
2006-05-04 18:07:29 +02:00
|
|
|
new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
|
2022-02-02 11:58:55 +01:00
|
|
|
new_record[Anum_pg_database_datcollate - 1] = CStringGetTextDatum(dbcollate);
|
|
|
|
new_record[Anum_pg_database_datctype - 1] = CStringGetTextDatum(dbctype);
|
2022-03-17 11:11:21 +01:00
|
|
|
if (dbiculocale)
|
|
|
|
new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
|
|
|
|
else
|
|
|
|
new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
|
2023-03-08 16:35:42 +01:00
|
|
|
if (dbicurules)
|
|
|
|
new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
|
|
|
|
else
|
|
|
|
new_record_nulls[Anum_pg_database_daticurules - 1] = true;
|
2022-02-14 08:09:04 +01:00
|
|
|
if (dbcollversion)
|
|
|
|
new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
|
|
|
|
else
|
|
|
|
new_record_nulls[Anum_pg_database_datcollversion - 1] = true;
|
2006-05-04 18:07:29 +02:00
|
|
|
|
|
|
|
/*
|
2009-10-08 00:14:26 +02:00
|
|
|
* We deliberately set datacl to default (NULL), rather than copying it
|
|
|
|
* from the template database. Copying it would be a bad idea when the
|
|
|
|
* owner is not the same as the template's owner.
|
2006-05-04 18:07:29 +02:00
|
|
|
*/
|
2008-11-02 02:45:28 +01:00
|
|
|
new_record_nulls[Anum_pg_database_datacl - 1] = true;
|
2006-05-04 18:07:29 +02:00
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
tuple = heap_form_tuple(RelationGetDescr(pg_database_rel),
|
2006-05-04 18:07:29 +02:00
|
|
|
new_record, new_record_nulls);
|
|
|
|
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleInsert(pg_database_rel, tuple);
|
2006-05-04 18:07:29 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Now generate additional catalog entries associated with the new DB
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Register owner dependency */
|
|
|
|
recordDependencyOnOwner(DatabaseRelationId, dboid, datdba);
|
|
|
|
|
|
|
|
/* Create pg_shdepend entries for objects within database */
|
|
|
|
copyTemplateDependencies(src_dboid, dboid);
|
2000-10-22 19:55:49 +02:00
|
|
|
|
2010-11-25 17:48:49 +01:00
|
|
|
/* Post creation hook for new database */
|
2013-03-07 02:52:06 +01:00
|
|
|
InvokeObjectPostCreateHook(DatabaseRelationId, dboid, 0);
|
2010-11-25 17:48:49 +01:00
|
|
|
|
2001-01-14 23:14:10 +01:00
|
|
|
/*
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
* If we're going to be reading data for the to-be-created database into
|
|
|
|
* shared_buffers, take a lock on it. Nobody should know that this
|
2022-08-04 11:41:29 +02:00
|
|
|
* database exists yet, but it's good to maintain the invariant that an
|
2023-05-19 23:24:48 +02:00
|
|
|
* AccessExclusiveLock on the database is sufficient to drop all of its
|
|
|
|
* buffers without worrying about more being read later.
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
*
|
|
|
|
* Note that we need to do this before entering the
|
|
|
|
* PG_ENSURE_ERROR_CLEANUP block below, because createdb_failure_callback
|
|
|
|
* expects this lock to be held already.
|
2000-11-18 04:36:48 +01:00
|
|
|
*/
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
if (dbstrategy == CREATEDB_WAL_LOG)
|
|
|
|
LockSharedObject(DatabaseRelationId, dboid, 0, AccessShareLock);
|
1998-08-24 03:14:24 +02:00
|
|
|
|
2000-06-02 06:04:54 +02:00
|
|
|
/*
|
2005-08-02 21:02:32 +02:00
|
|
|
* Once we start copying subdirectories, we need to be able to clean 'em
|
2008-04-17 01:59:40 +02:00
|
|
|
* up if we fail. Use an ENSURE block to make sure this happens. (This
|
2005-08-02 21:02:32 +02:00
|
|
|
* is not a 100% solution, because of the possibility of failure during
|
|
|
|
* transaction commit after we leave this routine, but it should handle
|
|
|
|
* most scenarios.)
|
2000-06-02 06:04:54 +02:00
|
|
|
*/
|
2008-04-17 01:59:40 +02:00
|
|
|
fparms.src_dboid = src_dboid;
|
|
|
|
fparms.dest_dboid = dboid;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
fparms.strategy = dbstrategy;
|
|
|
|
|
2008-04-17 01:59:40 +02:00
|
|
|
PG_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
|
|
|
|
PointerGetDatum(&fparms));
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
2005-08-02 21:02:32 +02:00
|
|
|
/*
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
* If the user has asked to create a database with WAL_LOG strategy
|
|
|
|
* then call CreateDatabaseUsingWalLog, which will copy the database
|
|
|
|
* at the block level and it will WAL log each copied block.
|
|
|
|
* Otherwise, call CreateDatabaseUsingFileCopy that will copy the
|
|
|
|
* database file by file.
|
2005-08-02 21:02:32 +02:00
|
|
|
*/
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
if (dbstrategy == CREATEDB_WAL_LOG)
|
|
|
|
CreateDatabaseUsingWalLog(src_dboid, dboid, src_deftablespace,
|
|
|
|
dst_deftablespace);
|
|
|
|
else
|
|
|
|
CreateDatabaseUsingFileCopy(src_dboid, dboid, src_deftablespace,
|
|
|
|
dst_deftablespace);
|
2000-11-14 19:37:49 +01:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
2009-09-01 04:54:52 +02:00
|
|
|
* Close pg_database, but keep lock till commit.
|
2006-05-04 18:07:29 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pg_database_rel, NoLock);
|
2006-05-04 18:07:29 +02:00
|
|
|
|
2005-08-02 21:02:32 +02:00
|
|
|
/*
|
2009-09-01 04:54:52 +02:00
|
|
|
* Force synchronous commit, thus minimizing the window between
|
2017-02-06 10:33:58 +01:00
|
|
|
* creation of the database files and committal of the transaction. If
|
2007-08-02 00:45:09 +02:00
|
|
|
* we crash before committing, we'll have a DB that's taking up disk
|
|
|
|
* space but is not in pg_database, which is not good.
|
2005-08-02 21:02:32 +02:00
|
|
|
*/
|
2009-09-01 04:54:52 +02:00
|
|
|
ForceSyncCommit();
|
2005-08-02 21:02:32 +02:00
|
|
|
}
|
2008-04-17 01:59:40 +02:00
|
|
|
PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
|
|
|
|
PointerGetDatum(&fparms));
|
2012-12-29 13:55:37 +01:00
|
|
|
|
|
|
|
return dboid;
|
2008-04-17 01:59:40 +02:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2011-02-12 14:54:13 +01:00
|
|
|
/*
|
|
|
|
* Check whether chosen encoding matches chosen locale settings. This
|
|
|
|
* restriction is necessary because libc's locale-specific code usually
|
|
|
|
* fails when presented with data in an encoding it's not expecting. We
|
|
|
|
* allow mismatch in four cases:
|
|
|
|
*
|
|
|
|
* 1. locale encoding = SQL_ASCII, which means that the locale is C/POSIX
|
|
|
|
* which works with any encoding.
|
|
|
|
*
|
|
|
|
* 2. locale encoding = -1, which means that we couldn't determine the
|
|
|
|
* locale's encoding and have to trust the user to get it right.
|
|
|
|
*
|
|
|
|
* 3. selected encoding is UTF8 and platform is win32. This is because
|
|
|
|
* UTF8 is a pseudo codepage that is supported in all locales since it's
|
|
|
|
* converted to UTF16 before being used.
|
|
|
|
*
|
|
|
|
* 4. selected encoding is SQL_ASCII, but only if you're a superuser. This
|
|
|
|
* is risky but we have historically allowed it --- notably, the
|
|
|
|
* regression tests require it.
|
|
|
|
*
|
|
|
|
* Note: if you change this policy, fix initdb to match.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
check_encoding_locale_matches(int encoding, const char *collate, const char *ctype)
|
|
|
|
{
|
|
|
|
int ctype_encoding = pg_get_encoding_from_locale(ctype, true);
|
|
|
|
int collate_encoding = pg_get_encoding_from_locale(collate, true);
|
|
|
|
|
|
|
|
if (!(ctype_encoding == encoding ||
|
|
|
|
ctype_encoding == PG_SQL_ASCII ||
|
|
|
|
ctype_encoding == -1 ||
|
|
|
|
#ifdef WIN32
|
|
|
|
encoding == PG_UTF8 ||
|
|
|
|
#endif
|
|
|
|
(encoding == PG_SQL_ASCII && superuser())))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
2012-04-13 19:37:07 +02:00
|
|
|
errmsg("encoding \"%s\" does not match locale \"%s\"",
|
2011-02-12 14:54:13 +01:00
|
|
|
pg_encoding_to_char(encoding),
|
|
|
|
ctype),
|
2012-04-13 19:37:07 +02:00
|
|
|
errdetail("The chosen LC_CTYPE setting requires encoding \"%s\".",
|
2011-02-12 14:54:13 +01:00
|
|
|
pg_encoding_to_char(ctype_encoding))));
|
|
|
|
|
|
|
|
if (!(collate_encoding == encoding ||
|
|
|
|
collate_encoding == PG_SQL_ASCII ||
|
|
|
|
collate_encoding == -1 ||
|
|
|
|
#ifdef WIN32
|
|
|
|
encoding == PG_UTF8 ||
|
|
|
|
#endif
|
|
|
|
(encoding == PG_SQL_ASCII && superuser())))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
2012-04-13 19:37:07 +02:00
|
|
|
errmsg("encoding \"%s\" does not match locale \"%s\"",
|
2011-02-12 14:54:13 +01:00
|
|
|
pg_encoding_to_char(encoding),
|
|
|
|
collate),
|
2012-04-13 19:37:07 +02:00
|
|
|
errdetail("The chosen LC_COLLATE setting requires encoding \"%s\".",
|
2011-02-12 14:54:13 +01:00
|
|
|
pg_encoding_to_char(collate_encoding))));
|
|
|
|
}
|
|
|
|
|
2008-04-17 01:59:40 +02:00
|
|
|
/* Error cleanup callback for createdb */
|
|
|
|
static void
|
|
|
|
createdb_failure_callback(int code, Datum arg)
|
|
|
|
{
|
|
|
|
createdb_failure_params *fparms = (createdb_failure_params *) DatumGetPointer(arg);
|
2005-07-07 22:40:02 +02:00
|
|
|
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
/*
|
|
|
|
* If we were copying database at block levels then drop pages for the
|
|
|
|
* destination database that are in the shared buffer cache. And tell
|
|
|
|
* checkpointer to forget any pending fsync and unlink requests for files
|
|
|
|
* in the database. The reasoning behind doing this is same as explained
|
|
|
|
* in dropdb function. But unlike dropdb we don't need to call
|
|
|
|
* pgstat_drop_database because this database is still not created so
|
|
|
|
* there should not be any stat for this.
|
|
|
|
*/
|
|
|
|
if (fparms->strategy == CREATEDB_WAL_LOG)
|
|
|
|
{
|
|
|
|
DropDatabaseBuffers(fparms->dest_dboid);
|
|
|
|
ForgetDatabaseSyncRequests(fparms->dest_dboid);
|
|
|
|
|
|
|
|
/* Release lock on the target database. */
|
|
|
|
UnlockSharedObject(DatabaseRelationId, fparms->dest_dboid, 0,
|
|
|
|
AccessShareLock);
|
|
|
|
}
|
|
|
|
|
2008-04-17 01:59:40 +02:00
|
|
|
/*
|
|
|
|
* Release lock on source database before doing recursive remove. This is
|
|
|
|
* not essential but it seems desirable to release the lock as soon as
|
|
|
|
* possible.
|
|
|
|
*/
|
|
|
|
UnlockSharedObject(DatabaseRelationId, fparms->src_dboid, 0, ShareLock);
|
|
|
|
|
|
|
|
/* Throw away any successfully copied subdirectories */
|
|
|
|
remove_dbtablespaces(fparms->dest_dboid);
|
2000-11-14 19:37:49 +01:00
|
|
|
}
|
1999-12-12 06:15:10 +01:00
|
|
|
|
|
|
|
|
2000-01-13 19:26:18 +01:00
|
|
|
/*
|
|
|
|
* DROP DATABASE
|
|
|
|
*/
|
1996-07-09 08:22:35 +02:00
|
|
|
void
|
2019-11-12 06:36:13 +01:00
|
|
|
dropdb(const char *dbname, bool missing_ok, bool force)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
1998-08-29 06:09:29 +02:00
|
|
|
Oid db_id;
|
2005-06-28 07:09:14 +02:00
|
|
|
bool db_istemplate;
|
1999-09-24 02:25:33 +02:00
|
|
|
Relation pgdbrel;
|
|
|
|
HeapTuple tup;
|
2008-08-04 20:03:46 +02:00
|
|
|
int notherbackends;
|
|
|
|
int npreparedxacts;
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
int nslots,
|
|
|
|
nslots_active;
|
2017-01-19 18:00:00 +01:00
|
|
|
int nsubscriptions;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Look up the target database's OID, and get exclusive lock on it. We
|
|
|
|
* need this to ensure that no new backend starts up in the target
|
|
|
|
* database while we are deleting it (see postinit.c), and that no one is
|
|
|
|
* using it as a CREATE DATABASE template or trying to delete it for
|
|
|
|
* themselves.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
|
1999-09-24 02:25:33 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
|
2023-03-08 16:35:42 +01:00
|
|
|
&db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
|
2005-11-22 16:24:18 +01:00
|
|
|
{
|
|
|
|
if (!missing_ok)
|
|
|
|
{
|
|
|
|
ereport(ERROR,
|
2003-07-19 01:20:33 +02:00
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
|
|
|
errmsg("database \"%s\" does not exist", dbname)));
|
2005-11-22 16:24:18 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Close pg_database, release the lock, since we changed nothing */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pgdbrel, RowExclusiveLock);
|
2005-11-22 16:24:18 +01:00
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("database \"%s\" does not exist, skipping",
|
|
|
|
dbname)));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
2000-11-14 19:37:49 +01:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
|
|
|
* Permission checks
|
|
|
|
*/
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
2003-08-01 02:15:26 +02:00
|
|
|
dbname);
|
2000-11-14 19:37:49 +01:00
|
|
|
|
2012-03-09 20:34:56 +01:00
|
|
|
/* DROP hook for the database being removed */
|
2013-03-07 02:52:06 +01:00
|
|
|
InvokeObjectDropHook(DatabaseRelationId, db_id, 0);
|
2012-03-09 20:34:56 +01:00
|
|
|
|
2000-11-14 19:37:49 +01:00
|
|
|
/*
|
|
|
|
* Disallow dropping a DB that is marked istemplate. This is just to
|
|
|
|
* prevent people from accidentally dropping template0 or template1; they
|
|
|
|
* can do so if they're really determined ...
|
|
|
|
*/
|
|
|
|
if (db_istemplate)
|
2003-07-19 01:20:33 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("cannot drop a template database")));
|
2000-11-14 19:37:49 +01:00
|
|
|
|
2007-06-01 21:38:07 +02:00
|
|
|
/* Obviously can't drop my own database */
|
|
|
|
if (db_id == MyDatabaseId)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("cannot drop the currently open database")));
|
|
|
|
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
/*
|
2017-03-28 16:05:21 +02:00
|
|
|
* Check whether there are active logical slots that refer to the
|
|
|
|
* to-be-dropped database. The database lock we are holding prevents the
|
|
|
|
* creation of new slots using the database or existing slots becoming
|
|
|
|
* active.
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
*/
|
2017-03-28 16:05:21 +02:00
|
|
|
(void) ReplicationSlotsCountDBSlots(db_id, &nslots, &nslots_active);
|
|
|
|
if (nslots_active)
|
|
|
|
{
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
2017-03-28 16:05:21 +02:00
|
|
|
errmsg("database \"%s\" is used by an active logical replication slot",
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
dbname),
|
2018-02-18 23:16:11 +01:00
|
|
|
errdetail_plural("There is %d active slot.",
|
|
|
|
"There are %d active slots.",
|
2017-03-28 16:05:21 +02:00
|
|
|
nslots_active, nslots_active)));
|
|
|
|
}
|
Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 22:32:18 +01:00
|
|
|
|
2017-01-19 18:00:00 +01:00
|
|
|
/*
|
|
|
|
* Check if there are subscriptions defined in the target database.
|
|
|
|
*
|
|
|
|
* We can't drop them automatically because they might be holding
|
|
|
|
* resources in other databases/instances.
|
|
|
|
*/
|
|
|
|
if ((nsubscriptions = CountDBSubscriptions(db_id)) > 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("database \"%s\" is being used by logical replication subscription",
|
|
|
|
dbname),
|
|
|
|
errdetail_plural("There is %d subscription.",
|
|
|
|
"There are %d subscriptions.",
|
|
|
|
nsubscriptions, nsubscriptions)));
|
|
|
|
|
2019-11-12 06:36:13 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempt to terminate all existing connections to the target database if
|
|
|
|
* the user has requested to do so.
|
|
|
|
*/
|
|
|
|
if (force)
|
|
|
|
TerminateOtherDBBackends(db_id);
|
|
|
|
|
2019-11-09 12:58:27 +01:00
|
|
|
/*
|
|
|
|
* Check for other backends in the target database. (Because we hold the
|
|
|
|
* database lock, no new ones can start after this.)
|
|
|
|
*
|
|
|
|
* As in CREATE DATABASE, check this after other error conditions.
|
|
|
|
*/
|
|
|
|
if (CountOtherDBBackends(db_id, ¬herbackends, &npreparedxacts))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("database \"%s\" is being accessed by other users",
|
|
|
|
dbname),
|
|
|
|
errdetail_busy_db(notherbackends, npreparedxacts)));
|
|
|
|
|
1999-09-24 02:25:33 +02:00
|
|
|
/*
|
2006-05-04 00:45:26 +02:00
|
|
|
* Remove the database's tuple from pg_database.
|
1999-09-24 02:25:33 +02:00
|
|
|
*/
|
2010-02-14 19:42:19 +01:00
|
|
|
tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(db_id));
|
1999-09-24 02:25:33 +02:00
|
|
|
if (!HeapTupleIsValid(tup))
|
2006-05-04 00:45:26 +02:00
|
|
|
elog(ERROR, "cache lookup failed for database %u", db_id);
|
1999-09-24 02:25:33 +02:00
|
|
|
|
2017-02-01 22:13:30 +01:00
|
|
|
CatalogTupleDelete(pgdbrel, &tup->t_self);
|
1999-09-24 02:25:33 +02:00
|
|
|
|
2006-05-04 00:45:26 +02:00
|
|
|
ReleaseSysCache(tup);
|
1999-09-24 02:25:33 +02:00
|
|
|
|
2002-07-12 20:43:19 +02:00
|
|
|
/*
|
2011-07-20 19:18:24 +02:00
|
|
|
* Delete any comments or security labels associated with the database.
|
2002-07-12 20:43:19 +02:00
|
|
|
*/
|
2006-02-12 04:22:21 +01:00
|
|
|
DeleteSharedComments(db_id, DatabaseRelationId);
|
2011-07-20 19:18:24 +02:00
|
|
|
DeleteSharedSecurityLabel(db_id, DatabaseRelationId);
|
2001-08-10 20:57:42 +02:00
|
|
|
|
2009-10-08 00:14:26 +02:00
|
|
|
/*
|
|
|
|
* Remove settings associated with this database
|
|
|
|
*/
|
|
|
|
DropSetting(db_id, InvalidOid);
|
|
|
|
|
2005-10-10 22:02:20 +02:00
|
|
|
/*
|
|
|
|
* Remove shared dependency references for the database.
|
|
|
|
*/
|
|
|
|
dropDatabaseDependencies(db_id);
|
|
|
|
|
2017-03-28 16:05:21 +02:00
|
|
|
/*
|
|
|
|
* Drop db-specific replication slots.
|
|
|
|
*/
|
|
|
|
ReplicationSlotsDropDBSlots(db_id);
|
|
|
|
|
1999-09-24 02:25:33 +02:00
|
|
|
/*
|
|
|
|
* Drop pages for this database that are in the shared buffer cache. This
|
|
|
|
* is important to ensure that no remaining backend tries to write out a
|
|
|
|
* dirty buffer to the dead database later...
|
|
|
|
*/
|
2006-03-29 23:17:39 +02:00
|
|
|
DropDatabaseBuffers(db_id);
|
1999-03-15 15:07:44 +01:00
|
|
|
|
2007-02-09 17:12:19 +01:00
|
|
|
/*
|
2022-04-06 22:56:06 +02:00
|
|
|
* Tell the cumulative stats system to forget it immediately, too.
|
2007-02-09 17:12:19 +01:00
|
|
|
*/
|
|
|
|
pgstat_drop_database(db_id);
|
|
|
|
|
2004-10-28 02:39:59 +02:00
|
|
|
/*
|
2011-11-01 19:48:47 +01:00
|
|
|
* Tell checkpointer to forget any pending fsync and unlink requests for
|
2008-04-18 08:48:38 +02:00
|
|
|
* files in the database; else the fsyncs will fail at next checkpoint, or
|
|
|
|
* worse, it will delete files that belong to a newly created database
|
|
|
|
* with the same OID.
|
2007-01-17 17:25:01 +01:00
|
|
|
*/
|
2019-04-04 10:56:03 +02:00
|
|
|
ForgetDatabaseSyncRequests(db_id);
|
2007-01-17 17:25:01 +01:00
|
|
|
|
|
|
|
/*
|
2011-11-01 19:48:47 +01:00
|
|
|
* Force a checkpoint to make sure the checkpointer has received the
|
2022-02-11 22:21:23 +01:00
|
|
|
* message sent by ForgetDatabaseSyncRequests.
|
2004-10-28 02:39:59 +02:00
|
|
|
*/
|
2007-06-28 02:02:40 +02:00
|
|
|
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
|
2004-10-28 02:39:59 +02:00
|
|
|
|
2022-02-11 22:21:23 +01:00
|
|
|
/* Close all smgr fds in all backends. */
|
|
|
|
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2004-06-18 08:14:31 +02:00
|
|
|
* Remove all tablespace subdirs belonging to the database.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2004-06-18 08:14:31 +02:00
|
|
|
remove_dbtablespaces(db_id);
|
2001-01-14 23:14:10 +01:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
2009-09-01 04:54:52 +02:00
|
|
|
* Close pg_database, but keep lock till commit.
|
2006-05-04 18:07:29 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pgdbrel, NoLock);
|
2005-02-20 03:22:07 +01:00
|
|
|
|
|
|
|
/*
|
2009-09-01 04:54:52 +02:00
|
|
|
* Force synchronous commit, thus minimizing the window between removal of
|
2017-02-06 10:33:58 +01:00
|
|
|
* the database files and committal of the transaction. If we crash before
|
2007-08-02 00:45:09 +02:00
|
|
|
* committing, we'll have a DB that's gone on disk but still there
|
|
|
|
* according to pg_database, which is not good.
|
2005-02-20 03:22:07 +01:00
|
|
|
*/
|
2009-09-01 04:54:52 +02:00
|
|
|
ForceSyncCommit();
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
1999-12-12 06:15:10 +01:00
|
|
|
|
2003-06-27 16:45:32 +02:00
|
|
|
/*
|
|
|
|
* Rename database
|
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
2003-06-27 16:45:32 +02:00
|
|
|
RenameDatabase(const char *oldname, const char *newname)
|
|
|
|
{
|
2006-05-04 18:07:29 +02:00
|
|
|
Oid db_id;
|
|
|
|
HeapTuple newtup;
|
2003-06-27 16:45:32 +02:00
|
|
|
Relation rel;
|
2008-08-04 20:03:46 +02:00
|
|
|
int notherbackends;
|
|
|
|
int npreparedxacts;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
2003-06-27 16:45:32 +02:00
|
|
|
|
|
|
|
/*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Look up the target database's OID, and get exclusive lock on it. We
|
|
|
|
* need this for the same reasons as DROP DATABASE.
|
2003-06-27 16:45:32 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(DatabaseRelationId, RowExclusiveLock);
|
2003-06-27 16:45:32 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
|
2023-03-08 16:35:42 +01:00
|
|
|
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
|
2003-06-27 16:45:32 +02:00
|
|
|
ereport(ERROR,
|
2003-07-19 01:20:33 +02:00
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
2003-06-27 16:45:32 +02:00
|
|
|
errmsg("database \"%s\" does not exist", oldname)));
|
|
|
|
|
2007-06-01 21:38:07 +02:00
|
|
|
/* must be owner */
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
2007-06-01 21:38:07 +02:00
|
|
|
oldname);
|
|
|
|
|
|
|
|
/* must have createdb rights */
|
2014-12-23 19:35:49 +01:00
|
|
|
if (!have_createdb_privilege())
|
2007-06-01 21:38:07 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied to rename database")));
|
|
|
|
|
Add an enforcement mechanism for global object names in regression tests.
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have
on an existing installation.
This commit adds a simple enforcement mechanism for that rule: if the code
is compiled with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS defined, it
will emit a warning (not an error) whenever a database, role, tablespace,
subscription, or replication origin name is created that doesn't obey the
rule. Running one or more buildfarm members with that symbol defined
should be enough to catch new violations, at least in the regular
regression tests. Most TAP tests wouldn't notice such warnings, but
that's actually fine because TAP tests don't execute against an existing
server anyway.
Since it's already the case that running src/test/modules/ tests in
installcheck mode is deprecated, we can use that as a home for tests
that seem unsafe to run against an existing server, such as tests that
might have side-effects on existing roles. Document that (though this
commit doesn't in itself make it any less safe than before).
Update regress.sgml to define these restrictions more clearly, and
to clean up assorted lack-of-up-to-date-ness in its descriptions of
the available regression tests.
Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
2019-06-29 17:34:00 +02:00
|
|
|
/*
|
|
|
|
* If built with appropriate switch, whine when regression-testing
|
|
|
|
* conventions for database names are violated.
|
|
|
|
*/
|
|
|
|
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
|
|
|
|
if (strstr(newname, "regression") == NULL)
|
|
|
|
elog(WARNING, "databases created by regression test cases should have names including \"regression\"");
|
|
|
|
#endif
|
|
|
|
|
2007-06-01 21:38:07 +02:00
|
|
|
/*
|
|
|
|
* Make sure the new name doesn't exist. See notes for same error in
|
|
|
|
* CREATE DATABASE.
|
|
|
|
*/
|
2010-08-05 16:45:09 +02:00
|
|
|
if (OidIsValid(get_database_oid(newname, true)))
|
2007-06-01 21:38:07 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_DATABASE),
|
|
|
|
errmsg("database \"%s\" already exists", newname)));
|
|
|
|
|
2003-06-27 16:45:32 +02:00
|
|
|
/*
|
|
|
|
* XXX Client applications probably store the current database somewhere,
|
|
|
|
* so renaming it could cause confusion. On the other hand, there may not
|
|
|
|
* be an actual problem besides a little confusion, so think about this
|
|
|
|
* and decide.
|
|
|
|
*/
|
2006-05-04 18:07:29 +02:00
|
|
|
if (db_id == MyDatabaseId)
|
2003-06-27 16:45:32 +02:00
|
|
|
ereport(ERROR,
|
2003-07-19 01:20:33 +02:00
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 20:10:30 +01:00
|
|
|
errmsg("current database cannot be renamed")));
|
2003-06-27 16:45:32 +02:00
|
|
|
|
|
|
|
/*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Make sure the database does not have active sessions. This is the same
|
|
|
|
* concern as above, but applied to other sessions.
|
2007-06-01 21:38:07 +02:00
|
|
|
*
|
|
|
|
* As in CREATE DATABASE, check this after other error conditions.
|
2003-06-27 16:45:32 +02:00
|
|
|
*/
|
2008-08-04 20:03:46 +02:00
|
|
|
if (CountOtherDBBackends(db_id, ¬herbackends, &npreparedxacts))
|
2003-07-19 01:20:33 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("database \"%s\" is being accessed by other users",
|
2008-08-04 20:03:46 +02:00
|
|
|
oldname),
|
|
|
|
errdetail_busy_db(notherbackends, npreparedxacts)));
|
2003-06-27 16:45:32 +02:00
|
|
|
|
|
|
|
/* rename */
|
2010-02-14 19:42:19 +01:00
|
|
|
newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
|
2006-05-04 18:07:29 +02:00
|
|
|
if (!HeapTupleIsValid(newtup))
|
|
|
|
elog(ERROR, "cache lookup failed for database %u", db_id);
|
2003-06-27 16:45:32 +02:00
|
|
|
namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(rel, &newtup->t_self, newtup);
|
2003-06-27 16:45:32 +02:00
|
|
|
|
2013-03-18 03:55:14 +01:00
|
|
|
InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddressSet(address, DatabaseRelationId, db_id);
|
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
2009-09-01 04:54:52 +02:00
|
|
|
* Close pg_database, but keep lock till commit.
|
2006-05-04 18:07:29 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2012-12-24 00:25:03 +01:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2003-06-27 16:45:32 +02:00
|
|
|
}
|
|
|
|
|
1999-12-12 06:15:10 +01:00
|
|
|
|
2008-11-07 19:25:07 +01:00
|
|
|
/*
|
|
|
|
* ALTER DATABASE SET TABLESPACE
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
movedb(const char *dbname, const char *tblspcname)
|
|
|
|
{
|
|
|
|
Oid db_id;
|
|
|
|
Relation pgdbrel;
|
|
|
|
int notherbackends;
|
|
|
|
int npreparedxacts;
|
|
|
|
HeapTuple oldtuple,
|
|
|
|
newtuple;
|
|
|
|
Oid src_tblspcoid,
|
|
|
|
dst_tblspcoid;
|
|
|
|
ScanKeyData scankey;
|
|
|
|
SysScanDesc sysscan;
|
|
|
|
AclResult aclresult;
|
|
|
|
char *src_dbpath;
|
|
|
|
char *dst_dbpath;
|
|
|
|
DIR *dstdir;
|
|
|
|
struct dirent *xlde;
|
|
|
|
movedb_failure_params fparms;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Look up the target database's OID, and get exclusive lock on it. We
|
|
|
|
* need this to ensure that no new backend starts up in the database while
|
|
|
|
* we are moving it, and that no one is using it as a CREATE DATABASE
|
|
|
|
* template or trying to delete it.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
|
2008-11-07 19:25:07 +01:00
|
|
|
|
|
|
|
if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
|
2023-03-08 16:35:42 +01:00
|
|
|
NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
|
2008-11-07 19:25:07 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
|
|
|
errmsg("database \"%s\" does not exist", dbname)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We actually need a session lock, so that the lock will persist across
|
|
|
|
* the commit/restart below. (We could almost get away with letting the
|
|
|
|
* lock be released at commit, except that someone could try to move
|
|
|
|
* relations of the DB back into the old directory while we rmtree() it.)
|
|
|
|
*/
|
|
|
|
LockSharedObjectForSession(DatabaseRelationId, db_id, 0,
|
|
|
|
AccessExclusiveLock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Permission checks
|
|
|
|
*/
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
2008-11-07 19:25:07 +01:00
|
|
|
dbname);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Obviously can't move the tables of my own database
|
|
|
|
*/
|
|
|
|
if (db_id == MyDatabaseId)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("cannot change the tablespace of the currently open database")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get tablespace's oid
|
|
|
|
*/
|
2010-08-05 16:45:09 +02:00
|
|
|
dst_tblspcoid = get_tablespace_oid(tblspcname, false);
|
2008-11-07 19:25:07 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Permission checks
|
|
|
|
*/
|
2022-11-13 08:11:17 +01:00
|
|
|
aclresult = object_aclcheck(TableSpaceRelationId, dst_tblspcoid, GetUserId(),
|
2023-05-19 23:24:48 +02:00
|
|
|
ACL_CREATE);
|
2008-11-07 19:25:07 +01:00
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, OBJECT_TABLESPACE,
|
2008-11-07 19:25:07 +01:00
|
|
|
tblspcname);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_global must never be the default tablespace
|
|
|
|
*/
|
|
|
|
if (dst_tblspcoid == GLOBALTABLESPACE_OID)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("pg_global cannot be used as default tablespace")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* No-op if same tablespace
|
|
|
|
*/
|
|
|
|
if (src_tblspcoid == dst_tblspcoid)
|
|
|
|
{
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pgdbrel, NoLock);
|
2008-11-07 19:25:07 +01:00
|
|
|
UnlockSharedObjectForSession(DatabaseRelationId, db_id, 0,
|
|
|
|
AccessExclusiveLock);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for other backends in the target database. (Because we hold the
|
|
|
|
* database lock, no new ones can start after this.)
|
|
|
|
*
|
|
|
|
* As in CREATE DATABASE, check this after other error conditions.
|
|
|
|
*/
|
|
|
|
if (CountOtherDBBackends(db_id, ¬herbackends, &npreparedxacts))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("database \"%s\" is being accessed by other users",
|
|
|
|
dbname),
|
|
|
|
errdetail_busy_db(notherbackends, npreparedxacts)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get old and new database paths
|
|
|
|
*/
|
|
|
|
src_dbpath = GetDatabasePath(db_id, src_tblspcoid);
|
|
|
|
dst_dbpath = GetDatabasePath(db_id, dst_tblspcoid);
|
|
|
|
|
|
|
|
/*
|
2014-10-20 23:43:46 +02:00
|
|
|
* Force a checkpoint before proceeding. This will force all dirty
|
|
|
|
* buffers, including those of unlogged tables, out to disk, to ensure
|
|
|
|
* source database is up-to-date on disk for the copy.
|
2008-11-07 19:25:07 +01:00
|
|
|
* FlushDatabaseBuffers() would suffice for that, but we also want to
|
|
|
|
* process any pending unlink requests. Otherwise, the check for existing
|
|
|
|
* files in the target directory might fail unnecessarily, not to mention
|
|
|
|
* that the copy might fail due to source files getting deleted under it.
|
2011-11-01 19:48:47 +01:00
|
|
|
* On Windows, this also ensures that background procs don't hold any open
|
2008-11-07 19:25:07 +01:00
|
|
|
* files, which would cause rmdir() to fail.
|
|
|
|
*/
|
2014-10-20 23:43:46 +02:00
|
|
|
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT
|
|
|
|
| CHECKPOINT_FLUSH_ALL);
|
2008-11-07 19:25:07 +01:00
|
|
|
|
2022-02-11 22:21:23 +01:00
|
|
|
/* Close all smgr fds in all backends. */
|
|
|
|
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
|
|
|
|
|
2014-11-04 19:24:06 +01:00
|
|
|
/*
|
|
|
|
* Now drop all buffers holding data of the target database; they should
|
|
|
|
* no longer be dirty so DropDatabaseBuffers is safe.
|
|
|
|
*
|
|
|
|
* It might seem that we could just let these buffers age out of shared
|
|
|
|
* buffers naturally, since they should not get referenced anymore. The
|
|
|
|
* problem with that is that if the user later moves the database back to
|
|
|
|
* its original tablespace, any still-surviving buffers would appear to
|
|
|
|
* contain valid data again --- but they'd be missing any changes made in
|
|
|
|
* the database while it was in the new tablespace. In any case, freeing
|
|
|
|
* buffers that should never be used again seems worth the cycles.
|
|
|
|
*
|
|
|
|
* Note: it'd be sufficient to get rid of buffers matching db_id and
|
|
|
|
* src_tblspcoid, but bufmgr.c presently provides no API for that.
|
|
|
|
*/
|
|
|
|
DropDatabaseBuffers(db_id);
|
|
|
|
|
2008-11-07 19:25:07 +01:00
|
|
|
/*
|
|
|
|
* Check for existence of files in the target directory, i.e., objects of
|
|
|
|
* this database that are already in the target tablespace. We can't
|
|
|
|
* allow the move in such a case, because we would need to change those
|
|
|
|
* relations' pg_class.reltablespace entries to zero, and we don't have
|
|
|
|
* access to the DB's pg_class to do so.
|
|
|
|
*/
|
|
|
|
dstdir = AllocateDir(dst_dbpath);
|
|
|
|
if (dstdir != NULL)
|
|
|
|
{
|
|
|
|
while ((xlde = ReadDir(dstdir, dst_dbpath)) != NULL)
|
|
|
|
{
|
|
|
|
if (strcmp(xlde->d_name, ".") == 0 ||
|
|
|
|
strcmp(xlde->d_name, "..") == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ereport(ERROR,
|
2009-05-06 18:15:21 +02:00
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
|
|
|
errmsg("some relations of database \"%s\" are already in tablespace \"%s\"",
|
2008-11-07 19:25:07 +01:00
|
|
|
dbname, tblspcname),
|
|
|
|
errhint("You must move them back to the database's default tablespace before using this command.")));
|
|
|
|
}
|
|
|
|
|
|
|
|
FreeDir(dstdir);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The directory exists but is empty. We must remove it before using
|
|
|
|
* the copydir function.
|
|
|
|
*/
|
|
|
|
if (rmdir(dst_dbpath) != 0)
|
|
|
|
elog(ERROR, "could not remove directory \"%s\": %m",
|
|
|
|
dst_dbpath);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use an ENSURE block to make sure we remove the debris if the copy fails
|
|
|
|
* (eg, due to out-of-disk-space). This is not a 100% solution, because
|
|
|
|
* of the possibility of failure during transaction commit, but it should
|
|
|
|
* handle most scenarios.
|
|
|
|
*/
|
|
|
|
fparms.dest_dboid = db_id;
|
|
|
|
fparms.dest_tsoid = dst_tblspcoid;
|
|
|
|
PG_ENSURE_ERROR_CLEANUP(movedb_failure_callback,
|
|
|
|
PointerGetDatum(&fparms));
|
|
|
|
{
|
2022-07-16 08:42:15 +02:00
|
|
|
Datum new_record[Natts_pg_database] = {0};
|
|
|
|
bool new_record_nulls[Natts_pg_database] = {0};
|
|
|
|
bool new_record_repl[Natts_pg_database] = {0};
|
|
|
|
|
2008-11-07 19:25:07 +01:00
|
|
|
/*
|
|
|
|
* Copy files from the old tablespace to the new one
|
|
|
|
*/
|
|
|
|
copydir(src_dbpath, dst_dbpath, false);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Record the filesystem change in XLOG
|
|
|
|
*/
|
|
|
|
{
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
xl_dbase_create_file_copy_rec xlrec;
|
2008-11-07 19:25:07 +01:00
|
|
|
|
|
|
|
xlrec.db_id = db_id;
|
|
|
|
xlrec.tablespace_id = dst_tblspcoid;
|
|
|
|
xlrec.src_db_id = db_id;
|
|
|
|
xlrec.src_tablespace_id = src_tblspcoid;
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogBeginInsert();
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
XLogRegisterData((char *) &xlrec,
|
|
|
|
sizeof(xl_dbase_create_file_copy_rec));
|
2008-11-07 19:25:07 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
(void) XLogInsert(RM_DBASE_ID,
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
XLOG_DBASE_CREATE_FILE_COPY | XLR_SPECIAL_REL_UPDATE);
|
2008-11-07 19:25:07 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Update the database's pg_database tuple
|
|
|
|
*/
|
|
|
|
ScanKeyInit(&scankey,
|
|
|
|
Anum_pg_database_datname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
2016-09-13 23:17:48 +02:00
|
|
|
CStringGetDatum(dbname));
|
2008-11-07 19:25:07 +01:00
|
|
|
sysscan = systable_beginscan(pgdbrel, DatabaseNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, &scankey);
|
2008-11-07 19:25:07 +01:00
|
|
|
oldtuple = systable_getnext(sysscan);
|
|
|
|
if (!HeapTupleIsValid(oldtuple)) /* shouldn't happen... */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
|
|
|
errmsg("database \"%s\" does not exist", dbname)));
|
|
|
|
|
|
|
|
new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
|
|
|
|
new_record_repl[Anum_pg_database_dattablespace - 1] = true;
|
|
|
|
|
|
|
|
newtuple = heap_modify_tuple(oldtuple, RelationGetDescr(pgdbrel),
|
|
|
|
new_record,
|
|
|
|
new_record_nulls, new_record_repl);
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
|
2008-11-07 19:25:07 +01:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
|
2013-03-18 03:55:14 +01:00
|
|
|
|
2008-11-07 19:25:07 +01:00
|
|
|
systable_endscan(sysscan);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Force another checkpoint here. As in CREATE DATABASE, this is to
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
* ensure that we don't have to replay a committed
|
|
|
|
* XLOG_DBASE_CREATE_FILE_COPY operation, which would cause us to lose
|
|
|
|
* any unlogged operations done in the new DB tablespace before the
|
|
|
|
* next checkpoint.
|
2008-11-07 19:25:07 +01:00
|
|
|
*/
|
|
|
|
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
|
|
|
|
|
|
|
|
/*
|
2009-09-01 04:54:52 +02:00
|
|
|
* Force synchronous commit, thus minimizing the window between
|
2017-02-06 10:33:58 +01:00
|
|
|
* copying the database files and committal of the transaction. If we
|
2008-11-07 19:25:07 +01:00
|
|
|
* crash before committing, we'll leave an orphaned set of files on
|
|
|
|
* disk, which is not fatal but not good either.
|
|
|
|
*/
|
2009-09-01 04:54:52 +02:00
|
|
|
ForceSyncCommit();
|
2008-11-07 19:25:07 +01:00
|
|
|
|
|
|
|
/*
|
2009-09-01 04:54:52 +02:00
|
|
|
* Close pg_database, but keep lock till commit.
|
2008-11-07 19:25:07 +01:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pgdbrel, NoLock);
|
2008-11-07 19:25:07 +01:00
|
|
|
}
|
|
|
|
PG_END_ENSURE_ERROR_CLEANUP(movedb_failure_callback,
|
|
|
|
PointerGetDatum(&fparms));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Commit the transaction so that the pg_database update is committed. If
|
|
|
|
* we crash while removing files, the database won't be corrupt, we'll
|
|
|
|
* just leave some orphaned files in the old directory.
|
|
|
|
*
|
|
|
|
* (This is OK because we know we aren't inside a transaction block.)
|
|
|
|
*
|
|
|
|
* XXX would it be safe/better to do this inside the ensure block? Not
|
|
|
|
* convinced it's a good idea; consider elog just after the transaction
|
|
|
|
* really commits.
|
|
|
|
*/
|
|
|
|
PopActiveSnapshot();
|
|
|
|
CommitTransactionCommand();
|
|
|
|
|
|
|
|
/* Start new transaction for the remaining work; don't need a snapshot */
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove files from the old tablespace
|
|
|
|
*/
|
|
|
|
if (!rmtree(src_dbpath, true))
|
|
|
|
ereport(WARNING,
|
|
|
|
(errmsg("some useless files may be left behind in old database directory \"%s\"",
|
|
|
|
src_dbpath)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Record the filesystem change in XLOG
|
|
|
|
*/
|
|
|
|
{
|
|
|
|
xl_dbase_drop_rec xlrec;
|
|
|
|
|
|
|
|
xlrec.db_id = db_id;
|
2019-11-21 13:10:37 +01:00
|
|
|
xlrec.ntablespaces = 1;
|
2008-11-07 19:25:07 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));
|
2019-11-21 13:10:37 +01:00
|
|
|
XLogRegisterData((char *) &src_tblspcoid, sizeof(Oid));
|
2008-11-07 19:25:07 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
(void) XLogInsert(RM_DBASE_ID,
|
|
|
|
XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
|
2008-11-07 19:25:07 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Now it's safe to release the database lock */
|
|
|
|
UnlockSharedObjectForSession(DatabaseRelationId, db_id, 0,
|
|
|
|
AccessExclusiveLock);
|
2022-04-25 10:32:13 +02:00
|
|
|
|
|
|
|
pfree(src_dbpath);
|
|
|
|
pfree(dst_dbpath);
|
2008-11-07 19:25:07 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Error cleanup callback for movedb */
|
|
|
|
static void
|
|
|
|
movedb_failure_callback(int code, Datum arg)
|
|
|
|
{
|
|
|
|
movedb_failure_params *fparms = (movedb_failure_params *) DatumGetPointer(arg);
|
|
|
|
char *dstpath;
|
|
|
|
|
|
|
|
/* Get rid of anything we managed to copy to the target directory */
|
|
|
|
dstpath = GetDatabasePath(fparms->dest_dboid, fparms->dest_tsoid);
|
|
|
|
|
|
|
|
(void) rmtree(dstpath, true);
|
2022-04-25 10:32:13 +02:00
|
|
|
|
|
|
|
pfree(dstpath);
|
2008-11-07 19:25:07 +01:00
|
|
|
}
|
|
|
|
|
2019-11-12 06:36:13 +01:00
|
|
|
/*
|
|
|
|
* Process options and call dropdb function.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
DropDatabase(ParseState *pstate, DropdbStmt *stmt)
|
|
|
|
{
|
|
|
|
bool force = false;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, stmt->options)
|
|
|
|
{
|
|
|
|
DefElem *opt = (DefElem *) lfirst(lc);
|
|
|
|
|
|
|
|
if (strcmp(opt->defname, "force") == 0)
|
|
|
|
force = true;
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
|
|
|
|
parser_errposition(pstate, opt->location)));
|
|
|
|
}
|
|
|
|
|
|
|
|
dropdb(stmt->dbname, stmt->missing_ok, force);
|
|
|
|
}
|
2008-11-07 19:25:07 +01:00
|
|
|
|
2005-07-31 19:19:22 +02:00
|
|
|
/*
|
|
|
|
* ALTER DATABASE name ...
|
|
|
|
*/
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid
|
2016-09-06 18:00:00 +02:00
|
|
|
AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
|
2005-07-31 19:19:22 +02:00
|
|
|
{
|
|
|
|
Relation rel;
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid dboid;
|
2005-07-31 19:19:22 +02:00
|
|
|
HeapTuple tuple,
|
|
|
|
newtuple;
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Form_pg_database datform;
|
2005-07-31 19:19:22 +02:00
|
|
|
ScanKeyData scankey;
|
|
|
|
SysScanDesc scan;
|
|
|
|
ListCell *option;
|
2014-07-02 02:10:38 +02:00
|
|
|
bool dbistemplate = false;
|
|
|
|
bool dballowconnections = true;
|
|
|
|
int dbconnlimit = -1;
|
|
|
|
DefElem *distemplate = NULL;
|
|
|
|
DefElem *dallowconnections = NULL;
|
2005-07-31 19:19:22 +02:00
|
|
|
DefElem *dconnlimit = NULL;
|
2008-11-07 19:25:07 +01:00
|
|
|
DefElem *dtablespace = NULL;
|
2022-07-16 08:42:15 +02:00
|
|
|
Datum new_record[Natts_pg_database] = {0};
|
|
|
|
bool new_record_nulls[Natts_pg_database] = {0};
|
|
|
|
bool new_record_repl[Natts_pg_database] = {0};
|
2005-07-31 19:19:22 +02:00
|
|
|
|
|
|
|
/* Extract options from the statement node tree */
|
|
|
|
foreach(option, stmt->options)
|
|
|
|
{
|
|
|
|
DefElem *defel = (DefElem *) lfirst(option);
|
|
|
|
|
2014-07-02 02:10:38 +02:00
|
|
|
if (strcmp(defel->defname, "is_template") == 0)
|
|
|
|
{
|
|
|
|
if (distemplate)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2014-07-02 02:10:38 +02:00
|
|
|
distemplate = defel;
|
|
|
|
}
|
|
|
|
else if (strcmp(defel->defname, "allow_connections") == 0)
|
|
|
|
{
|
|
|
|
if (dallowconnections)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2014-07-02 02:10:38 +02:00
|
|
|
dallowconnections = defel;
|
|
|
|
}
|
|
|
|
else if (strcmp(defel->defname, "connection_limit") == 0)
|
2005-07-31 19:19:22 +02:00
|
|
|
{
|
|
|
|
if (dconnlimit)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2005-07-31 19:19:22 +02:00
|
|
|
dconnlimit = defel;
|
|
|
|
}
|
2008-11-07 19:25:07 +01:00
|
|
|
else if (strcmp(defel->defname, "tablespace") == 0)
|
|
|
|
{
|
|
|
|
if (dtablespace)
|
Improve reporting of "conflicting or redundant options" errors.
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
2021-07-15 09:49:45 +02:00
|
|
|
errorConflictingDefElem(defel, pstate);
|
2008-11-07 19:25:07 +01:00
|
|
|
dtablespace = defel;
|
|
|
|
}
|
2005-07-31 19:19:22 +02:00
|
|
|
else
|
2014-07-02 01:02:21 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("option \"%s\" not recognized", defel->defname),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2005-07-31 19:19:22 +02:00
|
|
|
}
|
|
|
|
|
2008-11-07 19:25:07 +01:00
|
|
|
if (dtablespace)
|
|
|
|
{
|
2014-07-02 01:02:21 +02:00
|
|
|
/*
|
|
|
|
* While the SET TABLESPACE syntax doesn't allow any other options,
|
|
|
|
* somebody could write "WITH TABLESPACE ...". Forbid any other
|
|
|
|
* options from being specified in that case.
|
|
|
|
*/
|
|
|
|
if (list_length(stmt->options) != 1)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("option \"%s\" cannot be specified with other options",
|
2016-09-06 18:00:00 +02:00
|
|
|
dtablespace->defname),
|
|
|
|
parser_errposition(pstate, dtablespace->location)));
|
2008-11-07 19:25:07 +01:00
|
|
|
/* this case isn't allowed within a transaction block */
|
2018-02-17 02:44:15 +01:00
|
|
|
PreventInTransactionBlock(isTopLevel, "ALTER DATABASE SET TABLESPACE");
|
2014-07-02 01:02:21 +02:00
|
|
|
movedb(stmt->dbname, defGetString(dtablespace));
|
2012-12-29 13:55:37 +01:00
|
|
|
return InvalidOid;
|
2008-11-07 19:25:07 +01:00
|
|
|
}
|
|
|
|
|
2014-07-02 02:10:38 +02:00
|
|
|
if (distemplate && distemplate->arg)
|
|
|
|
dbistemplate = defGetBoolean(distemplate);
|
|
|
|
if (dallowconnections && dallowconnections->arg)
|
|
|
|
dballowconnections = defGetBoolean(dallowconnections);
|
2014-07-02 01:02:21 +02:00
|
|
|
if (dconnlimit && dconnlimit->arg)
|
2009-01-30 18:24:47 +01:00
|
|
|
{
|
2014-07-02 02:10:38 +02:00
|
|
|
dbconnlimit = defGetInt32(dconnlimit);
|
|
|
|
if (dbconnlimit < -1)
|
2009-01-30 18:24:47 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
2014-07-02 02:10:38 +02:00
|
|
|
errmsg("invalid connection limit: %d", dbconnlimit)));
|
2009-01-30 18:24:47 +01:00
|
|
|
}
|
2005-07-31 19:19:22 +02:00
|
|
|
|
|
|
|
/*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Get the old tuple. We don't need a lock on the database per se,
|
|
|
|
* because we're not going to do anything that would mess up incoming
|
|
|
|
* connections.
|
2005-07-31 19:19:22 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(DatabaseRelationId, RowExclusiveLock);
|
2005-07-31 19:19:22 +02:00
|
|
|
ScanKeyInit(&scankey,
|
|
|
|
Anum_pg_database_datname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
2016-09-13 23:17:48 +02:00
|
|
|
CStringGetDatum(stmt->dbname));
|
2005-07-31 19:19:22 +02:00
|
|
|
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, &scankey);
|
2005-07-31 19:19:22 +02:00
|
|
|
tuple = systable_getnext(scan);
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
|
|
|
errmsg("database \"%s\" does not exist", stmt->dbname)));
|
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
datform = (Form_pg_database) GETSTRUCT(tuple);
|
|
|
|
dboid = datform->oid;
|
2012-12-29 13:55:37 +01:00
|
|
|
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, dboid, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
2005-07-31 19:19:22 +02:00
|
|
|
stmt->dbname);
|
|
|
|
|
2014-07-02 02:10:38 +02:00
|
|
|
/*
|
|
|
|
* In order to avoid getting locked out and having to go through
|
|
|
|
* standalone mode, we refuse to disallow connections to the database
|
|
|
|
* we're currently connected to. Lockout can still happen with concurrent
|
|
|
|
* sessions but the likeliness of that is not high enough to worry about.
|
|
|
|
*/
|
|
|
|
if (!dballowconnections && dboid == MyDatabaseId)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("cannot disallow connections for current database")));
|
|
|
|
|
2005-07-31 19:19:22 +02:00
|
|
|
/*
|
|
|
|
* Build an updated tuple, perusing the information just obtained
|
|
|
|
*/
|
2014-07-02 02:10:38 +02:00
|
|
|
if (distemplate)
|
|
|
|
{
|
|
|
|
new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
|
|
|
|
new_record_repl[Anum_pg_database_datistemplate - 1] = true;
|
|
|
|
}
|
|
|
|
if (dallowconnections)
|
|
|
|
{
|
|
|
|
new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
|
|
|
|
new_record_repl[Anum_pg_database_datallowconn - 1] = true;
|
|
|
|
}
|
2005-07-31 19:19:22 +02:00
|
|
|
if (dconnlimit)
|
|
|
|
{
|
2014-07-02 02:10:38 +02:00
|
|
|
new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
|
2008-11-02 02:45:28 +01:00
|
|
|
new_record_repl[Anum_pg_database_datconnlimit - 1] = true;
|
2005-07-31 19:19:22 +02:00
|
|
|
}
|
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
|
2005-07-31 19:19:22 +02:00
|
|
|
new_record_nulls, new_record_repl);
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
|
2005-07-31 19:19:22 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
|
2013-03-18 03:55:14 +01:00
|
|
|
|
2005-07-31 19:19:22 +02:00
|
|
|
systable_endscan(scan);
|
|
|
|
|
|
|
|
/* Close pg_database, but keep lock till commit */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2012-12-29 13:55:37 +01:00
|
|
|
|
|
|
|
return dboid;
|
2005-07-31 19:19:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2022-02-14 08:09:04 +01:00
|
|
|
/*
|
|
|
|
* ALTER DATABASE name REFRESH COLLATION VERSION
|
|
|
|
*/
|
|
|
|
ObjectAddress
|
|
|
|
AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
|
|
|
|
{
|
|
|
|
Relation rel;
|
|
|
|
ScanKeyData scankey;
|
|
|
|
SysScanDesc scan;
|
|
|
|
Oid db_id;
|
|
|
|
HeapTuple tuple;
|
|
|
|
Form_pg_database datForm;
|
|
|
|
ObjectAddress address;
|
|
|
|
Datum datum;
|
|
|
|
bool isnull;
|
|
|
|
char *oldversion;
|
|
|
|
char *newversion;
|
|
|
|
|
|
|
|
rel = table_open(DatabaseRelationId, RowExclusiveLock);
|
|
|
|
ScanKeyInit(&scankey,
|
|
|
|
Anum_pg_database_datname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(stmt->dbname));
|
|
|
|
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
|
|
|
|
NULL, 1, &scankey);
|
|
|
|
tuple = systable_getnext(scan);
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
|
|
|
errmsg("database \"%s\" does not exist", stmt->dbname)));
|
|
|
|
|
|
|
|
datForm = (Form_pg_database) GETSTRUCT(tuple);
|
|
|
|
db_id = datForm->oid;
|
|
|
|
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
|
2022-02-14 08:09:04 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
|
|
|
stmt->dbname);
|
|
|
|
|
|
|
|
datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
|
|
|
|
oldversion = isnull ? NULL : TextDatumGetCString(datum);
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
|
|
|
|
if (isnull)
|
|
|
|
elog(ERROR, "unexpected null in pg_database");
|
|
|
|
newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
|
2022-02-14 08:09:04 +01:00
|
|
|
|
|
|
|
/* cannot change from NULL to non-NULL or vice versa */
|
|
|
|
if ((!oldversion && newversion) || (oldversion && !newversion))
|
|
|
|
elog(ERROR, "invalid collation version change");
|
|
|
|
else if (oldversion && newversion && strcmp(newversion, oldversion) != 0)
|
|
|
|
{
|
|
|
|
bool nulls[Natts_pg_database] = {0};
|
|
|
|
bool replaces[Natts_pg_database] = {0};
|
|
|
|
Datum values[Natts_pg_database] = {0};
|
|
|
|
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("changing version from %s to %s",
|
|
|
|
oldversion, newversion)));
|
|
|
|
|
|
|
|
values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
|
|
|
|
replaces[Anum_pg_database_datcollversion - 1] = true;
|
|
|
|
|
|
|
|
tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
|
|
|
|
values, nulls, replaces);
|
|
|
|
CatalogTupleUpdate(rel, &tuple->t_self, tuple);
|
|
|
|
heap_freetuple(tuple);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("version has not changed")));
|
|
|
|
|
|
|
|
InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
|
|
|
|
|
|
|
|
ObjectAddressSet(address, DatabaseRelationId, db_id);
|
|
|
|
|
|
|
|
systable_endscan(scan);
|
|
|
|
|
|
|
|
table_close(rel, NoLock);
|
|
|
|
|
|
|
|
return address;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2002-03-01 23:45:19 +01:00
|
|
|
/*
|
|
|
|
* ALTER DATABASE name SET ...
|
|
|
|
*/
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid
|
2002-03-01 23:45:19 +01:00
|
|
|
AlterDatabaseSet(AlterDatabaseSetStmt *stmt)
|
|
|
|
{
|
2010-08-05 16:45:09 +02:00
|
|
|
Oid datid = get_database_oid(stmt->dbname, false);
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2004-11-18 02:14:26 +01:00
|
|
|
/*
|
2009-10-08 00:14:26 +02:00
|
|
|
* Obtain a lock on the database and make sure it didn't go away in the
|
|
|
|
* meantime.
|
2004-11-18 02:14:26 +01:00
|
|
|
*/
|
2009-10-08 00:14:26 +02:00
|
|
|
shdepLockAndCheckObject(DatabaseRelationId, datid);
|
2002-03-01 23:45:19 +01:00
|
|
|
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, datid, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
2009-10-08 00:14:26 +02:00
|
|
|
stmt->dbname);
|
2002-03-01 23:45:19 +01:00
|
|
|
|
2009-10-08 00:14:26 +02:00
|
|
|
AlterSetting(datid, InvalidOid, stmt->setstmt);
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2009-10-08 00:14:26 +02:00
|
|
|
UnlockSharedObject(DatabaseRelationId, datid, 0, AccessShareLock);
|
2012-12-29 13:55:37 +01:00
|
|
|
|
|
|
|
return datid;
|
2002-03-01 23:45:19 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2004-05-26 15:57:04 +02:00
|
|
|
/*
|
|
|
|
* ALTER DATABASE name OWNER TO newowner
|
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
2005-06-28 07:09:14 +02:00
|
|
|
AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
|
2004-05-26 15:57:04 +02:00
|
|
|
{
|
2012-12-24 00:25:03 +01:00
|
|
|
Oid db_id;
|
2004-08-01 22:30:49 +02:00
|
|
|
HeapTuple tuple;
|
2004-05-26 15:57:04 +02:00
|
|
|
Relation rel;
|
|
|
|
ScanKeyData scankey;
|
|
|
|
SysScanDesc scan;
|
2004-06-25 23:55:59 +02:00
|
|
|
Form_pg_database datForm;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
2004-05-26 15:57:04 +02:00
|
|
|
|
2004-11-18 02:14:26 +01:00
|
|
|
/*
|
2006-05-04 18:07:29 +02:00
|
|
|
* Get the old tuple. We don't need a lock on the database per se,
|
|
|
|
* because we're not going to do anything that would mess up incoming
|
|
|
|
* connections.
|
2004-11-18 02:14:26 +01:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(DatabaseRelationId, RowExclusiveLock);
|
2004-05-26 15:57:04 +02:00
|
|
|
ScanKeyInit(&scankey,
|
|
|
|
Anum_pg_database_datname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
2016-09-13 23:17:48 +02:00
|
|
|
CStringGetDatum(dbname));
|
2005-04-14 22:03:27 +02:00
|
|
|
scan = systable_beginscan(rel, DatabaseNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, &scankey);
|
2004-05-26 15:57:04 +02:00
|
|
|
tuple = systable_getnext(scan);
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
|
|
|
errmsg("database \"%s\" does not exist", dbname)));
|
|
|
|
|
2004-08-01 22:30:49 +02:00
|
|
|
datForm = (Form_pg_database) GETSTRUCT(tuple);
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
db_id = datForm->oid;
|
2004-05-26 15:57:04 +02:00
|
|
|
|
2004-06-25 23:55:59 +02:00
|
|
|
/*
|
|
|
|
* If the new owner is the same as the existing owner, consider the
|
|
|
|
* command to have succeeded. This is to be consistent with other
|
|
|
|
* objects.
|
|
|
|
*/
|
2005-06-28 07:09:14 +02:00
|
|
|
if (datForm->datdba != newOwnerId)
|
2004-06-25 23:55:59 +02:00
|
|
|
{
|
2004-08-01 22:30:49 +02:00
|
|
|
Datum repl_val[Natts_pg_database];
|
2022-07-16 08:42:15 +02:00
|
|
|
bool repl_null[Natts_pg_database] = {0};
|
|
|
|
bool repl_repl[Natts_pg_database] = {0};
|
2004-08-01 22:30:49 +02:00
|
|
|
Acl *newAcl;
|
|
|
|
Datum aclDatum;
|
|
|
|
bool isNull;
|
|
|
|
HeapTuple newtuple;
|
|
|
|
|
2005-07-14 23:46:30 +02:00
|
|
|
/* Otherwise, must be owner of the existing object */
|
2022-11-13 08:11:17 +01:00
|
|
|
if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
2005-07-14 23:46:30 +02:00
|
|
|
dbname);
|
|
|
|
|
|
|
|
/* Must be able to become new owner */
|
Add a SET option to the GRANT command.
Similar to how the INHERIT option controls whether or not the
permissions of the granted role are automatically available to the
grantee, the new SET permission controls whether or not the grantee
may use the SET ROLE command to assume the privileges of the granted
role.
In addition, the new SET permission controls whether or not it
is possible to transfer ownership of objects to the target role
or to create new objects owned by the target role using commands
such as CREATE DATABASE .. OWNER. We could alternatively have made
this controlled by the INHERIT option, or allow it when either
option is given. An advantage of this approach is that if you
are granted a predefined role with INHERIT TRUE, SET FALSE, you
can't go and create objects owned by that role.
The underlying theory here is that the ability to create objects
as a target role is not a privilege per se, and thus does not
depend on whether you inherit the target role's privileges. However,
it's surely something you could do anyway if you could SET ROLE
to the target role, and thus making it contingent on whether you
have that ability is reasonable.
Design review by Nathan Bossat, Wolfgang Walther, Jeff Davis,
Peter Eisentraut, and Stephen Frost.
Discussion: http://postgr.es/m/CA+Tgmob+zDSRS6JXYrgq0NWdzCXuTNzT5eK54Dn2hhgt17nm8A@mail.gmail.com
2022-11-18 18:32:50 +01:00
|
|
|
check_can_set_role(GetUserId(), newOwnerId);
|
2005-07-14 23:46:30 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* must have createdb rights
|
|
|
|
*
|
|
|
|
* NOTE: This is different from other alter-owner checks in that the
|
|
|
|
* current user is checked for createdb privileges instead of the
|
|
|
|
* destination owner. This is consistent with the CREATE case for
|
2005-08-22 19:38:20 +02:00
|
|
|
* databases. Because superusers will always have this right, we need
|
|
|
|
* no special case for them.
|
2005-07-14 23:46:30 +02:00
|
|
|
*/
|
2014-12-23 19:35:49 +01:00
|
|
|
if (!have_createdb_privilege())
|
2004-06-25 23:55:59 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2005-07-14 23:46:30 +02:00
|
|
|
errmsg("permission denied to change owner of database")));
|
2004-05-26 15:57:04 +02:00
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
repl_repl[Anum_pg_database_datdba - 1] = true;
|
2005-06-28 07:09:14 +02:00
|
|
|
repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
|
2004-08-01 22:30:49 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine the modified ACL for the new owner. This is only
|
|
|
|
* necessary when the ACL is non-null.
|
|
|
|
*/
|
|
|
|
aclDatum = heap_getattr(tuple,
|
|
|
|
Anum_pg_database_datacl,
|
|
|
|
RelationGetDescr(rel),
|
|
|
|
&isNull);
|
|
|
|
if (!isNull)
|
|
|
|
{
|
|
|
|
newAcl = aclnewowner(DatumGetAclP(aclDatum),
|
2005-06-28 07:09:14 +02:00
|
|
|
datForm->datdba, newOwnerId);
|
2008-11-02 02:45:28 +01:00
|
|
|
repl_repl[Anum_pg_database_datacl - 1] = true;
|
2004-08-01 22:30:49 +02:00
|
|
|
repl_val[Anum_pg_database_datacl - 1] = PointerGetDatum(newAcl);
|
|
|
|
}
|
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
|
2004-08-01 22:30:49 +02:00
|
|
|
|
|
|
|
heap_freetuple(newtuple);
|
2005-07-07 22:40:02 +02:00
|
|
|
|
|
|
|
/* Update owner dependency reference */
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
changeDependencyOnOwner(DatabaseRelationId, db_id, newOwnerId);
|
2004-06-25 23:55:59 +02:00
|
|
|
}
|
2004-05-26 15:57:04 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
|
2013-03-18 03:55:14 +01:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddressSet(address, DatabaseRelationId, db_id);
|
|
|
|
|
2005-02-26 19:43:34 +01:00
|
|
|
systable_endscan(scan);
|
|
|
|
|
|
|
|
/* Close pg_database, but keep lock till commit */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2012-12-24 00:25:03 +01:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2004-05-26 15:57:04 +02:00
|
|
|
}
|
|
|
|
|
2002-03-01 23:45:19 +01:00
|
|
|
|
2022-02-14 08:09:04 +01:00
|
|
|
Datum
|
|
|
|
pg_database_collation_actual_version(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
Oid dbid = PG_GETARG_OID(0);
|
|
|
|
HeapTuple tp;
|
2022-03-17 11:11:21 +01:00
|
|
|
char datlocprovider;
|
2022-02-14 08:09:04 +01:00
|
|
|
Datum datum;
|
|
|
|
char *version;
|
|
|
|
|
|
|
|
tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
|
|
|
|
if (!HeapTupleIsValid(tp))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("database with OID %u does not exist", dbid)));
|
|
|
|
|
2022-03-17 11:11:21 +01:00
|
|
|
datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
|
|
|
|
|
2023-03-25 22:49:33 +01:00
|
|
|
datum = SysCacheGetAttrNotNull(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
|
2022-03-17 11:11:21 +01:00
|
|
|
version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
|
2022-02-14 08:09:04 +01:00
|
|
|
|
|
|
|
ReleaseSysCache(tp);
|
|
|
|
|
|
|
|
if (version)
|
|
|
|
PG_RETURN_TEXT_P(cstring_to_text(version));
|
|
|
|
else
|
|
|
|
PG_RETURN_NULL();
|
|
|
|
}
|
|
|
|
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2000-01-13 19:26:18 +01:00
|
|
|
* Helper functions
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
|
|
|
* Look up info about the database named "name". If the database exists,
|
|
|
|
* obtain the specified lock type on it, fill in any of the remaining
|
2017-08-16 06:22:32 +02:00
|
|
|
* parameters that aren't NULL, and return true. If no such database,
|
|
|
|
* return false.
|
2006-05-04 18:07:29 +02:00
|
|
|
*/
|
2000-01-13 19:26:18 +01:00
|
|
|
static bool
|
2006-05-04 18:07:29 +02:00
|
|
|
get_db_info(const char *name, LOCKMODE lockmode,
|
|
|
|
Oid *dbIdP, Oid *ownerIdP,
|
2005-03-12 22:33:55 +01:00
|
|
|
int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
|
2022-01-20 14:56:54 +01:00
|
|
|
TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
|
2022-03-17 11:11:21 +01:00
|
|
|
Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
|
2023-03-08 16:35:42 +01:00
|
|
|
char **dbIcurules,
|
2022-03-17 11:11:21 +01:00
|
|
|
char *dbLocProvider,
|
2022-02-14 08:09:04 +01:00
|
|
|
char **dbCollversion)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2006-05-04 18:07:29 +02:00
|
|
|
bool result = false;
|
2000-01-13 19:26:18 +01:00
|
|
|
Relation relation;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2022-10-28 09:19:06 +02:00
|
|
|
Assert(name);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-11-14 19:37:49 +01:00
|
|
|
/* Caller may wish to grab a better lock on pg_database beforehand... */
|
2019-01-21 19:32:19 +01:00
|
|
|
relation = table_open(DatabaseRelationId, AccessShareLock);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
|
|
|
* Loop covers the rare case where the database is renamed before we can
|
|
|
|
* lock it. We try again just in case we can find a new one of the same
|
|
|
|
* name.
|
|
|
|
*/
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
ScanKeyData scanKey;
|
|
|
|
SysScanDesc scan;
|
|
|
|
HeapTuple tuple;
|
|
|
|
Oid dbOid;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/*
|
|
|
|
* there's no syscache for database-indexed-by-name, so must do it the
|
|
|
|
* hard way
|
|
|
|
*/
|
|
|
|
ScanKeyInit(&scanKey,
|
|
|
|
Anum_pg_database_datname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
2016-09-13 23:17:48 +02:00
|
|
|
CStringGetDatum(name));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
scan = systable_beginscan(relation, DatabaseNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, &scanKey);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
tuple = systable_getnext(scan);
|
|
|
|
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
{
|
|
|
|
/* definitely no database of that name */
|
|
|
|
systable_endscan(scan);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
dbOid = ((Form_pg_database) GETSTRUCT(tuple))->oid;
|
2006-05-04 18:07:29 +02:00
|
|
|
|
|
|
|
systable_endscan(scan);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now that we have a database OID, we can try to lock the DB.
|
|
|
|
*/
|
|
|
|
if (lockmode != NoLock)
|
|
|
|
LockSharedObject(DatabaseRelationId, dbOid, 0, lockmode);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* And now, re-fetch the tuple by OID. If it's still there and still
|
|
|
|
* the same name, we win; else, drop the lock and loop back to try
|
|
|
|
* again.
|
|
|
|
*/
|
2010-02-14 19:42:19 +01:00
|
|
|
tuple = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbOid));
|
2006-05-04 18:07:29 +02:00
|
|
|
if (HeapTupleIsValid(tuple))
|
|
|
|
{
|
|
|
|
Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);
|
|
|
|
|
|
|
|
if (strcmp(name, NameStr(dbform->datname)) == 0)
|
|
|
|
{
|
2022-01-27 08:44:31 +01:00
|
|
|
Datum datum;
|
|
|
|
bool isnull;
|
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
/* oid of the database */
|
|
|
|
if (dbIdP)
|
|
|
|
*dbIdP = dbOid;
|
|
|
|
/* oid of the owner */
|
|
|
|
if (ownerIdP)
|
|
|
|
*ownerIdP = dbform->datdba;
|
|
|
|
/* character encoding */
|
|
|
|
if (encodingP)
|
|
|
|
*encodingP = dbform->encoding;
|
|
|
|
/* allowed as template? */
|
|
|
|
if (dbIsTemplateP)
|
|
|
|
*dbIsTemplateP = dbform->datistemplate;
|
|
|
|
/* allowing connections? */
|
|
|
|
if (dbAllowConnP)
|
|
|
|
*dbAllowConnP = dbform->datallowconn;
|
Fix recently-understood problems with handling of XID freezing, particularly
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-05 23:42:10 +01:00
|
|
|
/* limit of frozen XIDs */
|
|
|
|
if (dbFrozenXidP)
|
|
|
|
*dbFrozenXidP = dbform->datfrozenxid;
|
2019-07-29 05:28:30 +02:00
|
|
|
/* minimum MultiXactId */
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
if (dbMinMultiP)
|
|
|
|
*dbMinMultiP = dbform->datminmxid;
|
2006-05-04 18:07:29 +02:00
|
|
|
/* default tablespace for this database */
|
|
|
|
if (dbTablespace)
|
|
|
|
*dbTablespace = dbform->dattablespace;
|
2008-09-23 11:20:39 +02:00
|
|
|
/* default locale settings for this database */
|
2022-03-17 11:11:21 +01:00
|
|
|
if (dbLocProvider)
|
|
|
|
*dbLocProvider = dbform->datlocprovider;
|
2008-09-23 11:20:39 +02:00
|
|
|
if (dbCollate)
|
2022-01-27 08:44:31 +01:00
|
|
|
{
|
2023-03-25 22:49:33 +01:00
|
|
|
datum = SysCacheGetAttrNotNull(DATABASEOID, tuple, Anum_pg_database_datcollate);
|
2022-01-27 08:44:31 +01:00
|
|
|
*dbCollate = TextDatumGetCString(datum);
|
|
|
|
}
|
2008-09-23 11:20:39 +02:00
|
|
|
if (dbCtype)
|
2022-01-27 08:44:31 +01:00
|
|
|
{
|
2023-03-25 22:49:33 +01:00
|
|
|
datum = SysCacheGetAttrNotNull(DATABASEOID, tuple, Anum_pg_database_datctype);
|
2022-01-27 08:44:31 +01:00
|
|
|
*dbCtype = TextDatumGetCString(datum);
|
|
|
|
}
|
2022-03-17 11:11:21 +01:00
|
|
|
if (dbIculocale)
|
|
|
|
{
|
|
|
|
datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticulocale, &isnull);
|
|
|
|
if (isnull)
|
|
|
|
*dbIculocale = NULL;
|
|
|
|
else
|
|
|
|
*dbIculocale = TextDatumGetCString(datum);
|
|
|
|
}
|
2023-03-08 16:35:42 +01:00
|
|
|
if (dbIcurules)
|
|
|
|
{
|
|
|
|
datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
|
|
|
|
if (isnull)
|
|
|
|
*dbIcurules = NULL;
|
|
|
|
else
|
|
|
|
*dbIcurules = TextDatumGetCString(datum);
|
|
|
|
}
|
2022-02-14 08:09:04 +01:00
|
|
|
if (dbCollversion)
|
|
|
|
{
|
|
|
|
datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
|
|
|
|
if (isnull)
|
|
|
|
*dbCollversion = NULL;
|
|
|
|
else
|
|
|
|
*dbCollversion = TextDatumGetCString(datum);
|
|
|
|
}
|
2006-05-04 18:07:29 +02:00
|
|
|
ReleaseSysCache(tuple);
|
|
|
|
result = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
/* can only get here if it was just renamed */
|
|
|
|
ReleaseSysCache(tuple);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (lockmode != NoLock)
|
|
|
|
UnlockSharedObject(DatabaseRelationId, dbOid, 0, lockmode);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(relation, AccessShareLock);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-05-04 18:07:29 +02:00
|
|
|
return result;
|
2000-01-13 19:26:18 +01:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2014-12-23 19:35:49 +01:00
|
|
|
/* Check if current user has createdb privileges */
|
Adjust interaction of CREATEROLE with role properties.
Previously, a CREATEROLE user without SUPERUSER could not alter
REPLICATION users in any way, and could not set the BYPASSRLS
attribute. However, they could manipulate the CREATEDB property
even if they themselves did not possess it.
With this change, a CREATEROLE user without SUPERUSER can set or
clear the REPLICATION, BYPASSRLS, or CREATEDB property on a new
role or a role that they have rights to manage if and only if
that property is set for their own role.
This implements the standard idea that you can't give permissions
you don't have (but you can give the ones you do have). We might
in the future want to provide more powerful ways to constrain
what a CREATEROLE user can do - for example, to limit whether
CONNECTION LIMIT can be set or the values to which it can be set -
but that is left as future work.
Patch by me, reviewed by Nathan Bossart, Tushar Ahuja, and Neha
Sharma.
Discussion: http://postgr.es/m/CA+TgmobX=LHg_J5aT=0pi9gJy=JdtrUVGAu0zhr-i5v5nNbJDg@mail.gmail.com
2023-01-24 16:57:09 +01:00
|
|
|
bool
|
2014-12-23 19:35:49 +01:00
|
|
|
have_createdb_privilege(void)
|
|
|
|
{
|
|
|
|
bool result = false;
|
|
|
|
HeapTuple utup;
|
|
|
|
|
|
|
|
/* Superusers can always do everything */
|
|
|
|
if (superuser())
|
|
|
|
return true;
|
|
|
|
|
|
|
|
utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(GetUserId()));
|
|
|
|
if (HeapTupleIsValid(utup))
|
|
|
|
{
|
|
|
|
result = ((Form_pg_authid) GETSTRUCT(utup))->rolcreatedb;
|
|
|
|
ReleaseSysCache(utup);
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/*
|
|
|
|
* Remove tablespace directories
|
|
|
|
*
|
|
|
|
* We don't know what tablespaces db_id is using, so iterate through all
|
|
|
|
* tablespaces removing <tablespace>/db_id
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
remove_dbtablespaces(Oid db_id)
|
2000-11-08 17:59:50 +01:00
|
|
|
{
|
2004-06-18 08:14:31 +02:00
|
|
|
Relation rel;
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scan;
|
2004-06-18 08:14:31 +02:00
|
|
|
HeapTuple tuple;
|
2019-11-21 13:10:37 +01:00
|
|
|
List *ltblspc = NIL;
|
|
|
|
ListCell *cell;
|
|
|
|
int ntblspc;
|
|
|
|
int i;
|
|
|
|
Oid *tablespace_ids;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, AccessShareLock);
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scan = table_beginscan_catalog(rel, 0, NULL);
|
2004-06-18 08:14:31 +02:00
|
|
|
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
|
2000-11-08 17:59:50 +01:00
|
|
|
{
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Form_pg_tablespace spcform = (Form_pg_tablespace) GETSTRUCT(tuple);
|
|
|
|
Oid dsttablespace = spcform->oid;
|
2004-06-18 08:14:31 +02:00
|
|
|
char *dstpath;
|
|
|
|
struct stat st;
|
2000-11-08 17:59:50 +01:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/* Don't mess with the global tablespace */
|
|
|
|
if (dsttablespace == GLOBALTABLESPACE_OID)
|
|
|
|
continue;
|
2002-02-23 21:55:46 +01:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
dstpath = GetDatabasePath(db_id, dsttablespace);
|
2000-11-08 17:59:50 +01:00
|
|
|
|
2006-10-19 00:44:12 +02:00
|
|
|
if (lstat(dstpath, &st) < 0 || !S_ISDIR(st.st_mode))
|
2000-11-08 17:59:50 +01:00
|
|
|
{
|
2004-06-18 08:14:31 +02:00
|
|
|
/* Assume we can ignore it */
|
|
|
|
pfree(dstpath);
|
|
|
|
continue;
|
2000-11-08 17:59:50 +01:00
|
|
|
}
|
2000-11-14 19:37:49 +01:00
|
|
|
|
2004-08-01 08:19:26 +02:00
|
|
|
if (!rmtree(dstpath, true))
|
2004-06-18 08:14:31 +02:00
|
|
|
ereport(WARNING,
|
2008-04-18 19:05:45 +02:00
|
|
|
(errmsg("some useless files may be left behind in old database directory \"%s\"",
|
2004-08-29 23:08:48 +02:00
|
|
|
dstpath)));
|
|
|
|
|
2019-11-21 13:10:37 +01:00
|
|
|
ltblspc = lappend_oid(ltblspc, dsttablespace);
|
|
|
|
pfree(dstpath);
|
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
|
2019-11-21 13:10:37 +01:00
|
|
|
ntblspc = list_length(ltblspc);
|
|
|
|
if (ntblspc == 0)
|
|
|
|
{
|
|
|
|
table_endscan(scan);
|
|
|
|
table_close(rel, AccessShareLock);
|
|
|
|
return;
|
|
|
|
}
|
2005-03-23 01:03:37 +01:00
|
|
|
|
2019-11-21 13:10:37 +01:00
|
|
|
tablespace_ids = (Oid *) palloc(ntblspc * sizeof(Oid));
|
|
|
|
i = 0;
|
|
|
|
foreach(cell, ltblspc)
|
|
|
|
tablespace_ids[i++] = lfirst_oid(cell);
|
2004-08-29 23:08:48 +02:00
|
|
|
|
2019-11-21 13:10:37 +01:00
|
|
|
/* Record the filesystem change in XLOG */
|
|
|
|
{
|
|
|
|
xl_dbase_drop_rec xlrec;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2019-11-21 13:10:37 +01:00
|
|
|
xlrec.db_id = db_id;
|
|
|
|
xlrec.ntablespaces = ntblspc;
|
|
|
|
|
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterData((char *) &xlrec, MinSizeOfDbaseDropRec);
|
|
|
|
XLogRegisterData((char *) tablespace_ids, ntblspc * sizeof(Oid));
|
|
|
|
|
|
|
|
(void) XLogInsert(RM_DBASE_ID,
|
|
|
|
XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
|
2000-11-08 17:59:50 +01:00
|
|
|
}
|
|
|
|
|
2019-11-21 13:10:37 +01:00
|
|
|
list_free(ltblspc);
|
|
|
|
pfree(tablespace_ids);
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scan);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, AccessShareLock);
|
2000-11-08 17:59:50 +01:00
|
|
|
}
|
2002-08-09 18:45:16 +02:00
|
|
|
|
2006-10-19 00:44:12 +02:00
|
|
|
/*
|
|
|
|
* Check for existing files that conflict with a proposed new DB OID;
|
2017-08-16 06:22:32 +02:00
|
|
|
* return true if there are any
|
2006-10-19 00:44:12 +02:00
|
|
|
*
|
|
|
|
* If there were a subdirectory in any tablespace matching the proposed new
|
|
|
|
* OID, we'd get a create failure due to the duplicate name ... and then we'd
|
|
|
|
* try to remove that already-existing subdirectory during the cleanup in
|
|
|
|
* remove_dbtablespaces. Nuking existing files seems like a bad idea, so
|
|
|
|
* instead we make this extra check before settling on the OID of the new
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
* database. This exactly parallels what GetNewRelFileNumber() does for table
|
|
|
|
* relfilenumber values.
|
2006-10-19 00:44:12 +02:00
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
check_db_file_conflict(Oid db_id)
|
|
|
|
{
|
|
|
|
bool result = false;
|
|
|
|
Relation rel;
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scan;
|
2006-10-19 00:44:12 +02:00
|
|
|
HeapTuple tuple;
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, AccessShareLock);
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scan = table_beginscan_catalog(rel, 0, NULL);
|
2006-10-19 00:44:12 +02:00
|
|
|
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
|
|
|
|
{
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Form_pg_tablespace spcform = (Form_pg_tablespace) GETSTRUCT(tuple);
|
|
|
|
Oid dsttablespace = spcform->oid;
|
2006-10-19 00:44:12 +02:00
|
|
|
char *dstpath;
|
|
|
|
struct stat st;
|
|
|
|
|
|
|
|
/* Don't mess with the global tablespace */
|
|
|
|
if (dsttablespace == GLOBALTABLESPACE_OID)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
dstpath = GetDatabasePath(db_id, dsttablespace);
|
|
|
|
|
|
|
|
if (lstat(dstpath, &st) == 0)
|
|
|
|
{
|
|
|
|
/* Found a conflicting file (or directory, whatever) */
|
|
|
|
pfree(dstpath);
|
|
|
|
result = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
pfree(dstpath);
|
|
|
|
}
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scan);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, AccessShareLock);
|
2013-01-19 00:06:20 +01:00
|
|
|
|
2006-10-19 00:44:12 +02:00
|
|
|
return result;
|
|
|
|
}
|
2002-08-09 18:45:16 +02:00
|
|
|
|
2008-08-04 20:03:46 +02:00
|
|
|
/*
|
|
|
|
* Issue a suitable errdetail message for a busy database
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
errdetail_busy_db(int notherbackends, int npreparedxacts)
|
|
|
|
{
|
|
|
|
if (notherbackends > 0 && npreparedxacts > 0)
|
2013-05-29 22:58:43 +02:00
|
|
|
|
2012-06-15 01:01:00 +02:00
|
|
|
/*
|
|
|
|
* We don't deal with singular versus plural here, since gettext
|
|
|
|
* doesn't support multiple plurals in one string.
|
|
|
|
*/
|
2008-08-04 20:03:46 +02:00
|
|
|
errdetail("There are %d other session(s) and %d prepared transaction(s) using the database.",
|
|
|
|
notherbackends, npreparedxacts);
|
|
|
|
else if (notherbackends > 0)
|
2012-06-15 01:01:00 +02:00
|
|
|
errdetail_plural("There is %d other session using the database.",
|
|
|
|
"There are %d other sessions using the database.",
|
|
|
|
notherbackends,
|
|
|
|
notherbackends);
|
2008-08-04 20:03:46 +02:00
|
|
|
else
|
2012-06-15 01:01:00 +02:00
|
|
|
errdetail_plural("There is %d prepared transaction using the database.",
|
|
|
|
"There are %d prepared transactions using the database.",
|
|
|
|
npreparedxacts,
|
|
|
|
npreparedxacts);
|
2008-08-04 20:03:46 +02:00
|
|
|
return 0; /* just to keep ereport macro happy */
|
|
|
|
}
|
|
|
|
|
2002-08-09 18:45:16 +02:00
|
|
|
/*
|
|
|
|
* get_database_oid - given a database name, look up the OID
|
|
|
|
*
|
2010-08-05 16:45:09 +02:00
|
|
|
* If missing_ok is false, throw an error if database name not found. If
|
|
|
|
* true, just return InvalidOid.
|
2002-08-09 18:45:16 +02:00
|
|
|
*/
|
|
|
|
Oid
|
2010-08-05 16:45:09 +02:00
|
|
|
get_database_oid(const char *dbname, bool missing_ok)
|
2002-08-09 18:45:16 +02:00
|
|
|
{
|
|
|
|
Relation pg_database;
|
|
|
|
ScanKeyData entry[1];
|
2003-06-27 16:45:32 +02:00
|
|
|
SysScanDesc scan;
|
2002-08-09 18:45:16 +02:00
|
|
|
HeapTuple dbtuple;
|
|
|
|
Oid oid;
|
|
|
|
|
2006-05-04 00:45:26 +02:00
|
|
|
/*
|
|
|
|
* There's no syscache for pg_database indexed by name, so we must look
|
|
|
|
* the hard way.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
pg_database = table_open(DatabaseRelationId, AccessShareLock);
|
2003-11-12 22:15:59 +01:00
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_database_datname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(dbname));
|
2005-04-14 22:03:27 +02:00
|
|
|
scan = systable_beginscan(pg_database, DatabaseNameIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, entry);
|
2002-08-09 18:45:16 +02:00
|
|
|
|
2003-06-27 16:45:32 +02:00
|
|
|
dbtuple = systable_getnext(scan);
|
2002-08-09 18:45:16 +02:00
|
|
|
|
|
|
|
/* We assume that there can be at most one matching tuple */
|
|
|
|
if (HeapTupleIsValid(dbtuple))
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
oid = ((Form_pg_database) GETSTRUCT(dbtuple))->oid;
|
2002-08-09 18:45:16 +02:00
|
|
|
else
|
|
|
|
oid = InvalidOid;
|
|
|
|
|
2003-06-27 16:45:32 +02:00
|
|
|
systable_endscan(scan);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pg_database, AccessShareLock);
|
2002-08-09 18:45:16 +02:00
|
|
|
|
2010-08-05 16:45:09 +02:00
|
|
|
if (!OidIsValid(oid) && !missing_ok)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_DATABASE),
|
|
|
|
errmsg("database \"%s\" does not exist",
|
|
|
|
dbname)));
|
|
|
|
|
2002-08-09 18:45:16 +02:00
|
|
|
return oid;
|
|
|
|
}
|
|
|
|
|
2003-06-27 16:45:32 +02:00
|
|
|
|
2002-08-09 18:45:16 +02:00
|
|
|
/*
|
2003-06-27 16:45:32 +02:00
|
|
|
* get_database_name - given a database OID, look up the name
|
2002-08-09 18:45:16 +02:00
|
|
|
*
|
2004-06-18 08:14:31 +02:00
|
|
|
* Returns a palloc'd string, or NULL if no such database.
|
2002-08-09 18:45:16 +02:00
|
|
|
*/
|
2003-06-27 16:45:32 +02:00
|
|
|
char *
|
|
|
|
get_database_name(Oid dbid)
|
2002-08-09 18:45:16 +02:00
|
|
|
{
|
|
|
|
HeapTuple dbtuple;
|
2003-06-27 16:45:32 +02:00
|
|
|
char *result;
|
2002-08-09 18:45:16 +02:00
|
|
|
|
2010-02-14 19:42:19 +01:00
|
|
|
dbtuple = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
|
2003-06-27 16:45:32 +02:00
|
|
|
if (HeapTupleIsValid(dbtuple))
|
2006-05-04 00:45:26 +02:00
|
|
|
{
|
2003-06-27 16:45:32 +02:00
|
|
|
result = pstrdup(NameStr(((Form_pg_database) GETSTRUCT(dbtuple))->datname));
|
2006-05-04 00:45:26 +02:00
|
|
|
ReleaseSysCache(dbtuple);
|
|
|
|
}
|
2003-06-27 16:45:32 +02:00
|
|
|
else
|
|
|
|
result = NULL;
|
2002-08-09 18:45:16 +02:00
|
|
|
|
2003-06-27 16:45:32 +02:00
|
|
|
return result;
|
2002-08-09 18:45:16 +02:00
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
/*
|
|
|
|
* recovery_create_dbdir()
|
|
|
|
*
|
|
|
|
* During recovery, there's a case where we validly need to recover a missing
|
|
|
|
* tablespace directory so that recovery can continue. This happens when
|
|
|
|
* recovery wants to create a database but the holding tablespace has been
|
|
|
|
* removed before the server stopped. Since we expect that the directory will
|
|
|
|
* be gone before reaching recovery consistency, and we have no knowledge about
|
|
|
|
* the tablespace other than its OID here, we create a real directory under
|
|
|
|
* pg_tblspc here instead of restoring the symlink.
|
|
|
|
*
|
|
|
|
* If only_tblspc is true, then the requested directory must be in pg_tblspc/
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
recovery_create_dbdir(char *path, bool only_tblspc)
|
|
|
|
{
|
|
|
|
struct stat st;
|
|
|
|
|
|
|
|
Assert(RecoveryInProgress());
|
|
|
|
|
|
|
|
if (stat(path, &st) == 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (only_tblspc && strstr(path, "pg_tblspc/") == NULL)
|
|
|
|
elog(PANIC, "requested to created invalid directory: %s", path);
|
|
|
|
|
|
|
|
if (reachedConsistency && !allow_in_place_tablespaces)
|
|
|
|
ereport(PANIC,
|
|
|
|
errmsg("missing directory \"%s\"", path));
|
|
|
|
|
|
|
|
elog(reachedConsistency ? WARNING : DEBUG1,
|
|
|
|
"creating missing directory: %s", path);
|
|
|
|
|
|
|
|
if (pg_mkdir_p(path, pg_dir_create_mode) != 0)
|
|
|
|
ereport(PANIC,
|
|
|
|
errmsg("could not create missing directory \"%s\": %m", path));
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
|
|
|
* DATABASE resource manager's routines
|
|
|
|
*/
|
|
|
|
void
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
dbase_redo(XLogReaderState *record)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
|
2004-08-29 23:08:48 +02:00
|
|
|
|
2009-01-20 19:59:37 +01:00
|
|
|
/* Backup blocks are not used in dbase records */
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
Assert(!XLogRecHasAnyBlockRefs(record));
|
2009-01-20 19:59:37 +01:00
|
|
|
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
if (info == XLOG_DBASE_CREATE_FILE_COPY)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
xl_dbase_create_file_copy_rec *xlrec =
|
2023-05-19 23:24:48 +02:00
|
|
|
(xl_dbase_create_file_copy_rec *) XLogRecGetData(record);
|
2005-03-23 01:03:37 +01:00
|
|
|
char *src_path;
|
|
|
|
char *dst_path;
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
char *parent_path;
|
2005-03-23 01:03:37 +01:00
|
|
|
struct stat st;
|
|
|
|
|
|
|
|
src_path = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id);
|
|
|
|
dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Our theory for replaying a CREATE is to forcibly drop the target
|
|
|
|
* subdirectory if present, then re-copy the source data. This may be
|
|
|
|
* more work than needed, but it is simple to implement.
|
|
|
|
*/
|
|
|
|
if (stat(dst_path, &st) == 0 && S_ISDIR(st.st_mode))
|
|
|
|
{
|
|
|
|
if (!rmtree(dst_path, true))
|
2010-07-20 20:14:16 +02:00
|
|
|
/* If this failed, copydir() below is going to error. */
|
2005-03-23 01:03:37 +01:00
|
|
|
ereport(WARNING,
|
2008-04-18 19:05:45 +02:00
|
|
|
(errmsg("some useless files may be left behind in old database directory \"%s\"",
|
2005-08-02 21:02:32 +02:00
|
|
|
dst_path)));
|
2005-03-23 01:03:37 +01:00
|
|
|
}
|
|
|
|
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
/*
|
|
|
|
* If the parent of the target path doesn't exist, create it now. This
|
|
|
|
* enables us to create the target underneath later.
|
|
|
|
*/
|
|
|
|
parent_path = pstrdup(dst_path);
|
|
|
|
get_parent_directory(parent_path);
|
|
|
|
if (stat(parent_path, &st) < 0)
|
|
|
|
{
|
|
|
|
if (errno != ENOENT)
|
|
|
|
ereport(FATAL,
|
|
|
|
errmsg("could not stat directory \"%s\": %m",
|
|
|
|
dst_path));
|
|
|
|
|
|
|
|
/* create the parent directory if needed and valid */
|
|
|
|
recovery_create_dbdir(parent_path, true);
|
|
|
|
}
|
|
|
|
pfree(parent_path);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* There's a case where the copy source directory is missing for the
|
2023-02-09 06:43:53 +01:00
|
|
|
* same reason above. Create the empty source directory so that
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
* copydir below doesn't fail. The directory will be dropped soon by
|
|
|
|
* recovery.
|
|
|
|
*/
|
|
|
|
if (stat(src_path, &st) < 0 && errno == ENOENT)
|
|
|
|
recovery_create_dbdir(src_path, false);
|
|
|
|
|
2005-03-23 01:03:37 +01:00
|
|
|
/*
|
|
|
|
* Force dirty buffers out to disk, to ensure source database is
|
2007-06-28 02:02:40 +02:00
|
|
|
* up-to-date for the copy.
|
2005-03-23 01:03:37 +01:00
|
|
|
*/
|
2007-06-28 02:02:40 +02:00
|
|
|
FlushDatabaseBuffers(xlrec->src_db_id);
|
2005-03-23 01:03:37 +01:00
|
|
|
|
2022-05-07 05:19:52 +02:00
|
|
|
/* Close all sgmr fds in all backends. */
|
|
|
|
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
|
|
|
|
|
2005-03-23 01:03:37 +01:00
|
|
|
/*
|
|
|
|
* Copy this subdirectory to the new location
|
|
|
|
*
|
2005-08-02 21:02:32 +02:00
|
|
|
* We don't need to copy subdirectories
|
2005-03-23 01:03:37 +01:00
|
|
|
*/
|
2005-08-02 21:02:32 +02:00
|
|
|
copydir(src_path, dst_path, false);
|
2022-04-25 10:32:13 +02:00
|
|
|
|
|
|
|
pfree(src_path);
|
|
|
|
pfree(dst_path);
|
2005-03-23 01:03:37 +01:00
|
|
|
}
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
else if (info == XLOG_DBASE_CREATE_WAL_LOG)
|
|
|
|
{
|
|
|
|
xl_dbase_create_wal_log_rec *xlrec =
|
2023-05-19 23:24:48 +02:00
|
|
|
(xl_dbase_create_wal_log_rec *) XLogRecGetData(record);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
char *dbpath;
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
char *parent_path;
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
|
|
|
|
dbpath = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
|
|
|
|
|
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc68bf4, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
|
|
|
/* create the parent directory if needed and valid */
|
|
|
|
parent_path = pstrdup(dbpath);
|
|
|
|
get_parent_directory(parent_path);
|
|
|
|
recovery_create_dbdir(parent_path, true);
|
|
|
|
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
/* Create the database directory with the version file. */
|
|
|
|
CreateDirAndVersionFile(dbpath, xlrec->db_id, xlrec->tablespace_id,
|
|
|
|
true);
|
2022-04-25 10:32:13 +02:00
|
|
|
pfree(dbpath);
|
Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.
Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.
Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.
Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
|
|
|
}
|
2005-03-23 01:03:37 +01:00
|
|
|
else if (info == XLOG_DBASE_DROP)
|
|
|
|
{
|
|
|
|
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
|
|
|
|
char *dst_path;
|
2019-11-21 13:10:37 +01:00
|
|
|
int i;
|
2005-03-23 01:03:37 +01:00
|
|
|
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
if (InHotStandby)
|
2010-01-16 15:16:31 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Lock database while we resolve conflicts to ensure that
|
|
|
|
* InitPostgres() cannot fully re-execute concurrently. This
|
|
|
|
* avoids backends re-connecting automatically to same database,
|
|
|
|
* which can happen in some cases.
|
2017-03-28 16:05:21 +02:00
|
|
|
*
|
|
|
|
* This will lock out walsenders trying to connect to db-specific
|
|
|
|
* slots for logical decoding too, so it's safe for us to drop
|
|
|
|
* slots.
|
2010-01-16 15:16:31 +01:00
|
|
|
*/
|
|
|
|
LockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
|
2010-01-14 12:08:02 +01:00
|
|
|
ResolveRecoveryConflictWithDatabase(xlrec->db_id);
|
2010-01-16 15:16:31 +01:00
|
|
|
}
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
|
2017-03-28 16:05:21 +02:00
|
|
|
/* Drop any database-specific replication slots */
|
|
|
|
ReplicationSlotsDropDBSlots(xlrec->db_id);
|
|
|
|
|
2006-03-29 23:17:39 +02:00
|
|
|
/* Drop pages for this database that are in the shared buffer cache */
|
|
|
|
DropDatabaseBuffers(xlrec->db_id);
|
|
|
|
|
2007-04-12 17:04:35 +02:00
|
|
|
/* Also, clean out any fsync requests that might be pending in md.c */
|
2019-04-04 10:56:03 +02:00
|
|
|
ForgetDatabaseSyncRequests(xlrec->db_id);
|
2007-04-12 17:04:35 +02:00
|
|
|
|
2006-03-29 23:17:39 +02:00
|
|
|
/* Clean out the xlog relcache too */
|
|
|
|
XLogDropDatabase(xlrec->db_id);
|
2005-03-23 01:03:37 +01:00
|
|
|
|
2022-02-11 22:21:23 +01:00
|
|
|
/* Close all sgmr fds in all backends. */
|
|
|
|
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
|
|
|
|
|
2019-11-21 13:10:37 +01:00
|
|
|
for (i = 0; i < xlrec->ntablespaces; i++)
|
|
|
|
{
|
|
|
|
dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]);
|
|
|
|
|
|
|
|
/* And remove the physical files */
|
|
|
|
if (!rmtree(dst_path, true))
|
|
|
|
ereport(WARNING,
|
|
|
|
(errmsg("some useless files may be left behind in old database directory \"%s\"",
|
|
|
|
dst_path)));
|
|
|
|
pfree(dst_path);
|
|
|
|
}
|
2010-01-16 15:16:31 +01:00
|
|
|
|
|
|
|
if (InHotStandby)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Release locks prior to commit. XXX There is a race condition
|
|
|
|
* here that may allow backends to reconnect, but the window for
|
|
|
|
* this is small because the gap between here and commit is mostly
|
|
|
|
* fairly small and it is unlikely that people will be dropping
|
|
|
|
* databases that we are trying to connect to anyway.
|
|
|
|
*/
|
|
|
|
UnlockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
|
|
|
|
}
|
2005-03-23 01:03:37 +01:00
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
else
|
|
|
|
elog(PANIC, "dbase_redo: unknown op code %u", info);
|
|
|
|
}
|