postgresql/doc/src/sgml/manage-ag.sgml

491 lines
20 KiB
Plaintext
Raw Normal View History

<!--
2004-08-07 21:14:45 +02:00
$PostgreSQL: pgsql/doc/src/sgml/manage-ag.sgml,v 2.33 2004/08/07 19:14:45 tgl Exp $
-->
<chapter id="managing-databases">
<title>Managing Databases</title>
2001-11-12 20:19:39 +01:00
<indexterm zone="managing-databases"><primary>database</></>
<para>
2002-11-15 04:11:18 +01:00
Every instance of a running <productname>PostgreSQL</productname>
server manages one or more databases. Databases are therefore the
topmost hierarchical level for organizing <acronym>SQL</acronym>
objects (<quote>database objects</quote>). This chapter describes
the properties of databases, and how to create, manage, and destroy
them.
</para>
<sect1 id="manage-ag-overview">
<title>Overview</title>
2003-08-31 19:32:24 +02:00
<indexterm zone="manage-ag-overview">
<primary>schema</primary>
</indexterm>
<para>
2002-11-15 04:11:18 +01:00
A database is a named collection of <acronym>SQL</acronym> objects
(<quote>database objects</quote>). Generally, every database
object (tables, functions, etc.) belongs to one and only one
database. (But there are a few system catalogs, for example
<literal>pg_database</>, that belong to a whole cluster and
are accessible from each database within the cluster.) More
2002-11-15 04:11:18 +01:00
accurately, a database is a collection of schemas and the schemas
contain the tables, functions, etc. So the full hierarchy is:
server, database, schema, table (or some other kind of object,
such as a function).
</para>
<para>
An application that connects to the database server specifies in
its connection request the name of the database it wants to connect
to. It is not possible to access more than one database per
connection. (But an application is not restricted in the number of
connections it opens to the same or other databases.) It is
possible, however, to access more than one schema from the same
connection. Schemas are a purely logical structure and who can
access what is managed by the privilege system. Databases are
physically separated and access control is managed at the
2002-11-15 04:11:18 +01:00
connection level. If one <productname>PostgreSQL</> server
instance is to house projects or users that should be separate and
for the most part unaware of each other, it is therefore
recommendable to put them into separate databases. If the projects
or users are interrelated and should be able to use each other's
resources they should be put in the same databases but possibly
into separate schemas. More information about managing schemas is
in <xref linkend="ddl-schemas">.
</para>
<note>
<para>
The <acronym>SQL</> standard calls databases <quote>catalogs</>, but there
is no difference in practice.
</para>
</note>
</sect1>
<sect1 id="manage-ag-createdb">
<title>Creating a Database</title>
<para>
In order to create a database, the <productname>PostgreSQL</>
server must be up and running (see <xref
linkend="postmaster-start">).
</para>
<para>
Databases are created with the SQL command
2004-08-07 21:14:45 +02:00
<xref linkend="sql-createdatabase">:<indexterm><primary>CREATE
DATABASE</></>
<synopsis>
CREATE DATABASE <replaceable>name</>;
</synopsis>
2002-11-15 04:11:18 +01:00
where <replaceable>name</> follows the usual rules for
<acronym>SQL</acronym> identifiers. The current user automatically
becomes the owner of the new database. It is the privilege of the
owner of a database to remove it later on (which also removes all
the objects in it, even if they have a different owner).
</para>
<para>
The creation of databases is a restricted operation. See <xref
linkend="user-attributes"> for how to grant permission.
</para>
<para>
Since you need to be connected to the database server in order to
execute the <command>CREATE DATABASE</command> command, the
question remains how the <emphasis>first</> database at any given
site can be created. The first database is always created by the
<command>initdb</> command when the data storage area is
2003-08-31 19:32:24 +02:00
initialized. (See <xref linkend="creating-cluster">.) This
database is called
<literal>template1</>.<indexterm><primary>template1</></> So to
create the first <quote>real</> database you can connect to
<literal>template1</>.
</para>
<para>
The name <literal>template1</literal> is no accident: when a new
database is created, the template database is essentially cloned.
This means that any changes you make in <literal>template1</> are
propagated to all subsequently created databases. This implies that
you should not use the template database for real work, but when
used judiciously this feature can be convenient. More details
appear in <xref linkend="manage-ag-templatedbs">.
</para>
<para>
As an extra convenience, there is also a program that you can
execute from the shell to create new databases,
2003-08-31 19:32:24 +02:00
<command>createdb</>.<indexterm><primary>createdb</></>
<synopsis>
createdb <replaceable class="parameter">dbname</replaceable>
</synopsis>
<command>createdb</> does no magic. It connects to the <literal>template1</>
database and issues the <command>CREATE DATABASE</> command,
exactly as described above.
The reference page on <command>createdb</> contains the invocation
details. Note that <command>createdb</> without any arguments will create
a database with the current user name, which may or may not be what
you want.
</para>
<note>
<para>
<xref linkend="client-authentication"> contains information about
how to restrict who can connect to a given database.
</para>
</note>
<para>
Sometimes you want to create a database for someone else. That
user should become the owner of the new database, so he can
configure and manage it himself. To achieve that, use one of the
following commands:
<programlisting>
CREATE DATABASE <replaceable>dbname</> OWNER <replaceable>username</>;
</programlisting>
from the SQL environment, or
<programlisting>
createdb -O <replaceable>username</> <replaceable>dbname</>
</programlisting>
You must be a superuser to be allowed to create a database for
someone else.
</para>
</sect1>
<sect1 id="manage-ag-templatedbs">
<title>Template Databases</title>
<para>
<command>CREATE DATABASE</> actually works by copying an existing
database. By default, it copies the standard system database named
2003-08-31 19:32:24 +02:00
<literal>template1</>.<indexterm><primary>template1</></> Thus that
database is the <quote>template</> from which new databases are
made. If you add objects to <literal>template1</>, these objects
will be copied into subsequently created user databases. This
behavior allows site-local modifications to the standard set of
objects in databases. For example, if you install the procedural
language <application>PL/pgSQL</> in <literal>template1</>, it will
2003-08-31 19:32:24 +02:00
automatically be available in user databases without any extra
action being taken when those databases are made.
</para>
<para>
2003-08-31 19:32:24 +02:00
There is a second standard system database named
<literal>template0</>.<indexterm><primary>template0</></> This
database contains the same data as the initial contents of
<literal>template1</>, that is, only the standard objects
predefined by your version of
<productname>PostgreSQL</productname>. <literal>template0</>
should never be changed after <command>initdb</>. By instructing
<command>CREATE DATABASE</> to copy <literal>template0</> instead
of <literal>template1</>, you can create a <quote>virgin</> user
database that contains none of the site-local additions in
<literal>template1</>. This is particularly handy when restoring a
<literal>pg_dump</> dump: the dump script should be restored in a
virgin database to ensure that one recreates the correct contents
of the dumped database, without any conflicts with additions that
may now be present in <literal>template1</>.
</para>
<para>
To create a database by copying <literal>template0</literal>, use
<programlisting>
CREATE DATABASE <replaceable>dbname</> TEMPLATE template0;
</programlisting>
from the SQL environment, or
<programlisting>
createdb -T template0 <replaceable>dbname</>
</programlisting>
from the shell.
</para>
<para>
It is possible to create additional template databases, and indeed
one might copy any database in a cluster by specifying its name
as the template for <command>CREATE DATABASE</>. It is important to
understand, however, that this is not (yet) intended as
a general-purpose <quote><command>COPY DATABASE</command></quote> facility. In particular, it is
essential that the source database be idle (no data-altering transactions
in progress)
for the duration of the copying operation. <command>CREATE DATABASE</>
will check
that no session (other than itself) is connected to
the source database at the start of the operation, but this does not
guarantee that changes cannot be made while the copy proceeds, which
would result in an inconsistent copied database. Therefore,
we recommend that databases used as templates be treated as read-only.
</para>
<para>
2003-08-31 19:32:24 +02:00
Two useful flags exist in <literal>pg_database</literal><indexterm><primary>pg_database</></> for each
database: the columns <literal>datistemplate</literal> and
<literal>datallowconn</literal>. <literal>datistemplate</literal>
may be set to indicate that a database is intended as a template for
<command>CREATE DATABASE</>. If this flag is set, the database may be
cloned by
2001-11-28 21:49:10 +01:00
any user with <literal>CREATEDB</> privileges; if it is not set, only superusers
and the owner of the database may clone it.
If <literal>datallowconn</literal> is false, then no new connections
to that database will be allowed (but existing sessions are not killed
simply by setting the flag false). The <literal>template0</literal>
database is normally marked <literal>datallowconn = false</> to prevent modification of it.
Both <literal>template0</literal> and <literal>template1</literal>
should always be marked with <literal>datistemplate = true</>.
</para>
<para>
After preparing a template database, or making any changes to one,
it is a good idea to perform
<command>VACUUM FREEZE</> or <command>VACUUM FULL FREEZE</> in that
database. If this is done when there are no other open transactions
in the same database, then it is guaranteed that all rows in the
database are <quote>frozen</> and will not be subject to transaction
ID wraparound problems. This is particularly important for a database
that will have <literal>datallowconn</literal> set to false, since it
will be impossible to do routine maintenance <command>VACUUM</> in
such a database.
See <xref linkend="vacuum-for-wraparound"> for more information.
</para>
<note>
<para>
<literal>template1</> and <literal>template0</> do not have any special
status beyond the fact that the name <literal>template1</> is the default
source database name for <command>CREATE DATABASE</> and the default
database-to-connect-to for various programs such as <command>createdb</>.
For example, one could drop <literal>template1</> and recreate it from
<literal>template0</> without any ill effects. This course of action
might be advisable if one has carelessly added a bunch of junk in
<literal>template1</>.
</para>
</note>
</sect1>
<sect1 id="manage-ag-config">
<title>Database Configuration</title>
<para>
2002-11-15 04:11:18 +01:00
Recall from <xref linkend="runtime-config"> that the
<productname>PostgreSQL</> server provides a large number of
run-time configuration variables. You can set database-specific
default values for many of these settings.
</para>
<para>
For example, if for some reason you want to disable the
<acronym>GEQO</acronym> optimizer for a given database, you'd
ordinarily have to either disable it for all databases or make sure
that every connecting client is careful to issue <literal>SET geqo
TO off;</literal>. To make this setting the default within a particular
database, you can execute the command
<programlisting>
ALTER DATABASE mydb SET geqo TO off;
</programlisting>
This will save the setting (but not set it immediately) and in
subsequent connections it will appear as though <literal>SET geqo
TO off;</literal> had been called right before the session started.
Note that users can still alter this setting during the session; it
will only be the default. To undo any such setting, use
<literal>ALTER DATABASE <replaceable>dbname</> RESET
<replaceable>varname</>;</literal>.
</para>
</sect1>
<sect1 id="manage-ag-dropdb">
<title>Destroying a Database</title>
<para>
Databases are destroyed with the command
2004-08-07 21:14:45 +02:00
<xref linkend="sql-dropdatabase">:<indexterm><primary>DROP DATABASE</></>
<synopsis>
DROP DATABASE <replaceable>name</>;
</synopsis>
Only the owner of the database (i.e., the user that created it), or
a superuser, can drop a database. Dropping a database removes all objects
that were
contained within the database. The destruction of a database cannot
be undone.
</para>
<para>
You cannot execute the <command>DROP DATABASE</command> command
while connected to the victim database. You can, however, be
connected to any other database, including the <literal>template1</>
database.
<literal>template1</> would be the only option for dropping the last user database of a
given cluster.
</para>
<para>
2003-08-31 19:32:24 +02:00
For convenience, there is also a shell program to drop
databases:<indexterm><primary>dropdb</></>
<synopsis>
dropdb <replaceable class="parameter">dbname</replaceable>
</synopsis>
(Unlike <command>createdb</>, it is not the default action to drop
the database with the current user name.)
</para>
</sect1>
<sect1 id="manage-ag-tablespaces">
<title>Tablespaces</title>
<para>
Tablespaces in <productname>PostgreSQL</> allow database administrators to
define locations in the file system where the files representing
database objects can be stored. Once created, a tablespace can be referred
to by name when creating database objects.
</para>
<para>
By using tablespaces, a database administrator can control the disk
layout of a <productname>PostgreSQL</> installation. This is useful in
at least two ways. Firstly, if the partition or volume on which the cluster
was initialized runs out of space and cannot be extended logically
or otherwise, a tablespace can be created on a different partition
and used until the system can be reconfigured.
</para>
<para>
Secondly, tablespaces allow a database administrator to arrange data
locations based on the usage patterns of database objects. For
example, an index which is very heavily used can be placed on very fast,
highly available disk, such as an expensive solid state device. At the same
time a table storing archived data which is rarely used or not performance
critical could be stored on a less expensive, slower disk system.
</para>
<para>
2004-08-07 21:14:45 +02:00
To define a tablespace, use the <xref linkend="sql-createtablespace">
command, for example:<indexterm><primary>CREATE TABLESPACE</></>
<programlisting>
CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';
</programlisting>
The location must be an existing, empty directory that is owned by
the <productname>PostgreSQL</> system user. All objects subsequently
created within the tablespace will be stored in files underneath this
directory.
</para>
<note>
<para>
There is usually not much point in making more than one
tablespace per logical filesystem, since you can't control the location
of individual files within a logical filesystem. However,
<productname>PostgreSQL</> does not enforce any such limitation, and
indeed it's not directly aware of the filesystem boundaries on your
system. It just stores files in the directories you tell it to use.
</para>
</note>
<para>
Creation of the tablespace itself must be done as a database superuser,
but after that you can allow ordinary database users to make use of it.
To do that, grant them the <literal>CREATE</> privilege on it.
</para>
<para>
Databases, schemas, tables, and indexes can all be assigned to
particular tablespaces. To do so, a user with the <literal>CREATE</>
privilege on a given tablespace must pass the tablespace name as a
parameter to the relevant command. For example, the following creates
a table in the tablespace <literal>space1</>:
<programlisting>
CREATE TABLE foo(i int) TABLESPACE space1;
</programlisting>
</para>
<para>
The tablespace associated with a database is used to store the system
catalogs of that database, as well as any temporary files created by
server processes using that database. Furthermore, it is the default
tablespace selected for any objects created within the database, if
no specific <literal>TABLESPACE</> clause is given when those objects
are created. If a database is created without specifying a tablespace
for it, it uses the same tablespace as the template database it is copied
from.
</para>
<para>
A schema does not in itself occupy any storage (other than a system
catalog entry), so assigning a tablespace to a schema does not in itself
do anything. What this actually does is to set a default tablespace
for tables later created within the schema. If
no tablespace is mentioned when creating a schema, it inherits its
default tablespace from the current database.
</para>
<para>
The default choice of tablespace for an index is the same tablespace
already assigned to the table the index is for.
</para>
<para>
Another way to state the above rules is that when a schema, table, or index
is created without specifying a tablespace, the object
inherits its logical parent's tablespace. A schema will be created in the
current database's tablespace; a table will be created in the
tablespace of the schema it is being created in; an index will be created
in the tablespace of the table underlying the index.
</para>
<para>
Two tablespaces are automatically created by <literal>initdb</>. The
<literal>pg_global</> tablespace is used for shared system catalogs. The
<literal>pg_default</> tablespace is the default tablespace of the
<literal>template1</> and <literal>template0</> databases (and, therefore,
will be the default tablespace for everything else as well, unless
explicit <literal>TABLESPACE</> clauses are used somewhere along the
line).
</para>
<para>
Once created, a tablespace can be used from any database, provided
the requesting user has sufficient privilege. This means that a tablespace
cannot be dropped until all objects in all databases using the tablespace
have been removed.
</para>
<para>
To simplify the implementation of tablespaces,
<productname>PostgreSQL</> makes extensive use of symbolic links. This
means that tablespaces can be used <emphasis>only</> on systems
that support symbolic links.
</para>
<para>
The directory <filename>$PGDATA/pg_tblspc</> contains symbolic links that
point to each of the non-built-in tablespaces defined in the cluster.
Although not recommended, it is possible to adjust the tablespace
layout by hand by redefining these links. Two warnings: do not do so
while the postmaster is running; and after you restart the postmaster,
update the <structname>pg_tablespace</> catalog to show the new
locations. (If you do not, <literal>pg_dump</> will continue to show
the old tablespace locations.)
</para>
</sect1>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->