Update our documentation concerning where to create data directories.

Although initdb has long discouraged use of a filesystem mount-point
directory as a PG data directory, this point was covered nowhere in the
user-facing documentation.  Also, with the popularity of pg_upgrade,
we really need to recommend that the PG user own not only the data
directory but its parent directory too.  (Without a writable parent
directory, operations such as "mv data data.old" fail immediately.
pg_upgrade itself doesn't do that, but wrapper scripts for it often do.)

Hence, adjust the "Creating a Database Cluster" section to address
these points.  I also took the liberty of wordsmithing the discussion
of NFS a bit.

These considerations aren't by any means new, so back-patch to all
supported branches.
This commit is contained in:
Tom Lane 2015-07-28 18:42:59 -04:00
parent 40a50a17b9
commit 28b11bd106

View File

@ -49,7 +49,7 @@
<para>
Before you can do anything, you must initialize a database storage
area on disk. We call this a <firstterm>database cluster</firstterm>.
(<acronym>SQL</acronym> uses the term catalog cluster.) A
(The <acronym>SQL</acronym> standard uses the term catalog cluster.) A
database cluster is a collection of databases that is managed by a
single instance of a running database server. After initialization, a
database cluster will contain a database named <literal>postgres</literal>,
@ -65,7 +65,7 @@
</para>
<para>
In file system terms, a database cluster will be a single directory
In file system terms, a database cluster is a single directory
under which all data will be stored. We call this the <firstterm>data
directory</firstterm> or <firstterm>data area</firstterm>. It is
completely up to you where you choose to store your data. There is no
@ -109,15 +109,18 @@
<para>
<command>initdb</command> will attempt to create the directory you
specify if it does not already exist. It is likely that it will not
have the permission to do so (if you followed our advice and created
an unprivileged account). In that case you should create the
directory yourself (as root) and change the owner to be the
<productname>PostgreSQL</productname> user. Here is how this might
be done:
specify if it does not already exist. Of course, this will fail if
<command>initdb</command> does not have permissions to write in the
parent directory. It's generally recommendable that the
<productname>PostgreSQL</productname> user own not just the data
directory but its parent directory as well, so that this should not
be a problem. If the desired parent directory doesn't exist either,
you will need to create it first, using root privileges if the
grandparent directory isn't writable. So the process might look
like this:
<screen>
root# <userinput>mkdir /usr/local/pgsql/data</userinput>
root# <userinput>chown postgres /usr/local/pgsql/data</userinput>
root# <userinput>mkdir /usr/local/pgsql</userinput>
root# <userinput>chown postgres /usr/local/pgsql</userinput>
root# <userinput>su postgres</userinput>
postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
</screen>
@ -125,7 +128,9 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
<para>
<command>initdb</command> will refuse to run if the data directory
looks like it has already been initialized.</para>
exists and already contains files; this is to prevent accidentally
overwriting an existing installation.
</para>
<para>
Because the data directory contains all the data stored in the
@ -178,8 +183,30 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
locale setting. For details see <xref linkend="multibyte">.
</para>
<sect2 id="creating-cluster-mount-points">
<title>Use of Secondary File Systems</title>
<indexterm zone="creating-cluster-mount-points">
<primary>file system mount points</primary>
</indexterm>
<para>
Many installations create their database clusters on file systems
(volumes) other than the machine's <quote>root</> volume. If you
choose to do this, it is not advisable to try to use the secondary
volume's topmost directory (mount point) as the data directory.
Best practice is to create a directory within the mount-point
directory that is owned by the <productname>PostgreSQL</productname>
user, and then create the data directory within that. This avoids
permissions problems, particularly for operations such
as <application>pg_upgrade</>, and it also ensures clean failures if
the secondary volume is taken offline.
</para>
</sect2>
<sect2 id="creating-cluster-nfs">
<title>Network File Systems</title>
<title>Use of Network File Systems</title>
<indexterm zone="creating-cluster-nfs">
<primary>Network File Systems</primary>
@ -188,22 +215,30 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
<indexterm><primary>Network Attached Storage (<acronym>NAS</>)</><see>Network File Systems</></>
<para>
Many installations create database clusters on network file systems.
Sometimes this is done directly via <acronym>NFS</>, or by using a
Many installations create their database clusters on network file
systems. Sometimes this is done via <acronym>NFS</>, or by using a
Network Attached Storage (<acronym>NAS</>) device that uses
<acronym>NFS</> internally. <productname>PostgreSQL</> does nothing
special for <acronym>NFS</> file systems, meaning it assumes
<acronym>NFS</> behaves exactly like locally-connected drives
(<acronym>DAS</>, Direct Attached Storage). If client and server
<acronym>NFS</> implementations have non-standard semantics, this can
<acronym>NFS</> behaves exactly like locally-connected drives.
If the client or server <acronym>NFS</> implementation does not
provide standard file system semantics, this can
cause reliability problems (see <ulink
url="http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html"></ulink>).
Specifically, delayed (asynchronous) writes to the <acronym>NFS</>
server can cause reliability problems; if possible, mount
<acronym>NFS</> file systems synchronously (without caching) to avoid
this. Also, soft-mounting <acronym>NFS</> is not recommended.
(Storage Area Networks (<acronym>SAN</>) use a low-level
communication protocol rather than <acronym>NFS</>.)
server can cause data corruption problems. If possible, mount the
<acronym>NFS</> file system synchronously (without caching) to avoid
this hazard. Also, soft-mounting the <acronym>NFS</> file system is
not recommended.
</para>
<para>
Storage Area Networks (<acronym>SAN</>) typically use communication
protocols other than <acronym>NFS</>, and may or may not be subject
to hazards of this sort. It's advisable to consult the vendor's
documentation concerning data consistency guarantees.
<productname>PostgreSQL</productname> cannot be more reliable than
the file system it's using.
</para>
</sect2>