2010-09-20 22:08:53 +02:00
|
|
|
src/backend/storage/smgr/README
|
2008-03-20 18:55:15 +01:00
|
|
|
|
2013-03-30 19:23:45 +01:00
|
|
|
Storage Managers
|
|
|
|
================
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2004-02-10 02:55:27 +01:00
|
|
|
In the original Berkeley Postgres system, there were several storage managers,
|
|
|
|
of which only the "magnetic disk" manager remains. (At Berkeley there were
|
|
|
|
also managers for the Sony WORM optical disk jukebox and persistent main
|
|
|
|
memory, but these were never supported in any externally released Postgres,
|
2013-03-30 19:23:45 +01:00
|
|
|
nor in any version of PostgreSQL.) The "magnetic disk" manager is itself
|
|
|
|
seriously misnamed, because actually it supports any kind of device for
|
|
|
|
which the operating system provides standard filesystem operations; which
|
|
|
|
these days is pretty much everything of interest. However, we retain the
|
|
|
|
notion of a storage manager switch in case anyone ever wants to reintroduce
|
|
|
|
other kinds of storage managers. Removing the switch layer would save
|
|
|
|
nothing noticeable anyway, since storage-access operations are surely far
|
|
|
|
more expensive than one extra layer of C function calls.
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2004-02-10 02:55:27 +01:00
|
|
|
In Berkeley Postgres each relation was tagged with the ID of the storage
|
2013-03-30 19:23:45 +01:00
|
|
|
manager to use for it. This is gone. It would be probably more reasonable
|
|
|
|
to associate storage managers with tablespaces, should we ever re-introduce
|
|
|
|
multiple storage managers into the system catalogs.
|
1996-07-09 08:22:35 +02:00
|
|
|
|
|
|
|
The files in this directory, and their contents, are
|
|
|
|
|
2013-03-30 19:23:45 +01:00
|
|
|
smgr.c The storage manager switch dispatch code. The routines in
|
|
|
|
this file call the appropriate storage manager to do storage
|
|
|
|
accesses requested by higher-level code. smgr.c also manages
|
|
|
|
the file handle cache (SMgrRelation table).
|
|
|
|
|
|
|
|
md.c The "magnetic disk" storage manager, which is really just
|
|
|
|
an interface to the kernel's filesystem operations.
|
|
|
|
|
2004-02-10 02:55:27 +01:00
|
|
|
Note that md.c in turn relies on src/backend/storage/file/fd.c.
|
2008-08-11 13:05:11 +02:00
|
|
|
|
2013-03-30 19:23:45 +01:00
|
|
|
|
2008-08-11 13:05:11 +02:00
|
|
|
Relation Forks
|
|
|
|
==============
|
|
|
|
|
|
|
|
Since 8.4, a single smgr relation can be comprised of multiple physical
|
|
|
|
files, called relation forks. This allows storing additional metadata like
|
|
|
|
Free Space information in additional forks, which can be grown and truncated
|
|
|
|
independently of the main data file, while still treating it all as a single
|
|
|
|
physical relation in system catalogs.
|
|
|
|
|
|
|
|
It is assumed that the main fork, fork number 0 or MAIN_FORKNUM, always
|
2014-07-28 22:30:14 +02:00
|
|
|
exists. Fork numbers are assigned in src/include/common/relpath.h.
|
2008-08-11 13:05:11 +02:00
|
|
|
Functions in smgr.c and md.c take an extra fork number argument, in addition
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
to relfilelocator and block number, to identify which relation fork you want to
|
2008-08-11 13:05:11 +02:00
|
|
|
access. Since most code wants to access the main fork, a shortcut version of
|
|
|
|
ReadBuffer that accesses MAIN_FORKNUM is provided in the buffer manager for
|
|
|
|
convenience.
|