postgresql/src/backend/storage/smgr/README

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

53 lines
2.5 KiB
Plaintext
Raw Normal View History

2010-09-20 22:08:53 +02:00
src/backend/storage/smgr/README
Storage Managers
================
In the original Berkeley Postgres system, there were several storage managers,
of which only the "magnetic disk" manager remains. (At Berkeley there were
also managers for the Sony WORM optical disk jukebox and persistent main
memory, but these were never supported in any externally released Postgres,
nor in any version of PostgreSQL.) The "magnetic disk" manager is itself
seriously misnamed, because actually it supports any kind of device for
which the operating system provides standard filesystem operations; which
these days is pretty much everything of interest. However, we retain the
notion of a storage manager switch in case anyone ever wants to reintroduce
other kinds of storage managers. Removing the switch layer would save
nothing noticeable anyway, since storage-access operations are surely far
more expensive than one extra layer of C function calls.
In Berkeley Postgres each relation was tagged with the ID of the storage
manager to use for it. This is gone. It would be probably more reasonable
to associate storage managers with tablespaces, should we ever re-introduce
multiple storage managers into the system catalogs.
The files in this directory, and their contents, are
smgr.c The storage manager switch dispatch code. The routines in
this file call the appropriate storage manager to do storage
accesses requested by higher-level code. smgr.c also manages
the file handle cache (SMgrRelation table).
md.c The "magnetic disk" storage manager, which is really just
an interface to the kernel's filesystem operations.
Note that md.c in turn relies on src/backend/storage/file/fd.c.
Relation Forks
==============
Since 8.4, a single smgr relation can be comprised of multiple physical
files, called relation forks. This allows storing additional metadata like
Free Space information in additional forks, which can be grown and truncated
independently of the main data file, while still treating it all as a single
physical relation in system catalogs.
It is assumed that the main fork, fork number 0 or MAIN_FORKNUM, always
exists. Fork numbers are assigned in src/include/common/relpath.h.
Functions in smgr.c and md.c take an extra fork number argument, in addition
Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
to relfilelocator and block number, to identify which relation fork you want to
access. Since most code wants to access the main fork, a shortcut version of
ReadBuffer that accesses MAIN_FORKNUM is provided in the buffer manager for
convenience.