postgresql/src/include/storage/shmem.h

/*-------------------------------------------------------------------------
 *
 * shmem.h
 *	  shared memory management structures
 *
 * Historical note:
 * A long time ago, Postgres' shared memory region was allowed to be mapped
 * at a different address in each process, and shared memory "pointers" were
 * passed around as offsets relative to the start of the shared memory region.
 * That is no longer the case: each process must map the shared memory region
 * at the same address.  This means shared memory pointers can be passed
 * around directly between different processes.
 *
 * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 * src/include/storage/shmem.h
 *
 *-------------------------------------------------------------------------
 */
#ifndef SHMEM_H
#define SHMEM_H

#include "utils/hsearch.h"


/* shmqueue.c */
typedef struct SHM_QUEUE
{
	struct SHM_QUEUE *prev;
	struct SHM_QUEUE *next;
} SHM_QUEUE;

/* shmem.c */
extern void InitShmemAccess(void *seghdr);
extern void InitShmemAllocation(void);
extern void *ShmemAlloc(Size size);
extern bool ShmemAddrIsValid(const void *addr);
extern void InitShmemIndex(void);
extern HTAB *ShmemInitHash(const char *name, long init_size, long max_size,
			  HASHCTL *infoP, int hash_flags);
extern void *ShmemInitStruct(const char *name, Size size, bool *foundPtr);
extern Size add_size(Size s1, Size s2);
extern Size mul_size(Size s1, Size s2);

/* ipci.c */
extern void RequestAddinShmemSpace(Size size);

/* size constants for the shmem index table */
 /* max size of data structure string name */
#define SHMEM_INDEX_KEYSIZE		 (48)
 /* estimated size of the shmem index table (not a hard limit) */
#define SHMEM_INDEX_SIZE		 (32)

/* this is a hash bucket in the shmem index table */
typedef struct
{
	char		key[SHMEM_INDEX_KEYSIZE];		/* string name */
	void	   *location;		/* location in shared mem */
	Size		size;			/* # bytes allocated for the structure */
} ShmemIndexEnt;

/*
 * prototypes for functions in shmqueue.c
 */
extern void SHMQueueInit(SHM_QUEUE *queue);
extern void SHMQueueElemInit(SHM_QUEUE *queue);
extern void SHMQueueDelete(SHM_QUEUE *queue);
extern void SHMQueueInsertBefore(SHM_QUEUE *queue, SHM_QUEUE *elem);
extern Pointer SHMQueueNext(const SHM_QUEUE *queue, const SHM_QUEUE *curElem,
			 Size linkOffset);
extern bool SHMQueueEmpty(const SHM_QUEUE *queue);
extern bool SHMQueueIsDetached(const SHM_QUEUE *queue);

#endif   /* SHMEM_H */
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00			`/*-------------------------------------------------------------------------`
			`*`
Change my-function-name-- to my_function_name, and optimizer renames. 1999-02-14 00:22:53 +01:00			`* shmem.h`
Massive commit to run PGINDENT on all .c and .h files. 1997-09-07 07:04:48 +02:00			`* shared memory management structures`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00			`*`
Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't allowed different processes to have different addresses for the shmem segment in quite a long time, but there were still a few places left that used the old coding convention. Clean them up to reduce confusion and improve the compiler's ability to detect pointer type mismatches. Kris Jurka 2008-11-02 22:24:52 +01:00			`* Historical note:`
			`* A long time ago, Postgres' shared memory region was allowed to be mapped`
			`* at a different address in each process, and shared memory "pointers" were`
			`* passed around as offsets relative to the start of the shared memory region.`
			`* That is no longer the case: each process must map the shared memory region`
			`* at the same address. This means shared memory pointers can be passed`
			`* around directly between different processes.`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00			`*`
Stamp copyrights for year 2011. 2011-01-01 19:18:15 +01:00			`* Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group`
Add: * Portions Copyright (c) 1996-2000, PostgreSQL, Inc to all files copyright Regents of Berkeley. Man, that's a lot of files. 2000-01-26 06:58:53 +01:00			`* Portions Copyright (c) 1994, Regents of the University of California`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00			`*`
Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00			`* src/include/storage/shmem.h`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00			`*`
			`*-------------------------------------------------------------------------`
			`*/`
Massive commit to run PGINDENT on all .c and .h files. 1997-09-07 07:04:48 +02:00			`#ifndef SHMEM_H`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00			`#define SHMEM_H`

More cleanup 1999-07-16 19:07:40 +02:00			`#include "utils/hsearch.h"`
Another directory totally cleaned out 1996-11-05 07:11:08 +01:00
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00
[ Newest version of patch applied.] This patch is an updated version of the lock listing patch. I've made the following changes: - write documentation - wrap the SRF in a view called 'pg_locks': all user-level access should be done through this view - re-diff against latest CVS One thing I chose not to do is adapt the SRF to use the anonymous composite type code from Joe Conway. I'll probably do that eventually, but I'm not really convinced it's a significantly cleaner way to bootstrap SRF builtins than the method this patch uses (of course, it has other uses...) Neil Conway 2002-08-17 15:04:19 +02:00			`/* shmqueue.c */`
Massive commit to run PGINDENT on all .c and .h files. 1997-09-07 07:04:48 +02:00			`typedef struct SHM_QUEUE`
			`{`
Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't allowed different processes to have different addresses for the shmem segment in quite a long time, but there were still a few places left that used the old coding convention. Clean them up to reduce confusion and improve the compiler's ability to detect pointer type mismatches. Kris Jurka 2008-11-02 22:24:52 +01:00			`struct SHM_QUEUE *prev;`
			`struct SHM_QUEUE *next;`
Used modified version of indent that understands over 100 typedefs. 1997-09-08 23:56:23 +02:00			`} SHM_QUEUE;`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00
			`/* shmem.c */`
Rearrange backend startup sequence so that ShmemIndexLock can become an LWLock instead of a spinlock. This hardly matters on Unix machines but should improve startup performance on Windows (or any port using EXEC_BACKEND). Per previous discussion. 2006-01-04 22:06:32 +01:00			`extern void InitShmemAccess(void *seghdr);`
			`extern void InitShmemAllocation(void);`
First phase of memory management rewrite (see backend/utils/mmgr/README for details). It doesn't really do that much yet, since there are no short-term memory contexts in the executor, but the infrastructure is in place and long-term contexts are handled reasonably. A few long- standing bugs have been fixed, such as 'VACUUM; anything' in a single query string crashing. Also, out-of-memory is now considered a recoverable ERROR, not FATAL. Eliminate a large amount of crufty, now-dead code in and around memory management. Fix problem with holding off SIGTRAP, SIGSEGV, etc in postmaster and backend startup. 2000-06-28 05:33:33 +02:00			`extern void *ShmemAlloc(Size size);`
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen 2011-02-07 22:46:51 +01:00			`extern bool ShmemAddrIsValid(const void *addr);`
Implement new 'lightweight lock manager' that's intermediate between existing lock manager and spinlocks: it understands exclusive vs shared lock but has few other fancy features. Replace most uses of spinlocks with lightweight locks. All remaining uses of spinlocks have very short lock hold times (a few dozen instructions), so tweak spinlock backoff code to work efficiently given this assumption. All per my proposal on pghackers 26-Sep-01. 2001-09-29 06:02:27 +02:00			`extern void InitShmemIndex(void);`
Further cleanup of dynahash.c API, in pursuit of portability and readability. Bizarre '(long *) TRUE' return convention is gone, in favor of just raising an error internally in dynahash.c when we detect hashtable corruption. HashTableWalk is gone, in favor of using hash_seq_search directly, since it had no hope of working with non-LONGALIGNable datatypes. Simplify some other code that was made undesirably grotty by promixity to HashTableWalk. 2001-10-05 19:28:13 +02:00			`extern HTAB ShmemInitHash(const char name, long init_size, long max_size,`
Used modified version of indent that understands over 100 typedefs. 1997-09-08 23:56:23 +02:00			`HASHCTL *infoP, int hash_flags);`
Further cleanup of dynahash.c API, in pursuit of portability and readability. Bizarre '(long *) TRUE' return convention is gone, in favor of just raising an error internally in dynahash.c when we detect hashtable corruption. HashTableWalk is gone, in favor of using hash_seq_search directly, since it had no hope of working with non-LONGALIGNable datatypes. Simplify some other code that was made undesirably grotty by promixity to HashTableWalk. 2001-10-05 19:28:13 +02:00			`extern void ShmemInitStruct(const char name, Size size, bool *foundPtr);`
Convert the arithmetic for shared memory size calculation from 'int' to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi. 2005-08-21 01:26:37 +02:00			`extern Size add_size(Size s1, Size s2);`
			`extern Size mul_size(Size s1, Size s2);`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00
Redesign the patch for allocation of shmem space and LWLocks for add-on modules; the first try was not usable in EXEC_BACKEND builds (e.g., Windows). Instead, just provide some entry points to increase the allocation requests during postmaster start, and provide a dedicated LWLock that can be used to synchronize allocation operations performed by backends. Per discussion with Marc Munro. 2006-10-16 00:04:08 +02:00			`/* ipci.c */`
			`extern void RequestAddinShmemSpace(Size size);`

Rename BindingTable to ShmemIndex. 1998-06-27 17:47:48 +02:00			`/* size constants for the shmem index table */`
Massive commit to run PGINDENT on all .c and .h files. 1997-09-07 07:04:48 +02:00			`/* max size of data structure string name */`
Another round of cleanups for dynahash.c (maybe it's finally clean of portability issues). Caller-visible data structures are now allocated on MAXALIGN boundaries, allowing safe use of datatypes wider than 'long'. Rejigger hash_create API so that caller specifies size of key and total size of entry, not size of key and size of rest of entry. This simplifies life considerably since each number is just a sizeof(), and padding issues etc. are taken care of automatically. 2001-10-01 07:36:17 +02:00			`#define SHMEM_INDEX_KEYSIZE (48)`
Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't allowed different processes to have different addresses for the shmem segment in quite a long time, but there were still a few places left that used the old coding convention. Clean them up to reduce confusion and improve the compiler's ability to detect pointer type mismatches. Kris Jurka 2008-11-02 22:24:52 +01:00			`/* estimated size of the shmem index table (not a hard limit) */`
Arrange to preallocate all required space for the buffer and FSM hash tables in shared memory. This ensures that overflow of the lock table creates no long-lasting problems. Per discussion with Merlin Moncure. 2004-09-28 22:46:37 +02:00			`#define SHMEM_INDEX_SIZE (32)`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00
Rename BindingTable to ShmemIndex. 1998-06-27 17:47:48 +02:00			`/* this is a hash bucket in the shmem index table */`
Massive commit to run PGINDENT on all .c and .h files. 1997-09-07 07:04:48 +02:00			`typedef struct`
			`{`
OK, folks, here is the pgindent output. 1998-09-01 06:40:42 +02:00			`char key[SHMEM_INDEX_KEYSIZE]; /* string name */`
8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew. 2009-06-11 16:49:15 +02:00			`void location; / location in shared mem */`
			`Size size; /* # bytes allocated for the structure */`
Another pgindent run. Sorry folks. 1999-05-26 00:43:53 +02:00			`} ShmemIndexEnt;`
More cleanups of the include files - centralizing to simplify the -I's required to compile 1996-08-28 03:59:28 +02:00
			`/*`
			`* prototypes for functions in shmqueue.c`
			`*/`
Used modified version of indent that understands over 100 typedefs. 1997-09-08 23:56:23 +02:00			`extern void SHMQueueInit(SHM_QUEUE *queue);`
			`extern void SHMQueueElemInit(SHM_QUEUE *queue);`
			`extern void SHMQueueDelete(SHM_QUEUE *queue);`
Clean up lockmanager data structures some more, in preparation for planned rewrite of deadlock checking. Lock holder objects are now reachable from the associated LOCK as well as from the owning PROC. This makes it practical to find all the processes holding a lock, as well as all those waiting on the lock. Also, clean up some of the grottier aspects of the SHMQueue API, and cause the waitProcs list to be stored in the intuitive direction instead of the nonintuitive one. (Bet you didn't know that the code followed the 'prev' link to get to the next waiting process, instead of the 'next' link. It doesn't do that anymore.) 2001-01-22 23:30:06 +01:00			`extern void SHMQueueInsertBefore(SHM_QUEUE queue, SHM_QUEUE elem);`
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen 2011-02-07 22:46:51 +01:00			`extern Pointer SHMQueueNext(const SHM_QUEUE queue, const SHM_QUEUE curElem,`
pgindent run. Make it all clean. 2001-03-22 05:01:46 +01:00			`Size linkOffset);`
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen 2011-02-07 22:46:51 +01:00			`extern bool SHMQueueEmpty(const SHM_QUEUE *queue);`
			`extern bool SHMQueueIsDetached(const SHM_QUEUE *queue);`
Another pgindent run. Fixes enum indenting, and improves #endif spacing. Also adds space for one-line comments. 2001-10-28 07:26:15 +01:00
New pgindent run with fixes suggested by Tom. Patch manually reviewed, initdb/regression tests pass. 2001-11-05 18:46:40 +01:00			`#endif /* SHMEM_H */`