2002-05-05 02:03:29 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* pg_shmem.h
|
|
|
|
* Platform-independent API for shared memory support.
|
|
|
|
*
|
|
|
|
* Every port is expected to support shared memory with approximately
|
|
|
|
* SysV-ish semantics; in particular, a memory block is not anonymous
|
|
|
|
* but has an ID, and we must be able to tell whether there are any
|
|
|
|
* remaining processes attached to a block of a specified ID.
|
|
|
|
*
|
|
|
|
* To simplify life for the SysV implementation, the ID is assumed to
|
|
|
|
* consist of two unsigned long values (these are key and ID in SysV
|
|
|
|
* terms). Other platforms may ignore the second value if they need
|
|
|
|
* only one ID number.
|
|
|
|
*
|
|
|
|
*
|
2023-01-02 21:00:37 +01:00
|
|
|
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
|
2002-05-05 02:03:29 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/include/storage/pg_shmem.h
|
2002-05-05 02:03:29 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#ifndef PG_SHMEM_H
|
|
|
|
#define PG_SHMEM_H
|
|
|
|
|
2014-04-08 17:39:55 +02:00
|
|
|
#include "storage/dsm_impl.h"
|
|
|
|
|
2002-05-05 02:03:29 +02:00
|
|
|
typedef struct PGShmemHeader /* standard header for all Postgres shmem */
|
|
|
|
{
|
|
|
|
int32 magic; /* magic # to identify Postgres segments */
|
2006-01-04 22:06:32 +01:00
|
|
|
#define PGShmemMagic 679834894
|
2019-04-13 07:36:38 +02:00
|
|
|
pid_t creatorPID; /* PID of creating process (set but unread) */
|
2005-08-21 01:26:37 +02:00
|
|
|
Size totalsize; /* total size of segment */
|
|
|
|
Size freeoffset; /* offset to first free space */
|
2014-04-08 17:39:55 +02:00
|
|
|
dsm_handle dsm_control; /* ID of dynamic shared memory control seg */
|
2008-11-02 22:24:52 +01:00
|
|
|
void *index; /* pointer to ShmemIndex table */
|
2004-11-09 22:30:18 +01:00
|
|
|
#ifndef WIN32 /* Windows doesn't have useful inode#s */
|
|
|
|
dev_t device; /* device data directory is on */
|
|
|
|
ino_t inode; /* inode number of data directory */
|
|
|
|
#endif
|
2002-05-05 02:03:29 +02:00
|
|
|
} PGShmemHeader;
|
|
|
|
|
2019-02-03 09:55:39 +01:00
|
|
|
/* GUC variables */
|
2022-04-08 14:16:38 +02:00
|
|
|
extern PGDLLIMPORT int shared_memory_type;
|
|
|
|
extern PGDLLIMPORT int huge_pages;
|
|
|
|
extern PGDLLIMPORT int huge_page_size;
|
Allow using huge TLB pages on Linux (MAP_HUGETLB)
This patch adds an option, huge_tlb_pages, which allows requesting the
shared memory segment to be allocated using huge pages, by using the
MAP_HUGETLB flag in mmap(). This can improve performance.
The default is 'try', which means that we will attempt using huge pages,
and fall back to non-huge pages if it doesn't work. Currently, only Linux
has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as
'off'.
In the passing, don't try to round the mmap() size to a multiple of
pagesize. mmap() doesn't require that, and there's no particular reason for
PostgreSQL to do that either. When using MAP_HUGETLB, however, round the
request size up to nearest 2MB boundary. This is to work around a bug in
some Linux kernel versions, but also to avoid wasting memory, because the
kernel will round the size up anyway.
Many people were involved in writing this patch, including Christian Kruse,
Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund
and me.
2014-01-29 12:44:45 +01:00
|
|
|
|
Add GUC parameter "huge_pages_status"
This is useful to show the allocation state of huge pages when setting
up a server with "huge_pages = try", where allocating huge pages would
be attempted but the server would continue its startup sequence even if
the allocation fails. The effective status of huge pages is not easily
visible without OS-level tools (or for instance, a lookup at
/proc/N/smaps), and the environments where Postgres runs may not
authorize that. Like the other GUCs related to huge pages, this works
for Linux and Windows.
This GUC can report as values:
- "on", if huge pages were allocated.
- "off", if huge pages were not allocated.
- "unknown", a special state that could only be seen when using for
example postgres -C because it is only possible to know if the shared
memory allocation worked after we can check for the GUC values, even if
checking a runtime-computed GUC. This value should never be seen when
querying for the GUC on a running server. An assertion is added to
check that.
The discussion has also turned around having a new function to grab this
status, but this would have required more tricks for -DEXEC_BACKEND,
something that GUCs already handle.
Noriyoshi Shinoda has initiated the thread that has led to the result of
this commit.
Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/TU4PR8401MB1152EBB0D271F827E2E37A01EECC9@TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM
2023-07-06 07:42:36 +02:00
|
|
|
/* Possible values for huge_pages and huge_pages_status */
|
Allow using huge TLB pages on Linux (MAP_HUGETLB)
This patch adds an option, huge_tlb_pages, which allows requesting the
shared memory segment to be allocated using huge pages, by using the
MAP_HUGETLB flag in mmap(). This can improve performance.
The default is 'try', which means that we will attempt using huge pages,
and fall back to non-huge pages if it doesn't work. Currently, only Linux
has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as
'off'.
In the passing, don't try to round the mmap() size to a multiple of
pagesize. mmap() doesn't require that, and there's no particular reason for
PostgreSQL to do that either. When using MAP_HUGETLB, however, round the
request size up to nearest 2MB boundary. This is to work around a bug in
some Linux kernel versions, but also to avoid wasting memory, because the
kernel will round the size up anyway.
Many people were involved in writing this patch, including Christian Kruse,
Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund
and me.
2014-01-29 12:44:45 +01:00
|
|
|
typedef enum
|
|
|
|
{
|
2014-03-03 19:52:48 +01:00
|
|
|
HUGE_PAGES_OFF,
|
|
|
|
HUGE_PAGES_ON,
|
Add GUC parameter "huge_pages_status"
This is useful to show the allocation state of huge pages when setting
up a server with "huge_pages = try", where allocating huge pages would
be attempted but the server would continue its startup sequence even if
the allocation fails. The effective status of huge pages is not easily
visible without OS-level tools (or for instance, a lookup at
/proc/N/smaps), and the environments where Postgres runs may not
authorize that. Like the other GUCs related to huge pages, this works
for Linux and Windows.
This GUC can report as values:
- "on", if huge pages were allocated.
- "off", if huge pages were not allocated.
- "unknown", a special state that could only be seen when using for
example postgres -C because it is only possible to know if the shared
memory allocation worked after we can check for the GUC values, even if
checking a runtime-computed GUC. This value should never be seen when
querying for the GUC on a running server. An assertion is added to
check that.
The discussion has also turned around having a new function to grab this
status, but this would have required more tricks for -DEXEC_BACKEND,
something that GUCs already handle.
Noriyoshi Shinoda has initiated the thread that has led to the result of
this commit.
Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/TU4PR8401MB1152EBB0D271F827E2E37A01EECC9@TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM
2023-07-06 07:42:36 +02:00
|
|
|
HUGE_PAGES_TRY, /* only for huge_pages */
|
|
|
|
HUGE_PAGES_UNKNOWN, /* only for huge_pages_status */
|
2014-03-03 19:52:48 +01:00
|
|
|
} HugePagesType;
|
2002-05-05 02:03:29 +02:00
|
|
|
|
2019-02-03 09:55:39 +01:00
|
|
|
/* Possible values for shared_memory_type */
|
|
|
|
typedef enum
|
|
|
|
{
|
|
|
|
SHMEM_TYPE_WINDOWS,
|
|
|
|
SHMEM_TYPE_SYSV,
|
|
|
|
SHMEM_TYPE_MMAP,
|
|
|
|
} PGShmemType;
|
|
|
|
|
2010-01-02 13:18:45 +01:00
|
|
|
#ifndef WIN32
|
2022-04-08 14:16:38 +02:00
|
|
|
extern PGDLLIMPORT unsigned long UsedShmemSegID;
|
2010-01-02 13:18:45 +01:00
|
|
|
#else
|
2022-04-08 14:16:38 +02:00
|
|
|
extern PGDLLIMPORT HANDLE UsedShmemSegID;
|
|
|
|
extern PGDLLIMPORT void *ShmemProtectiveRegion;
|
2010-01-02 13:18:45 +01:00
|
|
|
#endif
|
2022-04-08 14:16:38 +02:00
|
|
|
extern PGDLLIMPORT void *UsedShmemSegAddr;
|
2004-12-29 22:36:09 +01:00
|
|
|
|
2019-02-03 09:55:39 +01:00
|
|
|
#if !defined(WIN32) && !defined(EXEC_BACKEND)
|
|
|
|
#define DEFAULT_SHARED_MEMORY_TYPE SHMEM_TYPE_MMAP
|
|
|
|
#elif !defined(WIN32)
|
|
|
|
#define DEFAULT_SHARED_MEMORY_TYPE SHMEM_TYPE_SYSV
|
|
|
|
#else
|
|
|
|
#define DEFAULT_SHARED_MEMORY_TYPE SHMEM_TYPE_WINDOWS
|
|
|
|
#endif
|
|
|
|
|
2014-02-09 03:21:46 +01:00
|
|
|
#ifdef EXEC_BACKEND
|
2004-12-29 22:36:09 +01:00
|
|
|
extern void PGSharedMemoryReAttach(void);
|
On Windows, ensure shared memory handle gets closed if not being used.
Postmaster child processes that aren't supposed to be attached to shared
memory were not bothering to close the shared memory mapping handle they
inherit from the postmaster process. That's mostly harmless, since the
handle vanishes anyway when the child process exits -- but the syslogger
process, if used, doesn't get killed and restarted during recovery from a
backend crash. That meant that Windows doesn't see the shared memory
mapping as becoming free, so it doesn't delete it and the postmaster is
unable to create a new one, resulting in failure to recover from crashes
whenever logging_collector is turned on.
Per report from Dmitry Vasilyev. It's a bit astonishing that we'd not
figured this out long ago, since it's been broken from the very beginnings
of out native Windows support; probably some previously-unexplained trouble
reports trace to this.
A secondary problem is that on Cygwin (perhaps only in older versions?),
exec() may not detach from the shared memory segment after all, in which
case these child processes did remain attached to shared memory, posing
the risk of an unexpected shared memory clobber if they went off the rails
somehow. That may be a long-gone bug, but we can deal with it now if it's
still live, by detaching within the infrastructure introduced here to deal
with closing the handle.
Back-patch to all supported branches.
Tom Lane and Amit Kapila
2015-10-13 17:21:33 +02:00
|
|
|
extern void PGSharedMemoryNoReAttach(void);
|
2003-05-07 01:34:56 +02:00
|
|
|
#endif
|
|
|
|
|
Use data directory inode number, not port, to select SysV resource keys.
This approach provides a much tighter binding between a data directory
and the associated SysV shared memory block (and SysV or named-POSIX
semaphores, if we're using those). Key collisions are still possible,
but only between data directories stored on different filesystems,
so the situation should be negligible in practice. More importantly,
restarting the postmaster with a different port number no longer
risks failing to identify a relevant shared memory block, even when
postmaster.pid has been removed. A standalone backend is likewise
much more certain to detect conflicting leftover backends.
(In the longer term, we might now think about deprecating the port as
a cluster-wide value, so that one postmaster could support sockets
with varying port numbers. But that's for another day.)
The hazards fixed here apply only on Unix systems; our Windows code
paths already use identifiers derived from the data directory path
name rather than the port.
src/test/recovery/t/017_shm.pl, which intends to test key-collision
cases, has been substantially rewritten since it can no longer use
two postmasters with identical port numbers to trigger the case.
Instead, use Perl's IPC::SharedMem module to create a conflicting
shmem segment directly. The test script will be skipped if that
module is not available. (This means that some older buildfarm
members won't run it, but I don't think that that results in any
meaningful coverage loss.)
Patch by me; thanks to Noah Misch and Peter Eisentraut for discussion
and review.
Discussion: https://postgr.es/m/16908.1557521200@sss.pgh.pa.us
2019-09-05 19:31:41 +02:00
|
|
|
extern PGShmemHeader *PGSharedMemoryCreate(Size size,
|
2019-04-13 07:36:38 +02:00
|
|
|
PGShmemHeader **shim);
|
2002-05-05 02:03:29 +02:00
|
|
|
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
|
2003-11-07 22:55:50 +01:00
|
|
|
extern void PGSharedMemoryDetach(void);
|
2021-09-21 03:31:58 +02:00
|
|
|
extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
|
2002-05-05 02:03:29 +02:00
|
|
|
|
|
|
|
#endif /* PG_SHMEM_H */
|