postgresql/src/backend/utils/resowner/README

src/backend/utils/resowner/README

Notes About Resource Owners
===========================

ResourceOwner objects are a concept invented to simplify management of
query-related resources, such as buffer pins and table locks.  These
resources need to be tracked in a reliable way to ensure that they will
be released at query end, even if the query fails due to an error.
Rather than expecting the entire executor to have bulletproof data
structures, we localize the tracking of such resources into a single
module.

The design of the ResourceOwner API is modeled on our MemoryContext API,
which has proven very flexible and successful in preventing memory leaks.
In particular we allow ResourceOwners to have child ResourceOwner objects
so that there can be forests of the things; releasing a parent
ResourceOwner acts on all its direct and indirect children as well.

(It is tempting to consider unifying ResourceOwners and MemoryContexts
into a single object type, but their usage patterns are sufficiently
different that this is probably not really a helpful thing to do.)

We create a ResourceOwner for each transaction or subtransaction as
well as one for each Portal.  During execution of a Portal, the global
variable CurrentResourceOwner points to the Portal's ResourceOwner.
This causes operations such as ReadBuffer and LockAcquire to record
ownership of the acquired resources in that ResourceOwner object.

When a Portal is closed, any remaining resources (typically only locks)
become the responsibility of the current transaction.  This is represented
by making the Portal's ResourceOwner a child of the current transaction's
ResourceOwner.  resowner.c automatically transfers the resources to the
parent object when releasing the child.  Similarly, subtransaction
ResourceOwners are children of their immediate parent.

We need transaction-related ResourceOwners as well as Portal-related ones
because transactions may initiate operations that require resources (such
as query parsing) when no associated Portal exists yet.


Usage
-----

The basic operations on a ResourceOwner are:

* create a ResourceOwner

* associate or deassociate some resource with a ResourceOwner

* release a ResourceOwner's assets (free all owned resources, but not the
  owner object itself)

* delete a ResourceOwner (including child owner objects); all resources
  must have been released beforehand

Locks are handled specially because in non-error situations a lock should
be held until end of transaction, even if it was originally taken by a
subtransaction or portal.  Therefore, the "release" operation on a child
ResourceOwner transfers lock ownership to the parent instead of actually
releasing the lock, if isCommit is true.

Whenever we are inside a transaction, the global variable
CurrentResourceOwner shows which resource owner should be assigned
ownership of acquired resources.  Note however that CurrentResourceOwner
is NULL when not inside any transaction (or when inside a failed
transaction).  In this case it is not valid to acquire query-lifespan
resources.

When unpinning a buffer or releasing a lock or cache reference,
CurrentResourceOwner must point to the same resource owner that was current
when the buffer, lock, or cache reference was acquired.  It would be possible
to relax this restriction given additional bookkeeping effort, but at present
there seems no need.

Adding a new resource type
--------------------------

ResourceOwner can track ownership of many different kinds of resources.  In
core PostgreSQL it is used for buffer pins, lmgr locks, and catalog cache
references, to name a few examples.

To add a new kind of resource, define a ResourceOwnerDesc to describe it.
For example:

static const ResourceOwnerDesc myresource_desc = {
	.name = "My fancy resource",
	.release_phase = RESOURCE_RELEASE_AFTER_LOCKS,
	.release_priority = RELEASE_PRIO_FIRST,
	.ReleaseResource = ReleaseMyResource,
	.DebugPrint = PrintMyResource
};

ResourceOwnerRemember() and ResourceOwnerForget() functions take a pointer
to that struct, along with a Datum to represent the resource.  The meaning
of the Datum depends on the resource type.  Most resource types use it to
store a pointer to some struct, but it can also be a file descriptor or
library handle, for example.

The ReleaseResource callback is called when a resource owner is released or
deleted.  It should release any resources (e.g. close files, free memory)
associated with the resource.  Because the callback is called during
transaction abort, it must perform only low-level cleanup with no user
visible effects.  The callback should not perform operations that could
fail, like allocate memory.

The optional DebugPrint callback is used in the warning at transaction
commit, if any resources are leaked.  If not specified, a generic
implementation that prints the resource name and the resource as a pointer
is used.

There is another API for other modules to get control during ResourceOwner
release, so that they can scan their own data structures to find the objects
that need to be deleted.  See RegisterResourceReleaseCallback function.
This used to be the only way for extensions to use the resource owner
mechanism with new kinds of objects; nowadays it easier to define a custom
ResourceOwnerDesc struct.


Releasing
---------

Releasing the resources of a ResourceOwner happens in three phases:

1. "Before-locks" resources

2. Locks

3. "After-locks" resources

Each resource type specifies whether it needs to be released before or after
locks.  Each resource type also has a priority, which determines the order
that the resources are released in.  Note that the phases are performed fully
for the whole tree of resource owners, before moving to the next phase, but
the priority within each phase only determines the order within that
ResourceOwner.  Child resource owners are always handled before the parent,
within each phase.

For example, imagine that you have two ResourceOwners, parent and child,
as follows:

Parent
  parent resource BEFORE_LOCKS priority 1
  parent resource BEFORE_LOCKS priority 2
  parent resource AFTER_LOCKS priority 10001
  parent resource AFTER_LOCKS priority 10002
  Child
    child resource BEFORE_LOCKS priority 1
    child resource BEFORE_LOCKS priority 2
    child resource AFTER_LOCKS priority 10001
    child resource AFTER_LOCKS priority 10002

These resources would be released in the following order:

child resource BEFORE_LOCKS priority 1
child resource BEFORE_LOCKS priority 2
parent resource BEFORE_LOCKS priority 1
parent resource BEFORE_LOCKS priority 2
(locks)
child resource AFTER_LOCKS priority 10001
child resource AFTER_LOCKS priority 10002
parent resource AFTER_LOCKS priority 10001
parent resource AFTER_LOCKS priority 10002

To release all the resources, you need to call ResourceOwnerRelease() three
times, once for each phase. You may perform additional tasks between the
phases, but after the first call to ResourceOwnerRelease(), you cannot use
the ResourceOwner to remember any more resources. You also cannot call
ResourceOwnerForget on the resource owner to release any previously
remembered resources "in retail", after you have started the release process.

Normally, you are expected to call ResourceOwnerForget on every resource so
that at commit, the ResourceOwner is empty (locks are an exception). If there
are any resources still held at commit, ResourceOwnerRelease will print a
WARNING on each such resource. At abort, however, we truly rely on the
ResourceOwner mechanism and it is normal that there are resources to be
released.
Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00			`src/backend/utils/resowner/README`
Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later. 2004-07-17 05:32:14 +02:00
Make source code READMEs more consistent. Add CVS tags to all README files. 2008-03-20 18:55:15 +01:00			`Notes About Resource Owners`
More README src cleanups. 2008-03-21 14:23:29 +01:00			`===========================`
Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later. 2004-07-17 05:32:14 +02:00
			`ResourceOwner objects are a concept invented to simplify management of`
			`query-related resources, such as buffer pins and table locks. These`
			`resources need to be tracked in a reliable way to ensure that they will`
			`be released at query end, even if the query fails due to an error.`
			`Rather than expecting the entire executor to have bulletproof data`
			`structures, we localize the tracking of such resources into a single`
			`module.`

			`The design of the ResourceOwner API is modeled on our MemoryContext API,`
			`which has proven very flexible and successful in preventing memory leaks.`
			`In particular we allow ResourceOwners to have child ResourceOwner objects`
			`so that there can be forests of the things; releasing a parent`
			`ResourceOwner acts on all its direct and indirect children as well.`

			`(It is tempting to consider unifying ResourceOwners and MemoryContexts`
			`into a single object type, but their usage patterns are sufficiently`
			`different that this is probably not really a helpful thing to do.)`

			`We create a ResourceOwner for each transaction or subtransaction as`
			`well as one for each Portal. During execution of a Portal, the global`
			`variable CurrentResourceOwner points to the Portal's ResourceOwner.`
			`This causes operations such as ReadBuffer and LockAcquire to record`
			`ownership of the acquired resources in that ResourceOwner object.`

			`When a Portal is closed, any remaining resources (typically only locks)`
			`become the responsibility of the current transaction. This is represented`
			`by making the Portal's ResourceOwner a child of the current transaction's`
Revise ResourceOwner code to avoid accumulating ResourceOwner objects for every command executed within a transaction. For long transactions this was a significant memory leak. Instead, we can delete a portal's or subtransaction's ResourceOwner immediately, if we physically transfer the information about its locks up to the parent owner. This does not fully solve the leak problem; we need to do something about counting multiple acquisitions of the same lock in order to fix it. But it's a necessary step along the way. 2004-08-25 20:43:43 +02:00			`ResourceOwner. resowner.c automatically transfers the resources to the`
			`parent object when releasing the child. Similarly, subtransaction`
			`ResourceOwners are children of their immediate parent.`
Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later. 2004-07-17 05:32:14 +02:00
			`We need transaction-related ResourceOwners as well as Portal-related ones`
			`because transactions may initiate operations that require resources (such`
			`as query parsing) when no associated Portal exists yet.`


Make ResourceOwners more easily extensible. Instead of having a separate array/hash for each resource kind, use a single array and hash to hold all kinds of resources. This makes it possible to introduce new resource "kinds" without having to modify the ResourceOwnerData struct. In particular, this makes it possible for extensions to register custom resource kinds. The old approach was to have a small array of resources of each kind, and if it fills up, switch to a hash table. The new approach also uses an array and a hash, but now the array and the hash are used at the same time. The array is used to hold the recently added resources, and when it fills up, they are moved to the hash. This keeps the access to recent entries fast, even when there are a lot of long-held resources. All the resource-specific ResourceOwnerEnlarge(), ResourceOwnerRemember(), and ResourceOwnerForget*() functions have been replaced with three generic functions that take resource kind as argument. For convenience, we still define resource-specific wrapper macros around the generic functions with the old names, but they are now defined in the source files that use those resource kinds. The release callback no longer needs to call ResourceOwnerForget on the resource being released. ResourceOwnerRelease unregisters the resource from the owner before calling the callback. That needed some changes in bufmgr.c and some other files, where releasing the resources previously always called ResourceOwnerForget. Each resource kind specifies a release priority, and ResourceOwnerReleaseAll releases the resources in priority order. To make that possible, we have to restrict what you can do between phases. After calling ResourceOwnerRelease(), you are no longer allowed to remember any more resources in it or to forget any previously remembered resources by calling ResourceOwnerForget. There was one case where that was done previously. At subtransaction commit, AtEOSubXact_Inval() would handle the invalidation messages and call RelationFlushRelation(), which temporarily increased the reference count on the relation being flushed. We now switch to the parent subtransaction's resource owner before calling AtEOSubXact_Inval(), so that there is a valid ResourceOwner to temporarily hold that relcache reference. Other end-of-xact routines make similar calls to AtEOXact_Inval() between release phases, but I didn't see any regression test failures from those, so I'm not sure if they could reach a codepath that needs remembering extra resources. There were two exceptions to how the resource leak WARNINGs on commit were printed previously: llvmjit silently released the context without printing the warning, and a leaked buffer io triggered a PANIC. Now everything prints a WARNING, including those cases. Add tests in src/test/modules/test_resowner. Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu Reviewed-by: Peter Eisentraut, Andres Freund Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi 2023-11-08 12:30:50 +01:00			`Usage`
			`-----`
Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later. 2004-07-17 05:32:14 +02:00
			`The basic operations on a ResourceOwner are:`

			`* create a ResourceOwner`

			`* associate or deassociate some resource with a ResourceOwner`

			`* release a ResourceOwner's assets (free all owned resources, but not the`
			`owner object itself)`

			`* delete a ResourceOwner (including child owner objects); all resources`
			`must have been released beforehand`

Revise ResourceOwner code to avoid accumulating ResourceOwner objects for every command executed within a transaction. For long transactions this was a significant memory leak. Instead, we can delete a portal's or subtransaction's ResourceOwner immediately, if we physically transfer the information about its locks up to the parent owner. This does not fully solve the leak problem; we need to do something about counting multiple acquisitions of the same lock in order to fix it. But it's a necessary step along the way. 2004-08-25 20:43:43 +02:00			`Locks are handled specially because in non-error situations a lock should`
			`be held until end of transaction, even if it was originally taken by a`
			`subtransaction or portal. Therefore, the "release" operation on a child`
			`ResourceOwner transfers lock ownership to the parent instead of actually`
			`releasing the lock, if isCommit is true.`

Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later. 2004-07-17 05:32:14 +02:00			`Whenever we are inside a transaction, the global variable`
			`CurrentResourceOwner shows which resource owner should be assigned`
			`ownership of acquired resources. Note however that CurrentResourceOwner`
			`is NULL when not inside any transaction (or when inside a failed`
			`transaction). In this case it is not valid to acquire query-lifespan`
			`resources.`

			`When unpinning a buffer or releasing a lock or cache reference,`
			`CurrentResourceOwner must point to the same resource owner that was current`
			`when the buffer, lock, or cache reference was acquired. It would be possible`
			`to relax this restriction given additional bookkeeping effort, but at present`
			`there seems no need.`
Make ResourceOwners more easily extensible. Instead of having a separate array/hash for each resource kind, use a single array and hash to hold all kinds of resources. This makes it possible to introduce new resource "kinds" without having to modify the ResourceOwnerData struct. In particular, this makes it possible for extensions to register custom resource kinds. The old approach was to have a small array of resources of each kind, and if it fills up, switch to a hash table. The new approach also uses an array and a hash, but now the array and the hash are used at the same time. The array is used to hold the recently added resources, and when it fills up, they are moved to the hash. This keeps the access to recent entries fast, even when there are a lot of long-held resources. All the resource-specific ResourceOwnerEnlarge(), ResourceOwnerRemember(), and ResourceOwnerForget*() functions have been replaced with three generic functions that take resource kind as argument. For convenience, we still define resource-specific wrapper macros around the generic functions with the old names, but they are now defined in the source files that use those resource kinds. The release callback no longer needs to call ResourceOwnerForget on the resource being released. ResourceOwnerRelease unregisters the resource from the owner before calling the callback. That needed some changes in bufmgr.c and some other files, where releasing the resources previously always called ResourceOwnerForget. Each resource kind specifies a release priority, and ResourceOwnerReleaseAll releases the resources in priority order. To make that possible, we have to restrict what you can do between phases. After calling ResourceOwnerRelease(), you are no longer allowed to remember any more resources in it or to forget any previously remembered resources by calling ResourceOwnerForget. There was one case where that was done previously. At subtransaction commit, AtEOSubXact_Inval() would handle the invalidation messages and call RelationFlushRelation(), which temporarily increased the reference count on the relation being flushed. We now switch to the parent subtransaction's resource owner before calling AtEOSubXact_Inval(), so that there is a valid ResourceOwner to temporarily hold that relcache reference. Other end-of-xact routines make similar calls to AtEOXact_Inval() between release phases, but I didn't see any regression test failures from those, so I'm not sure if they could reach a codepath that needs remembering extra resources. There were two exceptions to how the resource leak WARNINGs on commit were printed previously: llvmjit silently released the context without printing the warning, and a leaked buffer io triggered a PANIC. Now everything prints a WARNING, including those cases. Add tests in src/test/modules/test_resowner. Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu Reviewed-by: Peter Eisentraut, Andres Freund Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi 2023-11-08 12:30:50 +01:00
			`Adding a new resource type`
			`--------------------------`

			`ResourceOwner can track ownership of many different kinds of resources. In`
			`core PostgreSQL it is used for buffer pins, lmgr locks, and catalog cache`
			`references, to name a few examples.`

			`To add a new kind of resource, define a ResourceOwnerDesc to describe it.`
			`For example:`

			`static const ResourceOwnerDesc myresource_desc = {`
			`.name = "My fancy resource",`
			`.release_phase = RESOURCE_RELEASE_AFTER_LOCKS,`
			`.release_priority = RELEASE_PRIO_FIRST,`
			`.ReleaseResource = ReleaseMyResource,`
			`.DebugPrint = PrintMyResource`
			`};`

			`ResourceOwnerRemember() and ResourceOwnerForget() functions take a pointer`
			`to that struct, along with a Datum to represent the resource. The meaning`
			`of the Datum depends on the resource type. Most resource types use it to`
			`store a pointer to some struct, but it can also be a file descriptor or`
			`library handle, for example.`

			`The ReleaseResource callback is called when a resource owner is released or`
			`deleted. It should release any resources (e.g. close files, free memory)`
			`associated with the resource. Because the callback is called during`
			`transaction abort, it must perform only low-level cleanup with no user`
			`visible effects. The callback should not perform operations that could`
			`fail, like allocate memory.`

			`The optional DebugPrint callback is used in the warning at transaction`
			`commit, if any resources are leaked. If not specified, a generic`
			`implementation that prints the resource name and the resource as a pointer`
			`is used.`

			`There is another API for other modules to get control during ResourceOwner`
			`release, so that they can scan their own data structures to find the objects`
			`that need to be deleted. See RegisterResourceReleaseCallback function.`
			`This used to be the only way for extensions to use the resource owner`
			`mechanism with new kinds of objects; nowadays it easier to define a custom`
			`ResourceOwnerDesc struct.`


			`Releasing`
			`---------`

			`Releasing the resources of a ResourceOwner happens in three phases:`

			`1. "Before-locks" resources`

			`2. Locks`

			`3. "After-locks" resources`

			`Each resource type specifies whether it needs to be released before or after`
			`locks. Each resource type also has a priority, which determines the order`
			`that the resources are released in. Note that the phases are performed fully`
			`for the whole tree of resource owners, before moving to the next phase, but`
			`the priority within each phase only determines the order within that`
			`ResourceOwner. Child resource owners are always handled before the parent,`
			`within each phase.`

			`For example, imagine that you have two ResourceOwners, parent and child,`
			`as follows:`

			`Parent`
			`parent resource BEFORE_LOCKS priority 1`
			`parent resource BEFORE_LOCKS priority 2`
			`parent resource AFTER_LOCKS priority 10001`
			`parent resource AFTER_LOCKS priority 10002`
			`Child`
			`child resource BEFORE_LOCKS priority 1`
			`child resource BEFORE_LOCKS priority 2`
			`child resource AFTER_LOCKS priority 10001`
			`child resource AFTER_LOCKS priority 10002`

			`These resources would be released in the following order:`

			`child resource BEFORE_LOCKS priority 1`
			`child resource BEFORE_LOCKS priority 2`
			`parent resource BEFORE_LOCKS priority 1`
			`parent resource BEFORE_LOCKS priority 2`
			`(locks)`
			`child resource AFTER_LOCKS priority 10001`
			`child resource AFTER_LOCKS priority 10002`
			`parent resource AFTER_LOCKS priority 10001`
			`parent resource AFTER_LOCKS priority 10002`

			`To release all the resources, you need to call ResourceOwnerRelease() three`
			`times, once for each phase. You may perform additional tasks between the`
			`phases, but after the first call to ResourceOwnerRelease(), you cannot use`
			`the ResourceOwner to remember any more resources. You also cannot call`
			`ResourceOwnerForget on the resource owner to release any previously`
			`remembered resources "in retail", after you have started the release process.`

			`Normally, you are expected to call ResourceOwnerForget on every resource so`
			`that at commit, the ResourceOwner is empty (locks are an exception). If there`
			`are any resources still held at commit, ResourceOwnerRelease will print a`
			`WARNING on each such resource. At abort, however, we truly rely on the`
			`ResourceOwner mechanism and it is normal that there are resources to be`
			`released.`