postgresql/src/tools/pgindent/typedefs.list

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

3937 lines
61 KiB
Plaintext
Raw Normal View History

2010-02-26 02:55:35 +01:00
ACCESS_ALLOWED_ACE
ACL
2010-02-26 02:55:35 +01:00
ACL_SIZE_INFORMATION
AFFIX
ASN1_INTEGER
ASN1_OBJECT
2010-02-26 02:55:35 +01:00
ASN1_OCTET_STRING
ASN1_STRING
AV
A_ArrayExpr
A_Const
A_Expr
A_Expr_Kind
A_Indices
A_Indirection
A_Star
AbsoluteTime
AccessMethodInfo
2010-02-26 02:55:35 +01:00
AccessPriv
Acl
AclItem
AclMaskHow
AclMode
AclResult
AcquireSampleRowsFunc
ActionList
2010-02-26 02:55:35 +01:00
ActiveSnapshotElt
AddForeignUpdateTargets_function
2010-02-26 02:55:35 +01:00
AffixNode
AffixNodeData
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
AfterTriggerEventList
AfterTriggerShared
AfterTriggerSharedData
AfterTriggersData
AfterTriggersQueryData
AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
2010-02-26 02:55:35 +01:00
AggInfo
AggPath
2016-08-15 19:42:51 +02:00
AggSplit
2010-02-26 02:55:35 +01:00
AggState
AggStatePerAgg
AggStatePerGroup
AggStatePerHash
AggStatePerPhase
AggStatePerTrans
2010-02-26 02:55:35 +01:00
AggStrategy
AggTransInfo
2010-02-26 02:55:35 +01:00
Aggref
AggregateInstrumentation
AlenState
2010-02-26 02:55:35 +01:00
Alias
AllocBlock
Improve performance of and reduce overheads of memory management Whenever we palloc a chunk of memory, traditionally, we prefix the returned pointer with a pointer to the memory context to which the chunk belongs. This is required so that we're able to easily determine the owning context when performing operations such as pfree() and repalloc(). For the AllocSet context, prior to this commit we additionally prefixed the pointer to the owning context with the size of the chunk. This made the header 16 bytes in size. This 16-byte overhead was required for all AllocSet allocations regardless of the allocation size. For the generation context, the problem was worse; in addition to the pointer to the owning context and chunk size, we also stored a pointer to the owning block so that we could track the number of freed chunks on a block. The slab allocator had a 16-byte chunk header. The changes being made here reduce the chunk header size down to just 8 bytes for all 3 of our memory context types. For small to medium sized allocations, this significantly increases the number of chunks that we can fit on a given block which results in much more efficient use of memory. Additionally, this commit completely changes the rule that pointers to palloc'd memory must be directly prefixed by a pointer to the owning memory context and instead, we now insist that they're directly prefixed by an 8-byte value where the least significant 3-bits are set to a value to indicate which type of memory context the pointer belongs to. Using those 3 bits as an index (known as MemoryContextMethodID) to a new array which stores the methods for each memory context type, we're now able to pass the pointer given to functions such as pfree() and repalloc() to the function specific to that context implementation to allow them to devise their own methods of finding the memory context which owns the given allocated chunk of memory. The reason we're able to reduce the chunk header down to just 8 bytes is because of the way we make use of the remaining 61 bits of the required 8-byte chunk header. Here we also implement a general-purpose MemoryChunk struct which makes use of those 61 remaining bits to allow the storage of a 30-bit value which the MemoryContext is free to use as it pleases, and also the number of bytes which must be subtracted from the chunk to get a reference to the block that the chunk is stored on (also 30 bits). The 1 additional remaining bit is to denote if the chunk is an "external" chunk or not. External here means that the chunk header does not store the 30-bit value or the block offset. The MemoryContext can use these external chunks at any time, but must use them if any of the two 30-bit fields are not large enough for the value(s) that need to be stored in them. When the chunk is marked as external, it is up to the MemoryContext to devise its own means to determine the block offset. Using 3-bits for the MemoryContextMethodID does mean we're limiting ourselves to only having a maximum of 8 different memory context types. We could reduce the bit space for the 30-bit value a little to make way for more than 3 bits, but it seems like it might be better to do that only if we ever need more than 8 context types. This would only be a problem if some future memory context type which does not use MemoryChunk really couldn't give up any of the 61 remaining bits in the chunk header. With this MemoryChunk, each of our 3 memory context types can quickly obtain a reference to the block any given chunk is located on. AllocSet is able to find the context to which the chunk is owned, by first obtaining a reference to the block by subtracting the block offset as is stored in the 'hdrmask' field and then referencing the block's 'aset' field. The Generation context uses the same method, but GenerationBlock did not have a field pointing back to the owning context, so one is added by this commit. In aset.c and generation.c, all allocations larger than allocChunkLimit are stored on dedicated blocks. When there's just a single chunk on a block like this, it's easy to find the block from the chunk, we just subtract the size of the block header from the chunk pointer. The size of these chunks is also known as we store the endptr on the block, so we can just subtract the pointer to the allocated memory from that. Because we can easily find the owning block and the size of the chunk for these dedicated blocks, we just always use external chunks for allocation sizes larger than allocChunkLimit. For generation.c, this sidesteps the problem of non-external MemoryChunks being unable to represent chunk sizes >= 1GB. This is less of a problem for aset.c as we store the free list index in the MemoryChunk's spare 30-bit field (the value of which will never be close to using all 30-bits). We can easily reverse engineer the chunk size from this when needed. Storing this saves AllocSetFree() from having to make a call to AllocSetFreeIndex() to determine which free list to put the newly freed chunk on. For the slab allocator, this commit adds a new restriction that slab chunks cannot be >= 1GB in size. If there happened to be any users of slab.c which used chunk sizes this large, they really should be using AllocSet instead. Here we also add a restriction that normal non-dedicated blocks cannot be 1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB during the creation of an AllocSet or Generation context. Allocations can still be larger than 1GB, it's just these will always be on dedicated blocks (which do not have the 1GB restriction). Author: Andres Freund, David Rowley Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
2022-08-29 07:15:00 +02:00
AllocFreeListLink
2010-02-26 02:55:35 +01:00
AllocPointer
AllocSet
AllocSetContext
AllocSetFreeList
2010-02-26 02:55:35 +01:00
AllocateDesc
AllocateDescKind
AlterCollationStmt
2010-02-26 02:55:35 +01:00
AlterDatabaseRefreshCollStmt
AlterDatabaseSetStmt
AlterDatabaseStmt
AlterDefaultPrivilegesStmt
AlterDomainStmt
AlterEnumStmt
2010-02-26 02:55:35 +01:00
AlterEventTrigStmt
AlterExtensionContentsStmt
AlterExtensionStmt
2010-02-26 02:55:35 +01:00
AlterFdwStmt
AlterForeignServerStmt
AlterFunctionStmt
AlterObjectDependsStmt
2010-02-26 02:55:35 +01:00
AlterObjectSchemaStmt
AlterOpFamilyStmt
AlterOperatorStmt
2010-02-26 02:55:35 +01:00
AlterOwnerStmt
AlterPolicyStmt
AlterPublicationAction
AlterPublicationStmt
2010-02-26 02:55:35 +01:00
AlterRoleSetStmt
AlterRoleStmt
AlterSeqStmt
AlterStatsStmt
AlterSubscriptionStmt
AlterSubscriptionType
AlterSystemStmt
AlterTSConfigType
2010-02-26 02:55:35 +01:00
AlterTSConfigurationStmt
AlterTSDictionaryStmt
AlterTableCmd
AlterTableMoveAllStmt
AlterTableSpaceOptionsStmt
2010-02-26 02:55:35 +01:00
AlterTableStmt
AlterTableType
AlterTableUtilityContext
AlterTypeRecurseParams
2010-02-26 02:55:35 +01:00
AlterTypeStmt
AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AmcheckOptions
AnalyzeAttrComputeStatsFunc
2010-02-26 02:55:35 +01:00
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
AnlExprData
2010-02-26 02:55:35 +01:00
AnlIndexData
AnyArrayType
2010-02-26 02:55:35 +01:00
Append
AppendPath
AppendRelInfo
AppendState
ApplyErrorCallbackArg
ApplyExecutionData
ApplySubXactData
2010-02-26 02:55:35 +01:00
Archive
ArchiveCheckConfiguredCB
ArchiveEntryPtrType
ArchiveFileCB
2010-02-26 02:55:35 +01:00
ArchiveFormat
ArchiveHandle
ArchiveMode
ArchiveModuleCallbacks
ArchiveModuleInit
2023-02-17 06:26:42 +01:00
ArchiveModuleState
ArchiveOpts
ArchiveShutdownCB
2010-02-26 02:55:35 +01:00
ArchiveStreamState
ArchiverOutput
2010-02-26 02:55:35 +01:00
ArchiverStage
ArrayAnalyzeExtraData
2010-02-26 02:55:35 +01:00
ArrayBuildState
ArrayBuildStateAny
ArrayBuildStateArr
2010-02-26 02:55:35 +01:00
ArrayCoerceExpr
ArrayConstIterState
ArrayExpr
ArrayExprIterState
ArrayIOData
ArrayIterator
2010-02-26 02:55:35 +01:00
ArrayMapState
ArrayMetaState
ArrayParseState
ArraySubWorkspace
2010-02-26 02:55:35 +01:00
ArrayType
AsyncQueueControl
AsyncQueueEntry
AsyncRequest
2010-02-26 02:55:35 +01:00
AttInMetadata
AttStatsSlot
2010-02-26 02:55:35 +01:00
AttoptCacheEntry
AttoptCacheKey
AttrDefInfo
AttrDefault
AttrMap
AttrMissing
2010-02-26 02:55:35 +01:00
AttrNumber
AttributeOpts
AuthRequest
AuthToken
AutoPrewarmSharedState
2010-02-26 02:55:35 +01:00
AutoVacOpts
AutoVacuumShmemStruct
AutoVacuumWorkItem
AutoVacuumWorkItemType
2010-02-26 02:55:35 +01:00
AuxProcType
BF_ctx
BF_key
BF_word
BF_word_signed
2010-02-26 02:55:35 +01:00
BIGNUM
BIO
BIO_METHOD
BITVECP
BMS_Comparison
2010-02-26 02:55:35 +01:00
BMS_Membership
BN_CTX
BOOL
BOOLEAN
BOX
BTArrayKeyInfo
2010-02-26 02:55:35 +01:00
BTBuildState
BTCycleId
BTDedupInterval
BTDedupState
BTDedupStateData
2010-02-26 02:55:35 +01:00
BTDeletedPageData
BTIndexStat
BTInsertState
BTInsertStateData
BTLeader
2010-02-26 02:55:35 +01:00
BTMetaPageData
BTOneVacInfo
BTOptions
BTPS_State
2010-02-26 02:55:35 +01:00
BTPageOpaque
BTPageOpaqueData
BTPageStat
BTPageState
BTParallelScanDesc
BTPendingFSM
BTScanInsert
BTScanInsertData
2010-02-26 02:55:35 +01:00
BTScanOpaque
BTScanOpaqueData
BTScanPos
2010-02-26 02:55:35 +01:00
BTScanPosData
BTScanPosItem
BTShared
BTSortArrayContext
2010-02-26 02:55:35 +01:00
BTSpool
BTStack
BTStackData
BTVacInfo
BTVacState
BTVacuumPosting
BTVacuumPostingData
2010-02-26 02:55:35 +01:00
BTWriteState
BUF_MEM
2010-02-26 02:55:35 +01:00
BYTE
BY_HANDLE_FILE_INFORMATION
Backend
BackendId
BackendParameters
BackendState
BackendType
BackgroundWorker
BackgroundWorkerArray
BackgroundWorkerHandle
BackgroundWorkerSlot
BackupState
Barrier
BaseBackupCmd
BaseBackupTargetHandle
BaseBackupTargetType
BeginDirectModify_function
BeginForeignInsert_function
BeginForeignModify_function
BeginForeignScan_function
BeginSampleScan_function
BernoulliSamplerData
BgWorkerStartTime
BgwHandleStatus
BinaryArithmFunc
2010-02-26 02:55:35 +01:00
BindParamCbData
BipartiteMatchState
BitString
2010-02-26 02:55:35 +01:00
BitmapAnd
BitmapAndPath
BitmapAndState
BitmapHeapPath
BitmapHeapScan
BitmapHeapScanState
BitmapIndexScan
BitmapIndexScanState
BitmapOr
BitmapOrPath
BitmapOrState
Bitmapset
BlobInfo
Block
BlockId
BlockIdData
BlockInfoRecord
2010-02-26 02:55:35 +01:00
BlockNumber
BlockSampler
BlockSamplerData
BlockedProcData
BlockedProcsData
BloomBuildState
BloomFilter
BloomMetaPageData
BloomOpaque
BloomOptions
BloomPageOpaque
BloomPageOpaqueData
BloomScanOpaque
BloomScanOpaqueData
BloomSignatureWord
BloomState
BloomTuple
BoolAggState
2010-02-26 02:55:35 +01:00
BoolExpr
BoolExprType
BoolTestType
Boolean
2010-02-26 02:55:35 +01:00
BooleanTest
BpChar
BrinBuildState
BrinDesc
BrinMemTuple
BrinMetaPageData
BrinOpaque
BrinOpcInfo
BrinOptions
BrinRevmap
BrinSpecialSpace
BrinStatsData
BrinTuple
BrinValues
BtreeCheckState
BtreeLevel
2010-02-26 02:55:35 +01:00
Bucket
BufFile
Buffer
BufferAccessStrategy
BufferAccessStrategyType
BufferCachePagesContext
BufferCachePagesRec
BufferDesc
BufferDescPadded
2010-02-26 02:55:35 +01:00
BufferHeapTupleTableSlot
BufferLookupEnt
BufferStrategyControl
BufferTag
BufferUsage
BuildAccumulator
BuiltinScript
2010-02-26 02:55:35 +01:00
BulkInsertState
BulkInsertStateData
CACHESIGN
CAC_state
Improve sys/catcache performance. The following are the individual improvements: 1) Avoidance of FunctionCallInfo based function calls, replaced by more efficient functions with a native C argument interface. 2) Don't extract columns from a cache entry's tuple whenever matching entries - instead store them as a Datum array. This also allows to get rid of having to build dummy tuples for negative & list entries, and of a hack for dealing with cstring vs. text weirdness. 3) Reorder members of catcache.h struct, so imortant entries are more likely to be on one cacheline. 4) Allowing the compiler to specialize critical SearchCatCache for a specific number of attributes allows to unroll loops and avoid other nkeys dependant initialization. 5) Only initializing the ScanKey when necessary, i.e. catcache misses, greatly reduces cache unnecessary cpu cache misses. 6) Split of the cache-miss case from the hash lookup, reducing stack allocations etc in the common case. 7) CatCTup and their corresponding heaptuple are allocated in one piece. This results in making cache lookups themselves roughly three times as fast - full-system benchmarks obviously improve less than that. I've also evaluated further techniques: - replace open coded hash with simplehash - the list walk right now shows up in profiles. Unfortunately it's not easy to do so safely as an entry's memory location can change at various times, which doesn't work well with the refcounting and cache invalidation. - Cacheline-aligning CatCTup entries - helps some with performance, but the win isn't big and the code for it is ugly, because the tuples have to be freed as well. - add more proper functions, rather than macros for SearchSysCacheCopyN etc., but right now they don't show up in profiles. The reason the macro wrapper for syscache.c/h have to be changed, rather than just catcache, is that doing otherwise would require exposing the SysCache array to the outside. That might be a good idea anyway, but it's for another day. Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
2017-10-13 22:16:50 +02:00
CCFastEqualFN
CCHashFN
CEOUC_WAIT_MODE
2010-02-26 02:55:35 +01:00
CFuncHashTabEntry
CHAR
CHECKPOINT
2010-02-26 02:55:35 +01:00
CHKVAL
CIRCLE
CMPDAffix
CONTEXT
COP
CRITICAL_SECTION
CRSSnapshotAction
2010-02-26 02:55:35 +01:00
CState
CTECycleClause
CTEMaterialize
CTESearchClause
2010-02-26 02:55:35 +01:00
CV
CachedExpression
2010-02-26 02:55:35 +01:00
CachedPlan
CachedPlanSource
CallContext
CallStmt
2010-02-26 02:55:35 +01:00
CancelRequestPacket
Cardinality
2010-02-26 02:55:35 +01:00
CaseExpr
CaseTestExpr
CaseWhen
Cash
CastInfo
CatCList
CatCTup
CatCache
CatCacheHeader
CatalogId
CatalogIdMapEntry
2010-02-26 02:55:35 +01:00
CatalogIndexState
ChangeVarNodes_context
CheckPoint
CheckPointStmt
CheckpointStatsData
CheckpointerRequest
CheckpointerShmemStruct
2010-02-26 02:55:35 +01:00
Chromosome
CkptSortItem
CkptTsStatus
ClientAuthentication_hook_type
ClientCertMode
ClientCertName
ClientConnectionInfo
2010-02-26 02:55:35 +01:00
ClientData
ClonePtrType
2010-02-26 02:55:35 +01:00
ClosePortalStmt
ClosePtrType
2010-02-26 02:55:35 +01:00
Clump
2010-07-06 21:18:19 +02:00
ClusterInfo
ClusterParams
2010-02-26 02:55:35 +01:00
ClusterStmt
CmdType
CoalesceExpr
CoerceParamHook
CoerceToDomain
CoerceToDomainValue
CoerceViaIO
CoercionContext
CoercionForm
CoercionPathType
2017-08-14 23:29:33 +02:00
CollAliasData
CollInfo
CollateClause
CollateExpr
CollateStrength
CollectedATSubcmd
CollectedCommand
CollectedCommandType
ColorTrgm
ColorTrgmInfo
2010-02-26 02:55:35 +01:00
ColumnCompareData
ColumnDef
ColumnIOData
ColumnRef
ColumnsHashData
CombinationGenerator
2010-02-26 02:55:35 +01:00
ComboCidEntry
ComboCidEntryData
ComboCidKey
ComboCidKeyData
Command
CommandDest
CommandId
CommandTag
CommandTagBehavior
2010-02-26 02:55:35 +01:00
CommentItem
CommentStmt
CommitTimestampEntry
CommitTimestampShared
CommonEntry
2010-02-26 02:55:35 +01:00
CommonTableExpr
CompareScalarsContext
CompiledExprState
CompositeIOData
2010-02-26 02:55:35 +01:00
CompositeTypeStmt
CompoundAffixFlag
CompressFileHandle
2010-02-26 02:55:35 +01:00
CompressionLocation
CompressorState
snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-13 01:03:49 +02:00
ComputeXidHorizonsResult
ConditionVariable
ConditionVariableMinimallyPadded
ConditionalStack
ConfigData
ConfigVariable
2010-02-26 02:55:35 +01:00
ConnCacheEntry
ConnCacheKey
ConnParams
2010-02-26 02:55:35 +01:00
ConnStatusType
ConnType
ConnectionStateEnum
ConsiderSplitContext
2010-02-26 02:55:35 +01:00
Const
ConstrCheck
ConstrType
Constraint
ConstraintCategory
ConstraintInfo
ConstraintsSetStmt
2010-07-06 21:18:19 +02:00
ControlData
2010-02-26 02:55:35 +01:00
ControlFileData
ConvInfo
ConvProcInfo
ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
CopyFormatOptions
CopyFromState
2010-02-26 02:55:35 +01:00
CopyFromStateData
CopyHeaderChoice
CopyInsertMethod
CopyMultiInsertBuffer
CopyMultiInsertInfo
2010-02-26 02:55:35 +01:00
CopySource
CopyStmt
CopyToState
2010-02-26 02:55:35 +01:00
CopyToStateData
Cost
CostSelector
Counters
CoverExt
2010-02-26 02:55:35 +01:00
CoverPos
CreateAmStmt
2010-02-26 02:55:35 +01:00
CreateCastStmt
CreateConversionStmt
Add new block-by-block strategy for CREATE DATABASE. Because this strategy logs changes on a block-by-block basis, it avoids the need to checkpoint before and after the operation. However, because it logs each changed block individually, it might generate a lot of extra write-ahead logging if the template database is large. Therefore, the older strategy remains available via a new STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy option to createdb. Somewhat controversially, this patch assembles the list of relations to be copied to the new database by reading the pg_class relation of the template database. Cross-database access like this isn't normally possible, but it can be made to work here because there can't be any connections to the database being copied, nor can it contain any in-doubt transactions. Even so, we have to use lower-level interfaces than normal, since the table scan and relcache interfaces will not work for a database to which we're not connected. The advantage of this approach is that we do not need to rely on the filesystem to determine what ought to be copied, but instead on PostgreSQL's own knowledge of the database structure. This avoids, for example, copying stray files that happen to be located in the source database directory. Dilip Kumar, with a fairly large number of cosmetic changes by me. Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor, Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian, Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others. Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
CreateDBRelInfo
CreateDBStrategy
2010-02-26 02:55:35 +01:00
CreateDomainStmt
CreateEnumStmt
CreateEventTrigStmt
CreateExtensionStmt
2010-02-26 02:55:35 +01:00
CreateFdwStmt
CreateForeignServerStmt
CreateForeignTableStmt
2010-02-26 02:55:35 +01:00
CreateFunctionStmt
CreateOpClassItem
CreateOpClassStmt
CreateOpFamilyStmt
CreatePLangStmt
CreatePolicyStmt
CreatePublicationStmt
CreateRangeStmt
CreateReplicationSlotCmd
2010-02-26 02:55:35 +01:00
CreateRoleStmt
CreateSchemaStmt
CreateSchemaStmtContext
CreateSeqStmt
CreateStatsStmt
2010-02-26 02:55:35 +01:00
CreateStmt
CreateStmtContext
CreateSubscriptionStmt
CreateTableAsStmt
2010-02-26 02:55:35 +01:00
CreateTableSpaceStmt
CreateTransformStmt
2010-02-26 02:55:35 +01:00
CreateTrigStmt
CreateUserMappingStmt
CreatedbStmt
CredHandle
2010-02-26 02:55:35 +01:00
CteItem
CteScan
CteScanState
CteState
CtlCommand
CtxtHandle
2010-02-26 02:55:35 +01:00
CurrentOfExpr
CustomExecMethods
CustomOutPtrType
CustomPath
CustomScan
CustomScanMethods
CustomScanState
2010-02-26 02:55:35 +01:00
CycleCtr
DBState
DCHCacheEntry
DEADLOCK_INFO
DECountItem
2010-02-26 02:55:35 +01:00
DH
DIR
DNSServiceErrorType
DNSServiceRef
2010-02-26 02:55:35 +01:00
DR_copy
DR_intorel
DR_printtup
DR_sqlfunction
DR_transientrel
DSA
2010-02-26 02:55:35 +01:00
DWORD
DataDumperPtr
DataPageDeleteStack
DatabaseInfo
2010-02-26 02:55:35 +01:00
DateADT
Datum
DatumTupleFields
2010-07-06 21:18:19 +02:00
DbInfo
DbInfoArr
DeClonePtrType
2010-02-26 02:55:35 +01:00
DeadLockState
DeallocateStmt
DeclareCursorStmt
DecodedBkpBlock
DecodedXLogRecord
DecodingOutputState
2010-02-26 02:55:35 +01:00
DefElem
DefElemAction
DefaultACLInfo
DefineStmt
DeleteStmt
DependencyGenerator
DependencyGeneratorData
2010-02-26 02:55:35 +01:00
DependencyType
DestReceiver
DictISpell
DictInt
DictSimple
DictSnowball
DictSubState
DictSyn
DictThesaurus
DimensionInfo
DirectoryMethodData
DirectoryMethodFile
DisableTimeoutParams
2010-02-26 02:55:35 +01:00
DiscardMode
DiscardStmt
DistanceValue
2010-02-26 02:55:35 +01:00
DistinctExpr
DoStmt
DocRepresentation
DomainConstraintCache
DomainConstraintRef
2010-02-26 02:55:35 +01:00
DomainConstraintState
DomainConstraintType
DomainIOData
DropBehavior
DropOwnedStmt
DropReplicationSlotCmd
2010-02-26 02:55:35 +01:00
DropRoleStmt
DropStmt
DropSubscriptionStmt
2010-02-26 02:55:35 +01:00
DropTableSpaceStmt
DropUserMappingStmt
DropdbStmt
DumpComponents
2010-02-26 02:55:35 +01:00
DumpId
DumpOptions
DumpSignalInformation
DumpableAcl
2010-02-26 02:55:35 +01:00
DumpableObject
DumpableObjectType
DumpableObjectWithAcl
DynamicFileList
DynamicZoneAbbrev
EC_KEY
2010-02-26 02:55:35 +01:00
EDGE
ENGINE
EOM_flatten_into_method
EOM_get_flat_size_method
2010-02-26 02:55:35 +01:00
EPQState
EPlan
EState
EStatus
EVP_CIPHER
EVP_CIPHER_CTX
2010-02-26 02:55:35 +01:00
EVP_MD
EVP_MD_CTX
EVP_PKEY
EachState
2010-02-26 02:55:35 +01:00
Edge
EditableObjectType
ElementsState
EnableTimeoutParams
EndBlobPtrType
EndBlobsPtrType
EndDataPtrType
EndDirectModify_function
EndForeignInsert_function
EndForeignModify_function
EndForeignScan_function
EndOfWalRecoveryInfo
EndSampleScan_function
EnumItem
2010-02-26 02:55:35 +01:00
EolType
EphemeralNameRelationType
EphemeralNamedRelation
EphemeralNamedRelationData
EphemeralNamedRelationMetadata
EphemeralNamedRelationMetadataData
2010-02-26 02:55:35 +01:00
EquivalenceClass
EquivalenceMember
ErrorContextCallback
ErrorData
ErrorSaveContext
EstimateDSMForeignScan_function
EstimationInfo
2010-02-26 02:55:35 +01:00
EventTriggerCacheEntry
EventTriggerCacheItem
EventTriggerCacheStateType
EventTriggerData
EventTriggerEvent
EventTriggerInfo
EventTriggerQueryState
ExceptionLabelMap
ExceptionMap
ExecAuxRowMark
ExecEvalBoolSubroutine
2010-02-26 02:55:35 +01:00
ExecEvalJsonExprContext
ExecEvalSubroutine
ExecForeignBatchInsert_function
ExecForeignDelete_function
ExecForeignInsert_function
ExecForeignTruncate_function
ExecForeignUpdate_function
ExecParallelEstimateContext
ExecParallelInitializeDSMContext
ExecPhraseData
2017-08-14 23:29:33 +02:00
ExecProcNodeMtd
2010-02-26 02:55:35 +01:00
ExecRowMark
ExecScanAccessMtd
ExecScanRecheckMtd
ExecStatus
ExecStatusType
ExecuteStmt
ExecutorCheckPerms_hook_type
2010-02-26 02:55:35 +01:00
ExecutorEnd_hook_type
ExecutorFinish_hook_type
2010-02-26 02:55:35 +01:00
ExecutorRun_hook_type
ExecutorStart_hook_type
ExpandedArrayHeader
ExpandedObjectHeader
ExpandedObjectMethods
ExpandedRange
ExpandedRecordFieldInfo
ExpandedRecordHeader
ExplainDirectModify_function
ExplainForeignModify_function
ExplainForeignScan_function
2010-02-26 02:55:35 +01:00
ExplainFormat
ExplainOneQuery_hook_type
ExplainState
ExplainStmt
ExplainWorkersState
ExportedSnapshot
2010-02-26 02:55:35 +01:00
Expr
ExprContext
ExprContextCallbackFunction
ExprContext_CB
ExprDoneCond
Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
ExprEvalOp
ExprEvalOpLookup
ExprEvalRowtypeCache
Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
ExprEvalStep
ExprState
ExprStateEvalFunc
ExtensibleNode
ExtensibleNodeEntry
ExtensibleNodeMethods
ExtensionControlFile
ExtensionInfo
ExtensionVersionInfo
FDWCollateState
FD_SET
2010-02-26 02:55:35 +01:00
FILE
FILETIME
FPI
FSMAddress
FSMPage
FSMPageData
FakeRelCacheEntry
FakeRelCacheEntryData
FastPathStrongRelationLockData
2010-02-26 02:55:35 +01:00
FdwInfo
FdwRoutine
2010-02-26 02:55:35 +01:00
FetchDirection
FetchStmt
FieldSelect
FieldStore
File
FileFdwExecutionState
FileFdwPlanState
2010-07-06 21:18:19 +02:00
FileNameMap
FileSet
FileTag
FinalPathExtraData
2010-02-26 02:55:35 +01:00
FindColsContext
FindSplitData
FindSplitStrat
FixedParallelExecutorState
FixedParallelState
2010-02-26 02:55:35 +01:00
FixedParamState
FlagMode
Float
FlushPosition
2010-02-26 02:55:35 +01:00
FmgrBuiltin
FmgrHookEventType
2010-02-26 02:55:35 +01:00
FmgrInfo
ForBothCellState
ForBothState
ForEachState
ForFiveState
ForFourState
ForThreeState
ForeignAsyncConfigureWait_function
ForeignAsyncNotify_function
ForeignAsyncRequest_function
2010-02-26 02:55:35 +01:00
ForeignDataWrapper
ForeignKeyCacheInfo
ForeignKeyOptInfo
ForeignPath
ForeignScan
ForeignScanState
2010-02-26 02:55:35 +01:00
ForeignServer
ForeignServerInfo
ForeignTable
ForeignTruncateInfo
2010-02-26 02:55:35 +01:00
ForkNumber
FormData_pg_aggregate
FormData_pg_am
FormData_pg_amop
FormData_pg_amproc
FormData_pg_attrdef
FormData_pg_attribute
FormData_pg_auth_members
FormData_pg_authid
FormData_pg_cast
FormData_pg_class
FormData_pg_collation
2010-02-26 02:55:35 +01:00
FormData_pg_constraint
FormData_pg_conversion
FormData_pg_database
FormData_pg_default_acl
FormData_pg_depend
FormData_pg_enum
FormData_pg_event_trigger
FormData_pg_extension
2010-02-26 02:55:35 +01:00
FormData_pg_foreign_data_wrapper
FormData_pg_foreign_server
FormData_pg_foreign_table
2010-02-26 02:55:35 +01:00
FormData_pg_index
FormData_pg_inherits
FormData_pg_language
FormData_pg_largeobject
FormData_pg_largeobject_metadata
FormData_pg_namespace
FormData_pg_opclass
FormData_pg_operator
FormData_pg_opfamily
FormData_pg_partitioned_table
FormData_pg_policy
2010-02-26 02:55:35 +01:00
FormData_pg_proc
FormData_pg_publication
FormData_pg_publication_namespace
FormData_pg_publication_rel
FormData_pg_range
FormData_pg_replication_origin
2010-02-26 02:55:35 +01:00
FormData_pg_rewrite
FormData_pg_sequence
FormData_pg_sequence_data
2010-02-26 02:55:35 +01:00
FormData_pg_shdepend
FormData_pg_statistic
FormData_pg_statistic_ext
FormData_pg_statistic_ext_data
FormData_pg_subscription
FormData_pg_subscription_rel
2010-02-26 02:55:35 +01:00
FormData_pg_tablespace
FormData_pg_transform
2010-02-26 02:55:35 +01:00
FormData_pg_trigger
FormData_pg_ts_config
FormData_pg_ts_config_map
FormData_pg_ts_dict
FormData_pg_ts_parser
FormData_pg_ts_template
FormData_pg_type
FormData_pg_user_mapping
Form_pg_aggregate
Form_pg_am
Form_pg_amop
Form_pg_amproc
Form_pg_attrdef
Form_pg_attribute
Form_pg_auth_members
Form_pg_authid
Form_pg_cast
Form_pg_class
Form_pg_collation
2010-02-26 02:55:35 +01:00
Form_pg_constraint
Form_pg_conversion
Form_pg_database
Form_pg_default_acl
Form_pg_depend
Form_pg_enum
Form_pg_event_trigger
Form_pg_extension
2010-02-26 02:55:35 +01:00
Form_pg_foreign_data_wrapper
Form_pg_foreign_server
Form_pg_foreign_table
2010-02-26 02:55:35 +01:00
Form_pg_index
Form_pg_inherits
Form_pg_language
Form_pg_largeobject
Form_pg_largeobject_metadata
Form_pg_namespace
Form_pg_opclass
Form_pg_operator
Form_pg_opfamily
Form_pg_partitioned_table
Form_pg_policy
2010-02-26 02:55:35 +01:00
Form_pg_proc
Form_pg_publication
Form_pg_publication_namespace
Form_pg_publication_rel
Form_pg_range
Form_pg_replication_origin
2010-02-26 02:55:35 +01:00
Form_pg_rewrite
Form_pg_sequence
Form_pg_sequence_data
2010-02-26 02:55:35 +01:00
Form_pg_shdepend
Form_pg_statistic
Form_pg_statistic_ext
Form_pg_statistic_ext_data
Form_pg_subscription
Form_pg_subscription_rel
2010-02-26 02:55:35 +01:00
Form_pg_tablespace
Form_pg_transform
2010-02-26 02:55:35 +01:00
Form_pg_trigger
Form_pg_ts_config
Form_pg_ts_config_map
Form_pg_ts_dict
Form_pg_ts_parser
Form_pg_ts_template
Form_pg_type
Form_pg_user_mapping
FormatNode
FreeBlockNumberArray
FreeListData
FreePageBtree
FreePageBtreeHeader
FreePageBtreeInternalKey
FreePageBtreeLeafKey
FreePageBtreeSearchResult
FreePageManager
FreePageSpanLeader
2010-02-26 02:55:35 +01:00
FromCharDateMode
FromExpr
FullTransactionId
2010-02-26 02:55:35 +01:00
FuncCall
FuncCallContext
FuncCandidateList
FuncDetailCode
FuncExpr
FuncInfo
FuncLookupError
2010-02-26 02:55:35 +01:00
FunctionCallInfo
Change function call information to be variable length. Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
FunctionCallInfoBaseData
2010-02-26 02:55:35 +01:00
FunctionParameter
FunctionParameterMode
FunctionScan
FunctionScanPerFuncState
2010-02-26 02:55:35 +01:00
FunctionScanState
FuzzyAttrMatchState
2010-02-26 02:55:35 +01:00
GBT_NUMKEY
GBT_NUMKEY_R
GBT_VARKEY
GBT_VARKEY_R
GENERAL_NAME
GISTBuildBuffers
2010-02-26 02:55:35 +01:00
GISTBuildState
GISTDeletedPageContents
GISTENTRY
GISTInsertStack
GISTInsertState
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
GISTIntArrayBigOptions
GISTIntArrayOptions
GISTNodeBuffer
GISTNodeBufferPage
2010-02-26 02:55:35 +01:00
GISTPageOpaque
GISTPageOpaqueData
GISTPageSplitInfo
2010-02-26 02:55:35 +01:00
GISTSTATE
GISTScanOpaque
GISTScanOpaqueData
GISTSearchHeapItem
GISTSearchItem
2010-02-26 02:55:35 +01:00
GISTTYPE
GIST_SPLITVEC
GMReaderTupleBuffer
GROUP
2010-02-26 02:55:35 +01:00
GV
Gather
GatherMerge
GatherMergePath
GatherMergeState
GatherPath
GatherState
2010-02-26 02:55:35 +01:00
Gene
GeneratePruningStepsContext
GenerationBlock
GenerationContext
GenerationPointer
GenericCosts
GenericXLogState
2010-02-26 02:55:35 +01:00
GeqoPrivateData
GetEPQSlotArg
GetForeignJoinPaths_function
GetForeignModifyBatchSize_function
GetForeignPaths_function
GetForeignPlan_function
GetForeignRelSize_function
GetForeignRowMarkType_function
GetForeignUpperPaths_function
GetState
GiSTOptions
2010-02-26 02:55:35 +01:00
GinBtree
GinBtreeData
GinBtreeDataLeafInsertData
GinBtreeEntryInsertData
2010-02-26 02:55:35 +01:00
GinBtreeStack
GinBuildState
GinChkVal
GinEntries
GinEntryAccumulator
GinIndexStat
2010-02-26 02:55:35 +01:00
GinMetaPageData
GinNullCategory
2010-02-26 02:55:35 +01:00
GinOptions
GinPageOpaque
GinPageOpaqueData
GinPlaceToPageRC
GinPostingList
GinQualCounts
2010-02-26 02:55:35 +01:00
GinScanEntry
GinScanKey
GinScanOpaque
GinScanOpaqueData
GinState
GinStatsData
GinTernaryValue
2010-02-26 02:55:35 +01:00
GinTupleCollector
GinVacuumState
GistBuildMode
2010-02-26 02:55:35 +01:00
GistEntryVector
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
GistHstoreOptions
GistInetKey
2010-02-26 02:55:35 +01:00
GistNSN
GistOptBufferingMode
GistSortedBuildLevelState
2010-02-26 02:55:35 +01:00
GistSplitUnion
GistSplitVector
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
GistTsVectorOptions
GistVacState
2010-02-26 02:55:35 +01:00
GlobalTransaction
GlobalVisHorizonKind
snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-13 01:03:49 +02:00
GlobalVisState
2010-02-26 02:55:35 +01:00
GrantRoleStmt
GrantStmt
GrantTargetType
Group
GroupClause
GroupPath
GroupPathExtraData
2010-02-26 02:55:35 +01:00
GroupResultPath
GroupState
GroupVarInfo
GroupingFunc
GroupingSet
GroupingSetData
GroupingSetKind
GroupingSetsPath
2010-02-26 02:55:35 +01:00
GucAction
GucBoolAssignHook
GucBoolCheckHook
2010-02-26 02:55:35 +01:00
GucContext
GucEnumAssignHook
GucEnumCheckHook
2010-02-26 02:55:35 +01:00
GucIntAssignHook
GucIntCheckHook
2010-02-26 02:55:35 +01:00
GucRealAssignHook
GucRealCheckHook
2010-02-26 02:55:35 +01:00
GucShowHook
GucSource
GucStack
GucStackState
GucStringAssignHook
GucStringCheckHook
GzipCompressorState
2010-02-26 02:55:35 +01:00
HANDLE
HASHACTION
HASHBUCKET
HASHCTL
HASHELEMENT
HASHHDR
HASHSEGMENT
HASH_SEQ_STATUS
HE
2010-02-26 02:55:35 +01:00
HEntry
HIST_ENTRY
HKEY
HLOCAL
HMAC_CTX
2010-02-26 02:55:35 +01:00
HMODULE
HOldEntry
HRESULT
2010-02-26 02:55:35 +01:00
HSParser
HSpool
HStore
HTAB
HTSV_Result
HV
Hash
HashAggBatch
HashAggSpill
2010-02-26 02:55:35 +01:00
HashAllocFunc
HashBuildState
HashCompareFunc
HashCopyFunc
HashIndexStat
HashInstrumentation
2010-02-26 02:55:35 +01:00
HashJoin
HashJoinState
HashJoinTable
HashJoinTuple
HashMemoryChunk
2010-02-26 02:55:35 +01:00
HashMetaPage
HashMetaPageData
HashOptions
2010-02-26 02:55:35 +01:00
HashPageOpaque
HashPageOpaqueData
HashPageStat
2010-02-26 02:55:35 +01:00
HashPath
HashScanOpaque
HashScanOpaqueData
HashScanPosData
HashScanPosItem
2010-02-26 02:55:35 +01:00
HashSkewBucket
HashState
HashValueFunc
HbaLine
HeadlineJsonState
2010-02-26 02:55:35 +01:00
HeadlineParsedText
HeadlineWordEntry
HeapCheckContext
HeapPageFreeze
2010-02-26 02:55:35 +01:00
HeapScanDesc
HeapTuple
HeapTupleData
HeapTupleFields
HeapTupleForceOption
HeapTupleFreeze
2010-02-26 02:55:35 +01:00
HeapTupleHeader
HeapTupleHeaderData
HeapTupleTableSlot
HistControl
HotStandbyState
I32
ICU_Convert_Func
ID
2010-02-26 02:55:35 +01:00
INFIX
INT128
INTERFACE_INFO
pgstat: Infrastructure for more detailed IO statistics This commit adds the infrastructure for more detailed IO statistics. The calls to actually count IOs, a system view to access the new statistics, documentation and tests will be added in subsequent commits, to make review easier. While we already had some IO statistics, e.g. in pg_stat_bgwriter and pg_stat_database, they did not provide sufficient detail to understand what the main sources of IO are, or whether configuration changes could avoid IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers written out by a backend, but as that includes extending relations (always done by backends) and writes triggered by the use of buffer access strategies, it cannot easily be used to tune background writer or checkpointer. Similarly, pg_stat_database.blks_read cannot easily be used to tune shared_buffers / compute a cache hit ratio, as the use of buffer access strategies will often prevent a large fraction of the read blocks to end up in shared_buffers. The new IO statistics count IO operations (evict, extend, fsync, read, reuse, and write), and are aggregated for each combination of backend type (backend, autovacuum worker, bgwriter, etc), target object of the IO (relations, temp relations) and context of the IO (normal, vacuum, bulkread, bulkwrite). What is tracked in this series of patches, is sufficient to perform the aforementioned analyses. Further details, e.g. tracking the number of buffer hits, would make that even easier, but was left out for now, to keep the scope of the already large patchset manageable. Bumps PGSTAT_FILE_FORMAT_ID. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-09 05:53:42 +01:00
IOContext
2010-02-26 02:55:35 +01:00
IOFuncSelector
pgstat: Infrastructure for more detailed IO statistics This commit adds the infrastructure for more detailed IO statistics. The calls to actually count IOs, a system view to access the new statistics, documentation and tests will be added in subsequent commits, to make review easier. While we already had some IO statistics, e.g. in pg_stat_bgwriter and pg_stat_database, they did not provide sufficient detail to understand what the main sources of IO are, or whether configuration changes could avoid IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers written out by a backend, but as that includes extending relations (always done by backends) and writes triggered by the use of buffer access strategies, it cannot easily be used to tune background writer or checkpointer. Similarly, pg_stat_database.blks_read cannot easily be used to tune shared_buffers / compute a cache hit ratio, as the use of buffer access strategies will often prevent a large fraction of the read blocks to end up in shared_buffers. The new IO statistics count IO operations (evict, extend, fsync, read, reuse, and write), and are aggregated for each combination of backend type (backend, autovacuum worker, bgwriter, etc), target object of the IO (relations, temp relations) and context of the IO (normal, vacuum, bulkread, bulkwrite). What is tracked in this series of patches, is sufficient to perform the aforementioned analyses. Further details, e.g. tracking the number of buffer hits, would make that even easier, but was left out for now, to keep the scope of the already large patchset manageable. Bumps PGSTAT_FILE_FORMAT_ID. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-09 05:53:42 +01:00
IOObject
IOOp
2010-02-26 02:55:35 +01:00
IPCompareMethod
ITEM
IV
IdentLine
2010-02-26 02:55:35 +01:00
IdentifierLookup
IdentifySystemCmd
IfStackElem
ImportForeignSchemaStmt
ImportForeignSchemaType
ImportForeignSchema_function
ImportQual
InProgressEnt
IncludeWal
InclusionOpaque
2010-02-26 02:55:35 +01:00
IncrementVarSublevelsUp_context
IncrementalSort
IncrementalSortExecutionStatus
2010-02-26 02:55:35 +01:00
IncrementalSortGroupInfo
IncrementalSortInfo
IncrementalSortPath
IncrementalSortState
Index
2016-08-15 19:42:51 +02:00
IndexAMProperty
IndexAmRoutine
2010-02-26 02:55:35 +01:00
IndexArrayKeyInfo
IndexAttachInfo
IndexAttrBitmapKind
2010-02-26 02:55:35 +01:00
IndexBuildCallback
IndexBuildResult
IndexBulkDeleteCallback
IndexBulkDeleteResult
IndexClause
IndexClauseSet
2010-02-26 02:55:35 +01:00
IndexDeleteCounts
Compute XID horizon for page level index vacuum on primary. Previously the xid horizon was only computed during WAL replay. That had two major problems: 1) It relied on knowing what the table pointed to looks like. That was easy enough before the introducing of tableam (we knew it had to be heap, although some trickery around logging the heap relfilenodes was required). But to properly handle table AMs we need per-database catalog access to look up the AM handler, which recovery doesn't allow. 2) Not knowing the xid horizon also makes it hard to support logical decoding on standbys. When on a catalog table, we need to be able to conflict with slots that have an xid horizon that's too old. But computing the horizon by visiting the heap only works once consistency is reached, but we always need to be able to detect conflicts. There's also a secondary problem, in that the current method performs redundant work on every standby. But that's counterbalanced by potentially computing the value when not necessary (either because there's no standby, or because there's no connected backends). Solve 1) and 2) by moving computation of the xid horizon to the primary and by involving tableam in the computation of the horizon. To address the potentially increased overhead, increase the efficiency of the xid horizon computation for heap by sorting the tids, and eliminating redundant buffer accesses. When prefetching is available, additionally perform prefetching of buffers. As this is more of a maintenance task, rather than something routinely done in every read only query, we add an arbitrary 10 to the effective concurrency - thereby using IO concurrency, when not globally enabled. That's possibly not the perfect formula, but seems good enough for now. Bumps WAL format, as latestRemovedXid is now part of the records, and the heap's relfilenode isn't anymore. Author: Andres Freund, Amit Khandekar, Robert Haas Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de https://postgr.es/m/20181214014235.dal5ogljs3bmlq44@alap3.anarazel.de https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-26 22:41:46 +01:00
IndexDeletePrefetchState
2010-02-26 02:55:35 +01:00
IndexElem
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
IndexFetchHeapData
IndexFetchTableData
2010-02-26 02:55:35 +01:00
IndexInfo
IndexList
IndexOnlyScan
IndexOnlyScanState
2010-02-26 02:55:35 +01:00
IndexOptInfo
IndexOrderByDistance
2010-02-26 02:55:35 +01:00
IndexPath
IndexRuntimeKeyInfo
IndexScan
IndexScanDesc
IndexScanState
IndexStateFlagsAction
IndexStmt
IndexTuple
IndexTupleData
IndexUniqueCheck
IndexVacuumInfo
IndxInfo
InferClause
InferenceElem
2010-02-26 02:55:35 +01:00
InfoItem
InhInfo
InheritableSocket
InitSampleScan_function
InitializeDSMForeignScan_function
InitializeWorkerForeignScan_function
2010-02-26 02:55:35 +01:00
InlineCodeBlock
InsertStmt
Instrumentation
Int128AggState
2010-02-26 02:55:35 +01:00
Int8TransTypeData
IntRBTreeNode
Integer
IntegerSet
2010-02-26 02:55:35 +01:00
InternalDefaultACL
InternalGrant
Interval
IntoClause
InvalMessageArray
InvalidationMsgsGroup
IpcMemoryId
IpcMemoryKey
IpcMemoryState
2010-02-26 02:55:35 +01:00
IpcSemaphoreId
IpcSemaphoreKey
IsForeignPathAsyncCapable_function
IsForeignRelUpdatable_function
IsForeignScanParallelSafe_function
IsoConnInfo
2010-02-26 02:55:35 +01:00
IspellDict
Item
ItemId
ItemIdData
ItemPointer
ItemPointerData
IterateDirectModify_function
IterateForeignScan_function
IterateJsonStringValuesState
JEntry
JHashState
2010-02-26 02:55:35 +01:00
JOBOBJECTINFOCLASS
JOBOBJECT_BASIC_LIMIT_INFORMATION
JOBOBJECT_BASIC_UI_RESTRICTIONS
JOBOBJECT_SECURITY_LIMIT_INFORMATION
JitContext
JitInstrumentation
JitProviderCallbacks
JitProviderCompileExprCB
JitProviderInit
JitProviderReleaseContextCB
JitProviderResetAfterErrorCB
2010-02-26 02:55:35 +01:00
Join
JoinCostWorkspace
2010-02-26 02:55:35 +01:00
JoinExpr
JoinHashEntry
JoinPath
JoinPathExtraData
2010-02-26 02:55:35 +01:00
JoinState
JoinType
JsObject
JsValue
JsonAggConstructor
JsonAggState
JsonArgument
JsonArrayAgg
JsonArrayConstructor
JsonArrayQueryConstructor
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonBaseObjectInfo
JsonBehavior
JsonBehaviorType
JsonCoercion
JsonCommon
JsonConstructorExpr
JsonConstructorType
JsonEncoding
JsonExpr
JsonExprOp
JsonFormat
JsonFormatType
JsonFunc
JsonFuncExpr
2010-02-26 02:55:35 +01:00
JsonHashEntry
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonIsPredicate
JsonItemCoercions
JsonIterateStringValuesAction
JsonKeyValue
JsonLexContext
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonLikeRegexContext
JsonManifestFileField
JsonManifestParseContext
JsonManifestParseState
JsonManifestSemanticState
JsonManifestWALRangeField
JsonObjectAgg
2010-02-26 02:55:35 +01:00
JsonObjectConstructor
JsonOutput
JsonParseContext
JsonParseErrorType
JsonParseExpr
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonPath
JsonPathBool
JsonPathDatatypeStatus
JsonPathExecContext
JsonPathExecResult
JsonPathGinAddPathItemFunc
JsonPathGinContext
JsonPathGinExtractNodesFunc
JsonPathGinNode
JsonPathGinNodeType
JsonPathGinPath
JsonPathGinPathItem
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonPathItem
JsonPathItemType
JsonPathKeyword
JsonPathMutableContext
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonPathParseItem
JsonPathParseResult
JsonPathPredicateCallback
JsonPathString
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonPathVarCallback
JsonPathVariableEvalContext
JsonQuotes
JsonReturning
JsonScalarExpr
JsonSemAction
JsonTokenType
JsonTransformStringValuesAction
JsonTypeCategory
JsonUniqueBuilderState
JsonUniqueCheckState
2010-02-26 02:55:35 +01:00
JsonUniqueHashEntry
JsonUniqueParsingState
2010-02-26 02:55:35 +01:00
JsonUniqueStackEntry
Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 10:15:37 +01:00
JsonValueExpr
JsonValueList
JsonValueListIterator
JsonValueType
JsonWrapper
Jsonb
JsonbAggState
JsonbContainer
JsonbInState
JsonbIterState
JsonbIterator
JsonbIteratorToken
JsonbPair
JsonbParseState
JsonbSubWorkspace
JsonbTypeCategory
JsonbValue
JumbleState
2010-02-26 02:55:35 +01:00
JunkFilter
KeyAction
KeyActions
KeyArray
2010-02-26 02:55:35 +01:00
KeySuffix
KeyWord
LARGE_INTEGER
LDAP
LDAPMessage
LDAPURLDesc
2010-02-26 02:55:35 +01:00
LDAP_TIMEVAL
LINE
LLVMAttributeRef
LLVMBasicBlockRef
LLVMBuilderRef
LLVMIntPredicate
LLVMJitContext
LLVMJitHandle
LLVMMemoryBufferRef
LLVMModuleRef
LLVMOrcJITStackRef
LLVMOrcModuleHandle
LLVMOrcTargetAddress
LLVMPassManagerBuilderRef
LLVMPassManagerRef
LLVMSharedModuleRef
LLVMTargetMachineRef
LLVMTargetRef
LLVMTypeRef
LLVMValueRef
2010-02-26 02:55:35 +01:00
LOCALLOCK
LOCALLOCKOWNER
LOCALLOCKTAG
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
LOCALPREDICATELOCK
2010-02-26 02:55:35 +01:00
LOCK
LOCKMASK
LOCKMETHODID
LOCKMODE
LOCKTAG
LONG
LONG_PTR
LOOP
2010-02-26 02:55:35 +01:00
LPBYTE
LPCTSTR
LPCWSTR
LPDWORD
LPFILETIME
2010-02-26 02:55:35 +01:00
LPSECURITY_ATTRIBUTES
LPSERVICE_STATUS
LPSTR
LPTHREAD_START_ROUTINE
LPTSTR
LPVOID
LPWSTR
LSEG
LUID
LVPagePruneState
2010-02-26 02:55:35 +01:00
LVRelState
LVSavedErrInfo
2010-02-26 02:55:35 +01:00
LWLock
LWLockHandle
2010-02-26 02:55:35 +01:00
LWLockMode
LWLockPadded
LZ4CompressorState
2010-02-26 02:55:35 +01:00
LZ4F_compressionContext_t
LZ4F_decompressOptions_t
2010-02-26 02:55:35 +01:00
LZ4F_decompressionContext_t
LZ4F_errorCode_t
2010-02-26 02:55:35 +01:00
LZ4F_preferences_t
LZ4File
LabelProvider
LagTracker
2010-02-26 02:55:35 +01:00
LargeObjectDesc
Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
LastAttnumInfo
Latch
LazyTupleTableSlot
LerpFunc
2010-02-26 02:55:35 +01:00
LexDescr
LexemeEntry
LexemeHashKey
LexemeInfo
LexemeKey
LexizeData
LibraryInfo
2010-02-26 02:55:35 +01:00
Limit
LimitOption
LimitPath
2010-02-26 02:55:35 +01:00
LimitState
LimitStateCond
List
ListCell
ListDictionary
ListParsedLex
ListenAction
ListenActionKind
ListenStmt
LoadStmt
LocalBufferLookupEnt
LocalPgBackendStatus
2010-02-26 02:55:35 +01:00
LocalTransactionId
LocationIndex
LocationLen
2010-02-26 02:55:35 +01:00
LockAcquireResult
LockClauseStrength
LockData
LockInfoData
LockInstanceData
2010-02-26 02:55:35 +01:00
LockMethod
LockMethodData
LockRelId
LockRows
LockRowsPath
2010-02-26 02:55:35 +01:00
LockRowsState
LockStmt
LockTagType
LockTupleMode
LockViewRecurse_context
LockWaitPolicy
2010-02-26 02:55:35 +01:00
LockingClause
LogOpts
2010-02-26 02:55:35 +01:00
LogStmtLevel
LogicalDecodeBeginCB
LogicalDecodeBeginPrepareCB
LogicalDecodeChangeCB
LogicalDecodeCommitCB
LogicalDecodeCommitPreparedCB
LogicalDecodeFilterByOriginCB
LogicalDecodeFilterPrepareCB
LogicalDecodeMessageCB
LogicalDecodePrepareCB
LogicalDecodeRollbackPreparedCB
LogicalDecodeShutdownCB
LogicalDecodeStartupCB
LogicalDecodeStreamAbortCB
LogicalDecodeStreamChangeCB
LogicalDecodeStreamCommitCB
LogicalDecodeStreamMessageCB
LogicalDecodeStreamPrepareCB
LogicalDecodeStreamStartCB
LogicalDecodeStreamStopCB
LogicalDecodeStreamTruncateCB
LogicalDecodeTruncateCB
LogicalDecodingContext
LogicalErrorCallbackState
LogicalOutputPluginInit
LogicalOutputPluginWriterPrepareWrite
LogicalOutputPluginWriterUpdateProgress
LogicalOutputPluginWriterWrite
LogicalRepBeginData
LogicalRepCommitData
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
LogicalRepCommitPreparedTxnData
LogicalRepCtxStruct
LogicalRepMode
LogicalRepMsgType
LogicalRepPartMapEntry
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
LogicalRepPreparedTxnData
LogicalRepRelId
LogicalRepRelMapEntry
LogicalRepRelation
Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 04:03:50 +02:00
LogicalRepRollbackPreparedTxnData
Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
2023-01-09 02:30:39 +01:00
LogicalRepStreamAbortData
LogicalRepTupleData
LogicalRepTyp
LogicalRepWorker
LogicalRewriteMappingData
2010-02-26 02:55:35 +01:00
LogicalTape
LogicalTapeSet
Prefetch data referenced by the WAL, take II. Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2022-04-07 09:28:40 +02:00
LsnReadQueue
LsnReadQueueNextFun
LsnReadQueueNextStatus
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
LtreeGistOptions
LtreeSignature
2010-02-26 02:55:35 +01:00
MAGIC
MBuf
MCVItem
MCVList
MEMORY_BASIC_INFORMATION
MGVTBL
MINIDUMPWRITEDUMP
MINIDUMP_TYPE
2010-07-06 21:18:19 +02:00
MJEvalResult
MTTargetRelLookup
MVDependencies
MVDependency
MVNDistinct
MVNDistinctItem
2010-02-26 02:55:35 +01:00
Material
MaterialPath
MaterialState
MdfdVec
Memoize
MemoizeEntry
MemoizeInstrumentation
MemoizeKey
MemoizePath
MemoizeState
MemoizeTuple
Improve performance of and reduce overheads of memory management Whenever we palloc a chunk of memory, traditionally, we prefix the returned pointer with a pointer to the memory context to which the chunk belongs. This is required so that we're able to easily determine the owning context when performing operations such as pfree() and repalloc(). For the AllocSet context, prior to this commit we additionally prefixed the pointer to the owning context with the size of the chunk. This made the header 16 bytes in size. This 16-byte overhead was required for all AllocSet allocations regardless of the allocation size. For the generation context, the problem was worse; in addition to the pointer to the owning context and chunk size, we also stored a pointer to the owning block so that we could track the number of freed chunks on a block. The slab allocator had a 16-byte chunk header. The changes being made here reduce the chunk header size down to just 8 bytes for all 3 of our memory context types. For small to medium sized allocations, this significantly increases the number of chunks that we can fit on a given block which results in much more efficient use of memory. Additionally, this commit completely changes the rule that pointers to palloc'd memory must be directly prefixed by a pointer to the owning memory context and instead, we now insist that they're directly prefixed by an 8-byte value where the least significant 3-bits are set to a value to indicate which type of memory context the pointer belongs to. Using those 3 bits as an index (known as MemoryContextMethodID) to a new array which stores the methods for each memory context type, we're now able to pass the pointer given to functions such as pfree() and repalloc() to the function specific to that context implementation to allow them to devise their own methods of finding the memory context which owns the given allocated chunk of memory. The reason we're able to reduce the chunk header down to just 8 bytes is because of the way we make use of the remaining 61 bits of the required 8-byte chunk header. Here we also implement a general-purpose MemoryChunk struct which makes use of those 61 remaining bits to allow the storage of a 30-bit value which the MemoryContext is free to use as it pleases, and also the number of bytes which must be subtracted from the chunk to get a reference to the block that the chunk is stored on (also 30 bits). The 1 additional remaining bit is to denote if the chunk is an "external" chunk or not. External here means that the chunk header does not store the 30-bit value or the block offset. The MemoryContext can use these external chunks at any time, but must use them if any of the two 30-bit fields are not large enough for the value(s) that need to be stored in them. When the chunk is marked as external, it is up to the MemoryContext to devise its own means to determine the block offset. Using 3-bits for the MemoryContextMethodID does mean we're limiting ourselves to only having a maximum of 8 different memory context types. We could reduce the bit space for the 30-bit value a little to make way for more than 3 bits, but it seems like it might be better to do that only if we ever need more than 8 context types. This would only be a problem if some future memory context type which does not use MemoryChunk really couldn't give up any of the 61 remaining bits in the chunk header. With this MemoryChunk, each of our 3 memory context types can quickly obtain a reference to the block any given chunk is located on. AllocSet is able to find the context to which the chunk is owned, by first obtaining a reference to the block by subtracting the block offset as is stored in the 'hdrmask' field and then referencing the block's 'aset' field. The Generation context uses the same method, but GenerationBlock did not have a field pointing back to the owning context, so one is added by this commit. In aset.c and generation.c, all allocations larger than allocChunkLimit are stored on dedicated blocks. When there's just a single chunk on a block like this, it's easy to find the block from the chunk, we just subtract the size of the block header from the chunk pointer. The size of these chunks is also known as we store the endptr on the block, so we can just subtract the pointer to the allocated memory from that. Because we can easily find the owning block and the size of the chunk for these dedicated blocks, we just always use external chunks for allocation sizes larger than allocChunkLimit. For generation.c, this sidesteps the problem of non-external MemoryChunks being unable to represent chunk sizes >= 1GB. This is less of a problem for aset.c as we store the free list index in the MemoryChunk's spare 30-bit field (the value of which will never be close to using all 30-bits). We can easily reverse engineer the chunk size from this when needed. Storing this saves AllocSetFree() from having to make a call to AllocSetFreeIndex() to determine which free list to put the newly freed chunk on. For the slab allocator, this commit adds a new restriction that slab chunks cannot be >= 1GB in size. If there happened to be any users of slab.c which used chunk sizes this large, they really should be using AllocSet instead. Here we also add a restriction that normal non-dedicated blocks cannot be 1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB during the creation of an AllocSet or Generation context. Allocations can still be larger than 1GB, it's just these will always be on dedicated blocks (which do not have the 1GB restriction). Author: Andres Freund, David Rowley Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
2022-08-29 07:15:00 +02:00
MemoryChunk
2010-02-26 02:55:35 +01:00
MemoryContext
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
2010-02-26 02:55:35 +01:00
MemoryContextData
Improve performance of and reduce overheads of memory management Whenever we palloc a chunk of memory, traditionally, we prefix the returned pointer with a pointer to the memory context to which the chunk belongs. This is required so that we're able to easily determine the owning context when performing operations such as pfree() and repalloc(). For the AllocSet context, prior to this commit we additionally prefixed the pointer to the owning context with the size of the chunk. This made the header 16 bytes in size. This 16-byte overhead was required for all AllocSet allocations regardless of the allocation size. For the generation context, the problem was worse; in addition to the pointer to the owning context and chunk size, we also stored a pointer to the owning block so that we could track the number of freed chunks on a block. The slab allocator had a 16-byte chunk header. The changes being made here reduce the chunk header size down to just 8 bytes for all 3 of our memory context types. For small to medium sized allocations, this significantly increases the number of chunks that we can fit on a given block which results in much more efficient use of memory. Additionally, this commit completely changes the rule that pointers to palloc'd memory must be directly prefixed by a pointer to the owning memory context and instead, we now insist that they're directly prefixed by an 8-byte value where the least significant 3-bits are set to a value to indicate which type of memory context the pointer belongs to. Using those 3 bits as an index (known as MemoryContextMethodID) to a new array which stores the methods for each memory context type, we're now able to pass the pointer given to functions such as pfree() and repalloc() to the function specific to that context implementation to allow them to devise their own methods of finding the memory context which owns the given allocated chunk of memory. The reason we're able to reduce the chunk header down to just 8 bytes is because of the way we make use of the remaining 61 bits of the required 8-byte chunk header. Here we also implement a general-purpose MemoryChunk struct which makes use of those 61 remaining bits to allow the storage of a 30-bit value which the MemoryContext is free to use as it pleases, and also the number of bytes which must be subtracted from the chunk to get a reference to the block that the chunk is stored on (also 30 bits). The 1 additional remaining bit is to denote if the chunk is an "external" chunk or not. External here means that the chunk header does not store the 30-bit value or the block offset. The MemoryContext can use these external chunks at any time, but must use them if any of the two 30-bit fields are not large enough for the value(s) that need to be stored in them. When the chunk is marked as external, it is up to the MemoryContext to devise its own means to determine the block offset. Using 3-bits for the MemoryContextMethodID does mean we're limiting ourselves to only having a maximum of 8 different memory context types. We could reduce the bit space for the 30-bit value a little to make way for more than 3 bits, but it seems like it might be better to do that only if we ever need more than 8 context types. This would only be a problem if some future memory context type which does not use MemoryChunk really couldn't give up any of the 61 remaining bits in the chunk header. With this MemoryChunk, each of our 3 memory context types can quickly obtain a reference to the block any given chunk is located on. AllocSet is able to find the context to which the chunk is owned, by first obtaining a reference to the block by subtracting the block offset as is stored in the 'hdrmask' field and then referencing the block's 'aset' field. The Generation context uses the same method, but GenerationBlock did not have a field pointing back to the owning context, so one is added by this commit. In aset.c and generation.c, all allocations larger than allocChunkLimit are stored on dedicated blocks. When there's just a single chunk on a block like this, it's easy to find the block from the chunk, we just subtract the size of the block header from the chunk pointer. The size of these chunks is also known as we store the endptr on the block, so we can just subtract the pointer to the allocated memory from that. Because we can easily find the owning block and the size of the chunk for these dedicated blocks, we just always use external chunks for allocation sizes larger than allocChunkLimit. For generation.c, this sidesteps the problem of non-external MemoryChunks being unable to represent chunk sizes >= 1GB. This is less of a problem for aset.c as we store the free list index in the MemoryChunk's spare 30-bit field (the value of which will never be close to using all 30-bits). We can easily reverse engineer the chunk size from this when needed. Storing this saves AllocSetFree() from having to make a call to AllocSetFreeIndex() to determine which free list to put the newly freed chunk on. For the slab allocator, this commit adds a new restriction that slab chunks cannot be >= 1GB in size. If there happened to be any users of slab.c which used chunk sizes this large, they really should be using AllocSet instead. Here we also add a restriction that normal non-dedicated blocks cannot be 1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB during the creation of an AllocSet or Generation context. Allocations can still be larger than 1GB, it's just these will always be on dedicated blocks (which do not have the 1GB restriction). Author: Andres Freund, David Rowley Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
2022-08-29 07:15:00 +02:00
MemoryContextMethodID
MemoryContextMethods
MemoryStatsPrintFunc
Add support for MERGE SQL command MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
2022-03-28 16:45:58 +02:00
MergeAction
MergeActionState
MergeAppend
MergeAppendPath
MergeAppendState
2010-02-26 02:55:35 +01:00
MergeJoin
MergeJoinClause
MergeJoinState
MergePath
MergeScanSelCache
Add support for MERGE SQL command MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
2022-03-28 16:45:58 +02:00
MergeStmt
MergeWhenClause
MetaCommand
2010-02-26 02:55:35 +01:00
MinMaxAggInfo
MinMaxAggPath
2010-02-26 02:55:35 +01:00
MinMaxExpr
MinMaxMultiOptions
2010-02-26 02:55:35 +01:00
MinMaxOp
MinimalTuple
MinimalTupleData
MinimalTupleTableSlot
MinmaxMultiOpaque
MinmaxOpaque
2010-02-26 02:55:35 +01:00
ModifyTable
ModifyTableContext
ModifyTablePath
2010-02-26 02:55:35 +01:00
ModifyTableState
MonotonicFunction
MorphOpaque
2010-02-26 02:55:35 +01:00
MsgType
MultiAssignRef
MultiSortSupport
MultiSortSupportData
2010-02-26 02:55:35 +01:00
MultiXactId
MultiXactMember
2010-02-26 02:55:35 +01:00
MultiXactOffset
MultiXactStateData
MultiXactStatus
2020-12-20 05:20:33 +01:00
MultirangeIOData
MultirangeParseState
MultirangeType
2010-02-26 02:55:35 +01:00
NDBOX
NODE
NTSTATUS
2010-02-26 02:55:35 +01:00
NUMCacheEntry
NUMDesc
NUMProc
NV
Name
NameData
NameHashEntry
2010-02-26 02:55:35 +01:00
NamedArgExpr
NamedLWLockTranche
NamedLWLockTrancheRequest
NamedTuplestoreScan
NamedTuplestoreScanState
2010-02-26 02:55:35 +01:00
NamespaceInfo
NestLoop
NestLoopParam
2010-02-26 02:55:35 +01:00
NestLoopState
NestPath
NewColumnValue
NewConstraint
NextSampleBlock_function
NextSampleTuple_function
NextValueExpr
2010-02-26 02:55:35 +01:00
Node
NodeTag
NonEmptyRange
2010-02-26 02:55:35 +01:00
Notification
NotificationHash
NotificationList
NotifyStmt
Nsrt
NtDllRoutine
2010-02-26 02:55:35 +01:00
NullIfExpr
NullTest
NullTestType
Change function call information to be variable length. Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 23:17:52 +01:00
NullableDatum
2010-02-26 02:55:35 +01:00
Numeric
NumericAggState
2010-02-26 02:55:35 +01:00
NumericDigit
NumericSortSupport
NumericSumAccum
2010-02-26 02:55:35 +01:00
NumericVar
OM_uint32
2010-02-26 02:55:35 +01:00
OP
OSAPerGroupState
OSAPerQueryState
OSInfo
OSSLCipher
2010-02-26 02:55:35 +01:00
OSSLDigest
OVERLAPPED
ObjectAccessDrop
ObjectAccessNamespaceSearch
ObjectAccessPostAlter
ObjectAccessPostCreate
ObjectAccessType
2010-02-26 02:55:35 +01:00
ObjectAddress
ObjectAddressAndFlags
ObjectAddressExtra
ObjectAddressStack
ObjectAddresses
ObjectClass
ObjectPropertyType
2010-02-26 02:55:35 +01:00
ObjectType
ObjectWithArgs
2010-02-26 02:55:35 +01:00
Offset
OffsetNumber
OffsetVarNodes_context
Oid
OidOptions
OkeysState
OldSnapshotControlData
OldSnapshotTimeMapping
2010-02-26 02:55:35 +01:00
OldToNewMapping
OldToNewMappingData
OnCommitAction
OnCommitItem
OnConflictAction
OnConflictClause
OnConflictExpr
OnConflictSetState
OpBtreeInterpretation
2010-02-26 02:55:35 +01:00
OpClassCacheEnt
OpExpr
OpFamilyMember
OpFamilyOpFuncGroup
2010-02-26 02:55:35 +01:00
OpclassInfo
Operator
2016-08-15 19:42:51 +02:00
OperatorElement
2010-02-26 02:55:35 +01:00
OpfamilyInfo
OprCacheEntry
OprCacheKey
OprInfo
OprProofCacheEntry
OprProofCacheKey
OutputContext
OutputPluginCallbacks
OutputPluginOptions
OutputPluginOutputType
2010-02-26 02:55:35 +01:00
OverrideSearchPath
OverrideStackEntry
OverridingKind
2010-02-26 02:55:35 +01:00
PACE_HEADER
PACL
PATH
PBOOL
PCtxtHandle
PERL_CONTEXT
PERL_SI
2010-02-26 02:55:35 +01:00
PFN
PGAlignedBlock
PGAlignedXLogBlock
2010-02-26 02:55:35 +01:00
PGAsyncStatusType
PGCALL2
PGChecksummablePage
PGContextVisibility
2010-02-26 02:55:35 +01:00
PGEvent
PGEventConnDestroy
PGEventConnReset
PGEventId
PGEventProc
PGEventRegister
PGEventResultCopy
PGEventResultCreate
PGEventResultDestroy
PGFInfoFunction
PGFileType
2010-02-26 02:55:35 +01:00
PGFunction
PGLZ_HistEntry
PGLZ_Strategy
PGMessageField
PGModuleMagicFunction
PGNoticeHooks
PGOutputData
PGOutputTxnData
2010-02-26 02:55:35 +01:00
PGPROC
PGP_CFB
PGP_Context
PGP_MPI
PGP_PubKey
PGP_S2K
PGPing
2010-02-26 02:55:35 +01:00
PGQueryClass
PGRUsage
PGSemaphore
PGSemaphoreData
PGShmemHeader
2010-07-06 21:18:19 +02:00
PGTargetServerType
PGTernaryBool
2010-02-26 02:55:35 +01:00
PGTransactionStatusType
PGVerbosity
PG_Locale_Strategy
2010-02-26 02:55:35 +01:00
PG_Lock_Status
PG_init_t
PGcancel
PGcmdQueueEntry
2010-02-26 02:55:35 +01:00
PGconn
PGdataValue
2010-02-26 02:55:35 +01:00
PGlobjfuncs
PGnotify
PGpipelineStatus
2010-02-26 02:55:35 +01:00
PGresAttDesc
PGresAttValue
PGresParamDesc
PGresult
PGresult_data
PHANDLE
PLAINTREE
PLAssignStmt
PLUID_AND_ATTRIBUTES
PLcword
PLpgSQL_case_when
PLpgSQL_condition
PLpgSQL_datum
PLpgSQL_datum_type
2010-02-26 02:55:35 +01:00
PLpgSQL_diag_item
PLpgSQL_exception
PLpgSQL_exception_block
PLpgSQL_execstate
PLpgSQL_expr
PLpgSQL_func_hashkey
PLpgSQL_function
PLpgSQL_getdiag_kind
PLpgSQL_if_elsif
PLpgSQL_label_type
2010-02-26 02:55:35 +01:00
PLpgSQL_nsitem
PLpgSQL_nsitem_type
2010-02-26 02:55:35 +01:00
PLpgSQL_plugin
PLpgSQL_promise_type
2010-02-26 02:55:35 +01:00
PLpgSQL_raise_option
PLpgSQL_raise_option_type
2010-02-26 02:55:35 +01:00
PLpgSQL_rec
PLpgSQL_recfield
PLpgSQL_resolve_option
PLpgSQL_row
PLpgSQL_stmt
PLpgSQL_stmt_assert
2010-02-26 02:55:35 +01:00
PLpgSQL_stmt_assign
PLpgSQL_stmt_block
PLpgSQL_stmt_call
2010-02-26 02:55:35 +01:00
PLpgSQL_stmt_case
PLpgSQL_stmt_close
PLpgSQL_stmt_commit
2010-02-26 02:55:35 +01:00
PLpgSQL_stmt_dynexecute
PLpgSQL_stmt_dynfors
PLpgSQL_stmt_execsql
PLpgSQL_stmt_exit
PLpgSQL_stmt_fetch
PLpgSQL_stmt_forc
PLpgSQL_stmt_foreach_a
2010-02-26 02:55:35 +01:00
PLpgSQL_stmt_fori
PLpgSQL_stmt_forq
PLpgSQL_stmt_fors
PLpgSQL_stmt_getdiag
PLpgSQL_stmt_if
PLpgSQL_stmt_loop
PLpgSQL_stmt_open
PLpgSQL_stmt_perform
PLpgSQL_stmt_raise
PLpgSQL_stmt_return
PLpgSQL_stmt_return_next
PLpgSQL_stmt_return_query
PLpgSQL_stmt_rollback
PLpgSQL_stmt_type
2010-02-26 02:55:35 +01:00
PLpgSQL_stmt_while
PLpgSQL_trigtype
PLpgSQL_type
PLpgSQL_type_type
2010-02-26 02:55:35 +01:00
PLpgSQL_var
PLpgSQL_variable
PLwdatum
PLword
PLyArrayToOb
PLyCursorObject
2010-02-26 02:55:35 +01:00
PLyDatumToOb
PLyDatumToObFunc
PLyExceptionEntry
PLyExecutionContext
PLyObToArray
2010-02-26 02:55:35 +01:00
PLyObToDatum
PLyObToDatumFunc
PLyObToDomain
PLyObToScalar
PLyObToTransform
2010-02-26 02:55:35 +01:00
PLyObToTuple
PLyObject_AsString_t
2010-02-26 02:55:35 +01:00
PLyPlanObject
PLyProcedure
PLyProcedureEntry
PLyProcedureKey
2010-02-26 02:55:35 +01:00
PLyResultObject
PLySRFState
PLySavedArgs
PLyScalarToOb
PLySubtransactionData
PLySubtransactionObject
PLyTransformToOb
2010-02-26 02:55:35 +01:00
PLyTupleToOb
PLyUnicode_FromStringAndSize_t
PLy_elog_impl_t
PMINIDUMP_CALLBACK_INFORMATION
PMINIDUMP_EXCEPTION_INFORMATION
PMINIDUMP_USER_STREAM_INFORMATION
2010-02-26 02:55:35 +01:00
PMSignalData
PMSignalReason
PMState
POLYGON
PQArgBlock
PQEnvironmentOption
PQExpBuffer
PQExpBufferData
PQcommMethods
2010-02-26 02:55:35 +01:00
PQconninfoOption
PQnoticeProcessor
PQnoticeReceiver
PQprintOpt
PQsslKeyPassHook_OpenSSL_type
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
PREDICATELOCK
PREDICATELOCKTAG
PREDICATELOCKTARGET
PREDICATELOCKTARGETTAG
2010-02-26 02:55:35 +01:00
PROCESS_INFORMATION
PROCLOCK
PROCLOCKTAG
PROC_HDR
PSID
PSID_AND_ATTRIBUTES
PSQL_COMP_CASE
2010-02-26 02:55:35 +01:00
PSQL_ECHO
PSQL_ECHO_HIDDEN
PSQL_ERROR_ROLLBACK
PTEntryArray
PTIterationArray
PTOKEN_PRIVILEGES
2010-02-26 02:55:35 +01:00
PTOKEN_USER
PULONG
PUTENVPROC
PVIndStats
PVIndVacStatus
2010-02-26 02:55:35 +01:00
PVOID
PVShared
2010-02-26 02:55:35 +01:00
PX_Alias
PX_Cipher
PX_Combo
PX_HMAC
PX_MD
Page
PageData
PageGistNSN
2010-02-26 02:55:35 +01:00
PageHeader
PageHeaderData
PageXLogRecPtr
PagetableEntry
Pairs
ParallelAppendState
Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
2023-01-09 02:30:39 +01:00
ParallelApplyWorkerEntry
ParallelApplyWorkerInfo
ParallelApplyWorkerShared
ParallelBitmapHeapState
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
ParallelBlockTableScanDesc
ParallelBlockTableScanWorker
ParallelBlockTableScanWorkerData
ParallelCompletionPtr
ParallelContext
ParallelExecutorInfo
Add parallel-aware hash joins. Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 08:39:21 +01:00
ParallelHashGrowth
ParallelHashJoinBatch
ParallelHashJoinBatchAccessor
ParallelHashJoinState
ParallelIndexScanDesc
ParallelReadyList
2010-02-26 02:55:35 +01:00
ParallelSlot
ParallelSlotArray
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
ParallelSlotResultHandler
ParallelState
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
ParallelTableScanDesc
ParallelTableScanDescData
Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
2023-01-09 02:30:39 +01:00
ParallelTransState
ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
2010-02-26 02:55:35 +01:00
Param
ParamCompileHook
2010-02-26 02:55:35 +01:00
ParamExecData
ParamExternData
ParamFetchHook
ParamKind
ParamListInfo
ParamPathInfo
2010-02-26 02:55:35 +01:00
ParamRef
ParamsErrorCbData
ParentMapEntry
2010-02-26 02:55:35 +01:00
ParseCallbackState
ParseExprKind
2010-02-26 02:55:35 +01:00
ParseNamespaceColumn
ParseNamespaceItem
ParseParamRefHook
ParseState
ParsedLex
ParsedScript
2010-02-26 02:55:35 +01:00
ParsedText
ParsedWord
ParserSetupHook
ParserState
PartClauseInfo
PartClauseMatchStatus
PartClauseTarget
Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
2023-01-09 02:30:39 +01:00
PartialFileSetState
PartitionBoundInfo
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
PartitionBoundInfoData
PartitionBoundSpec
PartitionCmd
PartitionDesc
PartitionDescData
PartitionDirectory
2010-02-26 02:55:35 +01:00
PartitionDirectoryEntry
PartitionDispatch
PartitionElem
PartitionHashBound
PartitionKey
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
PartitionListValue
PartitionMap
PartitionPruneCombineOp
PartitionPruneContext
PartitionPruneInfo
PartitionPruneState
PartitionPruneStep
PartitionPruneStepCombine
PartitionPruneStepOp
PartitionPruningData
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
PartitionedRelPruneInfo
PartitionedRelPruningData
PartitionwiseAggregateType
PasswordType
2010-02-26 02:55:35 +01:00
Path
PathClauseUsage
PathCostComparison
PathHashStack
2010-02-26 02:55:35 +01:00
PathKey
PathKeyInfo
2010-02-26 02:55:35 +01:00
PathKeysComparison
PathTarget
PathkeyMutatorState
2010-02-26 02:55:35 +01:00
PathkeySortCost
PatternInfo
PatternInfoArray
2010-02-26 02:55:35 +01:00
Pattern_Prefix_Status
Pattern_Type
PendingFsyncEntry
2010-02-26 02:55:35 +01:00
PendingRelDelete
PendingRelSync
PendingUnlinkEntry
Allow to trigger kernel writeback after a configurable number of writes. Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
2016-02-19 21:13:05 +01:00
PendingWriteback
PerLockTagEntry
2010-02-26 02:55:35 +01:00
PerlInterpreter
Perl_ppaddr_t
Permutation
PermutationStep
PermutationStepBlocker
2010-02-26 02:55:35 +01:00
PermutationStepBlockerType
PgArchData
PgBackendGSSStatus
PgBackendSSLStatus
2010-02-26 02:55:35 +01:00
PgBackendStatus
PgBenchExpr
PgBenchExprLink
PgBenchExprList
PgBenchExprType
PgBenchFunction
PgBenchValue
PgBenchValueType
PgChecksumMode
2010-02-26 02:55:35 +01:00
PgFdwAnalyzeState
PgFdwConnState
PgFdwDirectModifyState
2010-02-26 02:55:35 +01:00
PgFdwModifyState
PgFdwOption
PgFdwPathExtraData
2010-02-26 02:55:35 +01:00
PgFdwRelationInfo
PgFdwScanState
PgIfAddrCallback
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
PgStatShared_Archiver
PgStatShared_BgWriter
PgStatShared_Checkpointer
PgStatShared_Common
PgStatShared_Database
PgStatShared_Function
PgStatShared_HashEntry
pgstat: Infrastructure for more detailed IO statistics This commit adds the infrastructure for more detailed IO statistics. The calls to actually count IOs, a system view to access the new statistics, documentation and tests will be added in subsequent commits, to make review easier. While we already had some IO statistics, e.g. in pg_stat_bgwriter and pg_stat_database, they did not provide sufficient detail to understand what the main sources of IO are, or whether configuration changes could avoid IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers written out by a backend, but as that includes extending relations (always done by backends) and writes triggered by the use of buffer access strategies, it cannot easily be used to tune background writer or checkpointer. Similarly, pg_stat_database.blks_read cannot easily be used to tune shared_buffers / compute a cache hit ratio, as the use of buffer access strategies will often prevent a large fraction of the read blocks to end up in shared_buffers. The new IO statistics count IO operations (evict, extend, fsync, read, reuse, and write), and are aggregated for each combination of backend type (backend, autovacuum worker, bgwriter, etc), target object of the IO (relations, temp relations) and context of the IO (normal, vacuum, bulkread, bulkwrite). What is tracked in this series of patches, is sufficient to perform the aforementioned analyses. Further details, e.g. tracking the number of buffer hits, would make that even easier, but was left out for now, to keep the scope of the already large patchset manageable. Bumps PGSTAT_FILE_FORMAT_ID. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-09 05:53:42 +01:00
PgStatShared_IO
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
PgStatShared_Relation
PgStatShared_ReplSlot
PgStatShared_SLRU
PgStatShared_Subscription
PgStatShared_Wal
PgStat_ArchiverStats
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
PgStat_BackendSubEntry
PgStat_BgWriterStats
pgstat: Infrastructure for more detailed IO statistics This commit adds the infrastructure for more detailed IO statistics. The calls to actually count IOs, a system view to access the new statistics, documentation and tests will be added in subsequent commits, to make review easier. While we already had some IO statistics, e.g. in pg_stat_bgwriter and pg_stat_database, they did not provide sufficient detail to understand what the main sources of IO are, or whether configuration changes could avoid IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers written out by a backend, but as that includes extending relations (always done by backends) and writes triggered by the use of buffer access strategies, it cannot easily be used to tune background writer or checkpointer. Similarly, pg_stat_database.blks_read cannot easily be used to tune shared_buffers / compute a cache hit ratio, as the use of buffer access strategies will often prevent a large fraction of the read blocks to end up in shared_buffers. The new IO statistics count IO operations (evict, extend, fsync, read, reuse, and write), and are aggregated for each combination of backend type (backend, autovacuum worker, bgwriter, etc), target object of the IO (relations, temp relations) and context of the IO (normal, vacuum, bulkread, bulkwrite). What is tracked in this series of patches, is sufficient to perform the aforementioned analyses. Further details, e.g. tracking the number of buffer hits, would make that even easier, but was left out for now, to keep the scope of the already large patchset manageable. Bumps PGSTAT_FILE_FORMAT_ID. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-09 05:53:42 +01:00
PgStat_BktypeIO
PgStat_CheckpointerStats
2010-02-26 02:55:35 +01:00
PgStat_Counter
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
PgStat_EntryRef
PgStat_EntryRefHashEntry
PgStat_FetchConsistency
2010-02-26 02:55:35 +01:00
PgStat_FunctionCallUsage
PgStat_FunctionCounts
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
PgStat_HashKey
pgstat: Infrastructure for more detailed IO statistics This commit adds the infrastructure for more detailed IO statistics. The calls to actually count IOs, a system view to access the new statistics, documentation and tests will be added in subsequent commits, to make review easier. While we already had some IO statistics, e.g. in pg_stat_bgwriter and pg_stat_database, they did not provide sufficient detail to understand what the main sources of IO are, or whether configuration changes could avoid IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers written out by a backend, but as that includes extending relations (always done by backends) and writes triggered by the use of buffer access strategies, it cannot easily be used to tune background writer or checkpointer. Similarly, pg_stat_database.blks_read cannot easily be used to tune shared_buffers / compute a cache hit ratio, as the use of buffer access strategies will often prevent a large fraction of the read blocks to end up in shared_buffers. The new IO statistics count IO operations (evict, extend, fsync, read, reuse, and write), and are aggregated for each combination of backend type (backend, autovacuum worker, bgwriter, etc), target object of the IO (relations, temp relations) and context of the IO (normal, vacuum, bulkread, bulkwrite). What is tracked in this series of patches, is sufficient to perform the aforementioned analyses. Further details, e.g. tracking the number of buffer hits, would make that even easier, but was left out for now, to keep the scope of the already large patchset manageable. Bumps PGSTAT_FILE_FORMAT_ID. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-09 05:53:42 +01:00
PgStat_IO
PgStat_Kind
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
PgStat_KindInfo
PgStat_LocalState
pgstat: scaffolding for transactional stats creation / drop. One problematic part of the current statistics collector design is that there is no reliable way of getting rid of statistics entries. Because of that pgstat_vacuum_stat() (called by [auto-]vacuum) matches all stats for the current database with the catalog contents and tries to drop now-superfluous entries. That's quite expensive. What's worse, it doesn't work on physical replicas, despite physical replicas collection statistics entries. This commit introduces infrastructure to create / drop statistics entries transactionally, together with the underlying catalog objects (functions, relations, subscriptions). pgstat_xact.c maintains a list of stats entries created / dropped transactionally in the current transaction. To ensure the removal of statistics entries is durable dropped statistics entries are included in commit / abort (and prepare) records, which also ensures that stats entries are dropped on standbys. Statistics entries created separately from creating the underlying catalog object (e.g. when stats were previously lost due to an immediate restart) are *not* WAL logged. However that can only happen outside of the transaction creating the catalog object, so it does not lead to "leaked" statistics entries. For this to work, functions creating / dropping functions / relations / subscriptions need to call into pgstat. For subscriptions this was already done when dropping subscriptions, via pgstat_report_subscription_drop() (now renamed to pgstat_drop_subscription()). This commit does not actually drop stats yet, it just provides the infrastructure. It is however a largely independent piece of infrastructure, so committing it separately makes sense. Bumps XLOG_PAGE_MAGIC. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-04-07 03:22:22 +02:00
PgStat_PendingDroppedStatsItem
2010-02-26 02:55:35 +01:00
PgStat_SLRUStats
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
PgStat_ShmemControl
PgStat_Snapshot
PgStat_SnapshotEntry
2010-02-26 02:55:35 +01:00
PgStat_StatDBEntry
PgStat_StatFuncEntry
PgStat_StatReplSlotEntry
PgStat_StatSubEntry
2010-02-26 02:55:35 +01:00
PgStat_StatTabEntry
PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
2010-02-26 02:55:35 +01:00
Pg_finfo_record
Pg_magic_struct
PipeProtoChunk
PipeProtoHeader
PlaceHolderInfo
PlaceHolderVar
Plan
PlanDirectModify_function
PlanForeignModify_function
2010-02-26 02:55:35 +01:00
PlanInvalItem
PlanRowMark
PlanState
PlannedStmt
PlannerGlobal
PlannerInfo
PlannerParamItem
Point
Pointer
PolicyInfo
PolyNumAggState
2010-02-26 02:55:35 +01:00
Pool
PopulateArrayContext
PopulateArrayState
PopulateRecordCache
2010-02-26 02:55:35 +01:00
PopulateRecordsetState
Port
Portal
PortalHashEnt
PortalStatus
PortalStrategy
PostParseColumnRefHook
PostgresPollingStatusType
PostingItem
PostponedQual
2010-02-26 02:55:35 +01:00
PreParseColumnRefHook
PredClass
PredIterInfo
PredIterInfoData
PredXactList
PredXactListElement
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
PredicateLockData
PredicateLockTargetType
PrefetchBufferResult
PrepParallelRestorePtrType
2010-02-26 02:55:35 +01:00
PrepareStmt
PreparedStatement
PresortedKeyData
PrewarmType
PrintExtraTocPtrType
PrintTocDataPtrType
PrintfArgType
PrintfArgValue
PrintfTarget
2010-02-26 02:55:35 +01:00
PrinttupAttrInfo
PrivTarget
PrivateRefCountEntry
2010-02-26 02:55:35 +01:00
ProcArrayStruct
ProcLangInfo
ProcSignalBarrierType
ProcSignalHeader
ProcSignalReason
ProcSignalSlot
ProcState
ProcWaitStatus
2010-02-26 02:55:35 +01:00
ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
ProjectSet
ProjectSetPath
ProjectSetState
2010-02-26 02:55:35 +01:00
ProjectionInfo
ProjectionPath
PromptInterruptContext
2010-02-26 02:55:35 +01:00
ProtocolVersion
PrsStorage
PruneState
PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
2010-02-26 02:55:35 +01:00
PsqlScanResult
PsqlScanState
PsqlScanStateData
PsqlSettings
Publication
PublicationActions
Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 03:24:12 +01:00
PublicationDesc
PublicationInfo
PublicationObjSpec
PublicationObjSpecType
PublicationPartOpt
PublicationRelInfo
PublicationSchemaInfo
PublicationTable
2010-02-26 02:55:35 +01:00
PullFilter
PullFilterOps
PushFilter
PushFilterOps
PushFunction
PyCFunction
PyMappingMethods
2010-02-26 02:55:35 +01:00
PyMethodDef
PyModuleDef
2010-02-26 02:55:35 +01:00
PyObject
PySequenceMethods
PyTypeObject
Py_ssize_t
QPRS_STATE
QTN2QTState
QTNode
QUERYTYPE
QUERY_SECURITY_CONTEXT_TOKEN_FN
QualCost
QualItem
Query
QueryCompletion
2010-02-26 02:55:35 +01:00
QueryDesc
QueryEnvironment
2010-02-26 02:55:35 +01:00
QueryInfo
QueryItem
QueryItemType
QueryMode
QueryOperand
QueryOperator
QueryRepresentation
QueryRepresentationOperand
2010-02-26 02:55:35 +01:00
QuerySource
QueueBackendStatus
QueuePosition
QuitSignalReason
RBTNode
RBTOrderControl
RBTree
RBTreeIterator
REPARSE_JUNCTION_DATA_BUFFER
2010-02-26 02:55:35 +01:00
RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
RI_QueryHashEntry
RI_QueryKey
RTEKind
Rework query relation permission checking Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
2022-12-06 16:09:24 +01:00
RTEPermissionInfo
RWConflict
RWConflictPoolHeader
Range
RangeBound
RangeBox
2010-02-26 02:55:35 +01:00
RangeFunction
RangeIOData
2010-02-26 02:55:35 +01:00
RangeQueryClause
RangeSubselect
RangeTableFunc
RangeTableFuncCol
RangeTableSample
2010-02-26 02:55:35 +01:00
RangeTblEntry
RangeTblFunction
2010-02-26 02:55:35 +01:00
RangeTblRef
RangeType
2010-02-26 02:55:35 +01:00
RangeVar
RangeVarGetRelidCallback
Ranges
2010-02-26 02:55:35 +01:00
RawColumnDefault
RawParseMode
RawStmt
ReInitializeDSMForeignScan_function
ReScanForeignScan_function
ReadBufPtrType
2010-02-26 02:55:35 +01:00
ReadBufferMode
ReadBytePtrType
ReadExtraTocPtrType
ReadFunc
2010-02-26 02:55:35 +01:00
ReadLocalXLogPageNoWaitPrivate
ReadReplicationSlotCmd
2010-02-26 02:55:35 +01:00
ReassignOwnedStmt
RecheckForeignScan_function
2010-02-26 02:55:35 +01:00
RecordCacheEntry
RecordCompareData
RecordIOData
2018-06-30 18:07:27 +02:00
RecoveryLockListsEntry
RecoveryPauseState
RecoveryState
RecoveryTargetTimeLineGoal
2010-07-06 21:18:19 +02:00
RecoveryTargetType
RectBox
2010-02-26 02:55:35 +01:00
RecursionContext
RecursiveUnion
RecursiveUnionPath
2010-02-26 02:55:35 +01:00
RecursiveUnionState
RefetchForeignRow_function
RefreshMatViewStmt
2010-02-26 02:55:35 +01:00
RegProcedure
Regis
RegisNode
RegisteredBgWorker
2020-09-08 03:09:22 +02:00
ReindexErrorInfo
ReindexIndexInfo
ReindexObjectType
ReindexParams
2010-02-26 02:55:35 +01:00
ReindexStmt
ReindexType
Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
RelFileLocator
RelFileLocatorBackend
2010-02-26 02:55:35 +01:00
RelIdCacheEnt
2010-07-06 21:18:19 +02:00
RelInfo
RelInfoArr
2010-02-26 02:55:35 +01:00
RelMapFile
RelMapping
RelOptInfo
RelOptKind
RelToCheck
RelToCluster
RelabelType
Relation
RelationData
RelationInfo
RelationPtr
RelationSyncEntry
2010-02-26 02:55:35 +01:00
RelcacheCallbackFunction
ReleaseMatchCB
Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
RelfilenumberMapEntry
RelfilenumberMapKey
2010-02-26 02:55:35 +01:00
Relids
RelocationBufferInfo
RelptrFreePageBtree
RelptrFreePageManager
RelptrFreePageSpanLeader
2010-02-26 02:55:35 +01:00
RenameStmt
ReopenPtrType
ReorderBuffer
ReorderBufferApplyChangeCB
ReorderBufferApplyTruncateCB
ReorderBufferBeginCB
ReorderBufferChange
Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 03:24:12 +01:00
ReorderBufferChangeType
ReorderBufferCommitCB
ReorderBufferCommitPreparedCB
ReorderBufferDiskChange
ReorderBufferIterTXNEntry
ReorderBufferIterTXNState
ReorderBufferMessageCB
ReorderBufferPrepareCB
ReorderBufferRollbackPreparedCB
ReorderBufferStreamAbortCB
ReorderBufferStreamChangeCB
ReorderBufferStreamCommitCB
ReorderBufferStreamMessageCB
ReorderBufferStreamPrepareCB
ReorderBufferStreamStartCB
ReorderBufferStreamStopCB
ReorderBufferStreamTruncateCB
ReorderBufferTXN
ReorderBufferTXNByIdEnt
ReorderBufferToastEnt
ReorderBufferTupleBuf
ReorderBufferTupleCidEnt
ReorderBufferTupleCidKey
ReorderBufferUpdateProgressTxnCB
ReorderTuple
RepOriginId
ReparameterizeForeignPathByChild_function
2010-02-26 02:55:35 +01:00
ReplaceVarsFromTargetList_context
ReplaceVarsNoMatchOption
ReplicaIdentityStmt
ReplicationKind
ReplicationSlot
ReplicationSlotCtlData
ReplicationSlotOnDisk
ReplicationSlotPersistency
ReplicationSlotPersistentData
ReplicationState
ReplicationStateCtl
ReplicationStateOnDisk
2010-02-26 02:55:35 +01:00
ResTarget
ReservoirState
ReservoirStateData
ResourceArray
2010-02-26 02:55:35 +01:00
ResourceOwner
ResourceReleaseCallback
ResourceReleaseCallbackItem
ResourceReleasePhase
RestoreOptions
2017-08-14 23:29:33 +02:00
RestorePass
2010-02-26 02:55:35 +01:00
RestrictInfo
Result
ResultRelInfo
ResultState
ReturnSetInfo
ReturnStmt
RevmapContents
RewriteMappingDataEntry
RewriteMappingFile
2010-02-26 02:55:35 +01:00
RewriteRule
RewriteState
RmgrData
RmgrDescData
2010-02-26 02:55:35 +01:00
RmgrId
RoleNameItem
RoleSpec
RoleSpecType
2010-02-26 02:55:35 +01:00
RoleStmtType
RollupData
2010-02-26 02:55:35 +01:00
RowCompareExpr
RowCompareType
RowExpr
RowIdentityVarInfo
2010-02-26 02:55:35 +01:00
RowMarkClause
RowMarkType
RowSecurityDesc
RowSecurityPolicy
RtlGetLastNtStatus_t
2010-02-26 02:55:35 +01:00
RuleInfo
RuleLock
RuleStmt
RunningTransactions
RunningTransactionsData
SC_HANDLE
SECURITY_ATTRIBUTES
SECURITY_STATUS
2010-02-26 02:55:35 +01:00
SEG
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
SERIALIZABLEXACT
SERIALIZABLEXID
SERIALIZABLEXIDTAG
2010-02-26 02:55:35 +01:00
SERVICE_STATUS
SERVICE_STATUS_HANDLE
SERVICE_TABLE_ENTRY
SID_AND_ATTRIBUTES
SID_IDENTIFIER_AUTHORITY
SID_NAME_USE
SISeg
SIZE_T
2010-02-26 02:55:35 +01:00
SMgrRelation
SMgrRelationData
SMgrSortArray
SOCKADDR
SOCKET
2010-02-26 02:55:35 +01:00
SPELL
SPICallbackArg
SPIExecuteOptions
SPIParseOpenOptions
2010-02-26 02:55:35 +01:00
SPIPlanPtr
SPIPrepareOptions
SPITupleTable
SPLITCOST
SPNode
SPNodeData
SPPageDesc
SQLDropObject
2010-02-26 02:55:35 +01:00
SQLFunctionCache
SQLFunctionCachePtr
SQLFunctionParseInfo
SQLFunctionParseInfoPtr
2010-02-26 02:55:35 +01:00
SSL
SSLExtensionInfoContext
2010-02-26 02:55:35 +01:00
SSL_CTX
STARTUPINFO
STRLEN
SV
SYNCHRONIZATION_BARRIER
SampleScan
SampleScanGetSampleSize_function
SampleScanState
SavedTransactionCharacteristics
2010-02-26 02:55:35 +01:00
ScalarArrayOpExpr
ScalarArrayOpExprHashEntry
ScalarArrayOpExprHashTable
ScalarIOData
2010-02-26 02:55:35 +01:00
ScalarItem
ScalarMCVItem
Scan
ScanDirection
ScanKey
ScanKeyData
ScanKeywordHashFunc
ScanKeywordList
2010-02-26 02:55:35 +01:00
ScanState
ScanTypeControl
ScannerCallbackState
2010-02-26 02:55:35 +01:00
SchemaQuery
SecBuffer
SecBufferDesc
SecLabelItem
SecLabelStmt
SeenRelsEntry
SelectLimit
2010-02-26 02:55:35 +01:00
SelectStmt
Selectivity
SemTPadded
SemiAntiJoinFactors
2010-02-26 02:55:35 +01:00
SeqScan
SeqScanState
SeqTable
SeqTableData
SerCommitSeqNo
SerialControl
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
SerializableXactHandle
SerializedActiveRelMaps
SerializedClientConnectionInfo
SerializedRanges
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
Session
SessionBackupState
SessionEndType
2010-02-26 02:55:35 +01:00
SetConstraintState
SetConstraintStateData
SetConstraintTriggerData
Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
SetExprState
2010-02-26 02:55:35 +01:00
SetFunctionReturnMode
SetOp
SetOpCmd
SetOpPath
2010-02-26 02:55:35 +01:00
SetOpState
SetOpStatePerGroup
SetOpStrategy
SetOperation
SetOperationStmt
SetQuantifier
2010-02-26 02:55:35 +01:00
SetToDefault
SetupWorkerPtrType
ShDependObjectInfo
SharedAggInfo
SharedBitmapState
SharedDependencyObjectType
2010-02-26 02:55:35 +01:00
SharedDependencyType
SharedExecutorInstrumentation
SharedFileSet
SharedHashInfo
SharedIncrementalSortInfo
2010-02-26 02:55:35 +01:00
SharedInvalCatalogMsg
SharedInvalCatcacheMsg
SharedInvalRelcacheMsg
SharedInvalRelmapMsg
SharedInvalSmgrMsg
SharedInvalSnapshotMsg
2010-02-26 02:55:35 +01:00
SharedInvalidationMessage
SharedJitInstrumentation
SharedMemoizeInfo
SharedRecordTableEntry
SharedRecordTableKey
SharedRecordTypmodRegistry
SharedSortInfo
SharedTuplestore
SharedTuplestoreAccessor
SharedTuplestoreChunk
SharedTuplestoreParticipant
SharedTypmodTableEntry
Sharedsort
2010-02-26 02:55:35 +01:00
ShellTypeInfo
ShippableCacheEntry
ShippableCacheKey
2010-02-26 02:55:35 +01:00
ShmemIndexEnt
ShutdownForeignScan_function
ShutdownInformation
2010-02-26 02:55:35 +01:00
ShutdownMode
SignTSVector
SimpleActionList
SimpleActionListCell
2010-02-26 02:55:35 +01:00
SimpleEcontextStackEntry
SimpleOidList
SimpleOidListCell
SimplePtrList
SimplePtrListCell
SimpleStats
2010-02-26 02:55:35 +01:00
SimpleStringList
SimpleStringListCell
SingleBoundSortItem
2010-02-26 02:55:35 +01:00
Size
SkipPages
Add "Slab" MemoryContext implementation for efficient equal-sized allocations. The default general purpose aset.c style memory context is not a great choice for allocations that are all going to be evenly sized, especially when those objects aren't small, and have varying lifetimes. There tends to be a lot of fragmentation, larger allocations always directly go to libc rather than have their cost amortized over several pallocs. These problems lead to the introduction of ad-hoc slab allocators in reorderbuffer.c. But it turns out that the simplistic implementation leads to problems when a lot of objects are allocated and freed, as aset.c is still the underlying implementation. Especially freeing can easily run into O(n^2) behavior in aset.c. While the O(n^2) behavior in aset.c can, and probably will, be addressed, custom allocators for this behavior are more efficient both in space and time. This allocator is for evenly sized allocations, and supports both cheap allocations and freeing, without fragmenting significantly. It does so by allocating evenly sized blocks via malloc(), and carves them into chunks that can be used for allocations. In order to release blocks to the OS as early as possible, chunks are allocated from the fullest block that still has free objects, increasing the likelihood of a block being entirely unused. A subsequent commit uses this in reorderbuffer.c, but a further allocator is needed to resolve the performance problems triggering this work. There likely are further potentialy uses of this allocator besides reorderbuffer.c. There's potential further optimizations of the new slab.c, in particular the array of freelists could be replaced by a more intelligent structure - but for now this looks more than good enough. Author: Tomas Vondra, editorialized by Andres Freund Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
2017-02-27 12:41:44 +01:00
SlabBlock
SlabContext
SlabSlot
SlotNumber
2010-02-26 02:55:35 +01:00
SlruCtl
SlruCtlData
SlruErrorCause
SlruPageStatus
SlruScanCallback
2010-02-26 02:55:35 +01:00
SlruShared
SlruSharedData
SlruWriteAll
SlruWriteAllData
SnapBuild
SnapBuildOnDisk
SnapBuildState
2010-02-26 02:55:35 +01:00
Snapshot
SnapshotData
SnapshotType
2010-02-26 02:55:35 +01:00
SockAddr
Sort
SortBy
SortByDir
SortByNulls
SortCoordinate
2010-02-26 02:55:35 +01:00
SortGroupClause
SortItem
SortPath
SortShimExtra
2010-02-26 02:55:35 +01:00
SortState
SortSupport
SortSupportData
2010-02-26 02:55:35 +01:00
SortTuple
SortTupleComparator
SortedPoint
SpGistBuildState
SpGistCache
SpGistDeadTuple
SpGistDeadTupleData
SpGistInnerTuple
SpGistInnerTupleData
SpGistLUPCache
SpGistLastUsedPage
SpGistLeafTuple
SpGistLeafTupleData
SpGistMetaPageData
SpGistNodeTuple
SpGistNodeTupleData
SpGistOptions
SpGistPageOpaque
SpGistPageOpaqueData
SpGistScanOpaque
SpGistScanOpaqueData
SpGistSearchItem
SpGistState
SpGistTypeDesc
2010-02-26 02:55:35 +01:00
SpecialJoinInfo
Allow Pin/UnpinBuffer to operate in a lockfree manner. Pinning/Unpinning a buffer is a very frequent operation; especially in read-mostly cache resident workloads. Benchmarking shows that in various scenarios the spinlock protecting a buffer header's state becomes a significant bottleneck. The problem can be reproduced with pgbench -S on larger machines, but can be considerably worse for queries which touch the same buffers over and over at a high frequency (e.g. nested loops over a small inner table). To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). As not all operations can easily implemented in a lockfree manner, implement the previous buf_hdr_lock via a flag bit in the atomic variable. That way we can continue to lock the header in places where it's needed, but can get away without acquiring it in the more frequent hot-paths. There's some additional operations which can be done without the lock, but aren't in this patch; but the most important places are covered. As bufmgr.c now essentially re-implements spinlocks, abstract the delay logic from s_lock.c into something more generic. It now has already two users, and more are coming up; there's a follupw patch for lwlock.c at least. This patch is based on a proof-of-concept written by me, which Alexander Korotkov made into a fully working patch; the committed version is again revised by me. Benchmarking and testing has, amongst others, been provided by Dilip Kumar, Alexander Korotkov, Robert Haas. On a large x86 system improvements for readonly pgbench, with a high client count, of a factor of 8 have been observed. Author: Alexander Korotkov and Andres Freund Discussion: 2400449.GjM57CE0Yg@dinodell
2016-04-11 05:12:32 +02:00
SpinDelayStatus
SplitInterval
SplitLR
SplitPoint
SplitTextOutputData
2010-02-26 02:55:35 +01:00
SplitVar
SplitedPageLayout
StackElem
StartBlobPtrType
StartBlobsPtrType
StartDataPtrType
StartReplicationCmd
StartupStatusEnum
2010-02-26 02:55:35 +01:00
StatEntry
StatExtEntry
StateFileChunk
StatisticExtInfo
StatsBuildData
StatsData
StatsElem
StatsExtInfo
2010-02-26 02:55:35 +01:00
StdAnalyzeData
StdRdOptIndexCleanup
2010-02-26 02:55:35 +01:00
StdRdOptions
Step
2010-02-26 02:55:35 +01:00
StopList
StrategyNumber
StreamCtl
String
2010-02-26 02:55:35 +01:00
StringInfo
StringInfoData
StripnullState
2010-02-26 02:55:35 +01:00
SubLink
SubLinkType
SubOpts
2010-02-26 02:55:35 +01:00
SubPlan
SubPlanState
Allow multiple xacts during table sync in logical replication. For the initial table data synchronization in logical replication, we use a single transaction to copy the entire table and then synchronize the position in the stream with the main apply worker. There are multiple downsides of this approach: (a) We have to perform the entire copy operation again if there is any error (network breakdown, error in the database operation, etc.) while we synchronize the WAL position between tablesync worker and apply worker; this will be onerous especially for large copies, (b) Using a single transaction in the synchronization-phase (where we can receive WAL from multiple transactions) will have the risk of exceeding the CID limit, (c) The slot will hold the WAL till the entire sync is complete because we never commit till the end. This patch solves all the above downsides by allowing multiple transactions during the tablesync phase. The initial copy is done in a single transaction and after that, we commit each transaction as we receive. To allow recovery after any error or crash, we use a permanent slot and origin to track the progress. The slot and origin will be removed once we finish the synchronization of the table. We also remove slot and origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not finished. The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true cannot be executed inside a transaction block because they can now drop the slots for which we have no provision to rollback. This will also open up the path for logical replication of 2PC transactions on the subscriber side. Previously, we can't do that because of the requirement of maintaining a single transaction in tablesync workers. Bump catalog version due to change of state in the catalog (pg_subscription_rel). Author: Peter Smith, Amit Kapila, and Takamichi Osumi Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
2021-02-12 03:11:51 +01:00
SubRemoveRels
2010-02-26 02:55:35 +01:00
SubTransactionId
SubXactCallback
SubXactCallbackItem
SubXactEvent
SubXactInfo
2010-02-26 02:55:35 +01:00
SubqueryScan
SubqueryScanPath
2010-02-26 02:55:35 +01:00
SubqueryScanState
SubqueryScanStatus
SubscriptExecSetup
SubscriptExecSteps
SubscriptRoutines
SubscriptTransform
SubscriptingRef
SubscriptingRefState
Subscription
SubscriptionInfo
SubscriptionRelState
SupportRequestCost
SupportRequestIndexCondition
SupportRequestRows
2010-02-26 02:55:35 +01:00
SupportRequestSelectivity
SupportRequestSimplify
SupportRequestWFuncMonotonic
2010-02-26 02:55:35 +01:00
Syn
SyncOps
SyncRepConfigData
SyncRepStandbyData
SyncRequestHandler
SyncRequestType
SysFKRelationship
2010-02-26 02:55:35 +01:00
SysScanDesc
SyscacheCallbackFunction
SystemRowsSamplerData
SystemSamplerData
SystemTimeSamplerData
2010-02-26 02:55:35 +01:00
TAR_MEMBER
TBMIterateResult
TBMIteratingState
2010-02-26 02:55:35 +01:00
TBMIterator
TBMSharedIterator
TBMSharedIteratorState
2010-02-26 02:55:35 +01:00
TBMStatus
TBlockState
TIDBitmap
tableam: Add tuple_{insert, delete, update, lock} and use. This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
TM_FailureData
TM_IndexDelete
TM_IndexDeleteOp
TM_IndexStatus
tableam: Add tuple_{insert, delete, update, lock} and use. This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-24 03:55:57 +01:00
TM_Result
2010-02-26 02:55:35 +01:00
TOKEN_DEFAULT_DACL
TOKEN_INFORMATION_CLASS
TOKEN_PRIVILEGES
2010-02-26 02:55:35 +01:00
TOKEN_USER
TParser
TParserCharTest
TParserPosition
TParserSpecial
TParserState
TParserStateAction
TParserStateActionItem
TQueueDestReceiver
2010-02-26 02:55:35 +01:00
TRGM
TSAnyCacheEntry
TSConfigCacheEntry
TSConfigInfo
TSDictInfo
TSDictionaryCacheEntry
TSExecuteCallback
2010-02-26 02:55:35 +01:00
TSLexeme
TSParserCacheEntry
TSParserInfo
TSQuery
TSQueryData
TSQueryParserState
TSQuerySign
TSReadPointer
TSTemplateInfo
TSTernaryValue
2010-02-26 02:55:35 +01:00
TSTokenTypeStorage
TSVector
TSVectorBuildState
2010-02-26 02:55:35 +01:00
TSVectorData
TSVectorParseState
TSVectorStat
TState
TStatus
2010-02-26 02:55:35 +01:00
TStoreState
TXNEntryFile
2010-02-26 02:55:35 +01:00
TYPCATEGORY
T_Action
T_WorkerStatus
tableam: introduce table AM infrastructure. This introduces the concept of table access methods, i.e. CREATE ACCESS METHOD ... TYPE TABLE and CREATE TABLE ... USING (storage-engine). No table access functionality is delegated to table AMs as of this commit, that'll be done in following commits. Subsequent commits will incrementally abstract table access functionality to be routed through table access methods. That change is too large to be reviewed & committed at once, so it'll be done incrementally. Docs will be updated at the end, as adding them incrementally would likely make them less coherent, and definitely is a lot more work, without a lot of benefit. Table access methods are specified similar to index access methods, i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with callbacks. In contrast to index AMs that struct needs to live as long as a backend, typically that's achieved by just returning a pointer to a constant struct. Psql's \d+ now displays a table's access method. That can be disabled with HIDE_TABLEAM=true, which is mainly useful so regression tests can be run against different AMs. It's quite possible that this behaviour still needs to be fine tuned. For now it's not allowed to set a table AM for a partitioned table, as we've not resolved how partitions would inherit that. Disallowing allows us to introduce, if we decide that's the way forward, such a behaviour without a compatibility break. Catversion bumped, to add the heap table AM and references to it. Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 18:54:38 +01:00
TableAmRoutine
2010-02-26 02:55:35 +01:00
TableAttachInfo
TableDataInfo
TableFunc
TableFuncRoutine
TableFuncScan
TableFuncScanState
2010-02-26 02:55:35 +01:00
TableInfo
TableLikeClause
TableSampleClause
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
TableScanDesc
TableScanDescData
2010-02-26 02:55:35 +01:00
TableSpaceCacheEntry
TableSpaceOpts
TablespaceList
TablespaceListCell
TapeBlockTrailer
TapeShare
TarMethodData
TarMethodFile
2010-02-26 02:55:35 +01:00
TargetEntry
TclExceptionNameMap
2010-02-26 02:55:35 +01:00
Tcl_DString
Tcl_FileProc
Tcl_HashEntry
Tcl_HashTable
Tcl_Interp
Tcl_NotifierProcs
Tcl_Obj
2010-02-26 02:55:35 +01:00
Tcl_Time
TempNamespaceStatus
TestDecodingData
TestDecodingTxnData
TestSpec
2010-02-26 02:55:35 +01:00
TextFreq
TextPositionState
TheLexeme
TheSubstitute
TidExpr
TidExprType
2010-02-26 02:55:35 +01:00
TidHashKey
TidOpExpr
2010-02-26 02:55:35 +01:00
TidPath
TidRangePath
TidRangeScan
2010-02-26 02:55:35 +01:00
TidRangeScanState
TidScan
TidScanState
TimeADT
TimeLineHistoryCmd
TimeLineHistoryEntry
2010-02-26 02:55:35 +01:00
TimeLineID
TimeOffset
TimeStamp
2010-02-26 02:55:35 +01:00
TimeTzADT
TimeZoneAbbrevTable
TimeoutId
TimeoutType
2010-02-26 02:55:35 +01:00
Timestamp
TimestampTz
TmFromChar
TmToChar
2020-03-23 22:54:33 +01:00
ToastAttrInfo
ToastCompressionId
2020-03-23 22:54:33 +01:00
ToastTupleContext
ToastedAttribute
2010-02-26 02:55:35 +01:00
TocEntry
TokenAuxData
TokenizedAuthLine
2010-02-26 02:55:35 +01:00
TrackItem
Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
2023-01-09 02:30:39 +01:00
TransApplyAction
2010-02-26 02:55:35 +01:00
TransInvalidationInfo
TransState
TransactionId
TransactionState
TransactionStateData
TransactionStmt
TransactionStmtKind
TransformInfo
TransformJsonStringValuesState
2010-02-26 02:55:35 +01:00
TransitionCaptureState
TrgmArc
TrgmArcInfo
TrgmBound
TrgmColor
TrgmColorInfo
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
TrgmGistOptions
TrgmNFA
TrgmPackArcInfo
TrgmPackedArc
TrgmPackedGraph
TrgmPackedState
TrgmPrefix
TrgmState
TrgmStateKey
TrieChar
2010-02-26 02:55:35 +01:00
Trigger
TriggerData
TriggerDesc
TriggerEvent
TriggerFlags
TriggerInfo
TriggerTransition
2010-02-26 02:55:35 +01:00
TruncateStmt
TsmRoutine
2010-02-26 02:55:35 +01:00
TupOutputState
TupSortStatus
TupStoreStatus
TupleConstr
TupleConversionMap
TupleDesc
TupleHashEntry
TupleHashEntryData
TupleHashIterator
TupleHashTable
2015-11-06 22:58:45 +01:00
TupleQueueReader
2010-02-26 02:55:35 +01:00
TupleTableSlot
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
TupleTableSlotOps
TuplesortClusterArg
TuplesortDatumArg
TuplesortIndexArg
TuplesortIndexBTreeArg
TuplesortIndexHashArg
TuplesortInstrumentation
TuplesortMethod
TuplesortPublic
2010-02-26 02:55:35 +01:00
TuplesortSpaceType
Tuplesortstate
Tuplestorestate
TwoPhaseCallback
TwoPhaseFileHeader
TwoPhaseLockRecord
TwoPhasePgStatRecord
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
TwoPhasePredicateLockRecord
TwoPhasePredicateRecord
TwoPhasePredicateRecordType
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
TwoPhasePredicateXactRecord
2010-02-26 02:55:35 +01:00
TwoPhaseRecordOnDisk
TwoPhaseRmgrId
TwoPhaseStateData
Type
TypeCacheEntry
TypeCacheEnumData
2010-02-26 02:55:35 +01:00
TypeCast
TypeCat
2010-02-26 02:55:35 +01:00
TypeFuncClass
TypeInfo
TypeName
U
2010-02-26 02:55:35 +01:00
U32
U8
UChar
UCharIterator
UColAttribute
UColAttributeValue
UCollator
UConverter
UErrorCode
2010-02-26 02:55:35 +01:00
UINT
ULARGE_INTEGER
ULONG
ULONG_PTR
UV
UVersionInfo
UnicodeNormalizationForm
UnicodeNormalizationQC
2010-02-26 02:55:35 +01:00
Unique
UniquePath
UniquePathMethod
UniqueState
UnlistenStmt
UnresolvedTup
UnresolvedTupData
UpdateContext
2010-02-26 02:55:35 +01:00
UpdateStmt
UpperRelationKind
UpperUniquePath
2010-02-26 02:55:35 +01:00
UserAuth
UserMapping
UserOpts
2010-02-26 02:55:35 +01:00
VacAttrStats
VacAttrStatsP
VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
VacuumRelation
2010-02-26 02:55:35 +01:00
VacuumStmt
ValidateIndexState
2010-02-26 02:55:35 +01:00
ValuesScan
ValuesScanState
Var
VarBit
VarChar
VarParamState
VarString
VarStringSortSupport
2010-02-26 02:55:35 +01:00
Variable
VariableAssignHook
VariableCache
VariableCacheData
VariableSetKind
VariableSetStmt
VariableShowStmt
VariableSpace
VariableStatData
VariableSubstituteHook
Variables
VersionedQuery
2010-02-26 02:55:35 +01:00
Vfd
ViewCheckOption
ViewOptCheckOption
ViewOptions
2010-02-26 02:55:35 +01:00
ViewStmt
VirtualTransactionId
VirtualTupleTableSlot
VolatileFunctionStatus
2010-02-26 02:55:35 +01:00
Vsrt
WAIT_ORDER
WALAvailability
WALInsertLock
WALInsertLockPadded
WALOpenSegment
WALReadError
WalRcvWakeupReason
WALSegmentCloseCB
2010-02-26 02:55:35 +01:00
WALSegmentContext
WALSegmentOpenCB
2010-02-26 02:55:35 +01:00
WCHAR
WCOKind
WFW_WaitOption
WIDGET
2010-02-26 02:55:35 +01:00
WORD
WORKSTATE
WSABUF
WSADATA
WSANETWORKEVENTS
WSAPROTOCOL_INFO
WaitEvent
WaitEventActivity
WaitEventClient
WaitEventIO
WaitEventIPC
WaitEventSet
WaitEventTimeout
2017-08-14 23:29:33 +02:00
WaitPMResult
WalCloseMethod
WalCompression
2010-07-06 21:18:19 +02:00
WalLevel
2010-02-26 02:55:35 +01:00
WalRcvData
WalRcvExecResult
WalRcvExecStatus
2010-02-26 02:55:35 +01:00
WalRcvState
WalRcvStreamOptions
WalReceiverConn
WalReceiverFunctionsType
2010-02-26 02:55:35 +01:00
WalSnd
WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
WalUsage
WalWriteMethod
Walfile
2010-02-26 02:55:35 +01:00
WindowAgg
WindowAggPath
2010-02-26 02:55:35 +01:00
WindowAggState
WindowAggStatus
WindowClause
WindowClauseSortData
WindowDef
WindowFunc
WindowFuncExprState
WindowFuncLists
WindowObject
WindowObjectData
WindowStatePerAgg
WindowStatePerAggData
WindowStatePerFunc
WithCheckOption
2010-02-26 02:55:35 +01:00
WithClause
WordEntry
WordEntryIN
WordEntryPos
WordEntryPosVector
WordEntryPosVector1
2010-02-26 02:55:35 +01:00
WorkTableScan
WorkTableScanState
WorkerInfo
WorkerInfoData
WorkerInstrumentation
WorkerJobDumpPtrType
WorkerJobRestorePtrType
2010-02-26 02:55:35 +01:00
Working_State
WriteBufPtrType
WriteBytePtrType
WriteDataCallback
WriteDataPtrType
WriteExtraTocPtrType
WriteFunc
2010-02-26 02:55:35 +01:00
WriteManifestState
WriteTarState
Allow to trigger kernel writeback after a configurable number of writes. Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
2016-02-19 21:13:05 +01:00
WritebackContext
2010-02-26 02:55:35 +01:00
X509
X509_EXTENSION
2010-02-26 02:55:35 +01:00
X509_NAME
X509_NAME_ENTRY
X509_STORE
X509_STORE_CTX
XLTW_Oper
2010-02-26 02:55:35 +01:00
XLogCtlData
XLogCtlInsert
XLogDumpConfig
XLogDumpPrivate
2010-02-26 02:55:35 +01:00
XLogLongPageHeader
XLogLongPageHeaderData
XLogPageHeader
XLogPageHeaderData
XLogPageReadCB
XLogPageReadPrivate
XLogPageReadResult
Prefetch data referenced by the WAL, take II. Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2022-04-07 09:28:40 +02:00
XLogPrefetchStats
XLogPrefetcher
XLogPrefetcherFilter
2010-02-26 02:55:35 +01:00
XLogReaderRoutine
XLogReaderState
XLogRecData
XLogRecPtr
Prefetch data referenced by the WAL, take II. Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2022-04-07 09:28:40 +02:00
XLogRecStats
2010-02-26 02:55:35 +01:00
XLogRecord
XLogRecordBlockCompressHeader
XLogRecordBlockHeader
XLogRecordBlockImageHeader
XLogRecordBuffer
XLogRecoveryCtlData
XLogRedoAction
XLogSegNo
XLogSource
XLogStats
2010-02-26 02:55:35 +01:00
XLogwrtResult
XLogwrtRqst
XPV
2010-02-26 02:55:35 +01:00
XPVIV
XPVMG
2010-02-26 02:55:35 +01:00
XactCallback
XactCallbackItem
XactEvent
XactLockTableWaitInfo
XidBoundsViolation
XidCacheStatus
XidCommitStatus
2010-02-26 02:55:35 +01:00
XidStatus
XmlExpr
XmlExprOp
XmlOptionType
XmlSerialize
XmlTableBuilderData
2010-02-26 02:55:35 +01:00
YYLTYPE
YYSTYPE
YY_BUFFER_STATE
ZSTD_CCtx
ZSTD_DCtx
ZSTD_inBuffer
ZSTD_outBuffer
2010-02-26 02:55:35 +01:00
_SPI_connection
_SPI_plan
__AssignProcessToJobObject
__CreateJobObject
__CreateRestrictedToken
__IsProcessInJob
__QueryInformationJobObject
__SetInformationJobObject
__time64_t
_dev_t
_ino_t
_locale_t
_resultmap
_stringlist
acquireLocksOnSubLinks_context
adjust_appendrel_attrs_context
aff_regex_struct
allocfunc
amadjustmembers_function
ambeginscan_function
ambuild_function
ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
amcostestimate_function
amendscan_function
amestimateparallelscan_function
amgetbitmap_function
amgettuple_function
aminitparallelscan_function
aminsert_function
ammarkpos_function
amoptions_function
amparallelrescan_function
amproperty_function
amrescan_function
amrestrpos_function
amvacuumcleanup_function
amvalidate_function
array_iter
2010-02-26 02:55:35 +01:00
array_unnest_fctx
assign_collations_context
2010-02-26 02:55:35 +01:00
autovac_table
av_relation
avl_dbase
avl_node
avl_tree
2010-02-26 02:55:35 +01:00
avw_dbase
backslashResult
backup_manifest_info
backup_manifest_option
2010-02-26 02:55:35 +01:00
base_yy_extra_type
basebackup_options
bbsink
bbsink_copystream
bbsink_gzip
bbsink_lz4
bbsink_ops
bbsink_server
bbsink_shell
bbsink_state
bbsink_throttle
bbsink_zstd
bbstreamer
2010-02-26 02:55:35 +01:00
bbstreamer_archive_context
bbstreamer_extractor
bbstreamer_gzip_decompressor
bbstreamer_gzip_writer
bbstreamer_lz4_frame
bbstreamer_member
bbstreamer_ops
bbstreamer_plain_writer
bbstreamer_recovery_injector
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
bbstreamer_tar_archiver
bbstreamer_tar_parser
bbstreamer_zstd_frame
bgworker_main_type
binaryheap
2010-02-26 02:55:35 +01:00
binaryheap_comparator
bitmapword
bits16
2010-02-26 02:55:35 +01:00
bits32
bits8
bloom_filter
boolKEY
brin_column_state
brin_serialize_callback_type
2010-02-26 02:55:35 +01:00
bytea
cached_re_str
canonicalize_state
2010-02-26 02:55:35 +01:00
cashKEY
catalogid_hash
cfp
2010-02-26 02:55:35 +01:00
check_agg_arguments_context
check_function_callback
check_network_data
check_object_relabel_type
2010-02-26 02:55:35 +01:00
check_password_hook_type
check_ungrouped_columns_context
chr
clock_t
cmpEntriesArg
2010-02-26 02:55:35 +01:00
codes_t
collation_cache_entry
2010-02-26 02:55:35 +01:00
color
colormaprange
2010-02-26 02:55:35 +01:00
compare_context
config_var_value
2010-02-26 02:55:35 +01:00
contain_aggs_of_level_context
convert_testexpr_context
copy_data_dest_cb
copy_data_source_cb
2010-02-26 02:55:35 +01:00
core_YYSTYPE
core_yy_extra_type
core_yyscan_t
corrupt_items
2010-02-26 02:55:35 +01:00
cost_qual_eval_context
cp_hash_func
create_upper_paths_hook_type
2010-02-26 02:55:35 +01:00
createdb_failure_params
crosstab_HashEnt
crosstab_cat_desc
datapagemap_iterator_t
datapagemap_t
2010-02-26 02:55:35 +01:00
dateKEY
datetkn
dce_uuid_t
dclist_head
2010-02-26 02:55:35 +01:00
decimal
deparse_columns
deparse_context
deparse_expr_cxt
deparse_namespace
destructor
dev_t
digit
disassembledLeaf
dlist_head
dlist_iter
dlist_mutable_iter
dlist_node
2010-02-26 02:55:35 +01:00
ds_state
Introduce dynamic shared memory areas. Programmers discovered decades ago that it was useful to have a simple interface for allocating and freeing memory, which is why malloc() and free() were invented. Unfortunately, those handy tools don't work with dynamic shared memory segments because those are specific to PostgreSQL and are not necessarily mapped at the same address in every cooperating process. So invent our own allocator instead. This makes it possible for processes cooperating as part of parallel query execution to allocate and free chunks of memory without having to reserve them prior to the start of execution. It could also be used for longer lived objects; for example, we could consider storing data for pg_stat_statements or the stats collector in shared memory using these interfaces, rather than writing them to files. Basically, anything that needs shared memory but can't predict in advance how much it's going to need might find this useful. Thomas Munro and Robert Haas. The original code (of mine) on which Thomas based his work was actually designed to be a new backend-local memory allocator for PostgreSQL, but that hasn't gone anywhere - or not yet, anyway. Thomas took that work and performed major refactoring and extensive modifications to make it work with dynamic shared memory, including the addition of appropriate locking. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 18:34:36 +01:00
dsa_area
dsa_area_control
dsa_area_pool
dsa_area_span
dsa_handle
dsa_pointer
dsa_pointer_atomic
Introduce dynamic shared memory areas. Programmers discovered decades ago that it was useful to have a simple interface for allocating and freeing memory, which is why malloc() and free() were invented. Unfortunately, those handy tools don't work with dynamic shared memory segments because those are specific to PostgreSQL and are not necessarily mapped at the same address in every cooperating process. So invent our own allocator instead. This makes it possible for processes cooperating as part of parallel query execution to allocate and free chunks of memory without having to reserve them prior to the start of execution. It could also be used for longer lived objects; for example, we could consider storing data for pg_stat_statements or the stats collector in shared memory using these interfaces, rather than writing them to files. Basically, anything that needs shared memory but can't predict in advance how much it's going to need might find this useful. Thomas Munro and Robert Haas. The original code (of mine) on which Thomas based his work was actually designed to be a new backend-local memory allocator for PostgreSQL, but that hasn't gone anywhere - or not yet, anyway. Thomas took that work and performed major refactoring and extensive modifications to make it work with dynamic shared memory, including the addition of appropriate locking. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 18:34:36 +01:00
dsa_segment_header
dsa_segment_index
Introduce dynamic shared memory areas. Programmers discovered decades ago that it was useful to have a simple interface for allocating and freeing memory, which is why malloc() and free() were invented. Unfortunately, those handy tools don't work with dynamic shared memory segments because those are specific to PostgreSQL and are not necessarily mapped at the same address in every cooperating process. So invent our own allocator instead. This makes it possible for processes cooperating as part of parallel query execution to allocate and free chunks of memory without having to reserve them prior to the start of execution. It could also be used for longer lived objects; for example, we could consider storing data for pg_stat_statements or the stats collector in shared memory using these interfaces, rather than writing them to files. Basically, anything that needs shared memory but can't predict in advance how much it's going to need might find this useful. Thomas Munro and Robert Haas. The original code (of mine) on which Thomas based his work was actually designed to be a new backend-local memory allocator for PostgreSQL, but that hasn't gone anywhere - or not yet, anyway. Thomas took that work and performed major refactoring and extensive modifications to make it work with dynamic shared memory, including the addition of appropriate locking. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 18:34:36 +01:00
dsa_segment_map
dshash_compare_function
dshash_hash
dshash_hash_function
dshash_parameters
dshash_partition
dshash_seq_status
dshash_table
dshash_table_control
dshash_table_handle
dshash_table_item
dsm_control_header
dsm_control_item
dsm_handle
dsm_op
dsm_segment
dsm_segment_detach_callback
2010-07-06 21:18:19 +02:00
eLogType
2010-02-26 02:55:35 +01:00
ean13
eary
ec_matches_callback_type
ec_member_foreign_arg
ec_member_matches_arg
emit_log_hook_type
2010-02-26 02:55:35 +01:00
eval_const_expressions_context
exec_thread_arg
2010-02-26 02:55:35 +01:00
execution_state
explain_get_index_name_hook_type
f_smgr
fd_set
fe_scram_state
fe_scram_state_enum
fetch_range_request
file_action_t
file_entry_t
file_type_t
filehash_hash
filehash_iterator
filemap_t
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
fill_string_relopt
2010-02-26 02:55:35 +01:00
finalize_primnode_context
find_dependent_phvs_context
find_expr_references_context
fix_join_expr_context
fix_scan_expr_context
fix_upper_expr_context
fix_windowagg_cond_context
flatten_join_alias_vars_context
Rework query relation permission checking Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
2022-12-06 16:09:24 +01:00
flatten_rtes_walker_context
2010-02-26 02:55:35 +01:00
float4
float4KEY
float8
float8KEY
floating_decimal_32
floating_decimal_64
fmAggrefPtr
fmExprContextCallbackFunction
2010-02-26 02:55:35 +01:00
fmNodePtr
fmStringInfo
fmgr_hook_type
foreign_glob_cxt
foreign_loc_cxt
freeaddrinfo_ptr_t
2010-02-26 02:55:35 +01:00
freefunc
fsec_t
gbt_vsrt_arg
2010-02-26 02:55:35 +01:00
gbtree_ninfo
gbtree_vinfo
generate_series_fctx
generate_series_numeric_fctx
2010-02-26 02:55:35 +01:00
generate_series_timestamp_fctx
generate_series_timestamptz_fctx
generate_subscripts_fctx
get_attavgwidth_hook_type
get_index_stats_hook_type
get_relation_info_hook_type
get_relation_stats_hook_type
getaddrinfo_ptr_t
getnameinfo_ptr_t
2010-02-26 02:55:35 +01:00
gid_t
gin_leafpage_items_state
2010-02-26 02:55:35 +01:00
ginxlogCreatePostingTree
ginxlogDeleteListPages
ginxlogDeletePage
ginxlogInsert
ginxlogInsertDataInternal
ginxlogInsertEntry
2010-02-26 02:55:35 +01:00
ginxlogInsertListPage
ginxlogRecompressDataLeaf
2010-02-26 02:55:35 +01:00
ginxlogSplit
ginxlogUpdateMeta
ginxlogVacuumDataLeafPage
2010-02-26 02:55:35 +01:00
gistxlogDelete
gistxlogPage
gistxlogPageDelete
gistxlogPageReuse
gistxlogPageSplit
gistxlogPageUpdate
grouping_sets_data
gseg_picksplit_item
2010-02-26 02:55:35 +01:00
gss_buffer_desc
gss_cred_id_t
gss_ctx_id_t
gss_name_t
2010-02-26 02:55:35 +01:00
gtrgm_consistent_cache
gzFile
hashfunc
hbaPort
heap_page_items_state
help_handler
hlCheck
hstoreCheckKeyLen_t
hstoreCheckValLen_t
hstorePairs_t
hstoreUniquePairs_t
hstoreUpgrade_t
hyperLogLogState
ifState
ilist
import_error_callback_arg
2010-02-26 02:55:35 +01:00
indexed_tlist
inet
inetKEY
inet_struct
init_function
2010-02-26 02:55:35 +01:00
inline_cte_walker_context
2010-07-06 21:18:19 +02:00
inline_error_callback_arg
2010-02-26 02:55:35 +01:00
ino_t
instr_time
int128
2010-02-26 02:55:35 +01:00
int16
int16KEY
int2vector
int32
int32KEY
int32_t
int64
int64KEY
int8
internalPQconninfoOption
intptr_t
intset_internal_node
intset_leaf_node
intset_node
2010-02-26 02:55:35 +01:00
intvKEY
Add pg_stat_io view, providing more detailed IO statistics Builds on 28e626bde00 and f30d62c2fc6. See the former for motivation. Rows of the view show IO operations for a particular backend type, IO target object, IO context combination (e.g. a client backend's operations on permanent relations in shared buffers) and each column in the view is the total number of IO Operations done (e.g. writes). So a cell in the view would be, for example, the number of blocks of relation data written from shared buffers by client backends since the last stats reset. In anticipation of tracking WAL IO and non-block-oriented IO (such as temporary file IO), the "op_bytes" column specifies the unit of the "reads", "writes", and "extends" columns for a given row. Rows for combinations of IO operation, backend type, target object and context that never occur, are ommitted entirely. For example, checkpointer will never operate on temporary relations. Similarly, if an IO operation never occurs for such a combination, the IO operation's cell will be null, to distinguish from 0 observed IO operations. For example, bgwriter should not perform reads. Note that some of the cells in the view are redundant with fields in pg_stat_bgwriter (e.g. buffers_backend). For now, these have been kept for backwards compatibility. Bumps catversion. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Samay Sharma <smilingsamay@gmail.com> Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-11 18:51:58 +01:00
io_stat_col
2010-02-26 02:55:35 +01:00
itemIdCompact
itemIdCompactData
iterator
2010-02-26 02:55:35 +01:00
jmp_buf
join_search_hook_type
json_aelem_action
json_manifest_error_callback
json_manifest_perfile_callback
json_manifest_perwalrange_callback
json_ofield_action
json_scalar_action
json_struct_action
keyEntryData
2010-02-26 02:55:35 +01:00
key_t
lclContext
lclTocEntry
leafSegmentInfo
leaf_item
libpq_source
2010-02-26 02:55:35 +01:00
line_t
lineno_t
list_sort_comparator
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
local_relopt
local_relopts
local_source
locale_t
2010-02-26 02:55:35 +01:00
locate_agg_of_level_context
locate_var_of_level_context
locate_windowfunc_context
logstreamer_param
2010-02-26 02:55:35 +01:00
lquery
lquery_level
lquery_variant
ltree
ltree_gist
ltree_level
ltxtquery
mXactCacheEnt
mac8KEY
2010-02-26 02:55:35 +01:00
macKEY
macaddr
macaddr8
macaddr_sortsupport_state
manifest_file
manifest_files_hash
manifest_files_iterator
manifest_wal_range
2010-02-26 02:55:35 +01:00
map_variable_attnos_context
max_parallel_hazard_context
2010-02-26 02:55:35 +01:00
mb2wchar_with_len_converter
mbchar_verifier
mbcharacter_incrementer
2010-02-26 02:55:35 +01:00
mbdisplaylen_converter
mblen_converter
mbstr_verifier
memoize_hash
memoize_iterator
2010-02-26 02:55:35 +01:00
metastring
mix_data_t
mixedStruct
mode_t
movedb_failure_params
2020-12-20 05:20:33 +01:00
multirange_bsearch_comparison
2010-02-26 02:55:35 +01:00
multirange_unnest_fctx
mxact
mxtruncinfo
needs_fmgr_hook_type
network_sortsupport_state
2010-02-26 02:55:35 +01:00
nodeitem
normal_rand_fctx
ntile_context
numeric
object_access_hook_type
object_access_hook_type_str
2010-02-26 02:55:35 +01:00
off_t
oidKEY
oidvector
on_dsm_detach_callback
on_exit_nicely_callback
openssl_tls_init_hook_typ
ossl_EVP_cipher_func
other
output_type
pagetable_hash
pagetable_iterator
pairingheap
pairingheap_comparator
pairingheap_node
parallel_worker_main_type
2010-07-06 21:18:19 +02:00
parse_error_callback_arg
2010-02-26 02:55:35 +01:00
parser_context
partition_method_t
2010-02-26 02:55:35 +01:00
pendingPosition
pgParameterStatus
pg_atomic_flag
pg_atomic_uint32
pg_atomic_uint64
pg_be_sasl_mech
2010-02-26 02:55:35 +01:00
pg_checksum_context
pg_checksum_raw_context
pg_checksum_type
pg_compress_algorithm
pg_compress_specification
pg_conn_host
pg_conn_host_type
2010-02-26 02:55:35 +01:00
pg_conv_map
pg_crc32
pg_crc32c
Move SHA2 routines to a new generic API layer for crypto hashes Two new routines to allocate a hash context and to free it are created, as these become necessary for the goal behind this refactoring: switch the all cryptohash implementations for OpenSSL to use EVP (for FIPS and also because upstream does not recommend the use of low-level cryptohash functions for 20 years). Note that OpenSSL hides the internals of cryptohash contexts since 1.1.0, so it is necessary to leave the allocation to OpenSSL itself, explaining the need for those two new routines. This part is going to require more work to properly track hash contexts with resource owners, but this not introduced here. Still, this refactoring makes the move possible. This reduces the number of routines for all SHA2 implementations from twelve (SHA{224,256,386,512} with init, update and final calls) to five (create, free, init, update and final calls) by incorporating the hash type directly into the hash context data. The new cryptohash routines are moved to a new file, called cryptohash.c for the fallback implementations, with SHA2 specifics becoming a part internal to src/common/. OpenSSL specifics are part of cryptohash_openssl.c. This infrastructure is usable for more hash types, like MD5 or HMAC. Any code paths using the internal SHA2 routines are adapted to report correctly errors, which are most of the changes of this commit. The zones mostly impacted are checksum manifests, libpq and SCRAM. Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it lacked the refactoring needed for libpq, as done here. This patch has been tested on Linux and Windows, with and without OpenSSL, and down to 1.0.1, the oldest version supported on HEAD. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
2020-12-02 02:37:20 +01:00
pg_cryptohash_ctx
Improve error handling of cryptohash computations The existing cryptohash facility was causing problems in some code paths related to MD5 (frontend and backend) that relied on the fact that the only type of error that could happen would be an OOM, as the MD5 implementation used in PostgreSQL ~13 (the in-core implementation is used when compiling with or without OpenSSL in those older versions), could fail only under this circumstance. The new cryptohash facilities can fail for reasons other than OOMs, like attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to 1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this would cause incorrect reports to show up. This commit extends the cryptohash APIs so as callers of those routines can fetch more context when an error happens, by using a new routine called pg_cryptohash_error(). The error states are stored within each implementation's internal context data, so as it is possible to extend the logic depending on what's suited for an implementation. The default implementation requires few error states, but OpenSSL could report various issues depending on its internal state so more is needed in cryptohash_openssl.c, and the code is shaped so as we are always able to grab the necessary information. The core code is changed to adapt to the new error routine, painting more "const" across the call stack where the static errors are stored, particularly in authentication code paths on variables that provide log details. This way, any future changes would warn if attempting to free these strings. The MD5 authentication code was also a bit blurry about the handling of "logdetail" (LOG sent to the postmaster), so improve the comments related that, while on it. The origin of the problem is 87ae969, that introduced the centralized cryptohash facility. Extra changes are done for pgcrypto in v14 for the non-OpenSSL code path to cope with the improvements done by this commit. Reported-by: Michael Mühlbeyer Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com Backpatch-through: 14
2022-01-11 01:55:16 +01:00
pg_cryptohash_errno
Move SHA2 routines to a new generic API layer for crypto hashes Two new routines to allocate a hash context and to free it are created, as these become necessary for the goal behind this refactoring: switch the all cryptohash implementations for OpenSSL to use EVP (for FIPS and also because upstream does not recommend the use of low-level cryptohash functions for 20 years). Note that OpenSSL hides the internals of cryptohash contexts since 1.1.0, so it is necessary to leave the allocation to OpenSSL itself, explaining the need for those two new routines. This part is going to require more work to properly track hash contexts with resource owners, but this not introduced here. Still, this refactoring makes the move possible. This reduces the number of routines for all SHA2 implementations from twelve (SHA{224,256,386,512} with init, update and final calls) to five (create, free, init, update and final calls) by incorporating the hash type directly into the hash context data. The new cryptohash routines are moved to a new file, called cryptohash.c for the fallback implementations, with SHA2 specifics becoming a part internal to src/common/. OpenSSL specifics are part of cryptohash_openssl.c. This infrastructure is usable for more hash types, like MD5 or HMAC. Any code paths using the internal SHA2 routines are adapted to report correctly errors, which are most of the changes of this commit. The zones mostly impacted are checksum manifests, libpq and SCRAM. Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it lacked the refactoring needed for libpq, as done here. This patch has been tested on Linux and Windows, with and without OpenSSL, and down to 1.0.1, the oldest version supported on HEAD. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
2020-12-02 02:37:20 +01:00
pg_cryptohash_type
pg_ctype_cache
2010-02-26 02:55:35 +01:00
pg_enc
pg_enc2gettext
pg_enc2name
pg_encname
pg_fe_sasl_mech
pg_funcptr_t
2010-02-26 02:55:35 +01:00
pg_gssinfo
pg_hmac_ctx
pg_hmac_errno
pg_int64
2010-02-26 02:55:35 +01:00
pg_local_to_utf_combined
pg_locale_t
pg_mb_radix_tree
Refactor MD5 implementations according to new cryptohash infrastructure This commit heavily reorganizes the MD5 implementations that exist in the tree in various aspects. First, MD5 is added to the list of options available in cryptohash.c and cryptohash_openssl.c. This means that if building with OpenSSL, EVP is used for MD5 instead of the fallback implementation that Postgres had for ages. With the recent refactoring work for cryptohash functions, this change is straight-forward. If not building with OpenSSL, a fallback implementation internal to src/common/ is used. Second, this reduces the number of MD5 implementations present in the tree from two to one, by moving the KAME implementation from pgcrypto to src/common/, and by removing the implementation that existed in src/common/. KAME was already structured with an init/update/final set of routines by pgcrypto (see original pgcrypto/md5.h) for compatibility with OpenSSL, so moving it to src/common/ has proved to be a straight-forward move, requiring no actual manipulation of the internals of each routine. Some benchmarking has not shown any performance gap between both implementations. Similarly to the fallback implementation used for SHA2, the fallback implementation of MD5 is moved to src/common/md5.c with an internal header called md5_int.h for the init, update and final routines. This gets then consumed by cryptohash.c. The original routines used for MD5-hashed passwords are moved to a separate file called md5_common.c, also in src/common/, aimed at being shared between all MD5 implementations as utility routines to keep compatibility with any code relying on them. Like the SHA2 changes, this commit had its round of tests on both Linux and Windows, across all versions of OpenSSL supported on HEAD, with and even without OpenSSL. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20201106073434.GA4961@paquier.xyz
2020-12-10 03:59:10 +01:00
pg_md5_ctx
2010-02-26 02:55:35 +01:00
pg_on_exit_callback
pg_prng_state
2010-02-26 02:55:35 +01:00
pg_re_flags
pg_saslprep_rc
pg_sha1_ctx
pg_sha224_ctx
pg_sha256_ctx
pg_sha384_ctx
pg_sha512_ctx
pg_snapshot
pg_stack_base_t
2010-02-26 02:55:35 +01:00
pg_time_t
pg_time_usec_t
2010-02-26 02:55:35 +01:00
pg_tz
pg_tz_cache
pg_tzenum
pg_unicode_decompinfo
pg_unicode_decomposition
pg_unicode_norminfo
pg_unicode_normprops
pg_unicode_recompinfo
2010-02-26 02:55:35 +01:00
pg_utf_to_local_combined
pg_uuid_t
pg_wc_probefunc
2010-02-26 02:55:35 +01:00
pg_wchar
pg_wchar_tbl
pgp_armor_headers_state
2010-02-26 02:55:35 +01:00
pgsocket
pgsql_thing_t
pgssEntry
pgssGlobalStats
2010-02-26 02:55:35 +01:00
pgssHashKey
pgssSharedState
Allow pg_stat_statements to track planning statistics. This commit makes pg_stat_statements support new GUC pg_stat_statements.track_planning. If this option is enabled, pg_stat_statements tracks the planning statistics of the statements, e.g., the number of times the statement was planned, the total time spent planning the statement, etc. This feature is useful to check the statements that it takes a long time to plan. Previously since pg_stat_statements tracked only the execution statistics, we could not use that for the purpose. The planning and execution statistics are stored at the end of each phase separately. So there are not always one-to-one relationship between them. For example, if the statement is successfully planned but fails in the execution phase, only its planning statistics are stored. This may cause the users to be able to see different pg_stat_statements results from the previous version. To avoid this, pg_stat_statements.track_planning needs to be disabled. This commit bumps the version of pg_stat_statements to 1.8 since it changes the definition of pg_stat_statements function. Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com
2020-04-02 04:20:19 +02:00
pgssStoreKind
pgssVersion
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
pgstat_entry_ref_hash_hash
pgstat_entry_ref_hash_iterator
2010-02-26 02:55:35 +01:00
pgstat_page
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
pgstat_snapshot_hash
2010-02-26 02:55:35 +01:00
pgstattuple_type
pgthreadlock_t
pid_t
pivot_field
2010-02-26 02:55:35 +01:00
planner_hook_type
plperl_array_info
2010-02-26 02:55:35 +01:00
plperl_call_data
plperl_interp_desc
2010-02-26 02:55:35 +01:00
plperl_proc_desc
plperl_proc_key
plperl_proc_ptr
2010-02-26 02:55:35 +01:00
plperl_query_desc
plperl_query_entry
plpgsql_CastHashEntry
plpgsql_CastHashKey
2010-02-26 02:55:35 +01:00
plpgsql_HashEnt
pltcl_call_state
pltcl_interp_desc
2010-02-26 02:55:35 +01:00
pltcl_proc_desc
pltcl_proc_key
pltcl_proc_ptr
2010-02-26 02:55:35 +01:00
pltcl_query_desc
pointer
polymorphic_actuals
pos_trgm
post_parse_analyze_hook_type
postprocess_result_function
2010-02-26 02:55:35 +01:00
pqbool
pqsigfunc
printQueryOpt
printTableContent
printTableFooter
printTableOpt
printTextFormat
printTextLineFormat
printTextLineWrap
printTextRule
printXheaderWidthType
printfunc
2010-02-26 02:55:35 +01:00
priv_map
process_file_callback_t
2010-02-26 02:55:35 +01:00
process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
2010-02-26 02:55:35 +01:00
promptStatus_t
pthread_barrier_t
2010-02-26 02:55:35 +01:00
pthread_cond_t
pthread_key_t
pthread_mutex_t
pthread_once_t
pthread_t
ptrdiff_t
2010-02-26 02:55:35 +01:00
pull_var_clause_context
pull_varattnos_context
2010-02-26 02:55:35 +01:00
pull_varnos_context
pull_vars_context
pullup_replace_vars_context
pushdown_safety_info
qc_hash_func
2010-02-26 02:55:35 +01:00
qsort_arg_comparator
qsort_comparator
query_pathkeys_callback
radius_attribute
radius_packet
rangeTableEntry_used_context
rank_context
rbt_allocfunc
rbt_combiner
2010-02-26 02:55:35 +01:00
rbt_comparator
rbt_freefunc
reduce_outer_joins_state
reference
regex_arc_t
2010-02-26 02:55:35 +01:00
regex_t
regexp
2010-02-26 02:55:35 +01:00
regexp_matches_ctx
registered_buffer
2010-02-26 02:55:35 +01:00
regmatch_t
regoff_t
regproc
relopt_bool
relopt_enum
relopt_enum_elt_def
2010-02-26 02:55:35 +01:00
relopt_gen
relopt_int
relopt_kind
relopt_parse_elt
relopt_real
relopt_string
relopt_type
relopt_value
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
relopts_validator
2010-02-26 02:55:35 +01:00
remoteConn
remoteConnHashEnt
remoteDep
rendezvousHashEntry
replace_rte_variables_callback
replace_rte_variables_context
ret_type
rewind_source
2010-02-26 02:55:35 +01:00
rewrite_event
Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 03:24:12 +01:00
rf_context
2010-02-26 02:55:35 +01:00
rm_detail_t
role_auth_extra
row_security_policy_hook_type
2010-02-26 02:55:35 +01:00
rsv_callback
saophash_hash
2010-02-26 02:55:35 +01:00
save_buffer
scram_state
scram_state_enum
sem_t
2010-02-26 02:55:35 +01:00
sequence_magic
set_join_pathlist_hook_type
set_rel_pathlist_hook_type
shm_mq
shm_mq_handle
shm_mq_iovec
shm_mq_result
shm_toc
shm_toc_entry
shm_toc_estimator
shmem_request_hook_type
2010-02-26 02:55:35 +01:00
shmem_startup_hook_type
sig_atomic_t
sigjmp_buf
signedbitmapword
sigset_t
size_t
slist_head
slist_iter
2010-02-26 02:55:35 +01:00
slist_mutable_iter
slist_node
2010-02-26 02:55:35 +01:00
slock_t
socket_set
socklen_t
spgBulkDeleteState
spgChooseIn
spgChooseOut
spgChooseResultType
spgConfigIn
spgConfigOut
spgInnerConsistentIn
spgInnerConsistentOut
spgLeafConsistentIn
spgLeafConsistentOut
spgNodePtr
spgPickSplitIn
spgPickSplitOut
spgVacPendingItem
spgxlogAddLeaf
spgxlogAddNode
spgxlogMoveLeafs
spgxlogPickSplit
spgxlogSplitTuple
spgxlogState
spgxlogVacuumLeaf
spgxlogVacuumRedirect
spgxlogVacuumRoot
split_pathtarget_context
2018-06-30 18:07:27 +02:00
split_pathtarget_item
2010-02-26 02:55:35 +01:00
sql_error_callback_arg
sqlparseInfo
sqlparseState
2010-02-26 02:55:35 +01:00
ss_lru_item_t
ss_scan_location_t
ss_scan_locations_t
ssize_t
standard_qp_extra
2010-02-26 02:55:35 +01:00
stemmer_module
stmtCacheEntry
storeInfo
storeRes_func
stream_stop_callback
string
2010-02-26 02:55:35 +01:00
substitute_actual_parameters_context
substitute_actual_srf_parameters_context
substitute_phv_relids_context
symbol
tablespaceinfo
2010-02-26 02:55:35 +01:00
teSection
temp_tablespaces_extra
test_re_flags
test_regex_ctx
test_shm_mq_header
test_spec
test_start_function
2010-02-26 02:55:35 +01:00
text
timeKEY
time_t
timeout_handler_proc
timeout_params
timerCA
2010-02-26 02:55:35 +01:00
tlist_vinfo
toast_compress_header
tokenize_error_callback_arg
2010-07-06 21:18:19 +02:00
transferMode
transfer_thread_arg
2010-02-26 02:55:35 +01:00
trgm
trgm_mb_char
trivalue
2010-02-26 02:55:35 +01:00
tsKEY
ts_parserstate
ts_tokenizer
2010-02-26 02:55:35 +01:00
ts_tokentype
tsearch_readline_state
tuplehash_hash
tuplehash_iterator
type
2010-02-26 02:55:35 +01:00
tzEntry
u_char
u_int
uchr
uid_t
uint128
2010-02-26 02:55:35 +01:00
uint16
uint16_t
2010-02-26 02:55:35 +01:00
uint32
uint32_t
uint64
uint64_t
2010-02-26 02:55:35 +01:00
uint8
uint8_t
2010-02-26 02:55:35 +01:00
uintptr_t
unicodeStyleBorderFormat
unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
uuidKEY
2010-02-26 02:55:35 +01:00
uuid_rc_t
uuid_sortsupport_state
2010-02-26 02:55:35 +01:00
uuid_t
va_list
vacuumingOptions
2010-02-26 02:55:35 +01:00
validate_string_relopt
varatt_expanded
2010-02-26 02:55:35 +01:00
varattrib_1b
varattrib_1b_e
varattrib_4b
vbits
2010-02-26 02:55:35 +01:00
verifier_context
walrcv_check_conninfo_fn
walrcv_connect_fn
walrcv_create_slot_fn
walrcv_disconnect_fn
walrcv_endstreaming_fn
walrcv_exec_fn
walrcv_get_backend_pid_fn
walrcv_get_conninfo_fn
walrcv_get_senderinfo_fn
walrcv_identify_system_fn
walrcv_readtimelinehistoryfile_fn
walrcv_receive_fn
walrcv_send_fn
walrcv_server_version_fn
walrcv_startstreaming_fn
2010-02-26 02:55:35 +01:00
wchar2mb_with_len_converter
wchar_t
win32_deadchild_waitinfo
2010-02-26 02:55:35 +01:00
wint_t
worker_state
2010-02-26 02:55:35 +01:00
worktable
wrap
xl_brin_createidx
xl_brin_desummarize
xl_brin_insert
xl_brin_revmap_extend
xl_brin_samepage_update
xl_brin_update
2010-02-26 02:55:35 +01:00
xl_btree_dedup
xl_btree_delete
xl_btree_insert
xl_btree_mark_page_halfdead
2010-02-26 02:55:35 +01:00
xl_btree_metadata
xl_btree_newroot
xl_btree_reuse_page
xl_btree_split
xl_btree_unlink_page
xl_btree_update
2010-02-26 02:55:35 +01:00
xl_btree_vacuum
xl_clog_truncate
xl_commit_ts_truncate
Add new block-by-block strategy for CREATE DATABASE. Because this strategy logs changes on a block-by-block basis, it avoids the need to checkpoint before and after the operation. However, because it logs each changed block individually, it might generate a lot of extra write-ahead logging if the template database is large. Therefore, the older strategy remains available via a new STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy option to createdb. Somewhat controversially, this patch assembles the list of relations to be copied to the new database by reading the pg_class relation of the template database. Cross-database access like this isn't normally possible, but it can be made to work here because there can't be any connections to the database being copied, nor can it contain any in-doubt transactions. Even so, we have to use lower-level interfaces than normal, since the table scan and relcache interfaces will not work for a database to which we're not connected. The advantage of this approach is that we do not need to rely on the filesystem to determine what ought to be copied, but instead on PostgreSQL's own knowledge of the database structure. This avoids, for example, copying stray files that happen to be located in the source database directory. Dilip Kumar, with a fairly large number of cosmetic changes by me. Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor, Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian, Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others. Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 17:31:43 +02:00
xl_dbase_create_file_copy_rec
xl_dbase_create_wal_log_rec
2010-02-26 02:55:35 +01:00
xl_dbase_drop_rec
xl_end_of_recovery
xl_hash_add_ovfl_page
xl_hash_delete
xl_hash_init_bitmap_page
xl_hash_init_meta_page
xl_hash_insert
xl_hash_move_page_contents
xl_hash_split_allocate_page
xl_hash_split_complete
xl_hash_squeeze_page
xl_hash_update_meta_page
xl_hash_vacuum_one_page
xl_heap_confirm
2010-02-26 02:55:35 +01:00
xl_heap_delete
xl_heap_freeze_page
xl_heap_freeze_plan
xl_heap_freeze_tuple
2010-02-26 02:55:35 +01:00
xl_heap_header
xl_heap_inplace
xl_heap_insert
xl_heap_lock
xl_heap_lock_updated
xl_heap_multi_insert
xl_heap_new_cid
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
2010-02-26 02:55:35 +01:00
xl_heap_update
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
xl_heap_vacuum
xl_heap_visible
2010-02-26 02:55:35 +01:00
xl_invalid_page
xl_invalid_page_key
xl_invalidations
xl_logical_message
xl_multi_insert_tuple
2010-02-26 02:55:35 +01:00
xl_multixact_create
Rework the way multixact truncations work. The fact that multixact truncations are not WAL logged has caused a fair share of problems. Amongst others it requires to do computations during recovery while the database is not in a consistent state, delaying truncations till checkpoints, and handling members being truncated, but offset not. We tried to put bandaids on lots of these issues over the last years, but it seems time to change course. Thus this patch introduces WAL logging for multixact truncations. This allows: 1) to perform the truncation directly during VACUUM, instead of delaying it to the checkpoint. 2) to avoid looking at the offsets SLRU for truncation during recovery, we can just use the master's values. 3) simplify a fair amount of logic to keep in memory limits straight, this has gotten much easier During the course of fixing this a bunch of additional bugs had to be fixed: 1) Data was not purged from memory the member's SLRU before deleting segments. This happened to be hard or impossible to hit due to the interlock between checkpoints and truncation. 2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but that doesn't work for offsets that haven't yet been flushed to disk. Add code to flush the SLRUs to fix. Not pretty, but it feels slightly safer to only make decisions based on actual on-disk state. 3) find_multixact_start() could be called concurrently with a truncation and thus fail. Via SetOffsetVacuumLimit() that could lead to a round of emergency vacuuming. The problem remains in pg_get_multixact_members(), but that's quite harmless. For now this is going to only get applied to 9.5+, leaving the issues in the older branches in place. It is quite possible that we need to backpatch at a later point though. For the case this gets backpatched we need to handle that an updated standby may be replaying WAL from a not-yet upgraded primary. We have to recognize that situation and use "old style" truncation (i.e. looking at the SLRUs) during WAL replay. In contrast to before, this now happens in the startup process, when replaying a checkpoint record, instead of the checkpointer. Doing truncation in the restartpoint is incorrect, they can happen much later than the original checkpoint, thereby leading to wraparound. To avoid "multixact_redo: unknown op code 48" errors standbys would have to be upgraded before primaries. A later patch will bump the WAL page magic, and remove the legacy truncation codepaths. Legacy truncation support is just included to make a possible future backpatch easier. Discussion: 20150621192409.GA4797@alap3.anarazel.de Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
xl_multixact_truncate
Fix WAL replay in presence of an incomplete record Physical replication always ships WAL segment files to replicas once they are complete. This is a problem if one WAL record is split across a segment boundary and the primary server crashes before writing down the segment with the next portion of the WAL record: WAL writing after crash recovery would happily resume at the point where the broken record started, overwriting that record ... but any standby or backup may have already received a copy of that segment, and they are not rewinding. This causes standbys to stop following the primary after the latter crashes: LOG: invalid contrecord length 7262 at A8/D9FFFBC8 because the standby is still trying to read the continuation record (contrecord) for the original long WAL record, but it is not there and it will never be. A workaround is to stop the replica, delete the WAL file, and restart it -- at which point a fresh copy is brought over from the primary. But that's pretty labor intensive, and I bet many users would just give up and re-clone the standby instead. A fix for this problem was already attempted in commit 515e3d84a0b5, but it only addressed the case for the scenario of WAL archiving, so streaming replication would still be a problem (as well as other things such as taking a filesystem-level backup while the server is down after having crashed), and it had performance scalability problems too; so it had to be reverted. This commit fixes the problem using an approach suggested by Andres Freund, whereby the initial portion(s) of the split-up WAL record are kept, and a special type of WAL record is written where the contrecord was lost, so that WAL replay in the replica knows to skip the broken parts. With this approach, we can continue to stream/archive segment files as soon as they are complete, and replay of the broken records will proceed across the crash point without a hitch. Because a new type of WAL record is added, users should be careful to upgrade standbys first, primaries later. Otherwise they risk the standby being unable to start if the primary happens to write such a record. A new TAP test that exercises this is added, but the portability of it is yet to be seen. This has been wrong since the introduction of physical replication, so backpatch all the way back. In stable branches, keep the new XLogReaderState members at the end of the struct, to avoid an ABI break. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
2021-09-29 16:21:51 +02:00
xl_overwrite_contrecord
2010-07-06 21:18:19 +02:00
xl_parameter_change
2010-02-26 02:55:35 +01:00
xl_relmap_update
xl_replorigin_drop
xl_replorigin_set
xl_restore_point
2010-02-26 02:55:35 +01:00
xl_running_xacts
xl_seq_rec
xl_smgr_create
xl_smgr_truncate
xl_standby_lock
xl_standby_locks
xl_tblspc_create_rec
xl_tblspc_drop_rec
xl_xact_abort
xl_xact_assignment
xl_xact_commit
xl_xact_dbinfo
xl_xact_invals
xl_xact_origin
xl_xact_parsed_abort
xl_xact_parsed_commit
xl_xact_parsed_prepare
xl_xact_prepare
Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
xl_xact_relfilelocators
pgstat: scaffolding for transactional stats creation / drop. One problematic part of the current statistics collector design is that there is no reliable way of getting rid of statistics entries. Because of that pgstat_vacuum_stat() (called by [auto-]vacuum) matches all stats for the current database with the catalog contents and tries to drop now-superfluous entries. That's quite expensive. What's worse, it doesn't work on physical replicas, despite physical replicas collection statistics entries. This commit introduces infrastructure to create / drop statistics entries transactionally, together with the underlying catalog objects (functions, relations, subscriptions). pgstat_xact.c maintains a list of stats entries created / dropped transactionally in the current transaction. To ensure the removal of statistics entries is durable dropped statistics entries are included in commit / abort (and prepare) records, which also ensures that stats entries are dropped on standbys. Statistics entries created separately from creating the underlying catalog object (e.g. when stats were previously lost due to an immediate restart) are *not* WAL logged. However that can only happen outside of the transaction creating the catalog object, so it does not lead to "leaked" statistics entries. For this to work, functions creating / dropping functions / relations / subscriptions need to call into pgstat. For subscriptions this was already done when dropping subscriptions, via pgstat_report_subscription_drop() (now renamed to pgstat_drop_subscription()). This commit does not actually drop stats yet, it just provides the infrastructure. It is however a largely independent piece of infrastructure, so committing it separately makes sense. Bumps XLOG_PAGE_MAGIC. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-04-07 03:22:22 +02:00
xl_xact_stats_item
xl_xact_stats_items
xl_xact_subxacts
xl_xact_twophase
xl_xact_xinfo
2010-02-26 02:55:35 +01:00
xmlBuffer
xmlBufferPtr
xmlChar
xmlDocPtr
xmlErrorPtr
xmlExternalEntityLoader
xmlGenericErrorFunc
2010-02-26 02:55:35 +01:00
xmlNodePtr
xmlNodeSetPtr
xmlParserCtxtPtr
xmlParserInputPtr
xmlStructuredErrorFunc
2010-02-26 02:55:35 +01:00
xmlTextWriter
xmlTextWriterPtr
xmlXPathCompExprPtr
xmlXPathContextPtr
xmlXPathObjectPtr
xmltype
xpath_workspace
xsltSecurityPrefsPtr
2010-02-26 02:55:35 +01:00
xsltStylesheetPtr
xsltTransformContextPtr
yy_parser
2010-02-26 02:55:35 +01:00
yy_size_t
yyscan_t
z_stream
z_streamp
zic_t