postgresql/src/backend/executor/execTuples.c

1319 lines
37 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* execTuples.c
* Routines dealing with TupleTableSlots. These are used for resource
* management associated with tuples (eg, releasing buffer pins for
* tuples in disk buffers, or freeing the memory occupied by transient
* tuples). Slots also provide access abstraction that lets us implement
* "virtual" tuples to reduce data-copying overhead.
*
* Routines dealing with the type information for tuples. Currently,
* the type information for a tuple is an array of FormData_pg_attribute.
* This information is needed by routines manipulating tuples
* (getattribute, formtuple, etc.).
*
* Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/backend/executor/execTuples.c
*
*-------------------------------------------------------------------------
*/
/*
* INTERFACE ROUTINES
*
* SLOT CREATION/DESTRUCTION
* MakeTupleTableSlot - create an empty slot
* ExecAllocTableSlot - create a slot within a tuple table
* ExecResetTupleTable - clear and optionally delete a tuple table
* MakeSingleTupleTableSlot - make a standalone slot, set its descriptor
* ExecDropSingleTupleTableSlot - destroy a standalone slot
*
* SLOT ACCESSORS
* ExecSetSlotDescriptor - set a slot's tuple descriptor
* ExecStoreTuple - store a physical tuple in the slot
* ExecStoreMinimalTuple - store a minimal physical tuple in the slot
* ExecClearTuple - clear contents of a slot
* ExecStoreVirtualTuple - mark slot as containing a virtual tuple
* ExecCopySlotTuple - build a physical tuple from a slot
* ExecCopySlotMinimalTuple - build a minimal physical tuple from a slot
* ExecMaterializeSlot - convert virtual to physical storage
* ExecCopySlot - copy one slot's contents to another
*
* CONVENIENCE INITIALIZATION ROUTINES
2001-03-22 05:01:46 +01:00
* ExecInitResultTupleSlot \ convenience routines to initialize
* ExecInitScanTupleSlot \ the various tuple slots for nodes
* ExecInitExtraTupleSlot / which store copies of tuples.
* ExecInitNullTupleSlot /
*
* Routines that probably belong somewhere else:
* ExecTypeFromTL - form a TupleDesc from a target list
*
* EXAMPLE OF HOW TABLE ROUTINES WORK
* Suppose we have a query such as SELECT emp.name FROM emp and we have
* a single SeqScan node in the query plan.
*
2005-03-17 16:25:51 +01:00
* At ExecutorStart()
* ----------------
* - ExecInitSeqScan() calls ExecInitScanTupleSlot() and
* ExecInitResultTupleSlot() to construct TupleTableSlots
* for the tuples returned by the access methods and the
* tuples resulting from performing target list projections.
*
2005-03-17 16:25:51 +01:00
* During ExecutorRun()
* ----------------
* - SeqNext() calls ExecStoreTuple() to place the tuple returned
* by the access methods into the scan tuple slot.
*
* - ExecSeqScan() calls ExecStoreTuple() to take the result
* tuple from ExecProject() and place it into the result tuple slot.
*
* - ExecutePlan() calls the output function.
*
* The important thing to watch in the executor code is how pointers
* to the slots containing tuples are passed instead of the tuples
* themselves. This facilitates the communication of related information
* (such as whether or not a tuple should be pfreed, what buffer contains
* this tuple, the tuple's tuple descriptor, etc). It also allows us
* to avoid physically constructing projection tuples in many cases.
*/
#include "postgres.h"
#include "access/htup_details.h"
Fix failure to detoast fields in composite elements of structured types. If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
2014-05-01 21:19:06 +02:00
#include "access/tuptoaster.h"
#include "funcapi.h"
#include "catalog/pg_type.h"
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
static TupleDesc ExecTypeFromTLInternal(List *targetList,
2004-08-29 07:07:03 +02:00
bool hasoid, bool skipjunk);
/* ----------------------------------------------------------------
* tuple table create/delete functions
* ----------------------------------------------------------------
*/
/* --------------------------------
* MakeTupleTableSlot
*
* Basic routine to make an empty TupleTableSlot.
* --------------------------------
*/
TupleTableSlot *
MakeTupleTableSlot(void)
{
TupleTableSlot *slot = makeNode(TupleTableSlot);
slot->tts_isempty = true;
slot->tts_shouldFree = false;
slot->tts_shouldFreeMin = false;
slot->tts_tuple = NULL;
slot->tts_tupleDescriptor = NULL;
slot->tts_mcxt = CurrentMemoryContext;
slot->tts_buffer = InvalidBuffer;
slot->tts_nvalid = 0;
slot->tts_values = NULL;
slot->tts_isnull = NULL;
slot->tts_mintuple = NULL;
return slot;
}
/* --------------------------------
* ExecAllocTableSlot
*
* Create a tuple table slot within a tuple table (which is just a List).
* --------------------------------
*/
TupleTableSlot *
ExecAllocTableSlot(List **tupleTable)
{
TupleTableSlot *slot = MakeTupleTableSlot();
*tupleTable = lappend(*tupleTable, slot);
return slot;
}
/* --------------------------------
* ExecResetTupleTable
*
* This releases any resources (buffer pins, tupdesc refcounts)
* held by the tuple table, and optionally releases the memory
* occupied by the tuple table data structure.
* It is expected that this routine be called by EndPlan().
* --------------------------------
*/
void
ExecResetTupleTable(List *tupleTable, /* tuple table */
bool shouldFree) /* true if we should free memory */
{
ListCell *lc;
foreach(lc, tupleTable)
{
TupleTableSlot *slot = (TupleTableSlot *) lfirst(lc);
/* Sanity checks */
Assert(IsA(slot, TupleTableSlot));
/* Always release resources and reset the slot to empty */
ExecClearTuple(slot);
if (slot->tts_tupleDescriptor)
{
ReleaseTupleDesc(slot->tts_tupleDescriptor);
slot->tts_tupleDescriptor = NULL;
}
/* If shouldFree, release memory occupied by the slot itself */
if (shouldFree)
{
if (slot->tts_values)
pfree(slot->tts_values);
if (slot->tts_isnull)
pfree(slot->tts_isnull);
pfree(slot);
}
}
/* If shouldFree, release the list structure */
if (shouldFree)
list_free(tupleTable);
}
/* --------------------------------
* MakeSingleTupleTableSlot
*
* This is a convenience routine for operations that need a
* standalone TupleTableSlot not gotten from the main executor
* tuple table. It makes a single slot and initializes it
* to use the given tuple descriptor.
* --------------------------------
*/
TupleTableSlot *
MakeSingleTupleTableSlot(TupleDesc tupdesc)
{
TupleTableSlot *slot = MakeTupleTableSlot();
ExecSetSlotDescriptor(slot, tupdesc);
return slot;
}
/* --------------------------------
* ExecDropSingleTupleTableSlot
*
* Release a TupleTableSlot made with MakeSingleTupleTableSlot.
* DON'T use this on a slot that's part of a tuple table list!
* --------------------------------
*/
void
ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
{
/* This should match ExecResetTupleTable's processing of one slot */
Assert(IsA(slot, TupleTableSlot));
ExecClearTuple(slot);
if (slot->tts_tupleDescriptor)
ReleaseTupleDesc(slot->tts_tupleDescriptor);
if (slot->tts_values)
pfree(slot->tts_values);
if (slot->tts_isnull)
pfree(slot->tts_isnull);
pfree(slot);
}
/* ----------------------------------------------------------------
* tuple table slot accessor functions
* ----------------------------------------------------------------
*/
/* --------------------------------
* ExecSetSlotDescriptor
*
* This function is used to set the tuple descriptor associated
* with the slot's tuple. The passed descriptor must have lifespan
* at least equal to the slot's. If it is a reference-counted descriptor
* then the reference count is incremented for as long as the slot holds
* a reference.
* --------------------------------
*/
void
ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */
TupleDesc tupdesc) /* new tuple descriptor */
{
/* For safety, make sure slot is empty before changing it */
ExecClearTuple(slot);
/*
* Release any old descriptor. Also release old Datum/isnull arrays if
2005-10-15 04:49:52 +02:00
* present (we don't bother to check if they could be re-used).
*/
if (slot->tts_tupleDescriptor)
ReleaseTupleDesc(slot->tts_tupleDescriptor);
if (slot->tts_values)
pfree(slot->tts_values);
if (slot->tts_isnull)
pfree(slot->tts_isnull);
/*
* Install the new descriptor; if it's refcounted, bump its refcount.
*/
slot->tts_tupleDescriptor = tupdesc;
PinTupleDesc(tupdesc);
/*
2005-10-15 04:49:52 +02:00
* Allocate Datum/isnull arrays of the appropriate size. These must have
* the same lifetime as the slot, so allocate in the slot's own context.
*/
slot->tts_values = (Datum *)
MemoryContextAlloc(slot->tts_mcxt, tupdesc->natts * sizeof(Datum));
slot->tts_isnull = (bool *)
MemoryContextAlloc(slot->tts_mcxt, tupdesc->natts * sizeof(bool));
}
/* --------------------------------
* ExecStoreTuple
*
* This function is used to store a physical tuple into a specified
* slot in the tuple table.
*
* tuple: tuple to store
* slot: slot to store it in
* buffer: disk buffer if tuple is in a disk page, else InvalidBuffer
* shouldFree: true if ExecClearTuple should pfree() the tuple
* when done with it
*
* If 'buffer' is not InvalidBuffer, the tuple table code acquires a pin
* on the buffer which is held until the slot is cleared, so that the tuple
* won't go away on us.
*
* shouldFree is normally set 'true' for tuples constructed on-the-fly.
* It must always be 'false' for tuples that are stored in disk pages,
* since we don't want to try to pfree those.
*
* Another case where it is 'false' is when the referenced tuple is held
* in a tuple table slot belonging to a lower-level executor Proc node.
* In this case the lower-level slot retains ownership and responsibility
* for eventually releasing the tuple. When this method is used, we must
* be certain that the upper-level Proc node will lose interest in the tuple
* sooner than the lower-level one does! If you're not certain, copy the
* lower-level tuple with heap_copytuple and let the upper-level table
* slot assume ownership of the copy!
*
* Return value is just the passed-in slot pointer.
*
* NOTE: before PostgreSQL 8.1, this function would accept a NULL tuple
* pointer and effectively behave like ExecClearTuple (though you could
* still specify a buffer to pin, which would be an odd combination).
* This saved a couple lines of code in a few places, but seemed more likely
* to mask logic errors than to be really useful, so it's now disallowed.
* --------------------------------
*/
TupleTableSlot *
ExecStoreTuple(HeapTuple tuple,
TupleTableSlot *slot,
Buffer buffer,
bool shouldFree)
{
/*
* sanity checks
*/
Assert(tuple != NULL);
Assert(slot != NULL);
Assert(slot->tts_tupleDescriptor != NULL);
/* passing shouldFree=true for a tuple on a disk page is not sane */
Assert(BufferIsValid(buffer) ? (!shouldFree) : true);
/*
* Free any old physical tuple belonging to the slot.
*/
if (slot->tts_shouldFree)
heap_freetuple(slot->tts_tuple);
if (slot->tts_shouldFreeMin)
heap_free_minimal_tuple(slot->tts_mintuple);
/*
* Store the new tuple into the specified slot.
*/
slot->tts_isempty = false;
slot->tts_shouldFree = shouldFree;
slot->tts_shouldFreeMin = false;
slot->tts_tuple = tuple;
slot->tts_mintuple = NULL;
/* Mark extracted state invalid */
slot->tts_nvalid = 0;
/*
2005-10-15 04:49:52 +02:00
* If tuple is on a disk page, keep the page pinned as long as we hold a
* pointer into it. We assume the caller already has such a pin.
*
* This is coded to optimize the case where the slot previously held a
* tuple on the same disk page: in that case releasing and re-acquiring
* the pin is a waste of cycles. This is a common situation during
* seqscans, so it's worth troubling over.
*/
if (slot->tts_buffer != buffer)
{
if (BufferIsValid(slot->tts_buffer))
ReleaseBuffer(slot->tts_buffer);
slot->tts_buffer = buffer;
if (BufferIsValid(buffer))
IncrBufferRefCount(buffer);
}
return slot;
}
/* --------------------------------
* ExecStoreMinimalTuple
*
* Like ExecStoreTuple, but insert a "minimal" tuple into the slot.
*
* No 'buffer' parameter since minimal tuples are never stored in relations.
* --------------------------------
*/
TupleTableSlot *
ExecStoreMinimalTuple(MinimalTuple mtup,
TupleTableSlot *slot,
bool shouldFree)
{
/*
* sanity checks
*/
Assert(mtup != NULL);
Assert(slot != NULL);
Assert(slot->tts_tupleDescriptor != NULL);
/*
* Free any old physical tuple belonging to the slot.
*/
if (slot->tts_shouldFree)
heap_freetuple(slot->tts_tuple);
if (slot->tts_shouldFreeMin)
heap_free_minimal_tuple(slot->tts_mintuple);
/*
* Drop the pin on the referenced buffer, if there is one.
*/
if (BufferIsValid(slot->tts_buffer))
ReleaseBuffer(slot->tts_buffer);
slot->tts_buffer = InvalidBuffer;
/*
* Store the new tuple into the specified slot.
*/
slot->tts_isempty = false;
slot->tts_shouldFree = false;
slot->tts_shouldFreeMin = shouldFree;
slot->tts_tuple = &slot->tts_minhdr;
slot->tts_mintuple = mtup;
slot->tts_minhdr.t_len = mtup->t_len + MINIMAL_TUPLE_OFFSET;
slot->tts_minhdr.t_data = (HeapTupleHeader) ((char *) mtup - MINIMAL_TUPLE_OFFSET);
/* no need to set t_self or t_tableOid since we won't allow access */
/* Mark extracted state invalid */
slot->tts_nvalid = 0;
return slot;
}
/* --------------------------------
* ExecClearTuple
*
* This function is used to clear out a slot in the tuple table.
*
* NB: only the tuple is cleared, not the tuple descriptor (if any).
* --------------------------------
*/
TupleTableSlot * /* return: slot passed */
ExecClearTuple(TupleTableSlot *slot) /* slot in which to store tuple */
{
/*
* sanity checks
*/
Assert(slot != NULL);
/*
* Free the old physical tuple if necessary.
*/
if (slot->tts_shouldFree)
heap_freetuple(slot->tts_tuple);
if (slot->tts_shouldFreeMin)
heap_free_minimal_tuple(slot->tts_mintuple);
slot->tts_tuple = NULL;
slot->tts_mintuple = NULL;
slot->tts_shouldFree = false;
slot->tts_shouldFreeMin = false;
/*
* Drop the pin on the referenced buffer, if there is one.
*/
if (BufferIsValid(slot->tts_buffer))
ReleaseBuffer(slot->tts_buffer);
slot->tts_buffer = InvalidBuffer;
/*
* Mark it empty.
*/
slot->tts_isempty = true;
slot->tts_nvalid = 0;
return slot;
}
/* --------------------------------
* ExecStoreVirtualTuple
* Mark a slot as containing a virtual tuple.
*
* The protocol for loading a slot with virtual tuple data is:
* * Call ExecClearTuple to mark the slot empty.
* * Store data into the Datum/isnull arrays.
* * Call ExecStoreVirtualTuple to mark the slot valid.
* This is a bit unclean but it avoids one round of data copying.
* --------------------------------
*/
TupleTableSlot *
ExecStoreVirtualTuple(TupleTableSlot *slot)
{
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(slot->tts_tupleDescriptor != NULL);
Assert(slot->tts_isempty);
slot->tts_isempty = false;
slot->tts_nvalid = slot->tts_tupleDescriptor->natts;
return slot;
}
/* --------------------------------
* ExecStoreAllNullTuple
* Set up the slot to contain a null in every column.
*
* At first glance this might sound just like ExecClearTuple, but it's
* entirely different: the slot ends up full, not empty.
* --------------------------------
*/
TupleTableSlot *
ExecStoreAllNullTuple(TupleTableSlot *slot)
{
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(slot->tts_tupleDescriptor != NULL);
/* Clear any old contents */
ExecClearTuple(slot);
/*
* Fill all the columns of the virtual tuple with nulls
*/
MemSet(slot->tts_values, 0,
slot->tts_tupleDescriptor->natts * sizeof(Datum));
memset(slot->tts_isnull, true,
slot->tts_tupleDescriptor->natts * sizeof(bool));
return ExecStoreVirtualTuple(slot);
}
/* --------------------------------
* ExecCopySlotTuple
* Obtain a copy of a slot's regular physical tuple. The copy is
* palloc'd in the current memory context.
* The slot itself is undisturbed.
*
* This works even if the slot contains a virtual or minimal tuple;
* however the "system columns" of the result will not be meaningful.
* --------------------------------
*/
HeapTuple
ExecCopySlotTuple(TupleTableSlot *slot)
{
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(!slot->tts_isempty);
/*
* If we have a physical tuple (either format) then just copy it.
*/
if (TTS_HAS_PHYSICAL_TUPLE(slot))
return heap_copytuple(slot->tts_tuple);
if (slot->tts_mintuple)
return heap_tuple_from_minimal_tuple(slot->tts_mintuple);
/*
* Otherwise we need to build a tuple from the Datum array.
*/
return heap_form_tuple(slot->tts_tupleDescriptor,
slot->tts_values,
slot->tts_isnull);
}
/* --------------------------------
* ExecCopySlotMinimalTuple
* Obtain a copy of a slot's minimal physical tuple. The copy is
* palloc'd in the current memory context.
* The slot itself is undisturbed.
* --------------------------------
*/
MinimalTuple
ExecCopySlotMinimalTuple(TupleTableSlot *slot)
{
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(!slot->tts_isempty);
/*
* If we have a physical tuple then just copy it. Prefer to copy
* tts_mintuple since that's a tad cheaper.
*/
if (slot->tts_mintuple)
return heap_copy_minimal_tuple(slot->tts_mintuple);
if (slot->tts_tuple)
return minimal_tuple_from_heap_tuple(slot->tts_tuple);
/*
* Otherwise we need to build a tuple from the Datum array.
*/
return heap_form_minimal_tuple(slot->tts_tupleDescriptor,
slot->tts_values,
slot->tts_isnull);
}
/* --------------------------------
* ExecFetchSlotTuple
* Fetch the slot's regular physical tuple.
*
* If the slot contains a virtual tuple, we convert it to physical
* form. The slot retains ownership of the physical tuple.
* If it contains a minimal tuple we convert to regular form and store
* that in addition to the minimal tuple (not instead of, because
* callers may hold pointers to Datums within the minimal tuple).
*
* The main difference between this and ExecMaterializeSlot() is that this
* does not guarantee that the contained tuple is local storage.
* Hence, the result must be treated as read-only.
* --------------------------------
*/
HeapTuple
ExecFetchSlotTuple(TupleTableSlot *slot)
{
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(!slot->tts_isempty);
/*
* If we have a regular physical tuple then just return it.
*/
if (TTS_HAS_PHYSICAL_TUPLE(slot))
return slot->tts_tuple;
/*
* Otherwise materialize the slot...
*/
return ExecMaterializeSlot(slot);
}
/* --------------------------------
* ExecFetchSlotMinimalTuple
* Fetch the slot's minimal physical tuple.
*
* If the slot contains a virtual tuple, we convert it to minimal
* physical form. The slot retains ownership of the minimal tuple.
* If it contains a regular tuple we convert to minimal form and store
* that in addition to the regular tuple (not instead of, because
* callers may hold pointers to Datums within the regular tuple).
*
* As above, the result must be treated as read-only.
* --------------------------------
*/
MinimalTuple
ExecFetchSlotMinimalTuple(TupleTableSlot *slot)
{
MemoryContext oldContext;
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(!slot->tts_isempty);
/*
* If we have a minimal physical tuple (local or not) then just return it.
*/
if (slot->tts_mintuple)
return slot->tts_mintuple;
/*
* Otherwise, copy or build a minimal tuple, and store it into the slot.
*
* We may be called in a context that is shorter-lived than the tuple
* slot, but we have to ensure that the materialized tuple will survive
* anyway.
*/
oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
slot->tts_mintuple = ExecCopySlotMinimalTuple(slot);
slot->tts_shouldFreeMin = true;
MemoryContextSwitchTo(oldContext);
/*
* Note: we may now have a situation where we have a local minimal tuple
* attached to a virtual or non-local physical tuple. There seems no harm
* in that at the moment, but if any materializes, we should change this
* function to force the slot into minimal-tuple-only state.
*/
return slot->tts_mintuple;
}
/* --------------------------------
* ExecFetchSlotTupleDatum
* Fetch the slot's tuple as a composite-type Datum.
*
Fix failure to detoast fields in composite elements of structured types. If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
2014-05-01 21:19:06 +02:00
* The result is always freshly palloc'd in the caller's memory context.
* --------------------------------
*/
Datum
ExecFetchSlotTupleDatum(TupleTableSlot *slot)
{
HeapTuple tup;
TupleDesc tupdesc;
Fix failure to detoast fields in composite elements of structured types. If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
2014-05-01 21:19:06 +02:00
/* Fetch slot's contents in regular-physical-tuple form */
tup = ExecFetchSlotTuple(slot);
tupdesc = slot->tts_tupleDescriptor;
Fix failure to detoast fields in composite elements of structured types. If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
2014-05-01 21:19:06 +02:00
/* Convert to Datum form */
return heap_copy_tuple_as_datum(tup, tupdesc);
}
/* --------------------------------
* ExecMaterializeSlot
* Force a slot into the "materialized" state.
*
* This causes the slot's tuple to be a local copy not dependent on
* any external storage. A pointer to the contained tuple is returned.
*
* A typical use for this operation is to prepare a computed tuple
* for being stored on disk. The original data may or may not be
* virtual, but in any case we need a private copy for heap_insert
* to scribble on.
* --------------------------------
*/
HeapTuple
ExecMaterializeSlot(TupleTableSlot *slot)
{
MemoryContext oldContext;
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(!slot->tts_isempty);
/*
2006-10-04 02:30:14 +02:00
* If we have a regular physical tuple, and it's locally palloc'd, we have
* nothing to do.
*/
if (slot->tts_tuple && slot->tts_shouldFree)
return slot->tts_tuple;
/*
* Otherwise, copy or build a physical tuple, and store it into the slot.
*
* We may be called in a context that is shorter-lived than the tuple
* slot, but we have to ensure that the materialized tuple will survive
* anyway.
*/
oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
slot->tts_tuple = ExecCopySlotTuple(slot);
slot->tts_shouldFree = true;
MemoryContextSwitchTo(oldContext);
/*
* Drop the pin on the referenced buffer, if there is one.
*/
if (BufferIsValid(slot->tts_buffer))
ReleaseBuffer(slot->tts_buffer);
slot->tts_buffer = InvalidBuffer;
/*
* Mark extracted state invalid. This is important because the slot is
* not supposed to depend any more on the previous external data; we
* mustn't leave any dangling pass-by-reference datums in tts_values.
* However, we have not actually invalidated any such datums, if there
* happen to be any previously fetched from the slot. (Note in particular
* that we have not pfree'd tts_mintuple, if there is one.)
*/
slot->tts_nvalid = 0;
/*
* On the same principle of not depending on previous remote storage,
* forget the mintuple if it's not local storage. (If it is local
* storage, we must not pfree it now, since callers might have already
* fetched datum pointers referencing it.)
*/
if (!slot->tts_shouldFreeMin)
slot->tts_mintuple = NULL;
return slot->tts_tuple;
}
/* --------------------------------
* ExecCopySlot
* Copy the source slot's contents into the destination slot.
*
* The destination acquires a private copy that will not go away
* if the source is cleared.
*
* The caller must ensure the slots have compatible tupdescs.
* --------------------------------
*/
TupleTableSlot *
ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
{
HeapTuple newTuple;
MemoryContext oldContext;
/*
2005-10-15 04:49:52 +02:00
* There might be ways to optimize this when the source is virtual, but
* for now just always build a physical copy. Make sure it is in the
* right context.
*/
oldContext = MemoryContextSwitchTo(dstslot->tts_mcxt);
newTuple = ExecCopySlotTuple(srcslot);
MemoryContextSwitchTo(oldContext);
return ExecStoreTuple(newTuple, dstslot, InvalidBuffer, true);
}
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
*/
/* --------------------------------
* ExecInit{Result,Scan,Extra}TupleSlot
*
* These are convenience routines to initialize the specified slot
* in nodes inheriting the appropriate state. ExecInitExtraTupleSlot
* is used for initializing special-purpose slots.
* --------------------------------
*/
/* ----------------
* ExecInitResultTupleSlot
* ----------------
*/
void
ExecInitResultTupleSlot(EState *estate, PlanState *planstate)
{
planstate->ps_ResultTupleSlot = ExecAllocTableSlot(&estate->es_tupleTable);
}
/* ----------------
* ExecInitScanTupleSlot
* ----------------
*/
void
ExecInitScanTupleSlot(EState *estate, ScanState *scanstate)
{
scanstate->ss_ScanTupleSlot = ExecAllocTableSlot(&estate->es_tupleTable);
}
/* ----------------
* ExecInitExtraTupleSlot
* ----------------
*/
TupleTableSlot *
ExecInitExtraTupleSlot(EState *estate)
{
return ExecAllocTableSlot(&estate->es_tupleTable);
}
1999-05-25 18:15:34 +02:00
/* ----------------
* ExecInitNullTupleSlot
*
* Build a slot containing an all-nulls tuple of the given type.
* This is used as a substitute for an input tuple when performing an
* outer join.
* ----------------
*/
TupleTableSlot *
ExecInitNullTupleSlot(EState *estate, TupleDesc tupType)
{
2001-03-22 05:01:46 +01:00
TupleTableSlot *slot = ExecInitExtraTupleSlot(estate);
ExecSetSlotDescriptor(slot, tupType);
return ExecStoreAllNullTuple(slot);
}
/* ----------------------------------------------------------------
* ExecTypeFromTL
*
* Generate a tuple descriptor for the result tuple of a targetlist.
* (A parse/plan tlist must be passed, not an ExprState tlist.)
* Note that resjunk columns, if any, are included in the result.
*
* Currently there are about 4 different places where we create
* TupleDescriptors. They should all be merged, or perhaps
* be rewritten to call BuildDesc().
* ----------------------------------------------------------------
*/
TupleDesc
ExecTypeFromTL(List *targetList, bool hasoid)
{
return ExecTypeFromTLInternal(targetList, hasoid, false);
}
/* ----------------------------------------------------------------
* ExecCleanTypeFromTL
*
* Same as above, but resjunk columns are omitted from the result.
* ----------------------------------------------------------------
*/
TupleDesc
ExecCleanTypeFromTL(List *targetList, bool hasoid)
{
return ExecTypeFromTLInternal(targetList, hasoid, true);
}
static TupleDesc
ExecTypeFromTLInternal(List *targetList, bool hasoid, bool skipjunk)
{
2004-08-29 07:07:03 +02:00
TupleDesc typeInfo;
ListCell *l;
int len;
int cur_resno = 1;
if (skipjunk)
len = ExecCleanTargetListLength(targetList);
else
len = ExecTargetListLength(targetList);
typeInfo = CreateTemplateTupleDesc(len, hasoid);
foreach(l, targetList)
{
2004-08-29 07:07:03 +02:00
TargetEntry *tle = lfirst(l);
if (skipjunk && tle->resjunk)
continue;
TupleDescInitEntry(typeInfo,
cur_resno,
tle->resname,
exprType((Node *) tle->expr),
exprTypmod((Node *) tle->expr),
0);
TupleDescInitEntryCollation(typeInfo,
cur_resno,
exprCollation((Node *) tle->expr));
cur_resno++;
}
return typeInfo;
}
/*
* ExecTypeFromExprList - build a tuple descriptor from a list of Exprs
*
Ensure that RowExprs and whole-row Vars produce the expected column names. At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-10 21:21:09 +01:00
* This is roughly like ExecTypeFromTL, but we work from bare expressions
* not TargetEntrys. No names are attached to the tupledesc's columns.
*/
TupleDesc
Ensure that RowExprs and whole-row Vars produce the expected column names. At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-10 21:21:09 +01:00
ExecTypeFromExprList(List *exprList)
{
2004-08-29 07:07:03 +02:00
TupleDesc typeInfo;
Ensure that RowExprs and whole-row Vars produce the expected column names. At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-10 21:21:09 +01:00
ListCell *lc;
2004-08-29 07:07:03 +02:00
int cur_resno = 1;
typeInfo = CreateTemplateTupleDesc(list_length(exprList), false);
Ensure that RowExprs and whole-row Vars produce the expected column names. At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-10 21:21:09 +01:00
foreach(lc, exprList)
{
Ensure that RowExprs and whole-row Vars produce the expected column names. At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-10 21:21:09 +01:00
Node *e = lfirst(lc);
TupleDescInitEntry(typeInfo,
cur_resno,
Ensure that RowExprs and whole-row Vars produce the expected column names. At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-10 21:21:09 +01:00
NULL,
exprType(e),
exprTypmod(e),
0);
TupleDescInitEntryCollation(typeInfo,
cur_resno,
exprCollation(e));
cur_resno++;
}
return typeInfo;
}
Ensure that RowExprs and whole-row Vars produce the expected column names. At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-10 21:21:09 +01:00
/*
* ExecTypeSetColNames - set column names in a TupleDesc
*
* Column names must be provided as an alias list (list of String nodes).
*
* For some callers, the supplied tupdesc has a named rowtype (not RECORD)
* and it is moderately likely that the alias list matches the column names
* already present in the tupdesc. If we do change any column names then
* we must reset the tupdesc's type to anonymous RECORD; but we avoid doing
* so if no names change.
*/
void
ExecTypeSetColNames(TupleDesc typeInfo, List *namesList)
{
bool modified = false;
int colno = 0;
ListCell *lc;
foreach(lc, namesList)
{
char *cname = strVal(lfirst(lc));
Form_pg_attribute attr;
/* Guard against too-long names list */
if (colno >= typeInfo->natts)
break;
attr = typeInfo->attrs[colno++];
/* Ignore empty aliases (these must be for dropped columns) */
if (cname[0] == '\0')
continue;
/* Change tupdesc only if alias is actually different */
if (strcmp(cname, NameStr(attr->attname)) != 0)
{
namestrcpy(&(attr->attname), cname);
modified = true;
}
}
/* If we modified the tupdesc, it's now a new record type */
if (modified)
{
typeInfo->tdtypeid = RECORDOID;
typeInfo->tdtypmod = -1;
}
}
/*
* BlessTupleDesc - make a completed tuple descriptor useful for SRFs
*
* Rowtype Datums returned by a function must contain valid type information.
* This happens "for free" if the tupdesc came from a relcache entry, but
* not if we have manufactured a tupdesc for a transient RECORD datatype.
* In that case we have to notify typcache.c of the existence of the type.
*/
TupleDesc
BlessTupleDesc(TupleDesc tupdesc)
{
if (tupdesc->tdtypeid == RECORDOID &&
tupdesc->tdtypmod < 0)
assign_record_type_typmod(tupdesc);
return tupdesc; /* just for notational convenience */
}
/*
* TupleDescGetSlot - Initialize a slot based on the supplied tupledesc
*
* Note: this is obsolete; it is sufficient to call BlessTupleDesc on
* the tupdesc. We keep it around just for backwards compatibility with
* existing user-written SRFs.
*/
TupleTableSlot *
TupleDescGetSlot(TupleDesc tupdesc)
{
2002-09-04 22:31:48 +02:00
TupleTableSlot *slot;
/* The useful work is here */
BlessTupleDesc(tupdesc);
/* Make a standalone slot */
slot = MakeSingleTupleTableSlot(tupdesc);
/* Return the slot */
return slot;
}
/*
* TupleDescGetAttInMetadata - Build an AttInMetadata structure based on the
* supplied TupleDesc. AttInMetadata can be used in conjunction with C strings
* to produce a properly formed tuple.
*/
AttInMetadata *
TupleDescGetAttInMetadata(TupleDesc tupdesc)
{
2002-09-04 22:31:48 +02:00
int natts = tupdesc->natts;
int i;
Oid atttypeid;
Oid attinfuncid;
FmgrInfo *attinfuncinfo;
Oid *attioparams;
2002-09-04 22:31:48 +02:00
int32 *atttypmods;
AttInMetadata *attinmeta;
attinmeta = (AttInMetadata *) palloc(sizeof(AttInMetadata));
/* "Bless" the tupledesc so that we can make rowtype datums with it */
attinmeta->tupdesc = BlessTupleDesc(tupdesc);
/*
2005-10-15 04:49:52 +02:00
* Gather info needed later to call the "in" function for each attribute
*/
attinfuncinfo = (FmgrInfo *) palloc0(natts * sizeof(FmgrInfo));
attioparams = (Oid *) palloc0(natts * sizeof(Oid));
atttypmods = (int32 *) palloc0(natts * sizeof(int32));
for (i = 0; i < natts; i++)
{
/* Ignore dropped attributes */
if (!tupdesc->attrs[i]->attisdropped)
{
atttypeid = tupdesc->attrs[i]->atttypid;
getTypeInputInfo(atttypeid, &attinfuncid, &attioparams[i]);
fmgr_info(attinfuncid, &attinfuncinfo[i]);
atttypmods[i] = tupdesc->attrs[i]->atttypmod;
}
}
attinmeta->attinfuncs = attinfuncinfo;
attinmeta->attioparams = attioparams;
attinmeta->atttypmods = atttypmods;
return attinmeta;
}
/*
* BuildTupleFromCStrings - build a HeapTuple given user data in C string form.
* values is an array of C strings, one for each attribute of the return tuple.
* A NULL string pointer indicates we want to create a NULL field.
*/
HeapTuple
BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
{
2002-09-04 22:31:48 +02:00
TupleDesc tupdesc = attinmeta->tupdesc;
int natts = tupdesc->natts;
Datum *dvalues;
bool *nulls;
2002-09-04 22:31:48 +02:00
int i;
HeapTuple tuple;
dvalues = (Datum *) palloc(natts * sizeof(Datum));
nulls = (bool *) palloc(natts * sizeof(bool));
/* Call the "in" function for each non-dropped attribute */
for (i = 0; i < natts; i++)
{
if (!tupdesc->attrs[i]->attisdropped)
{
/* Non-dropped attributes */
dvalues[i] = InputFunctionCall(&attinmeta->attinfuncs[i],
values[i],
attinmeta->attioparams[i],
attinmeta->atttypmods[i]);
if (values[i] != NULL)
nulls[i] = false;
else
nulls[i] = true;
}
else
{
/* Handle dropped attributes by setting to NULL */
dvalues[i] = (Datum) 0;
nulls[i] = true;
}
}
/*
* Form a tuple
*/
tuple = heap_form_tuple(tupdesc, dvalues, nulls);
/*
2005-10-15 04:49:52 +02:00
* Release locally palloc'd space. XXX would probably be good to pfree
* values of pass-by-reference datums, as well.
*/
pfree(dvalues);
pfree(nulls);
return tuple;
}
Fix failure to detoast fields in composite elements of structured types. If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
2014-05-01 21:19:06 +02:00
/*
* HeapTupleHeaderGetDatum - convert a HeapTupleHeader pointer to a Datum.
*
* This must *not* get applied to an on-disk tuple; the tuple should be
* freshly made by heap_form_tuple or some wrapper routine for it (such as
* BuildTupleFromCStrings). Be sure also that the tupledesc used to build
* the tuple has a properly "blessed" rowtype.
*
* Formerly this was a macro equivalent to PointerGetDatum, relying on the
* fact that heap_form_tuple fills in the appropriate tuple header fields
* for a composite Datum. However, we now require that composite Datums not
* contain any external TOAST pointers. We do not want heap_form_tuple itself
* to enforce that; more specifically, the rule applies only to actual Datums
* and not to HeapTuple structures. Therefore, HeapTupleHeaderGetDatum is
* now a function that detects whether there are externally-toasted fields
* and constructs a new tuple with inlined fields if so. We still need
* heap_form_tuple to insert the Datum header fields, because otherwise this
* code would have no way to obtain a tupledesc for the tuple.
*
* Note that if we do build a new tuple, it's palloc'd in the current
* memory context. Beware of code that changes context between the initial
Fix failure to detoast fields in composite elements of structured types. If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
2014-05-01 21:19:06 +02:00
* heap_form_tuple/etc call and calling HeapTuple(Header)GetDatum.
*
* For performance-critical callers, it could be worthwhile to take extra
* steps to ensure that there aren't TOAST pointers in the output of
* heap_form_tuple to begin with. It's likely however that the costs of the
* typcache lookup and tuple disassembly/reassembly are swamped by TOAST
* dereference costs, so that the benefits of such extra effort would be
* minimal.
*
* XXX it would likely be better to create wrapper functions that produce
* a composite Datum from the field values in one step. However, there's
* enough code using the existing APIs that we couldn't get rid of this
* hack anytime soon.
*/
Datum
HeapTupleHeaderGetDatum(HeapTupleHeader tuple)
{
Datum result;
TupleDesc tupDesc;
/* No work if there are no external TOAST pointers in the tuple */
if (!HeapTupleHeaderHasExternal(tuple))
return PointerGetDatum(tuple);
/* Use the type data saved by heap_form_tuple to look up the rowtype */
tupDesc = lookup_rowtype_tupdesc(HeapTupleHeaderGetTypeId(tuple),
HeapTupleHeaderGetTypMod(tuple));
/* And do the flattening */
result = toast_flatten_tuple_to_datum(tuple,
HeapTupleHeaderGetDatumLength(tuple),
tupDesc);
ReleaseTupleDesc(tupDesc);
return result;
}
/*
* Functions for sending tuples to the frontend (or other specified destination)
* as though it is a SELECT result. These are used by utility commands that
* need to project directly to the destination and don't need or want full
* table function capability. Currently used by EXPLAIN and SHOW ALL.
*/
TupOutputState *
begin_tup_output_tupdesc(DestReceiver *dest, TupleDesc tupdesc)
{
TupOutputState *tstate;
tstate = (TupOutputState *) palloc(sizeof(TupOutputState));
tstate->slot = MakeSingleTupleTableSlot(tupdesc);
tstate->dest = dest;
(*tstate->dest->rStartup) (tstate->dest, (int) CMD_SELECT, tupdesc);
return tstate;
}
/*
* write a single tuple
*/
void
do_tup_output(TupOutputState *tstate, Datum *values, bool *isnull)
{
TupleTableSlot *slot = tstate->slot;
int natts = slot->tts_tupleDescriptor->natts;
/* make sure the slot is clear */
ExecClearTuple(slot);
/* insert data */
memcpy(slot->tts_values, values, natts * sizeof(Datum));
memcpy(slot->tts_isnull, isnull, natts * sizeof(bool));
/* mark slot as containing a virtual tuple */
ExecStoreVirtualTuple(slot);
/* send the tuple to the receiver */
(*tstate->dest->receiveSlot) (slot, tstate->dest);
/* clean up */
ExecClearTuple(slot);
}
/*
* write a chunk of text, breaking at newline characters
*
* Should only be used with a single-TEXT-attribute tupdesc.
*/
void
do_text_output_multiline(TupOutputState *tstate, char *text)
{
Datum values[1];
2010-02-26 03:01:40 +01:00
bool isnull[1] = {false};
while (*text)
{
2002-09-04 22:31:48 +02:00
char *eol;
int len;
eol = strchr(text, '\n');
if (eol)
{
len = eol - text;
2010-02-26 03:01:40 +01:00
eol++;
}
else
{
len = strlen(text);
eol += len;
}
2002-09-04 22:31:48 +02:00
values[0] = PointerGetDatum(cstring_to_text_with_len(text, len));
do_tup_output(tstate, values, isnull);
pfree(DatumGetPointer(values[0]));
text = eol;
}
}
void
end_tup_output(TupOutputState *tstate)
{
(*tstate->dest->rShutdown) (tstate->dest);
/* note that destroying the dest is not ours to do */
ExecDropSingleTupleTableSlot(tstate->slot);
pfree(tstate);
}