1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* genam.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* general index access method routines
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2018-01-03 05:30:12 +01:00
|
|
|
* Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/access/index/genam.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
* NOTES
|
1997-09-07 07:04:48 +02:00
|
|
|
* many of the old access method routines have been turned into
|
|
|
|
* macros and moved to genam.h -cim 4/30/91
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
2002-05-21 01:51:44 +02:00
|
|
|
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "postgres.h"
|
1996-10-21 09:38:20 +02:00
|
|
|
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "access/relscan.h"
|
2007-09-20 19:56:33 +02:00
|
|
|
#include "access/transam.h"
|
2010-02-07 21:48:13 +01:00
|
|
|
#include "catalog/index.h"
|
2012-08-29 01:02:00 +02:00
|
|
|
#include "lib/stringinfo.h"
|
2002-02-19 21:11:20 +01:00
|
|
|
#include "miscadmin.h"
|
2008-06-09 01:16:43 +02:00
|
|
|
#include "storage/bufmgr.h"
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
#include "utils/acl.h"
|
2009-08-01 21:59:41 +02:00
|
|
|
#include "utils/builtins.h"
|
|
|
|
#include "utils/lsyscache.h"
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "utils/rel.h"
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
#include "utils/rls.h"
|
2014-10-08 23:10:47 +02:00
|
|
|
#include "utils/ruleutils.h"
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
#include "utils/snapmgr.h"
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
#include "utils/syscache.h"
|
2008-03-26 22:10:39 +01:00
|
|
|
#include "utils/tqual.h"
|
1996-10-21 09:38:20 +02:00
|
|
|
|
2002-02-19 21:11:20 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/* ----------------------------------------------------------------
|
1997-09-07 07:04:48 +02:00
|
|
|
* general access method routines
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1997-09-07 07:04:48 +02:00
|
|
|
* All indexed access methods use an identical scan structure.
|
|
|
|
* We don't know how the various AMs do locking, however, so we don't
|
|
|
|
* do anything about that here.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1999-12-30 06:05:13 +01:00
|
|
|
* The intent is that an AM implementor will define a beginscan routine
|
|
|
|
* that calls RelationGetIndexScan, to fill in the scan, and then does
|
|
|
|
* whatever kind of locking he wants.
|
|
|
|
*
|
|
|
|
* At the end of a scan, the AM's endscan routine undoes the locking,
|
|
|
|
* but does *not* call IndexScanEnd --- the higher-level index_endscan
|
2014-05-06 18:12:18 +02:00
|
|
|
* routine does that. (We can't do it in the AM because index_endscan
|
1999-12-30 06:05:13 +01:00
|
|
|
* still needs to touch the IndexScanDesc after calling the AM.)
|
|
|
|
*
|
|
|
|
* Because of this, the AM does not have a choice whether to call
|
|
|
|
* RelationGetIndexScan or not; its beginscan routine must return an
|
|
|
|
* object made by RelationGetIndexScan. This is kinda ugly but not
|
|
|
|
* worth cleaning up now.
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* ----------------
|
1997-09-07 07:04:48 +02:00
|
|
|
* RelationGetIndexScan -- Create and fill an IndexScanDesc.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2010-12-03 02:50:48 +01:00
|
|
|
* This routine creates an index scan structure and sets up initial
|
|
|
|
* contents for it.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1997-09-07 07:04:48 +02:00
|
|
|
* Parameters:
|
2002-05-21 01:51:44 +02:00
|
|
|
* indexRelation -- index relation for scan.
|
2010-12-03 02:50:48 +01:00
|
|
|
* nkeys -- count of scan keys (index qual conditions).
|
|
|
|
* norderbys -- count of index order-by operators.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1997-09-07 07:04:48 +02:00
|
|
|
* Returns:
|
|
|
|
* An initialized IndexScanDesc.
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
IndexScanDesc
|
2010-12-03 02:50:48 +01:00
|
|
|
RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
IndexScanDesc scan;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
scan = (IndexScanDesc) palloc(sizeof(IndexScanDescData));
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2002-05-21 01:51:44 +02:00
|
|
|
scan->heapRelation = NULL; /* may be set later */
|
|
|
|
scan->indexRelation = indexRelation;
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
scan->xs_snapshot = InvalidSnapshot; /* caller must initialize this */
|
2002-05-21 01:51:44 +02:00
|
|
|
scan->numberOfKeys = nkeys;
|
2010-12-03 02:50:48 +01:00
|
|
|
scan->numberOfOrderBys = norderbys;
|
2002-05-21 01:51:44 +02:00
|
|
|
|
|
|
|
/*
|
2010-12-03 02:50:48 +01:00
|
|
|
* We allocate key workspace here, but it won't get filled until amrescan.
|
2002-05-21 01:51:44 +02:00
|
|
|
*/
|
|
|
|
if (nkeys > 0)
|
|
|
|
scan->keyData = (ScanKey) palloc(sizeof(ScanKeyData) * nkeys);
|
|
|
|
else
|
|
|
|
scan->keyData = NULL;
|
2010-12-03 02:50:48 +01:00
|
|
|
if (norderbys > 0)
|
|
|
|
scan->orderByData = (ScanKey) palloc(sizeof(ScanKeyData) * norderbys);
|
|
|
|
else
|
|
|
|
scan->orderByData = NULL;
|
2002-05-21 01:51:44 +02:00
|
|
|
|
2012-06-10 21:20:04 +02:00
|
|
|
scan->xs_want_itup = false; /* may be set later */
|
2011-10-08 02:13:02 +02:00
|
|
|
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
/*
|
|
|
|
* During recovery we ignore killed tuples and don't bother to kill them
|
2010-02-26 03:01:40 +01:00
|
|
|
* either. We do this because the xmin on the primary node could easily be
|
|
|
|
* later than the xmin on the standby node, so that what the primary
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
* thinks is killed is supposed to be visible on standby. So for correct
|
|
|
|
* MVCC for queries during recovery we must ignore these hints and check
|
2010-02-26 03:01:40 +01:00
|
|
|
* all tuples. Do *not* set ignore_killed_tuples to true when running in a
|
|
|
|
* transaction that was started during recovery. xactStartedInRecovery
|
|
|
|
* should not be altered by index AMs.
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
*/
|
2002-05-24 20:57:57 +02:00
|
|
|
scan->kill_prior_tuple = false;
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
scan->xactStartedInRecovery = TransactionStartedDuringRecovery();
|
|
|
|
scan->ignore_killed_tuples = !scan->xactStartedInRecovery;
|
2002-05-24 20:57:57 +02:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
scan->opaque = NULL;
|
|
|
|
|
2011-10-08 02:13:02 +02:00
|
|
|
scan->xs_itup = NULL;
|
2011-10-17 01:15:04 +02:00
|
|
|
scan->xs_itupdesc = NULL;
|
2017-02-27 23:20:34 +01:00
|
|
|
scan->xs_hitup = NULL;
|
|
|
|
scan->xs_hitupdesc = NULL;
|
2011-10-08 02:13:02 +02:00
|
|
|
|
2002-05-21 01:51:44 +02:00
|
|
|
ItemPointerSetInvalid(&scan->xs_ctup.t_self);
|
|
|
|
scan->xs_ctup.t_data = NULL;
|
|
|
|
scan->xs_cbuf = InvalidBuffer;
|
2011-06-27 16:27:17 +02:00
|
|
|
scan->xs_continue_hot = false;
|
2001-06-22 21:16:24 +02:00
|
|
|
|
1998-09-01 05:29:17 +02:00
|
|
|
return scan;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
1999-12-30 06:05:13 +01:00
|
|
|
/* ----------------
|
|
|
|
* IndexScanEnd -- End an index scan.
|
|
|
|
*
|
|
|
|
* This routine just releases the storage acquired by
|
|
|
|
* RelationGetIndexScan(). Any AM-level resources are
|
|
|
|
* assumed to already have been released by the AM's
|
|
|
|
* endscan routine.
|
|
|
|
*
|
|
|
|
* Returns:
|
|
|
|
* None.
|
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
IndexScanEnd(IndexScanDesc scan)
|
|
|
|
{
|
|
|
|
if (scan->keyData != NULL)
|
|
|
|
pfree(scan->keyData);
|
2010-12-03 02:50:48 +01:00
|
|
|
if (scan->orderByData != NULL)
|
|
|
|
pfree(scan->orderByData);
|
1999-12-30 06:05:13 +01:00
|
|
|
|
|
|
|
pfree(scan);
|
|
|
|
}
|
|
|
|
|
2009-08-01 21:59:41 +02:00
|
|
|
/*
|
2009-08-01 22:59:17 +02:00
|
|
|
* BuildIndexValueDescription
|
2009-08-01 21:59:41 +02:00
|
|
|
*
|
2009-08-01 22:59:17 +02:00
|
|
|
* Construct a string describing the contents of an index entry, in the
|
|
|
|
* form "(key_name, ...)=(key_value, ...)". This is currently used
|
2018-04-07 22:00:39 +02:00
|
|
|
* for building unique-constraint and exclusion-constraint error messages,
|
|
|
|
* so only key columns of the index are checked and printed.
|
2009-12-07 06:22:23 +01:00
|
|
|
*
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
* Note that if the user does not have permissions to view all of the
|
|
|
|
* columns involved then a NULL is returned. Returning a partial key seems
|
|
|
|
* unlikely to be useful and we have no way to know which of the columns the
|
|
|
|
* user provided (unlike in ExecBuildSlotValueDescription).
|
|
|
|
*
|
2009-12-07 06:22:23 +01:00
|
|
|
* The passed-in values/nulls arrays are the "raw" input to the index AM,
|
|
|
|
* e.g. results of FormIndexDatum --- this is not necessarily what is stored
|
|
|
|
* in the index, but it's what the user perceives to be stored.
|
2017-03-03 04:37:41 +01:00
|
|
|
*
|
|
|
|
* Note: if you change anything here, check whether
|
|
|
|
* ExecBuildSlotPartitionKeyDescription() in execMain.c needs a similar
|
|
|
|
* change.
|
2009-08-01 21:59:41 +02:00
|
|
|
*/
|
2009-08-01 22:59:17 +02:00
|
|
|
char *
|
|
|
|
BuildIndexValueDescription(Relation indexRelation,
|
|
|
|
Datum *values, bool *isnull)
|
2009-08-01 21:59:41 +02:00
|
|
|
{
|
2009-08-01 22:59:17 +02:00
|
|
|
StringInfoData buf;
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
Form_pg_index idxrec;
|
|
|
|
HeapTuple ht_idx;
|
2018-04-07 22:00:39 +02:00
|
|
|
int indnkeyatts;
|
2009-08-01 21:59:41 +02:00
|
|
|
int i;
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
int keyno;
|
|
|
|
Oid indexrelid = RelationGetRelid(indexRelation);
|
|
|
|
Oid indrelid;
|
|
|
|
AclResult aclresult;
|
|
|
|
|
2018-04-07 22:00:39 +02:00
|
|
|
indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRelation);
|
|
|
|
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
/*
|
|
|
|
* Check permissions- if the user does not have access to view all of the
|
|
|
|
* key columns then return NULL to avoid leaking data.
|
|
|
|
*
|
2015-05-24 03:35:49 +02:00
|
|
|
* First check if RLS is enabled for the relation. If so, return NULL to
|
|
|
|
* avoid leaking data.
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
*
|
2015-05-24 03:35:49 +02:00
|
|
|
* Next we need to check table-level SELECT access and then, if there is
|
|
|
|
* no access there, check column-level permissions.
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fetch the pg_index tuple by the Oid of the index
|
|
|
|
*/
|
|
|
|
ht_idx = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexrelid));
|
|
|
|
if (!HeapTupleIsValid(ht_idx))
|
|
|
|
elog(ERROR, "cache lookup failed for index %u", indexrelid);
|
|
|
|
idxrec = (Form_pg_index) GETSTRUCT(ht_idx);
|
|
|
|
|
|
|
|
indrelid = idxrec->indrelid;
|
|
|
|
Assert(indexrelid == idxrec->indexrelid);
|
|
|
|
|
|
|
|
/* RLS check- if RLS is enabled then we don't return anything. */
|
2015-07-28 22:21:22 +02:00
|
|
|
if (check_enable_rls(indrelid, InvalidOid, true) == RLS_ENABLED)
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
{
|
|
|
|
ReleaseSysCache(ht_idx);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Table-level SELECT is enough, if the user has it */
|
|
|
|
aclresult = pg_class_aclcheck(indrelid, GetUserId(), ACL_SELECT);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
|
|
|
{
|
|
|
|
/*
|
2015-05-24 03:35:49 +02:00
|
|
|
* No table-level access, so step through the columns in the index and
|
|
|
|
* make sure the user has SELECT rights on all of them.
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
*/
|
2018-04-07 22:00:39 +02:00
|
|
|
for (keyno = 0; keyno < idxrec->indnkeyatts; keyno++)
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
{
|
|
|
|
AttrNumber attnum = idxrec->indkey.values[keyno];
|
|
|
|
|
2015-01-30 03:59:34 +01:00
|
|
|
/*
|
2015-05-24 03:35:49 +02:00
|
|
|
* Note that if attnum == InvalidAttrNumber, then this is an index
|
|
|
|
* based on an expression and we return no detail rather than try
|
|
|
|
* to figure out what column(s) the expression includes and if the
|
|
|
|
* user has SELECT rights on them.
|
2015-01-30 03:59:34 +01:00
|
|
|
*/
|
|
|
|
if (attnum == InvalidAttrNumber ||
|
|
|
|
pg_attribute_aclcheck(indrelid, attnum, GetUserId(),
|
|
|
|
ACL_SELECT) != ACLCHECK_OK)
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
{
|
|
|
|
/* No access, so clean up and return */
|
|
|
|
ReleaseSysCache(ht_idx);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
ReleaseSysCache(ht_idx);
|
2009-08-01 21:59:41 +02:00
|
|
|
|
2009-08-01 22:59:17 +02:00
|
|
|
initStringInfo(&buf);
|
|
|
|
appendStringInfo(&buf, "(%s)=(",
|
Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-12 23:04:11 +01:00
|
|
|
pg_get_indexdef_columns(indexrelid, true));
|
2009-08-01 21:59:41 +02:00
|
|
|
|
2018-04-07 22:00:39 +02:00
|
|
|
for (i = 0; i < indnkeyatts; i++)
|
2009-08-01 21:59:41 +02:00
|
|
|
{
|
2010-02-26 03:01:40 +01:00
|
|
|
char *val;
|
2009-08-01 21:59:41 +02:00
|
|
|
|
|
|
|
if (isnull[i])
|
|
|
|
val = "null";
|
|
|
|
else
|
|
|
|
{
|
2010-02-26 03:01:40 +01:00
|
|
|
Oid foutoid;
|
|
|
|
bool typisvarlena;
|
2009-08-01 21:59:41 +02:00
|
|
|
|
2009-12-07 06:22:23 +01:00
|
|
|
/*
|
2010-02-26 03:01:40 +01:00
|
|
|
* The provided data is not necessarily of the type stored in the
|
|
|
|
* index; rather it is of the index opclass's input type. So look
|
|
|
|
* at rd_opcintype not the index tupdesc.
|
2009-12-07 06:22:23 +01:00
|
|
|
*
|
|
|
|
* Note: this is a bit shaky for opclasses that have pseudotype
|
2014-05-06 18:12:18 +02:00
|
|
|
* input types such as ANYARRAY or RECORD. Currently, the
|
2010-02-26 03:01:40 +01:00
|
|
|
* typoutput functions associated with the pseudotypes will work
|
|
|
|
* okay, but we might have to try harder in future.
|
2009-12-07 06:22:23 +01:00
|
|
|
*/
|
|
|
|
getTypeOutputInfo(indexRelation->rd_opcintype[i],
|
2009-08-01 21:59:41 +02:00
|
|
|
&foutoid, &typisvarlena);
|
|
|
|
val = OidOutputFunctionCall(foutoid, values[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (i > 0)
|
2009-08-01 22:59:17 +02:00
|
|
|
appendStringInfoString(&buf, ", ");
|
|
|
|
appendStringInfoString(&buf, val);
|
2009-08-01 21:59:41 +02:00
|
|
|
}
|
|
|
|
|
2009-08-01 22:59:17 +02:00
|
|
|
appendStringInfoChar(&buf, ')');
|
|
|
|
|
|
|
|
return buf.data;
|
2009-08-01 21:59:41 +02:00
|
|
|
}
|
|
|
|
|
2002-02-19 21:11:20 +01:00
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* heap-or-index-scan access to system catalogs
|
|
|
|
*
|
|
|
|
* These functions support system catalog accesses that normally use
|
|
|
|
* an index but need to be capable of being switched to heap scans
|
2002-05-21 01:51:44 +02:00
|
|
|
* if the system indexes are unavailable.
|
2002-02-19 21:11:20 +01:00
|
|
|
*
|
|
|
|
* The specified scan keys must be compatible with the named index.
|
|
|
|
* Generally this means that they must constrain either all columns
|
|
|
|
* of the index, or the first K columns of an N-column index.
|
|
|
|
*
|
2002-05-21 01:51:44 +02:00
|
|
|
* These routines could work with non-system tables, actually,
|
2002-02-19 21:11:20 +01:00
|
|
|
* but they're only useful when there is a known index to use with
|
2002-05-21 01:51:44 +02:00
|
|
|
* the given scan keys; so in practice they're only good for
|
2002-02-19 21:11:20 +01:00
|
|
|
* predetermined types of scans of system catalogs.
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* systable_beginscan --- set up for heap-or-index scan
|
|
|
|
*
|
|
|
|
* rel: catalog to scan, already opened and suitably locked
|
2005-04-14 22:03:27 +02:00
|
|
|
* indexId: OID of index to conditionally use
|
2002-02-19 21:11:20 +01:00
|
|
|
* indexOK: if false, forces a heap scan (see notes below)
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
* snapshot: time qual to use (NULL for a recent catalog snapshot)
|
2002-02-19 21:11:20 +01:00
|
|
|
* nkeys, key: scan keys
|
|
|
|
*
|
|
|
|
* The attribute numbers in the scan key should be set for the heap case.
|
|
|
|
* If we choose to index, we reset them to 1..n to reference the index
|
|
|
|
* columns. Note this means there must be one scankey qualification per
|
|
|
|
* index column! This is checked by the Asserts in the normal, index-using
|
|
|
|
* case, but won't be checked if the heapscan path is taken.
|
|
|
|
*
|
|
|
|
* The routine checks the normal cases for whether an indexscan is safe,
|
|
|
|
* but caller can make additional checks and pass indexOK=false if needed.
|
|
|
|
* In standard case indexOK can simply be constant TRUE.
|
|
|
|
*/
|
|
|
|
SysScanDesc
|
2002-05-21 01:51:44 +02:00
|
|
|
systable_beginscan(Relation heapRelation,
|
2005-04-14 22:03:27 +02:00
|
|
|
Oid indexId,
|
2002-02-19 21:11:20 +01:00
|
|
|
bool indexOK,
|
|
|
|
Snapshot snapshot,
|
2002-05-21 01:51:44 +02:00
|
|
|
int nkeys, ScanKey key)
|
2002-02-19 21:11:20 +01:00
|
|
|
{
|
|
|
|
SysScanDesc sysscan;
|
2003-09-24 20:54:02 +02:00
|
|
|
Relation irel;
|
|
|
|
|
2005-04-14 22:03:27 +02:00
|
|
|
if (indexOK &&
|
2006-01-05 11:07:46 +01:00
|
|
|
!IgnoreSystemIndexes &&
|
2005-04-14 22:03:27 +02:00
|
|
|
!ReindexIsProcessingIndex(indexId))
|
2006-07-31 22:09:10 +02:00
|
|
|
irel = index_open(indexId, AccessShareLock);
|
2003-09-24 20:54:02 +02:00
|
|
|
else
|
|
|
|
irel = NULL;
|
2002-02-19 21:11:20 +01:00
|
|
|
|
|
|
|
sysscan = (SysScanDesc) palloc(sizeof(SysScanDescData));
|
2002-05-21 01:51:44 +02:00
|
|
|
|
|
|
|
sysscan->heap_rel = heapRelation;
|
2003-09-24 20:54:02 +02:00
|
|
|
sysscan->irel = irel;
|
2002-02-19 21:11:20 +01:00
|
|
|
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
if (snapshot == NULL)
|
|
|
|
{
|
2014-05-06 18:12:18 +02:00
|
|
|
Oid relid = RelationGetRelid(heapRelation);
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
|
|
|
|
snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
|
|
|
|
sysscan->snapshot = snapshot;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Caller is responsible for any snapshot. */
|
|
|
|
sysscan->snapshot = NULL;
|
|
|
|
}
|
|
|
|
|
2003-09-24 20:54:02 +02:00
|
|
|
if (irel)
|
2002-02-19 21:11:20 +01:00
|
|
|
{
|
2002-05-21 01:51:44 +02:00
|
|
|
int i;
|
2002-02-19 21:11:20 +01:00
|
|
|
|
2008-11-06 14:07:08 +01:00
|
|
|
/* Change attribute numbers to be index column numbers. */
|
2002-02-19 21:11:20 +01:00
|
|
|
for (i = 0; i < nkeys; i++)
|
|
|
|
{
|
2009-06-11 16:49:15 +02:00
|
|
|
int j;
|
2008-11-06 14:07:08 +01:00
|
|
|
|
2018-04-07 22:00:39 +02:00
|
|
|
for (j = 0; j < IndexRelationGetNumberOfAttributes(irel); j++)
|
2008-11-06 14:07:08 +01:00
|
|
|
{
|
|
|
|
if (key[i].sk_attno == irel->rd_index->indkey.values[j])
|
|
|
|
{
|
|
|
|
key[i].sk_attno = j + 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2018-04-07 22:00:39 +02:00
|
|
|
if (j == IndexRelationGetNumberOfAttributes(irel))
|
2008-11-06 14:07:08 +01:00
|
|
|
elog(ERROR, "column is not in index");
|
2002-02-19 21:11:20 +01:00
|
|
|
}
|
2003-09-24 20:54:02 +02:00
|
|
|
|
2006-07-31 22:09:10 +02:00
|
|
|
sysscan->iscan = index_beginscan(heapRelation, irel,
|
2010-12-03 02:50:48 +01:00
|
|
|
snapshot, nkeys, 0);
|
|
|
|
index_rescan(sysscan->iscan, key, nkeys, NULL, 0);
|
2002-02-19 21:11:20 +01:00
|
|
|
sysscan->scan = NULL;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2012-05-27 01:09:52 +02:00
|
|
|
/*
|
|
|
|
* We disallow synchronized scans when forced to use a heapscan on a
|
|
|
|
* catalog. In most cases the desired rows are near the front, so
|
|
|
|
* that the unpredictable start point of a syncscan is a serious
|
|
|
|
* disadvantage; and there are no compensating advantages, because
|
|
|
|
* it's unlikely that such scans will occur in parallel.
|
|
|
|
*/
|
|
|
|
sysscan->scan = heap_beginscan_strat(heapRelation, snapshot,
|
|
|
|
nkeys, key,
|
|
|
|
true, false);
|
2002-02-19 21:11:20 +01:00
|
|
|
sysscan->iscan = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return sysscan;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* systable_getnext --- get next tuple in a heap-or-index scan
|
|
|
|
*
|
|
|
|
* Returns NULL if no more tuples available.
|
|
|
|
*
|
|
|
|
* Note that returned tuple is a reference to data in a disk buffer;
|
|
|
|
* it must not be modified, and should be presumed inaccessible after
|
|
|
|
* next getnext() or endscan() call.
|
|
|
|
*/
|
|
|
|
HeapTuple
|
|
|
|
systable_getnext(SysScanDesc sysscan)
|
|
|
|
{
|
2002-05-21 01:51:44 +02:00
|
|
|
HeapTuple htup;
|
2002-02-19 21:11:20 +01:00
|
|
|
|
|
|
|
if (sysscan->irel)
|
2008-04-13 21:18:14 +02:00
|
|
|
{
|
2002-05-21 01:51:44 +02:00
|
|
|
htup = index_getnext(sysscan->iscan, ForwardScanDirection);
|
2009-06-11 16:49:15 +02:00
|
|
|
|
2008-04-13 21:18:14 +02:00
|
|
|
/*
|
2009-06-11 16:49:15 +02:00
|
|
|
* We currently don't need to support lossy index operators for any
|
|
|
|
* system catalog scan. It could be done here, using the scan keys to
|
|
|
|
* drive the operator calls, if we arranged to save the heap attnums
|
|
|
|
* during systable_beginscan(); this is practical because we still
|
|
|
|
* wouldn't need to support indexes on expressions.
|
2008-04-13 21:18:14 +02:00
|
|
|
*/
|
|
|
|
if (htup && sysscan->iscan->xs_recheck)
|
|
|
|
elog(ERROR, "system catalog scans with lossy index conditions are not implemented");
|
|
|
|
}
|
2002-02-19 21:11:20 +01:00
|
|
|
else
|
2002-05-21 01:51:44 +02:00
|
|
|
htup = heap_getnext(sysscan->scan, ForwardScanDirection);
|
2002-02-19 21:11:20 +01:00
|
|
|
|
|
|
|
return htup;
|
|
|
|
}
|
|
|
|
|
2008-06-09 00:41:04 +02:00
|
|
|
/*
|
|
|
|
* systable_recheck_tuple --- recheck visibility of most-recently-fetched tuple
|
|
|
|
*
|
2013-07-17 02:16:32 +02:00
|
|
|
* In particular, determine if this tuple would be visible to a catalog scan
|
|
|
|
* that started now. We don't handle the case of a non-MVCC scan snapshot,
|
|
|
|
* because no caller needs that yet.
|
|
|
|
*
|
2008-06-09 00:41:04 +02:00
|
|
|
* This is useful to test whether an object was deleted while we waited to
|
|
|
|
* acquire lock on it.
|
|
|
|
*
|
|
|
|
* Note: we don't actually *need* the tuple to be passed in, but it's a
|
|
|
|
* good crosscheck that the caller is interested in the right tuple.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
|
|
|
|
{
|
2013-07-17 02:16:32 +02:00
|
|
|
Snapshot freshsnap;
|
2008-06-09 00:41:04 +02:00
|
|
|
bool result;
|
|
|
|
|
2013-07-17 02:16:32 +02:00
|
|
|
/*
|
|
|
|
* Trust that LockBuffer() and HeapTupleSatisfiesMVCC() do not themselves
|
|
|
|
* acquire snapshots, so we need not register the snapshot. Those
|
|
|
|
* facilities are too low-level to have any business scanning tables.
|
|
|
|
*/
|
|
|
|
freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan->heap_rel));
|
|
|
|
|
2008-06-09 00:41:04 +02:00
|
|
|
if (sysscan->irel)
|
|
|
|
{
|
|
|
|
IndexScanDesc scan = sysscan->iscan;
|
|
|
|
|
2013-07-17 02:16:32 +02:00
|
|
|
Assert(IsMVCCSnapshot(scan->xs_snapshot));
|
2008-06-09 00:41:04 +02:00
|
|
|
Assert(tup == &scan->xs_ctup);
|
|
|
|
Assert(BufferIsValid(scan->xs_cbuf));
|
|
|
|
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
|
|
|
|
LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
|
2013-07-17 02:16:32 +02:00
|
|
|
result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->xs_cbuf);
|
2008-06-09 00:41:04 +02:00
|
|
|
LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
HeapScanDesc scan = sysscan->scan;
|
|
|
|
|
2013-07-17 02:16:32 +02:00
|
|
|
Assert(IsMVCCSnapshot(scan->rs_snapshot));
|
2008-06-09 00:41:04 +02:00
|
|
|
Assert(tup == &scan->rs_ctup);
|
|
|
|
Assert(BufferIsValid(scan->rs_cbuf));
|
|
|
|
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
|
|
|
|
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
|
2013-07-17 02:16:32 +02:00
|
|
|
result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->rs_cbuf);
|
2008-06-09 00:41:04 +02:00
|
|
|
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2002-02-19 21:11:20 +01:00
|
|
|
/*
|
|
|
|
* systable_endscan --- close scan, release resources
|
|
|
|
*
|
|
|
|
* Note that it's still up to the caller to close the heap relation.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
systable_endscan(SysScanDesc sysscan)
|
|
|
|
{
|
|
|
|
if (sysscan->irel)
|
|
|
|
{
|
|
|
|
index_endscan(sysscan->iscan);
|
2006-07-31 22:09:10 +02:00
|
|
|
index_close(sysscan->irel, AccessShareLock);
|
2002-02-19 21:11:20 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
heap_endscan(sysscan->scan);
|
|
|
|
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
if (sysscan->snapshot)
|
|
|
|
UnregisterSnapshot(sysscan->snapshot);
|
|
|
|
|
2002-02-19 21:11:20 +01:00
|
|
|
pfree(sysscan);
|
|
|
|
}
|
2008-04-13 01:14:21 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* systable_beginscan_ordered --- set up for ordered catalog scan
|
|
|
|
*
|
|
|
|
* These routines have essentially the same API as systable_beginscan etc,
|
|
|
|
* except that they guarantee to return multiple matching tuples in
|
|
|
|
* index order. Also, for largely historical reasons, the index to use
|
|
|
|
* is opened and locked by the caller, not here.
|
|
|
|
*
|
2014-05-06 18:12:18 +02:00
|
|
|
* Currently we do not support non-index-based scans here. (In principle
|
2008-04-13 01:14:21 +02:00
|
|
|
* we could do a heapscan and sort, but the uses are in places that
|
|
|
|
* probably don't need to still work with corrupted catalog indexes.)
|
|
|
|
* For the moment, therefore, these functions are merely the thinnest of
|
|
|
|
* wrappers around index_beginscan/index_getnext. The main reason for their
|
|
|
|
* existence is to centralize possible future support of lossy operators
|
|
|
|
* in catalog scans.
|
|
|
|
*/
|
|
|
|
SysScanDesc
|
|
|
|
systable_beginscan_ordered(Relation heapRelation,
|
|
|
|
Relation indexRelation,
|
|
|
|
Snapshot snapshot,
|
|
|
|
int nkeys, ScanKey key)
|
|
|
|
{
|
|
|
|
SysScanDesc sysscan;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/* REINDEX can probably be a hard error here ... */
|
|
|
|
if (ReindexIsProcessingIndex(RelationGetRelid(indexRelation)))
|
2010-02-07 21:48:13 +01:00
|
|
|
elog(ERROR, "cannot do ordered scan on index \"%s\", because it is being reindexed",
|
2008-04-13 01:14:21 +02:00
|
|
|
RelationGetRelationName(indexRelation));
|
|
|
|
/* ... but we only throw a warning about violating IgnoreSystemIndexes */
|
|
|
|
if (IgnoreSystemIndexes)
|
|
|
|
elog(WARNING, "using index \"%s\" despite IgnoreSystemIndexes",
|
|
|
|
RelationGetRelationName(indexRelation));
|
|
|
|
|
|
|
|
sysscan = (SysScanDesc) palloc(sizeof(SysScanDescData));
|
|
|
|
|
|
|
|
sysscan->heap_rel = heapRelation;
|
|
|
|
sysscan->irel = indexRelation;
|
|
|
|
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
if (snapshot == NULL)
|
|
|
|
{
|
2014-05-06 18:12:18 +02:00
|
|
|
Oid relid = RelationGetRelid(heapRelation);
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
|
|
|
|
snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
|
|
|
|
sysscan->snapshot = snapshot;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Caller is responsible for any snapshot. */
|
|
|
|
sysscan->snapshot = NULL;
|
|
|
|
}
|
|
|
|
|
2008-11-06 14:07:08 +01:00
|
|
|
/* Change attribute numbers to be index column numbers. */
|
2008-04-13 01:14:21 +02:00
|
|
|
for (i = 0; i < nkeys; i++)
|
|
|
|
{
|
2009-06-11 16:49:15 +02:00
|
|
|
int j;
|
2008-11-06 14:07:08 +01:00
|
|
|
|
2018-04-07 22:00:39 +02:00
|
|
|
for (j = 0; j < IndexRelationGetNumberOfAttributes(indexRelation); j++)
|
2008-11-06 14:07:08 +01:00
|
|
|
{
|
|
|
|
if (key[i].sk_attno == indexRelation->rd_index->indkey.values[j])
|
|
|
|
{
|
|
|
|
key[i].sk_attno = j + 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2018-04-07 22:00:39 +02:00
|
|
|
if (j == IndexRelationGetNumberOfAttributes(indexRelation))
|
2008-11-06 14:07:08 +01:00
|
|
|
elog(ERROR, "column is not in index");
|
2008-04-13 01:14:21 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
|
2010-12-03 02:50:48 +01:00
|
|
|
snapshot, nkeys, 0);
|
|
|
|
index_rescan(sysscan->iscan, key, nkeys, NULL, 0);
|
2008-04-13 01:14:21 +02:00
|
|
|
sysscan->scan = NULL;
|
|
|
|
|
|
|
|
return sysscan;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* systable_getnext_ordered --- get next tuple in an ordered catalog scan
|
|
|
|
*/
|
|
|
|
HeapTuple
|
|
|
|
systable_getnext_ordered(SysScanDesc sysscan, ScanDirection direction)
|
|
|
|
{
|
|
|
|
HeapTuple htup;
|
|
|
|
|
|
|
|
Assert(sysscan->irel);
|
|
|
|
htup = index_getnext(sysscan->iscan, direction);
|
2008-04-13 21:18:14 +02:00
|
|
|
/* See notes in systable_getnext */
|
|
|
|
if (htup && sysscan->iscan->xs_recheck)
|
|
|
|
elog(ERROR, "system catalog scans with lossy index conditions are not implemented");
|
2008-04-13 01:14:21 +02:00
|
|
|
|
|
|
|
return htup;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* systable_endscan_ordered --- close scan, release resources
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
systable_endscan_ordered(SysScanDesc sysscan)
|
|
|
|
{
|
|
|
|
Assert(sysscan->irel);
|
|
|
|
index_endscan(sysscan->iscan);
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
if (sysscan->snapshot)
|
|
|
|
UnregisterSnapshot(sysscan->snapshot);
|
2008-04-13 01:14:21 +02:00
|
|
|
pfree(sysscan);
|
|
|
|
}
|