2006-07-31 03:16:38 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* toasting.c
|
|
|
|
* This file contains routines to support creation of toast tables
|
|
|
|
*
|
|
|
|
*
|
2021-01-02 19:06:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
|
2006-07-31 03:16:38 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/catalog/toasting.c
|
2006-07-31 03:16:38 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
Don't include heapam.h from others headers.
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.
heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.
Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.
As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.
Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-15 00:54:18 +01:00
|
|
|
#include "access/heapam.h"
|
Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.
In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.
Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.
It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.
Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.
Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 20:10:38 +01:00
|
|
|
#include "access/toast_compression.h"
|
2006-07-31 03:16:38 +02:00
|
|
|
#include "access/xact.h"
|
2013-12-19 22:10:01 +01:00
|
|
|
#include "catalog/binary_upgrade.h"
|
Ignore attempts to add TOAST table to shared or catalog tables
Running ALTER TABLE on any table will check if a TOAST table needs to be
added. On shared tables, this would previously fail, thus effectively
disabling ALTER TABLE for those tables. On (non-shared) system
catalogs, on the other hand, it would add a TOAST table, even though we
don't really want TOAST tables on some system catalogs. In some cases,
it would also fail with an error "AccessExclusiveLock required to add
toast table.", depending on what locks the ALTER TABLE actions had
already taken.
So instead, just ignore attempts to add TOAST tables to such tables,
outside of bootstrap mode, pretending they don't need one.
This allows running ALTER TABLE on such tables without messing up the
TOAST situation. Legitimate uses for ALTER TABLE on system catalogs
include setting reloptions (say, fillfactor or autovacuum settings).
(All this still requires allow_system_table_mods, which is independent
of this.)
Discussion: https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975@2ndquadrant.com
2019-03-19 10:48:03 +01:00
|
|
|
#include "catalog/catalog.h"
|
2006-07-31 03:16:38 +02:00
|
|
|
#include "catalog/dependency.h"
|
|
|
|
#include "catalog/heap.h"
|
|
|
|
#include "catalog/index.h"
|
2007-07-26 00:16:18 +02:00
|
|
|
#include "catalog/namespace.h"
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
#include "catalog/pg_am.h"
|
2006-07-31 03:16:38 +02:00
|
|
|
#include "catalog/pg_namespace.h"
|
|
|
|
#include "catalog/pg_opclass.h"
|
|
|
|
#include "catalog/pg_type.h"
|
|
|
|
#include "catalog/toasting.h"
|
|
|
|
#include "miscadmin.h"
|
|
|
|
#include "nodes/makefuncs.h"
|
2014-04-06 17:13:43 +02:00
|
|
|
#include "storage/lock.h"
|
2006-07-31 03:16:38 +02:00
|
|
|
#include "utils/builtins.h"
|
2011-02-23 18:18:09 +01:00
|
|
|
#include "utils/rel.h"
|
2006-07-31 03:16:38 +02:00
|
|
|
#include "utils/syscache.h"
|
|
|
|
|
2014-04-06 17:13:43 +02:00
|
|
|
static void CheckAndCreateToastTable(Oid relOid, Datum reloptions,
|
2021-08-25 06:40:50 +02:00
|
|
|
LOCKMODE lockmode, bool check,
|
|
|
|
Oid OIDOldToast);
|
2009-02-02 20:31:40 +01:00
|
|
|
static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
|
2021-08-25 06:40:50 +02:00
|
|
|
Datum reloptions, LOCKMODE lockmode, bool check,
|
|
|
|
Oid OIDOldToast);
|
2006-07-31 03:16:38 +02:00
|
|
|
static bool needs_toast_table(Relation rel);
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2014-04-06 17:13:43 +02:00
|
|
|
* CreateToastTable variants
|
2006-07-31 03:16:38 +02:00
|
|
|
* If the table needs a toast table, and doesn't already have one,
|
2010-01-06 04:04:03 +01:00
|
|
|
* then create a toast table for it.
|
2009-06-11 22:46:11 +02:00
|
|
|
*
|
2009-05-08 00:58:28 +02:00
|
|
|
* reloptions for the toast table can be passed, too. Pass (Datum) 0
|
|
|
|
* for default reloptions.
|
2006-07-31 03:16:38 +02:00
|
|
|
*
|
|
|
|
* We expect the caller to have verified that the relation is a table and have
|
|
|
|
* already done any necessary permission checks. Callers expect this function
|
|
|
|
* to end with CommandCounterIncrement if it makes any changes.
|
|
|
|
*/
|
|
|
|
void
|
2014-04-06 17:13:43 +02:00
|
|
|
AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
|
|
|
|
{
|
2021-08-25 06:40:50 +02:00
|
|
|
CheckAndCreateToastTable(relOid, reloptions, lockmode, true, InvalidOid);
|
2014-04-06 17:13:43 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2021-08-25 06:40:50 +02:00
|
|
|
NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
|
|
|
|
Oid OIDOldToast)
|
2014-04-06 17:13:43 +02:00
|
|
|
{
|
2021-08-25 06:40:50 +02:00
|
|
|
CheckAndCreateToastTable(relOid, reloptions, lockmode, false, OIDOldToast);
|
2014-04-06 17:13:43 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
NewRelationCreateToastTable(Oid relOid, Datum reloptions)
|
|
|
|
{
|
2021-08-25 06:40:50 +02:00
|
|
|
CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false,
|
|
|
|
InvalidOid);
|
2014-04-06 17:13:43 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2021-08-25 06:40:50 +02:00
|
|
|
CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
|
|
|
|
bool check, Oid OIDOldToast)
|
2006-07-31 03:16:38 +02:00
|
|
|
{
|
|
|
|
Relation rel;
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(relOid, lockmode);
|
2006-07-31 03:16:38 +02:00
|
|
|
|
|
|
|
/* create_toast_table does all the work */
|
2021-08-25 06:40:50 +02:00
|
|
|
(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode,
|
|
|
|
check, OIDOldToast);
|
2006-07-31 03:16:38 +02:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2006-07-31 03:16:38 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a toast table during bootstrap
|
|
|
|
*
|
|
|
|
* Here we need to prespecify the OIDs of the toast table and its index
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
|
|
|
|
{
|
|
|
|
Relation rel;
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_openrv(makeRangeVar(NULL, relName, -1), AccessExclusiveLock);
|
2006-07-31 03:16:38 +02:00
|
|
|
|
2013-03-04 01:23:31 +01:00
|
|
|
if (rel->rd_rel->relkind != RELKIND_RELATION &&
|
|
|
|
rel->rd_rel->relkind != RELKIND_MATVIEW)
|
2006-07-31 03:16:38 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
2013-03-04 01:23:31 +01:00
|
|
|
errmsg("\"%s\" is not a table or materialized view",
|
2006-07-31 03:16:38 +02:00
|
|
|
relName)));
|
|
|
|
|
|
|
|
/* create_toast_table does all the work */
|
2014-04-06 17:13:43 +02:00
|
|
|
if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0,
|
2021-08-25 06:40:50 +02:00
|
|
|
AccessExclusiveLock, false, InvalidOid))
|
2006-07-31 03:16:38 +02:00
|
|
|
elog(ERROR, "\"%s\" does not require a toast table",
|
|
|
|
relName);
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2006-07-31 03:16:38 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* create_toast_table --- internal workhorse
|
|
|
|
*
|
2011-04-14 03:07:14 +02:00
|
|
|
* rel is already opened and locked
|
2010-01-06 04:04:03 +01:00
|
|
|
* toastOid and toastIndexOid are normally InvalidOid, but during
|
|
|
|
* bootstrap they can be nonzero to specify hand-assigned OIDs
|
2006-07-31 03:16:38 +02:00
|
|
|
*/
|
|
|
|
static bool
|
2014-04-06 17:13:43 +02:00
|
|
|
create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
|
2021-08-25 06:40:50 +02:00
|
|
|
Datum reloptions, LOCKMODE lockmode, bool check,
|
|
|
|
Oid OIDOldToast)
|
2006-07-31 03:16:38 +02:00
|
|
|
{
|
|
|
|
Oid relOid = RelationGetRelid(rel);
|
|
|
|
HeapTuple reltup;
|
|
|
|
TupleDesc tupdesc;
|
|
|
|
bool shared_relation;
|
2010-02-07 21:48:13 +01:00
|
|
|
bool mapped_relation;
|
2011-01-25 21:42:03 +01:00
|
|
|
Relation toast_rel;
|
2006-07-31 03:16:38 +02:00
|
|
|
Relation class_rel;
|
|
|
|
Oid toast_relid;
|
2007-07-26 00:16:18 +02:00
|
|
|
Oid namespaceid;
|
2006-07-31 03:16:38 +02:00
|
|
|
char toast_relname[NAMEDATALEN];
|
|
|
|
char toast_idxname[NAMEDATALEN];
|
|
|
|
IndexInfo *indexInfo;
|
2011-02-08 22:04:18 +01:00
|
|
|
Oid collationObjectId[2];
|
2006-07-31 03:16:38 +02:00
|
|
|
Oid classObjectId[2];
|
2007-01-09 03:14:16 +01:00
|
|
|
int16 coloptions[2];
|
2006-07-31 03:16:38 +02:00
|
|
|
ObjectAddress baseobject,
|
|
|
|
toastobject;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Is it already toasted?
|
|
|
|
*/
|
|
|
|
if (rel->rd_rel->reltoastrelid != InvalidOid)
|
|
|
|
return false;
|
|
|
|
|
Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.
This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.
The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed. While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested). I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.
If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page. Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem. So let's just
follow the old version's conclusion about whether a TOAST table is needed.
(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17aa43ed did.)
Back-patch to 9.3, like the previous patch.
Discussion: <8110.1462291671@sss.pgh.pa.us>
2016-05-07 04:05:51 +02:00
|
|
|
/*
|
|
|
|
* Check to see whether the table actually needs a TOAST table.
|
|
|
|
*/
|
2014-08-07 20:56:13 +02:00
|
|
|
if (!IsBinaryUpgrade)
|
|
|
|
{
|
Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.
This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.
The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed. While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested). I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.
If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page. Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem. So let's just
follow the old version's conclusion about whether a TOAST table is needed.
(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17aa43ed did.)
Back-patch to 9.3, like the previous patch.
Discussion: <8110.1462291671@sss.pgh.pa.us>
2016-05-07 04:05:51 +02:00
|
|
|
/* Normal mode, normal check */
|
2014-08-07 20:56:13 +02:00
|
|
|
if (!needs_toast_table(rel))
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.
This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.
The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed. While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested). I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.
If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page. Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem. So let's just
follow the old version's conclusion about whether a TOAST table is needed.
(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17aa43ed did.)
Back-patch to 9.3, like the previous patch.
Discussion: <8110.1462291671@sss.pgh.pa.us>
2016-05-07 04:05:51 +02:00
|
|
|
* In binary-upgrade mode, create a TOAST table if and only if
|
|
|
|
* pg_upgrade told us to (ie, a TOAST table OID has been provided).
|
2014-08-07 20:56:13 +02:00
|
|
|
*
|
Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.
This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.
The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed. While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested). I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.
If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page. Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem. So let's just
follow the old version's conclusion about whether a TOAST table is needed.
(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17aa43ed did.)
Back-patch to 9.3, like the previous patch.
Discussion: <8110.1462291671@sss.pgh.pa.us>
2016-05-07 04:05:51 +02:00
|
|
|
* This indicates that the old cluster had a TOAST table for the
|
|
|
|
* current table. We must create a TOAST table to receive the old
|
|
|
|
* TOAST file, even if the table seems not to need one.
|
|
|
|
*
|
|
|
|
* Contrariwise, if the old cluster did not have a TOAST table, we
|
|
|
|
* should be able to get along without one even if the new version's
|
|
|
|
* needs_toast_table rules suggest we should have one. There is a lot
|
|
|
|
* of daylight between where we will create a TOAST table and where
|
|
|
|
* one is really necessary to avoid failures, so small cross-version
|
|
|
|
* differences in the when-to-create heuristic shouldn't be a problem.
|
|
|
|
* If we tried to create a TOAST table anyway, we would have the
|
|
|
|
* problem that it might take up an OID that will conflict with some
|
|
|
|
* old-cluster table we haven't seen yet.
|
2014-08-07 20:56:13 +02:00
|
|
|
*/
|
Don't create pg_type entries for sequences or toast tables.
Commit f7f70d5e2 left one inconsistency behind: we're still creating
pg_type entries for the composite types of sequences and toast tables,
but not arrays over those composites. But there seems precious little
reason to have named composite types for toast tables, and not much more
to have them for sequences (especially given the thought that sequences
may someday not be standalone relations at all).
So, let's close that inconsistency by removing these composite types,
rather than adding arrays for them. This buys back a little bit of
the initial pg_type bloat added by the previous patch, and could be
a significant savings in a large database with many toast tables.
Aside from a small logic rearrangement in heap_create_with_catalog,
this patch mostly needs to clean up some places that were assuming that
pg_class.reltype always has a valid value. Those are really pre-existing
bugs, given that it's documented otherwise; notably, the plpgsql changes
fix code that gives "cache lookup failed for type 0" on indexes today.
But none of these seem interesting enough to back-patch.
Also, remove the pg_dump/pg_upgrade infrastructure for propagating
a toast table's pg_type OID into the new database, since we no longer
need that.
Discussion: https://postgr.es/m/761F1389-C6A8-4C15-80CE-950C961F5341@gmail.com
2020-07-07 21:43:22 +02:00
|
|
|
if (!OidIsValid(binary_upgrade_next_toast_pg_class_oid))
|
2014-08-07 20:56:13 +02:00
|
|
|
return false;
|
|
|
|
}
|
2006-07-31 03:16:38 +02:00
|
|
|
|
2014-04-06 17:13:43 +02:00
|
|
|
/*
|
|
|
|
* If requested check lockmode is sufficient. This is a cross check in
|
|
|
|
* case of errors or conflicting decisions in earlier code.
|
|
|
|
*/
|
|
|
|
if (check && lockmode != AccessExclusiveLock)
|
|
|
|
elog(ERROR, "AccessExclusiveLock required to add toast table.");
|
|
|
|
|
2006-07-31 03:16:38 +02:00
|
|
|
/*
|
|
|
|
* Create the toast table and its index
|
|
|
|
*/
|
|
|
|
snprintf(toast_relname, sizeof(toast_relname),
|
|
|
|
"pg_toast_%u", relOid);
|
|
|
|
snprintf(toast_idxname, sizeof(toast_idxname),
|
|
|
|
"pg_toast_%u_index", relOid);
|
|
|
|
|
|
|
|
/* this is pretty painful... need a tuple descriptor */
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
tupdesc = CreateTemplateTupleDesc(3);
|
2006-07-31 03:16:38 +02:00
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 1,
|
|
|
|
"chunk_id",
|
|
|
|
OIDOID,
|
|
|
|
-1, 0);
|
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 2,
|
|
|
|
"chunk_seq",
|
|
|
|
INT4OID,
|
|
|
|
-1, 0);
|
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 3,
|
|
|
|
"chunk_data",
|
|
|
|
BYTEAOID,
|
|
|
|
-1, 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Ensure that the toast table doesn't itself get toasted, or we'll be
|
|
|
|
* toast :-(. This is essential for chunk_data because type bytea is
|
|
|
|
* toastable; hit the other two just to be sure.
|
|
|
|
*/
|
2020-03-04 16:34:25 +01:00
|
|
|
TupleDescAttr(tupdesc, 0)->attstorage = TYPSTORAGE_PLAIN;
|
|
|
|
TupleDescAttr(tupdesc, 1)->attstorage = TYPSTORAGE_PLAIN;
|
|
|
|
TupleDescAttr(tupdesc, 2)->attstorage = TYPSTORAGE_PLAIN;
|
2006-07-31 03:16:38 +02:00
|
|
|
|
Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.
In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.
Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.
It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.
Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.
Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 20:10:38 +01:00
|
|
|
/* Toast field should not be compressed */
|
|
|
|
TupleDescAttr(tupdesc, 0)->attcompression = InvalidCompressionMethod;
|
|
|
|
TupleDescAttr(tupdesc, 1)->attcompression = InvalidCompressionMethod;
|
|
|
|
TupleDescAttr(tupdesc, 2)->attcompression = InvalidCompressionMethod;
|
|
|
|
|
2006-07-31 03:16:38 +02:00
|
|
|
/*
|
2007-07-26 00:16:18 +02:00
|
|
|
* Toast tables for regular relations go in pg_toast; those for temp
|
|
|
|
* relations go into the per-backend temp-toast-table namespace.
|
|
|
|
*/
|
2014-08-26 03:28:19 +02:00
|
|
|
if (isTempOrTempToastNamespace(rel->rd_rel->relnamespace))
|
2007-07-26 00:16:18 +02:00
|
|
|
namespaceid = GetTempToastNamespace();
|
|
|
|
else
|
|
|
|
namespaceid = PG_TOAST_NAMESPACE;
|
|
|
|
|
Ignore attempts to add TOAST table to shared or catalog tables
Running ALTER TABLE on any table will check if a TOAST table needs to be
added. On shared tables, this would previously fail, thus effectively
disabling ALTER TABLE for those tables. On (non-shared) system
catalogs, on the other hand, it would add a TOAST table, even though we
don't really want TOAST tables on some system catalogs. In some cases,
it would also fail with an error "AccessExclusiveLock required to add
toast table.", depending on what locks the ALTER TABLE actions had
already taken.
So instead, just ignore attempts to add TOAST tables to such tables,
outside of bootstrap mode, pretending they don't need one.
This allows running ALTER TABLE on such tables without messing up the
TOAST situation. Legitimate uses for ALTER TABLE on system catalogs
include setting reloptions (say, fillfactor or autovacuum settings).
(All this still requires allow_system_table_mods, which is independent
of this.)
Discussion: https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975@2ndquadrant.com
2019-03-19 10:48:03 +01:00
|
|
|
/* Toast table is shared if and only if its parent is. */
|
|
|
|
shared_relation = rel->rd_rel->relisshared;
|
|
|
|
|
|
|
|
/* It's mapped if and only if its parent is, too */
|
|
|
|
mapped_relation = RelationIsMapped(rel);
|
|
|
|
|
2006-07-31 03:16:38 +02:00
|
|
|
toast_relid = heap_create_with_catalog(toast_relname,
|
2007-07-26 00:16:18 +02:00
|
|
|
namespaceid,
|
2006-07-31 03:16:38 +02:00
|
|
|
rel->rd_rel->reltablespace,
|
|
|
|
toastOid,
|
Don't create pg_type entries for sequences or toast tables.
Commit f7f70d5e2 left one inconsistency behind: we're still creating
pg_type entries for the composite types of sequences and toast tables,
but not arrays over those composites. But there seems precious little
reason to have named composite types for toast tables, and not much more
to have them for sequences (especially given the thought that sequences
may someday not be standalone relations at all).
So, let's close that inconsistency by removing these composite types,
rather than adding arrays for them. This buys back a little bit of
the initial pg_type bloat added by the previous patch, and could be
a significant savings in a large database with many toast tables.
Aside from a small logic rearrangement in heap_create_with_catalog,
this patch mostly needs to clean up some places that were assuming that
pg_class.reltype always has a valid value. Those are really pre-existing
bugs, given that it's documented otherwise; notably, the plpgsql changes
fix code that gives "cache lookup failed for type 0" on indexes today.
But none of these seem interesting enough to back-patch.
Also, remove the pg_dump/pg_upgrade infrastructure for propagating
a toast table's pg_type OID into the new database, since we no longer
need that.
Discussion: https://postgr.es/m/761F1389-C6A8-4C15-80CE-950C961F5341@gmail.com
2020-07-07 21:43:22 +02:00
|
|
|
InvalidOid,
|
2010-01-29 00:21:13 +01:00
|
|
|
InvalidOid,
|
2006-07-31 03:16:38 +02:00
|
|
|
rel->rd_rel->relowner,
|
2020-01-07 20:23:25 +01:00
|
|
|
table_relation_toast_am(rel),
|
2006-07-31 03:16:38 +02:00
|
|
|
tupdesc,
|
2008-05-10 01:32:05 +02:00
|
|
|
NIL,
|
2006-07-31 03:16:38 +02:00
|
|
|
RELKIND_TOASTVALUE,
|
2010-12-13 18:34:26 +01:00
|
|
|
rel->rd_rel->relpersistence,
|
2006-07-31 03:16:38 +02:00
|
|
|
shared_relation,
|
2010-02-07 21:48:13 +01:00
|
|
|
mapped_relation,
|
2006-07-31 03:16:38 +02:00
|
|
|
ONCOMMIT_NOOP,
|
2009-02-02 20:31:40 +01:00
|
|
|
reloptions,
|
2009-10-05 21:24:49 +02:00
|
|
|
false,
|
2012-10-23 23:07:26 +02:00
|
|
|
true,
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
true,
|
2021-08-25 06:40:50 +02:00
|
|
|
OIDOldToast,
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
NULL);
|
2010-07-26 01:21:22 +02:00
|
|
|
Assert(toast_relid != InvalidOid);
|
2006-07-31 03:16:38 +02:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
/* make the toast relation visible, else table_open will fail */
|
2006-07-31 03:16:38 +02:00
|
|
|
CommandCounterIncrement();
|
|
|
|
|
2011-01-25 21:42:03 +01:00
|
|
|
/* ShareLock is not really needed here, but take it anyway */
|
2019-01-21 19:32:19 +01:00
|
|
|
toast_rel = table_open(toast_relid, ShareLock);
|
2011-01-25 21:42:03 +01:00
|
|
|
|
2006-07-31 03:16:38 +02:00
|
|
|
/*
|
|
|
|
* Create unique index on chunk_id, chunk_seq.
|
|
|
|
*
|
|
|
|
* NOTE: the normal TOAST access routines could actually function with a
|
|
|
|
* single-column index on chunk_id only. However, the slice access
|
|
|
|
* routines use both columns for faster access to an individual chunk. In
|
|
|
|
* addition, we want it to be unique as a check against the possibility of
|
|
|
|
* duplicate TOAST chunk OIDs. The index might also be a little more
|
|
|
|
* efficient this way, since btree isn't all that happy with large numbers
|
|
|
|
* of equal keys.
|
|
|
|
*/
|
|
|
|
|
|
|
|
indexInfo = makeNode(IndexInfo);
|
|
|
|
indexInfo->ii_NumIndexAttrs = 2;
|
2018-04-07 22:00:39 +02:00
|
|
|
indexInfo->ii_NumIndexKeyAttrs = 2;
|
2018-04-12 12:02:45 +02:00
|
|
|
indexInfo->ii_IndexAttrNumbers[0] = 1;
|
|
|
|
indexInfo->ii_IndexAttrNumbers[1] = 2;
|
2006-07-31 03:16:38 +02:00
|
|
|
indexInfo->ii_Expressions = NIL;
|
|
|
|
indexInfo->ii_ExpressionsState = NIL;
|
|
|
|
indexInfo->ii_Predicate = NIL;
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
indexInfo->ii_PredicateState = NULL;
|
2009-12-07 06:22:23 +01:00
|
|
|
indexInfo->ii_ExclusionOps = NULL;
|
|
|
|
indexInfo->ii_ExclusionProcs = NULL;
|
|
|
|
indexInfo->ii_ExclusionStrats = NULL;
|
Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing. These index AMs are GiST, GIN,
SP-GiST and BRIN. There opclasses define representation of keys, operations on
them and supported search strategies. So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision. This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.
This commit doesn't introduce new storage in system catalog. Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.
In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions. Options
are set to fn_expr as the constant bytea expression. It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.
This commit comes with some examples of opclass options usage. We parametrize
signature length in GiST. That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops. Also we parametrize maximum number of integer ranges for
gist__int_ops. However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.
Catversion is bumped.
Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
|
|
|
indexInfo->ii_OpclassOptions = NULL;
|
2006-07-31 03:16:38 +02:00
|
|
|
indexInfo->ii_Unique = true;
|
2007-09-20 19:56:33 +02:00
|
|
|
indexInfo->ii_ReadyForInserts = true;
|
2006-08-25 06:06:58 +02:00
|
|
|
indexInfo->ii_Concurrent = false;
|
2007-09-20 19:56:33 +02:00
|
|
|
indexInfo->ii_BrokenHotChain = false;
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
indexInfo->ii_ParallelWorkers = 0;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
indexInfo->ii_Am = BTREE_AM_OID;
|
Allow index AMs to cache data across aminsert calls within a SQL command.
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that. However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)
For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.
The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data. What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.
Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
2017-02-09 17:52:12 +01:00
|
|
|
indexInfo->ii_AmCache = NULL;
|
|
|
|
indexInfo->ii_Context = CurrentMemoryContext;
|
2006-07-31 03:16:38 +02:00
|
|
|
|
2011-02-08 22:04:18 +01:00
|
|
|
collationObjectId[0] = InvalidOid;
|
|
|
|
collationObjectId[1] = InvalidOid;
|
|
|
|
|
2006-07-31 03:16:38 +02:00
|
|
|
classObjectId[0] = OID_BTREE_OPS_OID;
|
|
|
|
classObjectId[1] = INT4_BTREE_OPS_OID;
|
|
|
|
|
2007-01-09 03:14:16 +01:00
|
|
|
coloptions[0] = 0;
|
|
|
|
coloptions[1] = 0;
|
|
|
|
|
2011-07-18 17:02:48 +02:00
|
|
|
index_create(toast_rel, toast_idxname, toastIndexOid, InvalidOid,
|
2018-02-19 20:59:37 +01:00
|
|
|
InvalidOid, InvalidOid,
|
2006-07-31 03:16:38 +02:00
|
|
|
indexInfo,
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
list_make2("chunk_id", "chunk_seq"),
|
2006-07-31 03:16:38 +02:00
|
|
|
BTREE_AM_OID,
|
|
|
|
rel->rd_rel->reltablespace,
|
2011-02-08 22:04:18 +01:00
|
|
|
collationObjectId, classObjectId, coloptions, (Datum) 0,
|
2018-02-19 20:59:37 +01:00
|
|
|
INDEX_CREATE_IS_PRIMARY, 0, true, true, NULL);
|
2006-07-31 03:16:38 +02:00
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(toast_rel, NoLock);
|
2011-01-25 21:42:03 +01:00
|
|
|
|
2006-07-31 03:16:38 +02:00
|
|
|
/*
|
|
|
|
* Store the toast table's OID in the parent relation's pg_class row
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
class_rel = table_open(RelationRelationId, RowExclusiveLock);
|
2006-07-31 03:16:38 +02:00
|
|
|
|
2010-02-14 19:42:19 +01:00
|
|
|
reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
|
2006-07-31 03:16:38 +02:00
|
|
|
if (!HeapTupleIsValid(reltup))
|
|
|
|
elog(ERROR, "cache lookup failed for relation %u", relOid);
|
|
|
|
|
|
|
|
((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
|
|
|
|
|
|
|
|
if (!IsBootstrapProcessingMode())
|
|
|
|
{
|
|
|
|
/* normal case, use a transactional update */
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
|
2006-07-31 03:16:38 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
|
|
|
|
heap_inplace_update(class_rel, reltup);
|
|
|
|
}
|
|
|
|
|
|
|
|
heap_freetuple(reltup);
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(class_rel, RowExclusiveLock);
|
2006-07-31 03:16:38 +02:00
|
|
|
|
|
|
|
/*
|
2020-06-15 19:14:40 +02:00
|
|
|
* Register dependency from the toast table to the main, so that the toast
|
|
|
|
* table will be deleted if the main is. Skip this in bootstrap mode.
|
2006-07-31 03:16:38 +02:00
|
|
|
*/
|
|
|
|
if (!IsBootstrapProcessingMode())
|
|
|
|
{
|
|
|
|
baseobject.classId = RelationRelationId;
|
|
|
|
baseobject.objectId = relOid;
|
|
|
|
baseobject.objectSubId = 0;
|
|
|
|
toastobject.classId = RelationRelationId;
|
|
|
|
toastobject.objectId = toast_relid;
|
|
|
|
toastobject.objectSubId = 0;
|
|
|
|
|
|
|
|
recordDependencyOn(&toastobject, &baseobject, DEPENDENCY_INTERNAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make changes visible
|
|
|
|
*/
|
|
|
|
CommandCounterIncrement();
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2019-05-21 17:57:13 +02:00
|
|
|
* Check to see whether the table needs a TOAST table.
|
2006-07-31 03:16:38 +02:00
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
needs_toast_table(Relation rel)
|
|
|
|
{
|
Ignore attempts to add TOAST table to shared or catalog tables
Running ALTER TABLE on any table will check if a TOAST table needs to be
added. On shared tables, this would previously fail, thus effectively
disabling ALTER TABLE for those tables. On (non-shared) system
catalogs, on the other hand, it would add a TOAST table, even though we
don't really want TOAST tables on some system catalogs. In some cases,
it would also fail with an error "AccessExclusiveLock required to add
toast table.", depending on what locks the ALTER TABLE actions had
already taken.
So instead, just ignore attempts to add TOAST tables to such tables,
outside of bootstrap mode, pretending they don't need one.
This allows running ALTER TABLE on such tables without messing up the
TOAST situation. Legitimate uses for ALTER TABLE on system catalogs
include setting reloptions (say, fillfactor or autovacuum settings).
(All this still requires allow_system_table_mods, which is independent
of this.)
Discussion: https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975@2ndquadrant.com
2019-03-19 10:48:03 +01:00
|
|
|
/*
|
|
|
|
* No need to create a TOAST table for partitioned tables.
|
|
|
|
*/
|
2018-03-22 18:49:38 +01:00
|
|
|
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
return false;
|
|
|
|
|
Ignore attempts to add TOAST table to shared or catalog tables
Running ALTER TABLE on any table will check if a TOAST table needs to be
added. On shared tables, this would previously fail, thus effectively
disabling ALTER TABLE for those tables. On (non-shared) system
catalogs, on the other hand, it would add a TOAST table, even though we
don't really want TOAST tables on some system catalogs. In some cases,
it would also fail with an error "AccessExclusiveLock required to add
toast table.", depending on what locks the ALTER TABLE actions had
already taken.
So instead, just ignore attempts to add TOAST tables to such tables,
outside of bootstrap mode, pretending they don't need one.
This allows running ALTER TABLE on such tables without messing up the
TOAST situation. Legitimate uses for ALTER TABLE on system catalogs
include setting reloptions (say, fillfactor or autovacuum settings).
(All this still requires allow_system_table_mods, which is independent
of this.)
Discussion: https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975@2ndquadrant.com
2019-03-19 10:48:03 +01:00
|
|
|
/*
|
|
|
|
* We cannot allow toasting a shared relation after initdb (because
|
|
|
|
* there's no way to mark it toasted in other databases' pg_class).
|
|
|
|
*/
|
|
|
|
if (rel->rd_rel->relisshared && !IsBootstrapProcessingMode())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Ignore attempts to create toast tables on catalog tables after initdb.
|
|
|
|
* Which catalogs get toast tables is explicitly chosen in catalog/pg_*.h.
|
2020-11-07 12:11:40 +01:00
|
|
|
* (We could get here via some ALTER TABLE command if the catalog doesn't
|
Ignore attempts to add TOAST table to shared or catalog tables
Running ALTER TABLE on any table will check if a TOAST table needs to be
added. On shared tables, this would previously fail, thus effectively
disabling ALTER TABLE for those tables. On (non-shared) system
catalogs, on the other hand, it would add a TOAST table, even though we
don't really want TOAST tables on some system catalogs. In some cases,
it would also fail with an error "AccessExclusiveLock required to add
toast table.", depending on what locks the ALTER TABLE actions had
already taken.
So instead, just ignore attempts to add TOAST tables to such tables,
outside of bootstrap mode, pretending they don't need one.
This allows running ALTER TABLE on such tables without messing up the
TOAST situation. Legitimate uses for ALTER TABLE on system catalogs
include setting reloptions (say, fillfactor or autovacuum settings).
(All this still requires allow_system_table_mods, which is independent
of this.)
Discussion: https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975@2ndquadrant.com
2019-03-19 10:48:03 +01:00
|
|
|
* have a toast table.)
|
|
|
|
*/
|
|
|
|
if (IsCatalogRelation(rel) && !IsBootstrapProcessingMode())
|
|
|
|
return false;
|
|
|
|
|
2019-05-21 17:57:13 +02:00
|
|
|
/* Otherwise, let the AM decide. */
|
|
|
|
return table_relation_needs_toast_table(rel);
|
2006-07-31 03:16:38 +02:00
|
|
|
}
|