1997-04-02 05:51:23 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* sequence.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* PostgreSQL sequences support code.
|
1997-04-02 05:51:23 +02:00
|
|
|
*
|
2017-01-03 19:48:53 +01:00
|
|
|
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
2000-12-08 21:10:19 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/commands/sequence.c
|
2000-12-08 21:10:19 +01:00
|
|
|
*
|
1997-04-02 05:51:23 +02:00
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
2000-12-08 21:10:19 +01:00
|
|
|
#include "postgres.h"
|
1997-04-02 05:51:23 +02:00
|
|
|
|
2017-02-08 21:45:30 +01:00
|
|
|
#include "access/bufmask.h"
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
#include "access/multixact.h"
|
|
|
|
#include "access/transam.h"
|
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.
This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.
This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...
This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.
Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau
Per complaint from maumau and Thom Brown
Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
2015-02-26 12:50:07 +01:00
|
|
|
#include "access/xact.h"
|
2014-11-06 12:52:08 +01:00
|
|
|
#include "access/xlog.h"
|
|
|
|
#include "access/xloginsert.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "access/xlogutils.h"
|
2006-08-21 02:57:26 +02:00
|
|
|
#include "catalog/dependency.h"
|
2016-12-20 18:00:00 +01:00
|
|
|
#include "catalog/indexing.h"
|
2002-03-30 02:02:42 +01:00
|
|
|
#include "catalog/namespace.h"
|
2013-03-18 03:55:14 +01:00
|
|
|
#include "catalog/objectaccess.h"
|
2016-12-20 18:00:00 +01:00
|
|
|
#include "catalog/pg_sequence.h"
|
2002-03-29 20:06:29 +01:00
|
|
|
#include "catalog/pg_type.h"
|
2002-05-22 23:40:55 +02:00
|
|
|
#include "commands/defrem.h"
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "commands/sequence.h"
|
2006-07-11 19:26:59 +02:00
|
|
|
#include "commands/tablecmds.h"
|
2011-01-02 14:08:08 +01:00
|
|
|
#include "funcapi.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "miscadmin.h"
|
2006-03-14 23:48:25 +01:00
|
|
|
#include "nodes/makefuncs.h"
|
2017-02-10 21:12:32 +01:00
|
|
|
#include "parser/parse_type.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "storage/lmgr.h"
|
2011-09-04 07:13:16 +02:00
|
|
|
#include "storage/proc.h"
|
2010-02-09 22:43:30 +01:00
|
|
|
#include "storage/smgr.h"
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "utils/acl.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "utils/builtins.h"
|
2006-08-21 02:57:26 +02:00
|
|
|
#include "utils/lsyscache.h"
|
2004-09-16 18:58:44 +02:00
|
|
|
#include "utils/resowner.h"
|
2005-06-07 09:08:35 +02:00
|
|
|
#include "utils/syscache.h"
|
2017-01-21 02:29:53 +01:00
|
|
|
#include "utils/varlena.h"
|
2004-09-16 18:58:44 +02:00
|
|
|
|
2001-06-07 00:03:48 +02:00
|
|
|
|
2000-11-30 02:47:33 +01:00
|
|
|
/*
|
2002-01-11 19:16:04 +01:00
|
|
|
* We don't want to log each fetching of a value from a sequence,
|
2000-11-30 02:47:33 +01:00
|
|
|
* so we pre-log a few fetches in advance. In the event of
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
* crash we can lose (skip over) as many values as we pre-logged.
|
2000-11-30 02:47:33 +01:00
|
|
|
*/
|
2001-03-22 05:01:46 +01:00
|
|
|
#define SEQ_LOG_VALS 32
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-05-22 23:40:55 +02:00
|
|
|
/*
|
|
|
|
* The "special area" of a sequence's buffer page looks like this.
|
|
|
|
*/
|
|
|
|
#define SEQ_MAGIC 0x1717
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
typedef struct sequence_magic
|
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
uint32 magic;
|
1997-09-08 23:56:23 +02:00
|
|
|
} sequence_magic;
|
1997-04-02 05:51:23 +02:00
|
|
|
|
2002-05-22 23:40:55 +02:00
|
|
|
/*
|
|
|
|
* We store a SeqTable item for every sequence we have touched in the current
|
|
|
|
* session. This is needed to hold onto nextval/currval state. (We can't
|
|
|
|
* rely on the relcache, since it's only, well, a cache, and may decide to
|
|
|
|
* discard entries.)
|
|
|
|
*/
|
1997-09-07 07:04:48 +02:00
|
|
|
typedef struct SeqTableData
|
|
|
|
{
|
2013-11-15 11:29:38 +01:00
|
|
|
Oid relid; /* pg_class OID of this sequence (hash key) */
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
Oid filenode; /* last seen relfilenode of this sequence */
|
2007-09-05 20:10:48 +02:00
|
|
|
LocalTransactionId lxid; /* xact in which we last did a seq op */
|
2007-10-25 20:54:03 +02:00
|
|
|
bool last_valid; /* do we have a valid "last" value? */
|
2002-05-22 23:40:55 +02:00
|
|
|
int64 last; /* value last returned by nextval */
|
|
|
|
int64 cached; /* last value already cached for nextval */
|
|
|
|
/* if last != cached, we have not used up all the cached values */
|
|
|
|
int64 increment; /* copy of sequence's increment field */
|
2016-12-20 18:00:00 +01:00
|
|
|
/* note that increment is zero until we first do nextval_internal() */
|
1997-09-08 23:56:23 +02:00
|
|
|
} SeqTableData;
|
1997-04-02 05:51:23 +02:00
|
|
|
|
|
|
|
typedef SeqTableData *SeqTable;
|
|
|
|
|
2013-11-15 18:17:12 +01:00
|
|
|
static HTAB *seqhashtab = NULL; /* hash table for SeqTable items */
|
1997-04-02 05:51:23 +02:00
|
|
|
|
2005-06-07 09:08:35 +02:00
|
|
|
/*
|
|
|
|
* last_used_seq is updated by nextval() to point to the last used
|
|
|
|
* sequence.
|
|
|
|
*/
|
|
|
|
static SeqTableData *last_used_seq = NULL;
|
2002-05-22 23:40:55 +02:00
|
|
|
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
static void fill_seq_with_data(Relation rel, HeapTuple tuple);
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
static Relation lock_and_open_sequence(SeqTable seq);
|
2013-11-15 11:29:38 +01:00
|
|
|
static void create_seq_hashtable(void);
|
2005-10-03 01:50:16 +02:00
|
|
|
static void init_sequence(Oid relid, SeqTable *p_elm, Relation *p_rel);
|
2016-12-20 18:00:00 +01:00
|
|
|
static Form_pg_sequence_data read_seq_tuple(Relation rel,
|
|
|
|
Buffer *buf, HeapTuple seqdatatuple);
|
2017-04-06 14:33:16 +02:00
|
|
|
static void init_params(ParseState *pstate, List *options, bool for_identity,
|
2017-05-17 22:31:56 +02:00
|
|
|
bool isInit,
|
|
|
|
Form_pg_sequence seqform,
|
2017-06-12 22:57:31 +02:00
|
|
|
Form_pg_sequence_data seqdataform,
|
|
|
|
bool *need_seq_rewrite,
|
|
|
|
List **owned_by);
|
2005-10-03 01:50:16 +02:00
|
|
|
static void do_setval(Oid relid, int64 next, bool iscalled);
|
2017-04-06 14:33:16 +02:00
|
|
|
static void process_owned_by(Relation seqrel, List *owned_by, bool for_identity);
|
2006-08-21 02:57:26 +02:00
|
|
|
|
1997-04-02 05:51:23 +02:00
|
|
|
|
|
|
|
/*
|
1999-05-25 18:15:34 +02:00
|
|
|
* DefineSequence
|
1997-09-07 07:04:48 +02:00
|
|
|
* Creates a new sequence relation
|
1997-04-02 05:51:23 +02:00
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
2016-09-06 18:00:00 +02:00
|
|
|
DefineSequence(ParseState *pstate, CreateSeqStmt *seq)
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
FormData_pg_sequence seqform;
|
|
|
|
FormData_pg_sequence_data seqdataform;
|
2017-06-12 22:57:31 +02:00
|
|
|
bool need_seq_rewrite;
|
2006-08-21 02:57:26 +02:00
|
|
|
List *owned_by;
|
1997-09-08 04:41:22 +02:00
|
|
|
CreateStmt *stmt = makeNode(CreateStmt);
|
2002-03-22 03:56:37 +01:00
|
|
|
Oid seqoid;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
1997-09-08 04:41:22 +02:00
|
|
|
Relation rel;
|
|
|
|
HeapTuple tuple;
|
|
|
|
TupleDesc tupDesc;
|
|
|
|
Datum value[SEQ_COL_LASTCOL];
|
2008-11-02 02:45:28 +01:00
|
|
|
bool null[SEQ_COL_LASTCOL];
|
2016-12-20 18:00:00 +01:00
|
|
|
Datum pgs_values[Natts_pg_sequence];
|
|
|
|
bool pgs_nulls[Natts_pg_sequence];
|
1997-09-08 04:41:22 +02:00
|
|
|
int i;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2011-02-22 20:42:45 +01:00
|
|
|
/* Unlogged sequences are not implemented -- not clear if useful. */
|
|
|
|
if (seq->sequence->relpersistence == RELPERSISTENCE_UNLOGGED)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("unlogged sequences are not supported")));
|
|
|
|
|
2014-08-26 15:05:18 +02:00
|
|
|
/*
|
|
|
|
* If if_not_exists was given and a relation with the same name already
|
|
|
|
* exists, bail out. (Note: we needn't check this when not if_not_exists,
|
|
|
|
* because DefineRelation will complain anyway.)
|
|
|
|
*/
|
|
|
|
if (seq->if_not_exists)
|
|
|
|
{
|
|
|
|
RangeVarGetAndCheckCreationNamespace(seq->sequence, NoLock, &seqoid);
|
|
|
|
if (OidIsValid(seqoid))
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_TABLE),
|
|
|
|
errmsg("relation \"%s\" already exists, skipping",
|
|
|
|
seq->sequence->relname)));
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return InvalidObjectAddress;
|
2014-08-26 15:05:18 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2003-11-24 17:54:07 +01:00
|
|
|
/* Check and set all option values */
|
2017-06-12 22:57:31 +02:00
|
|
|
init_params(pstate, seq->options, seq->for_identity, true,
|
|
|
|
&seqform, &seqdataform,
|
|
|
|
&need_seq_rewrite, &owned_by);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
/*
|
2008-05-17 01:36:05 +02:00
|
|
|
* Create relation (and fill value[] and null[] for the tuple)
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
|
|
|
stmt->tableElts = NIL;
|
|
|
|
for (i = SEQ_COL_FIRSTCOL; i <= SEQ_COL_LASTCOL; i++)
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
2006-03-14 23:48:25 +01:00
|
|
|
ColumnDef *coldef = makeNode(ColumnDef);
|
2002-03-29 20:06:29 +01:00
|
|
|
|
2002-09-22 21:42:52 +02:00
|
|
|
coldef->inhcount = 0;
|
|
|
|
coldef->is_local = true;
|
2002-07-17 00:12:20 +02:00
|
|
|
coldef->is_not_null = true;
|
Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee. It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".
To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API. I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes. This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.
Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.
While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.
BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-10 04:38:52 +01:00
|
|
|
coldef->is_from_type = false;
|
2017-04-28 19:52:17 +02:00
|
|
|
coldef->is_from_parent = false;
|
2009-10-13 02:53:08 +02:00
|
|
|
coldef->storage = 0;
|
1999-10-04 01:55:40 +02:00
|
|
|
coldef->raw_default = NULL;
|
|
|
|
coldef->cooked_default = NULL;
|
Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee. It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".
To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API. I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes. This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.
Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.
While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.
BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-10 04:38:52 +01:00
|
|
|
coldef->collClause = NULL;
|
|
|
|
coldef->collOid = InvalidOid;
|
2002-07-17 00:12:20 +02:00
|
|
|
coldef->constraints = NIL;
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
coldef->location = -1;
|
2002-07-17 00:12:20 +02:00
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
null[i - 1] = false;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
switch (i)
|
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
case SEQ_COL_LASTVAL:
|
Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee. It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".
To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API. I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes. This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.
Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.
While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.
BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-10 04:38:52 +01:00
|
|
|
coldef->typeName = makeTypeNameFromOid(INT8OID, -1);
|
1997-09-08 04:41:22 +02:00
|
|
|
coldef->colname = "last_value";
|
2016-12-20 18:00:00 +01:00
|
|
|
value[i - 1] = Int64GetDatumFast(seqdataform.last_value);
|
1997-09-08 04:41:22 +02:00
|
|
|
break;
|
2000-11-30 02:47:33 +01:00
|
|
|
case SEQ_COL_LOG:
|
Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee. It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".
To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API. I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes. This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.
Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.
While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.
BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-10 04:38:52 +01:00
|
|
|
coldef->typeName = makeTypeNameFromOid(INT8OID, -1);
|
2000-11-30 02:47:33 +01:00
|
|
|
coldef->colname = "log_cnt";
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
value[i - 1] = Int64GetDatum((int64) 0);
|
2000-11-30 02:47:33 +01:00
|
|
|
break;
|
1997-09-08 04:41:22 +02:00
|
|
|
case SEQ_COL_CALLED:
|
Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee. It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".
To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API. I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes. This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.
Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.
While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.
BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-10 04:38:52 +01:00
|
|
|
coldef->typeName = makeTypeNameFromOid(BOOLOID, -1);
|
1997-09-08 04:41:22 +02:00
|
|
|
coldef->colname = "is_called";
|
2001-08-16 22:38:56 +02:00
|
|
|
value[i - 1] = BoolGetDatum(false);
|
1997-09-08 04:41:22 +02:00
|
|
|
break;
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
|
|
|
stmt->tableElts = lappend(stmt->tableElts, coldef);
|
|
|
|
}
|
|
|
|
|
2002-03-21 17:02:16 +01:00
|
|
|
stmt->relation = seq->sequence;
|
|
|
|
stmt->inhRelations = NIL;
|
1997-09-07 07:04:48 +02:00
|
|
|
stmt->constraints = NIL;
|
Clean up the mess around EXPLAIN and materialized views.
Revert the matview-related changes in explain.c's API, as per recent
complaint from Robert Haas. The reason for these appears to have been
principally some ill-considered choices around having intorel_startup do
what ought to be parse-time checking, plus a poor arrangement for passing
it the view parsetree it needs to store into pg_rewrite when creating a
materialized view. Do the latter by having parse analysis stick a copy
into the IntoClause, instead of doing it at runtime. (On the whole,
I seriously question the choice to represent CREATE MATERIALIZED VIEW as a
variant of SELECT INTO/CREATE TABLE AS, because that means injecting even
more complexity into what was already a horrid legacy kluge. However,
I didn't go so far as to rethink that choice ... yet.)
I also moved several error checks into matview parse analysis, and
made the check for external Params in a matview more accurate.
In passing, clean things up a bit more around interpretOidsOption(),
and fix things so that we can use that to force no-oids for views,
sequences, etc, thereby eliminating the need to cons up "oids = false"
options when creating them.
catversion bump due to change in IntoClause. (I wonder though if we
really need readfuncs/outfuncs support for IntoClause anymore.)
2013-04-13 01:25:20 +02:00
|
|
|
stmt->options = NIL;
|
2002-11-11 23:19:25 +01:00
|
|
|
stmt->oncommit = ONCOMMIT_NOOP;
|
2004-07-12 07:38:11 +02:00
|
|
|
stmt->tablespacename = NULL;
|
2014-08-26 15:05:18 +02:00
|
|
|
stmt->if_not_exists = seq->if_not_exists;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
address = DefineRelation(stmt, RELKIND_SEQUENCE, seq->ownerId, NULL, NULL);
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
seqoid = address.objectId;
|
2010-07-26 01:21:22 +02:00
|
|
|
Assert(seqoid != InvalidOid);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-03-22 03:56:37 +01:00
|
|
|
rel = heap_open(seqoid, AccessExclusiveLock);
|
1998-09-01 05:29:17 +02:00
|
|
|
tupDesc = RelationGetDescr(rel);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
/* now initialize the sequence's data */
|
|
|
|
tuple = heap_form_tuple(tupDesc, value, null);
|
|
|
|
fill_seq_with_data(rel, tuple);
|
|
|
|
|
|
|
|
/* process OWNED BY if given */
|
|
|
|
if (owned_by)
|
2017-04-06 14:33:16 +02:00
|
|
|
process_owned_by(rel, owned_by, seq->for_identity);
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
|
|
|
|
heap_close(rel, NoLock);
|
2012-12-24 00:25:03 +01:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
/* fill in pg_sequence */
|
|
|
|
rel = heap_open(SequenceRelationId, RowExclusiveLock);
|
|
|
|
tupDesc = RelationGetDescr(rel);
|
|
|
|
|
|
|
|
memset(pgs_nulls, 0, sizeof(pgs_nulls));
|
|
|
|
|
|
|
|
pgs_values[Anum_pg_sequence_seqrelid - 1] = ObjectIdGetDatum(seqoid);
|
2017-02-10 21:12:32 +01:00
|
|
|
pgs_values[Anum_pg_sequence_seqtypid - 1] = ObjectIdGetDatum(seqform.seqtypid);
|
2016-12-20 18:00:00 +01:00
|
|
|
pgs_values[Anum_pg_sequence_seqstart - 1] = Int64GetDatumFast(seqform.seqstart);
|
|
|
|
pgs_values[Anum_pg_sequence_seqincrement - 1] = Int64GetDatumFast(seqform.seqincrement);
|
|
|
|
pgs_values[Anum_pg_sequence_seqmax - 1] = Int64GetDatumFast(seqform.seqmax);
|
|
|
|
pgs_values[Anum_pg_sequence_seqmin - 1] = Int64GetDatumFast(seqform.seqmin);
|
|
|
|
pgs_values[Anum_pg_sequence_seqcache - 1] = Int64GetDatumFast(seqform.seqcache);
|
2017-02-10 21:12:32 +01:00
|
|
|
pgs_values[Anum_pg_sequence_seqcycle - 1] = BoolGetDatum(seqform.seqcycle);
|
2016-12-20 18:00:00 +01:00
|
|
|
|
|
|
|
tuple = heap_form_tuple(tupDesc, pgs_values, pgs_nulls);
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleInsert(rel, tuple);
|
2016-12-20 18:00:00 +01:00
|
|
|
|
|
|
|
heap_freetuple(tuple);
|
|
|
|
heap_close(rel, RowExclusiveLock);
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Reset a sequence to its initial value.
|
|
|
|
*
|
|
|
|
* The change is made transactionally, so that on failure of the current
|
|
|
|
* transaction, the sequence will be restored to its previous state.
|
|
|
|
* We do that by creating a whole new relfilenode for the sequence; so this
|
|
|
|
* works much like the rewriting forms of ALTER TABLE.
|
|
|
|
*
|
|
|
|
* Caller is assumed to have acquired AccessExclusiveLock on the sequence,
|
|
|
|
* which must not be released until end of transaction. Caller is also
|
|
|
|
* responsible for permissions checking.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ResetSequence(Oid seq_relid)
|
|
|
|
{
|
|
|
|
Relation seq_rel;
|
|
|
|
SeqTable elm;
|
2016-12-20 18:00:00 +01:00
|
|
|
Form_pg_sequence_data seq;
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
Buffer buf;
|
2016-12-20 18:00:00 +01:00
|
|
|
HeapTupleData seqdatatuple;
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
HeapTuple tuple;
|
2016-12-20 18:00:00 +01:00
|
|
|
HeapTuple pgstuple;
|
|
|
|
Form_pg_sequence pgsform;
|
|
|
|
int64 startv;
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Read the old sequence. This does a bit more work than really
|
|
|
|
* necessary, but it's simple, and we do want to double-check that it's
|
|
|
|
* indeed a sequence.
|
|
|
|
*/
|
|
|
|
init_sequence(seq_relid, &elm, &seq_rel);
|
2016-12-20 18:00:00 +01:00
|
|
|
(void) read_seq_tuple(seq_rel, &buf, &seqdatatuple);
|
|
|
|
|
|
|
|
pgstuple = SearchSysCache1(SEQRELID, ObjectIdGetDatum(seq_relid));
|
|
|
|
if (!HeapTupleIsValid(pgstuple))
|
|
|
|
elog(ERROR, "cache lookup failed for sequence %u", seq_relid);
|
|
|
|
pgsform = (Form_pg_sequence) GETSTRUCT(pgstuple);
|
|
|
|
startv = pgsform->seqstart;
|
|
|
|
ReleaseSysCache(pgstuple);
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Copy the existing sequence tuple.
|
|
|
|
*/
|
2016-12-20 18:00:00 +01:00
|
|
|
tuple = heap_copytuple(&seqdatatuple);
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
|
|
|
|
/* Now we're done with the old page */
|
|
|
|
UnlockReleaseBuffer(buf);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Modify the copied tuple to execute the restart (compare the RESTART
|
|
|
|
* action in AlterSequence)
|
|
|
|
*/
|
2016-12-20 18:00:00 +01:00
|
|
|
seq = (Form_pg_sequence_data) GETSTRUCT(tuple);
|
|
|
|
seq->last_value = startv;
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
seq->is_called = false;
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
seq->log_cnt = 0;
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
|
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* Create a new storage file for the sequence. We want to keep the
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
* sequence's relfrozenxid at 0, since it won't contain any unfrozen XIDs.
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
* Same with relminmxid, since a sequence will never contain multixacts.
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
*/
|
2014-11-17 15:23:35 +01:00
|
|
|
RelationSetNewRelfilenode(seq_rel, seq_rel->rd_rel->relpersistence,
|
|
|
|
InvalidTransactionId, InvalidMultiXactId);
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Insert the modified tuple into the new storage file.
|
|
|
|
*/
|
|
|
|
fill_seq_with_data(seq_rel, tuple);
|
|
|
|
|
|
|
|
/* Clear local cache so that we don't think we have cached numbers */
|
|
|
|
/* Note that we do not change the currval() state */
|
|
|
|
elm->cached = elm->last;
|
|
|
|
|
|
|
|
relation_close(seq_rel, NoLock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize a sequence's relation with the specified tuple as content
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
fill_seq_with_data(Relation rel, HeapTuple tuple)
|
|
|
|
{
|
|
|
|
Buffer buf;
|
|
|
|
Page page;
|
|
|
|
sequence_magic *sm;
|
2014-04-22 08:50:47 +02:00
|
|
|
OffsetNumber offnum;
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
|
2001-06-29 23:08:25 +02:00
|
|
|
/* Initialize first page of relation with special magic number */
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
buf = ReadBuffer(rel, P_NEW);
|
2001-06-29 23:08:25 +02:00
|
|
|
Assert(BufferGetBlockNumber(buf) == 0);
|
|
|
|
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(buf);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2008-07-13 22:45:47 +02:00
|
|
|
PageInit(page, BufferGetPageSize(buf), sizeof(sequence_magic));
|
1997-09-07 07:04:48 +02:00
|
|
|
sm = (sequence_magic *) PageGetSpecialPointer(page);
|
|
|
|
sm->magic = SEQ_MAGIC;
|
|
|
|
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
/* Now insert sequence tuple */
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2014-04-22 08:50:47 +02:00
|
|
|
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
|
2002-01-11 19:16:04 +01:00
|
|
|
|
2001-04-03 23:58:00 +02:00
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* Since VACUUM does not process sequences, we have to force the tuple to
|
|
|
|
* have xmin = FrozenTransactionId now. Otherwise it would become
|
|
|
|
* invisible to SELECTs after 2G transactions. It is okay to do this
|
2002-01-11 19:16:04 +01:00
|
|
|
* because if the current transaction aborts, no other xact will ever
|
|
|
|
* examine the sequence tuple anyway.
|
2001-04-03 23:58:00 +02:00
|
|
|
*/
|
2014-04-22 08:50:47 +02:00
|
|
|
HeapTupleHeaderSetXmin(tuple->t_data, FrozenTransactionId);
|
|
|
|
HeapTupleHeaderSetXminFrozen(tuple->t_data);
|
|
|
|
HeapTupleHeaderSetCmin(tuple->t_data, FirstCommandId);
|
|
|
|
HeapTupleHeaderSetXmax(tuple->t_data, InvalidTransactionId);
|
|
|
|
tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
|
|
|
|
ItemPointerSet(&tuple->t_data->t_ctid, 0, FirstOffsetNumber);
|
2002-08-06 04:36:35 +02:00
|
|
|
|
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.
This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.
This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...
This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.
Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau
Per complaint from maumau and Thom Brown
Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
2015-02-26 12:50:07 +01:00
|
|
|
/* check the comment above nextval_internal()'s equivalent call. */
|
|
|
|
if (RelationNeedsWAL(rel))
|
|
|
|
GetTopTransactionId();
|
|
|
|
|
2001-04-03 23:58:00 +02:00
|
|
|
START_CRIT_SECTION();
|
2002-01-11 19:16:04 +01:00
|
|
|
|
2006-04-01 01:32:07 +02:00
|
|
|
MarkBufferDirty(buf);
|
|
|
|
|
2014-04-22 08:50:47 +02:00
|
|
|
offnum = PageAddItem(page, (Item) tuple->t_data, tuple->t_len,
|
|
|
|
InvalidOffsetNumber, false, false);
|
|
|
|
if (offnum != FirstOffsetNumber)
|
|
|
|
elog(ERROR, "failed to add sequence tuple to page");
|
|
|
|
|
2002-08-06 04:36:35 +02:00
|
|
|
/* XLOG stuff */
|
2010-12-13 18:34:26 +01:00
|
|
|
if (RelationNeedsWAL(rel))
|
2001-04-03 23:58:00 +02:00
|
|
|
{
|
2001-10-25 07:50:21 +02:00
|
|
|
xl_seq_rec xlrec;
|
|
|
|
XLogRecPtr recptr;
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
|
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
|
2001-04-03 23:58:00 +02:00
|
|
|
|
|
|
|
xlrec.node = rel->rd_node;
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogRegisterData((char *) &xlrec, sizeof(xl_seq_rec));
|
|
|
|
XLogRegisterData((char *) tuple->t_data, tuple->t_len);
|
2001-04-03 23:58:00 +02:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
|
2001-04-03 23:58:00 +02:00
|
|
|
|
|
|
|
PageSetLSN(page, recptr);
|
|
|
|
}
|
2002-08-06 04:36:35 +02:00
|
|
|
|
2001-04-03 23:58:00 +02:00
|
|
|
END_CRIT_SECTION();
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-04-01 01:32:07 +02:00
|
|
|
UnlockReleaseBuffer(buf);
|
1997-04-02 05:51:23 +02:00
|
|
|
}
|
|
|
|
|
2003-03-20 08:02:11 +01:00
|
|
|
/*
|
|
|
|
* AlterSequence
|
|
|
|
*
|
2003-11-24 17:54:07 +01:00
|
|
|
* Modify the definition of a sequence relation
|
2003-03-20 08:02:11 +01:00
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
2016-09-06 18:00:00 +02:00
|
|
|
AlterSequence(ParseState *pstate, AlterSeqStmt *stmt)
|
2003-03-20 08:02:11 +01:00
|
|
|
{
|
2005-10-03 01:50:16 +02:00
|
|
|
Oid relid;
|
2003-03-20 08:02:11 +01:00
|
|
|
SeqTable elm;
|
|
|
|
Relation seqrel;
|
|
|
|
Buffer buf;
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
HeapTupleData datatuple;
|
2016-12-20 18:00:00 +01:00
|
|
|
Form_pg_sequence seqform;
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
Form_pg_sequence_data newdataform;
|
2017-06-12 22:57:31 +02:00
|
|
|
bool need_seq_rewrite;
|
2006-08-21 02:57:26 +02:00
|
|
|
List *owned_by;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
2016-12-20 18:00:00 +01:00
|
|
|
Relation rel;
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
HeapTuple seqtuple;
|
|
|
|
HeapTuple newdatatuple;
|
2003-03-20 08:02:11 +01:00
|
|
|
|
2017-06-13 20:58:17 +02:00
|
|
|
/* Open and lock sequence, and check for ownership along the way. */
|
|
|
|
relid = RangeVarGetRelidExtended(stmt->sequence,
|
|
|
|
ShareRowExclusiveLock,
|
|
|
|
stmt->missing_ok,
|
|
|
|
false,
|
|
|
|
RangeVarCallbackOwnsRelation,
|
|
|
|
NULL);
|
2012-01-24 00:25:04 +01:00
|
|
|
if (relid == InvalidOid)
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("relation \"%s\" does not exist, skipping",
|
2012-06-10 21:20:04 +02:00
|
|
|
stmt->sequence->relname)));
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return InvalidObjectAddress;
|
2012-01-24 00:25:04 +01:00
|
|
|
}
|
|
|
|
|
2005-10-03 01:50:16 +02:00
|
|
|
init_sequence(relid, &elm, &seqrel);
|
2003-03-20 08:02:11 +01:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
rel = heap_open(SequenceRelationId, RowExclusiveLock);
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
seqtuple = SearchSysCacheCopy1(SEQRELID,
|
|
|
|
ObjectIdGetDatum(relid));
|
|
|
|
if (!HeapTupleIsValid(seqtuple))
|
2016-12-20 18:00:00 +01:00
|
|
|
elog(ERROR, "cache lookup failed for sequence %u",
|
|
|
|
relid);
|
|
|
|
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
seqform = (Form_pg_sequence) GETSTRUCT(seqtuple);
|
2008-05-17 03:20:39 +02:00
|
|
|
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
/* lock page's buffer and read tuple into new sequence structure */
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
(void) read_seq_tuple(seqrel, &buf, &datatuple);
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
/* copy the existing sequence data tuple, so it can be modified localy */
|
|
|
|
newdatatuple = heap_copytuple(&datatuple);
|
|
|
|
newdataform = (Form_pg_sequence_data) GETSTRUCT(newdatatuple);
|
|
|
|
|
|
|
|
UnlockReleaseBuffer(buf);
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
|
2008-05-17 03:20:39 +02:00
|
|
|
/* Check and set new values */
|
2017-06-12 22:57:31 +02:00
|
|
|
init_params(pstate, stmt->options, stmt->for_identity, false,
|
|
|
|
seqform, newdataform,
|
|
|
|
&need_seq_rewrite, &owned_by);
|
2003-03-20 08:02:11 +01:00
|
|
|
|
2007-10-25 20:54:03 +02:00
|
|
|
/* Clear local cache so that we don't think we have cached numbers */
|
|
|
|
/* Note that we do not change the currval() state */
|
|
|
|
elm->cached = elm->last;
|
|
|
|
|
2017-06-12 22:57:31 +02:00
|
|
|
/* If needed, rewrite the sequence relation itself */
|
|
|
|
if (need_seq_rewrite)
|
|
|
|
{
|
|
|
|
/* check the comment above nextval_internal()'s equivalent call. */
|
|
|
|
if (RelationNeedsWAL(seqrel))
|
|
|
|
GetTopTransactionId();
|
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.
This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.
This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...
This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.
Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau
Per complaint from maumau and Thom Brown
Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
2015-02-26 12:50:07 +01:00
|
|
|
|
2017-06-12 22:57:31 +02:00
|
|
|
/*
|
|
|
|
* Create a new storage file for the sequence, making the state
|
|
|
|
* changes transactional. We want to keep the sequence's relfrozenxid
|
|
|
|
* at 0, since it won't contain any unfrozen XIDs. Same with
|
|
|
|
* relminmxid, since a sequence will never contain multixacts.
|
|
|
|
*/
|
|
|
|
RelationSetNewRelfilenode(seqrel, seqrel->rd_rel->relpersistence,
|
|
|
|
InvalidTransactionId, InvalidMultiXactId);
|
2003-03-20 08:02:11 +01:00
|
|
|
|
2017-06-12 22:57:31 +02:00
|
|
|
/*
|
|
|
|
* Insert the modified tuple into the new storage file.
|
|
|
|
*/
|
|
|
|
fill_seq_with_data(seqrel, newdatatuple);
|
|
|
|
}
|
2003-03-20 08:02:11 +01:00
|
|
|
|
2006-08-21 02:57:26 +02:00
|
|
|
/* process OWNED BY if given */
|
|
|
|
if (owned_by)
|
2017-04-06 14:33:16 +02:00
|
|
|
process_owned_by(seqrel, owned_by, stmt->for_identity);
|
2006-08-21 02:57:26 +02:00
|
|
|
|
2017-06-12 22:57:31 +02:00
|
|
|
/* update the pg_sequence tuple (we could skip this in some cases...) */
|
2017-06-01 02:03:10 +02:00
|
|
|
CatalogTupleUpdate(rel, &seqtuple->t_self, seqtuple);
|
|
|
|
|
2013-03-18 03:55:14 +01:00
|
|
|
InvokeObjectPostAlterHook(RelationRelationId, relid, 0);
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddressSet(address, RelationRelationId, relid);
|
|
|
|
|
Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were. That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.
To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.
This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not. This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.
This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.
Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 01:39:27 +02:00
|
|
|
heap_close(rel, RowExclusiveLock);
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
relation_close(seqrel, NoLock);
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2003-03-20 08:02:11 +01:00
|
|
|
}
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
void
|
|
|
|
DeleteSequenceTuple(Oid relid)
|
|
|
|
{
|
|
|
|
Relation rel;
|
|
|
|
HeapTuple tuple;
|
|
|
|
|
|
|
|
rel = heap_open(SequenceRelationId, RowExclusiveLock);
|
|
|
|
|
|
|
|
tuple = SearchSysCache1(SEQRELID, ObjectIdGetDatum(relid));
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
elog(ERROR, "cache lookup failed for sequence %u", relid);
|
|
|
|
|
2017-02-01 22:13:30 +01:00
|
|
|
CatalogTupleDelete(rel, &tuple->t_self);
|
2016-12-20 18:00:00 +01:00
|
|
|
|
|
|
|
ReleaseSysCache(tuple);
|
|
|
|
heap_close(rel, RowExclusiveLock);
|
|
|
|
}
|
1997-04-02 05:51:23 +02:00
|
|
|
|
2005-10-03 01:50:16 +02:00
|
|
|
/*
|
|
|
|
* Note: nextval with a text argument is no longer exported as a pg_proc
|
|
|
|
* entry, but we keep it around to ease porting of C code that may have
|
|
|
|
* called the function directly.
|
|
|
|
*/
|
2000-06-11 22:08:01 +02:00
|
|
|
Datum
|
|
|
|
nextval(PG_FUNCTION_ARGS)
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
2017-03-13 00:35:34 +01:00
|
|
|
text *seqin = PG_GETARG_TEXT_PP(0);
|
2002-03-30 02:02:42 +01:00
|
|
|
RangeVar *sequence;
|
2005-10-03 01:50:16 +02:00
|
|
|
Oid relid;
|
|
|
|
|
|
|
|
sequence = makeRangeVarFromNameList(textToQualifiedNameList(seqin));
|
2011-07-09 04:19:30 +02:00
|
|
|
|
|
|
|
/*
|
2012-06-10 21:20:04 +02:00
|
|
|
* XXX: This is not safe in the presence of concurrent DDL, but acquiring
|
|
|
|
* a lock here is more expensive than letting nextval_internal do it,
|
|
|
|
* since the latter maintains a cache that keeps us from hitting the lock
|
2014-05-06 18:12:18 +02:00
|
|
|
* manager more than once per transaction. It's not clear whether the
|
2012-06-10 21:20:04 +02:00
|
|
|
* performance penalty is material in practice, but for now, we do it this
|
|
|
|
* way.
|
2011-07-09 04:19:30 +02:00
|
|
|
*/
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
relid = RangeVarGetRelid(sequence, NoLock, false);
|
2005-10-03 01:50:16 +02:00
|
|
|
|
2017-04-06 14:33:16 +02:00
|
|
|
PG_RETURN_INT64(nextval_internal(relid, true));
|
2005-10-03 01:50:16 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
Datum
|
|
|
|
nextval_oid(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
Oid relid = PG_GETARG_OID(0);
|
|
|
|
|
2017-04-06 14:33:16 +02:00
|
|
|
PG_RETURN_INT64(nextval_internal(relid, true));
|
2005-10-03 01:50:16 +02:00
|
|
|
}
|
|
|
|
|
2017-04-06 14:33:16 +02:00
|
|
|
int64
|
|
|
|
nextval_internal(Oid relid, bool check_permissions)
|
2005-10-03 01:50:16 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
SeqTable elm;
|
2002-05-22 23:40:55 +02:00
|
|
|
Relation seqrel;
|
1997-09-08 04:41:22 +02:00
|
|
|
Buffer buf;
|
2002-03-15 20:20:36 +01:00
|
|
|
Page page;
|
2016-12-20 18:00:00 +01:00
|
|
|
HeapTuple pgstuple;
|
|
|
|
Form_pg_sequence pgsform;
|
|
|
|
HeapTupleData seqdatatuple;
|
|
|
|
Form_pg_sequence_data seq;
|
2001-08-16 22:38:56 +02:00
|
|
|
int64 incby,
|
1997-09-08 04:41:22 +02:00
|
|
|
maxv,
|
|
|
|
minv,
|
2000-11-30 02:47:33 +01:00
|
|
|
cache,
|
|
|
|
log,
|
|
|
|
fetch,
|
|
|
|
last;
|
2001-08-16 22:38:56 +02:00
|
|
|
int64 result,
|
1997-09-08 04:41:22 +02:00
|
|
|
next,
|
|
|
|
rescnt = 0;
|
2016-12-20 18:00:00 +01:00
|
|
|
bool cycle;
|
2000-11-30 02:47:33 +01:00
|
|
|
bool logit = false;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
/* open and lock sequence */
|
2005-10-03 01:50:16 +02:00
|
|
|
init_sequence(relid, &elm, &seqrel);
|
2000-06-11 22:08:01 +02:00
|
|
|
|
2017-04-06 14:33:16 +02:00
|
|
|
if (check_permissions &&
|
|
|
|
pg_class_aclcheck(elm->relid, GetUserId(),
|
2014-10-23 03:41:43 +02:00
|
|
|
ACL_USAGE | ACL_UPDATE) != ACLCHECK_OK)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2003-08-01 02:15:26 +02:00
|
|
|
errmsg("permission denied for sequence %s",
|
2005-10-03 01:50:16 +02:00
|
|
|
RelationGetRelationName(seqrel))));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2010-02-20 22:24:02 +01:00
|
|
|
/* read-only transactions may only modify temp sequences */
|
2012-12-18 02:15:32 +01:00
|
|
|
if (!seqrel->rd_islocaltemp)
|
2010-02-20 22:24:02 +01:00
|
|
|
PreventCommandIfReadOnly("nextval()");
|
|
|
|
|
Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 21:02:14 +02:00
|
|
|
/*
|
2015-05-24 03:35:49 +02:00
|
|
|
* Forbid this during parallel operation because, to make it work, the
|
|
|
|
* cooperating backends would need to share the backend-local cached
|
Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 21:02:14 +02:00
|
|
|
* sequence information. Currently, we don't support that.
|
|
|
|
*/
|
|
|
|
PreventCommandIfParallelMode("nextval()");
|
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
if (elm->last != elm->cached) /* some numbers were cached */
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
2007-10-25 20:54:03 +02:00
|
|
|
Assert(elm->last_valid);
|
|
|
|
Assert(elm->increment != 0);
|
1997-09-07 07:04:48 +02:00
|
|
|
elm->last += elm->increment;
|
2002-05-22 23:40:55 +02:00
|
|
|
relation_close(seqrel, NoLock);
|
2007-10-25 20:54:03 +02:00
|
|
|
last_used_seq = elm;
|
2005-10-03 01:50:16 +02:00
|
|
|
return elm->last;
|
1997-04-02 05:51:23 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
pgstuple = SearchSysCache1(SEQRELID, ObjectIdGetDatum(relid));
|
|
|
|
if (!HeapTupleIsValid(pgstuple))
|
|
|
|
elog(ERROR, "cache lookup failed for sequence %u", relid);
|
|
|
|
pgsform = (Form_pg_sequence) GETSTRUCT(pgstuple);
|
|
|
|
incby = pgsform->seqincrement;
|
|
|
|
maxv = pgsform->seqmax;
|
|
|
|
minv = pgsform->seqmin;
|
|
|
|
cache = pgsform->seqcache;
|
2017-02-10 21:12:32 +01:00
|
|
|
cycle = pgsform->seqcycle;
|
2016-12-20 18:00:00 +01:00
|
|
|
ReleaseSysCache(pgstuple);
|
|
|
|
|
2002-05-22 23:40:55 +02:00
|
|
|
/* lock page' buffer and read tuple */
|
2016-12-20 18:00:00 +01:00
|
|
|
seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(buf);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
elm->increment = incby;
|
2000-11-30 02:47:33 +01:00
|
|
|
last = next = result = seq->last_value;
|
2016-12-20 18:00:00 +01:00
|
|
|
fetch = cache;
|
2000-11-30 02:47:33 +01:00
|
|
|
log = seq->log_cnt;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2001-08-16 22:38:56 +02:00
|
|
|
if (!seq->is_called)
|
2000-11-30 02:47:33 +01:00
|
|
|
{
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
rescnt++; /* return last_value if not is_called */
|
2000-11-30 02:47:33 +01:00
|
|
|
fetch--;
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-03-15 20:20:36 +01:00
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* Decide whether we should emit a WAL log record. If so, force up the
|
2005-10-15 04:49:52 +02:00
|
|
|
* fetch count to grab SEQ_LOG_VALS more values than we actually need to
|
|
|
|
* cache. (These will then be usable without logging.)
|
2002-03-15 20:20:36 +01:00
|
|
|
*
|
2005-11-22 19:17:34 +01:00
|
|
|
* If this is the first nextval after a checkpoint, we must force a new
|
|
|
|
* WAL record to be written anyway, else replay starting from the
|
|
|
|
* checkpoint would fail to advance the sequence past the logged values.
|
|
|
|
* In this case we may as well fetch extra values.
|
2002-03-15 20:20:36 +01:00
|
|
|
*/
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
if (log < fetch || !seq->is_called)
|
2000-11-30 02:47:33 +01:00
|
|
|
{
|
2002-03-15 20:20:36 +01:00
|
|
|
/* forced log to satisfy local demand for values */
|
|
|
|
fetch = log = fetch + SEQ_LOG_VALS;
|
2000-11-30 02:47:33 +01:00
|
|
|
logit = true;
|
|
|
|
}
|
2002-03-15 20:20:36 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
XLogRecPtr redoptr = GetRedoRecPtr();
|
|
|
|
|
2012-12-28 17:06:15 +01:00
|
|
|
if (PageGetLSN(page) <= redoptr)
|
2002-03-15 20:20:36 +01:00
|
|
|
{
|
|
|
|
/* last update of seq was before checkpoint */
|
|
|
|
fetch = log = fetch + SEQ_LOG_VALS;
|
|
|
|
logit = true;
|
|
|
|
}
|
|
|
|
}
|
2000-11-30 02:47:33 +01:00
|
|
|
|
2001-03-22 05:01:46 +01:00
|
|
|
while (fetch) /* try to fetch cache [+ log ] numbers */
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Check MAXVALUE for ascending sequences and MINVALUE for descending
|
|
|
|
* sequences
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
2000-06-11 22:08:01 +02:00
|
|
|
if (incby > 0)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
2000-06-11 22:08:01 +02:00
|
|
|
/* ascending sequence */
|
1997-09-07 07:04:48 +02:00
|
|
|
if ((maxv >= 0 && next > maxv - incby) ||
|
|
|
|
(maxv < 0 && next + incby > maxv))
|
|
|
|
{
|
|
|
|
if (rescnt > 0)
|
2000-11-30 02:47:33 +01:00
|
|
|
break; /* stop fetching */
|
2016-12-20 18:00:00 +01:00
|
|
|
if (!cycle)
|
2002-09-03 20:50:54 +02:00
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
char buf[100];
|
|
|
|
|
2003-03-20 04:34:57 +01:00
|
|
|
snprintf(buf, sizeof(buf), INT64_FORMAT, maxv);
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
(errcode(ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED),
|
|
|
|
errmsg("nextval: reached maximum value of sequence \"%s\" (%s)",
|
|
|
|
RelationGetRelationName(seqrel), buf)));
|
2002-09-03 20:50:54 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
next = minv;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
next += incby;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2000-06-11 22:08:01 +02:00
|
|
|
/* descending sequence */
|
1997-09-07 07:04:48 +02:00
|
|
|
if ((minv < 0 && next < minv - incby) ||
|
|
|
|
(minv >= 0 && next + incby < minv))
|
|
|
|
{
|
|
|
|
if (rescnt > 0)
|
2000-11-30 02:47:33 +01:00
|
|
|
break; /* stop fetching */
|
2016-12-20 18:00:00 +01:00
|
|
|
if (!cycle)
|
2002-09-03 20:50:54 +02:00
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
char buf[100];
|
|
|
|
|
2003-03-20 04:34:57 +01:00
|
|
|
snprintf(buf, sizeof(buf), INT64_FORMAT, minv);
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
(errcode(ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED),
|
|
|
|
errmsg("nextval: reached minimum value of sequence \"%s\" (%s)",
|
|
|
|
RelationGetRelationName(seqrel), buf)));
|
2002-09-03 20:50:54 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
next = maxv;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
next += incby;
|
|
|
|
}
|
2000-11-30 02:47:33 +01:00
|
|
|
fetch--;
|
|
|
|
if (rescnt < cache)
|
|
|
|
{
|
|
|
|
log--;
|
|
|
|
rescnt++;
|
|
|
|
last = next;
|
2001-03-22 05:01:46 +01:00
|
|
|
if (rescnt == 1) /* if it's first result - */
|
|
|
|
result = next; /* it's what to return */
|
2000-11-30 02:47:33 +01:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
|
|
|
|
2002-03-15 20:20:36 +01:00
|
|
|
log -= fetch; /* adjust for any unfetched numbers */
|
|
|
|
Assert(log >= 0);
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/* save info in local cache */
|
|
|
|
elm->last = result; /* last returned number */
|
2000-11-30 02:47:33 +01:00
|
|
|
elm->cached = last; /* last fetched number */
|
2007-10-25 20:54:03 +02:00
|
|
|
elm->last_valid = true;
|
2000-11-30 02:47:33 +01:00
|
|
|
|
2005-06-07 09:08:35 +02:00
|
|
|
last_used_seq = elm;
|
|
|
|
|
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.
This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.
This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...
This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.
Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau
Per complaint from maumau and Thom Brown
Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
2015-02-26 12:50:07 +01:00
|
|
|
/*
|
|
|
|
* If something needs to be WAL logged, acquire an xid, so this
|
2015-05-24 03:35:49 +02:00
|
|
|
* transaction's commit will trigger a WAL flush and wait for syncrep.
|
|
|
|
* It's sufficient to ensure the toplevel transaction has an xid, no need
|
|
|
|
* to assign xids subxacts, that'll already trigger an appropriate wait.
|
|
|
|
* (Have to do that here, so we're outside the critical section)
|
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.
This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.
This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...
This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.
Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau
Per complaint from maumau and Thom Brown
Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
2015-02-26 12:50:07 +01:00
|
|
|
*/
|
|
|
|
if (logit && RelationNeedsWAL(seqrel))
|
|
|
|
GetTopTransactionId();
|
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
/* ready to change the on-disk (or really, in-buffer) tuple */
|
2001-01-12 22:54:01 +01:00
|
|
|
START_CRIT_SECTION();
|
2002-08-06 04:36:35 +02:00
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
/*
|
|
|
|
* We must mark the buffer dirty before doing XLogInsert(); see notes in
|
|
|
|
* SyncOneBuffer(). However, we don't apply the desired changes just yet.
|
2013-05-29 22:58:43 +02:00
|
|
|
* This looks like a violation of the buffer update protocol, but it is in
|
2014-05-06 18:12:18 +02:00
|
|
|
* fact safe because we hold exclusive lock on the buffer. Any other
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
* process, including a checkpoint, that tries to examine the buffer
|
|
|
|
* contents will block until we release the lock, and then will see the
|
|
|
|
* final state that we install below.
|
|
|
|
*/
|
2006-04-01 01:32:07 +02:00
|
|
|
MarkBufferDirty(buf);
|
|
|
|
|
2002-08-06 04:36:35 +02:00
|
|
|
/* XLOG stuff */
|
2010-12-13 18:34:26 +01:00
|
|
|
if (logit && RelationNeedsWAL(seqrel))
|
2000-11-30 02:47:33 +01:00
|
|
|
{
|
|
|
|
xl_seq_rec xlrec;
|
|
|
|
XLogRecPtr recptr;
|
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
/*
|
|
|
|
* We don't log the current state of the tuple, but rather the state
|
|
|
|
* as it would appear after "log" more fetches. This lets us skip
|
|
|
|
* that many future WAL records, at the cost that we lose those
|
|
|
|
* sequence values if we crash.
|
|
|
|
*/
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
|
2000-12-28 14:00:29 +01:00
|
|
|
|
2002-08-06 04:36:35 +02:00
|
|
|
/* set values that will be saved in xlog */
|
2000-12-28 14:00:29 +01:00
|
|
|
seq->last_value = next;
|
2001-08-16 22:38:56 +02:00
|
|
|
seq->is_called = true;
|
2000-12-28 14:00:29 +01:00
|
|
|
seq->log_cnt = 0;
|
2002-08-06 04:36:35 +02:00
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
xlrec.node = seqrel->rd_node;
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogRegisterData((char *) &xlrec, sizeof(xl_seq_rec));
|
2016-12-20 18:00:00 +01:00
|
|
|
XLogRegisterData((char *) seqdatatuple.t_data, seqdatatuple.t_len);
|
2000-12-28 14:00:29 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
|
2000-11-30 02:47:33 +01:00
|
|
|
|
2000-12-28 14:00:29 +01:00
|
|
|
PageSetLSN(page, recptr);
|
2000-11-30 02:47:33 +01:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
/* Now update sequence tuple to the intended final state */
|
2000-11-30 02:47:33 +01:00
|
|
|
seq->last_value = last; /* last fetched number */
|
2001-08-16 22:38:56 +02:00
|
|
|
seq->is_called = true;
|
2000-11-30 02:47:33 +01:00
|
|
|
seq->log_cnt = log; /* how much is logged */
|
2002-08-06 04:36:35 +02:00
|
|
|
|
2001-01-12 22:54:01 +01:00
|
|
|
END_CRIT_SECTION();
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-04-01 01:32:07 +02:00
|
|
|
UnlockReleaseBuffer(buf);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-05-22 23:40:55 +02:00
|
|
|
relation_close(seqrel, NoLock);
|
|
|
|
|
2005-10-03 01:50:16 +02:00
|
|
|
return result;
|
1997-04-02 05:51:23 +02:00
|
|
|
}
|
|
|
|
|
2000-06-11 22:08:01 +02:00
|
|
|
Datum
|
2005-10-03 01:50:16 +02:00
|
|
|
currval_oid(PG_FUNCTION_ARGS)
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
2005-10-03 01:50:16 +02:00
|
|
|
Oid relid = PG_GETARG_OID(0);
|
|
|
|
int64 result;
|
1997-09-08 04:41:22 +02:00
|
|
|
SeqTable elm;
|
2002-05-22 23:40:55 +02:00
|
|
|
Relation seqrel;
|
2002-03-30 02:02:42 +01:00
|
|
|
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
/* open and lock sequence */
|
2005-10-03 01:50:16 +02:00
|
|
|
init_sequence(relid, &elm, &seqrel);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2014-10-23 03:41:43 +02:00
|
|
|
if (pg_class_aclcheck(elm->relid, GetUserId(),
|
|
|
|
ACL_SELECT | ACL_USAGE) != ACLCHECK_OK)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2003-08-01 02:15:26 +02:00
|
|
|
errmsg("permission denied for sequence %s",
|
2005-10-03 01:50:16 +02:00
|
|
|
RelationGetRelationName(seqrel))));
|
2002-03-22 00:27:25 +01:00
|
|
|
|
2007-10-25 20:54:03 +02:00
|
|
|
if (!elm->last_valid)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("currval of sequence \"%s\" is not yet defined in this session",
|
2005-10-03 01:50:16 +02:00
|
|
|
RelationGetRelationName(seqrel))));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
result = elm->last;
|
|
|
|
|
2002-05-22 23:40:55 +02:00
|
|
|
relation_close(seqrel, NoLock);
|
|
|
|
|
2001-08-16 22:38:56 +02:00
|
|
|
PG_RETURN_INT64(result);
|
1997-04-02 05:51:23 +02:00
|
|
|
}
|
|
|
|
|
2005-06-07 09:08:35 +02:00
|
|
|
Datum
|
|
|
|
lastval(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
Relation seqrel;
|
|
|
|
int64 result;
|
|
|
|
|
|
|
|
if (last_used_seq == NULL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
|
|
|
errmsg("lastval is not yet defined in this session")));
|
|
|
|
|
|
|
|
/* Someone may have dropped the sequence since the last nextval() */
|
2010-02-14 19:42:19 +01:00
|
|
|
if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(last_used_seq->relid)))
|
2005-06-07 09:08:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
|
|
|
errmsg("lastval is not yet defined in this session")));
|
|
|
|
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
seqrel = lock_and_open_sequence(last_used_seq);
|
2005-06-07 09:08:35 +02:00
|
|
|
|
|
|
|
/* nextval() must have already been called for this sequence */
|
2007-10-25 20:54:03 +02:00
|
|
|
Assert(last_used_seq->last_valid);
|
2005-06-07 09:08:35 +02:00
|
|
|
|
2014-10-23 03:41:43 +02:00
|
|
|
if (pg_class_aclcheck(last_used_seq->relid, GetUserId(),
|
|
|
|
ACL_SELECT | ACL_USAGE) != ACLCHECK_OK)
|
2005-06-07 09:08:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied for sequence %s",
|
|
|
|
RelationGetRelationName(seqrel))));
|
|
|
|
|
|
|
|
result = last_used_seq->last;
|
|
|
|
relation_close(seqrel, NoLock);
|
2005-10-03 01:50:16 +02:00
|
|
|
|
2005-06-07 09:08:35 +02:00
|
|
|
PG_RETURN_INT64(result);
|
|
|
|
}
|
|
|
|
|
2001-03-22 05:01:46 +01:00
|
|
|
/*
|
2001-02-13 02:57:12 +01:00
|
|
|
* Main internal procedure that handles 2 & 3 arg forms of SETVAL.
|
|
|
|
*
|
|
|
|
* Note that the 3 arg version (which sets the is_called flag) is
|
|
|
|
* only for use in pg_dump, and setting the is_called flag may not
|
2001-03-22 05:01:46 +01:00
|
|
|
* work if multiple users are attached to the database and referencing
|
2001-02-13 02:57:12 +01:00
|
|
|
* the sequence (unlikely if pg_dump is restoring it).
|
|
|
|
*
|
2001-03-22 05:01:46 +01:00
|
|
|
* It is necessary to have the 3 arg version so that pg_dump can
|
2001-02-13 02:57:12 +01:00
|
|
|
* restore the state of a sequence exactly during data-only restores -
|
|
|
|
* it is the only way to clear the is_called flag in an existing
|
|
|
|
* sequence.
|
|
|
|
*/
|
2000-10-16 19:08:11 +02:00
|
|
|
static void
|
2005-10-03 01:50:16 +02:00
|
|
|
do_setval(Oid relid, int64 next, bool iscalled)
|
1998-08-25 23:25:46 +02:00
|
|
|
{
|
|
|
|
SeqTable elm;
|
2002-05-22 23:40:55 +02:00
|
|
|
Relation seqrel;
|
1998-09-01 06:40:42 +02:00
|
|
|
Buffer buf;
|
2016-12-20 18:00:00 +01:00
|
|
|
HeapTupleData seqdatatuple;
|
|
|
|
Form_pg_sequence_data seq;
|
|
|
|
HeapTuple pgstuple;
|
|
|
|
Form_pg_sequence pgsform;
|
|
|
|
int64 maxv,
|
|
|
|
minv;
|
1998-08-25 23:25:46 +02:00
|
|
|
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
/* open and lock sequence */
|
2005-10-03 01:50:16 +02:00
|
|
|
init_sequence(relid, &elm, &seqrel);
|
2002-03-22 00:27:25 +01:00
|
|
|
|
|
|
|
if (pg_class_aclcheck(elm->relid, GetUserId(), ACL_UPDATE) != ACLCHECK_OK)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
2003-08-01 02:15:26 +02:00
|
|
|
errmsg("permission denied for sequence %s",
|
2005-10-03 01:50:16 +02:00
|
|
|
RelationGetRelationName(seqrel))));
|
1998-08-25 23:25:46 +02:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
pgstuple = SearchSysCache1(SEQRELID, ObjectIdGetDatum(relid));
|
|
|
|
if (!HeapTupleIsValid(pgstuple))
|
|
|
|
elog(ERROR, "cache lookup failed for sequence %u", relid);
|
|
|
|
pgsform = (Form_pg_sequence) GETSTRUCT(pgstuple);
|
|
|
|
maxv = pgsform->seqmax;
|
|
|
|
minv = pgsform->seqmin;
|
|
|
|
ReleaseSysCache(pgstuple);
|
|
|
|
|
2010-02-20 22:24:02 +01:00
|
|
|
/* read-only transactions may only modify temp sequences */
|
2012-12-18 02:15:32 +01:00
|
|
|
if (!seqrel->rd_islocaltemp)
|
2010-02-20 22:24:02 +01:00
|
|
|
PreventCommandIfReadOnly("setval()");
|
|
|
|
|
Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 21:02:14 +02:00
|
|
|
/*
|
2015-05-24 03:35:49 +02:00
|
|
|
* Forbid this during parallel operation because, to make it work, the
|
|
|
|
* cooperating backends would need to share the backend-local cached
|
Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 21:02:14 +02:00
|
|
|
* sequence information. Currently, we don't support that.
|
|
|
|
*/
|
|
|
|
PreventCommandIfParallelMode("setval()");
|
|
|
|
|
2002-03-22 00:27:25 +01:00
|
|
|
/* lock page' buffer and read tuple */
|
2016-12-20 18:00:00 +01:00
|
|
|
seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
|
1998-08-25 23:25:46 +02:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
if ((next < minv) || (next > maxv))
|
2002-09-03 20:50:54 +02:00
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
char bufv[100],
|
|
|
|
bufm[100],
|
|
|
|
bufx[100];
|
|
|
|
|
2003-03-20 04:34:57 +01:00
|
|
|
snprintf(bufv, sizeof(bufv), INT64_FORMAT, next);
|
2016-12-20 18:00:00 +01:00
|
|
|
snprintf(bufm, sizeof(bufm), INT64_FORMAT, minv);
|
|
|
|
snprintf(bufx, sizeof(bufx), INT64_FORMAT, maxv);
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("setval: value %s is out of bounds for sequence \"%s\" (%s..%s)",
|
2005-10-03 01:50:16 +02:00
|
|
|
bufv, RelationGetRelationName(seqrel),
|
|
|
|
bufm, bufx)));
|
2002-09-03 20:50:54 +02:00
|
|
|
}
|
1998-08-25 23:25:46 +02:00
|
|
|
|
2007-10-25 20:54:03 +02:00
|
|
|
/* Set the currval() state only if iscalled = true */
|
|
|
|
if (iscalled)
|
|
|
|
{
|
|
|
|
elm->last = next; /* last returned number */
|
|
|
|
elm->last_valid = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* In any case, forget any future cached numbers */
|
|
|
|
elm->cached = elm->last;
|
1998-08-25 23:25:46 +02:00
|
|
|
|
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.
This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.
This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...
This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.
Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau
Per complaint from maumau and Thom Brown
Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
2015-02-26 12:50:07 +01:00
|
|
|
/* check the comment above nextval_internal()'s equivalent call. */
|
|
|
|
if (RelationNeedsWAL(seqrel))
|
|
|
|
GetTopTransactionId();
|
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
/* ready to change the on-disk (or really, in-buffer) tuple */
|
2001-01-12 22:54:01 +01:00
|
|
|
START_CRIT_SECTION();
|
2002-08-06 04:36:35 +02:00
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
seq->last_value = next; /* last fetched number */
|
|
|
|
seq->is_called = iscalled;
|
|
|
|
seq->log_cnt = 0;
|
|
|
|
|
2006-04-01 01:32:07 +02:00
|
|
|
MarkBufferDirty(buf);
|
|
|
|
|
2002-08-06 04:36:35 +02:00
|
|
|
/* XLOG stuff */
|
2010-12-13 18:34:26 +01:00
|
|
|
if (RelationNeedsWAL(seqrel))
|
2000-11-30 02:47:33 +01:00
|
|
|
{
|
|
|
|
xl_seq_rec xlrec;
|
|
|
|
XLogRecPtr recptr;
|
2016-04-20 15:31:19 +02:00
|
|
|
Page page = BufferGetPage(buf);
|
2000-11-30 02:47:33 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
|
2000-12-28 14:00:29 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
xlrec.node = seqrel->rd_node;
|
|
|
|
XLogRegisterData((char *) &xlrec, sizeof(xl_seq_rec));
|
2016-12-20 18:00:00 +01:00
|
|
|
XLogRegisterData((char *) seqdatatuple.t_data, seqdatatuple.t_len);
|
2000-12-28 14:00:29 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
|
2000-12-28 14:00:29 +01:00
|
|
|
|
|
|
|
PageSetLSN(page, recptr);
|
2000-11-30 02:47:33 +01:00
|
|
|
}
|
2002-08-06 04:36:35 +02:00
|
|
|
|
2001-01-12 22:54:01 +01:00
|
|
|
END_CRIT_SECTION();
|
1998-08-25 23:25:46 +02:00
|
|
|
|
2006-04-01 01:32:07 +02:00
|
|
|
UnlockReleaseBuffer(buf);
|
2002-05-22 23:40:55 +02:00
|
|
|
|
|
|
|
relation_close(seqrel, NoLock);
|
2000-10-11 17:31:34 +02:00
|
|
|
}
|
|
|
|
|
2001-02-13 02:57:12 +01:00
|
|
|
/*
|
|
|
|
* Implement the 2 arg setval procedure.
|
|
|
|
* See do_setval for discussion.
|
|
|
|
*/
|
2000-10-11 17:31:34 +02:00
|
|
|
Datum
|
2005-10-03 01:50:16 +02:00
|
|
|
setval_oid(PG_FUNCTION_ARGS)
|
2000-10-11 17:31:34 +02:00
|
|
|
{
|
2005-10-03 01:50:16 +02:00
|
|
|
Oid relid = PG_GETARG_OID(0);
|
2001-08-16 22:38:56 +02:00
|
|
|
int64 next = PG_GETARG_INT64(1);
|
2002-03-30 02:02:42 +01:00
|
|
|
|
2005-10-03 01:50:16 +02:00
|
|
|
do_setval(relid, next, true);
|
2000-10-11 17:31:34 +02:00
|
|
|
|
2001-08-16 22:38:56 +02:00
|
|
|
PG_RETURN_INT64(next);
|
2000-10-11 17:31:34 +02:00
|
|
|
}
|
|
|
|
|
2001-02-13 02:57:12 +01:00
|
|
|
/*
|
|
|
|
* Implement the 3 arg setval procedure.
|
|
|
|
* See do_setval for discussion.
|
|
|
|
*/
|
2000-10-11 17:31:34 +02:00
|
|
|
Datum
|
2005-10-03 01:50:16 +02:00
|
|
|
setval3_oid(PG_FUNCTION_ARGS)
|
2000-10-11 17:31:34 +02:00
|
|
|
{
|
2005-10-03 01:50:16 +02:00
|
|
|
Oid relid = PG_GETARG_OID(0);
|
2001-08-16 22:38:56 +02:00
|
|
|
int64 next = PG_GETARG_INT64(1);
|
2000-10-11 17:31:34 +02:00
|
|
|
bool iscalled = PG_GETARG_BOOL(2);
|
|
|
|
|
2005-10-03 01:50:16 +02:00
|
|
|
do_setval(relid, next, iscalled);
|
2000-06-11 22:08:01 +02:00
|
|
|
|
2002-03-30 02:02:42 +01:00
|
|
|
PG_RETURN_INT64(next);
|
1998-08-25 23:25:46 +02:00
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2005-06-07 09:08:35 +02:00
|
|
|
/*
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
* Open the sequence and acquire lock if needed
|
2006-07-31 22:09:10 +02:00
|
|
|
*
|
2005-06-07 09:08:35 +02:00
|
|
|
* If we haven't touched the sequence already in this transaction,
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
* we need to acquire a lock. We arrange for the lock to
|
2005-06-07 09:08:35 +02:00
|
|
|
* be owned by the top transaction, so that we don't need to do it
|
|
|
|
* more than once per xact.
|
|
|
|
*/
|
2006-07-31 22:09:10 +02:00
|
|
|
static Relation
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
lock_and_open_sequence(SeqTable seq)
|
2005-06-07 09:08:35 +02:00
|
|
|
{
|
2007-09-05 20:10:48 +02:00
|
|
|
LocalTransactionId thislxid = MyProc->lxid;
|
2005-06-07 09:08:35 +02:00
|
|
|
|
2006-07-31 22:09:10 +02:00
|
|
|
/* Get the lock if not already held in this xact */
|
2007-09-05 20:10:48 +02:00
|
|
|
if (seq->lxid != thislxid)
|
2005-06-07 09:08:35 +02:00
|
|
|
{
|
|
|
|
ResourceOwner currentOwner;
|
|
|
|
|
|
|
|
currentOwner = CurrentResourceOwner;
|
|
|
|
PG_TRY();
|
|
|
|
{
|
|
|
|
CurrentResourceOwner = TopTransactionResourceOwner;
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
LockRelationOid(seq->relid, RowExclusiveLock);
|
2005-06-07 09:08:35 +02:00
|
|
|
}
|
|
|
|
PG_CATCH();
|
|
|
|
{
|
|
|
|
/* Ensure CurrentResourceOwner is restored on error */
|
|
|
|
CurrentResourceOwner = currentOwner;
|
|
|
|
PG_RE_THROW();
|
|
|
|
}
|
|
|
|
PG_END_TRY();
|
|
|
|
CurrentResourceOwner = currentOwner;
|
|
|
|
|
2006-07-31 22:09:10 +02:00
|
|
|
/* Flag that we have a lock in the current xact */
|
2007-09-05 20:10:48 +02:00
|
|
|
seq->lxid = thislxid;
|
2005-06-07 09:08:35 +02:00
|
|
|
}
|
2006-07-31 22:09:10 +02:00
|
|
|
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
/* We now know we have the lock, and can safely open the rel */
|
2006-07-31 22:09:10 +02:00
|
|
|
return relation_open(seq->relid, NoLock);
|
2005-06-07 09:08:35 +02:00
|
|
|
}
|
|
|
|
|
2013-11-15 11:29:38 +01:00
|
|
|
/*
|
|
|
|
* Creates the hash table for storing sequence data
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
create_seq_hashtable(void)
|
|
|
|
{
|
|
|
|
HASHCTL ctl;
|
|
|
|
|
|
|
|
memset(&ctl, 0, sizeof(ctl));
|
|
|
|
ctl.keysize = sizeof(Oid);
|
|
|
|
ctl.entrysize = sizeof(SeqTableData);
|
|
|
|
|
|
|
|
seqhashtab = hash_create("Sequence values", 16, &ctl,
|
Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create(). Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate. Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions. hash_create() itself will
take care of optimizing when the key size is four bytes.
This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.
In future we could look into offering a similar optimized hashing function
for 8-byte keys. Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.
For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules. Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.
Teodor Sigaev and Tom Lane
2014-12-18 19:36:29 +01:00
|
|
|
HASH_ELEM | HASH_BLOBS);
|
2013-11-15 11:29:38 +01:00
|
|
|
}
|
|
|
|
|
2002-05-22 23:40:55 +02:00
|
|
|
/*
|
2005-10-03 01:50:16 +02:00
|
|
|
* Given a relation OID, open and lock the sequence. p_elm and p_rel are
|
2002-05-22 23:40:55 +02:00
|
|
|
* output parameters.
|
|
|
|
*/
|
|
|
|
static void
|
2005-10-03 01:50:16 +02:00
|
|
|
init_sequence(Oid relid, SeqTable *p_elm, Relation *p_rel)
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
2006-10-04 02:30:14 +02:00
|
|
|
SeqTable elm;
|
1999-12-31 01:54:27 +01:00
|
|
|
Relation seqrel;
|
2013-11-15 11:29:38 +01:00
|
|
|
bool found;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2013-11-15 18:17:12 +01:00
|
|
|
/* Find or create a hash table entry for this sequence */
|
2013-11-15 11:29:38 +01:00
|
|
|
if (seqhashtab == NULL)
|
|
|
|
create_seq_hashtable();
|
|
|
|
|
|
|
|
elm = (SeqTable) hash_search(seqhashtab, &relid, HASH_ENTER, &found);
|
2004-09-16 18:58:44 +02:00
|
|
|
|
2002-03-30 02:02:42 +01:00
|
|
|
/*
|
2013-11-15 18:17:12 +01:00
|
|
|
* Initialize the new hash table entry if it did not exist already.
|
2002-03-30 02:02:42 +01:00
|
|
|
*
|
2013-11-15 11:29:38 +01:00
|
|
|
* NOTE: seqtable entries are stored for the life of a backend (unless
|
2013-11-15 18:17:12 +01:00
|
|
|
* explicitly discarded with DISCARD). If the sequence itself is deleted
|
2013-11-15 11:29:38 +01:00
|
|
|
* then the entry becomes wasted memory, but it's small enough that this
|
|
|
|
* should not matter.
|
2002-09-04 22:31:48 +02:00
|
|
|
*/
|
2013-11-15 11:29:38 +01:00
|
|
|
if (!found)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
2013-11-15 11:29:38 +01:00
|
|
|
/* relid already filled in */
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
elm->filenode = InvalidOid;
|
2007-09-05 20:10:48 +02:00
|
|
|
elm->lxid = InvalidLocalTransactionId;
|
2007-10-25 20:54:03 +02:00
|
|
|
elm->last_valid = false;
|
2016-12-20 18:00:00 +01:00
|
|
|
elm->last = elm->cached = 0;
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
|
|
|
|
2006-07-31 22:09:10 +02:00
|
|
|
/*
|
|
|
|
* Open the sequence relation.
|
|
|
|
*/
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
seqrel = lock_and_open_sequence(elm);
|
2006-07-31 22:09:10 +02:00
|
|
|
|
|
|
|
if (seqrel->rd_rel->relkind != RELKIND_SEQUENCE)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" is not a sequence",
|
|
|
|
RelationGetRelationName(seqrel))));
|
2002-05-22 23:40:55 +02:00
|
|
|
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
/*
|
|
|
|
* If the sequence has been transactionally replaced since we last saw it,
|
2014-05-06 18:12:18 +02:00
|
|
|
* discard any cached-but-unissued values. We do not touch the currval()
|
Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 22:42:18 +01:00
|
|
|
* state, however.
|
|
|
|
*/
|
|
|
|
if (seqrel->rd_rel->relfilenode != elm->filenode)
|
|
|
|
{
|
|
|
|
elm->filenode = seqrel->rd_rel->relfilenode;
|
|
|
|
elm->cached = elm->last;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return results */
|
2002-05-22 23:40:55 +02:00
|
|
|
*p_elm = elm;
|
|
|
|
*p_rel = seqrel;
|
1997-04-02 05:51:23 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
/*
|
|
|
|
* Given an opened sequence relation, lock the page buffer and find the tuple
|
|
|
|
*
|
|
|
|
* *buf receives the reference to the pinned-and-ex-locked buffer
|
2016-12-20 18:00:00 +01:00
|
|
|
* *seqdatatuple receives the reference to the sequence tuple proper
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
* (this arg should point to a local variable of type HeapTupleData)
|
|
|
|
*
|
|
|
|
* Function's return value points to the data payload of the tuple
|
|
|
|
*/
|
2016-12-20 18:00:00 +01:00
|
|
|
static Form_pg_sequence_data
|
|
|
|
read_seq_tuple(Relation rel, Buffer *buf, HeapTuple seqdatatuple)
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
2008-07-13 22:45:47 +02:00
|
|
|
Page page;
|
2002-05-22 23:40:55 +02:00
|
|
|
ItemId lp;
|
|
|
|
sequence_magic *sm;
|
2016-12-20 18:00:00 +01:00
|
|
|
Form_pg_sequence_data seq;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-05-22 23:40:55 +02:00
|
|
|
*buf = ReadBuffer(rel, 0);
|
|
|
|
LockBuffer(*buf, BUFFER_LOCK_EXCLUSIVE);
|
|
|
|
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(*buf);
|
2002-05-22 23:40:55 +02:00
|
|
|
sm = (sequence_magic *) PageGetSpecialPointer(page);
|
|
|
|
|
|
|
|
if (sm->magic != SEQ_MAGIC)
|
2003-07-28 02:09:16 +02:00
|
|
|
elog(ERROR, "bad magic number in sequence \"%s\": %08X",
|
|
|
|
RelationGetRelationName(rel), sm->magic);
|
2002-05-22 23:40:55 +02:00
|
|
|
|
|
|
|
lp = PageGetItemId(page, FirstOffsetNumber);
|
2007-09-13 00:10:26 +02:00
|
|
|
Assert(ItemIdIsNormal(lp));
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
/* Note we currently only bother to set these two fields of *seqdatatuple */
|
|
|
|
seqdatatuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
|
|
|
|
seqdatatuple->t_len = ItemIdGetLength(lp);
|
2002-05-22 23:40:55 +02:00
|
|
|
|
2011-06-02 21:30:56 +02:00
|
|
|
/*
|
2011-06-09 20:32:50 +02:00
|
|
|
* Previous releases of Postgres neglected to prevent SELECT FOR UPDATE on
|
|
|
|
* a sequence, which would leave a non-frozen XID in the sequence tuple's
|
|
|
|
* xmax, which eventually leads to clog access failures or worse. If we
|
|
|
|
* see this has happened, clean up after it. We treat this like a hint
|
|
|
|
* bit update, ie, don't bother to WAL-log it, since we can certainly do
|
|
|
|
* this again if the update gets lost.
|
2011-06-02 21:30:56 +02:00
|
|
|
*/
|
2016-12-20 18:00:00 +01:00
|
|
|
Assert(!(seqdatatuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI));
|
|
|
|
if (HeapTupleHeaderGetRawXmax(seqdatatuple->t_data) != InvalidTransactionId)
|
2011-06-02 21:30:56 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
HeapTupleHeaderSetXmax(seqdatatuple->t_data, InvalidTransactionId);
|
|
|
|
seqdatatuple->t_data->t_infomask &= ~HEAP_XMAX_COMMITTED;
|
|
|
|
seqdatatuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
|
2013-06-17 17:02:12 +02:00
|
|
|
MarkBufferDirtyHint(*buf, true);
|
2011-06-02 21:30:56 +02:00
|
|
|
}
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
seq = (Form_pg_sequence_data) GETSTRUCT(seqdatatuple);
|
2002-05-22 23:40:55 +02:00
|
|
|
|
|
|
|
return seq;
|
1997-04-02 05:51:23 +02:00
|
|
|
}
|
|
|
|
|
2003-11-24 17:54:07 +01:00
|
|
|
/*
|
2017-05-02 16:41:48 +02:00
|
|
|
* init_params: process the options list of CREATE or ALTER SEQUENCE, and
|
|
|
|
* store the values into appropriate fields of seqform, for changes that go
|
2017-06-12 22:57:31 +02:00
|
|
|
* into the pg_sequence catalog, and fields of seqdataform for changes to the
|
|
|
|
* sequence relation itself. Set *need_seq_rewrite to true if we changed any
|
|
|
|
* parameters that require rewriting the sequence's relation (interesting for
|
|
|
|
* ALTER SEQUENCE). Also set *owned_by to any OWNED BY option, or to NIL if
|
|
|
|
* there is none.
|
2003-11-24 17:54:07 +01:00
|
|
|
*
|
|
|
|
* If isInit is true, fill any unspecified options with default values;
|
|
|
|
* otherwise, do not change existing options that aren't explicitly overridden.
|
2017-06-12 22:57:31 +02:00
|
|
|
*
|
|
|
|
* Note: we force a sequence rewrite whenever we change parameters that affect
|
|
|
|
* generation of future sequence values, even if the seqdataform per se is not
|
|
|
|
* changed. This allows ALTER SEQUENCE to behave transactionally. Currently,
|
|
|
|
* the only option that doesn't cause that is OWNED BY. It's *necessary* for
|
|
|
|
* ALTER SEQUENCE OWNED BY to not rewrite the sequence, because that would
|
|
|
|
* break pg_upgrade by causing unwanted changes in the sequence's relfilenode.
|
2003-11-24 17:54:07 +01:00
|
|
|
*/
|
1997-09-07 07:04:48 +02:00
|
|
|
static void
|
2017-04-06 14:33:16 +02:00
|
|
|
init_params(ParseState *pstate, List *options, bool for_identity,
|
|
|
|
bool isInit,
|
2016-12-20 18:00:00 +01:00
|
|
|
Form_pg_sequence seqform,
|
2017-05-02 16:41:48 +02:00
|
|
|
Form_pg_sequence_data seqdataform,
|
2017-06-12 22:57:31 +02:00
|
|
|
bool *need_seq_rewrite,
|
2017-05-02 16:41:48 +02:00
|
|
|
List **owned_by)
|
1997-04-02 05:51:23 +02:00
|
|
|
{
|
2017-02-10 21:12:32 +01:00
|
|
|
DefElem *as_type = NULL;
|
2008-05-17 03:20:39 +02:00
|
|
|
DefElem *start_value = NULL;
|
|
|
|
DefElem *restart_value = NULL;
|
1997-09-08 04:41:22 +02:00
|
|
|
DefElem *increment_by = NULL;
|
|
|
|
DefElem *max_value = NULL;
|
|
|
|
DefElem *min_value = NULL;
|
|
|
|
DefElem *cache_value = NULL;
|
2003-11-24 17:54:07 +01:00
|
|
|
DefElem *is_cycled = NULL;
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *option;
|
2017-04-04 18:36:15 +02:00
|
|
|
bool reset_max_value = false;
|
|
|
|
bool reset_min_value = false;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = false;
|
2006-08-21 02:57:26 +02:00
|
|
|
*owned_by = NIL;
|
|
|
|
|
2003-03-20 08:02:11 +01:00
|
|
|
foreach(option, options)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
DefElem *defel = (DefElem *) lfirst(option);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2017-02-10 21:12:32 +01:00
|
|
|
if (strcmp(defel->defname, "as") == 0)
|
|
|
|
{
|
|
|
|
if (as_type)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
|
|
|
as_type = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2017-02-10 21:12:32 +01:00
|
|
|
}
|
|
|
|
else if (strcmp(defel->defname, "increment") == 0)
|
2003-02-13 06:25:24 +01:00
|
|
|
{
|
|
|
|
if (increment_by)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
1997-09-07 07:04:48 +02:00
|
|
|
increment_by = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2003-02-13 06:25:24 +01:00
|
|
|
}
|
2008-05-17 01:36:05 +02:00
|
|
|
else if (strcmp(defel->defname, "start") == 0)
|
|
|
|
{
|
2008-05-17 03:20:39 +02:00
|
|
|
if (start_value)
|
2008-05-17 01:36:05 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2008-05-17 03:20:39 +02:00
|
|
|
start_value = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2008-05-17 01:36:05 +02:00
|
|
|
}
|
|
|
|
else if (strcmp(defel->defname, "restart") == 0)
|
2003-02-13 06:25:24 +01:00
|
|
|
{
|
2008-05-17 03:20:39 +02:00
|
|
|
if (restart_value)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2008-05-17 03:20:39 +02:00
|
|
|
restart_value = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2003-02-13 06:25:24 +01:00
|
|
|
}
|
2001-10-25 07:50:21 +02:00
|
|
|
else if (strcmp(defel->defname, "maxvalue") == 0)
|
2003-02-13 06:25:24 +01:00
|
|
|
{
|
|
|
|
if (max_value)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
1997-09-07 07:04:48 +02:00
|
|
|
max_value = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2003-02-13 06:25:24 +01:00
|
|
|
}
|
2001-10-25 07:50:21 +02:00
|
|
|
else if (strcmp(defel->defname, "minvalue") == 0)
|
2003-02-13 06:25:24 +01:00
|
|
|
{
|
|
|
|
if (min_value)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
1997-09-07 07:04:48 +02:00
|
|
|
min_value = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2003-02-13 06:25:24 +01:00
|
|
|
}
|
2001-10-25 07:50:21 +02:00
|
|
|
else if (strcmp(defel->defname, "cache") == 0)
|
2003-02-13 06:25:24 +01:00
|
|
|
{
|
|
|
|
if (cache_value)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
1997-09-07 07:04:48 +02:00
|
|
|
cache_value = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2003-02-13 06:25:24 +01:00
|
|
|
}
|
2001-10-25 07:50:21 +02:00
|
|
|
else if (strcmp(defel->defname, "cycle") == 0)
|
2003-02-13 06:25:24 +01:00
|
|
|
{
|
2003-11-24 17:54:07 +01:00
|
|
|
if (is_cycled)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2003-11-24 17:54:07 +01:00
|
|
|
is_cycled = defel;
|
2017-06-12 22:57:31 +02:00
|
|
|
*need_seq_rewrite = true;
|
2003-02-13 06:25:24 +01:00
|
|
|
}
|
2006-08-21 02:57:26 +02:00
|
|
|
else if (strcmp(defel->defname, "owned_by") == 0)
|
|
|
|
{
|
|
|
|
if (*owned_by)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2016-09-06 18:00:00 +02:00
|
|
|
errmsg("conflicting or redundant options"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
2006-08-21 02:57:26 +02:00
|
|
|
*owned_by = defGetQualifiedName(defel);
|
|
|
|
}
|
2017-04-06 14:33:16 +02:00
|
|
|
else if (strcmp(defel->defname, "sequence_name") == 0)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The parser allows this, but it is only for identity columns, in
|
|
|
|
* which case it is filtered out in parse_utilcmd.c. We only get
|
|
|
|
* here if someone puts it into a CREATE SEQUENCE.
|
|
|
|
*/
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("invalid sequence option SEQUENCE NAME"),
|
|
|
|
parser_errposition(pstate, defel->location)));
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
else
|
2003-07-20 23:56:35 +02:00
|
|
|
elog(ERROR, "option \"%s\" not recognized",
|
1997-09-07 07:04:48 +02:00
|
|
|
defel->defname);
|
|
|
|
}
|
|
|
|
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
/*
|
2013-05-29 22:58:43 +02:00
|
|
|
* We must reset log_cnt when isInit or when changing any parameters that
|
|
|
|
* would affect future nextval allocations.
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
*/
|
|
|
|
if (isInit)
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->log_cnt = 0;
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
|
2017-02-10 21:12:32 +01:00
|
|
|
/* AS type */
|
|
|
|
if (as_type != NULL)
|
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
Oid newtypid = typenameTypeId(pstate, defGetTypeName(as_type));
|
2017-04-04 18:36:15 +02:00
|
|
|
|
|
|
|
if (newtypid != INT2OID &&
|
|
|
|
newtypid != INT4OID &&
|
|
|
|
newtypid != INT8OID)
|
2017-02-10 21:12:32 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
2017-04-06 14:33:16 +02:00
|
|
|
for_identity
|
|
|
|
? errmsg("identity column type must be smallint, integer, or bigint")
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
: errmsg("sequence type must be smallint, integer, or bigint")));
|
2017-04-04 18:36:15 +02:00
|
|
|
|
|
|
|
if (!isInit)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* When changing type and the old sequence min/max values were the
|
|
|
|
* min/max of the old type, adjust sequence min/max values to
|
|
|
|
* min/max of new type. (Otherwise, the user chose explicit
|
|
|
|
* min/max values, which we'll leave alone.)
|
|
|
|
*/
|
|
|
|
if ((seqform->seqtypid == INT2OID && seqform->seqmax == PG_INT16_MAX) ||
|
|
|
|
(seqform->seqtypid == INT4OID && seqform->seqmax == PG_INT32_MAX) ||
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
(seqform->seqtypid == INT8OID && seqform->seqmax == PG_INT64_MAX))
|
2017-04-04 18:36:15 +02:00
|
|
|
reset_max_value = true;
|
|
|
|
if ((seqform->seqtypid == INT2OID && seqform->seqmin == PG_INT16_MIN) ||
|
|
|
|
(seqform->seqtypid == INT4OID && seqform->seqmin == PG_INT32_MIN) ||
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
(seqform->seqtypid == INT8OID && seqform->seqmin == PG_INT64_MIN))
|
2017-04-04 18:36:15 +02:00
|
|
|
reset_min_value = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
seqform->seqtypid = newtypid;
|
2017-02-10 21:12:32 +01:00
|
|
|
}
|
|
|
|
else if (isInit)
|
2017-05-02 16:41:48 +02:00
|
|
|
{
|
2017-02-10 21:12:32 +01:00
|
|
|
seqform->seqtypid = INT8OID;
|
2017-05-02 16:41:48 +02:00
|
|
|
}
|
2017-02-10 21:12:32 +01:00
|
|
|
|
2003-03-20 08:02:11 +01:00
|
|
|
/* INCREMENT BY */
|
2004-01-07 19:56:30 +01:00
|
|
|
if (increment_by != NULL)
|
2003-03-20 08:02:11 +01:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqincrement = defGetInt64(increment_by);
|
|
|
|
if (seqform->seqincrement == 0)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("INCREMENT must not be zero")));
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->log_cnt = 0;
|
2003-03-20 08:02:11 +01:00
|
|
|
}
|
2003-11-24 17:54:07 +01:00
|
|
|
else if (isInit)
|
2017-05-02 16:41:48 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqincrement = 1;
|
2017-05-02 16:41:48 +02:00
|
|
|
}
|
2003-11-24 17:54:07 +01:00
|
|
|
|
|
|
|
/* CYCLE */
|
2004-01-07 19:56:30 +01:00
|
|
|
if (is_cycled != NULL)
|
2003-11-24 17:54:07 +01:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqcycle = intVal(is_cycled->arg);
|
|
|
|
Assert(BoolIsValid(seqform->seqcycle));
|
|
|
|
seqdataform->log_cnt = 0;
|
2003-11-24 17:54:07 +01:00
|
|
|
}
|
|
|
|
else if (isInit)
|
2017-05-02 16:41:48 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqcycle = false;
|
2017-05-02 16:41:48 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2003-11-24 17:54:07 +01:00
|
|
|
/* MAXVALUE (null arg means NO MAXVALUE) */
|
2004-01-07 19:56:30 +01:00
|
|
|
if (max_value != NULL && max_value->arg)
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqmax = defGetInt64(max_value);
|
|
|
|
seqdataform->log_cnt = 0;
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
}
|
2017-04-04 18:36:15 +02:00
|
|
|
else if (isInit || max_value != NULL || reset_max_value)
|
There's a patch attached to fix gcc 2.8.x warnings, except for the
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
1998-03-30 18:47:35 +02:00
|
|
|
{
|
2017-04-04 18:36:15 +02:00
|
|
|
if (seqform->seqincrement > 0 || reset_max_value)
|
2017-02-10 21:12:32 +01:00
|
|
|
{
|
|
|
|
/* ascending seq */
|
|
|
|
if (seqform->seqtypid == INT2OID)
|
|
|
|
seqform->seqmax = PG_INT16_MAX;
|
|
|
|
else if (seqform->seqtypid == INT4OID)
|
|
|
|
seqform->seqmax = PG_INT32_MAX;
|
|
|
|
else
|
|
|
|
seqform->seqmax = PG_INT64_MAX;
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
else
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
seqform->seqmax = -1; /* descending seq */
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->log_cnt = 0;
|
There's a patch attached to fix gcc 2.8.x warnings, except for the
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
1998-03-30 18:47:35 +02:00
|
|
|
}
|
1997-04-02 05:51:23 +02:00
|
|
|
|
2017-02-10 21:12:32 +01:00
|
|
|
if ((seqform->seqtypid == INT2OID && (seqform->seqmax < PG_INT16_MIN || seqform->seqmax > PG_INT16_MAX))
|
|
|
|
|| (seqform->seqtypid == INT4OID && (seqform->seqmax < PG_INT32_MIN || seqform->seqmax > PG_INT32_MAX))
|
|
|
|
|| (seqform->seqtypid == INT8OID && (seqform->seqmax < PG_INT64_MIN || seqform->seqmax > PG_INT64_MAX)))
|
|
|
|
{
|
|
|
|
char bufx[100];
|
|
|
|
|
|
|
|
snprintf(bufx, sizeof(bufx), INT64_FORMAT, seqform->seqmax);
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
errmsg("MAXVALUE (%s) is out of range for sequence data type %s",
|
|
|
|
bufx, format_type_be(seqform->seqtypid))));
|
2017-02-10 21:12:32 +01:00
|
|
|
}
|
|
|
|
|
2003-11-24 17:54:07 +01:00
|
|
|
/* MINVALUE (null arg means NO MINVALUE) */
|
2004-01-07 19:56:30 +01:00
|
|
|
if (min_value != NULL && min_value->arg)
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqmin = defGetInt64(min_value);
|
|
|
|
seqdataform->log_cnt = 0;
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
}
|
2017-04-04 18:36:15 +02:00
|
|
|
else if (isInit || min_value != NULL || reset_min_value)
|
There's a patch attached to fix gcc 2.8.x warnings, except for the
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
1998-03-30 18:47:35 +02:00
|
|
|
{
|
2017-04-04 18:36:15 +02:00
|
|
|
if (seqform->seqincrement < 0 || reset_min_value)
|
2017-02-10 21:12:32 +01:00
|
|
|
{
|
|
|
|
/* descending seq */
|
|
|
|
if (seqform->seqtypid == INT2OID)
|
|
|
|
seqform->seqmin = PG_INT16_MIN;
|
|
|
|
else if (seqform->seqtypid == INT4OID)
|
|
|
|
seqform->seqmin = PG_INT32_MIN;
|
|
|
|
else
|
|
|
|
seqform->seqmin = PG_INT64_MIN;
|
|
|
|
}
|
2017-04-04 18:36:15 +02:00
|
|
|
else
|
2017-05-17 22:31:56 +02:00
|
|
|
seqform->seqmin = 1; /* ascending seq */
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->log_cnt = 0;
|
There's a patch attached to fix gcc 2.8.x warnings, except for the
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
1998-03-30 18:47:35 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2017-02-10 21:12:32 +01:00
|
|
|
if ((seqform->seqtypid == INT2OID && (seqform->seqmin < PG_INT16_MIN || seqform->seqmin > PG_INT16_MAX))
|
|
|
|
|| (seqform->seqtypid == INT4OID && (seqform->seqmin < PG_INT32_MIN || seqform->seqmin > PG_INT32_MAX))
|
|
|
|
|| (seqform->seqtypid == INT8OID && (seqform->seqmin < PG_INT64_MIN || seqform->seqmin > PG_INT64_MAX)))
|
|
|
|
{
|
|
|
|
char bufm[100];
|
|
|
|
|
|
|
|
snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin);
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
errmsg("MINVALUE (%s) is out of range for sequence data type %s",
|
|
|
|
bufm, format_type_be(seqform->seqtypid))));
|
2017-02-10 21:12:32 +01:00
|
|
|
}
|
|
|
|
|
2003-11-24 17:54:07 +01:00
|
|
|
/* crosscheck min/max */
|
2016-12-20 18:00:00 +01:00
|
|
|
if (seqform->seqmin >= seqform->seqmax)
|
2002-09-03 20:50:54 +02:00
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
char bufm[100],
|
|
|
|
bufx[100];
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin);
|
|
|
|
snprintf(bufx, sizeof(bufx), INT64_FORMAT, seqform->seqmax);
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("MINVALUE (%s) must be less than MAXVALUE (%s)",
|
|
|
|
bufm, bufx)));
|
2002-09-03 20:50:54 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2008-05-17 03:20:39 +02:00
|
|
|
/* START WITH */
|
|
|
|
if (start_value != NULL)
|
2017-05-02 16:41:48 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqstart = defGetInt64(start_value);
|
2017-05-02 16:41:48 +02:00
|
|
|
}
|
2003-11-24 17:54:07 +01:00
|
|
|
else if (isInit)
|
There's a patch attached to fix gcc 2.8.x warnings, except for the
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
1998-03-30 18:47:35 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
if (seqform->seqincrement > 0)
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
seqform->seqstart = seqform->seqmin; /* ascending seq */
|
1997-09-07 07:04:48 +02:00
|
|
|
else
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
seqform->seqstart = seqform->seqmax; /* descending seq */
|
There's a patch attached to fix gcc 2.8.x warnings, except for the
yyerror ones from bison. It also includes a few 'enhancements' to
the C programming style (which are, of course, personal).
The other patch removes the compilation of backend/lib/qsort.c, as
qsort() is a standard function in stdlib.h and can be used any
where else (and it is). It was only used in
backend/optimizer/geqo/geqo_pool.c, backend/optimizer/path/predmig.c,
and backend/storage/page/bufpage.c
> > Some or all of these changes might not be appropriate for v6.3,
since we > > are in beta testing and since they do not affect the
current functionality. > > For those cases, how about submitting
patches based on the final v6.3 > > release?
There's more to come. Please review these patches. I ran the
regression tests and they only failed where this was expected
(random, geo, etc).
Cheers,
Jeroen
1998-03-30 18:47:35 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2008-05-17 01:36:05 +02:00
|
|
|
/* crosscheck START */
|
2016-12-20 18:00:00 +01:00
|
|
|
if (seqform->seqstart < seqform->seqmin)
|
2008-05-17 01:36:05 +02:00
|
|
|
{
|
|
|
|
char bufs[100],
|
|
|
|
bufm[100];
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
snprintf(bufs, sizeof(bufs), INT64_FORMAT, seqform->seqstart);
|
|
|
|
snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin);
|
2008-05-17 01:36:05 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("START value (%s) cannot be less than MINVALUE (%s)",
|
|
|
|
bufs, bufm)));
|
|
|
|
}
|
2016-12-20 18:00:00 +01:00
|
|
|
if (seqform->seqstart > seqform->seqmax)
|
2008-05-17 01:36:05 +02:00
|
|
|
{
|
|
|
|
char bufs[100],
|
|
|
|
bufm[100];
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
snprintf(bufs, sizeof(bufs), INT64_FORMAT, seqform->seqstart);
|
|
|
|
snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmax);
|
2008-05-17 01:36:05 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
errmsg("START value (%s) cannot be greater than MAXVALUE (%s)",
|
|
|
|
bufs, bufm)));
|
2008-05-17 01:36:05 +02:00
|
|
|
}
|
|
|
|
|
2008-05-17 03:20:39 +02:00
|
|
|
/* RESTART [WITH] */
|
|
|
|
if (restart_value != NULL)
|
|
|
|
{
|
|
|
|
if (restart_value->arg != NULL)
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->last_value = defGetInt64(restart_value);
|
2008-05-17 03:20:39 +02:00
|
|
|
else
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->last_value = seqform->seqstart;
|
|
|
|
seqdataform->is_called = false;
|
|
|
|
seqdataform->log_cnt = 0;
|
2008-05-17 03:20:39 +02:00
|
|
|
}
|
|
|
|
else if (isInit)
|
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->last_value = seqform->seqstart;
|
|
|
|
seqdataform->is_called = false;
|
2008-05-17 03:20:39 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* crosscheck RESTART (or current value, if changing MIN/MAX) */
|
2016-12-20 18:00:00 +01:00
|
|
|
if (seqdataform->last_value < seqform->seqmin)
|
2002-09-03 20:50:54 +02:00
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
char bufs[100],
|
|
|
|
bufm[100];
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
snprintf(bufs, sizeof(bufs), INT64_FORMAT, seqdataform->last_value);
|
|
|
|
snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin);
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
errmsg("RESTART value (%s) cannot be less than MINVALUE (%s)",
|
|
|
|
bufs, bufm)));
|
2002-09-03 20:50:54 +02:00
|
|
|
}
|
2016-12-20 18:00:00 +01:00
|
|
|
if (seqdataform->last_value > seqform->seqmax)
|
2002-09-03 20:50:54 +02:00
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
char bufs[100],
|
|
|
|
bufm[100];
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
snprintf(bufs, sizeof(bufs), INT64_FORMAT, seqdataform->last_value);
|
|
|
|
snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmax);
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
errmsg("RESTART value (%s) cannot be greater than MAXVALUE (%s)",
|
|
|
|
bufs, bufm)));
|
2002-09-03 20:50:54 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2003-03-20 08:02:11 +01:00
|
|
|
/* CACHE */
|
2004-01-07 19:56:30 +01:00
|
|
|
if (cache_value != NULL)
|
2002-09-03 20:50:54 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqcache = defGetInt64(cache_value);
|
|
|
|
if (seqform->seqcache <= 0)
|
2003-11-24 17:54:07 +01:00
|
|
|
{
|
|
|
|
char buf[100];
|
2002-09-04 22:31:48 +02:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
snprintf(buf, sizeof(buf), INT64_FORMAT, seqform->seqcache);
|
2003-11-24 17:54:07 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("CACHE (%s) must be greater than zero",
|
|
|
|
buf)));
|
|
|
|
}
|
2016-12-20 18:00:00 +01:00
|
|
|
seqdataform->log_cnt = 0;
|
2002-09-03 20:50:54 +02:00
|
|
|
}
|
2003-11-24 17:54:07 +01:00
|
|
|
else if (isInit)
|
2017-05-02 16:41:48 +02:00
|
|
|
{
|
2016-12-20 18:00:00 +01:00
|
|
|
seqform->seqcache = 1;
|
2017-05-02 16:41:48 +02:00
|
|
|
}
|
1997-04-02 05:51:23 +02:00
|
|
|
}
|
|
|
|
|
2006-08-21 02:57:26 +02:00
|
|
|
/*
|
|
|
|
* Process an OWNED BY option for CREATE/ALTER SEQUENCE
|
|
|
|
*
|
|
|
|
* Ownership permissions on the sequence are already checked,
|
|
|
|
* but if we are establishing a new owned-by dependency, we must
|
|
|
|
* enforce that the referenced table has the same owner and namespace
|
|
|
|
* as the sequence.
|
|
|
|
*/
|
|
|
|
static void
|
2017-04-06 14:33:16 +02:00
|
|
|
process_owned_by(Relation seqrel, List *owned_by, bool for_identity)
|
2006-08-21 02:57:26 +02:00
|
|
|
{
|
2017-04-06 14:33:16 +02:00
|
|
|
DependencyType deptype;
|
2006-08-21 02:57:26 +02:00
|
|
|
int nnames;
|
|
|
|
Relation tablerel;
|
|
|
|
AttrNumber attnum;
|
|
|
|
|
2017-04-06 14:33:16 +02:00
|
|
|
deptype = for_identity ? DEPENDENCY_INTERNAL : DEPENDENCY_AUTO;
|
|
|
|
|
2006-08-21 02:57:26 +02:00
|
|
|
nnames = list_length(owned_by);
|
|
|
|
Assert(nnames > 0);
|
|
|
|
if (nnames == 1)
|
|
|
|
{
|
|
|
|
/* Must be OWNED BY NONE */
|
|
|
|
if (strcmp(strVal(linitial(owned_by)), "none") != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("invalid OWNED BY option"),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
errhint("Specify OWNED BY table.column or OWNED BY NONE.")));
|
2006-08-21 02:57:26 +02:00
|
|
|
tablerel = NULL;
|
|
|
|
attnum = 0;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
List *relname;
|
|
|
|
char *attrname;
|
|
|
|
RangeVar *rel;
|
|
|
|
|
|
|
|
/* Separate relname and attr name */
|
|
|
|
relname = list_truncate(list_copy(owned_by), nnames - 1);
|
|
|
|
attrname = strVal(lfirst(list_tail(owned_by)));
|
|
|
|
|
|
|
|
/* Open and lock rel to ensure it won't go away meanwhile */
|
|
|
|
rel = makeRangeVarFromNameList(relname);
|
|
|
|
tablerel = relation_openrv(rel, AccessShareLock);
|
|
|
|
|
2013-05-16 01:03:29 +02:00
|
|
|
/* Must be a regular or foreign table */
|
|
|
|
if (!(tablerel->rd_rel->relkind == RELKIND_RELATION ||
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
tablerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
|
2017-04-06 14:33:16 +02:00
|
|
|
tablerel->rd_rel->relkind == RELKIND_VIEW ||
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
tablerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
|
2006-08-21 02:57:26 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
2013-05-16 01:03:29 +02:00
|
|
|
errmsg("referenced relation \"%s\" is not a table or foreign table",
|
2006-08-21 02:57:26 +02:00
|
|
|
RelationGetRelationName(tablerel))));
|
|
|
|
|
|
|
|
/* We insist on same owner and schema */
|
|
|
|
if (seqrel->rd_rel->relowner != tablerel->rd_rel->relowner)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
2007-11-15 22:14:46 +01:00
|
|
|
errmsg("sequence must have same owner as table it is linked to")));
|
2006-08-21 02:57:26 +02:00
|
|
|
if (RelationGetNamespace(seqrel) != RelationGetNamespace(tablerel))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
2006-10-06 19:14:01 +02:00
|
|
|
errmsg("sequence must be in same schema as table it is linked to")));
|
2006-08-21 02:57:26 +02:00
|
|
|
|
|
|
|
/* Now, fetch the attribute number from the system cache */
|
|
|
|
attnum = get_attnum(RelationGetRelid(tablerel), attrname);
|
|
|
|
if (attnum == InvalidAttrNumber)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_COLUMN),
|
|
|
|
errmsg("column \"%s\" of relation \"%s\" does not exist",
|
|
|
|
attrname, RelationGetRelationName(tablerel))));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2017-04-06 14:33:16 +02:00
|
|
|
* Catch user explicitly running OWNED BY on identity sequence.
|
|
|
|
*/
|
|
|
|
if (deptype == DEPENDENCY_AUTO)
|
|
|
|
{
|
|
|
|
Oid tableId;
|
|
|
|
int32 colId;
|
|
|
|
|
|
|
|
if (sequenceIsOwned(RelationGetRelid(seqrel), DEPENDENCY_INTERNAL, &tableId, &colId))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot change ownership of identity sequence"),
|
|
|
|
errdetail("Sequence \"%s\" is linked to table \"%s\".",
|
|
|
|
RelationGetRelationName(seqrel),
|
|
|
|
get_rel_name(tableId))));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* OK, we are ready to update pg_depend. First remove any existing
|
2006-10-04 02:30:14 +02:00
|
|
|
* dependencies for the sequence, then optionally add a new one.
|
2006-08-21 02:57:26 +02:00
|
|
|
*/
|
2017-04-06 14:33:16 +02:00
|
|
|
deleteDependencyRecordsForClass(RelationRelationId, RelationGetRelid(seqrel),
|
|
|
|
RelationRelationId, deptype);
|
2006-08-21 02:57:26 +02:00
|
|
|
|
|
|
|
if (tablerel)
|
|
|
|
{
|
|
|
|
ObjectAddress refobject,
|
|
|
|
depobject;
|
|
|
|
|
|
|
|
refobject.classId = RelationRelationId;
|
|
|
|
refobject.objectId = RelationGetRelid(tablerel);
|
|
|
|
refobject.objectSubId = attnum;
|
|
|
|
depobject.classId = RelationRelationId;
|
|
|
|
depobject.objectId = RelationGetRelid(seqrel);
|
|
|
|
depobject.objectSubId = 0;
|
2017-04-06 14:33:16 +02:00
|
|
|
recordDependencyOn(&depobject, &refobject, deptype);
|
2006-08-21 02:57:26 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Done, but hold lock until commit */
|
|
|
|
if (tablerel)
|
|
|
|
relation_close(tablerel, NoLock);
|
|
|
|
}
|
|
|
|
|
2000-11-30 02:47:33 +01:00
|
|
|
|
2017-04-06 14:33:16 +02:00
|
|
|
/*
|
|
|
|
* Return sequence parameters in a list of the form created by the parser.
|
|
|
|
*/
|
|
|
|
List *
|
|
|
|
sequence_options(Oid relid)
|
|
|
|
{
|
|
|
|
HeapTuple pgstuple;
|
|
|
|
Form_pg_sequence pgsform;
|
|
|
|
List *options = NIL;
|
|
|
|
|
|
|
|
pgstuple = SearchSysCache1(SEQRELID, relid);
|
|
|
|
if (!HeapTupleIsValid(pgstuple))
|
|
|
|
elog(ERROR, "cache lookup failed for sequence %u", relid);
|
|
|
|
pgsform = (Form_pg_sequence) GETSTRUCT(pgstuple);
|
|
|
|
|
|
|
|
options = lappend(options, makeDefElem("cache", (Node *) makeInteger(pgsform->seqcache), -1));
|
|
|
|
options = lappend(options, makeDefElem("cycle", (Node *) makeInteger(pgsform->seqcycle), -1));
|
|
|
|
options = lappend(options, makeDefElem("increment", (Node *) makeInteger(pgsform->seqincrement), -1));
|
|
|
|
options = lappend(options, makeDefElem("maxvalue", (Node *) makeInteger(pgsform->seqmax), -1));
|
|
|
|
options = lappend(options, makeDefElem("minvalue", (Node *) makeInteger(pgsform->seqmin), -1));
|
|
|
|
options = lappend(options, makeDefElem("start", (Node *) makeInteger(pgsform->seqstart), -1));
|
|
|
|
|
|
|
|
ReleaseSysCache(pgstuple);
|
|
|
|
|
|
|
|
return options;
|
|
|
|
}
|
|
|
|
|
2011-01-02 14:08:08 +01:00
|
|
|
/*
|
2016-12-20 18:00:00 +01:00
|
|
|
* Return sequence parameters (formerly for use by information schema)
|
2011-01-02 14:08:08 +01:00
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
pg_sequence_parameters(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
Oid relid = PG_GETARG_OID(0);
|
|
|
|
TupleDesc tupdesc;
|
2017-02-10 21:12:32 +01:00
|
|
|
Datum values[7];
|
|
|
|
bool isnull[7];
|
2016-12-20 18:00:00 +01:00
|
|
|
HeapTuple pgstuple;
|
|
|
|
Form_pg_sequence pgsform;
|
2011-01-02 14:08:08 +01:00
|
|
|
|
|
|
|
if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_UPDATE | ACL_USAGE) != ACLCHECK_OK)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied for sequence %s",
|
2016-12-20 18:00:00 +01:00
|
|
|
get_rel_name(relid))));
|
2011-01-02 14:08:08 +01:00
|
|
|
|
2017-02-10 21:12:32 +01:00
|
|
|
tupdesc = CreateTemplateTupleDesc(7, false);
|
2011-03-26 23:28:40 +01:00
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "start_value",
|
|
|
|
INT8OID, -1, 0);
|
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "minimum_value",
|
|
|
|
INT8OID, -1, 0);
|
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 3, "maximum_value",
|
|
|
|
INT8OID, -1, 0);
|
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 4, "increment",
|
|
|
|
INT8OID, -1, 0);
|
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 5, "cycle_option",
|
|
|
|
BOOLOID, -1, 0);
|
2016-11-18 18:00:00 +01:00
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 6, "cache_size",
|
|
|
|
INT8OID, -1, 0);
|
2017-02-10 21:12:32 +01:00
|
|
|
TupleDescInitEntry(tupdesc, (AttrNumber) 7, "data_type",
|
|
|
|
OIDOID, -1, 0);
|
2011-01-02 14:08:08 +01:00
|
|
|
|
|
|
|
BlessTupleDesc(tupdesc);
|
|
|
|
|
|
|
|
memset(isnull, 0, sizeof(isnull));
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
pgstuple = SearchSysCache1(SEQRELID, relid);
|
|
|
|
if (!HeapTupleIsValid(pgstuple))
|
|
|
|
elog(ERROR, "cache lookup failed for sequence %u", relid);
|
|
|
|
pgsform = (Form_pg_sequence) GETSTRUCT(pgstuple);
|
2011-01-02 14:08:08 +01:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
values[0] = Int64GetDatum(pgsform->seqstart);
|
|
|
|
values[1] = Int64GetDatum(pgsform->seqmin);
|
|
|
|
values[2] = Int64GetDatum(pgsform->seqmax);
|
|
|
|
values[3] = Int64GetDatum(pgsform->seqincrement);
|
|
|
|
values[4] = BoolGetDatum(pgsform->seqcycle);
|
|
|
|
values[5] = Int64GetDatum(pgsform->seqcache);
|
2017-02-10 21:12:32 +01:00
|
|
|
values[6] = ObjectIdGetDatum(pgsform->seqtypid);
|
2011-01-02 14:08:08 +01:00
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
ReleaseSysCache(pgstuple);
|
2011-01-02 14:08:08 +01:00
|
|
|
|
|
|
|
return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, isnull));
|
|
|
|
}
|
|
|
|
|
2016-11-18 18:00:00 +01:00
|
|
|
/*
|
|
|
|
* Return the last value from the sequence
|
|
|
|
*
|
|
|
|
* Note: This has a completely different meaning than lastval().
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
pg_sequence_last_value(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
Oid relid = PG_GETARG_OID(0);
|
|
|
|
SeqTable elm;
|
|
|
|
Relation seqrel;
|
|
|
|
Buffer buf;
|
|
|
|
HeapTupleData seqtuple;
|
2016-12-20 18:00:00 +01:00
|
|
|
Form_pg_sequence_data seq;
|
2016-11-18 18:00:00 +01:00
|
|
|
bool is_called;
|
|
|
|
int64 result;
|
|
|
|
|
Fix ALTER SEQUENCE locking
In 1753b1b027035029c2a2a1649065762fafbf63f3, the pg_sequence system
catalog was introduced. This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally. This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.
The main problems were:
- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
caused by updates to pg_sequence catalog.
- Sequence WAL writes and catalog updates are not protected by same
lock, which could lead to inconsistent recovery order.
- nextval() disregarding uncommitted ALTER SEQUENCE changes.
To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock. This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself. (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.
As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions. So check whether we are only running a
RESTART and run with less locking if so.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-10 05:35:31 +02:00
|
|
|
/* open and lock sequence */
|
2016-11-18 18:00:00 +01:00
|
|
|
init_sequence(relid, &elm, &seqrel);
|
|
|
|
|
|
|
|
if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_USAGE) != ACLCHECK_OK)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied for sequence %s",
|
|
|
|
RelationGetRelationName(seqrel))));
|
|
|
|
|
2016-12-20 18:00:00 +01:00
|
|
|
seq = read_seq_tuple(seqrel, &buf, &seqtuple);
|
2016-11-18 18:00:00 +01:00
|
|
|
|
|
|
|
is_called = seq->is_called;
|
|
|
|
result = seq->last_value;
|
|
|
|
|
|
|
|
UnlockReleaseBuffer(buf);
|
|
|
|
relation_close(seqrel, NoLock);
|
|
|
|
|
|
|
|
if (is_called)
|
|
|
|
PG_RETURN_INT64(result);
|
|
|
|
else
|
|
|
|
PG_RETURN_NULL();
|
|
|
|
}
|
|
|
|
|
2011-01-02 14:08:08 +01:00
|
|
|
|
2001-03-22 05:01:46 +01:00
|
|
|
void
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
seq_redo(XLogReaderState *record)
|
2000-11-30 02:47:33 +01:00
|
|
|
{
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogRecPtr lsn = record->EndRecPtr;
|
|
|
|
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
|
2001-03-22 05:01:46 +01:00
|
|
|
Buffer buffer;
|
|
|
|
Page page;
|
Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation. The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer. Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200. It most likely explains the
original report as well, though we don't yet have confirmation of that.
To fix, change the code so that only bytes that are supposed to change will
change, even transiently. This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.
Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.
So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.
Back-patch to 9.0 where Hot Standby was added.
2012-02-05 21:49:17 +01:00
|
|
|
Page localpage;
|
2001-03-22 05:01:46 +01:00
|
|
|
char *item;
|
|
|
|
Size itemsz;
|
|
|
|
xl_seq_rec *xlrec = (xl_seq_rec *) XLogRecGetData(record);
|
2000-12-28 14:00:29 +01:00
|
|
|
sequence_magic *sm;
|
2000-11-30 02:47:33 +01:00
|
|
|
|
2000-12-28 14:00:29 +01:00
|
|
|
if (info != XLOG_SEQ_LOG)
|
Commit to match discussed elog() changes. Only update is that LOG is
now just below FATAL in server_min_messages. Added more text to
highlight ordering difference between it and client_min_messages.
---------------------------------------------------------------------------
REALLYFATAL => PANIC
STOP => PANIC
New INFO level the prints to client by default
New LOG level the prints to server log by default
Cause VACUUM information to print only to the client
NOTICE => INFO where purely information messages are sent
DEBUG => LOG for purely server status messages
DEBUG removed, kept as backward compatible
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added
DebugLvl removed in favor of new DEBUG[1-5] symbols
New server_min_messages GUC parameter with values:
DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC
New client_min_messages GUC parameter with values:
DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC
Server startup now logged with LOG instead of DEBUG
Remove debug_level GUC parameter
elog() numbers now start at 10
Add test to print error message if older elog() values are passed to elog()
Bootstrap mode now has a -d that requires an argument, like postmaster
2002-03-02 22:39:36 +01:00
|
|
|
elog(PANIC, "seq_redo: unknown op code %u", info);
|
2000-11-30 02:47:33 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
buffer = XLogInitBufferForRedo(record, 0);
|
2016-04-20 15:31:19 +02:00
|
|
|
page = (Page) BufferGetPage(buffer);
|
2000-11-30 02:47:33 +01:00
|
|
|
|
Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation. The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer. Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200. It most likely explains the
original report as well, though we don't yet have confirmation of that.
To fix, change the code so that only bytes that are supposed to change will
change, even transiently. This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.
Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.
So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.
Back-patch to 9.0 where Hot Standby was added.
2012-02-05 21:49:17 +01:00
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* We always reinit the page. However, since this WAL record type is also
|
|
|
|
* used for updating sequences, it's possible that a hot-standby backend
|
|
|
|
* is examining the page concurrently; so we mustn't transiently trash the
|
|
|
|
* buffer. The solution is to build the correct new page contents in
|
|
|
|
* local workspace and then memcpy into the buffer. Then only bytes that
|
|
|
|
* are supposed to change will change, even transiently. We must palloc
|
|
|
|
* the local page for alignment reasons.
|
Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation. The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer. Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200. It most likely explains the
original report as well, though we don't yet have confirmation of that.
To fix, change the code so that only bytes that are supposed to change will
change, even transiently. This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.
Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.
So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.
Back-patch to 9.0 where Hot Standby was added.
2012-02-05 21:49:17 +01:00
|
|
|
*/
|
|
|
|
localpage = (Page) palloc(BufferGetPageSize(buffer));
|
|
|
|
|
|
|
|
PageInit(localpage, BufferGetPageSize(buffer), sizeof(sequence_magic));
|
|
|
|
sm = (sequence_magic *) PageGetSpecialPointer(localpage);
|
2000-12-28 14:00:29 +01:00
|
|
|
sm->magic = SEQ_MAGIC;
|
2000-11-30 02:47:33 +01:00
|
|
|
|
2001-03-22 05:01:46 +01:00
|
|
|
item = (char *) xlrec + sizeof(xl_seq_rec);
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
itemsz = XLogRecGetDataLen(record) - sizeof(xl_seq_rec);
|
Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.
More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence. (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.) The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record. However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false). Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record. A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.
To fix, get rid of the idea of logging an is_called status different from
reality. This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible. In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results. This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.
In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details. It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.
Back-patch to all supported branches.
2012-07-25 23:40:36 +02:00
|
|
|
|
Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation. The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer. Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200. It most likely explains the
original report as well, though we don't yet have confirmation of that.
To fix, change the code so that only bytes that are supposed to change will
change, even transiently. This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.
Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.
So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.
Back-patch to 9.0 where Hot Standby was added.
2012-02-05 21:49:17 +01:00
|
|
|
if (PageAddItem(localpage, (Item) item, itemsz,
|
2007-09-20 19:56:33 +02:00
|
|
|
FirstOffsetNumber, false, false) == InvalidOffsetNumber)
|
Commit to match discussed elog() changes. Only update is that LOG is
now just below FATAL in server_min_messages. Added more text to
highlight ordering difference between it and client_min_messages.
---------------------------------------------------------------------------
REALLYFATAL => PANIC
STOP => PANIC
New INFO level the prints to client by default
New LOG level the prints to server log by default
Cause VACUUM information to print only to the client
NOTICE => INFO where purely information messages are sent
DEBUG => LOG for purely server status messages
DEBUG removed, kept as backward compatible
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added
DebugLvl removed in favor of new DEBUG[1-5] symbols
New server_min_messages GUC parameter with values:
DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC
New client_min_messages GUC parameter with values:
DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC
Server startup now logged with LOG instead of DEBUG
Remove debug_level GUC parameter
elog() numbers now start at 10
Add test to print error message if older elog() values are passed to elog()
Bootstrap mode now has a -d that requires an argument, like postmaster
2002-03-02 22:39:36 +01:00
|
|
|
elog(PANIC, "seq_redo: failed to add item to page");
|
2000-11-30 02:47:33 +01:00
|
|
|
|
Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation. The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer. Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200. It most likely explains the
original report as well, though we don't yet have confirmation of that.
To fix, change the code so that only bytes that are supposed to change will
change, even transiently. This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.
Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.
So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.
Back-patch to 9.0 where Hot Standby was added.
2012-02-05 21:49:17 +01:00
|
|
|
PageSetLSN(localpage, lsn);
|
|
|
|
|
|
|
|
memcpy(page, localpage, BufferGetPageSize(buffer));
|
2006-04-01 01:32:07 +02:00
|
|
|
MarkBufferDirty(buffer);
|
|
|
|
UnlockReleaseBuffer(buffer);
|
Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation. The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer. Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200. It most likely explains the
original report as well, though we don't yet have confirmation of that.
To fix, change the code so that only bytes that are supposed to change will
change, even transiently. This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.
Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.
So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.
Back-patch to 9.0 where Hot Standby was added.
2012-02-05 21:49:17 +01:00
|
|
|
|
|
|
|
pfree(localpage);
|
2000-11-30 02:47:33 +01:00
|
|
|
}
|
2013-10-03 22:17:18 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Flush cached sequence information.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ResetSequenceCaches(void)
|
|
|
|
{
|
2013-11-15 11:29:38 +01:00
|
|
|
if (seqhashtab)
|
2013-10-03 22:17:18 +02:00
|
|
|
{
|
2013-11-15 11:29:38 +01:00
|
|
|
hash_destroy(seqhashtab);
|
|
|
|
seqhashtab = NULL;
|
2013-10-03 22:17:18 +02:00
|
|
|
}
|
2013-10-07 21:55:56 +02:00
|
|
|
|
|
|
|
last_used_seq = NULL;
|
2013-10-03 22:17:18 +02:00
|
|
|
}
|
2017-02-08 21:45:30 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Mask a Sequence page before performing consistency checks on it.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
seq_mask(char *page, BlockNumber blkno)
|
|
|
|
{
|
|
|
|
mask_page_lsn(page);
|
|
|
|
|
|
|
|
mask_unused_space(page);
|
|
|
|
}
|