Commit Graph

369 Commits

Author SHA1 Message Date
Robert Haas 8f6bb851bd Remove more volatile qualifiers.
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.
This continues work begun in df4077cda2
and 6ba4ecbf47.

Thomas Munro and Michael Paquier
2015-10-06 15:45:02 -04:00
Andres Freund 920218cbc0 Improve errhint() about replication slot naming restrictions.
The existing hint talked about "may only contain letters", but the
actual requirement is more strict: only lower case letters are allowed.

Reported-By: Rushabh Lathia
Author: Rushabh Lathia
Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com
Backpatch: 9.4-, where replication slots were added
2015-10-03 15:29:08 +02:00
Alvaro Herrera 17f5831c81 Fix "sesssion" typo
It was introduced alongside replication origins, by commit
5aa2350426, so backpatch to 9.5.

Pointed out by Fujii Masao
2015-09-28 19:13:42 -03:00
Andres Freund c314ead5be Add ability to reserve WAL upon slot creation via replication protocol.
Since 6fcd885 it is possible to immediately reserve WAL when creating a
slot via pg_create_physical_replication_slot(). Extend the replication
protocol to allow that as well.

Although, in contrast to the SQL interface, it is possible to update the
reserved location via the replication interface, it is still useful
being able to reserve upon creation there. Otherwise the logic in
ReplicationSlotReserveWal() has to be repeated in slot employing
clients.

Author: Michael Paquier
Discussion: CAB7nPqT0Wc1W5mdYGeJ_wbutbwNN+3qgrFR64avXaQCiJMGaYA@mail.gmail.com
2015-09-06 13:30:57 +02:00
Heikki Linnakangas c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Andres Freund e95126cf04 Don't use function definitions looking like old-style ones.
This fixes a bunch of somewhat pedantic warnings with new
compilers. Since by far the majority of other functions definitions use
the (void) style it just seems to be consistent to do so as well in the
remaining few places.
2015-08-15 17:25:00 +02:00
Andres Freund a4b059fdde Remove duplicated assignment in pg_create_physical_replication_slot.
Reported-By: Gurjeet Singh
2015-08-12 17:35:50 +02:00
Alvaro Herrera 4901b2f495 Don't include rel.h when relcache.h is sufficient
Trivial change to reduce exposure of rel.h.
2015-08-11 13:03:14 -03:00
Andres Freund 6fcd88511f Allow pg_create_physical_replication_slot() to reserve WAL.
When creating a physical slot it's often useful to immediately reserve
the current WAL position instead of only doing after the first feedback
message arrives. That e.g. allows slots to guarantee that all the WAL
for a base backup will be available afterwards.

Logical slots already have to reserve WAL during creation, so generalize
that logic into being usable for both physical and logical slots.

Catversion bump because of the new parameter.

Author: Gurjeet Singh
Reviewed-By: Andres Freund
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:34:31 +02:00
Andres Freund 093d0c83c1 Introduce macros determining if a replication slot is physical or logical.
These make the code a bit easier to read, and make it easier to add a
more explicit notion of a slot's type at some point in the future.

Author: Gurjeet Singh
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:32:48 +02:00
Andres Freund 3b425b7c02 Minor cleanups in slot related code.
Fix a bunch of typos, and remove two superflous includes.

Author: Gurjeet Singh
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
Backpatch: 9.4
2015-08-11 12:32:48 +02:00
Andres Freund 3f811c2d6f Add confirmed_flush column to pg_replication_slots.
There's no reason not to expose both restart_lsn and confirmed_flush
since they have rather distinct meanings. The former is the oldest WAL
still required and valid for both physical and logical slots, whereas
the latter is the location up to which a logical slot's consumer has
confirmed receiving data. Most of the time a slot will require older
WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn).

Author: Marko Tiikkaja, editorialized by me
Discussion: 559D110B.1020109@joh.to
2015-08-10 13:28:18 +02:00
Andres Freund 5c4b25acce Fix copy & paste mistake in pg_get_replication_slots().
XLogRecPtr was compared with InvalidTransactionId instead of
InvalidXLogRecPtr. As both are defined to the same value this doesn't
cause any actual problems, but it's still wrong.

Backpatch: 9.4-master, bug was introduced in 9.4
2015-08-10 13:28:18 +02:00
Andres Freund 18e8613564 Address points made in post-commit review of replication origins.
Amit reviewed the replication origins patch and made some good
points. Address them. This fixes typos in error messages, docs and
comments and adds a missing error check (although in a
should-never-happen scenario).

Discussion: CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com
Backpatch: 9.5, where replication origins were introduced.
2015-08-07 15:09:05 +02:00
Andres Freund a855118be3 Fix debug message output when connecting to a logical slot.
Previously the message erroneously printed the same LSN twice as the
assignment to the start_lsn variable was before the message. Correct
that.

Reported-By: Marko Tiikkaja
Author: Marko Tiikkaja
Backpatch: 9.5, where logical decoding was introduced
2015-08-05 13:26:01 +02:00
Andres Freund b2f6f749c7 Fix logical decoding bug leading to inefficient reopening of files.
When spilling transaction data to disk a simple typo caused the output
file to be closed and reopened for every serialized change. That happens
to not have a huge impact on linux, which is why it probably wasn't
noticed so far, but on windows that appears to trigger actual disk
writes after every change. Not fun.

The bug fortunately does not have any impact besides speed. A change
could end up being in the wrong segment (last instead of next), but
since we read all files to the end, that's just ugly, not really
problematic. It's not a problem to upgrade, since transaction spill
files do not persist across restarts.

Bug: #13484
Reported-By: Olivier Gosseaume
Discussion: 20150703090217.1190.63940@wrigleys.postgresql.org

Backpatch to 9.4, where logical decoding was added.
2015-07-07 13:12:46 +02:00
Peter Eisentraut 385522c7dc Fix typo 2015-06-10 21:30:17 -04:00
Tom Lane 32f628be74 Fix assorted inconsistencies in our calls of readlink().
Ensure that we null-terminate the result string (one place in pg_rewind).
Be paranoid about out-of-range results from readlink() (should not happen,
but there is no good reason for some call sites to be careful about it and
others not).  Consistently use the whole buffer, not sometimes one byte
less.  Ensure we emit an appropriate errcode() in all cases.  Spell the
error messages the same way.

The only serious bug here is the missing null-termination in pg_rewind,
which is new code, so no need for a back-patch.

Abhijit Menon-Sen and Tom Lane
2015-05-28 12:17:22 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Magnus Hagander 3b075e9d7b Fix typos in comments
Dmitriy Olshevskiy
2015-05-17 14:58:04 +02:00
Heikki Linnakangas ffd37740ee Add archive_mode='always' option.
In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.
2015-05-15 18:55:24 +03:00
Andrew Dunstan 72d422a522 Map basebackup tablespaces using a tablespace_map file
Windows can't reliably restore symbolic links from a tar format, so
instead during backup start we create a tablespace_map file, which is
used by the restoring postgres to create the correct links in pg_tblspc.
The backup protocol also now has an option to request this file to be
included in the backup stream, and this is used by pg_basebackup when
operating in tar mode.

This is done on all platforms, not just Windows.

This means that pg_basebackup will not not work in tar mode against 9.4
and older servers, as this protocol option isn't implemented there.

Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.
2015-05-12 09:29:10 -04:00
Heikki Linnakangas 179cdd0981 Add macros to check if a filename is a WAL segment or other such file.
We had many instances of the strlen + strspn combination to check for that.
This makes the code a bit easier to read.
2015-05-08 21:58:57 +03:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Peter Eisentraut ad8d6d064c Fix typos
Author: Erik Rijkers <er@xs4all.nl>
2015-05-04 20:40:19 -04:00
Andres Freund 2b22795b32 Copy editing of the replication origins patch.
Michael Paquier and myself.
2015-05-01 12:22:13 +02:00
Andres Freund e0f26fc765 Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX.
We can't rely on UINT16_MAX being present, which is why we introduced
PG_UINT16_MAX...

Buildfarm animal bowerbird via Andrew Gierth.
2015-04-30 00:19:36 +02:00
Andres Freund 5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
Andres Freund dfbaed4597 Use a fd opened for read/write when syncing slots during startup.
Some operating systems, including the reporter's windows, return EBADFD
or similar when fsync() is invoked on a O_RDONLY file descriptor.
Unfortunately RestoreSlotFromDisk() does exactly that; which causes
failures after restarts in at least some scenarios.

If you hit the bug the error message will be something like
ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor

Simply use O_RDWR instead of O_RDONLY when opening the relevant file
descriptor to fix the bug.  Unfortunately I have no way of verifying the
fix, but we've seen similar problems in the past.

This bug goes back to 9.4 where slots were introduced. Backpatch
accordingly.

Reported-By: Patrice Drolet
Bug: #13143:
Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org
2015-04-28 00:17:43 +02:00
Andres Freund 6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Andres Freund d811c037ce Add 'active_in' column to pg_replication_slots.
Right now it is visible whether a replication slot is active in any
session, but not in which.  Adding the active_in column, containing the
pid of the backend having acquired the slot, makes it much easier to
associate pg_replication_slots entries with the corresponding
pg_stat_replication/pg_stat_activity row.

This should have been done from the start, but I (Andres) dropped the
ball there somehow.

Author: Craig Ringer, revised by me Discussion:
CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-21 11:51:06 +02:00
Heikki Linnakangas e2999abcd1 Fix assertion failure in logical decoding.
Logical decoding set SnapshotData's regd_count field to avoid the
snapshot manager from prematurely freeing snapshots that are generated
by the decoding system. That was always an abuse of the field, as it was
never supposed to be used outside the snapshot manager. Commit 94028691
made snapshot manager's tracking of the snapshots smarter, and that scheme
fell apart. The snapshot manager got confused and hit the assertion, when
a snapshot that was marked with regd_count==1 was not found in the heap,
where the snapshot manager tracks registered the snapshots.

To fix, don't abuse the regd_count field like that. Logical decoding still
abuses the active_count field for similar purposes, but that's currently
harmless.

The assertion failure was first reported by Michael Paquier
2015-04-16 21:50:07 +03:00
Heikki Linnakangas 4f700bcd20 Reorganize our CRC source files again.
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.

Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-04-14 17:03:42 +03:00
Heikki Linnakangas a10589a512 Remove duplicated words in comments.
David Rowley
2015-04-12 10:46:17 +03:00
Fujii Masao 6e4bf4ecd3 Fix error handling of XLogReaderAllocate in case of OOM
Similarly to previous fix 9b8d478, commit 2c03216 has switched
XLogReaderAllocate() to use a set of palloc calls instead of malloc,
causing any callers of this function to fail with an error instead of
receiving a NULL pointer in case of out-of-memory error. Fix this by
using palloc_extended with MCXT_ALLOC_NO_OOM that will safely return
NULL in case of an OOM.

Michael Paquier, slightly modified by me.
2015-04-03 21:55:37 +09:00
Simon Riggs 7dae3cf68c Correct comment to use RS_EPHEMERAL 2015-04-02 07:45:19 -04:00
Heikki Linnakangas 1d0db8de04 Remove spurious semicolons.
Petr Jelinek
2015-03-31 15:12:27 +03:00
Tom Lane 785941cdc3 Tweak __attribute__-wrapping macros for better pgindent results.
This improves on commit bbfd7edae5 by
making two simple changes:

* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed().  This reduces
pgindent's tendency to misformat declarations involving them.

* attributes are now always attached to function declarations, not
definitions.  Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway.  (It does little good to put a
noreturn annotation where callers can't see it.)  In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.

I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
2015-03-26 14:03:25 -04:00
Tatsuo Ishii 656ea810e5 Make SyncRepWakeQueue to a static function
It is only used in src/backend/replication/syncrep.c.

Back-patch to all supported branches except 9.1 which declares the
function as static.
2015-03-26 10:34:08 +09:00
Andres Freund 4f1b890b13 Merge the various forms of transaction commit & abort records.
Since 465883b0a two versions of commit records have existed. A compact
version that was used when no cache invalidations, smgr unlinks and
similar were needed, and a full version that could deal with all
that. Additionally the full version was embedded into twophase commit
records.

That resulted in a measurable reduction in the size of the logged WAL in
some workloads. But more recently additions like logical decoding, which
e.g. needs information about the database something was executed on,
made it applicable in fewer situations. The static split generally made
it hard to expand the commit record, because concerns over the size made
it hard to add anything to the compact version.

Additionally it's not particularly pretty to have twophase.c insert
RM_XACT records.

Rejigger things so that the commit and abort records only have one form
each, including the twophase equivalents. The presence of the various
optional (in the sense of not being in every record) pieces is indicated
by a bits in the 'xinfo' flag.  That flag previously was not included in
compact commit records. To prevent an increase in size due to its
presence, it's only included if necessary; signalled by a bit in the
xl_info bits available for xact.c, similar to heapam.c's
XLOG_HEAP_OPMASK/XLOG_HEAP_INIT_PAGE.

Twophase commit/aborts are now the same as their normal
counterparts. The original transaction's xid is included in an optional
data field.

This means that commit records generally are smaller, except in the case
of a transaction with subtransactions, but no other special cases; the
increase there is four bytes, which seems acceptable given that the more
common case of not having subtransactions shrank.  The savings are
especially measurable for twophase commits, which previously always used
the full version; but will in practice only infrequently have required
that.

The motivation for this work are not the space savings and and
deduplication though; it's that it makes it easier to extend commit
records with additional information. That's just a few lines of code
now; without impacting the common case where that information is not
needed.

Discussion: 20150220152150.GD4149@awork2.anarazel.de,
    235610.92468.qm%40web29004.mail.ird.yahoo.com

Reviewed-By: Heikki Linnakangas, Simon Riggs
2015-03-15 17:37:07 +01:00
Tatsuo Ishii 364c006c1f Fix integer overflow in debug message of walreceiver
The message tries to tell the replication apply delay which fails if
the first WAL record is not applied yet. Fix is, instead of telling
overflowed minus numeric, showing "N/A" which indicates that the delay
data is not yet available. Problem reported by me and patch by
Fabrízio de Royes Mello.

Back patched to 9.4, 9.3 and 9.2 stable branches (9.1 and 9.0 do not
have the debug message).
2015-03-14 08:16:50 +09:00
Andres Freund bbfd7edae5 Add macros wrapping all usage of gcc's __attribute__.
Until now __attribute__() was defined to be empty for all compilers but
gcc. That's problematic because it prevents using it in other compilers;
which is necessary e.g. for atomics portability.  It's also just
generally dubious to do so in a header as widely included as c.h.

Instead add pg_attribute_format_arg, pg_attribute_printf,
pg_attribute_noreturn macros which are implemented in the compilers that
understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
but don't provide fallbacks, since they can affect functionality.

This means that external code that, possibly unwittingly, relied on
__attribute__ defined to be empty on !gcc compilers may now run into
warnings or errors on those compilers. But there shouldn't be many
occurances of that and it's hard to work around...

Discussion: 54B58BA3.8040302@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.
2015-03-11 14:30:01 +01:00
Fujii Masao f8b031bca8 Fix an obsolete reference to SnapshotNow in comment.
Peter Geoghegan
2015-03-04 12:25:48 +09:00
Peter Eisentraut 23a78352c0 Error when creating names too long for tar format
The tar format (at least the version we are using), does not support
file names or symlink targets longer than 99 bytes.  Until now, the tar
creation code would silently truncate any names that are too long.  (Its
original application was pg_dump, where this never happens.)  This
creates problems when running base backups over the replication
protocol.

The most important problem is when a tablespace path is longer than 99
bytes, which will result in a truncated tablespace path being backed up.
Less importantly, the basebackup protocol also promises to back up any
other files it happens to find in the data directory, which would also
lead to file name truncation if someone put a file with a long name in
there.

Now both of these cases result in an error during the backup.

Add tests that fail when a too-long file name or symlink is attempted to
be backed up.

Reviewed-by: Robert Hass <robertmhaas@gmail.com>
2015-02-24 13:41:07 -05:00
Heikki Linnakangas dd58c6098f Fix typo in README.
Kyotaro Horiguchi
2015-02-24 14:33:26 +02:00
Tom Lane e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
Tom Lane 4f38a281a3 Fix missing PQclear() in libpqrcv_endstreaming().
This omission leaked one PGresult per WAL streaming cycle, which possibly
would never be enough to notice in the real world, but it's still a leak.

Per Coverity.  Back-patch to 9.3 where the error was introduced.
2015-02-11 19:20:49 -05:00
Heikki Linnakangas ff16b40f8c Report WAL flush, not insert, position in replication IDENTIFY_SYSTEM
When beginning streaming replication, the client usually issues the
IDENTIFY_SYSTEM command, which used to return the current WAL insert
position. That's not suitable for the intended purpose of that field,
however. pg_receivexlog uses it to start replication from the reported
point, but if it hasn't been flushed to disk yet, it will fail. Change
IDENTIFY_SYSTEM to report the flush position instead.

Backpatch to 9.1 and above. 9.0 doesn't report any WAL position.
2015-02-06 11:26:50 +02:00