Commit Graph

724 Commits

Author SHA1 Message Date
Magnus Hagander 10d62d1065 Properly use INT64_FORMAT in output
Per buildfarm animal prairiedog, suggestion solution from Tom.
2018-04-03 16:39:29 +02:00
Magnus Hagander a08dc71195 Fix for checksum validation patch
Reorder the check for non-BLCKSZ size reads to make sure we don't abort
sending the file in this case.

Missed in the previous commit.
2018-04-03 13:57:49 +02:00
Magnus Hagander 4eb77d50c2 Validate page level checksums in base backups
When base backups are run over the replication protocol (for example
using pg_basebackup), verify the checksums of all data blocks if
checksums are enabled. If checksum failures are encountered, log them
as warnings but don't abort the backup.

This becomes the default behaviour in pg_basebackup (provided checksums
are enabled on the server), so add a switch (-k) to disable the checks
if necessary.

Author: Michael Banck
Reviewed-By: Magnus Hagander, David Steele
Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de
2018-04-03 13:47:16 +02:00
Fujii Masao 9a895462d9 Enhance pg_stat_wal_receiver view to display host and port of sender server.
Previously there was no way in the standby side to find out the host and port
of the sender server that the walreceiver was currently connected to when
multiple hosts and ports were specified in primary_conninfo. For that purpose,
this patch adds sender_host and sender_port columns into pg_stat_wal_receiver
view. They report the host and port that the active replication connection
currently uses.

Bump catalog version.

Author: Haribabu Kommi
Reviewed-by: Michael Paquier and me

Discussion: https://postgr.es/m/CAJrrPGcV_aq8=cdqkFhVDJKEnDQ70yRTTdY9RODzMnXNrCz2Ow@mail.gmail.com
2018-03-31 07:51:22 +09:00
Bruce Momjian a2894cce54 C comment: fix typo, log -> lag
Reported-by: atorikoshi

Discussion: https://postgr.es/m/b61f2ab9-c0e0-d33d-ce3f-42a228025681@lab.ntt.co.jp

Author: atorikoshi
2018-03-28 18:23:47 -04:00
Fujii Masao 266b6acb31 Make pg_rewind skip files and directories that are removed during server start.
The target cluster that was rewound needs to perform recovery from
the checkpoint created at failover, which leads it to remove or recreate
some files and directories that may have been copied from the source
cluster. So pg_rewind can skip synchronizing such files and directories,
and which reduces the amount of data transferred during a rewind
without changing the usefulness of the operation.

Author: Michael Paquier
Reviewed-by: Anastasia Lubennikova, Stephen Frost and me

Discussion: https://postgr.es/m/20180205071022.GA17337@paquier.xyz
2018-03-29 04:56:52 +09:00
Tom Lane 442accc3fe Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident".  The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to.  The "ident" string, if
wanted, is supplied later.  This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.

The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there.  And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.

All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead.  This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other.  Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.

I also arranged for plancache contexts to use the query source string
as their identifier.  This is basically free for CachedPlanSources, as
they contained a copy of that string already.  We pay an extra pstrdup
to do it for CachedPlans.  That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first).  I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.

This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format.  This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.

The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1.  Seems
better for extension authors to endure just one round of API changes
not two.

Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:51 -04:00
Teodor Sigaev 920a5e500a Skip temp tables from basebackup.
Do not store temp tables in basebackup, they will not be visible anyway, so,
there are not reasons to store them.

Author: David Steel
Reviewed by: me
Discussion: https://www.postgresql.org/message-id/flat/5ea4d26a-a453-c1b7-eff9-5a3ef8f8aceb@pgmasters.net
2018-03-27 16:14:40 +03:00
Tom Lane d0c0c89453 Fix unsafe extraction of the OID part of a relation filename.
Commit 8694cc96b did this randomly differently from other callers of
parse_filename_for_nontemp_relation().  Perhaps unsurprisingly,
the randomly different way is wrong; it fails to ensure the
extracted string is null-terminated.  Per buildfarm member skink.

Discussion: https://postgr.es/m/14453.1522001792@sss.pgh.pa.us
2018-03-25 15:15:40 -04:00
Teodor Sigaev 8694cc96b5 Exclude unlogged tables from base backups
Exclude unlogged tables from base backup entirely except init fork which marks
created unlogged table. The next question is do not backup temp table but
it's a story for separate patch.

Author: David Steele
Review by: Adam Brightwell, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/04791bab-cb04-ba43-e9c0-664a4c1ffb2c@pgmasters.net
2018-03-23 19:14:12 +03:00
Tom Lane feb8254518 Improve style guideline compliance of assorted error-report messages.
Per the project style guide, details and hints should have leading
capitalization and end with a period.  On the other hand, errcontext should
not be capitalized and should not end with a period.  To support well
formatted error contexts in dblink, extend dblink_res_error() to take a
format+arguments rather than a hardcoded string.

Daniel Gustafsson

Discussion: https://postgr.es/m/B3C002C8-21A0-4F53-A06E-8CAB29FCF295@yesql.se
2018-03-22 17:33:10 -04:00
Peter Eisentraut 325f2ec555 Handle heap rewrites even better in logical decoding
Logical decoding should not publish anything about tables created as
part of a heap rewrite during DDL.  Those tables don't exist externally,
so consumers of logical decoding cannot do anything sensible with that
information.  In ab28feae2b, we worked
around this for built-in logical replication, but that was hack.

This is a more proper fix: We mark such transient heaps using the new
field pg_class.relwrite, linking to the original relation OID.  By
default, we ignore them in logical decoding before they get to the
output plugin.  Optionally, a plugin can register their interest in
getting such changes, if they handle DDL specially, in which case the
new field will help them get information about the actual table.

Reviewed-by: Craig Ringer <craig@2ndquadrant.com>
2018-03-21 09:15:04 -04:00
Peter Eisentraut 04700b685f Rename TransactionChain functions
We call this thing a "transaction block" everywhere except in a few
functions, where it is mysteriously called a "transaction chain".  In
the SQL standard, a transaction chain is something different.  So rename
these functions to match the common terminology.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-03-16 13:18:06 -04:00
Alvaro Herrera 24c0a6c649 logical replication: fix OID type mapping mechanism
The logical replication type map seems to have been misused by its only
caller -- it would try to use the remote OID as input for local type
routines, which unsurprisingly could result in bogus "cache lookup
failed for type XYZ" errors, or random other type names being picked up
if they happened to use the right OID.  Fix that, changing
Oid logicalrep_typmap_getid(Oid remoteid) to
char *logicalrep_typmap_gettypname(Oid remoteid)
which is more useful.  If the remote type is not part of the typmap,
this simply prints "unrecognized type" instead of choking trying to
figure out -- a pointless exercise (because the only input for that
comes from replication messages, which are not under the local node's
control) and dangerous to boot, when called from within an error context
callback.

Once that is done, it comes to light that the local OID in the typmap
entry was not being used for anything; the type/schema names are what we
need, so remove local type OID from that struct.

Once you do that, it becomes pointless to attach a callback to regular
syscache invalidation.  So remove that also.

Reported-by: Dang Minh Huong
Author: Masahiko Sawada
Reviewed-by: Álvaro Herrera, Petr Jelínek, Dang Minh Huong, Atsushi Torikoshi
Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A6BE964@BPXM05GP.gisp.nec.co.jp
Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A6C4B0A@BPXM05GP.gisp.nec.co.jp
2018-03-14 21:34:26 -03:00
Alvaro Herrera 8aa75e1384 Refrain from duplicating data in reorderbuffers
If a walsender exits leaving data in reorderbuffers, the next walsender
that tries to decode the same transaction would append its decoded data
in the same spill files without truncating it first, which effectively
duplicate the data.  Avoid that by removing any leftover reorderbuffer
spill files when a walsender starts.

Backpatch to 9.4; this bug has been there from the very beginning of
logical decoding.

Author: Craig Ringer, revised by me
Reviewed by: Álvaro Herrera, Petr Jelínek, Masahiko Sawada
2018-03-06 18:34:29 -03:00
Peter Eisentraut bc1adc651b Fix filtering of unsupported relations in logical replication
In the pgoutput plugin, skip changes for relations that are not
publishable, per is_publishable_class().  This concerns in particular
materialized views and information_schema tables.  While those relations
cannot be part of a publication, per existing checks, they will be
considered by a FOR ALL TABLES publication.  A subscription would not
actually apply changes for those relations, again per existing checks,
but trying to match incoming changes to local tables on the subscriber
would lead to errors if no matching local table exists.  Skipping those
changes on the publisher avoids sending useless changes and eliminates
the error.

Bug: #15044
Reported-by: Chad Trabant <chad@iris.washington.edu>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2018-02-23 22:13:21 -05:00
Alvaro Herrera 6f1d723b63 Fix crash in pg_replication_slot_advance
We were trying to use a LSN variable after releasing its containing slot
structure.

Reported by: tushar
Author: amul sul
Reviewed-by: Petr Jelinek, Masahiko Sawada
Discussion: https://postgr.es/m/94ba999c-f76a-0423-6523-b8d531dfe4c7@enterprisedb.com
2018-02-19 22:25:27 -03:00
Andres Freund ad7dbee368 Allow tupleslots to have a fixed tupledesc, use in executor nodes.
The reason for doing so is that it will allow expression evaluation to
optimize based on the underlying tupledesc. In particular it will
allow to JIT tuple deforming together with the expression itself.

For that expression initialization needs to be moved after the
relevant slots are initialized - mostly unproblematic, except in the
case of nodeWorktablescan.c.

After doing so there's no need for ExecAssignResultType() and
ExecAssignResultTypeFromTL() anymore, as all former callers have been
converted to create a slot with a fixed descriptor.

When creating a slot with a fixed descriptor, tts_values/isnull can be
allocated together with the main slot, reducing allocation overhead
and increasing cache density a bit.

Author: Andres Freund
Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
2018-02-16 21:17:38 -08:00
Tom Lane 49bff412ed Remove some inappropriate #includes.
Other header files should never #include postgres.h (nor postgres_fe.h,
nor c.h), per project policy.  Also, there's no need for any backend .c
file to explicitly include elog.h or palloc.h, because postgres.h pulls
those in already.

Extracted from a larger patch by Kyotaro Horiguchi.  The rest of the
removals he suggests require more study, but these are no-brainers.

Discussion: https://postgr.es/m/20180215.200447.209320006.horiguchi.kyotaro@lab.ntt.co.jp
2018-02-16 12:14:08 -05:00
Peter Eisentraut ebdb42a0d6 Fix typo
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2018-02-12 22:39:52 -05:00
Robert Haas b78d0160da Fix incorrect method name in comment.
Atsushi Torikoshi

Discussion: http://postgr.es/m/1b056262-4bc0-a982-c899-bb67a0a7fd52@lab.ntt.co.jp
2018-02-08 14:35:54 -05:00
Simon Riggs 4e54dd2e0a Fix typo in recent commit
Typo in 9c7d06d606

Reported-by: Masahiko Sawada
2018-01-19 06:36:17 +00:00
Simon Riggs 9c7d06d606 Ability to advance replication slots
Ability to advance both physical and logical replication slots using a
new user function pg_replication_slot_advance().

For logical advance that means records are consumed as fast as possible
and changes are not given to output plugin for sending. Makes 2nd phase
(after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation
faster, especially when there are big transactions as the reorder buffer
does not have to deal with data changes and does not have to spill to
disk.

Author: Petr Jelinek
Reviewed-by: Simon Riggs
2018-01-17 11:38:34 +00:00
Peter Eisentraut 9e945f8626 Fix Latin spelling
"c.f." should be "cf.".
2018-01-11 08:32:01 -05:00
Tom Lane 8a906204ae Fix race condition during replication origin drop.
replorigin_drop() misunderstood the API for condition variables: it
had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
inside its test-and-sleep loop, rather than outside the loop as
intended.  The net effect is a narrow race-condition window wherein,
if the process using a replication slot releases it immediately after
replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
would get into the condition variable's wait list too late and then
wait indefinitely for a signal that won't come.

Because there's a different CV for each replication slot, we can't
just move the ConditionVariablePrepareToSleep call to above the
test-and-sleep loop.  What we can do, in the wake of commit 13db3b936,
is drop the ConditionVariablePrepareToSleep call entirely.  This fix
depends on that commit because (at least in principle) the slot matching
the target replication origin might move around, so that once in a blue
moon successive loop iterations might involve different CVs.  We can now
cope with such a scenario, at the cost of an extra trip through the
retry loop.

(There are ways we could fix this bug without depending on that commit,
but they're all a lot more complicated than this way.)

While at it, upgrade the rather skimpy comments in this function.

Back-patch to v10 where this code came in.

Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us
2018-01-09 12:09:30 -05:00
Simon Riggs 6668a54eb8 Default monitoring roles - errata
25fff40798 introduced
default monitoring roles. Apply these corrections:

* Allow access to pg_stat_get_wal_senders()
  by role pg_read_all_stats

* Correct comment in pg_stat_get_wal_receiver()
  to show it is no longer superuser-only.

Author: Feike Steenbergen
Reviewed-by: Michael Paquier

Apply to HEAD, then later backpatch to 10
2018-01-06 11:48:21 +00:00
Alvaro Herrera df9f682c7b Fix failure to delete spill files of aborted transactions
Logical decoding's reorderbuffer.c may spill transaction files to disk
when transactions are large.  These are supposed to be removed when they
become "too old" by xid; but file removal requires the boundary LSNs of
the transaction to be known.  The final_lsn is only set when we see the
commit or abort record for the transaction, but nothing sets the value
for transactions that crash, so the removal code misbehaves -- in
assertion-enabled builds, it crashes by a failed assertion.

To fix, modify the final_lsn of transactions that don't have a value
set, to the LSN of the very latest change in the transaction.  This
causes the spilled files to be removed appropriately.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada
Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp
2018-01-05 12:17:10 -03:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Magnus Hagander d02974e32e Properly set base backup backends to active in pg_stat_activity
When walsenders were included in pg_stat_activity, only the ones
actually streaming WAL were listed as active when they were active. In
particular, the connections sending base backups were listed as being
idle. Which means that a regular pg_basebackup would show up with one
active and one idle connection, when both were active.

This patch updates to set all walsenders to active when they are
(including those doing very fast things like IDENTIFY_SYSTEM), and then
back to idle. Details about exactly what they are doing is available in
pg_stat_replication.

Patch by me, review by Michael Paquier and David Steele.
2017-12-29 16:28:32 +01:00
Simon Riggs 48c9f49265 Fix race condition when changing synchronous_standby_names
A momentary window exists when synchronous_standby_names
changes that allows commands issued after the change to
continue to act as async until the change becomes visible.
Remove the race by using more appropriate test in syncrep.c

Author: Asim Rama Praveen and Ashwin Agrawal
Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen
Reviewed-by: Michael Paquier, Masahiko Sawada
2017-12-29 14:30:33 +00:00
Fujii Masao 56a95ee511 Fix bug in cancellation of non-exclusive backup to avoid assertion failure.
Previously an assertion failure occurred when pg_stop_backup() for
non-exclusive backup was aborted while it's waiting for WAL files to
be archived. This assertion failure happened in do_pg_abort_backup()
which was called when a non-exclusive backup was canceled.
do_pg_abort_backup() assumes that there is at least one non-exclusive
backup running when it's called. But pg_stop_backup() can be canceled
even after it marks the end of non-exclusive backup (e.g.,
during waiting for WAL archiving). This broke the assumption that
do_pg_abort_backup() relies on, and which caused an assertion failure.

This commit changes do_pg_abort_backup() so that it does nothing
when non-exclusive backup has been already marked as completed.
That is, the asssumption is also changed, and do_pg_abort_backup()
now can handle even the case where it's called when there is
no running backup.

Backpatch to 9.6 where SQL-callable non-exclusive backup was added.

Author: Masahiko Sawada and Michael Paquier
Reviewed-By: Robert Haas and Fujii Masao
Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
2017-12-19 03:46:14 +09:00
Andrew Dunstan 0fedb4ea69 Fix walsender timeouts when decoding a large transaction
The logical slots have a fast code path for sending data so as not to
impose too high a per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that a transaction with a large number of changes may take a
very long time to be processed and sent to the client. This causes the
walsender to ignore interrupts for potentially a long time and more
importantly it will result in the walsender being killed due to
timeout at the end of such a transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when the last keepalive check happened less
than half the walsender timeout ago. Otherwise the slower code path will
be taken.

Backpatched to 9.4

Petr Jelinek, reviewed by  Kyotaro HORIGUCHI, Yura Sokolov,  Craig
Ringer and Robert Haas.

Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
2017-12-14 11:13:14 -05:00
Tom Lane 9fa6f00b13 Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts.  The key ideas are:

* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.

* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).

* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.

The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations.  That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.

Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant.  Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)

The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.

To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option.  AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.

An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.

Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module.  That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.

In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles.  The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.

Also, mark the memory context method-pointer structs "const",
just for cleanliness.

Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:16 -05:00
Tom Lane 066bc21c0e Simplify do_pg_start_backup's API by opening pg_tblspc internally.
do_pg_start_backup() expects its callers to pass in an open DIR pointer
for the pg_tblspc directory, but there's no apparent advantage in that.
It complicates the callers without adding any flexibility, and there's no
robustness advantage, since we surely have to be prepared for errors during
the scan of pg_tblspc anyway.  In fact, by holding an extra kernel resource
during operations like the preliminary checkpoint, we might be making
things a fraction more failure-prone not less.  Hence, remove that argument
and open the directory just for the duration of the actual scan.

Discussion: https://postgr.es/m/28752.1512413887@sss.pgh.pa.us
2017-12-04 18:37:54 -05:00
Tom Lane 2069e6faa0 Clean up assorted messiness around AllocateDir() usage.
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures.  It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode.  And it eliminates a lot of just plain redundant error-handling
code.

In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern

	dir = AllocateDir(path);
	while ((de = ReadDirExtended(dir, path, LOG)) != NULL)

if they'd like to treat directory-open failures as mere LOG conditions
rather than errors.  Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.

Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine.  Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above.  (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.)  In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.

Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported.  There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.

Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming.  (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire.  If so, this function was broken for its
own purposes as well as breaking callers.)

And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.

Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless.  Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes.  The rest of this is
essentially cosmetic and need not get back-patched.

Michael Paquier, with a bit of additional work by me

Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
2017-12-04 17:02:56 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Simon Riggs a4ccc1cef5 Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Author: Tomas Vondra
Reviewed-by: Simon Riggs
2017-11-23 05:45:07 +11:00
Simon Riggs 7e17a6889a Set es_output_cid in replication worker
Allows triggers to operate correctly

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
2017-11-22 16:28:14 +11:00
Peter Eisentraut 0e1539ba0d Add some const decorations to prototypes
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-11-10 13:38:57 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Simon Riggs 98267ee83e Exclude pg_internal.init from BASE_BACKUP
Add docs to explain this for other backup mechanisms

Author: David Steele <david@pgmasters.net>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndQuadrant.com> et al
2017-11-07 12:28:35 -05:00
Peter Eisentraut a9fce66729 Don't reset additional columns on subscriber to NULL on UPDATE
When a publisher table has fewer columns than a subscriber, the update
of a row on the publisher should result in updating of only the columns
in common.  The previous coding mistakenly reset the values of
additional columns on the subscriber to NULL because it failed to skip
updates of columns not found in the attribute map.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-11-03 12:27:59 -04:00
Peter Eisentraut 4211673622 Exclude flex-generated code from coverage testing
Flex generates a lot of functions that are not actually used.  In order
to avoid coverage figures being ruined by that, mark up the part of the
.l files where the generated code appears by lcov exclusion markers.
That way, lcov will typically only reported on coverage for the .l file,
which is under our control, but not for the .c file.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-10-16 16:28:11 -04:00
Andres Freund 31079a4a8e Replace remaining uses of pq_sendint with pq_sendint{8,16,32}.
pq_sendint() remains, so extension code doesn't unnecessarily break.

Author: Andres Freund
Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
2017-10-11 21:00:46 -07:00
Robert Haas c097b271e8 Fix more user-visible elog() calls.
Michael Paquier discovered that this could be triggered via SQL;
give a nicer message instead.

Patch by Michael Paquier, reviewed by Masahiko Sawada.

Discussion: http://postgr.es/m/CAB7nPqQtPg+LKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg@mail.gmail.com
2017-10-05 07:58:02 -04:00
Tom Lane 45f9d08684 Fix race condition with unprotected use of a latch pointer variable.
Commit 597a87ccc introduced a latch pointer variable to replace use
of a long-lived shared latch in the shared WalRcvData structure.
This was not well thought out, because there are now hazards of the
pointer variable changing while it's being inspected by another
process.  This could obviously lead to a core dump in code like

	if (WalRcv->latch)
		SetLatch(WalRcv->latch);

and there's a more remote risk of a torn read, if we have any
platforms where reading/writing a pointer is not atomic.

An actual problem would occur only if the walreceiver process
exits (gracefully) while the startup process is trying to
signal it, but that seems well within the realm of possibility.

To fix, treat the pointer variable (not the referenced latch)
as being protected by the WalRcv->mutex spinlock.  There
remains a race condition that we could apply SetLatch to a
process latch that no longer belongs to the walreceiver, but
I believe that's harmless: at worst it'd cause an extra wakeup
of the next process to use that PGPROC structure.

Back-patch to v10 where the faulty code was added.

Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us
2017-10-03 14:00:56 -04:00
Alvaro Herrera 89e434b59c Fix coding rules violations in walreceiver.c
1. Since commit b1a9bad9e7 we had pstrdup() inside a
spinlock-protected critical section; reported by Andreas Seltenreich.
Turn those into strlcpy() to stack-allocated variables instead.
Backpatch to 9.6.

2. Since commit 9ed551e0a4 we had a pfree() uselessly inside a
spinlock-protected critical section.  Tom Lane noticed in code review.
Move down.  Backpatch to 9.6.

3. Since commit 64233902d2 we had GetCurrentTimestamp() (a kernel
call) inside a spinlock-protected critical section.  Tom Lane noticed in
code review.  Move it up.  Backpatch to 9.2.

4. Since commit 1bb2558046 we did elog(PANIC) while holding spinlock.
Tom Lane noticed in code review.  Release spinlock before dying.
Backpatch to 9.2.

Discussion: https://postgr.es/m/87h8vhtgj2.fsf@ansel.ydns.eu
2017-10-03 14:58:25 +02:00
Peter Eisentraut 5373bc2a08 Add background worker type
Add bgw_type field to background worker structure.  It is intended to be
set to the same value for all workers of the same type, so they can be
grouped in pg_stat_activity, for example.

The backend_type column in pg_stat_activity now shows bgw_type for a
background worker.  The ps listing also no longer calls out that a
process is a background worker but just show the bgw_type.  That way,
being a background worker is more of an implementation detail now that
is not shown to the user.  However, most log messages still refer to
'background worker "%s"'; otherwise constructing sensible and
translatable log messages would become tricky.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2017-09-29 11:08:24 -04:00
Peter Eisentraut ab28feae2b Handle heap rewrites better in logical replication
A FOR ALL TABLES publication naturally considers all base tables to be a
candidate for replication.  This includes transient heaps that are
created during a table rewrite during DDL.  This causes failures on the
subscriber side because it will not have a table like pg_temp_16386 to
receive data (and if it did, it would be the wrong table).

The prevent this problem, we filter out any tables that match this
naming pattern and match an actual table from FOR ALL TABLES
publications.  This is only a heuristic, meaning that user tables that
match that naming could accidentally be omitted.  A more robust solution
might require an explicit marking of such tables in pg_class somehow.

Reported-by: yxq <yxq@o2.pl>
Bug: #14785
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-09-26 10:13:43 -04:00
Peter Eisentraut 0c5803b450 Refactor new file permission handling
The file handling functions from fd.c were called with a diverse mix of
notations for the file permissions when they were opening new files.
Almost all files created by the server should have the same permissions
set.  So change the API so that e.g. OpenTransientFile() automatically
uses the standard permissions set, and OpenTransientFilePerm() is a new
function that takes an explicit permissions set for the few cases where
it is needed.  This also saves an unnecessary argument for call sites
that are just opening an existing file.

While we're reviewing these APIs, get rid of the FileName typedef and
use the standard const char * for the file name and mode_t for the file
mode.  This makes these functions match other file handling functions
and removes an unnecessary layer of mysteriousness.  We can also get rid
of a few casts that way.

Author: David Steele <david@pgmasters.net>
2017-09-23 10:16:18 -04:00
Andres Freund fc49e24fa6 Make WAL segment size configurable at initdb time.
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.

But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.

This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured.  For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.

Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
    Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-19 22:03:48 -07:00
Tom Lane 3e1683d37e Fix, or at least ameliorate, bugs in logicalrep_worker_launch().
If we failed to get a background worker slot, the code just walked
away from the logicalrep-worker slot it already had, leaving that
looking like the worker is still starting up.  This led to an indefinite
hang in subscription startup, as reported by Thomas Munro.  We must
release the slot on failure.

Also fix a thinko: we must capture the worker slot's generation before
releasing LogicalRepWorkerLock the first time, else testing to see if
it's changed is pretty meaningless.

BTW, the CHECK_FOR_INTERRUPTS() in WaitForReplicationWorkerAttach is a
ticking time bomb, even without considering the possibility of elog(ERROR)
in one of the other functions it calls.  Really, this entire business needs
a redesign with some actual thought about error recovery.  But for now
I'm just band-aiding the case observed in testing.

Back-patch to v10 where this code was added.

Discussion: https://postgr.es/m/CAEepm=2bP3TBMFBArP6o20AZaRduWjMnjCjt22hSdnA-EvrtCw@mail.gmail.com
2017-09-18 11:39:55 -04:00
Peter Eisentraut 61975d6c2c Improve error message in WAL sender
The previous error message when attempting to run a general SQL command
in a physical replication WAL sender was a bit sloppy.

Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-09-13 08:31:03 -04:00
Peter Eisentraut 821fb8cdbf Message style fixes 2017-09-11 11:21:27 -04:00
Alvaro Herrera ebd346caf4 Correct base backup throttling
Throttling for sending a base backup in walsender is broken for the case
where there is a lot of WAL traffic, because the latch used to put the
walsender to sleep is also signalled by regular WAL traffic (and each
signal causes an additional batch of data to be sent); the net effect is
that there is no or little actual throttling.  This is undesirable, so
rewrite the sleep into a loop to achieve the desired effeect.

Author: Jeff Janes, small tweaks by me
Reviewed-by: Antonin Houska
Discussion: https://postgr.es/m/CAMkU=1xH6mde-yL-Eo1TKBGNd0PB1-TMxvrNvqcAkN-qr2E9mw@mail.gmail.com
2017-09-05 17:27:30 +02:00
Alvaro Herrera a6979c3a68 Restore behavior for replication origin drop
Do for replication origins what the previous commit did for replication
slots: restore the original behavior of replication origin drop to raise
an error rather than blocking, because users might be depending on the
original behavior.  Maintain the blocking behavior when invoked
internally from logical replication subscription handling.

Discussion: https://postgr.es/m/20170830133922.tlpo3lgfejm4n2cs@alvherre.pgsql
2017-09-01 16:30:02 +02:00
Alvaro Herrera be7161566d Add a WAIT option to DROP_REPLICATION_SLOT
Commit 9915de6c1c changed the default behavior of
DROP_REPLICATION_SLOT so that it would wait until any session holding
the slot active would release it, instead of raising an error.  But
users are already depending on the original behavior, so revert to it by
default and add a WAIT option to invoke the new behavior.

Per complaint from Simone Gotti, in
Discussion: https://postgr.es/m/CAEvsy6Wgdf90O6pUvg2wSVXL2omH5OPC-38OD4Zzgk-FXavj3Q@mail.gmail.com
2017-09-01 13:44:14 +02:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Peter Eisentraut 77d05706be Fix up some misusage of appendStringInfo() and friends
Change to appendStringInfoChar() or appendStringInfoString() where those
can be used.

Author: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
2017-08-15 23:34:39 -04:00
Peter Eisentraut 70b573b267 Fix logical replication protocol comparison logic
Since we currently only have one protocol, this doesn't make much of a
difference other than the error message.

Author: Yugo Nagata <nagata@sraoss.co.jp>
2017-08-15 16:21:19 -04:00
Peter Eisentraut e42351ae07 Simplify some code in logical replication launcher
Avoid unnecessary locking calls when a subscription is disabled.

Author: Yugo Nagata <nagata@sraoss.co.jp>
2017-08-15 15:13:06 -04:00
Tom Lane f3a4d7e7c2 Distinguish wait-for-connection from wait-for-write-ready on Windows.
The API for WaitLatch and friends followed the Unix convention in which
waiting for a socket connection to complete is identical to waiting for
the socket to accept a write.  While Windows provides a select(2)
emulation that agrees with that, the native WaitForMultipleObjects API
treats them as quite different --- and for some bizarre reason, it will
report a not-yet-connected socket as write-ready.  libpq itself has so
far escaped dealing with this because it waits with select(), but in
libpqwalreceiver.c we want to wait using WaitLatchOrSocket.  The semantics
mismatch resulted in replication connection failures on Windows, but only
for remote connections (apparently, localhost connections complete
immediately, or at least too fast for anyone to have noticed the problem
in single-machine testing).

To fix, introduce an additional WL_SOCKET_CONNECTED wait flag for
WaitLatchOrSocket, which is identical to WL_SOCKET_WRITEABLE on
non-Windows, but results in waiting for FD_CONNECT events on Windows.

Ideally, we would also distinguish the two conditions in the API for
PQconnectPoll(), but changing that API at this point seems infeasible.
Instead, cheat by checking for PQstatus() == CONNECTION_STARTED to
determine that we're still waiting for the connection to complete.
(This is a cheat mainly because CONNECTION_STARTED is documented as an
internal state rather than something callers should rely on.  Perhaps
we ought to change the documentation ... but this patch doesn't.)

Per reports from Jobin Augustine and Igor Neyman.  Back-patch to v10
where commit 1e8a85009 exposed this longstanding shortcoming.

Andres Freund, minor fix and some code review/beautification by me

Discussion: https://postgr.es/m/CAHBggj8g2T+ZDcACZ2FmzX9CTxkWjKBsHd6NkYB4i9Ojf6K1Fw@mail.gmail.com
2017-08-15 11:07:57 -04:00
Tom Lane 21d304dfed Final pgindent + perltidy run for v10. 2017-08-14 17:29:33 -04:00
Alvaro Herrera 2336f84284 Reword comment for clarity
Reported by Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com
2017-08-12 23:26:35 -04:00
Peter Eisentraut a1ef920e27 Remove uses of "slave" in replication contexts
This affects mostly code comments, some documentation, and tests.
Official APIs already used "standby".
2017-08-10 22:55:41 -04:00
Alvaro Herrera b2c95a3798 Fix replication origin-related race conditions
Similar to what was fixed in commit 9915de6c1c for replication slots,
but this time it's related to replication origins: DROP SUBSCRIPTION
attempts to drop the replication origin, but that fails if the
replication worker process hasn't yet marked it unused.  This causes
failures in the buildfarm:
ERROR:  could not drop replication origin with OID 1, in use by PID 34069

Like the aforementioned commit, fix by having the process running DROP
SUBSCRIPTION sleep until the worker marks the the replication origin
struct as free.  This uses a condition variable on each replication
origin shmem state struct, so that the session trying to drop can sleep
and expect to be awakened by the process keeping the origin open.

Also fix a SGML markup in the previous commit.

Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql
2017-08-08 16:07:46 -04:00
Alvaro Herrera 030273b7ea Fix inadequacies in recently added wait events
In commit 9915de6c1c, we introduced a new wait point for replication
slots and incorrectly labelled it as wait event PG_WAIT_LOCK.  That's
wrong, so invent an appropriate new wait event instead, and document it
properly.

While at it, fix numerous other problems in the vicinity:
- two different walreceiver wait events were being mixed up in a single
  wait event (which wasn't documented either); split it out so that they
  can be distinguished, and document the new events properly.

- ParallelBitmapPopulate was documented but didn't exist.

- ParallelBitmapScan was not documented (I think this should be called
  "ParallelBitmapScanInit" instead.)

- Logical replication wait events weren't documented

- various symbols had been added in dartboard order in various places.
  Put them in alphabetical order instead, as was originally intended.

Discussion: https://postgr.es/m/20170808181131.mu4fjepuh5m75cyq@alvherre.pgsql
2017-08-08 15:37:44 -04:00
Peter Eisentraut fca17a933b Fix local/remote attribute mix-up in logical replication
This would lead to failures if local and remote tables have a different
column order.  The tests previously didn't catch that because they only
tested the initial data copy.  So add another test that exercises the
apply worker.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-08-07 10:49:08 -04:00
Peter Eisentraut 0e58455dd4 Fix handling of dropped columns in logical replication
The relation attribute map was not initialized for dropped columns,
leading to errors later on.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Scott Milliken <scott@deltaex.com>
Bug: #14769
2017-08-07 10:28:35 -04:00
Andres Freund 5af4456a56 Fix thinko introduced in 2bef06d516 et al.
The callers for GetOldestSafeDecodingTransactionId() all inverted the
argument for the argument introduced in 2bef06d516. Luckily this
appears to be inconsequential for the moment, as we wait for
concurrent in-progress transaction when assembling a
snapshot. Additionally this could only make a difference when adding a
second logical slot, because only a pre-existing slot could cause an
issue by lowering the returned xid dangerously much.

Reported-By: Antonin Houska
Discussion: https://postgr.es/m/32704.1496993134@localhost
Backport: 9.4-, where 2bef06d516 was backpatched to.
2017-08-06 14:20:55 -07:00
Peter Eisentraut 7e174fa793 Only kill sync workers at commit time in subscription DDL
This allows a transaction abort to avoid killing those workers.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-08-04 21:17:47 -04:00
Peter Eisentraut cf65201833 Get a snapshot before COPY in table sync
This fixes a crash if the local table has a function index and the
function makes non-immutable calls.

Reported-by: Scott Milliken <scott@deltaex.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-08-02 11:34:42 -04:00
Alvaro Herrera 5e3254f086 Update copyright in recently added files 2017-07-26 18:17:18 -04:00
Alvaro Herrera 9915de6c1c Fix race conditions in replication slot operations
It is relatively easy to get a replication slot to look as still active
while one process is in the process of getting rid of it; when some
other process tries to "acquire" the slot, it would fail with an error
message of "replication slot XYZ is active for PID N".

The error message in itself is fine, except that when the intention is
to drop the slot, it is unhelpful: the useful behavior would be to wait
until the slot is no longer acquired, so that the drop can proceed.  To
implement this, we use a condition variable so that slot acquisition can
be told to wait on that condition variable if the slot is already
acquired, and we make any change in active_pid broadcast a signal on the
condition variable.  Thus, as soon as the slot is released, the drop
will proceed properly.

Reported by: Tom Lane
Discussion: https://postgr.es/m/11904.1499039688@sss.pgh.pa.us
Authors: Petr Jelínek, Álvaro Herrera
2017-07-25 13:26:49 -04:00
Heikki Linnakangas 49a3360209 Fix ordering of operations in SyncRepWakeQueue to avoid assertion failure.
Commit 14e8803f1 removed the locking in SyncRepWaitForLSN, but that
introduced a race condition, where SyncRepWaitForLSN might see
syncRepState already set to SYNC_REP_WAIT_COMPLETE, but the process was
not yet removed from the queue. That tripped the assertion, that the
process should no longer be in the uqeue. Reorder the operations in
SyncRepWakeQueue to remove the process from the queue first, and update
syncRepState only after that, and add a memory barrier in between to make
sure the operations are made visible to other processes in that order.

Fixes bug #14721 reported by Const Zhang. Analysis and fix by Thomas Munro.
Backpatch down to 9.5, where the locking was removed.

Discussion: https://www.postgresql.org/message-id/20170629023623.1480.26508%40wrigleys.postgresql.org
2017-07-12 15:30:52 +03:00
Tom Lane f32678c016 Reduce delay for last logicalrep feedback message when master goes idle.
The regression tests contain numerous cases where we do some activity on a
master server and then wait till the slave has ack'd flushing its copy of
that transaction.  Because WAL flush on the slave is asynchronous to the
logicalrep worker process, the worker cannot send such a feedback message
during the LogicalRepApplyLoop iteration where it processes the last data
from the master.  In the previous coding, the feedback message would come
out only when the loop's WaitLatchOrSocket call returned WL_TIMEOUT.  That
requires one full second of delay (NAPTIME_PER_CYCLE); and to add insult
to injury, it could take more than that if the WaitLatchOrSocket was
interrupted a few times by latch-setting events.

In reality we can expect the slave's walwriter process to have flushed the
WAL data after, more or less, WalWriterDelay (typically 200ms).  Hence,
if there are unacked transactions pending, make the wait delay only that
long rather than the full NAPTIME_PER_CYCLE.  Also, move one of the
send_feedback() calls into the loop main line, so that we'll check for the
need to send feedback even if we were woken by a latch event and not either
socket data or timeout.

It's not clear how much this matters for production purposes, but
it's definitely helpful for testing.

Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
2017-07-01 12:15:51 -04:00
Tom Lane 799f8bc76a Shorten timeouts while waiting for logicalrep worker slot attach/detach.
When waiting for a logical replication worker process to start or stop,
we have to busy-wait until we see it add or remove itself from the
LogicalRepWorker slot in shared memory.  Those loops were using a
one-second delay between checks, but on any reasonably modern machine, it
doesn't take more than a couple of msec for a worker to spawn or shut down.
Reduce the loop delays to 10ms to avoid wasting quite so much time in the
related regression tests.

In principle, a better solution would be to fix things so that the waiting
process can be awakened via its latch at the right time.  But that seems
considerably more invasive, which is undesirable for a post-beta fix.
Worker start/stop performance likely isn't of huge interest anyway for
production purposes, so we might not ever get around to it.

In passing, rearrange the second wait loop in logicalrep_worker_stop()
so that the lock is held at the top of the loop, thus saving one lock
acquisition/release per call, and making it look more like the other loop.

Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
2017-07-01 11:59:44 -04:00
Alvaro Herrera 572d6ee6d4 Fix locking in WAL receiver/sender shmem state structs
In WAL receiver and WAL server, some accesses to their corresponding
shared memory control structs were done without holding any kind of
lock, which could lead to inconsistent and possibly insecure results.

In walsender, fix by clarifying the locking rules and following them
correctly, as documented in the new comment in walsender_private.h;
namely that some members can be read in walsender itself without a lock,
because the only writes occur in the same process.  The rest of the
struct requires spinlock for accesses, as usual.

In walreceiver, fix by always holding spinlock while accessing the
struct.

While there is potentially a problem in all branches, it is minor in
stable ones.  This only became a real problem in pg10 because of quorum
commit in synchronous replication (commit 3901fd70cc), and a potential
security problem in walreceiver because a superuser() check was removed
by default monitoring roles (commit 25fff40798).  Thus, no backpatch.

In passing, clean up some leftover braces which were used to create
unconditional blocks.  Once upon a time these were used for
volatile-izing accesses to those shmem structs, which is no longer
required.  Many other occurrences of this pattern remain.

Author: Michaël Paquier
Reported-by: Michaël Paquier
Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi, Thomas Munro,
	Robert Haas
Discussion: https://postgr.es/m/CAB7nPqTWYqtzD=LN_oDaf9r-hAjUEPAy0B9yRkhcsLdRN8fzrw@mail.gmail.com
2017-06-30 18:06:33 -04:00
Tom Lane 1f201a818a Fix race conditions and missed wakeups in syncrep worker signaling.
When a sync worker is waiting for the associated apply worker to notice
that it's in SYNCWAIT state, wait_for_worker_state_change() would just
patiently wait for that to happen.  This generally required waiting for
the 1-second timeout in LogicalRepApplyLoop to elapse.  Kicking the worker
via its latch makes things significantly snappier.

While at it, fix race conditions that could potentially result in crashes:
we can *not* call logicalrep_worker_wakeup_ptr() once we've released the
LogicalRepWorkerLock, because worker->proc might've been reset to NULL
after we do that (indeed, there's no really solid reason to believe that
the LogicalRepWorker slot even belongs to the same worker anymore).
In logicalrep_worker_wakeup(), we can just move the wakeup inside the
lock scope.  In process_syncing_tables_for_apply(), a bit more code
rearrangement is needed.

Also improve some nearby comments.
2017-06-30 14:57:14 -04:00
Tom Lane 609fa63db6 Check for error during PQendcopy.
Oversight in commit 78c8c8143; noted while nosing around the
walreceiver startup/shutdown code.
2017-06-30 12:22:33 -04:00
Tom Lane fca85f8ef1 Fix walsender to exit promptly if client requests shutdown.
It's possible for WalSndWaitForWal to be asked to wait for WAL that doesn't
exist yet.  That's fine, in fact it's the normal situation if we're caught
up; but when the client requests shutdown we should not keep waiting.
The previous coding could wait indefinitely if the source server was idle.

In passing, improve the rather weak comments in this area, and slightly
rearrange some related code for better readability.

Back-patch to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/14154.1498781234@sss.pgh.pa.us
2017-06-30 12:00:15 -04:00
Simon Riggs 9ea3c64124 Improve replication lag interpolation after idle period
After sitting idle and fully replayed for a while and then encountering
a new burst of WAL activity, we interpolate between an ancient sample and the
not-yet-reached one for the new traffic. That produced a corner case report
of lag after receiving first new reply from standby, which might sometimes
be a large spike.

Correct this by resetting last_read time and handle that new case.

Author: Thomas Munro
2017-06-23 18:58:46 +01:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Andres Freund 3bdea167eb Fix leaking of small spilled subtransactions during logical decoding.
When, during logical decoding, a transaction gets too big, it's
contents get spilled to disk. Not just the top-transaction gets
spilled, but *also* all of its subtransactions, even if they're not
that large themselves.  Unfortunately we didn't clean up
such small spilled subtransactions from disk.

Fix that, by keeping better track of whether a transaction has been
spilled to disk.

Author: Andres Freund
Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello
Discussion:
    https://postgr.es/m/1457621358.355011041@f382.i.mail.ru
    https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com
Backpatch: 9.4-, where logical decoding was introduced
2017-06-18 19:12:56 -07:00
Peter Eisentraut 033370179a Set statement timestamp in apply worker
This ensures that triggers can see an up-to-date timestamp.

Reported-by: Konstantin Evteev <konst583@gmail.com>
2017-06-17 08:54:21 -04:00
Tom Lane a3bed62d44 Fix low-probability leaks of PGresult objects in the backend.
We had three occurrences of essentially the same coding pattern
wherein we tried to retrieve a query result from a libpq connection
without blocking.  In the case where PQconsumeInput failed (typically
indicating a lost connection), all three loops simply gave up and
returned, forgetting to clear any previously-collected PGresult
object.  Since those are malloc'd not palloc'd, the oversight results
in a process-lifespan memory leak.

One instance, in libpqwalreceiver, is of little significance because
the walreceiver process would just quit anyway if its connection fails.
But we might as well fix it.

The other two instances, in postgres_fdw, are somewhat more worrisome
because at least in principle the scenario could be repeated, allowing
the amount of memory leaked to build up to something worth worrying
about.  Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS
calls, as well as other calls that could potentially elog(ERROR),
providing another way to exit without having cleared the PGresult.
Here we need to add PG_TRY logic similar to what exists in quite a
few other places in postgres_fdw.

Coverity noted the libpqwalreceiver bug; I found the other two cases
by checking all calls of PQconsumeInput.

Back-patch to all supported versions as appropriate (9.2 lacks
postgres_fdw, so this is really quite unexciting for that branch).

Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us
2017-06-15 15:03:52 -04:00
Andres Freund 6c2003f8a1 Don't force-assign transaction id when exporting a snapshot.
Previously we required every exported transaction to have an xid
assigned. That was used to check that the exporting transaction is
still running, which in turn is needed to guarantee that that
necessary rows haven't been removed in between exporting and importing
the snapshot.

The exported xid caused unnecessary problems with logical decoding,
because slot creation has to wait for all concurrent xid to finish,
which in turn serializes concurrent slot creation.   It also
prohibited snapshots to be exported on hot-standby replicas.

Instead export the virtual transactionid, which avoids the unnecessary
serialization and the inability to export snapshots on standbys. This
changes the file name of the exported snapshot, but since we never
documented what that one means, that seems ok.

Author: Petr Jelinek, slightly editorialized by me
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f598b4b8-8cd7-0d54-0939-adda763d8c34@2ndquadrant.com
2017-06-14 11:57:21 -07:00
Tom Lane 651902deb1 Re-run pgindent.
This is just to have a clean base state for testing of Piotr Stefaniak's
latest version of FreeBSD indent.  I fixed up a couple of places where
pgindent would have changed format not-nicely.  perltidy not included.

Discussion: https://postgr.es/m/VI1PR03MB119959F4B65F000CA7CD9F6BF2CC0@VI1PR03MB1199.eurprd03.prod.outlook.com
2017-06-13 13:05:59 -04:00
Peter Eisentraut 88c6cff8e7 Improve code comments
Author: Erik Rijkers <er@xs4all.nl>
2017-06-13 10:43:36 -04:00
Peter Eisentraut 8dc7c33812 Improve tablesync behavior with concurrent changes
When a table is removed from a subscription before the tablesync worker
could start, this would previously result in an error when reading
pg_subscription_rel.  Now we just ignore this.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-09 09:20:54 -04:00
Andres Freund 2c48f5db64 Use standard interrupt handling in logical replication launcher.
Previously the exit handling was only able to exit from within the
main loop, and not from within the backend code it calls.  Fix that by
using the standard die() SIGTERM handler, and adding the necessary
CHECK_FOR_INTERRUPTS() call.

This requires adding yet another process-type-specific branch to
ProcessInterrupts(), which hints that we probably should generalize
that handling.  But that's work for another day.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
2017-06-08 15:38:50 -07:00
Andres Freund 5fd56b9f5b Again report a useful error message when walreceiver's connection closes.
Since 7c4f52409a (merged in v10), a shutdown master is reported as
  FATAL:  unexpected result after CommandComplete: server closed the connection unexpectedly
by walsender. It used to be
  LOG:  replication terminated by primary server
  FATAL:  could not send end-of-streaming message to primary: no COPY in progress
while the old message clearly is not perfect, it's definitely better
than what's reported now.

The change comes from the attempt to handle finished COPYs without
erroring out, needed for the new logical replication, which wasn't
needed before.

There's probably better ways to handle this, but for now just
explicitly check for a closed connection.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f7c7dd08-855c-e4ed-41f4-d064a6c0665a@2ndquadrant.com
Backpatch: -
2017-06-08 14:51:43 -07:00
Peter Eisentraut 644ea35fc1 Fix updating of pg_subscription_rel from workers
A logical replication worker should not insert new rows into
pg_subscription_rel, only update existing rows, so that there are no
races if a concurrent refresh removes rows.  Adjust the API to be able
to choose that behavior.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-06-07 13:49:14 -04:00
Peter Eisentraut d4bfc06e29 Consistently use subscription name as application name
The logical replication apply worker uses the subscription name as
application name, except for table sync.  This was incorrectly set to
use the replication slot name, which might be different, in one case.
Also add a comment why the other case is different.
2017-06-06 22:11:22 -04:00
Andres Freund 9206ced1dc Clean up latch related code.
The larger part of this patch replaces usages of MyProc->procLatch
with MyLatch.  The latter works even early during backend startup,
where MyProc->procLatch doesn't yet.  While the affected code
shouldn't run in cases where it's not initialized, it might get copied
into places where it might.  Using MyLatch is simpler and a bit faster
to boot, so there's little point to stick with the previous coding.

While doing so I noticed some weaknesses around newly introduced uses
of latches that could lead to missed events, and an omitted
CHECK_FOR_INTERRUPTS() call in worker_spi.

As all the actual bugs are in v10 code, there doesn't seem to be
sufficient reason to backpatch this.

Author: Andres Freund
Discussion:
    https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de
    https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de
Backpatch: -
2017-06-06 16:13:00 -07:00
Peter Eisentraut e3a815d2fa Improve handover logic between sync and apply workers
Make apply busy wait check the catalog instead of shmem state to ensure
that next transaction will see the expected table synchronization state.

Also make the handover always go through same set of steps to make the
overall process easier to understand and debug.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Tested-by: Erik Rijkers <er@xs4all.nl>
2017-06-06 14:41:04 -04:00
Andres Freund c1abe6c786 Wire up query cancel interrupt for walsender backends.
This allows to cancel commands run over replication connections. While
it might have some use before v10, it has become important now that
normal SQL commands are allowed in database connected walsender
connections.

Author: Petr Jelinek
Reviewed-By: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8c97@2ndquadrant.com
2017-06-05 19:18:16 -07:00
Andres Freund 6e1dd2773e Unify SIGHUP handling between normal and walsender backends.
Because walsender and normal backends share the same main loop it's
problematic to have two different flag variables, set in signal
handlers, indicating a pending configuration reload.  Only certain
walsender commands reach code paths checking for the
variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
... LOGICAL, notably not base backups).

This is a bug present since the introduction of walsender, but has
gotten worse in releases since then which allow walsender to do more.

A later patch, not slated for v10, will similarly unify SIGHUP
handling in other types of processes as well.

Author: Petr Jelinek, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
Backpatch: 9.2-, bug is present since 9.0
2017-06-05 19:18:16 -07:00