Commit Graph

7888 Commits

Author SHA1 Message Date
Andres Freund
cc9f08b6b8 Move ExecProcNode from dispatch to function pointer based model.
This allows us to add stack-depth checks the first time an executor
node is called, and skip that overhead on following
calls. Additionally it yields a nice speedup.

While it'd probably have been a good idea to have that check all
along, it has become more important after the new expression
evaluation framework in b8d7f053c5 - there's no stack depth
check in common paths anymore now. We previously relied on
ExecEvalExpr() being executed somewhere.

We should move towards that model for further routines, but as this is
required for v10, it seems better to only do the necessary (which
already is quite large).

Author: Andres Freund, Tom Lane
Reported-By: Julien Rouhaud
Discussion:
    https://postgr.es/m/22833.1490390175@sss.pgh.pa.us
    https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
2017-07-30 16:18:21 -07:00
Alvaro Herrera
5e3254f086 Update copyright in recently added files 2017-07-26 18:17:18 -04:00
Alvaro Herrera
9915de6c1c Fix race conditions in replication slot operations
It is relatively easy to get a replication slot to look as still active
while one process is in the process of getting rid of it; when some
other process tries to "acquire" the slot, it would fail with an error
message of "replication slot XYZ is active for PID N".

The error message in itself is fine, except that when the intention is
to drop the slot, it is unhelpful: the useful behavior would be to wait
until the slot is no longer acquired, so that the drop can proceed.  To
implement this, we use a condition variable so that slot acquisition can
be told to wait on that condition variable if the slot is already
acquired, and we make any change in active_pid broadcast a signal on the
condition variable.  Thus, as soon as the slot is released, the drop
will proceed properly.

Reported by: Tom Lane
Discussion: https://postgr.es/m/11904.1499039688@sss.pgh.pa.us
Authors: Petr Jelínek, Álvaro Herrera
2017-07-25 13:26:49 -04:00
Dean Rasheed
d363d42bb9 Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition bounds.
Previously, UNBOUNDED meant no lower bound when used in the FROM list,
and no upper bound when used in the TO list, which was OK for
single-column range partitioning, but problematic with multiple
columns. For example, an upper bound of (10.0, UNBOUNDED) would not be
collocated with a lower bound of (10.0, UNBOUNDED), thus making it
difficult or impossible to define contiguous multi-column range
partitions in some cases.

Fix this by using MINVALUE and MAXVALUE instead of UNBOUNDED to
represent a partition column that is unbounded below or above
respectively. This syntax removes any ambiguity, and ensures that if
one partition's lower bound equals another partition's upper bound,
then the partitions are contiguous.

Also drop the constraint prohibiting finite values after an unbounded
column, and just document the fact that any values after MINVALUE or
MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED
multiple times, which was needlessly verbose.

Note: Forces a post-PG 10 beta2 initdb.

Report by Amul Sul, original patch by Amit Langote with some
additional hacking by me.

Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-21 09:20:47 +01:00
Tom Lane
3cb29c42f9 Add static assertions about pg_control fitting into one disk sector.
When pg_control was first designed, sizeof(ControlFileData) was small
enough that a comment seemed like plenty to document the assumption that
it'd fit into one disk sector.  Now it's nearly 300 bytes, raising the
possibility that somebody would carelessly add enough stuff to create
a problem.  Let's add a StaticAssertStmt() to ensure that the situation
doesn't pass unnoticed if it ever occurs.

While at it, rename PG_CONTROL_SIZE to PG_CONTROL_FILE_SIZE to make it
clearer what that symbol means, and convert the existing runtime
comparisons of sizeof(ControlFileData) vs. PG_CONTROL_FILE_SIZE to be
static asserts --- we didn't have that technology when this code was
first written.

Discussion: https://postgr.es/m/9192.1500490591@sss.pgh.pa.us
2017-07-19 16:16:57 -04:00
Tom Lane
b4c6d31c0b Fix serious performance problems in json(b) to_tsvector().
In an off-list followup to bug #14745, Bob Jones complained that
to_tsvector() on a 2MB jsonb value took an unreasonable amount of
time and space --- enough to draw the wrath of the OOM killer on
his machine.  On my machine, his example proved to require upwards
of 18 seconds and 4GB, which seemed pretty bogus considering that
to_tsvector() on the same data treated as text took just a couple
hundred msec and 10 or so MB.

On investigation, the problem is that the implementation scans each
string element of the json(b) and converts it to tsvector separately,
then applies tsvector_concat() to join those separate tsvectors.
The unreasonable memory usage came from leaking every single one of
the transient tsvectors --- but even without that mistake, this is an
O(N^2) or worse algorithm, because tsvector_concat() has to repeatedly
process the words coming from earlier elements.

We can fix it by accumulating all the lexeme data and applying
make_tsvector() just once.  As a side benefit, that also makes the
desired adjustment of lexeme positions far cheaper, because we can
just tweak the running "pos" counter between JSON elements.

In passing, try to make the explanation of that tweak more intelligible.
(I didn't think that a barely-readable comment far removed from the
actual code was helpful.)  And do some minor other code beautification.
2017-07-18 12:45:51 -04:00
Robert Haas
f81a91db4d Use a real RT index when setting up partition tuple routing.
Before, we always used a dummy value of 1, but that's not right when
the partitioned table being modified is inside of a WITH clause
rather than part of the main query.

Amit Langote, reported and reviewd by Etsuro Fujita, with a comment
change by me.

Discussion: http://postgr.es/m/ee12f648-8907-77b5-afc0-2980bcb0aa37@lab.ntt.co.jp
2017-07-17 21:29:45 -04:00
Tom Lane
decb08ebdf Code review for NextValueExpr expression node type.
Add missing infrastructure for this node type, notably in ruleutils.c where
its lack could demonstrably cause EXPLAIN to fail.  Add outfuncs/readfuncs
support.  (outfuncs support is useful today for debugging purposes.  The
readfuncs support may never be needed, since at present it would only
matter for parallel query and NextValueExpr should never appear in a
parallelizable query; but it seems like a bad idea to have a primnode type
that isn't fully supported here.)  Teach planner infrastructure that
NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node
with cost cpu_operator_cost.  Given its limited scope of usage, there
*might* be no live bug today from the lack of that knowledge, but it's
certainly going to bite us on the rear someday.  Teach pg_stat_statements
about the new node type, too.

While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction,
XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost.
Failing to do this for SQLValueFunction was an oversight in my commit
0bb51aa96.  The others are longer-standing oversights, but no time like the
present to fix them.  (In principle, CoerceToDomain could have cost much
higher than this, but it doesn't presently seem worth trying to examine the
domain's constraints here.)

Modify execExprInterp.c to execute NextValueExpr as an out-of-line
function; it seems quite unlikely to me that it's worth insisting that
it be inlined in all expression eval methods.  Besides, providing the
out-of-line function doesn't stop anyone from inlining if they want to.

Adjust some places where NextValueExpr support had been inserted with the
aid of a dartboard rather than keeping it in the same order as elsewhere.

Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-07-14 15:25:43 -04:00
Tom Lane
42171e2cd2 Stamp 10beta2. 2017-07-10 16:26:20 -04:00
Alvaro Herrera
572d6ee6d4 Fix locking in WAL receiver/sender shmem state structs
In WAL receiver and WAL server, some accesses to their corresponding
shared memory control structs were done without holding any kind of
lock, which could lead to inconsistent and possibly insecure results.

In walsender, fix by clarifying the locking rules and following them
correctly, as documented in the new comment in walsender_private.h;
namely that some members can be read in walsender itself without a lock,
because the only writes occur in the same process.  The rest of the
struct requires spinlock for accesses, as usual.

In walreceiver, fix by always holding spinlock while accessing the
struct.

While there is potentially a problem in all branches, it is minor in
stable ones.  This only became a real problem in pg10 because of quorum
commit in synchronous replication (commit 3901fd70cc), and a potential
security problem in walreceiver because a superuser() check was removed
by default monitoring roles (commit 25fff40798).  Thus, no backpatch.

In passing, clean up some leftover braces which were used to create
unconditional blocks.  Once upon a time these were used for
volatile-izing accesses to those shmem structs, which is no longer
required.  Many other occurrences of this pattern remain.

Author: Michaël Paquier
Reported-by: Michaël Paquier
Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi, Thomas Munro,
	Robert Haas
Discussion: https://postgr.es/m/CAB7nPqTWYqtzD=LN_oDaf9r-hAjUEPAy0B9yRkhcsLdRN8fzrw@mail.gmail.com
2017-06-30 18:06:33 -04:00
Peter Eisentraut
1acc04e404 Remove outdated comment
Author: Thomas Munro <thomas.munro@enterprisedb.com>
2017-06-30 14:43:05 -04:00
Tom Lane
f13ea95f9e Change pg_ctl to detect server-ready by watching status in postmaster.pid.
Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log.  We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.

In commit c61559ec3, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse.  In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read.  This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683f).  pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.

In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order.  This is needed
on Windows where the SHMEM_KEY line will never be written at all.  We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.

Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway.  (Yes, I found that
out the hard way.)

While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious.  This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else.  For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec().  (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)

In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.

Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
2017-06-28 17:31:32 -04:00
Andrew Gierth
8c55244ae3 Fix transition tables for ON CONFLICT.
We now disallow having triggers with both transition tables and ON
INSERT OR UPDATE (which was a PG extension to the spec anyway),
because in this case it's not at all clear how the transition tables
should work for an INSERT ... ON CONFLICT query.  Separate ON INSERT
and ON UPDATE triggers with transition tables are allowed, and the
transition tables for these reflect only the inserted and only the
updated tuples respectively.

Patch by Thomas Munro

Discussion: https://postgr.es/m/CAEepm%3D11KHQ0JmETJQihSvhZB5mUZL2xrqHeXbCeLhDiqQ39%3Dw%40mail.gmail.com
2017-06-28 19:00:55 +01:00
Andrew Gierth
c46c0e5202 Fix transition tables for wCTEs.
The original coding didn't handle this case properly; each separate
DML substatement needs its own set of transitions.

Patch by Thomas Munro

Discussion: https://postgr.es/m/CAL9smLCDQ%3D2o024rBgtD4WihzX8B3C6u_oSQ2K3%2BR5grJrV0bg%40mail.gmail.com
2017-06-28 18:59:01 +01:00
Andrew Gierth
501ed02cf6 Fix transition tables for partition/inheritance.
We disallow row-level triggers with transition tables on child tables.
Transition tables for triggers on the parent table contain only those
columns present in the parent.  (We can't mix tuple formats in a
single transition table.)

Patch by Thomas Munro

Discussion: https://postgr.es/m/CA%2BTgmoZzTBBAsEUh4MazAN7ga%3D8SsMC-Knp-6cetts9yNZUCcg%40mail.gmail.com
2017-06-28 18:55:03 +01:00
Tom Lane
ddb5fdc068 Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU.  This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings.  So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial.  If there are some, we can simply not install the
comment.  (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)

For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales.  I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.

With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings.  This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations.  Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().

Adjust documentation to match.  Also, expand Table 23.1 to show which
encodings are supported by ICU.

catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.

Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 13:54:23 -04:00
Tom Lane
0b13b2a771 Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once.  All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.

The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known".  This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.

While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them.  And make it return a count of the number of collations
it did add, which seems like it might be helpful.

Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities.  This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.

In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.

Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that.  There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.

Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 14:19:58 -04:00
Tom Lane
b6159202c9 Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it.  This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory.  For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c.  I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.

Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place.  Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday.  Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed.  Const-ify icu_from_uchar()'s
input string for consistency.

That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.

Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 12:22:06 -04:00
Peter Eisentraut
113b0045e2 Reformat comments about ResultRelInfo
Also add a comment on its new member PartitionRoot.

Reported-by: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
2017-06-21 15:43:23 -04:00
Tom Lane
382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane
c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane
e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Tom Lane
9ef2dbefc7 Final pgindent run with old pg_bsd_indent (version 1.3).
This is just to have a clean basis for comparison with the results of
the new version (which will indeed end up reverting some of these
changes...)

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:09:24 -04:00
Peter Eisentraut
a2141c42f9 Tweak publication fetching in psql
Viewing a table with \d in psql also shows the publications at table is
in.  If a publication is concurrently dropped, this shows an error,
because the view pg_publication_tables internally uses
pg_get_publication_tables(), which uses a catalog snapshot.  This can be
particularly annoying if a for-all-tables publication is concurrently
dropped.

To avoid that, write the query in psql differently.  Expose the function
pg_relation_is_publishable() to SQL and write the query using that.
That still has a risk of being affected by concurrent catalog changes,
but in this case it would be a table drop that causes problems, and then
the psql \d command wouldn't be interesting anymore anyway.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
2017-06-20 12:35:02 -04:00
Peter Eisentraut
20d7d68b09 Change pg_get_publication_tables to prosecdef false
This was apparently a mistake in the original commit.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
2017-06-20 10:03:35 -04:00
Andres Freund
3bdea167eb Fix leaking of small spilled subtransactions during logical decoding.
When, during logical decoding, a transaction gets too big, it's
contents get spilled to disk. Not just the top-transaction gets
spilled, but *also* all of its subtransactions, even if they're not
that large themselves.  Unfortunately we didn't clean up
such small spilled subtransactions from disk.

Fix that, by keeping better track of whether a transaction has been
spilled to disk.

Author: Andres Freund
Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello
Discussion:
    https://postgr.es/m/1457621358.355011041@f382.i.mail.ru
    https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com
Backpatch: 9.4-, where logical decoding was introduced
2017-06-18 19:12:56 -07:00
Magnus Hagander
bb1f8f9e5b Fix typos in comments
Author: Daniel Gustafsson <daniel@yesql.se>
2017-06-17 10:17:28 +02:00
Peter Eisentraut
e42645ad92 Define HAVE_UCOL_STRCOLLUTF8 on Windows
This should normally be determined by a configure check, but until
someone figures out how to do that on Windows, it's better that the code
uses the new function by default.
2017-06-17 00:17:10 -04:00
Noah Misch
39ac55918f Reconcile nodes/*funcs.c with PostgreSQL 10 work.
The _equalTableFunc() omission of coltypmods has semantic significance,
but I did not track down resulting user-visible bugs, if any.  The other
changes are cosmetic only, affecting order.  catversion bump due to
readfuncs.c field order change.
2017-06-16 00:16:11 -07:00
Alvaro Herrera
3ab7912c18 Rename function for consistency
Avoid using prefix "staext" when everything else uses "statext".

Author: Kyotaro HORIGUCHI
Discussion: https://postgr.es/m/20170615.140041.165731947.horiguchi.kyotaro@lab.ntt.co.jp
2017-06-15 11:44:33 -04:00
Robert Haas
f32d57fd70 Fix problems related to RangeTblEntry members enrname and enrtuples.
Commit 18ce3a4ab2 failed to update
the comments in parsenodes.h for the new members, and made only
incomplete updates to src/backend/nodes

Thomas Munro, per a report from Noah Misch.

Discussion: http://postgr.es/m/20170611062525.GA1628882@rfd.leadboat.com
2017-06-14 16:19:46 -04:00
Andres Freund
6c2003f8a1 Don't force-assign transaction id when exporting a snapshot.
Previously we required every exported transaction to have an xid
assigned. That was used to check that the exporting transaction is
still running, which in turn is needed to guarantee that that
necessary rows haven't been removed in between exporting and importing
the snapshot.

The exported xid caused unnecessary problems with logical decoding,
because slot creation has to wait for all concurrent xid to finish,
which in turn serializes concurrent slot creation.   It also
prohibited snapshots to be exported on hot-standby replicas.

Instead export the virtual transactionid, which avoids the unnecessary
serialization and the inability to export snapshots on standbys. This
changes the file name of the exported snapshot, but since we never
documented what that one means, that seems ok.

Author: Petr Jelinek, slightly editorialized by me
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f598b4b8-8cd7-0d54-0939-adda763d8c34@2ndquadrant.com
2017-06-14 11:57:21 -07:00
Robert Haas
b08df9cab7 Teach predtest.c about CHECK clauses to fix partitioning bugs.
In a CHECK clause, a null result means true, whereas in a WHERE clause
it means false.  predtest.c provided different functions depending on
which set of semantics applied to the predicate being proved, but had
no option to control what a null meant in the clauses provided as
axioms.  Add one.

Use that in the partitioning code when figuring out whether the
validation scan on a new partition can be skipped.  Rip out the
old logic that attempted (not very successfully) to compensate
for the absence of the necessary support in predtest.c.

Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and
incorporating feedback from Tom Lane.

Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-06-14 13:13:11 -04:00
Tom Lane
0436f6bde8 Disallow set-returning functions inside CASE or COALESCE.
When we reimplemented SRFs in commit 69f4b9c85, our initial choice was
to allow the behavior to vary from historical practice in cases where a
SRF call appeared within a conditional-execution construct (currently,
only CASE or COALESCE).  But that was controversial to begin with, and
subsequent discussion has resulted in a consensus that it's better to
throw an error instead of executing the query differently from before,
so long as we can provide a reasonably clear error message and a way to
rewrite the query.

Hence, add a parser mechanism to allow detection of such cases during
parse analysis.  The mechanism just requires storing, in the ParseState,
a pointer to the set-returning FuncExpr or OpExpr most recently emitted
by parse analysis.  Then the parsing functions for CASE and COALESCE can
detect the presence of a SRF in their arguments by noting whether this
pointer changes while analyzing their arguments.  Furthermore, if it does,
it provides a suitable error cursor location for the complaint.  (This
means that if there's more than one SRF in the arguments, the error will
point at the last one to be analyzed not the first.  While connoisseurs of
parsing behavior might find that odd, it's unlikely the average user would
ever notice.)

While at it, we can also provide more specific error messages than before
about some pre-existing restrictions, such as no-SRFs-within-aggregates.
Also, reject at parse time cases where a NULLIF or IS DISTINCT FROM
construct would need to return a set.  We've never supported that, but the
restriction is depended on in more subtle ways now, so it seems wise to
detect it at the start.

Also, provide some documentation about how to rewrite a SRF-within-CASE
query using a custom wrapper SRF.

It turns out that the information_schema.user_mapping_options view
contained an instance of exactly the behavior we're now forbidding; but
rewriting it makes it more clear and safer too.

initdb forced because of user_mapping_options change.

Patch by me, with error message suggestions from Alvaro Herrera and
Andres Freund, pursuant to a complaint from Regina Obe.

Discussion: https://postgr.es/m/000001d2d5de$d8d66170$8a832450$@pcorp.us
2017-06-13 23:46:39 -04:00
Tom Lane
651902deb1 Re-run pgindent.
This is just to have a clean base state for testing of Piotr Stefaniak's
latest version of FreeBSD indent.  I fixed up a couple of places where
pgindent would have changed format not-nicely.  perltidy not included.

Discussion: https://postgr.es/m/VI1PR03MB119959F4B65F000CA7CD9F6BF2CC0@VI1PR03MB1199.eurprd03.prod.outlook.com
2017-06-13 13:05:59 -04:00
Peter Eisentraut
ec7129b781 Fix collprovider of predefined collations
An earlier version of the patch had collprovider as an integer and thus
set these to 0, but the correct setting is now null.
2017-06-13 08:55:09 -04:00
Andres Freund
2c48f5db64 Use standard interrupt handling in logical replication launcher.
Previously the exit handling was only able to exit from within the
main loop, and not from within the backend code it calls.  Fix that by
using the standard die() SIGTERM handler, and adding the necessary
CHECK_FOR_INTERRUPTS() call.

This requires adding yet another process-type-specific branch to
ProcessInterrupts(), which hints that we probably should generalize
that handling.  But that's work for another day.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
2017-06-08 15:38:50 -07:00
Andrew Dunstan
f7e6853e1a Mark to_tsvector(regconfig,json[b]) functions immutable
This make them consistent with the text function and means they can be
used in functional indexes.

Catalog version bumped.

Per gripe from Josh Berkus.
2017-06-08 15:47:10 -04:00
Robert Haas
0eac8e7ff7 Add statistics subdirectory to Makefile.
Commit 7b504eb282 overlooked this.

Report and patch by Kyotaro Horiguchi

Discussion: http://postgr.es/m/20170608.145852.54673832.horiguchi.kyotaro@lab.ntt.co.jp
2017-06-08 11:29:50 -04:00
Peter Eisentraut
644ea35fc1 Fix updating of pg_subscription_rel from workers
A logical replication worker should not insert new rows into
pg_subscription_rel, only update existing rows, so that there are no
races if a concurrent refresh removes rows.  Adjust the API to be able
to choose that behavior.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-06-07 13:49:14 -04:00
Andres Freund
6e1dd2773e Unify SIGHUP handling between normal and walsender backends.
Because walsender and normal backends share the same main loop it's
problematic to have two different flag variables, set in signal
handlers, indicating a pending configuration reload.  Only certain
walsender commands reach code paths checking for the
variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
... LOGICAL, notably not base backups).

This is a bug present since the introduction of walsender, but has
gotten worse in releases since then which allow walsender to do more.

A later patch, not slated for v10, will similarly unify SIGHUP
handling in other types of processes as well.

Author: Petr Jelinek, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
Backpatch: 9.2-, bug is present since 9.0
2017-06-05 19:18:16 -07:00
Andres Freund
c6c3334364 Prevent possibility of panics during shutdown checkpoint.
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so.  At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits).  So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.

To fix this, divide the walsender shutdown into two phases.  First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown.  Otherwise this causes the walsender
to switch to the "stopping" state. In this state, the walsender will
reject any further replication commands. The checkpointer begins the
shutdown checkpoint once all walsenders are confirmed as
stopping. When the shutdown checkpoint finishes, the postmaster sends
us SIGUSR2. This instructs walsender to send any outstanding WAL,
including the shutdown checkpoint record, wait for it to be replicated
to the standby, and then exit.

Author: Andres Freund, based on an earlier patch by Michael Paquier
Reported-By: Fujii Masao, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
Backpatch: 9.4, where logical decoding was introduced
2017-06-05 19:18:15 -07:00
Andres Freund
703f148e98 Revert "Prevent panic during shutdown checkpoint"
This reverts commit 086221cf6b, which
was made to master only.

The approach implemented in the above commit has some issues.  While
those could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.

Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
2017-06-05 19:18:15 -07:00
Peter Eisentraut
9907b55ceb Fix ALTER SUBSCRIPTION grammar ambiguity
There was a grammar ambiguity between SET PUBLICATION name REFRESH and
SET PUBLICATION SKIP REFRESH, because SKIP is not a reserved word.  To
resolve that, fold the refresh choice into the WITH options.  Refreshing
is the default now.

Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-06-05 21:43:25 -04:00
Tom Lane
3e60c6f723 Code review for shm_toc.h/.c.
Declare the toc_nentry field as uint32 not Size.  Since shm_toc_lookup()
reads the field without any lock, it has to be atomically readable, and
we do not assume that for fields wider than 32 bits.  Performance would
be impossibly bad for entry counts approaching 2^32 anyway, so there is
no need to try to preserve maximum width here.

This is probably an academic issue, because even if reading int64 isn't
atomic, the high order half would never change in practice.  Still, it's
a coding rule violation, so let's fix it.

Adjust some other not-terribly-well-chosen data types too, and copy-edit
some comments.  Make shm_toc_attach's Asserts consistent with
shm_toc_create's.

None of this looks to be a live bug, so no need for back-patch.

Discussion: https://postgr.es/m/16984.1496679541@sss.pgh.pa.us
2017-06-05 14:50:59 -04:00
Tom Lane
d466335064 Don't be so trusting that shm_toc_lookup() will always succeed.
Given the possibility of race conditions and so on, it seems entirely
unsafe to just assume that shm_toc_lookup() always finds the key it's
looking for --- but that was exactly what all but one call site were
doing.  To fix, add a "bool noError" argument, similarly to what we
have in many other functions, and throw an error on an unexpected
lookup failure.  Remove now-redundant Asserts that a rather random
subset of call sites had.

I doubt this will throw any light on buildfarm member lorikeet's
recent failures, because if an unnoticed lookup failure were involved,
you'd kind of expect a null-pointer-dereference crash rather than the
observed symptom.  But you never know ... and this is better coding
practice even if it never catches anything.

Discussion: https://postgr.es/m/9697.1496675981@sss.pgh.pa.us
2017-06-05 12:05:42 -04:00
Heikki Linnakangas
553e16951c Fix comments in simplehash.h.
Jeff Janes and me.

Discussion: https://www.postgresql.org/message-id/CAMkU=1zYnniLYg+W9itL93DXebCjx6Uk6m_=Xa8p_zM65X3S0Q@mail.gmail.com
2017-06-05 11:07:42 +03:00
Tom Lane
9db7d47f90 #ifdef out assorted unused GEQO code.
I'd always assumed that backend/optimizer/geqo/'s remarkably poor
showing on code coverage metrics was because we weren't exercising
it much in the regression tests.  But it turns out that a good chunk
of the problem is that there's a bunch of code that is physically
unreachable (because the calls to it are #ifdef'd out in geqo_main.c)
but is being built anyway.  Making the called code have #if guards
similar to the calling code saves a couple of kilobytes of executable
size and should make the coverage numbers more reflective of reality.

It's arguable that we should just delete all the unused recombination
mechanisms altogether, but I didn't feel a need to go that far today.
2017-06-04 13:34:05 -04:00
Tom Lane
0d18852666 Disallow CREATE INDEX if table is already in use in current session.
If we allow this, whatever outer command has the table open will not know
about the new index and may fail to update it as needed, as shown in a
report from Laurenz Albe.  We already had such a prohibition in place for
ALTER TABLE, but the CREATE INDEX syntax missed the check.

Fixing it requires an API change for DefineIndex(), which conceivably
would break third-party extensions if we were to back-patch it.  Given
how long this problem has existed without being noticed, fixing it in
the back branches doesn't seem worth that risk.

Discussion: https://postgr.es/m/A737B7A37273E048B164557ADEF4A58B53A4DC9A@ntex2010i.host.magwien.gv.at
2017-06-04 12:02:41 -04:00
Peter Eisentraut
9fcf670c2e Fix signal handling in logical replication workers
The logical replication worker processes now use the normal die()
handler for SIGTERM and CHECK_FOR_INTERRUPTS() instead of custom code.
One problem before was that the apply worker would not exit promptly
when a subscription was dropped, which could lead to deadlocks.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-02 14:49:23 -04:00
Tom Lane
b5b3229141 Avoid -Wconversion warnings from direct use of GET_n_BYTES macros.
The GET/SET_n_BYTES macros are meant to be infrastructure for the
DatumGetFoo/FooGetDatum macros, which include a cast to the intended
target type.  Using them directly without a cast, as DatumGetFloat4
and friends previously did, can yield warnings when -Wconversion is on.
This is of little significance when building Postgres proper, because
there are such a huge number of such warnings in the server that nobody
would think -Wconversion is of any use.  But some extensions build with
-Wconversion due to outside constraints.  Commit 14cca1bf8 did a disservice
to those extensions by moving DatumGetFloat4 et al into postgres.h,
where they can now cause warnings in extension builds.

To fix, use DatumGetInt32 and friends in place of the low-level macros.
This is arguably a bit cleaner anyway.

Chapman Flack

Discussion: https://postgr.es/m/592E4D04.1070609@anastigmatix.net
2017-05-31 11:27:21 -04:00
Tom Lane
54e839fe29 Sort syscache identifiers into alphabetical order.
Not much point in having a convention about this if we don't enforce it.

Mark Dilger

Discussion: https://postgr.es/m/7F67FBEF-C3B3-404E-8EC6-E02ACB15D894@gmail.com
2017-05-30 18:47:13 -04:00
Tom Lane
80f583ffe9 Fix omission of locations in outfuncs/readfuncs partitioning node support.
We could have limped along without this for v10, which was my intention
when I annotated the bug in commit 76a3df6e5.  But consensus is that it's
better to fix it now and take the cost of a post-beta1 initdb (which is
needed because these node types are stored in pg_class.relpartbound).

Since we're forcing initdb anyway, take the opportunity to make the node
type identification strings match the node struct names, instead of being
randomly different from them.

Discussion: https://postgr.es/m/E1dFBEX-0004wt-8t@gemulon.postgresql.org
2017-05-30 11:32:41 -04:00
Tom Lane
76a3df6e5e Code review focused on new node types added by partitioning support.
Fix failure to check that we got a plain Const from const-simplification of
a coercion request.  This is the cause of bug #14666 from Tian Bing: there
is an int4 to money cast, but it's only stable not immutable (because of
dependence on lc_monetary), resulting in a FuncExpr that the code was
miserably unequipped to deal with, or indeed even to notice that it was
failing to deal with.  Add test cases around this coercion behavior.

In view of the above, sprinkle the code liberally with castNode() macros,
in hope of catching the next such bug a bit sooner.  Also, change some
functions that were randomly declared to take Node* to take more specific
pointer types.  And change some struct fields that were declared Node*
but could be given more specific types, allowing removal of assorted
explicit casts.

Place PARTITION_MAX_KEYS check a bit closer to the code it's protecting.
Likewise check only-one-key-for-list-partitioning restriction in a less
random place.

Avoid not-per-project-style usages like !strcmp(...).

Fix assorted failures to avoid scribbling on the input of parse
transformation.  I'm not sure how necessary this is, but it's entirely
silly for these functions to be expending cycles to avoid that and not
getting it right.

Add guards against partitioning on system columns.

Put backend/nodes/ support code into an order that matches handling
of these node types elsewhere.

Annotate the fact that somebody added location fields to PartitionBoundSpec
and PartitionRangeDatum but forgot to handle them in
outfuncs.c/readfuncs.c.  This is fairly harmless for production purposes
(since readfuncs.c would just substitute -1 anyway) but it's still bogus.
It's not worth forcing a post-beta1 initdb just to fix this, but if we
have another reason to force initdb before 10.0, we should go back and
clean this up.

Contrariwise, somebody added location fields to PartitionElem and
PartitionSpec but forgot to teach exprLocation() about them.

Consolidate duplicative code in transformPartitionBound().

Improve a couple of error messages.

Improve assorted commentary.

Re-pgindent the files touched by this patch; this affects a few comment
blocks that must have been added quite recently.

Report: https://postgr.es/m/20170524024550.29935.14396@wrigleys.postgresql.org
2017-05-28 23:20:28 -04:00
Bruce Momjian
a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Peter Eisentraut
944dc0f9ce Check relkind of tables in CREATE/ALTER SUBSCRIPTION
We used to only check for a supported relkind on the subscriber during
replication, which is needed to ensure that the setup is valid and we
don't crash.  But it's also useful to tell the user immediately when
CREATE or ALTER SUBSCRIPTION is executed that the relation being added
to the subscription is not of a supported relkind.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-05-16 22:57:16 -04:00
Tom Lane
c079673dcb Preventive maintenance in advance of pgindent run.
Reformat various places in which pgindent will make a mess, and
fix a few small violations of coding style that I happened to notice
while perusing the diffs from a pgindent dry run.

There is one actual bug fix here: the need-to-enlarge-the-buffer code
path in icu_convert_case was obviously broken.  Perhaps it's unreachable
in our usage?  Or maybe this is just sadly undertested.
2017-05-16 20:36:35 -04:00
Robert Haas
59f40566ca Fix relcache leak when row triggers on partitions are fired by COPY.
Thomas Munro, reviewed by Amit Langote

Discussion: http://postgr.es/m/CAEepm=15Jss-yhFApuKzxcoCuFnb8TR8iQiWMjG=CLYPx48QLw@mail.gmail.com
2017-05-16 12:46:32 -04:00
Tom Lane
5ad367a35b Stamp 10beta1. 2017-05-15 17:20:59 -04:00
Tom Lane
b5b0db19b8 Fix handling of extended statistics during ALTER COLUMN TYPE.
ALTER COLUMN TYPE on a column used by a statistics object fails since
commit 928c4de30, because the relevant switch in ATExecAlterColumnType
is unprepared for columns to have dependencies from OCLASS_STATISTIC_EXT
objects.

Although the existing types of extended statistics don't actually need us
to do any work for a column type change, it seems completely indefensible
that that assumption is hidden behind the failure of an unrelated module
to contain any code for the case.  Hence, create and call an API function
in statscmds.c where the assumption can be explained, and where we could
add code to deal with the problem when it inevitably becomes real.

Also, the reason this wasn't handled before, neither for extended stats
nor for the last half-dozen new OCLASS kinds :-(, is that the default:
in that switch suppresses compiler warnings, allowing people to miss the
need to consider it when adding an OCLASS.  We don't really need a default
because surely getObjectClass should only return valid values of the enum;
so remove it, and add the missed OCLASS entries where they should be.

Discussion: https://postgr.es/m/20170512221010.nglatgt5azzdxjlj@alvherre.pgsql
2017-05-14 12:22:25 -04:00
Tom Lane
f674743487 Remove no-longer-needed fields of Hash plan nodes.
skewColType/skewColTypmod are no longer used in the wake of commit
9aab83fc5, and seem unlikely to be wanted in future, so let's drop 'em.

Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-14 11:07:40 -04:00
Tom Lane
f04c9a6146 Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics".  Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural.  That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.

This commit fixes the code and a lot of comments (though I may have
missed a few).  I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.

I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-14 10:55:01 -04:00
Andres Freund
955a684e04 Fix race condition leading to hanging logical slot creation.
The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records.  Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.

That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions.  That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.

It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.

Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code.  That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
   we cannot use it to build the initial snapshot.  Instead we have to
   wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
   the all transaction perceived as running based on commit/abort
   records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.

Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release.  As this is going to be backpatched, we have to hack
around a bit to keep on-disk compatibility.  A later commit will
rejigger that on master.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
2017-05-13 14:21:00 -07:00
Tom Lane
9aab83fc50 Redesign get_attstatsslot()/free_attstatsslot() for more safety and speed.
The mess cleaned up in commit da0759600 is clear evidence that it's a
bug hazard to expect the caller of get_attstatsslot()/free_attstatsslot()
to provide the correct type OID for the array elements in the slot.
Moreover, we weren't even getting any performance benefit from that,
since get_attstatsslot() was extracting the real type OID from the array
anyway.  So we ought to get rid of that requirement; indeed, it would
make more sense for get_attstatsslot() to pass back the type OID it found,
in case the caller isn't sure what to expect, which is likely in binary-
compatible-operator cases.

Another problem with the current implementation is that if the stats array
element type is pass-by-reference, we incur a palloc/memcpy/pfree cycle
for each element.  That seemed acceptable when the code was written because
we were targeting O(10) array sizes --- but these days, stats arrays are
almost always bigger than that, sometimes much bigger.  We can save a
significant number of cycles by doing one palloc/memcpy/pfree of the whole
array.  Indeed, in the now-probably-common case where the array is toasted,
that happens anyway so this method is basically free.  (Note: although the
catcache code will inline any out-of-line toasted values, it doesn't
decompress them.  At the other end of the size range, it doesn't expand
short-header datums either.  In either case, DatumGetArrayTypeP would have
to make a copy.  We do end up using an extra array copy step if the element
type is pass-by-value and the array length is neither small enough for a
short header nor large enough to have suffered compression.  But that
seems like a very acceptable price for winning in pass-by-ref cases.)

Hence, redesign to take these insights into account.  While at it,
convert to an API in which we fill a struct rather than passing a bunch
of pointers to individual output arguments.  That will make it less
painful if we ever want further expansion of what get_attstatsslot can
pass back.

It's certainly arguable that this is new development and not something to
push post-feature-freeze.  However, I view it as primarily bug-proofing
and therefore something that's better to have sooner not later.  Since
we aren't quite at beta phase yet, let's put it in.

Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-13 15:14:39 -04:00
Robert Haas
1848b73d45 Teach \d+ to show partitioning constraints.
The fact that we didn't have this in the first place is likely why
the problem fixed by f8bffe9e6d
escaped detection.

Patch by Amit Langote, reviewed and slightly adjusted by me.

Discussion: http://postgr.es/m/CA+TgmoYWnV2GMnYLG-Czsix-E1WGAbo4D+0tx7t9NdfYBDMFsA@mail.gmail.com
2017-05-13 12:04:53 -04:00
Robert Haas
f8bffe9e6d Fix multi-column range partitioning constraints.
The old logic was just plain wrong.

Report by Olaf Gawenda.  Patch by Amit Langote, reviewed by
Beena Emerson and by me.  Minor adjustments by me also.
2017-05-13 11:36:41 -04:00
Alvaro Herrera
d99d58cdc8 Complete tab completion for DROP STATISTICS
Tab-completing DROP STATISTICS would only work if you started writing
the schema name containing the statistics object, because the visibility
clause was missing.  To add it, we need to add SQL-callable support for
testing visibility of a statistics object, like all other object types
already have.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-13 01:05:48 -03:00
Tom Lane
2df5d46555 Avoid searching for callback functions in CallSyscacheCallbacks().
We have now grown enough registerable syscache-invalidation callback
functions that the original assumption that there would be few of them
is causing performance problems.  In particular, let's fix things so that
CallSyscacheCallbacks doesn't have to search the whole array to find
which callback(s) to invoke for a given cache ID.  Preserve the original
behavior that callbacks are called in order of registration, just in
case there's someplace that depends on that (which I doubt).

In support of this, export the number of syscaches from syscache.h.
People could have found that out anyway from the enum, but adding a
#define makes that much safer.

This provides a useful additional speedup in Mathieu Fenniak's
logical-decoding test case, although we're reaching the point of
diminishing returns there.  I think any further improvement will have
to come from reducing the number of cache invalidations that are
triggered in the first place.  Still, we can hope that this change
gives some incremental benefit for all invalidation scenarios.

Back-patch to 9.4 where logical decoding was introduced.

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 19:05:27 -04:00
Tom Lane
50ee1c7462 Avoid searching for the target catcache in CatalogCacheIdInvalidate.
A test case provided by Mathieu Fenniak shows that the initial search for
the target catcache in CatalogCacheIdInvalidate consumes a very significant
amount of overhead in cases where cache invalidation is triggered but has
little useful work to do.  There is no good reason for that search to exist
at all, as the index array maintained by syscache.c allows direct lookup of
the catcache from its ID.  We just need a frontend function in syscache.c,
matching the division of labor for most other cache-accessing operations.

While there's more that can be done in this area, this patch alone reduces
the runtime of Mathieu's example by 2X.  We can hope that it offers some
useful benefit in other cases too, although usually cache invalidation
overhead is not such a striking fraction of the total runtime.

Back-patch to 9.4 where logical decoding was introduced.  It might be
worth going further back, but presently the only case we know of where
cache invalidation is really a significant burden is in logical decoding.
Also, older branches have fewer catcaches, reducing the possible benefit.

(Note: although this nominally changes catcache's API, we have always
documented CatalogCacheIdInvalidate as a private function, so I would
have little sympathy for an external module calling it directly.  So
backpatching should be fine.)

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 18:17:29 -04:00
Tom Lane
928c4de309 Fix dependencies for extended statistics objects.
A stats object ought to have a dependency on each individual column
it reads, not the entire table.  Doing this honestly lets us get rid
of the hard-wired logic in RemoveStatisticsExt, which seems to have
been misguidedly modeled on RemoveStatistics; and it will be far easier
to extend to multiple tables later.

Also, add overlooked dependency on owner, and make the dependency on
schema be NORMAL like every other such dependency.

There remains some unfinished work here, which is to allow statistics
objects to be extension members.  That takes more effort than just
adding the dependency call, though, so I left it out for now.

initdb forced because this changes the set of pg_depend records that
should exist for a statistics object.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-12 16:26:31 -04:00
Alvaro Herrera
bc085205c8 Change CREATE STATISTICS syntax
Previously, we had the WITH clause in the middle of the command, where
you'd specify both generic options as well as statistic types.  Few
people liked this, so this commit changes it to remove the WITH keyword
from that clause and makes it accept statistic types only.  (We
currently don't have any generic options, but if we invent in the
future, we will gain a new WITH clause, probably at the end of the
command).

Also, the column list is now specified without parens, which makes the
whole command look more similar to a SELECT command.  This change will
let us expand the command to supporting expressions (not just columns
names) as well as multiple tables and their join conditions.

Tom added lots of code comments and fixed some parts of the CREATE
STATISTICS reference page, too; more changes in this area are
forthcoming.  He also fixed a potential problem in the alter_generic
regression test, reducing verbosity on a cascaded drop to avoid
dependency on message ordering, as we do in other tests.

Tom also closed a security bug: we documented that table ownership was
required in order to create a statistics object on it, but didn't
actually implement it.

Implement tab-completion for statistics objects.  This can stand some
more improvement.

Authors: Alvaro Herrera, with lots of cleanup by Tom Lane
Discussion: https://postgr.es/m/20170420212426.ltvgyhnefvhixm6i@alvherre.pgsql
2017-05-12 14:59:35 -03:00
Peter Eisentraut
d496a65790 Standardize "WAL location" terminology
Other previously used terms were "WAL position" or "log position".
2017-05-12 13:51:27 -04:00
Peter Eisentraut
c1a7f64b4a Replace "transaction log" with "write-ahead log"
This makes documentation and error messages match the renaming of "xlog"
to "wal" in APIs and file naming.
2017-05-12 11:52:43 -04:00
Peter Eisentraut
b807f59828 Rework the options syntax for logical replication commands
For CREATE/ALTER PUBLICATION/SUBSCRIPTION, use similar option style as
other statements that use a WITH clause for options.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-05-12 08:57:49 -04:00
Simon Riggs
024711bb54 Lag tracking for logical replication
Lag tracking is called for each commit, but we introduce
a pacing delay to ensure we don't swamp the lag tracker.

Author: Petr Jelinek, with minor pacing delay code from me
2017-05-12 10:50:56 +01:00
Tom Lane
d10c626de4 Rename WAL-related functions and views to use "lsn" not "location".
Per discussion, "location" is a rather vague term that could refer to
multiple concepts.  "LSN" is an unambiguous term for WAL locations and
should be preferred.  Some function names, view column names, and function
output argument names used "lsn" already, but others used "location",
as well as yet other terms such as "wal_position".  Since we've already
renamed a lot of things in this area from "xlog" to "wal" for v10,
we may as well incur a bit more compatibility pain and make these names
all consistent.

David Rowley, minor additional docs hacking by me

Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com
2017-05-11 11:49:59 -04:00
Alvaro Herrera
b66adb7b0c Revert "Permit dump/reload of not-too-large >1GB tuples"
This reverts commits fa2fa99552 and 42f50cb8fa.

While the functionality that was intended to be provided by these
commits is desired, the patch didn't actually solve as many of the
problematic situations as we hoped, and it created a bunch of its own
problems.  Since we're going to require more extensive changes soon for
other reasons and users have been working around these problems for a
long time already, there is no point in spending effort in fixing this
halfway measure.

Per complaint from Tom Lane.
Discussion: https://postgr.es/m/21407.1484606922@sss.pgh.pa.us

(Commit fa2fa99552 had already been reverted in branches 9.5 as
f858524ee4 and 9.6 as e9e44a0953, so this touches master only.
Commit 42f50cb8fa was not present in the older branches.)
2017-05-10 18:41:27 -03:00
Peter Eisentraut
489b96e80b Improve memory use in logical replication apply
Previously, the memory used by the logical replication apply worker for
processing messages would never be freed, so that could end up using a
lot of memory.  To improve that, change the existing ApplyContext memory
context to ApplyMessageContext and reset that after every
message (similar to MessageContext used elsewhere).  For consistency of
naming, rename the ApplyCacheContext to ApplyContext.

Author: Stas Kelvich <s.kelvich@postgrespro.ru>
2017-05-09 14:51:49 -04:00
Peter Eisentraut
013c1178fd Remove the NODROP SLOT option from DROP SUBSCRIPTION
It turned out this approach had problems, because a DROP command should
not have any options other than CASCADE and RESTRICT.  Instead, always
attempt to drop the slot if there is one configured, but also add an
ALTER SUBSCRIPTION action to set the slot to NONE.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/29431.1493730652@sss.pgh.pa.us
2017-05-09 10:20:42 -04:00
Tom Lane
da07596006 Further patch rangetypes_selfuncs.c's statistics slot management.
Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8,
not of the type of the column the statistics are for.

This bug is at least partly the fault of sloppy specification comments
for get_attstatsslot()/free_attstatsslot(): the type OID they want is that
of the stavalues entries, not of the underlying column.  (I double-checked
other callers and they seem to get this right.)  Adjust the comments to be
more correct.

Per buildfarm.

Security: CVE-2017-7484
2017-05-08 15:03:14 -04:00
Peter Eisentraut
e2d4ef8de8 Add security checks to selectivity estimation functions
Some selectivity estimation functions run user-supplied operators over
data obtained from pg_statistic without security checks, which allows
those operators to leak pg_statistic data without having privileges on
the underlying tables.  Fix by checking that one of the following is
satisfied: (1) the user has table or column privileges on the table
underlying the pg_statistic data, or (2) the function implementing the
user-supplied operator is leak-proof.  If neither is satisfied, planning
will proceed as if there are no statistics available.

At least one of these is satisfied in most cases in practice.  The only
situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Peter Eisentraut <peter_e@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>

Security: CVE-2017-7484
2017-05-08 09:26:32 -04:00
Heikki Linnakangas
eb61136dc7 Remove support for password_encryption='off' / 'plain'.
Storing passwords in plaintext hasn't been a good idea for a very long
time, if ever. Now seems like a good time to finally forbid it, since we're
messing with this in PostgreSQL 10 anyway.

Remove the CREATE/ALTER USER UNENCRYPTED PASSSWORD 'foo' syntax, since
storing passwords unencrypted is no longer supported. ENCRYPTED PASSWORD
'foo' is still accepted, but ENCRYPTED is now just a noise-word, it does
the same as just PASSWORD 'foo'.

Likewise, remove the --unencrypted option from createuser, but accept
--encrypted as a no-op for backward compatibility. AFAICS, --encrypted was
a no-op even before this patch, because createuser encrypted the password
before sending it to the server even if --encrypted was not specified. It
added the ENCRYPTED keyword to the SQL command, but since the password was
already in encrypted form, it didn't make any difference. The documentation
was not clear on whether that was intended or not, but it's moot now.

Also, while password_encryption='on' is still accepted as an alias for
'md5', it is now marked as hidden, so that it is not listed as an accepted
value in error hints, for example. That's not directly related to removing
'plain', but it seems better this way.

Reviewed by Michael Paquier

Discussion: https://www.postgresql.org/message-id/16e9b768-fd78-0b12-cfc1-7b6b7f238fde@iki.fi
2017-05-08 11:26:07 +03:00
Peter Eisentraut
086221cf6b Prevent panic during shutdown checkpoint
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-05-05 10:31:42 -04:00
Heikki Linnakangas
0557a5dc2c Make SCRAM salts and nonces longer.
The salt is stored base64-encoded. With the old 10 bytes raw length, it was
always padded to 16 bytes after encoding. We might as well use 12 raw bytes
for the salt, and it's still encoded into 16 bytes.

Similarly for the random nonces, use a raw length that's divisible by 3, so
that there's no padding after base64 encoding. Make the nonces longer while
we're at it. 10 bytes was probably enough to prevent replay attacks, but
there's no reason to be skimpy here.

Per suggestion from Álvaro Hernández Tortosa.

Discussion: https://www.postgresql.org/message-id/df8c6e27-4d8e-5281-96e5-131a4e638fc8@8kdata.com
2017-05-05 10:02:13 +03:00
Heikki Linnakangas
e6e9c4da3a Misc cleanup of SCRAM code.
* Remove is_scram_verifier() function. It was unused.
* Fix sanitize_char() function, used in error messages on protocol
  violations, to print bytes >= 0x7F correctly.
* Change spelling of scram_MockSalt() function to be more consistent with
  the surroundings.
* Change a few more references to "server proof" to "server signature" that
  I missed in commit d981074c24.
2017-05-05 10:01:44 +03:00
Heikki Linnakangas
8f8b9be51f Add PQencryptPasswordConn function to libpq, use it in psql and createuser.
The new function supports creating SCRAM verifiers, in addition to md5
hashes. The algorithm is chosen based on password_encryption, by default.

This fixes the issue reported by Jeff Janes, that there was previously
no way to create a SCRAM verifier with "\password".

Michael Paquier and me

Discussion: https://www.postgresql.org/message-id/CAMkU%3D1wfBgFPbfAMYZQE78p%3DVhZX7nN86aWkp0QcCp%3D%2BKxZ%3Dbg%40mail.gmail.com
2017-05-03 11:19:07 +03:00
Tom Lane
23c6eb0336 Remove create_singleton_array(), hard-coding the case in its sole caller.
create_singleton_array() was not really as useful as we perhaps thought
when we added it.  It had never accreted more than one call site, and is
only saving a dozen lines of code at that one, which is considerably less
bulk than the function itself.  Moreover, because of its insistence on
using the caller's fn_extra cache space, it's arguably a coding hazard.
text_to_array_internal() does not currently use fn_extra in any other way,
but if it did it would be subtly broken, since the conflicting fn_extra
uses could be needed within a single query, in the seldom-tested case that
the field separator varies during the query.  The same objection seems
likely to apply to any other potential caller.

The replacement code is a bit uglier, because it hardwires knowledge of
the storage parameters of type TEXT, but it's not like we haven't got
dozens or hundreds of other places that do the same.  Uglier seems like
a good tradeoff for smaller, faster, and safer.

Per discussion with Neha Khatri.

Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
2017-05-02 20:41:37 -04:00
Tom Lane
92a43e4857 Reduce semijoins with unique inner relations to plain inner joins.
If the inner relation can be proven unique, that is it can have no more
than one matching row for any row of the outer query, then we might as
well implement the semijoin as a plain inner join, allowing substantially
more freedom to the planner.  This is a form of outer join strength
reduction, but it can't be implemented in reduce_outer_joins() because
we don't have enough info about the individual relations at that stage.
Instead do it much like remove_useless_joins(): once we've built base
relations, we can make another pass over the SpecialJoinInfo list and
get rid of any entries representing reducible semijoins.

This is essentially a followon to the inner-unique patch (commit 9c7f5229a)
and makes use of the proof machinery that that patch created.  We need only
minor refactoring of innerrel_is_unique's API to support this usage.

Per performance complaint from Teodor Sigaev.

Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
2017-05-01 14:53:42 -04:00
Peter Eisentraut
9414e41ea7 Fix logical replication launcher wake up and reset
After the logical replication launcher was told to wake up at
commit (for example, by a CREATE SUBSCRIPTION command), the flag to wake
up was not reset, so it would be woken up at every following commit as
well.  So fix that by resetting the flag.

Also, we don't need to wake up anything if the transaction was rolled
back.  Just reset the flag in that case.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-05-01 10:18:09 -04:00
Robert Haas
e180c8aa8c Fire per-statement triggers on partitioned tables.
Even though no actual tuples are ever inserted into a partitioned
table (the actual tuples are in the partitions, not the partitioned
table itself), we still need to have a ResultRelInfo for the
partitioned table, or per-statement triggers won't get fired.

Amit Langote, per a report from Rajkumar Raghuwanshi.  Reviewed by me.

Discussion: http://postgr.es/m/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
2017-05-01 08:23:01 -04:00
Robert Haas
504c2205ab Fix crash when partitioned column specified twice.
Amit Langote, reviewed by Beena Emerson

Discussion: http://postgr.es/m/6ed23d3d-c09d-4cbc-3628-0a8a32f750f4@lab.ntt.co.jp
2017-04-28 13:52:17 -04:00
Heikki Linnakangas
d981074c24 Misc SCRAM code cleanups.
* Move computation of SaltedPassword to a separate function from
  scram_ClientOrServerKey(). This saves a lot of cycles in libpq, by
  computing SaltedPassword only once per authentication. (Computing
  SaltedPassword is expensive by design.)

* Split scram_ClientOrServerKey() into two functions. Improves
  readability, by making the calling code less verbose.

* Rename "server proof" to "server signature", to better match the
  nomenclature used in RFC 5802.

* Rename SCRAM_SALT_LEN to SCRAM_DEFAULT_SALT_LEN, to make it more clear
  that the salt can be of any length, and the constant only specifies how
  long a salt we use when we generate a new verifier. Also rename
  SCRAM_ITERATIONS_DEFAULT to SCRAM_DEFAULT_ITERATIONS, for consistency.

These things caught my eye while working on other upcoming changes.
2017-04-28 15:22:38 +03:00
Andres Freund
56e19d938d Don't use on-disk snapshots for exported logical decoding snapshot.
Logical decoding stores historical snapshots on disk, so that logical
decoding can restart without having to reconstruct a snapshot from
scratch (for which the resources are not guaranteed to be present
anymore).  These serialized snapshots were also used when creating a
new slot via the walsender interface, which can export a "full"
snapshot (i.e. one that can read all tables, not just catalog ones).

The problem is that the serialized snapshots are only useful for
catalogs and not for normal user tables.  Thus the use of such a
serialized snapshot could result in an inconsistent snapshot being
exported, which could lead to queries returning wrong data.  This
would only happen if logical slots are created while another logical
slot already exists.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
2017-04-27 15:29:15 -07:00
Andres Freund
2bef06d516 Preserve required !catalog tuples while computing initial decoding snapshot.
The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.

This could cause a corrupted initial snapshot being exported.  The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it.  Ongoing decoding is not
affected by this bug.

To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol.  To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.

In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.

Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
2017-04-27 13:13:36 -07:00
Simon Riggs
49e9281549 Rework handling of subtransactions in 2PC recovery
The bug fixed by 0874d4f3e1
caused us to question and rework the handling of
subtransactions in 2PC during and at end of recovery.
Patch adds checks and tests to ensure no further bugs.

This effectively removes the temporary measure put in place
by 546c13e11b.

Author: Simon Riggs
Reviewed-by: Tom Lane, Michael Paquier
Discussion: http://postgr.es/m/CANP8+j+vvXmruL_i2buvdhMeVv5TQu0Hm2+C5N+kdVwHJuor8w@mail.gmail.com
2017-04-27 14:41:22 +02:00
Peter Eisentraut
de43897122 Fix various concurrency issues in logical replication worker launching
The code was originally written with assumption that launcher is the
only process starting the worker.  However that hasn't been true since
commit 7c4f52409 which failed to modify the worker management code
adequately.

This patch adds an in_use field to the LogicalRepWorker struct to
indicate whether the worker slot is being used and uses proper locking
everywhere this flag is set or read.

However if the parent process dies while the new worker is starting and
the new worker fails to attach to shared memory, this flag would never
get cleared.  We solve this rare corner case by adding a sort of garbage
collector for in_use slots.  This uses another field in the
LogicalRepWorker struct named launch_time that contains the time when
the worker was started.  If any request to start a new worker does not
find free slot, we'll check for workers that were supposed to start but
took too long to actually do so, and reuse their slot.

In passing also fix possible race conditions when stopping a worker that
hasn't finished starting yet.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-04-26 10:45:59 -04:00
Tom Lane
64925603c9 Revert "Use pselect(2) not select(2), if available, to wait in postmaster's loop."
This reverts commit 81069a9efc.

Buildfarm results suggest that some platforms have versions of pselect(2)
that are not merely non-atomic, but flat out non-functional.  Revert the
use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART
patch as the source of trouble).  If it's so, we should probably look into
blacklisting specific platforms that have broken pselect.

Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
2017-04-24 18:29:03 -04:00
Tom Lane
81069a9efc Use pselect(2) not select(2), if available, to wait in postmaster's loop.
Traditionally we've unblocked signals, called select(2), and then blocked
signals again.  The code expects that the select() will be cancelled with
EINTR if an interrupt occurs; but there's a race condition, which is that
an already-pending signal will be delivered as soon as we unblock, and then
when we reach select() there will be nothing preventing it from waiting.
This can result in a long delay before we perform any action that
ServerLoop was supposed to have taken in response to the signal.  As with
the somewhat-similar symptoms fixed by commit 893902085, the main practical
problem is slow launching of parallel workers.  The window for trouble is
usually pretty short, corresponding to one iteration of ServerLoop; but
it's not negligible.

To fix, use pselect(2) in place of select(2) where available, as that's
designed to solve exactly this problem.  Where not available, we continue
to use the old way, and are no worse off than before.

pselect(2) has been required by POSIX since about 2001, so most modern
platforms should have it.  A bigger portability issue is that some
implementations are said to be non-atomic, ie pselect() isn't really
any different from unblock/select/reblock.  Still, we're no worse off
than before on such a platform.

There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point this
could be reverted, since we'd be using a self-pipe to solve the race
condition.  But that's not happening before v11 at the earliest.

Back-patch to 9.6.  The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.

Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
2017-04-24 14:03:14 -04:00
Tom Lane
8939020853 Run the postmaster's signal handlers without SA_RESTART.
The postmaster keeps signals blocked everywhere except while waiting
for something to happen in ServerLoop().  The code expects that the
select(2) will be cancelled with EINTR if an interrupt occurs; without
that, followup actions that should be performed by ServerLoop() itself
will be delayed.  However, some platforms interpret the SA_RESTART
signal flag as meaning that they should restart rather than cancel
the select(2).  Worse yet, some of them restart it with the original
timeout delay, meaning that a steady stream of signal interrupts can
prevent ServerLoop() from iterating at all if there are no incoming
connection requests.

Observable symptoms of this, on an affected platform such as HPUX 10,
include extremely slow parallel query startup (possibly as much as
30 seconds) and failure to update timestamps on the postmaster's sockets
and lockfiles when no new connections arrive for a long time.

We can fix this by running the postmaster's signal handlers without
SA_RESTART.  That would be quite a scary change if the range of code
where signals are accepted weren't so tiny, but as it is, it seems
safe enough.  (Note that postmaster children do, and must, reset all
the handlers before unblocking signals; so this change should not
affect any child process.)

There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point it might
be appropriate to revert this patch.  But that's not happening before
v11 at the earliest.

Back-patch to 9.6.  The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.

Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
2017-04-24 13:00:30 -04:00
Fujii Masao
cbc2270e3f Get rid of extern declarations of non-existent functions.
Those extern declartions were mistakenly added by commit 7c4f52409.

Author: Petr Jelinek
2017-04-25 01:31:42 +09:00