Commit Graph

17347 Commits

Author SHA1 Message Date
Tatsuo Ishii 393d47ed0f Add missing comment in postgresql.conf.
current_source requires to restart server to reflect the new
value. Per Yugo Nagata and Masahiko Sawada.

Back patched to 9.2 and beyond.
2017-07-31 11:24:51 +09:00
Tatsuo Ishii 8b015dd723 Add missing comment in postgresql.conf.
dynamic_shared_memory_type requires to restart server to reflect
the new value. Per Yugo Nagata and Masahiko Sawada.

Back pached to 9.4 and beyond.
2017-07-31 11:06:37 +09:00
Tatsuo Ishii 9fe63092b5 Add missing comment in postgresql.conf.
max_logical_replication_workers requires to restart server to reflect
the new value. Per Yugo Nagata. Minor editing by me.
2017-07-31 10:46:32 +09:00
Andres Freund cc9f08b6b8 Move ExecProcNode from dispatch to function pointer based model.
This allows us to add stack-depth checks the first time an executor
node is called, and skip that overhead on following
calls. Additionally it yields a nice speedup.

While it'd probably have been a good idea to have that check all
along, it has become more important after the new expression
evaluation framework in b8d7f053c5 - there's no stack depth
check in common paths anymore now. We previously relied on
ExecEvalExpr() being executed somewhere.

We should move towards that model for further routines, but as this is
required for v10, it seems better to only do the necessary (which
already is quite large).

Author: Andres Freund, Tom Lane
Reported-By: Julien Rouhaud
Discussion:
    https://postgr.es/m/22833.1490390175@sss.pgh.pa.us
    https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
2017-07-30 16:18:21 -07:00
Andres Freund d47cfef711 Move interrupt checking from ExecProcNode() to executor nodes.
In a followup commit ExecProcNode(), and especially the large switch
it contains, will largely be replaced by a function pointer directly
to the correct node. The node functions will then get invoked by a
thin inline function wrapper. To avoid having to include miscadmin.h
in headers - CHECK_FOR_INTERRUPTS() - move the interrupt checks into
the individual executor routines.

While looking through all executor nodes, I noticed a number of
arguably missing interrupt checks, add these too.

Author: Andres Freund, Tom Lane
Reviewed-By: Tom Lane
Discussion:
    https://postgr.es/m/22833.1490390175@sss.pgh.pa.us
2017-07-30 16:06:42 -07:00
Alvaro Herrera 5e3254f086 Update copyright in recently added files 2017-07-26 18:17:18 -04:00
Alvaro Herrera 459c64d322 Fix concurrent locking of tuple update chain
If several sessions are concurrently locking a tuple update chain with
nonconflicting lock modes using an old snapshot, and they all succeed,
it may happen that some of them fail because of restarting the loop (due
to a concurrent Xmax change) and getting an error in the subsequent pass
while trying to obtain a tuple lock that they already have in some tuple
version.

This can only happen with very high concurrency (where a row is being
both updated and FK-checked by multiple transactions concurrently), but
it's been observed in the field and can have unpleasant consequences
such as an FK check failing to see a tuple that definitely exists:
    ERROR:  insert or update on table "child_table" violates foreign key constraint "fk_constraint_name"
    DETAIL:  Key (keyid)=(123456) is not present in table "parent_table".
(where the key is observably present in the table).

Discussion: https://postgr.es/m/20170714210011.r25mrff4nxjhmf3g@alvherre.pgsql
2017-07-26 17:24:16 -04:00
Alvaro Herrera c28e4f4dc6 Remove obsolete comments about functional dependencies
Initial submitted versions of the functional dependencies patch ignored
row groups that were smaller than a configured size.  However, that
consideration was removed in late stages of the patch just before
commit, but some comments referring to it remained.  Remove them to
avoid confusion.

Author: Atsushi Torikoshi
Discussion: https://postgr.es/m/7cfb23fc-4493-9c02-5da9-e505fd0115d2@lab.ntt.co.jp
2017-07-26 11:40:39 -04:00
Alvaro Herrera 9915de6c1c Fix race conditions in replication slot operations
It is relatively easy to get a replication slot to look as still active
while one process is in the process of getting rid of it; when some
other process tries to "acquire" the slot, it would fail with an error
message of "replication slot XYZ is active for PID N".

The error message in itself is fine, except that when the intention is
to drop the slot, it is unhelpful: the useful behavior would be to wait
until the slot is no longer acquired, so that the drop can proceed.  To
implement this, we use a condition variable so that slot acquisition can
be told to wait on that condition variable if the slot is already
acquired, and we make any change in active_pid broadcast a signal on the
condition variable.  Thus, as soon as the slot is released, the drop
will proceed properly.

Reported by: Tom Lane
Discussion: https://postgr.es/m/11904.1499039688@sss.pgh.pa.us
Authors: Petr Jelínek, Álvaro Herrera
2017-07-25 13:26:49 -04:00
Robert Haas 4132dbec69 Fix partitioning crashes during error reporting.
In various places where we reverse-map a tuple before calling
ExecBuildSlotValueDescription, we neglected to ensure that the
slot descriptor matched the tuple stored in it.

Amit Langote and Amit Khandekar, reviewed by Etsuro Fujita

Discussion: http://postgr.es/m/CAJ3gD9cqpP=WvJj=dv1ONkPWjy8ZuUaOM4_x86i3uQPas=0_jg@mail.gmail.com
2017-07-24 18:08:08 -04:00
Tom Lane e2c8100e60 Fix race condition in predicate-lock init code in EXEC_BACKEND builds.
Trading a little too heavily on letting the code path be the same whether
we were creating shared data structures or only attaching to them,
InitPredicateLocks() inserted the "scratch" PredicateLockTargetHash entry
unconditionally.  This is just wrong if we're in a postmaster child,
which would only reach this code in EXEC_BACKEND builds.  Most of the
time, the hash_search(HASH_ENTER) call would simply report that the
entry already existed, causing no visible effect since the code did not
bother to check for that possibility.  However, if this happened while
some other backend had transiently removed the "scratch" entry, then
that other backend's eventual RestoreScratchTarget would suffer an
assert failure; this appears to be the explanation for a recent failure
on buildfarm member culicidae.  In non-assert builds, there would be
no visible consequences there either.  But nonetheless this is a pretty
bad bug for EXEC_BACKEND builds, for two reasons:

1. Each new backend would perform the hash_search(HASH_ENTER) call
without holding any lock that would prevent concurrent access to the
PredicateLockTargetHash hash table.  This creates a low but certainly
nonzero risk of corruption of that hash table.

2. In the event that the race condition occurred, by reinserting the
scratch entry too soon, we were defeating the entire purpose of the
scratch entry, namely to guarantee that transaction commit could move
hash table entries around with no risk of out-of-memory failure.
The odds of an actual OOM failure are quite low, but not zero, and if
it did happen it would again result in corruption of the hash table.

The user-visible symptoms of such corruption are a little hard to predict,
but would presumably amount to misbehavior of SERIALIZABLE transactions
that'd require a crash or postmaster restart to fix.

To fix, just skip the hash insertion if IsUnderPostmaster.  I also
inserted a bunch of assertions that the expected things happen
depending on whether IsUnderPostmaster is true.  That might be overkill,
since most comparable code in other functions isn't quite that paranoid,
but once burnt twice shy.

In passing, also move a couple of lines to places where they seemed
to make more sense.

Diagnosis of problem by Thomas Munro, patch by me.  Back-patch to
all supported branches.

Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us
2017-07-24 16:45:58 -04:00
Robert Haas 7086be6e36 When WCOs are present, disable direct foreign table modification.
If the user modifies a view that has CHECK OPTIONs and this gets
translated into a modification to an underlying relation which happens
to be a foreign table, the check options should be enforced.  In the
normal code path, that was happening properly, but it was not working
properly for "direct" modification because the whole operation gets
pushed to the remote side in that case and we never have an option to
enforce the constraint against individual tuples.  Fix by disabling
direct modification when there is a need to enforce CHECK OPTIONs.

Etsuro Fujita, reviewed by Kyotaro Horiguchi and by me.

Discussion: http://postgr.es/m/f8a48f54-6f02-9c8a-5250-9791603171ee@lab.ntt.co.jp
2017-07-24 15:57:24 -04:00
Tom Lane b4af9e3f37 Ensure that pg_get_ruledef()'s output matches pg_get_viewdef()'s.
Various cases involving renaming of view columns are handled by having
make_viewdef pass down the view's current relation tupledesc to
get_query_def, which then takes care to use the column names from the
tupledesc for the output column names of the SELECT.  For some reason
though, we'd missed teaching make_ruledef to do similarly when it is
printing an ON SELECT rule, even though this is exactly the same case.
The results from pg_get_ruledef would then be different and arguably wrong.
In particular, this breaks pre-v10 versions of pg_dump, which in some
situations would define views by means of emitting a CREATE RULE ... ON
SELECT command.  Third-party tools might not be happy either.

In passing, clean up some crufty code in make_viewdef; we'd apparently
modernized the equivalent code in make_ruledef somewhere along the way,
and missed this copy.

Per report from Gilles Darold.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/ec05659a-40ff-4510-fc45-ca9d965d0838@dalibo.com
2017-07-24 15:16:31 -04:00
Tom Lane 278cb43411 Be more consistent about errors for opfamily member lookup failures.
Add error checks in some places that were calling get_opfamily_member
or get_opfamily_proc and just assuming that the call could never fail.
Also, standardize the wording for such errors in some other places.

None of these errors are expected in normal use, hence they're just
elog not ereport.  But they may be handy for diagnosing omissions in
custom opclasses.

Rushabh Lathia found the oversight in RelationBuildPartitionKey();
I found the others by grepping for all callers of these functions.

Discussion: https://postgr.es/m/CAGPqQf2R9Nk8htpv0FFi+FP776EwMyGuORpc9zYkZKC8sFQE3g@mail.gmail.com
2017-07-24 11:23:27 -04:00
Tom Lane ab2324fd46 Improve comments about partitioned hash table freelists.
While I couldn't find any live bugs in commit 44ca4022f, the comments
seemed pretty far from adequate; in particular it was not made plain that
"borrowing" entries from other freelists is critical for correctness.
Try to improve the commentary.  A couple of very minor code style
tweaks, as well.

Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us
2017-07-22 18:02:26 -04:00
Alvaro Herrera de38489b92 Fix typo in comment
Commit fd31cd2651 renamed the variable to skipping_blocks, but forgot
to update this comment.

Noticed while inspecting code.
2017-07-21 20:08:53 -04:00
Teodor Sigaev 7e1fb4c59e Fix double shared memory allocation.
SLRU buffer lwlocks are allocated twice by oversight in commit
fe702a7b3f where that locks were moved to
separate tranche. The bug doesn't have user-visible effects except small
overspending of shared memory.

Backpatch to 9.6 where it was introduced.

Alexander Korotkov with small editorization by me.
2017-07-21 13:31:20 +03:00
Dean Rasheed d363d42bb9 Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition bounds.
Previously, UNBOUNDED meant no lower bound when used in the FROM list,
and no upper bound when used in the TO list, which was OK for
single-column range partitioning, but problematic with multiple
columns. For example, an upper bound of (10.0, UNBOUNDED) would not be
collocated with a lower bound of (10.0, UNBOUNDED), thus making it
difficult or impossible to define contiguous multi-column range
partitions in some cases.

Fix this by using MINVALUE and MAXVALUE instead of UNBOUNDED to
represent a partition column that is unbounded below or above
respectively. This syntax removes any ambiguity, and ensures that if
one partition's lower bound equals another partition's upper bound,
then the partitions are contiguous.

Also drop the constraint prohibiting finite values after an unbounded
column, and just document the fact that any values after MINVALUE or
MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED
multiple times, which was needlessly verbose.

Note: Forces a post-PG 10 beta2 initdb.

Report by Amul Sul, original patch by Amit Langote with some
additional hacking by me.

Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-21 09:20:47 +01:00
Tom Lane eb145fdfea Fix dumping of outer joins with empty qual lists.
Normally, a JoinExpr would have empty "quals" only if it came from CROSS
JOIN syntax.  However, it's possible to get to this state by specifying
NATURAL JOIN between two tables with no common column names, and there
might be other ways too.  The code previously printed no ON clause if
"quals" was empty; that's right for CROSS JOIN but syntactically invalid
if it's some type of outer join.  Fix by printing ON TRUE in that case.

This got broken by commit 2ffa740be, which stopped using NATURAL JOIN
syntax in ruleutils output due to its brittleness in the face of
column renamings.  Back-patch to 9.3 where that commit appeared.

Per report from Tushar Ahuja.

Discussion: https://postgr.es/m/98b283cd-6dda-5d3f-f8ac-87db8c76a3da@enterprisedb.com
2017-07-20 11:29:36 -04:00
Tom Lane 3cb29c42f9 Add static assertions about pg_control fitting into one disk sector.
When pg_control was first designed, sizeof(ControlFileData) was small
enough that a comment seemed like plenty to document the assumption that
it'd fit into one disk sector.  Now it's nearly 300 bytes, raising the
possibility that somebody would carelessly add enough stuff to create
a problem.  Let's add a StaticAssertStmt() to ensure that the situation
doesn't pass unnoticed if it ever occurs.

While at it, rename PG_CONTROL_SIZE to PG_CONTROL_FILE_SIZE to make it
clearer what that symbol means, and convert the existing runtime
comparisons of sizeof(ControlFileData) vs. PG_CONTROL_FILE_SIZE to be
static asserts --- we didn't have that technology when this code was
first written.

Discussion: https://postgr.es/m/9192.1500490591@sss.pgh.pa.us
2017-07-19 16:16:57 -04:00
Tom Lane 04a2c7f412 Improve make_tsvector() to handle empty input, and simplify its callers.
It seemed a bit silly that each caller of make_tsvector() was laboriously
special-casing the situation where no lexemes were found, when it would
be easy and much more bullet-proof to make make_tsvector() handle that.
2017-07-18 13:13:47 -04:00
Tom Lane b4c6d31c0b Fix serious performance problems in json(b) to_tsvector().
In an off-list followup to bug #14745, Bob Jones complained that
to_tsvector() on a 2MB jsonb value took an unreasonable amount of
time and space --- enough to draw the wrath of the OOM killer on
his machine.  On my machine, his example proved to require upwards
of 18 seconds and 4GB, which seemed pretty bogus considering that
to_tsvector() on the same data treated as text took just a couple
hundred msec and 10 or so MB.

On investigation, the problem is that the implementation scans each
string element of the json(b) and converts it to tsvector separately,
then applies tsvector_concat() to join those separate tsvectors.
The unreasonable memory usage came from leaking every single one of
the transient tsvectors --- but even without that mistake, this is an
O(N^2) or worse algorithm, because tsvector_concat() has to repeatedly
process the words coming from earlier elements.

We can fix it by accumulating all the lexeme data and applying
make_tsvector() just once.  As a side benefit, that also makes the
desired adjustment of lexeme positions far cheaper, because we can
just tweak the running "pos" counter between JSON elements.

In passing, try to make the explanation of that tweak more intelligible.
(I didn't think that a barely-readable comment far removed from the
actual code was helpful.)  And do some minor other code beautification.
2017-07-18 12:45:51 -04:00
Robert Haas c85ec643ff Reverse-convert row types in ExecWithCheckOptions.
Just as we already do in ExecConstraints, and for the same reason:
to improve the quality of error messages.

Etsuro Fujita, reviewed by Amit Langote

Discussion: http://postgr.es/m/56e0baa8-e458-2bbb-7936-367f7d832e43@lab.ntt.co.jp
2017-07-17 21:56:31 -04:00
Robert Haas f81a91db4d Use a real RT index when setting up partition tuple routing.
Before, we always used a dummy value of 1, but that's not right when
the partitioned table being modified is inside of a WITH clause
rather than part of the main query.

Amit Langote, reported and reviewd by Etsuro Fujita, with a comment
change by me.

Discussion: http://postgr.es/m/ee12f648-8907-77b5-afc0-2980bcb0aa37@lab.ntt.co.jp
2017-07-17 21:29:45 -04:00
Robert Haas 09c2e7cd2f hash: Fix write-ahead logging bugs related to init forks.
One, logging for CREATE INDEX was oblivious to the fact that when
an unlogged table is created, *only* operations on the init fork
should be logged.

Two, init fork buffers need to be flushed after they are written;
otherwise, a filesystem-level copy following recovery may do the
wrong thing.  (There may be a better fix for this issue than the
one used here, but this is transposed from the similar logic already
present in XLogReadBufferForRedoExtended, and a broader refactoring
after beta2 seems inadvisable.)

Amit Kapila, reviewed by Ashutosh Sharma, Kyotaro Horiguchi,
and Michael Paquier

Discussion: http://postgr.es/m/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w@mail.gmail.com
2017-07-17 12:03:35 -04:00
Tom Lane de2af6e001 Improve comments for execExpr.c's handling of FieldStore subexpressions.
Given this code's general eagerness to use subexpressions' output variables
as temporary workspace, it's not exactly clear that it is safe for
FieldStore to tell a newval subexpression that it can write into the same
variable that is being supplied as a potential input.  Document the chain
of assumptions needed for that to be safe.
2017-07-15 16:57:43 -04:00
Tom Lane e9b64824a0 Improve comments for execExpr.c's isAssignmentIndirectionExpr().
I got confused about why this function doesn't need to recursively
search the expression tree for a CaseTestExpr node.  After figuring
that out, add a comment to save the next person some time.
2017-07-15 14:03:39 -04:00
Tom Lane decb08ebdf Code review for NextValueExpr expression node type.
Add missing infrastructure for this node type, notably in ruleutils.c where
its lack could demonstrably cause EXPLAIN to fail.  Add outfuncs/readfuncs
support.  (outfuncs support is useful today for debugging purposes.  The
readfuncs support may never be needed, since at present it would only
matter for parallel query and NextValueExpr should never appear in a
parallelizable query; but it seems like a bad idea to have a primnode type
that isn't fully supported here.)  Teach planner infrastructure that
NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node
with cost cpu_operator_cost.  Given its limited scope of usage, there
*might* be no live bug today from the lack of that knowledge, but it's
certainly going to bite us on the rear someday.  Teach pg_stat_statements
about the new node type, too.

While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction,
XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost.
Failing to do this for SQLValueFunction was an oversight in my commit
0bb51aa96.  The others are longer-standing oversights, but no time like the
present to fix them.  (In principle, CoerceToDomain could have cost much
higher than this, but it doesn't presently seem worth trying to examine the
domain's constraints here.)

Modify execExprInterp.c to execute NextValueExpr as an out-of-line
function; it seems quite unlikely to me that it's worth insisting that
it be inlined in all expression eval methods.  Besides, providing the
out-of-line function doesn't stop anyone from inlining if they want to.

Adjust some places where NextValueExpr support had been inserted with the
aid of a dartboard rather than keeping it in the same order as elsewhere.

Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-07-14 15:25:43 -04:00
Tom Lane a3ca72ae9a Fix dumping of FUNCTION RTEs that contain non-function-call expressions.
The grammar will only accept something syntactically similar to a function
call in a function-in-FROM expression.  However, there are various ways
to input something that ruleutils.c won't deparse that way, potentially
leading to a view or rule that fails dump/reload.  Fix by inserting a
dummy CAST around anything that isn't going to deparse as a function
(which is one of the ways to get something like that in there in the
first place).

In HEAD, also make use of the infrastructure added by this to avoid
emitting unnecessary parentheses in CREATE INDEX deparsing.  I did
not change that in back branches, thinking that people might find it
to be unexpected/unnecessary behavioral change.

In HEAD, also fix incorrect logic for when to add extra parens to
partition key expressions.  Somebody apparently thought they could
get away with simpler logic than pg_get_indexdef_worker has, but
they were wrong --- a counterexample is PARTITION BY LIST ((a[1])).
Ignoring the prettyprint flag for partition expressions isn't exactly
a nice solution anyway.

This has been broken all along, so back-patch to all supported branches.

Discussion: https://postgr.es/m/10477.1499970459@sss.pgh.pa.us
2017-07-13 19:25:03 -04:00
Heikki Linnakangas 74fc83869e Fix race between GetNewTransactionId and GetOldestActiveTransactionId.
The race condition goes like this:

1. GetNewTransactionId advances nextXid e.g. from 100 to 101
2. GetOldestActiveTransactionId reads the new nextXid, 101
3. GetOldestActiveTransactionId loops through the proc array. There are no
   active XIDs there, so it returns 101 as the oldest active XID.
4. GetNewTransactionid stores XID 100 to MyPgXact->xid

So, GetOldestActiveTransactionId returned XID 101, even though 100 only
just started and is surely still running.

This would be hard to hit in practice, and even harder to spot any ill
effect if it happens. GetOldestActiveTransactionId is only used when
creating a checkpoint in a master server, and the race condition can only
happen on an online checkpoint, as there are no backends running during a
shutdown checkpoint. The oldestActiveXid value of an online checkpoint is
only used when starting up a hot standby server, to determine the starting
point where pg_subtrans is initialized from. For the race condition to
happen, there must be no other XIDs in the proc array that would hold back
the oldest-active XID value, which means that the missed XID must be a top
transaction's XID. However, pg_subtrans is not used for top XIDs, so I
believe an off-by-one error is in fact inconsequential. Nevertheless, let's
fix it, as it's clearly wrong and the fix is simple.

This has been wrong ever since hot standby was introduced, so backport to
all supported versions.

Discussion: https://www.postgresql.org/message-id/e7258662-82b6-7a45-56d4-99b337a32bf7@iki.fi
2017-07-13 15:47:02 +03:00
Tom Lane bc2d716ad0 Fix ruleutils.c for domain-over-array cases, too.
Further investigation shows that ruleutils isn't quite up to speed either
for cases where we have a domain-over-array: it needs to be prepared to
look past a CoerceToDomain at the top level of field and element
assignments, else it decompiles them incorrectly.  Potentially this would
result in failure to dump/reload a rule, if it looked like the one in the
new test case.  (I also added a test for EXPLAIN; that output isn't broken,
but clearly we need more test coverage here.)

Like commit b1cb32fb6, this bug is reachable in cases we already support,
so back-patch all the way.
2017-07-12 18:00:04 -04:00
Heikki Linnakangas da11977de9 Reduce memory usage of tsvector type analyze function.
compute_tsvector_stats() detoasted and kept in memory every tsvector value
in the sample, but that can be a lot of memory. The original bug report
described a case using over 10 gigabytes, with statistics target of 10000
(the maximum).

To fix, allocate a separate copy of just the lexemes that we keep around,
and free the detoasted tsvector values as we go. This adds some palloc/pfree
overhead, when you have a lot of distinct lexemes in the sample, but it's
better than running out of memory.

Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to
all supported versions.

Discussion: https://www.postgresql.org/message-id/20170514200602.1451.46797@wrigleys.postgresql.org
2017-07-12 22:06:13 +03:00
Tom Lane 512f67c8d0 Avoid integer overflow while sifting-up a heap in tuplesort.c.
If the number of tuples in the heap exceeds approximately INT_MAX/2,
this loop's calculation "2*i+1" could overflow, resulting in a crash.
Fix it by using unsigned int rather than int for the relevant local
variables; that shouldn't cost anything extra on any popular hardware.
Per bug #14722 from Sergey Koposov.

Original patch by Sergey Koposov, modified by me per a suggestion
from Heikki Linnakangas to use unsigned int not int64.

Back-patch to 9.4, where tuplesort.c grew the ability to sort as many
as INT_MAX tuples in-memory (commit 263865a48).

Discussion: https://postgr.es/m/20170629161637.1478.93109@wrigleys.postgresql.org
2017-07-12 13:24:16 -04:00
Heikki Linnakangas ca906f68f2 Fix variable and type name in comment.
Kyotaro Horiguchi

Discussion: https://www.postgresql.org/message-id/20170711.163441.241981736.horiguchi.kyotaro@lab.ntt.co.jp
2017-07-12 17:07:35 +03:00
Heikki Linnakangas 49a3360209 Fix ordering of operations in SyncRepWakeQueue to avoid assertion failure.
Commit 14e8803f1 removed the locking in SyncRepWaitForLSN, but that
introduced a race condition, where SyncRepWaitForLSN might see
syncRepState already set to SYNC_REP_WAIT_COMPLETE, but the process was
not yet removed from the queue. That tripped the assertion, that the
process should no longer be in the uqeue. Reorder the operations in
SyncRepWakeQueue to remove the process from the queue first, and update
syncRepState only after that, and add a memory barrier in between to make
sure the operations are made visible to other processes in that order.

Fixes bug #14721 reported by Const Zhang. Analysis and fix by Thomas Munro.
Backpatch down to 9.5, where the locking was removed.

Discussion: https://www.postgresql.org/message-id/20170629023623.1480.26508%40wrigleys.postgresql.org
2017-07-12 15:30:52 +03:00
Tom Lane b1cb32fb62 Fix multiple assignments to a column of a domain type.
We allow INSERT and UPDATE commands to assign to the same column more than
once, as long as the assignments are to subfields or elements rather than
the whole column.  However, this failed when the target column was a domain
over array rather than plain array.  Fix by teaching process_matched_tle()
to look through CoerceToDomain nodes, and add relevant test cases.

Also add a group of test cases exercising domains over array of composite.
It's doubtless accidental that CREATE DOMAIN allows this case while not
allowing straight domain over composite; but it does, so we'd better make
sure we don't break it.  (I could not find any documentation mentioning
either side of that, so no doc changes.)

It's been like this for a long time, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
2017-07-11 16:48:59 -04:00
Alvaro Herrera 6c774caf0e Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: c5a8de3653bb1af6b0eb41cc6bf090c5522df52b
2017-07-10 11:53:55 -04:00
Tom Lane 45e004fb4e On Windows, retry process creation if we fail to reserve shared memory.
We've heard occasional reports of backend launch failing because
pgwin32_ReserveSharedMemoryRegion() fails, indicating that something
has already used that address space in the child process.  It's not
very clear what, given that we disable ASLR in Windows builds, but
suspicion falls on antivirus products.  It'd be better if we didn't
have to disable ASLR, anyway.  So let's try to ameliorate the problem
by retrying the process launch after such a failure, up to 100 times.

Patch by me, based on previous work by Amit Kapila and others.
This is a longstanding issue, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CAA4eK1+R6hSx6t_yvwtx+NRzneVp+MRqXAdGJZChcau8Uij-8g@mail.gmail.com
2017-07-10 11:00:09 -04:00
Andrew Gierth 1add0b15f1 Fix COPY's handling of transition tables with indexes.
Commit c46c0e5202 failed to pass the
TransitionCaptureState object to ExecARInsertTriggers() in the case
where it's using heap_multi_insert and there are indexes.  Repair.

Thomas Munro, from a report by David Fetter
Discussion: https://postgr.es/m/20170708084213.GA14720%40fetter.org
2017-07-10 11:40:08 +01:00
Tom Lane ec4073f641 Avoid unreferenced-function warning on low-functionality platforms.
On platforms lacking both locale_t and ICU, collationcmds.c failed
to make any use of its static function is_all_ascii(), thus probably
drawing a compiler warning.  Oversight in my commit ddb5fdc06.
Per buildfarm member gaur.
2017-07-08 12:42:25 -04:00
Alvaro Herrera 46e9151942 Fix typo
Noticed while reviewing code.
2017-07-07 17:10:36 -04:00
Teodor Sigaev 31b8db8e6c Fix potential data corruption during freeze
Fix oversight in 3b97e6823b bug fix. Bitwise AND is used instead of OR and
it cleans all bits in t_infomask heap tuple field.

Backpatch to 9.3
2017-07-06 17:18:55 +03:00
Dean Rasheed f1dae097f2 Clarify the contract of partition_rbound_cmp().
partition_rbound_cmp() is intended to compare range partition bounds
in a way such that if all the bound values are equal but one is an
upper bound and one is a lower bound, the upper bound is treated as
smaller than the lower bound. This particular ordering is required by
RelationBuildPartitionDesc() when building the PartitionBoundInfoData,
so that it can consistently keep only the upper bounds when upper and
lower bounds coincide.

Update the function comment to make that clearer.

Also, fix a (currently unreachable) corner-case bug -- if the bound
values coincide and they contain unbounded values, fall through to the
lower-vs-upper comparison code, rather than immediately returning
0. Currently it is not possible to define coincident upper and lower
bounds containing unbounded columns, but that may change in the
future, so code defensively.

Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-06 12:46:08 +01:00
Dean Rasheed c03911d945 Simplify the logic checking new range partition bounds.
The previous logic, whilst not actually wrong, was overly complex and
involved doing two binary searches, where only one was really
necessary. This simplifies that logic and improves the comments.

One visible change is that if the new partition overlaps multiple
existing partitions, the error message now always reports the overlap
with the first existing partition (the one with the lowest
bounds). The old code would sometimes report the clash with the first
partition and sometimes with the last one.

Original patch idea from Amit Langote, substantially rewritten by me.

Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-06 09:58:06 +01:00
Peter Eisentraut d80e73f229 Fix output of char node fields
WRITE_CHAR_FIELD() didn't do any escaping, so that for example a zero
byte would cause the whole output string to be truncated.  To fix, pass
the char through outToken(), so it is escaped like a string.  Adjust the
reading side to handle this.
2017-07-05 10:51:54 -04:00
Peter Eisentraut cb9079cd51 Improve subscription locking
This avoids "tuple concurrently updated" errors when a ALTER or DROP
SUBSCRIPTION writes to pg_subscription_rel at the same time as a worker.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-07-03 22:47:06 -04:00
Heikki Linnakangas b93827c745 Treat clean shutdown of an SSL connection same as the non-SSL case.
If the client closes an SSL connection, treat it the same as EOF on a
non-SSL connection. In particular, don't write a message in the log about
that.

Michael Paquier.

Discussion: https://www.postgresql.org/message-id/CAB7nPqSfyVV42Q2acFo%3DvrvF2gxoZAMJLAPq3S3KkjhZAYi7aw@mail.gmail.com
2017-07-03 14:51:51 +03:00
Peter Eisentraut d8b3c81335 Refine memory allocation in ICU conversions
The simple calculations done to estimate the size of the output buffers
for ucnv_fromUChars() and ucnv_toUChars() could overflow int32_t for
large strings.  To avoid that, go the long way and run the function
first without an output buffer to get the correct output buffer size
requirement.
2017-07-01 23:08:37 -04:00
Tom Lane f32678c016 Reduce delay for last logicalrep feedback message when master goes idle.
The regression tests contain numerous cases where we do some activity on a
master server and then wait till the slave has ack'd flushing its copy of
that transaction.  Because WAL flush on the slave is asynchronous to the
logicalrep worker process, the worker cannot send such a feedback message
during the LogicalRepApplyLoop iteration where it processes the last data
from the master.  In the previous coding, the feedback message would come
out only when the loop's WaitLatchOrSocket call returned WL_TIMEOUT.  That
requires one full second of delay (NAPTIME_PER_CYCLE); and to add insult
to injury, it could take more than that if the WaitLatchOrSocket was
interrupted a few times by latch-setting events.

In reality we can expect the slave's walwriter process to have flushed the
WAL data after, more or less, WalWriterDelay (typically 200ms).  Hence,
if there are unacked transactions pending, make the wait delay only that
long rather than the full NAPTIME_PER_CYCLE.  Also, move one of the
send_feedback() calls into the loop main line, so that we'll check for the
need to send feedback even if we were woken by a latch event and not either
socket data or timeout.

It's not clear how much this matters for production purposes, but
it's definitely helpful for testing.

Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
2017-07-01 12:15:51 -04:00
Tom Lane 799f8bc76a Shorten timeouts while waiting for logicalrep worker slot attach/detach.
When waiting for a logical replication worker process to start or stop,
we have to busy-wait until we see it add or remove itself from the
LogicalRepWorker slot in shared memory.  Those loops were using a
one-second delay between checks, but on any reasonably modern machine, it
doesn't take more than a couple of msec for a worker to spawn or shut down.
Reduce the loop delays to 10ms to avoid wasting quite so much time in the
related regression tests.

In principle, a better solution would be to fix things so that the waiting
process can be awakened via its latch at the right time.  But that seems
considerably more invasive, which is undesirable for a post-beta fix.
Worker start/stop performance likely isn't of huge interest anyway for
production purposes, so we might not ever get around to it.

In passing, rearrange the second wait loop in logicalrep_worker_stop()
so that the lock is held at the top of the loop, thus saving one lock
acquisition/release per call, and making it look more like the other loop.

Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
2017-07-01 11:59:44 -04:00