Commit Graph

17923 Commits

Author SHA1 Message Date
Robert Haas 19c47e7c82 Factor error generation out of ExecPartitionCheck.
At present, we always raise an ERROR if the partition constraint
is violated, but a pending patch for UPDATE tuple routing will
consider instead moving the tuple to the correct partition.
Refactor to make that simpler.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.

Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
2018-01-05 15:22:33 -05:00
Alvaro Herrera df9f682c7b Fix failure to delete spill files of aborted transactions
Logical decoding's reorderbuffer.c may spill transaction files to disk
when transactions are large.  These are supposed to be removed when they
become "too old" by xid; but file removal requires the boundary LSNs of
the transaction to be known.  The final_lsn is only set when we see the
commit or abort record for the transaction, but nothing sets the value
for transactions that crash, so the removal code misbehaves -- in
assertion-enabled builds, it crashes by a failed assertion.

To fix, modify the final_lsn of transactions that don't have a value
set, to the LSN of the very latest change in the transaction.  This
causes the spilled files to be removed appropriately.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada
Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp
2018-01-05 12:17:10 -03:00
Peter Eisentraut 054e8c6cdb Another attempt at fixing build with various OpenSSL versions
It seems we can't easily work around the lack of
X509_get_signature_nid(), so revert the previous attempts and just
disable the tls-server-end-point feature if we don't have it.
2018-01-04 19:09:27 -05:00
Peter Eisentraut 1834c1e432 Add missing includes
<openssl/x509.h> is necessary to look into the X509 struct, used by
ac3ff8b1d8.
2018-01-04 17:56:09 -05:00
Robert Haas ef6087ee5f Minor preparatory refactoring for UPDATE row movement.
Generalize is_partition_attr to has_partition_attrs and make it
accessible from outside tablecmds.c.  Change map_partition_varattnos
to clarify that it can be used for mapping between any two relations
in a partitioning hierarchy, not just parent -> child.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Some comment changes by me.

Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
2018-01-04 16:25:49 -05:00
Peter Eisentraut ac3ff8b1d8 Fix build with older OpenSSL versions
Apparently, X509_get_signature_nid() is only in fairly new OpenSSL
versions, so use the lower-level interface it is built on instead.
2018-01-04 16:22:06 -05:00
Robert Haas cc6337d2fe Simplify and encapsulate tuple routing support code.
Instead of having ExecSetupPartitionTupleRouting return multiple out
parameters, have it return a pointer to a structure containing all of
those different things.  Also, provide and use a cleanup function,
ExecCleanupTupleRouting, instead of cleaning up all of the resources
allocated by ExecSetupPartitionTupleRouting individually.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me

Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
2018-01-04 15:48:15 -05:00
Peter Eisentraut d3fb72ea6d Implement channel binding tls-server-end-point for SCRAM
This adds a second standard channel binding type for SCRAM.  It is
mainly intended for third-party clients that cannot implement
tls-unique, for example JDBC.

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-04 15:29:50 -05:00
Peter Eisentraut f3049a603a Refactor channel binding code to fetch cbind_data only when necessary
As things stand now, channel binding data is fetched from OpenSSL and
saved into the SCRAM exchange context for any SSL connection attempted
for a SCRAM authentication, resulting in data fetched but not used if no
channel binding is used or if a different channel binding type is used
than what the data is here for.

Refactor the code in such a way that binding data is fetched from the
SSL stack only when a specific channel binding is used for both the
frontend and the backend.  In order to achieve that, save the libpq
connection context directly in the SCRAM exchange state, and add a
dependency to SSL in the low-level SCRAM routines.

This makes the interface in charge of initializing the SCRAM context
cleaner as all its data comes from either PGconn* (for frontend) or
Port* (for the backend).

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-04 13:55:12 -05:00
Peter Eisentraut 3ad2afc2e9 Define LDAPS_PORT if it's missing and disable implicit LDAPS on Windows
Some versions of Windows don't define LDAPS_PORT.

Also, Windows' ldap_sslinit() is documented to use LDAPS even if you
said secure=0 when the port number happens to be 636 or 3269.  Let's
avoid using the port number to imply that you want LDAPS, so that
connection strings have the same meaning on Windows and Unix.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm%3D23B7GV4AUz3MYH1TKpTv030VHxD2Sn%2BLYWDv8d-qWxww%40mail.gmail.com
2018-01-04 10:34:41 -05:00
Robert Haas c759395617 Code review for Parallel Append.
- Remove unnecessary #include mistakenly added in execnodes.h.
- Fix mistake in comment in choose_next_subplan_for_leader.
- Adjust row estimates in cost_append for a possibly-different
  parallel divisor.
- Clamp row estimates in cost_append after operations that may
  not produce integers.

Amit Kapila, with cosmetic adjustments by me.

Discussion: http://postgr.es/m/CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com
2018-01-04 07:56:09 -05:00
Tom Lane 47c6772eb7 Clean up tupdesc.c for recent changes.
TupleDescCopy needs to have the same effects as CreateTupleDescCopy in
that, since it doesn't copy constraints, it should clear the per-attribute
fields associated with them.  Oversight in commit cc5f81366.

Since TupleDescCopy has already established the presumption that it
can just flat-copy the entire attribute array in one go, propagate
that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr.
(I'm suspicious that this would lead to valgrind complaints if we
had any trailing padding in the struct, but we do not, and anyway
fixing that seems like a job for a separate commit.)

Add some better comments.

Thomas Munro, reviewed by Vik Fearing, some additional hacking by me

Discussion: https://postgr.es/m/CAEepm=0NvOGZ8B6GbQyQe2C_c2m3LKJ9w=8OMBaYRLgZ_Gw6Nw@mail.gmail.com
2018-01-03 17:53:41 -05:00
Alvaro Herrera bab2969867 Fix typo
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no
2018-01-03 19:12:06 -03:00
Alvaro Herrera 3c27944fb2 Make XactLockTableWait work for transactions that are not yet self-locked
XactLockTableWait assumed that its xid argument has already added itself
to the lock table.  That assumption led to another assumption that if
locking the xid has succeeded but the xid is reported as still in
progress, then the input xid must have been a subtransaction.

These assumptions hold true for the original uses of this code in
locking related to on-disk tuples, but they break down in logical
replication slot snapshot building -- in particular, when a standby
snapshot logged contains an xid that's already in ProcArray but not yet
in the lock table.  This leads to assertion failures that can be
reproduced all the way back to 9.4, when logical decoding was
introduced.

To fix, change SubTransGetParent to SubTransGetTopmostTransaction which
has a slightly different API: it returns the argument Xid if there is no
parent, and it goes all the way to the top instead of moving up the
levels one by one.  Also, to avoid busy-waiting, add a 1ms sleep to give
the other process time to register itself in the lock table.

For consistency, change ConditionalXactLockTableWait the same way.

Author: Petr Jelínek
Discussion: https://postgr.es/m/1B3E32D8-FCF4-40B4-AEF9-5C0E3AC57969@postgrespro.ru
Reported-by: Konstantin Knizhnik
Diagnosed-by: Stas Kelvich, Petr Jelínek
Reviewed-by: Andres Freund, Robert Haas
2018-01-03 17:26:20 -03:00
Tom Lane 6fcde24063 Fix some minor errors in new PHJ code.
Correct ExecParallelHashTuplePrealloc's estimate of whether the
space_allowed limit is exceeded.  Be more consistent about tuples that
are exactly HASH_CHUNK_THRESHOLD in size (they're "small", not "large").
Neither of these things explain the current buildfarm unhappiness, but
they're still bugs.

Thomas Munro, per gripe by me

Discussion: https://postgr.es/m/CAEepm=34PDuR69kfYVhmZPgMdy8pSA-MYbpesEN1SR+2oj3Y+w@mail.gmail.com
2018-01-03 12:53:49 -05:00
Tom Lane 3decd150a2 Teach eval_const_expressions() to handle some more cases.
Add some infrastructure (mostly macros) to make it easier to write
typical cases for constant-expression simplification.  Add simplification
processing for ArrayRef, RowExpr, and ScalarArrayOpExpr node types,
which formerly went unsimplified even if all their inputs were constants.
Also teach it to simplify FieldSelect from a composite constant.
Make use of the new infrastructure to reduce the amount of code needed
for the existing ArrayExpr and ArrayCoerceExpr cases.

One existing test case changes output as a result of the fact that
RowExpr can now be folded to a constant.  All the new code is exercised
by existing test cases according to gcov, so I feel no need to add
additional tests.

Tom Lane, reviewed by Dmitry Dolgov

Discussion: https://postgr.es/m/3be3b82c-e29c-b674-2163-bf47d98817b1@iki.fi
2018-01-03 12:35:09 -05:00
Peter Eisentraut 35c0754fad Allow ldaps when using ldap authentication
While ldaptls=1 provides an RFC 4513 conforming way to do LDAP
authentication with TLS encryption, there was an earlier de facto
standard way to do LDAP over SSL called LDAPS.  Even though it's not
enshrined in a standard, it's still widely used and sometimes required
by organizations' network policies.  There seems to be no reason not to
support it when available in the client library.  Therefore, add support
when using OpenLDAP 2.4+ or Windows.  It can be configured with
ldapscheme=ldaps or ldapurl=ldaps://...

Add tests for both ways of requesting LDAPS and a test for the
pre-existing ldaptls=1.  Modify the 001_auth.pl test for "diagnostic
messages", which was previously relying on the server rejecting
ldaptls=1.

Author: Thomas Munro
Reviewed-By: Peter Eisentraut
Discussion: https://postgr.es/m/CAEepm=1s+pA-LZUjQ-9GQz0Z4rX_eK=DFXAF1nBQ+ROPimuOYQ@mail.gmail.com
2018-01-03 10:11:26 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Andres Freund f9ccf92e16 Simplify representation of aggregate transition values a bit.
Previously aggregate transition values for hash and other forms of
aggregation (i.e. sort and no group by) were represented
differently. Hash based aggregation used a grouping set indexed array
pointing to an array of transition values, whereas other forms of
aggregation used one flattened array with the index being computed out
of grouping set and transition offsets.

That made upcoming changes hard, so represent both as grouping set
indexed array of per-group data.

As a nice side-effect this also makes aggregation slightly faster,
because computing offsets with `transno + (setno * numTrans)` turns
out not to be that cheap (too big for x86 lea for example).

Author: Andres Freund
Discussion: https://postgr.es/m/20171128003121.nmxbm2ounxzb6n2t@alap3.anarazel.de
2018-01-02 18:23:37 -08:00
Tom Lane 5dc692f78d Ensure proper alignment of tuples in HashMemoryChunkData buffers.
The previous coding relied (without any documentation) on the data[]
member of HashMemoryChunkData being at a MAXALIGN'ed offset.  If it
was not, the tuples would not be maxaligned either, leading to failures
on alignment-picky machines.  While there seems to be no live bug on any
platform we support, this is clearly pretty fragile: any addition to or
rearrangement of the fields in HashMemoryChunkData could break it.
Let's remove the hazard by getting rid of the data[] member and instead
using pointer arithmetic with an explicitly maxalign'ed offset.

Discussion: https://postgr.es/m/14483.1514938129@sss.pgh.pa.us
2018-01-02 21:23:06 -05:00
Alvaro Herrera 54eff5311d Fix deadlock hazard in CREATE INDEX CONCURRENTLY
Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
supposed to be able to work in parallel, as evidenced by fixes in commit
c3d09b3bd2 specifically to support this case.  In reality, one of the
sessions would be aborted by a misterious "deadlock detected" error.

Jeff Janes diagnosed that this is because of leftover snapshots used for
system catalog scans -- this was broken by 8aa3e47510 keeping track of
(registering) the catalog snapshot.  To fix the deadlocks, it's enough
to de-register that snapshot prior to waiting.

Backpatch to 9.4, which introduced MVCC catalog scans.

Include an isolationtester spec that 8 out of 10 times reproduces the
deadlock with the unpatched code for me (Álvaro).

Author: Jeff Janes
Diagnosed-by: Jeff Janes
Reported-by: Jeremy Finzel
Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com
2018-01-02 19:16:16 -03:00
Peter Eisentraut 438036264a Don't cast between GinNullCategory and bool
The original idea was that we could use an isNull-style bool array
directly as a GinNullCategory array.  However, the existing code already
acknowledges that that doesn't really work, because of the possibility
that bool as currently defined can have arbitrary bit patterns for true
values.  So it has to loop through the nullFlags array to set each bool
value to an acceptable value.  But if we are looping through the whole
array anyway, we might as well build a proper GinNullCategory array
instead and abandon the type casting.  That makes the code much safer in
case bool is ever changed to something else.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-02 12:20:56 -05:00
Andres Freund 93ea78b17c Fix EXPLAIN ANALYZE output for Parallel Hash.
In a race case, EXPLAIN ANALYZE could fail to display correct nbatch
and size information.  Refactor so that participants report only on
batches they worked on rather than trying to report on all of them,
and teach explain.c to consider the HashInstrumentation object from
all participants instead of picking the first one it can find.  This
should fix an occasional build farm failure in the "join" regression
test.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us
2018-01-01 14:38:23 -08:00
Andres Freund b40933101c Perform slot validity checks in a separate pass over expression.
This reduces code duplication a bit, but the primary benefit that it
makes JITing expression evaluation easier. When doing so we can't, as
previously done in the interpreted case, really change opcode without
recompiling. Nor dow we just carry around unnecessary branches to
avoid re-checking over and over.

As a minor side-effect this makes ExecEvalStepOp() O(log(N)) rather
than O(N).

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2017-12-29 12:45:25 -08:00
Andres Freund 4717fdb14c Rely on executor utils to build targetlist for DML RETURNING.
This is useful because it gets rid of the sole direct user of
ExecAssignResultType(). A future commit will likely make use of that
and combine creating the targetlist with the initialization of the
result slot. But it seems like good code hygiene anyway.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2017-12-29 12:26:29 -08:00
Magnus Hagander d02974e32e Properly set base backup backends to active in pg_stat_activity
When walsenders were included in pg_stat_activity, only the ones
actually streaming WAL were listed as active when they were active. In
particular, the connections sending base backups were listed as being
idle. Which means that a regular pg_basebackup would show up with one
active and one idle connection, when both were active.

This patch updates to set all walsenders to active when they are
(including those doing very fast things like IDENTIFY_SYSTEM), and then
back to idle. Details about exactly what they are doing is available in
pg_stat_replication.

Patch by me, review by Michael Paquier and David Steele.
2017-12-29 16:28:32 +01:00
Simon Riggs 48c9f49265 Fix race condition when changing synchronous_standby_names
A momentary window exists when synchronous_standby_names
changes that allows commands issued after the change to
continue to act as async until the change becomes visible.
Remove the race by using more appropriate test in syncrep.c

Author: Asim Rama Praveen and Ashwin Agrawal
Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen
Reviewed-by: Michael Paquier, Masahiko Sawada
2017-12-29 14:30:33 +00:00
Simon Riggs 2958a672b1 Extend near-wraparound hints to include replication slots
Author: Feike Steenbergen
Reviewed-by: Michael Paquier
2017-12-29 14:01:25 +00:00
Andres Freund f83040c62a Fix rare assertion failure in parallel hash join.
When a backend runs out of inner tuples to hash, it should detach from
grow_batch_barrier only after it has flushed all batches to disk and
merged counters, not before.  Otherwise a concurrent backend in
ExecParallelHashIncreaseNumBatches() could stop waiting for this
backend and try to read tuples before they have been written.  This
commit reorders those operations and should fix the assertion failures
seen occasionally on the build farm since commit
1804284042.

Author: Thomas Munro
Discussion: https://postgr.es/m/E1eRwXy-0004IK-TO%40gemulon.postgresql.org
2017-12-28 02:41:53 -08:00
Alvaro Herrera be2343221f Protect against hypothetical memory leaks in RelationGetPartitionKey
Also, fix a comment that commit 8a0596cb65 made obsolete.

Reported-by: Robert Haas
Discussion: http://postgr.es/m/CA+TgmoYbpuUUUp2GhYNwWm0qkah39spiU7uOiNXLz20ASfKYoA@mail.gmail.com
2017-12-27 18:06:14 -03:00
Robert Haas 62d02f39e7 Fix race-under-concurrency in PathNameCreateTemporaryDir.
Thomas Munro

Discussion: http://postgr.es/m/CAEepm=1Vp1e3KtftLtw4B60ZV9teNeKu6HxoaaBptQMsRWjJbQ@mail.gmail.com
2017-12-27 10:56:14 -08:00
Teodor Sigaev ad337c76b6 Update relation's stats in pg_class during vacuum full.
Hash index depends on estimation of numbers of tuples and pages of relations,
incorrect value could be a reason of significantly growing of index. Vacuum
full recreates heap and reindex all indexes before renewal stats. The patch
fixes that, so indexes will see correct values.

Backpatch to v10 only because earlier versions haven't usable hash index and
growing of hash index is a single user-visible symptom.

Author: Amit Kapila
Reviewed-by: Ashutosh Sharma, me
Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net
2017-12-27 18:25:37 +03:00
Teodor Sigaev ff963b393c Add polygon opclass for SP-GiST
Polygon opclass uses compress method feature of SP-GiST added earlier. For now
it's a single operator class which uses this feature. SP-GiST actually indexes
a bounding boxes of input polygons, so part of supported operations are lossy.
Opclass uses most methods of corresponding opclass over boxes of SP-GiST and
treats bounding boxes as point in 4D-space.

Bump catalog version.

Authors: Nikita Glukhov, Alexander Korotkov with minor editorization by me
Reviewed-By: all authors + Darafei Praliaskouski
Discussion: https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru
2017-12-25 18:59:38 +03:00
Andres Freund 4e2970f880 Fix assert with side effects in the new PHJ code.
Instead of asserting the assert just set the value to what it was
supposed to test...

Per coverity.
2017-12-24 02:57:55 -08:00
Tom Lane c4c2885cbb Fix UNION/INTERSECT/EXCEPT over no columns.
Since 9.4, we've allowed the syntax "select union select" and variants
of that.  However, the planner wasn't expecting a no-column set operation
and ended up treating the set operation as if it were UNION ALL.

Turns out it's trivial to fix in v10 and later; we just need to be careful
about not generating a Sort node with no sort keys.  However, since a weird
corner case like this is never going to be exercised by developers, we'd
better have thorough regression tests if we want to consider it supported.

Per report from Victor Yegorov.

Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com
2017-12-22 12:08:06 -05:00
Teodor Sigaev 854823fa33 Add optional compression method to SP-GiST
Patch allows to have different types of column and value stored in leaf tuples
of SP-GiST. The main application of feature is to transform complex column type
to simple indexed type or for truncating too long value, transformation could
be lossy.  Simple example: polygons are converted to their bounding boxes,
this opclass follows.

Authors: me, Heikki Linnakangas, Alexander Korotkov, Nikita Glukhov
Reviewed-By: all authors + Darafei Praliaskouski
Discussions:
https://www.postgresql.org/message-id/5447B3FF.2080406@sigaev.ru
https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru#54907069.1030506@sigaev.ru
2017-12-22 13:33:16 +03:00
Alvaro Herrera 9373baa0f7 Minor edits to catalog files and scripts
This fixes a few typos and small mistakes; it also cleans a few
minor stylistic issues.  The biggest functional change is that
Gen_fmgrtab.pl no longer knows the OID of language 'internal'.

Author: John Naylor
Discussion: https://postgr.es/m/CAJVSVGXAkwbk-A9QHHHf00N905kKisyQbaYwKqaRpze_gPXGfg@mail.gmail.com
2017-12-21 19:07:32 -03:00
Robert Haas cce1ecfc77 Adjust assertion in GetCurrentCommandId.
currentCommandIdUsed is only used to skip redundant increments of the
command counter, and CommandCounterIncrement() is categorically denied
under parallelism anyway.  Therefore, it's OK for
GetCurrentCommandId() to mark the counter value used, as long as it
happens in the leader, not a worker.

Prior to commit e9baa5e9fa, the slightly
incorrect check didn't matter, but now it does.  A test case added by
commit 1804284042 uncovered the problem
by accident; it caused failures with force_parallel_mode=on/regress.

Report and review by Andres Freund.  Patch by me.

Discussion: http://postgr.es/m/20171221143106.5lhtygohvmazli3x@alap3.anarazel.de
2017-12-21 13:19:59 -05:00
Tom Lane 6719b238e8 Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit.
This patch does three interrelated things:

* Create a new expression execution step type EEOP_PARAM_CALLBACK
and add the infrastructure needed for add-on modules to generate that.
As discussed, the best control mechanism for that seems to be to add
another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly, since it can't be retrieved
from the parent plan node's EState.

* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual.  This also lets us get rid
of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
decide which param IDs should be accessible or not.  plpgsql_param_fetch
was already doing the identical masking check, so having callers do it too
seemed redundant.  While I was at it, I added a "speculative" flag to
paramFetch that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide values
of non-DTYPE_VAR variables to the planner for fear of triggering premature
"record not assigned yet" or "field not found" errors during planning.

* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for.  This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal.  copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.)  This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).

Performance testing shows this to be at worst a break-even change,
and it can provide wins ranging up to 20% in test cases involving
accesses to fields of "record" variables.  The fact that values of
such variables can now be exposed to the planner might produce wins
in some situations, too, but I've not pursued that angle.

In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer in a
transient field in ExprState.  The ParamListInfo pointer for a stand-alone
expression is handled the same way; we'd otherwise have had to add
yet another recursively-passed-down argument in expression compilation.

Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
2017-12-21 12:57:45 -05:00
Alvaro Herrera 8a0596cb65 Get rid of copy_partition_key
That function currently exists to avoid leaking memory in
CacheMemoryContext in case of trouble while the partition key is being
built, but there's a better way: allocate everything in a memcxt that
goes away if the current (sub)transaction fails, and once the partition
key is built and no further errors can occur, make the memcxt permanent
by making it a child of CacheMemoryContext.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
2017-12-21 14:21:39 -03:00
Alvaro Herrera 9ef6aba1d3 Fix typo 2017-12-21 13:36:52 -03:00
Tom Lane c98c35cd08 Avoid putting build-location-dependent strings into generated files.
Various Perl scripts we use to generate files were in the habit of
printing things like "generated by $0" into their output files.
That looks like a fine idea at first glance, but it results in
non-reproducible output, because in VPATH builds $0 won't be just
the name of the script file, but a full path for it.  We'd prefer
that you get identical results whether using VPATH or not, so this
is a bad thing.

Some of these places also printed their input file name(s), causing
an additional hazard of the same type.

Hence, establish a policy that thou shalt not print $0, nor input file
pathnames, into output files (they're still allowed in error messages,
though).  Instead just write the script name verbatim.  While we are at
it, we can make these annotations more useful by giving the script's
full relative path name within the PG source tree, eg instead of
Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.

Not all of the changes made here actually affect any files shipped
in finished tarballs today, but it seems best to apply the policy
everyplace so that nobody copies unsafe code into places where it
could matter.

Christoph Berg and Tom Lane

Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
2017-12-21 10:57:06 -05:00
Robert Haas 59d1e2b95a Cancel CV sleep during subtransaction abort.
Generally, error recovery paths that need to do things like
LWLockReleaseAll and pgstat_report_wait_end also need to call
ConditionVariableCancelSleep, but AbortSubTransaction was missed.

Since subtransaction abort might destroy up the DSM segment that
contains the ConditionVariable stored in cv_sleep_target, this
can result in a crash for anything using condition variables.

Reported and diagnosed by Andres Freund.

Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de
2017-12-21 09:24:30 -05:00
Andres Freund 1804284042 Add parallel-aware hash joins.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash.  While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.

After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory.  If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.

The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * divides the work of building the hash table over the CPUs

One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables.  This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes.  Another is that
outer batch 0 must be written to disk if multiple batches are required.

A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.

A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.

Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 00:43:41 -08:00
Robert Haas f94eec490b When passing query strings to workers, pass the terminating \0.
Otherwise, when the query string is read, we might trailing garbage
beyond the end, unless there happens to be a \0 there by good luck.

Report and patch by Thomas Munro. Reviewed by Rafia Sabih.

Discussion: http://postgr.es/m/CAEepm=2SJs7X+_vx8QoDu8d1SMEOxtLhxxLNzZun_BvNkuNhrw@mail.gmail.com
2017-12-20 17:26:50 -05:00
Robert Haas 8526bcb2df Try again to fix accumulation of parallel worker instrumentation.
When a Gather or Gather Merge node is started and stopped multiple
times, accumulate instrumentation data only once, at the end, instead
of after each execution, to avoid recording inflated totals.

Commit 778e78ae9f, the previous attempt
at a fix, instead reset the state after every execution, which worked
for the general instrumentation data but had problems for the additional
instrumentation specific to Sort and Hash nodes.

Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
following a design proposal from Thomas Munro, with a comment tweak
by me.

Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
2017-12-19 12:21:56 -05:00
Robert Haas 38fc54703e Re-fix wrong costing of Sort under Gather Merge.
Commit dc02c7bca4 changed this call
to create_sort_path() to take -1 rather than limit_tuples because,
at that time, there was no way for a Sort beneath a Gather Merge
to become a top-N sort.

Later, commit 3452dc5240 provided
a way for a Sort beneath a Gather Merge to become a top-N sort,
but failed to revert the previous commit in the process.  Do that.

Report and analysis by Jeff Janes; patch by Thomas Munro; review by
Amit Kapila and by me.

Discussion: http://postgr.es/m/CAEepm=1BWtC34vUroA0Uqjw02MaqdUrW+d6WD85_k8SLyPiKHQ@mail.gmail.com
2017-12-19 10:42:17 -05:00
Andres Freund ab9e0e718a Add shared tuplestores.
SharedTuplestore allows multiple participants to write into it and
then read the tuples back from it in parallel.  Each reader receives
partial results.

For now it always uses disk files, but other buffering policies and
other kinds of scans (ie each reader receives complete results) may be
useful in future.

The upcoming parallel hash join feature will use this facility.

Author: Thomas Munro
Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas
Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
2017-12-18 14:23:19 -08:00
Peter Eisentraut 25d532698d Move SCRAM-related name definitions to scram-common.h
Mechanism names for SCRAM and channel binding names have been included
in scram.h by the libpq frontend code, and this header references a set
of routines which are only used by the backend.  scram-common.h is on
the contrary usable by both the backend and libpq, so getting those
names from there seems more reasonable.

Author: Michael Paquier <michael.paquier@gmail.com>
2017-12-18 16:59:48 -05:00
Fujii Masao 56a95ee511 Fix bug in cancellation of non-exclusive backup to avoid assertion failure.
Previously an assertion failure occurred when pg_stop_backup() for
non-exclusive backup was aborted while it's waiting for WAL files to
be archived. This assertion failure happened in do_pg_abort_backup()
which was called when a non-exclusive backup was canceled.
do_pg_abort_backup() assumes that there is at least one non-exclusive
backup running when it's called. But pg_stop_backup() can be canceled
even after it marks the end of non-exclusive backup (e.g.,
during waiting for WAL archiving). This broke the assumption that
do_pg_abort_backup() relies on, and which caused an assertion failure.

This commit changes do_pg_abort_backup() so that it does nothing
when non-exclusive backup has been already marked as completed.
That is, the asssumption is also changed, and do_pg_abort_backup()
now can handle even the case where it's called when there is
no running backup.

Backpatch to 9.6 where SQL-callable non-exclusive backup was added.

Author: Masahiko Sawada and Michael Paquier
Reviewed-By: Robert Haas and Fujii Masao
Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
2017-12-19 03:46:14 +09:00
Robert Haas fd7c0fa732 Fix crashes on plans with multiple Gather (Merge) nodes.
es_query_dsa turns out to be broken by design, because it supposes
that there is only one DSA for the whole query, whereas there is
actually one per Gather (Merge) node.  For now, work around that
problem by setting and clearing the pointer around the sections of
code that might need it.  It's probably a better idea to get rid of
es_query_dsa altogether in favor of having each node keep track
individually of which DSA is relevant, but that seems like more than
we would want to back-patch.

Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit
Kapila, and by me.

Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
2017-12-18 12:22:31 -05:00
Magnus Hagander 7731c32087 Fix typo on comment
Author: David Rowley
2017-12-18 11:24:55 +01:00
Tom Lane b31a9d7dd3 Suppress compiler warning about no function return value.
Compilers that don't know that ereport(ERROR) doesn't return
complained about the new coding in scanint8() introduced by
commit 101c7ee3e.  Tweak coding to avoid the warning.
Per buildfarm.
2017-12-17 00:41:41 -05:00
Andres Freund 699bf7d05c Perform a lot more sanity checks when freezing tuples.
The previous commit has shown that the sanity checks around freezing
aren't strong enough. Strengthening them seems especially important
because the existance of the bug has caused corruption that we don't
want to make even worse during future vacuum cycles.

The errors are emitted with ereport rather than elog, despite being
"should never happen" messages, so a proper error code is emitted. To
avoid superflous translations, mark messages as internal.

Author: Andres Freund and Alvaro Herrera
Reviewed-By: Alvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
2017-12-14 18:20:47 -08:00
Andres Freund 9c2f0a6c3c Fix pruning of locked and updated tuples.
Previously it was possible that a tuple was not pruned during vacuum,
even though its update xmax (i.e. the updating xid in a multixact with
both key share lockers and an updater) was below the cutoff horizon.

As the freezing code assumed, rightly so, that that's not supposed to
happen, xmax would be preserved (as a member of a new multixact or
xmax directly). That causes two problems: For one the tuple is below
the xmin horizon, which can cause problems if the clog is truncated or
once there's an xid wraparound. The bigger problem is that that will
break HOT chains, which in turn can lead two to breakages: First,
failing index lookups, which in turn can e.g lead to constraints being
violated. Second, future hot prunes / vacuums can end up making
invisible tuples visible again. There's other harmful scenarios.

Fix the problem by recognizing that tuples can be DEAD instead of
RECENTLY_DEAD, even if the multixactid has alive members, if the
update_xid is below the xmin horizon. That's safe because newer
versions of the tuple will contain the locking xids.

A followup commit will harden the code somewhat against future similar
bugs and already corrupted data.

Author: Andres Freund, with changes by Alvaro Herrera
Reported-By: Daniel Wood
Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter
   Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier
Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier
Discussion:
    https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
    https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
2017-12-14 18:20:47 -08:00
Andrew Dunstan 0fedb4ea69 Fix walsender timeouts when decoding a large transaction
The logical slots have a fast code path for sending data so as not to
impose too high a per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that a transaction with a large number of changes may take a
very long time to be processed and sent to the client. This causes the
walsender to ignore interrupts for potentially a long time and more
importantly it will result in the walsender being killed due to
timeout at the end of such a transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when the last keepalive check happened less
than half the walsender timeout ago. Otherwise the slower code path will
be taken.

Backpatched to 9.4

Petr Jelinek, reviewed by  Kyotaro HORIGUCHI, Yura Sokolov,  Craig
Ringer and Robert Haas.

Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
2017-12-14 11:13:14 -05:00
Andres Freund 538d114f6d Allow executor nodes to change their ExecProcNode function.
In order for executor nodes to be able to change their ExecProcNode function
after ExecInitNode() has finished, provide ExecSetExecProcNode().  This allows
any wrappers functions that only execProcnode.c knows about to be reinstalled.
The motivation for wanting to change ExecProcNode after ExecInitNode() has
finished is that it is not known until later whether parallel query is
available, so if a parallel variant is to be installed then ExecInitNode()
is too soon to decide.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
2017-12-13 15:47:01 -08:00
Andres Freund 923e8dee88 Add defenses against pre-crash files to BufFileOpenShared().
Crash restarts currently don't clean up temporary files, as a debugging aid.
If a left-over file happens to have the same name as a segment file we're
trying to create, we'll just truncate and reuse it, but there is a problem:
BufFileOpenShared() determines how many segment files exist by trying to open
.0, .1, .2, ... until it finds no more files.  It might be confused by a junk
file that has the next segment number.  To defend against that, make sure we
always create a gap after the end file by unlinking the following name if it
exists.  Also make it an error to try to open a BufFile that doesn't exist
(has no segment 0), so as not to encourage the development of client code
that depends on an interface that we can't reliably provide.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D2jhCbC_GFQJaaDhWxLB4EXtT3vVd5czuRNaqF5CWSTog%40mail.gmail.com
2017-12-13 13:27:41 -08:00
Robert Haas 884a60840c Fix parallel index scan hang with deleted or half-dead pages.
The previous coding forgot to release the scan before seizing
it again, leading to a lockup.

Report by Patrick Hemmer.  Diagnosis by Thomas Munro.  Patch by
Amit Kapila.

Discussion: http://postgr.es/m/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com
2017-12-13 16:15:44 -05:00
Robert Haas 1d6fb35ad6 Revert "Fix accumulation of parallel worker instrumentation."
This reverts commit 2c09a5c12a.  Per
further discussion, that doesn't seem to be the best possible fix.

Discussion: http://postgr.es/m/CAA4eK1LW2aFKzY3=vwvc=t-juzPPVWP2uT1bpx_MeyEqnM+p8g@mail.gmail.com
2017-12-13 15:19:28 -05:00
Tom Lane 9fa6f00b13 Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts.  The key ideas are:

* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.

* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).

* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.

The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations.  That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.

Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant.  Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)

The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.

To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option.  AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.

An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.

Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module.  That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.

In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles.  The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.

Also, mark the memory context method-pointer structs "const",
just for cleanliness.

Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:16 -05:00
Peter Eisentraut 3d8874224f Fix crash when using CALL on an aggregate
Author: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reported-by: Rushabh Lathia <rushabh.lathia@gmail.com>
2017-12-13 10:37:48 -05:00
Andres Freund 8e211f5391 Add float.h include to int8.c, for isnan().
port.h redirects isnan() to _isnan() on windows, which in turn is
provided by float.h rather than math.h. Therefore include the latter
as well.

Per buildfarm.
2017-12-12 23:32:43 -08:00
Andres Freund f512a6e132 Consistently use PG_INT(16|32|64)_(MIN|MAX).
Per buildfarm animal woodlouse.
2017-12-12 18:19:13 -08:00
Andres Freund 101c7ee3ee Use new overflow aware integer operations.
A previous commit added inline functions that provide fast(er) and
correct overflow checks for signed integer math. Use them in a
significant portion of backend code.  There's more to touch in both
backend and frontend code, but these were the easily identifiable
cases.

The old overflow checks are noticeable in integer heavy workloads.

A secondary benefit is that getting rid of overflow checks that rely
on signed integer overflow wrapping around, will allow us to get rid
of -fwrapv in the future. Which in turn slows down other code.

Author: Andres Freund
Discussion: https://postgr.es/m/20171024103954.ztmatprlglz3rwke@alap3.anarazel.de
2017-12-12 16:55:37 -08:00
Robert Haas 95b52351fe Remove obsolete comment.
Commit 8b304b8b72 removed replacement
selection, but left behind this comment text.  The optimization to
which the comment refers is not relevant without replacement
selection, because if we had so few tuples as to require only one
tape, we would have just completed the sort in memory.

Peter Geoghegan

Discussion: http://postgr.es/m/CAH2-WznqupLA8CMjp+vqzoe0yXu0DYYbQSNZxmgN76tLnAOZ_w@mail.gmail.com
2017-12-12 19:33:50 -05:00
Robert Haas d329dc2ea4 Remove bug from OPTIMIZER_DEBUG code for partition-wise join.
Etsuro Fujita, reviewed by Ashutosh Bapat

Discussion: http://postgr.es/m/5A2A60E6.6000008@lab.ntt.co.jp
2017-12-12 10:52:15 -05:00
Peter Eisentraut 4034db215b Fix comment
Reported-by: Noah Misch <noah@leadboat.com>
2017-12-11 16:37:39 -05:00
Tom Lane 7eb16ab17d Fix corner-case coredump in _SPI_error_callback().
I noticed that _SPI_execute_plan initially sets spierrcontext.arg = NULL,
and only fills it in some time later.  If an error were to happen in
between, _SPI_error_callback would try to dereference the null pointer.
This is unlikely --- there's not much between those points except
push-snapshot calls --- but it's clearly not impossible.  Tweak the
callback to do nothing if the pointer isn't set yet.

It's been like this for awhile, so back-patch to all supported branches.
2017-12-11 16:34:28 -05:00
Robert Haas 01a0ca1bed Improve comment about PartitionBoundInfoData.
Ashutosh Bapat, per discussion with Julien Rouhaund, who also
reviewed this patch.

Discussion: http://postgr.es/m/CAFjFpReBR3ftK9C23LLCZY_TDXhhjB_dgE-L9+mfTnA=gkvdvQ@mail.gmail.com
2017-12-11 12:52:15 -05:00
Magnus Hagander d8f632caec Fix typo
Reported by Robins Tharakan
2017-12-09 11:40:31 +01:00
Peter Eisentraut 005ac298b1 Prohibit identity columns on typed tables and partitions
Those cases currently crash and supporting them is more work then
originally thought, so we'll just prohibit these scenarios for now.

Author: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Reported-by: Мансур Галиев <gomer94@yandex.ru>
Bug: #14866
2017-12-08 12:13:04 -05:00
Peter Eisentraut af9f8b7ca3 Fix mistake in comment
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-12-08 11:23:36 -05:00
Peter Eisentraut 2d2d06b7e2 Apply identity sequence values on COPY
A COPY into a table should apply identity sequence values just like it
does for ordinary defaults.  This was previously forgotten, leading to
null values being inserted, which in turn would fail because identity
columns have not-null constraints.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Steven Winfield <steven.winfield@cantabcapital.com>
Bug: #14952
2017-12-08 09:18:18 -05:00
Robert Haas 28724fd90d Report failure to start a background worker.
When a worker is flagged as BGW_NEVER_RESTART and we fail to start it,
or if it is not marked BGW_NEVER_RESTART but is terminated before
startup succeeds, what BgwHandleStatus should be reported?  The
previous code really hadn't considered this possibility (as indicated
by the comments which ignore it completely) and would typically return
BGWH_NOT_YET_STARTED, but that's not a good answer, because then
there's no way for code using GetBackgroundWorkerPid() to tell the
difference between a worker that has not started but will start
later and a worker that has not started and will never be started.
So, when this case happens, return BGWH_STOPPED instead.  Update the
comments to reflect this.

The preceding fix by itself is insufficient to fix the problem,
because the old code also didn't send a notification to the process
identified in bgw_notify_pid when startup failed.  That might've
been technically correct under the theory that the status of the
worker was BGWH_NOT_YET_STARTED, because the status would indeed not
change when the worker failed to start, but now that we're more
usefully reporting BGWH_STOPPED, a notification is needed.

Without these fixes, code which starts background workers and then
uses the recommended APIs to wait for those background workers to
start would hang indefinitely if the postmaster failed to fork a
worker.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com
2017-12-06 08:58:27 -05:00
Robert Haas 9c64ddd414 Fix Parallel Append crash.
Reported by Tom Lane and the buildfarm.

Amul Sul and Amit Khandekar

Discussion: http://postgr.es/m/17868.1512519318@sss.pgh.pa.us
Discussion: http://postgr.es/m/CAJ3gD9cJQ4d-XhmZ6BqM9rMM2KDBfpkdgOAb4+psz56uBuMQ_A@mail.gmail.com
2017-12-06 08:42:50 -05:00
Robert Haas ab72716778 Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention.  We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.

Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 17:28:39 -05:00
Robert Haas 2c09a5c12a Fix accumulation of parallel worker instrumentation.
When a Gather or Gather Merge node is started and stopped multiple
times, the old code wouldn't reset the shared state between executions,
potentially resulting in dramatically inflated instrumentation data
for nodes beneath it.  (The per-worker instrumentation ended up OK,
I think, but the overall totals were inflated.)

Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
reviewed and tweaked a bit by me.

Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
2017-12-05 14:35:33 -05:00
Andres Freund 5bcf389ecf Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate.
If a hash join appears in a parallel query, there may be no hash table
available for explain.c to inspect even though a hash table may have
been built in other processes.  This could happen either because
parallel_leader_participation was set to off or because the leader
happened to hit the end of the outer relation immediately (even though
the complete relation is not empty) and decided not to build the hash
table.

Commit bf11e7ee introduced a way for workers to exchange
instrumentation via the DSM segment for Sort nodes even though they
are not parallel-aware.  This commit does the same for Hash nodes, so
that explain.c has a way to find instrumentation data from an
arbitrary participant that actually built the hash table.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com
2017-12-05 10:55:56 -08:00
Tom Lane 8dc3c971a9 Treat directory open failures as hard errors in ResetUnloggedRelations().
Previously, this code just reported such problems at LOG level and kept
going.  The problem with this approach is that transient failures (e.g.,
ENFILE) could prevent us from resetting unlogged relations to empty,
yet allow recovery to appear to complete successfully.  That seems like
a data corruption hazard large enough to treat such problems as reasons
to fail startup.

For the same reason, treat unlink failures for unlogged files as hard
errors not just LOG messages.  It's a little odd that we did it like that
when file-level errors in other steps (copy_file, fsync_fname) are ERRORs.

The sole case that I left alone is that ENOENT failure on a tablespace
(not database) directory is not an error, though it will now be logged
rather than just silently ignored.  This is to cover the scenario where
a previous DROP TABLESPACE removed the tablespace directory but failed
before removing the pg_tblspc symlink.  I'm not sure that that's very
likely in practice, but that seems like the only real excuse for the
old behavior here, so let's allow for it.  (As coded, this will also
allow ENOENT on $PGDATA/base/.  But since we'll fail soon enough if
that's gone, I don't think we need to complicate this code by
distinguishing that from a true tablespace case.)

Discussion: https://postgr.es/m/21040.1512418508@sss.pgh.pa.us
2017-12-04 20:52:59 -05:00
Tom Lane 066bc21c0e Simplify do_pg_start_backup's API by opening pg_tblspc internally.
do_pg_start_backup() expects its callers to pass in an open DIR pointer
for the pg_tblspc directory, but there's no apparent advantage in that.
It complicates the callers without adding any flexibility, and there's no
robustness advantage, since we surely have to be prepared for errors during
the scan of pg_tblspc anyway.  In fact, by holding an extra kernel resource
during operations like the preliminary checkpoint, we might be making
things a fraction more failure-prone not less.  Hence, remove that argument
and open the directory just for the duration of the actual scan.

Discussion: https://postgr.es/m/28752.1512413887@sss.pgh.pa.us
2017-12-04 18:37:54 -05:00
Tom Lane 561885db05 Improve error handling in RemovePgTempFiles().
Modify this function and its subsidiaries so that syscall failures are
reported via ereport(LOG), rather than silently ignored as before.
We don't want to throw a hard ERROR, as that would prevent database
startup, and getting rid of leftover temporary files is not important
enough for that.  On the other hand, not reporting trouble at all
seems like an odd choice not in line with current project norms,
especially since any failure here is quite unexpected.

On the same reasoning, adjust these functions' AllocateDir/ReadDir calls
so that failure to scan a directory results in LOG not ERROR.  I also
removed the previous practice of silently ignoring ENOENT failures during
directory opens --- there are some corner cases where that could happen
given a previous database crash, but that seems like a bad excuse for
ignoring a condition that isn't expected in most cases.  A LOG message
during postmaster start seems OK in such situations, and better than
no output at all.

In passing, make RemovePgTempRelationFiles' test for "is the file name
all digits" look more like the way it's done elsewhere.

Discussion: https://postgr.es/m/19907.1512402254@sss.pgh.pa.us
2017-12-04 17:59:35 -05:00
Tom Lane 2069e6faa0 Clean up assorted messiness around AllocateDir() usage.
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures.  It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode.  And it eliminates a lot of just plain redundant error-handling
code.

In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern

	dir = AllocateDir(path);
	while ((de = ReadDirExtended(dir, path, LOG)) != NULL)

if they'd like to treat directory-open failures as mere LOG conditions
rather than errors.  Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.

Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine.  Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above.  (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.)  In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.

Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported.  There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.

Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming.  (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire.  If so, this function was broken for its
own purposes as well as breaking callers.)

And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.

Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless.  Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes.  The rest of this is
essentially cosmetic and need not get back-patched.

Michael Paquier, with a bit of additional work by me

Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
2017-12-04 17:02:56 -05:00
Robert Haas ab6eaee884 When VACUUM or ANALYZE skips a concurrently dropped table, log it.
Hopefully, the additional logging will help avoid confusion that
could otherwise result.

Nathan Bossart, reviewed by Michael Paquier, Fabrízio Mello, and me
2017-12-04 15:25:55 -05:00
Tom Lane ecc27d55f4 Support boolean columns in functional-dependency statistics.
There's no good reason that the multicolumn stats stuff shouldn't work on
booleans.  But it looked only for "Var = pseudoconstant" clauses, and it
will seldom find those for boolean Vars, since earlier phases of planning
will fold "boolvar = true" or "boolvar = false" to just "boolvar" or
"NOT boolvar" respectively.  Improve dependencies_clauselist_selectivity()
to recognize such clauses as equivalent to equality restrictions.

This fixes a failure of the extended stats mechanism to apply in a case
reported by Vitaliy Garnashevich.  It's not a complete solution to his
problem because the bitmap-scan costing code isn't consulting extended
stats where it should, but that's surely an independent issue.

In passing, improve some comments, get rid of a NumRelids() test that's
redundant with the preceding bms_membership() test, and fix
dependencies_clauselist_selectivity() so that estimatedclauses actually
is a pure output argument as stated by its API contract.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/73a4936d-2814-dc08-ed0c-978f76f435b0@gmail.com
2017-12-04 11:51:43 -05:00
Robert Haas 9f4992e2a9 Remove memory leak protection from Gather and Gather Merge nodes.
Before commit 6b65a7fe62, tqueue.c could
perform tuple remapping and thus leak memory, which is why commit
af33039317 made TupleQueueReaderNext
run in a short-lived context.  Now, however, tqueue.c has been reduced
to a shadow of its former self, and there shouldn't be any chance of
leaks any more.  Accordingly, remove some tuple copying and memory
context manipulation to speed up processing.

Patch by me, reviewed by Amit Kapila.  Some testing by Rafia Sabih.

Discussion: http://postgr.es/m/CAA4eK1LSDydwrNjmYSNkfJ3ZivGSWH9SVswh6QpNzsMdj_oOQA@mail.gmail.com
2017-12-04 10:39:24 -05:00
Andres Freund ec6a040056 Adjust #ifdef EXEC_BACKEND RemovePgTempFilesInDir() call.
Other callers were adjusted in the course of
dc6c4c9dc2.

Per buildfarm.
2017-12-01 17:28:05 -08:00
Andres Freund dc6c4c9dc2 Add infrastructure for sharing temporary files between backends.
SharedFileSet allows temporary files to be created by one backend and
then exported for read-only access by other backends, with clean-up
managed by reference counting associated with a DSM segment.  This
includes changes to fd.c and buffile.c to support the new kind of
temporary file.

This will be used by an upcoming patch adding support for parallel
hash joins.

Author: Thomas Munro
Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas, Rushabh Lathia
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAH2-WznJ_UgLux=_jTgCQ4yFz0iBntudsNKa1we3kN1BAG=88w@mail.gmail.com
2017-12-01 16:30:56 -08:00
Robert Haas 35438e5763 Minor code beautification in partition_bounds_equal.
Use get_greatest_modulus more consistently, instead of doing the
same thing in an ad-hoc manner in this one place.

Ashutosh Bapat

Discussion: http://postgr.es/m/CAFjFpReT9L4RCiJBKOyWC2=i02kv9uG2fx=4Fv7kFY2t0SPCgw@mail.gmail.com
2017-12-01 13:52:59 -05:00
Robert Haas 87c37e3291 Re-allow INSERT .. ON CONFLICT DO NOTHING on partitioned tables.
Commit 8355a011a0 was reverted in
f05230752d, but this attempt is
hopefully better-considered: we now pass the correct value to
ExecOpenIndices, which should avoid the crash that we hit before.

Amit Langote, reviewed by Simon Riggs and by me.  Some final
editing by me.

Discussion: http://postgr.es/m/7ff1e8ec-dc39-96b1-7f47-ff5965dceeac@lab.ntt.co.jp
2017-12-01 12:53:21 -05:00
Robert Haas 1cbc17aaca Try to exclude partitioned tables in toto.
Ashutosh Bapat, reviewed by Jeevan Chalke.  Comment by me.

Discussion: http://postgr.es/m/CAFjFpRcuRaydz88CY_aQekmuvmN2A9ax5z0k=ppT+s8KS8xMRA@mail.gmail.com
2017-12-01 10:59:09 -05:00
Robert Haas 59c8078744 Fix uninitialized memory reference.
Without this, when partdesc->nparts == 0, we end up calling
ExecBuildSlotPartitionKeyDescription without initializing values
and isnull.

Reported by Coverity via Michael Paquier.  Patch by Michael Paquier,
reviewed and revised by Amit Langote.

Discussion: http://postgr.es/m/CAB7nPqQ3mwkdMoPY-ocgTpPnjd8TKOadMxdTtMLvEzF8480Zfg@mail.gmail.com
2017-12-01 10:05:00 -05:00
Peter Eisentraut 86ab28fbd1 Check channel binding flag at end of SCRAM exchange
We need to check whether the channel-binding flag encoded in the
client-final-message is the same one sent in the client-first-message.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-12-01 09:53:26 -05:00
Robert Haas 06ae669c92 Remove extra word from comment.
David Rowley, who also was the primary author of the patch that
added this function; the attribution in my previous commit,
84940644de, was incorrect due to
sloppiness on my part.

Discussion: http://postgr.es/m/CAKJS1f_0iSiLQsf_c06AzOWAc3eS6ePjjVQFpcFv3W-O5aktnQ@mail.gmail.com
2017-11-30 16:22:38 -05:00
Peter Eisentraut e4128ee767 SQL procedures
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar.  This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.

This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL).  There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql.  Support for defining
procedures is available in all the languages supplied by the core
distribution.

While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2017-11-30 11:03:20 -05:00
Robert Haas 1761653bbb Make create_unique_path manage memory like mark_dummy_rel.
Put the unique path in the same context as the owning RelOptInfo, rather
than the toplevel planner context.  This is how this function worked
originally, but commit f41803bb39
changed it without explanation.  mark_dummy_rel adopted the older (or
newer?) technique in commit eca75a12a2,
which also featured a much better explanation of why it is correct.
So, switch back to that technique here, with the same explanation
given there.

Although this fixes a possible memory leak when GEQO is in use, the
leak is minor and probably nobody cares, so no back-patch.

Ashutosh Bapat, reviewed by Tom Lane and by me

Discussion: http://postgr.es/m/CAFjFpRcXkHHrXyD9BCvkgGJV4TnHG2SWJ0PhJfrDu3NAcQvh7g@mail.gmail.com
2017-11-30 09:50:10 -05:00
Tom Lane 7ca25b7de6 Fix neqjoinsel's behavior for semi/anti join cases.
Previously, this function estimated the selectivity as 1 minus eqjoinsel()
for the negator equality operator, regardless of join type (I think there
was an expectation that eqjoinsel would handle the join type).  But
actually this is completely wrong for semijoin cases: the fraction of the
LHS that has a non-matching row is not one minus the fraction of the LHS
that has a matching row.  In reality a semijoin with <> will nearly always
succeed: it can only fail when the RHS is empty, or it contains a single
distinct value that is equal to the particular LHS value, or the LHS value
is null.  The only one of those things we should have much confidence in
estimating is the fraction of LHS values that are null, so let's just take
the selectivity as 1 minus outer nullfrac.

Per coding convention, antijoin should be estimated the same as semijoin.

Arguably this is a bug fix, but in view of the lack of field complaints
and the risk of destabilizing plans, no back-patch.

Thomas Munro, reviewed by Ashutosh Bapat

Discussion: https://postgr.es/m/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com
2017-11-29 22:00:37 -05:00
Andres Freund 1145acc70d Add a barrier primitive for synchronizing backends.
Provide support for dynamic or static parties of processes to wait for
all processes to reach point in the code before continuing.

This is similar to the mechanism of the same name in POSIX threads and
MPI, though has explicit phasing and dynamic party support like the
Java core library's Phaser.

This will be used by an upcoming patch adding support for parallel
hash joins.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=2_y7oi01OjA_wLvYcWMc9_d=LaoxrY3eiROCZkB_qakA@mail.gmail.com
2017-11-29 17:07:16 -08:00
Robert Haas 84940644de New C function: bms_add_range
This will be used by pending patches to improve partition pruning.

Amit Langote and Kyotaro Horiguchi, per a suggestion from David
Rowley.  Review and testing of the larger patch set of which this is a
part by Ashutosh Bapat, David Rowley, Dilip Kumar, Jesper Pedersen,
Rajkumar Raghuwanshi, Beena Emerson, Amul Sul, and Kyotaro Horiguchi.

Discussion: http://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2017-11-29 17:12:05 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Tom Lane 801386af62 Clarify old comment about qual_is_pushdown_safe's handling of subplans.
This comment glossed over the difference between initplans and subplans,
but they are indeed different for our purposes here.
2017-11-28 23:32:17 -05:00
Alvaro Herrera 3848d21673 Make memset() use sizeof() rather than re-compute size
This is simpler and more closely follows overwhelming precedent.

Report and patch by Mark Dilger.
Discussion: https://postgr.es/m/9A68FB88-5F45-4848-9926-8586E2D777D1@gmail.com
2017-11-29 00:13:45 -03:00
Alvaro Herrera 414cd434ff Fix extstat collection when no stats are produced for a column
This is a mistakenly placed conditional in bf2a691e02.

Reported by Justin Pryzby
Discussion: https://postgr.es/m/20171117214352.GE25796@telsasoft.com
2017-11-28 23:40:11 -03:00
Robert Haas cd482f2951 Fix wrong function name in comment.
Rushabh Lathia

Discussion: http://postgr.es/m/CAGPqQf2z5g+7YmGZSZgKoiFsaUB+63Rzmz8-5PQHuS6hd14FEg@mail.gmail.com
2017-11-28 14:17:21 -05:00
Robert Haas 2d7950f222 If a range-partitioned table has no default partition, reject null keys.
Commit 4e5fe9ad19 introduced this
problem.  Also add a test so it doesn't get broken again.

Report by Rushabh Lathia.  Fix by Amit Langote.  Reviewed by Rushabh
Lathia and Amul Sul.  Tweaked by me.

Discussion: http://postgr.es/m/CAGPqQf0Y1iJyk4QJBdMf=pS9i6Q0JUMM_h5-qkR3OMJ-e04PyA@mail.gmail.com
2017-11-28 14:11:16 -05:00
Robert Haas 445dbd82a3 Fix ReinitializeParallelDSM to tolerate finding no error queues.
Commit d466335064 changed things so
that shm_toc_lookup would fail with an error rather than silently
returning NULL in the hope that such failures would be reported
in a useful way rather than via a system crash.  However, it
overlooked the fact that the lookup of PARALLEL_KEY_ERROR_QUEUE
in ReinitializeParallelDSM is expected to fail when no DSM segment
was created in the first place; in that case, we end up with a
backend-private memory segment that still contains an entry for
PARALLEL_KEY_FIXED but no others.  Consequently a benign failure
to initialize parallelism can escalate into an elog(ERROR);
repair.

Discussion: http://postgr.es/m/CA+Tgmob8LFw55DzH1QEREpBEA9RJ_W_amhBFCVZ6WMwUhVpOqg@mail.gmail.com
2017-11-28 12:15:38 -05:00
Robert Haas c6755e233b Teach bitmap heap scan to cope with absence of a DSA.
If we have a plan that uses parallelism but are unable to execute it
using parallelism, for example due to a lack of available DSM
segments, then the EState's es_query_dsa will be NULL.  Parallel
bitmap heap scan needs to fall back to a non-parallel scan in such
cases.

Patch by me, reviewed by Dilip Kumar

Discussion: http://postgr.es/m/CAEepm=0kADK5inNf_KuemjX=HQ=PuTP0DykM--fO5jS5ePVFEA@mail.gmail.com
2017-11-28 11:44:59 -05:00
Robert Haas 7b88d63a91 Add null test to partition constraint for default range partitions.
Non-default range partitions have a constraint which include null
tests, and both default and non-default list partitions also have a
constraint which includes null tests, but for some reason this was
missed for default range partitions.  This could cause the partition
constraint to evaluate to false for rows that were (correctly) routed
to that partition by insert tuple routing, which could in turn
cause constraint exclusion to prune the default partition in cases
where it should not.

Amit Langote, reviewed by Kyotaro Horiguchi

Discussion: http://postgr.es/m/ba7aaeb1-4399-220e-70b4-62eade1522d0@lab.ntt.co.jp
2017-11-28 10:51:01 -05:00
Robert Haas 487a0c1518 Fix typo.
Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoCq_QG6UEo6yb-purmhLQjDLsCA2_B+8Wh3ah9P2-XmrQ@mail.gmail.com
2017-11-28 08:19:01 -05:00
Tom Lane cb03fa33ae Fix assorted syscache lookup sloppiness in partition-related code.
heap_drop_with_catalog and ATExecDetachPartition neglected to check for
SearchSysCache failures, as noted in bugs #14927 and #14928 from Pan Bian.
Such failures are pretty unlikely, since we should already have some sort
of lock on the rel at these points, but it's neither a good idea nor
per project style to omit a check for failure.

Also, StorePartitionKey contained a syscache lookup that it never did
anything with, including never releasing the result.  Presumably the
reason why we don't see refcount-leak complaints is that the lookup
always fails; but in any case it's pretty useless, so remove it.

All of these errors were evidently introduced by the relation
partitioning feature.  Back-patch to v10 where that came in.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/20171127090105.1463.3962@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20171127091341.1468.72696@wrigleys.postgresql.org
2017-11-27 19:22:08 -05:00
Tom Lane 9a785ad573 Fix creation of resjunk tlist entries for inherited mixed UPDATE/DELETE.
rewriteTargetListUD's processing is dependent on the relkind of the query's
target table.  That was fine at the time it was made to act that way, even
for queries on inheritance trees, because all tables in an inheritance tree
would necessarily be plain tables.  However, the 9.5 feature addition
allowing some members of an inheritance tree to be foreign tables broke the
assumption that rewriteTargetListUD's output tlist could be applied to all
child tables with nothing more than column-number mapping.  This led to
visible failures if foreign child tables had row-level triggers, and would
also break in cases where child tables belonged to FDWs that used methods
other than CTID for row identification.

To fix, delay running rewriteTargetListUD until after the planner has
expanded inheritance, so that it is applied separately to the (already
mapped) tlist for each child table.  We can conveniently call it from
preprocess_targetlist.  Refactor associated code slightly to avoid the
need to heap_open the target relation multiple times during
preprocess_targetlist.  (The APIs remain a bit ugly, particularly around
the point of which steps scribble on parse->targetList and which don't.
But avoiding such scribbling would require a change in FDW callback APIs,
which is more pain than it's worth.)

Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when
we transition from rows providing a CTID to rows that don't.  (That's
really an independent bug, but it manifests in much the same cases.)

Add a regression test checking one manifestation of this problem, which
was that row-level triggers on a foreign child table did not work right.

Back-patch to 9.5 where the problem was introduced.

Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat

Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru
2017-11-27 17:54:07 -05:00
Simon Riggs 59af8d4384 Pad XLogReaderState's per-buffer data_bufsz more aggressively.
Originally, we palloc'd this buffer just barely big enough to hold the
largest xlog backup block seen so far. We now MAXALIGN the palloc size.

The original coding could result in many repeated palloc cycles, in the
worst case where we see a series ofgradually larger xlog records.  We
ameliorate that similarly to 8735978e7a
by imposing a minimum buffer size of BLCKSZ.

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-27 09:34:05 +00:00
Tom Lane 8735978e7a Pad XLogReaderState's main_data buffer more aggressively.
Originally, we palloc'd this buffer just barely big enough to hold the
largest xlog record seen so far.  It turns out that that can result in
valgrind complaints, because some compilers will emit code that assumes
it can safely fetch padding bytes at the end of a struct, and those
padding bytes were unallocated so far as aset.c was concerned.  We can
fix that by MAXALIGN'ing the palloc request size, ensuring that it is big
enough to include any possible padding that might've been omitted from
the on-disk record.

An additional objection to the original coding is that it could result in
many repeated palloc cycles, in the worst case where we see a series of
gradually larger xlog records.  We can ameliorate that cheaply by
imposing a minimum buffer size that's large enough for most xlog records.
BLCKSZ/2 was chosen after a bit of discussion.

In passing, remove an obsolete comment in struct xl_heap_new_cid that the
combocid field is free due to alignment considerations.  Perhaps that was
true at some point, but it's not now.

Back-patch to 9.5 where this code came in.

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-26 15:17:24 -05:00
Joe Conway 752714dd9d Make has_sequence_privilege support WITH GRANT OPTION
The various has_*_privilege() functions all support an optional
WITH GRANT OPTION added to the supported privilege types to test
whether the privilege is held with grant option. That is, all except
has_sequence_privilege() variations. Fix that.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/005147f6-8280-42e9-5a03-dd2c1e4397ef@joeconway.com
2017-11-26 09:49:40 -08:00
Tom Lane 9b63c13f0a Repair failure with SubPlans in multi-row VALUES lists.
When nodeValuesscan.c was written, it was impossible to have a SubPlan in
VALUES --- any sub-SELECT there would have to be uncorrelated and thereby
would produce an InitPlan instead.  We therefore took a shortcut in the
logic that throws away a ValuesScan's per-row expression evaluation data
structures.  This was broken by the introduction of LATERAL however; a
sub-SELECT containing a lateral reference produces a correlated SubPlan.

The cleanest fix for this would be to give up the optimization of
discarding the expression eval state.  But that still seems pretty
unappetizing for long VALUES lists.  It seems to work to just prevent
the subexpressions from hooking into the ValuesScan node's subPlan
list, so let's do that and see how well it works.  (If this breaks,
due to additional connections between the subexpressions and the outer
query structures, we might consider compromises like throwing away data
only for VALUES rows not containing SubPlans.)

Per bug #14924 from Christian Duta.  Back-patch to 9.3 where LATERAL
was introduced.

Discussion: https://postgr.es/m/20171124120836.1463.5310@wrigleys.postgresql.org
2017-11-25 14:15:48 -05:00
Tom Lane ab97aaac8f Update buffile.h/.c comments for removal of non-temp option.
Commit 11e264517 removed BufFile's isTemp flag, thereby eliminating
the possibility of resurrecting BufFileCreate().  But it left that
function in place, as well as a bunch of comments describing how things
worked for the non-temp-file case.  At best, that's now a source of
confusion.  So remove the long-since-commented-out function and change
relevant comments.

I (tgl) wanted to rename BufFileCreateTemp() to BufFileCreate(), but
that seems not to be the consensus position, so leave it as-is.

In passing, fix commit f0828b2fc's failure to update BufFileSeek's
comment to match the change of its argument type from long to off_t.
(I think that might actually have been intentional at the time, but
now that 64-bit off_t is nearly universal, it looks anachronistic.)

Thomas Munro and Tom Lane

Discussion: https://postgr.es/m/E1eFVyl-0008J1-RO@gemulon.postgresql.org
2017-11-25 13:19:43 -05:00
Tom Lane df3a66e282 Improve planner's handling of set-returning functions in grouping columns.
Improve query_is_distinct_for() to accept SRFs in the targetlist when
we can prove distinctness from a DISTINCT clause.  In that case the
de-duplication will surely happen after SRF expansion, so the proof
still works.  Continue to punt in the case where we'd try to prove
distinctness from GROUP BY (or, in the future, source relations).
To do that, we'd have to determine whether the SRFs were in the
grouping columns or elsewhere in the tlist, and it still doesn't
seem worth the trouble.  But this trivial change allows us to
recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces
unique-ified output, which seems worth having.

Also, fix estimate_num_groups() to consider the possibility of SRFs in
the grouping columns.  Its failure to do so was masked before v10 because
grouping_planner() scaled up plan rowcount estimates by the estimated SRF
multiplier after performing grouping.  That doesn't happen anymore, which
is more correct, but it means we need an adjustment in the estimate for
the number of groups.  Failure to do this leads to an underestimate for
the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)"
compared to what 9.6 and earlier estimated, thus breaking plan choices
in some cases.

Per report from Dmitry Shalashov.  Back-patch to v10 to avoid degraded
plan choices compared to previous releases.

Discussion: https://postgr.es/m/CAKPeCUGAeHgoh5O=SvcQxREVkoX7UdeJUMj1F5=aBNvoTa+O8w@mail.gmail.com
2017-11-25 11:48:09 -05:00
Robert Haas b10967eddf Avoid projecting tuples unnecessarily in Gather and Gather Merge.
It's most often the case that the target list for the Gather (Merge)
node matches the target list supplied by the underlying plan node;
when this is so, we can avoid the overhead of projecting.

This depends on commit f455e1125e for
proper functioning.

Idea by Andres Freund.  Patch by me.  Review by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
2017-11-25 10:49:17 -05:00
Tom Lane 0f2458ff5f Improve valgrind logic in aset.c, and fix multiple issues in generation.c.
Revise aset.c so that all the "private" fields of chunk headers are
marked NOACCESS when outside the module, improving on the previous
coding which protected only requested_size.  Fix a couple of corner
case bugs, such as failing to re-protect the header during a failure
exit from AllocSetRealloc, and wrong padding-size calculation for an
oversize allocation request.

Apply the same design to generation.c, and also fix several bugs therein
that I found by dint of hacking the code to use generation.c as the
standard allocator and then running the core regression tests with it.
Notably, we have to track the actual size of each block, else the
wipe_mem call in GenerationReset clears the wrong amount of memory for
an oversize-chunk block; and GenerationCheck needs a way of identifying
freed chunks that isn't fooled by palloc(0).  I chose to fix the latter
by resetting the context pointer to NULL in a freed chunk, roughly like
what happens in a freed aset.c chunk.

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-24 19:28:19 -05:00
Tom Lane f65d21b258 Mostly-cosmetic improvements in memory chunk header alignment coding.
Add commentary about what we're doing and why.  Apply the method used for
padding in GenerationChunk to AllocChunkData, replacing the rather ad-hoc
solution used in commit 7e3aa03b4.  Reorder fields in GenerationChunk so
that the padding calculation will work even if sizeof(size_t) is different
from sizeof(void *) --- likely that will never happen, but we don't need
the assumption if we do it like this.  Improve static assertions about
alignment.

In passing, fix a couple of oversights in the "large chunk" path in
GenerationAlloc().

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-24 15:50:22 -05:00
Tom Lane cc3c4af4a9 Fix bug in generation.c's valgrind support.
This doesn't look like the last such bug, but it's one that the
test_decoding regression test is tripping over.  Per buildfarm.

Tomas Vondra

Discussion: https://postgr.es/m/c903f275-2150-fa52-64bf-dca7b53ebf8d@fuzzy.cz
2017-11-24 13:43:34 -05:00
Dean Rasheed 9c55391f0f RLS comment fixes.
The comments in get_policies_for_relation() say that CREATE POLICY
does not support defining restrictive policies. This is no longer
true, starting from PG10.
2017-11-24 14:14:40 +00:00
Andres Freund 59b71c6fe6 Fix handling of NULLs returned by aggregate combine functions.
When strict aggregate combine functions, used in multi-stage/parallel
aggregation, returned NULL, we didn't check for that, invoking the
combine function with NULL the next round, despite it being strict.

The equivalent code invoking normal transition functions has a check
for that situation, which did not get copied in a7de3dc5c3. Fix the
bug by adding the equivalent check.

Based on a quick look I could not find any strict combine functions in
core actually returning NULL, and it doesn't seem very likely external
users have done so. So this isn't likely to have caused issues in
practice.

Add tests verifying transition / combine functions returning NULL is
tested.

Reported-By: Andres Freund
Author: Andres Freund
Discussion: https://postgr.es/m/20171121033642.7xvmjqrl4jdaaat3@alap3.anarazel.de
Backpatch: 9.6, where parallel aggregation was introduced
2017-11-23 17:15:27 -08:00
Tom Lane 07bd77b95a Ensure sizeof(GenerationChunk) is maxaligned.
Per buildfarm.

Also improve some comments.
2017-11-23 17:02:15 -05:00
Simon Riggs b99661c2ff Tweak code for older compilers
Attempt to quiesce build farm

Author: Tomas Vondra
2017-11-23 06:55:18 +11:00
Simon Riggs a4ccc1cef5 Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Author: Tomas Vondra
Reviewed-by: Simon Riggs
2017-11-23 05:45:07 +11:00
Simon Riggs 7e17a6889a Set es_output_cid in replication worker
Allows triggers to operate correctly

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
2017-11-22 16:28:14 +11:00
Robert Haas ae65f6066d Provide for forward compatibility with future minor protocol versions.
Previously, any attempt to request a 3.x protocol version other than
3.0 would lead to a hard connection failure, which made the minor
protocol version really no different from the major protocol version
and precluded gentle protocol version breaks.  Instead, when the
client requests a 3.x protocol version where x is greater than 0, send
the new NegotiateProtocolVersion message to convey that we support
only 3.0.  This makes it possible to introduce new minor protocol
versions without requiring a connection retry when the server is
older.

In addition, if the startup packet includes name/value pairs where
the name starts with "_pq_.", assume that those are protocol options,
not GUCs.  Include those we don't support (i.e. all of them, at
present) in the NegotiateProtocolVersion message so that the client
knows they were not understood.  This makes it possible for the
client to request previously-unsupported features without bumping
the protocol version at all; the client can tell from the server's
response whether the option was understood.

It will take some time before servers that support these new
facilities become common in the wild; to speed things up and make
things easier for a future 3.1 protocol version, back-patch to all
supported releases.

Robert Haas and Badrul Chowdhury

Discussion: http://postgr.es/m/BN6PR21MB0772FFA0CBD298B76017744CD1730@BN6PR21MB0772.namprd21.prod.outlook.com
Discussion: http://postgr.es/m/30788.1498672033@sss.pgh.pa.us
2017-11-21 13:56:24 -05:00
Robert Haas f3b0897a12 Fix multiple problems with satisfies_hash_partition.
Fix the function header comment to describe the actual behavior.
Check that table OID, modulus, and remainder arguments are not NULL
before accessing them.  Check that the modulus and remainder are
sensible.  If the table OID doesn't exist, return NULL instead of
emitting an internal error, similar to what we do elsewhere.  Check
that the actual argument types match, or at least are binary coercible
to, the expected argument types.  Correctly handle invocation of this
function using the VARIADIC syntax.  Add regression tests.

Robert Haas and Amul Sul, per a report by Andreas Seltenreich and
subsequent followup investigation.

Discussion: http://postgr.es/m/871sl4sdrv.fsf@ansel.ydns.eu
2017-11-21 13:06:32 -05:00
Tom Lane 84669c9b06 Use out-of-line M68K spinlock code for OpenBSD as well as NetBSD.
David Carlier (from a patch being carried by OpenBSD packagers)

Discussion: https://postgr.es/m/CA+XhMqzwFSGVU7MEnfhCecc8YdP98tigXzzpd0AAdwaGwaVXEA@mail.gmail.com
2017-11-20 18:05:17 -05:00
Simon Riggs 2ede45c3a4 Fix pg_control_checkpoint from commit 4b0d28de06
Author: Simon Riggs <simon@2ndQuadrant.com>
Reported-By: Andreas Seltenreich <seltenreich@gmx.de>
2017-11-21 08:00:54 +11:00
Robert Haas 0510421ee0 Tweak use of ExecContextForcesOids by Gather (Merge).
Specifically, pass the outer plan's PlanState instead of our own
PlanState.  At present, ExecContextForcesOids doesn't actually care
which PlanState we pass; it just looks through to the underlying
EState to find the result relation or top-level eflags.  However, in
the future it might care.  If that happens, and if our goal is to get
a tuple descriptor that matches that of the outer plan, then I think
what we care about is whether the outer plan's context forces OIDs,
rather than whether our own context forces OIDs, just as we use the
outer node's target list rather than our own.

Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
2017-11-20 12:34:06 -05:00
Robert Haas f455e1125e Pass eflags down to parallel workers.
Currently, there are no known consequences of this oversight, so no
back-patch.  Several of the EXEC_FLAG_* constants aren't usable in
parallel mode anyway, and potential problems related to the presence
or absence of OIDs (see EXEC_FLAG_WITH_OIDS, EXEC_FLAG_WITHOUT_OIDS)
seem at present to be masked by the unconditional projection step
performed by Gather and Gather Merge.  In general, however, it seems
important that all participants agree on the values of these flags,
which modify executor behavior globally, and a pending patch to skip
projection in Gather (Merge) would be outright broken in certain cases
without this fix.

Patch by me, based on investigation of a test case provided by Amit
Kapila.  This patch was also reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
2017-11-20 12:00:33 -05:00
Simon Riggs c2513365a0 Parameter toast_tuple_target controls TOAST for new rows
Specifies the point at which we try to move long column values
into TOAST tables.

No effect on existing rows.

Discussion: https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2Ya00OyQQ@mail.gmail.com

Author: Simon Riggs <simon@2ndQudrant.com>
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndQuadrant.com>
2017-11-20 09:50:10 +11:00
Tom Lane 52f63bd916 Fix compiler warning in rangetypes_spgist.c.
On gcc 7.2.0, comparing pointer to (Datum) 0 produces a warning.
Treat it as a simple pointer to avoid that; this is more consistent
with comparable code elsewhere, anyway.

Tomas Vondra

Discussion: https://postgr.es/m/99410021-61ef-9a9a-9bc8-f733ece637ee@2ndquadrant.com
2017-11-18 16:46:29 -05:00
Tom Lane 4797f9b519 Merge near-duplicate code in RI triggers.
Merge ri_restrict_del and ri_restrict_upd into one function ri_restrict.
Create a function ri_setnull that is the common implementation of
RI_FKey_setnull_del and RI_FKey_setnull_upd.  Likewise create a function
ri_setdefault that is the common implementation of RI_FKey_setdefault_del
and RI_FKey_setdefault_upd.  All of these pairs of functions were identical
except for needing to check for no-actual-key-change in the UPDATE cases;
the one extra if-test is a small price to pay for saving so much code.

Aside from removing about 400 lines of essentially duplicate code, this
allows us to recognize that we were uselessly caching two identical plans
whenever there were pairs of triggers using these duplicated functions
(which is likely very common).

Ildar Musin, reviewed by Ildus Kurbangaliev

Discussion: https://postgr.es/m/ca7064a7-6adc-6f22-ca47-8615ba9425a5@postgrespro.ru
2017-11-18 16:24:05 -05:00
Tom Lane 976a1a48fc Improve to_date/to_number/to_timestamp behavior with multibyte characters.
The documentation says that these functions skip one input character
per literal (non-pattern) format character.  Actually, though, they
skipped one input *byte* per literal *byte*, which could be hugely
confusing if either data or format contained multibyte characters.

To fix, adjust the FormatNode representation and parse_format() so
that multibyte format characters are stored as one FormatNode not
several, and adjust the data-skipping bits to advance by pg_mblen()
not necessarily one byte.  There's no user-visible behavior change
on the to_char() side, although the internal representation changes.

Commit e87d4965b had already fixed most places where we skip characters
on the basis of non-literal format patterns to advance by characters
not bytes, but this gets one more place, the SKIP_THth macro.  I think
everything in formatting.c gets that right now.

It'd be nice to have some regression test cases covering this behavior;
but of course there's no way to do so in an encoding-agnostic way, and
many of the interesting aspects would also require unportable locale
selections.  So I've not bothered here.

Discussion: https://postgr.es/m/28186.1510957703@sss.pgh.pa.us
2017-11-18 12:42:52 -05:00
Tom Lane 63ca86318d Fix quoted-substring handling in format parsing for to_char/to_number/etc.
This code evidently intended to treat backslash as an escape character
within double-quoted substrings, but it was sufficiently confused that
cases like ..."foo\\"... did not work right: the second backslash
managed to quote the double-quote after it, despite being quoted itself.
Rewrite to get that right, while preserving the existing behavior
outside double-quoted substrings, which is that backslash isn't special
except in the combination \".

Comparing to Oracle, it seems that their version of to_char() for
timestamps allows literal alphanumerics only within double quotes, while
non-alphanumerics are allowed outside quotes; backslashes aren't special
anywhere; there is no way at all to emit a literal double quote.
(Bizarrely, their to_char() for numbers is different; it doesn't allow
literal text at all AFAICT.)  The fact that they don't treat backslash
as special justifies our existing behavior for backslash outside double
quotes.  I considered making backslash inside double quotes act the same
way (ie, special only if before "), which in a green field would be a
more consistent behavior.  But that would likely break more existing SQL
code than what this patch does.

Add some test cases illustrating this behavior.  (Only the last new
case actually changes behavior in this commit.)

Little of this behavior was documented, either, so fix that.

Discussion: https://postgr.es/m/3626.1510949486@sss.pgh.pa.us
2017-11-18 12:16:37 -05:00
Peter Eisentraut 9288d62bb4 Support channel binding 'tls-unique' in SCRAM
This is the basic feature set using OpenSSL to support the feature.  In
order to allow the frontend and the backend to fetch the sent and
expected TLS Finished messages, a PG-like API is added to be able to
make the interface pluggable for other SSL implementations.

This commit also adds a infrastructure to facilitate the addition of
future channel binding types as well as libpq parameters to control the
SASL mechanism names and channel binding names.  Those will be added by
upcoming commits.

Some tests are added to the SSL test suite to test SCRAM authentication
with channel binding.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-11-18 10:15:54 -05:00
Robert Haas 611fe7d479 Update postgresql.conf.sample comment for bgwriter_lru_maxpages
Commit 14ca9abfbe should have done
this, but did not.

Jeff Janes

Discussion: http://postgr.es/m/CAMkU=1yWOvL+YFYzGM9yXSoWjxr_5_Ny78pPzLKQCkfgB7H-JQ@mail.gmail.com
2017-11-17 14:52:00 -05:00
Tom Lane e87d4965bd Prevent to_number() from losing data when template doesn't match exactly.
Non-data template patterns would consume characters whether or not those
characters were what the pattern expected, for example
	SELECT TO_NUMBER('1234', '9,999');
produced 134 because the '2' got eaten by the comma pattern.  This seems
undesirable, not least because it doesn't happen in Oracle.  For the ','
and 'G' template patterns, we can fix this by consuming characters only
if they match what the pattern would output.  For non-data patterns such
as 'L' and 'TH', it seems impractical to tighten things up to the point of
consuming only exact matches to what the pattern would output; but we can
improve matters quite a lot by redefining the behavior as "consume only
characters that aren't digits, signs, decimal point, or comma".

Also, fix it so that the behavior is to consume the number of *characters*
the pattern would output, not the number of *bytes*.  The old coding would
do surprising things with non-ASCII currency symbols, for example.  (It
would be good to apply that rule for literal text as well, but this commit
only fixes it for non-data patterns.)

Oliver Ford, reviewed by Thomas Munro and Nathan Wagner, and whacked around
a bit more by me

Discussion: https://postgr.es/m/CAGMVOdvpbMqPf9XWNzOwBpzJfErkydr_fEGhmuDGa015z97mwg@mail.gmail.com
2017-11-17 12:04:13 -05:00
Andres Freund 11e264517d Remove BufFile's isTemp flag.
The isTemp flag controls whether buffile.c chops BufFile data up into
1GB segments on disk.  Since it was badly named and always true, get
rid of it.

Author: Thomas Munro (based on suggestion by Peter Geoghegan)
Reviewed-By: Peter Geoghegan, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wz%3D%2B9Rfqh5UdvdW9rGezdhrMGGH-JL1X9FXXVZdeeGeOJA%40mail.gmail.com
2017-11-16 17:52:57 -08:00
Andres Freund 7082e614c0 Provide DSM segment to ExecXXXInitializeWorker functions.
Previously, executor nodes running in parallel worker processes didn't
have access to the dsm_segment object used for parallel execution.  In
order to support resource management based on DSM segment lifetime,
they need that.  So create a ParallelWorkerContext object to hold it
and pass it to all InitializeWorker functions.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
2017-11-16 17:39:18 -08:00
Tom Lane 687f096ea9 Make PL/Python handle domain-type conversions correctly.
Fix PL/Python so that it can handle domains over composite, and so that
it enforces domain constraints correctly in other cases that were not
always done properly before.  Notably, it didn't do arrays of domains
right (oversight in commit c12d570fa), and it failed to enforce domain
constraints when returning a composite type containing a domain field,
and if a transform function is being used for a domain's base type then
it failed to enforce domain constraints on the result.  Also, in many
places it missed checking domain constraints on null values, because
the plpy_typeio code simply wasn't called for Py_None.

Rather than try to band-aid these problems, I made a significant
refactoring of the plpy_typeio logic.  The existing design of recursing
for array and composite members is extended to also treat domains as
containers requiring recursion, and the APIs for the module are cleaned
up and simplified.

The patch also modifies plpy_typeio to rely on the typcache more than
it did before (which was pretty much not at all).  This reduces the
need for repetitive lookups, and lets us get rid of an ad-hoc scheme
for detecting changes in composite types.  I added a couple of small
features to typcache to help with that.

Although some of this is fixing bugs that long predate v11, I don't
think we should risk a back-patch: it's a significant amount of code
churn, and there've been no complaints from the field about the bugs.

Tom Lane, reviewed by Anthony Bykov

Discussion: https://postgr.es/m/24449.1509393613@sss.pgh.pa.us
2017-11-16 16:23:04 -05:00
Robert Haas 3b2787e1f8 Fix broken cleanup interlock for GIN pending list.
The pending list must (for correctness) always be cleaned up by vacuum, and
should (for the avoidance of surprising behavior) always be cleaned up
by an explicit call to gin_clean_pending_list, but cleanup is optional
when inserting.  The old logic got this backward: cleanup was forced
if (stats == NULL), but that's going to be *false* when vacuuming and
*true* for inserts.

Masahiko Sawada, reviewed by me.

Discussion: http://postgr.es/m/CAD21AoBLUSyiYKnTYtSAbC+F=XDjiaBrOUEGK+zUXdQ8owfPKw@mail.gmail.com
2017-11-16 14:19:27 -05:00
Robert Haas 6b2cd278a9 Fix typo in comment.
Etsuro Fujita

Discussion: http://postgr.es/m/5A0D7C3D.80803@lab.ntt.co.jp
2017-11-16 14:16:45 -05:00
Robert Haas 79f2d63713 Update postgresql.conf.sample to match pg_settings classificaitons.
A handful of settings, most notably shared_preload_libraries, were
just plain the wrong place compared to their assigned config_group
value in guc.c (and thus pg_settings).  In other cases the names of
the sections in postgresql.conf.sample were mildly different from
the corresponding entries in config_group_names[].  Make it all
consistent.

Adrián Escoms, reviewed by me.

Discussion: http://postgr.es/m/CACksPC2veEmFRYqwYepWYO9U7aFhAx6sYq+WqjTyHw7uV=E=pw@mail.gmail.com
2017-11-16 12:57:17 -05:00
Robert Haas e89a71fb44 Pass InitPlan values to workers via Gather (Merge).
If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan
that computes it is attached to or above the Gather (Merge), force the
value to be computed before starting parallelism and pass it down to all
workers.  This allows us to use parallelism in cases where it previously
would have had to be rejected as unsafe.  We do - in this case - lose the
optimization that the value is only computed if it's actually used.  An
alternative strategy would be to have the first worker that needs the value
compute it, but one downside of that approach is that we'd then need to
select a parallel-safe path to compute the parameter value; it couldn't for
example contain a Gather (Merge) node.  At some point in the future, we
might want to consider both approaches.

Independent of that consideration, there is a great deal more work that
could be done to make more kinds of PARAM_EXEC parameters parallel-safe.
This infrastructure could be used to allow a Gather (Merge) on the inner
side of a nested loop (although that's not a very appealing plan) and
cases where the InitPlan is attached below the Gather (Merge) could be
addressed as well using various techniques.  But this is a good start.

Amit Kapila, reviewed and revised by me.  Reviewing and testing from
Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja.

Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com
2017-11-16 12:06:14 -05:00
Andrew Dunstan 98d54bb779 Back out the session_start and session_end hooks feature.
It's become apparent during testing that there are problems with at
least the testing regime. I don't think we should have it without a
working test regime, and the difficulties might indicate implementation
problems anyway, so I'm backing out the whole thing until that's sorted
out.

This reverts commits 7459484 9989f92 cd8ce3a
2017-11-16 11:35:02 -05:00
Robert Haas 4e5fe9ad19 Centralize executor-related partitioning code.
Some code is moved from partition.c, which has grown very quickly lately;
splitting the executor parts out might help to keep it from getting
totally out of control.  Other code is moved from execMain.c.  All is
moved to a new file execPartition.c.  get_partition_for_tuple now has
a new interface that more clearly separates executor concerns from
generic concerns.

Amit Langote.  A slight comment tweak by me.

Discussion: http://postgr.es/m/1f0985f8-3b61-8bc4-4350-baa6d804cb6d@lab.ntt.co.jp
2017-11-15 10:26:25 -05:00