Commit Graph

18130 Commits

Author SHA1 Message Date
Andres Freund 773aec7aa9 Do execGrouping.c via expression eval machinery.
This has a performance benefit on own, although not hugely so. The
primary benefit is that it will allow for to JIT tuple deforming and
comparator invocations.

Author: Andres Freund
Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
2018-02-15 21:55:31 -08:00
Tom Lane 51940f9760 Cast to void in StaticAssertExpr, not its callers.
Seems a bit silly that many (in fact all, as of today) uses of
StaticAssertExpr would need to cast it to void to avoid warnings from
pickier compilers.  Let's just do the cast right in the macro, instead.

In passing, change StaticAssertExpr to StaticAssertStmt in one
place where that seems more apropos.

Discussion: https://postgr.es/m/16161.1518715186@sss.pgh.pa.us
2018-02-15 13:41:30 -05:00
Tom Lane 9a725f7b5c Silence assorted "variable may be used uninitialized" warnings.
All of these are false positives, but in each case a fair amount of
analysis is needed to see that, and it's not too surprising that not all
compilers are smart enough.  (In particular, in the logtape.c case, a
compiler lacking the knowledge provided by the Assert would almost surely
complain, so that this warning will be seen in any non-assert build.)

Some of these are of long standing while others are pretty recent,
but it only seems worth fixing them in HEAD.

Jaime Casanova, tweaked a bit by me

Discussion: https://postgr.es/m/CAJGNTeMcYAMJdPAom52dppLMtF-UnEZi0dooj==75OEv1EoBZA@mail.gmail.com
2018-02-14 16:06:49 -05:00
Tom Lane 4b93f57999 Make plpgsql use its DTYPE_REC code paths for composite-type variables.
Formerly, DTYPE_REC was used only for variables declared as "record";
variables of named composite types used DTYPE_ROW, which is faster for
some purposes but much less flexible.  In particular, the ROW code paths
are entirely incapable of dealing with DDL-caused changes to the number
or data types of the columns of a row variable, once a particular plpgsql
function has been parsed for the first time in a session.  And, since the
stored representation of a ROW isn't a tuple, there wasn't any easy way
to deal with variables of domain-over-composite types, since the domain
constraint checking code would expect the value to be checked to be a
tuple.  A lesser, but still real, annoyance is that ROW format cannot
represent a true NULL composite value, only a row of per-field NULL
values, which is not exactly the same thing.

Hence, switch to using DTYPE_REC for all composite-typed variables,
whether "record", named composite type, or domain over named composite
type.  DTYPE_ROW remains but is used only for its native purpose, to
represent a fixed-at-compile-time list of variables, for instance the
targets of an INTO clause.

To accomplish this without taking significant performance losses, introduce
infrastructure that allows storing composite-type variables as "expanded
objects", similar to the "expanded array" infrastructure introduced in
commit 1dc5ebc90.  A composite variable's value is thereby kept (most of
the time) in the form of separate Datums, so that field accesses and
updates are not much more expensive than they were in the ROW format.
This holds the line, more or less, on performance of variables of named
composite types in field-access-intensive microbenchmarks, and makes
variables declared "record" perform much better than before in similar
tests.  In addition, the logic involved with enforcing composite-domain
constraints against updates of individual fields is in the expanded
record infrastructure not plpgsql proper, so that it might be reusable
for other purposes.

In further support of this, introduce a typcache feature for assigning a
unique-within-process identifier to each distinct tuple descriptor of
interest; in particular, DDL alterations on composite types result in a new
identifier for that type.  This allows very cheap detection of the need to
refresh tupdesc-dependent data.  This improves on the "tupDescSeqNo" idea
I had in commit 687f096ea: that assigned identifying sequence numbers to
successive versions of individual composite types, but the numbers were not
unique across different types, nor was there support for assigning numbers
to registered record types.

In passing, allow plpgsql functions to accept as well as return type
"record".  There was no good reason for the old restriction, and it
was out of step with most of the other PLs.

Tom Lane, reviewed by Pavel Stehule

Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us
2018-02-13 18:52:21 -05:00
Peter Eisentraut 7a32ac8a66 Add procedure support to pg_get_functiondef
This also makes procedures work in psql's \ef and \sf commands.

Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
2018-02-13 15:13:44 -05:00
Peter Eisentraut ebdb42a0d6 Fix typo
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2018-02-12 22:39:52 -05:00
Alvaro Herrera 8237f27b50 get_relid_attribute_name is dead, long live get_attname
The modern way is to use a missing_ok argument instead of two separate
almost-identical routines, so do that.

Author: Michaël Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz
2018-02-12 19:33:15 -03:00
Robert Haas 88ef48c1cc Fix parallel index builds for dynamic_shared_memory_type=none.
The previous code failed to realize that this setting effectively
disables parallelism, and would crash if it decided to attempt
parallelism anyway.  Instead, treat it as a disabling condition.

Kyotaro Horiguchi, who also reported the issue. Reviewed by Michael
Paquier and Peter Geoghegan.

Discussion: http://postgr.es/m/20180209.170635.256350357.horiguchi.kyotaro@lab.ntt.co.jp
2018-02-12 12:55:12 -05:00
Tom Lane d02d4a6d4f Avoid premature free of pass-by-reference CALL arguments.
Prematurely freeing the EState used to evaluate CALL arguments led, in some
cases, to passing dangling pointers to the procedure.  This was masked in
trivial cases because the argument pointers would point to Const nodes in
the original expression tree, and in some other cases because the result
value would end up in the standalone ExprContext rather than in memory
belonging to the EState --- but that wasn't exactly high quality
programming either, because the standalone ExprContext was never
explicitly freed, breaking assorted API contracts.

In addition, using a separate EState for each argument was just silly.

So let's use just one EState, and one ExprContext, and make the latter
belong to the former rather than be standalone, and clean up the EState
(and hence the ExprContext) post-call.

While at it, improve the function's commentary a bit.

Discussion: https://postgr.es/m/29173.1518282748@sss.pgh.pa.us
2018-02-10 13:37:12 -05:00
Tom Lane 65b1d76785 Fix oversight in CALL argument handling, and do some minor cleanup.
CALL statements cannot support sub-SELECTs in the arguments of the called
procedure, since they just use ExecEvalExpr to evaluate such arguments.
Teach transformSubLink() to reject the case, as it already does for other
contexts in which subqueries are not supported.

In passing, s/EXPR_KIND_CALL/EXPR_KIND_CALL_ARGUMENT/ to make that enum
symbol line up more closely with the phrasing of the error messages it is
associated with.  And fix someone's weak grasp of English grammar in the
preceding EXPR_KIND_PARTITION_EXPRESSION addition.  Also update an
incorrect comment in resolve_unique_index_expr (possibly it was correct
when written, but nowadays transformExpr definitely does reject SRFs here).

Per report from Pavel Stehule --- but this resolves only one of the bugs
he mentions.

Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfWZ3zMGkm568Q@mail.gmail.com
2018-02-10 13:05:14 -05:00
Robert Haas be42015fcc Clear stmt_timeout_active if we disable_all_timeouts.
Otherwise, we can end up with the flag set when the timeout is
actually disabled, leading to misbehavior.  Commit
f8e5f156b3 introduced this bug.

Reported by Peter Eisentraut.  Analysis and fix by Thomas Munro,
tweaked by me.

Discussion: http://postgr.es/m/6a909374-2602-7136-8c70-397330a418f3@2ndquadrant.com
2018-02-09 15:48:18 -05:00
Robert Haas b78d0160da Fix incorrect method name in comment.
Atsushi Torikoshi

Discussion: http://postgr.es/m/1b056262-4bc0-a982-c899-bb67a0a7fd52@lab.ntt.co.jp
2018-02-08 14:35:54 -05:00
Robert Haas e44dd84325 Avoid listing the same ResultRelInfo in more than one EState list.
Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple
times.  Commit 2f17844104 seems to
be to blame for this.

Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.
2018-02-08 14:29:05 -05:00
Robert Haas 88fdc70060 Fix possible infinite loop with Parallel Append.
When the previously-chosen plan was non-partial, all pa_finished
flags for partial plans are now set, and pa_next_plan has not yet
been set to INVALID_SUBPLAN_INDEX, the previous code could go into
an infinite loop.

Report by Rajkumar Raghuwanshi.  Patch by Amit Khandekar and me.
Review by Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CAJ3gD9cf43z78qY=U=H0HvOEN341qfRO-vLpnKPSviHeWgJQ5w@mail.gmail.com
2018-02-08 12:31:48 -05:00
Peter Eisentraut 32ff269117 Add more information_schema columns
- table_constraints.enforced
- triggers.action_order
- triggers.action_reference_old_table
- triggers.action_reference_new_table

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-02-07 10:08:02 -05:00
Robert Haas b98a7cd58f Update out-of-date comment in StartupXLOG.
Commit 4b0d28de06 should have updated
this comment, but did not.

Thomas Munro

Discussion: http://postgr.es/m/CAEepm=0iJ8aqQcF9ij2KerAkuHF3SwrVTzjMdm1H4w++nfBf9A@mail.gmail.com
2018-02-07 08:48:04 -05:00
Tom Lane 0a459cec96 Support all SQL:2011 options for window frame clauses.
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"
frame boundaries in window functions.  We'd punted on that back in the
original patch to add window functions, because it was not clear how to
do it in a reasonably data-type-extensible fashion.  That problem is
resolved here by adding the ability for btree operator classes to provide
an "in_range" support function that defines how to add or subtract the
RANGE offset value.  Factoring it this way also allows the operator class
to avoid overflow problems near the ends of the datatype's range, if it
wishes to expend effort on that.  (In the committed patch, the integer
opclasses handle that issue, but it did not seem worth the trouble to
avoid overflow failures for datetime types.)

The patch includes in_range support for the integer_ops opfamily
(int2/int4/int8) as well as the standard datetime types.  Support for
other numeric types has been requested, but that seems like suitable
material for a follow-on patch.

In addition, the patch adds GROUPS mode which counts the offset in
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion
options specified by SQL:2011.  As far as I can see, we are now fully
up to spec on window framing options.

Existing behaviors remain unchanged, except that I changed the errcode
for a couple of existing error reports to meet the SQL spec's expectation
that negative "offset" values should be reported as SQLSTATE 22013.

Internally and in relevant parts of the documentation, we now consistently
use the terminology "offset PRECEDING/FOLLOWING" rather than "value
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.

Oliver Ford, reviewed and whacked around some by me

Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
2018-02-07 00:06:56 -05:00
Robert Haas 2320945731 Fix incorrect grammar.
Etsuro Fujita

Discussion: http://postgr.es/m/5A7981EA.8020201@lab.ntt.co.jp
2018-02-06 15:50:13 -05:00
Robert Haas 9fafa413ac Avoid valgrind complaint about write() of uninitalized bytes.
LogicalTapeFreeze() may write out its first block when it is dirty but
not full, and then immediately read the first block back in from its
BufFile as a BLCKSZ-width block.  This can only occur in rare cases
where very few tuples were written out, which is currently only
possible with parallel external tuplesorts.  To avoid valgrind
complaints, tell it to treat the tail of logtape.c's buffer as
defined.

Commit 9da0cc3528 exposed this problem
but did not create it.  LogicalTapeFreeze() has always tended to write
out some amount of garbage bytes, but previously never wrote less than
one block of data in total, so the problem was masked.

Per buildfarm members lousyjack and skink.

Peter Geoghegan, based on a suggestion from Tom Lane and me.  Some
comment revisions by me.
2018-02-06 14:24:57 -05:00
Tom Lane 3785f7eee3 Doc: move info for btree opclass implementors into main documentation.
Up to now, useful info for writing a new btree opclass has been buried
in the backend's nbtree/README file.  Let's move it into the SGML docs,
in preparation for extending it with info about "in_range" functions
in the upcoming window RANGE patch.

To do this, I chose to create a new chapter for btree indexes in Part VII
(Internals), parallel to the chapters that exist for the newer index AMs.
This is a pretty short chapter as-is.  At some point somebody might care
to flesh it out with more detail about btree internals, but that is
beyond the scope of my ambition for today.

Discussion: https://postgr.es/m/23141.1517874668@sss.pgh.pa.us
2018-02-06 13:52:27 -05:00
Robert Haas f069c91a57 Fix possible crash in partition-wise join.
The previous code assumed that we'd always succeed in creating
child-joins for a joinrel for which partition-wise join was considered,
but that's not guaranteed, at least in the case where dummy rels
are involved.

Ashutosh Bapat, with some wordsmithing by me.

Discussion: http://postgr.es/m/CAFjFpRf8=uyMYYfeTBjWDMs1tR5t--FgOe2vKZPULxxdYQ4RNw@mail.gmail.com
2018-02-05 17:31:57 -05:00
Tom Lane 3492a0af0b Fix RelationBuildPartitionKey's processing of partition key expressions.
Failure to advance the list pointer while reading partition expressions
from a list results in invoking an input function with inappropriate data,
possibly leading to crashes or, with carefully crafted input, disclosure
of arbitrary backend memory.

Bug discovered independently by Álvaro Herrera and David Rowley.
This patch is by Álvaro but owes something to David's proposed fix.
Back-patch to v10 where the issue was introduced.

Security: CVE-2018-1052
2018-02-05 10:37:30 -05:00
Tom Lane 05d0f13f07 Skip setting up shared instrumentation for Hash node if not needed.
We don't need to set up the shared space for hash join instrumentation data
if instrumentation hasn't been requested.  Let's follow the example of the
similar Sort node code and save a few cycles by skipping that when we can.

This reverts commit d59ff4ab3 and instead allows us to use the safer choice
of passing noError = false to shm_toc_lookup in ExecHashInitializeWorker,
since if we reach that call there should be a TOC entry to be found.

Thomas Munro

Discussion: https://postgr.es/m/E1ehkoZ-0005uW-43%40gemulon.postgresql.org
2018-02-04 22:14:07 -05:00
Tom Lane d59ff4ab31 Fix another instance of unsafe coding for shm_toc_lookup failure.
One or another author of commit 5bcf389ec seems to have thought that
computing an offset from a NULL pointer would yield another NULL pointer.
There may possibly be architectures where that works, but common machines
don't work like that.  Per a quick code review of places calling
shm_toc_lookup and not using noError = false.
2018-02-02 18:32:05 -05:00
Tom Lane 957ff087c8 Be more wary about shm_toc_lookup failure.
Commit 445dbd82a basically missed the point of commit d46633506,
which was that we shouldn't allow shm_toc_lookup() failure to lead
to a core dump or assertion crash, because the odds of such a
failure should never be considered negligible.  It's correct that
we can't expect the PARALLEL_KEY_ERROR_QUEUE TOC entry to be there
if we have no workers.  But if we have no workers, we're not going
to do anything in this function with the lookup result anyway,
so let's just skip it.  That lets the code use the easy-to-prove-safe
noError=false case, rather than anything requiring effort to review.

Back-patch to v10, like the previous commit.

Discussion: https://postgr.es/m/3647.1517601675@sss.pgh.pa.us
2018-02-02 18:26:07 -05:00
Peter Eisentraut 533c5d8bdd Fix application of identity values in some cases
Investigation of 2d2d06b7e2 revealed that
identity values were not applied in some further cases, including
logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD
COLUMN.  To fix all that, apply the identity column expression in
build_column_default() instead of repeating the same logic at each call
site.

For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding
completely ignored that existing rows for the new column should have
values filled in from the identity sequence.  The coding using
build_column_default() fails for this because the sequence ownership
isn't registered until after ALTER TABLE, and we can't do it before
because we don't have the column in the catalog yet.  So we specially
remember in ColumnDef the sequence name that we decided on and build a
custom NextValueExpr using that.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-02-02 14:39:10 -05:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Robert Haas 9aef173163 Refactor code for partition bound searching
Remove partition_bound_cmp() and partition_bound_bsearch(), whose
void * argument could be, depending on the situation, of any of
three different types: PartitionBoundSpec *, PartitionRangeBound *,
Datum *.

Instead, introduce separate bound-searching functions for each
situation: partition_list_bsearch, partition_range_bsearch,
partition_range_datum_bsearch, and partition_hash_bsearch.  This
requires duplicating the code for binary search, but it makes the
code much more type safe, involves fewer branches at runtime, and
at least in my opinion, is much easier to understand.

Along the way, add an option to partition_range_datum_bsearch
allowing the number of keys to be specified, so that we can search
for partitions based on a prefix of the full list of partition
keys.  This is important for pending work to improve partition
pruning.

Amit Langote, per a suggestion from me.

Discussion: http://postgr.es/m/CA+TgmoaVLDLc8=YESRwD32gPhodU_ELmXyKs77gveiYp+JE4vQ@mail.gmail.com
2018-02-02 09:32:44 -05:00
Robert Haas 9222c0d9ed Add new function WaitForParallelWorkersToAttach.
Once this function has been called, we know that all workers have
started and attached to their error queues -- so if any of them
subsequently exit uncleanly, we'll be sure to throw an ERROR promptly.
Otherwise, users of the ParallelContext machinery must be careful not
to wait forever for a worker that has failed to start.  Parallel query
manages to work without needing this for reasons explained in new
comments added by this patch, but it's a useful primitive for other
parallel operations, such as the pending patch to make creating a
btree index run in parallel.

Amit Kapila, revised by me.  Additional review by Peter Geoghegan.

Discussion: http://postgr.es/m/CAA4eK1+e2MzyouF5bg=OtyhDSX+=Ao=3htN=T-r_6s3gCtKFiw@mail.gmail.com
2018-02-02 09:00:59 -05:00
Robert Haas ad25a6b1f2 Fix possible failure to mark hash metapage dirty.
Report and suggested fix by Lixian Zou.  Amit Kapila put it
in the form of a patch and reviewed.

Discussion: http://postgr.es/m/151739848647.1239.12528851873396651946@wrigleys.postgresql.org
2018-02-01 15:23:45 -05:00
Robert Haas 22757960bb Fix typo: colums -> columns.
Along the way, also fix code indentation.

Alexander Lakhin, reviewed by Michael Paquier

Discussion: http://postgr.es/m/45c44aa7-7cfa-7f3b-83fd-d8300677fdda@gmail.com
2018-01-31 16:45:37 -05:00
Robert Haas 3ccdc6f9a5 Fix list partition constraints for partition keys of array type.
The old code generated always generated a constraint of the form
col = ANY(ARRAY[val1, val2, ...]), but that's invalid when col is an
array type.  Instead, generate col = val when there's only one value,
col = val1 OR col = val2 OR ... when there are multiple values and
col is of array type, and the old form when there are multiple values
and col is not of an array type.

As a side benefit, this makes constraint exclusion able to prune
a list partition declared to accept a single Boolean value, which
didn't work before.

Amit Langote, reviewed by Etsuro Fujita

Discussion: http://postgr.es/m/97267195-e235-89d1-a41a-c110198dfce9@lab.ntt.co.jp
2018-01-31 15:43:11 -05:00
Peter Eisentraut 38d485fdaa Fix up references to scram-sha-256
pg_hba_file_rules erroneously reported this as scram-sha256.  Fix that.

To avoid future errors and confusion, also adjust documentation links
and internal symbols to have a separator between "sha" and "256".

Reported-by: Christophe Courtois <christophe.courtois@dalibo.com>
Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-30 16:50:30 -05:00
Peter Eisentraut a044378ce2 Add some noreturn attributes to help static analyzers 2018-01-29 20:44:35 -05:00
Peter Eisentraut 07e524d3e9 Silence complaint about dead assignment
The preferred place for "placate compiler" assignments is after
elog(ERROR), not before it.  Otherwise, scan-build complains about a
dead assignment.
2018-01-29 20:43:43 -05:00
Andres Freund c12693d8f3 Introduce ExecQualAndReset() helper.
It's a common task to evaluate a qual and reset the corresponding
expression context. Currently that requires storing the result of the
qual eval, resetting the context, and then reacting on the result. As
that's awkward several places only reset the context next time through
a node. That's not great, so introduce a helper that evaluates and
resets.

It's a bit ugly that it currently uses MemoryContextReset() instead of
ResetExprContext(), but that seems easier than reordering all of
executor.h.

Author: Andres Freund
Discussion: https://postgr.es/m/20180109222544.f7loxrunqh3xjl5f@alap3.anarazel.de
2018-01-29 12:19:12 -08:00
Tom Lane 97d4445a03 Save a few bytes by removing useless last argument to SearchCatCacheList.
There's never any value in giving a fully specified cache key to
SearchCatCacheList: you might as well call SearchCatCache instead,
since there could be only one match.  So the maximum useful number of
key arguments is one less than the supported number of key columns.
We might as well remove the useless extra argument and save some few
bytes per call site, as well as a cycle or so per call.

I believe the reason it was coded like this is that originally, callers
had to write out all the dummy arguments in each call, and so it seemed
less confusing if SearchCatCache and SearchCatCacheList took the same
number of key arguments.  But since commit e26c539e9, callers only write
their live arguments explicitly, making that a non-factor; and there's
surely been enough time for third-party modules to adapt to that coding
style.  So this is only an ABI break not an API break for callers.

Per discussion with Oliver Ford, this might also make it less confusing
how to use SearchCatCacheList correctly.

Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us
2018-01-29 15:13:17 -05:00
Andres Freund fc96c69425 Initialize unused ExprEvalStep fields.
ExecPushExprSlots didn't initialize ExprEvalStep's resvalue/resnull
steps as it didn't use them. That caused wrong valgrind warnings for
an upcoming patch, so zero-intialize.

Also zero-initialize all scratch ExprEvalStep's allocated on the
stack, to avoid issues with similar future omissions of non-critial
data.
2018-01-29 12:01:07 -08:00
Andres Freund c068f87723 Improve bit perturbation in TupleHashTableHash.
The changes in b81b5a96f4 did not fully
address the issue, because the bit-mixing of the IV into the final
hash-key didn't prevent clustering in the input-data survive in the
output data.

This didn't cause a lot of problems because of the additional growth
conditions added d4c62a6b62. But as we
want to rein those in due to explosive growth in some edges, this
needs to be fixed.

Author: Andres Freund
Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org
Backpatch: 10, where simplehash was introduced
2018-01-29 11:24:57 -08:00
Tom Lane 35a528062c Add stack-overflow guards in set-operation planning.
create_plan_recurse lacked any stack depth check.  This is not per
our normal coding rules, but I'd supposed it was safe because earlier
planner processing is more complex and presumably should eat more
stack.  But bug #15033 from Andrew Grossman shows this isn't true,
at least not for queries having the form of a many-thousand-way
INTERSECT stack.

Further testing showed that recurse_set_operations is also capable
of being crashed in this way, since it likewise will recurse to the
bottom of a parsetree before calling any support functions that
might themselves contain any stack checks.  However, its stack
consumption is only perhaps a third of create_plan_recurse's.

It's possible that this particular problem with create_plan_recurse can
only manifest in 9.6 and later, since before that we didn't build a Path
tree for set operations.  But having seen this example, I now have no
faith in the proposition that create_plan_recurse doesn't need a stack
check, so back-patch to all supported branches.

Discussion: https://postgr.es/m/20180127050845.28812.58244@wrigleys.postgresql.org
2018-01-28 13:39:07 -05:00
Bruce Momjian 010123e144 C includes: Reorder C includes in partition.c
Discussion: https://postgr.es/m/5A69AA50.2060600@lab.ntt.co.jp

Author: Etsuro Fujita
2018-01-27 23:05:52 -05:00
Tom Lane 2e668c522e Avoid crash during EvalPlanQual recheck of an inner indexscan.
Commit 09529a70b changed nodeIndexscan.c and nodeIndexonlyscan.c to
postpone initialization of the indexscan proper until the first tuple
fetch.  It overlooked the question of mark/restore behavior, which means
that if some caller attempts to mark the scan before the first tuple fetch,
you get a null pointer dereference.

The only existing user of mark/restore is nodeMergejoin.c, which (somewhat
accidentally) will never attempt to set a mark before the first inner tuple
unless the inner child node is a Material node.  Hence the case can't arise
normally, so it seems sufficient to document the assumption at both ends.
However, during an EvalPlanQual recheck, ExecScanFetch doesn't call
IndexNext but just returns the jammed-in test tuple.  Therefore, if we're
doing a recheck in a plan tree with a mergejoin with inner indexscan,
it's possible to reach ExecIndexMarkPos with iss_ScanDesc still null,
as reported by Guo Xiang Tan in bug #15032.

Really, when there's a test tuple supplied during an EPQ recheck, touching
the index at all is the wrong thing: rather, the behavior of mark/restore
ought to amount to saving and restoring the es_epqScanDone flag.  We can
avoid finding a place to actually save the flag, for the moment, because
given the assumption that no caller will set a mark before fetching a
tuple, es_epqScanDone must always be set by the time we try to mark.
So the actual behavior change required is just to not reach the index
access if a test tuple is supplied.

The set of plan node types that need to consider this issue are those
that support EPQ test tuples (i.e., call ExecScan()) and also support
mark/restore; which is to say, IndexScan, IndexOnlyScan, and perhaps
CustomScan.  It's tempting to try to fix the problem in one place by
teaching ExecMarkPos() itself about EPQ; but ExecMarkPos supports some
plan types that aren't Scans, and also it seems risky to make assumptions
about what a CustomScan wants to do here.  Also, the most likely future
change here is to decide that we do need to support marks placed before
the first tuple, which would require additional work in IndexScan and
IndexOnlyScan in any case.  Hence, fix the EPQ issue in nodeIndexscan.c
and nodeIndexonlyscan.c, accepting the small amount of code duplicated
thereby, and leave it to CustomScan providers to fix this bug if they
have it.

Back-patch to v10 where commit 09529a70b came in.  In earlier branches,
the index_markpos() call is a waste of cycles when EPQ is active, but
no more than that, so it doesn't seem appropriate to back-patch further.

Discussion: https://postgr.es/m/20180126074932.3098.97815@wrigleys.postgresql.org
2018-01-27 13:52:24 -05:00
Tom Lane fb8697b31a Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily.  Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser.  Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.

This does create a user-visible behavioral change, which is that while
formerly all of these would work:
	alter table foo set (fillfactor = 50);
	alter table foo set (FillFactor = 50);
	alter table foo set ("fillfactor" = 50);
	alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others.  However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.

Daniel Gustafsson, reviewed by Michael Paquier, small changes by me

Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-26 18:25:14 -05:00
Robert Haas 9fd8b7d632 Factor some code out of create_grouping_paths.
This is preparatory refactoring to prepare the way for partition-wise
aggregate, which will reuse the new subroutines for child grouping
rels.  It also does not seem like a bad idea on general principle,
as the function was getting pretty long.

Jeevan Chalke.  The larger patch series of which this patch is a part
was reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi,
Ashutosh Bapat, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, and me.  Some cosmetic changes by me.

Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-01-26 15:03:12 -05:00
Tom Lane 4971d2a322 Remove the obsolete WITH clause of CREATE FUNCTION.
This clause was superseded by SQL-standard syntax back in 7.3.
We've kept it around for backwards-compatibility purposes ever since;
but 15 years seems like long enough for that, especially seeing that
there are undocumented weirdnesses in how it interacts with the
SQL-standard syntax for specifying the same options.

Michael Paquier, per an observation by Daniel Gustafsson;
some small cosmetic adjustments to nearby code by me.

Discussion: https://postgr.es/m/20180115022748.GB1724@paquier.xyz
2018-01-26 12:25:44 -05:00
Peter Eisentraut c1869542b3 Use abstracted SSL API in server connection log messages
The existing "connection authorized" server log messages used OpenSSL
API calls directly, even though similar abstracted API calls exist.
Change to use the latter instead.

Change the function prototype for the functions that return the TLS
version and the cipher to return const char * directly instead of
copying into a buffer.  That makes them slightly easier to use.

Add bits= to the message.  psql shows that, so we might as well show the
same information on the client and server.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-26 09:50:46 -05:00
Tom Lane bb415675d8 Add missing "static" markers.
Per buildfarm.
2018-01-25 14:32:28 -05:00
Alvaro Herrera 05fb5d6619 Ignore partitioned indexes where appropriate
get_relation_info() was too optimistic about opening indexes in
partitioned tables, which would raise errors when any queries were
planned on such tables.  Fix by ignoring any indexes of the partitioned
kind.

CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem.  Fix by
disallowing these commands in partitioned tables.

Fallout from 8b08f7d482.
2018-01-25 16:12:15 -03:00
Peter Eisentraut 0b5e33f667 Remove use of byte-masking macros in record_image_cmp
These were introduced in 4cbb646334, but
after further analysis and testing, they should not be necessary and
probably weren't the part of that commit that fixed anything.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-25 09:41:19 -05:00
Robert Haas 945f71db84 Avoid referencing off the end of subplan_partition_offsets.
Report by buildfarm member skink and Tom Lane.  Analysis by me.
Patch by Amit Khandekar.

Discussion: http://postgr.es/m/CAJ3gD9fVA1iXQYhfqHP5n_TEd4U9=V8TL_cc-oKRnRmxgdvJrQ@mail.gmail.com
2018-01-24 16:34:51 -05:00
Tom Lane 434e6e1484 Improve implementation of pg_attribute_always_inline.
Avoid compiler warnings on MSVC (which doesn't want to see both
__forceinline and inline) and ancient GCC (which doesn't have
__attribute__((always_inline))).

Don't force inline-ing when building at -O0, as the programmer is probably
hoping for exact source-to-object-line correspondence in that case.
(For the moment this only works for GCC; maybe we can extend it later.)

Make pg_attribute_always_inline be syntactically a drop-in replacement
for inline, rather than an additional wart.

And improve the comments.

Thomas Munro and Michail Nikolaev, small tweaks by me

Discussion: https://postgr.es/m/32278.1514863068@sss.pgh.pa.us
Discussion: https://postgr.es/m/CANtu0oiYp74brgntKOxgg1FK5+t8uQ05guSiFU6FYz_5KUhr6Q@mail.gmail.com
2018-01-23 23:07:13 -05:00
Tom Lane bb94ce4d26 Teach reparameterize_path() to handle AppendPaths.
If we're inside a lateral subquery, there may be no unparameterized paths
for a particular child relation of an appendrel, in which case we *must*
be able to create similarly-parameterized paths for each other child
relation, else the planner will fail with "could not devise a query plan
for the given query".  This means that there are situations where we'd
better be able to reparameterize at least one path for each child.

This calls into question the assumption in reparameterize_path() that
it can just punt if it feels like it.  However, the only case that is
known broken right now is where the child is itself an appendrel so that
all its paths are AppendPaths.  (I think possibly I disregarded that in
the original coding on the theory that nested appendrels would get folded
together --- but that only happens *after* reparameterize_path(), so it's
not excused from handling a child AppendPath.)  Given that this code's been
like this since 9.3 when LATERAL was introduced, it seems likely we'd have
heard of other cases by now if there were a larger problem.

Per report from Elvis Pranskevichus.  Back-patch to 9.3.

Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
2018-01-23 16:50:34 -05:00
Alvaro Herrera 95be5ce1bc Remove unnecessary include
autovacuum.c no longer needs dsa.h, since commit 31ae1638ce.
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoCWvYyXrvdANSHWWWEWJH5TeAWAkJ_2gqrHhukG+OBo1g@mail.gmail.com
2018-01-23 15:22:13 -03:00
Robert Haas 28e04155f1 Update obsolete sentence in README.parallel.
Since 9.6, heavyweight locking is not an abstract and unhandled
concern of the parallel machinery, but rather something to which
we have a specific approach.
2018-01-23 11:22:47 -05:00
Robert Haas 2badb5afb8 Report an ERROR if a parallel worker fails to start properly.
Commit 28724fd90d fixed things so that
if a background worker fails to start due to fork() failure or because
it is terminated before startup succeeds, BGWH_STOPPED will be
reported.  However, that only helps if the code that uses the
background worker machinery notices the change in status, and the code
in parallel.c did not.

To fix that, do two things.  First, make sure that when a worker
exits, it triggers the leader to read from error queues.  That way, if
a worker which has attached to an error queue exits uncleanly, the
leader is sure to throw some error, either the contents of the
ErrorResponse sent by the worker, or "lost connection to parallel
worker" if it exited without sending one.  To cover the case where
the worker never starts up in the first place or exits before
attaching to the error queue, the ParallelContext now keeps track
of which workers have sent at least one message via the error
queue.  A worker which sends no messages by the time the parallel
operation finishes will be checked to see whether it exited before
attaching to the error queue; if so, a new error message, "parallel
worker failed to initialize", will be reported.  If not, we'll
continue to wait until it either starts up and exits cleanly, starts
up and exits uncleanly, or fails to start, and then take the
appropriate action.

Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoYnBgXgdTu6wk5YPdWhmgabYc9nY_pFLq=tB=FSLYkD8Q@mail.gmail.com
2018-01-23 11:03:03 -05:00
Peter Eisentraut 1c2183403b Extract common bits from OpenSSL implementation
Some things in be-secure-openssl.c and fe-secure-openssl.c were not
actually specific to OpenSSL but could also be used by other
implementations.  In order to avoid copy-and-pasting, move some of that
code to common files.
2018-01-23 07:11:39 -05:00
Peter Eisentraut f966101d19 Move SSL API comments to header files
Move the documentation of the SSL API calls are supposed to do into the
headers files, instead of keeping them in the files for the OpenSSL
implementation.  That way, they don't have to be duplicated or be
inconsistent when other implementations are added.
2018-01-23 07:11:39 -05:00
Peter Eisentraut 573bd08b99 Move EDH support to common files
The EDH support is not really specific to the OpenSSL implementation, so
move the support and documentation comments to common files.
2018-01-23 07:11:38 -05:00
Peter Eisentraut 7404e77cc1 Split out documentation of SSL parameters into their own section
Split the "Authentication and Security" section into two separate
sections "Authentication" and "SSL".  The latter part has gotten much
longer over time, and doesn't primarily have to do with authentication.

Also, the row_security parameter was inconsistently categorized, so
clean that up while we're here.
2018-01-23 07:11:38 -05:00
Peter Eisentraut 8561e4840c Transaction control in PL procedures
In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language.  Add similar underlying functions to SPI.  Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call.  Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.

- SPI

Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags.  The only option flag right now is
SPI_OPT_NONATOMIC.  A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed.  This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called.  A nonatomic SPI connection uses different
memory management.  A normal SPI connection allocates its memory in
TopTransactionContext.  For nonatomic connections we use PortalContext
instead.  As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.

SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.

- portalmem.c

Some adjustments were made in the code that cleans up portals at
transaction abort.  The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.

In AtAbort_Portals(), remove the code that marks an active portal as
failed.  As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort.  And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception.  So the code in AtAbort_Portals() is never used
anyway.

In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much.  This mirrors similar code in
PreCommit_Portals().

- PL/Perl

Gets new functions spi_commit() and spi_rollback()

- PL/pgSQL

Gets new commands COMMIT and ROLLBACK.

Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.

- PL/Python

Gets new functions plpy.commit and plpy.rollback.

- PL/Tcl

Gets new commands commit and rollback.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2018-01-22 08:43:06 -05:00
Magnus Hagander 1cc4f536ef Support huge pages on Windows
Add support for huge pages (called large pages on Windows) to the
Windows build.

This (probably) breaks compatibility with Windows versions prior to
Windows 2003 or Windows Vista.

Authors: Takayuki Tsunakawa and Thomas Munro
Reviewed by: Magnus Hagander, Amit Kapila
2018-01-21 15:40:46 +01:00
Tom Lane 96102a32a3 Suppress possibly-uninitialized-variable warnings.
Apparently, Peter's compiler has faith that the switch test values here
could never not be valid values of their enums.  Mine does not, and
I tend to agree with it.
2018-01-19 22:16:25 -05:00
Robert Haas 2f17844104 Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint.  In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one.  This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented.  (There is a pending patch to improve the
situation further, but it needs more review.)

Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me.  A few final revisions by me.

Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 15:33:06 -05:00
Alvaro Herrera 7f17fd6fc7 Fix CompareIndexInfo's attnum comparisons
When an index column is an expression, it makes no sense to compare its
attribute numbers.

This seems to account for remaining buildfarm fallout from 8b08f7d482.
At least, it solves the issue in my local 32bit VM -- let's see what the
rest thinks.
2018-01-19 16:56:42 -03:00
Peter Eisentraut 8b9e9644dc Replace AclObjectKind with ObjectType
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that.  It's only used in
aclcheck_error.  By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-19 14:01:15 -05:00
Peter Eisentraut 2c6f37ed62 Replace GrantObjectType with ObjectType
There used to be a lot of different *Type and *Kind symbol groups to
address objects within different commands, most of which have been
replaced by ObjectType, starting with
b256f24264.  But this conversion was never
done for the ACL commands until now.

This change ends up being just a plain replacement of the types and
symbols, without any code restructuring needed, except deleting some now
redundant code.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Stephen Frost <sfrost@snowman.net>
2018-01-19 14:01:14 -05:00
Alvaro Herrera 8b08f7d482 Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.

As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones.  Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.

To support pg_dump'ing these indexes, add commands
    CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
    ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index).  These reconstruct prior
database state exactly.

Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
	Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 11:49:22 -03:00
Alvaro Herrera 1ef61ddce9 Fix StoreCatalogInheritance1 to use 32bit inhseqno
For no apparent reason, this function was using a 16bit-wide inhseqno
value, rather than the correct 32 bit width which is what is stored in
the pg_inherits catalog.  This becomes evident if you try to create a
table with more than 65535 parents, because this error appears:

ERROR:  duplicate key value violates unique constraint «pg_inherits_relid_seqno_index»
DETAIL:  Key (inhrelid, inhseqno)=(329371, 0) already exists.

Needless to say, having so many parents is an uncommon situations, which
explains why this error has never been reported despite being having
been introduced with the Postgres95 1.01 sources in commit d31084e9d111:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349

Backpatch all the way back.

David Rowley noticed this while reviewing a patch of mine.
Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com
2018-01-19 10:17:54 -03:00
Robert Haas 29d58fd3ad Transfer state pertaining to pending REINDEX operations to workers.
This will allow the pending patch for parallel CREATE INDEX to work
on system catalogs, and to provide the same level of protection
against use of user indexes while they are being rebuilt that we
have for non-parallel CREATE INDEX.

Patch by me, reviewed by Peter Geoghegan.

Discussion: http://postgr.es/m/CA+TgmoYN-YQU9JsGQcqFLovZ-C+Xgp1_xhJQad=cunGG-_p5gg@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wzkv4UNkXYhqQRqk-u9rS7h5c-4cCW+EqQ8K_WSeS43aZg@mail.gmail.com
2018-01-19 07:48:54 -05:00
Simon Riggs 4e54dd2e0a Fix typo in recent commit
Typo in 9c7d06d606

Reported-by: Masahiko Sawada
2018-01-19 06:36:17 +00:00
Peter Eisentraut a228e44ce4 Update comment
The "callback" that this comment was referring to was removed by commit
c0a15e07cd, so update to match the current
code.
2018-01-18 19:36:34 -05:00
Bruce Momjian f033462d8f Reorder C includes
Reorder header files in joinrels.c and pathnode.c in alphabetical order,
removing unnecessary ones.

Author: Etsuro Fujita
2018-01-17 18:10:05 -05:00
Tom Lane dca48d145e Remove useless lookup of root partitioned rel in ExecInitModifyTable().
node->partitioned_rels is only set in UPDATE/DELETE cases, but
ExecInitModifyTable only uses its "rel" variable in INSERT cases,
so the extra logic to find the root rel is just a waste of complexity
and cycles.

Etsuro Fujita, reviewed by Amit Langote

Discussion: https://postgr.es/m/93cf9816-2f7d-0f67-8ed2-4a4e497a6ab8@lab.ntt.co.jp
2018-01-17 14:44:15 -05:00
Simon Riggs 9c7d06d606 Ability to advance replication slots
Ability to advance both physical and logical replication slots using a
new user function pg_replication_slot_advance().

For logical advance that means records are consumed as fast as possible
and changes are not given to output plugin for sending. Makes 2nd phase
(after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation
faster, especially when there are big transactions as the reorder buffer
does not have to deal with data changes and does not have to spill to
disk.

Author: Petr Jelinek
Reviewed-by: Simon Riggs
2018-01-17 11:38:34 +00:00
Andrew Dunstan cc4feded0a Centralize json and jsonb handling of datetime types
The creates a single function JsonEncodeDateTime which will format these
data types in an efficient and consistent manner. This will be all the
more important when we come to jsonpath so we don't have to implement yet
more code doing the same thing in two more places.

This also extends the code to handle time and timetz types which were
not previously handled specially. This requires exposing the time2tm and
timetz2tm functions.

Patch from Nikita Glukhov
2018-01-16 19:07:13 -05:00
Peter Eisentraut d91da5eced Remove useless use of bit-masking macros
In this case, the macros SET_8_BYTES(), GET_8_BYTES(), SET_4_BYTES(),
GET_4_BYTES() are no-ops, so we can just remove them.

The plan is to perhaps remove them from the source code altogether, so
we'll start here.

Discussion: https://www.postgresql.org/message-id/5d51721a-69ef-2053-9172-599b539f0628@2ndquadrant.com
2018-01-16 17:12:16 -05:00
Tom Lane 680d540502 Avoid unnecessary failure in SELECT concurrent with ALTER NO INHERIT.
If a query against an inheritance tree runs concurrently with an ALTER
TABLE that's disinheriting one of the tree members, it's possible to get
a "could not find inherited attribute" error because after obtaining lock
on the removed member, make_inh_translation_list sees that its columns
have attinhcount=0 and decides they aren't the columns it's looking for.

An ideal fix, perhaps, would avoid including such a just-removed member
table in the query at all; but there seems no way to accomplish that
without adding expensive catalog rechecks or creating a likelihood of
deadlocks.  Instead, let's just drop the check on attinhcount.  In this
way, a query that's included a just-disinherited child will still
succeed, which is not a completely unreasonable behavior.

This problem has existed for a long time, so back-patch to all supported
branches.  Also add an isolation test verifying related behaviors.

Patch by me; the new isolation test is based on Kyotaro Horiguchi's work.

Discussion: https://postgr.es/m/20170626.174612.23936762.horiguchi.kyotaro@lab.ntt.co.jp
2018-01-12 15:46:37 -05:00
Tom Lane 90947674fc Fix incorrect handling of subquery pullup in the presence of grouping sets.
If we flatten a subquery whose target list contains constants or
expressions, when those output columns are used in GROUPING SET columns,
the planner was capable of doing the wrong thing by merging a pulled-up
expression into the surrounding expression during const-simplification.
Then the late processing that attempts to match subexpressions to grouping
sets would fail to match those subexpressions to grouping sets, with the
effect that they'd not go to null when expected.

To fix, wrap such subquery outputs in PlaceHolderVars, ensuring that
they preserve their separate identity throughout the planner's expression
processing.  This is a bit of a band-aid, because the wrapper defeats
const-simplification even in places where it would be safe to allow.
But a nicer fix would likely be too invasive to back-patch, and the
consequences of the missed optimizations probably aren't large in most
cases.

Back-patch to 9.5 where grouping sets were introduced.

Heikki Linnakangas, with small mods and better test cases by me;
additional review by Andrew Gierth

Discussion: https://postgr.es/m/7dbdcf5c-b5a6-ef89-4958-da212fe10176@iki.fi
2018-01-12 12:24:50 -05:00
Alvaro Herrera 49c784ece7 Remove hard-coded schema knowledge about pg_attribute from genbki.pl
Add the ability to label a column's default value in the catalog header,
and implement this for pg_attribute.  A new function in Catalog.pm is
used to fill in a tuple with defaults.  The build process will complain
loudly if a catalog entry is incomplete,

Commit 8137f2c323 labeled variable length columns for the C preprocessor.
Expose that label to genbki.pl so we can exclude those columns from schema
macros in a general fashion. Also, format schema macro entries according
to their types.

This means slightly less code maintenance, but more importantly it's a
proving ground for mechanisms intended to be used in later commits.

While at it, I (Álvaro) couldn't resist making some changes in
genbki.pl: rename some functions to actually indicate their purpose
instead of actively misleading onlookers; and don't iterate on the whole
of pg_type to find the entry for each catalog row, using a hash instead
of an array.

Author: John Naylor, some changes by Álvaro Herrera
Discussion: https://postgr.es/m/CAJVSVGVJHwD8sfDfZW9TbCHWKf=C1YDRM-rF=2JenRU_y+VcFg@mail.gmail.com
2018-01-12 11:21:42 -03:00
Bruce Momjian bdb70c12b3 C comment: fix "the the" mentions in C comments
Reported-by: Christoph Dreis

Discussion: https://postgr.es/m/007e01d3519e$2734ca10$759e5e30$@freenet.de

Author: Christoph Dreis
2018-01-11 21:50:21 -05:00
Tom Lane 4d41b2e092 Add QueryEnvironment to ExplainOneQuery_hook's parameter list.
This should have been done in commit 18ce3a4ab, which added that parameter
to ExplainOneQuery, but it was overlooked.  This makes it impossible for
a user of the hook to pass the queryEnv down to ExplainOnePlan.

It's too late to change this API in v10, I suppose, but fortunately
passing NULL to ExplainOnePlan will work in nearly all interesting
cases in v10.  That might not be true forever, so we'd better fix it.

Tatsuro Yamada, reviewed by Thomas Munro

Discussion: https://postgr.es/m/890e8dd9-c1c7-a422-6892-874f5eaee048@lab.ntt.co.jp
2018-01-11 12:16:18 -05:00
Peter Eisentraut 9e945f8626 Fix Latin spelling
"c.f." should be "cf.".
2018-01-11 08:32:01 -05:00
Peter Eisentraut b48b2f8793 Revert "Move portal pinning from PL/pgSQL to SPI"
This reverts commit b3617cdfbb.

This broke returning unnamed cursors from PL/pgSQL functions.
Apparently, there are no test cases for this.
2018-01-10 16:01:17 -05:00
Tom Lane 3afd75eaac Remove dubious micro-optimization in ckpt_buforder_comparator().
It seems incorrect to assume that the list of CkptSortItems can never
contain duplicate page numbers: concurrent activity could result in some
page getting dropped from a low-numbered buffer and later loaded into a
high-numbered buffer while BufferSync is scanning the buffer pool.
If that happened, the comparator would give self-inconsistent results,
potentially confusing qsort().  Saving one comparison step is not worth
possibly getting the sort wrong.

So far as I can tell, nothing would actually go wrong given our current
implementation of qsort().  It might get a bit slower than expected
if there were a large number of duplicates of one value, but that's
surely a probability-epsilon case.  Still, the comment is wrong,
and if we ever switched to another sort implementation it might be
less forgiving.

In passing, avoid casting away const-ness of the argument pointers;
I've not seen any compiler complaints from that, but it seems likely
that some compilers would not like it.

Back-patch to 9.6 where this code came in, just in case I've underestimated
the possible consequences.

Discussion: https://postgr.es/m/18437.1515607610@sss.pgh.pa.us
2018-01-10 15:50:54 -05:00
Robert Haas 2fd58096f0 Add missing "return" statement to accumulate_append_subpath.
Without this, Parallel Append can end up with extra children.

Report by Rajkumar Raghuwanshi.  Fix by Amit Khandekar.  Brown
paper bag bug by me.

Discussion: http://postgr.es/m/CAKcux6mBF-NiddyEe9LwymoUC5+wh8bQJ=uk2gGkOE+L8cv=LA@mail.gmail.com
2018-01-10 11:21:20 -05:00
Peter Eisentraut b3617cdfbb Move portal pinning from PL/pgSQL to SPI
PL/pgSQL "pins" internally generated (unnamed) portals so that user code
cannot close them by guessing their names.  This logic is also useful in
other languages and really for any code.  So move that logic into SPI.
An unnamed portal obtained through SPI_cursor_open() and related
functions is now automatically pinned, and SPI_cursor_close()
automatically unpins a portal that is pinned.

In the core distribution, this affects PL/Perl and PL/Python, preventing
users from manually closing cursors created by spi_query and
plpy.cursor, respectively.  (PL/Tcl does not currently offer any cursor
functionality.)

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2018-01-10 10:20:51 -05:00
Peter Eisentraut acc67ffd0a Give more accurate error message for dropping pinned portal
The previous code gave the same error message for attempting to drop
pinned and active portals, but those are separate states, so give
separate error messages.
2018-01-10 09:22:07 -05:00
Andres Freund 69c3936a14 Expression evaluation based aggregate transition invocation.
Previously aggregate transition and combination functions were invoked
by special case code in nodeAgg.c, evaluating input and filters
separately using the expression evaluation machinery. That turns out
to not be great for performance for several reasons:

- repeated expression evaluations have some cost
- the transition functions invocations are poorly predicted, as
  commonly there are multiple aggregates in a query, resulting in the
  same call-stack invoking different functions.
- filter and input computation had to be done separately
- the special case code made it hard to implement JITing of the whole
  transition function invocation

Address this by building one large expression that computes input,
evaluates filters, and invokes transition functions.

This leads to moderate speedups in queries bottlenecked by aggregate
computations, and enables large speedups for similar cases once JITing
is done.

There's potential for further improvement:
- It'd be nice if we could simplify the somewhat expensive
  aggstate->all_pergroups lookups.
- right now there's still an advance_transition_function invocation in
  nodeAgg.c, leading to some code duplication.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-01-09 13:25:38 -08:00
Alvaro Herrera 272c2ab9fd Change some bogus PageGetLSN calls to BufferGetLSNAtomic
As src/backend/access/transam/README says, PageGetLSN may only be called
by processes holding either exclusive lock on buffer, or a shared lock
on buffer plus buffer header lock.  Therefore any place that only holds
a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN,
which internally obtains buffer header lock prior to reading the LSN.

A few callsites failed to comply with this rule.  This was detected by
running all tests under a new (not committed) assertion that verifies
PageGetLSN locking contract.  All but one of the callsites that failed
the assertion are fixed by this patch.  Remaining callsites were
inspected manually and determined not to need any change.

The exception (unfixed callsite) is in TestForOldSnapshot, which only
has a Page argument, making it impossible to access the corresponding
Buffer from it.  Fixing that seems a much larger patch that will have to
be done separately; and that's just as well, since it was only
introduced in 9.6 and other bugs are much older.

Some of these bugs are ancient; backpatch all the way back to 9.3.

Authors: Jacob Champion, Asim Praveen, Ashwin Agrawal
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com
2018-01-09 17:06:31 -03:00
Andrew Dunstan 11b623dd0a Implement TZH and TZM timestamp format patterns
These are compatible with Oracle and required for the datetime template
language for jsonpath in an upcoming patch.

Nikita Glukhov and Andrew Dunstan, reviewed by Pavel Stehule.
2018-01-09 14:25:05 -05:00
Peter Eisentraut a77dd53f30 Remove PortalGetQueryDesc()
After having gotten rid of PortalGetHeapMemory(), there seems little
reason to keep one Portal access macro around that offers no actual
abstraction and isn't consistently used anyway.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Peter Eisentraut 0f7c49e855 Update portal-related memory context names and API
Rename PortalMemory to TopPortalContext, to avoid confusion with
PortalContext and align naming with similar top-level memory contexts.

Rename PortalData's "heap" field to portalContext.  The "heap" naming
seems quite antiquated and confusing.  Also get rid of the
PortalGetHeapMemory() macro and access the field directly, which we do
for other portal fields, so this abstraction doesn't buy anything.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Tom Lane 3cb1b2a880 Rewrite list_qsort() to avoid trashing its input list.
The initial implementation of list_qsort(), from commit ab7271677,
re-used the ListCells of the input list while not touching the List
header.  This meant that anybody who still had a pointer to the
original header would now be in possession of a corrupted list,
a problem that seems sure to bite us eventually.

One possible solution is to re-use the original List header as well,
giving the function the semantics of update-in-place.  However, that
doesn't seem like a very good idea either given the way that the
function is used in the planner: create_path functions aren't normally
supposed to modify their input lists.  It doesn't look like there would
be a problem today, but it's not hard to foresee a time when modifying
a list of Paths in-place could have side-effects on some other append
path.

On the whole, and in view of the likelihood that this function might
be used in other contexts in the future, it seems best to get rid of
the micro-optimization of re-using the input list cells.  Just build
a new list.

Discussion: https://postgr.es/m/16912.1515449066@sss.pgh.pa.us
2018-01-09 13:25:53 -05:00
Tom Lane 624e440a47 Improve the heuristic for ordering child paths of a parallel append.
Commit ab7271677 introduced code that attempts to order the child
scans of a Parallel Append node in a way that will minimize execution
time, based on total cost and startup cost.  However, it failed to
think hard about what to do when estimated costs are exactly equal;
a case that's particularly likely to occur when comparing on startup
cost.  In such a case the ordering of the child paths would be left
to the whims of qsort, an algorithm that isn't even stable.

We can improve matters by applying the rule used elsewhere in the
planner: if total costs are equal, sort on startup cost, and
vice versa.  When both cost estimates are exactly equal, rather
than letting qsort do something unpredictable, sort based on the
child paths' relids, which should typically result in sorting in
inheritance order.  (The latter provision requires inventing a
qsort-style comparator for bitmapsets, but maybe we'll have use
for that for other reasons in future.)

This results in a few plan changes in the select_parallel test,
but those all look more reasonable than before, when the actual
underlying cost numbers are taken into account.

Discussion: https://postgr.es/m/4944.1515446989@sss.pgh.pa.us
2018-01-09 13:07:52 -05:00
Tom Lane 80259d4dbf While waiting for a condition variable, detect postmaster death.
The general assumption for postmaster child processes is that they
should just exit(1), reasonably promptly, if the postmaster disappears.
condition_variable.c neglected this consideration and could be left
waiting forever, if the counterpart process it is waiting for has
done the right thing and exited.

We had some discussion of adjusting the WaitEventSet API to make it
harder to make this type of mistake in future; but for the moment,
and for v10, let's make this narrow fix.

Discussion: https://postgr.es/m/20412.1515456143@sss.pgh.pa.us
2018-01-09 12:34:57 -05:00
Tom Lane 8a906204ae Fix race condition during replication origin drop.
replorigin_drop() misunderstood the API for condition variables: it
had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
inside its test-and-sleep loop, rather than outside the loop as
intended.  The net effect is a narrow race-condition window wherein,
if the process using a replication slot releases it immediately after
replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
would get into the condition variable's wait list too late and then
wait indefinitely for a signal that won't come.

Because there's a different CV for each replication slot, we can't
just move the ConditionVariablePrepareToSleep call to above the
test-and-sleep loop.  What we can do, in the wake of commit 13db3b936,
is drop the ConditionVariablePrepareToSleep call entirely.  This fix
depends on that commit because (at least in principle) the slot matching
the target replication origin might move around, so that once in a blue
moon successive loop iterations might involve different CVs.  We can now
cope with such a scenario, at the cost of an extra trip through the
retry loop.

(There are ways we could fix this bug without depending on that commit,
but they're all a lot more complicated than this way.)

While at it, upgrade the rather skimpy comments in this function.

Back-patch to v10 where this code came in.

Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us
2018-01-09 12:09:30 -05:00
Tom Lane 13db3b9363 Allow ConditionVariable[PrepareTo]Sleep to auto-switch between CVs.
The original coding here insisted that callers manually cancel any prepared
sleep for one condition variable before starting a sleep on another one.
While that's not a huge burden today, it seems like a gotcha that will bite
us in future if the use of condition variables increases; anything we can
do to make the use of this API simpler and more robust is attractive.
Hence, allow these functions to automatically switch their attention to
a different CV when required.  This is safe for the same reason it was OK
for commit aced5a92b to let a broadcast operation cancel any prepared CV
sleep: whenever we return to the other test-and-sleep loop, we will
automatically re-prepare that CV, paying at most an extra test of that
loop's exit condition.

Back-patch to v10 where condition variables were introduced.  Ordinarily
we would probably not back-patch a change like this, but since it does not
invalidate any coding pattern that was legal before, it seems safe enough.
Furthermore, there's an open bug in replorigin_drop() for which the
simplest fix requires this.  Even if we chose to fix that in some more
complicated way, the hazard would remain that we might back-patch some
other bug fix that requires this behavior.

Patch by me, reviewed by Thomas Munro.

Discussion: https://postgr.es/m/2437.1515368316@sss.pgh.pa.us
2018-01-09 11:39:10 -05:00
Robert Haas 921059bd66 Don't allow VACUUM VERBOSE ANALYZE VERBOSE.
There are plans to extend the syntax for ANALYZE, so we need to break
the link between VacuumStmt and AnalyzeStmt.  But apart from that, the
syntax above is undocumented and, if discovered by users, might give
the impression that the VERBOSE option for VACUUM differs from the
verbose option from ANALYZE, which it does not.

Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada

Discussion: http://postgr.es/m/D3FC73E2-9B1A-4DB4-8180-55F57D116B4E@amazon.com
2018-01-09 10:20:48 -05:00
Robert Haas 63008b19ee Fix comment.
RELATION_IS_OTHER_TEMP is tested in the caller, not here.

Discussion: http://postgr.es/m/5A5438E4.3090709@lab.ntt.co.jp
2018-01-09 09:40:31 -05:00
Tom Lane e35dba475a Cosmetic improvements in condition_variable.[hc].
Clarify a bunch of comments.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-08 18:28:03 -05:00
Tom Lane ea8e1bbc53 Improve error detection capability in proclists.
Previously, although the initial state of a proclist_node is expected
to be next == prev == 0, proclist_delete_offset would reset nodes to
next == prev == INVALID_PGPROCNO when removing them from a list.
This is the same state that a node in a singleton list has, so that
it's impossible to distinguish not-in-a-list from in-a-list.  Change
proclist_delete_offset to reset removed nodes to next == prev == 0,
making it possible to distinguish those cases, and then add Asserts
to the list add and delete functions that the supplied node isn't
or is in a list at entry.  Also tighten assertions about the node
being in the particular list (not some other one) where it is possible
to check that in O(1) time.

In ConditionVariablePrepareToSleep, since we don't expect the process's
cvWaitLink to already be in a list, remove the more-or-less-useless
proclist_contains check; we'd rather have proclist_push_tail's new
assertion fire if that happens.

Improve various comments related to proclists, too.

Patch by me, reviewed by Thomas Munro.  This isn't back-patchable, since
there could theoretically be inlined copies of proclist_delete_offset in
third-party modules.  But it's only improving debuggability anyway.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-08 18:07:04 -05:00
Tom Lane eeb3c2df42 Back off chattiness in RemovePgTempFiles().
In commit 561885db0, as part of normalizing RemovePgTempFiles's error
handling, I removed its behavior of silently ignoring ENOENT failures
during directory opens.  Thomas Munro points out that this is a bad idea at
the top level, because we don't create pgsql_tmp directories until needed.
Thus this coding could produce LOG messages in perfectly normal situations,
which isn't what I intended.  Restore the suppression of ENOENT logging,
but only at top level --- it would still be unexpected for a nested temp
directory to disappear between seeing it in the parent directory and
opening it.

Discussion: https://postgr.es/m/CAEepm=2y06SehAkTnd5sU_eVqdv5P-=Srt1y5vYNQk6yVDVaPw@mail.gmail.com
2018-01-07 20:40:40 -05:00
Simon Riggs 6271fceb8a Add TIMELINE to backup_label file
Allows new test to confirm timelines match

Author: Michael Paquier
Reviewed-by: David Steele
2018-01-06 12:24:19 +00:00
Simon Riggs 6668a54eb8 Default monitoring roles - errata
25fff40798 introduced
default monitoring roles. Apply these corrections:

* Allow access to pg_stat_get_wal_senders()
  by role pg_read_all_stats

* Correct comment in pg_stat_get_wal_receiver()
  to show it is no longer superuser-only.

Author: Feike Steenbergen
Reviewed-by: Michael Paquier

Apply to HEAD, then later backpatch to 10
2018-01-06 11:48:21 +00:00
Tom Lane ccf312a448 Remove return values of ConditionVariableSignal/Broadcast.
In the wake of commit aced5a92b, the semantics of these results are
a bit squishy: we can tell whether we signaled some other process(es),
but we do not know which ones were real waiters versus mere sentinels
for ConditionVariableBroadcast operations.  It does not help much that
ConditionVariableBroadcast will attempt to pass on the signal to the
next real waiter, because (a) there might not be one, and (b) that will
only happen awhile later, anyway.  So these results could overstate how
much effect the calls really had.

However, no existing caller of either function pays any attention to its
result value, so it seems reasonable to just define that as a required
property of a correct algorithm.  To encourage correctness and save some
tiny number of cycles, change both functions to return void.

Patch by me, per an observation by Thomas Munro.  No back-patch, since
if any third parties happen to be using these functions, they might not
appreciate an API break in a minor release.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 20:33:26 -05:00
Tom Lane 3cac0ec859 Reorder steps in ConditionVariablePrepareToSleep for more safety.
In the admittedly-very-unlikely case that AddWaitEventToSet fails,
ConditionVariablePrepareToSleep would error out after already having
set cv_sleep_target, which is probably bad, and after having already
set cv_wait_event_set, which is very bad.  Transaction abort might or
might not clean up cv_sleep_target properly; but there is nothing
that would be aware that the WaitEventSet wasn't fully constructed,
so that all future condition variable sleeps would be broken.
We can easily guard against these hazards with slight restructuring.

Back-patch to v10 where condition_variable.c was introduced.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 19:42:49 -05:00
Tom Lane aced5a92bf Rewrite ConditionVariableBroadcast() to avoid live-lock.
The original implementation of ConditionVariableBroadcast was, per its
self-description, "the dumbest way possible".  Thomas Munro found out
it was a bit too dumb.  An awakened process may immediately re-queue
itself, if the specific condition it's waiting for is not yet satisfied.
If this happens before ConditionVariableBroadcast is able to see the wait
queue as empty, then ConditionVariableBroadcast will re-awaken the same
process, repeating the cycle.  Given unlucky timing this back-and-forth
can repeat indefinitely; loops lasting thousands of seconds have been
seen in testing.

To fix, add our own process to the end of the wait queue to serve as a
sentinel, and exit the broadcast loop once our process is not there
anymore.  There are various special considerations described in the
comments, the principal disadvantage being that wakers can no longer
be sure whether they awakened a real waiter or just a sentinel.  But in
practice nobody pays attention to the result of ConditionVariableSignal
or ConditionVariableBroadcast anyway, so that problem seems hypothetical.

Back-patch to v10 where condition_variable.c was introduced.

Tom Lane and Thomas Munro

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 19:21:30 -05:00
Robert Haas 19c47e7c82 Factor error generation out of ExecPartitionCheck.
At present, we always raise an ERROR if the partition constraint
is violated, but a pending patch for UPDATE tuple routing will
consider instead moving the tuple to the correct partition.
Refactor to make that simpler.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.

Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
2018-01-05 15:22:33 -05:00
Alvaro Herrera df9f682c7b Fix failure to delete spill files of aborted transactions
Logical decoding's reorderbuffer.c may spill transaction files to disk
when transactions are large.  These are supposed to be removed when they
become "too old" by xid; but file removal requires the boundary LSNs of
the transaction to be known.  The final_lsn is only set when we see the
commit or abort record for the transaction, but nothing sets the value
for transactions that crash, so the removal code misbehaves -- in
assertion-enabled builds, it crashes by a failed assertion.

To fix, modify the final_lsn of transactions that don't have a value
set, to the LSN of the very latest change in the transaction.  This
causes the spilled files to be removed appropriately.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada
Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp
2018-01-05 12:17:10 -03:00
Peter Eisentraut 054e8c6cdb Another attempt at fixing build with various OpenSSL versions
It seems we can't easily work around the lack of
X509_get_signature_nid(), so revert the previous attempts and just
disable the tls-server-end-point feature if we don't have it.
2018-01-04 19:09:27 -05:00
Peter Eisentraut 1834c1e432 Add missing includes
<openssl/x509.h> is necessary to look into the X509 struct, used by
ac3ff8b1d8.
2018-01-04 17:56:09 -05:00
Robert Haas ef6087ee5f Minor preparatory refactoring for UPDATE row movement.
Generalize is_partition_attr to has_partition_attrs and make it
accessible from outside tablecmds.c.  Change map_partition_varattnos
to clarify that it can be used for mapping between any two relations
in a partitioning hierarchy, not just parent -> child.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Some comment changes by me.

Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
2018-01-04 16:25:49 -05:00
Peter Eisentraut ac3ff8b1d8 Fix build with older OpenSSL versions
Apparently, X509_get_signature_nid() is only in fairly new OpenSSL
versions, so use the lower-level interface it is built on instead.
2018-01-04 16:22:06 -05:00
Robert Haas cc6337d2fe Simplify and encapsulate tuple routing support code.
Instead of having ExecSetupPartitionTupleRouting return multiple out
parameters, have it return a pointer to a structure containing all of
those different things.  Also, provide and use a cleanup function,
ExecCleanupTupleRouting, instead of cleaning up all of the resources
allocated by ExecSetupPartitionTupleRouting individually.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me

Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
2018-01-04 15:48:15 -05:00
Peter Eisentraut d3fb72ea6d Implement channel binding tls-server-end-point for SCRAM
This adds a second standard channel binding type for SCRAM.  It is
mainly intended for third-party clients that cannot implement
tls-unique, for example JDBC.

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-04 15:29:50 -05:00
Peter Eisentraut f3049a603a Refactor channel binding code to fetch cbind_data only when necessary
As things stand now, channel binding data is fetched from OpenSSL and
saved into the SCRAM exchange context for any SSL connection attempted
for a SCRAM authentication, resulting in data fetched but not used if no
channel binding is used or if a different channel binding type is used
than what the data is here for.

Refactor the code in such a way that binding data is fetched from the
SSL stack only when a specific channel binding is used for both the
frontend and the backend.  In order to achieve that, save the libpq
connection context directly in the SCRAM exchange state, and add a
dependency to SSL in the low-level SCRAM routines.

This makes the interface in charge of initializing the SCRAM context
cleaner as all its data comes from either PGconn* (for frontend) or
Port* (for the backend).

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-04 13:55:12 -05:00
Peter Eisentraut 3ad2afc2e9 Define LDAPS_PORT if it's missing and disable implicit LDAPS on Windows
Some versions of Windows don't define LDAPS_PORT.

Also, Windows' ldap_sslinit() is documented to use LDAPS even if you
said secure=0 when the port number happens to be 636 or 3269.  Let's
avoid using the port number to imply that you want LDAPS, so that
connection strings have the same meaning on Windows and Unix.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm%3D23B7GV4AUz3MYH1TKpTv030VHxD2Sn%2BLYWDv8d-qWxww%40mail.gmail.com
2018-01-04 10:34:41 -05:00
Robert Haas c759395617 Code review for Parallel Append.
- Remove unnecessary #include mistakenly added in execnodes.h.
- Fix mistake in comment in choose_next_subplan_for_leader.
- Adjust row estimates in cost_append for a possibly-different
  parallel divisor.
- Clamp row estimates in cost_append after operations that may
  not produce integers.

Amit Kapila, with cosmetic adjustments by me.

Discussion: http://postgr.es/m/CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com
2018-01-04 07:56:09 -05:00
Tom Lane 47c6772eb7 Clean up tupdesc.c for recent changes.
TupleDescCopy needs to have the same effects as CreateTupleDescCopy in
that, since it doesn't copy constraints, it should clear the per-attribute
fields associated with them.  Oversight in commit cc5f81366.

Since TupleDescCopy has already established the presumption that it
can just flat-copy the entire attribute array in one go, propagate
that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr.
(I'm suspicious that this would lead to valgrind complaints if we
had any trailing padding in the struct, but we do not, and anyway
fixing that seems like a job for a separate commit.)

Add some better comments.

Thomas Munro, reviewed by Vik Fearing, some additional hacking by me

Discussion: https://postgr.es/m/CAEepm=0NvOGZ8B6GbQyQe2C_c2m3LKJ9w=8OMBaYRLgZ_Gw6Nw@mail.gmail.com
2018-01-03 17:53:41 -05:00
Alvaro Herrera bab2969867 Fix typo
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no
2018-01-03 19:12:06 -03:00
Alvaro Herrera 3c27944fb2 Make XactLockTableWait work for transactions that are not yet self-locked
XactLockTableWait assumed that its xid argument has already added itself
to the lock table.  That assumption led to another assumption that if
locking the xid has succeeded but the xid is reported as still in
progress, then the input xid must have been a subtransaction.

These assumptions hold true for the original uses of this code in
locking related to on-disk tuples, but they break down in logical
replication slot snapshot building -- in particular, when a standby
snapshot logged contains an xid that's already in ProcArray but not yet
in the lock table.  This leads to assertion failures that can be
reproduced all the way back to 9.4, when logical decoding was
introduced.

To fix, change SubTransGetParent to SubTransGetTopmostTransaction which
has a slightly different API: it returns the argument Xid if there is no
parent, and it goes all the way to the top instead of moving up the
levels one by one.  Also, to avoid busy-waiting, add a 1ms sleep to give
the other process time to register itself in the lock table.

For consistency, change ConditionalXactLockTableWait the same way.

Author: Petr Jelínek
Discussion: https://postgr.es/m/1B3E32D8-FCF4-40B4-AEF9-5C0E3AC57969@postgrespro.ru
Reported-by: Konstantin Knizhnik
Diagnosed-by: Stas Kelvich, Petr Jelínek
Reviewed-by: Andres Freund, Robert Haas
2018-01-03 17:26:20 -03:00
Tom Lane 6fcde24063 Fix some minor errors in new PHJ code.
Correct ExecParallelHashTuplePrealloc's estimate of whether the
space_allowed limit is exceeded.  Be more consistent about tuples that
are exactly HASH_CHUNK_THRESHOLD in size (they're "small", not "large").
Neither of these things explain the current buildfarm unhappiness, but
they're still bugs.

Thomas Munro, per gripe by me

Discussion: https://postgr.es/m/CAEepm=34PDuR69kfYVhmZPgMdy8pSA-MYbpesEN1SR+2oj3Y+w@mail.gmail.com
2018-01-03 12:53:49 -05:00
Tom Lane 3decd150a2 Teach eval_const_expressions() to handle some more cases.
Add some infrastructure (mostly macros) to make it easier to write
typical cases for constant-expression simplification.  Add simplification
processing for ArrayRef, RowExpr, and ScalarArrayOpExpr node types,
which formerly went unsimplified even if all their inputs were constants.
Also teach it to simplify FieldSelect from a composite constant.
Make use of the new infrastructure to reduce the amount of code needed
for the existing ArrayExpr and ArrayCoerceExpr cases.

One existing test case changes output as a result of the fact that
RowExpr can now be folded to a constant.  All the new code is exercised
by existing test cases according to gcov, so I feel no need to add
additional tests.

Tom Lane, reviewed by Dmitry Dolgov

Discussion: https://postgr.es/m/3be3b82c-e29c-b674-2163-bf47d98817b1@iki.fi
2018-01-03 12:35:09 -05:00
Peter Eisentraut 35c0754fad Allow ldaps when using ldap authentication
While ldaptls=1 provides an RFC 4513 conforming way to do LDAP
authentication with TLS encryption, there was an earlier de facto
standard way to do LDAP over SSL called LDAPS.  Even though it's not
enshrined in a standard, it's still widely used and sometimes required
by organizations' network policies.  There seems to be no reason not to
support it when available in the client library.  Therefore, add support
when using OpenLDAP 2.4+ or Windows.  It can be configured with
ldapscheme=ldaps or ldapurl=ldaps://...

Add tests for both ways of requesting LDAPS and a test for the
pre-existing ldaptls=1.  Modify the 001_auth.pl test for "diagnostic
messages", which was previously relying on the server rejecting
ldaptls=1.

Author: Thomas Munro
Reviewed-By: Peter Eisentraut
Discussion: https://postgr.es/m/CAEepm=1s+pA-LZUjQ-9GQz0Z4rX_eK=DFXAF1nBQ+ROPimuOYQ@mail.gmail.com
2018-01-03 10:11:26 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Andres Freund f9ccf92e16 Simplify representation of aggregate transition values a bit.
Previously aggregate transition values for hash and other forms of
aggregation (i.e. sort and no group by) were represented
differently. Hash based aggregation used a grouping set indexed array
pointing to an array of transition values, whereas other forms of
aggregation used one flattened array with the index being computed out
of grouping set and transition offsets.

That made upcoming changes hard, so represent both as grouping set
indexed array of per-group data.

As a nice side-effect this also makes aggregation slightly faster,
because computing offsets with `transno + (setno * numTrans)` turns
out not to be that cheap (too big for x86 lea for example).

Author: Andres Freund
Discussion: https://postgr.es/m/20171128003121.nmxbm2ounxzb6n2t@alap3.anarazel.de
2018-01-02 18:23:37 -08:00
Tom Lane 5dc692f78d Ensure proper alignment of tuples in HashMemoryChunkData buffers.
The previous coding relied (without any documentation) on the data[]
member of HashMemoryChunkData being at a MAXALIGN'ed offset.  If it
was not, the tuples would not be maxaligned either, leading to failures
on alignment-picky machines.  While there seems to be no live bug on any
platform we support, this is clearly pretty fragile: any addition to or
rearrangement of the fields in HashMemoryChunkData could break it.
Let's remove the hazard by getting rid of the data[] member and instead
using pointer arithmetic with an explicitly maxalign'ed offset.

Discussion: https://postgr.es/m/14483.1514938129@sss.pgh.pa.us
2018-01-02 21:23:06 -05:00
Alvaro Herrera 54eff5311d Fix deadlock hazard in CREATE INDEX CONCURRENTLY
Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
supposed to be able to work in parallel, as evidenced by fixes in commit
c3d09b3bd2 specifically to support this case.  In reality, one of the
sessions would be aborted by a misterious "deadlock detected" error.

Jeff Janes diagnosed that this is because of leftover snapshots used for
system catalog scans -- this was broken by 8aa3e47510 keeping track of
(registering) the catalog snapshot.  To fix the deadlocks, it's enough
to de-register that snapshot prior to waiting.

Backpatch to 9.4, which introduced MVCC catalog scans.

Include an isolationtester spec that 8 out of 10 times reproduces the
deadlock with the unpatched code for me (Álvaro).

Author: Jeff Janes
Diagnosed-by: Jeff Janes
Reported-by: Jeremy Finzel
Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com
2018-01-02 19:16:16 -03:00
Peter Eisentraut 438036264a Don't cast between GinNullCategory and bool
The original idea was that we could use an isNull-style bool array
directly as a GinNullCategory array.  However, the existing code already
acknowledges that that doesn't really work, because of the possibility
that bool as currently defined can have arbitrary bit patterns for true
values.  So it has to loop through the nullFlags array to set each bool
value to an acceptable value.  But if we are looping through the whole
array anyway, we might as well build a proper GinNullCategory array
instead and abandon the type casting.  That makes the code much safer in
case bool is ever changed to something else.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-02 12:20:56 -05:00
Andres Freund 93ea78b17c Fix EXPLAIN ANALYZE output for Parallel Hash.
In a race case, EXPLAIN ANALYZE could fail to display correct nbatch
and size information.  Refactor so that participants report only on
batches they worked on rather than trying to report on all of them,
and teach explain.c to consider the HashInstrumentation object from
all participants instead of picking the first one it can find.  This
should fix an occasional build farm failure in the "join" regression
test.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us
2018-01-01 14:38:23 -08:00
Andres Freund b40933101c Perform slot validity checks in a separate pass over expression.
This reduces code duplication a bit, but the primary benefit that it
makes JITing expression evaluation easier. When doing so we can't, as
previously done in the interpreted case, really change opcode without
recompiling. Nor dow we just carry around unnecessary branches to
avoid re-checking over and over.

As a minor side-effect this makes ExecEvalStepOp() O(log(N)) rather
than O(N).

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2017-12-29 12:45:25 -08:00
Andres Freund 4717fdb14c Rely on executor utils to build targetlist for DML RETURNING.
This is useful because it gets rid of the sole direct user of
ExecAssignResultType(). A future commit will likely make use of that
and combine creating the targetlist with the initialization of the
result slot. But it seems like good code hygiene anyway.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2017-12-29 12:26:29 -08:00
Magnus Hagander d02974e32e Properly set base backup backends to active in pg_stat_activity
When walsenders were included in pg_stat_activity, only the ones
actually streaming WAL were listed as active when they were active. In
particular, the connections sending base backups were listed as being
idle. Which means that a regular pg_basebackup would show up with one
active and one idle connection, when both were active.

This patch updates to set all walsenders to active when they are
(including those doing very fast things like IDENTIFY_SYSTEM), and then
back to idle. Details about exactly what they are doing is available in
pg_stat_replication.

Patch by me, review by Michael Paquier and David Steele.
2017-12-29 16:28:32 +01:00
Simon Riggs 48c9f49265 Fix race condition when changing synchronous_standby_names
A momentary window exists when synchronous_standby_names
changes that allows commands issued after the change to
continue to act as async until the change becomes visible.
Remove the race by using more appropriate test in syncrep.c

Author: Asim Rama Praveen and Ashwin Agrawal
Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen
Reviewed-by: Michael Paquier, Masahiko Sawada
2017-12-29 14:30:33 +00:00
Simon Riggs 2958a672b1 Extend near-wraparound hints to include replication slots
Author: Feike Steenbergen
Reviewed-by: Michael Paquier
2017-12-29 14:01:25 +00:00
Andres Freund f83040c62a Fix rare assertion failure in parallel hash join.
When a backend runs out of inner tuples to hash, it should detach from
grow_batch_barrier only after it has flushed all batches to disk and
merged counters, not before.  Otherwise a concurrent backend in
ExecParallelHashIncreaseNumBatches() could stop waiting for this
backend and try to read tuples before they have been written.  This
commit reorders those operations and should fix the assertion failures
seen occasionally on the build farm since commit
1804284042.

Author: Thomas Munro
Discussion: https://postgr.es/m/E1eRwXy-0004IK-TO%40gemulon.postgresql.org
2017-12-28 02:41:53 -08:00
Alvaro Herrera be2343221f Protect against hypothetical memory leaks in RelationGetPartitionKey
Also, fix a comment that commit 8a0596cb65 made obsolete.

Reported-by: Robert Haas
Discussion: http://postgr.es/m/CA+TgmoYbpuUUUp2GhYNwWm0qkah39spiU7uOiNXLz20ASfKYoA@mail.gmail.com
2017-12-27 18:06:14 -03:00
Robert Haas 62d02f39e7 Fix race-under-concurrency in PathNameCreateTemporaryDir.
Thomas Munro

Discussion: http://postgr.es/m/CAEepm=1Vp1e3KtftLtw4B60ZV9teNeKu6HxoaaBptQMsRWjJbQ@mail.gmail.com
2017-12-27 10:56:14 -08:00
Teodor Sigaev ad337c76b6 Update relation's stats in pg_class during vacuum full.
Hash index depends on estimation of numbers of tuples and pages of relations,
incorrect value could be a reason of significantly growing of index. Vacuum
full recreates heap and reindex all indexes before renewal stats. The patch
fixes that, so indexes will see correct values.

Backpatch to v10 only because earlier versions haven't usable hash index and
growing of hash index is a single user-visible symptom.

Author: Amit Kapila
Reviewed-by: Ashutosh Sharma, me
Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net
2017-12-27 18:25:37 +03:00
Teodor Sigaev ff963b393c Add polygon opclass for SP-GiST
Polygon opclass uses compress method feature of SP-GiST added earlier. For now
it's a single operator class which uses this feature. SP-GiST actually indexes
a bounding boxes of input polygons, so part of supported operations are lossy.
Opclass uses most methods of corresponding opclass over boxes of SP-GiST and
treats bounding boxes as point in 4D-space.

Bump catalog version.

Authors: Nikita Glukhov, Alexander Korotkov with minor editorization by me
Reviewed-By: all authors + Darafei Praliaskouski
Discussion: https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru
2017-12-25 18:59:38 +03:00
Andres Freund 4e2970f880 Fix assert with side effects in the new PHJ code.
Instead of asserting the assert just set the value to what it was
supposed to test...

Per coverity.
2017-12-24 02:57:55 -08:00
Tom Lane c4c2885cbb Fix UNION/INTERSECT/EXCEPT over no columns.
Since 9.4, we've allowed the syntax "select union select" and variants
of that.  However, the planner wasn't expecting a no-column set operation
and ended up treating the set operation as if it were UNION ALL.

Turns out it's trivial to fix in v10 and later; we just need to be careful
about not generating a Sort node with no sort keys.  However, since a weird
corner case like this is never going to be exercised by developers, we'd
better have thorough regression tests if we want to consider it supported.

Per report from Victor Yegorov.

Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com
2017-12-22 12:08:06 -05:00
Teodor Sigaev 854823fa33 Add optional compression method to SP-GiST
Patch allows to have different types of column and value stored in leaf tuples
of SP-GiST. The main application of feature is to transform complex column type
to simple indexed type or for truncating too long value, transformation could
be lossy.  Simple example: polygons are converted to their bounding boxes,
this opclass follows.

Authors: me, Heikki Linnakangas, Alexander Korotkov, Nikita Glukhov
Reviewed-By: all authors + Darafei Praliaskouski
Discussions:
https://www.postgresql.org/message-id/5447B3FF.2080406@sigaev.ru
https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru#54907069.1030506@sigaev.ru
2017-12-22 13:33:16 +03:00
Alvaro Herrera 9373baa0f7 Minor edits to catalog files and scripts
This fixes a few typos and small mistakes; it also cleans a few
minor stylistic issues.  The biggest functional change is that
Gen_fmgrtab.pl no longer knows the OID of language 'internal'.

Author: John Naylor
Discussion: https://postgr.es/m/CAJVSVGXAkwbk-A9QHHHf00N905kKisyQbaYwKqaRpze_gPXGfg@mail.gmail.com
2017-12-21 19:07:32 -03:00
Robert Haas cce1ecfc77 Adjust assertion in GetCurrentCommandId.
currentCommandIdUsed is only used to skip redundant increments of the
command counter, and CommandCounterIncrement() is categorically denied
under parallelism anyway.  Therefore, it's OK for
GetCurrentCommandId() to mark the counter value used, as long as it
happens in the leader, not a worker.

Prior to commit e9baa5e9fa, the slightly
incorrect check didn't matter, but now it does.  A test case added by
commit 1804284042 uncovered the problem
by accident; it caused failures with force_parallel_mode=on/regress.

Report and review by Andres Freund.  Patch by me.

Discussion: http://postgr.es/m/20171221143106.5lhtygohvmazli3x@alap3.anarazel.de
2017-12-21 13:19:59 -05:00
Tom Lane 6719b238e8 Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit.
This patch does three interrelated things:

* Create a new expression execution step type EEOP_PARAM_CALLBACK
and add the infrastructure needed for add-on modules to generate that.
As discussed, the best control mechanism for that seems to be to add
another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly, since it can't be retrieved
from the parent plan node's EState.

* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual.  This also lets us get rid
of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
decide which param IDs should be accessible or not.  plpgsql_param_fetch
was already doing the identical masking check, so having callers do it too
seemed redundant.  While I was at it, I added a "speculative" flag to
paramFetch that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide values
of non-DTYPE_VAR variables to the planner for fear of triggering premature
"record not assigned yet" or "field not found" errors during planning.

* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for.  This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal.  copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.)  This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).

Performance testing shows this to be at worst a break-even change,
and it can provide wins ranging up to 20% in test cases involving
accesses to fields of "record" variables.  The fact that values of
such variables can now be exposed to the planner might produce wins
in some situations, too, but I've not pursued that angle.

In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer in a
transient field in ExprState.  The ParamListInfo pointer for a stand-alone
expression is handled the same way; we'd otherwise have had to add
yet another recursively-passed-down argument in expression compilation.

Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
2017-12-21 12:57:45 -05:00
Alvaro Herrera 8a0596cb65 Get rid of copy_partition_key
That function currently exists to avoid leaking memory in
CacheMemoryContext in case of trouble while the partition key is being
built, but there's a better way: allocate everything in a memcxt that
goes away if the current (sub)transaction fails, and once the partition
key is built and no further errors can occur, make the memcxt permanent
by making it a child of CacheMemoryContext.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
2017-12-21 14:21:39 -03:00
Alvaro Herrera 9ef6aba1d3 Fix typo 2017-12-21 13:36:52 -03:00
Tom Lane c98c35cd08 Avoid putting build-location-dependent strings into generated files.
Various Perl scripts we use to generate files were in the habit of
printing things like "generated by $0" into their output files.
That looks like a fine idea at first glance, but it results in
non-reproducible output, because in VPATH builds $0 won't be just
the name of the script file, but a full path for it.  We'd prefer
that you get identical results whether using VPATH or not, so this
is a bad thing.

Some of these places also printed their input file name(s), causing
an additional hazard of the same type.

Hence, establish a policy that thou shalt not print $0, nor input file
pathnames, into output files (they're still allowed in error messages,
though).  Instead just write the script name verbatim.  While we are at
it, we can make these annotations more useful by giving the script's
full relative path name within the PG source tree, eg instead of
Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.

Not all of the changes made here actually affect any files shipped
in finished tarballs today, but it seems best to apply the policy
everyplace so that nobody copies unsafe code into places where it
could matter.

Christoph Berg and Tom Lane

Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
2017-12-21 10:57:06 -05:00
Robert Haas 59d1e2b95a Cancel CV sleep during subtransaction abort.
Generally, error recovery paths that need to do things like
LWLockReleaseAll and pgstat_report_wait_end also need to call
ConditionVariableCancelSleep, but AbortSubTransaction was missed.

Since subtransaction abort might destroy up the DSM segment that
contains the ConditionVariable stored in cv_sleep_target, this
can result in a crash for anything using condition variables.

Reported and diagnosed by Andres Freund.

Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de
2017-12-21 09:24:30 -05:00
Andres Freund 1804284042 Add parallel-aware hash joins.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash.  While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.

After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory.  If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.

The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * divides the work of building the hash table over the CPUs

One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables.  This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes.  Another is that
outer batch 0 must be written to disk if multiple batches are required.

A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.

A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.

Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 00:43:41 -08:00
Robert Haas f94eec490b When passing query strings to workers, pass the terminating \0.
Otherwise, when the query string is read, we might trailing garbage
beyond the end, unless there happens to be a \0 there by good luck.

Report and patch by Thomas Munro. Reviewed by Rafia Sabih.

Discussion: http://postgr.es/m/CAEepm=2SJs7X+_vx8QoDu8d1SMEOxtLhxxLNzZun_BvNkuNhrw@mail.gmail.com
2017-12-20 17:26:50 -05:00
Robert Haas 8526bcb2df Try again to fix accumulation of parallel worker instrumentation.
When a Gather or Gather Merge node is started and stopped multiple
times, accumulate instrumentation data only once, at the end, instead
of after each execution, to avoid recording inflated totals.

Commit 778e78ae9f, the previous attempt
at a fix, instead reset the state after every execution, which worked
for the general instrumentation data but had problems for the additional
instrumentation specific to Sort and Hash nodes.

Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
following a design proposal from Thomas Munro, with a comment tweak
by me.

Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
2017-12-19 12:21:56 -05:00
Robert Haas 38fc54703e Re-fix wrong costing of Sort under Gather Merge.
Commit dc02c7bca4 changed this call
to create_sort_path() to take -1 rather than limit_tuples because,
at that time, there was no way for a Sort beneath a Gather Merge
to become a top-N sort.

Later, commit 3452dc5240 provided
a way for a Sort beneath a Gather Merge to become a top-N sort,
but failed to revert the previous commit in the process.  Do that.

Report and analysis by Jeff Janes; patch by Thomas Munro; review by
Amit Kapila and by me.

Discussion: http://postgr.es/m/CAEepm=1BWtC34vUroA0Uqjw02MaqdUrW+d6WD85_k8SLyPiKHQ@mail.gmail.com
2017-12-19 10:42:17 -05:00
Andres Freund ab9e0e718a Add shared tuplestores.
SharedTuplestore allows multiple participants to write into it and
then read the tuples back from it in parallel.  Each reader receives
partial results.

For now it always uses disk files, but other buffering policies and
other kinds of scans (ie each reader receives complete results) may be
useful in future.

The upcoming parallel hash join feature will use this facility.

Author: Thomas Munro
Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas
Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
2017-12-18 14:23:19 -08:00
Peter Eisentraut 25d532698d Move SCRAM-related name definitions to scram-common.h
Mechanism names for SCRAM and channel binding names have been included
in scram.h by the libpq frontend code, and this header references a set
of routines which are only used by the backend.  scram-common.h is on
the contrary usable by both the backend and libpq, so getting those
names from there seems more reasonable.

Author: Michael Paquier <michael.paquier@gmail.com>
2017-12-18 16:59:48 -05:00
Fujii Masao 56a95ee511 Fix bug in cancellation of non-exclusive backup to avoid assertion failure.
Previously an assertion failure occurred when pg_stop_backup() for
non-exclusive backup was aborted while it's waiting for WAL files to
be archived. This assertion failure happened in do_pg_abort_backup()
which was called when a non-exclusive backup was canceled.
do_pg_abort_backup() assumes that there is at least one non-exclusive
backup running when it's called. But pg_stop_backup() can be canceled
even after it marks the end of non-exclusive backup (e.g.,
during waiting for WAL archiving). This broke the assumption that
do_pg_abort_backup() relies on, and which caused an assertion failure.

This commit changes do_pg_abort_backup() so that it does nothing
when non-exclusive backup has been already marked as completed.
That is, the asssumption is also changed, and do_pg_abort_backup()
now can handle even the case where it's called when there is
no running backup.

Backpatch to 9.6 where SQL-callable non-exclusive backup was added.

Author: Masahiko Sawada and Michael Paquier
Reviewed-By: Robert Haas and Fujii Masao
Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
2017-12-19 03:46:14 +09:00
Robert Haas fd7c0fa732 Fix crashes on plans with multiple Gather (Merge) nodes.
es_query_dsa turns out to be broken by design, because it supposes
that there is only one DSA for the whole query, whereas there is
actually one per Gather (Merge) node.  For now, work around that
problem by setting and clearing the pointer around the sections of
code that might need it.  It's probably a better idea to get rid of
es_query_dsa altogether in favor of having each node keep track
individually of which DSA is relevant, but that seems like more than
we would want to back-patch.

Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit
Kapila, and by me.

Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
2017-12-18 12:22:31 -05:00
Magnus Hagander 7731c32087 Fix typo on comment
Author: David Rowley
2017-12-18 11:24:55 +01:00
Tom Lane b31a9d7dd3 Suppress compiler warning about no function return value.
Compilers that don't know that ereport(ERROR) doesn't return
complained about the new coding in scanint8() introduced by
commit 101c7ee3e.  Tweak coding to avoid the warning.
Per buildfarm.
2017-12-17 00:41:41 -05:00
Andres Freund 699bf7d05c Perform a lot more sanity checks when freezing tuples.
The previous commit has shown that the sanity checks around freezing
aren't strong enough. Strengthening them seems especially important
because the existance of the bug has caused corruption that we don't
want to make even worse during future vacuum cycles.

The errors are emitted with ereport rather than elog, despite being
"should never happen" messages, so a proper error code is emitted. To
avoid superflous translations, mark messages as internal.

Author: Andres Freund and Alvaro Herrera
Reviewed-By: Alvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
2017-12-14 18:20:47 -08:00
Andres Freund 9c2f0a6c3c Fix pruning of locked and updated tuples.
Previously it was possible that a tuple was not pruned during vacuum,
even though its update xmax (i.e. the updating xid in a multixact with
both key share lockers and an updater) was below the cutoff horizon.

As the freezing code assumed, rightly so, that that's not supposed to
happen, xmax would be preserved (as a member of a new multixact or
xmax directly). That causes two problems: For one the tuple is below
the xmin horizon, which can cause problems if the clog is truncated or
once there's an xid wraparound. The bigger problem is that that will
break HOT chains, which in turn can lead two to breakages: First,
failing index lookups, which in turn can e.g lead to constraints being
violated. Second, future hot prunes / vacuums can end up making
invisible tuples visible again. There's other harmful scenarios.

Fix the problem by recognizing that tuples can be DEAD instead of
RECENTLY_DEAD, even if the multixactid has alive members, if the
update_xid is below the xmin horizon. That's safe because newer
versions of the tuple will contain the locking xids.

A followup commit will harden the code somewhat against future similar
bugs and already corrupted data.

Author: Andres Freund, with changes by Alvaro Herrera
Reported-By: Daniel Wood
Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter
   Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier
Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier
Discussion:
    https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
    https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
2017-12-14 18:20:47 -08:00
Andrew Dunstan 0fedb4ea69 Fix walsender timeouts when decoding a large transaction
The logical slots have a fast code path for sending data so as not to
impose too high a per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that a transaction with a large number of changes may take a
very long time to be processed and sent to the client. This causes the
walsender to ignore interrupts for potentially a long time and more
importantly it will result in the walsender being killed due to
timeout at the end of such a transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when the last keepalive check happened less
than half the walsender timeout ago. Otherwise the slower code path will
be taken.

Backpatched to 9.4

Petr Jelinek, reviewed by  Kyotaro HORIGUCHI, Yura Sokolov,  Craig
Ringer and Robert Haas.

Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
2017-12-14 11:13:14 -05:00
Andres Freund 538d114f6d Allow executor nodes to change their ExecProcNode function.
In order for executor nodes to be able to change their ExecProcNode function
after ExecInitNode() has finished, provide ExecSetExecProcNode().  This allows
any wrappers functions that only execProcnode.c knows about to be reinstalled.
The motivation for wanting to change ExecProcNode after ExecInitNode() has
finished is that it is not known until later whether parallel query is
available, so if a parallel variant is to be installed then ExecInitNode()
is too soon to decide.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
2017-12-13 15:47:01 -08:00
Andres Freund 923e8dee88 Add defenses against pre-crash files to BufFileOpenShared().
Crash restarts currently don't clean up temporary files, as a debugging aid.
If a left-over file happens to have the same name as a segment file we're
trying to create, we'll just truncate and reuse it, but there is a problem:
BufFileOpenShared() determines how many segment files exist by trying to open
.0, .1, .2, ... until it finds no more files.  It might be confused by a junk
file that has the next segment number.  To defend against that, make sure we
always create a gap after the end file by unlinking the following name if it
exists.  Also make it an error to try to open a BufFile that doesn't exist
(has no segment 0), so as not to encourage the development of client code
that depends on an interface that we can't reliably provide.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D2jhCbC_GFQJaaDhWxLB4EXtT3vVd5czuRNaqF5CWSTog%40mail.gmail.com
2017-12-13 13:27:41 -08:00
Robert Haas 884a60840c Fix parallel index scan hang with deleted or half-dead pages.
The previous coding forgot to release the scan before seizing
it again, leading to a lockup.

Report by Patrick Hemmer.  Diagnosis by Thomas Munro.  Patch by
Amit Kapila.

Discussion: http://postgr.es/m/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com
2017-12-13 16:15:44 -05:00
Robert Haas 1d6fb35ad6 Revert "Fix accumulation of parallel worker instrumentation."
This reverts commit 2c09a5c12a.  Per
further discussion, that doesn't seem to be the best possible fix.

Discussion: http://postgr.es/m/CAA4eK1LW2aFKzY3=vwvc=t-juzPPVWP2uT1bpx_MeyEqnM+p8g@mail.gmail.com
2017-12-13 15:19:28 -05:00
Tom Lane 9fa6f00b13 Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts.  The key ideas are:

* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.

* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).

* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.

The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations.  That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.

Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant.  Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)

The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.

To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option.  AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.

An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.

Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module.  That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.

In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles.  The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.

Also, mark the memory context method-pointer structs "const",
just for cleanliness.

Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:16 -05:00
Peter Eisentraut 3d8874224f Fix crash when using CALL on an aggregate
Author: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reported-by: Rushabh Lathia <rushabh.lathia@gmail.com>
2017-12-13 10:37:48 -05:00
Andres Freund 8e211f5391 Add float.h include to int8.c, for isnan().
port.h redirects isnan() to _isnan() on windows, which in turn is
provided by float.h rather than math.h. Therefore include the latter
as well.

Per buildfarm.
2017-12-12 23:32:43 -08:00
Andres Freund f512a6e132 Consistently use PG_INT(16|32|64)_(MIN|MAX).
Per buildfarm animal woodlouse.
2017-12-12 18:19:13 -08:00
Andres Freund 101c7ee3ee Use new overflow aware integer operations.
A previous commit added inline functions that provide fast(er) and
correct overflow checks for signed integer math. Use them in a
significant portion of backend code.  There's more to touch in both
backend and frontend code, but these were the easily identifiable
cases.

The old overflow checks are noticeable in integer heavy workloads.

A secondary benefit is that getting rid of overflow checks that rely
on signed integer overflow wrapping around, will allow us to get rid
of -fwrapv in the future. Which in turn slows down other code.

Author: Andres Freund
Discussion: https://postgr.es/m/20171024103954.ztmatprlglz3rwke@alap3.anarazel.de
2017-12-12 16:55:37 -08:00
Robert Haas 95b52351fe Remove obsolete comment.
Commit 8b304b8b72 removed replacement
selection, but left behind this comment text.  The optimization to
which the comment refers is not relevant without replacement
selection, because if we had so few tuples as to require only one
tape, we would have just completed the sort in memory.

Peter Geoghegan

Discussion: http://postgr.es/m/CAH2-WznqupLA8CMjp+vqzoe0yXu0DYYbQSNZxmgN76tLnAOZ_w@mail.gmail.com
2017-12-12 19:33:50 -05:00
Robert Haas d329dc2ea4 Remove bug from OPTIMIZER_DEBUG code for partition-wise join.
Etsuro Fujita, reviewed by Ashutosh Bapat

Discussion: http://postgr.es/m/5A2A60E6.6000008@lab.ntt.co.jp
2017-12-12 10:52:15 -05:00
Peter Eisentraut 4034db215b Fix comment
Reported-by: Noah Misch <noah@leadboat.com>
2017-12-11 16:37:39 -05:00
Tom Lane 7eb16ab17d Fix corner-case coredump in _SPI_error_callback().
I noticed that _SPI_execute_plan initially sets spierrcontext.arg = NULL,
and only fills it in some time later.  If an error were to happen in
between, _SPI_error_callback would try to dereference the null pointer.
This is unlikely --- there's not much between those points except
push-snapshot calls --- but it's clearly not impossible.  Tweak the
callback to do nothing if the pointer isn't set yet.

It's been like this for awhile, so back-patch to all supported branches.
2017-12-11 16:34:28 -05:00
Robert Haas 01a0ca1bed Improve comment about PartitionBoundInfoData.
Ashutosh Bapat, per discussion with Julien Rouhaund, who also
reviewed this patch.

Discussion: http://postgr.es/m/CAFjFpReBR3ftK9C23LLCZY_TDXhhjB_dgE-L9+mfTnA=gkvdvQ@mail.gmail.com
2017-12-11 12:52:15 -05:00
Magnus Hagander d8f632caec Fix typo
Reported by Robins Tharakan
2017-12-09 11:40:31 +01:00
Peter Eisentraut 005ac298b1 Prohibit identity columns on typed tables and partitions
Those cases currently crash and supporting them is more work then
originally thought, so we'll just prohibit these scenarios for now.

Author: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Reported-by: Мансур Галиев <gomer94@yandex.ru>
Bug: #14866
2017-12-08 12:13:04 -05:00
Peter Eisentraut af9f8b7ca3 Fix mistake in comment
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-12-08 11:23:36 -05:00
Peter Eisentraut 2d2d06b7e2 Apply identity sequence values on COPY
A COPY into a table should apply identity sequence values just like it
does for ordinary defaults.  This was previously forgotten, leading to
null values being inserted, which in turn would fail because identity
columns have not-null constraints.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Steven Winfield <steven.winfield@cantabcapital.com>
Bug: #14952
2017-12-08 09:18:18 -05:00
Robert Haas 28724fd90d Report failure to start a background worker.
When a worker is flagged as BGW_NEVER_RESTART and we fail to start it,
or if it is not marked BGW_NEVER_RESTART but is terminated before
startup succeeds, what BgwHandleStatus should be reported?  The
previous code really hadn't considered this possibility (as indicated
by the comments which ignore it completely) and would typically return
BGWH_NOT_YET_STARTED, but that's not a good answer, because then
there's no way for code using GetBackgroundWorkerPid() to tell the
difference between a worker that has not started but will start
later and a worker that has not started and will never be started.
So, when this case happens, return BGWH_STOPPED instead.  Update the
comments to reflect this.

The preceding fix by itself is insufficient to fix the problem,
because the old code also didn't send a notification to the process
identified in bgw_notify_pid when startup failed.  That might've
been technically correct under the theory that the status of the
worker was BGWH_NOT_YET_STARTED, because the status would indeed not
change when the worker failed to start, but now that we're more
usefully reporting BGWH_STOPPED, a notification is needed.

Without these fixes, code which starts background workers and then
uses the recommended APIs to wait for those background workers to
start would hang indefinitely if the postmaster failed to fork a
worker.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com
2017-12-06 08:58:27 -05:00
Robert Haas 9c64ddd414 Fix Parallel Append crash.
Reported by Tom Lane and the buildfarm.

Amul Sul and Amit Khandekar

Discussion: http://postgr.es/m/17868.1512519318@sss.pgh.pa.us
Discussion: http://postgr.es/m/CAJ3gD9cJQ4d-XhmZ6BqM9rMM2KDBfpkdgOAb4+psz56uBuMQ_A@mail.gmail.com
2017-12-06 08:42:50 -05:00
Robert Haas ab72716778 Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention.  We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.

Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 17:28:39 -05:00
Robert Haas 2c09a5c12a Fix accumulation of parallel worker instrumentation.
When a Gather or Gather Merge node is started and stopped multiple
times, the old code wouldn't reset the shared state between executions,
potentially resulting in dramatically inflated instrumentation data
for nodes beneath it.  (The per-worker instrumentation ended up OK,
I think, but the overall totals were inflated.)

Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
reviewed and tweaked a bit by me.

Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
2017-12-05 14:35:33 -05:00
Andres Freund 5bcf389ecf Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate.
If a hash join appears in a parallel query, there may be no hash table
available for explain.c to inspect even though a hash table may have
been built in other processes.  This could happen either because
parallel_leader_participation was set to off or because the leader
happened to hit the end of the outer relation immediately (even though
the complete relation is not empty) and decided not to build the hash
table.

Commit bf11e7ee introduced a way for workers to exchange
instrumentation via the DSM segment for Sort nodes even though they
are not parallel-aware.  This commit does the same for Hash nodes, so
that explain.c has a way to find instrumentation data from an
arbitrary participant that actually built the hash table.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com
2017-12-05 10:55:56 -08:00
Tom Lane 8dc3c971a9 Treat directory open failures as hard errors in ResetUnloggedRelations().
Previously, this code just reported such problems at LOG level and kept
going.  The problem with this approach is that transient failures (e.g.,
ENFILE) could prevent us from resetting unlogged relations to empty,
yet allow recovery to appear to complete successfully.  That seems like
a data corruption hazard large enough to treat such problems as reasons
to fail startup.

For the same reason, treat unlink failures for unlogged files as hard
errors not just LOG messages.  It's a little odd that we did it like that
when file-level errors in other steps (copy_file, fsync_fname) are ERRORs.

The sole case that I left alone is that ENOENT failure on a tablespace
(not database) directory is not an error, though it will now be logged
rather than just silently ignored.  This is to cover the scenario where
a previous DROP TABLESPACE removed the tablespace directory but failed
before removing the pg_tblspc symlink.  I'm not sure that that's very
likely in practice, but that seems like the only real excuse for the
old behavior here, so let's allow for it.  (As coded, this will also
allow ENOENT on $PGDATA/base/.  But since we'll fail soon enough if
that's gone, I don't think we need to complicate this code by
distinguishing that from a true tablespace case.)

Discussion: https://postgr.es/m/21040.1512418508@sss.pgh.pa.us
2017-12-04 20:52:59 -05:00
Tom Lane 066bc21c0e Simplify do_pg_start_backup's API by opening pg_tblspc internally.
do_pg_start_backup() expects its callers to pass in an open DIR pointer
for the pg_tblspc directory, but there's no apparent advantage in that.
It complicates the callers without adding any flexibility, and there's no
robustness advantage, since we surely have to be prepared for errors during
the scan of pg_tblspc anyway.  In fact, by holding an extra kernel resource
during operations like the preliminary checkpoint, we might be making
things a fraction more failure-prone not less.  Hence, remove that argument
and open the directory just for the duration of the actual scan.

Discussion: https://postgr.es/m/28752.1512413887@sss.pgh.pa.us
2017-12-04 18:37:54 -05:00
Tom Lane 561885db05 Improve error handling in RemovePgTempFiles().
Modify this function and its subsidiaries so that syscall failures are
reported via ereport(LOG), rather than silently ignored as before.
We don't want to throw a hard ERROR, as that would prevent database
startup, and getting rid of leftover temporary files is not important
enough for that.  On the other hand, not reporting trouble at all
seems like an odd choice not in line with current project norms,
especially since any failure here is quite unexpected.

On the same reasoning, adjust these functions' AllocateDir/ReadDir calls
so that failure to scan a directory results in LOG not ERROR.  I also
removed the previous practice of silently ignoring ENOENT failures during
directory opens --- there are some corner cases where that could happen
given a previous database crash, but that seems like a bad excuse for
ignoring a condition that isn't expected in most cases.  A LOG message
during postmaster start seems OK in such situations, and better than
no output at all.

In passing, make RemovePgTempRelationFiles' test for "is the file name
all digits" look more like the way it's done elsewhere.

Discussion: https://postgr.es/m/19907.1512402254@sss.pgh.pa.us
2017-12-04 17:59:35 -05:00
Tom Lane 2069e6faa0 Clean up assorted messiness around AllocateDir() usage.
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures.  It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode.  And it eliminates a lot of just plain redundant error-handling
code.

In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern

	dir = AllocateDir(path);
	while ((de = ReadDirExtended(dir, path, LOG)) != NULL)

if they'd like to treat directory-open failures as mere LOG conditions
rather than errors.  Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.

Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine.  Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above.  (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.)  In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.

Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported.  There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.

Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming.  (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire.  If so, this function was broken for its
own purposes as well as breaking callers.)

And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.

Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless.  Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes.  The rest of this is
essentially cosmetic and need not get back-patched.

Michael Paquier, with a bit of additional work by me

Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
2017-12-04 17:02:56 -05:00
Robert Haas ab6eaee884 When VACUUM or ANALYZE skips a concurrently dropped table, log it.
Hopefully, the additional logging will help avoid confusion that
could otherwise result.

Nathan Bossart, reviewed by Michael Paquier, Fabrízio Mello, and me
2017-12-04 15:25:55 -05:00
Tom Lane ecc27d55f4 Support boolean columns in functional-dependency statistics.
There's no good reason that the multicolumn stats stuff shouldn't work on
booleans.  But it looked only for "Var = pseudoconstant" clauses, and it
will seldom find those for boolean Vars, since earlier phases of planning
will fold "boolvar = true" or "boolvar = false" to just "boolvar" or
"NOT boolvar" respectively.  Improve dependencies_clauselist_selectivity()
to recognize such clauses as equivalent to equality restrictions.

This fixes a failure of the extended stats mechanism to apply in a case
reported by Vitaliy Garnashevich.  It's not a complete solution to his
problem because the bitmap-scan costing code isn't consulting extended
stats where it should, but that's surely an independent issue.

In passing, improve some comments, get rid of a NumRelids() test that's
redundant with the preceding bms_membership() test, and fix
dependencies_clauselist_selectivity() so that estimatedclauses actually
is a pure output argument as stated by its API contract.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/73a4936d-2814-dc08-ed0c-978f76f435b0@gmail.com
2017-12-04 11:51:43 -05:00
Robert Haas 9f4992e2a9 Remove memory leak protection from Gather and Gather Merge nodes.
Before commit 6b65a7fe62, tqueue.c could
perform tuple remapping and thus leak memory, which is why commit
af33039317 made TupleQueueReaderNext
run in a short-lived context.  Now, however, tqueue.c has been reduced
to a shadow of its former self, and there shouldn't be any chance of
leaks any more.  Accordingly, remove some tuple copying and memory
context manipulation to speed up processing.

Patch by me, reviewed by Amit Kapila.  Some testing by Rafia Sabih.

Discussion: http://postgr.es/m/CAA4eK1LSDydwrNjmYSNkfJ3ZivGSWH9SVswh6QpNzsMdj_oOQA@mail.gmail.com
2017-12-04 10:39:24 -05:00
Andres Freund ec6a040056 Adjust #ifdef EXEC_BACKEND RemovePgTempFilesInDir() call.
Other callers were adjusted in the course of
dc6c4c9dc2.

Per buildfarm.
2017-12-01 17:28:05 -08:00
Andres Freund dc6c4c9dc2 Add infrastructure for sharing temporary files between backends.
SharedFileSet allows temporary files to be created by one backend and
then exported for read-only access by other backends, with clean-up
managed by reference counting associated with a DSM segment.  This
includes changes to fd.c and buffile.c to support the new kind of
temporary file.

This will be used by an upcoming patch adding support for parallel
hash joins.

Author: Thomas Munro
Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas, Rushabh Lathia
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAH2-WznJ_UgLux=_jTgCQ4yFz0iBntudsNKa1we3kN1BAG=88w@mail.gmail.com
2017-12-01 16:30:56 -08:00
Robert Haas 35438e5763 Minor code beautification in partition_bounds_equal.
Use get_greatest_modulus more consistently, instead of doing the
same thing in an ad-hoc manner in this one place.

Ashutosh Bapat

Discussion: http://postgr.es/m/CAFjFpReT9L4RCiJBKOyWC2=i02kv9uG2fx=4Fv7kFY2t0SPCgw@mail.gmail.com
2017-12-01 13:52:59 -05:00
Robert Haas 87c37e3291 Re-allow INSERT .. ON CONFLICT DO NOTHING on partitioned tables.
Commit 8355a011a0 was reverted in
f05230752d, but this attempt is
hopefully better-considered: we now pass the correct value to
ExecOpenIndices, which should avoid the crash that we hit before.

Amit Langote, reviewed by Simon Riggs and by me.  Some final
editing by me.

Discussion: http://postgr.es/m/7ff1e8ec-dc39-96b1-7f47-ff5965dceeac@lab.ntt.co.jp
2017-12-01 12:53:21 -05:00
Robert Haas 1cbc17aaca Try to exclude partitioned tables in toto.
Ashutosh Bapat, reviewed by Jeevan Chalke.  Comment by me.

Discussion: http://postgr.es/m/CAFjFpRcuRaydz88CY_aQekmuvmN2A9ax5z0k=ppT+s8KS8xMRA@mail.gmail.com
2017-12-01 10:59:09 -05:00
Robert Haas 59c8078744 Fix uninitialized memory reference.
Without this, when partdesc->nparts == 0, we end up calling
ExecBuildSlotPartitionKeyDescription without initializing values
and isnull.

Reported by Coverity via Michael Paquier.  Patch by Michael Paquier,
reviewed and revised by Amit Langote.

Discussion: http://postgr.es/m/CAB7nPqQ3mwkdMoPY-ocgTpPnjd8TKOadMxdTtMLvEzF8480Zfg@mail.gmail.com
2017-12-01 10:05:00 -05:00
Peter Eisentraut 86ab28fbd1 Check channel binding flag at end of SCRAM exchange
We need to check whether the channel-binding flag encoded in the
client-final-message is the same one sent in the client-first-message.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-12-01 09:53:26 -05:00
Robert Haas 06ae669c92 Remove extra word from comment.
David Rowley, who also was the primary author of the patch that
added this function; the attribution in my previous commit,
84940644de, was incorrect due to
sloppiness on my part.

Discussion: http://postgr.es/m/CAKJS1f_0iSiLQsf_c06AzOWAc3eS6ePjjVQFpcFv3W-O5aktnQ@mail.gmail.com
2017-11-30 16:22:38 -05:00
Peter Eisentraut e4128ee767 SQL procedures
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar.  This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.

This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL).  There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql.  Support for defining
procedures is available in all the languages supplied by the core
distribution.

While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2017-11-30 11:03:20 -05:00
Robert Haas 1761653bbb Make create_unique_path manage memory like mark_dummy_rel.
Put the unique path in the same context as the owning RelOptInfo, rather
than the toplevel planner context.  This is how this function worked
originally, but commit f41803bb39
changed it without explanation.  mark_dummy_rel adopted the older (or
newer?) technique in commit eca75a12a2,
which also featured a much better explanation of why it is correct.
So, switch back to that technique here, with the same explanation
given there.

Although this fixes a possible memory leak when GEQO is in use, the
leak is minor and probably nobody cares, so no back-patch.

Ashutosh Bapat, reviewed by Tom Lane and by me

Discussion: http://postgr.es/m/CAFjFpRcXkHHrXyD9BCvkgGJV4TnHG2SWJ0PhJfrDu3NAcQvh7g@mail.gmail.com
2017-11-30 09:50:10 -05:00
Tom Lane 7ca25b7de6 Fix neqjoinsel's behavior for semi/anti join cases.
Previously, this function estimated the selectivity as 1 minus eqjoinsel()
for the negator equality operator, regardless of join type (I think there
was an expectation that eqjoinsel would handle the join type).  But
actually this is completely wrong for semijoin cases: the fraction of the
LHS that has a non-matching row is not one minus the fraction of the LHS
that has a matching row.  In reality a semijoin with <> will nearly always
succeed: it can only fail when the RHS is empty, or it contains a single
distinct value that is equal to the particular LHS value, or the LHS value
is null.  The only one of those things we should have much confidence in
estimating is the fraction of LHS values that are null, so let's just take
the selectivity as 1 minus outer nullfrac.

Per coding convention, antijoin should be estimated the same as semijoin.

Arguably this is a bug fix, but in view of the lack of field complaints
and the risk of destabilizing plans, no back-patch.

Thomas Munro, reviewed by Ashutosh Bapat

Discussion: https://postgr.es/m/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com
2017-11-29 22:00:37 -05:00
Andres Freund 1145acc70d Add a barrier primitive for synchronizing backends.
Provide support for dynamic or static parties of processes to wait for
all processes to reach point in the code before continuing.

This is similar to the mechanism of the same name in POSIX threads and
MPI, though has explicit phasing and dynamic party support like the
Java core library's Phaser.

This will be used by an upcoming patch adding support for parallel
hash joins.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=2_y7oi01OjA_wLvYcWMc9_d=LaoxrY3eiROCZkB_qakA@mail.gmail.com
2017-11-29 17:07:16 -08:00
Robert Haas 84940644de New C function: bms_add_range
This will be used by pending patches to improve partition pruning.

Amit Langote and Kyotaro Horiguchi, per a suggestion from David
Rowley.  Review and testing of the larger patch set of which this is a
part by Ashutosh Bapat, David Rowley, Dilip Kumar, Jesper Pedersen,
Rajkumar Raghuwanshi, Beena Emerson, Amul Sul, and Kyotaro Horiguchi.

Discussion: http://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2017-11-29 17:12:05 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Tom Lane 801386af62 Clarify old comment about qual_is_pushdown_safe's handling of subplans.
This comment glossed over the difference between initplans and subplans,
but they are indeed different for our purposes here.
2017-11-28 23:32:17 -05:00
Alvaro Herrera 3848d21673 Make memset() use sizeof() rather than re-compute size
This is simpler and more closely follows overwhelming precedent.

Report and patch by Mark Dilger.
Discussion: https://postgr.es/m/9A68FB88-5F45-4848-9926-8586E2D777D1@gmail.com
2017-11-29 00:13:45 -03:00
Alvaro Herrera 414cd434ff Fix extstat collection when no stats are produced for a column
This is a mistakenly placed conditional in bf2a691e02.

Reported by Justin Pryzby
Discussion: https://postgr.es/m/20171117214352.GE25796@telsasoft.com
2017-11-28 23:40:11 -03:00
Robert Haas cd482f2951 Fix wrong function name in comment.
Rushabh Lathia

Discussion: http://postgr.es/m/CAGPqQf2z5g+7YmGZSZgKoiFsaUB+63Rzmz8-5PQHuS6hd14FEg@mail.gmail.com
2017-11-28 14:17:21 -05:00
Robert Haas 2d7950f222 If a range-partitioned table has no default partition, reject null keys.
Commit 4e5fe9ad19 introduced this
problem.  Also add a test so it doesn't get broken again.

Report by Rushabh Lathia.  Fix by Amit Langote.  Reviewed by Rushabh
Lathia and Amul Sul.  Tweaked by me.

Discussion: http://postgr.es/m/CAGPqQf0Y1iJyk4QJBdMf=pS9i6Q0JUMM_h5-qkR3OMJ-e04PyA@mail.gmail.com
2017-11-28 14:11:16 -05:00
Robert Haas 445dbd82a3 Fix ReinitializeParallelDSM to tolerate finding no error queues.
Commit d466335064 changed things so
that shm_toc_lookup would fail with an error rather than silently
returning NULL in the hope that such failures would be reported
in a useful way rather than via a system crash.  However, it
overlooked the fact that the lookup of PARALLEL_KEY_ERROR_QUEUE
in ReinitializeParallelDSM is expected to fail when no DSM segment
was created in the first place; in that case, we end up with a
backend-private memory segment that still contains an entry for
PARALLEL_KEY_FIXED but no others.  Consequently a benign failure
to initialize parallelism can escalate into an elog(ERROR);
repair.

Discussion: http://postgr.es/m/CA+Tgmob8LFw55DzH1QEREpBEA9RJ_W_amhBFCVZ6WMwUhVpOqg@mail.gmail.com
2017-11-28 12:15:38 -05:00
Robert Haas c6755e233b Teach bitmap heap scan to cope with absence of a DSA.
If we have a plan that uses parallelism but are unable to execute it
using parallelism, for example due to a lack of available DSM
segments, then the EState's es_query_dsa will be NULL.  Parallel
bitmap heap scan needs to fall back to a non-parallel scan in such
cases.

Patch by me, reviewed by Dilip Kumar

Discussion: http://postgr.es/m/CAEepm=0kADK5inNf_KuemjX=HQ=PuTP0DykM--fO5jS5ePVFEA@mail.gmail.com
2017-11-28 11:44:59 -05:00
Robert Haas 7b88d63a91 Add null test to partition constraint for default range partitions.
Non-default range partitions have a constraint which include null
tests, and both default and non-default list partitions also have a
constraint which includes null tests, but for some reason this was
missed for default range partitions.  This could cause the partition
constraint to evaluate to false for rows that were (correctly) routed
to that partition by insert tuple routing, which could in turn
cause constraint exclusion to prune the default partition in cases
where it should not.

Amit Langote, reviewed by Kyotaro Horiguchi

Discussion: http://postgr.es/m/ba7aaeb1-4399-220e-70b4-62eade1522d0@lab.ntt.co.jp
2017-11-28 10:51:01 -05:00
Robert Haas 487a0c1518 Fix typo.
Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoCq_QG6UEo6yb-purmhLQjDLsCA2_B+8Wh3ah9P2-XmrQ@mail.gmail.com
2017-11-28 08:19:01 -05:00
Tom Lane cb03fa33ae Fix assorted syscache lookup sloppiness in partition-related code.
heap_drop_with_catalog and ATExecDetachPartition neglected to check for
SearchSysCache failures, as noted in bugs #14927 and #14928 from Pan Bian.
Such failures are pretty unlikely, since we should already have some sort
of lock on the rel at these points, but it's neither a good idea nor
per project style to omit a check for failure.

Also, StorePartitionKey contained a syscache lookup that it never did
anything with, including never releasing the result.  Presumably the
reason why we don't see refcount-leak complaints is that the lookup
always fails; but in any case it's pretty useless, so remove it.

All of these errors were evidently introduced by the relation
partitioning feature.  Back-patch to v10 where that came in.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/20171127090105.1463.3962@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20171127091341.1468.72696@wrigleys.postgresql.org
2017-11-27 19:22:08 -05:00
Tom Lane 9a785ad573 Fix creation of resjunk tlist entries for inherited mixed UPDATE/DELETE.
rewriteTargetListUD's processing is dependent on the relkind of the query's
target table.  That was fine at the time it was made to act that way, even
for queries on inheritance trees, because all tables in an inheritance tree
would necessarily be plain tables.  However, the 9.5 feature addition
allowing some members of an inheritance tree to be foreign tables broke the
assumption that rewriteTargetListUD's output tlist could be applied to all
child tables with nothing more than column-number mapping.  This led to
visible failures if foreign child tables had row-level triggers, and would
also break in cases where child tables belonged to FDWs that used methods
other than CTID for row identification.

To fix, delay running rewriteTargetListUD until after the planner has
expanded inheritance, so that it is applied separately to the (already
mapped) tlist for each child table.  We can conveniently call it from
preprocess_targetlist.  Refactor associated code slightly to avoid the
need to heap_open the target relation multiple times during
preprocess_targetlist.  (The APIs remain a bit ugly, particularly around
the point of which steps scribble on parse->targetList and which don't.
But avoiding such scribbling would require a change in FDW callback APIs,
which is more pain than it's worth.)

Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when
we transition from rows providing a CTID to rows that don't.  (That's
really an independent bug, but it manifests in much the same cases.)

Add a regression test checking one manifestation of this problem, which
was that row-level triggers on a foreign child table did not work right.

Back-patch to 9.5 where the problem was introduced.

Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat

Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru
2017-11-27 17:54:07 -05:00
Simon Riggs 59af8d4384 Pad XLogReaderState's per-buffer data_bufsz more aggressively.
Originally, we palloc'd this buffer just barely big enough to hold the
largest xlog backup block seen so far. We now MAXALIGN the palloc size.

The original coding could result in many repeated palloc cycles, in the
worst case where we see a series ofgradually larger xlog records.  We
ameliorate that similarly to 8735978e7a
by imposing a minimum buffer size of BLCKSZ.

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-27 09:34:05 +00:00
Tom Lane 8735978e7a Pad XLogReaderState's main_data buffer more aggressively.
Originally, we palloc'd this buffer just barely big enough to hold the
largest xlog record seen so far.  It turns out that that can result in
valgrind complaints, because some compilers will emit code that assumes
it can safely fetch padding bytes at the end of a struct, and those
padding bytes were unallocated so far as aset.c was concerned.  We can
fix that by MAXALIGN'ing the palloc request size, ensuring that it is big
enough to include any possible padding that might've been omitted from
the on-disk record.

An additional objection to the original coding is that it could result in
many repeated palloc cycles, in the worst case where we see a series of
gradually larger xlog records.  We can ameliorate that cheaply by
imposing a minimum buffer size that's large enough for most xlog records.
BLCKSZ/2 was chosen after a bit of discussion.

In passing, remove an obsolete comment in struct xl_heap_new_cid that the
combocid field is free due to alignment considerations.  Perhaps that was
true at some point, but it's not now.

Back-patch to 9.5 where this code came in.

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-26 15:17:24 -05:00
Joe Conway 752714dd9d Make has_sequence_privilege support WITH GRANT OPTION
The various has_*_privilege() functions all support an optional
WITH GRANT OPTION added to the supported privilege types to test
whether the privilege is held with grant option. That is, all except
has_sequence_privilege() variations. Fix that.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/005147f6-8280-42e9-5a03-dd2c1e4397ef@joeconway.com
2017-11-26 09:49:40 -08:00
Tom Lane 9b63c13f0a Repair failure with SubPlans in multi-row VALUES lists.
When nodeValuesscan.c was written, it was impossible to have a SubPlan in
VALUES --- any sub-SELECT there would have to be uncorrelated and thereby
would produce an InitPlan instead.  We therefore took a shortcut in the
logic that throws away a ValuesScan's per-row expression evaluation data
structures.  This was broken by the introduction of LATERAL however; a
sub-SELECT containing a lateral reference produces a correlated SubPlan.

The cleanest fix for this would be to give up the optimization of
discarding the expression eval state.  But that still seems pretty
unappetizing for long VALUES lists.  It seems to work to just prevent
the subexpressions from hooking into the ValuesScan node's subPlan
list, so let's do that and see how well it works.  (If this breaks,
due to additional connections between the subexpressions and the outer
query structures, we might consider compromises like throwing away data
only for VALUES rows not containing SubPlans.)

Per bug #14924 from Christian Duta.  Back-patch to 9.3 where LATERAL
was introduced.

Discussion: https://postgr.es/m/20171124120836.1463.5310@wrigleys.postgresql.org
2017-11-25 14:15:48 -05:00
Tom Lane ab97aaac8f Update buffile.h/.c comments for removal of non-temp option.
Commit 11e264517 removed BufFile's isTemp flag, thereby eliminating
the possibility of resurrecting BufFileCreate().  But it left that
function in place, as well as a bunch of comments describing how things
worked for the non-temp-file case.  At best, that's now a source of
confusion.  So remove the long-since-commented-out function and change
relevant comments.

I (tgl) wanted to rename BufFileCreateTemp() to BufFileCreate(), but
that seems not to be the consensus position, so leave it as-is.

In passing, fix commit f0828b2fc's failure to update BufFileSeek's
comment to match the change of its argument type from long to off_t.
(I think that might actually have been intentional at the time, but
now that 64-bit off_t is nearly universal, it looks anachronistic.)

Thomas Munro and Tom Lane

Discussion: https://postgr.es/m/E1eFVyl-0008J1-RO@gemulon.postgresql.org
2017-11-25 13:19:43 -05:00
Tom Lane df3a66e282 Improve planner's handling of set-returning functions in grouping columns.
Improve query_is_distinct_for() to accept SRFs in the targetlist when
we can prove distinctness from a DISTINCT clause.  In that case the
de-duplication will surely happen after SRF expansion, so the proof
still works.  Continue to punt in the case where we'd try to prove
distinctness from GROUP BY (or, in the future, source relations).
To do that, we'd have to determine whether the SRFs were in the
grouping columns or elsewhere in the tlist, and it still doesn't
seem worth the trouble.  But this trivial change allows us to
recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces
unique-ified output, which seems worth having.

Also, fix estimate_num_groups() to consider the possibility of SRFs in
the grouping columns.  Its failure to do so was masked before v10 because
grouping_planner() scaled up plan rowcount estimates by the estimated SRF
multiplier after performing grouping.  That doesn't happen anymore, which
is more correct, but it means we need an adjustment in the estimate for
the number of groups.  Failure to do this leads to an underestimate for
the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)"
compared to what 9.6 and earlier estimated, thus breaking plan choices
in some cases.

Per report from Dmitry Shalashov.  Back-patch to v10 to avoid degraded
plan choices compared to previous releases.

Discussion: https://postgr.es/m/CAKPeCUGAeHgoh5O=SvcQxREVkoX7UdeJUMj1F5=aBNvoTa+O8w@mail.gmail.com
2017-11-25 11:48:09 -05:00
Robert Haas b10967eddf Avoid projecting tuples unnecessarily in Gather and Gather Merge.
It's most often the case that the target list for the Gather (Merge)
node matches the target list supplied by the underlying plan node;
when this is so, we can avoid the overhead of projecting.

This depends on commit f455e1125e for
proper functioning.

Idea by Andres Freund.  Patch by me.  Review by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
2017-11-25 10:49:17 -05:00
Tom Lane 0f2458ff5f Improve valgrind logic in aset.c, and fix multiple issues in generation.c.
Revise aset.c so that all the "private" fields of chunk headers are
marked NOACCESS when outside the module, improving on the previous
coding which protected only requested_size.  Fix a couple of corner
case bugs, such as failing to re-protect the header during a failure
exit from AllocSetRealloc, and wrong padding-size calculation for an
oversize allocation request.

Apply the same design to generation.c, and also fix several bugs therein
that I found by dint of hacking the code to use generation.c as the
standard allocator and then running the core regression tests with it.
Notably, we have to track the actual size of each block, else the
wipe_mem call in GenerationReset clears the wrong amount of memory for
an oversize-chunk block; and GenerationCheck needs a way of identifying
freed chunks that isn't fooled by palloc(0).  I chose to fix the latter
by resetting the context pointer to NULL in a freed chunk, roughly like
what happens in a freed aset.c chunk.

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-24 19:28:19 -05:00
Tom Lane f65d21b258 Mostly-cosmetic improvements in memory chunk header alignment coding.
Add commentary about what we're doing and why.  Apply the method used for
padding in GenerationChunk to AllocChunkData, replacing the rather ad-hoc
solution used in commit 7e3aa03b4.  Reorder fields in GenerationChunk so
that the padding calculation will work even if sizeof(size_t) is different
from sizeof(void *) --- likely that will never happen, but we don't need
the assumption if we do it like this.  Improve static assertions about
alignment.

In passing, fix a couple of oversights in the "large chunk" path in
GenerationAlloc().

Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
2017-11-24 15:50:22 -05:00
Tom Lane cc3c4af4a9 Fix bug in generation.c's valgrind support.
This doesn't look like the last such bug, but it's one that the
test_decoding regression test is tripping over.  Per buildfarm.

Tomas Vondra

Discussion: https://postgr.es/m/c903f275-2150-fa52-64bf-dca7b53ebf8d@fuzzy.cz
2017-11-24 13:43:34 -05:00
Dean Rasheed 9c55391f0f RLS comment fixes.
The comments in get_policies_for_relation() say that CREATE POLICY
does not support defining restrictive policies. This is no longer
true, starting from PG10.
2017-11-24 14:14:40 +00:00
Andres Freund 59b71c6fe6 Fix handling of NULLs returned by aggregate combine functions.
When strict aggregate combine functions, used in multi-stage/parallel
aggregation, returned NULL, we didn't check for that, invoking the
combine function with NULL the next round, despite it being strict.

The equivalent code invoking normal transition functions has a check
for that situation, which did not get copied in a7de3dc5c3. Fix the
bug by adding the equivalent check.

Based on a quick look I could not find any strict combine functions in
core actually returning NULL, and it doesn't seem very likely external
users have done so. So this isn't likely to have caused issues in
practice.

Add tests verifying transition / combine functions returning NULL is
tested.

Reported-By: Andres Freund
Author: Andres Freund
Discussion: https://postgr.es/m/20171121033642.7xvmjqrl4jdaaat3@alap3.anarazel.de
Backpatch: 9.6, where parallel aggregation was introduced
2017-11-23 17:15:27 -08:00
Tom Lane 07bd77b95a Ensure sizeof(GenerationChunk) is maxaligned.
Per buildfarm.

Also improve some comments.
2017-11-23 17:02:15 -05:00
Simon Riggs b99661c2ff Tweak code for older compilers
Attempt to quiesce build farm

Author: Tomas Vondra
2017-11-23 06:55:18 +11:00
Simon Riggs a4ccc1cef5 Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Author: Tomas Vondra
Reviewed-by: Simon Riggs
2017-11-23 05:45:07 +11:00
Simon Riggs 7e17a6889a Set es_output_cid in replication worker
Allows triggers to operate correctly

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
2017-11-22 16:28:14 +11:00
Robert Haas ae65f6066d Provide for forward compatibility with future minor protocol versions.
Previously, any attempt to request a 3.x protocol version other than
3.0 would lead to a hard connection failure, which made the minor
protocol version really no different from the major protocol version
and precluded gentle protocol version breaks.  Instead, when the
client requests a 3.x protocol version where x is greater than 0, send
the new NegotiateProtocolVersion message to convey that we support
only 3.0.  This makes it possible to introduce new minor protocol
versions without requiring a connection retry when the server is
older.

In addition, if the startup packet includes name/value pairs where
the name starts with "_pq_.", assume that those are protocol options,
not GUCs.  Include those we don't support (i.e. all of them, at
present) in the NegotiateProtocolVersion message so that the client
knows they were not understood.  This makes it possible for the
client to request previously-unsupported features without bumping
the protocol version at all; the client can tell from the server's
response whether the option was understood.

It will take some time before servers that support these new
facilities become common in the wild; to speed things up and make
things easier for a future 3.1 protocol version, back-patch to all
supported releases.

Robert Haas and Badrul Chowdhury

Discussion: http://postgr.es/m/BN6PR21MB0772FFA0CBD298B76017744CD1730@BN6PR21MB0772.namprd21.prod.outlook.com
Discussion: http://postgr.es/m/30788.1498672033@sss.pgh.pa.us
2017-11-21 13:56:24 -05:00
Robert Haas f3b0897a12 Fix multiple problems with satisfies_hash_partition.
Fix the function header comment to describe the actual behavior.
Check that table OID, modulus, and remainder arguments are not NULL
before accessing them.  Check that the modulus and remainder are
sensible.  If the table OID doesn't exist, return NULL instead of
emitting an internal error, similar to what we do elsewhere.  Check
that the actual argument types match, or at least are binary coercible
to, the expected argument types.  Correctly handle invocation of this
function using the VARIADIC syntax.  Add regression tests.

Robert Haas and Amul Sul, per a report by Andreas Seltenreich and
subsequent followup investigation.

Discussion: http://postgr.es/m/871sl4sdrv.fsf@ansel.ydns.eu
2017-11-21 13:06:32 -05:00
Tom Lane 84669c9b06 Use out-of-line M68K spinlock code for OpenBSD as well as NetBSD.
David Carlier (from a patch being carried by OpenBSD packagers)

Discussion: https://postgr.es/m/CA+XhMqzwFSGVU7MEnfhCecc8YdP98tigXzzpd0AAdwaGwaVXEA@mail.gmail.com
2017-11-20 18:05:17 -05:00
Simon Riggs 2ede45c3a4 Fix pg_control_checkpoint from commit 4b0d28de06
Author: Simon Riggs <simon@2ndQuadrant.com>
Reported-By: Andreas Seltenreich <seltenreich@gmx.de>
2017-11-21 08:00:54 +11:00
Robert Haas 0510421ee0 Tweak use of ExecContextForcesOids by Gather (Merge).
Specifically, pass the outer plan's PlanState instead of our own
PlanState.  At present, ExecContextForcesOids doesn't actually care
which PlanState we pass; it just looks through to the underlying
EState to find the result relation or top-level eflags.  However, in
the future it might care.  If that happens, and if our goal is to get
a tuple descriptor that matches that of the outer plan, then I think
what we care about is whether the outer plan's context forces OIDs,
rather than whether our own context forces OIDs, just as we use the
outer node's target list rather than our own.

Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
2017-11-20 12:34:06 -05:00
Robert Haas f455e1125e Pass eflags down to parallel workers.
Currently, there are no known consequences of this oversight, so no
back-patch.  Several of the EXEC_FLAG_* constants aren't usable in
parallel mode anyway, and potential problems related to the presence
or absence of OIDs (see EXEC_FLAG_WITH_OIDS, EXEC_FLAG_WITHOUT_OIDS)
seem at present to be masked by the unconditional projection step
performed by Gather and Gather Merge.  In general, however, it seems
important that all participants agree on the values of these flags,
which modify executor behavior globally, and a pending patch to skip
projection in Gather (Merge) would be outright broken in certain cases
without this fix.

Patch by me, based on investigation of a test case provided by Amit
Kapila.  This patch was also reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
2017-11-20 12:00:33 -05:00
Simon Riggs c2513365a0 Parameter toast_tuple_target controls TOAST for new rows
Specifies the point at which we try to move long column values
into TOAST tables.

No effect on existing rows.

Discussion: https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2Ya00OyQQ@mail.gmail.com

Author: Simon Riggs <simon@2ndQudrant.com>
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndQuadrant.com>
2017-11-20 09:50:10 +11:00
Tom Lane 52f63bd916 Fix compiler warning in rangetypes_spgist.c.
On gcc 7.2.0, comparing pointer to (Datum) 0 produces a warning.
Treat it as a simple pointer to avoid that; this is more consistent
with comparable code elsewhere, anyway.

Tomas Vondra

Discussion: https://postgr.es/m/99410021-61ef-9a9a-9bc8-f733ece637ee@2ndquadrant.com
2017-11-18 16:46:29 -05:00
Tom Lane 4797f9b519 Merge near-duplicate code in RI triggers.
Merge ri_restrict_del and ri_restrict_upd into one function ri_restrict.
Create a function ri_setnull that is the common implementation of
RI_FKey_setnull_del and RI_FKey_setnull_upd.  Likewise create a function
ri_setdefault that is the common implementation of RI_FKey_setdefault_del
and RI_FKey_setdefault_upd.  All of these pairs of functions were identical
except for needing to check for no-actual-key-change in the UPDATE cases;
the one extra if-test is a small price to pay for saving so much code.

Aside from removing about 400 lines of essentially duplicate code, this
allows us to recognize that we were uselessly caching two identical plans
whenever there were pairs of triggers using these duplicated functions
(which is likely very common).

Ildar Musin, reviewed by Ildus Kurbangaliev

Discussion: https://postgr.es/m/ca7064a7-6adc-6f22-ca47-8615ba9425a5@postgrespro.ru
2017-11-18 16:24:05 -05:00
Tom Lane 976a1a48fc Improve to_date/to_number/to_timestamp behavior with multibyte characters.
The documentation says that these functions skip one input character
per literal (non-pattern) format character.  Actually, though, they
skipped one input *byte* per literal *byte*, which could be hugely
confusing if either data or format contained multibyte characters.

To fix, adjust the FormatNode representation and parse_format() so
that multibyte format characters are stored as one FormatNode not
several, and adjust the data-skipping bits to advance by pg_mblen()
not necessarily one byte.  There's no user-visible behavior change
on the to_char() side, although the internal representation changes.

Commit e87d4965b had already fixed most places where we skip characters
on the basis of non-literal format patterns to advance by characters
not bytes, but this gets one more place, the SKIP_THth macro.  I think
everything in formatting.c gets that right now.

It'd be nice to have some regression test cases covering this behavior;
but of course there's no way to do so in an encoding-agnostic way, and
many of the interesting aspects would also require unportable locale
selections.  So I've not bothered here.

Discussion: https://postgr.es/m/28186.1510957703@sss.pgh.pa.us
2017-11-18 12:42:52 -05:00
Tom Lane 63ca86318d Fix quoted-substring handling in format parsing for to_char/to_number/etc.
This code evidently intended to treat backslash as an escape character
within double-quoted substrings, but it was sufficiently confused that
cases like ..."foo\\"... did not work right: the second backslash
managed to quote the double-quote after it, despite being quoted itself.
Rewrite to get that right, while preserving the existing behavior
outside double-quoted substrings, which is that backslash isn't special
except in the combination \".

Comparing to Oracle, it seems that their version of to_char() for
timestamps allows literal alphanumerics only within double quotes, while
non-alphanumerics are allowed outside quotes; backslashes aren't special
anywhere; there is no way at all to emit a literal double quote.
(Bizarrely, their to_char() for numbers is different; it doesn't allow
literal text at all AFAICT.)  The fact that they don't treat backslash
as special justifies our existing behavior for backslash outside double
quotes.  I considered making backslash inside double quotes act the same
way (ie, special only if before "), which in a green field would be a
more consistent behavior.  But that would likely break more existing SQL
code than what this patch does.

Add some test cases illustrating this behavior.  (Only the last new
case actually changes behavior in this commit.)

Little of this behavior was documented, either, so fix that.

Discussion: https://postgr.es/m/3626.1510949486@sss.pgh.pa.us
2017-11-18 12:16:37 -05:00
Peter Eisentraut 9288d62bb4 Support channel binding 'tls-unique' in SCRAM
This is the basic feature set using OpenSSL to support the feature.  In
order to allow the frontend and the backend to fetch the sent and
expected TLS Finished messages, a PG-like API is added to be able to
make the interface pluggable for other SSL implementations.

This commit also adds a infrastructure to facilitate the addition of
future channel binding types as well as libpq parameters to control the
SASL mechanism names and channel binding names.  Those will be added by
upcoming commits.

Some tests are added to the SSL test suite to test SCRAM authentication
with channel binding.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-11-18 10:15:54 -05:00
Robert Haas 611fe7d479 Update postgresql.conf.sample comment for bgwriter_lru_maxpages
Commit 14ca9abfbe should have done
this, but did not.

Jeff Janes

Discussion: http://postgr.es/m/CAMkU=1yWOvL+YFYzGM9yXSoWjxr_5_Ny78pPzLKQCkfgB7H-JQ@mail.gmail.com
2017-11-17 14:52:00 -05:00
Tom Lane e87d4965bd Prevent to_number() from losing data when template doesn't match exactly.
Non-data template patterns would consume characters whether or not those
characters were what the pattern expected, for example
	SELECT TO_NUMBER('1234', '9,999');
produced 134 because the '2' got eaten by the comma pattern.  This seems
undesirable, not least because it doesn't happen in Oracle.  For the ','
and 'G' template patterns, we can fix this by consuming characters only
if they match what the pattern would output.  For non-data patterns such
as 'L' and 'TH', it seems impractical to tighten things up to the point of
consuming only exact matches to what the pattern would output; but we can
improve matters quite a lot by redefining the behavior as "consume only
characters that aren't digits, signs, decimal point, or comma".

Also, fix it so that the behavior is to consume the number of *characters*
the pattern would output, not the number of *bytes*.  The old coding would
do surprising things with non-ASCII currency symbols, for example.  (It
would be good to apply that rule for literal text as well, but this commit
only fixes it for non-data patterns.)

Oliver Ford, reviewed by Thomas Munro and Nathan Wagner, and whacked around
a bit more by me

Discussion: https://postgr.es/m/CAGMVOdvpbMqPf9XWNzOwBpzJfErkydr_fEGhmuDGa015z97mwg@mail.gmail.com
2017-11-17 12:04:13 -05:00
Andres Freund 11e264517d Remove BufFile's isTemp flag.
The isTemp flag controls whether buffile.c chops BufFile data up into
1GB segments on disk.  Since it was badly named and always true, get
rid of it.

Author: Thomas Munro (based on suggestion by Peter Geoghegan)
Reviewed-By: Peter Geoghegan, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wz%3D%2B9Rfqh5UdvdW9rGezdhrMGGH-JL1X9FXXVZdeeGeOJA%40mail.gmail.com
2017-11-16 17:52:57 -08:00
Andres Freund 7082e614c0 Provide DSM segment to ExecXXXInitializeWorker functions.
Previously, executor nodes running in parallel worker processes didn't
have access to the dsm_segment object used for parallel execution.  In
order to support resource management based on DSM segment lifetime,
they need that.  So create a ParallelWorkerContext object to hold it
and pass it to all InitializeWorker functions.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
2017-11-16 17:39:18 -08:00