Turns out we have enough functions that the binary search is quite
noticeable in profiles.
Thus have Gen_fmgrtab.pl build a new mapping from a builtin function's
oid to an index in the existing fmgr_builtins array. That keeps the
additional memory usage at a reasonable amount.
Author: Andres Freund, with input from Tom Lane
Discussion: https://postgr.es/m/20170914065128.a5sk7z4xde5uy3ei@alap3.anarazel.de
Not much to say about this; does what it says on the tin.
However, formerly, if there was a column list then the ANALYZE action was
implied; now it must be specified, or you get an error. This is because
it would otherwise be a bit unclear what the user meant if some tables
have column lists and some don't.
Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some
editorialization by me
Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com
Commit 597a87ccc introduced a latch pointer variable to replace use
of a long-lived shared latch in the shared WalRcvData structure.
This was not well thought out, because there are now hazards of the
pointer variable changing while it's being inspected by another
process. This could obviously lead to a core dump in code like
if (WalRcv->latch)
SetLatch(WalRcv->latch);
and there's a more remote risk of a torn read, if we have any
platforms where reading/writing a pointer is not atomic.
An actual problem would occur only if the walreceiver process
exits (gracefully) while the startup process is trying to
signal it, but that seems well within the realm of possibility.
To fix, treat the pointer variable (not the referenced latch)
as being protected by the WalRcv->mutex spinlock. There
remains a race condition that we could apply SetLatch to a
process latch that no longer belongs to the walreceiver, but
I believe that's harmless: at worst it'd cause an extra wakeup
of the next process to use that PGPROC structure.
Back-patch to v10 where the faulty code was added.
Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us
1. Since commit b1a9bad9e7 we had pstrdup() inside a
spinlock-protected critical section; reported by Andreas Seltenreich.
Turn those into strlcpy() to stack-allocated variables instead.
Backpatch to 9.6.
2. Since commit 9ed551e0a4 we had a pfree() uselessly inside a
spinlock-protected critical section. Tom Lane noticed in code review.
Move down. Backpatch to 9.6.
3. Since commit 64233902d2 we had GetCurrentTimestamp() (a kernel
call) inside a spinlock-protected critical section. Tom Lane noticed in
code review. Move it up. Backpatch to 9.2.
4. Since commit 1bb2558046 we did elog(PANIC) while holding spinlock.
Tom Lane noticed in code review. Release spinlock before dying.
Backpatch to 9.2.
Discussion: https://postgr.es/m/87h8vhtgj2.fsf@ansel.ydns.eu
All postgres internal usages are replaced, it's just libpq example
usages that haven't been converted. External users of libpq can't
generally rely on including postgres internal headers.
Note that this includes replacing open-coded byte swapping of 64bit
integers (using two 32 bit swaps) with a single 64bit swap.
Where it looked applicable, I have removed netinet/in.h and
arpa/inet.h usage, which previously provided the relevant
functionality. It's perfectly possible that I missed other reasons for
including those, the buildfarm will tell.
Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de
Previously that was disallowed out of an abundance of
caution. Providing KILL support however is helpful to make the
013_crash_restart.pl test portable, and there's no actual issue with
allowing it. SIGABRT, which has similar consequences except it also
dumps core, was already allowed.
Author: Andres Freund
Discussion: https://postgr.es/m/45d42d41-6145-9be1-7261-84acf6d9e344@2ndQuadrant.com
Buildfarm members skink and sungazer have both recently failed this
test, with symptoms indicating that the default 3-second timeout
isn't quite enough for those very slow systems. There's no reason
to be miserly with this timeout, so boost it to 60 seconds.
Back-patch to all versions containing this test. That may be overkill,
because the failure has only been observed in the v10 branch, but
I don't feel like having to revisit this later.
If --rate was used to throttle pgbench, it failed to sleep when it had
nothing to do, leading to a busy-wait with 100% CPU usage. This bug was
introduced in the refactoring in v10. Before that, sleep() was called with
a timeout, even when there were no file descriptors to wait for.
Reported by Jeff Janes, patch by Fabien COELHO. Backpatch to v10.
Discussion: https://www.postgresql.org/message-id/CAMkU%3D1x5hoX0pLLKPRnXCy0T8uHoDvXdq%2B7kAM9eoC9_z72ucw%40mail.gmail.com
During a binary upgrade, all type OIDs are supposed to be assigned by
pg_dump based on their values in the old cluster. But now that domains
have arrays, there's nothing to base the arrays' type OIDs on, if we're
upgrading from a pre-v11 cluster. Make pg_dump search for an unused type
OID to use for this purpose. Per buildfarm.
Discussion: https://postgr.es/m/E1dyLlE-0002gT-H5@gemulon.postgresql.org
Allowing arrays with a domain type as their element type was left un-done
in the original domain patch, but not for any very good reason. This
omission leads to such surprising results as array_agg() not working on
a domain column, because the parser can't identify a suitable output type
for the polymorphic aggregate.
In order to fix this, first clean up the APIs of coerce_to_domain() and
some internal functions in parse_coerce.c so that we consistently pass
around a CoercionContext along with CoercionForm. Previously, we sometimes
passed an "isExplicit" boolean flag instead, which is strictly less
information; and coerce_to_domain() didn't even get that, but instead had
to reverse-engineer isExplicit from CoercionForm. That's contrary to the
documentation in primnodes.h that says that CoercionForm only affects
display and not semantics. I don't think this change fixes any live bugs,
but it makes things more consistent. The main reason for doing it though
is that now build_coercion_expression() receives ccontext, which it needs
in order to be able to recursively invoke coerce_to_target_type().
Next, reimplement ArrayCoerceExpr so that the node does not directly know
any details of what has to be done to the individual array elements while
performing the array coercion. Instead, the per-element processing is
represented by a sub-expression whose input is a source array element and
whose output is a target array element. This simplifies life in
parse_coerce.c, because it can build that sub-expression by a recursive
invocation of coerce_to_target_type(). The executor now handles the
per-element processing as a compiled expression instead of hard-wired code.
The main advantage of this is that we can use a single ArrayCoerceExpr to
handle as many as three successive steps per element: base type conversion,
typmod coercion, and domain constraint checking. The old code used two
stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty
inefficient, and adding yet another array deconstruction to do domain
constraint checking seemed very unappetizing.
In the case where we just need a single, very simple coercion function,
doing this straightforwardly leads to a noticeable increase in the
per-array-element runtime cost. Hence, add an additional shortcut evalfunc
in execExprInterp.c that skips unnecessary overhead for that specific form
of expression. The runtime speed of simple cases is within 1% or so of
where it was before, while cases that previously required two levels of
array processing are significantly faster.
Finally, create an implicit array type for every domain type, as we do for
base types, enums, etc. Everything except the array-coercion case seems
to just work without further effort.
Tom Lane, reviewed by Andrew Dunstan
Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
Upcoming patches are going to address performance issues that involve
slow system provided ntohs/htons etc. To address that expand
pg_bswap.h to provide pg_ntoh{16,32,64}, pg_hton{16,32,64} and
optimize their respective implementations by using compiler intrinsics
for gcc compatible compilers and msvc. Fall back to manual
implementations using shifts etc otherwise.
Additionally remove multiple evaluation hazards from the existing
BSWAP32/64 macros, by replacing them with inline functions when
necessary. In the course of that the naming scheme is changed to
pg_bswap16/32/64.
Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de
get_rel_oids used to not take any relation locks at all, but that stopped
being a good idea with commit 3c3bb9933, which inserted a syscache lookup
into the function. A concurrent DROP TABLE could now produce "cache lookup
failed", which we don't want to have happen in normal operation. The best
solution seems to be to transiently take a lock on the relation named by
the RangeVar (which also makes the result of RangeVarGetRelid a lot less
spongy). But we shouldn't hold the lock beyond this function, because we
don't want VACUUM to lock more than one table at a time. (That would not
be a big problem right now, but it will become one after the pending
feature patch to allow multiple tables to be named in VACUUM.)
In passing, adjust vacuum_rel and analyze_rel to document that we don't
trust the passed RangeVar to be accurate, and allow the RangeVar to
possibly be NULL --- which it is anyway for a whole-database VACUUM,
though we accidentally didn't crash for that case.
The passed RangeVar is in fact inaccurate when dealing with a child
partition, as of v10, and it has been wrong for a whole long time in the
case of vacuum_rel() recursing to a TOAST table. None of these things
present visible bugs up to now, because the passed RangeVar is in fact
only consulted for autovacuum logging, and in that particular context it's
always accurate because autovacuum doesn't let vacuum.c expand partitions
nor recurse to toast tables. Still, this seems like trouble waiting to
happen, so let's nail the door at least partly shut. (Further cleanup
is planned, in HEAD only, as part of the pending feature patch.)
Fix some sadly inaccurate/obsolete comments too. Back-patch to v10.
Michael Paquier and Tom Lane
Discussion: https://postgr.es/m/25023.1506107590@sss.pgh.pa.us
If \d rather than \d+ is used, then verbose is false and we don't ask
the server for the partition constraint; so we shouldn't print it in
that case either.
Maksim Milyutin, per a report from Jesper Pedersen. Reviewed by
Jesper Pedersen and Amit Langote.
Discussion: http://postgr.es/m/2af5fc4d-7bcc-daa8-4fe6-86274bea363c@redhat.com
For \d sequencename, the psql code just did SELECT * FROM sequencename
to get the information to display, but this does not contain much
interesting information anymore in PostgreSQL 10, because the metadata
has been moved to a separate system catalog.
This patch creates a newly designed sequence display that is not merely
an extension of the general relation/table display as it was previously.
Example:
PostgreSQL 9.6:
=> \d foobar
Sequence "public.foobar"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | foobar
last_value | bigint | 1
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 0
is_cycled | boolean | f
is_called | boolean | f
PostgreSQL 10 before this change:
=> \d foobar
Sequence "public.foobar"
Column | Type | Value
------------+---------+-------
last_value | bigint | 1
log_cnt | bigint | 0
is_called | boolean | f
New:
=> \d foobar
Sequence "public.foobar"
Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
--------+-------+---------+---------------------+-----------+---------+-------
bigint | 1 | 1 | 9223372036854775807 | 1 | no | 1
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Avoid the coding pattern "*op->resvalue = f();", as some compilers think
that requires them to evaluate "op->resvalue" before the function call.
Unless there are lots of free registers, this can lead to a useless
register spill and reload across the call.
I changed all the cases like this in ExecInterpExpr(), but didn't bother
in the out-of-line opcode eval subroutines, since those are presumably
not as performance-critical.
Discussion: https://postgr.es/m/2508.1506630094@sss.pgh.pa.us
Add bgw_type field to background worker structure. It is intended to be
set to the same value for all workers of the same type, so they can be
grouped in pg_stat_activity, for example.
The backend_type column in pg_stat_activity now shows bgw_type for a
background worker. The ps listing also no longer calls out that a
process is a background worker but just show the bgw_type. That way,
being a background worker is more of an implementation detail now that
is not shown to the user. However, most log messages still refer to
'background worker "%s"'; otherwise constructing sensible and
translatable log messages would become tricky.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
At the time replacement_sort_tuples was introduced, there were still
cases where replacement selection sort noticeably outperformed using
quicksort even for the first run. However, those cases seem to have
evaporated as a result of further improvements made since that time
(and perhaps also advances in CPU technology). So remove replacement
selection and the controlling GUC entirely. This makes tuplesort.c
noticeably simpler and probably paves the way for further
optimizations someone might want to do later.
Peter Geoghegan, with review and testing by Tomas Vondra and me.
Discussion: https://postgr.es/m/CAH2-WzmmNjG_K0R9nqYwMq3zjyJJK+hCbiZYNGhAy-Zyjs64GQ@mail.gmail.com
Also make overriding the title easier. That helps telling where the
report came from and labeling different variants of a report.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
By just running lcov on the produced .gcda data files, we don't account
for source files that are not touched by tests at all. To fix that, run
lcov --initial to create a base line info file with all zero counters,
and merge that with the actual counters when creating the final report.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But
concurrent transaction commit/abort may turn DEAD some of the HOT tuples
that survived the prune, before HeapTupleSatisfiesVacuum tests them.
This happens to activate the code that decides to freeze the tuple ...
which resuscitates it, duplicating data.
(This is especially bad if there's any unique constraints, because those
are now internally violated due to the duplicate entries, though you
won't know until you try to REINDEX or dump/restore the table.)
One possible fix would be to simply skip doing anything to the tuple,
and hope that the next HOT prune would remove it. But there is a
problem: if the tuple is older than freeze horizon, this would leave an
unfrozen XID behind, and if no HOT prune happens to clean it up before
the containing pg_clog segment is truncated away, it'd later cause an
error when the XID is looked up.
Fix the problem by having the tuple freezing routines cope with the
situation: don't freeze the tuple (and keep it dead). In the cases that
the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
so that there is no need to look up the XID in pg_clog later on.
An isolation test is included, authored by Michael Paquier, loosely
based on Daniel Wood's original reproducer. It only tests one
particular scenario, though, not all the possible ways for this problem
to surface; it be good to have a more reliable way to test this more
fully, but it'd require more work.
In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql
I outlined another test case (more closely matching Dan Wood's) that
exposed a few more ways for the problem to occur.
Backpatch all the way back to 9.3, where this problem was introduced by
multixact juggling. In branches 9.3 and 9.4, this includes a backpatch
of commit e5ff9fefcd50 (of 9.5 era), since the original is not
correctable without matching the coding pattern in 9.5 up.
Reported-by: Daniel Wood
Diagnosed-by: Daniel Wood
Reviewed-by: Yi Wen Wong, Michaël Paquier
Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
Call lcov with --no-external option to exclude external files (for
example, system headers with inline functions) from output.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This is the way lcov was intended to be used. It is much faster and
more robust and makes the makefiles simpler than running it in each
subdirectory.
The previous coding ran gcov before lcov, but that is useless because
lcov/geninfo call gcov internally and use that information. Moreover,
this led to complications and failures during parallel make. This
separates the two targets: You either use "make coverage" to get
textual output from gcov or "make coverage-html" to get an HTML report
via lcov. (Using both is still problematic because they write the same
output files.)
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
float8_numeric() and float4_numeric() failed to consider the possibility
that the input is an IEEE infinity. The results depended on the
platform-specific behavior of sprintf(): on most platforms you'd get
something like
ERROR: invalid input syntax for type numeric: "inf"
but at least on Windows it's possible for the conversion to succeed and
deliver a finite value (typically 1), due to a nonstandard output format
from sprintf and lack of syntax error checking in these functions.
Since our numeric type lacks the concept of infinity, a suitable conversion
is impossible; the best thing to do is throw an explicit error before
letting sprintf do its thing.
While at it, let's use snprintf not sprintf. Overrunning the buffer
should be impossible if sprintf does what it's supposed to, but this
is cheap insurance against a stack smash if it doesn't.
Problem reported by Taiki Kondo. Patch by me based on fix suggestion
from KaiGai Kohei. Back-patch to all supported branches.
Discussion: https://postgr.es/m/12A9442FBAE80D4E8953883E0B84E088C8C7A2@BPXM01GP.gisp.nec.co.jp
This reverts commit 15bc038f9, along with the followon commits 1635e80d3
and 984c92074 that tried to clean up the problems exposed by bug #14825.
The result was incomplete because it failed to address parallel-query
requirements. With 10.0 release so close upon us, now does not seem like
the time to be adding more code to fix that. I hope we can un-revert this
code and add the missing parallel query support during the v11 cycle.
Back-patch to v10.
Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
The changes in 639928c988 turned out to
require Perl 5.9.3, which is newer than our minimum required version.
So revert back to the old code for the normal case and only use the new
variant when both coverage and vpath are used. As the minimum Perl
version moves forward, we can drop the old code sometime.
Run xsubpp with the -output option instead of redirecting stdout. That
ensures that the #line directives in the output file point to the right
place in a vpath build. This in turn fixes an error in coverage builds
that it can't find the source files.
Refactor the makefile rules while we're here.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
When requesting a particular replication slot, the new pg_basebackup
option -C/--create-slot creates it before starting to replicate from it.
Further refactor the slot creation logic to include the temporary slot
creation logic into the same function. Add new arguments is_temporary
and preserve_wal to CreateReplicationSlot(). Print in --verbose mode
that a slot has been created.
Author: Michael Banck <michael.banck@credativ.de>
Add some more tests for the --create-slot and --drop-slot options,
verifying that the right kind of slot was created and that the slot was
dropped. While working on an unrelated patch for pg_basebackup, some of
this was temporarily broken without any tests noticing.
posix_fallocate() is not quite a drop-in replacement for fallocate(),
because it is defined to return the error code as its function result,
not in "errno". I (tgl) missed this because RHEL6's version seems
to set errno as well. That is not the case on more modern Linuxen,
though, as per buildfarm results.
Aside from fixing the return-convention confusion, remove the test
for ENOSYS; we expect that glibc will mask that for posix_fallocate,
though it does not for fallocate. Keep the test for EINTR, because
POSIX specifies that as a possible result, and buildfarm results
suggest that it can happen in practice.
Back-patch to 9.4, like the previous commit.
Thomas Munro
Discussion: https://postgr.es/m/1002664500.12301802.1471008223422.JavaMail.yahoo@mail.yahoo.com
The blacklist mechanism added by the preceding commit directly fixes
most of the practical cases that the same-transaction test was meant
to cover. What remains is use-cases like
begin;
create type e as enum('x');
alter type e add value 'y';
-- use 'y' somehow
commit;
However, because the same-transaction test is heuristic, it fails on
small variants of that, such as renaming the type or changing its
owner. Rather than try to explain the behavior to users, let's
remove it and just have a rule that the newly added value can't be
used before being committed, full stop. Perhaps later it will be
worth the implementation effort and overhead to have a more accurate
test for type-was-created-in-this-transaction. We'll wait for some
field experience with v10 before deciding to do that.
Back-patch to v10.
Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
Commit 15bc038f9 allowed ALTER TYPE ADD VALUE to be executed inside
transaction blocks, by disallowing the use of the added value later
in the same transaction, except under limited circumstances. However,
the test for "limited circumstances" was heuristic and could reject
references to enum values that were created during CREATE TYPE AS ENUM,
not just later. This breaks the use-case of restoring pg_dump scripts
in a single transaction, as reported in bug #14825 from Balazs Szilfai.
We can improve this by keeping a "blacklist" table of enum value OIDs
created by ALTER TYPE ADD VALUE during the current transaction. Any
visible-but-uncommitted value whose OID is not in the blacklist must
have been created by CREATE TYPE AS ENUM, and can be used safely
because it could not have a lifespan shorter than its parent enum type.
This change also removes the restriction that a renamed enum value
can't be used before being committed (unless it was on the blacklist).
Andrew Dunstan, with cosmetic improvements by me.
Back-patch to v10.
Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
The --slot option somehow ended up under options controlling the output,
and some other options were in a nonsensical place or were not moved
after recent renamings, so tidy all that up a bit.
A FOR ALL TABLES publication naturally considers all base tables to be a
candidate for replication. This includes transient heaps that are
created during a table rewrite during DDL. This causes failures on the
subscriber side because it will not have a table like pg_temp_16386 to
receive data (and if it did, it would be the wrong table).
The prevent this problem, we filter out any tables that match this
naming pattern and match an actual table from FOR ALL TABLES
publications. This is only a heuristic, meaning that user tables that
match that naming could accidentally be omitted. A more robust solution
might require an explicit marking of such tables in pg_class somehow.
Reported-by: yxq <yxq@o2.pl>
Bug: #14785
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
This was intended as infrastructure for weakening VACUUM's locking
requirements, similar to what was done for btree indexes in commit
2ed5b87f96. However, for hash indexes,
it seems that the improvements which are possible are actually
extremely marginal. Furthermore, performing the LSN cross-check will
end up skipping cleanup far more often than is necessary; we only care
about page modifications due to a VACUUM, but the LSN check will fail
if ANY modification has occurred. So, rather than pressing forward
with that "optimization", just rip the LSN field out.
Patch by me, reviewed by Ashutosh Sharma and Amit Kapila
Discussion: http://postgr.es/m/CAA4eK1JxqqcuC5Un7YLQVhOYSZBS+t=3xqZuEkt5RyquyuxpwQ@mail.gmail.com
On Linux, shared memory segments created with shm_open() are backed by
swap files created in tmpfs. If the swap file needs to be extended,
but there's no tmpfs space left, you get a very unfriendly SIGBUS trap.
To avoid this, force allocation of the full request size when we create
the segment. This adds a few cycles, but none that we wouldn't expend
later anyway, assuming the request isn't hugely bigger than the actual
need.
Make this code #ifdef __linux__, because (a) there's not currently a
reason to think the same problem exists on other platforms, and (b)
applying posix_fallocate() to an FD created by shm_open() isn't very
portable anyway.
Back-patch to 9.4 where the DSM code came in.
Thomas Munro, per a bug report from Amul Sul
Discussion: https://postgr.es/m/1002664500.12301802.1471008223422.JavaMail.yahoo@mail.yahoo.com
If construct_array() or construct_md_array() were given a dimension of
zero, they'd produce an array that contains no elements but has positive
dimension. This violates a general expectation that empty arrays should
have ndims = 0; in particular, while arrays like this print as empty,
they don't compare equal to other empty arrays.
Up to now we've expected callers to avoid making such calls and instead
be careful to call construct_empty_array() if there would be no elements.
But this has always been an easily missed case, and we've repeatedly had to
fix callers to do it right. In bug #14826, Erwin Brandstetter pointed out
yet another such oversight, in ts_lexize(); and a bit of examination of
other call sites found at least two more with similar issues. So let's
fix the problem centrally and permanently by changing these two functions
to construct a proper zero-D empty array whenever the array would be empty.
This renders a few explicit calls of construct_empty_array() redundant,
but the only such place I found that really seemed worth changing was in
ExecEvalArrayExpr().
Although this fixes some very old bugs, no back-patch: the problem is
pretty minor and the risk of changing behavior seems to outweigh the
benefit in stable branches.
Discussion: https://postgr.es/m/20170923125723.1448.39412@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20570.1506198383@sss.pgh.pa.us
There is no reason to ever prevent the use of SortSupport on Windows
when ICU locales are used. We previously avoided SortSupport on Windows
with UTF-8 server encoding and a non C-locale due to restrictions in
Windows' libc functionality.
This is now considered to be a restriction in one platform's libc
collation provider, and not a more general platform restriction.
Reported-by: Peter Geoghegan <pg@bowt.ie>
One test case was meant to check that pg_basebackup does not succeed
when a slot is specified with -S but WAL streaming is not selected,
which used to require specifying -X stream. Since -X stream is the
default in PostgreSQL 10, this test case no longer covers that meaning,
but the pg_basebackup invocation happened to fail anyway for the
unrelated reason that the specified replication slot does not exist. To
fix, move the test case to later in the file where the slot does exist,
and add -X none to the invocation so that it covers the originally meant
behavior.
extracted from a patch by Michael Banck <michael.banck@credativ.de>
Invoke vacuum(), as well as "work item" processing, in the PortalContext
that do_autovacuum() has manufactured, which will be reset before each
such invocation. This ensures cleanup of any memory leaked by these
operations. It also avoids the rather dangerous practice of calling
vacuum() in a context that vacuum() itself will destroy while it runs.
There's no known live bug there, but it's not hard to imagine introducing
one if we leave it like this.
Tom Lane, reviewed by Michael Paquier and Alvaro Herrera
Discussion: https://postgr.es/m/13849.1506114543@sss.pgh.pa.us
Buildfarm member skink shows that this is even more flaky than
I thought. There are probably some actual pgbench bugs here
as well as a timing dependency. But we can't have stuff this
unstable in the buildfarm, it obscures other issues.
The file handling functions from fd.c were called with a diverse mix of
notations for the file permissions when they were opening new files.
Almost all files created by the server should have the same permissions
set. So change the API so that e.g. OpenTransientFile() automatically
uses the standard permissions set, and OpenTransientFilePerm() is a new
function that takes an explicit permissions set for the few cases where
it is needed. This also saves an unnecessary argument for call sites
that are just opening an existing file.
While we're reviewing these APIs, get rid of the FileName typedef and
use the standard const char * for the file name and mode_t for the file
mode. This makes these functions match other file handling functions
and removes an unnecessary layer of mysteriousness. We can also get rid
of a few casts that way.
Author: David Steele <david@pgmasters.net>
In two cases, we set a different umask for some piece of code and
restore it afterwards. But if the contained code errors out, the umask
is not restored. So add TRY/CATCH blocks to fix that.
This reverts commit f41e56c76e.
The build farm client would run the pg_upgrade tests twice, once as part
of the existing pg_upgrade check run and once as part of picking up all
TAP tests by looking for "t" directories. Since the pg_upgrade tests
are pretty slow, we will need a better solution or possibly a build farm
client change before we can proceed with this.
Commit 09cb5c0e7d added a similar
optimization to btree back in 2006, but nobody bothered to implement
the same thing for hash indexes, probably because they weren't
WAL-logged and had lots of other performance problems as well. As
with the corresponding btree case, this eliminates the problem of
potentially needing to refind our position within the page, and cuts
down on pin/unpin traffic as well.
Ashutosh Sharma, reviewed by Alexander Korotkov, Jesper Pedersen,
Amit Kapila, and me. Some final edits to comments and README by
me.
Discussion: http://postgr.es/m/CAE9k0Pm3KTx93K8_5j6VMzG4h5F+SyknxUwXrN-zqSZ9X8ZS3w@mail.gmail.com
There seems to be some considerable imprecision in the timing of -P
progress reports. Nominally each thread ought to produce 2 reports
in this test, but about 10% of the time we only get one, and 1% of
the time we get three, as per buildfarm results so far. Pending
further investigation, treat the last case as a "pass". (I, tgl,
am suspicious that this still might not be lax enough, now that it's
obvious that the behavior is load-dependent; but there's not yet
buildfarm evidence to confirm that suspicion.)
Fabien Coelho
Discussion: https://postgr.es/m/26654.1505232433@sss.pgh.pa.us
Adjust commentary in regc_pg_locale.c to remove mention of the possibility
of not having <wctype.h> functions, since we no longer consider that.
Eliminate duplicate code in wparser_def.c by generalizing the p_iswhat
macro to take a parameter saying what to return for non-ASCII chars
in C locale. (That's not really a consequence of the
USE_WIDE_UPPER_LOWER-ectomy, but I noticed it while doing that.)
These functions are required by SUS v2, which is our minimum baseline
for Unix platforms, and are present on all interesting Windows versions
as well. Even our oldest buildfarm members have them. Thus, we were not
testing the "!USE_WIDE_UPPER_LOWER" code paths, which explains why the bug
fixed in commit e6023ee7f escaped detection. Per discussion, there seems
to be no more real-world value in maintaining this option. Hence, remove
the configure-time tests for wcstombs() and towlower(), remove the
USE_WIDE_UPPER_LOWER symbol, and remove all the !USE_WIDE_UPPER_LOWER code.
There's not actually all that much of the latter, but simplifying the #if
nests is a win in itself.
Discussion: https://postgr.es/m/20170921052928.GA188913@rfd.leadboat.com
The placement of the ifdef blocks in formatting.c was pretty bogus, so
the code failed to compile if USE_WIDE_UPPER_LOWER was not defined.
Reported-by: Peter Geoghegan <pg@bowt.ie>
Reported-by: Noah Misch <noah@leadboat.com>
This patch absorbs a few unreleased fixes in the IANA code.
It corresponds to commit 2d8b944c1cec0808ac4f7a9ee1a463c28f9cd00a
in https://github.com/eggert/tz. Non-cosmetic changes include:
TZDEFRULESTRING is updated to match current US DST practice,
rather than what it was over ten years ago. This only matters
for interpretation of POSIX-style zone names (e.g., "EST5EDT"),
and only if the timezone database doesn't include either an exact
match for the zone name or a "posixrules" entry. The latter
should not be true in any current Postgres installation, but
this could possibly matter when using --with-system-tzdata.
Get rid of a nonportable use of "++var" on a bool var.
This is part of a larger fix that eliminates some vestigial
support for consecutive leap seconds, and adds checks to
the "zic" compiler that the data files do not specify that.
Remove a couple of ancient compatibility hacks. The IANA
crew think these are obsolete, and I tend to agree. But
perhaps our buildfarm will think different.
Back-patch to all supported branches, in line with our policy
that all branches should be using current IANA code. Before v10,
this includes application of current pgindent rules, to avoid
whitespace problems in future back-patches.
Discussion: https://postgr.es/m/E1dsWhf-0000pT-F9@gemulon.postgresql.org
"\if :{?variable_name}" will be translated to "\if TRUE" if the variable
exists and "\if FALSE" otherwise. Thus it will be possible to execute code
conditionally on the existence of the variable, regardless of its value.
Fabien Coelho, with some review by Robins Tharakan and some light text
editing by me.
Discussion: https://postgr.es/m/alpine.DEB.2.20.1708260835520.3627@lancre
Previously, the code didn't think about this case and would just try to
analyze such a column twice. That would fail at the point of inserting
the second version of the pg_statistic row, with obscure error messsages
like "duplicate key value violates unique constraint" or "tuple already
updated by self", depending on context and PG version. We could allow
the case by ignoring duplicate column specifications, but it seems better
to reject it explicitly.
The bogus error messages seem like arguably a bug, so back-patch to
all supported versions.
Nathan Bossart, per a report from Michael Paquier, and whacked
around a bit by me.
Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com
These variables are only ever written to in assertion-enabled builds,
and the latest Microsoft compilers complain about such variables in
non-assertion-enabled builds.
Apparently they don't worry so much about variables that are written to
but not read from, so most of our PG_USED_FOR_ASSERTS_ONLY variables
don't cause the problem.
Discussion: https://postgr.es/m/7800.1505950322@sss.pgh.pa.us
This is not used for anything yet, but it is necessary infrastructure
for partition-wise join and for partition pruning without constraint
exclusion.
Ashutosh Bapat, reviewed by Amit Langote and with quite a few changes,
mostly cosmetic, by me. Additional review and testing of this patch
series by Antonin Houska, Amit Khandekar, Rafia Sabih, Rajkumar
Raghuwanshi, Thomas Munro, and Dilip Kumar.
Discussion: http://postgr.es/m/CAFjFpRfneFG3H+F6BaiXemMrKF+FY-POpx3Ocy+RiH3yBmXSNw@mail.gmail.com
pg_newlocale_from_collation() used malloc() and strdup() directly,
which is generally not per backend coding style, and it didn't bother
to check for failure results, but would just SIGSEGV instead. Also,
if one of the numerous error checks in the middle of the function
failed, the already-allocated memory would be leaked permanently.
Admittedly, it's not a lot of memory, but it could build up if this
function were called repeatedly for a bad collation.
The first two problems are easily cured by palloc'ing in TopMemoryContext
instead of calling libc directly. We can fairly easily dodge the leakage
problem for the struct pg_locale_struct by filling in a temporary variable
and allocating permanent storage only once we reach the bottom of the
function. It's harder to get rid of the potential leakage for ICU's copy
of the collcollate string, but at least that's only allocated after most
of the error checks; so live with that aspect.
Back-patch to v10 where this code came in, with one or another of the
ICU patches.
005_encoding.pl neglected to wait for the subscriber's initial
synchronization to happen. While we have not seen this fail in
the buildfarm, it's pretty easy to demonstrate there's an issue
by hacking logicalrep_worker_launch() to fail most of the time.
Michael Paquier
Discussion: https://postgr.es/m/27032.1505749806@sss.pgh.pa.us
Remove gratuitous differences in the process names shown in
pg_stat_activity.backend_type and the ps output.
Reviewed-by: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
As it turns out we can't rely that the script's monitoring session is
terminated with a proper error by the server, because the session
might be terminated while already trying to send data.
Also improve robustness and error reporting facilities of the test,
developed while debugging this issue.
Discussion: https://postgr.es/m/20170920020038.kllxgilo7xzwmtto@alap3.anarazel.de
The preceding patch allowed us to remove useless GiST support functions.
This patch actually does that for all the no-op cases in the core GiST
code. This buys us whatever performance gain is to be had, and more
importantly exercises the preceding patch.
There remain no-op functions in the contrib GiST opclasses, but those
will take more work to remove.
Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com
There are common use-cases in which the compress and/or decompress
functions can be omitted, with the result being that we make no
data transformation when storing or retrieving index values.
Previously, you had to provide a no-op function anyway, but this
patch allows such opclass support functions to be omitted.
Furthermore, if the compress function is omitted, then the core code
knows that the stored representation is the same as the original data.
This means we can allow index-only scans without requiring a fetch
function to be provided either. Previously you had to provide a
no-op fetch function if you wanted IOS to work.
This reportedly provides a small performance benefit in such cases,
but IMO the real reason for doing it is just to reduce the amount of
useless boilerplate code that has to be written for GiST opclasses.
Andrey Borodin, reviewed by Dmitriy Sarafannikov
Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com
The plan is to convert the current pg_upgrade test to the TAP
framework. This commit just puts a basic TAP test in place so that we
can see how the build farm behaves, since the build farm client has some
special knowledge of the pg_upgrade tests.
Author: Michael Paquier <michael.paquier@gmail.com>
The use of strnlen rather than strlen was just paranoia. Instead of
giving up on the paranoia, just implement the safeguard
differently. And add a comment explaining why we're careful.
Author: Andres Freund
Discussion: https://postgr.es/m/E1duOkJ-0001Mc-U5@gemulon.postgresql.org
Previously multi-byte aware truncation was done on every
pgstat_report_activity() call - proving to be a bottleneck for
workloads with long query strings that execute quickly.
Instead move the truncation to the read side, which commonly is
executed far less frequently. That's possible because all server
encodings allow to determine the length of a multi-byte string from
the first byte.
Rename PgBackendStatus.st_activity to st_activity_raw so existing
extension users of the field break - their code has to be adjusted to
use pgstat_clip_activity().
Author: Andres Freund
Tested-By: Khuntal Ghosh
Reviewed-By: Robert Haas, Tom Lane
Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkviecpz@alap3.anarazel.de
Add timeouts in case psql doesn't deliver the expected output, and try
to cause the monitoring psql to be fully connected to a backend. This
isn't necessarily everything needed, but at least the timeouts should
reduce the pain for buildfarm owners.
Author: Andres Freund
Reported-By: Tom Lane, BF animals prairiedog and calliphoridae
Discussion: https://postgr.es/m/E1du6ZT-00043I-91@gemulon.postgresql.org
Previously statement_timeout, in the extended protocol, affected all
messages till a Sync message. For clients that pipeline/batch query
execution that's problematic.
Instead disable timeout after each Execute message, and enable, if
necessary, the timer in start_xact_command(). As that's done only for
Execute and not Parse / Bind, pipelining the latter two could still
cause undesirable timeouts. But a survey of protocol implementations
shows that all drivers issue Sync messages when preparing, and adding
timeout rearming to both is fairly expensive for the common parse /
bind / execute sequence.
Author: Tatsuo Ishii, editorialized by Andres Freund
Reviewed-By: Takayuki Tsunakawa, Andres Freund
Discussion: https://postgr.es/m/20170222.115044.1665674502985097185.t-ishii@sraoss.co.jp
The bug was caused by not re-reading the control file during crash
recovery restarts, which lead to an attempt to pfree() shared memory
contents. The fix is to re-read the control file, which seems good
anyway.
It's unclear as of this moment, whether we want to keep the
refactoring introduced in the commit referenced above, or come up with
an alternative approach. But fixing the bug in the mean time seems
like a good idea regardless.
A followup commit will introduce regression test coverage for crash
restarts.
Reported-By: Tom Lane
Discussion: https://postgr.es/m/14134.1505572349@sss.pgh.pa.us
Make the btree page-flags test macros (P_ISLEAF and friends) return clean
boolean values, rather than values that might not fit in a bool. Use them
in a few places that were randomly referencing the flag bits directly.
In passing, change access/nbtree/'s only direct use of BUFFER_LOCK_SHARE to
BT_READ. (Some think we should go the other way, but as long as we have
BT_READ/BT_WRITE, let's use them consistently.)
Masahiko Sawada, reviewed by Doug Doole
Discussion: https://postgr.es/m/CAD21AoBmWPeN=WBB5Jvyz_Nt3rmW1ebUyAnk3ZbJP3RMXALJog@mail.gmail.com
By project convention, these names should include "P" when dealing with a
pointer type; that is, if the result of a GETARG macro is of type FOO *,
it should be called PG_GETARG_FOO_P not just PG_GETARG_FOO. Some newer
types such as JSONB and ranges had not followed the convention, and a
number of contrib modules hadn't gotten that memo either. Rename the
offending macros to improve consistency.
In passing, fix a few places that thought PG_DETOAST_DATUM() returns
a Datum; it does not, it returns "struct varlena *". Applying
DatumGetPointer to that happens not to cause any bad effects today,
but it's formally wrong. Also, adjust an ltree macro that was designed
without any thought for what pgindent would do with it.
This is all cosmetic and shouldn't have any impact on generated code.
Mark Dilger, some further tweaks by me
Discussion: https://postgr.es/m/EA5676F4-766F-4F38-8348-ECC7DB427C6A@gmail.com
If we failed to get a background worker slot, the code just walked
away from the logicalrep-worker slot it already had, leaving that
looking like the worker is still starting up. This led to an indefinite
hang in subscription startup, as reported by Thomas Munro. We must
release the slot on failure.
Also fix a thinko: we must capture the worker slot's generation before
releasing LogicalRepWorkerLock the first time, else testing to see if
it's changed is pretty meaningless.
BTW, the CHECK_FOR_INTERRUPTS() in WaitForReplicationWorkerAttach is a
ticking time bomb, even without considering the possibility of elog(ERROR)
in one of the other functions it calls. Really, this entire business needs
a redesign with some actual thought about error recovery. But for now
I'm just band-aiding the case observed in testing.
Back-patch to v10 where this code was added.
Discussion: https://postgr.es/m/CAEepm=2bP3TBMFBArP6o20AZaRduWjMnjCjt22hSdnA-EvrtCw@mail.gmail.com
When ALTER SUBSCRIPTION DISABLE is run in the same transaction before
DROP SUBSCRIPTION, the latter will hang because workers will still be
running, not having seen the DISABLE committed, and DROP SUBSCRIPTION
will wait until the workers have vacated the replication origin slots.
Previously, DROP SUBSCRIPTION killed the logical replication workers
immediately only if it was going to drop the replication slot, otherwise
it scheduled the worker killing for the end of the transaction, as a
result of 7e174fa793. This, however,
causes the present problem. To fix, kill the workers immediately in all
cases. This covers all cases: A subscription that doesn't have a
replication slot must be disabled. It was either disabled in the same
transaction, or it was already disabled before the current transaction,
but then there shouldn't be any workers left and this won't make a
difference.
Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/87mv6av84w.fsf%40ars-thinkpad
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events. Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.
However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though. What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead. Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.
In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling. This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.
It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time. Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
Commit 0f79440fb introduced mechanism to keep AFTER STATEMENT triggers
from firing more than once per statement, which was formerly possible
if more than one FK enforcement action had to be applied to a given
table. Add a similar mechanism for BEFORE STATEMENT triggers, so that
we don't have the unexpected situation of firing BEFORE STATEMENT
triggers more often than AFTER STATEMENT.
As with the previous patch, back-patch to v10.
Discussion: https://postgr.es/m/22315.1505584992@sss.pgh.pa.us
The elements of RecordCacheArray are TupleDesc, not TupleDesc *.
Those are actually the same size, so that this error is harmless,
but it's still wrong --- and it might bite us someday, if TupleDesc
ever became a struct, say.
Per Coverity.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects. It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table. We weren't doing it
like that.
Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can. There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table. It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.
Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec. (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)
Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.
The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes. This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.
In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.
I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.
Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.
Patch by me, with help and review from Thomas Munro.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
This code is unsafe, as proven by buildfarm failures, because it tries
to access shared memory that might already be gone. It's also unnecessary,
because we're about to exit the process anyway and so the record type cache
should never be accessed again. The idea was to lay some foundations for
someday recycling workers --- which would require attaching to a different
shared tupdesc registry --- but that will require considerably more
thought. In the meantime let's save some bytes by just removing the
nonfunctional code.
Problem identification, and proposal to fix by removing functionality
from the detach function, by Thomas Munro. I went a bit further by
removing the function altogether.
Discussion: https://postgr.es/m/E1dsguX-00056N-9x@gemulon.postgresql.org
Tuples can have type RECORDOID and a typmod number that identifies a blessed
TupleDesc in a backend-private cache. To support the sharing of such tuples
through shared memory and temporary files, provide a typmod registry in
shared memory.
To achieve that, introduce per-session DSM segments, created on demand when a
backend first runs a parallel query. The per-session DSM segment has a
table-of-contents just like the per-query DSM segment, and initially the
contents are a shared record typmod registry and a DSA area to provide the
space it needs to grow.
State relating to the current session is accessed via a Session object
reached through global variable CurrentSession that may require significant
redesign further down the road as we figure out what else needs to be shared
or remodelled.
Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
Previously we read the control file in multiple places. But soon the
segment size will be configurable and stored in the control file, and
that needs to be available earlier than it currently is needed.
Instead of adding yet another place where it's read, refactor things
so there's a single processing of the control file during startup (in
EXEC_BACKEND that's every individual backend's startup).
Author: Andres Freund
Discussion: http://postgr.es/m/20170913092828.aozd3gvvmw67gmyc@alap3.anarazel.de
Flattening the partitioning hierarchy at this stage makes various
desirable optimizations difficult. The original use case for this
patch was partition-wise join, which wants to match up the partitions
in one partitioning hierarchy with those in another such hierarchy.
However, it now seems that it will also be useful in making partition
pruning work using the PartitionDesc rather than constraint exclusion,
because with a flattened expansion, we have no easy way to figure out
which PartitionDescs apply to which leaf tables in a multi-level
partition hierarchy.
As it turns out, we end up creating both rte->inh and !rte->inh RTEs
for each intermediate partitioned table, just as we previously did for
the root table. This seems unnecessary since the partitioned tables
have no storage and are not scanned. We might want to go back and
rejigger things so that no partitioned tables (including the parent)
need !rte->inh RTEs, but that seems to require some adjustments not
related to the core purpose of this patch.
Ashutosh Bapat, reviewed by me and by Amit Langote. Some final
adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRd=1venqLL7oGU=C1dEkuvk2DJgvF+7uKbnPHaum1mvHQ@mail.gmail.com
It's not necessary for such a small program, and it causes unnecessary
extra work to get the correct definition of bool, more so if we are
going to introduce stdbool.h later.
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
With this change, the order of leaf partitions as returned by
RelationGetPartitionDispatchInfo should now be the same as the
order used by expand_inherited_rtentry. This will make it simpler
for future patches to match up the partition dispatch information
with the planner data structures. The new code is also, in my
opinion anyway, simpler and easier to understand.
Amit Langote, reviewed by Amit Khandekar. I also reviewed and
made a few cosmetic revisions.
Discussion: http://postgr.es/m/d98d4761-5071-1762-501e-0e15047c714b@lab.ntt.co.jp
Using ++ on a bool variable doesn't work well when stdbool.h is in use.
The original BSD code appears to use int here, so use that instead.
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
During the development of d47cfef711 the CFI()s in ExecScan() were
moved back and forth, ending up in the wrong place. Thus queries that
largely spend their time in ExecScan(), and have neither projection
nor a qual, can't be cancelled in a timely manner.
Reported-By: Jeff Janes
Author: Andres Freund
Discussion: https://postgr.es/m/CAMkU=1weDXp8eLLPt9SO1LEUsJYYK9cScaGhLKpuN+WbYo9b5g@mail.gmail.com
Backpatch: 10, as d47cfef711
The order in which GRANTs are output is important as GRANTs which have
been GRANT'd by individuals via WITH GRANT OPTION GRANTs have to come
after the GRANT which included the WITH GRANT OPTION. This happens
naturally in the backend during normal operation as we only change
existing ACLs in-place, only add new ACLs to the end, and when removing
an ACL we remove any which depend on it also.
Also, adjust the comments in acl.h to make this clear.
Unfortunately, the updates to pg_dump to handle initial privileges
involved pulling apart ACLs and then combining them back together and
could end up putting them back together in an invalid order, leading to
dumps which wouldn't restore.
Fix this by adjusting the queries used by pg_dump to ensure that the
ACLs are rebuilt in the same order in which they were originally.
Back-patch to 9.6 where the changes for initial privileges were done.
Test queries added by commit 69835bc89 are giving unexpected results
on some smaller buildfarm critters. I think probably the seqscan
logic is kicking in to cause the scans to not start at the beginning
of the table. Add ORDER BY to make them be indexscans instead.
Per buildfarm member chipmunk.
Historically, the selectivity functions have simply not distinguished
< from <=, or > from >=, arguing that the fraction of the population that
satisfies the "=" aspect can be considered to be vanishingly small, if the
comparison value isn't any of the most-common-values for the variable.
(If it is, the code path that executes the operator against each MCV will
take care of things properly.) But that isn't really true unless we're
dealing with a continuum of variable values, and in practice we seldom are.
If "x = const" would estimate a nonzero number of rows for a given const
value, then it follows that we ought to estimate different numbers of rows
for "x < const" and "x <= const", even if the const is not one of the MCVs.
Handling this more honestly makes a significant difference in edge cases,
such as the estimate for a tight range (x BETWEEN y AND z where y and z
are close together).
Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly
split scalargtsel into scalargtsel/scalargesel. Adjust <= and >=
operator definitions to reference the new selectivity functions.
Improve the core ineq_histogram_selectivity() function to make a
correction for equality. (Along the way, I learned quite a bit about
exactly why that function gives good answers, which I tried to memorialize
in improved comments.)
The corresponding join selectivity functions were, and remain, just stubs.
But I chose to split them similarly, to avoid confusion and to prevent the
need for doing this exercise again if someone ever makes them less stubby.
In passing, change ineq_histogram_selectivity's clamp for extreme
probability estimates so that it varies depending on the histogram
size, instead of being hardwired at 0.0001. With the default histogram
size of 100 entries, you still get the old clamp value, but bigger
histograms should allow us to put more faith in edge values.
Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh
Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us
The previous error message when attempting to run a general SQL command
in a physical replication WAL sender was a bit sloppy.
Reported-by: Fujii Masao <masao.fujii@gmail.com>
Commit 83aaac41c6 introduced the use of
LDAP_NO_ATTRS to avoid requesting a dummy attribute when doing search+bind
LDAP authentication. It turns out that not all LDAP implementations define
that macro, but its value is fixed by the protocol so we can define it
ourselves if it's missing.
Author: Thomas Munro
Reported-By: Ashutosh Sharma
Discussion: https://postgr.es/m/CAE9k0Pm6FKCfPCiAr26-L_SMGOA7dT_k0%2B3pEbB8%2B-oT39xRpw%40mail.gmail.com
This patch adds ERROR, SQLSTATE, and ROW_COUNT, which are updated after
every query, as well as LAST_ERROR_MESSAGE and LAST_ERROR_SQLSTATE,
which are updated only when a query fails. The expected usage of these
is for scripting.
Fabien Coelho, reviewed by Pavel Stehule
Discussion: https://postgr.es/m/alpine.DEB.2.20.1704042158020.12290@lancre
Before, only filters of the form "(<ldapsearchattribute>=<user>)"
could be used to search an LDAP server. Introduce ldapsearchfilter
so that more general filters can be configured using patterns, like
"(|(uid=$username)(mail=$username))" and "(&(uid=$username)
(objectClass=posixAccount))". Also allow search filters to be included
in an LDAP URL.
Author: Thomas Munro
Reviewed-By: Peter Eisentraut, Mark Cave-Ayland, Magnus Hagander
Discussion: https://postgr.es/m/CAEepm=0XTkYvMci0WRubZcf_1am8=gP=7oJErpsUfRYcKF2gwg@mail.gmail.com
When copying from an active database tree, it's possible for files to be
deleted after we see them in a readdir() scan but before we can open them.
(Once we've got a file open, we don't expect any further errors from it
getting unlinked, though.) Tweak RecursiveCopy so it can cope with this
case, so as to avoid irreproducible test failures.
Back-patch to 9.6 where this code was added. In v10 and HEAD, also
remove unused "use RecursiveCopy" in one recovery test script.
Michael Paquier and Tom Lane
Discussion: https://postgr.es/m/24621.1504924323@sss.pgh.pa.us
This is primarily useful for making tests of this utility more
deterministic, to avoid the complexity of starting pg_receivewal as a
deamon in TAP tests.
While this is less useful than the equivalent pg_recvlogical option,
users can as well use it for example to enforce WAL streaming up to a
end-of-backup position, to save only a minimal amount of WAL.
Use this new option to stream WAL data in a deterministic way within a
new set of TAP tests.
Author: Michael Paquier <michael.paquier@gmail.com>
This allows the compiler/linker to move the static variables to a
read-only segment. Not all the signature changes are necessary, but
it seems better to apply const in a consistent manner.
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20170910232154.asgml44ji2b7lv3d@alap3.anarazel.de
AFTER triggers using transition tables crashed if they were fired due
to a foreign key ON CASCADE update. This is because ExecEndModifyTable
flushes the transition tables, on the assumption that any trigger that
could need them was already fired during ExecutorFinish. Normally
that's true, because we don't allow transition-table-using triggers
to be deferred. However, foreign key CASCADE updates force any
triggers on the referencing table to be deferred to the outer query
level, by means of the EXEC_FLAG_SKIP_TRIGGERS flag. I don't recall
all the details of why it's like that and am pretty loath to redesign
it right now. Instead, just teach ExecEndModifyTable to skip destroying
the TransitionCaptureState when that flag is set. This will allow the
transition table data to survive until end of the current subtransaction.
This isn't a terribly satisfactory solution, because (1) we might be
leaking the transition tables for much longer than really necessary,
and (2) as things stand, an AFTER STATEMENT trigger will fire once per
RI updating query, ie once per row updated or deleted in the referenced
table. I suspect that is not per SQL spec. But redesigning this is a
research project that we're certainly not going to get done for v10.
So let's go with this hackish answer for now.
In passing, tweak AfterTriggerSaveEvent to not save the transition_capture
pointer into the event record for a deferrable trigger. This is not
necessary to fix the current bug, but it avoids letting dangling pointers
to long-gone transition tables persist in the trigger event queue. That's
at least a safety feature. It might also allow merging shared trigger
states in more cases than before.
I added a regression test that demonstrates the crash on unpatched code,
and also exposes the behavior of firing the AFTER STATEMENT triggers
once per row update.
Per bug #14808 from Philippe Beaudoin. Back-patch to v10.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
This improves the regression tests' coverage of rbtree.c from pretty
awful (because some of the functions aren't used yet) to basically 100%.
Victor Drobny, reviewed by Aleksander Alekseev and myself
Discussion: https://postgr.es/m/c9d61310e16e75f8acaf6cb1c48b7b77@postgrespro.ru
This code isn't used, and there's no clear reason why anybody would ever
want to use it. These traversal mechanisms don't yield a visitation order
that is semantically meaningful for any external purpose, nor are they
any faster or simpler than the left-to-right or right-to-left traversals.
(In fact, some rough testing suggests they are slower :-(.) Moreover,
these mechanisms are impossible to test in any arm's-length fashion; doing
so requires knowledge of the red-black tree's internal implementation.
Hence, let's just jettison them.
Discussion: https://postgr.es/m/17735.1505003111@sss.pgh.pa.us
The previous coding of get_qual_for_list() was careful to copy everything
it was using from the input data structure. The new version missed
making a copy of pass-by-ref datum values that it's inserting into Consts.
This is not optional, however, as revealed by buildfarm failures on
machines running -DRELCACHE_FORCE_RELEASE: we're copying from a relcache
entry that could go away before the required lifespan of our output
expression. I'm pretty sure -DCLOBBER_CACHE_ALWAYS machines won't like
this either, but none of them have reported in yet.
map_partition_varattnos() failed to set its found_whole_row output
parameter if the given expression list was NIL. This seems to be
a pre-existing bug that chanced to be exposed by commit 6f6b99d13.
It might be unreachable in v10, but I have little faith in that
proposition, so back-patch.
Per buildfarm.
Not completely sure, but I think bowerbird is spitting up on attempting
to include ">" in a temporary file name. (Why in the world are we
writing this stuff into files at all? A hash would be a better answer.)
* Remove no-such-user test case, output isn't stable, and we really
don't need to be testing such cases here anyway.
* Fix the process exit code test logic to match PostgresNode::psql
(but I didn't bother with looking at the "core" flag).
* Give up on inf/nan tests.
Per buildfarm.
The encoding ID was converted between string and number too many times,
probably a remnant from the shell script days.
Reviewed-by: Aleksandr Parfenov <a.parfenov@postgrespro.ru>
* Our own version of getopt_long doesn't support abbreviation of
long options.
* It doesn't do automatic rearrangement of non-option arguments to the end,
either.
* Test was way too optimistic about the platform independence of
NaN and Infinity outputs. I rather imagine we might have to lose
those tests altogether, but for the moment just allow case variation
and fully spelled out Infinity.
Per buildfarm.
In commit fccebe421, we hacked get_actual_variable_range() to scan the
index with SnapshotDirty, so that if there are many uncommitted tuples
at the end of the index range, it wouldn't laboriously scan through all
of them looking for a live value to return. However, that didn't fix it
for the case of many recently-dead tuples at the end of the index;
SnapshotDirty recognizes those as committed dead and so we're back to
the same problem.
To improve the situation, invent a "SnapshotNonVacuumable" snapshot type
and use that instead. The reason this helps is that, if the snapshot
rejects a given index entry, we know that the indexscan will mark that
index entry as killed. This means the next get_actual_variable_range()
scan will proceed past that entry without visiting the heap, making the
scan a lot faster. We may end up accepting a recently-dead tuple as
being the estimated extremal value, but that doesn't seem much worse than
the compromise we made before to accept not-yet-committed extremal values.
The cost of the scan is still proportional to the number of dead index
entries at the end of the range, so in the interval after a mass delete
but before VACUUM's cleaned up the mess, it's still possible for
get_actual_variable_range() to take a noticeable amount of time, if you've
got enough such dead entries. But the constant factor is much much better
than before, since all we need to do with each index entry is test its
"killed" bit.
We chose to back-patch commit fccebe421 at the time, but I'm hesitant to
do so here, because this form of the problem seems to affect many fewer
people. Also, even when it happens, it's less bad than the case fixed
by commit fccebe421 because we don't get the contention effects from
expensive TransactionIdIsInProgress tests.
Dmitriy Sarafannikov, reviewed by Andrey Borodin
Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru
It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These
two styles have been applied inconsistently. After discussion, we'll
use the more verbose style for plain function pointer variables, to make
it clear that it's a variable, and the shorter style when the function
pointer is in a struct (s.func() or s->func()), because then it's clear
that it's not a plain function name, and otherwise the excessive
punctuation makes some of those invocations hard to read.
Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
This doesn't allow routing tuple to the foreign partitions themselves,
but it permits tuples to be routed to regular partitions despite the
presence of foreign partitions in the same inheritance hierarchy.
Etsuro Fujita, reviewed by Amit Langote and by me.
Discussion: http://postgr.es/m/bc3db4c1-1693-3b8a-559f-33ad2b50b7ad@lab.ntt.co.jp
Issuing a savepoint-related command in a Query message that contains
multiple SQL statements led to a FATAL exit with a complaint about
"unexpected state STARTED". This is a shortcoming of commit 4f896dac1,
which attempted to prevent such misbehaviors in multi-statement strings;
its quick hack of marking the individual statements as "not top-level"
does the wrong thing in this case, and isn't a very accurate description
of the situation anyway.
To fix, let's introduce into xact.c an explicit model of what happens for
multi-statement Query strings. This is an "implicit transaction block
in progress" state, which for many purposes works like the normal
TBLOCK_INPROGRESS state --- in particular, IsTransactionBlock returns true,
causing the desired result that PreventTransactionChain will throw error.
But in case of error abort it works like TBLOCK_STARTED, allowing the
transaction to be cancelled without need for an explicit ROLLBACK command.
Commit 4f896dac1 is reverted in toto, so that we go back to treating the
individual statements as "top level". We could have left it as-is, but
this allows sharpening the error message for PreventTransactionChain
calls inside functions.
Except for getting a normal error instead of a FATAL exit for savepoint
commands, this patch should result in no user-visible behavioral change
(other than that one error message rewording). There are some things
we might want to do in the line of changing the appearance or wording of
error and warning messages around this behavior, which would be much
simpler to do now that it's an explicitly modeled state. But I haven't
done them here.
Although this fixes a long-standing bug, no backpatch. The consequences
of the bug don't seem severe enough to justify the risk that this commit
itself creates some new issue.
Patch by me, but it owes something to previous investigation by
Takayuki Tsunakawa, who also reported the bug in the first place.
Also thanks to Michael Paquier for reviewing.
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05
In the generic atomic ops that rely on a loop around a CAS primitive,
there's no need to force the initial read of the "old" value to be atomic.
In the typically-rare case that we get a torn value, that simply means
that the first CAS attempt will fail; but it will update "old" to the
atomically-read value, so the next attempt has a chance of succeeding.
It was already being done that way in pg_atomic_exchange_u64_impl(),
but let's duplicate the approach in the rest.
(Given the current coding of the pg_atomic_read functions, this change
is a no-op anyway on popular platforms; it only makes a difference where
pg_atomic_read_u64_impl() is implemented as a CAS.)
In passing, also remove unnecessary take-a-pointer-and-dereference-it
coding in the pg_atomic_read functions. That seems to have been based
on a misunderstanding of what the C standard requires. What actually
matters is that the pointer be declared as pointing to volatile, which
it is.
I don't believe this will change the assembly code at all on x86
platforms (even ignoring the likelihood that these implementations
get overridden by others); but it may help on less-mainstream CPUs.
Discussion: https://postgr.es/m/13707.1504718238@sss.pgh.pa.us
Index columns are referenced by ordinal number rather than name, e.g.
CREATE INDEX coord_idx ON measured (x, y, (z + t));
ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000;
Incompatibility note for release notes:
\d+ for indexes now also displays Stats Target
Authors: Alexander Korotkov, with contribution by Adrien NAYRAT
Review: Adrien NAYRAT, Simon Riggs
Wordsmith: Simon Riggs
In addition to __sync_fetch_and_add, gcc offers __sync_fetch_and_sub,
__sync_fetch_and_and, and __sync_fetch_and_or, which correspond directly
to primitive atomic ops that we want. Testing shows that in some cases
they generate better code than our generic implementations, so use them.
We've assumed that configure's test for __sync_val_compare_and_swap is
sufficient to allow assuming that __sync_fetch_and_add is available, so
make the same assumption for these functions. Should that prove to be
wrong, we can add more configure tests.
Yura Sokolov, reviewed by Jesper Pedersen and myself
Discussion: https://postgr.es/m/7f65886daca545067f82bf2b463b218d@postgrespro.ru
The pg_atomic_compare_exchange_xxx functions are defined to update
*expected to whatever they read from the target variable. Therefore,
there's no need to do additional explicit reads after we've initialized
the "old" variable. The actual benefit of this is somewhat debatable,
but it seems fairly unlikely to hurt anything, especially since we
will override the generic implementations in most performance-sensitive
cases.
Yura Sokolov, reviewed by Jesper Pedersen and myself
Discussion: https://postgr.es/m/7f65886daca545067f82bf2b463b218d@postgrespro.ru
The NAMEDTUPLESTORE patch piggybacked on the infrastructure for
TABLEFUNC/VALUES/CTE RTEs, none of which can ever have dropped columns,
so the possibility was ignored most places. Fix that, including adding a
specification to parsenodes.h about what it's supposed to look like.
In passing, clean up assorted comments that hadn't been maintained
properly by said patch.
Per bug #14799 from Philippe Beaudoin. Back-patch to v10.
Discussion: https://postgr.es/m/20170906120005.25630.84360@wrigleys.postgresql.org
This command acts somewhat like \g, but instead of executing the query
buffer, it merely prints a description of the columns that the query
result would have. (Of course, this still requires parsing the query;
if parse analysis fails, you get an error anyway.) We accomplish this
using an unnamed prepared statement, which should be invisible to psql
users.
Pavel Stehule, reviewed by Fabien Coelho
Discussion: https://postgr.es/m/CAFj8pRBhYVvO34FU=EKb=nAF5t3b++krKt1FneCmR0kuF5m-QA@mail.gmail.com
It has not been used in a long time, and it doesn't seem safe anyway, so
drop it.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
The parenthesized style has only been used in a few modules. Change
that to use the style that is predominant across the whole tree.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
This moves the data directories from using temporary directories with
randomness in the directory name to a static name, to make it easier to
debug. The data directory will be retained if tests fail or the test
code dies/exits with failure, and is automatically removed on the next
make check.
If the environment variable PG_TEST_NOCLEAN is defined, the data
directories will be retained regardless of test or exit status.
Author: Daniel Gustafsson <daniel@yesql.se>
Throttling for sending a base backup in walsender is broken for the case
where there is a lot of WAL traffic, because the latch used to put the
walsender to sleep is also signalled by regular WAL traffic (and each
signal causes an additional batch of data to be sent); the net effect is
that there is no or little actual throttling. This is undesirable, so
rewrite the sleep into a loop to achieve the desired effeect.
Author: Jeff Janes, small tweaks by me
Reviewed-by: Antonin Houska
Discussion: https://postgr.es/m/CAMkU=1xH6mde-yL-Eo1TKBGNd0PB1-TMxvrNvqcAkN-qr2E9mw@mail.gmail.com
We already had a psql variable VERSION that shows the verbose form of
psql's own version. Add VERSION_NAME to show the short form (e.g.,
"11devel") and VERSION_NUM to show the numeric form (e.g., 110000).
Also add SERVER_VERSION_NAME and SERVER_VERSION_NUM to show the short and
numeric forms of the server's version. (We'd probably add SERVER_VERSION
with the verbose string if it were readily available; but adding another
network round trip to get it seems too expensive.)
The numeric forms, in particular, are expected to be useful for scripting
purposes, now that psql can do conditional tests.
Fabien Coelho, reviewed by Pavel Stehule
Discussion: https://postgr.es/m/alpine.DEB.2.20.1704020917220.4632@lancre
The previous format with variable names and descriptions in separate
columns was extremely constraining about the length of the descriptions.
We'd dealt with that in several inconsistent ways over the years,
including letting the lines run over 80 characters, breaking descriptions
into multiple lines, or shoving the description onto a separate line.
But it's been a long time since the output could realistically fit onto
a single screen vertically, so let's just rely even more heavily on the
pager to deal with the vertical distance, and split each entry into two
(or more) lines, in the format
variable-name
variable description goes here
Each variable name + description remains a single translatable string,
in hopes of reducing translator confusion; we're just changing the
embedded whitespace.
I failed to resist the temptation to copy-edit one or two of the
descriptions while at it.
Discussion: https://postgr.es/m/2947.1504542679@sss.pgh.pa.us
process_backslash_command would drop the last character of the input
command on the assumption that it was a newline. Given a non newline
terminated input file, this could result in dropping the last character
of the command. Fix that by doing an actual test that we're removing
a newline.
While at it, allow for Windows newlines (\r\n), and suppress multiple
newlines if any. I do not think either of those cases really occur,
since (a) we read script files in text mode and (b) the lexer stops
when it hits a newline. But it's cheap enough and it provides a
stronger guarantee about what the result string looks like.
This is just cosmetic, I think, since the possibly-overly-chomped
line was only used for display not for further processing. So
it doesn't seem necessary to back-patch.
Fabien Coelho, reviewed by Nikolay Shaplov, whacked around a bit by me
Discussion: https://postgr.es/m/alpine.DEB.2.20.1704171422500.4025@lancre
With --latency-limit, transactions might get skipped even beyond the
transaction count limit specified by -t, throwing off the expected
number of transactions and thus the denominator for later stats.
Be sure to stop skipping transactions once -t is reached.
Also, include skipped transactions in the "cnt" fields; this requires
discounting them again in a couple of places, but most places are
better off with this definition. In particular this is needed to
get correct overall stats for the combination of -R/-L/-t options.
Merge some more processing into processXactStats() to simplify this.
In passing, add a check that --progress-timestamp is specified only
when --progress is.
We might consider back-patching this, but given that it only matters
for a combination of options, and given the lack of field complaints,
consensus seems to be not to bother.
Fabien Coelho, reviewed by Nikolay Shaplov
Discussion: https://postgr.es/m/alpine.DEB.2.20.1704171422500.4025@lancre
Some compilers complain, not unreasonably, about left-shifting an
int32 "1" and then assigning the result to an int64. In practice
I sure hope that this data structure never gets large enough that
an overflow would actually occur; but let's cast the constant to
the right type to avoid the hazard.
In passing, fix a typo in dshash.h.
Amit Kapila, adjusted as per comment from Thomas Munro.
Discussion: https://postgr.es/m/CAA4eK1+5vfVMYtjK_NX8O3-42yM3o80qdqWnQzGquPrbq6mb+A@mail.gmail.com
In commit 9d6b160d7, I tweaked pg_config.h.win32 to use
"#define HAVE_LONG_LONG_INT_64 1" rather than defining it as empty,
for consistency with what happens in an autoconf'd build.
But Solution.pm injects another definition of that macro into
ecpg_config.h, leading to justifiable (though harmless) compiler whining.
Make that one consistent too. Back-patch, like the previous patch.
Discussion: https://postgr.es/m/CAEepm=1dWsXROuSbRg8PbKLh0S=8Ou-V8sr05DxmJOF5chBxqQ@mail.gmail.com
Move the responsibility for creating/destroying TupleQueueReaders into
execParallel.c, to avoid duplicative coding in nodeGather.c and
nodeGatherMerge.c. Also, instead of having DestroyTupleQueueReader do
shm_mq_detach, do it in the caller (which is now only ExecParallelFinish).
This means execParallel.c does both the attaching and detaching of the
tuple-queue-reader shm_mqs, which seems less weird than the previous
arrangement.
These changes also eliminate a vestigial memory leak (of the pei->tqueue
array). It's now demonstrable that rescans of Gather or GatherMerge don't
leak memory.
Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
Add the maxrss field to the getrusage output (log_*_stats). This was
previously omitted because of portability concerns, but we feel this
might not be a concern anymore.
based on patch by Justin Pryzby <pryzby@telsasoft.com>
Instead of using a cast to force the constant to be the right width,
assume we can plaster on an L, UL, LL, or ULL suffix as appropriate.
The old approach to this is very hoary, dating from before we were
willing to require compilers to have working int64 types.
This fix makes the PG_INT64_MIN, PG_INT64_MAX, and PG_UINT64_MAX
constants safe to use in preprocessor conditions, where a cast
doesn't work. Other symbolic constants that might be defined using
[U]INT64CONST are likewise safer than before.
Also fix the SIZE_MAX macro to be similarly safe, if we are forced
to provide a definition for that. The test added in commit 2e70d6b5e
happens to do what we want even with the hack "(size_t) -1" definition,
but we could easily get burnt on other tests in future.
Back-patch to all supported branches, like the previous commits.
Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us
Pre-C99 platforms may lack <stdint.h> and thereby SIZE_MAX. We have
a couple of places using the hack "(size_t) -1" as a fallback, but
it wasn't universally available; which means the code added in commit
2e70d6b5e fails to compile everywhere. Move that hack to c.h so that
we can rely on having SIZE_MAX everywhere.
Per discussion, it'd be a good idea to make the macro's value safe
for use in #if-tests, but that will take a bit more work. This is
just a quick expedient to get the buildfarm green again.
Back-patch to all supported branches, like the previous commit.
Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us
Commit 0e141c0fbb introduced a mechanism
to reduce contention on ProcArrayLock by having a single process clear
XIDs in the procArray on behalf of multiple processes, reducing the
need to hand the lock around. A previous attempt to introduce a similar
mechanism for CLogControlLock in ccce90b398
crashed and burned, but the design problem which resulted in those
failures is believed to have been corrected in this version.
Amit Kapila, with some cosmetic changes by me. See the previous commit
message for additional credits.
Discussion: http://postgr.es/m/CAA4eK1KudxzgWhuywY_X=yeSAhJMT4DwCjroV5Ay60xaeB2Eew@mail.gmail.com
Do for replication origins what the previous commit did for replication
slots: restore the original behavior of replication origin drop to raise
an error rather than blocking, because users might be depending on the
original behavior. Maintain the blocking behavior when invoked
internally from logical replication subscription handling.
Discussion: https://postgr.es/m/20170830133922.tlpo3lgfejm4n2cs@alvherre.pgsql
Commit 9915de6c1c changed the default behavior of
DROP_REPLICATION_SLOT so that it would wait until any session holding
the slot active would release it, instead of raising an error. But
users are already depending on the original behavior, so revert to it by
default and add a WAIT option to invoke the new behavior.
Per complaint from Simone Gotti, in
Discussion: https://postgr.es/m/CAEvsy6Wgdf90O6pUvg2wSVXL2omH5OPC-38OD4Zzgk-FXavj3Q@mail.gmail.com
This will be useful for hash partitioning, which needs a way to seed
the hash functions to avoid problems such as a hash index on a hash
partitioned table clumping all values into a small portion of the
bucket space; it's also useful for anything that wants a 64-bit hash
value rather than a 32-bit hash value.
Just in case somebody wants a 64-bit hash value that is compatible
with the existing 32-bit hash values, make the low 32-bits of the
64-bit hash value match the 32-bit hash value when the seed is 0.
Robert Haas and Amul Sul
Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com