Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
This provides a newer version of adminpack which works with the newly
added default roles to support GRANT'ing to non-superusers access to
read and write files, along with related functions (unlinking files,
getting file length, renaming/removing files, scanning the log file
directory) which are supported through adminpack.
Note that new versions of the functions are required because an
environment might have an updated version of the library but still have
the old adminpack 1.0 catalog definitions (where EXECUTE is GRANT'd to
PUBLIC for the functions).
This patch also removes the long-deprecated alternative names for
functions that adminpack used to include and which are now included in
the backend, in adminpack v1.1. Applications using the deprecated names
should be updated to use the backend functions instead. Existing
installations which continue to use adminpack v1.0 should continue to
function until/unless adminpack is upgraded.
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
This patch adds new default roles named 'pg_read_server_files',
'pg_write_server_files', 'pg_execute_server_program' which
allow an administrator to GRANT to a non-superuser role the ability to
access server-side files or run programs through PostgreSQL (as the user
the database is running as). Having one of these roles allows a
non-superuser to use server-side COPY to read, write, or with a program,
and to use file_fdw (if installed by a superuser and GRANT'd USAGE on
it) to read from files or run a program.
The existing misc file functions are also changed to allow a user with
the 'pg_read_server_files' default role to read any files on the
filesystem, matching the privileges given to that role through COPY and
file_fdw from above.
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
This removes the explicit superuser checks in the various file-access
functions in the backend, specifically pg_ls_dir(), pg_read_file(),
pg_read_binary_file(), and pg_stat_file(). Instead, EXECUTE is REVOKE'd
from public for these, meaning that only a superuser is able to run them
by default, but access to them can be GRANT'd to other roles.
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
The previous coding inadvertently checked the constraints for the
partitioned table rather than the target partition, which could
lead to data in a partition that fails to satisfy some constraint
on that partition. This problem seems to date back to when
table partitioning was introduced; prior to that, there was only
one target table for a COPY, so the problem didn't occur, and the
code just didn't get updated.
Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat
Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp
We don't actually need the insert-or-update logic, so it's clearer to
have separate functions for the inserting and updating.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
In case the subscription is removed before the worker is fully started,
give a specific error message instead of the generic "cache lookup"
error.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).
Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.
This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.
Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
A normal SQL command run inside PL/pgSQL acquires a snapshot, but SET
TRANSACTION does not work anymore if a snapshot is set. So we have to
handle this separately.
Reviewed-by: Alexander Korotkov <a.korotkov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
If we are not going to save the plan, then we need to unset expr->plan
after we are done, also in error cases. Otherwise, we get a dangling
pointer next time around.
This is not the ideal solution. It would be better if we could convince
SPI not to associate a cached plan with a resource owner, and then we
could just save the plan in all cases. But that would require bigger
surgery.
Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
THis adds a "flags" field to the BackgroundWorkerInitializeConnection()
and BackgroundWorkerInitializeConnectionByOid(). For now only one flag,
BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore
datallowconn.
Error-tolerant conversion function with web-like syntax for search query,
it simplifies constraining search engine with close to habitual interface for
users.
Bump catalog version
Authors: Victor Drobny, Dmitry Ivanov with editorization by me
Reviewed by: Aleksander Alekseev, Tomas Vondra, Thomas Munro, Aleksandr Parfenov
Discussion: https://www.postgresql.org/message-id/flat/fe931111ff7e9ad79196486ada79e268@postgrespro.ru
857f9c36 bumps B-tree metapage version while upgrade is performed "on the fly"
when needed. However, some asserts fired when old version metapage was
cached to rel->rd_amcache. Despite new metadata fields are never used from
rel->rd_amcache, that needs to be fixed. This patch introduces metadata
upgrade during its caching, which fills unavailable fields with their default
values. contrib/pageinspect is also patched to handle non-upgraded metapages
in the same way.
Author: Alexander Korotkov
Review comments from Andres Freund
* Consolidate code into AfterTriggerGetTransitionTable()
* Rename nodeMerge.c to execMerge.c
* Rename nodeMerge.h to execMerge.h
* Move MERGE handling in ExecInitModifyTable()
into a execMerge.c ExecInitMerge()
* Move mt_merge_subcommands flags into execMerge.h
* Rename opt_and_condition to opt_merge_when_and_condition
* Wordsmith various comments
Author: Pavan Deolasee
Reviewer: Simon Riggs
Maintainers of out-of-tree PLs typically need access to the set of
error codes. To avoid the need to duplicate that information in some
form in PL source trees, provide errcodes.txt as part of a server
installation.
Thomas Munro, based on a suggestion from Andrew Gierth
Discussion: https://postgr.es/m/87woykk7mu.fsf%40news-spur.riddles.org.uk
This is a blind fix, since I don't have SE-Linux to verify it.
Per unwanted change in rhinoceros, running sepgsql tests. Noted by Tom
Lane.
Discussion: https://postgr.es/m/32347.1522865050@sss.pgh.pa.us
This reworks how the tests to run are defined. Instead of having to
define all runs for all tests, we define those tests which should pass
(generally using one of the defined broad hashes), add in any which
should be specific for this test, and exclude any specific runs that
shouldn't pass for this test. This ends up removing some 4k+ lines
(more than half the file) but, more importantly, greatly simplifies the
way runs-to-be-tested are defined.
As discussed in the updated comments, for example, take the test which
does CREATE TABLE test_table. That CREATE TABLE should show up in all
'full' runs of pg_dump, except those cases where 'test_table' is
excluded, of course, and that's exactly how the test gets defined now
(modulo a few other related cases, like where we dump only that table,
or we dump the schema it's in, or we exclude the schema it's in):
like => {
%full_runs,
%dump_test_schema_runs,
only_dump_test_table => 1,
section_pre_data => 1, },
unlike => {
exclude_dump_test_schema => 1,
exclude_test_table => 1, }, },
Next, we no longer expect every run to be listed for every test. If a
run is listed in 'like' (directly or through a hash) then it's a 'like',
unless it's listed in 'unlike' in which case it's an 'unlike'. If it
isn't listed in either, then it's considered an 'unlike' automatically.
Lastly, this changes the code to no longer use like/unlike but rather to
use 'ok()' with 'diag()' which allows much more control over what gets
spit out to the screen. Gone are the days of the entire dump being sent
to the console, now you'll just get a couple of lines for each failing
test which say the test that failed and the run that it failed on.
This covers both the pg_dump TAP tests in src/bin/pg_dump and those in
src/test/modules/test_pg_dump.
BRIN indexes like to propagate additions of free space into the upper pages
of their free space maps as soon as the new space is known, even when it's
just on one individual index page. Previously this required calling
FreeSpaceMapVacuum, which is quite an expensive thing if the map is large.
Use the FreeSpaceMapVacuumRange function recently added by commit c79f6df75
to reduce the amount of work done for this purpose.
Fix a couple of places that neglected to do the upper-page vacuuming at all
after recording new free space. If the policy is to be that BRIN should do
that, it should do it everywhere.
Do RecordPageWithFreeSpace unconditionally in brin_page_cleanup, and do
FreeSpaceMapVacuum unconditionally in brin_vacuum_scan. Because of the
FSM's imprecise storage of free space, the old complications here seldom
bought anything, they just slowed things down. This approach also
provides a predictable path for FSM corruption to be repaired.
Remove premature RecordPageWithFreeSpace call in brin_getinsertbuffer
where it's about to return an extended page to the caller. The caller
should do that, instead, after it's inserted its new tuple. Fix the
one caller that forgot to do so.
Simplify logic in brin_doupdate's same-page-update case by postponing
brin_initialize_empty_new_buffer to after the critical section; I see
little point in doing it before.
Avoid repeat calls of RelationGetNumberOfBlocks in brin_vacuum_scan.
Avoid duplicate BufferGetBlockNumber and BufferGetPage calls in
a couple of places where we already had the right values.
Move a BRIN_elog debug logging call out of a critical section; that's
pretty unsafe and I don't think it buys us anything to not wait till
after the critical section.
Move the "*extended = false" step in brin_getinsertbuffer into the
routine's main loop. There's no actual bug there, since the loop can't
iterate with *extended still true, but it doesn't seem very future-proof
as coded; and it's certainly not documented as a loop invariant.
This is all from follow-on investigation inspired by commit c79f6df75.
Discussion: https://postgr.es/m/5801.1522429460@sss.pgh.pa.us
Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete
calls and one amvacuumcleanup call. When workload on particular table
is append-only, then autovacuum isn't intended to touch this table. However,
user may run vacuum manually in order to fill visibility map and get benefits
of index-only scans. Then ambulkdelete wouldn't be called for indexes
of such table (because no heap tuples were deleted), only amvacuumcleanup would
be called In this case, amvacuumcleanup would perform full index scan for
two objectives: put recyclable pages into free space map and update index
statistics.
This patch allows btvacuumclanup to skip full index scan when two conditions
are satisfied: no pages are going to be put into free space map and index
statistics isn't stalled. In order to check first condition, we store
oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then
there are some recyclable pages. In order to check second condition we store
number of heap tuples observed during previous full index scan by cleanup.
If fraction of newly inserted tuples is less than
vacuum_cleanup_index_scale_factor, then statistics isn't considered to be
stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default).
This patch bumps B-tree meta-page version. Upgrade of meta-page is performed
"on the fly": during VACUUM meta-page is rewritten with new version. No special
handling in pg_upgrade is required.
Author: Masahiko Sawada, Alexander Korotkov
Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov
Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
The code before the main loop, to handle the possible 1-7 unaligned bytes
at the beginning of the input, was broken, and read past the input, if the
the input was very short.
Hopefully fix the fact that these checks are unstable, by introducing
the corruption in a separate table from pg_class, and also explicitly
disable autovacuum on those tables. Also make sure PostgreSQL is
stopped while the corruption is introduced to avoid possible caching
effects.
Author: Michael Banck
ARMv8 introduced special CPU instructions for calculating CRC-32C. Use
them, when available, for speed.
Like with the similar Intel CRC instructions, several factors affect
whether the instructions can be used. The compiler intrinsics for them must
be supported by the compiler, and the instructions must be supported by the
target architecture. If the compilation target architecture does not
support the instructions, but adding "-march=armv8-a+crc" makes them
available, then we compile the code with a runtime check to determine if
the host we're running on supports them or not.
For the runtime check, use glibc getauxval() function. Unfortunately,
that's not very portable, but I couldn't find any more portable way to do
it. If getauxval() is not available, the CRC instructions will still be
used if the target architecture supports them without any additional
compiler flags, but the runtime check will not be available.
Original patch by Yuqi Gu, heavily modified by me. Reviewed by Andres
Freund, Thomas Munro.
Discussion: https://www.postgresql.org/message-id/HE1PR0801MB1323D171938EABC04FFE7FA9E3110%40HE1PR0801MB1323.eurprd08.prod.outlook.com
Trigger cloning to partitions was supposed to occur for user-visible
triggers only, but during development the protection that prevented it
from occurring to internal triggers was lost. Reinstate it, as well as
add a test case to ensure internal triggers (in the tested case,
triggers implementing a deferred unique constraint) are not cloned.
Without the code fix, the partitions in the test end up with different
numbers of triggers, which is clearly wrong ...
Bug in 86f575948c.
Discussion: https://postgr.es/m/20180403214903.ozfagwjcpk337uw7@alvherre.pgsql
Make buffer 1 byte larger to fit a sign. It's actually impossible for
there to be a sign in practice, but this is still required to keep GCC 7
happy.
Cleanup from commit 51bc271790.
Based on a suggestion from Peter Eisentraut.
Author: Peter Geoghegan
Reported-By: Peter Eisentraut
Discussion: https://postgr.es/m/d1cc82ed-d07d-cef2-7c00-2e987f121648@2ndquadrant.com
Previous coding was passing the wrong table's tuple descriptor, which
accidentally fails to fail because no existing test case exercises a
foreign key in which the referenced attributes are further to the right
of the referencing attributes.
Add a test so that further breakage is visible.
This got broken in 16828d5c02.
Discussion: https://postgr.es/m/20180403204723.fqte755nukgm42uf@alvherre.pgsql
We were being careless in some places about the order of -L switches in
link command lines, such that -L switches referring to external directories
could come before those referring to directories within the build tree.
This made it possible to accidentally link a system-supplied library, for
example /usr/lib/libpq.so, in place of the one built in the build tree.
Hilarity ensued, the more so the older the system-supplied library is.
To fix, break LDFLAGS into two parts, a sub-variable LDFLAGS_INTERNAL
and the main LDFLAGS variable, both of which are "recursively expanded"
so that they can be incrementally adjusted by different makefiles.
Establish a policy that -L switches for directories in the build tree
must always be added to LDFLAGS_INTERNAL, while -L switches for external
directories must always be added to LDFLAGS. This is sufficient to
ensure a safe search order. For simplicity, we typically also put -l
switches for the respective libraries into those same variables.
(Traditional make usage would have us put -l switches into LIBS, but
cleaning that up is a project for another day, as there's no clear
need for it.)
This turns out to also require separating SHLIB_LINK into two variables,
SHLIB_LINK and SHLIB_LINK_INTERNAL, with a similar rule about which
switches go into which variable. And likewise for PG_LIBS.
Although this change might appear to affect external users of pgxs.mk,
I think it doesn't; they shouldn't have any need to touch the _INTERNAL
variables.
In passing, tweak src/common/Makefile so that the value of CPPFLAGS
recorded in pg_config lacks "-DFRONTEND" and the recorded value of
LDFLAGS lacks "-L../../../src/common". Both of those things are
mistakes, apparently introduced during prior code rearrangements,
as old versions of pg_config don't print them. In general we don't
want anything that's specific to the src/common subdirectory to
appear in those outputs.
This is certainly a bug fix, but in view of the lack of field
complaints, I'm unsure whether it's worth the risk of back-patching.
In any case it seems wise to see what the buildfarm makes of it first.
Discussion: https://postgr.es/m/25214.1522604295@sss.pgh.pa.us
The prefix operator along with SP-GiST indexes can be used as an alternative
for LIKE 'word%' commands and it doesn't have a limitation of string/prefix
length as B-Tree has.
Bump catalog version
Author: Ildus Kurbangaliev with some editorization by me
Review by: Arthur Zakirov, Alexander Korotkov, and me
Discussion: https://www.postgresql.org/message-id/flat/20180202180327.222b04b3@wp.localdomain
When base backups are run over the replication protocol (for example
using pg_basebackup), verify the checksums of all data blocks if
checksums are enabled. If checksum failures are encountered, log them
as warnings but don't abort the backup.
This becomes the default behaviour in pg_basebackup (provided checksums
are enabled on the server), so add a switch (-k) to disable the checks
if necessary.
Author: Michael Banck
Reviewed-By: Magnus Hagander, David Steele
Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.
MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.
MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.
Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.
This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.
Various issues reported via sqlsmith by Andreas Seltenreich
Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs
Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.comhttps://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.
MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.
MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.
Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.
This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.
Various issues reported via sqlsmith by Andreas Seltenreich
Authors: Pavan Deolasee, Simon Riggs
Reviewers: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs
Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.comhttps://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Coverity complained about possible buffer overrun in two places added by
commit 1eb6d6527, and AFAICS it's reasonable to worry: even granting that
the WAL originator properly truncated the commit GID to GIDSIZE, we should
not really bet our lives on that having the same value as it does in the
current build. Hence, use strlcpy() not strcpy(), and adjust the pointer
advancement logic to be sure we skip over the whole source string even if
strlcpy() truncated it.
Recent commit 8a3d9425 has introduced be-secure-common.c, which is aimed
at including backend-side APIs that can be used by any SSL
implementation. The purpose is similar to fe-secure-common.c for the
frontend-side APIs.
However, this has forgotten to include check_ssl_key_file_permissions()
in the move, which causes a double dependency between be-secure.c and
be-secure-openssl.c.
Refactor the code in a more logical way. This also puts into light an
API which is usable by future SSL implementations for permissions on SSL
key files.
Author: Michael Paquier <michael@paquier.xyz>
Since commit 7012b132d0, postgres_fdw
has been able to push down the toplevel aggregation operation to the
remote server. Commit e2f1eb0ee3 made
it possible to break down the toplevel aggregation into one
aggregate per partition. This commit lets postgres_fdw push down
aggregation in that case just as it does at the top level.
In order to make this work, this commit adds an additional argument
to the GetForeignUpperPaths FDW API. A matching argument is added
to the signature for create_upper_paths_hook. Third-party code using
either of these will need to be updated.
Also adjust create_foreignscan_plan() so that it picks up the correct
set of relids in this case.
Jeevan Chalke, reviewed by Ashutosh Bapat and by me and with some
adjustments by me. The larger patch series of which this patch is a
part was also reviewed and tested by Antonin Houska, Rajkumar
Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal
Legrand, and Rafia Sabih.
Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
round() is from C99. Use rint() instead. There are behavioral
differences between round() and rint(), but they should not matter to
the Bloom filter optimal_k() function. We already assume POSIX
behavior for rint(), so there is no question of rint() not using
"rounds towards nearest" as its rounding mode.
Cleanup from commit 51bc271790.
Per buildfarm member thrips.
Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn76eCGUonARy-wrVtMHsf+4cvbK_oJAWTLfORTU5ki0w@mail.gmail.com
A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership. Callers will sometimes incur false
positives, but never false negatives. The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.
Two classic applications of Bloom filters are cache filtering, and data
synchronization testing. Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.
This commit adds a test harness extension module, test_bloomfilter. It
can be used to get a sense of how the Bloom filter implementation
performs under varying conditions.
This is infrastructure for the upcoming "heapallindexed" amcheck patch,
which verifies the consistency of a heap relation against one of its
indexes.
Author: Peter Geoghegan
Reviewed-By: Andrey Borodin, Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g@mail.gmail.com
Avoid storing the result of PQsocket() in a pgsocket variable; it's
declared as int, and the no-socket test is properly written as "x < 0"
not "x == PGINVALID_SOCKET". This accidentally had no bad effect
because we never got to init_slot() with a bad connection, but it's
still wrong.
Actually, it seems like we should avoid storing the result for a long
period at all. The function's not so expensive that it's worth avoiding,
and the existing coding technique here would fail if anyone tried to
PQreset the connection during the life of the program. Hence, just
re-call PQsocket every time we construct a select(2) mask.
Speaking of select(), GetIdleSlot imagined that it could compute the
select mask once and continue to use it over multiple calls to
select_loop(), which is pretty bogus since that would stomp on the
mask on return. This could only matter if the function's outer loop
iterated more than once, which is unlikely (it'd take some connection
receiving data, but not enough to complete its command). But if it
did happen, we'd acquire "tunnel vision" and stop watching the other
connections for query termination, with the effect of losing parallelism.
Another way in which GetIdleSlot could lose parallelism is that once
PQisBusy returns false, it would lock in on that connection and do
PQgetResult until that returns NULL; in some cases that could result
in blocking. (Perhaps this can never happen in vacuumdb due to the
limited set of commands that it can issue, but I'm not quite sure
of that, and even if true today it's not a future-proof assumption.)
Refactor the code to do that properly, so that it risks blocking in
PQgetResult only in cases where we need to wait anyway.
Another loss-of-parallelism problem, which *is* easily demonstrable,
is that any setup queries issued during prepare_vacuum_command() were
always issued on the last-to-be-created connection, whether or not
that was idle. Long-running operations on that connection thus
prevented issuance of additional operations on the other ones, except
in the limited cases where no preparatory query was needed. Instead,
wait till we've identified a free connection and use that one.
Also, avoid core dump due to undersized malloc request in the case
that no tables are identified to be vacuumed.
The bogus no-socket test was noted by CharSyam, the other problems
identified in my own code review. Back-patch to 9.5 where parallel
vacuumdb was introduced.
Discussion: https://postgr.es/m/CAMrLSE6etb33-192DTEUGkV-TsvEcxtBDxGWG1tgNOMnQHwgDA@mail.gmail.com
Compilation failed for lack of an #ifdef on builds without
pg_strong_random(). Also fix relevant error messages to meet
project style guidelines.
Fabien Coelho, further adjusted by me
Discussion: https://postgr.es/m/32390.1522464534@sss.pgh.pa.us
So far as I can find, NI_MAXHOST isn't actually required anywhere by
POSIX. Nonetheless, commit 9a895462d supposed that it could rely on
having that symbol without any ceremony at all. We do have a hack
for providing it if the platform doesn't, in getaddrinfo.h, so fix
the problem by #including that file. Per buildfarm.
In 9956ddc191, ten years ago, the
current objfile.txt based linking model was introduced. It's time to
retire the old SUBSYS.o based model.
This primarily is pertinent because the bitcode files for LLVM based
inlining are not produced when using PARTIAL_LINKING. It does not seem
worth to fix PARTIAL_LINKING to support that.
Author: Andres Freund
Discussion: https://postgr.es/m/20180121204356.d5oeu34jetqhmdv2@alap3.anarazel.de
LockViewRecurese() obtains view relation using heap_open() and passes
it to get_view_query() to get view info. It immediately closes the
relation then uses the returned view info by calling
LockViewRecurse_walker(). Since get_view_query() returns a pointer
within the relcache, the relcache should be kept until
LockViewRecurse_walker() returns. Otherwise the relation could point
to a garbage memory area.
Fix is moving the heap_close() call after LockViewRecurse_walker().
Problem reported by Tom Lane (buildfarm is unhappy, especially prion
since it enables -DRELCACHE_FORCE_RELEASE cpp flag), fix by me.
A followup patch will add a SKIP_LOCKED option. To avoid introducing
evermore arguments, breaking existing callers each time, introduce a
flags argument. This'll no doubt break a few external users...
Also change the MISSING_OK behaviour so a DEBUG1 debug message is
emitted when a relation is not found.
Author: Nathan Bossart
Reviewed-By: Michael Paquier and Andres Freund
Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
Previously there was no way in the standby side to find out the host and port
of the sender server that the walreceiver was currently connected to when
multiple hosts and ports were specified in primary_conninfo. For that purpose,
this patch adds sender_host and sender_port columns into pg_stat_wal_receiver
view. They report the host and port that the active replication connection
currently uses.
Bump catalog version.
Author: Haribabu Kommi
Reviewed-by: Michael Paquier and me
Discussion: https://postgr.es/m/CAJrrPGcV_aq8=cdqkFhVDJKEnDQ70yRTTdY9RODzMnXNrCz2Ow@mail.gmail.com
Richard Yen reported that pg_upgrade failed if the target cluster had
force_parallel_mode = on, because binary_upgrade_create_empty_extension()
is marked parallel restricted, allowing it to be executed in parallel
mode, which complains because it tries to acquire an XID.
In general, no function that might try to modify database data should
be considered parallel safe or restricted, since execution of it might
force XID acquisition. We found several other examples of this mistake.
Furthermore, functions that execute user-supplied SQL queries or query
fragments, or pull data from user-supplied cursors, had better be marked
both volatile and parallel unsafe, because we don't know what the supplied
query or cursor might try to do. There were several tsquery and XML
functions that had the wrong proparallel marking for this, and some of
them were even mislabeled as to volatility.
All these bugs are old, dating back to 9.6 for the proparallel mistakes
and much further for the provolatile mistakes. We can't force a
catversion bump in the back branches, but we can at least ensure that
installations initdb'd in future have the right values.
Thomas Munro and Tom Lane
Discussion: https://postgr.es/m/CAEepm=2sNDScSLTfyMYu32Q=ob98ZGW-vM_2oLxinzSABGQ6VA@mail.gmail.com
In the previous coding, skipped pages were mostly zeroes, but they still
had valid WAL page headers. That makes them very much less compressible
than an unbroken string of zeroes would be --- about 10X worse for bzip2
compression, for instance. We don't need those headers, so tweak the logic
so that we zero them out.
Chapman Flack, reviewed by Daniel Gustafsson
Discussion: https://postgr.es/m/579297F8.7020107@anastigmatix.net
When SSI was developed, slru.c was limited to segment files with names in
the range 0000-FFFF. This didn't allow enough space for predicate.c to
store every possible XID when spilling old transactions to disk, so it
would wrap around sooner and print warnings. Since commits 638cf09e and
73c986ad increased the number of segment files slru.c could manage, that
behavior is unnecessary. Therefore remove that code.
Also remove the macro OldSerXidSegment, which has been unused since
4cd3fb6e.
Thomas Munro, reviewed by Anastasia Lubennikova
Discussion: https://postgr.es/m/CAEepm=3XfsTSxgEbEOmxu0QDiXy0o18NUg2nC89JZcCGE+XFPA@mail.gmail.com
Add the target context's name to the errdetail field of "out of memory"
errors in mcxt.c. Per discussion, this seems likely to be useful to
help narrow down the cause of a reported failure, and it costs little.
Also, now that context names are required to be compile-time constants
in all cases, there's little reason to be concerned about security
issues from exposing these names to users. (Because of such concerns,
we are *not* including the context "ident" field.)
In passing, add unlikely() markers to the allocation-failed tests,
just to be sure the compiler is on the right page about that.
Also, in palloc and friends, copy CurrentMemoryContext into a local
variable, as that's almost surely cheaper to reference than a global.
Discussion: https://postgr.es/m/1099.1522285628@sss.pgh.pa.us
In btree and SP-GiST indexes, move the responsibility for calling
IndexFreeSpaceMapVacuum from the vacuumcleanup phase to the bulkdelete
phase, and do it if and only if we found some pages that could be put into
FSM. As in commit 851a26e26, the idea is to make free pages visible to FSM
searchers sooner when vacuuming very large tables (large enough to need
multiple bulkdelete scans). This adds more redundant work than that commit
did, since we have to scan the entire index FSM each time rather than being
able to localize what needs to be updated; but it still seems worthwhile.
However, we can buy something back by not touching the FSM at all when
there are no pages that can be put in it. That will result in slower
recovery from corrupt upper FSM pages in such a scenario, but it doesn't
seem like that's a case we need to optimize for.
Hash indexes don't use FSM at all. GIN, GiST, and bloom indexes update
FSM during the vacuumcleanup phase not bulkdelete, so that doing something
comparable to this would be a much more invasive change, and it's not clear
it's worth it. BRIN indexes do things sufficiently differently that this
change doesn't apply to them, either.
Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me
Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
Unlike the previous coding, this might result in a Gather per Append
subplan when the target list is parallel-restricted, but such a plan
is probably worth considering in that case, since a single Gather
on top of the entire Append is impossible.
Per Andres Freund and the buildfarm.
Discussion: http://postgr.es/m/20180330050351.bmxx4cdtz67czjda@alap3.anarazel.de
Predicate locks are used on per page basis only if fastupdate = off, in
opposite case predicate lock on pending list will effectively lock whole index,
to reduce locking overhead, just lock a relation. Entry and posting trees are
essentially B-tree, so locks are acquired on leaf pages only.
Author: Shubham Barai with some editorization by me and Dmitry Ivanov
Review by: Alexander Korotkov, Dmitry Ivanov, Fedor Sigaev
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPt5sWW+EwTaKUGFL5_XFcZ0MuGBcyJ70oqbWqr42YKR8Q@mail.gmail.com
If the toplevel scan/join target list is parallel-safe, postpone
generating Gather (or Gather Merge) paths until after the toplevel has
been adjusted to return it. This (correctly) makes queries with
expensive functions in the target list more likely to choose a
parallel plan, since the cost of the plan now reflects the fact that
the evaluation will happen in the workers rather than the leader.
The original complaint about this problem was from Jeff Janes.
If the toplevel scan/join relation is partitioned, recursively apply
the changes to all partitions. This sometimes allows us to get rid of
Result nodes, because Append is not projection-capable but its
children may be. It also cleans up what appears to be incorrect SRF
handling from commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old
code had no knowledge of SRFs for child scan/join rels.
Because we now use create_projection_path() in some cases where we
formerly used apply_projection_to_path(), this changes the ordering
of columns in some queries generated by postgres_fdw. Update
regression outputs accordingly.
Patch by me, reviewed by Amit Kapila and by Ashutosh Bapat. Other
fixes for this problem (substantially different from this version)
were reviewed by Dilip Kumar, Amit Khandekar, and Marina Polyakova.
Discussion: http://postgr.es/m/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com
Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths. Instead, do the work in
grouping_planner. By itself, this gains nothing; in fact it loses
slightly because we end up calling set_cheapest() for the topmost
scan/join rel twice rather than once. However, it paves the way for
a future commit which will postpone generate_gather_paths for the
topmost scan/join relation even further, allowing more accurate
costing of parallel paths.
Amit Kapila and Robert Haas. Earlier versions of this patch (which
different substantially) were reviewed by Dilip Kumar, Amit
Khandekar, Marina Polyakova, and Ashutosh Bapat.
We sometimes insert a ProjectionPath into a plan tree when projection
is not strictly required. The existing code already arranges to avoid
emitting a Result node when the ProjectionPath's subpath can perform
the projection itself, but previously it didn't consider the
possibility that the parent node might not actually require the
projection to be performed at all.
Skipping projection when it's not required can not only avoid Result
nodes that aren't needed, but also avoid losing the "physical tlist"
optimization unneccessarily.
Patch by me, reviewed by Amit Kapila.
Discussion: http://postgr.es/m/CA+TgmoakT5gmahbPWGqrR2nAdFOMAOnOXYoWHRdVfGWs34t6_A@mail.gmail.com
Just noticed that these were quite redundant, since we're holding the
page address in a local variable anyway, and we have pin on the buffer
throughout.
Also improve a comment.
FreeSpaceMapVacuumRange has the same effect, is more efficient if many
pages are involved, and makes fewer assumptions about how it's used.
Notably, Claudio Freire pointed out that UpdateFreeSpaceMap could fail
if the specified freespace value isn't the maximum possible. This isn't
a problem for the single existing user, but the function represents an
attractive nuisance IMO, because it's named as though it were a
general-purpose update function and its limitations are undocumented.
In any case we don't need multiple ways to get the same result.
In passing, do some code review and cleanup in RelationAddExtraBlocks.
In particular, I see no excuse for it to omit the PageIsNew safety check
that's done in the mainline extension path in RelationGetBufferForTuple.
Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
VACUUM updates leaf-level FSM entries immediately after cleaning the
corresponding heap blocks. fsmpage.c updates the intra-page search trees
on the leaf-level FSM pages when this happens, but it does not touch the
upper-level FSM pages, so that the released space might not actually be
findable by searchers. Previously, updating the upper-level pages happened
only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum()
call. This is bad because the VACUUM might get canceled before ever
reaching that point, so that from the point of view of searchers no space
has been freed at all, leading to table bloat.
We can improve matters by updating the upper pages immediately after each
cycle of index-cleaning and heap-cleaning, processing just the FSM pages
corresponding to the range of heap blocks we have now fully cleaned.
This adds a small amount of extra work, since the FSM pages leading down
to each range boundary will be touched twice, but it's pretty negligible
compared to everything else going on in a large VACUUM.
If there are no indexes, VACUUM doesn't work in cycles but just cleans
each heap page on first visit. In that case we just arbitrarily update
upper FSM pages after each 8GB of heap. That maintains the goal of not
letting all this work slide until the very end, and it doesn't seem worth
expending extra complexity on a case that so seldom occurs in practice.
In either case, the FSM is fully up to date before any attempt is made
to truncate the relation, so that the most likely scenario for VACUUM
cancellation no longer results in out-of-date upper FSM pages. When
we do successfully truncate, adjusting the FSM to reflect that is now
fully handled within FreeSpaceMapTruncateRel.
Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me
Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
Add explicit cast from scalar jsonb to all numeric and bool types. It would be
better to have cast from scalar jsonb to text too but there is already a cast
from jsonb to text as just text representation of json. There is no way to have
two different casts for the same type's pair.
Bump catalog version
Author: Anastasia Lubennikova with editorization by Nikita Glukhov and me
Review by: Aleksander Alekseev, Nikita Glukhov, Darafei Praliaskouski
Discussion: https://www.postgresql.org/message-id/flat/0154d35a-24ae-f063-5273-9ffcdf1c7f2e@postgrespro.ru
Previously, committing or aborting inside a cursor loop was prohibited
because that would close and remove the cursor. To allow that,
automatically convert such cursors to holdable cursors so they survive
commits or rollbacks. Portals now have a new state "auto-held", which
means they have been converted automatically from pinned. An auto-held
portal is kept on transaction commit or rollback, but is still removed
when returning to the main loop on error.
This supports all languages that have cursor loop constructs: PL/pgSQL,
PL/Python, PL/Perl.
Reviewed-by: Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>
As promised in earlier commits, this adds documentation about the new
build options, the new GUCs, about the planner logic when JIT is used,
and the benefits of JIT in general.
Also adds a more implementation oriented README.
I'm sure we're going to want to expand this further, but I think this
is a reasonable start.
Author: Andres Freund, with contributions by Thomas Munro
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
This just shows a few details about JITing, e.g. how many functions
have been JITed, and how long that took. To avoid noise in regression
tests with functions sometimes being JITed in --with-llvm builds,
disable display when COSTS OFF is specified.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
This provides infrastructure to allow JITed code to inline code
implemented in C. This e.g. can be postgres internal functions or
extension code.
This already speeds up long running queries, by allowing the LLVM
optimizer to optimize across function boundaries. The optimization
potential currently doesn't reach its full potential because LLVM
cannot optimize the FunctionCallInfoData argument fully away, because
it's allocated on the heap rather than the stack. Fixing that is
beyond what's realistic for v11.
To be able to do that, use CLANG to convert C code to LLVM bitcode,
and have LLVM build a summary for it. That bitcode can then be used to
to inline functions at runtime. For that the bitcode needs to be
installed. Postgres bitcode goes into $pkglibdir/bitcode/postgres,
extensions go into equivalent directories. PGXS has been modified so
that happens automatically if postgres has been compiled with LLVM
support.
Currently this isn't the fastest inline implementation, modules are
reloaded from disk during inlining. That's to work around an apparent
LLVM bug, triggering an apparently spurious error in LLVM assertion
enabled builds. Once that is resolved we can remove the superfluous
read from disk.
Docs will follow in a later commit containing docs for the whole JIT
feature.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
When compiling with clang glibc's definition of isinf() ends up
leading to and external libc function call. That's because there was a
bug in the builtin in an old gcc version, and clang claims
compatibility with an older version. That causes clang to be
measurably slower for floating point heavy workloads than gcc.
To fix simply redirect isinf when using clang and clang confirms it
has __builtin_isinf().
The target cluster that was rewound needs to perform recovery from
the checkpoint created at failover, which leads it to remove or recreate
some files and directories that may have been copied from the source
cluster. So pg_rewind can skip synchronizing such files and directories,
and which reduces the amount of data transferred during a rewind
without changing the usefulness of the operation.
Author: Michael Paquier
Reviewed-by: Anastasia Lubennikova, Stephen Frost and me
Discussion: https://postgr.es/m/20180205071022.GA17337@paquier.xyz
After processing the filemap to build the list of chunks that will be
fetched from the source to rewing the target server, it is possible that
a file which was previously processed is removed from the source. A
simple example of such an occurence is a WAL segment which gets recycled
on the target in-between. When the filemap is processed, files not
categorized as relation files are first truncated to prepare for its
full copy of which is going to be taken from the source, divided into a
set of junks. However, for a recycled WAL segment, this would result in
a segment which has a zero-byte size. With such an empty file,
post-rewind recovery thinks that records are saved but they are actually
not because of the truncation which happened when processing the
filemap, resulting in data loss.
In order to fix the problem, make sure that files which are found as
removed on the source when receiving chunks of them are as well deleted
on the target server for consistency.
Back-patch to 9.5 where pg_rewind was added.
Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier
Reported-by: Tsunakawa Takayuki
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8DAAA2%40G01JPEXMBYT05
So far, a nested CALL or DO in PL/pgSQL would not establish a context
where transaction control statements were allowed. This fixes that by
handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic
execution context through and doing the required management around
transaction boundaries.
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
tuplesort_gettupleslot() passed back tuples allocated in the tuplesort's
own memory context, even when the caller was responsible to free them.
This created a double-free hazard, because some callers might destroy
the tuplesort object (via tuplesort_end) before trying to clean up the
last returned tuple. To avoid this, change the API to specify that the
tuple is allocated in the caller's memory context. v10 and HEAD already
did things that way, but in 9.5 and 9.6 this is a live bug that can
demonstrably cause crashes with some grouping-set usages.
In 9.5 and 9.6, this requires doing an extra tuple copy in some cases,
which is unfortunate. But the amount of refactoring needed to avoid it
seems excessive for a back-patched change, especially since the cases
where an extra copy happens are less performance-critical.
Likewise change tuplesort_getdatum() to return pass-by-reference Datums
in the caller's context not the tuplesort's context. There seem to be
no live bugs among its callers, but clearly the same sort of situation
could happen in future.
For other tuplesort fetch routines, continue to allocate the memory in
the tuplesort's context. This is a little inconsistent with what we now
do for tuplesort_gettupleslot() and tuplesort_getdatum(), but that's
preferable to adding new copy overhead in the back branches where it's
clearly unnecessary. These other fetch routines provide the weakest
possible guarantees about tuple memory lifespan from v10 on, anyway,
so this actually seems more consistent overall.
Adjust relevant comments to reflect these API redefinitions.
Arguably, we should change the pre-9.5 branches as well, but since
there are no known failure cases there, it seems not worth the risk.
Peter Geoghegan, per report from Bernd Helmle. Reviewed by Kyotaro
Horiguchi; thanks also to Andreas Seltenreich for extracting a
self-contained test case.
Discussion: https://postgr.es/m/1512661638.9720.34.camel@oopsware.de
Store GID of 2PC in commit/abort WAL records when wal_level = logical.
This allows logical decoding to send the SAME gid to subscribers
across restarts of logical replication.
Track relica origin replay progress for 2PC.
(Edited from patch 0003 in the logical decoding 2PC series.)
Authors: Nikhil Sontakke, Stas Kelvich
Reviewed-by: Simon Riggs, Andres Freund
Resolve build farm failures from c203d6cf81,
diagnosed by Tom Lane.
The output of pg_stat_get_xact_tuples_hot_updated() and friends
is not guaranteed to show anything after the transaction completes.
Data is flushed slowly to stats collector, so using them can
give timing issues.
Instead using memset to set tts_isnull, call the new
slot_getmissingattrs().
Also fix a bug (= instead of >=) in the code generation. Normally = is
correct, but when repeatedly deforming fields not in a
tuple (e.g. deform up to natts + 1 and then natts + 2) >= is needed.
Discussion: https://postgr.es/m/20180328010053.i2qvsuuusst4lgmc@alap3.anarazel.de
Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.
Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.
The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g. catalog relations) NULL can be passed for this
argument.
Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.
Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
It seems that all buildfarm members are now using the <stdbool.h> code
path, so that none of them report "bool" as a typedef. We still need it
to be treated that way, so adjust pgindent to force that whether or not
it's in the given list.
Also, the recent introduction of LLVM infrastructure has caused the
appearance of some typedef names that we definitely *don't* want
treated as typedefs, such as "string" and "abs". Extend the existing
blacklist to include these. (Additions based on comparing v10's
typedefs list to what the buildfarm is currently emitting.)
Rearrange the code so that the lists of whitelisted/blacklisted
names are a bit easier to find and modify.
Andrew Dunstan and Tom Lane
Discussion: https://postgr.es/m/28690.1521912334@sss.pgh.pa.us
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
If the value of an index expression is unchanged after UPDATE,
allow HOT updates where previously we disallowed them, giving
a significant performance boost in those cases.
Particularly useful for indexes such as JSON->>field where the
JSON value changes but the indexed value does not.
Submitted as "surjective indexes" patch, now enabled by use
of new "recheck_on_update" parameter.
Author: Konstantin Knizhnik
Reviewer: Simon Riggs, with much wordsmithing and some cleanup
Previously, PQhost didn't return the connected host details when the
connection type was CHT_HOST_ADDRESS (i.e., via hostaddr). Instead, it
returned the complete host connection parameter (which could contain
multiple hosts) or the default host details, which was confusing and
arguably incorrect.
Change this to return the actually connected host or hostaddr
irrespective of the connection type. When hostaddr but no host was
specified, hostaddr is now returned. Never return the original host
connection parameter, and document that PQhost cannot be relied on
before the connection is established.
PQport is similarly changed to always return the active connection port
and never the original connection parameter.
Author: Hari Babu <kommi.haribabu@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Add page-level predicate locking, due to gist's code organization, patch seems
close to trivial: add check before page changing, add predicate lock before page
scanning. Although choosing right place to check is not simple: it should not
be called during index build, it should support insertion of new downlink and so
on.
Author: Shubham Barai with editorization by me and Alexander Korotkov
Reviewed by: Alexander Korotkov, Andrey Borodin, me
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPtdcANpw5ePU3LvnTP8HCENFw6wygupQAyNBgD-sG3h0g@mail.gmail.com
This is mostly done to be able to validate features and fixes
submitted to LLVM. Given the size of these changes that seems
acceptable.
Author: Andres Freund
Due to the differing APIs between versions, I forgot to deallocate the
generated module in older LLVM versions, leading to a memory leak.
Author: Andres Freund
Performing JIT compilation for deforming gains performance benefits
over unJITed deforming from compile-time knowledge of the tuple
descriptor. Fixed column widths, NOT NULLness, etc can be taken
advantage of.
Right now the JITed deforming is only used when deforming tuples as
part of expression evaluation (and obviously only if the descriptor is
known). It's likely to be beneficial in other cases, too.
By default tuple deforming is JITed whenever an expression is JIT
compiled. There's a separate boolean GUC controlling it, but that's
expected to be primarily useful for development and benchmarking.
Docs will follow in a later commit containing docs for the whole JIT
feature.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
Setting random could increase reproducibility of test in some cases. Patch
suggests three providers for seed: time (default), strong random
generator (if available) and unsigned constant. Seed could be set from
command line or enviroment variable.
Author: Fabien Coelho
Reviewed by: Chapman Flack
Discussion: https://www.postgresql.org/message-id/flat/20160407082711.q7iq3ykffqxcszkv@alap3.anarazel.de
The listed numbers disagreed with the ones being used in the symbols;
but instead of just fixing the numbers in the comment, use the symbolic
name instead, which seems clearer.
This has been wrong all along, so apply back to 9.5 where BRIN was
introduced.
Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/5ff514f2-8b1e-6366-b11c-8e2ed442562d@2ndquadrant.com
Test 'triggers' fails when another one creates triggers concurrently at
some precise time, because of a missing WHERE clause.
Per buildfarm members snapper, desmoxytes.
Commit eb7ed3f306 enabled unique constraints on partitioned tables,
but one thing that was not working properly is INSERT/ON CONFLICT.
This commit introduces a new node keeps state related to the ON CONFLICT
clause per partition, and fills it when that partition is about to be
used for tuple routing.
Author: Amit Langote, Álvaro Herrera
Reviewed-by: Etsuro Fujita, Pavan Deolasee
Discussion: https://postgr.es/m/20180228004602.cwdyralmg5ejdqkq@alvherre.pgsql
Remember the last page of an index insert if it's the rightmost leaf
page. If the next entry belongs on and can fit in the remembered page,
insert the new entry there as long as we can get a lock on the page.
Otherwise, fall back on the more expensive method of searching for
the right place to insert the entry.
This provides a performance improvement for the common case where an
index entry is for monotonically increasing or nearly monotonically
increasing value such as an identity field or a current timestamp.
Pavan Deolasee
Reviewed by Claudio Freire, Simon Riggs and Peter Geoghegan
Discussion: https://postgr.es/m/CABOikdM9DrupjyKZZFM5k8-0RCDs1wk6JzEkg7UgSW6QzOwMZw@mail.gmail.com
Commit 8694cc96b did this randomly differently from other callers of
parse_filename_for_nontemp_relation(). Perhaps unsurprisingly,
the randomly different way is wrong; it fails to ensure the
extracted string is null-terminated. Per buildfarm member skink.
Discussion: https://postgr.es/m/14453.1522001792@sss.pgh.pa.us
This adds a new option --wal-segsize (analogous to initdb) that changes
the WAL segment size in pg_control.
Author: Nathan Bossart <bossartn@amazon.com>
Coverity complained that this check is pointless, and it's right.
There is no case where we'd call ExecutorStart with a null plannedstmt,
and if we did, it'd have crashed before here. Thinko in commit cc415a56d.
If random() returns a result sufficiently close to zero, float8out
switches to scientific notation, breaking this test case's expectation
that the output should look like '0.xxxxxxxxx'. Casting to numeric
should fix that. Per buildfarm member pogona.
Discussion: https://postgr.es/m/20180324212502.wt4serghfidge2on@alap3.anarazel.de
We were running out of good single-letter options for some upcoming
pg_resetwal functionality, so add long options to create more
possibilities. Add to pg_controldata as well for symmetry.
based on patch by Bossart, Nathan <bossartn@amazon.com>
In the case that PostgreSQL uses stdbool.h but Perl doesn't, we need to
prevent Perl from defining bool, to prevent compiler warnings about
redefinition.
For years, our makefiles have correctly observed that "there is no correct
way to write a rule that generates two files". However, what we did is to
provide empty rules that "generate" the secondary output files from the
primary one, and that's not right either. Depending on the details of
the creating process, the primary file might end up timestamped later than
one or more secondary files, causing subsequent make runs to consider the
secondary file(s) out of date. That's harmless in a plain build, since
make will just re-execute the empty rule and nothing happens. But it's
fatal in a VPATH build, since make will expect the secondary file to be
rebuilt in the build directory. This would manifest as "file not found"
failures during VPATH builds from tarballs, if we were ever unlucky enough
to ship a tarball with apparently out-of-date secondary files. (It's not
clear whether that has ever actually happened, but it definitely could.)
To ensure that secondary output files have timestamps >= their primary's,
change our makefile convention to be that we provide a "touch $@" action
not an empty rule. Also, make sure that this rule actually gets invoked
during a distprep run, else the hazard remains.
It's been like this a long time, so back-patch to all supported branches.
In HEAD, I skipped the changes in src/backend/catalog/Makefile, because
those rules are due to get replaced soon in the bootstrap data format
patch, and there seems no need to create a merge issue for that patch.
If for some reason we fail to land that patch in v11, we'll need to
back-fill the changes in that one makefile from v10.
Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us
Revert the PL/Perl-specific change in
9a95a77d9d. We must not prevent Perl from
using stdbool.h when it has been built to do so, even if it uses an
incompatible size. Otherwise, we would be imposing our bool on Perl,
which will lead to crashes because of the size mismatch.
Instead, we undef bool after including the Perl headers, as we did
previously, but now only if we are not using stdbool.h ourselves.
Record that choice in c.h as USE_STDBOOL. This will also make it easier
to apply that coding pattern elsewhere if necessary.
Previously, FOR EACH ROW triggers were not allowed in partitioned
tables. Now we allow AFTER triggers on them, and on trigger creation we
cascade to create an identical trigger in each partition. We also clone
the triggers to each partition that is created or attached later.
This means that deferred unique keys are allowed on partitioned tables,
too.
Author: Álvaro Herrera
Reviewed-by: Peter Eisentraut, Simon Riggs, Amit Langote, Robert Haas,
Thomas Munro
Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql