Commit Graph

31898 Commits

Author SHA1 Message Date
Alvaro Herrera 9fdb675fc5 Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning.  The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.

At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.

This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.

Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 16:44:05 -03:00
Stephen Frost 11523e860f Support new default roles with adminpack
This provides a newer version of adminpack which works with the newly
added default roles to support GRANT'ing to non-superusers access to
read and write files, along with related functions (unlinking files,
getting file length, renaming/removing files, scanning the log file
directory) which are supported through adminpack.

Note that new versions of the functions are required because an
environment might have an updated version of the library but still have
the old adminpack 1.0 catalog definitions (where EXECUTE is GRANT'd to
PUBLIC for the functions).

This patch also removes the long-deprecated alternative names for
functions that adminpack used to include and which are now included in
the backend, in adminpack v1.1.  Applications using the deprecated names
should be updated to use the backend functions instead.  Existing
installations which continue to use adminpack v1.0 should continue to
function until/unless adminpack is upgraded.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
2018-04-06 14:47:10 -04:00
Stephen Frost 0fdc8495bf Add default roles for file/program access
This patch adds new default roles named 'pg_read_server_files',
'pg_write_server_files', 'pg_execute_server_program' which
allow an administrator to GRANT to a non-superuser role the ability to
access server-side files or run programs through PostgreSQL (as the user
the database is running as).  Having one of these roles allows a
non-superuser to use server-side COPY to read, write, or with a program,
and to use file_fdw (if installed by a superuser and GRANT'd USAGE on
it) to read from files or run a program.

The existing misc file functions are also changed to allow a user with
the 'pg_read_server_files' default role to read any files on the
filesystem, matching the privileges given to that role through COPY and
file_fdw from above.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
2018-04-06 14:47:10 -04:00
Stephen Frost e79350fef2 Remove explicit superuser checks in favor of ACLs
This removes the explicit superuser checks in the various file-access
functions in the backend, specifically pg_ls_dir(), pg_read_file(),
pg_read_binary_file(), and pg_stat_file().  Instead, EXECUTE is REVOKE'd
from public for these, meaning that only a superuser is able to run them
by default, but access to them can be GRANT'd to other roles.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
2018-04-06 14:47:10 -04:00
Peter Eisentraut 94c1f9ba11 Add memory context identifier to portal context
Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us
2018-04-06 12:37:54 -04:00
Peter Eisentraut bbca77623f Rename MemoryContextCopySetIdentifier() for clarity
MemoryContextCopySetIdentifier -> MemoryContextCopyAndSetIdentifier

Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us
2018-04-06 12:37:54 -04:00
Robert Haas cfbecf8100 Enforce child constraints during COPY TO a partitioned table.
The previous coding inadvertently checked the constraints for the
partitioned table rather than the target partition, which could
lead to data in a partition that fails to satisfy some constraint
on that partition.  This problem seems to date back to when
table partitioning was introduced; prior to that, there was only
one target table for a COPY, so the problem didn't occur, and the
code just didn't get updated.

Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat

Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp
2018-04-06 11:42:28 -04:00
Peter Eisentraut bcf79b5bb6 Split the SetSubscriptionRelState function into two
We don't actually need the insert-or-update logic, so it's clearer to
have separate functions for the inserting and updating.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
2018-04-06 10:00:26 -04:00
Peter Eisentraut c25304a945 Improve messaging during logical replication worker startup
In case the subscription is removed before the worker is fully started,
give a specific error message instead of the generic "cache lookup"
error.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
2018-04-06 09:07:09 -04:00
Peter Eisentraut 2cd6520e78 Fix compiler warning about format truncation 2018-04-06 08:43:50 -04:00
Simon Riggs f1464c5380 Improve parse representation for MERGE
Separation of parser data structures from executor, as
requested by Tom Lane. Further improvements possible.

While there, implement error for multiple VALUES clauses via parser
to allow line number of error, as requested by Andres Freund.

Author: Pavan Deolasee

Discussion: https://www.postgresql.org/message-id/CABOikdPpqjectFchg0FyTOpsGXyPoqwgC==OLKWuxgBOsrDDZw@mail.gmail.com
2018-04-06 09:38:59 +01:00
Magnus Hagander 3b0b4f31f7 Attempt to fix win32 build of pg_verify_checksums
S_ISLNK doesn't exist on Win32, instead we should use
pgwin32_is_junction().
2018-04-05 22:38:03 +02:00
Magnus Hagander 1fde38beaa Allow on-line enabling and disabling of data checksums
This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).

Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.

This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.

Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
2018-04-05 22:04:48 +02:00
Peter Eisentraut b981275b65 PL/pgSQL: Add support for SET TRANSACTION
A normal SQL command run inside PL/pgSQL acquires a snapshot, but SET
TRANSACTION does not work anymore if a snapshot is set.  So we have to
handle this separately.

Reviewed-by: Alexander Korotkov <a.korotkov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
2018-04-05 15:30:24 -04:00
Simon Riggs 530e69e59b Allow cpluspluscheck to pass by renaming variable
Use of a C++ keyword as a function name caused problems

Reported-by: Álvaro Herrera
2018-04-05 20:06:02 +01:00
Peter Eisentraut b9986551e0 Fix plan cache issue in PL/pgSQL CALL
If we are not going to save the plan, then we need to unset expr->plan
after we are done, also in error cases.  Otherwise, we get a dangling
pointer next time around.

This is not the ideal solution.  It would be better if we could convince
SPI not to associate a cached plan with a resource owner, and then we
could just save the plan in all cases.  But that would require bigger
surgery.

Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
2018-04-05 14:51:56 -04:00
Magnus Hagander 6a5f796b48 Fix worker_spi for new parameter to initialize connection
Missed in previous commit.

Spotted by Teodor and the buildfarm
2018-04-05 19:14:50 +02:00
Teodor Sigaev 1a8c95365e Remove tsearch test contained russian characters, missed in
1664ae1978
2018-04-05 20:05:04 +03:00
Magnus Hagander eed1ce72e1 Allow background workers to bypass datallowconn
THis adds a "flags" field to the BackgroundWorkerInitializeConnection()
and BackgroundWorkerInitializeConnectionByOid(). For now only one flag,
BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore
datallowconn.
2018-04-05 19:02:45 +02:00
Teodor Sigaev 1664ae1978 Add websearch_to_tsquery
Error-tolerant conversion function with web-like syntax for search query,
it simplifies  constraining search engine with close to habitual interface for
users.

Bump catalog version

Authors: Victor Drobny, Dmitry Ivanov with editorization by me
Reviewed by: Aleksander Alekseev, Tomas Vondra, Thomas Munro, Aleksandr Parfenov
Discussion: https://www.postgresql.org/message-id/flat/fe931111ff7e9ad79196486ada79e268@postgrespro.ru
2018-04-05 19:55:11 +03:00
Alvaro Herrera fbc27330b8 Add missing include
Newly added prototype broke cpluspluscheck.

Minor buglet in commit 8694cc96b5.
2018-04-05 12:20:17 -03:00
Teodor Sigaev 0a64b45152 Fix handling of non-upgraded B-tree metapages
857f9c36 bumps B-tree metapage version while upgrade is performed "on the fly"
when needed. However, some asserts fired when old version metapage was
cached to rel->rd_amcache. Despite new metadata fields are never used from
rel->rd_amcache, that needs to be fixed. This patch introduces metadata
upgrade during its caching, which fills unavailable fields with their default
values. contrib/pageinspect is also patched to handle non-upgraded metapages
in the same way.

Author: Alexander Korotkov
2018-04-05 17:56:00 +03:00
Simon Riggs 01b88b4df5 MERGE minor errata 2018-04-05 13:19:13 +01:00
Simon Riggs 3af7b2b0d4 MERGE fix variable warning in non-assert builds
Author: Jesper Pedersen
2018-04-05 13:02:29 +01:00
Teodor Sigaev 17d8beb4f5 Remove unused vars and mark assert-only vars
Kyotaro HORIGUCHI
2018-04-05 13:16:15 +03:00
Teodor Sigaev 51e6562324 Fix typo
Masahiko Sawada
2018-04-05 13:04:18 +03:00
Simon Riggs 4b2d44031f MERGE post-commit review
Review comments from Andres Freund

* Consolidate code into AfterTriggerGetTransitionTable()
* Rename nodeMerge.c to execMerge.c
* Rename nodeMerge.h to execMerge.h
* Move MERGE handling in ExecInitModifyTable()
  into a execMerge.c ExecInitMerge()
* Move mt_merge_subcommands flags into execMerge.h
* Rename opt_and_condition to opt_merge_when_and_condition
* Wordsmith various comments

Author: Pavan Deolasee
Reviewer: Simon Riggs
2018-04-05 09:54:07 +01:00
Andrew Gierth 1fd8690668 Install errcodes.txt for use by extensions.
Maintainers of out-of-tree PLs typically need access to the set of
error codes. To avoid the need to duplicate that information in some
form in PL source trees, provide errcodes.txt as part of a server
installation.

Thomas Munro, based on a suggestion from Andrew Gierth
Discussion: https://postgr.es/m/87woykk7mu.fsf%40news-spur.riddles.org.uk
2018-04-05 04:05:40 +01:00
Alvaro Herrera 7d7c99790b Restore erroneously removed ONLY from PK check
This is a blind fix, since I don't have SE-Linux to verify it.

Per unwanted change in rhinoceros, running sepgsql tests.  Noted by Tom
Lane.

Discussion: https://postgr.es/m/32347.1522865050@sss.pgh.pa.us
2018-04-04 16:38:11 -03:00
Stephen Frost 446f7f5d78 Rewrite pg_dump TAP tests
This reworks how the tests to run are defined.  Instead of having to
define all runs for all tests, we define those tests which should pass
(generally using one of the defined broad hashes), add in any which
should be specific for this test, and exclude any specific runs that
shouldn't pass for this test.  This ends up removing some 4k+ lines
(more than half the file) but, more importantly, greatly simplifies the
way runs-to-be-tested are defined.

As discussed in the updated comments, for example, take the test which
does CREATE TABLE test_table.  That CREATE TABLE should show up in all
'full' runs of pg_dump, except those cases where 'test_table' is
excluded, of course, and that's exactly how the test gets defined now
(modulo a few other related cases, like where we dump only that table,
or we dump the schema it's in, or we exclude the schema it's in):

like => {
    %full_runs,
    %dump_test_schema_runs,
    only_dump_test_table    => 1,
    section_pre_data        => 1, },
unlike => {
    exclude_dump_test_schema => 1,
    exclude_test_table => 1, }, },

Next, we no longer expect every run to be listed for every test.  If a
run is listed in 'like' (directly or through a hash) then it's a 'like',
unless it's listed in 'unlike' in which case it's an 'unlike'.  If it
isn't listed in either, then it's considered an 'unlike' automatically.

Lastly, this changes the code to no longer use like/unlike but rather to
use 'ok()' with 'diag()' which allows much more control over what gets
spit out to the screen.  Gone are the days of the entire dump being sent
to the console, now you'll just get a couple of lines for each failing
test which say the test that failed and the run that it failed on.

This covers both the pg_dump TAP tests in src/bin/pg_dump and those in
src/test/modules/test_pg_dump.
2018-04-04 15:26:51 -04:00
Tom Lane 1383e2a1a9 Improve FSM management for BRIN indexes.
BRIN indexes like to propagate additions of free space into the upper pages
of their free space maps as soon as the new space is known, even when it's
just on one individual index page.  Previously this required calling
FreeSpaceMapVacuum, which is quite an expensive thing if the map is large.
Use the FreeSpaceMapVacuumRange function recently added by commit c79f6df75
to reduce the amount of work done for this purpose.

Fix a couple of places that neglected to do the upper-page vacuuming at all
after recording new free space.  If the policy is to be that BRIN should do
that, it should do it everywhere.

Do RecordPageWithFreeSpace unconditionally in brin_page_cleanup, and do
FreeSpaceMapVacuum unconditionally in brin_vacuum_scan.  Because of the
FSM's imprecise storage of free space, the old complications here seldom
bought anything, they just slowed things down.  This approach also
provides a predictable path for FSM corruption to be repaired.

Remove premature RecordPageWithFreeSpace call in brin_getinsertbuffer
where it's about to return an extended page to the caller.  The caller
should do that, instead, after it's inserted its new tuple.  Fix the
one caller that forgot to do so.

Simplify logic in brin_doupdate's same-page-update case by postponing
brin_initialize_empty_new_buffer to after the critical section; I see
little point in doing it before.

Avoid repeat calls of RelationGetNumberOfBlocks in brin_vacuum_scan.
Avoid duplicate BufferGetBlockNumber and BufferGetPage calls in
a couple of places where we already had the right values.

Move a BRIN_elog debug logging call out of a critical section; that's
pretty unsafe and I don't think it buys us anything to not wait till
after the critical section.

Move the "*extended = false" step in brin_getinsertbuffer into the
routine's main loop.  There's no actual bug there, since the loop can't
iterate with *extended still true, but it doesn't seem very future-proof
as coded; and it's certainly not documented as a loop invariant.

This is all from follow-on investigation inspired by commit c79f6df75.

Discussion: https://postgr.es/m/5801.1522429460@sss.pgh.pa.us
2018-04-04 14:26:04 -04:00
Alvaro Herrera 3de241dba8 Foreign keys on partitioned tables
Author: Álvaro Herrera
Discussion: https://postgr.es/m/20171231194359.cvojcour423ulha4@alvherre.pgsql
Reviewed-by: Peter Eisentraut
2018-04-04 14:02:49 -03:00
Teodor Sigaev 857f9c36cd Skip full index scan during cleanup of B-tree indexes when possible
Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete
calls and one amvacuumcleanup call. When workload on particular table
is append-only, then autovacuum isn't intended to touch this table. However,
user may run vacuum manually in order to fill visibility map and get benefits
of index-only scans. Then ambulkdelete wouldn't be called for indexes
of such table (because no heap tuples were deleted), only amvacuumcleanup would
be called In this case, amvacuumcleanup would perform full index scan for
two objectives: put recyclable pages into free space map and update index
statistics.

This patch allows btvacuumclanup to skip full index scan when two conditions
are satisfied: no pages are going to be put into free space map and index
statistics isn't stalled. In order to check first condition, we store
oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then
there are some recyclable pages. In order to check second condition we store
number of heap tuples observed during previous full index scan by cleanup.
If fraction of newly inserted tuples is less than
vacuum_cleanup_index_scale_factor, then statistics isn't considered to be
stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default).

This patch bumps B-tree meta-page version. Upgrade of meta-page is performed
"on the fly": during VACUUM meta-page is rewritten with new version. No special
handling in pg_upgrade is required.

Author: Masahiko Sawada, Alexander Korotkov
Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov
Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 19:29:00 +03:00
Heikki Linnakangas 3a5e0a91bb Fix the new ARMv8 CRC code for short and unaligned input.
The code before the main loop, to handle the possible 1-7 unaligned bytes
at the beginning of the input, was broken, and read past the input, if the
the input was very short.
2018-04-04 14:40:39 +03:00
Magnus Hagander ee9e145531 Fix pg_bsaebackup checksum tests
Hopefully fix the fact that these checks are unstable, by introducing
the corruption in a separate table from pg_class, and also explicitly
disable autovacuum on those tables. Also make sure PostgreSQL is
stopped while the corruption is introduced to avoid possible caching
effects.

Author: Michael Banck
2018-04-04 11:37:55 +02:00
Heikki Linnakangas f044d71e33 Use ARMv8 CRC instructions where available.
ARMv8 introduced special CPU instructions for calculating CRC-32C. Use
them, when available, for speed.

Like with the similar Intel CRC instructions, several factors affect
whether the instructions can be used. The compiler intrinsics for them must
be supported by the compiler, and the instructions must be supported by the
target architecture. If the compilation target architecture does not
support the instructions, but adding "-march=armv8-a+crc" makes them
available, then we compile the code with a runtime check to determine if
the host we're running on supports them or not.

For the runtime check, use glibc getauxval() function. Unfortunately,
that's not very portable, but I couldn't find any more portable way to do
it. If getauxval() is not available, the CRC instructions will still be
used if the target architecture supports them without any additional
compiler flags, but the runtime check will not be available.

Original patch by Yuqi Gu, heavily modified by me. Reviewed by Andres
Freund, Thomas Munro.

Discussion: https://www.postgresql.org/message-id/HE1PR0801MB1323D171938EABC04FFE7FA9E3110%40HE1PR0801MB1323.eurprd08.prod.outlook.com
2018-04-04 12:22:45 +03:00
Heikki Linnakangas 638a199fa9 Also fix the descriptions in pg_config.h.win32.
I missed pg_config.h.win32 in the previous commit that fixed these in
pg_config.h.in.
2018-04-04 11:33:39 +03:00
Heikki Linnakangas 8989f52b1b Fix incorrect description of USE_SLICING_BY_8_CRC32C.
And a typo in the description of USE_SSE42_CRC32C_WITH_RUNTIME_CHECK,
spotted by Daniel Gustafsson.
2018-04-04 11:20:53 +03:00
Alvaro Herrera 851f4b4e14 Don't clone internal triggers to partitions
Trigger cloning to partitions was supposed to occur for user-visible
triggers only, but during development the protection that prevented it
from occurring to internal triggers was lost.  Reinstate it, as well as
add a test case to ensure internal triggers (in the tested case,
triggers implementing a deferred unique constraint) are not cloned.
Without the code fix, the partitions in the test end up with different
numbers of triggers, which is clearly wrong ...

Bug in 86f575948c.

Discussion: https://postgr.es/m/20180403214903.ozfagwjcpk337uw7@alvherre.pgsql
2018-04-03 19:08:25 -03:00
Andres Freund 2b3031559a Fix GCC 7 snprintf() compiler warning.
Make buffer 1 byte larger to fit a sign.  It's actually impossible for
there to be a sign in practice, but this is still required to keep GCC 7
happy.

Cleanup from commit 51bc271790.

Based on a suggestion from Peter Eisentraut.

Author: Peter Geoghegan
Reported-By: Peter Eisentraut
Discussion: https://postgr.es/m/d1cc82ed-d07d-cef2-7c00-2e987f121648@2ndquadrant.com
2018-04-03 14:08:41 -07:00
Alvaro Herrera cd5005bc12 Pass correct TupDesc to ri_NullCheck() in Assert
Previous coding was passing the wrong table's tuple descriptor, which
accidentally fails to fail because no existing test case exercises a
foreign key in which the referenced attributes are further to the right
of the referencing attributes.

Add a test so that further breakage is visible.

This got broken in 16828d5c02.

Discussion: https://postgr.es/m/20180403204723.fqte755nukgm42uf@alvherre.pgsql
2018-04-03 18:04:50 -03:00
Tom Lane dddfc4cb2e Prevent accidental linking of system-supplied copies of libpq.so etc.
We were being careless in some places about the order of -L switches in
link command lines, such that -L switches referring to external directories
could come before those referring to directories within the build tree.
This made it possible to accidentally link a system-supplied library, for
example /usr/lib/libpq.so, in place of the one built in the build tree.
Hilarity ensued, the more so the older the system-supplied library is.

To fix, break LDFLAGS into two parts, a sub-variable LDFLAGS_INTERNAL
and the main LDFLAGS variable, both of which are "recursively expanded"
so that they can be incrementally adjusted by different makefiles.
Establish a policy that -L switches for directories in the build tree
must always be added to LDFLAGS_INTERNAL, while -L switches for external
directories must always be added to LDFLAGS.  This is sufficient to
ensure a safe search order.  For simplicity, we typically also put -l
switches for the respective libraries into those same variables.
(Traditional make usage would have us put -l switches into LIBS, but
cleaning that up is a project for another day, as there's no clear
need for it.)

This turns out to also require separating SHLIB_LINK into two variables,
SHLIB_LINK and SHLIB_LINK_INTERNAL, with a similar rule about which
switches go into which variable.  And likewise for PG_LIBS.

Although this change might appear to affect external users of pgxs.mk,
I think it doesn't; they shouldn't have any need to touch the _INTERNAL
variables.

In passing, tweak src/common/Makefile so that the value of CPPFLAGS
recorded in pg_config lacks "-DFRONTEND" and the recorded value of
LDFLAGS lacks "-L../../../src/common".  Both of those things are
mistakes, apparently introduced during prior code rearrangements,
as old versions of pg_config don't print them.  In general we don't
want anything that's specific to the src/common subdirectory to
appear in those outputs.

This is certainly a bug fix, but in view of the lack of field
complaints, I'm unsure whether it's worth the risk of back-patching.
In any case it seems wise to see what the buildfarm makes of it first.

Discussion: https://postgr.es/m/25214.1522604295@sss.pgh.pa.us
2018-04-03 16:26:05 -04:00
Bruce Momjian 242408dbef C comment: mention null handling in BuildTupleFromCStrings()
Discussion: https://postgr.es/m/CAFjFpRcF-wNbe0w-m3NpkEwr9shmOZ=GoESOzd2Wog9h55J8sA@mail.gmail.com

Author: Ashutosh Bapat
2018-04-03 14:01:14 -04:00
Teodor Sigaev 710d90da1f Add prefix operator for TEXT type.
The prefix operator along with SP-GiST indexes can be used as an alternative
for LIKE 'word%' commands  and it doesn't have a limitation of string/prefix
length as B-Tree has.

Bump catalog version

Author: Ildus Kurbangaliev with some editorization by me
Review by: Arthur Zakirov, Alexander Korotkov, and me
Discussion: https://www.postgresql.org/message-id/flat/20180202180327.222b04b3@wp.localdomain
2018-04-03 19:46:45 +03:00
Peter Eisentraut 4ab2999815 Attempt to fix jsonb_plperl build on Windows 2018-04-03 10:43:41 -04:00
Magnus Hagander 10d62d1065 Properly use INT64_FORMAT in output
Per buildfarm animal prairiedog, suggestion solution from Tom.
2018-04-03 16:39:29 +02:00
Magnus Hagander a08dc71195 Fix for checksum validation patch
Reorder the check for non-BLCKSZ size reads to make sure we don't abort
sending the file in this case.

Missed in the previous commit.
2018-04-03 13:57:49 +02:00
Magnus Hagander 4eb77d50c2 Validate page level checksums in base backups
When base backups are run over the replication protocol (for example
using pg_basebackup), verify the checksums of all data blocks if
checksums are enabled. If checksum failures are encountered, log them
as warnings but don't abort the backup.

This becomes the default behaviour in pg_basebackup (provided checksums
are enabled on the server), so add a switch (-k) to disable the checks
if necessary.

Author: Michael Banck
Reviewed-By: Magnus Hagander, David Steele
Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de
2018-04-03 13:47:16 +02:00
Simon Riggs 4923550c20 Tab completion for MERGE
Author: Pavan Deolasee
2018-04-03 12:18:25 +01:00
Simon Riggs aa3faa3c7a WITH support in MERGE
Author: Peter Geoghegan
Recursive support removed, no tests
Docs added by me
2018-04-03 12:13:59 +01:00
Simon Riggs 83454e3c2b New files for MERGE 2018-04-03 10:22:21 +01:00
Simon Riggs d204ef6377 MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.

MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.

MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.

Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.

This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.

Various issues reported via sqlsmith by Andreas Seltenreich

Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs

Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-03 09:28:16 +01:00
Simon Riggs aa5877bb26 Revert "MERGE SQL Command following SQL:2016"
This reverts commit e6597dc353.
2018-04-02 21:36:38 +01:00
Simon Riggs 7cf8a5c302 Revert "Modified files for MERGE"
This reverts commit 354f13855e.
2018-04-02 21:34:15 +01:00
Simon Riggs 354f13855e Modified files for MERGE 2018-04-02 21:12:47 +01:00
Simon Riggs e6597dc353 MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.

MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.

MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.

Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.

This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.

Various issues reported via sqlsmith by Andreas Seltenreich

Authors: Pavan Deolasee, Simon Riggs
Reviewers: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs

Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-02 21:04:35 +01:00
Tom Lane b01f32c313 Fix some dubious WAL-parsing code.
Coverity complained about possible buffer overrun in two places added by
commit 1eb6d6527, and AFAICS it's reasonable to worry: even granting that
the WAL originator properly truncated the commit GID to GIDSIZE, we should
not really bet our lives on that having the same value as it does in the
current build.  Hence, use strlcpy() not strcpy(), and adjust the pointer
advancement logic to be sure we skip over the whole source string even if
strlcpy() truncated it.
2018-04-02 13:46:21 -04:00
Peter Eisentraut 05e85d35af psql: Fix \ef, \sf tab completion
\ef and \sf take any kind of routine, not just normal functions.

Author: Pavel Stehule <pavel.stehule@gmail.com>
2018-04-02 12:46:24 -04:00
Peter Eisentraut 2764d5dcfa Make be-secure-common.c more consistent for future SSL implementations
Recent commit 8a3d9425 has introduced be-secure-common.c, which is aimed
at including backend-side APIs that can be used by any SSL
implementation.  The purpose is similar to fe-secure-common.c for the
frontend-side APIs.

However, this has forgotten to include check_ssl_key_file_permissions()
in the move, which causes a double dependency between be-secure.c and
be-secure-openssl.c.

Refactor the code in a more logical way.  This also puts into light an
API which is usable by future SSL implementations for permissions on SSL
key files.

Author: Michael Paquier <michael@paquier.xyz>
2018-04-02 11:37:40 -04:00
Robert Haas 7e0d64c7a5 postgres_fdw: Push down partition-wise aggregation.
Since commit 7012b132d0, postgres_fdw
has been able to push down the toplevel aggregation operation to the
remote server.  Commit e2f1eb0ee3 made
it possible to break down the toplevel aggregation into one
aggregate per partition.  This commit lets postgres_fdw push down
aggregation in that case just as it does at the top level.

In order to make this work, this commit adds an additional argument
to the GetForeignUpperPaths FDW API.  A matching argument is added
to the signature for create_upper_paths_hook.  Third-party code using
either of these will need to be updated.

Also adjust create_foreignscan_plan() so that it picks up the correct
set of relids in this case.

Jeevan Chalke, reviewed by Ashutosh Bapat and by me and with some
adjustments by me.  The larger patch series of which this patch is a
part was also reviewed and tested by Antonin Houska, Rajkumar
Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal
Legrand, and Rafia Sabih.

Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
2018-04-02 10:51:50 -04:00
Tom Lane 0b11a674fb Fix a boatload of typos in C comments.
Justin Pryzby

Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
2018-04-01 15:01:28 -04:00
Andres Freund 686d399f2b Fix non-portable use of round().
round() is from C99.  Use rint() instead.  There are behavioral
differences between round() and rint(), but they should not matter to
the Bloom filter optimal_k() function.  We already assume POSIX
behavior for rint(), so there is no question of rint() not using
"rounds towards nearest" as its rounding mode.

Cleanup from commit 51bc271790.

Per buildfarm member thrips.

Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn76eCGUonARy-wrVtMHsf+4cvbK_oJAWTLfORTU5ki0w@mail.gmail.com
2018-03-31 20:26:47 -07:00
Andres Freund 51bc271790 Add Bloom filter implementation.
A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership.  Callers will sometimes incur false
positives, but never false negatives.  The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.

Two classic applications of Bloom filters are cache filtering, and data
synchronization testing.  Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.

This commit adds a test harness extension module, test_bloomfilter.  It
can be used to get a sense of how the Bloom filter implementation
performs under varying conditions.

This is infrastructure for the upcoming "heapallindexed" amcheck patch,
which verifies the consistency of a heap relation against one of its
indexes.

Author: Peter Geoghegan
Reviewed-By: Andrey Borodin, Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g@mail.gmail.com
2018-03-31 17:49:41 -07:00
Andrew Dunstan ed69864350 Small cleanups in fast default code.
Problems identified by Andres Freund and Haribabu Kommi
2018-04-01 08:16:18 +09:30
Tom Lane 94173d3eeb Fix assorted issues in parallel vacuumdb.
Avoid storing the result of PQsocket() in a pgsocket variable; it's
declared as int, and the no-socket test is properly written as "x < 0"
not "x == PGINVALID_SOCKET".  This accidentally had no bad effect
because we never got to init_slot() with a bad connection, but it's
still wrong.

Actually, it seems like we should avoid storing the result for a long
period at all.  The function's not so expensive that it's worth avoiding,
and the existing coding technique here would fail if anyone tried to
PQreset the connection during the life of the program.  Hence, just
re-call PQsocket every time we construct a select(2) mask.

Speaking of select(), GetIdleSlot imagined that it could compute the
select mask once and continue to use it over multiple calls to
select_loop(), which is pretty bogus since that would stomp on the
mask on return.  This could only matter if the function's outer loop
iterated more than once, which is unlikely (it'd take some connection
receiving data, but not enough to complete its command).  But if it
did happen, we'd acquire "tunnel vision" and stop watching the other
connections for query termination, with the effect of losing parallelism.

Another way in which GetIdleSlot could lose parallelism is that once
PQisBusy returns false, it would lock in on that connection and do
PQgetResult until that returns NULL; in some cases that could result
in blocking.  (Perhaps this can never happen in vacuumdb due to the
limited set of commands that it can issue, but I'm not quite sure
of that, and even if true today it's not a future-proof assumption.)
Refactor the code to do that properly, so that it risks blocking in
PQgetResult only in cases where we need to wait anyway.

Another loss-of-parallelism problem, which *is* easily demonstrable,
is that any setup queries issued during prepare_vacuum_command() were
always issued on the last-to-be-created connection, whether or not
that was idle.  Long-running operations on that connection thus
prevented issuance of additional operations on the other ones, except
in the limited cases where no preparatory query was needed.  Instead,
wait till we've identified a free connection and use that one.

Also, avoid core dump due to undersized malloc request in the case
that no tables are identified to be vacuumed.

The bogus no-socket test was noted by CharSyam, the other problems
identified in my own code review.  Back-patch to 9.5 where parallel
vacuumdb was introduced.

Discussion: https://postgr.es/m/CAMrLSE6etb33-192DTEUGkV-TsvEcxtBDxGWG1tgNOMnQHwgDA@mail.gmail.com
2018-03-31 16:28:52 -04:00
Tom Lane 5635c7aa67 Fix portability and translatability issues in commit 64f85894a.
Compilation failed for lack of an #ifdef on builds without
pg_strong_random().  Also fix relevant error messages to meet
project style guidelines.

Fabien Coelho, further adjusted by me

Discussion: https://postgr.es/m/32390.1522464534@sss.pgh.pa.us
2018-03-31 12:32:35 -04:00
Tom Lane b0c90c85fc Portability fix for commit 9a895462d.
So far as I can find, NI_MAXHOST isn't actually required anywhere by
POSIX.  Nonetheless, commit 9a895462d supposed that it could rely on
having that symbol without any ceremony at all.  We do have a hack
for providing it if the platform doesn't, in getaddrinfo.h, so fix
the problem by #including that file.  Per buildfarm.
2018-03-30 20:52:13 -04:00
Andres Freund a4ebbd2752 Remove PARTIAL_LINKING build mode.
In 9956ddc191, ten years ago, the
current objfile.txt based linking model was introduced.  It's time to
retire the old SUBSYS.o based model.

This primarily is pertinent because the bitcode files for LLVM based
inlining are not produced when using PARTIAL_LINKING. It does not seem
worth to fix PARTIAL_LINKING to support that.

Author: Andres Freund
Discussion: https://postgr.es/m/20180121204356.d5oeu34jetqhmdv2@alap3.anarazel.de
2018-03-30 17:33:04 -07:00
Tatsuo Ishii 1b26bd4089 Fix bug with view locking code.
LockViewRecurese() obtains view relation using heap_open() and passes
it to get_view_query() to get view info. It immediately closes the
relation then uses the returned view info by calling
LockViewRecurse_walker().  Since get_view_query() returns a pointer
within the relcache, the relcache should be kept until
LockViewRecurse_walker() returns. Otherwise the relation could point
to a garbage memory area.

Fix is moving the heap_close() call after LockViewRecurse_walker().

Problem reported by Tom Lane (buildfarm is unhappy, especially prion
since it enables -DRELCACHE_FORCE_RELEASE cpp flag), fix by me.
2018-03-31 09:26:43 +09:00
Andres Freund 3e256e5506 Add SKIP_LOCKED option to RangeVarGetRelidExtended().
This will be used for VACUUM (SKIP LOCKED).

Author: Nathan Bossart
Reviewed-By: Michael Paquier and Andres Freund
Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
2018-03-30 17:05:16 -07:00
Andres Freund d87510a524 Combine options for RangeVarGetRelidExtended() into a flags argument.
A followup patch will add a SKIP_LOCKED option. To avoid introducing
evermore arguments, breaking existing callers each time, introduce a
flags argument. This'll no doubt break a few external users...

Also change the MISSING_OK behaviour so a DEBUG1 debug message is
emitted when a relation is not found.

Author: Nathan Bossart
Reviewed-By: Michael Paquier and Andres Freund
Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
2018-03-30 17:05:16 -07:00
Fujii Masao 9a895462d9 Enhance pg_stat_wal_receiver view to display host and port of sender server.
Previously there was no way in the standby side to find out the host and port
of the sender server that the walreceiver was currently connected to when
multiple hosts and ports were specified in primary_conninfo. For that purpose,
this patch adds sender_host and sender_port columns into pg_stat_wal_receiver
view. They report the host and port that the active replication connection
currently uses.

Bump catalog version.

Author: Haribabu Kommi
Reviewed-by: Michael Paquier and me

Discussion: https://postgr.es/m/CAJrrPGcV_aq8=cdqkFhVDJKEnDQ70yRTTdY9RODzMnXNrCz2Ow@mail.gmail.com
2018-03-31 07:51:22 +09:00
Tom Lane 11002f8afa Fix bogus provolatile/proparallel markings on a few built-in functions.
Richard Yen reported that pg_upgrade failed if the target cluster had
force_parallel_mode = on, because binary_upgrade_create_empty_extension()
is marked parallel restricted, allowing it to be executed in parallel
mode, which complains because it tries to acquire an XID.

In general, no function that might try to modify database data should
be considered parallel safe or restricted, since execution of it might
force XID acquisition.  We found several other examples of this mistake.

Furthermore, functions that execute user-supplied SQL queries or query
fragments, or pull data from user-supplied cursors, had better be marked
both volatile and parallel unsafe, because we don't know what the supplied
query or cursor might try to do.  There were several tsquery and XML
functions that had the wrong proparallel marking for this, and some of
them were even mislabeled as to volatility.

All these bugs are old, dating back to 9.6 for the proparallel mistakes
and much further for the provolatile mistakes.  We can't force a
catversion bump in the back branches, but we can at least ensure that
installations initdb'd in future have the right values.

Thomas Munro and Tom Lane

Discussion: https://postgr.es/m/CAEepm=2sNDScSLTfyMYu32Q=ob98ZGW-vM_2oLxinzSABGQ6VA@mail.gmail.com
2018-03-30 18:14:51 -04:00
Tom Lane 4a33bb59df Ensure that WAL pages skipped by a forced WAL switch are zero-filled.
In the previous coding, skipped pages were mostly zeroes, but they still
had valid WAL page headers.  That makes them very much less compressible
than an unbroken string of zeroes would be --- about 10X worse for bzip2
compression, for instance.  We don't need those headers, so tweak the logic
so that we zero them out.

Chapman Flack, reviewed by Daniel Gustafsson

Discussion: https://postgr.es/m/579297F8.7020107@anastigmatix.net
2018-03-30 16:18:18 -04:00
Tom Lane e5eb4fa873 Remove obsolete SLRU wrapping and warnings from predicate.c.
When SSI was developed, slru.c was limited to segment files with names in
the range 0000-FFFF.  This didn't allow enough space for predicate.c to
store every possible XID when spilling old transactions to disk, so it
would wrap around sooner and print warnings.  Since commits 638cf09e and
73c986ad increased the number of segment files slru.c could manage, that
behavior is unnecessary.  Therefore remove that code.

Also remove the macro OldSerXidSegment, which has been unused since
4cd3fb6e.

Thomas Munro, reviewed by Anastasia Lubennikova

Discussion: https://postgr.es/m/CAEepm=3XfsTSxgEbEOmxu0QDiXy0o18NUg2nC89JZcCGE+XFPA@mail.gmail.com
2018-03-30 15:11:39 -04:00
Tom Lane 1bb9e731e1 Improve out-of-memory error reports by including memory context name.
Add the target context's name to the errdetail field of "out of memory"
errors in mcxt.c.  Per discussion, this seems likely to be useful to
help narrow down the cause of a reported failure, and it costs little.
Also, now that context names are required to be compile-time constants
in all cases, there's little reason to be concerned about security
issues from exposing these names to users.  (Because of such concerns,
we are *not* including the context "ident" field.)

In passing, add unlikely() markers to the allocation-failed tests,
just to be sure the compiler is on the right page about that.
Also, in palloc and friends, copy CurrentMemoryContext into a local
variable, as that's almost surely cheaper to reference than a global.

Discussion: https://postgr.es/m/1099.1522285628@sss.pgh.pa.us
2018-03-30 13:53:33 -04:00
Tom Lane c79f6df75d Do index FSM vacuuming sooner.
In btree and SP-GiST indexes, move the responsibility for calling
IndexFreeSpaceMapVacuum from the vacuumcleanup phase to the bulkdelete
phase, and do it if and only if we found some pages that could be put into
FSM.  As in commit 851a26e26, the idea is to make free pages visible to FSM
searchers sooner when vacuuming very large tables (large enough to need
multiple bulkdelete scans).  This adds more redundant work than that commit
did, since we have to scan the entire index FSM each time rather than being
able to localize what needs to be updated; but it still seems worthwhile.
However, we can buy something back by not touching the FSM at all when
there are no pages that can be put in it.  That will result in slower
recovery from corrupt upper FSM pages in such a scenario, but it doesn't
seem like that's a case we need to optimize for.

Hash indexes don't use FSM at all.  GIN, GiST, and bloom indexes update
FSM during the vacuumcleanup phase not bulkdelete, so that doing something
comparable to this would be a much more invasive change, and it's not clear
it's worth it.  BRIN indexes do things sufficiently differently that this
change doesn't apply to them, either.

Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me

Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-30 11:48:20 -04:00
Robert Haas 96030f9a48 Don't call IS_DUMMY_REL() when cheapest_total_path might be junk.
Unlike the previous coding, this might result in a Gather per Append
subplan when the target list is parallel-restricted, but such a plan
is probably worth considering in that case, since a single Gather
on top of the entire Append is impossible.

Per Andres Freund and the buildfarm.

Discussion: http://postgr.es/m/20180330050351.bmxx4cdtz67czjda@alap3.anarazel.de
2018-03-30 11:40:41 -04:00
Teodor Sigaev 43d1ed60fd Predicate locking in GIN index
Predicate locks are used on per page basis only if fastupdate = off, in
opposite case predicate lock on pending list will effectively lock whole index,
to reduce locking overhead, just lock a relation. Entry and posting trees are
essentially B-tree, so locks are acquired on leaf pages only.

Author: Shubham Barai with some editorization by me and Dmitry Ivanov
Review by: Alexander Korotkov, Dmitry Ivanov, Fedor Sigaev
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPt5sWW+EwTaKUGFL5_XFcZ0MuGBcyJ70oqbWqr42YKR8Q@mail.gmail.com
2018-03-30 14:23:17 +03:00
Magnus Hagander 019fa576ca Fix typo in comment
Author: Michael Paquier <michael@paquier.xyz>
2018-03-30 12:35:13 +02:00
Tatsuo Ishii 34c20de4d0 Allow to lock views.
Now all tables used in view definitions can be recursively locked by a
LOCK command.

Author: Yugo Nagata
Reviewed by Robert Haas, Thomas Munro and me.

Discussion: https://postgr.es/m/20171011183629.eb2817b3.nagata%40sraoss.co.jp
2018-03-30 09:18:02 +09:00
Andres Freund fb60478011 Improve JIT docs.
Author: John Naylor and Andres Freund
Discussion: https://postgr.es/m/CAJVSVGUs-VcwSY7-Kx-GQe__8hvWuA4Uhyf3gxoMXeiZqebE9g@mail.gmail.com
2018-03-29 16:13:40 -07:00
Robert Haas c1de1a3a8b Remove 'target' from GroupPathExtraData.
It's not needed.

Jeevan Chalke

Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
2018-03-29 16:17:18 -04:00
Robert Haas 11cf92f6e2 Rewrite the code that applies scan/join targets to paths.
If the toplevel scan/join target list is parallel-safe, postpone
generating Gather (or Gather Merge) paths until after the toplevel has
been adjusted to return it.  This (correctly) makes queries with
expensive functions in the target list more likely to choose a
parallel plan, since the cost of the plan now reflects the fact that
the evaluation will happen in the workers rather than the leader.
The original complaint about this problem was from Jeff Janes.

If the toplevel scan/join relation is partitioned, recursively apply
the changes to all partitions.  This sometimes allows us to get rid of
Result nodes, because Append is not projection-capable but its
children may be.  It also cleans up what appears to be incorrect SRF
handling from commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old
code had no knowledge of SRFs for child scan/join rels.

Because we now use create_projection_path() in some cases where we
formerly used apply_projection_to_path(), this changes the ordering
of columns in some queries generated by postgres_fdw.  Update
regression outputs accordingly.

Patch by me, reviewed by Amit Kapila and by Ashutosh Bapat.  Other
fixes for this problem (substantially different from this version)
were reviewed by Dilip Kumar, Amit Khandekar, and Marina Polyakova.

Discussion: http://postgr.es/m/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com
2018-03-29 15:49:31 -04:00
Robert Haas 3f90ec8597 Postpone generate_gather_paths for topmost scan/join rel.
Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths.  Instead, do the work in
grouping_planner.  By itself, this gains nothing; in fact it loses
slightly because we end up calling set_cheapest() for the topmost
scan/join rel twice rather than once.  However, it paves the way for
a future commit which will postpone generate_gather_paths for the
topmost scan/join relation even further, allowing more accurate
costing of parallel paths.

Amit Kapila and Robert Haas.  Earlier versions of this patch (which
different substantially) were reviewed by Dilip Kumar, Amit
Khandekar, Marina Polyakova, and Ashutosh Bapat.
2018-03-29 15:40:40 -04:00
Robert Haas d7c19e62a8 Teach create_projection_plan to omit projection where possible.
We sometimes insert a ProjectionPath into a plan tree when projection
is not strictly required. The existing code already arranges to avoid
emitting a Result node when the ProjectionPath's subpath can perform
the projection itself, but previously it didn't consider the
possibility that the parent node might not actually require the
projection to be performed at all.

Skipping projection when it's not required can not only avoid Result
nodes that aren't needed, but also avoid losing the "physical tlist"
optimization unneccessarily.

Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoakT5gmahbPWGqrR2nAdFOMAOnOXYoWHRdVfGWs34t6_A@mail.gmail.com
2018-03-29 15:37:48 -04:00
Bruce Momjian 20b4323bd1 C comments: "a" <--> "an" corrections
Reported-by: Michael Paquier, Abhijit Menon-Sen

Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz

Author: Michael Paquier, Abhijit Menon-Sen, me
2018-03-29 15:18:53 -04:00
Bruce Momjian 3282c4c136 README change: update for hash access method
Reported-by: Thomas Munro, Justin Pryzby

Discussion: https://postgr.es/m/CAEepm=1_682z-09DNHj4GkCJAqWK-D6h9Oq5ea84T1oqq1-Utg@mail.gmail.com
2018-03-29 14:38:39 -04:00
Magnus Hagander 8cdc834647 Fix incorrect copy/paste in comment
Author: Alexander Korotkov <a.korotkov@postgrespro.ru>
2018-03-29 19:11:05 +02:00
Magnus Hagander 9778d5c180 Fix typo in comment
Author: Daniel Gustafsson <daniel@yesql.se>
2018-03-29 19:10:04 +02:00
Tom Lane 2b1759e267 Remove unnecessary BufferGetPage() calls in fsm_vacuum_page().
Just noticed that these were quite redundant, since we're holding the
page address in a local variable anyway, and we have pin on the buffer
throughout.

Also improve a comment.
2018-03-29 12:44:19 -04:00
Tom Lane a063baaced Remove UpdateFreeSpaceMap(), use FreeSpaceMapVacuumRange() instead.
FreeSpaceMapVacuumRange has the same effect, is more efficient if many
pages are involved, and makes fewer assumptions about how it's used.
Notably, Claudio Freire pointed out that UpdateFreeSpaceMap could fail
if the specified freespace value isn't the maximum possible.  This isn't
a problem for the single existing user, but the function represents an
attractive nuisance IMO, because it's named as though it were a
general-purpose update function and its limitations are undocumented.
In any case we don't need multiple ways to get the same result.

In passing, do some code review and cleanup in RelationAddExtraBlocks.
In particular, I see no excuse for it to omit the PageIsNew safety check
that's done in the mainline extension path in RelationGetBufferForTuple.

Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 12:22:44 -04:00
Bruce Momjian bc0021ef09 C comment: fix wording about shared memory message queue
Reported-by: Tels

Discussion: https://postgr.es/m/e66e05bc55f5ce904e361ad17a3395ae.squirrel@sm.webmail.pair.com
2018-03-29 12:18:42 -04:00
Tom Lane 851a26e266 While vacuuming a large table, update upper-level FSM data every so often.
VACUUM updates leaf-level FSM entries immediately after cleaning the
corresponding heap blocks.  fsmpage.c updates the intra-page search trees
on the leaf-level FSM pages when this happens, but it does not touch the
upper-level FSM pages, so that the released space might not actually be
findable by searchers.  Previously, updating the upper-level pages happened
only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum()
call.  This is bad because the VACUUM might get canceled before ever
reaching that point, so that from the point of view of searchers no space
has been freed at all, leading to table bloat.

We can improve matters by updating the upper pages immediately after each
cycle of index-cleaning and heap-cleaning, processing just the FSM pages
corresponding to the range of heap blocks we have now fully cleaned.
This adds a small amount of extra work, since the FSM pages leading down
to each range boundary will be touched twice, but it's pretty negligible
compared to everything else going on in a large VACUUM.

If there are no indexes, VACUUM doesn't work in cycles but just cleans
each heap page on first visit.  In that case we just arbitrarily update
upper FSM pages after each 8GB of heap.  That maintains the goal of not
letting all this work slide until the very end, and it doesn't seem worth
expending extra complexity on a case that so seldom occurs in practice.

In either case, the FSM is fully up to date before any attempt is made
to truncate the relation, so that the most likely scenario for VACUUM
cancellation no longer results in out-of-date upper FSM pages.  When
we do successfully truncate, adjusting the FSM to reflect that is now
fully handled within FreeSpaceMapTruncateRel.

Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me

Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 11:29:54 -04:00
Teodor Sigaev c0cbe00fee Add casts from jsonb
Add explicit cast from scalar jsonb to all numeric and bool types. It would be
better to have cast from scalar jsonb to text too but there is already a cast
from jsonb to text as just text representation of json. There is no way to have
two different casts for the same type's pair.

Bump catalog version

Author: Anastasia Lubennikova with editorization by Nikita Glukhov and me
Review by: Aleksander Alekseev, Nikita Glukhov, Darafei Praliaskouski
Discussion: https://www.postgresql.org/message-id/flat/0154d35a-24ae-f063-5273-9ffcdf1c7f2e@postgrespro.ru
2018-03-29 16:33:56 +03:00
Magnus Hagander 669820a3d9 Fix typo in comment
Arthur Zakirov, confirmed by Thomas Munro
2018-03-29 11:42:32 +02:00
Peter Eisentraut 056a5a3f63 Allow committing inside cursor loop
Previously, committing or aborting inside a cursor loop was prohibited
because that would close and remove the cursor.  To allow that,
automatically convert such cursors to holdable cursors so they survive
commits or rollbacks.  Portals now have a new state "auto-held", which
means they have been converted automatically from pinned.  An auto-held
portal is kept on transaction commit or rollback, but is still removed
when returning to the main loop on error.

This supports all languages that have cursor loop constructs: PL/pgSQL,
PL/Python, PL/Perl.

Reviewed-by: Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>
2018-03-28 19:03:26 -04:00
Bruce Momjian a2894cce54 C comment: fix typo, log -> lag
Reported-by: atorikoshi

Discussion: https://postgr.es/m/b61f2ab9-c0e0-d33d-ce3f-42a228025681@lab.ntt.co.jp

Author: atorikoshi
2018-03-28 18:23:47 -04:00
Andres Freund a0a08c1d85 Fix mistakes in the just added JIT docs.
Reported-By: Lukas Fittl
Author: Andres Freund
2018-03-28 15:07:08 -07:00
Andres Freund e6c039d13e Add documentation for the JIT feature.
As promised in earlier commits, this adds documentation about the new
build options, the new GUCs, about the planner logic when JIT is used,
and the benefits of JIT in general.

Also adds a more implementation oriented README.

I'm sure we're going to want to expand this further, but I think this
is a reasonable start.

Author: Andres Freund, with contributions by Thomas Munro
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-28 14:22:42 -07:00
Andres Freund 1f0c6a9e7d Add EXPLAIN support for JIT.
This just shows a few details about JITing, e.g. how many functions
have been JITed, and how long that took.  To avoid noise in regression
tests with functions sometimes being JITed in --with-llvm builds,
disable display when COSTS OFF is specified.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-28 13:26:51 -07:00
Andres Freund 9370462e9a Add inlining support to LLVM JIT provider.
This provides infrastructure to allow JITed code to inline code
implemented in C. This e.g. can be postgres internal functions or
extension code.

This already speeds up long running queries, by allowing the LLVM
optimizer to optimize across function boundaries. The optimization
potential currently doesn't reach its full potential because LLVM
cannot optimize the FunctionCallInfoData argument fully away, because
it's allocated on the heap rather than the stack. Fixing that is
beyond what's realistic for v11.

To be able to do that, use CLANG to convert C code to LLVM bitcode,
and have LLVM build a summary for it. That bitcode can then be used to
to inline functions at runtime. For that the bitcode needs to be
installed. Postgres bitcode goes into $pkglibdir/bitcode/postgres,
extensions go into equivalent directories.  PGXS has been modified so
that happens automatically if postgres has been compiled with LLVM
support.

Currently this isn't the fastest inline implementation, modules are
reloaded from disk during inlining. That's to work around an apparent
LLVM bug, triggering an apparently spurious error in LLVM assertion
enabled builds.  Once that is resolved we can remove the superfluous
read from disk.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-28 13:19:08 -07:00
Andres Freund 8a934d6778 Use isinf builtin for clang, for performance.
When compiling with clang glibc's definition of isinf() ends up
leading to and external libc function call. That's because there was a
bug in the builtin in an old gcc version, and clang claims
compatibility with an older version.  That causes clang to be
measurably slower for floating point heavy workloads than gcc.

To fix simply redirect isinf when using clang and clang confirms it
has __builtin_isinf().
2018-03-28 13:12:15 -07:00
Fujii Masao 266b6acb31 Make pg_rewind skip files and directories that are removed during server start.
The target cluster that was rewound needs to perform recovery from
the checkpoint created at failover, which leads it to remove or recreate
some files and directories that may have been copied from the source
cluster. So pg_rewind can skip synchronizing such files and directories,
and which reduces the amount of data transferred during a rewind
without changing the usefulness of the operation.

Author: Michael Paquier
Reviewed-by: Anastasia Lubennikova, Stephen Frost and me

Discussion: https://postgr.es/m/20180205071022.GA17337@paquier.xyz
2018-03-29 04:56:52 +09:00
Fujii Masao 09e96b3f35 Fix handling of files that source server removes during pg_rewind is running.
After processing the filemap to build the list of chunks that will be
fetched from the source to rewing the target server, it is possible that
a file which was previously processed is removed from the source.  A
simple example of such an occurence is a WAL segment which gets recycled
on the target in-between.  When the filemap is processed, files not
categorized as relation files are first truncated to prepare for its
full copy of which is going to be taken from the source, divided into a
set of junks.  However, for a recycled WAL segment, this would result in
a segment which has a zero-byte size.  With such an empty file,
post-rewind recovery thinks that records are saved but they are actually
not because of the truncation which happened when processing the
filemap, resulting in data loss.

In order to fix the problem, make sure that files which are found as
removed on the source when receiving chunks of them are as well deleted
on the target server for consistency.

Back-patch to 9.5 where pg_rewind was added.

Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier
Reported-by: Tsunakawa Takayuki

Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8DAAA2%40G01JPEXMBYT05
2018-03-29 04:00:21 +09:00
Peter Eisentraut d92bc83c48 PL/pgSQL: Nested CALL with transactions
So far, a nested CALL or DO in PL/pgSQL would not establish a context
where transaction control statements were allowed.  This fixes that by
handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic
execution context through and doing the required management around
transaction boundaries.

Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
2018-03-28 13:31:27 -04:00
Tom Lane c2d4eb1b1f Fix actual and potential double-frees around tuplesort usage.
tuplesort_gettupleslot() passed back tuples allocated in the tuplesort's
own memory context, even when the caller was responsible to free them.
This created a double-free hazard, because some callers might destroy
the tuplesort object (via tuplesort_end) before trying to clean up the
last returned tuple.  To avoid this, change the API to specify that the
tuple is allocated in the caller's memory context.  v10 and HEAD already
did things that way, but in 9.5 and 9.6 this is a live bug that can
demonstrably cause crashes with some grouping-set usages.

In 9.5 and 9.6, this requires doing an extra tuple copy in some cases,
which is unfortunate.  But the amount of refactoring needed to avoid it
seems excessive for a back-patched change, especially since the cases
where an extra copy happens are less performance-critical.

Likewise change tuplesort_getdatum() to return pass-by-reference Datums
in the caller's context not the tuplesort's context.  There seem to be
no live bugs among its callers, but clearly the same sort of situation
could happen in future.

For other tuplesort fetch routines, continue to allocate the memory in
the tuplesort's context.  This is a little inconsistent with what we now
do for tuplesort_gettupleslot() and tuplesort_getdatum(), but that's
preferable to adding new copy overhead in the back branches where it's
clearly unnecessary.  These other fetch routines provide the weakest
possible guarantees about tuple memory lifespan from v10 on, anyway,
so this actually seems more consistent overall.

Adjust relevant comments to reflect these API redefinitions.

Arguably, we should change the pre-9.5 branches as well, but since
there are no known failure cases there, it seems not worth the risk.

Peter Geoghegan, per report from Bernd Helmle.  Reviewed by Kyotaro
Horiguchi; thanks also to Andreas Seltenreich for extracting a
self-contained test case.

Discussion: https://postgr.es/m/1512661638.9720.34.camel@oopsware.de
2018-03-28 13:26:57 -04:00
Simon Riggs 1eb6d6527a Store 2PC GID in commit/abort WAL recs for logical decoding
Store GID of 2PC in commit/abort WAL records when wal_level = logical.
This allows logical decoding to send the SAME gid to subscribers
across restarts of logical replication.

Track relica origin replay progress for 2PC.

(Edited from patch 0003 in the logical decoding 2PC series.)

Authors: Nikhil Sontakke, Stas Kelvich
Reviewed-by: Simon Riggs, Andres Freund
2018-03-28 17:42:50 +01:00
Peter Eisentraut 75e95dd79b Attempt to fix jsonb_plpython build on Windows 2018-03-28 11:49:23 -04:00
Andrew Dunstan a437551a22 Make fast_default regression tests locale independent 2018-03-28 17:06:45 +10:30
Simon Riggs 5b0d7f6996 Use pg_stat_get_xact* functions within xacts
Resolve build farm failures from c203d6cf81,
diagnosed by Tom Lane.

The output of pg_stat_get_xact_tuples_hot_updated() and friends
is not guaranteed to show anything after the transaction completes.
Data is flushed slowly to stats collector, so using them can
give timing issues.
2018-03-28 05:21:00 +01:00
Andres Freund f4f5845b31 Quick adaption of JIT tuple deforming to the fast default patch.
Instead using memset to set tts_isnull, call the new
slot_getmissingattrs().

Also fix a bug (= instead of >=) in the code generation. Normally = is
correct, but when repeatedly deforming fields not in a
tuple (e.g. deform up to natts + 1 and then natts + 2) >= is needed.

Discussion: https://postgr.es/m/20180328010053.i2qvsuuusst4lgmc@alap3.anarazel.de
2018-03-27 21:03:10 -07:00
Andres Freund b4013b8e4a Add catversion bump missed in 16828d5c0.
Given that pg_attribute changed its layout...
2018-03-27 19:07:39 -07:00
Andrew Dunstan 16828d5c02 Fast ALTER TABLE ADD COLUMN with a non-NULL default
Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.

Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.

The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g.  catalog relations) NULL can be passed for this
argument.

Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.

Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
2018-03-28 10:43:52 +10:30
Tom Lane ef1978d6ed Update pgindent's typedefs blacklist, and make it easier to adjust.
It seems that all buildfarm members are now using the <stdbool.h> code
path, so that none of them report "bool" as a typedef.  We still need it
to be treated that way, so adjust pgindent to force that whether or not
it's in the given list.

Also, the recent introduction of LLVM infrastructure has caused the
appearance of some typedef names that we definitely *don't* want
treated as typedefs, such as "string" and "abs".  Extend the existing
blacklist to include these.  (Additions based on comparing v10's
typedefs list to what the buildfarm is currently emitting.)

Rearrange the code so that the lists of whitelisted/blacklisted
names are a bit easier to find and modify.

Andrew Dunstan and Tom Lane

Discussion: https://postgr.es/m/28690.1521912334@sss.pgh.pa.us
2018-03-27 18:15:39 -04:00
Tom Lane 442accc3fe Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident".  The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to.  The "ident" string, if
wanted, is supplied later.  This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.

The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there.  And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.

All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead.  This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other.  Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.

I also arranged for plancache contexts to use the query source string
as their identifier.  This is basically free for CachedPlanSources, as
they contained a copy of that string already.  We pay an extra pstrdup
to do it for CachedPlans.  That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first).  I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.

This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format.  This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.

The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1.  Seems
better for extension authors to endure just one round of API changes
not two.

Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:51 -04:00
Simon Riggs c203d6cf81 Allow HOT updates for some expression indexes
If the value of an index expression is unchanged after UPDATE,
allow HOT updates where previously we disallowed them, giving
a significant performance boost in those cases.

Particularly useful for indexes such as JSON->>field where the
JSON value changes but the indexed value does not.

Submitted as "surjective indexes" patch, now enabled by use
of new "recheck_on_update" parameter.

Author: Konstantin Knizhnik
Reviewer: Simon Riggs, with much wordsmithing and some cleanup
2018-03-27 19:57:02 +01:00
Peter Eisentraut 1944cdc982 libpq: PQhost to return active connected host or hostaddr
Previously, PQhost didn't return the connected host details when the
connection type was CHT_HOST_ADDRESS (i.e., via hostaddr).  Instead, it
returned the complete host connection parameter (which could contain
multiple hosts) or the default host details, which was confusing and
arguably incorrect.

Change this to return the actually connected host or hostaddr
irrespective of the connection type.  When hostaddr but no host was
specified, hostaddr is now returned.  Never return the original host
connection parameter, and document that PQhost cannot be relied on
before the connection is established.

PQport is similarly changed to always return the active connection port
and never the original connection parameter.

Author: Hari Babu <kommi.haribabu@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
2018-03-27 12:32:18 -04:00
Teodor Sigaev 44bd95846a Fix count of skipped test of basebackup on Windows
Commit 920a5e500a add tests which should be
skipped on Windows boxes, but patch doesn't contain right count of them.

David Steel
2018-03-27 17:40:56 +03:00
Teodor Sigaev 920a5e500a Skip temp tables from basebackup.
Do not store temp tables in basebackup, they will not be visible anyway, so,
there are not reasons to store them.

Author: David Steel
Reviewed by: me
Discussion: https://www.postgresql.org/message-id/flat/5ea4d26a-a453-c1b7-eff9-5a3ef8f8aceb@pgmasters.net
2018-03-27 16:14:40 +03:00
Teodor Sigaev 3ad55863e9 Add predicate locking for GiST
Add page-level predicate locking, due to gist's code organization, patch seems
close to trivial: add check before page changing, add predicate lock before page
scanning.  Although choosing right place to check is not simple: it should not
be called during index build, it should support insertion of new downlink and so
on.

Author: Shubham Barai with editorization by me and Alexander Korotkov
Reviewed by: Alexander Korotkov, Andrey Borodin, me
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPtdcANpw5ePU3LvnTP8HCENFw6wygupQAyNBgD-sG3h0g@mail.gmail.com
2018-03-27 15:43:19 +03:00
Andres Freund 4b9094eb6e Adapt to LLVM 7+ Orc API changes.
This is mostly done to be able to validate features and fixes
submitted to LLVM. Given the size of these changes that seems
acceptable.

Author: Andres Freund
2018-03-26 16:04:53 -07:00
Andres Freund 071371bc43 LLVMJIT: Free created module in LLVM < 5.
Due to the differing APIs between versions, I forgot to deallocate the
generated module in older LLVM versions, leading to a memory leak.

Author: Andres Freund
2018-03-26 16:04:39 -07:00
Andres Freund 0976c4ddd4 Make new regression indpendent of max_parallel_workers_per_gather.
The tests in e2f1eb0ee3 ("Implement partition-wise
grouping/aggregation.") weren't independent of the server's
max_parallel_workers_per_gather setting.  I (Andres) find it useful to
locally run with that disabled, and the aforementioned patch broke
this.

Author: Jeevan Chalke
Discussion:
    https://postgr.es/m/20180322210703.qmga3vsxqmiiypci@alap3.anarazel.de
    https://postgr.es/m/CAM2+6=UNWGKTgh9aOn4=SQ72HfFzbVFseh9=5N54bD6KB+D9OQ@mail.gmail.com
2018-03-26 14:59:37 -07:00
Andres Freund 96b5eac918 Correct some typos in the new JIT code.
Author: Thomas Munro
2018-03-26 12:58:17 -07:00
Andres Freund 32af96b2b1 JIT tuple deforming in LLVM JIT provider.
Performing JIT compilation for deforming gains performance benefits
over unJITed deforming from compile-time knowledge of the tuple
descriptor. Fixed column widths, NOT NULLness, etc can be taken
advantage of.

Right now the JITed deforming is only used when deforming tuples as
part of expression evaluation (and obviously only if the descriptor is
known). It's likely to be beneficial in other cases, too.

By default tuple deforming is JITed whenever an expression is JIT
compiled. There's a separate boolean GUC controlling it, but that's
expected to be primarily useful for development and benchmarking.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-26 12:57:19 -07:00
Teodor Sigaev 64f85894ad Set random seed for pgbench.
Setting random could increase reproducibility of test in some cases. Patch
suggests three providers for seed: time (default), strong random
generator (if available) and unsigned constant. Seed could be set from
command line or enviroment variable.

Author: Fabien Coelho
Reviewed by: Chapman Flack
Discussion: https://www.postgresql.org/message-id/flat/20160407082711.q7iq3ykffqxcszkv@alap3.anarazel.de
2018-03-26 18:26:27 +03:00
Alvaro Herrera 530bcf7581 Fix thinko in comment
The listed numbers disagreed with the ones being used in the symbols;
but instead of just fixing the numbers in the comment, use the symbolic
name instead, which seems clearer.

This has been wrong all along, so apply back to 9.5 where BRIN was
introduced.

Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/5ff514f2-8b1e-6366-b11c-8e2ed442562d@2ndquadrant.com
2018-03-26 12:03:42 -03:00
Alvaro Herrera 186b6df2e6 Fix test impredictability
Test 'triggers' fails when another one creates triggers concurrently at
some precise time, because of a missing WHERE clause.

Per buildfarm members snapper, desmoxytes.
2018-03-26 11:46:04 -03:00
Alvaro Herrera 555ee77a96 Handle INSERT .. ON CONFLICT with partitioned tables
Commit eb7ed3f306 enabled unique constraints on partitioned tables,
but one thing that was not working properly is INSERT/ON CONFLICT.
This commit introduces a new node keeps state related to the ON CONFLICT
clause per partition, and fills it when that partition is about to be
used for tuple routing.

Author: Amit Langote, Álvaro Herrera
Reviewed-by: Etsuro Fujita, Pavan Deolasee
Discussion: https://postgr.es/m/20180228004602.cwdyralmg5ejdqkq@alvherre.pgsql
2018-03-26 10:43:54 -03:00
Alvaro Herrera 1b89c2188b Fix typo 2018-03-26 09:56:41 -03:00
Andrew Dunstan 1d494b622f Remove two tests inadvertently added in 2b27273435 2018-03-26 22:53:02 +10:30
Andrew Dunstan 2b27273435 Optimize btree insertions for common case of increasing values
Remember the last page of an index insert if it's the rightmost leaf
page. If the next entry belongs on and can fit in the remembered page,
insert the new entry there as long as we can get a lock on the page.
Otherwise, fall back on the more expensive method of searching for
the right place to insert the entry.

This provides a performance improvement for the common case where an
index entry is for monotonically increasing or nearly monotonically
increasing value such as an identity field or a current timestamp.

Pavan Deolasee
Reviewed by Claudio Freire, Simon Riggs and Peter Geoghegan

Discussion: https://postgr.es/m/CABOikdM9DrupjyKZZFM5k8-0RCDs1wk6JzEkg7UgSW6QzOwMZw@mail.gmail.com
2018-03-26 22:39:24 +10:30
Tom Lane d0c0c89453 Fix unsafe extraction of the OID part of a relation filename.
Commit 8694cc96b did this randomly differently from other callers of
parse_filename_for_nontemp_relation().  Perhaps unsurprisingly,
the randomly different way is wrong; it fails to ensure the
extracted string is null-terminated.  Per buildfarm member skink.

Discussion: https://postgr.es/m/14453.1522001792@sss.pgh.pa.us
2018-03-25 15:15:40 -04:00
Peter Eisentraut bf4a8676c3 pg_resetwal: Allow users to change the WAL segment size
This adds a new option --wal-segsize (analogous to initdb) that changes
the WAL segment size in pg_control.

Author: Nathan Bossart <bossartn@amazon.com>
2018-03-25 14:58:49 -04:00
Peter Eisentraut 8ad8d916f9 initdb: Further polishing of --wal-segsize option
Extend documentation.  Improve option parsing in case no argument was
specified.
2018-03-25 14:58:21 -04:00
Tom Lane 3a2cb59887 Remove useless if-test.
Coverity complained that this check is pointless, and it's right.
There is no case where we'd call ExecutorStart with a null plannedstmt,
and if we did, it'd have crashed before here.  Thinko in commit cc415a56d.
2018-03-25 14:54:16 -04:00
Peter Eisentraut cc547cf08f pg_resetwal: Fix logical typo in code
introduced in f1a074b146
2018-03-25 09:09:04 -04:00
Tom Lane 2dd3f969f5 Add #includes missed in commit e22b27f0cb.
Leaving out getopt_long.h works on some platforms, but not all.
Per buildfarm.

Discussion: https://postgr.es/m/20180325030552.f462zqmohs6cqekg@alap3.anarazel.de
2018-03-25 00:46:43 -04:00
Tom Lane 038a2ed139 Stabilize regression test result.
If random() returns a result sufficiently close to zero, float8out
switches to scientific notation, breaking this test case's expectation
that the output should look like '0.xxxxxxxxx'.  Casting to numeric
should fix that.  Per buildfarm member pogona.

Discussion: https://postgr.es/m/20180324212502.wt4serghfidge2on@alap3.anarazel.de
2018-03-25 00:09:26 -04:00
Peter Eisentraut e22b27f0cb Add long options to pg_resetwal and pg_controldata
We were running out of good single-letter options for some upcoming
pg_resetwal functionality, so add long options to create more
possibilities.  Add to pg_controldata as well for symmetry.

based on patch by Bossart, Nathan <bossartn@amazon.com>
2018-03-24 21:49:53 -04:00
Peter Eisentraut 496d56670a initdb: Improve --wal-segsize handling
Give separate error messages for when the argument is not a number and
when it is not the right kind of number.

Fix wording in the help message.
2018-03-24 15:40:21 -04:00
Peter Eisentraut 52f3a9d6a3 Small refactoring
Put the "atomic" argument of ExecuteDoStmt() and ExecuteCallStmt() into
a variable instead of repeating the formula.
2018-03-23 17:18:22 -04:00
Peter Eisentraut 66ee8513d1 Further fix interaction of Perl and stdbool.h
In the case that PostgreSQL uses stdbool.h but Perl doesn't, we need to
prevent Perl from defining bool, to prevent compiler warnings about
redefinition.
2018-03-23 16:31:49 -04:00
Tom Lane 4b538727e2 Fix make rules that generate multiple output files.
For years, our makefiles have correctly observed that "there is no correct
way to write a rule that generates two files".  However, what we did is to
provide empty rules that "generate" the secondary output files from the
primary one, and that's not right either.  Depending on the details of
the creating process, the primary file might end up timestamped later than
one or more secondary files, causing subsequent make runs to consider the
secondary file(s) out of date.  That's harmless in a plain build, since
make will just re-execute the empty rule and nothing happens.  But it's
fatal in a VPATH build, since make will expect the secondary file to be
rebuilt in the build directory.  This would manifest as "file not found"
failures during VPATH builds from tarballs, if we were ever unlucky enough
to ship a tarball with apparently out-of-date secondary files.  (It's not
clear whether that has ever actually happened, but it definitely could.)

To ensure that secondary output files have timestamps >= their primary's,
change our makefile convention to be that we provide a "touch $@" action
not an empty rule.  Also, make sure that this rule actually gets invoked
during a distprep run, else the hazard remains.

It's been like this a long time, so back-patch to all supported branches.

In HEAD, I skipped the changes in src/backend/catalog/Makefile, because
those rules are due to get replaced soon in the bootstrap data format
patch, and there seems no need to create a merge issue for that patch.
If for some reason we fail to land that patch in v11, we'll need to
back-fill the changes in that one makefile from v10.

Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us
2018-03-23 13:46:00 -04:00
Teodor Sigaev 8694cc96b5 Exclude unlogged tables from base backups
Exclude unlogged tables from base backup entirely except init fork which marks
created unlogged table. The next question is do not backup temp table but
it's a story for separate patch.

Author: David Steele
Review by: Adam Brightwell, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/04791bab-cb04-ba43-e9c0-664a4c1ffb2c@pgmasters.net
2018-03-23 19:14:12 +03:00
Peter Eisentraut 7ba7986fb4 Fix interaction of Perl and stdbool.h
Revert the PL/Perl-specific change in
9a95a77d9d.  We must not prevent Perl from
using stdbool.h when it has been built to do so, even if it uses an
incompatible size.  Otherwise, we would be imposing our bool on Perl,
which will lead to crashes because of the size mismatch.

Instead, we undef bool after including the Perl headers, as we did
previously, but now only if we are not using stdbool.h ourselves.
Record that choice in c.h as USE_STDBOOL.  This will also make it easier
to apply that coding pattern elsewhere if necessary.
2018-03-23 10:31:10 -04:00
Peter Eisentraut f1a074b146 pg_resetwal: Prevent division-by-zero errors
Handle the case where the pg_control file specifies a WAL segment size
of 0 bytes.  This would previously have led to a division by zero error.
Change this to assume the whole file is corrupt and go to guess
everything.

Discussion: https://www.postgresql.org/message-id/a6163ad7-cc99-fdd1-dfad-25df73032ab8%402ndquadrant.com
2018-03-23 10:14:25 -04:00
Alvaro Herrera 86f575948c Allow FOR EACH ROW triggers on partitioned tables
Previously, FOR EACH ROW triggers were not allowed in partitioned
tables.  Now we allow AFTER triggers on them, and on trigger creation we
cascade to create an identical trigger in each partition.  We also clone
the triggers to each partition that is created or attached later.

This means that deferred unique keys are allowed on partitioned tables,
too.

Author: Álvaro Herrera
Reviewed-by: Peter Eisentraut, Simon Riggs, Amit Langote, Robert Haas,
	Thomas Munro
Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql
2018-03-23 10:48:22 -03:00
Peter Eisentraut 5700aa1301 pg_resetwal: Add simple test suite
Some subsequent patches will add to this, but to avoid conflicts, set up
the basics separately.
2018-03-23 08:42:25 -04:00