Commit Graph

1648 Commits

Author SHA1 Message Date
Tom Lane f0c2a5bba6 Avoid leaking memory in RestoreGUCState(), and improve comments.
RestoreGUCState applied InitializeOneGUCOption to already-live
GUC entries, causing any malloc'd subsidiary data to be forgotten.
We do want the effect of resetting the GUC to its compiled-in
default, and InitializeOneGUCOption seems like the best way to do
that, so add code to free any existing subsidiary data beforehand.

The interaction between can_skip_gucvar, SerializeGUCState, and
RestoreGUCState is way more subtle than their opaque comments
would suggest to an unwary reader.  Rewrite and enlarge the
comments to try to make it clearer what's happening.

Remove a long-obsolete assertion in read_nondefault_variables: the
behavior of set_config_option hasn't depended on IsInitProcessingMode
since f5d9698a8 installed a better way of controlling it.

Although this is fixing a clear memory leak, the leak is quite unlikely
to involve any large amount of data, and it can only happen once in the
lifetime of a worker process.  So it seems unnecessary to take any
risk of back-patching.

Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us
2021-03-19 23:03:17 -04:00
Thomas Munro 61752afb26 Provide recovery_init_sync_method=syncfs.
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery.  This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.

Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory.  This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.

The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.

Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
2021-03-20 12:07:28 +13:00
Robert Haas bbe0a81db6 Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 15:10:38 -04:00
Tom Lane 377b7a8300 Don't leak malloc'd strings when a GUC setting is rejected.
Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks.  Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tomas Vondra cd91de0d17 Remove temporary files after backend crash
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
2021-03-18 17:38:28 +01:00
Amit Kapila c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
2021-03-18 07:25:27 +05:30
Peter Geoghegan 9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Fujii Masao ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Heikki Linnakangas 3174d69fb9 Remove server and libpq support for old FE/BE protocol version 2.
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.

The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.

Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
2021-03-04 10:45:55 +02:00
Peter Eisentraut e527a99055 Some copy-editing of GUC descriptions 2021-03-03 07:14:35 +01:00
Tom Lane d16f8c8e41 Mark default_transaction_read_only as GUC_REPORT.
This allows clients to find out the setting at connection time without
having to expend a query round trip to do so; which is helpful when
trying to identify read/write servers.  (One must also look at
in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.)
Modifying libpq to make use of this will come soon, but I felt it
cleaner to push the server change separately.

Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-03-02 13:53:54 -05:00
Alvaro Herrera d9d076222f
VACUUM: ignore indexing operations with CONCURRENTLY
As envisioned in commit c98763bf51, it is possible for VACUUM to
ignore certain transactions that are executing CREATE INDEX CONCURRENTLY
and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's
because we know those transactions are not going to examine any other
tables, and are not going to execute anything else in the same
transaction.  (Only operations on "safe" indexes can be ignored: those
on indexes that are neither partial nor expressional).

This is extremely useful in cases where CIC/RC can run for a very long
time, because that used to be a significant headache for concurrent
vacuuming of other tables.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
2021-02-23 12:15:09 -03:00
Thomas Munro db8374d804 Remove outdated reference to RAID spindles.
Commit b09ff536 left behind some outdated advice in the long_desc field
of the GUC "effective_io_concurrency".  Remove it.

Back-patch to 13.

Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJyyWqFBxL9gEj-qtjBThGjhAOBE8GBnF8MUJOJ3vrfag%40mail.gmail.com
2021-02-22 14:42:15 +13:00
Peter Eisentraut f5465fade9 Allow specifying CRL directory
Add another method to specify CRLs, hashed directory method, for both
server and client side.  This offers a means for server or libpq to
load only CRLs that are required to verify a certificate.  The CRL
directory is specifed by separate GUC variables or connection options
ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
sslcrl, so both methods can be used at the same time.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
2021-02-18 07:59:10 +01:00
Thomas Munro f900a79ecd Default to wal_sync_method=fdatasync on FreeBSD.
FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
choose open_datasync as its default value.  That may not be a good
choice for all systems, and performs worse than fdatasync in some
scenarios.  Let's preserve the existing default behavior for now.

Like commit 576477e73c, which did the same for Linux, back-patch to all
supported releases.

Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com
2021-02-15 16:04:59 +13:00
Peter Geoghegan e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Fujii Masao 0650ff2303 Add GUC to log long wait times on recovery conflicts.
This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.

Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.

Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-08 00:47:03 +09:00
Tom Lane 9486e7b666 Improve commentary in timeout.c.
On re-reading I realized that I'd missed one race condition in the new
timeout code.  It's safe, but add a comment explaining it.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 22:09:16 -05:00
Tom Lane 9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Tom Lane 09cf1d5226 Improve timeout.c's handling of repeated timeout set/cancel.
A very common usage pattern is that we set a timeout that we don't
expect to reach, cancel it after a little bit, and later repeat.
With the original implementation of timeout.c, this results in one
setitimer() call per timeout set or cancel.  We can do a lot better
by being lazy about changing the timeout interrupt request, namely:
(1) never cancel the outstanding interrupt, even when we have no
active timeout events;
(2) if we need to set an interrupt, but there already is one pending
at or before the required time, leave it alone.  When the interrupt
happens, the signal handler will reschedule it at whatever time is
then needed.

For example, with a one-second setting for statement_timeout, this
method results in having to interact with the kernel only a little
more than once a second, no matter how many statements we execute
in between.  The mainline code might never call setitimer() at all
after the first time, while each time the signal handler fires,
it sees that the then-pending request is most of a second away,
and that's when it sets the next interrupt request for.  Each
mainline timeout-set request after that will observe that the time
it wants is past the pending interrupt request time, and do nothing.

This also works pretty well for cases where a few different timeout
lengths are in use, as long as none of them are very short.  But
that describes our usage well.

Idea and original patch by Thomas Munro; I fixed a race condition
and improved the comments.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 18:28:52 -05:00
Peter Eisentraut 4656e3d668 Replace CLOBBER_CACHE_ALWAYS with run-time GUC
Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical
to use for testing in PostgreSQL because it's so slow and because it's
toggled on/off only at build time.  It is helpful when hunting bugs in
any code that uses the sycache/relcache because causes cache
invalidations to be injected whenever it would be possible for an
invalidation to occur, whether or not one was really pending.

Address this by providing run-time control over cache clobber
behaviour using the new debug_invalidate_system_caches_always GUC.
Support is not compiled in at all unless assertions are enabled or
CLOBBER_CACHE_ENABLED is explicitly defined at compile time.  It
defaults to 0 if compiled in, so it has negligible effect on assert
build performance by default.

When support is compiled in, test code can now set
debug_invalidate_system_caches_always=1 locally to a backend to test
specific queries, functions, extensions, etc.  Or tests can toggle it
globally for a specific test case while retaining normal performance
during test setup and teardown.

For backwards compatibility with existing test harnesses and scripts,
debug_invalidate_system_caches_always defaults to 1 if
CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE
is defined.

CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the
related RECOVER_RELATION_BUILD_MEMORY setting for the relcache.

Author: Craig Ringer <craig.ringer@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMsr+YF=+ctXBZj3ywmvKNUjWpxmuTuUKuv-rgbHGX5i5pLstQ@mail.gmail.com
2021-01-06 10:46:44 +01:00
Tom Lane bf8a662c9a Introduce a new GUC_REPORT setting "in_hot_standby".
Aside from being queriable via SHOW, this value is sent to the client
immediately at session startup, and again later on if the server gets
promoted to primary during the session.  The immediate report will be
used in an upcoming patch to avoid an extra round trip when trying to
connect to a primary server.

Haribabu Kommi, Greg Nancarrow, Tom Lane; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-01-05 16:18:05 -05:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane 860fe27ee1 Fix up usage of krb_server_keyfile GUC parameter.
secure_open_gssapi() installed the krb_server_keyfile setting as
KRB5_KTNAME unconditionally, so long as it's not empty.  However,
pg_GSS_recvauth() only installed it if KRB5_KTNAME wasn't set already,
leading to a troubling inconsistency: in theory, clients could see
different sets of server principal names depending on whether they
use GSSAPI encryption.  Always using krb_server_keyfile seems like
the right thing, so make both places do that.  Also fix up
secure_open_gssapi()'s lack of a check for setenv() failure ---
it's unlikely, surely, but security-critical actions are no place
to be sloppy.

Also improve the associated documentation.

This patch does nothing about secure_open_gssapi()'s use of setenv(),
and indeed causes pg_GSS_recvauth() to use it too.  That's nominally
against project portability rules, but since this code is only built
with --with-gssapi, I do not feel a need to do something about this
in the back branches.  A fix will be forthcoming for HEAD though.

Back-patch to v12 where GSSAPI encryption was introduced.  The
dubious behavior in pg_GSS_recvauth() goes back further, but it
didn't have anything to be inconsistent with, so let it be.

Discussion: https://postgr.es/m/2187460.1609263156@sss.pgh.pa.us
2020-12-30 11:38:42 -05:00
Bruce Momjian 3187ef7c46 Revert "Add key management system" (978f869b99) & later commits
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.

Reported-by: Tom Lane, Michael Paquier

Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
2020-12-27 21:37:42 -05:00
Bruce Momjian 978f869b99 Add key management system
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits.  The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode.  A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start.  New parameters allow a file descriptor open to the terminal to
be passed.  pg_upgrade support has also been added.

Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us

Author: Masahiko Sawada, me, Stephen Frost
2020-12-25 10:19:44 -05:00
Michael Paquier 6db27037b9 Fix portability issues with parsing of recovery_target_xid
The parsing of this parameter has been using strtoul(), which is not
portable across platforms.  On most Unix platforms, unsigned long has a
size of 64 bits, while on Windows it is 32 bits.  It is common in
recovery scenarios to rely on the output of txid_current() or even the
newer pg_current_xact_id() to get a transaction ID for setting up
recovery_target_xid.  The value returned by those functions includes the
epoch in the computed result, which would cause strtoul() to fail where
unsigned long has a size of 32 bits once the epoch is incremented.

WAL records and 2PC data include only information about 32-bit XIDs and
it is not possible to have XIDs across more than one epoch, so
discarding the high bits from the transaction ID set has no impact on
recovery.  On the contrary, the use of strtoul() prevents a consistent
behavior across platforms depending on the size of unsigned long.

This commit changes the parsing of recovery_target_xid to use
pg_strtouint64() instead, available down to 9.6.  There is one TAP test
stressing recovery with recovery_target_xid, where a tweak based on
pg_reset{xlog,wal} is added to bump the XID epoch so as this change gets
tested, as per an idea from Alexander Lakhin.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16780-107fd0c0385b1035@postgresql.org
Backpatch-through: 9.6
2020-12-23 12:51:22 +09:00
Tom Lane a676386b58 Remove operator_precedence_warning.
This GUC was always intended as a temporary solution to help with
finding 9.4-to-9.5 migration issues.  Now that all pre-9.5 branches
are out of support, and 9.5 will be too before v14 is released,
it seems like it's okay to drop it.  Doing so allows removal of
several hundred lines of poorly-tested code in parse_expr.c,
which have been a fertile source of bugs when people did use this.

Discussion: https://postgr.es/m/2234320.1607117945@sss.pgh.pa.us
2020-12-08 16:29:52 -05:00
Peter Eisentraut eb93f3a0b6 Convert elog(LOG) calls to ereport() where appropriate
User-visible log messages should go through ereport(), so they are
subject to translation.  Many remaining elog(LOG) calls are really
debugging calls.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
2020-12-04 14:25:23 +01:00
Fujii Masao 942305a363 Allow restore_command parameter to be changed with reload.
This commit changes restore_command from PGC_POSTMASTER to PGC_SIGHUP.

As the side effect of this commit, restore_command can be reset to
empty during archive recovery. In this setting, archive recovery
tries to replay only WAL files available in pg_wal directory. This is
the same behavior as when the command that always fails is specified
in restore_command.

Note that restore_command still must be specified (not empty) when
starting archive recovery, even after applying this commit. This is
necessary as the safeguard to prevent users from forgetting to
specify restore_command and starting archive recovery.

Thanks to Peter Eisentraut, Michael Paquier, Andres Freund,
Robert Haas and Anastasia Lubennikova for discussion.

Author: Sergei Kornilov
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/2317771549527294@sas2-985f744271ca.qloud-c.yandex.net
2020-12-02 11:00:15 +09:00
Tom Lane 2432b1a040 Avoid spamming the client with multiple ParameterStatus messages.
Up to now, we sent a ParameterStatus message to the client immediately
upon any change in the active value of any GUC_REPORT variable.  This
was only barely okay when the feature was designed; now that we have
things like function SET clauses, there are very plausible use-cases
where a GUC_REPORT variable might change many times within a query
--- and even end up back at its original value, perhaps.  Fortunately
most of our GUC_REPORT variables are unlikely to be changed often;
but there are proposals in play to enlarge that set, or even make it
user-configurable.

Hence, let's fix things to not generate more than one ParameterStatus
message per variable per query, and to not send any message at all
unless the end-of-query value is different from what we last reported.

Discussion: https://postgr.es/m/5708.1601145259@sss.pgh.pa.us
2020-11-25 11:40:44 -05:00
Michael Paquier a05dbf477b Add GUC_LIST_INPUT and GUC_LIST_QUOTE to unix_socket_directories
This should have been done in the initial commit that made
unix_socket_directories a list as of c9b0cbe.  This change allows to
support correctly the case of ALTER SYSTEM, where it is possible to
specify multiple paths as a list, like the following pattern where
flattening is applied to each item:
ALTER SYSTEM SET unix_socket_directories = '/path1', '/path2';

Any parameters specified in postgresql.conf are parsed the same way, so
there is no compatibility change.  pg_dump has a hardcoded list of
parameters marked with GUC_LIST_QUOTE, that gets its routine update.
These are reordered alphabetically for clarity.

Author: Ian Lawrence Barwick
Reviewed-by: Peter Eisentraunt, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAB8KJ=iMOtNY6_sUwV=LQVCJ2zgYHBDyNzVfvE5GN3WQ3v9kQg@mail.gmail.com
2020-11-07 10:30:22 +09:00
Tom Lane f90149e628 Don't use custom OID symbols in pg_type.dat, either.
On the same reasoning as in commit 36b931214, forbid using custom
oid_symbol macros in pg_type as well as pg_proc, so that we always
rely on the predictable macro names generated by genbki.pl.

We do continue to grant grandfather status to the names CASHOID and
LSNOID, although those are now considered deprecated aliases for the
preferred names MONEYOID and PG_LSNOID.  This is because there's
likely to be client-side code using the old names, and this bout of
neatnik-ism doesn't quite seem worth breaking client code.

There might be a case for grandfathering EVTTRIGGEROID, too, since
externally-maintained PLs may reference that symbol.  But renaming
such references to EVENT_TRIGGEROID doesn't seem like a particularly
heavy lift --- we make far more significant backend API changes in
every major release.  For now I didn't add that, but we could
reconsider if there's pushback.

The other names changed here seem pretty unlikely to have any outside
uses.  Again, we could add alias macros if there are complaints, but
for now I didn't.

As before, no need for a catversion bump.

John Naylor

Discussion: https://postgr.es/m/CAFBsxsHpCbjfoddNGpnnnY5pHwckWfiYkMYSF74PmP1su0+ZOw@mail.gmail.com
2020-10-29 13:33:38 -04:00
Tom Lane 397ea901e8 Fix memory leak when guc.c decides a setting can't be applied now.
The prohibitValueChange code paths in set_config_option(), which
are executed whenever we re-read a PGC_POSTMASTER variable from
postgresql.conf, neglected to free anything before exiting.  Thus
we'd leak the proposed new value of a PGC_STRING variable, as noted
by BoChen in bug #16666.  For all variable types, if the check hook
creates an "extra" chunk, we'd also leak that.

These are malloc not palloc chunks, so there is no mechanism for
recovering the leaks before process exit.  Fortunately, the values
are typically not very large, meaning you'd have to go through an
awful lot of SIGHUP configuration-reload cycles to make the leakage
amount to anything.  Still, for a long-lived postmaster process it
could potentially be a problem.

Oversight in commit 2594cf0e8.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/16666-2c41a4eec61b03e1@postgresql.org
2020-10-12 13:31:24 -04:00
Tom Lane 97b6144826 Make postgres.bki use the same literal-string syntax as postgresql.conf.
The BKI file's string quoting conventions were previously quite weird,
perhaps as a result of repurposing a function built to scan
single-quoted strings to scan double-quoted ones.  Change to use the
same rules as we use in GUC files, allowing some simplifications in
genbki.pl and initdb.c.

While at it, completely remove the backend's scanstr() function, which
was essentially a duplicate of the string dequoting code in guc-file.l.
Instead export that one (under a less generic name than it had) and let
bootscanner.l use it.  Now we can clarify that scansup.c exists only to
support the main lexer. We could alternatively have removed GUC_scanstr,
but this way seems better since the previous arrangement could mislead
a reader into thinking that scanstr() had something to do with the main
lexer's handling of string literals.  Maybe it did once, but if so it
was a long time ago.

This patch does not bump catversion, since the initially-installed
catalog contents don't change.  Note however that successful initdb
after applying this patch will require up-to-date postgres.bki as well
as postgres and initdb executables.

In passing, remove a bunch of very-long-obsolete #include's in
bootparse.y and bootscanner.l.

John Naylor

Discussion: https://postgr.es/m/CACPNZCtDpd18T0KATTmCggO2GdVC4ow86ypiq5ENff1VnauL8g@mail.gmail.com
2020-10-04 16:09:55 -04:00
Peter Eisentraut 3e0242b24c Message fixes and style improvements 2020-09-14 06:42:30 +02:00
Tom Lane f3e1e66196 Minor fixes in docs and error messages.
Alexander Lakhin

Discussion: https://postgr.es/m/ce7debdd-c943-d7a7-9b41-687107b27831@gmail.com
2020-09-09 11:53:39 -04:00
Andres Freund fea10a6434 Rename VariableCacheData.nextFullXid to nextXid.
Including Full in variable names duplicates the type information and
leads to overly long names. As FullTransactionId cannot accidentally
be casted to TransactionId that does not seem necessary.

Author: Andres Freund
Discussion: https://postgr.es/m/20200724011143.jccsyvsvymuiqfxu@alap3.anarazel.de
2020-08-11 12:07:14 -07:00
Michael Paquier b8fdee7d0c Add %P to log_line_prefix for parallel group leader
This is useful for monitoring purposes with log parsing.  Similarly to
pg_stat_activity, the leader's PID is shown only for active parallel
workers, minimizing the log footprint for the leaders as the equivalent
shared memory field is set as long as a backend is alive.

Author: Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier, Julien Rouhaud, Tom Lane
Discussion: https://postgr.es/m/20200315111831.GA21492@telsasoft.com
2020-08-03 13:38:48 +09:00
Thomas Munro 7be04496a9 Fix compiler warning from Clang.
Per build farm.

Discussion: https://postgr.es/m/20200731062626.GD3317%40paquier.xyz
2020-07-31 19:08:09 +12:00
Thomas Munro 84b1c63ad4 Preallocate some DSM space at startup.
Create an optional region in the main shared memory segment that can be
used to acquire and release "fast" DSM segments, and can benefit from
huge pages allocated at cluster startup time, if configured.  Fall back
to the existing mechanisms when that space is full.  The size is
controlled by a new GUC min_dynamic_shared_memory, defaulting to 0.

Main region DSM segments initially contain whatever garbage the memory
held last time they were used, rather than zeroes.  That change revealed
that DSA areas failed to initialize themselves correctly in memory that
wasn't zeroed first, so fix that problem.

Discussion: https://postgr.es/m/CA%2BhUKGLAE2QBv-WgGp%2BD9P_J-%3Dyne3zof9nfMaqq1h3EGHFXYQ%40mail.gmail.com
2020-07-31 17:49:58 +12:00
Peter Geoghegan d6c08e29e7 Add hash_mem_multiplier GUC.
Add a GUC that acts as a multiplier on work_mem.  It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.

The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit.  It is intended to enable admin
tuning of the executor's memory usage.  Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).

The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value.  This means that hash-based nodes continue to apply
work_mem in the traditional way by default.

hash_mem_multiplier is generally useful.  However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0).  While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited.  Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).

hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem.  This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit.  Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).

The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation.  That can be taken care of by a later commit.

Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-29 14:14:58 -07:00
Peter Geoghegan bcbf9446a2 Remove hashagg_avoid_disk_plan GUC.
Note: This GUC was originally named enable_hashagg_disk when it appeared
in commit 1f39bce0, which added disk-based hash aggregation.  It was
subsequently renamed in commit 92c58fd9.

Author: Peter Geoghegan
Reviewed-By: Jeff Davis, Álvaro Herrera
Discussion: https://postgr.es/m/9d9d1e1252a52ea1bad84ea40dbebfd54e672a0f.camel%40j-davis.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-27 17:53:19 -07:00
Fujii Masao c3fe108c02 Rename wal_keep_segments to wal_keep_size.
max_slot_wal_keep_size that was added in v13 and wal_keep_segments are
the GUC parameters to specify how much WAL files to retain for
the standby servers. While max_slot_wal_keep_size accepts the number of
bytes of WAL files, wal_keep_segments accepts the number of WAL files.
This difference of setting units between those similar parameters could
be confusing to users.

To alleviate this situation, this commit renames wal_keep_segments to
wal_keep_size, and make users specify the WAL size in it instead of
the number of WAL files.

There was also the idea to rename max_slot_wal_keep_size to
max_slot_wal_keep_segments, in the discussion. But we have been moving
away from measuring in segments, for example, checkpoint_segments was
replaced by max_wal_size. So we concluded to rename wal_keep_segments
to wal_keep_size.

Back-patch to v13 where max_slot_wal_keep_size was added.

Author: Fujii Masao
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, David Steele
Discussion: https://postgr.es/m/574b4ea3-e0f9-b175-ead2-ebea7faea855@oss.nttdata.com
2020-07-20 13:30:18 +09:00
Thomas Munro d2bddc2500 Add huge_page_size setting for use on Linux.
This allows the huge page size to be set explicitly.  The default is 0,
meaning it will use the system default, as before.

Author: Odin Ugedal <odin@ugedal.com>
Discussion: https://postgr.es/m/20200608154639.20254-1-odin%40ugedal.com
2020-07-17 14:33:00 +12:00
Andres Freund e07633646a code: replace 'master' with 'leader' where appropriate.
Leader already is the more widely used terminology, but a few places
didn't get the message.

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:58:32 -07:00
Andres Freund 5e7bbb5286 code: replace 'master' with 'primary' where appropriate.
Also changed "in the primary" to "on the primary", and added a few
"the" before "primary".

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:57:23 -07:00
Peter Eisentraut e61225ffab Rename enable_incrementalsort for clarity
Author: James Coleman <jtc331@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/df652910-e985-9547-152c-9d4357dc3979%402ndquadrant.com
2020-07-05 11:43:08 +02:00
Jeff Davis 92c58fd948 Rework HashAgg GUCs.
Eliminate enable_groupingsets_hash_disk, which was primarily useful
for testing grouping sets that use HashAgg and spill. Instead, hack
the table stats to convince the planner to choose hashed aggregation
for grouping sets that will spill to disk. Suggested by Melanie
Plageman.

Rename enable_hashagg_disk to hashagg_avoid_disk_plan, and invert the
meaning of on/off. The new name indicates more strongly that it only
affects the planner. Also, the word "avoid" is less definite, which
should avoid surprises when HashAgg still needs to use the
disk. Change suggested by Justin Pryzby, though I chose a different
GUC name.

Discussion: https://postgr.es/m/CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA%40mail.gmail.com
Discussion: https://postgr.es/m/20200610021544.GA14879@telsasoft.com
Backpatch-through: 13
2020-06-11 12:57:43 -07:00
Peter Eisentraut c7eab0e97e Change default of password_encryption to scram-sha-256
Also, the legacy values on/true/yes/1 for password_encryption that
mapped to md5 are removed.  The only valid values are now
scram-sha-256 and md5.

Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Discussion: https://www.postgresql.org/message-id/flat/d5b0ad33-7d94-bdd1-caac-43a1c782cab2%402ndquadrant.com
2020-06-10 16:42:55 +02:00
Peter Eisentraut 5a4ada71a8 Update description of parameter password_encryption
The previous description string still described the pre-PostgreSQL
10 (pre eb61136dc7) behavior of
selecting between encrypted and unencrypted, but it is now choosing
between encryption algorithms.
2020-06-10 11:57:41 +02:00
Peter Eisentraut f4c88ce1a2 Formatting and punctuation improvements in postgresql.conf.sample 2020-06-07 14:35:12 +02:00
Michael Paquier 55ca50deb8 Fix some mentions to memory units in postgresql.conf.sample
The default unit for max_slot_wal_keep_size is megabytes.  While on it,
also change temp_file_limit to use a more consistent wording.

Reported-by: Jeff Janes, Fujii Masao
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAMkU=1wWZhhjpwRFKJ9waQGxxROeC0P6UqPvb90fAaGz7dhoHA@mail.gmail.com
2020-05-28 15:39:05 +09:00
Noah Misch 3350fb5d1f Clear some style deviations. 2020-05-21 08:31:16 -07:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Alvaro Herrera 17cc133f01
Dial back -Wimplicit-fallthrough to level 3
The additional pain from level 4 is excessive for the gain.

Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.

Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
2020-05-13 15:31:14 -04:00
Alvaro Herrera 3e9744465d
Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGS
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.

(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)

Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
2020-05-12 16:07:30 -04:00
Alvaro Herrera c655077639
Allow users to limit storage reserved by replication slots
Replication slots are useful to retain data that may be needed by a
replication system.  But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space.  This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size.  Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
2020-04-07 18:35:00 -04:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Noah Misch c6b92041d3 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 12:25:34 -07:00
Tom Lane 0b34e7d307 Improve user control over truncation of logged bind-parameter values.
This patch replaces the boolean GUC log_parameters_on_error introduced
by commit ba79cb5dc with an integer log_parameter_max_length_on_error,
adding the ability to specify how many bytes to trim each logged
parameter value to.  (The previous coding hard-wired that choice at
64 bytes.)

In addition, add a new parameter log_parameter_max_length that provides
similar control over truncation of query parameters that are logged in
response to statement-logging options, as opposed to errors.  Previous
releases always logged such parameters in full, possibly causing log
bloat.

For backwards compatibility with prior releases,
log_parameter_max_length defaults to -1 (log in full), while
log_parameter_max_length_on_error defaults to 0 (no logging).

Per discussion, log_parameter_max_length is SUSET since the DBA should
control routine logging behavior, but log_parameter_max_length_on_error
is USERSET because it also affects errcontext data sent back to the
client.

Alexey Bashtanov, editorialized a little by me

Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc
2020-04-02 15:04:51 -04:00
Thomas Munro 37b3794dfc Add maintenance_io_concurrency to postgresql.conf.sample.
New GUC from commit fc34b0d9.
2020-04-02 16:50:36 +13:00
David Rowley b07642dbcd Trigger autovacuum based on number of INSERTs
Traditionally autovacuum has only ever invoked a worker based on the
estimated number of dead tuples in a table and for anti-wraparound
purposes. For the latter, with certain classes of tables such as
insert-only tables, anti-wraparound vacuums could be the first vacuum that
the table ever receives. This could often lead to autovacuum workers being
busy for extended periods of time due to having to potentially freeze
every page in the table. This could be particularly bad for very large
tables. New clusters, or recently pg_restored clusters could suffer even
more as many large tables may have the same relfrozenxid, which could
result in large numbers of tables requiring an anti-wraparound vacuum all
at once.

Here we aim to reduce the work required by anti-wraparound and aggressive
vacuums in general, by triggering autovacuum when the table has received
enough INSERTs. This is controlled by adding two new GUCs and reloptions;
autovacuum_vacuum_insert_threshold and
autovacuum_vacuum_insert_scale_factor. These work exactly the same as the
existing scale factor and threshold controls, only base themselves off the
number of inserts since the last vacuum, rather than the number of dead
tuples. New controls were added rather than reusing the existing
controls, to allow these new vacuums to be tuned independently and perhaps
even completely disabled altogether, which can be done by setting
autovacuum_vacuum_insert_threshold to -1.

We make no attempt to skip index cleanup operations on these vacuums as
they may trigger for an insert-mostly table which continually doesn't have
enough dead tuples to trigger an autovacuum for the purpose of removing
those dead tuples. If we were to skip cleaning the indexes in this case,
then it is possible for the index(es) to become bloated over time.

There are additional benefits to triggering autovacuums based on inserts,
as tables which never contain enough dead tuples to trigger an autovacuum
are now more likely to receive a vacuum, which can mark more of the table
as "allvisible" and encourage the query planner to make use of Index Only
Scans.

Currently, we still obey vacuum_freeze_min_age when triggering these new
autovacuums based on INSERTs. For large insert-only tables, it may be
beneficial to lower the table's autovacuum_freeze_min_age so that tuples
are eligible to be frozen sooner. Here we've opted not to zero that for
these types of vacuums, since the table may just be insert-mostly and we
may otherwise freeze tuples that are still destined to be updated or
removed in the near future.

There was some debate to what exactly the new scale factor and threshold
should default to. For now, these are set to 0.2 and 1000, respectively.
There may be some motivation to adjust these before the release.

Author: Laurenz Albe, Darafei Praliaskouski
Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby
Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
2020-03-28 19:20:12 +13:00
Alvaro Herrera 1e6148032e
Allow walreceiver configuration to change on reload
The parameters primary_conninfo, primary_slot_name and
wal_receiver_create_temp_slot can now be changed with a simple "reload"
signal, no longer requiring a server restart.  This is achieved by
signalling the walreceiver process to terminate and having it start
again with the new values.

Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
2020-03-27 19:51:37 -03:00
Alvaro Herrera 092c6936de
Set wal_receiver_create_temp_slot PGC_POSTMASTER
Commit 3297308278 gave walreceiver the ability to create and use a
temporary replication slot, and made it controllable by a GUC (enabled
by default) that can be changed with SIGHUP.  That's useful but has two
problems: one, it's possible to cause the origin server to fill its disk
if the slot doesn't advance in time; and also there's a disconnect
between state passed down via the startup process and GUCs that
walreceiver reads directly.

We handle the first problem by setting the option to disabled by
default.  If the user enables it, its on their head to make sure that
disk doesn't fill up.

We handle the second problem by passing the flag via startup rather than
having walreceiver acquire it directly, and making it PGC_POSTMASTER
(which ensures a walreceiver always has the fresh value).  A future
commit can relax this (to PGC_SIGHUP again) by having the startup
process signal walreceiver to shutdown whenever the value changes.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz
2020-03-27 16:20:33 -03:00
Peter Eisentraut f15ace7935 Fix compiler warning on Cygwin
bf68b79e50 introduced an unused variable
compiler warning on Cygwin.
2020-03-24 19:31:02 +01:00
Noah Misch de9396326e Revert "Skip WAL for new relfilenodes, under wal_level=minimal."
This reverts commit cb2fd7eac2.  Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64.  Back-patch to 9.5 (all supported versions).

Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
2020-03-22 09:24:09 -07:00
Noah Misch cb2fd7eac2 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Back-patch to 9.5 (all supported versions).  This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC.  As
always, update standby systems before master systems.  This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions.  (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-03-21 09:38:26 -07:00
Jeff Davis 1f39bce021 Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".

In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).

Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 15:42:02 -07:00
Thomas Munro fc34b0d9de Introduce a maintenance_io_concurrency setting.
Introduce a GUC and a tablespace option to control I/O prefetching, much
like effective_io_concurrency, but for work that is done on behalf of
many client sessions.

Use the new setting in heapam.c instead of the hard-coded formula
effective_io_concurrency + 10 introduced by commit 558a9165e0.  Go with
a default value of 10 for now, because it's a round number pretty close
to the value used for that existing case.

Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
2020-03-16 17:14:26 +13:00
Thomas Munro b09ff53667 Simplify the effective_io_concurrency setting.
The effective_io_concurrency GUC and equivalent tablespace option were
previously passed through a formula based on a theory about RAID
spindles and probabilities, to arrive at the number of pages to prefetch
in bitmap heap scans.  Tomas Vondra, Andres Freund and others argued
that it was anachronistic and hard to justify, and commit 558a9165e0
already started down the path of bypassing it in new code.  We agreed to
drop that logic and use the value directly.

For the default setting of 1, there is no change in effect.  Higher
settings can be converted from the old meaning to the new with:

  select round(sum(OLD / n::float)) from generate_series(1, OLD) s(n);

We might want to consider renaming the GUC before the next release given
the change in meaning, but it's not clear that many users had set it
very carefully anyway.  That decision is deferred for now.

Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
2020-03-16 17:14:26 +13:00
Peter Eisentraut 70a7b4776b Add backend type to csvlog and optionally log_line_prefix
The backend type, which corresponds to what
pg_stat_activity.backend_type shows, is added as a column to the
csvlog and can optionally be added to log_line_prefix using the new %b
placeholder.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-15 11:20:21 +01:00
Peter Eisentraut 8e8a0becb3 Unify several ways to tracking backend type
Add a new global variable MyBackendType that uses the same BackendType
enum that was previously only used by the stats collector.  That way
several duplicate ways of checking what type a particular process is
can be simplified.  Since it's no longer just for stats, move to
miscinit.c and rename existing functions to match the expanded
purpose.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-13 14:01:10 +01:00
Peter Eisentraut bf68b79e50 Refactor ps_status.c API
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary.  The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.

Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.

BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened.  So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-11 16:38:31 +01:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Fujii Masao d9249441ef Mark ssl_passphrase_command as GUC_SUPERUSER_ONLY.
This commit changes the GUC ssl_passphrase_command so that
it's examinable by only superuser and a member of pg_read_all_settings.
Per discussion, we determined to do this because the parameter may
contain a sensitive informtaion like a passphrase itself.

Author: Insung Moon
Reviewed-by: Keisuke Kuroda
Discussion: https://postgr.es/m/CAEMmqBuHVGayc+QkYKgx3gWSdqwTAQGw+0DYn3WhcX-eNa2ntA@mail.gmail.com
2020-03-09 11:41:31 +09:00
Tom Lane 3ed2005ff5 Introduce macros for typalign and typstorage constants.
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code.  But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage.  It's never
too late to make it better though, so let's do that.

The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.

I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references.  But that's not fatal --- there's no expectation that
we'd actually change any of these values.  We can clean up stragglers
over time.

Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
2020-03-04 10:34:25 -05:00
Tom Lane 3d475515a1 Account explicitly for long-lived FDs that are allocated outside fd.c.
The comments in fd.c have long claimed that all file allocations should
go through that module, but in reality that's not always practical.
fd.c doesn't supply APIs for invoking some FD-producing syscalls like
pipe() or epoll_create(); and the APIs it does supply for non-virtual
FDs are mostly insistent on releasing those FDs at transaction end;
and in some cases the actual open() call is in code that can't be made
to use fd.c, such as libpq.

This has led to a situation where, in a modern server, there are likely
to be seven or so long-lived FDs per backend process that are not known
to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
few spare FDs if max_files_per_process is >= the system ulimit and
fd.c had opened all the files it thought it safely could.  The
contrib/postgres_fdw regression test, in particular, could easily be
made to fall over by running it under a restrictive ulimit.

To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
that allow outside callers to tell fd.c that they have or want to allocate
a FD that's not directly managed by fd.c.  Add calls to track all the
fixed FDs in a standard backend session, so that we are honestly
guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
limit in a backend's idle state.  The coding rules for these functions say
that there's no need to call them in code that just allocates one FD over
a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
That means that there aren't all that many places where we need to worry.
But postgres_fdw and dblink must use this facility to account for
long-lived FDs consumed by libpq connections.  There may be other places
where it's worth doing such accounting, too, but this seems like enough
to solve the immediate problem.

Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
(Callers can choose to ignore this limit, but of course it's unwise
to do so except for fixed file allocations.)  I also reduced the limit
on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
Conceivably a smarter rule could be used here --- but in practice,
on reasonable systems, max_safe_fds should be large enough that this
isn't much of an issue, so KISS for now.  To avoid possible regression
in the number of external or allocated files that can be opened,
increase FD_MINFREE and the lower limit on max_files_per_process a
little bit; we now insist that the effective "ulimit -n" be at least 64.

This seems like pretty clearly a bug fix, but in view of the lack of
field complaints, I'll refrain from risking a back-patch.

Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
2020-02-24 17:28:33 -05:00
Michael Paquier 414c2fd1e1 Revert "Add GUC checks for ssl_min_protocol_version and ssl_max_protocol_version"
This reverts commit 41aadee, as the GUC checks could run on older values
with the new values used, and result in incorrect errors if both
parameters are changed at the same time.

Per complaint from Tom Lane.

Discussion: https://postgr.es/m/27574.1581015893@sss.pgh.pa.us
Backpatch-through: 12
2020-02-07 08:10:40 +09:00
Michael Paquier f1f10a1ba9 Add declaration-level assertions for compile-time checks
Those new assertions can be used at file scope, outside of any function
for compilation checks.  This commit provides implementations for C and
C++, and fallback implementations.

Author: Peter Smith
Reviewed-by: Andres Freund, Kyotaro Horiguchi, Dagfinn Ilmari Mannsåker,
Michael Paquier
Discussion: https://postgr.es/m/201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217
2020-02-03 14:48:42 +09:00
Alvaro Herrera c9d2977519 Clean up newlines following left parentheses
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars.  However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason.  Remove those newlines, and reflow some
of those lines for some extra naturality.

Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
2020-01-30 13:42:14 -03:00
Alvaro Herrera 4e89c79a52 Remove excess parens in ereport() calls
Cosmetic cleanup, not worth backpatching.

Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
Reviewed-by: Tom Lane, Michael Paquier
2020-01-30 13:32:04 -03:00
Tom Lane 3ec20c7091 Fix EXPLAIN (SETTINGS) to follow policy about when to print empty fields.
In non-TEXT output formats, the "Settings" field should appear when
requested, even if it would be empty.

Also, get rid of the premature optimization of counting all the
GUC_EXPLAIN variables at startup.  Since there was no provision for
adjusting that count later, all it'd take would be some extension marking
a parameter as GUC_EXPLAIN to risk an assertion failure or memory stomp.
We could make get_explain_guc_options() count those variables on-the-fly,
or dynamically resize its array ... but TBH I do not think that making a
transient array of pointers a bit smaller is worth any extra complication,
especially when you consider all the other transient space EXPLAIN eats.
So just allocate that array at the max possible size.

In HEAD, also add some regression test coverage for this feature.

Because of the memory-stomp hazard, back-patch to v12 where this
feature was added.

Discussion: https://postgr.es/m/19416.1580069629@sss.pgh.pa.us
2020-01-26 16:32:19 -05:00
Fujii Masao 41c184bc64 Add GUC ignore_invalid_pages.
Detection of WAL records having references to invalid pages
during recovery causes PostgreSQL to raise a PANIC-level error,
aborting the recovery. Setting ignore_invalid_pages to on causes
the system to ignore those WAL records (but still report a warning),
and continue recovery. This behavior may cause crashes, data loss,
propagate or hide corruption, or other serious problems.
However, it may allow you to get past the PANIC-level error,
to finish the recovery, and to cause the server to start up.

Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAHGQGwHCK6f77yeZD4MHOnN+PaTf6XiJfEB+Ce7SksSHjeAWtg@mail.gmail.com
2020-01-22 11:56:34 +09:00
Michael Paquier 41aadeeb12 Add GUC checks for ssl_min_protocol_version and ssl_max_protocol_version
Mixing incorrect bounds set in the SSL context leads to confusing error
messages generated by OpenSSL which are hard to act on.  New checks are
added within the GUC machinery to improve the user experience as they
apply to any SSL implementation, not only OpenSSL, and doing the checks
beforehand avoids the creation of a SSL during a reload (or startup)
which we know will never be used anyway.

Backpatch down to 12, as those parameters have been introduced by
e73e67c.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200114035420.GE1515@paquier.xyz
Backpatch-through: 12
2020-01-18 12:32:43 +09:00
Alvaro Herrera a166d408eb Report progress of ANALYZE commands
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for ANALYZE.

Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
	Vignesh C, Amit Langote
2020-01-15 11:14:39 -03:00
Peter Eisentraut 3297308278 walreceiver uses a temporary replication slot by default
If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot.  A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.

Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com
2020-01-14 14:40:41 +01:00
Robert Haas 8147278589 Increase the maximum value of track_activity_query_size.
This one-line change provoked a lot of discussion, but ultimately
the consensus seems to be that allowing a larger value might be
useful to somebody, and probably won't hurt anyone who chooses
not to take advantage of the higher maximum limit.

Vyacheslav Makarov, reviewed by many people.

Discussion: http://postgr.es/m/7b5ecc5a9991045e2f13c84e3047541d@postgrespro.ru
2020-01-07 12:14:19 -05:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Alvaro Herrera c4dcd9144b Avoid splitting C string literals with \-newline
Using \ is unnecessary and ugly, so remove that.  While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.

Leave contrib/tablefunc alone.

Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
2019-12-24 12:44:12 -03:00
Peter Eisentraut 8c6d30f211 Fix compiler warnings on MSYS2
The PS_USE_NONE case in ps_status.c left a couple of unused variables
exposed.

Discussion: https://www.postgresql.org/message-id/flat/6b467edc-4018-521f-ab18-171f098557ca%402ndquadrant.com
2019-12-20 08:16:44 +01:00
Alvaro Herrera ba79cb5dc8 Emit parameter values during query bind/execute errors
This makes such log entries more useful, since the cause of the error
can be dependent on the parameter values.

Author: Alexey Bashtanov, Álvaro Herrera
Discussion: https://postgr.es/m/0146a67b-a22a-0519-9082-bc29756b93a2@imap.cc
Reviewed-by: Peter Eisentraut, Andres Freund, Tom Lane
2019-12-11 18:03:35 -03:00
Tom Lane 8729fa7248 Fix tuple column count in pg_control_init().
Oversight in commit 2e4db241b.

Nathan Bossart

Discussion: https://postgr.es/m/1B616360-396A-4482-AA28-375566C86160@amazon.com
2019-12-10 17:52:13 -05:00
Peter Eisentraut b1abfec825 Update minimum SSL version
Change default of ssl_min_protocol_version to TLSv1.2 (from TLSv1,
which means 1.0).  Older versions are still supported, just not by
default.

TLS 1.0 is widely deprecated, and TLS 1.1 only slightly less so.  All
OpenSSL versions that support TLS 1.1 also support TLS 1.2, so there
would be very little reason to, say, set the default to TLS 1.1
instead on grounds of better compatibility.

The test suite overrides this new setting, so it can still run with
older OpenSSL versions.

Discussion: https://www.postgresql.org/message-id/flat/b327f8df-da98-054d-0cc5-b76a857cfed9%402ndquadrant.com
2019-12-04 22:07:43 +01:00
Peter Eisentraut c4a7a392ec Make allow_system_table_mods settable at run time
Make allow_system_table_mods settable at run time by superusers.  It
was previously postmaster start only.

We don't want to make system catalog DDL wide-open, but there are
occasionally useful things to do like setting reloptions or statistics
on a busy system table, and blocking those doesn't help anyone.  Also,
this enables the possibility of writing a test suite for this setting.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8b00ea5e-28a7-88ba-e848-21528b632354%402ndquadrant.com
2019-11-29 10:22:13 +01:00
Peter Eisentraut 2e4db241bf Remove configure --disable-float4-byval
This build option was only useful to maintain compatibility for
version-0 functions, but those are no longer supported, so this option
can be removed.

float4 is now always pass-by-value; the pass-by-reference code path is
completely removed.

Discussion: https://www.postgresql.org/message-id/flat/f3e1e576-2749-bbd7-2d57-3f9dcf75255a@2ndquadrant.com
2019-11-21 18:29:21 +01:00
Amit Kapila cec2edfa78 Add logical_decoding_work_mem to limit ReorderBuffer memory usage.
Instead of deciding to serialize a transaction merely based on the
number of changes in that xact (toplevel or subxact), this makes
the decisions based on amount of memory consumed by the changes.

The memory limit is defined by a new logical_decoding_work_mem GUC,
so for example we can do this

    SET logical_decoding_work_mem = '128kB'

to reduce the memory usage of walsenders or set the higher value to
reduce disk writes. The minimum value is 64kB.

When adding a change to a transaction, we account for the size in
two places. Firstly, in the ReorderBuffer, which is then used to
decide if we reached the total memory limit. And secondly in the
transaction the change belongs to, so that we can pick the largest
transaction to evict (and serialize to disk).

We still use max_changes_in_memory when loading changes serialized
to disk. The trouble is we can't use the memory limit directly as
there might be multiple subxact serialized, we need to read all of
them but we don't know how many are there (and which subxact to
read first).

We do not serialize the ReorderBufferTXN entries, so if there is a
transaction with many subxacts, most memory may be in this type of
objects. Those records are not included in the memory accounting.

We also do not account for INTERNAL_TUPLECID changes, which are
kept in a separate list and not evicted from memory. Transactions
with many CTID changes may consume significant amounts of memory,
but we can't really do much about that.

The current eviction algorithm is very simple - the transaction is
picked merely by size, while it might be useful to also consider age
(LSN) of the changes for example. With the new Generational memory
allocator, evicting the oldest changes would make it more likely
the memory gets actually pfreed.

The logical_decoding_work_mem can be set in postgresql.conf, in which
case it serves as the default for all publishers on that instance.

Author: Tomas Vondra, with changes by Dilip Kumar and Amit Kapila
Reviewed-by: Dilip Kumar and Amit Kapila
Tested-By: Vignesh C
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2019-11-19 07:32:36 +05:30
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Alvaro Herrera 71a8a4f6e3 Add backtrace support for error reporting
Add some support for automatically showing backtraces in certain error
situations in the server.  Backtraces are shown on assertion failure;
also, a new setting backtrace_functions can be set to a list of C
function names, and all ereport()s and elog()s from the mentioned
functions will have backtraces generated.  Finally, the function
errbacktrace() can be manually added to an ereport() call to generate a
backtrace for that call.

Authors: Peter Eisentraut, Álvaro Herrera
Discussion: https://postgr.es/m//5f48cb47-bf1e-05b6-7aae-3bf2cd01586d@2ndquadrant.com
Discussion: https://postgr.es/m/CAMsr+YGL+yfWE=JvbUbnpWtrRZNey7hJ07+zT4bYJdVp4Szdrg@mail.gmail.com
2019-11-08 15:44:20 -03:00
Tomas Vondra 6e3e6cc0e8 Allow sampling of statements depending on duration
This allows logging a sample of statements, without incurring excessive
log traffic (which may impact performance).  This can be useful when
analyzing workloads with lots of short queries.

The sampling is configured using two new GUC parameters:

 * log_min_duration_sample - minimum required statement duration

 * log_statement_sample_rate - sample rate (0.0 - 1.0)

Only statements with duration exceeding log_min_duration_sample are
considered for sampling. To enable sampling, both those GUCs have to
be set correctly.

The existing log_min_duration_statement GUC has a higher priority, i.e.
statements with duration exceeding log_min_duration_statement will be
always logged, irrespectedly of how the sampling is configured. This
means only configurations

  log_min_duration_sample < log_min_duration_statement

do actually sample the statements, instead of logging everything.

Author: Adrien Nayrat
Reviewed-by: David Rowley, Vik Fearing, Tomas Vondra
Discussion: https://postgr.es/m/bbe0a1a8-a8f7-3be2-155a-888e661cc06c@anayrat.info
2019-11-06 19:11:07 +01:00
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Tom Lane 22f6f2c1cc Improve management of statement timeouts.
Commit f8e5f156b added private state in postgres.c to track whether
a statement timeout is running.  This seems like bad design to me;
timeout.c's private state should be the single source of truth about
that.  We already fixed one bug associated with failure to keep those
states in sync (cf. be42015fc), and I've got little faith that we
won't find more in future.  So get rid of postgres.c's local variable
by exposing a way to ask timeout.c whether a timeout is running.
(Obviously, such an inquiry is subject to race conditions, but it
seems fine for the purpose at hand.)

To make get_timeout_active() as cheap as possible, add a flag in
the per-timeout struct showing whether that timeout is active.
This allows some small savings elsewhere in timeout.c, mainly
elimination of unnecessary searches of the active_timeouts array.

While at it, fix enable_statement_timeout to not call disable_timeout
when statement_timeout is 0 and the timeout is not running.  This
avoids a useless deschedule-and-reschedule-timeouts cycle, which
represents a significant savings (at least one kernel call) when
there is any other active timeout.  Right now, there usually isn't,
but there are proposals around to change that.

Discussion: https://postgr.es/m/16035-456e6e69ebfd4374@postgresql.org
2019-10-25 11:41:16 -04:00
Thomas Munro 3c8c55dd54 When restoring GUCs in parallel workers, show an error context.
Otherwise it can be hard to see where an error is coming from, when
the parallel worker sets all the GUCs that it received from the
leader.  Bug #15726.  Back-patch to 9.5, where RestoreGUCState()
appeared.

Reported-by: Tiago Anastacio
Reviewed-by: Daniel Gustafsson, Tom Lane
Discussion: https://postgr.es/m/15726-6d67e4fa14f027b3%40postgresql.org
2019-10-17 13:47:01 +13:00
Peter Eisentraut 887248e97e Message style fixes 2019-09-23 13:38:39 +02:00
Tom Lane 6e42130568 Reject empty names and recursion in config-file include directives.
An empty file name or subdirectory name leads join_path_components() to
just produce the parent directory name, which leads to weird failures or
recursive inclusions.  Let's throw a specific error for that.  It takes
only slightly more code to detect all-blank names, so do so.

Also, detect direct recursion, ie a file calling itself.  As coded
this will also detect recursion via "include_dir '.'", which is
perhaps more likely than explicitly including the file itself.

Detecting indirect recursion would require API changes for guc-file.l
functions, which seems not worth it since extensions might call them.
The nesting depth limit will catch such cases eventually, just not
with such an on-point error message.

In passing, adjust the example usages in postgresql.conf.sample
to perhaps eliminate the problem at the source: there's no reason
for the examples to suggest that an empty value is valid.

Per a trouble report from Brent Bates.  Back-patch to 9.5; the
issue is old, but the code in 9.4 is enough different that the
patch doesn't apply easily, and it doesn't seem worth the trouble
to fix there.

Ian Barwick and Tom Lane

Discussion: https://postgr.es/m/8c8bcbca-3bd9-dc6e-8986-04a5abdef142@2ndquadrant.com
2019-08-27 14:44:26 -04:00
Andres Freund f7db0ac7d5 Add default_table_access_method to postgresql.conf.sample.
Reported-By: Heikki Linnakangas
Author: Michael Paquier
Discussion: https://postgr.es/m/d6ffbebb-a0d2-181c-811d-b029b2225ed7@iki.fi
Backpatch: 12-, where pluggable table access methods were introduced
2019-08-16 15:24:22 -07:00
Tom Lane f1bf619acd Fix ALTER SYSTEM to cope with duplicate entries in postgresql.auto.conf.
ALTER SYSTEM itself normally won't make duplicate entries (although
up till this patch, it was possible to confuse it by writing case
variants of a GUC's name).  However, if some external tool has appended
entries to the file, that could result in duplicate entries for a single
GUC name.  In such a situation, ALTER SYSTEM did exactly the wrong thing,
because it replaced or removed only the first matching entry, leaving
the later one(s) still there and hence still determining the active value.

This patch fixes that by making ALTER SYSTEM sweep through the file and
remove all matching entries, then (if not ALTER SYSTEM RESET) append the
new setting to the end.  This means entries will be in order of last
setting rather than first setting, but that shouldn't hurt anything.

Also, make the comparisons case-insensitive so that the right things
happen if you do, say, ALTER SYSTEM SET "TimeZone" = 'whatever'.

This has been broken since ALTER SYSTEM was invented, so back-patch
to all supported branches.

Ian Barwick, with minor mods by me

Discussion: https://postgr.es/m/aed6cc9f-98f3-2693-ac81-52bb0052307e@2ndquadrant.com
2019-08-14 15:09:42 -04:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Tomas Vondra 75506195da Revert "Add log_statement_sample_rate parameter"
This reverts commit 88bdbd3f74.

As committed, statement sampling used the existing duration threshold
(log_min_duration_statement) when decide which statements to sample.
The issue is that even the longest statements are subject to sampling,
and so may not end up logged. An improvement was proposed, introducing
a second duration threshold, but it would not be backwards compatible.
So we've decided to revert this feature - the separate threshold should
be part of the feature itself.

Discussion: https://postgr.es/m/CAFj8pRDS8tQ3Wviw9%3DAvODyUciPSrGeMhJi_WPE%2BEB8%2B4gLL-Q%40mail.gmail.com
2019-08-04 23:38:27 +02:00
Bruce Momjian ba09342518 Adjust ssl_ciphers to be specific to OpenSSL
Syntax is OpenSSL-specific, so only use it for OpenSSL.

Discussion: https://postgr.es/m/8232E273-7B25-47F4-B0E7-3D4264106F82@yesql.se

Author: Daniel Gustafsson

Backpatch-through: head
2019-07-08 19:39:48 -04:00
Michael Paquier 6b8548964b Fix inconsistencies in the code
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
2019-07-08 13:15:09 +09:00
Thomas Munro e8fdcacc6c Improve comment in postgresql.conf.sample.
The Unix manual section that "man tcp" appears in varies, so let's
just leave it out of the command to run.
2019-07-05 21:03:51 +12:00
Tom Lane 9e1c9f9594 pgindent run prior to branching v12.
pgperltidy and reformat-dat-files too, though the latter didn't
find anything to change.
2019-07-01 12:37:52 -04:00
Peter Eisentraut 21f428ebde Don't call data type input functions in GUC check hooks
Instead of calling pg_lsn_in() in check_recovery_target_lsn and
timestamptz_in() in check_recovery_target_time, reorganize the
respective code so that we don't raise any errors in the check hooks.
The previous code tried to use PG_TRY/PG_CATCH to handle errors in a
way that is not safe, so now the code contains no ereport() calls and
can operate safely within the GUC error handling system.

Moreover, since the interpretation of the recovery_target_time string
may depend on the time zone, we cannot do the final processing of that
string until all the GUC processing is done.  Instead,
check_recovery_target_time() now does some parsing for syntax
checking, but the actual conversion to a timestamptz value is done
later in the recovery code that uses it.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190611061115.njjwkagvxp4qujhp%40alap3.anarazel.de
2019-06-30 10:27:43 +02:00
Amit Kapila 9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Etsuro Fujita 7d9eca59cf Fix typo. 2019-05-14 16:05:37 +09:00
Alvaro Herrera 9f8b717a80 Message style fixes 2019-04-30 10:33:37 -04:00
Bruce Momjian fb9c475597 postgresql.conf.sample: add proper defaults for include actions
Previously, include actions include_dir, include_if_exists, and include
listed commented-out values which were not the defaults, which is
inconsistent with other entries.  Instead, replace them with '', which
is the default value.

Reported-by: Emanuel Araújo

Discussion: https://postgr.es/m/CAMuTAkYMx6Q27wpELDR3_v9aG443y7ZjeXu15_+1nGUjhMWOJA@mail.gmail.com

Backpatch-through: 9.4
2019-04-17 18:12:10 -04:00
Michael Paquier 249d649996 Add support TCP user timeout in libpq and the backend server
Similarly to the set of parameters for keepalive, a connection parameter
for libpq is added as well as a backend GUC, called tcp_user_timeout.

Increasing the TCP user timeout is useful to allow a connection to
survive extended periods without end-to-end connection, and decreasing
it allows application to fail faster.  By default, the parameter is 0,
which makes the connection use the system default, and follows a logic
close to the keepalive parameters in its handling.  When connecting
through a Unix-socket domain, the parameters have no effect.

Author: Ryohei Nagaura
Reviewed-by: Fabien Coelho, Robert Haas, Kyotaro Horiguchi, Kirk
Jamison, Mikalai Keida, Takayuki Tsunakawa, Andrei Yahorau
Discussion: https://postgr.es/m/EDA4195584F5064680D8130B1CA91C45367328@G01JPEXMBYT04
2019-04-06 15:23:37 +09:00
Tomas Vondra ea569d64ac Add SETTINGS option to EXPLAIN, to print modified settings.
Query planning is affected by a number of configuration options, and it
may be crucial to know which of those options were set to non-default
values.  With this patch you can say EXPLAIN (SETTINGS ON) to include
that information in the query plan.  Only options affecting planning,
with values different from the built-in default are printed.

This patch also adds auto_explain.log_settings option, providing the
same capability in auto_explain module.

Author: Tomas Vondra
Reviewed-by: Rafia Sabih, John Naylor
Discussion: https://postgr.es/m/e1791b4c-df9c-be02-edc5-7c8874944be0@2ndquadrant.com
2019-04-04 00:04:31 +02:00
Alvaro Herrera d1f04b96b9 Tweak docs for log_statement_sample_rate
Author: Justin Pryzby, partly after a suggestion from Masahiko Sawada
Discussion: https://postgr.es/m/20190328135918.GA27808@telsasoft.com
Discussion: https://postgr.es/m/CAD21AoB9+y8N4+Fan-ne-_7J5yTybPttxeVKfwUocKp4zT1vNQ@mail.gmail.com
2019-04-03 18:56:56 -03:00
Alvaro Herrera 799e220346 Log all statements from a sample of transactions
This is useful to obtain a view of the different transaction types in an
application, regardless of the durations of the statements each runs.

Author: Adrien Nayrat
Reviewed-by: Masahiko Sawada, Hayato Kuroda, Andres Freund
2019-04-03 18:43:59 -03:00
Thomas Munro 475861b261 Add wal_recycle and wal_init_zero GUCs.
On at least ZFS, it can be beneficial to create new WAL files every
time and not to bother zero-filling them.  Since it's not clear which
other filesystems might benefit from one or both of those things,
add individual GUCs to control those two behaviors independently and
make only very general statements in the docs.

Author: Jerry Jelinek, with some adjustments by Thomas Munro
Reviewed-by: Alvaro Herrera, Andres Freund, Tomas Vondra, Robert Haas and others
Discussion: https://postgr.es/m/CACPQ5Fo00QR7LNAcd1ZjgoBi4y97%2BK760YABs0vQHH5dLdkkMA%40mail.gmail.com
2019-04-02 14:37:14 +13:00
Peter Eisentraut cc8d415117 Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.

Features:

- Program name is automatically prefixed.

- Message string does not end with newline.  This removes a common
  source of inconsistencies and omissions.

- Additionally, a final newline is automatically stripped, simplifying
  use of PQerrorMessage() etc., another common source of mistakes.

- I converted error message strings to use %m where possible.

- As a result of the above several points, more translatable message
  strings can be shared between different components and between
  frontends and backend, without gratuitous punctuation or whitespace
  differences.

- There is support for setting a "log level".  This is not meant to be
  user-facing, but can be used internally to implement debug or
  verbose modes.

- Lazy argument evaluation, so no significant overhead if logging at
  some level is disabled.

- Some color in the messages, similar to gcc and clang.  Set
  PG_COLOR=auto to try it out.  Some colors are predefined, but can be
  customized by setting PG_COLORS.

- Common files (common/, fe_utils/, etc.) can handle logging much more
  simply by just using one API without worrying too much about the
  context of the calling program, requiring callbacks, or having to
  pass "progname" around everywhere.

- Some programs called setvbuf() to make sure that stderr is
  unbuffered, even on Windows.  But not all programs did that.  This
  is now done centrally.

Soft goals:

- Reduces vertical space use and visual complexity of error reporting
  in the source code.

- Encourages more deliberate classification of messages.  For example,
  in some cases it wasn't clear without analyzing the surrounding code
  whether a message was meant as an error or just an info.

- Concepts and terms are vaguely aligned with popular logging
  frameworks such as log4j and Python logging.

This is all just about printing stuff out.  Nothing affects program
flow (e.g., fatal exits).  The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.

I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded.  One significant change is that
pg_rewind used to write all error messages to stdout.  That is now
changed to stderr.

Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 20:01:35 +02:00
Thomas Munro 2fc7af5e96 Add basic infrastructure for 64 bit transaction IDs.
Instead of inferring epoch progress from xids and checkpoints,
introduce a 64 bit FullTransactionId type and use it to track xid
generation.  This fixes an unlikely bug where the epoch is reported
incorrectly if the range of active xids wraps around more than once
between checkpoints.

The only user-visible effect of this commit is to correct the epoch
used by txid_current() and txid_status(), also visible with
pg_controldata, in those rare circumstances.  It also creates some
basic infrastructure so that later patches can use 64 bit
transaction IDs in more places.

The new type is a struct that we pass by value, as a form of strong
typedef.  This prevents the sort of accidental confusion between
TransactionId and FullTransactionId that would be possible if we
were to use a plain old uint64.

Author: Thomas Munro
Reported-by: Amit Kapila
Reviewed-by: Andres Freund, Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com
2019-03-28 18:12:20 +13:00
Peter Eisentraut 893d6f8a1f Avoid casting away a const 2019-03-16 10:13:03 +01:00
Tom Lane 1a83a80a2f Allow fractional input values for integer GUCs, and improve rounding logic.
Historically guc.c has just refused examples like set work_mem = '30.1GB',
but it seems more useful for it to take that and round off the value to
some reasonable approximation of what the user said.  Just rounding to
the parameter's native unit would work, but it would lead to rather
silly-looking settings, such as 31562138kB for this example.  Instead
let's round to the nearest multiple of the next smaller unit (if any),
producing 30822MB.

Also, do the units conversion math in floating point and round to integer
(if needed) only at the end.  This produces saner results for inputs that
aren't exact multiples of the parameter's native unit, and removes another
difference in the behavior for integer vs. float parameters.

In passing, document the ability to use hex or octal input where it
ought to be documented.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-11 19:13:55 -04:00
Tom Lane d9c5e9629b Give up on testing guc.c's behavior for "infinity" inputs.
Further buildfarm testing shows that on the machines that are failing
ac75959cd's test case, what we're actually getting from strtod("-infinity")
is a syntax error (endptr == value) not ERANGE at all.  This test case
is not worth carrying two sets of expected output for, so just remove it,
and revert commit b212245f9's misguided attempt to work around the platform
dependency.

Discussion: https://postgr.es/m/E1h33xk-0001Og-Gs@gemulon.postgresql.org
2019-03-11 17:53:09 -04:00
Tom Lane b212245f96 In guc.c, ignore ERANGE errors from strtod().
Instead, just proceed with the infinity or zero result that it should
return for overflow/underflow.  This avoids a platform dependency,
in that various versions of strtod are inconsistent about whether they
signal ERANGE for a value that's specified as infinity.

It's possible this won't be enough to remove the buildfarm failures
we're seeing from ac75959cd, in which case I'll take out the infinity
test case that commit added.  But first let's see if we can fix it.

Discussion: https://postgr.es/m/E1h33xk-0001Og-Gs@gemulon.postgresql.org
2019-03-11 11:25:26 -04:00
Tom Lane cbccac371c Reduce the default value of autovacuum_vacuum_cost_delay to 2ms.
This is a better way to implement the desired change of increasing
autovacuum's default resource consumption.

Discussion: https://postgr.es/m/28720.1552101086@sss.pgh.pa.us
2019-03-10 15:16:21 -04:00
Tom Lane 52985e4fea Revert "Increase the default vacuum_cost_limit from 200 to 2000"
This reverts commit bd09503e63.

Per discussion, it seems like what we should do instead is to
reduce the default value of autovacuum_vacuum_cost_delay by the
same factor.  That's functionally equivalent as long as the
platform can accurately service the smaller delay request, which
should be true on anything released in the last 10 years or more.
And smaller, more-closely-spaced delays are better in terms of
providing a steady I/O load.

Discussion: https://postgr.es/m/28720.1552101086@sss.pgh.pa.us
2019-03-10 15:05:25 -04:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Tom Lane 28a65fc360 Include GUC's unit, if it has one, in out-of-range error messages.
This should reduce confusion in cases where we've applied a units
conversion, so that the number being reported (and the quoted range
limits) are in some other units than what the user gave in the
setting we're rejecting.

Some of the changes here assume that float GUCs can have units,
which isn't true just yet, but will be shortly.

Discussion: https://postgr.es/m/3811.1552169665@sss.pgh.pa.us
2019-03-10 13:18:17 -04:00
Tom Lane ac75959cdc Disallow NaN as a value for floating-point GUCs.
None of the code that uses GUC values is really prepared for them to
hold NaN, but parse_real() didn't have any defense against accepting
such a value.  Treat it the same as a syntax error.

I haven't attempted to analyze the exact consequences of setting any
of the float GUCs to NaN, but since they're quite unlikely to be good,
this seems like a back-patchable bug fix.

Note: we don't need an explicit test for +-Infinity because those will
be rejected by existing range checks.  I added a regression test for
that in HEAD, but not older branches because the spelling of the value
in the error message will be platform-dependent in branches where we
don't always use port/snprintf.c.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 12:59:16 -04:00
Andres Freund 8586bf7ed8 tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
  ACCESS METHOD ... TYPE TABLE and
  CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.

Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.

Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs.  It's quite possible that this behaviour
still needs to be fine tuned.

For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.

Catversion bumped, to add the heap table AM and references to it.

Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
    https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
    https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 09:54:38 -08:00
Andrew Dunstan bd09503e63 Increase the default vacuum_cost_limit from 200 to 2000
The original 200 default value was set back in f425b605f4 when the cost
delay settings were first added.  Hardware has improved quite a bit since
then and we've also made improvements such as sorting buffers during
checkpoints (9cd00c457e) which should result in less random writes.

This low default value was reportedly causing problems with badly
configured servers and in the absence of a native method to remove
excessive bloat from tables without incurring an AccessExclusiveLock, this
often made cleaning up the damage caused by badly configured auto-vacuums
difficult.

It seems more likely that someone will notice that auto-vacuum is running
too quickly than too slowly, so let's go all out and multiple the default
value for the setting by 10.  With the default vacuum_cost_page_dirty and
autovacuum_vacuum_cost_delay (assuming a page size of 8192 bytes), this
allows autovacuum a theoretical maximum dirty write rate of around 39MB/s
instead of just 3.9MB/s.

Author: David Rowley

Discussion: https://postgr.es/m/CAKJS1f_YbXC2qTMPyCbmsPiKvZYwpuQNQMohiRXLj1r=8_rYvw@mail.gmail.com
2019-03-06 09:10:12 -05:00
Michael Paquier b3a156858a Improve documentation of data_sync_retry
Reflecting an updated parameter value requires a server restart, which
was not mentioned in the documentation and in postgresql.conf.sample.

Reported-by: Thomas Poty
Discussion: https://postgr.es/m/15659-0cd812f13027a2d8@postgresql.org
2019-02-28 11:02:11 +09:00
Andrew Gierth 02ddd49932 Change floating-point output format for improved performance.
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).

Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.

Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.

Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.

We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.

Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:

 - Output format is changed to use fixed-point notation for small
   exponents, as printf would, and also to use lowercase 'e', a
   minimum of 2 exponent digits, and a mandatory sign on the exponent,
   to keep the formatting as close as possible to previous output.

 - The output of exact midpoint values is disabled as noted above.

 - The integer fast-path code is changed somewhat (since we have
   fixed-point output and the upstream did not).

 - Our project style has been largely applied to the code with the
   exception of C99 declaration-after-statement, which has been
   retained as an exception to our present policy.

 - Most of upstream's debugging and conditionals are removed, and we
   use our own configure tests to determine things like uint128
   availability.

Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).

Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.

This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.

Code by me, adapting the work of Ulf Adams and other contributors.

References:
https://dl.acm.org/citation.cfm?id=3192369

Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 15:20:33 +00:00
Peter Eisentraut 37d9916020 More unconstify use
Replace casts whose only purpose is to cast away const with the
unconstify() macro.

Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
2019-02-13 11:50:16 +01:00
Michael Paquier ea92368cd1 Move max_wal_senders out of max_connections for connection slot handling
Since its introduction, max_wal_senders is counted as part of
max_connections when it comes to define how many connection slots can be
used for replication connections with a WAL sender context.  This can
lead to confusion for some users, as it could be possible to block a
base backup or replication from happening because other backend sessions
are already taken for other purposes by an application, and
superuser-only connection slots are not a correct solution to handle
that case.

This commit makes max_wal_senders independent of max_connections for its
handling of PGPROC entries in ProcGlobal, meaning that connection slots
for WAL senders are handled using their own free queue, like autovacuum
workers and bgworkers.

One compatibility issue that this change creates is that a standby now
requires to have a value of max_wal_senders at least equal to its
primary.  So, if a standby created enforces the value of
max_wal_senders to be lower than that, then this could break failovers.
Normally this should not be an issue though, as any settings of a
standby are inherited from its primary as postgresql.conf gets normally
copied as part of a base backup, so parameters would be consistent.

Author: Alexander Kukushkin
Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii
Kliukin
Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com
2019-02-12 10:07:56 +09:00
Peter Eisentraut 13b89f96d0 Allow some recovery parameters to be changed with reload
Change

archive_cleanup_command
promote_trigger_file
recovery_end_command
recovery_min_apply_delay

from PGC_POSTMASTER to PGC_SIGHUP.  This did not require any further
changes.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/ca28011a-cfaa-565c-d622-c1907c33ecf7%402ndquadrant.com
2019-02-07 08:34:48 +01:00
Thomas Munro f1bebef60e Add shared_memory_type GUC.
Since 9.3 we have used anonymous shared mmap for our main shared memory
region, except in EXEC_BACKEND builds.  Provide a GUC so that users
can opt for System V shared memory once again, like in 9.2 and earlier.

A later patch proposes to add huge/large page support for AIX, which
requires System V shared memory and provided the motivation to revive
this possibility.  It may also be useful on some BSDs.

Author: Andres Freund (revived and documented by Thomas Munro)
Discussion: https://postgr.es/m/HE1PR0202MB28126DB4E0B6621CC6A1A91286D90%40HE1PR0202MB2812.eurprd02.prod.outlook.com
Discussion: https://postgr.es/m/2AE143D2-87D3-4AD1-AC78-CE2258230C05%40FreeBSD.org
2019-02-03 12:47:26 +01:00
Tom Lane f09346a9c6 Refactor planner's header files.
Create a new header optimizer/optimizer.h, which exposes just the
planner functions that can be used "at arm's length", without need
to access Paths or the other planner-internal data structures defined
in nodes/relation.h.  This is intended to provide the whole planner
API seen by most of the rest of the system; although FDWs still need
to use additional stuff, and more thought is also needed about just
what selfuncs.c should rely on.

The main point of doing this now is to limit the amount of new
#include baggage that will be needed by "planner support functions",
which I expect to introduce later, and which will be in relevant
datatype modules rather than anywhere near the planner.

This commit just moves relevant declarations into optimizer.h from
other header files (a couple of which go away because everything
got moved), and adjusts #include lists to match.  There's further
cleanup that could be done if we want to decide that some stuff
being exposed by optimizer.h doesn't belong in the planner at all,
but I'll leave that for another day.

Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
2019-01-29 15:48:51 -05:00
Andres Freund e0c4ec0728 Replace uses of heap_open et al with the corresponding table_* function.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Andres Freund 111944c5ee Replace heapam.h includes with {table, relation}.h where applicable.
A lot of files only included heapam.h for relation_open, heap_open etc
- replace the heapam.h include in those files with the narrower
header.

Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Andres Freund de66987adb Re-add default_with_oids GUC to avoid breaking old dump files.
After 578b229718 / the removal of WITH OIDS support, older dump files
containing
    SET default_with_oids = false;
either report unnecessary errors (as the subsequent tables have no
oids) or even fail to restore entirely (when using transaction mode).
To avoid that, re-add the GUC, but don't allow setting it to true.

Per complaint from Tom Lane.

Author: Amit Khandekar, editorialized by me
Discussion: https://postgr.es/m/CAJ3gD9dZyxrtL0rJfoNoOj6v7fJSDaXBngi9wy5XU8m-ioXhAA@mail.gmail.com
2019-01-14 15:30:24 -08:00
Peter Eisentraut 0acb3bc33a Change default of recovery_target_timeline to 'latest'
This is what one usually wants for recovery and almost always wants
for a standby.

Discussion: https://www.postgresql.org/message-id/flat/6dd2c23a-4162-8469-410f-bfe146e28c0c@2ndquadrant.com/
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-01-13 10:01:05 +01:00
Peter Eisentraut ff85306055 Add value 'current' for recovery_target_timeline
This value represents the default behavior of using the current
timeline.  Previously, this was represented by an empty string.

(Before the removal of recovery.conf, this setting could not be chosen
explicitly but was used when recovery_target_timeline was not
mentioned at all.)

Discussion: https://www.postgresql.org/message-id/flat/6dd2c23a-4162-8469-410f-bfe146e28c0c@2ndquadrant.com/
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-01-11 11:02:03 +01:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Michael Paquier 1707a0d2aa Remove configure switch --disable-strong-random
This removes a portion of infrastructure introduced by fe0a0b5 to allow
compilation of Postgres in environments where no strong random source is
available, meaning that there is no linking to OpenSSL and no
/dev/urandom (Windows having its own CryptoAPI).  No systems shipped
this century lack /dev/urandom, and the buildfarm is actually not
testing this switch at all, so just remove it.  This simplifies
particularly some backend code which included a fallback implementation
using shared memory, and removes a set of alternate regression output
files from pgcrypto.

Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20181230063219.GG608@paquier.xyz
2019-01-01 20:05:51 +09:00
Michael Paquier 730422afcd Fix some errhint and errdetail strings missing a period
As per the error message style guide of the documentation, those should
be full sentences.

Author: Daniel Gustafsson
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://1E8D49B4-16BC-4420-B4ED-58501D9E076B@yesql.se
2018-12-07 07:47:42 +09:00
Tom Lane d2b0b60e71 Improve our response to invalid format strings, and detect more cases.
Places that are testing for *printf failure ought to include the format
string in their error reports, since bad-format-string is one of the
more likely causes of such failure.  This both makes it easier to find
and repair the mistake, and provides at least some useful info to the
user who stumbles across such a problem.

Also, tighten snprintf.c to report EINVAL for an invalid flag or
final character in a format %-spec (including the case where the
%-spec is missing a final character altogether).  This seems like
better project policy, and it also allows removing an instruction
or two from the hot code path.

Back-patch the error reporting change in pvsnprintf, since it should be
harmless and may be helpful; but not the snprintf.c change.

Per discussion of bug #15511 from Ertuğrul Kahveci, which reported an
invalid translated format string.  These changes don't fix that error,
but they should improve matters next time we make such a mistake.

Discussion: https://postgr.es/m/15511-1d8b6a0bc874112f@postgresql.org
2018-12-06 15:08:44 -05:00
Alvaro Herrera 88bdbd3f74 Add log_statement_sample_rate parameter
This allows to set a lower log_min_duration_statement value without
incurring excessive log traffic (which reduces performance).  This can
be useful to analyze workloads with lots of short queries.

Author: Adrien Nayrat
Reviewed-by: David Rowley, Vik Fearing
Discussion: https://postgr.es/m/c30ee535-ee1e-db9f-fa97-146b9f62caed@anayrat.info
2018-11-29 18:42:53 -03:00
Peter Eisentraut f2cbffc7a6 Only allow one recovery target setting
The previous recovery.conf regime accepted multiple recovery_target*
settings and used the last one.  This does not translate well to the
general GUC system.  Specifically, under EXEC_BACKEND, the settings
are written out not in any particular order, so the order in which
they were originally set is not available to new processes.

Rather than redesign the GUC system, it was decided to abandon the old
behavior and only allow one recovery target setting.  A second setting
will cause an error.  However, it is allowed to set the same parameter
multiple times or unset a parameter and set a different one.

Discussion: https://www.postgresql.org/message-id/flat/27802171543235530%40iva2-6ec8f0a6115e.qloud-c.yandex.net#701a59c837ad0bf8c244344aaf3ef5a4
2018-11-28 13:55:54 +01:00
Peter Eisentraut 2dedf4d9a8 Integrate recovery.conf into postgresql.conf
recovery.conf settings are now set in postgresql.conf (or other GUC
sources).  Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.

Recovery is now initiated by a file recovery.signal.  Standby mode is
initiated by a file standby.signal.  The standby_mode setting is
gone.  If a recovery.conf file is found, an error is issued.

The trigger_file setting has been renamed to promote_trigger_file as
part of the move.

The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".

pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.

Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/
2018-11-25 16:33:40 +01:00
Andres Freund 578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Peter Eisentraut e73e67c719 Add settings to control SSL/TLS protocol version
For example:

    ssl_min_protocol_version = 'TLSv1.1'
    ssl_max_protocol_version = 'TLSv1.2'

Reviewed-by: Steve Singer <steve@ssinger.info>
Discussion: https://www.postgresql.org/message-id/flat/1822da87-b862-041a-9fc2-d0310c3da173@2ndquadrant.com
2018-11-20 22:12:10 +01:00
Peter Eisentraut a568cadaff Refine some guc.c help texts
These settings apply to communication with the sending server, which
is not necessarily a primary.

Author: Sergei Kornilov <sk@zsrv.org>
2018-11-20 06:38:34 +01:00
Thomas Munro 9ccdd7f66e PANIC on fsync() failure.
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure.  In that case the only remaining copy of the
data is in the WAL.  A subsequent fsync() could appear to succeed,
but not have flushed the data.  That means that a future checkpoint
could apparently complete successfully but have lost data.

Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure.  Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.

Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses.  If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail.  The GUC defaults
to off, meaning that we panic.

Back-patch to all supported releases.

There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error.  A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.

Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
2018-11-19 17:41:26 +13:00
Andres Freund 1a0586de36 Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.

Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots.  Thus different types of
slots are needed, which requires adapting creators of slots.

The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.

Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.

But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.

Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().

The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.

This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit.  The different types of slots introduced will, for now, still
use the same backing implementation.

While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.

Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 22:00:30 -08:00
Tom Lane 3d360e20c9 Disallow setting client_min_messages higher than ERROR.
Previously it was possible to set client_min_messages to FATAL or PANIC,
which had the effect of suppressing transmission of regular ERROR messages
to the client.  Perhaps that seemed like a useful option in the past, but
the trouble with it is that it breaks guarantees that are explicitly made
in our FE/BE protocol spec about how a query cycle can end.  While libpq
and psql manage to cope with the omission, that's mostly because they
are not very bright; client libraries that have more semantic knowledge
are likely to get confused.  Notably, pgODBC doesn't behave very sanely.
Let's fix this by getting rid of the ability to set client_min_messages
above ERROR.

In HEAD, just remove the FATAL and PANIC options from the set of allowed
enum values for client_min_messages.  (This change also affects
trace_recovery_messages, but that's OK since these aren't useful values
for that variable either.)

In the back branches, there was concern that rejecting these values might
break applications that are explicitly setting things that way.  I'm
pretty skeptical of that argument, but accommodate it by accepting these
values and then internally setting the variable to ERROR anyway.

In all branches, this allows a couple of tiny simplifications in the
logic in elog.c, so do that.

Also respond to the point that was made that client_min_messages has
exactly nothing to do with the server's logging behavior, and therefore
does not belong in the "When To Log" subsection of the documentation.
The "Statement Behavior" subsection is a better match, so move it there.

Jonah Harris and Tom Lane

Discussion: https://postgr.es/m/7809.1541521180@sss.pgh.pa.us
Discussion: https://postgr.es/m/15479-ef0f4cc2fd995ca2@postgresql.org
2018-11-08 17:33:43 -05:00
Bruce Momjian b43df566b3 GUC: adjust effective_cache_size SQL descriptions
Follow on patch for commit 3e0f1a4741.

Reported-by: Peter Eisentraut

Discussion: https://postgr.es/m/369ec766-b947-51bd-4dad-6fb9e026439f@2ndquadrant.com

Backpatch-through: 9.4
2018-11-06 13:40:03 -05:00
Bruce Momjian 3e0f1a4741 GUC: adjust effective_cache_size docs and SQL description
Clarify that effective_cache_size is both kernel buffers and shared
buffers.

Reported-by: nat@makarevitch.org

Discussion: https://postgr.es/m/153685164808.22334.15432535018443165207@wrigleys.postgresql.org

Backpatch-through: 9.3
2018-11-02 09:11:00 -04:00
Peter Eisentraut f8c10f616f Turn transaction_isolation into GUC enum
It was previously a string setting that was converted into an enum by
custom code, but using the GUC enum facility seems much simpler and
doesn't change any functionality, except that

    set transaction_isolation='default';

no longer works, but that was never documented and doesn't work with
any other transaction characteristics.  (Note that this is not the
same as RESET or SET TO DEFAULT, which still work.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/457db615-e84c-4838-310e-43841eb806e5@iki.fi
2018-10-09 21:26:00 +02:00
Stephen Frost 8bddc86400 Add application_name to connection authorized msg
The connection authorized message has quite a bit of useful information
in it, but didn't include the application_name (when provided), so let's
add that as it can be very useful.

Note that at the point where we're emitting the connection authorized
message, we haven't processed GUCs, so it's not possible to get this by
using log_line_prefix (which pulls from the GUC).  There's also
something to be said for having this included in the connection
authorized message and then not needing to repeat it for every line, as
having it in log_line_prefix would do.

The GUC cleans the application name to pure-ascii, so do that here too,
but pull out the logic for cleaning up a string into its own function
in common and re-use it from those places, and check_cluster_name which
was doing the same thing.

Author: Don Seiler <don@seiler.us>
Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com
2018-09-28 19:04:50 -04:00
Tom Lane 2b04dfc472 Improve error reporting for unsupported effective_io_concurrency setting.
Give a specific error complaining about lack of posix_fadvise() when
someone tries to set effective_io_concurrency > 0 on platforms
without that.

This probably isn't worth extensive back-patching, but I (tgl) felt
cramming it into v11 was reasonable.

James Robinson

Discussion: https://postgr.es/m/153771876450.14994.560017943128223619@wrigleys.postgresql.org
Discussion: https://postgr.es/m/A3942987-5BC7-4F05-B54D-2A0EC2914B33@jlr-photo.com
2018-09-28 16:12:13 -04:00
Michael Paquier db361db2fc Make GUC wal_sender_timeout user-settable
Being able to use a value that can be changed on a connection basis is
useful with clusters distributed geographically, and makes failure
detection more flexible.  A note is added in the documentation about the
use of "options" in primary_conninfo, which can be hard to grasp for
newcomers with the need of two single quotes when listing a set of
parameters.

Author: Tsunakawa Takayuki
Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1FAAD3AE@G01JPEXMBYT05
2018-09-22 15:23:59 +09:00
Tom Lane 8f32bacc00 In v11, disable JIT by default (it's still enabled by default in HEAD).
Per discussion, JIT isn't quite mature enough to ship enabled-by-default.

I failed to resist the temptation to do a bunch of copy-editing on the
related documentation.  Also, clean up some inconsistencies in which
section of config.sgml the JIT GUCs are documented in vs. what guc.c
and postgresql.config.sample had.

Discussion: https://postgr.es/m/20180914222657.mw25esrzbcnu6qlu@alap3.anarazel.de
2018-09-15 17:24:35 -04:00
Thomas Munro af63926cf5 Wrap long line in postgresql.conf.sample.
Per complaint from Michael Paquier.
2018-08-22 21:34:50 +12:00
Thomas Munro f9fe269ca2 Provide plan_cache_mode options in postgresql.conf.sample.
Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f8YkwojSTSg8YjNYCLCXzx0fR7wBR3Gf%2BrA9_52eoPZKg%40mail.gmail.com
2018-08-22 18:19:39 +12:00
Michael Paquier 72be8c29a1 Fix set of NLS translation issues
While monitoring the code, a couple of issues related to string
translation has showed up:
- Some routines for auto-updatable views return an error string, which
sometimes missed the shot.  A comment regarding string translation is
added for each routine to help with future features.
- GSSAPI authentication missed two translations.
- vacuumdb handles non-translated strings.
- GetConfigOptionByNum should translate strings.  This part is not
back-patched as after a minor upgrade this could be surprising for
users.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch-through: 9.3
2018-08-21 15:17:13 +09:00
Michael Paquier d8c83800c3 Fix typo in description of enable_parallel_hash
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20180821.115841.93250330.horiguchi.kyotaro@lab.ntt.co.jp
2018-08-21 12:13:16 +09:00
Tom Lane e1d19c902e Require a C99-compliant snprintf(), and remove related workarounds.
Since our substitute snprintf now returns a C99-compliant result,
there's no need anymore to have complicated code to cope with pre-C99
behavior.  We can just make configure substitute snprintf.c if it finds
that the system snprintf() is pre-C99.  (Note: I do not believe that
there are any platforms where this test will trigger that weren't
already being rejected due to our other C99-ish feature requirements for
snprintf.  But let's add the check for paranoia's sake.)  Then, simplify
the call sites that had logic to cope with the pre-C99 definition.

I also dropped some stuff that was being paranoid about the possibility
of snprintf overrunning the given buffer.  The only reports we've ever
heard of that being a problem were for Solaris 7, which is long dead,
and we've sure not heard any reports of these assertions triggering in
a long time.  So let's drop that complexity too.

Likewise, drop some code that wasn't trusting snprintf to set errno
when it returns -1.  That would be not-per-spec, and again there's
no real reason to believe it is a live issue, especially not for
snprintfs that pass all of configure's feature checks.

Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
2018-08-16 13:01:09 -04:00
Tom Lane cc4f6b7786 Clean up assorted misuses of snprintf()'s result value.
Fix a small number of places that were testing the result of snprintf()
but doing so incorrectly.  The right test for buffer overrun, per C99,
is "result >= bufsize" not "result > bufsize".  Some places were also
checking for failure with "result == -1", but the standard only says
that a negative value is delivered on failure.

(Note that this only makes these places correct if snprintf() delivers
C99-compliant results.  But at least now these places are consistent
with all the other places where we assume that.)

Also, make psql_start_test() and isolation_start_test() check for
buffer overrun while constructing their shell commands.  There seems
like a higher risk of overrun, with more severe consequences, here
than there is for the individual file paths that are made elsewhere
in the same functions, so this seemed like a worthwhile change.

Also fix guc.c's do_serialize() to initialize errno = 0 before
calling vsnprintf.  In principle, this should be unnecessary because
vsnprintf should have set errno if it returns a failure indication ...
but the other two places this coding pattern is cribbed from don't
assume that, so let's be consistent.

These errors are all very old, so back-patch as appropriate.  I think
that only the shell command overrun cases are even theoretically
reachable in practice, but there's not much point in erroneous error
checks.

Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
2018-08-15 16:29:31 -04:00
Peter Eisentraut 98efa76fe3 Add ssl_library preset parameter
This allows querying the SSL implementation used on the server side.
It's analogous to using PQsslAttribute(conn, "library") in libpq.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2018-07-30 13:46:27 +02:00
Tomas Vondra 6bf0bc842b Provide separate header file for built-in float types
Some data types under adt/ have separate header files, but most simple
ones do not, and their public functions are defined in builtins.h.  As
the patches improving geometric types will require making additional
functions public, this seems like a good opportunity to create a header
for floats types.

Commit 1acf757255 made _cmp functions public to solve NaN issues locally
for GiST indexes.  This patch reworks it in favour of a more widely
applicable API.  The API uses inline functions, as they are easier to
use compared to macros, and avoid double-evaluation hazards.

Author: Emre Hasegeli
Reviewed-by: Kyotaro Horiguchi

Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com
2018-07-29 03:30:48 +02:00
Thomas Munro 1bc180cd2a Use setproctitle_fast() to update the ps status, if available.
FreeBSD has introduced a faster variant of setproctitle().  Use it,
where available.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=1wKMTi81uodJ=1KbJAz5WedOg=cr8ewEXrUFeaxWEgww@mail.gmail.com
2018-07-24 13:09:22 +12:00
Peter Eisentraut f7cb2842bf Add plan_cache_mode setting
This allows overriding the choice of custom or generic plan.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com
2018-07-16 13:35:41 +02:00
Alexander Korotkov edf59c40dd Fix more wrong paths in header comments
It appears that there are more files, whose header comment paths are
wrong.  So, fix those paths.  No backpatching per proposal of Tom Lane.

Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com
2018-07-11 17:57:04 +03:00
Alvaro Herrera f2c587067a Rethink how to get float.h in old Windows API for isnan/isinf
We include <float.h> in every place that needs isnan(), because MSVC
used to require it.  However, since MSVC 2013 that's no longer necessary
(cf. commit cec8394b5c), so we can retire the inclusion to a
version-specific stanza in win32_port.h, where it doesn't need to
pollute random .c files.  The header is of course still needed in a few
places for other reasons.

I (Álvaro) removed float.h from a few more files than in Emre's original
patch.  This doesn't break the build in my system, but we'll see what
the buildfarm has to say about it all.

Author: Emre Hasegeli
Discussion: https://postgr.es/m/CAE2gYzyc0+5uG+Cd9-BSL7NKC8LSHLNg1Aq2=8ubjnUwut4_iw@mail.gmail.com
2018-07-11 09:11:48 -04:00
Peter Eisentraut bcbd940806 Remove dynamic_shared_memory_type=none
PostgreSQL nowadays offers some kind of dynamic shared memory feature on
all supported platforms.  Having the choice of "none" prevents us from
relying on DSM in core features.  So this patch removes the choice of
"none".

Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
2018-07-10 18:35:24 +02:00
Andrew Dunstan 8fb68aa265 Print DEBUG2 like that rather than as DEBUG
DEBUG is an alias for DEBUG2, but we want DEBUG2 to show in the settings
no matter how it was spelled.

Takeshi Ideriha

Discussion: https://postgr.es/m/4E72940DA2BF16479384A86D54D0988A5678EC03@G01JPEXMBKW04
2018-07-06 07:29:12 -04:00
Alexander Korotkov 4d54543efa Fix upper limit for vacuum_cleanup_index_scale_factor
6ca33a88 sets upper limit for vacuum_cleanup_index_scale_factor to
DBL_MAX.  DBL_MAX appears to be platform-dependent. That causes
many buildfarm animals to fail, because we check boundaries of
vacuum_cleanup_index_scale_factor in regression tests.

This commit changes upper limit from DBL_MAX to just "large enough"
limit, which was arbitrary selected as 1e10.

Author: Alexander Korotkov
Reported-by: Tom Lane, Darafei Praliaskouski
Discussion: https://postgr.es/m/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
Discussion: https://postgr.es/m/CAC8Q8tLYFOpKNaPS_E7V8KtPdE%3D_TnAn16t%3DA3LuL%3DXjfOO-BQ%40mail.gmail.com
2018-06-26 21:55:59 +03:00
Alexander Korotkov 6ca33a885b Increase upper limit for vacuum_cleanup_index_scale_factor
Upper limits for vacuum_cleanup_index_scale_factor GUC and reloption
were initially set to 100.0 in 857f9c36.  However, after further
discussion, it appears that some users like to disable B-tree cleanup
index scan completely (assuming there are no deleted pages).

vacuum_cleanup_index_scale_factor is used barely to protect against
stalled index statistics.  And after detailed consideration it appears
that risk of stalled index statistics is low.  And it would be nice to
allow advanced users setting higher values of
vacuum_cleanup_index_scale_factor.  So, set upper limit for these
GUC and reloption to DBL_MAX.

Author: Alexander Korotkov
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8tJCb%3DgxhzcV7T6ctx7PY-Ux1oA-AsTJc6cAVNsQiYcCzA%40mail.gmail.com
2018-06-26 15:00:51 +03:00
Alexander Korotkov 9a994e37e0 Fixes for vacuum_cleanup_index_scale_factor GUC option
vacuum_cleanup_index_scale_factor was located in autovacuum group of
GUCs.  However, it affects not only autovacuum, but also manually run
VACUUM.  It appears that "client connection defaults" group of GUCs
is more appropriate for vacuum_cleanup_index_scale_factor, because
vacuum_*_age options are already located there.

Also, vacuum_cleanup_index_scale_factor was missed in
postgresql.conf.sample.  So, add it there with appropriate comment.

Author: Masahiko Sawada with minor editorization by me
Discussion: https://postgr.es/m/CAD21AoArsoXMLKudXSKN679FRzs6oubEchM53bHwn8Tp%3D2boNg%40mail.gmail.com
2018-06-22 12:26:21 +03:00
Alvaro Herrera 0c8910a0ca Teach SHOW ALL to honor pg_read_all_settings membership
Also, fix the pg_settings view to display source filename and line
number when invoked by a pg_read_all_settings member.  This addition by
me (Álvaro).

Also, fix wording of the comment in GetConfigOption regarding the
restriction it implements, renaming the parameter for extra clarity.
Noted by Michaël.

These were all oversight in commit 25fff40798fc; backpatch to pg10,
where that commit first appeared.

Author: Laurenz Albe
Reviewed-by: Michaël Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/1519917758.6586.8.camel@cybertec.at
2018-06-08 16:19:05 -04:00
Peter Eisentraut 2241ad1556 Fix typo 2018-06-04 15:34:42 -04:00
Heikki Linnakangas b06d8e58b5 Accept "B" in all memory-unit GUCs, and improve error messages.
Commit 6e7baa3227 added support for "B" unit, for specifying config options
in bytes. However, it was only accepted in GUC_UNIT_BYTE settings,
wal_segment_size and track_activity_query_size, and not e.g. in work_mem.
This patch makes it consistent, so that "B" accepted in all the same
contexts where "kB", "MB", and so forth are accepted.

Add "B" to the list of accepted units in the error hint, along with "kB",
"MB", etc.

Add an entry in the conversion table for "TB" to "B" conversion. A terabyte
is out of range for any GUC_UNIT_BYTE option, so you always get an "out of
range" error with that, but without it, you get a confusing error message
that claims that "TB" is not an accepted unit, with a hint that nevertheless
lists "TB" as an accepted unit.

Reviewed-by: Alexander Korotkov, Andres Freund
Discussion: https://www.postgresql.org/message-id/1bfe7f4a-7e22-aa6e-7b37-f4d222ed2d67@iki.fi
2018-05-23 10:22:26 +03:00
Teodor Sigaev 8e12f4a250 Various improvements of skipping index scan during vacuum technics
- Change vacuum_cleanup_index_scale_factor GUC to PGC_USERSET.
  vacuum_cleanup_index_scale_factor GUC was defined as PGC_SIGHUP.  But this
  GUC affects not only autovacuum.  So it might be useful to change it from user
  session in order to influence manually runned VACUUM.
- Add missing tab-complete support for vacuum_cleanup_index_scale_factor
  reloption.
- Fix condition for B-tree index cleanup.
  Zero value of vacuum_cleanup_index_scale_factor means that user wants B-tree
  index cleanup to be never skipped.
- Documentation and comment improvements

Authors: Justin Pryzby, Alexander Korotkov, Liudmila Mantrova
Reviewed by: all authors and Robert Haas
Discussion: https://www.postgresql.org/message-id/flat/20180502023025.GD7631%40telsasoft.com
2018-05-10 13:31:47 +03:00
Tom Lane fbb2e9a030 Fix assorted compiler warnings seen in the buildfarm.
Failure to use DatumGetFoo/FooGetDatum macros correctly, or at all,
causes some warnings about sign conversion.  This is just cosmetic
at the moment but in principle it's a type violation, so clean up
the instances I could find.

autoprewarm.c and sharedfileset.c contained code that unportably
assumed that pid_t is the same size as int.  We've variously dealt
with this by casting pid_t to int or to unsigned long for printing
purposes; I went with the latter.

Fix uninitialized-variable warning in RestoreGUCState.  This is
a live bug in some sense, but of no great significance given that
nobody is very likely to care what "line number" is associated with
a GUC that hasn't got a source file recorded.
2018-05-02 15:52:54 -04:00
Tom Lane 41c912cad1 Clean up warnings from -Wimplicit-fallthrough.
Recent gcc can warn about switch-case fall throughs that are not
explicitly labeled as intentional.  This seems like a good thing,
so clean up the warnings exposed thereby by labeling all such
cases with comments that gcc will recognize.

In files that already had one or more suitable comments, I generally
matched the existing style of those.  Otherwise I went with
/* FALLTHROUGH */, which is one of the spellings approved at the
more-restrictive-than-default level -Wimplicit-fallthrough=4.
(At the default level you can also spell it /* FALL ?THRU */,
and it's not picky about case.  What you can't do is include
additional text in the same comment, so some existing comments
containing versions of this aren't good enough.)

Testing with gcc 8.0.1 (Fedora 28's current version), I found that
I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR);
apparently, for this purpose gcc doesn't recognize that those don't
return.  That seems like possibly a gcc bug, but it's fine because
in most places we did that anyway; so this amounts to a visit from the
style police.

Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us
2018-05-01 19:35:08 -04:00
Tom Lane bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Alvaro Herrera 055fb8d33d Add GUC enable_partition_pruning
This controls both plan-time and execution-time new-style partition
pruning.  While finer-grain control is possible (maybe using an enum GUC
instead of boolean), there doesn't seem to be much need for that.

This new parameter controls partition pruning for all queries:
trivially, SELECT queries that affect partitioned tables are naturally
under its control since they are using the new technology.  However,
while UPDATE/DELETE queries do not use the new code, we make the new GUC
control their behavior also (stealing control from
constraint_exclusion), because it is more natural, and it leads to a
more natural transition to the future in which those queries will also
use the new pruning code.

Constraint exclusion still controls pruning for regular inheritance
situations (those not involving partitioned tables).

Author: David Rowley
Review: Amit Langote, Ashutosh Bapat, Justin Pryzby, David G. Johnston
Discussion: https://postgr.es/m/CAKJS1f_0HwsxJG9m+nzU+CizxSdGtfe6iF_ykPYBiYft302DCw@mail.gmail.com
2018-04-23 17:57:43 -03:00
Alvaro Herrera da6f3e45dd Reorganize partitioning code
There's been a massive addition of partitioning code in PostgreSQL 11,
with little oversight on its placement, resulting in a
catalog/partition.c with poorly defined boundaries and responsibilities.
This commit tries to set a couple of distinct modules to separate things
a little bit.  There are no code changes here, only code movement.

There are three new files:
  src/backend/utils/cache/partcache.c
  src/include/partitioning/partdefs.h
  src/include/utils/partcache.h

The previous arrangement of #including catalog/partition.h almost
everywhere is no more.

Authors: Amit Langote and Álvaro Herrera
Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp
	https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp
	https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp
	https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql
2018-04-14 21:12:14 -03:00
Magnus Hagander a228cc13ae Revert "Allow on-line enabling and disabling of data checksums"
This reverts the backend sides of commit 1fde38beaa.
I have, at least for now, left the pg_verify_checksums tool in place, as
this tool can be very valuable without the rest of the patch as well,
and since it's a read-only tool that only runs when the cluster is down
it should be a lot safer.
2018-04-09 19:03:42 +02:00
Stephen Frost c37b3d08ca Allow group access on PGDATA
Allow the cluster to be optionally init'd with read access for the
group.

This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.

The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates.  This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.

Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein.  If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required.  Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.

Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.

Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Magnus Hagander 1fde38beaa Allow on-line enabling and disabling of data checksums
This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).

Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.

This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.

Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
2018-04-05 22:04:48 +02:00
Teodor Sigaev 857f9c36cd Skip full index scan during cleanup of B-tree indexes when possible
Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete
calls and one amvacuumcleanup call. When workload on particular table
is append-only, then autovacuum isn't intended to touch this table. However,
user may run vacuum manually in order to fill visibility map and get benefits
of index-only scans. Then ambulkdelete wouldn't be called for indexes
of such table (because no heap tuples were deleted), only amvacuumcleanup would
be called In this case, amvacuumcleanup would perform full index scan for
two objectives: put recyclable pages into free space map and update index
statistics.

This patch allows btvacuumclanup to skip full index scan when two conditions
are satisfied: no pages are going to be put into free space map and index
statistics isn't stalled. In order to check first condition, we store
oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then
there are some recyclable pages. In order to check second condition we store
number of heap tuples observed during previous full index scan by cleanup.
If fraction of newly inserted tuples is less than
vacuum_cleanup_index_scale_factor, then statistics isn't considered to be
stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default).

This patch bumps B-tree meta-page version. Upgrade of meta-page is performed
"on the fly": during VACUUM meta-page is rewritten with new version. No special
handling in pg_upgrade is required.

Author: Masahiko Sawada, Alexander Korotkov
Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov
Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 19:29:00 +03:00
Andres Freund 9370462e9a Add inlining support to LLVM JIT provider.
This provides infrastructure to allow JITed code to inline code
implemented in C. This e.g. can be postgres internal functions or
extension code.

This already speeds up long running queries, by allowing the LLVM
optimizer to optimize across function boundaries. The optimization
potential currently doesn't reach its full potential because LLVM
cannot optimize the FunctionCallInfoData argument fully away, because
it's allocated on the heap rather than the stack. Fixing that is
beyond what's realistic for v11.

To be able to do that, use CLANG to convert C code to LLVM bitcode,
and have LLVM build a summary for it. That bitcode can then be used to
to inline functions at runtime. For that the bitcode needs to be
installed. Postgres bitcode goes into $pkglibdir/bitcode/postgres,
extensions go into equivalent directories.  PGXS has been modified so
that happens automatically if postgres has been compiled with LLVM
support.

Currently this isn't the fastest inline implementation, modules are
reloaded from disk during inlining. That's to work around an apparent
LLVM bug, triggering an apparently spurious error in LLVM assertion
enabled builds.  Once that is resolved we can remove the superfluous
read from disk.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-28 13:19:08 -07:00
Andres Freund 32af96b2b1 JIT tuple deforming in LLVM JIT provider.
Performing JIT compilation for deforming gains performance benefits
over unJITed deforming from compile-time knowledge of the tuple
descriptor. Fixed column widths, NOT NULLness, etc can be taken
advantage of.

Right now the JITed deforming is only used when deforming tuples as
part of expression evaluation (and obviously only if the descriptor is
known). It's likely to be beneficial in other cases, too.

By default tuple deforming is JITed whenever an expression is JIT
compiled. There's a separate boolean GUC controlling it, but that's
expected to be primarily useful for development and benchmarking.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-26 12:57:19 -07:00
Andres Freund 2a0faed9d7 Add expression compilation support to LLVM JIT provider.
In addition to the interpretation of expressions (which back
evaluation of WHERE clauses, target list projection, aggregates
transition values etc) support compiling expressions to native code,
using the infrastructure added in earlier commits.

To avoid duplicating a lot of code, only support emitting code for
cases that are likely to be performance critical. For expression steps
that aren't deemed that, use the existing interpreter.

The generated code isn't great - some architectural changes are
required to address that. But this already yields a significant
speedup for some analytics queries, particularly with WHERE clauses
filtering a lot, or computing multiple aggregates.

Author: Andres Freund
Tested-By: Thomas Munro
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de

Disable JITing for VALUES() nodes.

VALUES() nodes are only ever executed once. This is primarily helpful
for debugging, when forcing JITing even for cheap queries.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-22 14:45:59 -07:00
Andres Freund cc415a56d0 Basic planner and executor integration for JIT.
This adds simple cost based plan time decision about whether JIT
should be performed. jit_above_cost, jit_optimize_above_cost are
compared with the total cost of a plan, and if the cost is above them
JIT is performed / optimization is performed respectively.

For that PlannedStmt and EState have a jitFlags (es_jit_flags) field
that stores information about what JIT operations should be performed.

EState now also has a new es_jit field, which can store a
JitContext. When there are no errors the context is released in
standard_ExecutorEnd().

It is likely that the default values for jit_[optimize_]above_cost
will need to be adapted further, but in my test these values seem to
work reasonably.

Author: Andres Freund, with feedback by Peter Eisentraut
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-22 11:51:58 -07:00
Andres Freund 250bca7fc1 Debugging and profiling support for LLVM JIT provider.
This currently requires patches to the LLVM codebase to be
effective (submitted upstream), the GUCs are available without those
patches however.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-22 11:07:55 -07:00
Andres Freund b96d550eb0 Support for optimizing and emitting code in LLVM JIT provider.
This commit introduces the ability to actually generate code using
LLVM. In particular, this adds:

- Ability to emit code both in heavily optimized and largely
  unoptimized fashion
- Batching facility to allow functions to be defined in small
  increments, but optimized and emitted in executable form in larger
  batches (for performance and memory efficiency)
- Type and function declaration synchronization between runtime
  generated code and normal postgres code. This is critical to be able
  to access struct fields etc.
- Developer oriented jit_dump_bitcode GUC, for inspecting / debugging
  the generated code.
- per JitContext statistics of number of functions, time spent
  generating code, optimizing, and emitting it.  This will later be
  employed for EXPLAIN support.

This commit doesn't yet contain any code actually generating
functions. That'll follow in later commits.

Documentation for GUCs added, and for JIT in general, will be added in
later commits.

Author: Andres Freund, with contributions by Pierre Ducroquet
Testing-By: Thomas Munro, Peter Eisentraut
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-22 11:05:22 -07:00
Robert Haas e2f1eb0ee3 Implement partition-wise grouping/aggregation.
If the partition keys of input relation are part of the GROUP BY
clause, all the rows belonging to a given group come from a single
partition.  This allows aggregation/grouping over a partitioned
relation to be broken down * into aggregation/grouping on each
partition.  This should be no worse, and often better, than the normal
approach.

If the GROUP BY clause does not contain all the partition keys, we can
still perform partial aggregation for each partition and then finalize
aggregation after appending the partial results.  This is less certain
to be a win, but it's still useful.

Jeevan Chalke, Ashutosh Bapat, Robert Haas.  The larger patch series
of which this patch is a part was also reviewed and tested by Antonin
Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin
Knizhnik, Pascal Legrand, and Rafia Sabih.

Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-03-22 12:49:48 -04:00
Andres Freund 432bb9e04d Basic JIT provider and error handling infrastructure.
This commit introduces:

1) JIT provider abstraction, which allows JIT functionality to be
   implemented in separate shared libraries. That's desirable because
   it allows to install JIT support as a separate package, and because
   it allows experimentation with different forms of JITing.
2) JITContexts which can be, using functions introduced in follow up
   commits, used to emit JITed functions, and have them be cleaned up
   on error.
3) The outline of a LLVM JIT provider, which will be fleshed out in
   subsequent commits.

Documentation for GUCs added, and for JIT in general, will be added in
later commits.

Author: Andres Freund, with architectural input from Jeff Davis
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-21 19:28:28 -07:00
Tom Lane 846b5a5257 Prevent extensions from creating custom GUCs that are GUC_LIST_QUOTE.
Pending some solution for the problems noted in commit 742869946,
disallow dynamic creation of GUC_LIST_QUOTE variables.

If there are any extensions out there using this feature, they'd not
be happy for us to start enforcing this rule in minor releases, so
this is a HEAD-only change.  The previous commit didn't make things
any worse than they already were for such cases.

Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz
2018-03-21 20:11:07 -04:00
Tom Lane 742869946f Fix mishandling of quoted-list GUC values in pg_dump and ruleutils.c.
Code that prints out the contents of setconfig or proconfig arrays in
SQL format needs to handle GUC_LIST_QUOTE variables differently from
other ones, because for those variables, flatten_set_variable_args()
already applied a layer of quoting.  The value can therefore safely
be printed as-is, and indeed must be, or flatten_set_variable_args()
will muck it up completely on reload.  For all other GUC variables,
it's necessary and sufficient to quote the value as a SQL literal.

We'd recognized the need for this long ago, but mis-analyzed the
need slightly, thinking that all GUC_LIST_INPUT variables needed
the special treatment.  That's actually wrong, since a valid value
of a LIST variable might include characters that need quoting,
although no existing variables accept such values.

More to the point, we hadn't made any particular effort to keep the
various places that deal with this up-to-date with the set of variables
that actually need special treatment, meaning that we'd do the wrong
thing with, for example, temp_tablespaces values.  This affects dumping
of SET clauses attached to functions, as well as ALTER DATABASE/ROLE SET
commands.

In ruleutils.c we can fix it reasonably honestly by exporting a guc.c
function that allows discovering the flags for a given GUC variable.
But pg_dump doesn't have easy access to that, so continue the old method
of having a hard-wired list of affected variable names.  At least we can
fix it to have just one list not two, and update the list to match
current reality.

A remaining problem with this is that it only works for built-in
GUC variables.  pg_dump's list obvious knows nothing of third-party
extensions, and even the "ask guc.c" method isn't bulletproof since
the relevant extension might not be loaded.  There's no obvious
solution to that, so for now, we'll just have to discourage extension
authors from inventing custom GUCs that need GUC_LIST_QUOTE.

This has been busted for a long time, so back-patch to all supported
branches.

Michael Paquier and Tom Lane, reviewed by Kyotaro Horiguchi and
Pavel Stehule

Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz
2018-03-21 20:03:28 -04:00
Peter Eisentraut 8a3d942529 Add ssl_passphrase_command setting
This allows specifying an external command for prompting for or
otherwise obtaining passphrases for SSL key files.  This is useful
because in many cases there is no TTY easily available during service
startup.

Also add a setting ssl_passphrase_command_supports_reload, which allows
supporting SSL configuration reload even if SSL files need passphrases.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2018-03-17 08:28:51 -04:00
Peter Eisentraut 04700b685f Rename TransactionChain functions
We call this thing a "transaction block" everywhere except in a few
functions, where it is mysteriously called a "transaction chain".  In
the SQL standard, a transaction chain is something different.  So rename
these functions to match the common terminology.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-03-16 13:18:06 -04:00
Peter Eisentraut 3a4b891964 Fix more format truncation issues
Fix the warnings created by the compiler warning options
-Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7.  This
is a more aggressive variant of the fixes in
6275f5d28a, which GCC 7 warned about by
default.

The issues are all harmless, but some dubious coding patterns are
cleaned up.

One issue that is of external interest is that BGW_MAXLEN is increased
from 64 to 96.  Apparently, the old value would cause the bgw_name of
logical replication workers to be truncated in some circumstances.

But this doesn't actually add those warning options.  It appears that
the warnings depend a bit on compilation and optimization options, so it
would be annoying to have to keep up with that.  This is more of a
once-in-a-while cleanup.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-15 11:41:42 -04:00
Peter Eisentraut 6cf86f4354 Change internal integer representation of Value node
A Value node would store an integer as a long.  This causes needless
portability risks, as long can be of varying sizes.  Change it to use
int instead.  All code using this was already careful to only store
32-bit values anyway.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-13 09:56:25 -04:00
Tom Lane 4e0c743c18 Fix cross-checking of ReservedBackends/max_wal_senders/MaxConnections.
We were independently checking ReservedBackends < MaxConnections and
max_wal_senders < MaxConnections, but because walsenders aren't allowed
to use superuser-reserved connections, that's really the wrong thing.
Correct behavior is to insist on ReservedBackends + max_wal_senders being
less than MaxConnections.  Fix the code and associated documentation.

This has been wrong for a long time, but since the situation probably
hardly ever arises in the field (especially pre-v10, when the default
for max_wal_senders was zero), no back-patch.

Discussion: https://postgr.es/m/28271.1520195491@sss.pgh.pa.us
2018-03-08 11:25:26 -05:00
Alvaro Herrera f4a2842ac3 Fix typo
Author: Kyotaro HORIGUCHI
Discussion: https://postgr.es/m/20180307.163428.209919771.horiguchi.kyotaro@lab.ntt.co.jp
2018-03-07 07:08:38 -03:00
Tom Lane 49bff412ed Remove some inappropriate #includes.
Other header files should never #include postgres.h (nor postgres_fe.h,
nor c.h), per project policy.  Also, there's no need for any backend .c
file to explicitly include elog.h or palloc.h, because postgres.h pulls
those in already.

Extracted from a larger patch by Kyotaro Horiguchi.  The rest of the
removals he suggests require more study, but these are no-brainers.

Discussion: https://postgr.es/m/20180215.200447.209320006.horiguchi.kyotaro@lab.ntt.co.jp
2018-02-16 12:14:08 -05:00
Peter Eisentraut 2fb1abaeb0 Rename enable_partition_wise_join to enable_partitionwise_join
Discussion: https://www.postgresql.org/message-id/flat/ad24e4f4-6481-066e-e3fb-6ef4a3121882%402ndquadrant.com
2018-02-16 10:33:59 -05:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Peter Eisentraut 7404e77cc1 Split out documentation of SSL parameters into their own section
Split the "Authentication and Security" section into two separate
sections "Authentication" and "SSL".  The latter part has gotten much
longer over time, and doesn't primarily have to do with authentication.

Also, the row_security parameter was inconsistently categorized, so
clean that up while we're here.
2018-01-23 07:11:38 -05:00
Magnus Hagander 1cc4f536ef Support huge pages on Windows
Add support for huge pages (called large pages on Windows) to the
Windows build.

This (probably) breaks compatibility with Windows versions prior to
Windows 2003 or Windows Vista.

Authors: Takayuki Tsunakawa and Thomas Munro
Reviewed by: Magnus Hagander, Amit Kapila
2018-01-21 15:40:46 +01:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Andres Freund 1804284042 Add parallel-aware hash joins.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash.  While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.

After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory.  If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.

The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * divides the work of building the hash table over the CPUs

One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables.  This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes.  Another is that
outer batch 0 must be written to disk if multiple batches are required.

A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.

A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.

Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 00:43:41 -08:00
Robert Haas ab72716778 Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention.  We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.

Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 17:28:39 -05:00
Simon Riggs 2ede45c3a4 Fix pg_control_checkpoint from commit 4b0d28de06
Author: Simon Riggs <simon@2ndQuadrant.com>
Reported-By: Andreas Seltenreich <seltenreich@gmx.de>
2017-11-21 08:00:54 +11:00
Robert Haas 611fe7d479 Update postgresql.conf.sample comment for bgwriter_lru_maxpages
Commit 14ca9abfbe should have done
this, but did not.

Jeff Janes

Discussion: http://postgr.es/m/CAMkU=1yWOvL+YFYzGM9yXSoWjxr_5_Ny78pPzLKQCkfgB7H-JQ@mail.gmail.com
2017-11-17 14:52:00 -05:00
Robert Haas 79f2d63713 Update postgresql.conf.sample to match pg_settings classificaitons.
A handful of settings, most notably shared_preload_libraries, were
just plain the wrong place compared to their assigned config_group
value in guc.c (and thus pg_settings).  In other cases the names of
the sections in postgresql.conf.sample were mildly different from
the corresponding entries in config_group_names[].  Make it all
consistent.

Adrián Escoms, reviewed by me.

Discussion: http://postgr.es/m/CACksPC2veEmFRYqwYepWYO9U7aFhAx6sYq+WqjTyHw7uV=E=pw@mail.gmail.com
2017-11-16 12:57:17 -05:00
Robert Haas ebc189e122 Fix typo.
Jesper Pedersen

Discussion: http://postgr.es/m/000f92d6-f623-95a5-b341-46e2c0495cea@redhat.com
2017-11-15 08:37:41 -05:00
Robert Haas e5253fdc4f Add parallel_leader_participation GUC.
Sometimes, for testing, it's useful to have the leader do nothing but
read tuples from workers; and it's possible that could work out better
even in production.

Thomas Munro, reviewed by Amit Kapila and by me.  A few final tweaks
by me.

Discussion: http://postgr.es/m/CAEepm=2U++Lp3bNTv2Bv_kkr5NE2pOyHhxU=G0YTa4ZhSYhHiw@mail.gmail.com
2017-11-15 08:23:18 -05:00
Tom Lane ae20b23a9e Refactor permissions checks for large objects.
Up to now, ACL checks for large objects happened at the level of
the SQL-callable functions, which led to CVE-2017-7548 because of a
missing check.  Push them down to be enforced in inv_api.c as much
as possible, in hopes of preventing future bugs.  This does have the
effect of moving read and write permission errors to happen at lo_open
time not loread or lowrite time, but that seems acceptable.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/CAB7nPqRHmNOYbETnc_2EjsuzSM00Z+BWKv9sy6tnvSd5gWT_JA@mail.gmail.com
2017-11-09 12:56:07 -05:00
Tom Lane 6c3a7ba5bb Fix typo in ALTER SYSTEM output.
The header comment written into postgresql.auto.conf by ALTER SYSTEM
should match what initdb put there originally.

Feike Steenbergen

Discussion: https://postgr.es/m/CAK_s-G0KcKdO=0hqZkwb3s+tqZuuHwWqmF5BDsmoO9FtX75r0g@mail.gmail.com
2017-11-09 11:57:20 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Simon Riggs 4b0d28de06 Remove secondary checkpoint
Previously server reserved WAL for last two checkpoints,
which used too much disk space for small servers.

Bumps PG_CONTROL_VERSION

Author: Simon Riggs <simon@2ndQuadrant.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-07 12:56:30 -05:00
Robert Haas 846fcc8516 Fix problems with the "role" GUC and parallel query.
Without this fix, dropping a role can sometimes result in parallel
query failures in sessions that have used "SET ROLE" to assume the
dropped role, even if that setting isn't active any more.

Report by Pavan Deolasee.  Patch by Amit Kapila, reviewed by me.

Discussion: http://postgr.es/m/CABOikdOomRcZsLsLK+Z+qENM1zxyaWnAvFh3MJZzZnnKiF+REg@mail.gmail.com
2017-10-29 12:58:40 +05:30
Peter Eisentraut 4211673622 Exclude flex-generated code from coverage testing
Flex generates a lot of functions that are not actually used.  In order
to avoid coverage figures being ruined by that, mark up the part of the
.l files where the generated code appears by lcov exclusion markers.
That way, lcov will typically only reported on coverage for the .l file,
which is under our control, but not for the .c file.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-10-16 16:28:11 -04:00
Robert Haas f49842d1ee Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually.  This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels.  This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation.  In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side.  All the same, for now,
turn this feature off by default.

Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical.  It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me.  A few final adjustments by me.

Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 11:11:10 -04:00
Robert Haas 8b304b8b72 Remove replacement selection sort.
At the time replacement_sort_tuples was introduced, there were still
cases where replacement selection sort noticeably outperformed using
quicksort even for the first run.  However, those cases seem to have
evaporated as a result of further improvements made since that time
(and perhaps also advances in CPU technology).  So remove replacement
selection and the controlling GUC entirely.  This makes tuplesort.c
noticeably simpler and probably paves the way for further
optimizations someone might want to do later.

Peter Geoghegan, with review and testing by Tomas Vondra and me.

Discussion: https://postgr.es/m/CAH2-WzmmNjG_K0R9nqYwMq3zjyJJK+hCbiZYNGhAy-Zyjs64GQ@mail.gmail.com
2017-09-29 10:25:44 -04:00
Peter Eisentraut 0c5803b450 Refactor new file permission handling
The file handling functions from fd.c were called with a diverse mix of
notations for the file permissions when they were opening new files.
Almost all files created by the server should have the same permissions
set.  So change the API so that e.g. OpenTransientFile() automatically
uses the standard permissions set, and OpenTransientFilePerm() is a new
function that takes an explicit permissions set for the few cases where
it is needed.  This also saves an unnecessary argument for call sites
that are just opening an existing file.

While we're reviewing these APIs, get rid of the FileName typedef and
use the standard const char * for the file name and mode_t for the file
mode.  This makes these functions match other file handling functions
and removes an unnecessary layer of mysteriousness.  We can also get rid
of a few casts that way.

Author: David Steele <david@pgmasters.net>
2017-09-23 10:16:18 -04:00
Andres Freund fc49e24fa6 Make WAL segment size configurable at initdb time.
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.

But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.

This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured.  For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.

Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
    Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-19 22:03:48 -07:00
Andres Freund 6e7baa3227 Introduce BYTES unit for GUCs.
This is already useful for track_activity_query_size, and will further
be used in a later commit making the WAL segment size configurable.

Author: Beena Emerson
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAOG9ApEu8bXVwBxkOO9J7ZpM76TASK_vFMEEiCEjwhMmSLiaqQ@mail.gmail.com
2017-09-12 12:13:12 -07:00
Peter Eisentraut 821fb8cdbf Message style fixes 2017-09-11 11:21:27 -04:00
Tom Lane 3cf17c9d47 Remove mention of password_encryption = plain in postgresql.conf.sample.
Evidently missed in commit eb61136dc.

Spotted by Oleg Bartunov.

Discussion: https://postgr.es/m/CAF4Au4wz_iK5r4fnTnnd8XqioAZQs-P7-VsEAfivW34zMVpAmw@mail.gmail.com
2017-09-08 14:38:54 -04:00
Peter Eisentraut 1356f78ea9 Reduce excessive dereferencing of function pointers
It is equivalent in ANSI C to write (*funcptr) () and funcptr().  These
two styles have been applied inconsistently.  After discussion, we'll
use the more verbose style for plain function pointer variables, to make
it clear that it's a variable, and the shorter style when the function
pointer is in a struct (s.func() or s->func()), because then it's clear
that it's not a plain function name, and otherwise the excessive
punctuation makes some of those invocations hard to read.

Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
2017-09-07 13:56:09 -04:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Heikki Linnakangas c0a15e07cd Always use 2048 bit DH parameters for OpenSSL ephemeral DH ciphers.
1024 bits is considered weak these days, but OpenSSL always passes 1024 as
the key length to the tmp_dh callback. All the code to handle other key
lengths is, in fact, dead.

To remedy those issues:

* Only include hard-coded 2048-bit parameters.
* Set the parameters directly with SSL_CTX_set_tmp_dh(), without the
  callback
* The name of the file containing the DH parameters is now a GUC. This
  replaces the old hardcoded "dh1024.pem" filename. (The files for other
  key lengths, dh512.pem, dh2048.pem, etc. were never actually used.)

This is not a new problem, but it doesn't seem worth the risk and churn to
backport. If you care enough about the strength of the DH parameters on
old versions, you can create custom DH parameters, with as many bits as you
wish, and put them in the "dh1024.pem" file.

Per report by Nicolas Guini and Damian Quiroga. Reviewed by Michael Paquier.

Discussion: https://www.postgresql.org/message-id/CAMxBoUyjOOautVozN6ofzym828aNrDjuCcOTcCquxjwS-L2hGQ@mail.gmail.com
2017-07-31 22:36:09 +03:00
Tatsuo Ishii 393d47ed0f Add missing comment in postgresql.conf.
current_source requires to restart server to reflect the new
value. Per Yugo Nagata and Masahiko Sawada.

Back patched to 9.2 and beyond.
2017-07-31 11:24:51 +09:00
Tatsuo Ishii 8b015dd723 Add missing comment in postgresql.conf.
dynamic_shared_memory_type requires to restart server to reflect
the new value. Per Yugo Nagata and Masahiko Sawada.

Back pached to 9.4 and beyond.
2017-07-31 11:06:37 +09:00
Tatsuo Ishii 9fe63092b5 Add missing comment in postgresql.conf.
max_logical_replication_workers requires to restart server to reflect
the new value. Per Yugo Nagata. Minor editing by me.
2017-07-31 10:46:32 +09:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00