Commit Graph

833 Commits

Author SHA1 Message Date
Michael Paquier 797b0fc0b0 doc: Move remove_temp_files_after_crash to section for developer options
The main goal of this option is to allow inspecting temporary files for
debugging purposes, so moving the parameter there is natural.

Oversight in cd91de0.

Reported-by: Justin Pryzby
Author: Euler Taveira
Discussion: https://postgr.es/m/20210612004347.GP16435@telsasoft.com
2021-06-25 08:40:16 +09:00
Michael Paquier f80979f659 doc: Apply markup <productname> to OpenSSL more consistently
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/CE12DD5C-4BB3-4166-BC9A-39779568734C@yesql.se
2021-06-18 14:22:31 +09:00
David Rowley 04539e73fa Use the correct article for abbreviations
We've accumulated quite a mix of instances of "an SQL" and "a SQL" in the
documents.  It would be good to be a bit more consistent with these.

The most recent version of the SQL standard I looked at seems to prefer
"an SQL".  That seems like a good lead to follow, so here we change all
instances of "a SQL" to become "an SQL".  Most instances correctly use
"an SQL" already, so it also makes sense to use the dominant variation in
order to minimise churn.

Additionally, there were some other abbreviations that needed to be
adjusted. FSM, SSPI, SRF and a few others.  Also fix some pronounceable,
abbreviations to use "a" instead of "an".  For example, "a SASL" instead
of "an SASL".

Here I've only adjusted the documents and error messages.  Many others
still exist in source code comments.  Translator hint comments seem to be
the biggest culprit.  It currently does not seem worth the churn to change
these.

Discussion: https://postgr.es/m/CAApHDvpML27UqFXnrYO1MJddsKVMQoiZisPvsAGhKE_tsKXquw%40mail.gmail.com
2021-06-11 13:38:04 +12:00
Tom Lane ba356a397d Doc: improve libpq service-file docs, avoid overspecifying pathnames.
Clarify libpq.sgml's description of service file locations and
semantics.  Avoid use of backtick'ed pg_config calls to describe
paths; that doesn't work on Windows, and even on Unix it's an
idiom that not all readers may be instantly familiar with.

Don't overspecify the locations of include files, instead writing
only as much as you'd use in #include directives.  The previous text
in these places was incorrect for some installations, depending on
where "postgresql" is in the install path.

Our convention for referencing the user's home directory seems
to be "~", so change the one place that spelled it "$HOME".

install-windows.sgml follows the platform convention of spelling
file paths with "\", so change the one place that used "/".

Haiying Tang and Tom Lane

Discussion: https://postgr.es/m/162149020918.26174.7150424047314144297@wrigleys.postgresql.org
2021-05-29 14:27:37 -04:00
Tom Lane e6241d8e03 Rethink definition of pg_attribute.attcompression.
Redefine '\0' (InvalidCompressionMethod) as meaning "if we need to
compress, use the current setting of default_toast_compression".
This allows '\0' to be a suitable default choice regardless of
datatype, greatly simplifying code paths that initialize tupledescs
and the like.  It seems like a more user-friendly approach as well,
because now the default compression choice doesn't migrate into table
definitions, meaning that changing default_toast_compression is
usually sufficient to flip an installation's behavior; one needn't
tediously issue per-column ALTER SET COMPRESSION commands.

Along the way, fix a few minor bugs and documentation issues
with the per-column-compression feature.  Adopt more robust
APIs for SetIndexStorageProperties and GetAttributeCompression.

Bump catversion because typical contents of attcompression will now
be different.  We could get away without doing that, but it seems
better to ensure v14 installations all agree on this.  (We already
forced initdb for beta2, anyway.)

Discussion: https://postgr.es/m/626613.1621787110@sss.pgh.pa.us
2021-05-27 13:24:27 -04:00
Michael Paquier 2941138e60 doc: Fix description of some GUCs in docs and postgresql.conf.sample
The following parameters have been imprecise, or incorrect, about their
description (PGC_POSTMASTER or PGC_SIGHUP):
- autovacuum_work_mem (docs, as of 9.6~)
- huge_page_size (docs, as of 14~)
- max_logical_replication_workers (docs, as of 10~)
- max_sync_workers_per_subscription (docs, as of 10~)
- min_dynamic_shared_memory (docs, as of 14~)
- recovery_init_sync_method (postgresql.conf.sample, as of 14~)
- remove_temp_files_after_crash (docs, as of 14~)
- restart_after_crash (docs, as of 9.6~)
- ssl_min_protocol_version (docs, as of 12~)
- ssl_max_protocol_version (docs, as of 12~)

This commit adjusts the description of all these parameters to be more
consistent with the practice used for the others.

Revewed-by: Justin Pryzby
Discussion: https://postgr.es/m/YK2ltuLpe+FbRXzA@paquier.xyz
Backpatch-through: 9.6
2021-05-27 14:57:28 +09:00
Alvaro Herrera cafde58b33
Allow compute_query_id to be set to 'auto' and make it default
Allowing only on/off meant that all either all existing configuration
guides would become obsolete if we disabled it by default, or that we
would have to accept a performance loss in the default config if we
enabled it by default.  By allowing 'auto' as a middle ground, the
performance cost is only paid by those who enable pg_stat_statements and
similar modules.

I only edited the release notes to comment-out a paragraph that is now
factually wrong; further edits are probably needed to describe the
related change in more detail.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20210513002623.eugftm4nk2lvvks3@nol
2021-05-15 14:13:09 -04:00
Thomas Munro c2dc19342e Revert recovery prefetching feature.
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.

Commits reverted:

dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery."
1d257577: Optionally prefetch referenced data in recovery.
f003d9f8: Add circular WAL decoding buffer.
323cbe7c: Remove read_page callback from XLogReader.

Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the
corresponding section of config.sgml is now reverted.

Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
2021-05-10 16:06:09 +12:00
Michael Paquier 02a93e7ef9 doc: Fix some gaps with the documentation related to LZ4
The upstream project is officially named "LZ4", and the documentation
was confused with the option value that can be used with DDLs supporting
this option, and the project name.

Documentation related to the configure option --with-lz4 was missing, so
add something for that.

Author: Dilip Kumar, Michael Paquier
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/YJaOZQDXBVySq+Cc@paquier.xyz
2021-05-10 09:32:56 +09:00
Tom Lane f9b809e7fb Doc: copy-editing for debug_invalidate_system_caches_always description.
I came to fix "useful only useful", but the more I looked at the text
the more things I thought could be improved.
2021-05-08 11:33:13 -04:00
Alvaro Herrera 94b9cb7225
Improve documentation for default_tablespace on partitioned tables
Backpatch to 12, where 87259588d0 introduced the current behavior.

Per note from Justin Pryzby.

Co-authored-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20210416143135.GI3315@telsasoft.com
2021-04-29 11:31:24 -04:00
Michael Paquier 1599e7b375 doc: Move parallel_leader_participation to its correct category
parallel_leader_participation got introduced in e5253fd, where it was
listed under RESOURCES_ASYNCHRONOUS in guc.c, but the documentation
did not reflect that and listed it with the other planner-related
options.  This commit fixes this inconsistency as the parameter is
intended to be an asynchronous one.

While on it, reorganize a bit the section dedicated to asynchronous
parameters, backend_flush_after being moved first to do better in terms
of alphabetical order of the options listed.

Reported-by: Yanliang Lei
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/16972-42d4b0c15aa1d5f5@postgresql.org
2021-04-22 09:47:43 +09:00
Bruce Momjian db01f797dd Fix interaction of log_line_prefix's query_id and log_statement
log_statement is issued before query_id can be computed, so properly
clear the value, and document the interaction.

Reported-by: Fujii Masao, Michael Paquier

Discussion: https://postgr.es/m/YHPkU8hFi4no4NSw@paquier.xyz

Author: Julien Rouhaud
2021-04-20 12:57:59 -04:00
Michael Paquier ac725ee0f9 doc: Move force_parallel_mode to section for developer options
This GUC has always been classified as a planner option since its
introduction in 7c944bd, and was listed in postgresql.conf.sample.  As
this parameter exists for testing purposes, move it to the section
dedicated to developer parameters and hence remove it from
postgresql.conf.sample.  This will avoid any temptation to play with it
on production servers for users that should never really have to touch
this parameter.

The general description used for developer options is reworded a bit, to
take into account the inclusion of force_parallel_mode, per a suggestion
from Tom Lane.

Per discussion between Tom Lane, Bruce Momjian, Justin Pryzby, Bharath
Rupireddy and me.

Author: Justin Pryzby, Tom Lane
Discussion: https://postgr.es/m/20210403152402.GA8049@momjian.us
2021-04-14 15:55:55 +09:00
Michael Paquier b094063cd1 Move log_autovacuum_min_duration into its correct sections
This GUC has already been classified as LOGGING_WHAT, but its location
in postgresql.conf.sample and the documentation did not reflect that, so
fix those inconsistencies.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210404012546.GK6592@telsasoft.com
2021-04-12 13:53:17 +09:00
Tom Lane 07b76833b1 Doc: update documentation of check_function_bodies.
Adjust docs and description string to note that check_function_bodies
applies to procedures too.  (In hindsight it should have been named
check_routine_bodies, but it seems too late for that now.)

Daniel Westermann

Discussion: https://postgr.es/m/GV0P278MB04834A9EB9A74B036DC7CE49D2739@GV0P278MB0483.CHEP278.PROD.OUTLOOK.COM
2021-04-10 12:08:28 -04:00
Thomas Munro dc88460c24 Doc: Review for "Optionally prefetch referenced data in recovery."
Typos, corrections and language improvements in the docs, and a few in
code comments too.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20210409033703.GP6592%40telsasoft.com
2021-04-10 08:21:53 +12:00
Michael Paquier 609b0652af Fix typos and grammar in documentation and code comments
Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com
Backpatch-through: 9.6
2021-04-09 13:53:07 +09:00
Thomas Munro 1d257577e0 Optionally prefetch referenced data in recovery.
Introduce a new GUC recovery_prefetch, disabled by default.  When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.

The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.

The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2021-04-08 23:20:42 +12:00
Bruce Momjian f57a2f5e03 Add csvlog output for the new query_id value
This also adjusts the printf format for query id used by log_line_prefix
(%Q).

Reported-by: Justin Pryzby

Discussion: https://postgr.es/m/20210408005402.GG24239@momjian.us

Author: Julien Rouhaud, Bruce Momjian
2021-04-07 22:30:30 -04:00
Peter Geoghegan 1e55e7d175 Add wraparound failsafe to VACUUM.
Add a failsafe mechanism that is triggered by VACUUM when it notices
that the table's relfrozenxid and/or relminmxid are dangerously far in
the past.  VACUUM checks the age of the table dynamically, at regular
intervals.

When the failsafe triggers, VACUUM takes extraordinary measures to
finish as quickly as possible so that relfrozenxid and/or relminmxid can
be advanced.  VACUUM will stop applying any cost-based delay that may be
in effect.  VACUUM will also bypass any further index vacuuming and heap
vacuuming -- it only completes whatever remaining pruning and freezing
is required.  Bypassing index/heap vacuuming is enabled by commit
8523492d, which made it possible to dynamically trigger the mechanism
already used within VACUUM when it is run with INDEX_CLEANUP off.

It is expected that the failsafe will almost always trigger within an
autovacuum to prevent wraparound, long after the autovacuum began.
However, the failsafe mechanism can trigger in any VACUUM operation.
Even in a non-aggressive VACUUM, where we're likely to not advance
relfrozenxid, it still seems like a good idea to finish off remaining
pruning and freezing.   An aggressive/anti-wraparound VACUUM will be
launched immediately afterwards.  Note that the anti-wraparound VACUUM
that follows will itself trigger the failsafe, usually before it even
begins its first (and only) pass over the heap.

The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and
vacuum_multixact_failsafe_age.  There are no equivalent reloptions,
since that isn't expected to be useful.  The GUCs have rather high
defaults (both default to 1.6 billion), and are expected to generally
only be used to make the failsafe trigger sooner/more frequently.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 12:37:45 -07:00
Bruce Momjian 4f0b0966c8 Make use of in-core query id added by commit 5fd9dfa5f5
Use the in-core query id computation for pg_stat_activity,
log_line_prefix, and EXPLAIN VERBOSE.

Similar to other fields in pg_stat_activity, only the queryid from the
top level statements are exposed, and if the backends status isn't
active then the queryid from the last executed statements is displayed.

Add a %Q placeholder to include the queryid in log_line_prefix, which
will also only expose top level statements.

For EXPLAIN VERBOSE, if a query identifier has been computed, either by
enabling compute_query_id or using a third-party module, display it.

Bump catalog version.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 14:04:06 -04:00
Bruce Momjian 5fd9dfa5f5 Move pg_stat_statements query jumbling to core.
Add compute_query_id GUC to control whether a query identifier should be
computed by the core (off by default).  It's thefore now possible to
disable core queryid computation and use pg_stat_statements with a
different algorithm to compute the query identifier by using a
third-party module.

To ensure that a single source of query identifier can be used and is
well defined, modules that calculate a query identifier should throw an
error if compute_query_id specified to compute a query id and if a query
idenfitier was already calculated.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 13:06:56 -04:00
Michael Paquier 9afffcb833 Add some information about authenticated identity via log_connections
The "authenticated identity" is the string used by an authentication
method to identify a particular user.  In many common cases, this is the
same as the PostgreSQL username, but for some third-party authentication
methods, the identifier in use may be shortened or otherwise translated
(e.g. through pg_ident user mappings) before the server stores it.

To help administrators see who has actually interacted with the system,
this commit adds the capability to store the original identity when
authentication succeeds within the backend's Port, and generates a log
entry when log_connections is enabled.  The log entries generated look
something like this (where a local user named "foouser" is connecting to
the database as the database user called "admin"):

  LOG:  connection received: host=[local]
  LOG:  connection authenticated: identity="foouser" method=peer (/data/pg_hba.conf:88)
  LOG:  connection authorized: user=admin database=postgres application_name=psql

Port->authn_id is set according to the authentication method:

  bsd: the PostgreSQL username (aka the local username)
  cert: the client's Subject DN
  gss: the user principal
  ident: the remote username
  ldap: the final bind DN
  pam: the PostgreSQL username (aka PAM username)
  password (and all pw-challenge methods): the PostgreSQL username
  peer: the peer's pw_name
  radius: the PostgreSQL username (aka the RADIUS username)
  sspi: either the down-level (SAM-compatible) logon name, if
        compat_realm=1, or the User Principal Name if compat_realm=0

The trust auth method does not set an authenticated identity.  Neither
does clientcert=verify-full.

Port->authn_id could be used for other purposes, like a superuser-only
extra column in pg_stat_activity, but this is left as future work.

PostgresNode::connect_{ok,fails}() have been modified to let tests check
the backend log files for required or prohibited patterns, using the
new log_like and log_unlike parameters.  This uses a method based on a
truncation of the existing server log file, like issues_sql_like().
Tests are added to the ldap, kerberos, authentication and SSL test
suites.

Author: Jacob Champion
Reviewed-by: Stephen Frost, Magnus Hagander, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com
2021-04-07 10:16:39 +09:00
Fujii Masao 9de9294b0c Stop archive recovery if WAL generated with wal_level=minimal is found.
Previously if hot standby was enabled, archive recovery exited with
an error when it found WAL generated with wal_level=minimal.
But if hot standby was disabled, it just reported a warning and
continued in that case. Which could lead to data loss or errors
during normal operation. A warning was emitted, but users could
easily miss that and not notice this serious situation until
they encountered the actual errors.

To improve this situation, this commit changes archive recovery
so that it exits with FATAL error when it finds WAL generated with
wal_level=minimal whatever the setting of hot standby. This enables
users to notice the serious situation soon.

The FATAL error is thrown if archive recovery starts from a base
backup taken before wal_level is changed to minimal. When archive
recovery exits with the error, if users have a base backup taken
after setting wal_level to higher than minimal, they can recover
the database by starting archive recovery from that newer backup.
But note that if such backup doesn't exist, there is no easy way to
complete archive recovery, which may make the database server
unstartable and users may lose whole database. The commit adds
the note about this risk into the document.

Even in the case of unstartable database server, previously by just
disabling hot standby users could avoid the error during archive
recovery, forcibly start up the server and salvage data from it.
But note that this commit makes this procedure unavailable at all.

Author: Takamichi Osumi
Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao
Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com
2021-04-06 22:56:51 +09:00
Thomas Munro c30f54ad73 Detect POLLHUP/POLLRDHUP while running queries.
Provide a new GUC check_client_connection_interval that can be used to
check whether the client connection has gone away, while running very
long queries.  It is disabled by default.

For now this uses a non-standard Linux extension (also adopted by at
least one other OS).  POLLRDHUP is not defined by POSIX, and other OSes
don't have a reliable way to know if a connection was closed without
actually trying to read or write.

In future we might consider trying to send a no-op/heartbeat message
instead, but that could require protocol changes.

Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2021-04-03 09:02:41 +13:00
David Rowley 9eacee2e62 Add Result Cache executor node (take 2)
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-02 14:10:56 +13:00
David Rowley 28b3e3905c Revert b6002a796
This removes "Add Result Cache executor node".  It seems that something
weird is going on with the tracking of cache hits and misses as
highlighted by many buildfarm animals.  It's not yet clear what the
problem is as other parts of the plan indicate that the cache did work
correctly, it's just the hits and misses that were being reported as 0.

This is especially a bad time to have the buildfarm so broken, so
reverting before too many more animals go red.

Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
2021-04-01 13:33:23 +13:00
David Rowley b6002a796d Add Result Cache executor node
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-01 12:32:22 +13:00
Stephen Frost 3b0c647bbf Add a docs section for obsoleted and renamed functions and settings
The new appendix groups information on renamed or removed settings,
commands, etc into an out-of-the-way part of the docs.

The original id elements are retained in each subsection to ensure that
the same filenames are produced for HTML docs. This prevents /current/
links on the web from breaking, and allows users of the web docs
to follow links from old version pages to info on the changes in the
new version. Prior to this change, a link to /current/ for renamed
sections like the recovery.conf docs would just 404. Similarly if
someone searched for recovery.conf they would find the pg11 docs,
but there would be no /12/ or /current/ link, so they couldn't easily
find out that it was removed in pg12 or how to adapt.

Index entries are also added so that there's a breadcrumb trail for
users to follow when they know the old name, but not what we changed it
to. So a user who is trying to find out how to set standby_mode in
PostgreSQL 12+, or where pg_resetxlog went, now has more chance of
finding that information.

Craig Ringer and Stephen Frost
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/CAGRY4nzPNOyYQ_1-pWYToUVqQ0ThqP5jdURnJMZPm539fdizOg%40mail.gmail.com
Backpatch-through: 10
2021-03-31 16:23:25 -04:00
Etsuro Fujita 27e1f14563 Add support for asynchronous execution.
This implements asynchronous execution, which runs multiple parts of a
non-parallel-aware Append concurrently rather than serially to improve
performance when possible.  Currently, the only node type that can be
run concurrently is a ForeignScan that is an immediate child of such an
Append.  In the case where such ForeignScans access data on different
remote servers, this would run those ForeignScans concurrently, and
overlap the remote operations to be performed simultaneously, so it'll
improve the performance especially when the operations involve
time-consuming ones such as remote join and remote aggregation.

We may extend this to other node types such as joins or aggregates over
ForeignScans in the future.

This also adds the support for postgres_fdw, which is enabled by the
table-level/server-level option "async_capable".  The default is false.

Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself.  This commit
is mostly based on the patch proposed by Robert Haas, but also uses
stuff from the patch proposed by Kyotaro Horiguchi and from the patch
proposed by Thomas Munro.  Reviewed by Kyotaro Horiguchi, Konstantin
Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and
others.

Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com
Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
2021-03-31 18:45:00 +09:00
Stephen Frost bbcc4eb2e0 Change checkpoint_completion_target default to 0.9
Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
Nathan Bossart
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
2021-03-24 13:07:51 -04:00
Robert Haas e5595de03e Tidy up more loose ends related to configurable TOAST compression.
Change the default_toast_compression GUC to be an enum rather than
a string. Earlier, uncommitted versions of the patch supported using
CREATE ACCESS METHOD to add new compression methods to a running
system, but that idea was dropped before commit. So, we can simplify
the GUC handling as well, which has the nice side effect of improving
the error messages.

While updating the documentation to reflect the new GUC type, also
move it back to the right place in the list. I moved this while
revising what became commit 24f0e395ac,
but apparently the intended ordering is "alphabetical" rather than
"whatever Robert thinks looks nice."

Rejigger things to avoid having access/toast_compression.h depend on
utils/guc.h, so that we don't end up with every file that includes
it also depending on something largely unrelated. Move a few
inline functions back into the C source file partly to help reduce
dependencies and partly just to avoid clutter. A few very minor
cosmetic fixes.

Original patch by Justin Pryzby, but very heavily edited by me,
and reverse reviewed by him and also reviewed by by Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoYp=GT_ztUCeZg2i4hkHAQv8o=-nVJ1-TKWTG1zQOmOpg@mail.gmail.com
2021-03-24 12:36:08 -04:00
Amit Kapila 26acb54a13 Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."."
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.

List of commits reverted, in reverse chronological order:

ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".

Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
2021-03-24 11:29:15 +05:30
Robert Haas 24f0e395ac docs: Fix omissions related to configurable TOAST compression.
Previously, the default_toast_compression GUC was not documented,
and neither was pg_dump's new --no-toast-compression option.

Justin Pryzby and Robert Haas

Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
2021-03-22 10:34:10 -04:00
Thomas Munro 61752afb26 Provide recovery_init_sync_method=syncfs.
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery.  This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.

Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory.  This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.

The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.

Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
2021-03-20 12:07:28 +13:00
Tomas Vondra cd91de0d17 Remove temporary files after backend crash
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
2021-03-18 17:38:28 +01:00
Amit Kapila c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
2021-03-18 07:25:27 +05:30
Stephen Frost 94d13d474d Improve logging of auto-vacuum and auto-analyze
When logging auto-vacuum and auto-analyze activity, include the I/O
timing if track_io_timing is enabled.  Also, for auto-analyze, add the
read rate and the dirty rate, similar to how that information has
historically been logged for auto-vacuum.

Stephen Frost and Jakub Wartak

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2021-03-16 14:46:48 -04:00
Peter Geoghegan 9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Fujii Masao ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Amit Kapila 8af3c233e4 Clarify the usage of max_replication_slots on the subscriber side.
It was not clear in the docs that the max_replication_slots is also used
to track replication origins on the subscriber side.

Author: Paul Martinez
Reviewed-by: Amit Kapila
Backpatch-through: 10 where logical replication was introduced
Discussion: https://postgr.es/m/CACqFVBZgwCN_pHnW6dMNCrOS7tiHCw6Retf_=U2Vvj3aUSeATw@mail.gmail.com
2021-03-03 12:01:56 +05:30
Peter Geoghegan e5d8a99903 Use full 64-bit XIDs in deleted nbtree pages.
Otherwise we risk "leaking" deleted pages by making them non-recyclable
indefinitely.  Commit 6655a729 did the same thing for deleted pages in
GiST indexes.  That work was used as a starting point here.

Stop storing an XID indicating the oldest bpto.xact across all deleted
though unrecycled pages in nbtree metapages.  There is no longer any
reason to care about that condition/the oldest XID.  It only ever made
sense when wraparound was something _bt_vacuum_needs_cleanup() had to
consider.

The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
It is now btm_last_cleanup_num_delpages, which is used to remember how
many non-recycled deleted pages remain from the last VACUUM (in practice
its value is usually the precise number of pages that were _newly
deleted_ during the specific VACUUM operation that last set the field).

The general idea behind storing btm_last_cleanup_num_delpages is to use
it to give _some_ consideration to non-recycled deleted pages inside
_bt_vacuum_needs_cleanup() -- though never too much.  We only really
need to avoid leaving a truly excessive number of deleted pages in an
unrecycled state forever.  We only do this to cover certain narrow cases
where no other factor makes VACUUM do a full scan, and yet the index
continues to grow (and so actually misses out on recycling existing
deleted pages).

These metapage changes result in a clear user-visible benefit: We no
longer trigger full index scans during VACUUM operations solely due to
the presence of only 1 or 2 known deleted (though unrecycled) blocks
from a very large index.  All that matters now is keeping the costs and
benefits in balance over time.

Fix an issue that has been around since commit 857f9c36, which added the
"skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
logic).  The accuracy of btm_last_cleanup_num_heap_tuples accidentally
hinged upon _when_ the source value gets stored.  We now always store
btm_last_cleanup_num_heap_tuples in btvacuumcleanup().  This fixes the
issue because IndexVacuumInfo.num_heap_tuples (the source field) is
expected to accurately indicate the state of the table _after_ the
VACUUM completes inside btvacuumcleanup().

A backpatchable fix cannot easily be extracted from this commit.  A
targeted fix for the issue will follow in a later commit, though that
won't happen today.

I (pgeoghegan) have chosen to remove any mention of deleted pages in the
documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
the presence of deleted (though unrecycled) pages is no longer of much
concern to users.  The vacuum_cleanup_index_scale_factor description in
the docs now seems rather unclear in any case, and it should probably be
rewritten in the near future.  Perhaps some passing mention of page
deletion will be added back at the same time.

Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
2021-02-24 18:41:34 -08:00
Michael Paquier c82d59d64e doc: Improve description of wal_receiver_status_interval
This parameter description was previously confusing, telling that a
value of 0 disabled completely status updates.  This is not true as
there are cases where an update is sent while ignoring this parameter
value.  The documentation is improved to outline the difference of
treatment for scheduled status messages and when these are forced.

Reported-by: Dmitriy Kuzmin
Author: Michael Paquier
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/161346024420.3455.1345266601055047937@wrigleys.postgresql.org
2021-02-24 11:15:58 +09:00
Peter Eisentraut f5465fade9 Allow specifying CRL directory
Add another method to specify CRLs, hashed directory method, for both
server and client side.  This offers a means for server or libpq to
load only CRLs that are required to verify a certificate.  The CRL
directory is specifed by separate GUC variables or connection options
ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
sslcrl, so both methods can be used at the same time.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
2021-02-18 07:59:10 +01:00
Thomas Munro f900a79ecd Default to wal_sync_method=fdatasync on FreeBSD.
FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
choose open_datasync as its default value.  That may not be a good
choice for all systems, and performs worse than fdatasync in some
scenarios.  Let's preserve the existing default behavior for now.

Like commit 576477e73c, which did the same for Linux, back-patch to all
supported releases.

Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com
2021-02-15 16:04:59 +13:00
Peter Eisentraut 3c78e0569c Refactor Windows error message for easier translation
In the error messages referring to the user right "Lock pages in
memory", this is a term from the Windows OS, so it should be
translated in accordance with the OS localization.  Refactor the error
messages so this is easier and clearer.  Also fix the capitalization
to match the existing capitalization in the OS.
2021-02-04 13:31:13 +01:00
Tom Lane f743a2bbd4 Doc: improve cross-references for SET/SHOW.
The corresponding functions set_config and current_setting were
mostly not hyperlinked.  Clarify their descriptions a tad, too.

Discussion: https://postgr.es/m/161183356250.4077.687338658090583892@wrigleys.postgresql.org
2021-01-29 10:46:14 -05:00
Peter Geoghegan e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Fujii Masao 39b03690b5 Log long wait time on recovery conflict when it's resolved.
This is a follow-up of the work done in commit 0650ff2303. This commit
extends log_recovery_conflict_waits so that a log message is produced
also when recovery conflict has already been resolved after deadlock_timeout
passes, i.e., when the startup process finishes waiting for recovery
conflict after deadlock_timeout. This is useful in investigating how long
recovery conflicts prevented the recovery from applying WAL.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-13 22:59:17 +09:00