Commit Graph

7659 Commits

Author SHA1 Message Date
Thomas Munro 1d257577e0 Optionally prefetch referenced data in recovery.
Introduce a new GUC recovery_prefetch, disabled by default.  When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.

The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.

The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2021-04-08 23:20:42 +12:00
Magnus Hagander aaf0432572 Add functions to wait for backend termination
This adds a function, pg_wait_for_backend_termination(), and a new
timeout argument to pg_terminate_backend(), which will wait for the
backend to actually terminate (with or without signaling it to do so
depending on which function is called). The default behaviour of
pg_terminate_backend() remains being timeout=0 which does not waiting.
For pg_wait_for_backend_termination() the default wait is 5 seconds.

Author: Bharath Rupireddy
Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama,
             Hou Zhijie, Magnus Hagander
Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com
2021-04-08 11:40:54 +02:00
Amit Kapila 8ffb003591 Fix typo in jsonfuncs.c.
Author: Tatsuro Yamada
Discussion: https://postgr.es/m/7c166a60-2808-6b89-9524-feefc6233748@nttcom.co.jp_1
2021-04-08 10:24:00 +05:30
Bruce Momjian f57a2f5e03 Add csvlog output for the new query_id value
This also adjusts the printf format for query id used by log_line_prefix
(%Q).

Reported-by: Justin Pryzby

Discussion: https://postgr.es/m/20210408005402.GG24239@momjian.us

Author: Julien Rouhaud, Bruce Momjian
2021-04-07 22:30:30 -04:00
Peter Eisentraut e717a9a18b SQL-standard function body
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.

Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:

    CREATE FUNCTION add(a integer, b integer) RETURNS integer
        LANGUAGE SQL
        RETURN a + b;

or as a block

    CREATE PROCEDURE insert_data(a integer, b integer)
    LANGUAGE SQL
    BEGIN ATOMIC
      INSERT INTO tbl VALUES (a);
      INSERT INTO tbl VALUES (b);
    END;

The function body is parsed at function definition time and stored as
expression nodes in a new pg_proc column prosqlbody.  So at run time,
no further parsing is required.

However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.

Dependencies between the function and the objects it uses are fully
tracked.

A new RETURN statement is introduced.  This can only be used inside
function bodies.  Internally, it is treated much like a SELECT
statement.

psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.

Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com
2021-04-07 21:47:55 +02:00
Peter Geoghegan 1e55e7d175 Add wraparound failsafe to VACUUM.
Add a failsafe mechanism that is triggered by VACUUM when it notices
that the table's relfrozenxid and/or relminmxid are dangerously far in
the past.  VACUUM checks the age of the table dynamically, at regular
intervals.

When the failsafe triggers, VACUUM takes extraordinary measures to
finish as quickly as possible so that relfrozenxid and/or relminmxid can
be advanced.  VACUUM will stop applying any cost-based delay that may be
in effect.  VACUUM will also bypass any further index vacuuming and heap
vacuuming -- it only completes whatever remaining pruning and freezing
is required.  Bypassing index/heap vacuuming is enabled by commit
8523492d, which made it possible to dynamically trigger the mechanism
already used within VACUUM when it is run with INDEX_CLEANUP off.

It is expected that the failsafe will almost always trigger within an
autovacuum to prevent wraparound, long after the autovacuum began.
However, the failsafe mechanism can trigger in any VACUUM operation.
Even in a non-aggressive VACUUM, where we're likely to not advance
relfrozenxid, it still seems like a good idea to finish off remaining
pruning and freezing.   An aggressive/anti-wraparound VACUUM will be
launched immediately afterwards.  Note that the anti-wraparound VACUUM
that follows will itself trigger the failsafe, usually before it even
begins its first (and only) pass over the heap.

The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and
vacuum_multixact_failsafe_age.  There are no equivalent reloptions,
since that isn't expected to be useful.  The GUCs have rather high
defaults (both default to 1.6 billion), and are expected to generally
only be used to make the failsafe trigger sooner/more frequently.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 12:37:45 -07:00
Bruce Momjian 4f0b0966c8 Make use of in-core query id added by commit 5fd9dfa5f5
Use the in-core query id computation for pg_stat_activity,
log_line_prefix, and EXPLAIN VERBOSE.

Similar to other fields in pg_stat_activity, only the queryid from the
top level statements are exposed, and if the backends status isn't
active then the queryid from the last executed statements is displayed.

Add a %Q placeholder to include the queryid in log_line_prefix, which
will also only expose top level statements.

For EXPLAIN VERBOSE, if a query identifier has been computed, either by
enabling compute_query_id or using a third-party module, display it.

Bump catalog version.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 14:04:06 -04:00
Bruce Momjian 5fd9dfa5f5 Move pg_stat_statements query jumbling to core.
Add compute_query_id GUC to control whether a query identifier should be
computed by the core (off by default).  It's thefore now possible to
disable core queryid computation and use pg_stat_statements with a
different algorithm to compute the query identifier by using a
third-party module.

To ensure that a single source of query identifier can be used and is
well defined, modules that calculate a query identifier should throw an
error if compute_query_id specified to compute a query id and if a query
idenfitier was already calculated.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 13:06:56 -04:00
Tom Lane 3db826bd55 Tighten up allowed names for custom GUC parameters.
Formerly we were pretty lax about what a custom GUC's name could
be; so long as it had at least one dot in it, we'd take it.
However, corner cases such as dashes or equal signs in the name
would cause various bits of functionality to misbehave.  Rather
than trying to make the world perfectly safe for that, let's
just require that custom names look like "identifier.identifier",
where "identifier" means something that scan.l would accept
without double quotes.

Along the way, this patch refactors things slightly in guc.c
so that find_option() is responsible for reporting GUC-not-found
cases, allowing removal of duplicative code from its callers.

Per report from Hubert Depesz Lubaczewski.  No back-patch,
since the consequences of the problem don't seem to warrant
changing behavior in stable branches.

Discussion: https://postgr.es/m/951335.1612910077@sss.pgh.pa.us
2021-04-07 11:22:22 -04:00
Tom Lane 091e22b2e6 Clean up treatment of missing default and CHECK-constraint records.
Andrew Gierth reported that it's possible to crash the backend if no
pg_attrdef record is found to match an attribute that has atthasdef set.
AttrDefaultFetch warns about this situation, but then leaves behind
a relation tupdesc that has null "adbin" pointer(s), which most places
don't guard against.

We considered promoting the warning to an error, but throwing errors
during relcache load is pretty drastic: it effectively locks one out
of using the relation at all.  What seems better is to leave the
load-time behavior as a warning, but then throw an error in any code
path that wants to use a default and can't find it.  This confines
the error to a subset of INSERT/UPDATE operations on the table, and
in particular will at least allow a pg_dump to succeed.

Also, we should fix AttrDefaultFetch to not leave any null pointers
in the tupdesc, because that just creates an untested bug hazard.

While at it, apply the same philosophy of "warn at load, throw error
only upon use of the known-missing info" to CHECK constraints.
CheckConstraintFetch is very nearly the same logic as AttrDefaultFetch,
but for reasons lost in the mists of time, it was throwing ERROR for
the same cases that AttrDefaultFetch treats as WARNING.  Make the two
functions more nearly alike.

In passing, get rid of potentially-O(N^2) loops in equalTupleDesc
by making AttrDefaultFetch sort the entries after fetching them,
so that equalTupleDesc can assume that entries in two equal tupdescs
must be in matching order.  (CheckConstraintFetch already was sorting
CHECK constraints, but equalTupleDesc hadn't been told about it.)

There's some argument for back-patching this, but with such a small
number of field reports, I'm content to fix it in HEAD.

Discussion: https://postgr.es/m/87pmzaq4gx.fsf@news-spur.riddles.org.uk
2021-04-06 10:34:39 -04:00
Peter Eisentraut a2da77cdb4 Change return type of EXTRACT to numeric
The previous implementation of EXTRACT mapped internally to
date_part(), which returned type double precision (since it was
implemented long before the numeric type existed).  This can lead to
imprecise output in some cases, so returning numeric would be
preferrable.  Changing the return type of an existing function is a
bit risky, so instead we do the following:  We implement a new set of
functions, which are now called "extract", in parallel to the existing
date_part functions.  They work the same way internally but use
numeric instead of float8.  The EXTRACT construct is now mapped by the
parser to these new extract functions.  That way, dumps of views
etc. from old versions (which would use date_part) continue to work
unchanged, but new uses will map to the new extract functions.

Additionally, the reverse compilation of EXTRACT now reproduces the
original syntax, using the new mechanism introduced in
40c24bfef9.

The following minor changes of behavior result from the new
implementation:

- The column name from an isolated EXTRACT call is now "extract"
  instead of "date_part".

- Extract from date now rejects inappropriate field names such as
  HOUR.  It was previously mapped internally to extract from
  timestamp, so it would silently accept everything appropriate for
  timestamp.

- Return values when extracting fields with possibly fractional
  values, such as second and epoch, now have the full scale that the
  value has internally (so, for example, '1.000000' instead of just
  '1').

Reported-by: Petr Fedorov <petr.fedorov@phystech.edu>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2021-04-06 07:20:42 +02:00
Fujii Masao 43620e3286 Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.

Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.

When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.

There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.

Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 13:44:15 +09:00
Tom Lane dfc843d465 Fix more confusion in SP-GiST.
spg_box_quad_leaf_consistent unconditionally returned the leaf
datum as leafValue, even though in its usage for poly_ops that
value is of completely the wrong type.

In versions before 12, that was harmless because the core code did
nothing with leafValue in non-index-only scans ... but since commit
2a6368343, if we were doing a KNN-style scan, spgNewHeapItem would
unconditionally try to copy the value using the wrong datatype
parameters.  Said copying is a waste of time and space if we're not
going to return the data, but it accidentally failed to fail until
I fixed the datatype confusion in ac9099fc1.

Hence, change spgNewHeapItem to not copy the datum unless we're
actually going to return it later.  This saves cycles and dodges
the question of whether lossy opclasses are returning the right
type.  Also change spg_box_quad_leaf_consistent to not return
data that might be of the wrong type, as insurance against
somebody introducing a similar bug into the core code in future.

It seems like a good idea to back-patch these two changes into
v12 and v13, although I'm afraid to change spgNewHeapItem's
mistaken idea of which datatype to use in those branches.

Per buildfarm results from ac9099fc1.

Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us
2021-04-04 17:57:07 -04:00
Andres Freund 225a22b19e Improve efficiency of wait event reporting, remove proc.h dependency.
pgstat_report_wait_start() and pgstat_report_wait_end() required two
conditional branches so far. One to check if MyProc is NULL, the other to
check if pgstat_track_activities is set. As wait events are used around
comparatively lightweight operations, and are inlined (reducing branch
predictor effectiveness), that's not great.

The dependency on MyProc has a second disadvantage: Low-level subsystems, like
storage/file/fd.c, report wait events, but architecturally it is preferable
for them to not depend on inter-process subsystems like proc.h (defining
PGPROC).  After this change including pgstat.h (nor obviously its
sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC
related headers anymore.

These goals, efficiency and abstraction, are achieved by having
pgstat_report_wait_start/end() not interact with MyProc, but instead a new
my_wait_event_info variable. At backend startup it points to a local variable,
removing the need to check for MyProc being NULL. During process
initialization my_wait_event_info is redirected to MyProc->wait_event_info. At
shutdown this is reversed. Because wait event reporting now does not need to
know about where the wait event is stored, it does not need to know about
PGPROC anymore.

The removal of the branch for checking pgstat_track_activities is simpler:
Don't check anymore. The cost due to the branch are often higher than the
store - and even if not, pgstat_track_activities is rarely disabled.

The main motivator to commit this work now is that removing the (indirect)
pgproc.h include from pgstat.h simplifies a patch to move statistics reporting
to shared memory (which still has a chance to get into 14).

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
2021-04-03 12:03:45 -07:00
Andres Freund e1025044cd Split backend status and progress related functionality out of pgstat.c.
Backend status (supporting pg_stat_activity) and command
progress (supporting pg_stat_progress*) related code is largely
independent from the rest of pgstat.[ch] (supporting views like
pg_stat_all_tables that accumulate data over time). See also
a333476b92.

This commit doesn't rename the function names to make the distinction
from the rest of pgstat_ clearer - that'd be more invasive and not
clearly beneficial. If we were to decide to do such a rename at some
point, it's better done separately from moving the code as well.

Robert's review was of an earlier version.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-03 11:42:52 -07:00
Michael Paquier e6bdfd9700 Refactor HMAC implementations
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently).  If there
is no such libraries, a fallback implementation is available.  Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.

HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC.  This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.

This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1.  I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.

Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
2021-04-03 17:30:49 +09:00
Andres Freund 1d9c5d0ce2 Do not rely on pgstat.h to indirectly include storage/ headers.
An upcoming patch might remove the (now indirect) proc.h
include (which in turn includes other headers), and it's cleaner for
the modified files to include their dependencies directly anyway...

Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
2021-04-02 20:02:47 -07:00
Andres Freund a333476b92 Split wait event related code from pgstat.[ch] into wait_event.[ch].
The wait event related code is independent from the rest of the
pgstat.[ch] code, of nontrivial size and changes on a regular
basis. Put it into its own set of files.

As there doesn't seem to be a good pre-existing directory for code
like this, add src/backend/utils/activity.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-02 20:02:26 -07:00
Thomas Munro c30f54ad73 Detect POLLHUP/POLLRDHUP while running queries.
Provide a new GUC check_client_connection_interval that can be used to
check whether the client connection has gone away, while running very
long queries.  It is disabled by default.

For now this uses a non-standard Linux extension (also adopted by at
least one other OS).  POLLRDHUP is not defined by POSIX, and other OSes
don't have a reliable way to know if a connection was closed without
actually trying to read or write.

In future we might consider trying to send a no-op/heartbeat message
instead, but that could require protocol changes.

Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2021-04-03 09:02:41 +13:00
Tom Lane 53aafdb9ff Strip file names reported in error messages on Windows, too.
Commit dd136052b established a policy that error message FILE items
should include only the base name of the reporting source file, for
uniformity and succinctness.  We now observe that some Windows compilers
use backslashes in __FILE__ strings, so truncate at backslashes as well.

This is expected to fix some platform variation in the results of the
new libpq_pipeline test module.

Discussion: https://postgr.es/m/3650140.1617372290@sss.pgh.pa.us
2021-04-02 10:43:54 -04:00
David Rowley 9eacee2e62 Add Result Cache executor node (take 2)
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-02 14:10:56 +13:00
Stephen Frost c9c41c7a33 Rename Default Roles to Predefined Roles
The term 'default roles' wasn't quite apt as these roles aren't able to
be modified or removed after installation, so rename them to be
'Predefined Roles' instead, adding an entry into the newly added
Obsolete Appendix to help users of current releases find the new
documentation.

Bruce Momjian and Stephen Frost

Discussion: https://postgr.es/m/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org
and https://www.postgresql.org/message-id/20201120211304.GG16415@tamriel.snowman.net
2021-04-01 15:32:06 -04:00
Peter Eisentraut 91e7c90329 Fix internal extract(timezone_minute) formulas
Through various refactorings over time, the extract(timezone_minute
from time with time zone) and extract(timezone_minute from timestamp
with time zone) implementations ended up with two different but
equally nonsensical formulas by using SECS_PER_MINUTE and
MINS_PER_HOUR interchangeably.  Since those two are of course both the
same number, the formulas do work, but for readability, fix them to be
semantically correct.
2021-04-01 16:12:53 +02:00
Heikki Linnakangas ea1b99a661 Add 'noError' argument to encoding conversion functions.
With the 'noError' argument, you can try to convert a buffer without
knowing the character boundaries beforehand. The functions now need to
return the number of input bytes successfully converted.

This is is a backwards-incompatible change, if you have created a custom
encoding conversion with CREATE CONVERSION. This adds a check to
pg_upgrade for that, refusing the upgrade if there are any user-defined
encoding conversions. Custom conversions are very rare, there are no
commonly used extensions that I know of that uses that feature. No other
objects can depend on conversions, so if you do have one, you can fairly
easily drop it before upgrading, and recreate it after the upgrade with
an updated version.

Add regression tests for built-in encoding conversions. This doesn't cover
every conversion, but it covers all the internal functions in conv.c that
are used to implement the conversions.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi
2021-04-01 11:45:22 +03:00
David Rowley 28b3e3905c Revert b6002a796
This removes "Add Result Cache executor node".  It seems that something
weird is going on with the tracking of cache hits and misses as
highlighted by many buildfarm animals.  It's not yet clear what the
problem is as other parts of the plan indicate that the cache did work
correctly, it's just the hits and misses that were being reported as 0.

This is especially a bad time to have the buildfarm so broken, so
reverting before too many more animals go red.

Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
2021-04-01 13:33:23 +13:00
David Rowley b6002a796d Add Result Cache executor node
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-01 12:32:22 +13:00
Joe Conway b12bd4869b Fix has_column_privilege function corner case
According to the comments, when an invalid or dropped column oid is passed
to has_column_privilege(), the intention has always been to return NULL.
However, when the caller had table level privilege the invalid/missing
column was never discovered, because table permissions were checked first.

Fix that by introducing extended versions of pg_attribute_acl(check|mask)
and pg_class_acl(check|mask) which take a new argument, is_missing. When
is_missing is NULL, the old behavior is preserved. But when is_missing is
passed by the caller, no ERROR is thrown for dropped or missing
columns/relations, and is_missing is flipped to true. This in turn allows
has_column_privilege to check for column privileges first, providing the
desired semantics.

Not backpatched since it is a user visible behavioral change with no previous
complaints, and the fix is a bit on the invasive side.

Author: Joe Conway
Reviewed-By: Tom Lane
Reported by: Ian Barwick
Discussion: https://postgr.es/m/flat/9b5f4311-157b-4164-7fe7-077b4fe8ed84%40joeconway.com
2021-03-31 13:55:25 -04:00
Tom Lane 86dc90056d Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:

1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns.  The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree.  The disadvantage of course
is an extra fetch of each tuple to be updated.  However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost.  At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.

2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that.  To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information.  This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.

Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy.  Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD.  (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)

There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.

FDW authors should note several API changes:

* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query.  Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.

* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable.  See postgres_fdw for sample code.

* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 11:52:37 -04:00
Peter Eisentraut 055fee7eb4 Allow an alias to be attached to a JOIN ... USING
This allows something like

    SELECT ... FROM t1 JOIN t2 USING (a, b, c) AS x

where x has the columns a, b, c and unlike a regular alias it does not
hide the range variables of the tables being joined t1 and t2.

Per SQL:2016 feature F404 "Range variable for common column names".

Reviewed-by: Vik Fearing <vik.fearing@2ndquadrant.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/454638cf-d563-ab76-a585-2564428062af@2ndquadrant.com
2021-03-31 17:10:50 +02:00
Etsuro Fujita 27e1f14563 Add support for asynchronous execution.
This implements asynchronous execution, which runs multiple parts of a
non-parallel-aware Append concurrently rather than serially to improve
performance when possible.  Currently, the only node type that can be
run concurrently is a ForeignScan that is an immediate child of such an
Append.  In the case where such ForeignScans access data on different
remote servers, this would run those ForeignScans concurrently, and
overlap the remote operations to be performed simultaneously, so it'll
improve the performance especially when the operations involve
time-consuming ones such as remote join and remote aggregation.

We may extend this to other node types such as joins or aggregates over
ForeignScans in the future.

This also adds the support for postgres_fdw, which is enabled by the
table-level/server-level option "async_capable".  The default is false.

Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself.  This commit
is mostly based on the patch proposed by Robert Haas, but also uses
stuff from the patch proposed by Kyotaro Horiguchi and from the patch
proposed by Thomas Munro.  Reviewed by Kyotaro Horiguchi, Konstantin
Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and
others.

Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com
Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
2021-03-31 18:45:00 +09:00
Peter Eisentraut 91c5a8caaa Add errhint_plural() function and make use of it
Similar to existing errmsg_plural() and errdetail_plural().  Some
errhint() calls hadn't received the proper plural treatment yet.
2021-03-31 09:16:25 +02:00
Bruce Momjian 5da9868ed9 In messages, use singular nouns for -1, like we do for +1.
This outputs "-1 year", not "-1 years".

Reported-by: neverov.max@gmail.com

Bug: 16939

Discussion: https://postgr.es/m/16939-cceeb03fb72736ee@postgresql.org
2021-03-30 18:34:27 -04:00
David Rowley ed934d4fa3 Allow estimate_num_groups() to pass back further details about the estimation
Here we add a new output parameter to estimate_num_groups() to allow it to
inform the caller of additional, possibly useful information about the
estimation.

The new output parameter is a struct that currently contains just a single
field with a set of flags.  This was done rather than having the flags as
an output parameter to allow future fields to be added without having to
change the signature of the function at a later date when we want to pass
back further information that might not be suitable to store in the flags
field.

It seems reasonable that one day in the future that the planner would want
to know more about the estimation. For example, how many individual sets
of statistics was the estimation generated from?  The planner may want to
take that into account if we ever want to consider risks as well as costs
when generating plans.

For now, there's only 1 flag we set in the flags field.  This is to
indicate if the estimation fell back on using the hard-coded constants in
any part of the estimation. Callers may like to change their behavior if
this is set, and this gives them the ability to do so.  Callers may pass
the flag pointer as NULL if they have no interest in obtaining any
additional information about the estimate.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.  Additionally, we're also not
making any changes to add support for clauselist_selectivity() and
clauselist_selectivity_ext().  However, if this is required in the future
then the same struct being added here should be fine to use as a new
output argument for those functions too.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com
2021-03-30 20:52:46 +13:00
David Rowley efd9d92bb3 Fix compiler warning in unistr function
Some compilers are not aware that elog/ereport ERROR does not return.
2021-03-30 20:28:09 +13:00
Peter Eisentraut f37fec837c Add unistr function
This allows decoding a string with Unicode escape sequences.  It is
similar to Unicode escape strings, but offers some more flexibility.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Asif Rehman <asifr.rehman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com
2021-03-29 11:56:53 +02:00
Tomas Vondra a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Tomas Vondra 33e52ad9a3 Fix ndistinct estimates with system attributes
When estimating the number of groups using extended statistics, the code
was discarding information about system attributes. This led to strange
situation that

    SELECT 1 FROM t GROUP BY ctid;

could have produced higher estimate (equal to pg_class.reltuples) than

    SELECT 1 FROM t GROUP BY a, b, ctid;

with extended statistics on (a,b). Fixed by retaining information about
the system attribute.

Backpatch all the way to 10, where extended statistics were introduced.

Author: Tomas Vondra
Backpatch-through: 10
2021-03-26 22:34:58 +01:00
Noah Misch a14a0118a1 Add "pg_database_owner" default role.
Membership consists, implicitly, of the current database owner.  Expect
use in template databases.  Once pg_database_owner has rights within a
template, each owner of a database instantiated from that template will
exercise those rights.

Reviewed by John Naylor.

Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com
2021-03-26 10:42:17 -07:00
Noah Misch f687bf61ed Merge similar algorithms into roles_is_member_of().
The next commit would have complicated two or three algorithms, so take
this opportunity to consolidate.  No functional changes.

Reviewed by John Naylor.

Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com
2021-03-26 10:42:16 -07:00
Alvaro Herrera 71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Michael Paquier a1999a01bb Sanitize the term "combo CID" in code comments
Combo CIDs were referred in the code comments using different terms
across various places of the code, so unify a bit the term used with
what is currently in use in some of the READMEs.

Author: "Hou, Zhijie"
Discussion: https://postgr.es/m/1d42865c91404f46af4562532fdbea31@G08CNEXMBPEKD05.g08.fujitsu.local
2021-03-25 16:08:03 +09:00
Peter Eisentraut 37c99d304d Fix stray double semicolons
Reported-by: John Naylor <john.naylor@enterprisedb.com>
2021-03-24 20:42:51 +01:00
Stephen Frost bbcc4eb2e0 Change checkpoint_completion_target default to 0.9
Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
Nathan Bossart
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
2021-03-24 13:07:51 -04:00
Robert Haas e5595de03e Tidy up more loose ends related to configurable TOAST compression.
Change the default_toast_compression GUC to be an enum rather than
a string. Earlier, uncommitted versions of the patch supported using
CREATE ACCESS METHOD to add new compression methods to a running
system, but that idea was dropped before commit. So, we can simplify
the GUC handling as well, which has the nice side effect of improving
the error messages.

While updating the documentation to reflect the new GUC type, also
move it back to the right place in the list. I moved this while
revising what became commit 24f0e395ac,
but apparently the intended ordering is "alphabetical" rather than
"whatever Robert thinks looks nice."

Rejigger things to avoid having access/toast_compression.h depend on
utils/guc.h, so that we don't end up with every file that includes
it also depending on something largely unrelated. Move a few
inline functions back into the C source file partly to help reduce
dependencies and partly just to avoid clutter. A few very minor
cosmetic fixes.

Original patch by Justin Pryzby, but very heavily edited by me,
and reverse reviewed by him and also reviewed by by Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoYp=GT_ztUCeZg2i4hkHAQv8o=-nVJ1-TKWTG1zQOmOpg@mail.gmail.com
2021-03-24 12:36:08 -04:00
Peter Eisentraut 49ab61f0bd Add date_bin function
Similar to date_trunc, but allows binning by an arbitrary interval
rather than just full units.

Author: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Artur Zakirov <zaartur@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
2021-03-24 16:18:24 +01:00
Amit Kapila 26acb54a13 Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."."
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.

List of commits reverted, in reverse chronological order:

ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".

Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
2021-03-24 11:29:15 +05:30
Peter Eisentraut a6715af1e7 Add bit_count SQL function
This function for bit and bytea counts the set bits in the bit or byte
string.  Internally, we use the existing popcount functionality.

For the name, after some discussion, we settled on bit_count, which
also exists with this meaning in MySQL, Java, and Python.

Author: David Fetter <david@fetter.org>
Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
2021-03-23 10:13:58 +01:00
Bruce Momjian 95d77149c5 Add macro RelationIsPermanent() to report relation permanence
Previously, to check relation permanence, the Relation's Form_pg_class
structure member relpersistence was compared to the value
RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro
RelationIsPermanent() and is used in appropirate places to simplify the
code.  This matches other RelationIs* macros.

This macro will be used in more places in future cluster file encryption
patches.

Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
2021-03-22 20:23:52 -04:00
Tom Lane f0c2a5bba6 Avoid leaking memory in RestoreGUCState(), and improve comments.
RestoreGUCState applied InitializeOneGUCOption to already-live
GUC entries, causing any malloc'd subsidiary data to be forgotten.
We do want the effect of resetting the GUC to its compiled-in
default, and InitializeOneGUCOption seems like the best way to do
that, so add code to free any existing subsidiary data beforehand.

The interaction between can_skip_gucvar, SerializeGUCState, and
RestoreGUCState is way more subtle than their opaque comments
would suggest to an unwary reader.  Rewrite and enlarge the
comments to try to make it clearer what's happening.

Remove a long-obsolete assertion in read_nondefault_variables: the
behavior of set_config_option hasn't depended on IsInitProcessingMode
since f5d9698a8 installed a better way of controlling it.

Although this is fixing a clear memory leak, the leak is quite unlikely
to involve any large amount of data, and it can only happen once in the
lifetime of a worker process.  So it seems unnecessary to take any
risk of back-patching.

Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us
2021-03-19 23:03:17 -04:00
Thomas Munro 61752afb26 Provide recovery_init_sync_method=syncfs.
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery.  This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.

Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory.  This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.

The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.

Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
2021-03-20 12:07:28 +13:00
Robert Haas bbe0a81db6 Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 15:10:38 -04:00
Tom Lane 377b7a8300 Don't leak malloc'd strings when a GUC setting is rejected.
Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks.  Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 415ffdc220 Don't run RelationInitTableAccessMethod in a long-lived context.
Some code paths in this function perform syscache lookups, which
can lead to table accesses and possibly leakage of cruft into
the caller's context.  If said context is CacheMemoryContext,
we eventually will have visible bloat.  But fixing this is no
harder than moving one memory context switch step.  (The other
callers don't have a problem.)

Andres Freund and I independently found this via valgrind testing.
Back-patch to v12 where this code was added.

Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 28644fac10 Don't leak rd_statlist when a relcache entry is dropped.
Although these lists are usually NIL, and even when not empty
are unlikely to be large, constant relcache update traffic could
eventually result in visible bloat of CacheMemoryContext.

Found via valgrind testing.
Back-patch to v10 where this field was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tomas Vondra be45be9c33 Implement GROUP BY DISTINCT
With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-18 18:22:18 +01:00
Tomas Vondra cd91de0d17 Remove temporary files after backend crash
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
2021-03-18 17:38:28 +01:00
Magnus Hagander da18d829c2 Fix function name in error hint
pg_read_file() is the function that's in core, pg_file_read() is in
adminpack. But when using pg_file_read() in adminpack it calls the *C*
level function pg_read_file() in core, which probably threw the original
author off. But the error hint should be about the SQL function.

Reported-By: Sergei Kornilov
Backpatch-through: 11
Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru
2021-03-18 11:22:20 +01:00
Amit Kapila c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
2021-03-18 07:25:27 +05:30
Peter Geoghegan 9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Tom Lane c3ffe34863 Avoid creating duplicate cached plans for inherited FK constraints.
When a foreign key constraint is applied to a partitioned table, each
leaf partition inherits a similar FK constraint.  We were processing all
of those constraints independently, meaning that in large partitioning
trees we'd build up large collections of cached FK-checking query plans.
However, in all cases but one, the generated queries are actually
identical for all members of the inheritance tree (because, in most
cases, the query only mentions the topmost table of the other side of
the FK relationship).  So we can share a single cached plan among all
the partitions, saving memory, not to mention time to build and maintain
the cached plans.

Keisuke Kuroda and Amit Langote

Discussion: https://postgr.es/m/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1
2021-03-10 14:22:31 -05:00
Amit Kapila 05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Parallel SELECT can't be utilized for INSERT in the following cases:
- INSERT statement uses the ON CONFLICT DO UPDATE clause
- Target table has a parallel-unsafe: trigger, index expression or
  predicate, column default expression or check constraint
- Target table has a parallel-unsafe domain constraint on any column
- Target table is a partitioned table with a parallel-unsafe partition key
  expression or support function

The planner is updated to perform additional parallel-safety checks for
the cases listed above, for determining whether it is safe to run INSERT
in parallel-mode with an underlying parallel SELECT. The planner will
consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided
nothing unsafe is found from the additional parallel-safety checks, or
from the existing parallel-safety checks for SELECT.

While checking parallel-safety, we need to check it for all the partitions
on the table which can be costly especially when we decide not to use a
parallel plan. So, in a separate patch, we will introduce a GUC and or a
reloption to enable/disable parallelism for Insert statements.

Prior to entering parallel-mode for the execution of INSERT with parallel
SELECT, a TransactionId is acquired and assigned to the current
transaction state. This is necessary to prevent the INSERT from attempting
to assign the TransactionId whilst in parallel-mode, which is not allowed.
This approach has a disadvantage in that if the underlying SELECT does not
return any rows, then the TransactionId is not used, however that
shouldn't happen in practice in many cases.

Author: Greg Nancarrow, Amit Langote, Amit Kapila
Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila
Tested-by: Tang, Haiying
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-03-10 07:38:58 +05:30
Fujii Masao ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Michael Paquier f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Heikki Linnakangas 3174d69fb9 Remove server and libpq support for old FE/BE protocol version 2.
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.

The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.

Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
2021-03-04 10:45:55 +02:00
Tom Lane 0a687c8f10 Add trim_array() function.
This has been in the SQL spec since 2008.  It's a pretty thin
wrapper around the array slice functionality, but the spec
says we should have it, so here it is.

Vik Fearing, reviewed by Dian Fay

Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org
2021-03-03 16:39:57 -05:00
Peter Eisentraut e527a99055 Some copy-editing of GUC descriptions 2021-03-03 07:14:35 +01:00
Thomas Munro 8eda3eba30 Use sort_template.h for qsort_tuple() and qsort_ssup().
Replace the Perl code previously used to generate specialized sort
functions with sort_template.h.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-03 17:02:32 +13:00
Tom Lane d16f8c8e41 Mark default_transaction_read_only as GUC_REPORT.
This allows clients to find out the setting at connection time without
having to expend a query round trip to do so; which is helpful when
trying to identify read/write servers.  (One must also look at
in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.)
Modifying libpq to make use of this will come soon, but I felt it
cleaner to push the server change separately.

Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-03-02 13:53:54 -05:00
Tom Lane ffd3944ab9 Improve reporting for syntax errors in multi-line JSON data.
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was.  Simplify the logic a bit.  Add test cases.

Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
2021-03-01 16:44:17 -05:00
Thomas Munro a042ba2ba7 Introduce symbolic names for FeBeWaitSet positions.
Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in a couple of places that didn't already do that.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:10:16 +13:00
Thomas Munro 6a2a70a020 Use signalfd(2) for epoll latches.
Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe.  Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:12:02 +13:00
Peter Eisentraut f4adc41c4f Enhanced cycle mark values
Per SQL:202x draft, in the CYCLE clause of a recursive query, the
cycle mark values can be of type boolean and can be omitted, in which
case they default to TRUE and FALSE.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-27 08:13:24 +01:00
Thomas Munro 8556267b2b Revert "pg_collation_actual_version() -> pg_collation_current_version()."
This reverts commit 9cf184cc05.  Name
change less well received than anticipated.

Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org
2021-02-26 15:29:27 +13:00
Michael Paquier bcf2667bf6 Fix some typos, grammar and style in docs and comments
The portions fixing the documentation are backpatched where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210210235557.GQ20012@telsasoft.com
backpatch-through: 9.6
2021-02-24 16:13:17 +09:00
Alvaro Herrera d9d076222f
VACUUM: ignore indexing operations with CONCURRENTLY
As envisioned in commit c98763bf51, it is possible for VACUUM to
ignore certain transactions that are executing CREATE INDEX CONCURRENTLY
and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's
because we know those transactions are not going to examine any other
tables, and are not going to execute anything else in the same
transaction.  (Only operations on "safe" indexes can be ignored: those
on indexes that are neither partial nor expressional).

This is extremely useful in cases where CIC/RC can run for a very long
time, because that used to be a significant headache for concurrent
vacuuming of other tables.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
2021-02-23 12:15:09 -03:00
Peter Eisentraut 6f6f284c7e Simplify printing of LSNs
Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs.
Convert all applicable code to use it.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
2021-02-23 10:27:02 +01:00
Thomas Munro beb4480c85 Refactor get_collation_current_version().
The code paths for three different OSes finished up with three different
ways of excluding C[.xxx] and POSIX from consideration.  Merge them.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 9cf184cc05 pg_collation_actual_version() -> pg_collation_current_version().
The new name seems a bit more natural.

Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 0fb0a0503b Hide internal error for pg_collation_actual_version(<bad OID>).
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:01:20 +13:00
Thomas Munro db8374d804 Remove outdated reference to RAID spindles.
Commit b09ff536 left behind some outdated advice in the long_desc field
of the GUC "effective_io_concurrency".  Remove it.

Back-patch to 13.

Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJyyWqFBxL9gEj-qtjBThGjhAOBE8GBnF8MUJOJ3vrfag%40mail.gmail.com
2021-02-22 14:42:15 +13:00
Peter Eisentraut f5465fade9 Allow specifying CRL directory
Add another method to specify CRLs, hashed directory method, for both
server and client side.  This offers a means for server or libpq to
load only CRLs that are required to verify a certificate.  The CRL
directory is specifed by separate GUC variables or connection options
ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
sslcrl, so both methods can be used at the same time.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
2021-02-18 07:59:10 +01:00
Peter Eisentraut 0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Tom Lane 38bb3aef35 Convert tsginidx.c's GIN indexing logic to fully ternary operation.
Commit 2f2007fbb did this partially, but there were two remaining
warts.  checkcondition_gin handled some uncertain cases by setting
the out-of-band recheck flag, some by returning TS_MAYBE, and some
by doing both.  Meanwhile, TS_execute arbitrarily converted a
TS_MAYBE result to TS_YES.  Thus, if checkcondition_gin chose to
only return TS_MAYBE, the outcome would be TS_YES with no recheck
flag, potentially resulting in wrong query outputs.

The case where this'd happen is if there were GIN_MAYBE entries
in the indexscan results passed to gin_tsquery_[tri]consistent,
which so far as I can see would only happen if the tidbitmap used
to accumulate indexscan results grew large enough to become lossy.

I initially thought of fixing this by ensuring we always set the
recheck flag as well as returning TS_MAYBE in uncertain cases.
But that errs in the other direction, potentially forcing rechecks
of rows that provably match the query (since the recheck flag
remains set even if TS_execute later finds that the answer must be
TS_YES).  Instead, let's get rid of the out-of-band recheck flag
altogether and rely on returning TS_MAYBE.  This requires exporting
a version of TS_execute that will actually return the full ternary
result of the evaluation ... but we likely should have done that
to start with.

Unfortunately it doesn't seem practical to add a regression test case
that covers this: the amount of data needed to cause the GIN bitmap to
become lossy results in a longer runtime than I think we want to have
in the tests.  (I'm wondering about allowing smaller work_mem settings
to ameliorate that, but it'd be a matter for a separate patch.)

Per bug #16865 from Dimitri Nüscheler.  Back-patch to v13 where
the faulty commit came in.

Discussion: https://postgr.es/m/16865-4ffdc3e682e6d75b@postgresql.org
2021-02-16 12:07:14 -05:00
Fujii Masao 46d6e5f567 Display the time when the process started waiting for the lock, in pg_locks, take 2
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

The first attempt of this patch (commit 3b733fcd04) caused the buildfarm
member "rorqual" (built with --disable-atomics --disable-spinlocks) to report
the failure of the regression test. It was reverted by commit 890d2182a2.
The cause of this failure was that the atomic variable for "waitstart"
in the dummy process entry created at the end of prepare transaction was
not initialized. This second attempt fixes that issue.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-15 15:13:37 +09:00
Thomas Munro f900a79ecd Default to wal_sync_method=fdatasync on FreeBSD.
FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
choose open_datasync as its default value.  That may not be a good
choice for all systems, and performs worse than fdatasync in some
scenarios.  Let's preserve the existing default behavior for now.

Like commit 576477e73c, which did the same for Linux, back-patch to all
supported releases.

Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com
2021-02-15 16:04:59 +13:00
Michael Paquier b83dcf7928 Add result size as argument of pg_cryptohash_final() for overflow checks
With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that.  This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.

The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core.  Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes.  In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.

This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.

Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
2021-02-15 10:18:34 +09:00
Tom Lane ae4867ec74 Avoid divide-by-zero in regex_selectivity() with long fixed prefix.
Given a regex pattern with a very long fixed prefix (approaching 500
characters), the result of pow(FIXED_CHAR_SEL, fixed_prefix_len) can
underflow to zero.  Typically the preceding selectivity calculation
would have underflowed as well, so that we compute 0/0 and get NaN.
In released branches this leads to an assertion failure later on.
That doesn't happen in HEAD, for reasons I've not explored yet,
but it's surely still a bug.

To fix, just skip the division when the pow() result is zero, so
that we'll (most likely) return a zero selectivity estimate.  In
the edge cases where "sel" didn't yet underflow, perhaps this
isn't desirable, but I'm not sure that the case is worth spending
a lot of effort on.  The results of regex_selectivity_sub() are
barely worth the electrons they're written on anyway :-(

Per report from Alexander Lakhin.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/6de0a0c3-ada9-cd0c-3e4e-2fa9964b41e3@gmail.com
2021-02-12 16:26:47 -05:00
Tom Lane 69036aafb9 Simplify jsonfuncs.c code by using strtoint() not strtol().
Explicitly testing for INT_MIN and INT_MAX isn't particularly good
style; it's tedious and may draw useless compiler warnings on
machines where int and long are the same width.  We invented
strtoint() precisely for this usage, so use that instead.

While here, remove gratuitous variations in the way the tests for
did-strtoint-succeed were spelled.  Also, avoid attempting to
negate INT_MIN; that would probably work given that the result
is implicitly cast to uint32, but I think it's nominally undefined
behavior.

Per gripe from Ranier Vilela, though this isn't his proposed patch.

Discussion: https://postgr.es/m/CAEudQAqge3QfzoBRhe59QrB_5g+NmQUj2QpzqZ9Nc7QepXGAEw@mail.gmail.com
2021-02-11 12:49:22 -05:00
Michael Paquier bd12080980 Preserve pg_attribute.attstattarget across REINDEX CONCURRENTLY
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS.  This data was lost on the new index after REINDEX
CONCURRENTLY.

The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable.  Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.

Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
2021-02-10 13:06:48 +09:00
Fujii Masao 890d2182a2 Revert "Display the time when the process started waiting for the lock, in pg_locks."
This reverts commit 3b733fcd04.

Per buildfarm members prion and rorqual.
2021-02-09 18:30:40 +09:00
Fujii Masao 3b733fcd04 Display the time when the process started waiting for the lock, in pg_locks.
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-09 18:10:19 +09:00
Tom Lane 62f34097c8 Build in some knowledge about foreign-key relationships in the catalogs.
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs.  While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL.  Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation.  Nonetheless, we can collect and use knowledge about these
relationships.

This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships.  (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.)  Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h".  The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL.  The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins.  (As of this
commit, 217 relationships are checked by the test, versus 181 before.)

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 17:11:55 -05:00
Peter Eisentraut 3696a600e2 SEARCH and CYCLE clauses
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-01 14:32:51 +01:00
Alexander Korotkov bb513b364b Get rid of unnecessary memory allocation in jsonb_subscript_assign()
Current code allocates memory for JsonbValue, but it could be placed locally.
2021-02-01 14:06:02 +03:00
Tom Lane 7c5d57caed Fix portability issue in new jsonbsubs code.
On machines where sizeof(Datum) > sizeof(Oid) (that is, any 64-bit
platform), the previous coding would compute a misaligned
workspace->index pointer if nupper is odd.  Architectures where
misaligned access is a hard no-no would then fail.  This appears
to explain why thorntail is unhappy but other buildfarm members
are not.
2021-02-01 02:03:59 -05:00
Alexander Korotkov aa6e46daf5 Throw error when assigning jsonb scalar instead of a composite object
During the jsonb subscripting assignment, the provided path might assume an
object or an array where the source jsonb has a scalar value.  Initial
subscripting assignment logic will skip such an update operation with no
message shown.  This commit makes it throw an error to indicate this type
of situation.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:06 +03:00
Alexander Korotkov 81fcc72e66 Filling array gaps during jsonb subscripting
This commit introduces two new flags for jsonb assignment:

* JB_PATH_FILL_GAPS: Appending array elements on the specified position, gaps
  are filled with nulls (similar to the JavaScript behavior).  This mode also
  instructs to   create the whole path in a jsonb object if some part of the
  path (more than just the last element) is not present.

* JB_PATH_CONSISTENT_POSITION: Assigning keeps array positions consistent by
  preventing prepending of elements.

Both flags are used only in jsonb subscripting assignment.

Initially proposed by Nikita Glukhov based on polymorphic subscripting
patch, but transformed into an independent change.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:01 +03:00
Alexander Korotkov 676887a3b0 Implementation of subscripting for jsonb
Subscripting for jsonb does not support slices, does not have a limit for the
number of subscripts, and an assignment expects a replace value to have jsonb
type.  There is also one functional difference between assignment via
subscripting and assignment via jsonb_set().  When an original jsonb container
is NULL, the subscripting replaces it with an empty jsonb and proceeds with
an assignment.

For the sake of code reuse, we rearrange some parts of jsonb functionality
to allow the usage of the same functions for jsonb_set and assign subscripting
operation.

The original idea belongs to Oleg Bartunov.

Catversion is bumped.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:50:40 +03:00
Noah Misch 7da83415e5 Revive "snapshot too old" with wal_level=minimal and SET TABLESPACE.
Given a permanent relation rewritten in the current transaction, the
old_snapshot_threshold mechanism assumed the relation had never been
subject to early pruning.  Hence, a query could fail to report "snapshot
too old" when the rewrite followed an early truncation.  ALTER TABLE SET
TABLESPACE is probably the only rewrite mechanism capable of exposing
this bug.  REINDEX sets indcheckxmin, avoiding the problem.  CLUSTER has
zeroed page LSNs since before old_snapshot_threshold existed, so
old_snapshot_threshold has never cooperated with it.  ALTER TABLE
... SET DATA TYPE makes the table look empty to every past snapshot,
which is strictly worse.  Back-patch to v13, where commit
c6b92041d3 broke this.

Kyotaro Horiguchi and Noah Misch

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:12:18 -08:00
Heikki Linnakangas 6c5576075b Add direct conversion routines between EUC_TW and Big5.
Conversions between EUC_TW and Big5 were previously implemented by
converting the whole input to MIC first, and then from MIC to the target
encoding. Implement functions to convert directly between the two.

The reason to do this now is that I'm working on a patch that will change
the conversion function signature so that if the input is invalid, we
convert as much as we can and return the number of bytes successfully
converted. That's not possible if we use an intermediary format, because
if an error happens in the intermediary -> final conversion, we lose track
of the location of the invalid character in the original input. Avoiding
the intermediate step makes the conversions faster, too.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
2021-01-28 14:53:03 +02:00
Heikki Linnakangas b80e10638e Add mbverifystr() functions specific to each encoding.
This makes pg_verify_mbstr() function faster, by allowing more efficient
encoding-specific implementations. All the implementations included in
this commit are pretty naive, they just call the same encoding-specific
verifychar functions that were used previously, but that already gives a
performance boost because the tight character-at-a-time loop is simpler.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi
2021-01-28 14:40:07 +02:00
Michael Paquier f854c69a5b Refactor SQL functions of SHA-2 in cryptohashfuncs.c
The same code pattern was repeated four times when compiling a SHA-2
hash.  This refactoring has the advantage to issue a compilation warning
if a new value is added to pg_cryptohash_type, so as anybody doing an
addition in this area would need to consider if support for a new SQL
function is needed or not.

Author: Sehrope Sarkuni, Michael Paquier
Discussion: https://postgr.es/m/YA7DvLRn2xnTgsMc@paquier.xyz
2021-01-28 16:13:26 +09:00
Peter Geoghegan e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Tom Lane ee895a655c Improve performance of repeated CALLs within plpgsql procedures.
This patch essentially is cleaning up technical debt left behind
by the original implementation of plpgsql procedures, particularly
commit d92bc83c4.  That patch (or more precisely, follow-on patches
fixing its worst bugs) forced us to re-plan CALL and DO statements
each time through, if we're in a non-atomic context.  That wasn't
for any fundamental reason, but just because use of a saved plan
requires having a ResourceOwner to hold a reference count for the
plan, and we had no suitable resowner at hand, nor would the
available APIs support using one if we did.  While it's not that
expensive to create a "plan" for CALL/DO, the cycles do add up
in repeated executions.

This patch therefore makes the following API changes:

* GetCachedPlan/ReleaseCachedPlan are modified to let the caller
specify which resowner to use to pin the plan, rather than forcing
use of CurrentResourceOwner.

* spi.c gains a "SPI_execute_plan_extended" entry point that lets
callers say which resowner to use to pin the plan.  This borrows the
idea of an options struct from the recently added SPI_prepare_extended,
hopefully allowing future options to be added without more API breaks.
This supersedes SPI_execute_plan_with_paramlist (which I've marked
deprecated) as well as SPI_execute_plan_with_receiver (which is new
in v14, so I just took it out altogether).

* I also took the opportunity to remove the crude hack of letting
plpgsql reach into SPI private data structures to mark SPI plans as
"no_snapshot".  It's better to treat that as an option of
SPI_prepare_extended.

Now, when running a non-atomic procedure or DO block that contains
any CALL or DO commands, plpgsql creates a ResourceOwner that
will be used to pin the plans of the CALL/DO commands.  (In an
atomic context, we just use CurrentResourceOwner, as before.)
Having done this, we can just save CALL/DO plans normally,
whether or not they are used across transaction boundaries.
This seems to be good for something like 2X speedup of a CALL
of a trivial procedure with a few simple argument expressions.
By restricting the creation of an extra ResourceOwner like this,
there's essentially zero penalty in cases that can't benefit.

Pavel Stehule, with some further hacking by me

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-25 22:28:29 -05:00
Tom Lane 07d46fceb4 Fix broken ruleutils support for function TRANSFORM clauses.
I chanced to notice that this dumped core due to a faulty Assert.
To add insult to injury, the output has been misformatted since v11.
Obviously we need some regression testing here.

Discussion: https://postgr.es/m/d1cc628c-3953-4209-957b-29427acc38c8@www.fastmail.com
2021-01-25 13:03:43 -05:00
Tom Lane 55dc86eca7 Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins.  This could result in a malformed plan.  The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.

The right value to use is the join level we actually intend to evaluate
the PHV at.  We can get that from the ph_eval_at field of the associated
PlaceHolderInfo.  However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level.  (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.)  Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.

The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos.  We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter.  (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)

Back-patch to v12.  The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.

Per report from Stephan Springl.

Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-21 15:37:23 -05:00
Tom Lane a6cf3df4eb Add bytea equivalents of ltrim() and rtrim().
We had bytea btrim() already, but for some reason not the other two.

Joel Jacobson

Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
2021-01-18 15:11:32 -05:00
Magnus Hagander 960869da08 Add pg_stat_database counters for sessions and session time
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.

Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.

Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
2021-01-17 13:52:31 +01:00
Michael Paquier aef8948f38 Rework refactoring of hex and encoding routines
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
2021-01-14 11:13:24 +09:00
Michael Paquier fce7d0e6ef Fix routine name in comment of catcache.c
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUDXLAkf_XxQO9tAUtnTNGi3Lmd8fANd+vBJbcHn1HoWA@mail.gmail.com
2021-01-13 10:32:21 +09:00
Tom Lane 4edf96846a Rethink SQLSTATE code for ERRCODE_IDLE_SESSION_TIMEOUT.
Move it to class 57 (Operator Intervention), which seems like a
better choice given that from the client's standpoint it behaves
a heck of a lot like, e.g., ERRCODE_ADMIN_SHUTDOWN.

In a green field I'd put ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT
here as well.  But that's been around for a few years, so it's
probably too late to change its SQLSTATE code.

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-11 14:53:42 -05:00
Fujii Masao 0650ff2303 Add GUC to log long wait times on recovery conflicts.
This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.

Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.

Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-08 00:47:03 +09:00
Tom Lane 9486e7b666 Improve commentary in timeout.c.
On re-reading I realized that I'd missed one race condition in the new
timeout code.  It's safe, but add a comment explaining it.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 22:09:16 -05:00
Tom Lane 9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Tom Lane 09cf1d5226 Improve timeout.c's handling of repeated timeout set/cancel.
A very common usage pattern is that we set a timeout that we don't
expect to reach, cancel it after a little bit, and later repeat.
With the original implementation of timeout.c, this results in one
setitimer() call per timeout set or cancel.  We can do a lot better
by being lazy about changing the timeout interrupt request, namely:
(1) never cancel the outstanding interrupt, even when we have no
active timeout events;
(2) if we need to set an interrupt, but there already is one pending
at or before the required time, leave it alone.  When the interrupt
happens, the signal handler will reschedule it at whatever time is
then needed.

For example, with a one-second setting for statement_timeout, this
method results in having to interact with the kernel only a little
more than once a second, no matter how many statements we execute
in between.  The mainline code might never call setitimer() at all
after the first time, while each time the signal handler fires,
it sees that the then-pending request is most of a second away,
and that's when it sets the next interrupt request for.  Each
mainline timeout-set request after that will observe that the time
it wants is past the pending interrupt request time, and do nothing.

This also works pretty well for cases where a few different timeout
lengths are in use, as long as none of them are very short.  But
that describes our usage well.

Idea and original patch by Thomas Munro; I fixed a race condition
and improved the comments.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 18:28:52 -05:00
Tomas Vondra 8a4f618e7a Report progress of COPY commands
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).

Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
2021-01-06 21:51:06 +01:00
Peter Eisentraut 4656e3d668 Replace CLOBBER_CACHE_ALWAYS with run-time GUC
Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical
to use for testing in PostgreSQL because it's so slow and because it's
toggled on/off only at build time.  It is helpful when hunting bugs in
any code that uses the sycache/relcache because causes cache
invalidations to be injected whenever it would be possible for an
invalidation to occur, whether or not one was really pending.

Address this by providing run-time control over cache clobber
behaviour using the new debug_invalidate_system_caches_always GUC.
Support is not compiled in at all unless assertions are enabled or
CLOBBER_CACHE_ENABLED is explicitly defined at compile time.  It
defaults to 0 if compiled in, so it has negligible effect on assert
build performance by default.

When support is compiled in, test code can now set
debug_invalidate_system_caches_always=1 locally to a backend to test
specific queries, functions, extensions, etc.  Or tests can toggle it
globally for a specific test case while retaining normal performance
during test setup and teardown.

For backwards compatibility with existing test harnesses and scripts,
debug_invalidate_system_caches_always defaults to 1 if
CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE
is defined.

CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the
related RECOVER_RELATION_BUILD_MEMORY setting for the relcache.

Author: Craig Ringer <craig.ringer@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMsr+YF=+ctXBZj3ywmvKNUjWpxmuTuUKuv-rgbHGX5i5pLstQ@mail.gmail.com
2021-01-06 10:46:44 +01:00
Tom Lane bf8a662c9a Introduce a new GUC_REPORT setting "in_hot_standby".
Aside from being queriable via SHOW, this value is sent to the client
immediately at session startup, and again later on if the server gets
promoted to primary during the session.  The immediate report will be
used in an upcoming patch to avoid an extra round trip when trying to
connect to a primary server.

Haribabu Kommi, Greg Nancarrow, Tom Lane; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-01-05 16:18:05 -05:00
Dean Rasheed fead67c24a Add an explicit cast to double when using fabs().
Commit bc43b7c2c0 used fabs() directly on an int variable, which
apparently requires an explicit cast on some platforms.

Per buildfarm.
2021-01-05 11:52:42 +00:00
Dean Rasheed bc43b7c2c0 Fix numeric_power() when the exponent is INT_MIN.
In power_var_int(), the computation of the number of significant
digits to use in the computation used log(Abs(exp)), which isn't safe
because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs()
instead of Abs(), so that the exponent is cast to a double before the
absolute value is taken.

Back-patch to 9.6, where this was introduced (by 7d9a4737c2).

Discussion: https://postgr.es/m/CAEZATCVd6pMkz=BrZEgBKyqqJrt2xghr=fNc8+Z=5xC6cgWrWA@mail.gmail.com
2021-01-05 11:15:28 +00:00
Thomas Munro 034510c820 Replace remaining uses of "whitelist".
Instead describe the action that the list effects, or just use "list"
where the meaning is obvious from context.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 14:00:16 +13:00
Thomas Munro c0d4f6d897 Rename "enum blacklist" to "uncommitted enums".
We agreed to remove this terminology and use something more descriptive.

Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 12:38:48 +13:00
Tom Lane 4bd3fad80e Fix integer-overflow corner cases in substring() functions.
If the substring start index and length overflow when added together,
substring() misbehaved, either throwing a bogus "negative substring
length" error on a case that should succeed, or failing to complain that
a negative length is negative (and instead returning the whole string,
in most cases).  Unsurprisingly, the text, bytea, and bit variants of
the function all had this issue.  Rearrange the logic to ensure that
negative lengths are always rejected, and add an overflow check to
handle the other case.

Also install similar guards into detoast_attr_slice() (nee
heap_tuple_untoast_attr_slice()), since it's far from clear that
no other code paths leading to that function could pass it values
that would overflow.

Patch by myself and Pavel Stehule, per bug #16804 from Rafi Shamim.

Back-patch to v11.  While these bugs are old, the common/int.h
infrastructure for overflow-detecting arithmetic didn't exist before
commit 4d6ad3125, and it doesn't seem like these misbehaviors are bad
enough to justify developing a standalone fix for the older branches.

Discussion: https://postgr.es/m/16804-f4eeeb6c11ba71d4@postgresql.org
2021-01-04 18:32:44 -05:00
Tom Lane 1c1cbe279b Rethink the "read/write parameter" mechanism in pl/pgsql.
Performance issues with the preceding patch to re-implement array
element assignment within pl/pgsql led me to realize that the read/write
parameter mechanism is misdesigned.  Instead of requiring the assignment
source expression to be such that *all* its references to the target
variable could be passed as R/W, we really want to identify *one*
reference to the target variable to be passed as R/W, allowing any other
ones to be passed read/only as they would be by default.  As long as the
R/W reference is a direct argument to the top-level (hence last to be
executed) function in the expression, there is no harm in R/O references
being passed to other lower parts of the expression.  Nor is there any
use-case for more than one argument of the top-level function being R/W.

Hence, rewrite that logic to identify one single Param that references
the target variable, and make only that Param pass a read/write
reference, not any other Params referencing the target variable.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 12:39:27 -05:00
Michael Paquier b49154b3b7 Simplify some comments in xml.c
Author: Justin Pryzby
Discussion: https://postgr.es/m/X/Ff7jfnvJUab013@paquier.xyz
2021-01-04 19:47:58 +09:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Alexander Korotkov 16d531a30a Refactor multirange_in()
This commit preserves the logic of multirange_in() but makes it more clear
what's going on.  Also, this commit fixes the compiler warning spotted by the
buildfarm.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/2246043.1609290699%40sss.pgh.pa.us
2020-12-30 21:17:34 +03:00
Tom Lane 7ca37fb040 Use setenv() in preference to putenv().
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX.  However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function.  And setenv() has
cleaner semantics too.  So, let's reverse that old policy.

This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field).  More importantly, extend
win32env.c to also support setenv().  Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.

Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention.  I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.

Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
2020-12-30 12:56:06 -05:00
Alexander Korotkov 62097a4cc8 Fix selectivity estimation @> (anymultirange, anyrange) operator
Attempt to get selectivity estimation for @> (anymultirange, anyrange) operator
caused an error in buildfarm, because this operator was missed in switch()
of calc_hist_selectivity().  Fix that and also make regression tests reliably
check that selectivity estimation for (multi)ranges doesn't fall.  Previously,
whether we test selectivity estimation for (multi)ranges depended on
whether autovacuum managed to gather concurrently to the test.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/X%2BwmgjRItuvHNBeV%40paquier.xyz
2020-12-30 20:31:15 +03:00
Tom Lane 860fe27ee1 Fix up usage of krb_server_keyfile GUC parameter.
secure_open_gssapi() installed the krb_server_keyfile setting as
KRB5_KTNAME unconditionally, so long as it's not empty.  However,
pg_GSS_recvauth() only installed it if KRB5_KTNAME wasn't set already,
leading to a troubling inconsistency: in theory, clients could see
different sets of server principal names depending on whether they
use GSSAPI encryption.  Always using krb_server_keyfile seems like
the right thing, so make both places do that.  Also fix up
secure_open_gssapi()'s lack of a check for setenv() failure ---
it's unlikely, surely, but security-critical actions are no place
to be sloppy.

Also improve the associated documentation.

This patch does nothing about secure_open_gssapi()'s use of setenv(),
and indeed causes pg_GSS_recvauth() to use it too.  That's nominally
against project portability rules, but since this code is only built
with --with-gssapi, I do not feel a need to do something about this
in the back branches.  A fix will be forthcoming for HEAD though.

Back-patch to v12 where GSSAPI encryption was introduced.  The
dubious behavior in pg_GSS_recvauth() goes back further, but it
didn't have anything to be inconsistent with, so let it be.

Discussion: https://postgr.es/m/2187460.1609263156@sss.pgh.pa.us
2020-12-30 11:38:42 -05:00
Tom Lane 1f9158ba48 Suppress log spam from multiple reports of SIGQUIT shutdown.
When the postmaster sends SIGQUIT to its children, there's no real
need for all the children to log that fact; the postmaster already
made a log entry about it, so adding perhaps dozens or hundreds of
child-process log entries adds nothing of value.  So, let's introduce
a new ereport level to specify "WARNING, but never send to log" and
use that for these messages.

Such a change wouldn't have been desirable before commit 7e784d1dc,
because if someone manually SIGQUIT's a backend, we *do* want to log
that.  But now we can tell the difference between a signal that was
issued by the postmaster and one that was not with reasonable
certainty.

While we're here, also clear error_context_stack before ereport'ing,
to prevent error callbacks from being invoked in the signal-handler
context.  This should reduce the odds of getting hung up while trying
to notify the client.

Per a suggestion from Andres Freund.

Discussion: https://postgr.es/m/20201225230331.hru3u6obyy6j53tk@alap3.anarazel.de
2020-12-29 18:02:38 -05:00
Alexander Korotkov db6335b5b1 Add support of multirange matching to the existing range GiST indexes
6df7a9698b has introduced a set of operators between ranges and multiranges.
Existing GiST indexes for ranges could easily support majority of them.
This commit adds support for new operators to the existing range GiST indexes.
New operators resides the same strategy numbers as existing ones.  Appropriate
check function is determined using the subtype.

Catversion is bumped.
2020-12-29 23:36:43 +03:00
Alexander Korotkov d1d61a8b23 Improve the signature of internal multirange functions
There is a set of *_internal() functions exposed in
include/utils/multirangetypes.h.  This commit improves the signatures of these
functions in two ways.
 * Add const qualifies where applicable.
 * Replace multirange typecache argument with range typecache argument.
   Multirange typecache was used solely to find the range typecache.  At the
   same time, range typecache is easier for the caller to find.
2020-12-29 23:35:38 +03:00
Alexander Korotkov 4d7684cc75 Implement operators for checking if the range contains a multirange
We have operators for checking if the multirange contains a range but don't
have the opposite.  This commit improves completeness of the operator set by
adding two new operators: @> (anyrange,anymultirange) and
<@(anymultirange,anyrange).

Catversion is bumped.
2020-12-29 23:35:33 +03:00
Alexander Korotkov a5b81b6f00 Fix bugs in comparison functions for multirange_bsearch_match()
Two functions multirange_range_overlaps_bsearch_comparison() and
multirange_range_contains_bsearch_comparison() contain bugs of returning -1
instead of 1.  This commit fixes these bugs and adds corresponding regression
tests.
2020-12-29 23:35:26 +03:00
Tom Lane 622ae4621e Fix assorted issues in backend's GSSAPI encryption support.
Unrecoverable errors detected by GSSAPI encryption can't just be
reported with elog(ERROR) or elog(FATAL), because attempting to
send the error report to the client is likely to lead to infinite
recursion or loss of protocol sync.  Instead make this code do what
the SSL encryption code has long done, which is to just report any
such failure to the server log (with elevel COMMERROR), then pretend
we've lost the connection by returning errno = ECONNRESET.

Along the way, fix confusion about whether message translation is done
by pg_GSS_error() or its callers (the latter should do it), and make
the backend version of that function work more like the frontend
version.

Avoid allocating the port->gss struct until it's needed; we surely
don't need to allocate it in the postmaster.

Improve logging of "connection authorized" messages with GSS enabled.
(As part of this, I back-patched the code changes from dc11f31a1.)

Make BackendStatusShmemSize() account for the GSS-related space that
will be allocated by CreateSharedBackendStatus().  This omission
could possibly cause out-of-shared-memory problems with very high
max_connections settings.

Remove arbitrary, pointless restriction that only GSS authentication
can be used on a GSS-encrypted connection.

Improve documentation; notably, document the fact that libpq now
prefers GSS encryption over SSL encryption if both are possible.

Per report from Mikael Gustavsson.  Back-patch to v12 where
this code was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:44:17 -05:00
Michael Paquier 643428c54b Fix inconsistent code with shared invalidations of snapshots
The code in charge of processing a single invalidation message has been
using since 568d413 the structure for relation mapping messages.  This
had fortunately no consequence as both locate the database ID at the
same location, but it could become a problem in the future if this area
of the code changes.

Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/8044c223-4d3a-2cdb-42bf-29940840ce94@postgrespro.ru
Backpatch-through: 9.5
2020-12-28 22:16:49 +09:00
Bruce Momjian 3187ef7c46 Revert "Add key management system" (978f869b99) & later commits
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.

Reported-by: Tom Lane, Michael Paquier

Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
2020-12-27 21:37:42 -05:00
Noah Misch 08db7c63f3 Invalidate acl.c caches when pg_authid changes.
This makes existing sessions reflect "ALTER ROLE ... [NO]INHERIT" as
quickly as they have been reflecting "GRANT role_name".  Back-patch to
9.5 (all supported versions).

Reviewed by Nathan Bossart.

Discussion: https://postgr.es/m/20201221095028.GB3777719@rfd.leadboat.com
2020-12-25 10:41:59 -08:00
Bruce Momjian 978f869b99 Add key management system
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits.  The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode.  A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start.  New parameters allow a file descriptor open to the terminal to
be passed.  pg_upgrade support has also been added.

Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us

Author: Masahiko Sawada, me, Stephen Frost
2020-12-25 10:19:44 -05:00
Bruce Momjian c3826f831e move hex_decode() to /common so it can be called from frontend
This allows removal of a copy of hex_decode() from ecpg, and will be
used by the soon-to-be added pg_alterckey command.

Backpatch-through: master
2020-12-24 17:25:48 -05:00
Michael Paquier 6db27037b9 Fix portability issues with parsing of recovery_target_xid
The parsing of this parameter has been using strtoul(), which is not
portable across platforms.  On most Unix platforms, unsigned long has a
size of 64 bits, while on Windows it is 32 bits.  It is common in
recovery scenarios to rely on the output of txid_current() or even the
newer pg_current_xact_id() to get a transaction ID for setting up
recovery_target_xid.  The value returned by those functions includes the
epoch in the computed result, which would cause strtoul() to fail where
unsigned long has a size of 32 bits once the epoch is incremented.

WAL records and 2PC data include only information about 32-bit XIDs and
it is not possible to have XIDs across more than one epoch, so
discarding the high bits from the transaction ID set has no impact on
recovery.  On the contrary, the use of strtoul() prevents a consistent
behavior across platforms depending on the size of unsigned long.

This commit changes the parsing of recovery_target_xid to use
pg_strtouint64() instead, available down to 9.6.  There is one TAP test
stressing recovery with recovery_target_xid, where a tweak based on
pg_reset{xlog,wal} is added to bump the XID epoch so as this change gets
tested, as per an idea from Alexander Lakhin.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16780-107fd0c0385b1035@postgresql.org
Backpatch-through: 9.6
2020-12-23 12:51:22 +09:00
Tom Lane ff5d5611c0 Remove "invalid concatenation of jsonb objects" error case.
The jsonb || jsonb operator arbitrarily rejected certain combinations
of scalar and non-scalar inputs, while being willing to concatenate
other combinations.  This was of course quite undocumented.  Rather
than trying to document it, let's just remove the restriction,
creating a uniform rule that unless we are handling an object-to-object
concatenation, non-array inputs are converted to one-element arrays,
resulting in an array-to-array concatenation.  (This does not change
the behavior for any case that didn't throw an error before.)

Per complaint from Joel Jacobson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/163099.1608312033@sss.pgh.pa.us
2020-12-21 13:11:50 -05:00
Alexander Korotkov 29f8f54676 Fix compiler warning in multirange_constructor0()
Discussion: https://postgr.es/m/X%2BBP8XE0UpIB6Yvh%40paquier.xyz
Author: Michael Paquier
2020-12-21 14:25:32 +03:00
Alexander Korotkov 4e1ee79e31 Fix typalign in rangetypes statistics
6df7a9698b introduces multirange types, whose typanalyze function shares
infrastructure with range types typanalyze function.  Since 6df7a9698b,
information about type gathered by statistics is filled from typcache.
But typalign is mistakenly always set to double.  This commit fixes this
oversight.
2020-12-21 00:31:11 +03:00
Alexander Korotkov 6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Tom Lane 53d4f5fef0 Avoid memcpy() with same source and destination during relmapper init.
A narrow reading of the C standard says that memcpy(x,x,n) is undefined,
although it's hard to envision an implementation that would really
misbehave.  However, analysis tools such as valgrind might whine about
this; accordingly, let's band-aid relmapper.c to not do it.

See also 5b630501e, d3f4e8a8a, ad7b48ea0, and other similar fixes.
Apparently, none of those folk tried valgrinding initdb?  This has been
like this for long enough that I'm surprised it hasn't been reported
before.

Back-patch, just in case anybody wants to use a back branch on a platform
that complains about this; we back-patched those earlier fixes too.

Discussion: https://postgr.es/m/161790.1608310142@sss.pgh.pa.us
2020-12-18 15:46:44 -05:00
Tom Lane b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Tom Lane 653aa603f5 Provide an error cursor for "can't subscript" error messages.
Commit c7aba7c14 didn't add this, but after more fooling with the
feature I feel that it'd be useful.  To make this possible, refactor
getSubscriptingRoutines() so that the caller is responsible for
throwing any error.  (In clauses.c, I just chose to make the
most conservative assumption rather than throwing an error.  We don't
expect failures there anyway really, so the code space for an error
message would be a poor investment.)
2020-12-11 18:58:21 -05:00
Tom Lane c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Peter Eisentraut 8b069ef5dc Change get_constraint_index() to use pg_constraint.conindid
It was still using a scan of pg_depend instead of using the conindid
column that has been added since.

Since it is now just a catalog lookup wrapper and not related to
pg_depend, move from pg_depend.c to lsyscache.c.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/4688d55c-9a2e-9a5a-d166-5f24fe0bf8db%40enterprisedb.com
2020-12-09 15:41:45 +01:00
Tom Lane a676386b58 Remove operator_precedence_warning.
This GUC was always intended as a temporary solution to help with
finding 9.4-to-9.5 migration issues.  Now that all pre-9.5 branches
are out of support, and 9.5 will be too before v14 is released,
it seems like it's okay to drop it.  Doing so allows removal of
several hundred lines of poorly-tested code in parse_expr.c,
which have been a fertile source of bugs when people did use this.

Discussion: https://postgr.es/m/2234320.1607117945@sss.pgh.pa.us
2020-12-08 16:29:52 -05:00
Peter Eisentraut eb93f3a0b6 Convert elog(LOG) calls to ereport() where appropriate
User-visible log messages should go through ereport(), so they are
subject to translation.  Many remaining elog(LOG) calls are really
debugging calls.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
2020-12-04 14:25:23 +01:00
Michael Paquier bd94a9c04e Rename cryptohashes.c to cryptohashfuncs.c
87ae969 has created two new files called cryptohash{_openssl}.c in
src/common/, whose names overlap with the existing backend file called
cryptohashes.c dedicated to the SQL wrappers for SHA2 and MD5.  This
file is renamed to cryptohashfuncs.c to be more consistent with the
surroundings and reduce the confusion with the new cryptohash interface
of src/common/.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/X8hHhaQgbMbW+aGU@paquier.xyz
2020-12-04 12:58:44 +09:00
Michael Paquier 4f48a6fbe2 Change SHA2 implementation based on OpenSSL to use EVP digest routines
The use of low-level hash routines is not recommended by upstream
OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67.
This takes advantage of the refactoring done in 87ae969 that has
introduced the allocation and free routines for cryptographic hashes.

Since 1.1.0, OpenSSL does not publish the contents of the cryptohash
contexts, forcing any consumers to rely on OpenSSL for all allocations.
Hence, the resource owner callback mechanism gains a new set of routines
to track and free cryptohash contexts when using OpenSSL, preventing any
risks of leaks in the backend.  Nothing is needed in the frontend thanks
to the refactoring of 87ae969, and the resowner knowledge is isolated
into cryptohash_openssl.c.

Note that this also fixes a failure with SCRAM authentication when using
FIPS in OpenSSL, but as there have been few complaints about this
problem and as this causes an ABI breakage, no backpatch is done.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Heikki Linnakangas
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
2020-12-04 10:49:23 +09:00
Stephen Frost dc11f31a1a Add GSS information to connection authorized log message
GSS information (if used) such as if the connection was authorized using
GSS or if it was encrypted using GSS, and perhaps most importantly, what
the GSS principal used for the authentication was, is extremely useful
but wasn't being included in the connection authorized log message.

Therefore, add to the connection authorized log message that
information, in a similar manner to how we log SSL information when SSL
is used for a connection.

Author: Vignesh C
Reviewed-by: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CALDaNm2N1385_Ltoo%3DS7VGT-ESu_bRQa-sC1wg6ikrM2L2Z49w%40mail.gmail.com
2020-12-02 14:41:53 -05:00
Fujii Masao 01469241b2 Track total number of WAL records, FPIs and bytes generated in the cluster.
Commit 6b466bf5f2 allowed pg_stat_statements to track the number of
WAL records, full page images and bytes that each statement generated.
Similarly this commit allows us to track the cluster-wide WAL statistics
counters.

New columns wal_records, wal_fpi and wal_bytes are added into the
pg_stat_wal view, and reports the total number of WAL records,
full page images and bytes generated in the , respectively.

Author: Masahiro Ikeda
Reviewed-by: Amit Kapila, Movead Li, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/35ef960128b90bfae3b3fdf60a3a860f@oss.nttdata.com
2020-12-02 13:00:15 +09:00
Fujii Masao 942305a363 Allow restore_command parameter to be changed with reload.
This commit changes restore_command from PGC_POSTMASTER to PGC_SIGHUP.

As the side effect of this commit, restore_command can be reset to
empty during archive recovery. In this setting, archive recovery
tries to replay only WAL files available in pg_wal directory. This is
the same behavior as when the command that always fails is specified
in restore_command.

Note that restore_command still must be specified (not empty) when
starting archive recovery, even after applying this commit. This is
necessary as the safeguard to prevent users from forgetting to
specify restore_command and starting archive recovery.

Thanks to Peter Eisentraut, Michael Paquier, Andres Freund,
Robert Haas and Anastasia Lubennikova for discussion.

Author: Sergei Kornilov
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/2317771549527294@sas2-985f744271ca.qloud-c.yandex.net
2020-12-02 11:00:15 +09:00
Michael Paquier 87ae9691d2 Move SHA2 routines to a new generic API layer for crypto hashes
Two new routines to allocate a hash context and to free it are created,
as these become necessary for the goal behind this refactoring: switch
the all cryptohash implementations for OpenSSL to use EVP (for FIPS and
also because upstream does not recommend the use of low-level cryptohash
functions for 20 years).  Note that OpenSSL hides the internals of
cryptohash contexts since 1.1.0, so it is necessary to leave the
allocation to OpenSSL itself, explaining the need for those two new
routines.  This part is going to require more work to properly track
hash contexts with resource owners, but this not introduced here.
Still, this refactoring makes the move possible.

This reduces the number of routines for all SHA2 implementations from
twelve (SHA{224,256,386,512} with init, update and final calls) to five
(create, free, init, update and final calls) by incorporating the hash
type directly into the hash context data.

The new cryptohash routines are moved to a new file, called cryptohash.c
for the fallback implementations, with SHA2 specifics becoming a part
internal to src/common/.  OpenSSL specifics are part of
cryptohash_openssl.c.  This infrastructure is usable for more hash
types, like MD5 or HMAC.

Any code paths using the internal SHA2 routines are adapted to report
correctly errors, which are most of the changes of this commit.  The
zones mostly impacted are checksum manifests, libpq and SCRAM.

Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it
lacked the refactoring needed for libpq, as done here.

This patch has been tested on Linux and Windows, with and without
OpenSSL, and down to 1.0.1, the oldest version supported on HEAD.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
2020-12-02 10:37:20 +09:00
Tom Lane 2432b1a040 Avoid spamming the client with multiple ParameterStatus messages.
Up to now, we sent a ParameterStatus message to the client immediately
upon any change in the active value of any GUC_REPORT variable.  This
was only barely okay when the feature was designed; now that we have
things like function SET clauses, there are very plausible use-cases
where a GUC_REPORT variable might change many times within a query
--- and even end up back at its original value, perhaps.  Fortunately
most of our GUC_REPORT variables are unlikely to be changed often;
but there are proposals in play to enlarge that set, or even make it
user-configurable.

Hence, let's fix things to not generate more than one ParameterStatus
message per variable per query, and to not send any message at all
unless the end-of-query value is different from what we last reported.

Discussion: https://postgr.es/m/5708.1601145259@sss.pgh.pa.us
2020-11-25 11:40:44 -05:00
Amit Kapila 805b816305 Remove obsolete comment atop ri_PlanCheck.
Commit 5b7ba75f7f removed the unused parameter but forgot to update the
nearby comments.

Author: Li Japin
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/0E2F62A2-B2F1-4052-83AE-F0BEC8A75789@hotmail.com
2020-11-25 09:14:45 +05:30
Michael Paquier 7b94e99960 Remove catalog function currtid()
currtid() and currtid2() are an undocumented set of functions whose sole
known user is the Postgres ODBC driver, able to retrieve the latest TID
version for a tuple given by the caller of those functions.

As used by Postgres ODBC, currtid() is a shortcut able to retrieve the
last TID loaded into a backend by passing an OID of 0 (magic value)
after a tuple insertion.  This is removed in this commit, as it became
obsolete after the driver began using "RETURNING ctid" with inserts, a
clause supported since Postgres 8.2 (using RETURNING is better for
performance anyway as it reduces the number of round-trips to the
backend).

currtid2() is still used by the driver, so this remains around for now.
Note that this function is kept in its original shape for backward
compatibility reasons.

Per discussion with many people, including Andres Freund, Peter
Eisentraut, Álvaro Herrera, Hiroshi Inoue, Tom Lane and myself.

Bump catalog version.

Discussion: https://postgr.es/m/20200603021448.GB89559@paquier.xyz
2020-11-25 12:18:26 +09:00
Heikki Linnakangas 0a2bc5d61e Move per-agg and per-trans duplicate finding to the planner.
This has the advantage that the cost estimates for aggregates can count
the number of calls to transition and final functions correctly.

Bump catalog version, because views can contain Aggrefs.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b2e3536b-1dbc-8303-c97e-89cb0b4a9a48%40iki.fi
2020-11-24 10:45:00 +02:00
Tom Lane 789b938bf2 Centralize logic for skipping useless ereport/elog calls.
While ereport() and elog() themselves are quite cheap when the
error message level is too low to be printed, some places need to do
substantial work before they can call those macros at all.  To allow
optimizing away such setup work when nothing is to be printed, make
elog.c export a new function message_level_is_interesting(elevel)
that reports whether ereport/elog will do anything.  Make use of that
in various places that had ad-hoc direct tests of log_min_messages etc.
Also teach ProcSleep to use it to avoid some work.  (There may well
be other places that could usefully use this; I didn't search hard.)

Within elog.c, refactor a little bit to avoid having duplicate copies
of the policy-setting logic.  When that code was written, we weren't
relying on the availability of inline functions; so it had some
duplications in the name of efficiency, which I got rid of.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
2020-11-23 19:10:46 -05:00
David Rowley 913ec71d68 Improve compiler code layout in elog/ereport ERROR calls
Here we use a bit of preprocessor trickery to coax supporting compilers
into laying out their generated code so that the code that's in the same
branch as elog(ERROR)/ereport(ERROR) calls is moved away from the hot
path.  Effectively, this reduces the size of the hot code meaning that it
can sit on fewer cache lines.

Performance improvements of between 10-15% have been seen on highly CPU
bound workloads using pgbench's TPC-b benchmark.

What's achieved here is very similar to putting the error condition inside
an unlikely() macro. For example;

if (unlikely(x < 0))
    elog(ERROR, "invalid x value");

now there's no need to make use of unlikely() here as the common macro
used by elog and ereport will now see that elevel is >= ERROR and make use
of a pg_attribute_cold marked version of errstart().

When elevel < ERROR or if it cannot be determined to be constant, the
original behavior is maintained.

Author: David Rowley
Reviewed-by: Andres Freund, Peter Eisentraut
Discussion: https://postgr.es/m/CAApHDvrVpasrEzLL2er7p9iwZFZ%3DJj6WisePcFeunwfrV0js_A%40mail.gmail.com
2020-11-24 12:04:42 +13:00
Alvaro Herrera fe05129155
Make some sanity-check elogs more verbose
A few sanity checks in funcapi.c were not mentioning all the possible
clauses for failure, confusing developers who fat-fingered catalog data
additions.  Make the errors more detailed to avoid wasting time in
pinpointing mistakes.

Per complaint from Craig Ringer.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMsr+YH7Kd87A3cU5m_wKo46HPQ46zFv5wesFNL0YWxkGhGv3g@mail.gmail.com
2020-11-23 13:10:03 -03:00
Tom Lane 9fe649ea29 In geo_ops.c, represent infinite slope as Infinity, not DBL_MAX.
Since we're assuming IEEE floats these days, there seems little
reason not to do this.  It has the advantage that when the slope is
computed as infinite due to the presence of Inf coordinates, we get
saner behavior than before from line_construct(), and thence also
in some dependent operations such as finding the closest point.

Also fix line_construct() to special-case slope zero.  The previous
coding got the right answer in most cases, but it could compute
C as NaN when the point has Inf coordinates.

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-21 17:24:07 -05:00
Tom Lane 8597a48d01 Fix FPeq() and friends to get the right answers for infinities.
"FPeq(infinity, infinity)" returned false, on account of getting NaN
when it subtracts the two inputs.  Fix that by adding a separate
check for exact equality.

FPle() and FPge() similarly got the wrong answer for two like-signed
infinities.  In those cases, we can just rearrange the comparisons
to avoid potentially subtracting infinities.

While the sibling functions FPne() etc accidentally gave the right
answers even with the internal NaN results, it seems best to make
similar adjustments to them to avoid depending on this.

FPeq() has to be converted to an inline function to avoid double
evaluations of its arguments, and I did the same for the others
just for consistency.

In passing, make the handling of NaN cases in line_eq() and
point_eq_point() simpler and easier to reason about, and perhaps
faster.

This results in just one visible regression test change: slope()
now gives DBL_MAX for two inputs of (inf,1e300), which is consistent
with what it does for (1e300,inf), so that seems like a bug fix.

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-21 16:46:43 -05:00
Thomas Munro ca051d8b10 Add collation versions for FreeBSD.
On FreeBSD 13, use querylocale() to read the current version of libc
collations.  Similar to commits 352f6f2d for Windows and d5ac14f9 for
GNU/Linux.

Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-20 21:49:57 +13:00
Peter Eisentraut 01e658fa74 Hash support for row types
Add hash functions for the record type as well as a hash operator
family and operator class for the record type.  This enables all the
hash functionality for the record type such as hash-based plans for
UNION/INTERSECT/EXCEPT DISTINCT, recursive queries using UNION
DISTINCT, hash joins, and hash partitioning.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com
2020-11-19 09:32:47 +01:00
Tom Lane ff94205787 Suppress "warning: variable 'collcollate' set but not used".
Buildfarm members that lack both HAVE_LOCALE_T and USE_ICU have been
complaining about pg_newlocale_from_collation's collcollate variable.
This is evidently fallout from commit 7d1297df0, which removed the
only usage outside those two #ifdef'd code paths.  Mark the variable
pg_attribute_unused(), like its sibling collctype, which has been that
way for a long time.
2020-11-15 12:39:49 -05:00
Michael Paquier 788dd0b839 Fix some typos
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/C36ADFDF-D09A-4EE5-B186-CB46C3653F4C@yesql.se
2020-11-14 11:43:10 +09:00
Tom Lane ec0294fb2c Support negative indexes in split_part().
This provides a handy way to get, say, the last field of the string.
Use of a negative index in this way has precedent in the nearby
left() and right() functions.

The implementation scans the string twice when N < -1, but it seems
likely that N = -1 will be the huge majority of actual use cases,
so I'm not really excited about adding complexity to avoid that.

Nikhil Benesch, reviewed by Jacob Champion; cosmetic tweakage by me

Discussion: https://postgr.es/m/cbb7f861-6162-3a51-9823-97bc3aa0b638@gmail.com
2020-11-13 13:49:48 -05:00
Peter Eisentraut 0af302af40 Fix some stray whitespace in parser files 2020-11-11 17:37:18 +01:00
Tom Lane ec29427ce2 Fix and simplify some usages of TimestampDifference().
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format.  This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.

Two of these call sites were in fact wrong:

* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.

* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended.  Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units.  This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.

There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds.  It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.

Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.

Alexey Kondratov and Tom Lane

Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
2020-11-10 22:51:54 -05:00
Thomas Munro d50e3b1f8d Fix assertion in collation version lookup.
Commit 257836a7 included an assertion that a version lookup routine is
not trying to look up "C" or "POSIX", but that case is reachable with
the user-facing SQL function pg_collation_actual_version().  Remove the
assertion.
2020-11-08 20:45:29 +13:00
Peter Eisentraut bdc4edbea6 Move catalog index declarations
Move the system catalog index declarations from catalog/indexing.h to
the respective parent tables' catalog/pg_*.h files.  The original
reason for having it split was that the old genbki system produced the
output in the order of the catalog files it read, so all the indexing
stuff needed to come separately.  But this is no longer the case, and
keeping it together makes more sense.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/c7cc82d6-f976-75d6-2e3e-b03d2cab26bb@2ndquadrant.com
2020-11-07 12:26:24 +01:00
Michael Paquier a05dbf477b Add GUC_LIST_INPUT and GUC_LIST_QUOTE to unix_socket_directories
This should have been done in the initial commit that made
unix_socket_directories a list as of c9b0cbe.  This change allows to
support correctly the case of ALTER SYSTEM, where it is possible to
specify multiple paths as a list, like the following pattern where
flattening is applied to each item:
ALTER SYSTEM SET unix_socket_directories = '/path1', '/path2';

Any parameters specified in postgresql.conf are parsed the same way, so
there is no compatibility change.  pg_dump has a hardcoded list of
parameters marked with GUC_LIST_QUOTE, that gets its routine update.
These are reordered alphabetically for clarity.

Author: Ian Lawrence Barwick
Reviewed-by: Peter Eisentraunt, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAB8KJ=iMOtNY6_sUwV=LQVCJ2zgYHBDyNzVfvE5GN3WQ3v9kQg@mail.gmail.com
2020-11-07 10:30:22 +09:00
Tom Lane 40c24bfef9 Improve our ability to regurgitate SQL-syntax function calls.
The SQL spec calls out nonstandard syntax for certain function calls,
for example substring() with numeric position info is supposed to be
spelled "SUBSTRING(string FROM start FOR count)".  We accept many
of these things, but up to now would not print them in the same format,
instead simplifying down to "substring"(string, start, count).
That's long annoyed me because it creates an interoperability
problem: we're gratuitously injecting Postgres-specific syntax into
what might otherwise be a perfectly spec-compliant view definition.
However, the real reason for addressing it right now is to support
a planned change in the semantics of EXTRACT() a/k/a date_part().
When we switch that to returning numeric, we'll have the parser
translate EXTRACT() to some new function name (might as well be
"extract" if you ask me) and then teach ruleutils.c to reverse-list
that per SQL spec.  In this way existing calls to date_part() will
continue to have the old semantics.

To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX,
and make the parser insert that rather than COERCE_EXPLICIT_CALL when
the input has SQL-spec decoration.  (But if the input has the form of
a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even
if it's calling one of these functions.)  Then ruleutils.c recognizes
COERCE_SQL_SYNTAX as a cue to emit SQL call syntax.  It can know
which decoration to emit using hard-wired knowledge about the
functions that could be called this way.  (While this solution isn't
extensible without manual additions, neither is the grammar, so this
doesn't seem unmaintainable.)  Notice that this solution will
reverse-list a function call with SQL decoration only if it was
entered that way; so dump-and-reload will not by itself produce any
changes in the appearance of views.

This requires adding a CoercionForm field to struct FuncCall.
(I couldn't resist the temptation to rearrange that struct's
field order a tad while I was at it.)  FuncCall doesn't appear
in stored rules, so that change isn't a reason for a catversion
bump, but I did one anyway because the new enum value for
CoercionForm fields could confuse old backend code.

Possible future work:

* Perhaps CoercionForm should now be renamed to DisplayForm,
or something like that, to reflect its more general meaning.
This'd require touching a couple hundred places, so it's not
clear it's worth the code churn.

* The SQLValueFunction node type, which was invented partly for
the same goal of improving SQL-compatibility of view output,
could perhaps be replaced with regular function calls marked
with COERCE_SQL_SYNTAX.  It's unclear if this would be a net
code savings, however.

Discussion: https://postgr.es/m/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2020-11-04 12:34:50 -05:00
Tom Lane f21636e5d5 Remove useless entries for aggregate functions from fmgrtab.c.
Gen_fmgrtab.pl treated aggregate functions the same as other built-in
functions, which is wasteful because there is no real need to have
entries for them in the fmgr_builtins[] table.  Suppressing those
entries saves about 3KB in the compiled table on my machine; which
is not a lot but it's not nothing either, considering that that
table is pretty "hot".  The only outside code change needed is
that ExecInitWindowAgg() can't be allowed to call fmgr_info_cxt()
on a plain aggregate function.  But that saves a few cycles anyway.

Having done that, the aggregate_dummy() function is unreferenced
and might as well be dropped.  Using "aggregate_dummy" as the prosrc
value for an aggregate is now just a documentation convention not
something that matters.  There was some discussion of using NULL
instead to save a few bytes in pg_proc, but we'd have to remove
prosrc's BKI_FORCE_NOT_NULL marking which doesn't seem a great idea.
Anyway, it's possible there's client-side code that expects to
see "aggregate_dummy" there, so I'm loath to change it without a
strong reason.

Discussion: https://postgr.es/m/533989.1604263665@sss.pgh.pa.us
2020-11-04 11:25:56 -05:00
Peter Eisentraut 560564d3ad Enable hash partitioning of text arrays
hash_array_extended() needs to pass PG_GET_COLLATION() to the hash
function of the element type.  Otherwise, the hash function of a
collation-aware data type such as text will error out, since the
introduction of nondeterministic collation made hash functions require
a collation, too.

The consequence of this is that before this change, hash partitioning
using an array over text in the partition key would not work.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/32c1fdae-95c6-5dc6-058a-a90330a3b621%40enterprisedb.com
2020-11-04 12:46:28 +01:00
Thomas Munro 9f12a3b95d Tolerate version lookup failure for old style Windows locale names.
Accept that we can't get versions for such locale names for now.  Users
will need to specify the newer language tag format to enable the
collation versioning feature.  It's not clear that we can do automatic
conversion from the old style to the new style reliably enough for this
purpose.

Unfortunately, this means that collation versioning probably won't work
for the default collation unless you provide something like en-US at
initdb or CREATE DATABASE time (though, for reasons not yet understood,
it does seem to work on some systems).  It'd be nice to find a better
solution, or document this quirk if we settle on it, but this should
unbreak the 3 failing build farm animals in the meantime.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-04 15:13:08 +13:00
Michael Paquier e152506ade Revert pg_relation_check_pages()
This reverts the following set of commits, following complaints about
the lack of portability of the central part of the code in bufmgr.c as
well as the use of partition mapping locks during page reads:
c780a7a9
f2b88396
b787d4ce
ce7f772c
60a51c6b

Per discussion with Andres Freund, Robert Haas and myself.

Bump catalog version.

Discussion: https://postgr.es/m/20201029181729.2nrub47u7yqncsv7@alap3.anarazel.de
2020-11-04 10:21:46 +09:00
Tom Lane e1339bfc7a Remove special checks for pg_rewrite.ev_qual and ev_action being NULL.
make_ruledef() and make_viewdef() were coded to cope with possible
null-ness of these columns, but they've been marked BKI_FORCE_NOT_NULL
for some time.  So there's not really any need to do more than what
we do for the other columns of pg_rewrite, i.e. just Assert that
we got non-null results.

(There is a school of thought that says Asserts aren't the thing
to do to check for corrupt data, but surely here is not the place
to start if we want such a policy.)

Also, remove long-dead-if-indeed-it-ever-wasn't-dead handling of
an empty actions list in make_ruledef().  That's an error case
and should be treated as such.  (DO INSTEAD NOTHING is represented
by a CMD_NOTHING Query, not an empty list; cf transformRuleStmt.)

Kyotaro Horiguchi, some changes by me

Discussion: https://postgr.es/m/CAEudQApoA=tMTic6xEPYP_hsNZ8XtToVThK_0x7D_aFQYowq3w@mail.gmail.com
2020-11-02 14:34:34 -05:00
Tom Lane 8e1f37c07a Rethink the generation rule for fmgroids.h macros.
Traditionally, the names of fmgroids.h macros for pg_proc OIDs
have been constructed from the prosrc field.  But sometimes the
same C function underlies multiple pg_proc entries, forcing us
to make an arbitrary choice of which OID to reference; the other
entries are then not namable via fmgroids.h.  Moreover, we could
not have macros at all for pg_proc entries that aren't for
C-coded functions.

Instead, use the proname field, and append the proargtypes field
(replacing inter-argument spaces with underscores) if proname is
not unique.  Special-casing unique entries such as F_OIDEQ removes
the need to change a lot of code.  Indeed, I can only find two
places in the tree that need to be adjusted; while this changes
quite a few existing entries in fmgroids.h, few of them are
referenced from C code.

With this patch, all entries in pg_proc.dat have macros in fmgroids.h.

Discussion: https://postgr.es/m/472274.1604258384@sss.pgh.pa.us
2020-11-02 11:57:28 -05:00
Peter Eisentraut dd26a0ad76 Use PG_GETARG_TRANSACTIONID where appropriate
Some places were using PG_GETARG_UINT32 where PG_GETARG_TRANSACTIONID
would be more appropriate.  (Of course, they are the same internally,
so there is no externally visible effect.)  To do that, export
PG_GETARG_TRANSACTIONID outside of xid.c.  We also export
PG_RETURN_TRANSACTIONID for symmetry, even though there are currently
no external users.

Author: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/d8f6bdd536df403b9b33816e9f7e0b9d@G08CNEXMBPEKD05.g08.fujitsu.local
2020-11-02 16:48:22 +01:00
Thomas Munro 257836a755 Track collation versions for indexes.
Record the current version of dependent collations in pg_depend when
creating or rebuilding an index.  When accessing the index later, warn
that the index may be corrupted if the current version doesn't match.

Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
Michael Paquier, Robert Haas, Tom Lane and others for very helpful
discussion.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 01:19:50 +13:00
Thomas Munro 7d1297df08 Remove pg_collation.collversion.
This model couldn't be extended to cover the default collation, and
didn't have any information about the affected database objects when the
version changed.  Remove, in preparation for a follow-up commit that
will add a new mechanism.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 00:44:59 +13:00
Michael Paquier 8a15e735be Fix some grammar and typos in comments and docs
The documentation fixes are backpatched down to where they apply.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201031020801.GD3080@telsasoft.com
Backpatch-through: 9.6
2020-11-02 15:14:41 +09:00
Tom Lane f90149e628 Don't use custom OID symbols in pg_type.dat, either.
On the same reasoning as in commit 36b931214, forbid using custom
oid_symbol macros in pg_type as well as pg_proc, so that we always
rely on the predictable macro names generated by genbki.pl.

We do continue to grant grandfather status to the names CASHOID and
LSNOID, although those are now considered deprecated aliases for the
preferred names MONEYOID and PG_LSNOID.  This is because there's
likely to be client-side code using the old names, and this bout of
neatnik-ism doesn't quite seem worth breaking client code.

There might be a case for grandfathering EVTTRIGGEROID, too, since
externally-maintained PLs may reference that symbol.  But renaming
such references to EVENT_TRIGGEROID doesn't seem like a particularly
heavy lift --- we make far more significant backend API changes in
every major release.  For now I didn't add that, but we could
reconsider if there's pushback.

The other names changed here seem pretty unlikely to have any outside
uses.  Again, we could add alias macros if there are complaints, but
for now I didn't.

As before, no need for a catversion bump.

John Naylor

Discussion: https://postgr.es/m/CAFBsxsHpCbjfoddNGpnnnY5pHwckWfiYkMYSF74PmP1su0+ZOw@mail.gmail.com
2020-10-29 13:33:38 -04:00
Amit Kapila 8e90ec5580 Track statistics for streaming of changes from ReorderBuffer.
This adds the statistics about transactions streamed to the decoding
output plugin from ReorderBuffer. Users can query the
pg_stat_replication_slots view to check these stats and call
pg_stat_reset_replication_slot to reset the stats of a particular slot.
Users can pass NULL in pg_stat_reset_replication_slot to reset stats of
all the slots.

Commit 9868167500 has added the basic infrastructure to capture the stats
of slot and this commit extends the statistics collector to track
additional information about slots.

Bump the catversion as we have added new columns in the catalog entry.

Author: Ajin Cherian and Amit Kapila
Reviewed-by: Sawada Masahiko and Dilip Kumar
Discussion: https://postgr.es/m/CAA4eK1+chpEomLzgSoky-D31qev19AmECNiEAietPQUGEFhtVA@mail.gmail.com
2020-10-29 09:11:51 +05:30
Michael Paquier 60a51c6b32 Fix incorrect placement of pfree() in pg_relation_check_pages()
This would cause the function to crash when more than one page is
considered as broken and reported in the SRF.

Reported-by: Noriyoshi Shinoda
Discussion: https://postgr.es/m/TU4PR8401MB11523D42C315AAF822E74275EE170@TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM
2020-10-29 09:17:34 +09:00
Tom Lane 36b9312143 Don't use custom OID symbols in pg_proc.dat.
We have a perfectly good convention for OID macros for built-in functions
already, so making custom symbols is just introducing unnecessary
deviation from the convention.  Remove the one case that had snuck in,
and add an error check in genbki.pl to discourage future instances.

Although this touches pg_proc.dat, there's no need for a catversion
bump since the actual catalog data isn't changed.

John Naylor

Discussion: https://postgr.es/m/CAFBsxsHpCbjfoddNGpnnnY5pHwckWfiYkMYSF74PmP1su0+ZOw@mail.gmail.com
2020-10-28 12:18:45 -04:00
Michael Paquier ce7f772c5e Use correct GetDatum() in pg_relation_check_pages()
UInt32GetDatum() was getting used, while the result needs
Int64GetDatum().  Oversight in f2b8839.

Per buildfarm member florican.

Discussion: https://postgr.es/m/1226629.1603859189@sss.pgh.pa.us
2020-10-28 13:59:18 +09:00
Michael Paquier f2b8839695 Add pg_relation_check_pages() to check on-disk pages of a relation
This makes use of CheckBuffer() introduced in c780a7a, adding a SQL
wrapper able to do checks for all the pages of a relation.  By default,
all the fork types of a relation are checked, and it is possible to
check only a given relation fork.  Note that if the relation given in
input has no physical storage or is temporary, then no errors are
generated, allowing full-database checks when coupled with a simple scan
of pg_class for example.  This is not limited to clusters with data
checksums enabled, as clusters without data checksums can still apply
checks on pages using the page headers or for the case of a page full of
zeros.

This function returns a set of tuples consisting of:
- The physical file where a broken page has been detected (without the
segment number as that can be AM-dependent, which can be guessed from
the block number for heap).  A relative path from PGPATH is used.
- The block number of the broken page.

By default, only superusers have an access to this function but
execution rights can be granted to other users.

The feature introduced here is still minimal, and more improvements
could be done, like:
- Addition of a start and end block number to run checks on a range
of blocks, which would apply only if one fork type is checked.
- Addition of some progress reporting.
- Throttling, with configuration parameters in function input or
potentially some cost-based GUCs.

Regression tests are added for positive cases in the main regression
test suite, and TAP tests are added for cases involving the emulation of
page corruptions.

Bump catalog version.

Author: Julien Rouhaud, Michael Paquier
Reviewed-by: Masahiko Sawada, Justin Pryzby
Discussion: https://postgr.es/m/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com
2020-10-28 12:15:00 +09:00
David Rowley e83c9f913c Fix incorrect parameter name in a function header comment
Author: Zhijie Hou
Discussion: https://postgr.es/m/14cd74ea00204cc8a7ea5d738ac82cd1@G08CNEXMBPEKD05.g08.fujitsu.local
Backpatch-through: 12, where the mistake was introduced
2020-10-25 22:40:03 +13:00
Amit Kapila 03d51b776d Change the attribute name in pg_stat_replication_slots view.
Change the attribute 'name' to 'slot_name' in pg_stat_replication_slots
view to make it clear and that way we will be consistent with the other
places like pg_stat_wal_receiver view where we display the same attribute.

In the passing, fix the typo in one of the macros in the related code.

Bump the catversion as we have modified the name in the catalog as well.

Reported-by: Noriyoshi Shinoda
Author: Noriyoshi Shinoda
Reviewed-by: Sawada  Masahiko and Amit Kapila
Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
2020-10-20 10:24:36 +05:30
David Rowley 110d81728a Fixup some appendStringInfo and appendPQExpBuffer calls
A number of places were using appendStringInfo() when they could have been
using appendStringInfoString() instead.  While there's no functionality
change there, it's just more efficient to use appendStringInfoString()
when no formatting is required.  Likewise for some
appendStringInfoString() calls which were just appending a single char.
We can just use appendStringInfoChar() for that.

Additionally, many places were using appendPQExpBuffer() when they could
have used appendPQExpBufferStr(). Change those too.

Patch by Zhijie Hou, but further searching by me found significantly more
places that deserved the same treatment.

Author: Zhijie Hou, David Rowley
Discussion: https://postgr.es/m/cb172cf4361e4c7ba7167429070979d4@G08CNEXMBPEKD05.g08.fujitsu.local
2020-10-15 20:35:17 +13:00
Tom Lane 397ea901e8 Fix memory leak when guc.c decides a setting can't be applied now.
The prohibitValueChange code paths in set_config_option(), which
are executed whenever we re-read a PGC_POSTMASTER variable from
postgresql.conf, neglected to free anything before exiting.  Thus
we'd leak the proposed new value of a PGC_STRING variable, as noted
by BoChen in bug #16666.  For all variable types, if the check hook
creates an "extra" chunk, we'd also leak that.

These are malloc not palloc chunks, so there is no mechanism for
recovering the leaks before process exit.  Fortunately, the values
are typically not very large, meaning you'd have to go through an
awful lot of SIGHUP configuration-reload cycles to make the leakage
amount to anything.  Still, for a long-lived postmaster process it
could potentially be a problem.

Oversight in commit 2594cf0e8.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/16666-2c41a4eec61b03e1@postgresql.org
2020-10-12 13:31:24 -04:00
Tom Lane fe27009cbb Recognize network-failure errnos as indicating hard connection loss.
Up to now, only ECONNRESET (and EPIPE, in most but not quite all places)
received special treatment in our error handling logic.  This patch
changes things so that related error codes such as ECONNABORTED are
also recognized as indicating that the connection's dead and unlikely
to come back.

We continue to think, however, that only ECONNRESET and EPIPE should be
reported as probable server crashes; the other cases indicate network
connectivity problems but prove little about the server's state.  Thus,
there's no change in the error message texts that are output for such
cases.  The key practical effect is that errcode_for_socket_access()
will report ERRCODE_CONNECTION_FAILURE rather than
ERRCODE_INTERNAL_ERROR for a network failure.  It's expected that this
will fix buildfarm member lorikeet's failures since commit 32a9c0bdf,
as that seems to be due to not treating ECONNABORTED equivalently to
ECONNRESET.

The set of errnos treated this way now includes ECONNABORTED, EHOSTDOWN,
EHOSTUNREACH, ENETDOWN, ENETRESET, and ENETUNREACH.  Several of these
were second-class citizens in terms of their handling in places like
get_errno_symbol(), so upgrade the infrastructure where necessary.

As committed, this patch assumes that all these symbols are defined
everywhere.  POSIX specifies all of them except EHOSTDOWN, but that
seems to exist on all platforms of interest; we'll see what the
buildfarm says about that.

Probably this should be back-patched, but let's see what the buildfarm
thinks of it first.

Fujii Masao and Tom Lane

Discussion: https://postgr.es/m/2621622.1602184554@sss.pgh.pa.us
2020-10-10 13:28:12 -04:00
Tom Lane 7538708394 Avoid gratuitous inaccuracy in numeric width_bucket().
Multiply before dividing, not the reverse, so that cases that should
produce exact results do produce exact results.  (width_bucket_float8
got this right already.)  Even when the result is inexact, this avoids
making it more inexact, since only the division step introduces any
imprecision.

While at it, fix compute_bucket() to not uselessly repeat the sign
check already done by its caller, and avoid duplicating the
multiply/divide steps by adjusting variable usage.

Per complaint from Martin Visser.  Although this seems like a bug fix,
I'm hesitant to risk changing width_bucket()'s results in stable
branches, so no back-patch.

Discussion: https://postgr.es/m/6FA5117D-6AED-4656-8FEF-B74AC18FAD85@brytlyt.com
2020-10-08 13:06:27 -04:00
Tom Lane 8ce423b191 Fix numeric width_bucket() to allow its first argument to be infinite.
While the calculation is not well-defined if the bounds arguments are
infinite, there is a perfectly sane outcome if the test operand is
infinite: it's just like any other value that's before the first bucket
or after the last one.  width_bucket_float8() got this right, but
I was too hasty about the case when adding infinities to numerics
(commit a57d312a7), so that width_bucket_numeric() just rejected it.
Fix that, and sync the relevant error message strings.

No back-patch needed, since infinities-in-numeric haven't shipped yet.

Discussion: https://postgr.es/m/2465409.1602170063@sss.pgh.pa.us
2020-10-08 12:37:59 -04:00
Amit Kapila 9868167500 Track statistics for spilling of changes from ReorderBuffer.
This adds the statistics about transactions spilled to disk from
ReorderBuffer. Users can query the pg_stat_replication_slots view to check
these stats and call pg_stat_reset_replication_slot to reset the stats of
a particular slot. Users can pass NULL in pg_stat_reset_replication_slot
to reset stats of all the slots.

This commit extends the statistics collector to track this information
about slots.

Author: Sawada Masahiko and Amit Kapila
Reviewed-by: Amit Kapila and Dilip Kumar
Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
2020-10-08 09:09:08 +05:30
Tom Lane 3db322eaab Prevent internal overflows in date-vs-timestamp and related comparisons.
The date-vs-timestamp, date-vs-timestamptz, and timestamp-vs-timestamptz
comparators all worked by promoting the first type to the second and
then doing a simple same-type comparison.  This works fine, except
when the conversion result is out of range, in which case we throw an
entirely avoidable error.  The sources of such failures are
(a) type date can represent dates much farther in the future than
the timestamp types can;
(b) timezone rotation might cause a just-in-range timestamp value to
become a just-out-of-range timestamptz value.

Up to now we just ignored these corner-case issues, but now we have
an actual user complaint (bug #16657 from Huss EL-Sheikh), so let's
do something about it.

It turns out that commit 52ad1e659 already built all the necessary
infrastructure to support error-free comparisons, but neglected to
actually use it in the main-line code paths.  Fix that, do a little
bit of code style review, and remove the now-duplicate logic in
jsonpath_exec.c.

Back-patch to v13 where 52ad1e659 came in.  We could take this back
further by back-patching said infrastructure, but given the small
number of complaints so far, I don't feel a great need to.

Discussion: https://postgr.es/m/16657-cde2f876d8cc7971@postgresql.org
2020-10-07 17:10:26 -04:00
Tom Lane 18c170a08e Include the process PID in assertion-failure messages.
This should help to identify what happened when studying the postmaster
log after-the-fact.

While here, clean up some old comments in the same function.

Discussion: https://postgr.es/m/1568983.1601845687@sss.pgh.pa.us
2020-10-05 13:40:28 -04:00
Peter Eisentraut 2453ea1422 Support for OUT parameters in procedures
Unlike for functions, OUT parameters for procedures are part of the
signature.  Therefore, they have to be listed in pg_proc.proargtypes
as well as mentioned in ALTER PROCEDURE and DROP PROCEDURE.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/2b8490fe-51af-e671-c504-47359dc453c5@2ndquadrant.com
2020-10-05 09:21:43 +02:00
Tom Lane 97b6144826 Make postgres.bki use the same literal-string syntax as postgresql.conf.
The BKI file's string quoting conventions were previously quite weird,
perhaps as a result of repurposing a function built to scan
single-quoted strings to scan double-quoted ones.  Change to use the
same rules as we use in GUC files, allowing some simplifications in
genbki.pl and initdb.c.

While at it, completely remove the backend's scanstr() function, which
was essentially a duplicate of the string dequoting code in guc-file.l.
Instead export that one (under a less generic name than it had) and let
bootscanner.l use it.  Now we can clarify that scansup.c exists only to
support the main lexer. We could alternatively have removed GUC_scanstr,
but this way seems better since the previous arrangement could mislead
a reader into thinking that scanstr() had something to do with the main
lexer's handling of string literals.  Maybe it did once, but if so it
was a long time ago.

This patch does not bump catversion, since the initially-installed
catalog contents don't change.  Note however that successful initdb
after applying this patch will require up-to-date postgres.bki as well
as postgres and initdb executables.

In passing, remove a bunch of very-long-obsolete #include's in
bootparse.y and bootscanner.l.

John Naylor

Discussion: https://postgr.es/m/CACPNZCtDpd18T0KATTmCggO2GdVC4ow86ypiq5ENff1VnauL8g@mail.gmail.com
2020-10-04 16:09:55 -04:00
Fujii Masao 8d9a935965 Add pg_stat_wal statistics view.
This view shows the statistics about WAL activity. Currently it has only
two columns: wal_buffers_full and stats_reset. wal_buffers_full column
indicates the number of times WAL data was written to the disk because
WAL buffers got full. This information is useful when tuning wal_buffers.
stats_reset column indicates the time at which these statistics were
last reset.

pg_stat_wal view is also the basic infrastructure to expose other
various statistics about WAL activity later.

Bump PGSTAT_FILE_FORMAT_ID due to the change in pgstat format.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-by: Takayuki Tsunakawa, Kyotaro Horiguchi, Amit Kapila, Fujii Masao
Discussion: https://postgr.es/m/188bd3f2d2233cf97753b5ced02bb050@oss.nttdata.com
2020-10-02 10:17:11 +09:00
Tom Lane 489c9c3407 Fix handling of BC years in to_date/to_timestamp.
Previously, a conversion such as
	to_date('-44-02-01','YYYY-MM-DD')
would result in '0045-02-01 BC', as the code attempted to interpret
the negative year as BC, but failed to apply the correction needed
for our internal handling of BC years.  Fix the off-by-one problem.

Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD.  This is how
the negative-century case works, so it seems sane to do likewise.

Continue to read "year 0000" as 1 BC.  Oracle would throw an error,
but we've accepted that case for a long time so I'm hesitant to
change it in a back-patch.

Per bug #16419 from Saeed Hubaishan.  Back-patch to all supported
branches.

Dar Alathar-Yemen and Tom Lane

Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
2020-09-30 15:40:23 -04:00
Tom Lane a094c8ff53 Fix make_timestamp[tz] to accept negative years as meaning BC.
Previously we threw an error.  But make_date already allowed the case,
so it is inconsistent as well as unhelpful for make_timestamp not to.

Both functions continue to reject year zero.

Code and test fixes by Peter Eisentraut, doc changes by me

Discussion: https://postgr.es/m/13c0992c-f15a-a0ca-d839-91d3efd965d9@2ndquadrant.com
2020-09-29 13:48:06 -04:00
Alexander Korotkov 927d9abb65 Support for ISO 8601 in the jsonpath .datetime() method
The SQL standard doesn't require jsonpath .datetime() method to support the
ISO 8601 format.  But our to_json[b]() functions convert timestamps to text in
the ISO 8601 format in the sake of compatibility with javascript.  So, we add
support of the  ISO 8601 to the jsonpath .datetime() in the sake compatibility
with to_json[b]().

The standard mode of datetime parsing currently supports just template patterns
and separators in the format string.  In order to implement ISO 8601, we have to
add support of the format string double quotes to the standard parsing mode.

Discussion: https://postgr.es/m/94321be0-cc96-1a81-b6df-796f437f7c66%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Backpatch-through: 13
2020-09-29 12:00:04 +03:00
Alexander Korotkov c2aa562ea5 Remove excess space from jsonpath .datetime() default format string
bffe1bd684 has introduced jsonpath .datetime() method, but default formats
for time and timestamp contain excess space between time and timezone.  This
commit removes this excess space making behavior of .datetime() method
standard-compliant.

Discussion: https://postgr.es/m/94321be0-cc96-1a81-b6df-796f437f7c66%40postgrespro.ru
Author: Nikita Glukhov
Backpatch-through: 13
2020-09-29 11:00:22 +03:00
Tom Lane 56fe008996 Add for_each_from, to simplify loops starting from non-first list cells.
We have a dozen or so places that need to iterate over all but the
first cell of a List.  Prior to v13 this was typically written as
	for_each_cell(lc, lnext(list_head(list)))
Commit 1cff1b95a changed these to
	for_each_cell(lc, list, list_second_cell(list))
This patch introduces a new macro for_each_from() which expresses
the start point as a list index, allowing these to be written as
	for_each_from(lc, list, 1)
This is marginally more efficient, since ForEachState.i can be
initialized directly instead of backing into it from a ListCell
address.  It also seems clearer and less typo-prone.

Some of the remaining uses of for_each_cell() look like they could
profitably be changed to for_each_from(), but here I confined myself
to changing uses of list_second_cell().

Also, fix for_each_cell_setup() and for_both_cell_setup() to
const-ify their arguments; that's a simple oversight in 1cff1b95a.

Back-patch into v13, on the grounds that (1) the const-ification
is a minor bug fix, and (2) it's better for back-patching purposes
if we only have two ways to write these loops rather than three.

In HEAD, also remove list_third_cell() and list_fourth_cell(),
which were also introduced in 1cff1b95a, and are unused as of
cc99baa43.  It seems unlikely that any third-party code would
have started to use them already; anyone who has can be directed
to list_nth_cell instead.

Discussion: https://postgr.es/m/CAApHDvpo1zj9KhEpU2cCRZfSM3Q6XGdhzuAS2v79PH7WJBkYVA@mail.gmail.com
2020-09-28 20:33:13 -04:00
Tom Lane 0a87ddff5c Cache the result of converting now() to a struct pg_tm.
SQL operations such as CURRENT_DATE, CURRENT_TIME, LOCALTIME, and
conversion of "now" in a datetime input string have to obtain the
transaction start timestamp ("now()") as a broken-down struct pg_tm.
This is a remarkably expensive conversion, and since now() does not
change intra-transaction, it doesn't really need to be done more than
once per transaction.  Introducing a simple cache provides visible
speedups in queries that compute these values many times, for example
insertion of many rows that use a default value of CURRENT_DATE.

Peter Smith, with a bit of kibitzing by me

Discussion: https://postgr.es/m/CAHut+Pu89TWjq530V2gY5O6SWi=OEJMQ_VHMt8bdZB_9JFna5A@mail.gmail.com
2020-09-28 12:05:03 -04:00
Tom Lane 41efb83408 Move resolution of AlternativeSubPlan choices to the planner.
When commit bd3daddaf introduced AlternativeSubPlans, I had some
ambitions towards allowing the choice of subplan to change during
execution.  That has not happened, or even been thought about, in the
ensuing twelve years; so it seems like a failed experiment.  So let's
rip that out and resolve the choice of subplan at the end of planning
(in setrefs.c) rather than during executor startup.  This has a number
of positive benefits:

* Removal of a few hundred lines of executor code, since
AlternativeSubPlans need no longer be supported there.

* Removal of executor-startup overhead (particularly, initialization
of subplans that won't be used).

* Removal of incidental costs of having a larger plan tree, such as
tree-scanning and copying costs in the plancache; not to mention
setrefs.c's own costs of processing the discarded subplans.

* EXPLAIN no longer has to print a weird (and undocumented)
representation of an AlternativeSubPlan choice; it sees only the
subplan actually used.  This should mean less confusion for users.

* Since setrefs.c knows which subexpression of a plan node it's
working on at any instant, it's possible to adjust the estimated
number of executions of the subplan based on that.  For example,
we should usually estimate more executions of a qual expression
than a targetlist expression.  The implementation used here is
pretty simplistic, because we don't want to expend a lot of cycles
on the issue; but it's better than ignoring the point entirely,
as the executor had to.

That last point might possibly result in shifting the choice
between hashed and non-hashed EXISTS subplans in a few cases,
but in general this patch isn't meant to change planner choices.
Since we're doing the resolution so late, it's really impossible
to change any plan choices outside the AlternativeSubPlan itself.

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us
2020-09-27 12:51:28 -04:00
Robert Haas 55b7e2f4d7 Fix two bugs in MaintainOldSnapshotTimeMapping.
The previous coding was confused about whether head_timestamp was
intended to represent the timestamp for the newest bucket in the
mapping or the oldest timestamp for the oldest bucket in the mapping.
Decide that it's intended to be the oldest one, and repair
accordingly.

To do that, we need to do two things. First, when advancing to a
new bucket, don't categorically set head_timestamp to the new
timestamp. Do this only if we're blowing out the map completely
because a lot of time has passed since we last maintained it. If
we're replacing entries one by one, advance head_timestamp by
1 minute for each; if we're filling in unused entries, don't
advance head_timestamp at all.

Second, fix the computation of how many buckets we need to advance.
The previous formula would be correct if head_timestamp were the
timestamp for the new bucket, but we're now making all the code
agree that it's the timestamp for the oldest bucket, so adjust the
formula accordingly.

This is certainly a bug fix, but I don't feel good about
back-patching it without the introspection tools added by commit
aecf5ee2bb, and perhaps also some
actual tests. Since back-patching the introspection tools might
not attract sufficient support and since there are no automated
tests of these fixes yet, I'm just committing this to master for
now.

Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar.

Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
2020-09-24 15:27:19 -04:00
Robert Haas f5ea92e8d6 Expose oldSnapshotControl definition via new header.
This makes it possible for code outside snapmgr.c to examine the
contents of this data structure. This commit does not add any code
which actually does so; a subsequent commit will make that change.

Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar.

Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
2020-09-24 13:33:09 -04:00
Tom Lane f859c2ffa0 Fix a few more generator scripts to produce pgindent-clean output.
This completes the project of making all our derived files be
pgindent-clean (or else explicitly excluded from indentation),
so that no surprises result when running pgindent in a built-out
development tree.

Discussion: https://postgr.es/m/79ed5348-be7a-b647-dd40-742207186a22@2ndquadrant.com
2020-09-21 13:58:26 -04:00
Tom Lane 9436041ed8 Copy editing: fix a bunch of misspellings and poor wording.
99% of this is docs, but also a couple of comments.  No code changes.

Justin Pryzby

Discussion: https://postgr.es/m/20200919175804.GE30557@telsasoft.com
2020-09-21 12:43:42 -04:00
Peter Eisentraut 80fc96eceb Standardize order of use strict and use warnings in Perl code
The standard order in PostgreSQL and other code is use strict first,
but some code was uselessly inconsistent about this.
2020-09-21 17:04:36 +02:00
Thomas Munro ff28809feb Code review for dynahash change.
Commit be0a6666 left behind a comment about the order of some tests that
didn't make sense without the expensive division, and in fact we might
as well change the order to one that fails more cheaply most of the time
as a micro-optimization.  Also, remove the "+ 1" applied to max_bucket,
to drop an instruction and match the original behavior.  Per review
from Tom Lane.

Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2020-09-19 15:53:06 +12:00
Thomas Munro be0a666665 Remove large fill factor support from dynahash.c.
Since ancient times we have had support for a fill factor (maximum load
factor) to be set for a dynahash hash table, but:

1.  It was an integer, whereas for in-memory hash tables interesting
load factor targets are probably somewhere near the 0.75-1.0 range.

2.  It was implemented in a way that performed an expensive division
operation that regularly showed up in profiles.

3.  We are not aware of anyone ever having used a non-default value.

Therefore, remove support, effectively fixing it at 1.

Author: Jakub Wartak <Jakub.Wartak@tomtom.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2020-09-19 11:40:39 +12:00
Tom Lane 06a7c3154f Allow most keywords to be used as column labels without requiring AS.
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword.  This is rather unfriendly considering that we have
so many keywords and are constantly growing more.  In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.

We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS.  Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.

It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%.  This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.

Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor

Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-18 16:46:36 -04:00
Tom Lane 1ed6b89563 Remove support for postfix (right-unary) operators.
This feature has been a thorn in our sides for a long time, causing
many grammatical ambiguity problems.  It doesn't seem worth the
pain to continue to support it, so remove it.

There are some follow-on improvements we can make in the grammar,
but this commit only removes the bare minimum number of productions,
plus assorted backend support code.

Note that pg_dump and psql continue to have full support, since
they may be used against older servers.  However, pg_dump warns
about postfix operators.  There is also a check in pg_upgrade.

Documentation-wise, I (tgl) largely removed the "left unary"
terminology in favor of saying "prefix operator", which is
a more standard and IMO less confusing term.

I included a catversion bump, although no initial catalog data
changes here, to mark the boundary at which oprkind = 'r'
stopped being valid in pg_operator.

Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor

Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-17 19:38:05 -04:00
Peter Eisentraut 45b9805706 Allow CURRENT_ROLE where CURRENT_USER is accepted
In the particular case of GRANTED BY, this is specified in the SQL
standard.  Since in PostgreSQL, CURRENT_ROLE is equivalent to
CURRENT_USER, and CURRENT_USER is already supported here, adding
CURRENT_ROLE is trivial.  The other cases are PostgreSQL extensions,
but for the same reason it also makes sense there.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Asif Rehman <asifr.rehman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9%402ndquadrant.com
2020-09-17 11:40:08 +02:00
Heikki Linnakangas 16fa9b2b30 Add support for building GiST index by sorting.
This adds a new optional support function to the GiST access method:
sortsupport. If it is defined, the GiST index is built by sorting all data
to the order defined by the sortsupport's comparator function, and packing
the tuples in that order to GiST pages. This is similar to how B-tree
index build works, and is much faster than inserting the tuples one by
one. The resulting index is smaller too, because the pages are packed more
tightly, upto 'fillfactor'. The normal build method works by splitting
pages, which tends to lead to more wasted space.

The quality of the resulting index depends on how good the opclass-defined
sort order is. A good order preserves locality of the input data.

As the first user of this facility, add 'sortsupport' function to the
point_ops opclass. It sorts the points in Z-order (aka Morton Code), by
interleaving the bits of the X and Y coordinates.

Author: Andrey Borodin
Reviewed-by: Pavel Borisov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
2020-09-17 11:33:40 +03:00
Tom Lane 44fc6e259b Centralize setup of SIGQUIT handling for postmaster child processes.
We decided that the policy established in commit 7634bd4f6 for
the bgwriter, checkpointer, walwriter, and walreceiver processes,
namely that they should accept SIGQUIT at all times, really ought
to apply uniformly to all postmaster children.  Therefore, get
rid of the duplicative and inconsistent per-process code for
establishing that signal handler and removing SIGQUIT from BlockSig.
Instead, make InitPostmasterChild do it.

The handler set up by InitPostmasterChild is SignalHandlerForCrashExit,
which just summarily does _exit(2).  In interactive backends, we
almost immediately replace that with quickdie, since we would prefer
to try to tell the client that we're dying.  However, this patch is
changing the behavior of autovacuum (both launcher and workers), as
well as walsenders.  Those processes formerly also used quickdie,
but AFAICS that was just mindless copy-and-paste: they don't have
any interactive client that's likely to benefit from being told this.

The stats collector continues to be an outlier, in that it thinks
SIGQUIT means normal exit.  That should probably be changed for
consistency, but there's another patch set where that's being
dealt with, so I didn't do so here.

Discussion: https://postgr.es/m/644875.1599933441@sss.pgh.pa.us
2020-09-16 16:04:36 -04:00
Jeff Davis c8aeaf3ab3 Change LogicalTapeSetBlocks() to use nBlocksWritten.
Previously, it was based on nBlocksAllocated to account for tapes with
open write buffers that may not have made it to the BufFile yet.

That was unnecessary, because callers do not need to get the number of
blocks while a tape has an open write buffer; and it also conflicted
with the preallocation logic added for HashAgg.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/ce5af05900fdbd0e9185747825a7423c48501964.camel@j-davis.com
Backpatch-through: 13
2020-09-15 21:42:25 -07:00
David Rowley fe4f36bcde Fix compiler warning
Introduced in 0aa8f7640.

MSVC warned about performing 32-bit bit shifting when it appeared like we
might like a 64-bit result.  We did, but it just so happened that none of
the calls to this function could have caused the 32-bit shift to overflow.

Here we just cast the constant to int64 to make the compiler happy.

Discussion: https://postgr.es/m/CAApHDvofA_vsrpC13mq_hZyuye5B-ssKEaer04OouXYCO5-uXQ@mail.gmail.com
2020-09-15 15:07:57 +12:00
Fujii Masao 95233011a0 Fix typos.
Author: Naoki Nakamichi
Discussion: https://postgr.es/m/b6919d145af00295a8e86ce4d034b7cd@oss.nttdata.com
2020-09-14 14:16:07 +09:00
Peter Eisentraut 3e0242b24c Message fixes and style improvements 2020-09-14 06:42:30 +02:00
Jeff Davis 0758964963 logtape.c: do not preallocate for tapes when sorting
The preallocation logic is only useful for HashAgg, so disable it when
sorting.

Also, adjust an out-of-date comment.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn_o7tE2+hRVvwSFghRb75AJ5g-nqGzDUqLYMexjOAe=g@mail.gmail.com
Backpatch-through: 13
2020-09-11 17:10:02 -07:00
Noah Misch fe4d022c8e Fix rd_firstRelfilenodeSubid for nailed relations, in parallel workers.
Move applicable code out of RelationBuildDesc(), which nailed relations
bypass.  Non-assert builds experienced no known problems.  Back-patch to
v13, where commit c6b92041d3 introduced
rd_firstRelfilenodeSubid.

Kyotaro Horiguchi.  Reported by Justin Pryzby.

Discussion: https://postgr.es/m/20200907023737.GA7158@telsasoft.com
2020-09-09 18:50:24 -07:00
Peter Eisentraut 0aa8f76408 Expose internal function for converting int64 to numeric
Existing callers had to take complicated detours via
DirectFunctionCall1().  This simplifies a lot of code.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2020-09-09 20:16:28 +02:00
Tom Lane f3e1e66196 Minor fixes in docs and error messages.
Alexander Lakhin

Discussion: https://postgr.es/m/ce7debdd-c943-d7a7-9b41-687107b27831@gmail.com
2020-09-09 11:53:39 -04:00
Tom Lane 3438c988fd Use plain memset() in numeric.c, not MemSet and friends.
This essentially reverts a micro-optimization I made years ago,
as part of the much larger commit d72f6c750.  It's doubtful
that there was any hard evidence for it being helpful even then,
and the case is even more dubious now that modern compilers
are so much smarter about inlining memset().

The proximate reason for undoing it is to get rid of the type punning
inherent in MemSet, for fear that that may cause problems now that
we're applying additional optimization switches to numeric.c.
At the very least this'll silence some warnings from a few old
buildfarm animals.

(It's probably past time for another look at whether MemSet is still
worth anything at all, but I do not propose to tackle that question
right now.)

Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
2020-09-08 11:47:37 -04:00
Peter Eisentraut 728d4bc16b Use <unnamed> for name of unnamed portal's memory context
Otherwise just printing an empty string makes the memory context debug
output slightly confusing.

Discussion: https://www.postgresql.org/message-id/flat/ccb353ef-89ff-09b3-8046-1d2514624b9c%402ndquadrant.com
2020-09-08 17:15:00 +02:00
Tom Lane 9c79e646c6 Frob numeric.c loop so that clang will auto-vectorize it too.
Experimentation shows that clang will auto-vectorize the critical
multiplication loop if the termination condition is written "i2 < limit"
rather than "i2 <= limit".  This seems unbelievably stupid, but I've
reproduced it on both clang 9.0.1 (RHEL8) and 11.0.3 (macOS Catalina).
gcc doesn't care, so tweak the code to do it that way.

Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
2020-09-07 12:03:04 -04:00
Tom Lane 8870917623 Apply auto-vectorization to the inner loop of numeric multiplication.
Compile numeric.c with -ftree-vectorize where available, and adjust
the innermost loop of mul_var() so that it is amenable to being
auto-vectorized.  (Mainly, that involves making it process the arrays
left-to-right not right-to-left.)

Applying -ftree-vectorize actually makes numeric.o smaller, at least
with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too.
Independently of that, fixing the inner loop to be vectorizable also
makes things a bit faster.  But doing both is a huge win for
multiplications with lots of digits.  For me, the numeric regression
test is the same speed to within measurement noise, but numeric_big
is a full 45% faster.

We also looked into applying -funroll-loops, but that makes numeric.o
bloat quite a bit, and the additional speed improvement is very
marginal.

Amit Khandekar, reviewed and edited a little by me

Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
2020-09-06 21:40:39 -04:00
Tom Lane a5cc4dab6d Yet more elimination of dead stores and useless initializations.
I'm not sure what tool Ranier was using, but the ones I contributed
were found by using a newer version of scan-build than I tried before.

Ranier Vilela and Tom Lane

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-05 13:17:32 -04:00
Tom Lane 9a851039aa Remove still more useless assignments.
Fix some more things scan-build pointed to as dead stores.  In some of
these cases, rearranging the code a little leads to more readable
code IMO.  It's all cosmetic, though.

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-04 20:33:36 -04:00
Jeff Davis 0852006a94 Fix bogus MaxAllocSize check in logtape.c.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wz=NZPZc3-fkdmvu=w2itx0PiB-G6QpxHXZOjuvFAzPdZw@mail.gmail.com
Backpatch-through: 13
2020-09-04 12:09:52 -07:00
Bruce Momjian e36e936e0e remove redundant initializations
Reported-by: Ranier Vilela

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com

Author: Ranier Vilela

Backpatch-through: master
2020-09-03 22:57:35 -04:00
Tom Lane 66f1630680 Add string_to_table() function.
This splits a string at occurrences of a delimiter.  It is exactly like
string_to_array() except for producing a set of values instead of an
array of values.  Thus, the relationship of these two functions is
the same as between regexp_split_to_table() and regexp_split_to_array().

Although the same results could be had from unnest(string_to_array()),
this is somewhat faster than that, and anyway it seems reasonable to
have it for symmetry with the regexp functions.

Pavel Stehule, reviewed by Peter Smith

Discussion: https://postgr.es/m/CAFj8pRD8HOpjq2TqeTBhSo_QkzjLOhXzGCpKJ4nCs7Y9SQkuPw@mail.gmail.com
2020-09-02 18:23:56 -04:00
Tom Lane 3d351d916b Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.
Historically, we've considered the state with relpages and reltuples
both zero as indicating that we do not know the table's tuple density.
This is problematic because it's impossible to distinguish "never yet
vacuumed" from "vacuumed and seen to be empty".  In particular, a user
cannot use VACUUM or ANALYZE to override the planner's normal heuristic
that an empty table should not be believed to be empty because it is
probably about to get populated.  That heuristic is a good safety
measure, so I don't care to abandon it, but there should be a way to
override it if the table is indeed intended to stay empty.

Hence, represent the initial state of ignorance by setting reltuples
to -1 (relpages is still set to zero), and apply the minimum-ten-pages
heuristic only when reltuples is still -1.  If the table is empty,
VACUUM or ANALYZE (but not CREATE INDEX) will override that to
reltuples = relpages = 0, and then we'll plan on that basis.

This requires a bunch of fiddly little changes, but we can get rid of
some ugly kluges that were formerly needed to maintain the old definition.

One notable point is that FDWs' GetForeignRelSize methods will see
baserel->tuples = -1 when no ANALYZE has been done on the foreign table.
That seems like a net improvement, since those methods were formerly
also in the dark about what baserel->tuples = 0 really meant.  Still,
it is an API change.

I bumped catversion because code predating this change would get confused
by seeing reltuples = -1.

Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
2020-08-30 12:21:51 -04:00
Amit Kapila 808e13b282 Extend the BufFile interface.
Allow BufFile to support temporary files that can be used by the single
backend when the corresponding files need to be survived across the
transaction and need to be opened and closed multiple times. Such files
need to be created as a member of a SharedFileSet.

Additionally, this commit implements the interface for BufFileTruncate to
allow files to be truncated up to a particular offset and extends the
BufFileSeek API to support the SEEK_END case. This also adds an option to
provide a mode while opening the shared BufFiles instead of always opening
in read-only mode.

These enhancements in BufFile interface are required for the upcoming
patch to allow the replication apply worker, to handle streamed
in-progress transactions.

Author: Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-08-26 07:36:43 +05:30
Fujii Masao 50db5964ee Move codes for pg_backend_memory_contexts from mmgr/mcxt.c to adt/mcxtfuncs.c.
Previously the codes for pg_backend_memory_contexts were in
src/backend/utils/mmgr/mcxt.c. This commit moves them to
src/backend/utils/adt/mcxtfuncs.c so that mcxt.c basically includes
only the low-level interface for memory contexts.

Author: Atsushi Torikoshi
Reviewed-by: Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/20200819135545.GC19121@paquier.xyz
2020-08-26 10:51:31 +09:00
Fujii Masao 3e98c0bafb Add pg_backend_memory_contexts system view.
This view displays the usages of all the memory contexts of the server
process attached to the current session. This information is useful to
investigate the cause of backend-local memory bloat.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But this technique
cannot be uesd in some environments because no debugger is available there.
And it outputs lots of text messages and it's not easy to analyze them.
So, pg_backend_memory_contexts view allows us to access to backend-local
memory contexts information more easily.

Bump catalog version.

Author: Atsushi Torikoshi, Fujii Masao
Reviewed-by: Tatsuhito Kasahara, Andres Freund, Daniel Gustafsson, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com
2020-08-19 15:34:43 +09:00
Andres Freund 623a9ba79b snapshot scalability: cache snapshots using a xact completion counter.
Previous commits made it faster/more scalable to compute snapshots. But not
building a snapshot is still faster. Now that GetSnapshotData() does not
maintain RecentGlobal* anymore, that is actually not too hard:

This commit introduces xactCompletionCount, which tracks the number of
top-level transactions with xids (i.e. which may have modified the database)
that completed in some form since the start of the server.

We can avoid rebuilding the snapshot's contents whenever the current
xactCompletionCount is the same as it was when the snapshot was
originally built.  Currently this check happens while holding
ProcArrayLock. While it's likely possible to perform the check without
acquiring ProcArrayLock, it seems better to do that separately /
later, some careful analysis is required. Even with the lock this is a
significant win on its own.

On a smaller two socket machine this gains another ~1.03x, on a larger
machine the effect is roughly double (earlier patch version tested
though).  If we were able to safely avoid the lock there'd be another
significant gain on top of that.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-17 21:08:30 -07:00
Heikki Linnakangas 3941eb6341 Make xact.h usable in frontend.
xact.h included utils/datetime.h, which cannot be used in the frontend
(it includes fmgr.h, which needs Datum). But xact.h only needs the
definition of TimestampTz from it, which is available directly in
datatypes/timestamp.h. Change xact.h to include that instead of
utils/datetime.h, so that it can be used in client programs.
2020-08-17 10:50:13 +03:00