It has been spotted that multiranges lack of ability to decompose them into
individual ranges. Subscription and proper expanded object representation
require substantial work, and it's too late for v14. This commit
provides the implementation of unnest(multirange) and cast multirange as
an array of ranges, which is quite trivial.
unnest(multirange) is defined as a polymorphic procedure. The catalog
description of the cast underlying procedure is duplicated for each multirange
type because we don't have anyrangearray polymorphic type to use here.
Catversion is bumped.
Reported-by: Jonathan S. Katz
Discussion: https://postgr.es/m/flat/60258efe-bd7e-4886-82e1-196e0cac5433%40postgresql.org
Author: Alexander Korotkov
Reviewed-by: Justin Pryzby, Jonathan S. Katz, Zhihong Yu
It was unable to wait on a backend that had already left the procarray.
Users tolerant of that limitation can poll pg_stat_activity. Other
users can employ the "timeout" argument of pg_terminate_backend().
Reviewed by Bharath Rupireddy.
Discussion: https://postgr.es/m/20210605013236.GA208701@rfd.leadboat.com
Revert the pg_description entry to its v13 form, since those messages
usually remain shorter and don't discuss individual parameters. No
catversion bump, since pg_description content does not impair backend
compatibility or application compatibility.
Justin Pryzby
Discussion: https://postgr.es/m/20210612182743.GY16435@telsasoft.com
We've accumulated quite a mix of instances of "an SQL" and "a SQL" in the
documents. It would be good to be a bit more consistent with these.
The most recent version of the SQL standard I looked at seems to prefer
"an SQL". That seems like a good lead to follow, so here we change all
instances of "a SQL" to become "an SQL". Most instances correctly use
"an SQL" already, so it also makes sense to use the dominant variation in
order to minimise churn.
Additionally, there were some other abbreviations that needed to be
adjusted. FSM, SSPI, SRF and a few others. Also fix some pronounceable,
abbreviations to use "a" instead of "an". For example, "a SASL" instead
of "an SASL".
Here I've only adjusted the documents and error messages. Many others
still exist in source code comments. Translator hint comments seem to be
the biggest culprit. It currently does not seem worth the churn to change
these.
Discussion: https://postgr.es/m/CAApHDvpML27UqFXnrYO1MJddsKVMQoiZisPvsAGhKE_tsKXquw%40mail.gmail.com
Redefine '\0' (InvalidCompressionMethod) as meaning "if we need to
compress, use the current setting of default_toast_compression".
This allows '\0' to be a suitable default choice regardless of
datatype, greatly simplifying code paths that initialize tupledescs
and the like. It seems like a more user-friendly approach as well,
because now the default compression choice doesn't migrate into table
definitions, meaning that changing default_toast_compression is
usually sufficient to flip an installation's behavior; one needn't
tediously issue per-column ALTER SET COMPRESSION commands.
Along the way, fix a few minor bugs and documentation issues
with the per-column-compression feature. Adopt more robust
APIs for SetIndexStorageProperties and GetAttributeCompression.
Bump catversion because typical contents of attcompression will now
be different. We could get away without doing that, but it seems
better to ensure v14 installations all agree on this. (We already
forced initdb for beta2, anyway.)
Discussion: https://postgr.es/m/626613.1621787110@sss.pgh.pa.us
Make sample like_regex match string values of the root object instead of the
whole document. The corrected example seems to represent a more relevant
use case.
Backpatch to 12, when jsonpath was introduced.
Discussion: https://postgr.es/m/13440f8b-4c1f-5875-c8e3-f3f65606af2f%40xs4all.nl
Author: Erik Rijkers
Reviewed-by: Michael Paquier, Alexander Korotkov
Backpatch-through: 12
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose. We're out of time for this release so we'll revert
and try again.
Commits reverted:
1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.
Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
The grammar changes in commit bbe0a81db6
allow SET COMPRESSION to be used with ALTER MATERIALIZED VIEW as
well as with ALTER TABLE, so update those docs to say that it works.
Also, update the documentation for the pg_column_compression()
to explain that it will return NULL when there's no relevant value.
Patch by me, per concerns from Michael Paquier.
Discussion: http://postgr.es/m/CA+Tgmob9h5u4iNL9KM0drZgkY-JL4oCVW0dWrMqtLPQ1zHkquA@mail.gmail.com
Previously, a lot of information about type regclass existed only
in the discussion of the sequence functions. Maybe that made sense
in the beginning, because I think originally those were the only
functions taking regclass. But it doesn't make sense anymore.
Move that material to the "Object Identifier Types" section in
datatype.sgml, generalize it to talk about the other reg* types
as well, and add more examples.
Per bug #16991 from Federico Caselli.
Discussion: https://postgr.es/m/16991-bcaeaafa17e0a723@postgresql.org
Before now, looking up "multirange" in the index only led to the
multirange() function. To make this more useful, also add an entry
pointing to the range types section.
For some reason, the "julian" option for extract()/date_part() has
never gotten listed in the manual. Also, while Appendix B mentioned
in passing that we don't conform to the usual astronomical definition
that a Julian date starts at noon UTC, it was kind of vague about what
we do instead. Clarify that, and add an example showing how to get
the astronomical definition if you want it.
It's been like this for ages, so back-patch to all supported branches.
Discussion: https://postgr.es/m/1197050.1619123213@sss.pgh.pa.us
Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com
Backpatch-through: 9.6
This adds a function, pg_wait_for_backend_termination(), and a new
timeout argument to pg_terminate_backend(), which will wait for the
backend to actually terminate (with or without signaling it to do so
depending on which function is called). The default behaviour of
pg_terminate_backend() remains being timeout=0 which does not waiting.
For pg_wait_for_backend_termination() the default wait is 5 seconds.
Author: Bharath Rupireddy
Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama,
Hou Zhijie, Magnus Hagander
Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com
The previous implementation of EXTRACT mapped internally to
date_part(), which returned type double precision (since it was
implemented long before the numeric type existed). This can lead to
imprecise output in some cases, so returning numeric would be
preferrable. Changing the return type of an existing function is a
bit risky, so instead we do the following: We implement a new set of
functions, which are now called "extract", in parallel to the existing
date_part functions. They work the same way internally but use
numeric instead of float8. The EXTRACT construct is now mapped by the
parser to these new extract functions. That way, dumps of views
etc. from old versions (which would use date_part) continue to work
unchanged, but new uses will map to the new extract functions.
Additionally, the reverse compilation of EXTRACT now reproduces the
original syntax, using the new mechanism introduced in
40c24bfef9.
The following minor changes of behavior result from the new
implementation:
- The column name from an isolated EXTRACT call is now "extract"
instead of "date_part".
- Extract from date now rejects inappropriate field names such as
HOUR. It was previously mapped internally to extract from
timestamp, so it would silently accept everything appropriate for
timestamp.
- Return values when extracting fields with possibly fractional
values, such as second and epoch, now have the full scale that the
value has internally (so, for example, '1.000000' instead of just
'1').
Reported-by: Petr Fedorov <petr.fedorov@phystech.edu>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.
This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.
Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.
On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.
When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.
There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.
Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.
Bump catalog version.
Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
Similar to date_trunc, but allows binning by an arbitrary interval
rather than just full units.
Author: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Artur Zakirov <zaartur@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
This function for bit and bytea counts the set bits in the bit or byte
string. Internally, we use the existing popcount functionality.
For the name, after some discussion, we settled on bit_count, which
also exists with this meaning in MySQL, Java, and Python.
Author: David Fetter <david@fetter.org>
Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.
In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.
Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.
It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.
Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.
Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
Previously, the code and documentation seem to have essentially
assumed than a call to pg_wal_replay_pause() would take place
immediately, but that's not the case, because we only check for a
pause in certain places. This means that a tool that uses this
function and then wants to do something else afterward that is
dependent on the pause having taken effect doesn't know how long it
needs to wait to be sure that no more WAL is going to be replayed.
To avoid that, add a new function pg_get_wal_replay_pause_state()
which returns either 'not paused', 'paused requested', or 'paused'.
After calling pg_wal_replay_pause() the status will immediate change
from 'not paused' to 'pause requested'; when the startup process
has noticed this, the status will change to 'pause'. For backward
compatibility, pg_is_wal_replay_paused() still exists and returns
the same thing as before: true if a pause has been requested,
whether or not it has taken effect yet; and false if not.
The documentation is updated to clarify.
To improve the changes that a pause request is quickly confirmed
effective, adjust things so that WaitForWALToBecomeAvailable will
swiftly reach a call to recoveryPausesHere() when a pause request
is made.
Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata,
Masahiko Sawada, and Bharath Rupireddy.
Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.
Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.
Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
POSIX defines the behavior of back-references thus:
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression
enclosed between "\(" and "\)" preceding the '\n'.
As far as I can see, the back-reference is supposed to consider only
the data characters matched by the referenced subexpression. However,
because our engine copies the NFA constructed from the referenced
subexpression, it effectively enforces any constraints therein, too.
As an example, '(^.)\1' ought to match 'xx', or any other string
starting with two occurrences of the same character; but in our code
it does not, and indeed can't match anything, because the '^' anchor
constraint is included in the backref's copied NFA. If POSIX intended
that, you'd think they'd mention it. Perl for one doesn't act that
way, so it's hard to conclude that this isn't a bug.
Fix by modifying the backref's NFA immediately after it's copied from
the reference, replacing all constraint arcs by EMPTY arcs so that the
constraints are treated as automatically satisfied. This still allows
us to enforce matching rules that depend only on the data characters;
for example, in '(^\d+).*\1' the NFA matching step will still know
that the backref can only match strings of digits.
Perhaps surprisingly, this change does not affect the results of any
of a rather large corpus of real-world regexes. Nonetheless, I would
not consider back-patching it, since it's a clear compatibility break.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
Newline is certainly not a digit, nor a word character, so it is
sensible that it should match these complemented character classes.
Previously, \D and \W acted that way by default, but in
newline-sensitive mode ('n' or 'p' flag) they did not match newlines.
This behavior was previously forced because explicit complemented
character classes don't match newlines in newline-sensitive mode;
but as of the previous commit that implementation constraint no
longer exists. It seems useful to change this because the primary
real-world use for newline-sensitive mode seems to be to match the
default behavior of other regex engines such as Perl and Javascript
... and their default behavior is that these match newlines.
The old behavior can be kept by writing an explicit complemented
character class, i.e. [^[:digit:]] or [^[:word:]]. (This means
that \D and \W are not exactly equivalent to those strings, but
they weren't anyway.)
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
The complement-class escapes \D, \S, \W are now allowed within
bracket expressions. There is no semantic difficulty with doing
that, but the rather hokey macro-expansion-based implementation
previously used here couldn't cope.
Also, invent "word" as an allowed character class name, thus "\w"
is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
within brackets. POSIX allows such implementation-specific
extensions, and the same name is used in e.g. bash.
One surprising compatibility issue this raises is that constructs
such as "[\w-_]" are now disallowed, as our documentation has always
said they should be: character classes can't be endpoints of a range.
Previously, because \w was just a macro for "[:alnum:]_", such a
construct was read as "[[:alnum:]_-_]", so it was accepted so long as
the character after "-" was numerically greater than or equal to "_".
Some implementation cleanup along the way:
* Remove the lexnest() hack, and in consequence clean up wordchrs()
to not interact with the lexer.
* Fix colorcomplement() to not be O(N^2) in the number of colors
involved.
* Get rid of useless-as-far-as-I-can-see calls of element()
on single-character character element names in brackpart().
element() always maps these to the character itself, and things
would be quite broken if it didn't --- should "[a]" match something
different than "a" does? Besides, the shortcut path in brackpart()
wasn't doing this anyway, making it even more inconsistent.
Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
The manual did not mention whether its return value is (first arg -
second arg) or (second arg - first arg). The order matters because the
return value could have a sign. Fix the manual so that it mentions the
function returns (first arg - second arg).
Patch reviewed by Tom Lane.
Back-patch through v13. Older version's doc format is difficult to add
more description.
Discussion: https://postgr.es/m/flat/20210206.151125.960423226279810864.t-ishii%40sraoss.co.jp
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs. While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL. Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation. Nonetheless, we can collect and use knowledge about these
relationships.
This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships. (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.) Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h". The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL. The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins. (As of this
commit, 217 relationships are checked by the test, versus 181 before.)
Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
Writing unnecessary '.*' at start and end of a POSIX regex doesn't
do much except confuse the reader about whether that might be
necessary after all. Make the examples in table 9.16 a tad more
realistic, and try to turn the next group of examples into something
self-contained.
Per gripe from rmzgrimes. Back-patch to v13 because it's easy.
Discussion: https://postgr.es/m/161215841824.14653.8969016349304314299@wrigleys.postgresql.org
When the .** jsonpath accessor handles the array, it selects both array and
each of its elements. When using lax mode, subsequent accessors automatically
unwrap arrays. So, the content of each array element may be selected twice.
Even though this behavior is counterintuitive, it's correct because everything
works as designed. This commit documents it.
Backpatch to 12 where the jsonpath language was introduced.
Reported-by: Thomas Kellerer
Bug: #16828
Discussion: https://postgr.es/m/16828-2b0229babfad2d8c%40postgresql.org
Discussion: https://postgr.es/m/CAPpHfdtS-nNidT%3DEqZbAYOPcnNOWh_sd6skVdu2CAQUGdvpT8Q%40mail.gmail.com
Author: Alexandex Korotkov, revised by Tom Lane
Reviewed-by: Alvaro Herrera, Thomas Kellerer, Tom Lane
Backpatch-through: 12
Per a user question, spell out that UNNEST() returns array elements
in storage order; also provide an example to clarify the behavior for
multi-dimensional arrays.
While here, also clarify the SELECT reference page's description of
WITH ORDINALITY. These details were already given in 7.2.1.4, but
a reference page should not omit details.
Back-patch to v13; there's not room in the table in older versions.
Discussion: https://postgr.es/m/FF1FB31F-0507-4F18-9559-2DE6E07E3B43@gmail.com
The behavior of cross-type comparisons among date/time data types was
not really explained anywhere. You could probably infer it if you
recognized the applicability of comments elsewhere about datatype
conversions, but it seems worthy of explicit documentation.
Per bug #16797 from Dana Burd.
Discussion: https://postgr.es/m/16797-f264b0b980b53b8b@postgresql.org
We have operators for checking if the multirange contains a range but don't
have the opposite. This commit improves completeness of the operator set by
adding two new operators: @> (anyrange,anymultirange) and
<@(anymultirange,anyrange).
Catversion is bumped.
The jsonb || jsonb operator arbitrarily rejected certain combinations
of scalar and non-scalar inputs, while being willing to concatenate
other combinations. This was of course quite undocumented. Rather
than trying to document it, let's just remove the restriction,
creating a uniform rule that unless we are handling an object-to-object
concatenation, non-array inputs are converted to one-element arrays,
resulting in an array-to-array concatenation. (This does not change
the behavior for any case that didn't throw an error before.)
Per complaint from Joel Jacobson. Back-patch to all supported branches.
Discussion: https://postgr.es/m/163099.1608312033@sss.pgh.pa.us
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.
Since v14, each range type automatically gets a corresponding multirange
datatype. There are both manual and automatic mechanisms for naming multirange
types. Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE. Otherwise, a multirange type name is generated
automatically. If the range type name contains "range" then we change that to
"multirange". Otherwise, we add "_multirange" to the end.
Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids. Altogether this format allows fetching a particular range by its index
in O(n).
Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.
Catversion is bumped.
Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
Historically these were called >^ and <^, but that is inconsistent
with the similar box, polygon, and circle operators, which are named
|>> and <<| respectively. Worse, the >^ and <^ names are used for
*not* strict above/below tests for the box type.
Hence, invent new operators following the more common naming. The
old operators remain available for now, and are still accepted by
the relevant index opclasses too. But there's a deprecation notice,
so maybe we can get rid of them someday.
Emre Hasegeli, reviewed by Pavel Borisov
Discussion: https://postgr.es/m/24348.1587444160@sss.pgh.pa.us
This feature was added a long time ago, in 7c1e67bd5 and eb121ba2c,
but never documented in any user-facing way. (Documentation added
in 6126d3e70 was commented out almost immediately, in 8272fc3f7.)
That's because, while this syntax is defined by SQL:99, our
implementation is only vaguely related to the standard's semantics.
The standard appears to intend a run-time not parse-time test, and
it definitely intends that the test should understand subtype
relationships.
No one has stepped up to fix that in the intervening years, but
people keep coming across the code and asking why it's not documented.
Let's just get rid of it: if anyone ever wants to make it work per
spec, they can easily recover whatever parts of this code are still
of value from our git history.
If there's anyone out there who's actually using this despite its
undocumented status, they can switch to using pg_typeof() instead,
eg. "pg_typeof(something) = 'mytype'::regtype". That gives
essentially the same semantics as what our IS OF code did.
(We didn't have that function last time this was discussed, or
we would have ripped out IS OF then.)
Discussion: https://postgr.es/m/CAKFQuwZ2pTc-DSkOiTfjauqLYkNREeNZvWmeg12Q-_69D+sYZA@mail.gmail.com
Discussion: https://postgr.es/m/BAY20-F23E9F2B4DAB3E4E88D3623F99B0@phx.gbl
Discussion: https://postgr.es/m/3E7CF81D.1000203@joeconway.com
Bug #16652 complains that pg_reload_conf() returned true, even though
the configuration file contained errors. That's the way pg_reload_conf()
works, by design, but the documentation wasn't very clear on it. Clarify
that a 'true' return value only means that the signal was sent
successfully. Also add links to the system views that can be used to
check the configuration files for errors.
David G. Johnston, with some rewording by me.
Discussion: https://www.postgresql.org/message-id/CAKFQuwax6GxhUQEes0D045UtXG-fBraM39_6UMd5JyR5K1HWCQ%40mail.gmail.com
This provides a handy way to get, say, the last field of the string.
Use of a negative index in this way has precedent in the nearby
left() and right() functions.
The implementation scans the string twice when N < -1, but it seems
likely that N = -1 will be the huge majority of actual use cases,
so I'm not really excited about adding complexity to avoid that.
Nikhil Benesch, reviewed by Jacob Champion; cosmetic tweakage by me
Discussion: https://postgr.es/m/cbb7f861-6162-3a51-9823-97bc3aa0b638@gmail.com
Convert array_append, array_prepend, array_cat, array_position,
array_positions, array_remove, array_replace, and width_bucket
to use anycompatiblearray. This is a simple extension of commit
5c292e6b9 to hit some other places where there's a pretty obvious
gain in usability from doing so.
Ideally we'd also modify other functions taking multiple old-style
polymorphic arguments. But most of the remainder are tied into one
or more operator classes, making any such change a much larger can of
worms than I desire to open right now.
Discussion: https://postgr.es/m/77675130-89da-dab1-51dd-492c93dcf5d1@postgresfriends.org
This allows use of a "default" expression that doesn't slavishly
match the data column's type. Formerly you got something like
"function lag(numeric, integer, integer) does not exist", which
is not just unhelpful but actively misleading.
The SQL spec suggests that the default should be coerced to the data
column's type, but this implementation instead chooses the common
supertype, which seems at least as reasonable.
(Note: I took the opportunity to run "make reformat-dat-files" on
pg_proc.dat, so this commit includes some cosmetic changes to
recently-added entries that aren't related to lead/lag.)
Vik Fearing
Discussion: https://postgr.es/m/77675130-89da-dab1-51dd-492c93dcf5d1@postgresfriends.org
Remove old containment operators @ and ~ for built-in geometry data
types. These have been deprecated; use <@ and @> instead.
(Some contrib modules still contain the same deprecated operators.
That will be dealt with separately.)
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://www.postgresql.org/message-id/flat/20201027032511.GF9241@telsasoft.com
Record the current version of dependent collations in pg_depend when
creating or rebuilding an index. When accessing the index later, warn
that the index may be corrupted if the current version doesn't match.
Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
Michael Paquier, Robert Haas, Tom Lane and others for very helpful
discussion.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
This model couldn't be extended to cover the default collation, and
didn't have any information about the affected database objects when the
version changed. Remove, in preparation for a follow-up commit that
will add a new mechanism.
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
Commit f2b883969 did not get the memo about the new formatting
style for tables documenting built-in functions. I noticed because
of a PDF build warning about an overwidth table.
This makes use of CheckBuffer() introduced in c780a7a, adding a SQL
wrapper able to do checks for all the pages of a relation. By default,
all the fork types of a relation are checked, and it is possible to
check only a given relation fork. Note that if the relation given in
input has no physical storage or is temporary, then no errors are
generated, allowing full-database checks when coupled with a simple scan
of pg_class for example. This is not limited to clusters with data
checksums enabled, as clusters without data checksums can still apply
checks on pages using the page headers or for the case of a page full of
zeros.
This function returns a set of tuples consisting of:
- The physical file where a broken page has been detected (without the
segment number as that can be AM-dependent, which can be guessed from
the block number for heap). A relative path from PGPATH is used.
- The block number of the broken page.
By default, only superusers have an access to this function but
execution rights can be granted to other users.
The feature introduced here is still minimal, and more improvements
could be done, like:
- Addition of a start and end block number to run checks on a range
of blocks, which would apply only if one fork type is checked.
- Addition of some progress reporting.
- Throttling, with configuration parameters in function input or
potentially some cost-based GUCs.
Regression tests are added for positive cases in the main regression
test suite, and TAP tests are added for cases involving the emulation of
page corruptions.
Bump catalog version.
Author: Julien Rouhaud, Michael Paquier
Reviewed-by: Masahiko Sawada, Justin Pryzby
Discussion: https://postgr.es/m/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com
- Misc grammar and punctuation fixes.
- Stylistic cleanup: use spaces between function arguments and JSON fields
in examples. For example "foo(a,b)" -> "foo(a, b)". Add semicolon after
last END in a few PL/pgSQL examples that were missing them.
- Make sentence that talked about "..." and ".." operators more clear,
by avoiding to end the sentence with "..". That makes it look the same
as "..."
- Fix syntax description for HAVING: HAVING conditions cannot be repeated
Patch by Justin Pryzby, per Yaroslav Schekin's report. Backpatch to all
supported versions, to the extent that the patch applies easily.
Discussion: https://www.postgresql.org/message-id/20201005191922.GE17626%40telsasoft.com
Section 8.5.1.4, which defines these literals, made only a vague
reference to the fact that they might be evaluated too soon to be
safe in non-interactive contexts. Provide a more explicit caution
against misuse. Also, generalize the wording in the related tip in
section 9.9.4: while it clearly described this problem, it implied
(or really, stated outright) that the problem only applies to table
DEFAULT clauses.
Per gripe from Tijs van Dam. Back-patch to all supported branches.
Discussion: https://postgr.es/m/c2LuRv9BiRT3bqIo5mMQiVraEXey_25B4vUn0kDqVqilwOEu_iVF1tbtvLnyQK7yDG3PFaz_GxLLPil2SDkj1MCObNRVaac-7j1dVdFERk8=@thalex.com
The descriptions of make_interval() and pg_options_to_table()
were randomly different from the reality embedded in pg_proc.
(These are not all the discrepancies I found in a quick search,
but the others perhaps require more discussion, since there's
at least a case to be made for changing pg_proc not the docs.)
make_interval issue noted by Thomas Kellerer.
Discussion: https://postgr.es/m/7b154ef0-9f22-90b9-7734-4bf23686695b@gmx.net
Previously, a conversion such as
to_date('-44-02-01','YYYY-MM-DD')
would result in '0045-02-01 BC', as the code attempted to interpret
the negative year as BC, but failed to apply the correction needed
for our internal handling of BC years. Fix the off-by-one problem.
Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD. This is how
the negative-century case works, so it seems sane to do likewise.
Continue to read "year 0000" as 1 BC. Oracle would throw an error,
but we've accepted that case for a long time so I'm hesitant to
change it in a back-patch.
Per bug #16419 from Saeed Hubaishan. Back-patch to all supported
branches.
Dar Alathar-Yemen and Tom Lane
Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
Previously we threw an error. But make_date already allowed the case,
so it is inconsistent as well as unhelpful for make_timestamp not to.
Both functions continue to reject year zero.
Code and test fixes by Peter Eisentraut, doc changes by me
Discussion: https://postgr.es/m/13c0992c-f15a-a0ca-d839-91d3efd965d9@2ndquadrant.com
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword. This is rather unfriendly considering that we have
so many keywords and are constantly growing more. In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.
We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS. Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.
It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%. This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
The "!" operator is our only built-in postfix operator. Remove it,
on the way to removal of grammar support for postfix operators.
There is also a "!!" prefix operator, but since it's been marked
deprecated for most of its existence, we might as well remove it too.
Also zap the SQL alias function numeric_fac(), which seems to have
equally little reason to live.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
This splits a string at occurrences of a delimiter. It is exactly like
string_to_array() except for producing a set of values instead of an
array of values. Thus, the relationship of these two functions is
the same as between regexp_split_to_table() and regexp_split_to_array().
Although the same results could be had from unnest(string_to_array()),
this is somewhat faster than that, and anyway it seems reasonable to
have it for symmetry with the regexp functions.
Pavel Stehule, reviewed by Peter Smith
Discussion: https://postgr.es/m/CAFj8pRD8HOpjq2TqeTBhSo_QkzjLOhXzGCpKJ4nCs7Y9SQkuPw@mail.gmail.com
Per discussion, we're planning to remove parser support for postfix
operators in order to simplify the grammar. So it behooves us to
put out a deprecation notice at least one release before that.
There is only one built-in postfix operator, ! for factorial.
Label it deprecated in the docs and in pg_description, and adjust
some examples that formerly relied on it. (The sister prefix
operator !! is also deprecated. We don't really have to remove
that one, but since we're suggesting that people use factorial()
instead, it seems better to remove both operators.)
Also state in the CREATE OPERATOR ref page that postfix operators
in general are going away.
Although this changes the initial contents of pg_description,
I did not force a catversion bump; it doesn't seem essential.
In v13, also back-patch 4c5cf5431, so that there's someplace for
the <link>s to point to.
Mark Dilger and John Naylor, with some adjustments by me
Discussion: https://postgr.es/m/BE2DF53D-251A-4E26-972F-930E523580E9@enterprisedb.com
Make these examples self-contained by providing declarations of the
user-defined row types they rely on. There wasn't room to do this
in the old doc format, but now there is, and I think it makes the
examples a good bit less confusing.
When using the following functions, users could see various types of
errors of the type "cache lookup failed for OID XXX" with elog(), that
can only be used for internal errors:
* pg_describe_object()
* pg_identify_object()
* pg_identify_object_as_address()
The set of APIs managing object addresses for all object types are made
smarter by gaining a new argument "missing_ok" that allows any caller to
control if an error is raised or not on an undefined object. The SQL
functions listed above are changed to handle the case where an object is
missing.
Regression tests are added for all object types for the cases where
these are undefined. Before this commit, these cases failed with cache
lookup errors, and now they basically return NULL (minus the name of the
object type requested).
Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
This includes two changes:
- Addition of a new function pg_xact_commit_timestamp_origin() able, for
a given transaction ID, to return the commit timestamp and replication
origin of this transaction. An equivalent function existed in
pglogical.
- Addition of the replication origin to pg_last_committed_xact().
The commit timestamp manager includes already APIs able to return the
replication origin of a transaction on top of its commit timestamp, but
the code paths for replication origins were never stressed as those
functions have never looked for a replication origin, and the SQL
functions available have never included this information since their
introduction in 73c986a.
While on it, refactor a test of modules/commit_ts/ to use tstzrange() to
check that a transaction timestamp is within the wanted range, making
the test a bit easier to read.
Bump catalog version.
Author: Movead Li
Reviewed-by: Madan Kumar, Michael Paquier
Discussion: https://postgr.es/m/2020051116430836450630@highgo.ca
SQL:1999 had syntax
SUBSTRING(text FROM pattern FOR escapechar)
but this was replaced in SQL:2003 by the more clear
SUBSTRING(text SIMILAR pattern ESCAPE escapechar)
but this was never implemented in PostgreSQL. This patch adds that
new syntax as an alternative in the parser, and updates documentation
and tests to indicate that this is the preferred alternative now.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/flat/a15db31c-d0f8-8ce0-9039-578a31758adb%402ndquadrant.com
The IANA time zone folk have deprecated use of a "posixrules" file in
the tz database. While for now it's our choice whether to keep
supplying one in our own builds, installations built with
--with-system-tzdata will soon be needing to cope with that file not
being present, at least on some platforms.
This causes a problem for the horology test, which expected the
nonstandard POSIX zone spec "CST7CDT" to apply pre-2007 US daylight
savings rules. That does happen if the posixrules file supplies such
information, but otherwise the test produces undesired results.
To fix, add an explicit transition date rule that matches 2005 practice.
(We could alternatively have switched the test to use some real time
zone, but it seems useful to have coverage of this type of zone spec.)
While at it, update a documentation example that also relied on
"CST7CDT"; use a real-world zone name instead. Also, document why
the zone names EST5EDT, CST6CDT, MST7MDT, PST8PDT aren't subject to
similar failures when "posixrules" is missing.
Back-patch to all supported branches, since the hazard is the same
for all.
Discussion: https://postgr.es/m/1665379.1592581287@sss.pgh.pa.us
This patch removes the hardcoded check for superuser privileges when
executing replication origin functions. Instead, execution is revoked
from public, meaning that those functions can be executed by a superuser
and that access to them can be granted.
Author: Martín Marqués
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Masahiko Sawada
Discussion: https:/postgr.es/m/CAPdiE1xJMZOKQL3dgHMUrPqysZkgwzSMXETfKkHYnBAB7-0VRQ@mail.gmail.com
Use xreflabel attributes instead of endterm attributes to control the
appearance of links to subsections of SQL command reference pages.
This is simpler, it matches what we do elsewhere (e.g. for GUC variables),
and it doesn't draw "Unresolved ID reference" warnings from the PDF
toolchain.
Fix some places where the text was absolutely dependent on an <xref>
rendering exactly so, by using a <link> around the required text
instead. At least one of those spots had already been turned into
bad grammar by subsequent changes, and the whole idea is just too
fragile for my taste. <xref> does NOT have fixed output, don't write
as if it does.
Consistently include a page-level link in cross-man-page references,
because otherwise they are useless/nonsensical in man-page output.
Likewise, be consistent about mentioning "below" or "above" in same-page
references; we were doing that in about 90% of the cases, but now it's
100%.
Also get rid of another nonfunctional-in-PDF idea, of making
cross-references to functions by sticking ID tags on <row> constructs.
We can put the IDs on <indexterm>s instead --- which is probably not any
more sensible in abstract terms, but it works where the other doesn't.
(There is talk of attaching cross-reference IDs to most or all of
the docs' function descriptions, but for now I just fixed the two
that exist.)
Discussion: https://postgr.es/m/14480.1589154358@sss.pgh.pa.us
We had a mishmash of <replaceable>, <replaceable class="parameter">,
and <parameter> markup for operator/function arguments. Use <parameter>
consistently for things that are in fact names of parameters (including
OUT parameters), reserving <replaceable> for things that aren't. The
latter class includes some made-up-by-the-docs type class names, like
"numeric_type", as well as placeholders for arguments that don't have
well-defined types. Possibly we could do better with those categories
as well, but for the moment I'm content not to have parameter names
marked up in different ways in different places.
(This commit aligns the earlier sections of chapter 9 with a policy
that I'd arrived at while working on commit 1ad23335f, which is why
the last few sections need no changes.)
Make the markup a bit less ad-hoc. A function-table cell now contains
several <para> units, and we label the ones that contain function
signatures with role="func_signature". The CSS or FO stylesheets then
key off of that to decide how to set the indentation. A very useful
win from this approach is that we can have more than one signature
entry per table cell, simplifying the documentation of closely-related
operators and functions.
This patch mostly just replaces the markup in the tables I converted so
far. But I did alter a couple of places where multiple signatures were
helpful.
Discussion: https://postgr.es/m/5561.1587922854@sss.pgh.pa.us
This includes the usual amount of editorial cleanup, such as
correcting wrong or less-helpful-than-they-could-be examples.
I moved the two tsvector-updating triggers into "9.28 Trigger
Functions", which seems like a better home for them. (I believe
that section didn't exist when this text was originally written.)
Also rearrange that page a bit for more consistency and less
duplication.
In passing, fix erroneous examples of the results of abbrev(cidr)
in datatype.sgml, and do a bit of copy-editing there.
David Johnston reminded me that the per-point calculations being done
by these operators are equivalent to complex multiplication/division.
(Once I would've recognized that immediately, but it's been too long
since I did any of that sort of math.)
Also put in a footnote mentioning that "rotation" of a box doesn't do
what you might expect, as I'd griped about in the referenced thread.
Discussion: https://postgr.es/m/158110996889.1089.4224139874633222837@wrigleys.postgresql.org
This also makes an attempt to flesh out the docs for some of the more
severely underdocumented geometric operators and functions.
This effort exposed that the point <^ point (point_below) and
point >^ point (point_above) operators are misnamed; they should be
<<| and |>>, because they act like the other operators named that
way and not like the other operators named <^ and >^. But I just
documented them that way; fixing it is matter for another patch.
The haphazard datatype coverage of many of the operators is also
now depressingly obvious.
Discussion: https://postgr.es/m/158110996889.1089.4224139874633222837@wrigleys.postgresql.org
Along the way, update the older examples for bytea to use "hex"
output format. That lets us get rid of the lame disclaimer about
how the examples assume bytea_output = escape, which was only half
true anyway because none of the more-recently-added examples had
paid any attention to that.
I took the opportunity to do some copy-editing in this area as well,
and to add some new material such as a note about BETWEEN's syntactical
peculiarities.
Of note is that quite a few of the examples of transcendental functions
needed to be updated, because the displayed output no longer matched
what you get on a modern server. I believe some of these cases are
side-effects of the new Ryu algorithm in float8out. Others appear to be
because the examples predate the addition of type numeric, and were
expecting that float8 calculations would be done although the given
syntax would actually lead to calling the numeric function nowadays.
The table layout ideas proposed in commit e894c6183 were not as widely
popular as I'd hoped. After discussion, we've settled on a layout
that's effectively a single-column table with cell contents much like a
<varlistentry> description of the function or operator; though we're not
actually using <varlistentry>, because it'd add way too much vertical
space. Instead the effect is accomplished using line-break processing
instructions to separate the description and example(s), plus CSS or FO
customizations to produce indentation of all but the first line in each
cell. While technically this is a bit grotty, it does have the
advantage that we won't need to write nearly as much boilerplate markup.
This patch updates tables 9.30, 9.31, and 9.33 (which were touched by
the previous patch) to the revised style, and additionally converts
table 9.10. A lot of work still remains to do, but hopefully it won't
be too controversial.
Thanks to Andrew Dunstan, Pierre Giraud, Robert Haas, Alvaro Herrera,
David Johnston, Jonathan Katz, Isaac Morland for valuable ideas.
Discussion: https://postgr.es/m/8691.1586798003@sss.pgh.pa.us
We've long fought with the draconian space limitations of our
traditional table layout for describing SQL functions and operators.
This commit introduces a new approach, though so far I've only applied
it to a few of those tables. The new way makes use of DocBook's support
for different layouts in different rows of a table, and allows the
descriptions and examples for a function or operator to run to several
lines without as much ugliness and wasted space as before.
The core layout concept is now
Name Signature
Description
Example Example Result
so that a function or operator really has three table rows not one,
but we group them to look like one row by having the name column
have only one entry for all three rows. (Actually, there could be
four or more rows if you wanted to have more than one example, which
is another thing that was painful before but works easily now.)
This is handled by a "morerows" annotation on the name entry, which
isn't perfect (notably, the toolchain is not smart enough to avoid
breaking these row groups across PDF pages) but there seems no better
solution in DocBook. The name column is normally fairly narrow,
allowing plenty of space for the other column(s), and not wasting too
much space when one of the other components runs to multiple lines.
The varying row layout is managed by defining named "spans" and then
tagging entries with a "spanname" of "name", "sig", "desc", "example",
or "exresult". This provides a bit of semantic annotation to go with
the formatting improvement, which seems like a good thing. (It seems
that we have to re-define these spans afresh for each table, which is
annoying, but it's not any worse than the duplication involved in
the table headers. At least that gives us an opportunity to vary the
relative column widths per-table, which is handy since function tables
sometimes need much wider name columns than operator tables.)
Signature entries should be written in the style
<function>fname</function>(<type>typename</type> ...)
<returnvalue>typename</returnvalue>
The <returnvalue> tag produces a right arrow before the result type
name. (I'll document that convention in a user-visible place later.)
While this provides significantly more horizontal space than before
for examples, it's still true that PDF output is a lot narrower than
typical webpage viewing windows, so some examples need to be broken
in places where there is no whitespace. I've added &zwsp; markers in
suitable places to allow the tables to render warning-free in PDF.
I've so far converted only the date/time operator, date/time function,
and enum function tables in sections 9.9 and 9.10; these were chosen
to provide a reasonable sample of the formatting problems that need
to be solved. Assuming that this looks good on the website and doesn't
provoke howls of anguish, I'll work on the other similar tables in the
near future.
There's a moderate amount of new editorial content in this patch along
with the raw formatting changes; for instance I had to write text
descriptions for operators that lacked them. I failed to resist the
temptation to improve some other descriptions and examples, too.
Patch by me, with thanks to Alexander Lakhin for assistance with
figuring out some formatting issues.
Discussion: https://postgr.es/m/9326.1581457869@sss.pgh.pa.us
We already had a couple of places using zero-width spaces for formatting
hackery, and we're going to need more if we ever want the PDF manuals to
look decent. But please let's not write hard-coded Unicode escapes.
We can avoid that by using a custom entity, which also provides a place
to put a teeny bit of documentation about what it is and how to use it.
I'd previously posted a patch using "&break;" for this, but on reflection
that would be horrible to grep for. Instead let's use "&zwsp;", based
on the name of the Unicode symbol ("zero width space").
Discussion: https://postgr.es/m/9326.1581457869@sss.pgh.pa.us
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value. Lift that restriction by redeclaring the
bit number arguments as int8 (which requires a catversion bump,
hence is not back-patchable).
The similarly-named functions for bit/varbit don't really have a
problem because we restrict those types to at most VARBITMAXLEN bits;
hence leave them alone.
While here, extend the encode/decode functions in utils/adt/encode.c
to allow dealing with values wider than 1GB. This is not a live bug
or restriction in current usage, because no input could be more than
1GB, and since none of the encoders can expand a string more than 4X,
the result size couldn't overflow uint32. But it might be desirable
to support more in future, so make the input length values size_t
and the potential-output-length values uint64.
Also add some test cases to improve the miserable code coverage
of these functions.
Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat
Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to
users as int8. Now that we have an SQL type xid8 for FullTransactionId,
define a new set of functions including pg_current_xact_id() and
pg_current_snapshot() based on that. Keep the old functions around too,
for now.
It's a bit sneaky to use the same C functions for both, but since the
binary representation is identical except for the signedness of the
type, and since older functions are the ones using the wrong signedness,
and since we'll presumably drop the older ones after a reasonable period
of time, it seems reasonable to switch to FullTransactionId internally
and share the code for both.
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and
check Unicode normal forms, per SQL standard.
To support fast IS NORMALIZED tests, we pull in a new data file
DerivedNormalizationProps.txt from Unicode and build a lookup table
from that, using techniques similar to ones already used for other
Unicode data. make update-unicode will keep it up to date. We only
build and use these tables for the NFC and NFKC forms, because they
are too big for NFD and NFKD and the improvement is not significant
enough there.
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
Previously if a promotion was triggered while recovery was paused,
the paused state continued. Also recovery could be paused by executing
pg_wal_replay_pause() even while a promotion was ongoing. That is,
recovery pause had higher priority over a standby promotion.
But this behavior was not desirable because most users basically wanted
the recovery to complete as soon as possible and the server to become
the master when they requested a promotion.
This commit changes recovery so that it prefers a promotion over
recovery pause. That is, if a promotion is triggered while recovery
is paused, the paused state ends and a promotion continues. Also
this commit makes recovery pause functions like pg_wal_replay_pause()
throw an error if they are executed while a promotion is ongoing.
Internally, this commit adds new internal function PromoteIsTriggered()
that returns true if a promotion is triggered. Since the name of
this function and the existing function IsPromoteTriggered() are
confusingly similar, the commit changes the name of IsPromoteTriggered()
to IsPromoteSignaled, as more appropriate name.
Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi, Sergei Kornilov
Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com
Add missing index entries, add missing information on pg_upgrade man
page, order things alphabetical instead of (apparently) in the order
they were implemented, reduce repetitiveness a bit.
It's strange that a directory-listing function does not list all entries
in a directory, so let's at least document it. This involves
pg_ls_logdir
pg_ls_waldir
pg_ls_archive_statusdir
pg_ls_tmpdir
Backpatch as far back as it applies cleanly (and as far as as each
function exists). REL_10_STABLE uses different wording, but hopefully
people are not reading docs so old to write new apps anyway.
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200305161838.GJ684@telsasoft.com
to_char() has long allowed the TM (translation mode) prefix to
specify output of translated month or day names; but that prefix
had no effect in input format strings. Now it does. to_date()
and to_timestamp() will now recognize the same month or day names
that to_char() would output for the same format code. Matching is
case-insensitive (per the active collation's notion of what that
means), just as it has always been for English month/day names
without the TM prefix.
(As per the discussion thread, there are lots of cases that this
feature will not handle, such as alternate day names. But being
able to accept what to_char() will output seems useful enough.)
In passing, fix some shaky English and violations of message
style guidelines in jsonpath errors for the .datetime() method,
which depends on this code.
Juan José Santamaría Flecha, reviewed and modified by me,
with other commentary from Alvaro Herrera, Tomas Vondra,
Arthur Zakirov, Peter Eisentraut, Mark Dilger.
Discussion: https://postgr.es/m/CAC+AXB3u1jTngJcoC1nAHBf=M3v-jrEfo86UFtCqCjzbWS9QhA@mail.gmail.com
Deduplication reduces the storage overhead of duplicates in indexes that
use the standard nbtree index access method. The deduplication process
is applied lazily, after the point where opportunistic deletion of
LP_DEAD-marked index tuples occurs. Deduplication is only applied at
the point where a leaf page split would otherwise be required. New
posting list tuples are formed by merging together existing duplicate
tuples. The physical representation of the items on an nbtree leaf page
is made more space efficient by deduplication, but the logical contents
of the page are not changed. Even unique indexes make use of
deduplication as a way of controlling bloat from duplicates whose TIDs
point to different versions of the same logical table row.
The lazy approach taken by nbtree has significant advantages over a GIN
style eager approach. Most individual inserts of index tuples have
exactly the same overhead as before. The extra overhead of
deduplication is amortized across insertions, just like the overhead of
page splits. The key space of indexes works in the same way as it has
since commit dd299df8 (the commit that made heap TID a tiebreaker
column).
Testing has shown that nbtree deduplication can generally make indexes
with about 10 or 15 tuples for each distinct key value about 2.5X - 4X
smaller, even with single column integer indexes (e.g., an index on a
referencing column that accompanies a foreign key). The final size of
single column nbtree indexes comes close to the final size of a similar
contrib/btree_gin index, at least in cases where GIN's posting list
compression isn't very effective. This can significantly improve
transaction throughput, and significantly reduce the cost of vacuuming
indexes.
A new index storage parameter (deduplicate_items) controls the use of
deduplication. The default setting is 'on', so all new B-Tree indexes
automatically use deduplication where possible. This decision will be
reviewed at the end of the Postgres 13 beta period.
There is a regression of approximately 2% of transaction throughput with
synthetic workloads that consist of append-only inserts into a table
with several non-unique indexes, where all indexes have few or no
repeated values. The underlying issue is that cycles are wasted on
unsuccessful attempts at deduplicating items in non-unique indexes.
There doesn't seem to be a way around it short of disabling
deduplication entirely. Note that deduplication of items in unique
indexes is fairly well targeted in general, which avoids the problem
there (we can use a special heuristic to trigger deduplication passes in
unique indexes, since we're specifically targeting "version bloat").
Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.
No bump in BTREE_VERSION, since the representation of posting list
tuples works in a way that's backwards compatible with version 4 indexes
(i.e. indexes built on PostgreSQL 12). However, users must still
REINDEX a pg_upgrade'd index to use deduplication, regardless of the
Postgres version they've upgraded from. This is the only way to set the
new nbtree metapage flag indicating that deduplication is generally
safe.
Author: Anastasia Lubennikova, Peter Geoghegan
Reviewed-By: Peter Geoghegan, Heikki Linnakangas
Discussion:
https://postgr.es/m/55E4051B.7020209@postgrespro.ruhttps://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru
Advancing a physical replication slot with pg_replication_slot_advance()
did not mark the slot as dirty if any advancing was done, preventing the
follow-up checkpoint to flush the slot data to disk. This caused the
advancing to be lost even on clean restarts. This does not happen for
logical slots as any advancing marked the slot as dirty. Per
discussion, the original feature has been implemented so as in the event
of a crash the slot may move backwards to a past LSN. This property is
kept and more documentation is added about that.
This commit adds some new TAP tests to check the persistency of physical
and logical slots after advancing across clean restarts.
Author: Alexey Kondratov, Michael Paquier
Reviewed-by: Andres Freund, Kyotaro Horiguchi, Craig Ringer
Discussion: https://postgr.es/m/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru
Backpatch-through: 11
Rather than intermixing the discussion of text-string and binary-string
functions, make a clean break, moving all discussion of binary-string
operations into section 9.5. This involves some duplication of
function descriptions between 9.4 and 9.5, but it seems cleaner on the
whole since the individual descriptions are clearer (and on the other
side of the coin, it gets rid of some duplicated descriptions, too).
Move the convert*/encode/decode functions to a separate table, because
they don't quite seem to fit under the heading of "binary string
functions".
Also provide full documentation of the textual formats supported by
encode() and decode() (which was the original goal of this patch
series, many moons ago).
Also move the table of built-in encoding conversions out of section 9.4,
where it no longer had any relevance whatsoever, and put it into section
23.3 about character sets. I chose to put both that and table 23.2
(multibyte-translation-table) into a new <sect2> so as not to break up
the flow of discussion in 23.3.3.
Also do a bunch of minor copy-editing on the function descriptions
in 9.4 and 9.5.
Karl Pinc, reviewed by Fabien Coelho, further hacking by me
Discussion: https://postgr.es/m/20190304163347.7bca4897@slate.meme.com
jsonb_set_lax() is the same as jsonb_set, except that it takes and extra
argument that specifies what to do if the value argument is NULL. The
default is 'use_json_null'. Other possibilities are 'raise_exception',
'return_target' and 'delete_key', all these behaviours having been
suggested as reasonable by various users.
Discussion: https://postgr.es/m/375873e2-c957-3a8d-64f9-26c43c2b16e7@2ndQuadrant.com
Reviewed by: Pavel Stehule
This commit implements jsonpath .datetime() method as it's specified in
SQL/JSON standard. There are no-argument and single-argument versions of
this method. No-argument version selects first of ISO datetime formats
matching input string. Single-argument version accepts template string as
its argument.
Additionally to .datetime() method itself this commit also implements
comparison ability of resulting date and time values. There is some difficulty
because exising jsonb_path_*() functions are immutable, while comparison of
timezoned and non-timezoned types involves current timezone. At first, current
timezone could be changes in session. Moreover, timezones themselves are not
immutable and could be updated. This is why we let existing immutable functions
throw errors on such non-immutable comparison. In the same time this commit
provides jsonb_path_*_tz() functions which are stable and support operations
involving timezones. As new functions are added to the system catalog,
catversion is bumped.
Support of .datetime() method was the only blocker prevents T832 from being
marked as supported. sql_features.txt is updated correspondingly.
Extracted from original patch by Nikita Glukhov, Teodor Sigaev, Oleg Bartunov.
Heavily revised by me. Comments were adjusted by Liudmila Mantrova.
Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Alexander Korotkov, Nikita Glukhov, Teodor Sigaev, Oleg Bartunov, Liudmila Mantrova
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
The array <@ and @> operators do not worry about duplicates: if every
member of array X matches some element of array Y, then X is contained
in Y, even if several members of X get matched to the same Y member.
This was not explicitly stated in the docs though, so improve matters.
Discussion: https://postgr.es/m/156614120484.1310.310161642239149585@wrigleys.postgresql.org
Provide some documentation about the differences between XQuery
regular expressions and those supported by Spencer's regex engine.
Since SQL now exposes XQuery regexps with the LIKE_REGEX operator,
I made this a standalone section designed to help somebody who
has to translate a LIKE_REGEX query to Postgres. (Eventually we might
extend Spencer's engine to allow precise implementation of XQuery,
but not today.)
Reference that in the jsonpath docs, provide definitions of the
XQuery flag letters, and add a description of the JavaScript-inspired
string literal syntax used within jsonpath. Also point out explicitly
that backslashes used within like_regex patterns will need to be doubled.
This also syncs the docs with the decision implemented in commit
d5b90cd64 to desupport XQuery's 'x' flag for now.
Jonathan Katz and Tom Lane
Discussion: https://postgr.es/m/CAPpHfdvDci4iqNF9fhRkTqhe-5_8HmzeLt56drH%2B_Rv2rNRqfg@mail.gmail.com
SQL Standard 2016 defines SSSSS format pattern for seconds past midnight in
jsonpath .datetime() method and CAST (... FORMAT ...) SQL clause. In our
datetime parsing engine we currently support it with SSSS name.
This commit adds SSSSS as an alias for SSSS. Alias is added in favor of
upcoming jsonpath .datetime() method. But it's also supported in to_date()/
to_timestamp() as positive side effect.
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Nikita Glukhov, Alexander Korotkov
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
SQL Standard 2016 defines FF1-FF9 format patters for fractions of seconds in
jsonpath .datetime() method and CAST (... FORMAT ...) SQL clause. Parsing
engine of upcoming .datetime() method will be shared with to_date()/
to_timestamp().
This patch implements FF1-FF6 format patterns for upcoming jsonpath .datetime()
method. to_date()/to_timestamp() functions will also get support of this
format patterns as positive side effect. FF7-FF9 are not supported due to
lack of precision in our internal timestamp representation.
Extracted from original patch by Nikita Glukhov, Teodor Sigaev, Oleg Bartunov.
Heavily revised by me.
Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Nikita Glukhov, Teodor Sigaev, Oleg Bartunov, Alexander Korotkov
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
As a result of some long-ago quick hacks, the SIMILAR TO operator
and the corresponding flavor of substring() interpreted "ESCAPE NULL"
as selecting the default escape character '\'. This is both
surprising and not per spec: the standard is clear that these
functions should return NULL for NULL input.
Additionally, because of inconsistency of the strictness markings
of 3-argument substring() and similar_escape(), the planner could not
inline the SQL definition of substring(), resulting in a substantial
performance penalty compared to the underlying POSIX substring()
function.
The simplest fix for this would be to change the strictness marking
of similar_escape(), but if we do that we risk breaking existing views
that depend on that function. Hence, leave similar_escape() as-is
as a compatibility function, and instead invent a new function
similar_to_escape() that comes in two strict variants.
There are a couple of other behaviors in this area that are also
not per spec, but they are documented and seem generally at least
as sane as the spec's definition, so leave them alone. But improve
the documentation to describe them fully.
Patch by me; thanks to Álvaro Herrera and Andrew Gierth for review
and discussion.
Discussion: https://postgr.es/m/14047.1557708214@sss.pgh.pa.us
Section 4.2.7 says that unless otherwise specified, built-in
aggregates ignore rows in which any input is null. This is
not true of the JSON aggregates, but it wasn't documented.
Fix that.
Of the other entries in table 9.55, some were explicit about
ignoring nulls, and some weren't; for consistency and
self-contained-ness, make them all say it explicitly.
Per bug #15884 from Tim Möhlmann. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/15884-c32d848f787fcae3@postgresql.org
The ids for linking to libpq functions were previously all lower-case.
Change to mixed-case, matching the actual function name, for easier
readability in the source. The output isn't changed in a significant
way, since the ids are converted to lower or upper case for file names
and anchors.
This adds a built-in function to generate UUIDs.
PostgreSQL hasn't had a built-in function to generate a UUID yet,
relying on external modules such as uuid-ossp and pgcrypto to provide
one. Now that we have a strong random number generator built-in, we
can easily provide a version 4 (random) UUID generation function.
This patch takes the existing function gen_random_uuid() from pgcrypto
and makes it a built-in function. The pgcrypto implementation now
internally redirects to the built-in one.
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/6a65610c-46fc-2323-6b78-e8086340a325@2ndquadrant.com
The code for conversions SQL_ASCII <-> MULE_INTERNAL and
SQL_ASCII <-> UTF8 was unreachable, because we long ago changed
the wrapper functions pg_do_encoding_conversion() et al so that
they have hard-wired behaviors for conversions involving SQL_ASCII.
(At least some of those fast paths date back to 2002, though it
looks like we may not have been totally consistent about this until
later.) Given the lack of complaints, nobody is dissatisfied with
this state of affairs. Hence, let's just remove the unreachable code.
Also, change CREATE CONVERSION so that it rejects attempts to
define such conversions. Since we consider that SQL_ASCII represents
lack of knowledge about the encoding in use, such a conversion would
be semantically dubious even if it were reachable.
Adjust a couple of regression test cases that had randomly decided
to rely on these conversion functions rather than any other ones.
Discussion: https://postgr.es/m/41163.1559156593@sss.pgh.pa.us
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
json_to_record(), when an output column is declared as type json or jsonb,
should emit the corresponding field of the input JSON object. But it got
this slightly wrong when the field is just a string literal: it failed to
escape the contents of the string. That typically resulted in syntax
errors if the string contained any double quotes or backslashes.
jsonb_to_record() handles such cases correctly, but I added corresponding
test cases for it too, to prevent future backsliding.
Improve the documentation, as it provided only a very hand-wavy
description of the conversion rules used by these functions.
Per bug report from Robert Vollmert. Back-patch to v10 where the
error was introduced (by commit cf35346e8).
Note that PG 9.4 - 9.6 also get this case wrong, but differently so:
they feed the de-escaped contents of the string literal to json[b]_in.
That behavior is less obviously wrong, so possibly it's being depended on
in the field, so I won't risk trying to make the older branches behave
like the newer ones.
Discussion: https://postgr.es/m/D6921B37-BD8E-4664-8D5F-DB3525765DCD@vllmrt.net
Define the meanings of the POSIX-spec character classes in line,
rather than referring to the ctype(3) man page. That man page
doesn't even exist on many modern systems, and if it does exist
it probably says the wrong things about non-ASCII characters.
Also document our non-POSIX-spec "ascii" character class.
Also, point out here that this behavior is controlled by collation or
LC_CTYPE, since the existing text explaining that is pretty far away.
Per gripe from Geert Lobbestael. Given the lack of prior complaints,
I'm not excited about back-patching this.
Discussion: https://postgr.es/m/155837022049.1359.2948065118562813468@wrigleys.postgresql.org
SQL's regular-expression substring() function is defined to have a
pattern argument that's separated into three subpatterns by escape-
double-quote markers; the function result is the part of the input
matching the second subpattern. The standard makes it clear that
if there is ambiguity about how to match the input to the subpatterns,
the first and third subpatterns should be taken to match the smallest
possible amount of text (i.e., they're "non greedy", in the terms of
our regex code). We were not doing it that way: the first subpattern
would eat the largest possible amount of text, causing the function
result to be shorter than what the spec requires.
Fix that by attaching explicit greediness quantifiers to the
subpatterns. (This depends on the regex fix in commit 8a29ed053;
before that, this didn't reliably change the regex engine's behavior.)
Also, by adding parentheses around each subpattern, we ensure that
"|" (OR) in the subpatterns behave sanely. Previously, "|" in the
first or third subpatterns didn't work.
This patch also makes the function throw error if you write more than
two escape-double-quote markers, and do something sane if you write
just one, and document that behavior. Previously, an odd number of
markers led to a confusing complaint about unbalanced parentheses,
while extra pairs of markers were just ignored. (Note that the spec
requires exactly two markers, but we've historically allowed there
to be none, and this patch preserves the old behavior for that case.)
In passing, adjust some substring() test cases that didn't really
prove what they said they were testing for: they used patterns
that didn't match the data string, so that the output would be
NULL whether or not the function was really strict.
Although this is certainly a bug fix, changing the behavior in back
branches seems undesirable: applications could perhaps be depending on
the old behavior, since it's not obviously wrong unless you read the
spec very closely. Hence, no back-patch.
Discussion: https://postgr.es/m/5bb27a41-350d-37bf-901e-9d26f5592dd0@charter.net
Previously it's documented that use of replication functions is
restricted to superusers. This is true for the functions which
use replication origin, but not for pg_logicl_emit_message() and
functions which use replication slot. For example, not only
superusers but also users with REPLICATION privilege is allowed
to use the functions for replication slot. This commit fixes
the documentation for the privileges required for those replication
functions.
Back-patch to 9.4 (all supported versions).
Author: Matsumura Ryo
Discussion: https://postgr.es/m/03040DFF97E6E54E88D3BFEE5F5480F74ABA6E16@G01JPEXMBYT04
Word "singleton" is hard for user understanding, especially taking into account
there is only one place it's used in the docs and there is even no definition.
Use more evident wording instead.
Discussion: https://postgr.es/m/23737.1556550645%40sss.pgh.pa.us
This commit adds the description that "non-exclusive" pg_start_backup
and pg_stop_backup can be executed even during recovery. Previously
it was wrongly documented that those functions are not allowed to be
executed during recovery.
Back-patch to 9.6 where non-exclusive backup API was added.
Discussion: https://postgr.es/m/CAHGQGwEuAYrEX7Yhmf2MCrTK81HDkkg-JqsOUh8zw6+zYC5zzw@mail.gmail.com
This allows the user to create duplicates of existing replication slots,
either logical or physical, and even changing properties such as whether
they are temporary or the output plugin used.
There are multiple uses for this, such as initializing multiple replicas
using the slot for one base backup; when doing investigation of logical
replication issues; and to select a different output plugins.
Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Andres Freund, Petr Jelinek
Discussion: https://postgr.es/m/CAD21AoAm7XX8y_tOPP6j4Nzzch12FvA1wPqiO690RCk+uYVstg@mail.gmail.com
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
This adds a flag "deterministic" to collations. If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal. That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.
This functionality is only supported with the ICU provider. At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.
The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
This patch makes changes in three areas:
- CREATE COLLATION DDL changes and system catalog changes to support
this new flag.
- Many executor nodes and auxiliary code are extended to track
collations. Previously, this code would just throw away collation
information, because the eventually-called user-defined functions
didn't use it since they only cared about equality, which didn't
need collation information.
- String data type functions that do equality comparisons and hashing
are changed to take the (non-)deterministic flag into account. For
comparison, this just means skipping various shortcuts and tie
breakers that use byte-wise comparison. For hashing, we first need
to convert the input string to a canonical "sort key" using the ICU
analogue of strxfrm().
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
Add support of numeric error suppression to jsonpath as it's required by
standard. This commit doesn't use PG_TRY()/PG_CATCH() in order to implement
that. Instead, it provides internal versions of numeric functions used, which
support error suppression.
Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Alexander Korotkov, Nikita Glukhov
Reviewed-by: Tomas Vondra
SQL 2016 standards among other things contains set of SQL/JSON features for
JSON processing inside of relational database. The core of SQL/JSON is JSON
path language, allowing access parts of JSON documents and make computations
over them. This commit implements partial support JSON path language as
separate datatype called "jsonpath". The implementation is partial because
it's lacking datetime support and suppression of numeric errors. Missing
features will be added later by separate commits.
Support of SQL/JSON features requires implementation of separate nodes, and it
will be considered in subsequent patches. This commit includes following
set of plain functions, allowing to execute jsonpath over jsonb values:
* jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]),
* jsonb_path_match(jsonb, jsonpath[, jsonb, bool]),
* jsonb_path_query(jsonb, jsonpath[, jsonb, bool]),
* jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]).
* jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]).
This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which
are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb,
jsonpath) correspondingly. These operators will have an index support
(implemented in subsequent patches).
Catversion bumped, to add new functions and operators.
Code was written by Nikita Glukhov and Teodor Sigaev, revised by me.
Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work
was inspired by Oleg Bartunov.
Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova
Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
The SQL:2016 standard adds support for the hyperbolic functions
sinh(), cosh(), and tanh(). POSIX has long required libm to
provide those functions as well as their inverses asinh(),
acosh(), atanh(). Hence, let's just expose the libm functions
to the SQL level. As with the trig functions, we only implement
versions for float8, not numeric.
For the moment, we'll assume that all platforms actually do have
these functions; if experience teaches otherwise, some autoconf
effort may be needed.
SQL:2016 also adds support for base-10 logarithm, but with the
function name log10(), whereas the name we've long used is log().
Add aliases named log10() for the float8 and numeric versions.
Lætitia Avrot
Discussion: https://postgr.es/m/CAB_COdguG22LO=rnxDQ2DW1uzv8aQoUzyDQNJjrR4k00XSgm5w@mail.gmail.com
COALESCE, GREATEST and LEAST all look like functions taking variable
numbers of arguments, but in fact they are not functions, and so
VARIADIC array arguments don't work with them. Add a note to the docs
explaining this fact.
The consensus is not to try to make this work, but just to document the
limitation.
Discussion: https://postgr.es/m/CAFj8pRCaAtuXuRtvXf5GmPbAVriUQrNMo7-=TXUFN025S31R_w@mail.gmail.com
Correctly process nodes of more types than previously. In some cases,
nodes were being ignored (nothing was output); in other cases, trying to
return them resulted in errors about unrecognized nodes. In yet other
cases, necessary escaping (of XML special characters) was not being
done. Fix all those (as far as the authors could find) and add
regression tests cases verifying the new behavior.
I (Álvaro) was of two minds about backpatching these changes. They do
seem bugfixes that would benefit most users of the affected functions;
but on the other hand it would change established behavior in minor
releases, so it seems prudent not to.
Authors: Pavel Stehule, Markus Winand, Chapman Flack
Discussion:
https://postgr.es/m/CAFj8pRA6J25CtAZ2TuRvxK3gat7-bBUYh0rfE2yM7Hj9GD14Dg@mail.gmail.comhttps://postgr.es/m/8BDB0627-2105-4564-AA76-7849F028B96E@winand.at
The elephant in the room as pointed out by Chapman Flack, not fixed in
this commit, is that we still have XMLTABLE operating on XPath 1.0
instead of the standard-mandated XQuery (or even its subset XPath 2.0).
Fixing that is a major undertaking, however.
This clause is used to indicate the passing mode of a XML document, but
we were doing it wrong: we accepted BY REF and ignored it, and rejected
BY VALUE as a syntax error. The reality, however, is that documents are
always passed BY VALUE, so rejecting that clause was silly. Change
things so that we accept BY VALUE.
BY REF continues to be accepted, and continues to be ignored.
Author: Chapman Flack
Reviewed-by: Pavel Stehule
Discussion: https://postgr.es/m/5C297BB7.9070509@anastigmatix.net
This is useful when looking at partition trees with multiple layers, and
combined with pg_partition_tree, it provides the possibility to show up
an entire tree by just knowing one member at any level.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Amit Langote
Discussion: https://postgr.es/m/20181207014015.GP2407@paquier.xyz
These have been found while cross-checking for the use of unique words
in the documentation, and a wait event was not getting generated in a way
consistent to what the documentation provided.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/9b5a3a85-899a-ae62-dbab-1e7943aa5ab1@gmail.com
Previously, the SQL random() function depended on libc's random(3),
and setseed() invoked srandom(3). This results in interference between
these functions and backend-internal uses of random(3). We'd never paid
too much mind to that, but in the wake of commit 88bdbd3f7 which added
log_statement_sample_rate, the interference arguably has a security
consequence: if log_statement_sample_rate is active then an unprivileged
user could probably control which if any of his SQL commands get logged,
by issuing setseed() at the right times. That seems bad.
To fix this reliably, we need random() and setseed() to use their own
private random state variable. Standard random(3) isn't amenable to such
usage, so let's switch to pg_erand48(). It's hard to say whether that's
more or less "random" than any particular platform's version of random(3),
but it does have a wider seed value and a longer period than are required
by POSIX, so we can hope that this isn't a big downgrade. Also, we should
now have uniform behavior of random() across platforms, which is worth
something.
While at it, upgrade the per-process seed initialization method to use
pg_strong_random() if available, greatly reducing the predictability
of the initial seed value. (I'll separately do something similar for
the internal uses of random().)
In addition to forestalling the possible security problem, this has a
benefit in the other direction, which is that we can now document
setseed() as guaranteeing a reproducible sequence of random() values.
Previously, because of the possibility of internal calls of random(3),
we could not promise any such thing.
Discussion: https://postgr.es/m/3859.1545849900@sss.pgh.pa.us
Expand section 5.6 "Privileges" to include the full definition of
each privilege type, and an explanation of aclitem privilege displays,
along with some helpful summary tables. Most of this material came
out of the GRANT reference page, although some of it is new.
Adjust a bunch of links that were pointing to GRANT to point to 5.6.
Fabien Coelho and Tom Lane, reviewed by Bradley DeJong
Discussion: https://postgr.es/m/alpine.DEB.2.21.1807311735200.20743@lancre
recovery.conf settings are now set in postgresql.conf (or other GUC
sources). Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.
Recovery is now initiated by a file recovery.signal. Standby mode is
initiated by a file standby.signal. The standby_mode setting is
gone. If a recovery.conf file is found, an error is issued.
The trigger_file setting has been renamed to promote_trigger_file as
part of the move.
The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".
pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.
Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/
date_trunc(field, timestamptz, zone_name) performs truncation using
the named time zone as reference, rather than working in the session
time zone as is the default behavior. It's equivalent to
date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name
but it's faster, easier to type, and arguably easier to understand.
Vik Fearing and Tom Lane
Discussion: https://postgr.es/m/6249ffc4-2b22-4c1b-4e7d-7af84fedd7c6@2ndquadrant.com
This new function is useful to display a full tree of partitions with a
partitioned table given in output, and avoids the need of any complex
WITH RECURSIVE query when looking at partition trees which are
deep multiple levels.
It returns a set of records, one for each partition, containing the
partition's name, its immediate parent's name, a boolean value telling
if the relation is a leaf in the tree and an integer telling its level
in the partition tree with given table considered as root, beginning at
zero for the root, and incrementing by one each time the scan goes one
level down.
Author: Amit Langote
Reviewed-by: Jesper Pedersen, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/8d00e51a-9a51-ad02-d53e-ba6bf50b2e52@lab.ntt.co.jp
This function is able to promote a standby with this new SQL-callable
function. Execution access can be granted to non-superusers so that
failover tools can observe the principle of least privilege.
Catalog version is bumped.
Author: Laurenz Albe
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/6e7c79b3ec916cf49742fb8849ed17cd87aed620.camel@cybertec.at
This function lists the contents of the WAL archive status directory,
and is intended to be used by monitoring tools. Unlike pg_ls_dir(),
access to it can be granted to non-superusers so that those monitoring
tools can observe the principle of least privilege. Access is also
given by default to members of pg_monitor.
Author: Christoph Moench-Tegeder
Reviewed-by: Aya Iwata
Discussion: https://postgr.es/m/20180930205920.GA64534@elch.exwg.net
This lists the contents of a temporary directory associated to a given
tablespace, useful to get information about on-disk consumption caused
by temporary files used by a session query. By default, pg_default is
scanned, and a tablespace can be specified as argument.
This function is intended to be used by monitoring tools, and, unlike
pg_ls_dir(), access to them can be granted to non-superusers so that
those monitoring tools can observe the principle of least privilege.
Access is also given by default to members of pg_monitor.
Author: Nathan Bossart
Reviewed-by: Laurenz Albe
Discussion: https://postgr.es/m/92F458A2-6459-44B8-A7F2-2ADD3225046A@amazon.com
aclitem functions and operators have been heretofore undocumented.
Fix that. While at it, ensure the non-operator aclitem functions have
pg_description strings.
Does not seem worthwhile to back-patch.
Author: Fabien Coelho, with pg_description from John Naylor, and significant
refactoring and editorialization by me.
Reviewed by: Tom Lane
Discussion: https://postgr.es/m/flat/alpine.DEB.2.21.1808010825490.18204%40lancre
to_timestamp()/to_date() functions were introduced mainly for Oracle
compatibility, and became very popular among PostgreSQL users. However, some
behavior of to_timestamp()/to_date() functions are both incompatible with Oracle
and confusing for our users. This behavior is related to handling of spaces and
separators in non FX (fixed format) mode. This commit reworks this behavior
making less confusing, better documented and more compatible with Oracle.
Nevertheless, there are still following incompatibilities with Oracle.
1) We don't insist that there are no format string patterns unmatched to
input string.
2) In FX mode we don't insist space and separators in format string to exactly
match input string.
3) When format string patterns are divided by mix of spaces and separators, we
don't distinguish them, while Oracle takes into account only last group of
spaces/separators.
Discussion: https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com
Author: Artur Zakirov, Alexander Korotkov, Liudmila Mantrova
Review: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston
pg_get_object_address and pg_identify_object_as_address are supposed
to be inverses, but they disagreed as to the names of the arguments
representing the textual form of an object address. Moreover, the
documented argument names didn't agree with reality at all, either
for these functions or pg_identify_object.
In HEAD and v11, I think we can get away with renaming the input
arguments of pg_get_object_address to match the outputs of
pg_identify_object_as_address. In theory that might break queries
using named-argument notation to call pg_get_object_address, but
it seems really unlikely that anybody is doing that, or that they'd
have much trouble adjusting if they were. In older branches, we'll
just live with the lack of consistency.
Aside from fixing the documentation of these functions to match reality,
I couldn't resist the temptation to do some copy-editing.
Per complaint from Jean-Pierre Pelletier. Back-patch to 9.5 where these
functions were introduced. (Before v11, this is a documentation change
only.)
Discussion: https://postgr.es/m/CANGqjDnWH8wsTY_GzDUxbt4i=y-85SJreZin4Hm8uOqv1vzRQA@mail.gmail.com