Commit Graph

4610 Commits

Author SHA1 Message Date
Jeff Davis a326aac8f1 Wrap ICU ucol_open().
Hide details of supporting older ICU versions in a wrapper
function. The current code only needs to handle
icu_set_collation_attributes(), but a subsequent commit will add
additional version-specific code.

Discussion: https://postgr.es/m/7ee414ad-deb5-1144-8a0e-b34ae3b71cd5@enterprisedb.com
Reviewed-by: Peter Eisentraut
2023-03-23 09:15:25 -07:00
Jeff Davis 869650fa86 Support language tags in older ICU versions (53 and earlier).
By calling uloc_canonicalize() before parsing the attributes, the
existing locale attribute parsing logic works on language tags as
well.

Fix a small memory leak, too.

Discussion: http://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2023-03-21 16:12:37 -07:00
Tom Lane 75bd846b68 Add functions to do timestamptz arithmetic in a non-default timezone.
Add versions of timestamptz + interval, timestamptz - interval, and
generate_series(timestamptz, ...) in which a timezone can be specified
explicitly instead of defaulting to the TimeZone GUC setting.

The new functions for the first two are named date_add and
date_subtract.  This might seem too generic, but we could use
overloading to add additional variants if that seems useful.

Along the way, improve the docs' pretty inadequate explanation
of how timestamptz +- interval works.

Przemysław Sztoch and Gurjeet Singh; cosmetic changes and most of
the docs work by me

Discussion: https://postgr.es/m/01a84551-48dd-1359-bf7e-f6b0203a6bd0@sztoch.pl
2023-03-18 14:12:16 -04:00
Tom Lane 3e59e5048d Refactor datetime functions' timezone lookup code to reduce duplication.
We already had five copies of essentially the same logic, and an
upcoming patch introduces yet another use-case.  That's past my
threshold of pain, so introduce a common subroutine.  There's not
that much net code savings, but the chance of typos should go down.

Inspired by a patch from Przemysław Sztoch, but different in detail.

Discussion: https://postgr.es/m/01a84551-48dd-1359-bf7e-f6b0203a6bd0@sztoch.pl
2023-03-17 17:47:19 -04:00
Jeff Davis f413941f41 Fix t_isspace(), etc., when datlocprovider=i and datctype=C.
Check whether the datctype is C to determine whether t_isspace() and
related functions use isspace() or iswspace().

Previously, t_isspace() checked whether the database default collation
was C; which is incorrect when the default collation uses the ICU
provider.

Discussion: https://postgr.es/m/79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com
Reviewed-by: Peter Eisentraut
Backpatch-through: 15
2023-03-17 12:08:46 -07:00
Andres Freund 0dc40196f2 Work around spurious compiler warning in inet operators
gcc 12+ has complaints like the following:

../../../../../pgsql/src/backend/utils/adt/network.c: In function 'inetnot':
../../../../../pgsql/src/backend/utils/adt/network.c:1893:34: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
 1893 |                         pdst[nb] = ~pip[nb];
      |                         ~~~~~~~~~^~~~~~~~~~
../../../../../pgsql/src/include/utils/inet.h:27:23: note: at offset -1 into destination object 'ipaddr' of size 16
   27 |         unsigned char ipaddr[16];       /* up to 128 bits of address */
      |                       ^~~~~~
../../../../../pgsql/src/include/utils/inet.h:27:23: note: at offset -1 into destination object 'ipaddr' of size 16

This is due to a compiler bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104986

It has been a year since the bug has been reported without getting fixed. As
the warnings are verbose and use of gcc 12 is becoming more common, it seems
worth working around the bug. Particularly because a simple reformulation of
the loop condition fixes the issue and isn't any less readable.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/144536.1648326206@sss.pgh.pa.us
Backpatch: 11-
2023-03-16 14:48:45 -07:00
Tom Lane 5b3c595355 Tighten error checks in datetime input, and remove bogus "ISO" format.
DecodeDateTime and DecodeTimeOnly had support for date input in the
style "Y2023M03D16", which the comments claimed to be an "ISO" format.
However, so far as I can find there is no such format in ISO 8601;
they write units before numbers in intervals, but not in datetimes.
Furthermore, the lesser-known ISO 8601-2 spec actually defines an
incompatible format "2023Y03M16D".  None of our documentation mentions
such a format either.  So let's just drop it.

That leaves us with only two cases for a prefix unit specifier in
datetimes: Julian dates written as Jnnnn, and the "T" separator
defined by ISO 8601.  Add checks to catch misuse of these specifiers,
that is consecutive specifiers or a dangling specifier at the end of
the string.  We do not however disallow a specifier that is separated
from the field that it disambiguates (by noise words or unrelated
fields).  That being the case, remove some overly-aggressive error
checks from the ISOTIME cases.

Joseph Koshakow, editorialized a bit by me; thanks also to
Peter Eisentraut for some standards-reading.

Discussion: https://postgr.es/m/CAAvxfHf2Q1gKLiHGnuPOiyf0ASvKUM4BnMfsXuwgtYEb_Gx0Zw@mail.gmail.com
2023-03-16 14:18:33 -04:00
Tom Lane 2333803d84 Use "data directory" not "current directory" in error messages.
The user receiving the message might not understand where the
server's "current directory" is.  "Data directory" seems clearer.
(This would not be good for frontend code, but both of these
messages are only issued in the backend.)

Kyotaro Horiguchi

Discussion: https://postgr.es/m/20230316.111646.1564684434328830712.horikyota.ntt@gmail.com
2023-03-16 12:04:08 -04:00
Michael Paquier e731aeac89 Remove PgStat_BackendFunctionEntry
This structure included only PgStat_FunctionCounts, and removing it
facilitates some upcoming refactoring for pgstatfuncs.c to use more
macros rather that mostly-duplicated functions.

Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/11d531fe-52fc-c6ea-7e8e-62f1b6ec626e@gmail.com
2023-03-16 14:22:34 +09:00
Tom Lane 483bdb2afe Support [NO] INDENT option in XMLSERIALIZE().
This adds the ability to pretty-print XML documents ... according to
libxml's somewhat idiosyncratic notions of what's pretty, anyway.
One notable divergence from a strict reading of the spec is that
libxml is willing to collapse empty nodes "<node></node>" to just
"<node/>", whereas SQL and the underlying XML spec say that this
option should only result in whitespace tweaks.  Nonetheless,
it seems close enough to justify using the SQL-standard syntax.

Jim Jones, reviewed by Peter Smith and myself

Discussion: https://postgr.es/m/2f5df461-dad8-6d7d-4568-08e10608a69b@uni-muenster.de
2023-03-15 16:59:09 -04:00
Tom Lane b081fe4199 Fix corner case bug in numeric to_char() some more.
The band-aid applied in commit f0bedf3e4 turns out to still need
some work: it made sure we didn't set Np->last_relevant too small
(to the left of the decimal point), but it didn't prevent setting
it too large (off the end of the partially-converted string).
This could result in fetching data beyond the end of the allocated
space, which with very bad luck could cause a SIGSEGV, though
I don't see any hazard of interesting memory disclosure.

Per bug #17839 from Thiago Nunes.  The bug's pretty ancient,
so back-patch to all supported versions.

Discussion: https://postgr.es/m/17839-aada50db24d7b0da@postgresql.org
2023-03-14 19:17:31 -04:00
Dean Rasheed d5d574146d Add support for the error functions erf() and erfc().
Expose the standard error functions as SQL-callable functions. These
are expected to be useful to people working with normal distributions,
and we use them here to test the distribution from random_normal().

Since these functions are defined in the POSIX and C99 standards, they
should in theory be available on all supported platforms. If that
turns out not to be the case, more work will be needed.

On all platforms tested so far, using extra_float_digits = -1 in the
regression tests is sufficient to allow for variations between
implementations. However, past experience has shown that there are
almost certainly going to be additional unexpected portability issues,
so these tests may well need further adjustments, based on the
buildfarm results.

Dean Rasheed, reviewed by Nathan Bossart and Thomas Munro.

Discussion: https://postgr.es/m/CAEZATCXv5fi7+Vu-POiyai+ucF95+YMcCMafxV+eZuN1B-=MkQ@mail.gmail.com
2023-03-14 09:17:36 +00:00
Tom Lane 25a7812cd0 Fix JSON error reporting for many cases of erroneous string values.
The majority of error exit cases in json_lex_string() failed to
set lex->token_terminator, causing problems for the error context
reporting code: it would see token_terminator less than token_start
and do something more or less nuts.  In v14 and up the end result
could be as bad as a crash in report_json_context().  Older
versions accidentally avoided that fate; but all versions produce
error context lines that are far less useful than intended,
because they'd stop at the end of the prior token instead of
continuing to where the actually-bad input is.

To fix, invent some macros that make it less notationally painful
to do the right thing.  Also add documentation about what the
function is actually required to do; and in >= v14, add an assertion
in report_json_context about token_terminator being sufficiently
far advanced.

Per report from Nikolay Shaplov.  Back-patch to all supported
versions.

Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro
2023-03-13 15:19:00 -04:00
Tom Lane bcc704b524 Reject combining "epoch" and "infinity" with other datetime fields.
Datetime input formerly accepted combinations such as
'1995-08-06 infinity', but this seems like a clear error.
Reject any combination of regular y/m/d/h/m/s fields with
these special tokens.

Joseph Koshakow, reviewed by Keisuke Kuroda and myself

Discussion: https://postgr.es/m/CAAvxfHdm8wwXwG_FFRaJ1nTHiMWb7YXS2YKCzCt8Q0a2ZoMcHg@mail.gmail.com
2023-03-09 16:49:03 -05:00
Peter Eisentraut 30a53b7929 Allow tailoring of ICU locales with custom rules
This exposes the ICU facility to add custom collation rules to a
standard collation.

New options are added to CREATE COLLATION, CREATE DATABASE, createdb,
and initdb to set the rules.

Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
2023-03-08 16:56:37 +01:00
Peter Eisentraut ce1215d9b0 Add support for unit "B" to pg_size_bytes()
This makes it consistent with the units support in GUC.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0106914a-9eb5-22be-40d8-652cc88c827d%40enterprisedb.com
2023-03-07 20:31:16 +01:00
David Rowley cf96907aad Fix incorrect comment in pg_get_partkeydef()
The comment claimed the output of the function was prefixed by "PARTITION
BY".  This is incorrect.

Author: Japin Li
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/MEYP282MB166923B446FF5FE55B9DACB7B6B69@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2023-03-07 14:33:28 +13:00
Peter Eisentraut 102a5c164a SQL JSON path enhanced numeric literals
Add support for non-decimal integer literals and underscores in
numeric literals to SQL JSON path language.  This follows the rules of
ECMAScript, as referred to by the SQL standard.

Internally, all the numeric literal parsing of jsonpath goes through
numeric_in, which already supports all this, so this patch is just a
bit of lexer work and some tests and documentation.

Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b11b25bb-6ec1-d42f-cedd-311eae59e1fb@enterprisedb.com
2023-03-05 15:19:58 +01:00
Peter Eisentraut b1307b8b60 Fix incorrect format placeholders 2023-03-03 07:01:18 +01:00
Tom Lane d7056bc1c7 Avoid fetching one past the end of translate()'s "to" parameter.
This is usually harmless, but if you were very unlucky it could
provoke a segfault due to the "to" string being right up against
the end of memory.  Found via valgrind testing (so we might've
found it earlier, except that our regression tests lacked any
exercise of translate()'s deletion feature).

Fix by switching the order of the test-for-end-of-string and
advance-pointer steps.  While here, compute "to_ptr + tolen"
just once.  (Smarter compilers might figure that out for
themselves, but let's just make sure.)

Report and fix by Daniil Anisimov, in bug #17816.

Discussion: https://postgr.es/m/17816-70f3d2764e88a108@postgresql.org
2023-03-01 11:30:31 -05:00
Michael Paquier b8da37b3ad Rework pg_input_error_message(), now renamed pg_input_error_info()
pg_input_error_info() is now a SQL function able to return a row with
more than just the error message generated for incorrect data type
inputs when these are able to handle soft failures, returning more
contents of ErrorData, as of:
- The error message (same as before).
- The error detail, if set.
- The error hint, if set.
- SQL error code.

All the regression tests that relied on pg_input_error_message() are
updated to reflect the effects of the rename.

Per discussion with Tom Lane and Andrew Dunstan.

Author: Nathan Bossart
Discussion: https://postgr.es/m/139a68e1-bd1f-a9a7-b5fe-0be9845c6311@dunslane.net
2023-02-28 08:04:13 +09:00
Tom Lane 728560db7d Suppress compiler warnings in new pgstats code.
Some clang versions whine about comparing an enum variable to
a value outside the range of the enum, on the grounds that the
result must be constant.  In the cases we fix here, the loops
will terminate only if the enum variable can in fact hold a
value one beyond its declared range.  While that's very likely
to always be true for these enum types, it still seems like a
poor coding practice to assume it; so use "int" loop variables
instead to silence the warnings.  (This matches what we've done
in other places, for example loops over the range of ForkNumber.)

While at it, let's drop the XXX_FIRST macros for these enums and just
write zeroes for the loop start values.  The apparent flexibility
seems rather illusory given that iterating up to one-less-than-
the-number-of-values is only correct for a zero-based range.

Melanie Plageman

Discussion: https://postgr.es/m/20520.1677435600@sss.pgh.pa.us
2023-02-27 17:21:31 -05:00
Tom Lane ded7b7bbc3 Silence more compiler warnings introduced by d87d548cd0.
Per buildfarm, there are still a couple of functions where we
get warnings from compilers that don't know that elog(ERROR)
doesn't return.
2023-02-26 14:05:03 -05:00
Jeff Davis 05fc551796 Silence compiler warnings introduced by d87d548cd0.
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20230224002029.GQ1653@telsasoft.com
2023-02-24 09:11:35 -08:00
Jeff Davis e0b3074e89 Remove unnecessary #ifdef USE_ICU and branch.
Now that the provider-independent API pg_strnxfrm() is available, we
no longer need the special cases for ICU in hashfunc.c and varchar.c.

Reviewed-by: Peter Eisentraut, Peter Geoghegan
Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
2023-02-23 11:20:00 -08:00
Jeff Davis 6974a8f768 Refactor to introduce pg_locale_deterministic().
Avoids the need of callers to test for NULL, and also avoids the need
to access the pg_locale_t structure directly.

Reviewed-by: Peter Eisentraut, Peter Geoghegan
Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
2023-02-23 11:17:41 -08:00
Jeff Davis d87d548cd0 Refactor to add pg_strcoll(), pg_strxfrm(), and variants.
Offers a generally better separation of responsibilities for collation
code. Also, a step towards multi-lib ICU, which should be based on a
clean separation of the routines required for collation providers.

Callers with NUL-terminated strings should call pg_strcoll() or
pg_strxfrm(); callers with strings and their length should call the
variants pg_strncoll() or pg_strnxfrm().

Reviewed-by: Peter Eisentraut, Peter Geoghegan
Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
2023-02-23 10:55:20 -08:00
Peter Eisentraut 2ddab010c2 Implement ANY_VALUE aggregate
SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an
implementation-dependent (i.e. non-deterministic) value from the
aggregated rows.

Author: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5cff866c-10a8-d2df-32cb-e9072e6b04a2@postgresfriends.org
2023-02-22 09:33:07 +01:00
Tom Lane 8028e294b4 Detect overflow in timestamp[tz] subtraction.
It's possible to overflow the int64 microseconds field of the
output interval when subtracting two timestamps.  Detect that
instead of silently returning a bogus result.

Nick Babadzhanian

Discussion: https://postgr.es/m/CABw73Uq2oJ3E+kYvvDuY04EkhhkChim2e-PaghBDjOmgUAMWGw@mail.gmail.com
2023-02-20 17:26:25 -05:00
Tom Lane f0d0394e84 Fix parsing of ISO-8601 interval fields with exponential notation.
Historically we've accepted interval input like 'P.1e10D'.  This
is probably an accident of having used strtod() to do the parsing,
rather than something anyone intended, but it's been that way for
a long time.  Commit e39f99046 broke this by trying to parse the
integer and fractional parts separately, without accounting for
the possibility of an exponent.  In principle that coding allowed
for precise conversions of field values wider than 15 decimal
digits, but that does not seem like a goal worth sweating bullets
for.  So, rather than trying to manage an exponent on top of the
existing complexity, let's just revert to the previous coding that
used strtod() by itself.  We can still improve on the old code to
the extent of allowing the value to range up to 1.0e15 rather than
only INT_MAX.  (Allowing more than that risks creating problems
due to precision loss: the converted fractional part might have
absolute value more than 1.  Perhaps that could be dealt with in
some way, but it really does not seem worth additional effort.)

Per bug #17795 from Alexander Lakhin.  Back-patch to v15 where
the faulty code came in.

Discussion: https://postgr.es/m/17795-748d6db3ed95d313@postgresql.org
2023-02-20 16:55:59 -05:00
Tom Lane 393430f575 Print the correct aliases for DML target tables in ruleutils.
ruleutils.c blindly printed the user-given alias (or nothing if there
hadn't been one) for the target table of INSERT/UPDATE/DELETE queries.
That works a large percentage of the time, but not always: for queries
appearing in WITH, it's possible that we chose a different alias to
avoid conflict with outer-scope names.  Since the chosen alias would
be used in any Var references to the target table, this'd lead to an
inconsistent printout with consequences such as dump/restore failures.

The correct logic for printing (or not) a relation alias was embedded
in get_from_clause_item.  Factor it out to a separate function so that
we don't need a jointree node to use it.  (Only a limited part of that
function can be reached from these new call sites, but this seems like
the cleanest non-duplicative factorization.)

In passing, I got rid of a redundant "\d+ rules_src" step in rules.sql.

Initial report from Jonathan Katz; thanks to Vignesh C for analysis.
This has been broken for a long time, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org
Discussion: https://postgr.es/m/CALDaNm1MMntjmT_NJGp-Z=xbF02qHGAyuSHfYHias3TqQbPF2w@mail.gmail.com
2023-02-17 16:40:34 -05:00
Peter Eisentraut 3b12e68a5c Change argument type of pq_sendbytes from char * to void *
This is a follow-up to 1f605b82ba.  It
allows getting rid of further casts at call sites.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/783a4edb-84f9-6df2-7470-2ef5ccc6607a@enterprisedb.com
2023-02-14 13:32:19 +01:00
Peter Eisentraut bd944884e9 Consolidate ItemPointer to Datum conversion functions
Instead of defining the same set of macros several times, define it
once in an appropriate header file.  In passing, convert to inline
functions.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/844dd4c5-e5a1-3df1-bfaf-d1e1c2a16e45%40enterprisedb.com
2023-02-13 09:57:15 +01:00
Tom Lane 5e80d35154 Avoid dereferencing an undefined pointer in DecodeInterval().
Commit e39f99046 moved some code up closer to the start of
DecodeInterval(), without noticing that it had been implicitly
relying on previous checks to reject the case of empty input.
Given empty input, we'd now dereference a pointer that hadn't been
set, possibly leading to a core dump.  (But if we fail to provoke
a SIGSEGV, nothing bad happens, and the expected syntax error is
thrown a bit later.)

Per bug #17788 from Alexander Lakhin.  Back-patch to v15 where
the fault was introduced.

Discussion: https://postgr.es/m/17788-dabac9f98f7eafd5@postgresql.org
2023-02-12 12:50:55 -05:00
Andres Freund a9c70b46db Add pg_stat_io view, providing more detailed IO statistics
Builds on 28e626bde0 and f30d62c2fc. See the former for motivation.

Rows of the view show IO operations for a particular backend type, IO target
object, IO context combination (e.g. a client backend's operations on
permanent relations in shared buffers) and each column in the view is the
total number of IO Operations done (e.g. writes). So a cell in the view would
be, for example, the number of blocks of relation data written from shared
buffers by client backends since the last stats reset.

In anticipation of tracking WAL IO and non-block-oriented IO (such as
temporary file IO), the "op_bytes" column specifies the unit of the "reads",
"writes", and "extends" columns for a given row.

Rows for combinations of IO operation, backend type, target object and context
that never occur, are ommitted entirely. For example, checkpointer will never
operate on temporary relations.

Similarly, if an IO operation never occurs for such a combination, the IO
operation's cell will be null, to distinguish from 0 observed IO
operations. For example, bgwriter should not perform reads.

Note that some of the cells in the view are redundant with fields in
pg_stat_bgwriter (e.g. buffers_backend). For now, these have been kept for
backwards compatibility.

Bumps catversion.

Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Samay Sharma <smilingsamay@gmail.com>
Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-11 09:52:15 -08:00
Andres Freund 28e626bde0 pgstat: Infrastructure for more detailed IO statistics
This commit adds the infrastructure for more detailed IO statistics. The calls
to actually count IOs, a system view to access the new statistics,
documentation and tests will be added in subsequent commits, to make review
easier.

While we already had some IO statistics, e.g. in pg_stat_bgwriter and
pg_stat_database, they did not provide sufficient detail to understand what
the main sources of IO are, or whether configuration changes could avoid
IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers
written out by a backend, but as that includes extending relations (always
done by backends) and writes triggered by the use of buffer access strategies,
it cannot easily be used to tune background writer or checkpointer. Similarly,
pg_stat_database.blks_read cannot easily be used to tune shared_buffers /
compute a cache hit ratio, as the use of buffer access strategies will often
prevent a large fraction of the read blocks to end up in shared_buffers.

The new IO statistics count IO operations (evict, extend, fsync, read, reuse,
and write), and are aggregated for each combination of backend type (backend,
autovacuum worker, bgwriter, etc), target object of the IO (relations, temp
relations) and context of the IO (normal, vacuum, bulkread, bulkwrite).

What is tracked in this series of patches, is sufficient to perform the
aforementioned analyses. Further details, e.g. tracking the number of buffer
hits, would make that even easier, but was left out for now, to keep the scope
of the already large patchset manageable.

Bumps PGSTAT_FILE_FORMAT_ID.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-08 20:53:42 -08:00
Peter Eisentraut aa69541046 Remove useless casts to (void *) in arguments of some system functions
The affected functions are: bsearch, memcmp, memcpy, memset, memmove,
qsort, repalloc

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-02-07 06:57:59 +01:00
Peter Eisentraut 54a177a948 Remove useless casts to (void *) in hash_search() calls
Some of these appear to be leftovers from when hash_search() took a
char * argument (changed in 5999e78fc4).

Since after this there is some more horizontal space available, do
some light reformatting where suitable.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-02-06 09:41:01 +01:00
Dean Rasheed faff8f8e47 Allow underscores in integer and numeric constants.
This allows underscores to be used in integer and numeric literals,
and their corresponding type input functions, for visual grouping.
For example:

    1_500_000_000
    3.14159_26535_89793
    0xffff_ffff
    0b_1001_0001

A single underscore is allowed between any 2 digits, or immediately
after the base prefix indicator of non-decimal integers, per SQL:202x
draft.

Peter Eisentraut and Dean Rasheed

Discussion: https://postgr.es/m/84aae844-dc55-a4be-86d9-4f0fa405cc97%40enterprisedb.com
2023-02-04 09:48:51 +00:00
Peter Eisentraut 1b6f632a35 Remove unused code related to unknown type
These are leftovers obsoleted by
cfd9be939e.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/e7887965-9e70-fd01-c2d1-5bc02f9169aa%40enterprisedb.com
2023-02-04 07:56:09 +01:00
Dean Rasheed b2d4792890 Make int64_div_fast_to_numeric() more robust.
The prior coding of int64_div_fast_to_numeric() had a number of bugs
that would cause it to fail under different circumstances, such as
with log10val2 <= 0, or log10val2 a multiple of 4, or in the "slow"
numeric path with log10val2 >= 10.

None of those could be triggered by any of our current code, which
only uses log10val2 = 3 or 6. However, they made it a hazard for any
future code that might use it. Also, since this is exported by
numeric.c, users writing their own C code might choose to use it.

Therefore fix, and back-patch to v14, where it was introduced.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCW8gXgW0tgPxPgHDPhVX71%2BSWFRkhnXy%2BTfGDsKLepu2g%40mail.gmail.com
2023-02-03 11:13:34 +00:00
Dean Rasheed 0736fc1ceb Clarify the choice of rscale in numeric_sqrt().
Improve the comment explaining the choice of rscale in numeric_sqrt(),
and ensure that the code works consistently when other values of
NBASE/DEC_DIGITS are used.

Note that, in practice, we always expect DEC_DIGITS == 4, and this
does not change the computation in that case.

Joel Jacobson and Dean Rasheed

Discussion: https://postgr.es/m/06712c29-98e9-43b3-98da-f234d81c6e49%40app.fastmail.com
2023-02-02 09:41:22 +00:00
Dean Rasheed 9a84f2947b Ensure that numeric.c compiles with other NBASE values.
As noted in the comments, support for different NBASE values is really
only of historical interest, but as long as we're keeping it, we might
as well make sure that it compiles.

Joel Jacobson

Discussion: https://postgr.es/m/06712c29-98e9-43b3-98da-f234d81c6e49%40app.fastmail.com
2023-02-02 09:39:08 +00:00
Tom Lane 2489d76c49 Make Vars be outer-join-aware.
Traditionally we used the same Var struct to represent the value
of a table column everywhere in parse and plan trees.  This choice
predates our support for SQL outer joins, and it's really a pretty
bad idea with outer joins, because the Var's value can depend on
where it is in the tree: it might go to NULL above an outer join.
So expression nodes that are equal() per equalfuncs.c might not
represent the same value, which is a huge correctness hazard for
the planner.

To improve this, decorate Var nodes with a bitmapset showing
which outer joins (identified by RTE indexes) may have nulled
them at the point in the parse tree where the Var appears.
This allows us to trust that equal() Vars represent the same value.
A certain amount of klugery is still needed to cope with cases
where we re-order two outer joins, but it's possible to make it
work without sacrificing that core principle.  PlaceHolderVars
receive similar decoration for the same reason.

In the planner, we include these outer join bitmapsets into the relids
that an expression is considered to depend on, and in consequence also
add outer-join relids to the relids of join RelOptInfos.  This allows
us to correctly perceive whether an expression can be calculated above
or below a particular outer join.

This change affects FDWs that want to plan foreign joins.  They *must*
follow suit when labeling foreign joins in order to match with the
core planner, but for many purposes (if postgres_fdw is any guide)
they'd prefer to consider only base relations within the join.
To support both requirements, redefine ForeignScan.fs_relids as
base+OJ relids, and add a new field fs_base_relids that's set up by
the core planner.

Large though it is, this commit just does the minimum necessary to
install the new mechanisms and get check-world passing again.
Follow-up patches will perform some cleanup.  (The README additions
and comments mention some stuff that will appear in the follow-up.)

Patch by me; thanks to Richard Guo for review.

Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
2023-01-30 13:16:20 -05:00
David Rowley 456fa635a9 Teach planner about more monotonic window functions
9d9c02ccd introduced runConditions for window functions to allow
monotonic window function evaluation to be made more efficient when the
window function value went beyond some value that it would never go back
from due to its monotonic nature.  That commit added prosupport functions
to inform the planner that row_number(), rank(), dense_rank() and some
forms of count(*) were monotonic.  Here we add support for ntile(),
cume_dist() and percent_rank().

Reviewed-by: Melanie Plageman
Discussion: https://postgr.es/m/CAApHDvqR+VqB8s+xR-24bzJbU8xyFrBszJ17qKgECf7cWxLCaA@mail.gmail.com
2023-01-27 16:08:41 +13:00
Tom Lane 3a28d78089 Improve TimestampDifferenceMilliseconds to cope with overflow sanely.
We'd like to use TimestampDifferenceMilliseconds with the stop_time
possibly being TIMESTAMP_INFINITY, but up to now it's disclaimed
responsibility for overflow cases.  Define it to clamp its output to
the range [0, INT_MAX], handling overflow correctly.  (INT_MAX rather
than LONG_MAX seems appropriate, because the function is already
described as being intended for calculating wait times for WaitLatch
et al, and that infrastructure only handles waits up to INT_MAX.
Also, this choice gets rid of cross-platform behavioral differences.)

Having done that, we can replace some ad-hoc code in walreceiver.c
with a simple call to TimestampDifferenceMilliseconds.

While at it, fix some buglets in existing callers of
TimestampDifferenceMilliseconds: basebackup_copy.c had not read the
memo about TimestampDifferenceMilliseconds never returning a negative
value, and postmaster.c had not read the memo about Min() and Max()
being macros with multiple-evaluation hazards.  Neither of these
quite seem worth back-patching.

Patch by me; thanks to Nathan Bossart for review.

Discussion: https://postgr.es/m/3126727.1674759248@sss.pgh.pa.us
2023-01-26 17:09:12 -05:00
Dean Rasheed 6dfacbf72b Add non-decimal integer support to type numeric.
This enhances the numeric type input function, adding support for
hexadecimal, octal, and binary integers of any size, up to the limits
of the numeric type.

Since 6fcda9aba8, such non-decimal integers have been accepted by the
parser as integer literals and passed through to numeric_in(). This
commit gives numeric_in() the ability to handle them.

While at it, simplify the handling of NaN and infinities, reducing the
number of calls to pg_strncasecmp(), and arrange for pg_strncasecmp()
to not be called at all for regular numbers. This gives a significant
performance improvement for decimal inputs, more than offsetting the
small performance hit of checking for non-decimal input.

Discussion: https://postgr.es/m/CAEZATCV8XShnmT9HZy25C%2Bo78CVOFmUN5EM9FRAZ5xvYTggPMg%40mail.gmail.com
2023-01-23 19:21:22 +00:00
Dean Rasheed 0aa38db56b Optimise numeric division for 3 and 4 base-NBASE digit divisors.
On platforms with 128-bit integer support, introduce a new function
div_var_int64(), along the same lines as div_var_int() added in
d1b307eef2 for divisors with 1 or 2 base-NBASE digits, and use it to
speed up div_var() and div_var_fast() in a similar way when the
divisor has 3 or 4 base-NBASE digits.

This gives significant performance gains for divisors with 9-16
decimal digits.

Joel Jacobson.

Discussion:
  https://postgr.es/m/b7a5893d-af18-4c0b-8918-96de5f1bbf39%40app.fastmail.com
  https://postgr.es/m/CAEZATCXGm%3DDyTq%3DFrcOqC0gPMVveKUYTaD5KRRoajrUTiWxVMw%40mail.gmail.com
2023-01-23 11:58:28 +00:00
David Rowley 16fd03e956 Allow parallel aggregate on string_agg and array_agg
This adds combine, serial and deserial functions for the array_agg() and
string_agg() aggregate functions, thus allowing these aggregates to
partake in partial aggregations.  This allows both parallel aggregation to
take place when these aggregates are present and also allows additional
partition-wise aggregation plan shapes to include plans that require
additional aggregation once the partially aggregated results from the
partitions have been combined.

Author: David Rowley
Reviewed-by: Andres Freund, Tomas Vondra, Stephen Frost, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9sx_6GTcvd6TMuZnNtCh0VhBzhX6FZqw17TgVFH-ga_A@mail.gmail.com
2023-01-23 17:35:01 +13:00
David Rowley 9f1ca6ce65 Use appendStringInfoSpaces in more places
This adjusts a few places which were appending a string constant
containing spaces onto a StringInfo.  We have appendStringInfoSpaces for
that job, so let's use that instead.

For the change to jsonb.c's add_indent() function, appendStringInfoString
was being called inside a loop to append 4 spaces on each loop.  This
meant that enlargeStringInfo would get called once per loop.  Here it
should be much more efficient to get rid of the loop and just calculate
the number of spaces with "level * 4" and just append all the spaces in
one go.

Here we additionally adjust the appendStringInfoSpaces function so it
makes use of memset rather than a while loop to apply the required spaces
to the StringInfo.  One of the problems with the while loop was that it
was incrementing one variable and decrementing another variable once per
loop.  That's more work than what's required to get the job done.  We may
as well use memset for this rather than trying to optimize the existing
loop.  Some testing has shown memset is faster even for very small sizes.

Discussion: https://postgr.es/m/CAApHDvp_rKkvwudBKgBHniNRg67bzXVjyvVKfX0G2zS967K43A@mail.gmail.com
2023-01-20 13:07:24 +13:00