Commit Graph

415 Commits

Author SHA1 Message Date
Robert Haas 174c480508 Add a new WAL summarizer process.
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.

In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.

A new parameter summarize_wal enables or disables this new background
process.  The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.

Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 08:42:28 -05:00
Michael Paquier 1301c80b21 Remove MSVC scripts
This commit removes all the scripts located in src/tools/msvc/ to build
PostgreSQL with Visual Studio on Windows, meson becoming the recommended
way to achieve that.  The scripts held some information that is still
relevant with meson, information kept and moved to better locations.
Comments that referred directly to the scripts are removed.

All the documentation still relevant that was in install-windows.sgml
has been moved to installation.sgml under a new subsection for Visual.
All the content specific to the scripts is removed.  Some adjustments
for the documentation are planned in a follow-up set of changes.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/ZQzp_VMJcerM1Cs_@paquier.xyz
2023-12-20 09:44:37 +09:00
Robert Haas aafc07c7a1 Move src/bin/pg_verifybackup/parse_manifest.c into src/common.
This makes it possible for the code to be easily reused by other
client-side tools, and/or by the server.

Patch by me. Review of this patch in particular by at least Peter
Eisentraut; reviewers for the patch series in general include Dilip
Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak.

Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com
2023-12-19 15:24:31 -05:00
Thomas Munro 0c6be59f5e Provide helper for retrying partial vectored I/O.
compute_remaining_iovec() is a re-usable routine for retrying after
pg_readv() or pg_writev() reports a short transfer.  This will gain new
users in a later commit, but can already replace the open-coded
equivalent code in the existing pg_pwritev_with_retry() function.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12 10:57:18 +13:00
Jeff Davis 719b342d36 Shrink Unicode category table.
Missing entries can implicitly be considered "unassigned".

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com
2023-12-07 15:44:03 -08:00
Michael Paquier 14f2f9eb1a Add CHECK_FOR_INTERRUPTS() in scram_SaltedPassword() for the backend
scram_SaltedPassword() could take a long time to compute when the number
of iterations used is large enough, and this code uses a tight loop to
compute a salted password.

Note that the same issue exists in libpq when using \password and a
large iteration number, but this cannot be interrupted.  A CFI in the
backend is useful for server-side computations, at least.

Backpatch down to 16, where the user-settable GUC scram_iterations has
been added.

Author: Bowen Shi
Reviewed-by: Aleksander Alekseev, Daniel Gustafsson
Discussion: https://postgr.es/m/CAM_vCueV6xfr08KczfaCEk5J_qeTZtgqN7+orkNLx=g+phE82Q@mail.gmail.com
Backpatch-through: 16
2023-11-28 08:35:50 +09:00
Heikki Linnakangas b8bff07daa Make ResourceOwners more easily extensible.
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.

The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.

All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.

The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.

Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget.  There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.

Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.

There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.

Add tests in src/test/modules/test_resowner.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
2023-11-08 13:30:50 +02:00
Peter Eisentraut 721856ff24 Remove distprep
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-11-06 15:18:04 +01:00
Peter Eisentraut 2c7c6c417f More consistent behavior of GetDataDirectoryCreatePerm on Windows
On Windows, GetDataDirectoryCreatePerm() just did nothing.  The way
the code in some callers is structured, this is the first function
that tries to access the data directory.  So it also ends up the place
that is responsible for reporting that a data directory does not exist
or similar.  Therefore, on Windows, these scenarios end up on
potentially completely different code paths.

To unify this, to make testing more consistent across platforms, have
GetDataDirectoryCreatePerm() run the stat() call on Windows as well,
even though it won't do anything with the result.  That way, file
system errors are reporting to callers in the same way as on
non-Windows.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/15a59bca-0383-183c-9383-0446da9b87e1%40eisentraut.org
2023-11-05 21:59:04 +01:00
Jeff Davis a02b37fc08 Additional unicode primitive functions.
Introduce unicode_version(), icu_unicode_version(), and
unicode_assigned().

The latter requires introducing a new lookup table for the Unicode
General Category, which is generated along with the other Unicode
lookup tables.

Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com
Reviewed-by: Peter Eisentraut
2023-11-01 22:47:06 -07:00
Peter Eisentraut 611806cd72 Add trailing commas to enum definitions
Since C99, there can be a trailing comma after the last value in an
enum definition.  A lot of new code has been introducing this style on
the fly.  Some new patches are now taking an inconsistent approach to
this.  Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one.  We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.

I omitted a few places where there was a fixed "last" value that will
always stay last.  I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers.  There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.

Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-26 09:20:54 +02:00
David Rowley f0efa5aec1 Introduce the concept of read-only StringInfos
There were various places in our codebase which conjured up a StringInfo
by manually assigning the StringInfo fields and setting the data field
to point to some existing buffer.  There wasn't much consistency here as
to what fields like maxlen got set to and in one location we didn't
correctly ensure that the buffer was correctly NUL terminated at len
bytes, as per what was documented as required in stringinfo.h

Here we introduce 2 new functions to initialize StringInfos.  One allows
callers to initialize a StringInfo passing along a buffer that is
already allocated by palloc.  Here the StringInfo code uses this buffer
directly rather than doing any memcpying into a new allocation.  Having
this as a function allows us to verify the buffer is correctly NUL
terminated.  StringInfos initialized this way can be appended to and
reset just like any other normal StringInfo.

The other new initialization function also accepts an existing buffer,
but the given buffer does not need to be a pointer to a palloc'd chunk.
This buffer could be a pointer pointing partway into some palloc'd chunk
or may not even be palloc'd at all.  StringInfos initialized this way
are deemed as "read-only".  This means that it's not possible to
append to them or reset them.

For the latter of the two new initialization functions mentioned above,
we relax the requirement that the data buffer must be NUL terminated.
Relaxing this requirement is convenient in a few places as it can save
us from having to allocate an entire new buffer just to add the NUL
terminator or save us from having to temporarily add a NUL only to have to
put the original char back again later.

Incompatibility note:

Here we also forego adding the NUL in a few places where it does not
seem to be required.  These locations are passing the given StringInfo
into a type's receive function.  It does not seem like any of our
built-in receive functions require this, but perhaps there's some UDT
out there in the wild which does require this.  It is likely worthy of
a mention in the release notes that a UDT's receive function mustn't rely
on the input StringInfo being NUL terminated.

Author: David Rowley
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
2023-10-26 16:31:48 +13:00
Tom Lane 9b103f861e Improve pglz_decompress's defenses against corrupt compressed data.
When processing a match tag, check to see if the claimed "off"
is more than the distance back to the output buffer start.
If it is, then the data is corrupt, and what's more we would
fetch from outside the buffer boundaries and potentially incur
a SIGSEGV.  (Although the odds of that seem relatively low, given
that "off" can't be more than 4K.)

Back-patch to v13; before that, this function wasn't really
trying to protect against bad data.

Report and fix by Flavien Guedez.

Discussion: https://postgr.es/m/01fc0593-e31e-463d-902c-dd43174acee2@oopacity.net
2023-10-18 20:43:27 -04:00
Thomas Munro 63a582222c Try to handle torn reads of pg_control in frontend.
Some of our src/bin tools read the control file without any kind of
interlocking against concurrent writes from the server.  At least ext4
and ntfs can expose partially modified contents when you do that.

For now, we'll try to tolerate this by retrying up to 10 times if the
checksum doesn't match, until we get two reads in a row with the same
bad checksum.  This is not guaranteed to reach the right conclusion, but
it seems very likely to.  Thanks to Tom Lane for this suggestion.

Various ideas for interlocking or atomicity were considered too
complicated, unportable or expensive given the lack of field reports,
but remain open for future reconsideration.

Back-patch as far as 12.  It doesn't seem like a good idea to put a
heuristic change for a very rare problem into the final release of 11.

Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru>
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
2023-10-16 17:33:08 +13:00
Tom Lane b6c7cfac88 Restore proper linkage of pg_char_to_encoding() and friends.
Back in the 8.3 era we discovered that it was problematic if
libpq.so had encoding ID assignments different from the backend,
which is possible because on some platforms libpq.so might be
of a different major version from the calling programs.
psql should use libpq's assignments, but initdb has to use the
backend's, else it will put wrong values into pg_database.
The solution devised in commit 8468146b0 relied on giving initdb
its own copy of encnames.c rather than relying on the functions
exported by libpq.  Later, that metamorphosed into ensuring that
libpgcommon got linked before libpq -- which made things OK for
initdb but broke psql.  We didn't notice for lack of any changes
in enum pg_enc since then.  Commit 06843df4a reversed that, fixing
the latent bug in psql but adding one in initdb.  The meson build
infrastructure is also not being sufficiently careful about link
order, and trying to make it so would be equally fragile.

Hence, let's use a new scheme based on giving the libpq-exported
symbols different real names than the same functions exported from
libpgcommon.a or libpgcommon_srv.a.  (We could distinguish those
two cases as well, but there seems no need to.)  libpq gets the
official names to avoid an ABI break for libpq clients, while the
other cases use #define's to make the real names "xxx_private"
rather than "xxx".  By controlling where the #define's are
applied, we can force any particular client program to use one
set or the other of the encnames.c functions.

We cannot back-patch this, since it'd be an ABI break for backend
loadable modules, but there seems little need to.  We're just
trying to ensure that the world is safe for hypothetical future
additions to enum pg_enc.

In passing this should fix "duplicate symbol" linker warnings
that we've been seeing on AIX buildfarm members since commit
06843df4a.  It's not very clear why that linker is complaining
now, when there were strictly *more* duplicates visible before,
but in any case this should remove the reason for complaint.

Patch by me; thanks to Andres Freund for review.

Discussion: https://postgr.es/m/2385119.1696354473@sss.pgh.pa.us
2023-10-07 12:08:10 -04:00
Alvaro Herrera 1c99cde2f3
Improve JsonLexContext's freeability
Previously, the JSON code didn't have to worry too much about freeing
JsonLexContext, because it was never too long-lived.  With new features
being added for SQL/JSON this is no longer the case.  Add a routine
that knows how to free this struct and apply that to a few places, to
prevent this from becoming problematic.

At the same time, we change the API of makeJsonLexContextCstringLen to
make it receive a pointer to JsonLexContext for callers that want it to
be stack-allocated; it can also be passed as NULL to get the original
behavior of a palloc'ed one.

This also causes an ABI break due to the addition of flags to
JsonLexContext, so we can't easily backpatch it.  AFAICS that's not much
of a problem; apparently some leaks might exist in JSON usage of
text-search, for example via json_to_tsvector, but I haven't seen any
complaints about that.

Per Coverity complaint about datum_to_jsonb_internal().

Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql
2023-10-05 10:59:08 +02:00
Nathan Bossart c103d07381 Add function for removing arbitrary nodes in binaryheap.
This commit introduces binaryheap_remove_node(), which can be used
to remove any node from a binary heap.  The implementation is
straightforward.  The target node is replaced with the last node in
the heap, and then we sift as needed to preserve the heap property.
This new function is intended for use in a follow-up commit that
will improve the performance of pg_restore.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
2023-09-18 14:06:08 -07:00
Nathan Bossart 5af0263afd Make binaryheap available to frontend code.
There are a couple of places in frontend code that could make use
of this simple binary heap implementation.  This commit makes
binaryheap usable in frontend code, much like commit 26aaf97b68 did
for StringInfo.  Like StringInfo, the header file is left in lib/
to reduce the likelihood of unnecessary breakage.

The frontend version of binaryheap exposes a void *-based API since
frontend code does not have access to the Datum definitions.  This
seemed like a better approach than switching all existing uses to
void * or making the Datum definitions available to frontend code.

Reviewed-by: Tom Lane, Alvaro Herrera
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
2023-09-18 12:18:33 -07:00
Peter Eisentraut 9d17e5f16f Update Unicode data to Unicode 15.1.0 2023-09-18 07:26:34 +02:00
Peter Eisentraut 5c08927d36 Make Unicode script fit for future versions
Between Unicode 15.0.0 and 15.1.0, the whitespace in
EastAsianWidth.txt has changed a bit, such as from

0020;Na          # Zs         SPACE

to

0020           ; Na # Zs         SPACE

with space around the semicolon.  Adjust the script to be able to
parse that.
2023-09-18 07:25:46 +02:00
Nathan Bossart cccc6cdeb3 Add support for syncfs() in frontend support functions.
This commit adds support for using syncfs() in fsync_pgdata() and
fsync_dir_recurse() (which have been renamed to sync_pgdata() and
sync_dir_recurse()).  Like recovery_init_sync_method,
sync_pgdata() calls syncfs() for the data directory, each
tablespace, and pg_wal (if it is a symlink).  For now, all of the
frontend utilities that use these support functions are hard-coded
to use fsync(), but a follow-up commit will allow specifying
syncfs().

Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
2023-09-06 16:27:00 -07:00
Michael Paquier f1e9f6bbfa Avoid memory leak in rmtree() when path cannot be opened
An allocation done for the directory names to recurse into for their
deletion was done before OPENDIR(), so, assuming that a failure happens,
this could leak a bit of memory.

Author: Ranier Vilela
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CAEudQAoN3-2ZKBALThnEk_q2hu8En5A0WG9O+5siJTQKVZzoWQ@mail.gmail.com
2023-07-31 11:36:44 +09:00
Michael Paquier fa88928470 Generate automatically code and documentation related to wait events
The documentation and the code is generated automatically from a new
file called wait_event_names.txt, formatted in sections dedicated to
each wait event class (Timeout, Lock, IO, etc.) with three tab-separated
fields:
- C symbol in enums
- Format in the system views
- Description in the docs

Using this approach has several advantages, as we have proved to be
rather bad in maintaining this area of the tree across the years:
- The order of each item in the documentation and the code, which should
be alphabetical, has become incorrect multiple times, and the script
generating the code and documentation has a few rules to enforce that,
making the maintenance a no-brainer.
- Some wait events were added to the code, but not documented, so this
cannot be missed now.
- The order of the tables for each wait event class is enforced in the
documentation (the input .txt file does so as well for clarity, though
this is not mandatory).
- Less code, shaving 1.2k lines from the tree, with 1/3 of the savings
coming from the code, the rest from the documentation.

The wait event types "Lock" and "LWLock" still have their own code path
for their code, hence only the documentation is created for them.  These
classes are listed with a special marker called WAIT_EVENT_DOCONLY in
the input file.

Adding a new wait event now requires only an update of
wait_event_names.txt, with "Lock" and "LWLock" treated as exceptions.

This commit has been tested with configure/Makefile, the CI and VPATH
build.  clean, distclean and maintainer-clean were working fine.

Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
2023-07-05 10:53:11 +09:00
Andres Freund a1cd982098 meson: Add dependencies to perl modules to various script invocations
Eventually it is likely worth trying to deal with this in a more expansive
way, by generating dependency files generated within the scripts. But it's not
entirely obvious how to do that in perl and is work more suitable for 17
anyway.

Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Discussion: https://postgr.es/m/87v8g7s6bf.fsf@wibble.ilmari.org
2023-06-09 20:12:16 -07:00
Tom Lane d98ed080bb Fix small overestimation of base64 encoding output length.
pg_base64_enc_len() and its clones overestimated the output
length by up to 2 bytes, as a result of sloppy thinking about
where to divide.  No callers require a precise estimate, so
this has no consequences worse than palloc'ing a byte or two
more than necessary.  We might as well get it right though.

This bug is very ancient, dating to commit 79d78bb26 which
added encode.c.  (The other instances were presumably copied
from there.)  Still, it doesn't quite seem worth back-patching.

Oleg Tselebrovskiy

Discussion: https://postgr.es/m/f94da55286a63022150bc266afdab754@postgrespro.ru
2023-06-08 11:24:31 -04:00
Tom Lane 0245f8db36 Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

This set of diffs is a bit larger than typical.  We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop).  We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up.  Going
forward, that should make for fewer random-seeming changes to existing
code.

Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19 17:24:48 -04:00
Peter Eisentraut 8e7912e73d Message style improvements 2023-05-19 18:45:29 +02:00
Peter Eisentraut 009bd237bf Fix error message wordings
The original patch for percentrepl.c c96de2ce17 adopted the error
messages from basebackup_to_shell, but that uses terminology that
doesn't really fit with the new API naming.
2023-05-17 21:33:47 +02:00
Thomas Munro faeedbcefd Introduce PG_IO_ALIGN_SIZE and align all I/O buffers.
In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a
later commit, we need the addresses of user space buffers to be well
aligned.  The exact requirements vary by OS and file system (typically
sectors and/or memory pages).  The address alignment size is set to
4096, which is enough for currently known systems: it matches modern
sectors and common memory page size.  There is no standard governing
O_DIRECT's requirements so we might eventually have to reconsider this
with more information from the field or future systems.

Aligning I/O buffers on memory pages is also known to improve regular
buffered I/O performance.

Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap buffers are now allocated with the new palloc_aligned() or
MemoryContextAllocAligned() functions introduced by commit 439f6175.
(2) Stack buffers now use a new struct PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler.  (3) The buffer
pool is also aligned in shared memory.

WAL buffers were already aligned on XLOG_BLCKSZ.  It's possible for
XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus
for O_DIRECT WAL writes to fail to be well aligned, but that's a
pre-existing condition and will be addressed by a later commit.

BufFiles are not yet addressed (there's no current plan to use O_DIRECT
for those, but they could potentially get some incidental speedup even
in plain buffered I/O operations through better alignment).

If we can't align stack objects suitably using the compiler extensions
we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to
0.  This avoids the need to consider systems that have O_DIRECT but
can't align stack objects the way we want; such systems could in theory
be supported with more work but we don't currently know of any such
machines, so it's easier to pretend there is no O_DIRECT support
instead.  That's an existing and tested class of system.

Add assertions that all buffers passed into smgrread(), smgrwrite() and
smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack
alignment tricks may be unavailable) or the block size has been set too
small to allow arrays of buffers to be all aligned.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
2023-04-08 16:34:50 +12:00
Tomas Vondra 2820adf775 Support long distance matching for zstd compression
zstd compression supports a special mode for finding matched in distant
past, which may result in better compression ratio, at the expense of
using more memory (the window size is 128MB).

To enable this optional mode, use the "long" keyword when specifying the
compression method (--compress=zstd:long).

Author: Justin Pryzby
Reviewed-by: Tomas Vondra, Jacob Champion
Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com
Discussion: https://postgr.es/m/20220327205020.GM28503@telsasoft.com
2023-04-06 17:18:42 +02:00
Daniel Gustafsson b577743000 Make SCRAM iteration count configurable
Replace the hardcoded value with a GUC such that the iteration
count can be raised in order to increase protection against
brute-force attacks.  The hardcoded value for SCRAM iteration
count was defined to be 4096, which is taken from RFC 7677, so
set the default for the GUC to 4096 to match.  In RFC 7677 the
recommendation is at least 15000 iterations but 4096 is listed
as a SHOULD requirement given that it's estimated to yield a
0.5s processing time on a mobile handset of the time of RFC
writing (late 2015).

Raising the iteration count of SCRAM will make stored passwords
more resilient to brute-force attacks at a higher computational
cost during connection establishment.  Lowering the count will
reduce computational overhead during connections at the tradeoff
of reducing strength against brute-force attacks.

There are however platforms where even a modest iteration count
yields a too high computational overhead, with weaker password
encryption schemes chosen as a result.  In these situations,
SCRAM with a very low iteration count still gives benefits over
weaker schemes like md5, so we allow the iteration count to be
set to one at the low end.

The new GUC is intentionally generically named such that it can
be made to support future SCRAM standards should they emerge.
At that point the value can be made into key:value pairs with
an undefined key as a default which will be backwards compatible
with this.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se
2023-03-27 09:46:29 +02:00
Tom Lane 11a0a8b529 Implement find_my_exec()'s path normalization using realpath(3).
Replace the symlink-chasing logic in find_my_exec with realpath(3),
which has been required by POSIX since SUSv2.  (Windows lacks
realpath(), but there we can use _fullpath() which is functionally
equivalent.)  The main benefit of this is that -- on all modern
platforms at least -- realpath() avoids the chdir() shenanigans
we used to perform while interpreting symlinks.  That had various
corner-case failure modes so it's good to get rid of it.

There is still ongoing discussion about whether we could skip the
replacement of symlinks in some cases, but that's really matter
for a separate patch.  Meanwhile I want to push this before we get
too close to feature freeze, so that we can find out if there are
showstopper portability issues.

Discussion: https://postgr.es/m/797232.1662075573@sss.pgh.pa.us
2023-03-23 18:17:49 -04:00
Tom Lane b0d8f2d983 Add SHELL_ERROR and SHELL_EXIT_CODE magic variables to psql.
These are set after a \! command or a backtick substitution.
SHELL_ERROR is just "true" for error (nonzero exit status) or "false"
for success, while SHELL_EXIT_CODE records the actual exit status
following standard shell/system(3) conventions.

Corey Huinker, reviewed by Maxim Orlov and myself

Discussion: https://postgr.es/m/CADkLM=cWao2x2f+UDw15W1JkVFr_bsxfstw=NGea7r9m4j-7rQ@mail.gmail.com
2023-03-21 13:03:56 -04:00
Andres Freund 2b7259f855 Silence pedantic compiler warning introduced in ce340e530d
.../src/common/file_utils.c: In function ‘pg_pwrite_zeros’:
.../src/common/file_utils.c:543:9: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
  543 |         const static PGAlignedBlock zbuffer = {{0}};    /* worth BLCKSZ */
2023-03-16 09:41:13 -07:00
Tom Lane 25a7812cd0 Fix JSON error reporting for many cases of erroneous string values.
The majority of error exit cases in json_lex_string() failed to
set lex->token_terminator, causing problems for the error context
reporting code: it would see token_terminator less than token_start
and do something more or less nuts.  In v14 and up the end result
could be as bad as a crash in report_json_context().  Older
versions accidentally avoided that fate; but all versions produce
error context lines that are far less useful than intended,
because they'd stop at the end of the prior token instead of
continuing to where the actually-bad input is.

To fix, invent some macros that make it less notationally painful
to do the right thing.  Also add documentation about what the
function is actually required to do; and in >= v14, add an assertion
in report_json_context about token_terminator being sufficiently
far advanced.

Per report from Nikolay Shaplov.  Back-patch to all supported
versions.

Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro
2023-03-13 15:19:00 -04:00
Peter Eisentraut 36ea345f8f Improve/correct comments
Change comments for pg_cryptohash_init(), pg_cryptohash_update(),
pg_cryptohash_final() in cryptohash.c to match cryptohash_openssl.c.
In particular, the claim that these functions were "designed" to never
fail was incorrect, since by design callers need to be prepared to
handle failures, for compatibility with the cryptohash_openssl.c
versions.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/301F4EDD-27B9-460F-B462-B9DB2BDE4ACF@yesql.se
2023-03-09 09:59:46 +01:00
Andres Freund 401874ab02 meson: don't require 'touch' binary, make use of 'cp' optional
We already didn't use touch (some earlier version of the meson build did ),
and cp is only used for updating unicode files. The latter already depends on
the optional availability of 'wget', so doing the same for 'cp' makes sense.

Eventually we probably want a portable command for updating source code as
part of a target, but for now...

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/70e96c34-64ee-e549-8c4a-f91a7a668804@dunslane.net
2023-03-07 18:44:42 -08:00
Michael Paquier d937904cce Silence -Wmissing-braces complaints in file_utils.c
Per buildfarm member lapwing, coupled with an offline poke from Julien
Rouhaud.

6392f2a was a similar case.
2023-03-07 07:42:36 +09:00
Michael Paquier ce340e530d Revise pg_pwrite_zeros()
The following changes are made to pg_write_zeros(), the API able to
write series of zeros using vectored I/O:
- Add of an "offset" parameter, to write the size from this position
(the 'p' of "pwrite" seems to mean position, though POSIX does not
outline ythat directly), hence the name of the routine is incorrect if
it is not able to handle offsets.
- Avoid memset() of "zbuffer" on every call.
- Avoid initialization of the whole IOV array if not needed.
- Group the trailing write() call with the main write() call,
simplifying the function logic.

Author: Andres Freund
Reviewed-by: Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/20230215005525.mrrlmqrxzjzhaipl@awork3.anarazel.de
2023-03-06 13:21:33 +09:00
Michael Paquier 2f6e15ac93 Revert refactoring of restore command code to shell_restore.c
This reverts commits 24c35ec and 57169ad.  PreRestoreCommand() and
PostRestoreCommand() need to be put closer to the system() call calling
a restore_command, as they enable in_restore_command for the startup
process which would in turn trigger an immediate proc_exit() in the
SIGTERM handler.  Perhaps we could get rid of this behavior entirely,
but 24c35ec has made the window where the flag is enabled much larger
than it was, and any Postgres-like actions (palloc, etc.) taken by code
paths while the flag is enabled could lead to more severe issues in the
shutdown processing.

Note that curculio has showed that there are much more problems in this
area, unrelated to this change, actually, hence the issues related to
that had better be addressed first.  Keeping the code of HEAD in line
with the stable branches should make that a bit easier.

Per discussion with Andres Freund and Nathan Bossart.

Discussion: https://postgr.es/m/Y979NR3U5VnWrTwB@paquier.xyz
2023-02-06 08:28:42 +09:00
Thomas Munro 54e72b66ed Refactor rmtree() to use get_dirent_type().
Switch to get_dirent_type() instead of lstat() while traversing a
directory tree, to see if that fixes the intermittent ENOTEMPTY failures
seen in recent pg_upgrade tests, on Windows CI.  While refactoring, also
use AllocateDir() instead of opendir() in the backend, which knows how
to handle descriptor pressure.

Our CI system currently uses Windows Server 2019, a version known not to
have POSIX unlink semantics enabled by default yet, unlike typical
Windows 10 and 11 systems.  That might explain why we see this flapping
on CI but (apparently) not in the build farm, though the frequency is
quite low.

The theory is that some directory entry must be in state
STATUS_DELETE_PENDING, which lstat() would report as ENOENT, though
unfortunately we don't know exactly why yet.  With this change, rmtree()
will not skip them, and try to unlink (again).  Our unlink() wrapper
should either wait a short time for them to go away when some other
process closes the handle, or log a message to tell us the path of the
problem file if not, so we can dig further.

Discussion: https://postgr.es/m/20220919213217.ptqfdlcc5idk5xup%40awork3.anarazel.de
2023-01-31 13:46:25 +13:00
David Rowley 9f1ca6ce65 Use appendStringInfoSpaces in more places
This adjusts a few places which were appending a string constant
containing spaces onto a StringInfo.  We have appendStringInfoSpaces for
that job, so let's use that instead.

For the change to jsonb.c's add_indent() function, appendStringInfoString
was being called inside a loop to append 4 spaces on each loop.  This
meant that enlargeStringInfo would get called once per loop.  Here it
should be much more efficient to get rid of the loop and just calculate
the number of spaces with "level * 4" and just append all the spaces in
one go.

Here we additionally adjust the appendStringInfoSpaces function so it
makes use of memset rather than a while loop to apply the required spaces
to the StringInfo.  One of the problems with the while loop was that it
was incrementing one variable and decrementing another variable once per
loop.  That's more work than what's required to get the job done.  We may
as well use memset for this rather than trying to optimize the existing
loop.  Some testing has shown memset is faster even for very small sizes.

Discussion: https://postgr.es/m/CAApHDvp_rKkvwudBKgBHniNRg67bzXVjyvVKfX0G2zS967K43A@mail.gmail.com
2023-01-20 13:07:24 +13:00
Michael Paquier 14bdb3f13d Refactor code for restoring files via shell commands
Presently, restore_command uses a different code path than
archive_cleanup_command and recovery_end_command.  These code paths
are similar and can be easily combined, as long as it is possible to
identify if a command should:
- Issue a FATAL on signal.
- Exit immediately on SIGTERM.

While on it, this removes src/common/archive.c and its associated
header.  Since the introduction of c96de2c, BuildRestoreCommand() has
become a simple wrapper of replace_percent_placeholders() able to call
make_native_path().  This simplifies shell_restore.c as long as
RestoreArchivedFile() includes a call to make_native_path().

Author: Nathan Bossart
Reviewed-by: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/20221227192449.GA3672473@nathanxps13
2023-01-18 11:15:48 +09:00
Peter Eisentraut 881fa869c6 Code cleanup
for commit c96de2ce17

Author: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/20230111185434.GA1912982@nathanxps13
2023-01-12 07:37:39 +01:00
Peter Eisentraut c96de2ce17 Common function for percent placeholder replacement
There are a number of places where a shell command is constructed with
percent-placeholders (like %x).  It's cumbersome to have to open-code
this several times.  This factors out this logic into a separate
function.  This also allows us to ensure consistency for and document
some subtle behaviors, such as what to do with unrecognized
placeholders.

The unified handling is now that incorrect and unknown placeholders
are an error, where previously in most cases they were skipped or
ignored.  This affects the following settings:

- archive_cleanup_command
- archive_command
- recovery_end_command
- restore_command
- ssl_passphrase_command

The following settings are part of this refactoring but already had
stricter error handling and should be unchanged in their behavior:

- basebackup_to_shell.command

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5238bbed-0b01-83a6-d4b2-7eb0562a054e%40enterprisedb.com
2023-01-11 10:42:35 +01:00
Tom Lane 38d81760c4 Invent random_normal() to provide normally-distributed random numbers.
There is already a version of this in contrib/tablefunc, but it
seems sufficiently widely useful to justify having it in core.

Paul Ramsey

Discussion: https://postgr.es/m/CACowWR0DqHAvOKUCNxTrASFkWsDLqKMd6WiXvVvaWg4pV1BMnQ@mail.gmail.com
2023-01-09 12:44:00 -05:00
Bruce Momjian c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Peter Eisentraut 1f605b82ba Change argument of appendBinaryStringInfo from char * to void *
There is some code that uses this function to assemble some kind of
packed binary layout, which requires a bunch of casts because of this.
Functions taking binary data plus length should take void * instead,
like memcpy() for example.

Discussion: https://www.postgresql.org/message-id/flat/a0086cfc-ff0f-2827-20fe-52b591d2666c%40enterprisedb.com
2022-12-30 11:05:09 +01:00
Peter Eisentraut 24b55cd949 Reorder some object files in makefiles
This restores some once-intended alphabetical orders and makes the
lists consistent between the different build systems.
2022-12-28 15:10:27 +01:00
Andrew Dunstan 8284cf5f74 Add copyright notices to meson files
Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
2022-12-20 07:54:39 -05:00
Michael Paquier b3bb7d12af Remove hardcoded dependency to cryptohash type in the internals of SCRAM
SCRAM_KEY_LEN was a variable used in the internal routines of SCRAM to
size a set of fixed-sized arrays used in the SHA and HMAC computations
during the SASL exchange or when building a SCRAM password.  This had a
hard dependency on SHA-256, reducing the flexibility of SCRAM when it
comes to the addition of more hash methods.  A second issue was that
SHA-256 is assumed as the cryptohash method to use all the time.

This commit renames SCRAM_KEY_LEN to a more generic SCRAM_KEY_MAX_LEN,
which is used as the size of the buffers used by the internal routines
of SCRAM.  This is aimed at tracking centrally the maximum size
necessary for all the hash methods supported by SCRAM.  A global
variable has the advantage of keeping the code in its simplest form,
reducing the need of more alloc/free logic for all the buffers used in
the hash calculations.

A second change is that the key length (SHA digest length) and hash
types are now tracked by the state data in the backend and the frontend,
the common portions being extended to handle these as arguments by the
internal routines of SCRAM.  There are a few RFC proposals floating
around to extend the SCRAM protocol, including some to use stronger
cryptohash algorithms, so this lifts some of the existing restrictions
in the code.

The code in charge of parsing and building SCRAM secrets is extended to
rely on the key length and on the cryptohash type used for the exchange,
assuming currently that only SHA-256 is supported for the moment.  Note
that the mock authentication simply enforces SHA-256.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Jonathan Katz
Discussion: https://postgr.es/m/Y5k3Qiweo/1g9CG6@paquier.xyz
2022-12-20 08:53:22 +09:00
Peter Eisentraut 75f49221c2 Static assertions cleanup
Because we added StaticAssertStmt() first before StaticAssertDecl(),
some uses as well as the instructions in c.h are now a bit backwards
from the "native" way static assertions are meant to be used in C.
This updates the guidance and moves some static assertions to better
places.

Specifically, since the addition of StaticAssertDecl(), we can put
static assertions at the file level.  This moves a number of static
assertions out of function bodies, where they might have been stuck
out of necessity, to perhaps better places at the file level or in
header files.

Also, when the static assertion appears in a position where a
declaration is allowed, then using StaticAssertDecl() is more native
than StaticAssertStmt().

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
2022-12-15 10:10:32 +01:00
Tom Lane c60c9badba Convert json_in and jsonb_in to report errors softly.
This requires a bit of further infrastructure-extension to allow
trapping errors reported by numeric_in and pg_unicode_to_server,
but otherwise it's pretty straightforward.

In the case of jsonb_in, we are only capturing errors reported
during the initial "parse" phase.  The value-construction phase
(JsonbValueToJsonb) can also throw errors if assorted implementation
limits are exceeded.  We should improve that, but it seems like a
separable project.

Andrew Dunstan and Tom Lane

Discussion: https://postgr.es/m/3bac9841-fe07-713d-fa42-606c225567d6@dunslane.net
2022-12-11 11:28:15 -05:00
Tom Lane 50428a301d Change JsonSemAction to allow non-throw error reporting.
Formerly, semantic action functions for the JSON parser returned void,
so that there was no way for them to affect the parser's behavior.
That means in particular that they can't force an error exit except by
longjmp'ing.  That won't do in the context of our project to make input
functions return errors softly.  Hence, change them to return the same
JsonParseErrorType enum value as the parser itself uses.  If an action
function returns anything besides JSON_SUCCESS, the parse is abandoned
and that error code is returned.

Action functions can thus easily return the same error conditions that
the parser already knows about.  As an escape hatch for expansion, also
invent a code JSON_SEM_ACTION_FAILED that the core parser does not know
the exact meaning of.  When returning this code, an action function
must use some out-of-band mechanism for reporting the error details.

This commit simply makes the API change and causes all the existing
action functions to return JSON_SUCCESS, so that there is no actual
change in behavior here.  This is long enough and boring enough that
it seemed best to commit it separately from the changes that make
real use of the new mechanism.

In passing, remove a duplicate assignment of
transform_string_values_scalar.

Discussion: https://postgr.es/m/1436686.1670701118@sss.pgh.pa.us
2022-12-11 10:39:05 -05:00
Andres Freund 5bdd0cfb91 meson: Add basic PGXS compatibility
Generate a Makefile.global that's complete enough for PGXS to work for some
extensions. It is likely that this compatibility layer will not suffice for
every extension and not all platforms - we can expand it over time.

This allows extensions to use a single buildsystem across all the supported
postgres versions. Once all supported PG versions support meson, we can remove
the compatibility layer.

Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6clig6@awork3.anarazel.de
2022-12-06 18:56:46 -08:00
Michael Paquier d18655cc03 Refactor code parsing compression option values (-Z/--compress)
This commit moves the code in charge of deparsing the method and detail
strings fed later to parse_compress_specification() to a common routine,
where the backward-compatible case of only an integer being found (N
= 0 => "none", N > 1 => gzip at level N) is handled.

Note that this has a side-effect for pg_basebackup, as we now attempt to
detect "server-" and "client-" before checking for the integer-only
pre-14 grammar, where values like server-N and client-N (without the
follow-up detail string) are now valid rather than failing because of an
unsupported method name.  Past grammars are still handled the same way,
but these flavors are now authorized, and would now switch to consider N
= 0 as no compression and N > 1 as gzip with the compression level used
as N, with the caller still controlling if the compression method should
be done server-side, client-side or is unspecified.  The documentation
of pg_basebackup is updated to reflect that.

This benefits other code paths that would like to rely on the same logic
as pg_basebackup and pg_receivewal with option values used for
compression specifications, one area discussed lately being pg_dump.

Author: Georgios Kokolatos, Michael Paquier
Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
2022-11-30 09:34:32 +09:00
Peter Eisentraut 2fe3bdbd69 Check return value of pclose() correctly
Some callers didn't check the return value of pclose() or
ClosePipeStream() correctly.  Either they didn't check it at all or
they treated it like the return of fclose().

The correct way is to first check whether the return value is -1, and
then report errno, and then check the return value like a result from
system(), for which we already have wait_result_to_str() to make it
simpler.  To make this more compact, expand wait_result_to_str() to
also handle -1 explicitly.

Reviewed-by: Ankit Kumar Pandey <itsankitkp@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8cd9fb02-bc26-65f1-a809-b1cb360eef73@enterprisedb.com
2022-11-15 15:36:51 +01:00
Michael Paquier 3bdbdf5d06 Introduce pg_pwrite_zeros() in fileutils.c
This routine is designed to write zeros to a file using vectored I/O,
for a size given by its caller, being useful when it comes to
initializing a file with a final size already known.

XLogFileInitInternal() in xlog.c is changed to use this new routine when
initializing WAL segments with zeros (wal_init_zero enabled).  Note that
the aligned buffers used for the vectored I/O writes have a size of
XLOG_BLCKSZ, and not BLCKSZ anymore, as pg_pwrite_zeros() relies on
PGAlignedBlock while xlog.c originally used PGAlignedXLogBlock.

This routine will be used in a follow-up patch to do the pre-padding of
WAL segments for pg_receivewal and pg_basebackup when these are not
compressed.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Andres Freund, Thomas Munro, Michael
Paquier
Discussion: https://www.postgresql.org/message-id/CALj2ACUq7nAb7%3DbJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA%40mail.gmail.com
2022-11-08 12:23:46 +09:00
Peter Eisentraut b1099eca8f Remove AssertArg and AssertState
These don't offer anything over plain Assert, and their usage had
already been declared obsolescent.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13
2022-10-28 09:19:06 +02:00
Michael Paquier 4ab8c81bd9 Move pg_pwritev_with_retry() to src/common/file_utils.c
This commit moves pg_pwritev_with_retry(), a convenience wrapper of
pg_writev() able to handle partial writes, to common/file_utils.c so
that the frontend code is able to use it.  A first use-case targetted
for this routine is pg_basebackup and pg_receivewal, for the
zero-padding of a newly-initialized WAL segment.  This is used currently
in the backend when the GUC wal_init_zero is enabled (default).

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Thomas Munro
Discussion: https://postgr.es/m/CALj2ACUq7nAb7=bJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA@mail.gmail.com
2022-10-27 14:39:42 +09:00
Andres Freund e5555657ba meson: Add support for building with precompiled headers
This substantially speeds up building for windows, due to the vast amount of
headers included via windows.h. A cross build from linux targetting mingw goes
from

994.11user 136.43system 0:31.58elapsed 3579%CPU
to
422.41user 89.05system 0:14.35elapsed 3562%CPU

The wins on windows are similar-ish (but I don't have a system at hand just
now for actual numbers). Targetting other operating systems the wins are far
smaller (tested linux, macOS, FreeBSD).

For now precompiled headers are disabled by default, it's not clear how well
they work on all platforms. E.g. on FreeBSD gcc doesn't seem to have working
support, but clang does.

When doing a full build precompiled headers are only beneficial for targets
with multiple .c files, as meson builds a separate precompiled header for each
target (so that different compilation options take effect). This commit
therefore only changes target with at least two .c files to use precompiled
headers.

Because this commit adds b_pch=false to the default_options new build
directories will have precompiled headers disabled by default, however
existing build directories will continue use the default value of b_pch, which
is true.

Note that using precompiled headers with ccache requires setting
CCACHE_SLOPPINESS=pch_defines,time_macros to get hits.

Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKG+50eOUbN++ocDc0Qnp9Pvmou23DSXu=ZA6fepOcftKqA@mail.gmail.com
Discussion: https://postgr.es/m/c5736f70-bb6d-8d25-e35c-e3d886e4e905@enterprisedb.com
Discussion: https://postgr.es/m/20190826054000.GE7005%40paquier.xyz
2022-10-06 17:19:30 -07:00
Alvaro Herrera d84a7b290f
Change some errdetail() to errdetail_internal()
This prevents marking the argument string for translation for gettext,
and it also prevents the given string (which is already translated) from
being translated at runtime.

Also, mark the strings used as arguments to check_rolespec_name for
translation.

Backpatch all the way back as appropriate.  None of this is caught by
any tests (necessarily so), so I verified it manually.
2022-09-28 17:14:53 +02:00
Robert Haas a448e49bcb Revert 56-bit relfilenode change and follow-up commits.
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
2022-09-28 09:55:28 -04:00
Robert Haas 05d4cbf9b6 Increase width of RelFileNumbers from 32 bits to 56 bits.
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.

If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.

This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.

Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.

Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
2022-09-27 13:25:21 -04:00
Peter Eisentraut 26f7802beb Message style improvements 2022-09-24 18:41:25 -04:00
Michael Paquier 18ac08f0b4 Use min/max bounds defined by Zstd for compression level
The bounds hardcoded in compression.c since ffd5365 (minimum at 1 and
maximum at 22) do not match the reality of what zstd is able to
handle, these values being available via ZSTD_maxCLevel() and
ZSTD_minCLevel() at run-time.  The maximum of 22 is actually correct
in recent versions, but the minimum was not as the library can go down
to -131720 by design.  This commit changes the code to use the run-time
values in the code instead of some hardcoded ones.

Zstd seems to assume that these bounds could change in the future, and
Postgres will be able to adapt automatically to such changes thanks to
what's being done in this commit.

Reported-by: Justin Prysby
Discussion: https://postgr.es/m/20220922033716.GL31833@telsasoft.com
Backpatch-through: 15
2022-09-22 20:02:40 +09:00
Andres Freund e6927270cd meson: Add initial version of meson based build system
Autoconf is showing its age, fewer and fewer contributors know how to wrangle
it. Recursive make has a lot of hard to resolve dependency issues and slow
incremental rebuilds. Our home-grown MSVC build system is hard to maintain for
developers not using Windows and runs tests serially. While these and other
issues could individually be addressed with incremental improvements, together
they seem best addressed by moving to a more modern build system.

After evaluating different build system choices, we chose to use meson, to a
good degree based on the adoption by other open source projects.

We decided that it's more realistic to commit a relatively early version of
the new build system and mature it in tree.

This commit adds an initial version of a meson based build system. It supports
building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD,
Solaris and Windows (however only gcc is supported on aix, solaris). For
Windows/MSVC postgres can now be built with ninja (faster, particularly for
incremental builds) and msbuild (supporting the visual studio GUI, but
building slower).

Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM
bitcode generation, documentation adjustments) are done in subsequent commits
requiring further review. Other aspects (e.g. not installing test-only
extensions) are not yet addressed.

When building on Windows with msbuild, builds are slower when using a visual
studio version older than 2019, because those versions do not support
MultiToolTask, required by meson for intra-target parallelism.

The plan is to remove the MSVC specific build system in src/tools/msvc soon
after reaching feature parity. However, we're not planning to remove the
autoconf/make build system in the near future. Likely we're going to keep at
least the parts required for PGXS to keep working around until all supported
versions build with meson.

Some initial help for postgres developers is at
https://wiki.postgresql.org/wiki/Meson

With contributions from Thomas Munro, John Naylor, Stone Tickle and others.

Author: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-21 22:37:17 -07:00
Michael Paquier f352e2d08a Simplify handling of compression level with compression specifications
PG_COMPRESSION_OPTION_LEVEL is removed from the compression
specification logic, and instead the compression level is always
assigned with each library's default if nothing is directly given.  This
centralizes the checks on the compression methods supported by a given
build, and always assigns a default compression level when parsing a
compression specification.  This results in complaining at an earlier
stage than previously if a build supports a compression method or not,
aka when parsing a specification in the backend or the frontend, and not
when processing it.  zstd, lz4 and zlib are able to handle in their
respective routines setting up the compression level the case of a
default value, hence the backend or frontend code (pg_receivewal or
pg_basebackup) has now no need to know what the default compression
level should be if nothing is specified: the logic is now done so as the
specification parsing assigns it.  It can also be enforced by passing
down a "level" set to the default value, that the backend will accept
(the replication protocol is for example able to handle a command like
BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')).

This code simplification fixes an issue with pg_basebackup --gzip
introduced by ffd5365, where the tarball of the streamed WAL segments
would be created as of pg_wal.tar.gz with uncompressed contents, while
the intention is to compress the segments with gzip at a default level.
The origin of the confusion comes from the handling of the default
compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of
0 was getting assigned, which is what walmethods.c would consider
as equivalent to no compression when streaming WAL segments with its tar
methods.  Assigning always the compression level removes the confusion
of some code paths considering a value of 0 set in a specification as
either no compression or a default compression level.

Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests
where the shape of the compression detail string for client and
server-side compression was checked using gzip.  This is a result of the
code simplification, as gzip specifications cannot be used if a build
does not support it.

Reported-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 15
2022-09-14 12:16:57 +09:00
Peter Eisentraut 45b1a67a0f pg_clean_ascii(): escape bytes rather than lose them
Rather than replace each unprintable byte with a '?' character, replace
it with a hex escape instead. The API now allocates a copy rather than
modifying the input in place.

Author: Jacob Champion <jchampion@timescale.com>
Discussion: https://www.postgresql.org/message-id/CAAWbhmgsvHrH9wLU2kYc3pOi1KSenHSLAHBbCVmmddW6-mc_=w@mail.gmail.com
2022-09-13 16:10:44 +02:00
John Naylor 0bd9c62973 Treat Unicode codepoints of category "Format" as non-spacing
Commit d8594d123 updated the list of non-spacing codepoints used
for calculating display width, but in doing so inadvertently removed
some, since the script used for that commit only considered combining
characters.

For complete coverage for zero-width characters, include codepoints in
the category Cf (Format). To reflect the wider purpose, also rename files
and update comments that referred specifically to combining characters.

Some of these ranges have been missing since v12, but due to lack of
field complaints it was determined not important enough to justify adding
special-case logic the backbranches.

Kyotaro Horiguchi

Report by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRBE8yvpQ0FSkPCoe0Ny1jAAsAQ6j3qMgVwWvkqAoaaNmQ%40mail.gmail.com
2022-09-13 16:13:33 +07:00
Peter Eisentraut 5015e1e1b5 Assorted examples of expanded type-safer palloc/pg_malloc API
This adds some uses of the new palloc/pg_malloc variants here and
there as a demonstration and test.  This is kept separate from the
actual API patch, since the latter might be backpatched at some point.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
2022-09-12 08:45:03 +02:00
Michael Paquier 47bd0b3fa6 Replace load of functions by direct calls for some WIN32
This commit changes the following code paths to do direct system calls
to some WIN32 functions rather than loading them from an external
library, shaving some code in the process:
- Creation of restricted tokens in pg_ctl.c, introduced by a25cd81.
- QuerySecurityContextToken() in auth.c for SSPI authentication in the
backend, introduced in d602592.
- CreateRestrictedToken() in src/common/.  This change is similar to the
case of pg_ctl.c.

Most of these functions were loaded rather than directly called because,
as mentioned in the code comments, MinGW headers were not declaring
them.  I have double-checked the recent MinGW code, and all the
functions changed here are declared in its headers, so this change
should be safe.  Note that I do not have a MinGW environment at hand so
I have not tested it directly, but that MSVC was fine with the change.
The buildfarm will tell soon enough if this change is appropriate or not
for a much broader set of environments.

A few code paths still use GetProcAddress() to load some functions:
- LDAP authentication for ldap_start_tls_sA(), where I am not confident
that this change would work.
- win32env.c and win32ntdll.c where we have a per-MSVC version
dependency for the name of the library loaded.
- crashdump.c for MiniDumpWriteDump() and EnumDirTree(), where direct
calls were not able to work after testing.

Reported-by: Thomas Munro
Reviewed-by: Justin Prysby
Discussion: https://postgr.es/m/CA+hUKG+BMdcaCe=P-EjMoLTCr3zrrzqbcVE=8h5LyNsSVHKXZA@mail.gmail.com
2022-09-09 10:52:17 +09:00
John Naylor 0a8de93a48 Speed up lexing of long JSON strings
Use optimized linear search when looking ahead for end quotes,
backslashes, and non-printable characters. This results in nearly 40%
faster JSON parsing on x86-64 when most values are long strings, and
all platforms should see some improvement.

Reviewed by Andres Freund and Nathan Bossart
Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com
2022-09-02 09:36:22 +07:00
Tom Lane 7fed801135 Clean up inconsistent use of fflush().
More than twenty years ago (79fcde48b), we hacked the postmaster
to avoid a core-dump on systems that didn't support fflush(NULL).
We've mostly, though not completely, hewed to that rule ever since.
But such systems are surely gone in the wild, so in the spirit of
cleaning out no-longer-needed portability hacks let's get rid of
multiple per-file fflush() calls in favor of using fflush(NULL).

Also, we were fairly inconsistent about whether to fflush() before
popen() and system() calls.  While we've received no bug reports
about that, it seems likely that at least some of these call sites
are at risk of odd behavior, such as error messages appearing in
an unexpected order.  Rather than expend a lot of brain cells
figuring out which places are at hazard, let's just establish a
uniform coding rule that we should fflush(NULL) before these calls.
A no-op fflush() is surely of trivial cost compared to launching
a sub-process via a shell; while if it's not a no-op then we likely
need it.

Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
2022-08-29 13:55:41 -04:00
John Naylor 121d2d3d70 Use SSE2 in is_valid_ascii() where available.
Per flame graph from Jelte Fennema, COPY FROM ... USING BINARY shows
input validation taking at least 5% of the profile, so it's worth trying
to be more efficient here. With this change, validation of pure ASCII is
nearly 40% faster on contemporary Intel hardware. To make this change
legible and easier to adopt to additional architectures, use helper
functions to abstract the platform details away.

Reviewed by Nathan Bossart

Discussion: https://www.postgresql.org/message-id/CAFBsxsG%3Dk8t%3DC457FXnoBXb%3D8iA4OaZkbFogFMachWif7mNnww%40mail.gmail.com
2022-08-26 15:48:49 +07:00
Thomas Munro c981879814 Don't bother to set sockaddr_un.sun_len.
It's not necessary to fill in sun_len when calling bind() or connect(),
on all known systems that have it.

Discussion: https://postgr.es/m/2781112.1644819528%40sss.pgh.pa.us
2022-08-24 00:09:37 +12:00
Thomas Munro 64ef572c06 Remove configure probes for sockaddr_storage members.
Remove four probes for members of sockaddr_storage.  Keep only the probe
for sockaddr's sa_len, which is enough for our two remaining places that
know about _len fields:

1.  ifaddr.c needs to know if sockaddr has sa_len to understand the
result of ioctl(SIOCGIFCONF).  Only AIX is still using the relevant code
today, but it seems like a good idea to keep it compilable on Linux.

2.  ip.c was testing for presence of ss_len to decide whether to fill in
sun_len in our getaddrinfo_unix() function.  It's just as good to test
for sa_len.  If you have one, you have them all.

(The code in #2 isn't actually needed at all on several OSes I checked
since modern versions ignore sa_len on input to system calls.  Proving
that's the case for all relevant OSes is left for another day, but
wouldn't get rid of that last probe anyway if we still want it for #1.)

Discussion: https://postgr.es/m/CA%2BhUKGJJjF2AqdU_Aug5n2MAc1gr%3DGykNjVBZq%2Bd6Jrcp3Dyvg%40mail.gmail.com
2022-08-22 17:50:30 +12:00
Thomas Munro 2492fe49dc Remove configure probe for netinet/tcp.h.
<netinet/tcp.h> is in SUSv3 and all targeted Unix systems have it.
For Windows, we can provide a stub include file, to avoid some #ifdef
noise.

Discussion: https://postgr.es/m/CA+hUKGKErNfhmvb_H0UprEmp4LPzGN06yR2_0tYikjzB-2ECMw@mail.gmail.com
2022-08-18 16:31:11 +12:00
Thomas Munro f558088285 Remove HAVE_UNIX_SOCKETS.
Since HAVE_UNIX_SOCKETS is now defined unconditionally, remove the macro
and drop a small amount of dead code.

The last known systems not to have them (as far as I know at least) were
QNX, which we de-supported years ago, and Windows, which now has them.

If a new OS ever shows up with the POSIX sockets API but without working
AF_UNIX, it'll presumably still be able to compile the code, and fail at
runtime with an unsupported address family error.  We might want to
consider adding a HINT that you should turn off the option to use it if
your network stack doesn't support it at that point, but it doesn't seem
worth making the relevant code conditional at compile time.

Also adjust a couple of places in the docs and comments that referred to
builds without Unix-domain sockets, since there aren't any.  Windows
still gets a special mention in those places, though, because we don't
try to use them by default there yet.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
2022-08-14 08:46:53 +12:00
Thomas Munro 5fc88c5d53 Replace pgwin32_is_junction() with lstat().
Now that lstat() reports junction points with S_IFLNK/S_ISLINK(), and
unlink() can unlink them, there is no need for conditional code for
Windows in a few places.  That was expressed by testing for WIN32 or
S_ISLNK, which we can now constant-fold.

The coding around pgwin32_is_junction() was a bit suspect anyway, as we
never checked for errors, and we also know that errors can be spuriously
reported because of transient sharing violations on this OS.  The
lstat()-based code has handling for that.

This also reverts 4fc6b6ee on master only.  That was done because
lstat() didn't previously work for symlinks (junction points), but now
it does.

Tested-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
2022-08-06 12:50:59 +12:00
Thomas Munro 2b1f580ee2 Remove configure probes for symlink/readlink, and dead code.
symlink() and readlink() are in SUSv2 and all targeted Unix systems have
them.  We have partial emulation on Windows.  Code that raised runtime
errors on systems without it has been dead for years, so we can remove
that and also references to such systems in the documentation.

Define HAVE_READLINK and HAVE_SYMLINK macros on Unix.  Our Windows
replacement functions based on junction points can't be used for
relative paths or for non-directories, so the macros can be used to
check for full symlink support.  The places that deal with tablespaces
can just use symlink functions without checking the macros.  (If they
did check the macros, they'd need to provide an #else branch with a
runtime or compile time error, and it'd be dead code.)

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 09:22:56 +12:00
Thomas Munro 4fc6b6eefc Fix get_dirent_type() for symlinks on MinGW/MSYS.
On Windows with MSVC, get_dirent_type() was recently made to return
DT_LNK for junction points by commit 9d3444dc, which fixed some
defective dirent.c code.

On Windows with Cygwin, get_dirent_type() already worked for symlinks,
as it does on POSIX systems, because Cygwin has its own fake symlinks
that behave like POSIX (on closer inspection, Cygwin's dirent has the
BSD d_type extension but it's probably always DT_UNKNOWN, so we fall
back to lstat(), which understands Cygwin symlinks with S_ISLNK()).

On Windows with MinGW/MSYS, we need extra code, because the MinGW
runtime has its own readdir() without d_type, and the lstat()-based
fallback has no knowledge of our convention for treating junctions as
symlinks.

Back-patch to 14, where get_dirent_type() landed.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/b9ddf605-6b36-f90d-7c30-7b3e95c46276%40dunslane.net
2022-07-28 14:26:12 +12:00
Andres Freund c8a9246e09 Add output directory argument to generate-unicode_norm_table.pl
This is in preparation for building postgres with meson / ninja.

When building with meson, commands are run at the root of the build tree. Add
an option to put build output into the appropriate place.

Author: Andres Freund <andres@anarazel.de>
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/5e216522-ba3c-f0e6-7f97-5276d0270029@enterprisedb.com
2022-07-18 12:24:39 -07:00
Peter Eisentraut 9fd45870c1 Replace many MemSet calls with struct initialization
This replaces all MemSet() calls with struct initialization where that
is easily and obviously possible.  (For example, some cases have to
worry about padding bits, so I left those.)

(The same could be done with appropriate memset() calls, but this
patch is part of an effort to phase out MemSet(), so it doesn't touch
memset() calls.)

Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com
2022-07-16 08:50:49 +02:00
Tom Lane 920072339f Improve error reporting from validate_exec().
validate_exec() didn't guarantee to set errno to something appropriate
after a failure, leading to callers not being able to print an on-point
message.  Improve that.

Noted by Kyotaro Horiguchi, though this isn't exactly his proposal.

Discussion: https://postgr.es/m/20220615.131403.1791191615590453058.horikyota.ntt@gmail.com
2022-07-12 15:37:39 -04:00
John Naylor d3117fc1a3 Fix out-of-bounds read in json_lex_string
Commit 3838fa269 added a lookahead loop to allow building strings multiple
bytes at a time. This loop could exit because it reached the end of input,
yet did not check for that before checking if we reached the end of a
valid string. To fix, put the end of string check back in the outer loop.

Per Valgrind animal skink
2022-07-12 11:25:47 +07:00
John Naylor 3838fa269c Build de-escaped JSON strings in larger chunks during lexing
During COPY BINARY with large JSONB blobs, it was found that half
the time was spent parsing JSON, with much of that spent in separate
appendStringInfoChar() calls for each input byte.

Add lookahead loop to json_lex_string() to allow batching multiple bytes
via appendBinaryStringInfo(). Also use this same logic when de-escaping
is not done, to avoid code duplication.

Report and proof of concept patch by Jelte Fennema, reworked by Andres
Freund and John Naylor

Discussion: https://www.postgresql.org/message-id/CAGECzQQuXbies_nKgSiYifZUjBk6nOf2%3DTSXqRjj2BhUh8CTeA%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/PR3PR83MB0476F098CBCF68AF7A1CA89FF7B49@PR3PR83MB0476.EURPRD83.prod.outlook.com
2022-07-11 11:11:36 +07:00
John Naylor 3de359f18f Simplify json lexing state
Instead of updating the length as we go, use a const pointer to end of
the input, which we know already at the start.

This simplifies the coding and may improve performance slightly, but
the real motivation for doing this is to make further changes in this
area easier to reason about.

Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com
2022-07-08 14:53:20 +07:00
Robert Haas b0a55e4329 Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.

Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".

Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.

On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.

Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.

Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.

Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 11:39:09 -04:00
Peter Eisentraut 02c408e21a Remove redundant null pointer checks before free()
Per applicable standards, free() with a null pointer is a no-op.
Systems that don't observe that are ancient and no longer relevant.
Some PostgreSQL code already required this behavior, so this change
does not introduce any new requirements, just makes the code more
consistent.

Discussion: https://www.postgresql.org/message-id/flat/dac5d2d0-98f5-94d9-8e69-46da2413593d%40enterprisedb.com
2022-07-03 11:47:15 +02:00
Peter Eisentraut a8cca6026e logging: Also add the command prefix to detail and hint messages
This makes the output line up better and allows filtering messages by
command.

Discussion: https://www.postgresql.org/message-id/ba6d4fac-9e33-91f9-94fb-1e4c144a48b9@enterprisedb.com
2022-05-30 07:26:06 +02:00
Tom Lane 23e7b38bfe Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.
I manually fixed a couple of comments that pgindent uglified.
2022-05-12 15:17:30 -04:00
Daniel Gustafsson 17ec5fa502 Clear the OpenSSL error queue before cryptohash operations
Setting up an EVP context for ciphers banned under FIPS generate
two OpenSSL errors in the queue, and as we only consume one from
the queue the other is at the head for the next invocation:

  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: unsupported
  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: initialization error

Clearing the error queue when creating the context ensures that
we don't pull in an error from an earlier operation.

Discussion: https://postgr.es/m/C89D932C-501E-4473-9750-638CFCD9095E@yesql.se
2022-05-06 14:41:31 +02:00
Andrew Dunstan b787c554c2 Inhibit mingw CRT's auto-globbing of command line arguments
For some reason by default the mingw C Runtime takes it upon itself to
expand program arguments that look like shell globbing characters. That
has caused much scratching of heads and mis-attribution of the causes of
some TAP test failures, so stop doing that.

This removes an inconsistency with Windows binaries built with MSVC,
which have no such behaviour.

Per suggestion from Noah Misch.

Backpatch to all live branches.

Discussion: https://postgr.es/m/20220423025927.GA1274057@rfd.leadboat.com
2022-04-25 15:47:55 -04:00
Tom Lane 587de223f0 Add missing error handling in pg_md5_hash().
It failed to provide an error string as expected for the
admittedly-unlikely case of OOM in pg_cryptohash_create().
Also, make it initialize *errstr to NULL for success,
as pg_md5_binary() does.

Also add missing comments.  Readers should not have to
reverse-engineer the API spec for a publicly visible routine.
2022-04-18 20:04:55 -04:00
Alvaro Herrera 24d2b2680a
Remove extraneous blank lines before block-closing braces
These are useless and distracting.  We wouldn't have written the code
with them to begin with, so there's no reason to keep them.

Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
2022-04-13 19:16:02 +02:00
Tom Lane 2c9381840f Remove not-very-useful early checks of __pg_log_level in logging.h.
Enforce __pg_log_level message filtering centrally in logging.c,
instead of relying on the calling macros to do it.  This is more
reliable (e.g. it works correctly for direct calls to pg_log_generic)
and it saves a percent or so of total code size because we get rid of
so many duplicate checks of __pg_log_level.

This does mean that argument expressions in a logging macro will be
evaluated even if we end up not printing anything.  That seems of
little concern for INFO and higher levels as those messages are printed
by default, and most of our frontend programs don't even offer a way to
turn them off.  I left the unlikely() checks in place for DEBUG
messages, though.

Discussion: https://postgr.es/m/3993549.1649449609@sss.pgh.pa.us
2022-04-12 13:25:29 -04:00
Michael Paquier a4b57543ac Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.

pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup.  This cleanup needs to be done
before releasing a beta of 15.  pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.

Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 13:38:54 +09:00
Peter Eisentraut c215cc7b61 Add color support for new frontend detail/hint messages
As before, the defaults are similar to gcc's default appearance.
2022-04-11 17:36:44 +02:00
Tom Lane 9a374b77fb Improve frontend error logging style.
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless.  This mostly involves
s/pg_log_fatal/pg_log_error/g.

Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1).  Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.

Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.

Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.

Patch by me.  Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.

Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
2022-04-08 14:55:14 -04:00