We have various requirements when using a dlist_head to keep track of the
number of items in the list. This, traditionally, has been done by
maintaining a counter variable in the calling code. Here we tidy this up
by adding "dclist", which is very similar to dlist but also keeps track of
the number of items stored in the list.
Callers may use the new dclist_count() function when they need to know how
many items are stored. Obtaining the count is an O(1) operation.
For simplicity reasons, dclist and dlist both use dlist_node as their node
type and dlist_iter/dlist_mutable_iter as their iterator type. dclists
have all of the same functionality as dlists except there is no function
named dclist_delete(). To remove an item from a list dclist_delete_from()
must be used. This requires knowing which dclist the given item is stored
in.
Additionally, here we also convert some dlists where additional code
exists to keep track of the number of items stored and to make these use
dclists instead.
Author: David Rowley
Reviewed-by: Bharath Rupireddy, Aleksander Alekseev
Discussion: https://postgr.es/m/CAApHDvrtVxr+FXEX0VbViCFKDGxA3tWDgw9oFewNXCJMmwLjLg@mail.gmail.com
This is similar to 7d25958, and this commit takes care of all the
remaining inconsistencies between the initial value used in the C
variable associated to a GUC and its default value stored in the GUC
tables (as of pg_settings.boot_val).
Some of the initial values of the GUCs updated rely on a compile-time
default. These are refactored so as the GUC table and its C declaration
use the same values. This makes everything consistent with other
places, backend_flush_after, bgwriter_flush_after, port,
checkpoint_flush_after doing so already, for example.
Extracted from a larger patch by Peter Smith. The spots updated in the
modules are from me.
Author: Peter Smith, Michael Paquier
Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby
Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com
Here we add a new 'copy' parameter to tuplesort_getdatum so that we can
instruct the function not to datumCopy() byref Datums before returning.
Similar to 91e9e89dc, this can provide significant performance
improvements in nodeSort when sorting by a single byref column and the
sort's targetlist contains only that column.
This allows us to re-enable Datum sorts for byref types which was disabled
in 3a5817695 due to a reported memory leak.
Additionally, here we slightly optimize DISTINCT aggregates so that we no
longer perform any datumCopy() when we find the current value not to be
distinct from the previous value. Previously the code would always take a
copy of the most recent Datum and pfree the previous value, even when the
values were the same. Testing shows a small but noticeable performance
increase when aggregate transitions are skipped due to the current
transition value being the same as the prior one.
Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqS6wC5U==k9Hd26E4EQXH3QR67-T4=Q1rQ36NGvjfVSg@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvqHonfe9G1cVaKeHbDx70R_zCrM3qP2AGXpGrieSKGnhA@mail.gmail.com
This commit moves pg_pwritev_with_retry(), a convenience wrapper of
pg_writev() able to handle partial writes, to common/file_utils.c so
that the frontend code is able to use it. A first use-case targetted
for this routine is pg_basebackup and pg_receivewal, for the
zero-padding of a newly-initialized WAL segment. This is used currently
in the backend when the GUC wal_init_zero is enabled (default).
Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Thomas Munro
Discussion: https://postgr.es/m/CALj2ACUq7nAb7=bJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA@mail.gmail.com
These numbers are strictly-monotone identifiers assigned to each rule
of pg_hba_file_rules and each map of pg_ident_file_mappings when loading
the HBA and ident configuration files, indicating the order in which
they are checked at authentication time, until a match is found.
With only one file loaded currently, this is equivalent to the line
numbers assigned to the entries loaded if one wants to know their order,
but this becomes mandatory once the inclusion of external files is
added to the HBA and ident files to be able to know in which order the
rules and/or maps are applied at authentication. Note that NULL is used
when a HBA or ident entry cannot be parsed or validated, aka when an
error exists, contrary to the line number.
Bump catalog version.
Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
This has the advantage to limit the presence of the GUC values
hba_file and ident_file to the code paths where these files are loaded,
easing the introduction of an upcoming feature aimed at adding inclusion
logic for files and directories in HBA and ident files.
Note that this needs the addition of the source file name to HbaLine, in
addition to the line number, which is something needed by the backend in
two places of auth.c (authentication failure details and auth_id log
when log_connections is enabled).
While on it, adjust a log generated on authentication failure to report
the name of the actual HBA file on which the connection attempt matched,
where the line number and the raw line written in the HBA file were
already included. This was previously hardcoded as pg_hba.conf, which
would be incorrect when a custom value is used at postmaster startup for
the GUC hba_file.
Extracted from a larger patch by the same author.
Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
Specifically, when pg_basebackup is invoked with -Tx=y, don't error
out if x could plausibly be an absolute path either on Windows or on
non-Windows systems. We don't know whether the remote system is
running the same OS as the local system, so it's not appropriate to
assume that our local rule about absolute pathnames is the same as
the rule on the remote system.
Patch by me, reviewed by Tom Lane, Andrew Dunstan, and
Davinder Singh.
Discussion: http://postgr.es/m/CA+TgmoY+jC3YiskomvYKDPK3FbrmsDU7_8+wMHt02HOdJeRb0g@mail.gmail.com
AuthToken gains a regular expression, and IdentLine is changed so as it
uses an AuthToken rather than tracking separately the ident user string
used for the regex compilation and its generated regex_t. In the case
of pg_ident.conf, a set of AuthTokens is built in the pre-parsing phase
of the file, and an extra regular expression is compiled when building
the list of IdentLines, after checking the sanity of the fields in a
pre-parsed entry.
The logic in charge of computing and executing regular expressions is
now done in a new set of routines called respectively
regcomp_auth_token() and regexec_auth_token() that are wrappers around
pg_regcomp() and pg_regexec(), working on AuthTokens. While on it, this
patch adds a routine able to free an AuthToken, free_auth_token(), to
simplify a bit the logic around the requirement of using a specific free
routine for computed regular expressions. Note that there are no
functional or behavior changes introduced by this commit.
The goal of this patch is to ease the use of regular expressions with
more items of pg_hba.conf (user list, database list, potentially
hostnames) where AuthTokens are used extensively. This will be tackled
later in a separate patch.
Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/fff0d7c1-8ad4-76a1-9db3-0ab6ec338bf7@amazon.com
These routines have been renamed in a19e5ce. There is no need to keep
the compatibility declarations on HEAD, as once an extension moves to
the new routine name when compiling with v16~ the code would work the
same way when recompiled on v15. No backpatch to v15 for this one,
because ABI compatibility has to be maintained there.
Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de
Per discussion, the existing routine name able to initialize a SRF
function with materialize mode is unpopular, so rename it. Equally, the
flags of this function are renamed, as of:
- SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC
- SRF_SINGLE_BLESS -> MAT_SRF_BLESS
The previous function and flags introduced in 9e98583 are kept around
for compatibility purposes, so as any extension code already compiled
with v15 continues to work as-is. The declarations introduced here for
compatibility will be removed from HEAD in a follow-up commit.
The new names have been suggested by Andres Freund and Melanie
Plageman.
Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de
Backpatch-through: 15
When creating a cast that uses a conversion function, we've
historically allowed the input and result types to be
binary-compatible with the function's input and result types,
rather than necessarily being identical. This means that the new
cast is logically dependent on the binary-compatible cast or casts
that it references: if those are defined by pg_cast entries, and you
try to restore the new cast without having defined them, it'll fail.
Hence, we should make pg_depend entries to record these dependencies
so that pg_dump knows that there is an ordering requirement.
This is not the only place where we allow such shortcuts; aggregate
functions for example are similarly lax, and in principle should gain
similar dependencies. However, for now it seems sufficient to fix
the cast-versus-cast case, as pg_dump's other ordering heuristics
should keep it out of trouble for other object types.
Per report from David Turoň; thanks also to Robert Haas for
preliminary investigation. I considered back-patching, but
seeing that this issue has existed for many years without
previous reports, it's not clear it's worth the trouble.
Moreover, back-patching wouldn't be enough to ensure that the
new pg_depend entries exist in existing databases anyway.
Discussion: https://postgr.es/m/OF0A160F3E.578B15D1-ONC12588DA.003E4857-C12588DA.0045A428@notes.linuxbox.cz
In the latest version of Apple's macOS SDK, <sys/socket.h>
fails to compile if "REF" is #define'd as something.
Apple may or may not agree that this is a bug, and even if
they do accept the bug report I filed, they probably won't
fix it very quickly. In the meantime, our back branches will all
fail to compile gram.y. v15 and HEAD currently escape the problem
thanks to the refactoring done in 98e93a1fc, but that's purely
accidental. Moreover, since that patch removed a widely-visible
inclusion of <netdb.h>, back-patching it seems too likely to break
third-party code.
Instead, change the token's code name to REF_P, following our usual
convention for naming parser tokens that are likely to have symbol
conflicts. The effects of that should be localized to the grammar
and immediately surrounding files, so it seems like a safer answer.
Per project policy that we want to keep recently-out-of-support
branches buildable on modern systems, back-patch all the way to 9.2.
Discussion: https://postgr.es/m/1803927.1665938411@sss.pgh.pa.us
It can be useful to know when a relation has last been used, e.g., when
evaluating whether an index is still required. It was already possible to
infer the time of the last usage by tracking, e.g.,
pg_stat_all_indexes.idx_scan over time. But far from everybody does so.
To make it easier to detect the last time a relation has been scanned, track
that time in each relation's pgstat entry. To minimize overhead a) the
timestamp is updated only when the backend pending stats entry is flushed to
shared stats b) the last transaction's stop timestamp is used as the
timestamp.
Bumps catalog and stats format versions.
Author: Dave Page <dpage@pgadmin.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://postgr.es/m/CA+OCxozrVHNFVEPkweUHMZje+t1tfY816d9MZYc6eZwOOusOaQ@mail.gmail.com
The previous patch made addition of new GUCs cheap, but other GUC
operations aren't improved and indeed get a bit slower, because
hash_seq_search() is slower than just scanning a pointer array.
However, most performance-critical GUC operations only need
to touch a relatively small fraction of the GUCs; especially
so for AtEOXact_GUC(). We can improve matters at the cost
of a bit more space by adding dlist or slist links to the
GUC data structures. This patch invents lists that track
(1) all GUCs with non-default "source";
(2) all GUCs with nonempty state stack (implying they've
been changed in the current transaction);
(3) all GUCs due for reporting to the client.
All of guc.c's performance-critical cases can make use of one or
another of these lists to avoid searching the whole hash table.
In particular, the stack list means that transaction end
doesn't take time proportional to the number of GUCs, but
only to the number changed in the current transaction.
Discussion: https://postgr.es/m/2982579.1662416866@sss.pgh.pa.us
This gets rid of bsearch() in favor of hashed lookup. The main
advantage is that it becomes far cheaper to add new GUCs, since
we needn't re-sort the pointer array. Adding N new GUCs had
been O(N^2 log N), but now it's closer to O(N). We need to
sort only in SHOW ALL and equivalent functions, which are
hopefully not performance-critical to anybody.
Also, merge GetNumConfigOptions() into get_guc_variables(),
because in a world where the set of GUCs isn't fairly static
you really want to consider those two results as tied together
not independent.
Discussion: https://postgr.es/m/2982579.1662416866@sss.pgh.pa.us
The only real argument for using malloc directly was that we needed
the ability to not throw error on OOM; but mcxt.c grew that feature
awhile ago.
Keeping the data in a memory context improves accountability and
debuggability --- for example, without this it's almost impossible
to detect memory leaks in the GUC code with anything less costly
than valgrind. Moreover, the next patch in this series will add a
hash table for GUC lookup, and it'd be pretty silly to be using
palloc-dependent hash facilities alongside malloc'd storage of the
underlying data.
This is a bit invasive though, in particular causing an API break
for GUC check hooks that want to modify the GUC's value or use an
"extra" data structure. They must now use guc_malloc() and
guc_free() instead of malloc() and free(). Failure to change
affected code will result in assertion failures or worse; but
thanks to recent effort in the mcxt infrastructure, it shouldn't
be too hard to diagnose such oversights (at least in assert-enabled
builds).
One note is that this changes ParseLongOption() to return short-lived
palloc'd not malloc'd data. There wasn't any caller for which the
previous definition was better.
Discussion: https://postgr.es/m/2982579.1662416866@sss.pgh.pa.us
We lack a version of repalloc() that supports MCXT_ALLOC_NO_OOM
semantics, so invent repalloc_extended() with the usual set of
flags. repalloc_huge() becomes a legacy wrapper for that.
Also, fix dynahash.c so that it can support HASH_ENTER_NULL
requests when using the default palloc-based allocator.
The only reason it didn't do that already was the lack of the
MCXT_ALLOC_NO_OOM option when that code was written, ages ago.
While here, simplify a few overcomplicated tests in mcxt.c.
Discussion: https://postgr.es/m/2982579.1662416866@sss.pgh.pa.us
Commit 3d956d956 allowed the COPY, but it's done by inserting individual
rows to the foreign table, so it can be inefficient due to the overhead
caused by each round-trip to the foreign server. To improve performance
of the COPY in such a case, this patch allows batch insertion, by
extending the multi-insert machinery in CopyFrom() to the foreign-table
case so that we insert multiple rows to the foreign table at once using
the FDW callback routine added by commit b663a4136. This patch also
allows this for postgres_fdw. It is enabled by the "batch_size" option
added by commit b663a4136, which is disabled by default.
When doing batch insertion, we update progress of the COPY command after
performing the FDW callback routine, to count rows not suppressed by the
FDW as well as a BEFORE ROW INSERT trigger. For consistency, this patch
changes the timing of updating it for plain tables: previously, we
updated it immediately after adding each row to the multi-insert buffer,
but we do so only after writing the rows stored in the buffer out to the
table using table_multi_insert(), which I think would be consistent even
with non-batching mode, because in that mode we update it after writing
each row out to the table using table_tuple_insert().
Andrey Lepikhov, heavily revised by me, with review from Ian Barwick,
Andrey Lepikhov, and Zhihong Yu.
Discussion: https://postgr.es/m/bc489202-9855-7550-d64c-ad2d83c24867%40postgrespro.ru
This file needs xlogreader.h only for the XLogReaderState typedef; but
we can dodge that by forward-declaring it. Many files use xlog.h for
reasons other than reading WAL, and it's not good to force all those
files to include xlogreader.h, so take it out.
Surprisingly, there is no fallout in core code from making this change.
Perhaps external code will have to start including xlogreader.h.
Make a common replication origin name formatting function to replace
multiple snprintf() expressions. This also includes logic previously done
by ReplicationOriginNameForTablesync().
This makes the code to generate the origin name consistent among apply
worker and tablesync worker.
Author: Peter Smith
Reviewed-By: Aleksander Alekseev
Discussion: https://postgr.es/m/CAHut%2BPsa8hhfSE6ozUK-ih7GkQziAVAf4f3bqiXEj2nQiu-43g%40mail.gmail.com
This is useful as a way for extensions to process COPY TO rows in the
way they see fit (say auditing, analytics, backend, etc.) without the
need to invoke an external process running as the OS user running the
backend through PROGRAM that requires superuser rights. COPY FROM
already provides a similar callback for logical replication. For COPY
TO, the callback is triggered when we are ready to send a row in
CopySendEndOfRow(), which is the same code path as when sending a row
to a frontend or a pipe/file.
A small test module, test_copy_callbacks, is added to provide some
coverage for this facility.
Author: Bilva Sanaba, Nathan Bossart
Discussion: https://postgr.es/m/253C21D1-FCEB-41D9-A2AF-E6517015B7D7@amazon.com
Remove the Trap and TrapMacro macros, which were nearly unused
and confusingly had the opposite condition polarity from the
otherwise-functionally-equivalent Assert macros.
Having done that, it's very hard to justify carrying the errorType
argument of ExceptionalCondition, so drop that too, and just
let it assume everything's an Assert. This saves about 64K
of code space as of current HEAD.
Discussion: https://postgr.es/m/3928703.1665345117@sss.pgh.pa.us
Instead of Abs() for int64, use the C standard functions labs() or
llabs() as appropriate. Define a small wrapper around them that
matches our definition of int64. (labs() is C90, llabs() is C99.)
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/4beb42b5-216b-bce8-d452-d924d5794c63%40enterprisedb.com
Previously PgStat_StatReplSlotEntry contained the slotname, which was mainly
used when writing out the stats during shutdown, to identify the slot in the
serialized data (at runtime the index in ReplicationSlotCtl->replication_slots
is used, but that can change during a restart). Unfortunately the slotname was
overwritten when the slot's stats were reset.
That turned out to only cause "real" problems if the slot was active during
the reset, triggering an assertion failure at the next
pgstat_report_replslot(). In other paths the stats were re-initialized during
pgstat_acquire_replslot().
Fix this by removing slotname from PgStat_StatReplSlotEntry. Instead we can
get the slot's name from the slot itself. Besides fixing a bug, this also is
architecturally cleaner (a name is not really statistics). This is safe
because stats, for a slot removed while shut down, will not be restored at
startup.
In 15 the slotname is not removed, but renamed, to avoid changing the stats
format. In master, bump PGSTAT_FILE_FORMAT_ID.
This commit does not contain a test for the fix. I think this can only be
tested by a tap test starting pg_recvlogical in the background and checking
pg_recvlogical's output. That type of test is notoriously hard to be reliable,
so committing it shortly before the release is wrapped seems like a bad idea.
Reported-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/YxfagaTXUNa9ggLb@ahch-to
Backpatch: 15-, where the bug was introduced in 5891c7a8ed
Commit c6e0fe1f2 was a shade too trusting that any pointer passed
to pfree, repalloc, etc will point at a valid chunk. Notably,
passing a pointer that was actually obtained from malloc tended
to result in obscure assertion failures, if not worse. (On FreeBSD
I've seen such mistakes take down the entire cluster, seemingly as
a result of clobbering shared memory.)
To improve matters, extend the mcxt_methods[] array so that it
has entries for every possible MemoryContextMethodID bit-pattern,
with the currently unassigned ID codes pointing to error-reporting
functions. Then, fiddle with the ID assignments so that patterns
likely to be associated with bad pointers aren't valid ID codes.
In particular, we should avoid assigning bit patterns 000 (zeroed
memory) and 111 (wipe_mem'd memory).
It turns out that on glibc (Linux), malloc uses chunk headers that
have flag bits in the same place we keep MemoryContextMethodID,
and that the bit patterns 000, 001, 010 are the only ones we'll
see as long as the backend isn't threaded. So we can have very
robust detection of pfree'ing a malloc-assigned block on that
platform, at least so long as we can refrain from using up those
ID codes. On other platforms, we don't have such a good guarantee,
but keeping 000 reserved will be enough to catch many such cases.
While here, make GetMemoryChunkMethodID() local to mcxt.c, as there
seems no need for it to be exposed even in memutils_internal.h.
Patch by me, with suggestions from Andres Freund and David Rowley.
Discussion: https://postgr.es/m/2910981.1665080361@sss.pgh.pa.us
This substantially speeds up building for windows, due to the vast amount of
headers included via windows.h. A cross build from linux targetting mingw goes
from
994.11user 136.43system 0:31.58elapsed 3579%CPU
to
422.41user 89.05system 0:14.35elapsed 3562%CPU
The wins on windows are similar-ish (but I don't have a system at hand just
now for actual numbers). Targetting other operating systems the wins are far
smaller (tested linux, macOS, FreeBSD).
For now precompiled headers are disabled by default, it's not clear how well
they work on all platforms. E.g. on FreeBSD gcc doesn't seem to have working
support, but clang does.
When doing a full build precompiled headers are only beneficial for targets
with multiple .c files, as meson builds a separate precompiled header for each
target (so that different compilation options take effect). This commit
therefore only changes target with at least two .c files to use precompiled
headers.
Because this commit adds b_pch=false to the default_options new build
directories will have precompiled headers disabled by default, however
existing build directories will continue use the default value of b_pch, which
is true.
Note that using precompiled headers with ccache requires setting
CCACHE_SLOPPINESS=pch_defines,time_macros to get hits.
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKG+50eOUbN++ocDc0Qnp9Pvmou23DSXu=ZA6fepOcftKqA@mail.gmail.com
Discussion: https://postgr.es/m/c5736f70-bb6d-8d25-e35c-e3d886e4e905@enterprisedb.com
Discussion: https://postgr.es/m/20190826054000.GE7005%40paquier.xyz
MemoryContextContains is no longer reliable in the wake of c6e0fe1f2,
because there's no longer very much redundancy in chunk headers.
(It wasn't *completely* reliable even before that, as there was a
chance of a false positive if you passed it something that didn't
point to an mcxt chunk at all. But it was generally good enough.)
Hence, remove it. There is no remaining core code that requires it.
Extensions that have been using it might be able to substitute a
test like "GetMemoryChunkContext(ptr) == context", recognizing that
this explicitly requires that the pointer point to some chunk.
Tom Lane and David Rowley
Discussion: https://postgr.es/m/1913788.1664898906@sss.pgh.pa.us
ts_locale.c omitted support for "isalnum" tests, perhaps on the
grounds that there were initially no use-cases for that. However,
both ltree and pg_trgm need such tests, and we do also have one
use-case now in the core backend. The workaround of testing
isalpha and isdigit separately seems quite inefficient, especially
when dealing with multibyte characters; so let's fill in the
missing support.
Discussion: https://postgr.es/m/2548310.1664999615@sss.pgh.pa.us
This optional parameter can be specified in cases where there are nested
PG_TRY() statements within a function in order to stop the compiler from
issuing warnings about shadowed local variables when compiling with
-Wshadow. The optional parameter is used as a suffix on the variable
names declared within the PG_TRY(), PG_CATCH(), PG_FINALLY() and
PG_END_TRY() macros. The parameter, if specified, must be the same in
each component macro of the given PG_TRY() block.
This also adjusts the single case where we have nested PG_TRY() statements
to add a parameter to the inner-most PG_TRY().
This reduces the number of compiler warnings when compiling with
-Wshadow=compatible-local from 5 down to 1.
Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqWGMdB_pATeUqE=JCtNqNxObPOJ00jFEa2_sZ20j_Wvg@mail.gmail.com
In a similar effort to f01592f91, here we mostly rename shadowed local
variables to remove the warnings produced when compiling with
-Wshadow=compatible-local.
This fixes 63 warnings and leaves just 5.
Author: Justin Pryzby, David Rowley
Reviewed-by: Justin Pryzby
Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
This reverts commit db0d67db24 and
several follow-on fixes. The idea of making a cost-based choice
of the order of the sorting columns is not fundamentally unsound,
but it requires cost information and data statistics that we don't
really have. For example, relying on procost to distinguish the
relative costs of different sort comparators is pretty pointless
so long as most such comparator functions are labeled with cost 1.0.
Moreover, estimating the number of comparisons done by Quicksort
requires more than just an estimate of the number of distinct values
in the input: you also need some idea of the sizes of the larger
groups, if you want an estimate that's good to better than a factor of
three or so. That's data that's often unknown or not very reliable.
Worse, to arrive at estimates of the number of calls made to the
lower-order-column comparison functions, the code needs to make
estimates of the numbers of distinct values of multiple columns,
which are necessarily even less trustworthy than per-column stats.
Even if all the inputs are perfectly reliable, the cost algorithm
as-implemented cannot offer useful information about how to order
sorting columns beyond the point at which the average group size
is estimated to drop to 1.
Close inspection of the code added by db0d67db2 shows that there
are also multiple small bugs. These could have been fixed, but
there's not much point if we don't trust the estimates to be
accurate in-principle.
Finally, the changes in cost_sort's behavior made for very large
changes (often a factor of 2 or so) in the cost estimates for all
sorting operations, not only those for multi-column GROUP BY.
That naturally changes plan choices in many situations, and there's
precious little evidence to show that the changes are for the better.
Given the above doubts about whether the new estimates are really
trustworthy, it's hard to summon much confidence that these changes
are better on the average.
Since we're hard up against the release deadline for v15, let's
revert these changes for now. We can always try again later.
Note: in v15, I left T_PathKeyInfo in place in nodes.h even though
it's unreferenced. Removing it would be an ABI break, and it seems
a bit late in the release cycle for that.
Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
Up to now, the ID values returned by pg_stat_get_backend_idset() and
used by pg_stat_get_backend_activity() and allied functions were just
indexes into a local array of sessions seen by the last stats refresh.
This is problematic for a few reasons. The "ID" of a session can vary
over its existence, which is surprising. Also, while these numbers
often match the "backend ID" used for purposes like temp schema
assignment, that isn't reliably true. We can fairly cheaply switch
things around to make these numbers actually be the sessions' backend
IDs. The added test case illustrates that with this definition, the
temp schema used by a given session can be obtained given its PID.
While here, delete some dead code that guarded against getting
a NULL return from pgstat_fetch_stat_local_beentry(). That can't
happen as long as the caller is careful to pass an in-range array
index, as all the callers are. (This code may not have been dead
when written, but it surely is now.)
Nathan Bossart
Discussion: https://postgr.es/m/20220815205811.GA250990@nathanxps13
SYSTEM_USER is a reserved keyword of the SQL specification that,
roughly described, is aimed at reporting some information about the
system user who has connected to the database server. It may include
implementation-specific information about the means by the user
connected, like an authentication method.
This commit implements SYSTEM_USER as of auth_method:identity, where
"auth_method" is a keyword about the authentication method used to log
into the server (like peer, md5, scram-sha-256, gss, etc.) and
"identity" is the authentication identity as introduced by 9afffcb (peer
sets authn to the OS user name, gss to the user principal, etc.). This
format has been suggested by Tom Lane.
Note that thanks to d951052, SYSTEM_USER is available to parallel
workers.
Bump catalog version.
Author: Bertrand Drouvot
Reviewed-by: Jacob Champion, Joe Conway, Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/7e692b8c-0b11-45db-1cad-3afc5b57409f@amazon.com
We'd like to use precompiled headers on windows to reduce compile times. Right
now we rely on defining UMDF_USING_NTSTATUS before including postgres.h in a few
select places - which doesn't work with precompiled headers. Instead define
it globally.
When UMDF_USING_NTSTATUS is defined we need to explicitly include ntstatus.h,
winternl.h to get a comparable set of symbols. Right now these includes would
be required in a number of non-platform-specific .c files - to avoid that,
include them in win32_port.h. Based on my measurements that doesn't increase
compile times measurably.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220927011951.j3h4o7n6bhf7dwau@awork3.anarazel.de
Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of
the pg_ prefixes where we use pread(), pwrite() and vectored variants.
We dropped support for ancient Unixes where we needed to use lseek() to
implement replacements for those, but it turns out that Windows also
changes the current position even when you pass in an offset to
ReadFile() and WriteFile() if the file handle is synchronous, despite
its documentation saying otherwise.
Switching to asynchronous file handles would fix that, but have other
complications. For now let's just put back the pg_ prefix and add some
comments to highlight the non-standard side-effect, which we can now
describe as Windows-only.
Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
SharedInvalSmgrMsg can't require 8-byte alignment, because then
SharedInvalidationMessage will require 8-byte alignment, which will
then cause ParseCommitRecord to fail on machines that are picky
about alignment, because it assumes that everything that gets
packed into a commit record requires only 4-byte alignment.
Another problem with 05d4cbf9b6.
Discussion: http://postgr.es/m/3825454.1664310917@sss.pgh.pa.us
The previous macro implementations just cast the argument to a target
type but did not check whether the input type was appropriate. The
function implementation can do better type checking of the input type.
For the *GetDatumFast() macros, converting to an inline function
doesn't work in the !USE_FLOAT8_BYVAL case, but we can use
AssertVariableIsOfTypeMacro() to get a similar level of type checking.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.
If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.
This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.
Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Previously, these were declared in postgres_ext.h, but they are not
needed nearly so widely as the OID declarations, so that doesn't
necessarily make sense. Also, because postgres_ext.h is included
before most of c.h has been processed, the previous location creates
some problems for a pending patch.
Patch by me, reviewed by Dilip Kumar.
Discussion: http://postgr.es/m/CA+TgmoYc8oevMqRokZQ4y_6aRn-7XQny1JBr5DyWR_jiFtONHw@mail.gmail.com
Push the units fields over to the left so that all the single-bit
flags can be together. I considered rearranging the single-bit
flags to try to group flags with similar purposes, but eventually
decided that that involved too many judgment calls.
Discussion: https://postgr.es/m/17385-9ee529fb091f0ce5@postgresql.org
Previously, the transaction-property GUCs such as transaction_isolation
could be reset after starting a transaction, because we marked them
as GUC_NO_RESET_ALL but still allowed a targeted RESET. That leads to
assertion failures or worse, because those properties aren't supposed
to change after we've acquired a transaction snapshot.
There are some NO_RESET_ALL variables for which RESET is okay, so
we can't just redefine the semantics of that flag. Instead introduce
a separate GUC_NO_RESET flag. Mark "seed", as well as the transaction
property GUCs, as GUC_NO_RESET.
We have to disallow GUC_ACTION_SAVE as well as straight RESET, because
otherwise a function having a "SET transaction_isolation" clause can
still break things: the end-of-function restore action is equivalent
to a RESET.
No back-patch, as it's conceivable that someone is doing something
this patch will forbid (like resetting one of these GUCs at transaction
start, or "CREATE FUNCTION ... SET transaction_read_only = 1") and not
running into problems with it today. Given how long we've had this
issue and not noticed, the side effects in non-assert builds can't be
too serious.
Per bug #17385 from Andrew Bille.
Masahiko Sawada
Discussion: https://postgr.es/m/17385-9ee529fb091f0ce5@postgresql.org
This was used as the returned result type of the generated contents for
the backup_label and backup history files. This is replaced by a simple
string, reducing the cleanup burden of all the callers of
build_backup_content().
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/YzERvNPaZivHEKZJ@paquier.xyz
This change simplifies some of the logic related to the generation and
creation of the backup_label and backup history files, which has become
unnecessarily complicated since the removal of the exclusive backup mode
in commit 39969e2. The code was previously generating the contents of
these files as a string (start phase for the backup_label and stop phase
for the backup history file), one problem being that the contents of the
backup_label string were scanned to grab some of its internal contents
at the stop phase.
This commit changes the logic so as we store the data required to build
these files in an intermediate structure named BackupState. The
backup_label file and backup history file strings are generated when
they are ready to be sent back to the client. Both files are now
generated with the same code path. While on it, this commit renames
some variables for clarity.
Two new files named xlogbackup.{c,h} are introduced in this commit, to
remove from xlog.c some of the logic around base backups. Note that
more could be moved to this new set of files.
Author: Bharath Rupireddy, Michael Paquier
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CALj2ACXWwTDgJqCjdaPyfR7djwm6SrybGcrZyrvojzcsmt4FFw@mail.gmail.com
The node types A_Const, Constraint, and A_Expr had custom output
functions, but no read functions were implemented so far.
The A_Expr output format had to be tweaked a bit to make it easier to
parse.
Be a bit more cautious about applying strncmp to unterminated strings.
Also error out if an unrecognized enum value is found in each case,
instead of just printing a placeholder value. That was maybe ok for
debugging but won't work if we want to have robust round-tripping.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/4159834.1657405226@sss.pgh.pa.us
Autoconf is showing its age, fewer and fewer contributors know how to wrangle
it. Recursive make has a lot of hard to resolve dependency issues and slow
incremental rebuilds. Our home-grown MSVC build system is hard to maintain for
developers not using Windows and runs tests serially. While these and other
issues could individually be addressed with incremental improvements, together
they seem best addressed by moving to a more modern build system.
After evaluating different build system choices, we chose to use meson, to a
good degree based on the adoption by other open source projects.
We decided that it's more realistic to commit a relatively early version of
the new build system and mature it in tree.
This commit adds an initial version of a meson based build system. It supports
building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD,
Solaris and Windows (however only gcc is supported on aix, solaris). For
Windows/MSVC postgres can now be built with ninja (faster, particularly for
incremental builds) and msbuild (supporting the visual studio GUI, but
building slower).
Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM
bitcode generation, documentation adjustments) are done in subsequent commits
requiring further review. Other aspects (e.g. not installing test-only
extensions) are not yet addressed.
When building on Windows with msbuild, builds are slower when using a visual
studio version older than 2019, because those versions do not support
MultiToolTask, required by meson for intra-target parallelism.
The plan is to remove the MSVC specific build system in src/tools/msvc soon
after reaching feature parity. However, we're not planning to remove the
autoconf/make build system in the near future. Likely we're going to keep at
least the parts required for PGXS to keep working around until all supported
versions build with meson.
Some initial help for postgres developers is at
https://wiki.postgresql.org/wiki/Meson
With contributions from Thomas Munro, John Naylor, Stone Tickle and others.
Author: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
Make sure that function declarations use names that exactly match the
corresponding names from function definitions for several "lexer
adjacent" backend functions.
These functions were missed by recent commits because they were obscured
by clang-tidy warnings about functions whose signature is directly under
the control of the lexer (flex seems to always generate function
declarations with unnamed parameters). We probably can't fix most of
the warnings it generates for translation units that get built from .l
and .y files, but we can at least do this much.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Previously the following snprintf() wrappers:
* ReplicationSlotNameForTablesync()
* ReplicationOriginNameForTablesync()
... used int as a second argument of snprintf() while the actual type of it
is size_t. Although it doesn't fail at present better replace it with Size
for consistency with the rest of the system.
Author: Aleksander Alekseev
Reviewed-By: Peter Smith
Discussion: https://postgr.es/m/CAHut%2BPsa8hhfSE6ozUK-ih7GkQziAVAf4f3bqiXEj2nQiu-43g%40mail.gmail.com
Visual Studio 2015+ has support for a macro to control the alignement of
structures as of __declspec(align(#)), and this commit adds a definition
of pg_attribute_aligned() based on that. It happens that this was
already used in the implementation of atomics for MSVC. Note that there
is still no definition fo pg_attribute_packed(), so this does not impact
itemptr.h.
Author: James Coleman
Discussion: https://postgr.es/m/CAAaqYe-HbtZvR3msoMtk+hYW2S0e0OapzMW8icSMYTMA+mN8Aw@mail.gmail.com
expression_tree_walker and allied functions have traditionally
declared their callback functions as, say, "bool (*walker) ()"
to allow for variation in the declared types of the callback
functions' context argument. This is apparently going to be
forbidden by the next version of the C standard, and the latest
version of clang warns about that. In any case it's always
been pretty poor for error-detection purposes, so fixing it is
a good thing to do.
What we want to do is change the callback argument declarations to
be like "bool (*walker) (Node *node, void *context)", which is
correct so far as expression_tree_walker and friends are concerned,
but not change the actual callback functions. Strict compliance with
the C standard would require changing them to declare their arguments
as "void *context" and then cast to the appropriate context struct
type internally. That'd be very invasive and it would also introduce
a bunch of opportunities for future bugs, since we'd no longer have
any check that the correct sort of context object is passed by outside
callers or internal recursion cases. Therefore, we're just going
to ignore the standard's position that "void *" isn't necessarily
compatible with struct pointers. No machine built in the last forty
or so years actually behaves that way, so it's not worth introducing
bug hazards for compatibility with long-dead hardware.
Therefore, to silence these compiler warnings, introduce a layer of
macro wrappers that cast the supplied function name to the official
argument type. Thanks to our use of -Wcast-function-type, this will
still produce a warning if the supplied function is seriously
incompatible with the required signature, without going as far as
the official spec restriction does.
This method fixes the problem without any need for source code changes
outside nodeFuncs.h/.c. However, it is an ABI break because the
physically called functions now have names ending in "_impl". Hence
we can only fix it this way in HEAD. In the back branches, we'll have
to settle for disabling -Wdeprecated-non-prototype.
Discussion: https://postgr.es/m/CA+hUKGKpHPDTv67Y+s6yiC8KH5OXeDg6a-twWo_xznKTcG0kSA@mail.gmail.com
Fix selfuncs.h cpluspluscheck complaint, without reintroducing a
parameter name inconsistency (restore the original declaration names,
and then make corresponding function definitions consistent with that).
Oversight in commit a601366a.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Andres Freund <andres@anarazel.de>
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in optimizer, parser,
utility, libpq, and "commands" code, as well as in remaining library
code. Do the same for all code related to frontend programs (with the
exception of pg_dump/pg_dumpall related code).
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy. Later commits will handle
ecpg and pg_dump/pg_dumpall.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in storage, catalog,
access method, executor, and logical replication code, as well as in
miscellaneous utility/library code.
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy. Later commits will do the
same for other parts of the codebase.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Make sure that function declarations use names that exactly match the
corresponding names from function definitions. Having parameter names
that are reliably consistent in this way will make it easier to reason
about groups of related C functions from the same translation unit as a
module. It will also make certain refactoring tasks easier.
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy. Later commits will do the
same for other parts of the codebase.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Make regex code consistently use named parameters in function
declarations. Also make sure that parameter names from each function's
declaration match corresponding definition parameter names.
This makes Henry Spencer's regex code follow Postgres coding standards.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
The API contract for planstate_tree_walker() callbacks is that they
take a PlanState pointer and a context pointer. Somebody figured
they could save a couple lines of code by ignoring that, and passing
ExecShutdownNode itself as the walker even though it has but one
argument. Somewhat remarkably, we've gotten away with that so far.
However, it seems clear that the upcoming C2x standard means to
forbid such cases, and compilers that actively break such code
likely won't be far behind. So spend the extra few lines of code
to do it honestly with a separate walker function.
In HEAD, we might as well go further and remove ExecShutdownNode's
useless return value. I left that as-is in back branches though,
to forestall complaints about ABI breakage.
Back-patch, with the thought that this might become of practical
importance before our stable branches are all out of service.
It doesn't seem to be fixing any live bug on any currently known
platform, however.
Discussion: https://postgr.es/m/208054.1663534665@sss.pgh.pa.us
Make reorderbuffer.h function declarations consistently use named
parameters. Also make sure that the declarations use names that match
corresponding names from function definitions in reorderbuffer.c. This
makes the definitions easier to follow, especially in the case of
functions that happen to have adjoining arguments of the same type.
This patch was written with help from clang-tidy. Specifically, its
"readability-inconsistent-declaration-parameter-name" check and its
"readability-named-parameter" check were used.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/3955318.1663377656@sss.pgh.pa.us
The function has a bool argument named "case_insensitive", but that was
spelled "case_sensitive" in the declaration. Make them consistent now
to avoid confusion in the future.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Backpatch: 10-
For some reason we'd never decorated pg_v*printf() with
pg_attribute_printf() annotations. There is a convention for
how to label va_list-using printf functions (write zero for the
second argument), and we use that liberally elsewhere in the
code, but these core functions lacked it. It's not clear how
much useful checking the compiler can do for calls of these,
but we might as well add the annotations.
Also, sync win32security.c's log_error() with our normal convention
that pg_attribute_printf must be attached to a function's declaration
not definition. Apparently this file is only compiled with compilers
that aren't picky about that, but still it'd be better to be
consistent.
No back-patch since there's little reason to think we would catch
anything.
Discussion: https://postgr.es/m/3492412.1663283395@sss.pgh.pa.us
Various bits of code were declaring signal handlers manually,
using "int signum" or variants of that. We evidently have no
platforms where that's actually wrong, but let's use our
SIGNAL_ARGS macro everywhere anyway. If nothing else, it's
good for finding signal handlers easily.
No need for back-patch, since this is just cosmetic AFAICS.
Discussion: https://postgr.es/m/2684964.1663167995@sss.pgh.pa.us
This header is semi-private, being used only in files related to
raw parsing, so move to the backend directory where those files
live. This allows removal of Makefile rules that symlink gram.h to
src/include/parser, since gramparse.h can now include gram.h from
within the same directory. This has the side-effect of no longer
installing gram.h and gramparse.h, but there doesn't seem to be a
good reason to continue doing so.
Per suggestion from Andres Freund and Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/20220904181759.px6uosll6zbxcum5%40awork3.anarazel.de
PG_COMPRESSION_OPTION_LEVEL is removed from the compression
specification logic, and instead the compression level is always
assigned with each library's default if nothing is directly given. This
centralizes the checks on the compression methods supported by a given
build, and always assigns a default compression level when parsing a
compression specification. This results in complaining at an earlier
stage than previously if a build supports a compression method or not,
aka when parsing a specification in the backend or the frontend, and not
when processing it. zstd, lz4 and zlib are able to handle in their
respective routines setting up the compression level the case of a
default value, hence the backend or frontend code (pg_receivewal or
pg_basebackup) has now no need to know what the default compression
level should be if nothing is specified: the logic is now done so as the
specification parsing assigns it. It can also be enforced by passing
down a "level" set to the default value, that the backend will accept
(the replication protocol is for example able to handle a command like
BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')).
This code simplification fixes an issue with pg_basebackup --gzip
introduced by ffd5365, where the tarball of the streamed WAL segments
would be created as of pg_wal.tar.gz with uncompressed contents, while
the intention is to compress the segments with gzip at a default level.
The origin of the confusion comes from the handling of the default
compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of
0 was getting assigned, which is what walmethods.c would consider
as equivalent to no compression when streaming WAL segments with its tar
methods. Assigning always the compression level removes the confusion
of some code paths considering a value of 0 set in a specification as
either no compression or a default compression level.
Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests
where the shape of the compression detail string for client and
server-side compression was checked using gzip. This is a result of the
code simplification, as gzip specifications cannot be used if a build
does not support it.
Reported-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 15
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation. It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:
* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
built-in GUC variables, along with some already-exported constant
tables.
* GUC check/assign/show hook functions are moved to the variable's
home module, whenever that's clearly identifiable. A few hard-
to-classify hooks ended up in commands/variable.c, which was
already a home for miscellaneous GUC hook functions.
To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module. That allowed removal of #include "guc.h"
from some existing headers. The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.
There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments. But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.
Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.
Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
Commit d8594d123 updated the list of non-spacing codepoints used
for calculating display width, but in doing so inadvertently removed
some, since the script used for that commit only considered combining
characters.
For complete coverage for zero-width characters, include codepoints in
the category Cf (Format). To reflect the wider purpose, also rename files
and update comments that referred specifically to combining characters.
Some of these ranges have been missing since v12, but due to lack of
field complaints it was determined not important enough to justify adding
special-case logic the backbranches.
Kyotaro Horiguchi
Report by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRBE8yvpQ0FSkPCoe0Ny1jAAsAQ6j3qMgVwWvkqAoaaNmQ%40mail.gmail.com
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.
Specifically, this adds palloc_object(), palloc_array(), and
repalloc_array(), which take the type name of the object to be
allocated as its first argument and cast the return as a pointer to
that type. There are also palloc0_object() and palloc0_array()
variants for initializing with zero, and pg_malloc_*() variants of all
of the above.
Inspired by the talloc library.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
We should process completed IOs *before* trying to start more, so that
it is always possible to decode one more record when the decoded record
queue is empty, even if maintenance_io_concurrency is set so low that a
single earlier WAL record might have saturated the IO queue.
That bug was hidden because the effect of maintenance_io_concurrency was
arbitrarily clamped to be at least 2. Fix the ordering, and also remove
that clamp. We need a special case for 0, which is now treated the same
as recovery_prefetch=off, but otherwise the number is used directly.
This allows for testing with 1, which would have made the problem
obvious in simple test scenarios.
Also add an explicit error message for missing contrecords. It was a
bit strange that we didn't report an error already, and became a latent
bug with prefetching, since the internal state that tracks aborted
contrecords would not survive retrying, as revealed by
026_overwrite_contrecord.pl with this adjustment. Reporting an error
prevents that.
Back-patch to 15.
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
The addition of published column names forgot to filter on attisdropped,
leading to cases where you could see "........pg.dropped.1........"
or the like as a reportedly-published column.
While we're here, rewrite the new subquery to get a more efficient plan
for it.
Hou Zhijie, per report from Jaime Casanova. Back-patch to v15 where
the bug was introduced. (Sadly, this means we need a post-beta4
catversion bump before beta4 has even hit the streets. I see no
good alternative though.)
Discussion: https://postgr.es/m/Yxa1SU4nH2HfN3/i@ahch-to
XLogPageRead() can retry internally after a pread() system call has
succeeded, in the case of short reads, and page validation failures
while in standby mode (see commit 0668719801). Due to an oversight in
commit 3f1ce973, these cases could leave stale data in the internal
cache of xlogreader.c without marking it invalid. The main defense
against stale cached data on failure to read a page was in the error
handling path of the calling function ReadPageInternal(), but that
wasn't quite enough for errors handled internally by XLogPageRead()'s
retry loop if we then exited with XLREAD_WOULDBLOCK.
1. ReadPageInternal() now marks the cache invalid before calling the
page_read callback, by setting state->readLen to 0. It'll be set to
a non-zero value only after a successful read. It'll stay valid as
long as the caller requests data in the cached range.
2. XLogPageRead() no long performs internal retries while reading
ahead. While such retries should work, the general philosophy is
that we should give up prefetching if anything unusual happens so we
can handle it when recovery catches up, to reduce the complexity of
the system. Let's do that here too.
3. While here, a new function XLogReaderResetError() improves the
separation between xlogrecovery.c and xlogreader.c, where the former
previously clobbered the latter's internal error buffer directly.
The new function makes this more explicit, and also clears a related
flag, without which a standby would needlessly retry in the outer
function.
Thanks to Noah Misch for tracking down the conditions required for a
rare build farm failure in src/bin/pg_ctl/t/003_promote.pl, and
providing a reproducer.
Back-patch to 15.
Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20220807003627.GA4168930%40rfd.leadboat.com
The planner has to special-case indexes on boolean columns, because
what we need for an indexscan on such a column is a qual of the shape
of "boolvar = pseudoconstant". For plain bool constants, previous
simplification will have reduced this to "boolvar" or "NOT boolvar",
and we have to reverse that if we want to make an indexqual. There is
existing code to do so, but it only fires when the index's opfamily
is BOOL_BTREE_FAM_OID or BOOL_HASH_FAM_OID. Thus extension AMs, or
extension opclasses such as contrib/btree_gin, are out in the cold.
The reason for hard-wiring the set of relevant opfamilies was mostly
to avoid a catalog lookup in a hot code path. We can improve matters
while not taking much of a performance hit by relying on the
hard-wired set when the opfamily OID is visibly built-in, and only
checking the catalogs when dealing with an extension opfamily.
While here, rename IsBooleanOpfamily to IsBuiltinBooleanOpfamily
to remind future users of that macro of its limitations. At some
point we might want to make indxpath.c's improved version of the
test globally accessible, but it's not presently needed elsewhere.
Zongliang Quan and Tom Lane
Discussion: https://postgr.es/m/f293b91d-1d46-d386-b6bb-4b06ff5c667b@yeah.net
The reverts the following and makes some associated cleanups:
commit f79b803dc: Common SQL/JSON clauses
commit f4fb45d15: SQL/JSON constructors
commit 5f0adec25: Make STRING an unreserved_keyword.
commit 33a377608: IS JSON predicate
commit 1a36bc9db: SQL/JSON query functions
commit 606948b05: SQL JSON functions
commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR()
commit 4e34747c8: JSON_TABLE
commit fadb48b00: PLAN clauses for JSON_TABLE
commit 2ef6f11b0: Reduce running time of jsonb_sqljson test
commit 14d3f24fa: Further improve jsonb_sqljson parallel test
commit a6baa4bad: Documentation for SQL/JSON features
commit b46bcf7a4: Improve readability of SQL/JSON documentation.
commit 112fdb352: Fix finalization for json_objectagg and friends
commit fcdb35c32: Fix transformJsonBehavior
commit 4cd8717af: Improve a couple of sql/json error messages
commit f7a605f63: Small cleanups in SQL/JSON code
commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug
commit a79153b7a: Claim SQL standard compliance for SQL/JSON features
commit a1e7616d6: Rework SQL/JSON documentation
commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types.
commit 3c633f32b: Only allow returning string types or bytea from json_serialize
commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.
The release notes are also adjusted.
Backpatch to release 15.
Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
Before now, the cutoffs that VACUUM used to determine which XIDs/MXIDs
to freeze were determined at the start of each VACUUM by taking related
cutoffs that represent which XIDs/MXIDs VACUUM should treat as still
running, and subtracting an XID/MXID age based value controlled by GUCs
like vacuum_freeze_min_age. The FreezeLimit cutoff (XID freeze cutoff)
was derived by subtracting an XID age value from OldestXmin, while the
MultiXactCutoff cutoff (MXID freeze cutoff) was derived by subtracting
an MXID age value from OldestMxact. This approach didn't match the
approach used nearby to determine whether this VACUUM operation should
be an aggressive VACUUM or not.
VACUUM now uses the standard approach instead: it subtracts the same
age-based values from next XID/next MXID (rather than subtracting from
OldestXmin/OldestMxact). This approach is simpler and more uniform.
Most of the time it will have only a negligible impact on how and when
VACUUM freezes. It will occasionally make VACUUM more robust in the
event of problems caused by long running transaction. These are cases
where OldestXmin and OldestMxact are held back by so much that they
attain an age that is a significant fraction of the value of age-based
settings like vacuum_freeze_min_age.
There is no principled reason why freezing should be affected in any way
by the presence of a long-running transaction -- at least not before the
point that the OldestXmin and OldestMxact limits used by each VACUUM
operation attain an age that makes it unsafe to freeze some of the
XIDs/MXIDs whose age exceeds the value of the relevant age-based
settings. The new approach should at least make freezing degrade more
gracefully than before, even in the most extreme cases.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com
Robert Haas reported that his older clang compiler didn't like the two
Asserts which were verifying that the given MemoryContextMethodID was <=
MEMORY_CONTEXT_METHODID_MASK when building with
-Wtautological-constant-out-of-range-compare. In my (David's) opinion,
the compiler is wrong to warn about that. Newer versions of clang don't
warn about the out of range enum value, so perhaps this was a bug that has
now been fixed. To keep older clang versions happy, let's just cast the
enum value to int to stop the compiler complaining.
The main reason for the Asserts mentioned above to exist are to inform
future developers which are adding new MemoryContexts if they run out of
bit space in MemoryChunk to store the MemoryContextMethodID. As pointed
out by Tom Lane, it seems wise to also add a comment to the header for
that enum to document the restriction on these enum values.
Additionally, also fix an incorrect usage of UINT64CONST() which was
introduced in c6e0fe1f2.
Author: Robert Haas, David Rowley
Discussion: https://postgr.es/m/CA+TgmoYGG2C7Vbw1cjkQRRBL3zOk8SmhrQnsJgzscX=N9AwPrw@mail.gmail.com
This reverts commit df0f4feef. It turns out the problem which was causing
the 32-bit ARM and PPC animals to fail was due to a MAXALIGN problem in
slab.c. This was fixed by d5ee4db0e. The padding that was added in
df0f4feef would only do anything on machines where uint64 was not aligned
to 8 bytes. The 32-bit machines which were failing are not in that
category, so revert this commit.
Discussion: https://postgr.es/m/3209100.1661787561@sss.pgh.pa.us
Buildfarm animals skate, grison and mamba are Assert failing on the
pointer being given to repalloc not being MAXALIGNED. c6e0fe1f2a made
changes in that area.
All of these animals are 32-bit with a MAXIMUM_ALIGNOF of 8 and a
SIZEOF_VOID_P of 4. I suspect that the pointer is not properly aligned due
to the lack of padding in the MemoryChunk struct.
Here we add the same type of padding that was previously used in
AllocChunkData and GenerationChunk that c6e0fe1f2a neglected to add.
Discussion: https://postgr.es/m/CAA4eK1%2B1JyW5TiL%3DyV-3Uq1CrfnTyn0Xrk5uArt31Z%3D8rgPhXQ%40mail.gmail.com
NEON support is required on the Aarch64 architecture for standard
implementations. Hardware designers for specialized markets can choose
not to support it, but that's true of floating point as well, which
we assume is supported. As with x86, some SIMD support is available
on 32-bit platforms, but those are not interesting from a performance
standpoint and would require an inconvenient runtime check.
Nathan Bossart
Reviewed by John Naylor, Andres Freund, Thomas Munro, and Tom Lane
Discussion: https://www.postgresql.org/message-id/flat/CAFBsxsEyR9JkfbPcDXBRYEfdfC__OkwVGdwEAgY4Rv0cvw35EA%40mail.gmail.com#aba7a64b11503494ffd8dd27067626a9
Whenever we palloc a chunk of memory, traditionally, we prefix the
returned pointer with a pointer to the memory context to which the chunk
belongs. This is required so that we're able to easily determine the
owning context when performing operations such as pfree() and repalloc().
For the AllocSet context, prior to this commit we additionally prefixed
the pointer to the owning context with the size of the chunk. This made
the header 16 bytes in size. This 16-byte overhead was required for all
AllocSet allocations regardless of the allocation size.
For the generation context, the problem was worse; in addition to the
pointer to the owning context and chunk size, we also stored a pointer to
the owning block so that we could track the number of freed chunks on a
block.
The slab allocator had a 16-byte chunk header.
The changes being made here reduce the chunk header size down to just 8
bytes for all 3 of our memory context types. For small to medium sized
allocations, this significantly increases the number of chunks that we can
fit on a given block which results in much more efficient use of memory.
Additionally, this commit completely changes the rule that pointers to
palloc'd memory must be directly prefixed by a pointer to the owning
memory context and instead, we now insist that they're directly prefixed
by an 8-byte value where the least significant 3-bits are set to a value
to indicate which type of memory context the pointer belongs to. Using
those 3 bits as an index (known as MemoryContextMethodID) to a new array
which stores the methods for each memory context type, we're now able to
pass the pointer given to functions such as pfree() and repalloc() to the
function specific to that context implementation to allow them to devise
their own methods of finding the memory context which owns the given
allocated chunk of memory.
The reason we're able to reduce the chunk header down to just 8 bytes is
because of the way we make use of the remaining 61 bits of the required
8-byte chunk header. Here we also implement a general-purpose MemoryChunk
struct which makes use of those 61 remaining bits to allow the storage of
a 30-bit value which the MemoryContext is free to use as it pleases, and
also the number of bytes which must be subtracted from the chunk to get a
reference to the block that the chunk is stored on (also 30 bits). The 1
additional remaining bit is to denote if the chunk is an "external" chunk
or not. External here means that the chunk header does not store the
30-bit value or the block offset. The MemoryContext can use these
external chunks at any time, but must use them if any of the two 30-bit
fields are not large enough for the value(s) that need to be stored in
them. When the chunk is marked as external, it is up to the MemoryContext
to devise its own means to determine the block offset.
Using 3-bits for the MemoryContextMethodID does mean we're limiting
ourselves to only having a maximum of 8 different memory context types.
We could reduce the bit space for the 30-bit value a little to make way
for more than 3 bits, but it seems like it might be better to do that only
if we ever need more than 8 context types. This would only be a problem
if some future memory context type which does not use MemoryChunk really
couldn't give up any of the 61 remaining bits in the chunk header.
With this MemoryChunk, each of our 3 memory context types can quickly
obtain a reference to the block any given chunk is located on. AllocSet
is able to find the context to which the chunk is owned, by first
obtaining a reference to the block by subtracting the block offset as is
stored in the 'hdrmask' field and then referencing the block's 'aset'
field. The Generation context uses the same method, but GenerationBlock
did not have a field pointing back to the owning context, so one is added
by this commit.
In aset.c and generation.c, all allocations larger than allocChunkLimit
are stored on dedicated blocks. When there's just a single chunk on a
block like this, it's easy to find the block from the chunk, we just
subtract the size of the block header from the chunk pointer. The size of
these chunks is also known as we store the endptr on the block, so we can
just subtract the pointer to the allocated memory from that. Because we
can easily find the owning block and the size of the chunk for these
dedicated blocks, we just always use external chunks for allocation sizes
larger than allocChunkLimit. For generation.c, this sidesteps the problem
of non-external MemoryChunks being unable to represent chunk sizes >= 1GB.
This is less of a problem for aset.c as we store the free list index in
the MemoryChunk's spare 30-bit field (the value of which will never be
close to using all 30-bits). We can easily reverse engineer the chunk size
from this when needed. Storing this saves AllocSetFree() from having to
make a call to AllocSetFreeIndex() to determine which free list to put the
newly freed chunk on.
For the slab allocator, this commit adds a new restriction that slab
chunks cannot be >= 1GB in size. If there happened to be any users of
slab.c which used chunk sizes this large, they really should be using
AllocSet instead.
Here we also add a restriction that normal non-dedicated blocks cannot be
1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB
during the creation of an AllocSet or Generation context. Allocations can
still be larger than 1GB, it's just these will always be on dedicated
blocks (which do not have the 1GB restriction).
Author: Andres Freund, David Rowley
Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
Per flame graph from Jelte Fennema, COPY FROM ... USING BINARY shows
input validation taking at least 5% of the profile, so it's worth trying
to be more efficient here. With this change, validation of pure ASCII is
nearly 40% faster on contemporary Intel hardware. To make this change
legible and easier to adopt to additional architectures, use helper
functions to abstract the platform details away.
Reviewed by Nathan Bossart
Discussion: https://www.postgresql.org/message-id/CAFBsxsG%3Dk8t%3DC457FXnoBXb%3D8iA4OaZkbFogFMachWif7mNnww%40mail.gmail.com
SUSv3 <netinet/in.h> defines struct sockaddr_in6, and all targeted Unix
systems have it. Windows has it in <ws2ipdef.h>. Remove the configure
probe, the macro and a small amount of dead code.
Also remove a mention of IPv6-less builds from the documentation, since
there aren't any.
This is similar to commits f5580882 and 077bf2f2 for Unix sockets. Even
though AF_INET6 is an "optional" component of SUSv3, there are no known
modern operating system without it, and it seems even less likely to be
omitted from future systems than AF_UNIX.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGKErNfhmvb_H0UprEmp4LPzGN06yR2_0tYikjzB-2ECMw@mail.gmail.com
The GRANT statement can now specify WITH INHERIT TRUE or WITH
INHERIT FALSE to control whether the member inherits the granted
role's permissions. For symmetry, you can now likewise write
WITH ADMIN TRUE or WITH ADMIN FALSE to turn ADMIN OPTION on or off.
If a GRANT does not specify WITH INHERIT, the behavior based on
whether the member role is marked INHERIT or NOINHERIT. This means
that if all roles are marked INHERIT or NOINHERIT before any role
grants are performed, the behavior is identical to what we had before;
otherwise, it's different, because ALTER ROLE [NO]INHERIT now only
changes the default behavior of future grants, and has no effect on
existing ones.
Patch by me. Reviewed and testing by Nathan Bossart and Tushar Ahuja,
with design-level comments from various others.
Discussion: http://postgr.es/m/CA+Tgmoa5Sf4PiWrfxA=sGzDKg0Ojo3dADw=wAHOhR9dggV=RmQ@mail.gmail.com
This is preparatory work for a project to increase the number of bits
in a RelFileNumber from 32 to 56.
Along the way, introduce static inline accessor functions for a couple
of BufferTag fields.
Dilip Kumar, reviewed by me. The overall patch series has also had
review at various times from Andres Freund, Ashutosh Sharma, Hannu
Krosing, Vignesh C, Álvaro Herrera, and Tom Lane.
Discussion: http://postgr.es/m/CAFiTN-trubju5YbWAq-BSpZ90-Z6xCVBQE8BVqXqANOZAF1Znw@mail.gmail.com
This commit moves authn_id into a new global structure called
ClientConnectionInfo (mapping to a MyClientConnectionInfo for each
backend) which is intended to hold all the client information that
should be shared between the backend and any of its parallel workers,
access for extensions and triggers being the primary use case. There is
no need to push all the data of Port to the workers, and authn_id is
quite a generic concept so using a separate structure provides the best
balance (the name of the structure has been suggested by Robert Haas).
While on it, and per discussion as this would be useful for a potential
SYSTEM_USER that can be accessed through parallel workers, a second
field is added for the authentication method, copied directly from
Port.
ClientConnectionInfo is serialized and restored using a new parallel
key and a structure tracks the length of the authn_id, making the
addition of more fields straight-forward.
Author: Jacob Champion
Reviewed-by: Bertrand Drouvot, Stephen Frost, Robert Haas, Tom Lane,
Michael Paquier, Julien Rouhaud
Discussion: https://postgr.es/m/793d990837ae5c06a558d58d62de9378ab525d83.camel@vmware.com
I added this in commit 153f40067, out of paranoia about kernels
possibly rejecting very large listen backlog requests. However,
POSIX has said for decades that the kernel must silently reduce
any value it considers too large, and there's no evidence that
any current system doesn't obey that. Let's just drop this limit
and save some complication.
While we're here, compute the request as twice MaxConnections not
twice MaxBackends; the latter no longer means what it did in 2001.
Per discussion of a report from Kevin McKibbin.
Discussion: https://postgr.es/m/CADc_NKg2d+oZY9mg4DdQdoUcGzN2kOYXBu-3--RW_hEe0tUV=g@mail.gmail.com
All backends should have a BackendType to enable statistics reporting
per BackendType.
Add a new BackendType for standalone backends, B_STANDALONE_BACKEND (and
alphabetize the BackendTypes). Both the bootstrap backend and single
user mode backends will have BackendType B_STANDALONE_BACKEND.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/CAAKRu_aaq33UnG4TXq3S-OSXGWj1QGf0sU%2BECH4tNwGFNERkZA%40mail.gmail.com
Somewhere during the development of the patch acquiring a lock during read
access to variable-numbered stats got lost. The missing lock acquisition won't
cause corruption, but can lead to reading torn values when accessing
stats. Add the missing lock acquisitions.
Reported-by: Greg Stark <stark@mit.edu>
Reviewed-by: "Drouvot, Bertrand" <bdrouvot@amazon.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CAM-w4HMYkM_DkYhWtUGV+qE_rrBxKOzOF0+5faozxO3vXrc9wA@mail.gmail.com
Backpatch: 15-
Previously, membership of role A in role B could be recorded in the
catalog tables only once. This meant that a new grant of role A to
role B would overwrite the previous grant. For other object types, a
new grant of permission on an object - in this case role A - exists
along side the existing grant provided that the grantor is different.
Either grant can be revoked independently of the other, and
permissions remain so long as at least one grant remains. Make role
grants work similarly.
Previously, when granting membership in a role, the superuser could
specify any role whatsoever as the grantor, but for other object types,
the grantor of record must be either the owner of the object, or a
role that currently has privileges to perform a similar GRANT.
Implement the same scheme for role grants, treating the bootstrap
superuser as the role owner since roles do not have owners. This means
that attempting to revoke a grant, or admin option on a grant, can now
fail if there are dependent privileges, and that CASCADE can be used
to revoke these. It also means that you can't grant ADMIN OPTION on
a role back to a user who granted it directly or indirectly to you,
similar to how you can't give WITH GRANT OPTION on a privilege back
to a role which granted it directly or indirectly to you.
Previously, only the superuser could specify GRANTED BY with a user
other than the current user. Relax that rule to allow the grantor
to be any role whose privileges the current user posseses. This
doesn't improve compatibility with what we do for other object types,
where support for GRANTED BY is entirely vestigial, but it makes this
feature more usable and seems to make sense to change at the same time
we're changing related behaviors.
Along the way, fix "ALTER GROUP group_name ADD USER user_name" to
require the same privileges as "GRANT group_name TO user_name".
Previously, CREATEROLE privileges were sufficient for either, but
only the former form was permissible with ADMIN OPTION on the role.
Now, either CREATEROLE or ADMIN OPTION on the role suffices for
either spelling.
Patch by me, reviewed by Stephen Frost.
Discussion: http://postgr.es/m/CA+TgmoaFr-RZeQ+WoQ5nKPv97oT9+aDgK_a5+qWHSgbDsMp1Vg@mail.gmail.com
Remove four probes for members of sockaddr_storage. Keep only the probe
for sockaddr's sa_len, which is enough for our two remaining places that
know about _len fields:
1. ifaddr.c needs to know if sockaddr has sa_len to understand the
result of ioctl(SIOCGIFCONF). Only AIX is still using the relevant code
today, but it seems like a good idea to keep it compilable on Linux.
2. ip.c was testing for presence of ss_len to decide whether to fill in
sun_len in our getaddrinfo_unix() function. It's just as good to test
for sa_len. If you have one, you have them all.
(The code in #2 isn't actually needed at all on several OSes I checked
since modern versions ignore sa_len on input to system calls. Proving
that's the case for all relevant OSes is left for another day, but
wouldn't get rid of that last probe anyway if we still want it for #1.)
Discussion: https://postgr.es/m/CA%2BhUKGJJjF2AqdU_Aug5n2MAc1gr%3DGykNjVBZq%2Bd6Jrcp3Dyvg%40mail.gmail.com
Previously, "GRANT foo TO bar" or "GRANT foo TO bar GRANTED BY baz"
would record the OID of the grantor in pg_auth_members.grantor, but
that role could later be dropped without modifying or removing the
pg_auth_members record. That's not great, because we typically try
to avoid dangling references in catalog data.
Now, a role grant depends on the grantor, and the grantor can't be
dropped without removing the grant or changing the grantor. "DROP
OWNED BY" will remove the grant, just as it does for other kinds of
privileges. "REASSIGN OWNED BY" will not, again just like what we do
in other cases involving privileges.
pg_auth_members now has an OID column, because that is needed in order
for dependencies to work. It also now has an index on the grantor
column, because otherwise dropping a role would require a sequential
scan of the entire table to see whether the role's OID is in use as
a grantor. That probably wouldn't be too large a problem in practice,
but it seems better to have an index just in case.
A follow-on patch is planned with the goal of more thoroughly
rationalizing the behavior of role grants. This patch is just trying
to do enough to make sure that the data we store in the catalogs is at
some basic level valid.
Patch by me, reviewed by Stephen Frost
Discussion: http://postgr.es/m/CA+TgmoaFr-RZeQ+WoQ5nKPv97oT9+aDgK_a5+qWHSgbDsMp1Vg@mail.gmail.com
The present implementations of adjust_appendrel_attrs_multilevel and
its sibling adjust_child_relids_multilevel are very messy, because
they work by reconstructing the relids of the child's immediate
parent and then seeing if that's bms_equal to the relids of the
target parent. Aside from being quite inefficient, this will not
work with planned future changes to make joinrels' relid sets
contain outer-join relids in addition to baserels.
The whole thing can be solved at a stroke by adding explicit parent
and top_parent links to child RelOptInfos, and making these functions
work with RelOptInfo pointers instead of relids. Doing that is
simpler for most callers, too.
In my original version of this patch, I got rid of
RelOptInfo.top_parent_relids on the grounds that it was now redundant.
However, that adds a lot of code churn in places that otherwise would
not need changing, and arguably the extra indirection needed to fetch
top_parent->relids in those places costs something. So this version
leaves that field in place.
Discussion: https://postgr.es/m/553080.1657481916@sss.pgh.pa.us
Commit 5579388d was confused about why gai_strerror() didn't work, and
used gai_strerrorA(). It turns out that we had explicitly undefined
Windows' own macro for that somewhere else. Get rid of all that, and
use the system headers' definition of gai_sterror() directly as
intended.
Discussion: https://postgr.es/m/CA+hUKGKErNfhmvb_H0UprEmp4LPzGN06yR2_0tYikjzB-2ECMw@mail.gmail.com
On BSD-family systems, header <sys/sockio.h> defines socket ioctl
numbers like SIOCGIFCONF. Only AIX is using those now, but it defines
them in <net/if.h> anyway.
Supposing some PostgreSQL hacker wants to test that AIX-only code path
on a more common development system by pretending not to have
getifaddrs(). It's enough to include <sys/ioctl.h>, at least on macOS,
FreeBSD and Linux, and we're already doing that.
Up to now, callers of find_placeholder_info() were required to pass
a flag indicating if it's OK to make a new PlaceHolderInfo. That'd
be fine if the callers had free choice, but they do not. Once we
begin deconstruct_jointree() it's no longer OK to make more PHIs;
while callers before that always want to create a PHI if it's not
there already. So there's no freedom of action, only the opportunity
to cause bugs by creating PHIs too late. Let's get rid of that in
favor of adding a state flag PlannerInfo.placeholdersFrozen, which
we can set at the point where it's no longer OK to make more PHIs.
This patch also simplifies a couple of call sites that were using
complicated logic to avoid calling find_placeholder_info() as much
as possible. Now that that lookup is O(1) thanks to the previous
commit, the extra bitmap manipulations are probably a net negative.
Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
Up to now we've just searched the placeholder_list when we want to
find the PlaceHolderInfo with a given ID. While there's no evidence
of that being a problem in the field, an upcoming patch will add
find_placeholder_info() calls in build_joinrel_tlist(), which seems
likely to make it more of an issue: a joinrel emitting lots of
PlaceHolderVars would incur O(N^2) cost, and we might be building
a lot of joinrels in complex queries. Hence, add an array that
can be indexed directly by phid to make the lookups constant-time.
Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
This event can happen when using SET ACCESS METHOD, as the data files of
the materialized need a full refresh but this command tag was not
updated to reflect that. The documentation is updated to track this
behavior.
Author: Onder Kalaci
Discussion: https://postgr.es/m/CACawEhXwHN3X34FiwoYG8vXR-oyUdrp7qcfRWSzS+NPahS5gSw@mail.gmail.com
Backpatch-through: 15
SUSv3, all targeted Unixes and modern Windows have getaddrinfo() and
related interfaces. Drop the replacement implementation, and adjust
some headers slightly to make sure that the APIs are visible everywhere
using standard POSIX headers and names.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
Since HAVE_UNIX_SOCKETS is now defined unconditionally, remove the macro
and drop a small amount of dead code.
The last known systems not to have them (as far as I know at least) were
QNX, which we de-supported years ago, and Windows, which now has them.
If a new OS ever shows up with the POSIX sockets API but without working
AF_UNIX, it'll presumably still be able to compile the code, and fail at
runtime with an unsupported address family error. We might want to
consider adding a HINT that you should turn off the option to use it if
your network stack doesn't support it at that point, but it doesn't seem
worth making the relevant code conditional at compile time.
Also adjust a couple of places in the docs and comments that referred to
builds without Unix-domain sockets, since there aren't any. Windows
still gets a special mention in those places, though, because we don't
try to use them by default there yet.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
Assume that we can use LWARX hint flags and the LWSYNC instruction
on any PPC machine. The check on the assembler's behavior was only
needed for Apple's old assembler, which is no longer of interest
now that we've de-supported all PPC-era versions of macOS (thanks
to them not having clock_gettime()). Also, given an up-to-date
assembler these instructions work even on Apple's old hardware.
It seems quite unlikely that anyone would be interested in running
current Postgres on PPC hardware that's so old as to not have
these instructions.
Hence, rip out associated configure test and manual configuration
options, and just use the modernized instructions all the time.
Also, update atomics/arch-ppc.h to use these instructions as well.
(It was already using LWSYNC unconditionally in another place,
providing further proof that nobody is using PG on hardware old
enough to have a problem with that.)
Discussion: https://postgr.es/m/166622.1660323391@sss.pgh.pa.us
<sys/resource.h> is in SUSv2 and is on all targeted Unix systems. We
have a replacement for getrusage() on Windows, so let's just move its
declarations into src/include/port/win32/sys/resource.h so that we can
use a standard-looking #include. Also remove an obsolete reference to
CLK_TCK. Also rename src/port/getrusage.c to win32getrusage.c,
following the convention for Windows-only fallback code.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
<sys/un.h> is in SUSv3 and every targeted Unix has it. Some Windows
tool chains may still lack the approximately equivalent header
<afunix.h>, so we already defined struct sockaddr_un ourselves on that
OS for now. To harmonize things a bit, move our definition into a new
header src/include/port/win32/sys/un.h.
HAVE_UNIX_SOCKETS is now defined unconditionally. We migh remove that
in a separate commit, pending discussion.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
<sys/uio.h> is in SUSv2, and all targeted Unix system have it, so we
might as well drop the probe (in fact we never really needed this one).
It's where struct iovec is defined, and as a common extension, it's also
where non-standard preadv() and pwritev() are declared on systems that
have them.
We should also be able to assume that IOV_MAX is defined on Unix.
To spell out what our pg_iovec.h header does for the OSes in the build
farm as of today:
Windows: our own struct and functions
Solaris, Cygwin: <sys/uio.h>'s struct, our own functions
Every other Unix: <sys/uio.h>'s struct and functions
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION
records to know if the transaction has modified the catalog, and that
information is not serialized to snapshot. Therefore, after the restart,
if the logical decoding decodes only the commit record of the transaction
that has actually modified a catalog, we will miss adding its XID to the
snapshot. Thus, we will end up looking at catalogs with the wrong
snapshot.
To fix this problem, this change adds the list of transaction IDs and
sub-transaction IDs, that have modified catalogs and are running during
snapshot serialization, to the serialized snapshot. After restart or
otherwise, when we restore from such a serialized snapshot, the
corresponding list is restored in memory. Now, when decoding a COMMIT
record, we check both the list and the ReorderBuffer to see if the
transaction has modified catalogs.
Since this adds additional information to the serialized snapshot, we
cannot backpatch it. For back branches, we took another approach.
We remember the last-running-xacts list of the decoded RUNNING_XACTS
record after restoring the previously serialized snapshot. Then, we mark
the transaction as containing catalog changes if it's in the list of
initial running transactions and its commit record has
XACT_XINFO_HAS_INVALS. This doesn't require any file format changes but
the transaction will end up being added to the snapshot even if it has
only relcache invalidations. But that won't be a problem since we use
snapshot built during decoding only to read system catalogs.
This commit bumps SNAPBUILD_VERSION because of a change in SnapBuild.
Reported-by: Mike Oh
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi
Backpatch-through: 10
Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
Use SSE2 intrinsics to speed up the search, where available. Otherwise,
use a simple 'for' loop. The motivation to add this now is to speed up
XidInMVCCSnapshot(), which is the reason only unsigned 32-bit integer
arrays are optimized. Other types are left for future work, as is the
extension of this technique to non-x86 platforms.
Nathan Bossart
Reviewed by: Andres Freund, Bharath Rupireddy, Masahiko Sawada
Discussion: https://postgr.es/m/20220713170950.GA3116318%40nathanxps13
This commit addresses a few things around GUCs:
- The TCP-related parameters (the four tcp_keepalives_* and
client_connection_check_interval are listed in postgresql.conf.sample in
a subsection called "TCP settings" of "CONNECTIONS AND AUTHENTICATION",
but they did not have their own group name in guc.c.
- enable_group_by_reordering, stats_fetch_consistency and
recovery_prefetch had an inconsistent description, missing a dot at the
end.
- In postgresql.conf.sample, "Process title" should not have a section
of its own, but it should be a subsection of "REPORTING AND LOGGING".
This impacts the contents of pg_settings, which could be seen as a
compatibility break, so no backpatch is done. This is similar to the
cleanup done in a55a984.
Author: Shinya Kato
Discussion: https://postgr.es/m/5e0c9c608624eafbba910c344282cb14@oss.nttdata.com
Commit 623cc673 removed gettimeofday(), and commits 24c3ce8f and
495ed0ef removed support for very old Windows releases with low accuracy
timers, but references to those things were left behind in comments.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/295419.1659918447%40sss.pgh.pa.us
Previously, if an extension script did CREATE OR REPLACE and there was
an existing object not belonging to the extension, it would overwrite
the object and adopt it into the extension. This is problematic, first
because the overwrite is probably unintentional, and second because we
didn't change the object's ownership. Thus a hostile user could create
an object in advance of an expected CREATE EXTENSION command, and would
then have ownership rights on an extension object, which could be
modified for trojan-horse-type attacks.
Hence, forbid CREATE OR REPLACE of an existing object unless it already
belongs to the extension. (Note that we've always forbidden replacing
an object that belongs to some other extension; only the behavior for
previously-free-standing objects changes here.)
For the same reason, also fail CREATE IF NOT EXISTS when there is
an existing object that doesn't belong to the extension.
Our thanks to Sven Klemm for reporting this problem.
Security: CVE-2022-2625
Previously we fell back to __FUNCTION__ and then NULL. As __func__ is in C99
that shouldn't be necessary anymore.
Solution.pm defined HAVE_FUNCNAME__FUNCTION instead of
HAVE_FUNCNAME__FUNC (originating in 4164e6636e), as at some point in the past
MSVC only supported __FUNCTION__. Our minimum version supports __func__.
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220807012914.ydz73yte6j3coulo@awork3.anarazel.de
strtof() is in C99 and all targeted systems have it. We can remove the
configure probe and some dead code, but we still need replacement code
for a couple of systems that have known buggy implementations selected
via platform template.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/152683.1659830125%40sss.pgh.pa.us
Previously we bothered to forward-declare struct timezone, following man
pages on typical systems, but POSIX actually says the argument (which we
ignore anyway) is void *. Drop a line.
While here, add an assertion that nobody actually uses the tzp argument.
Previously we did extra work to select between Windows APIs needed on
older releases, but now we can just use the higher resolution function
directly.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKwRpvGfcfq2qNVAQS2Wg1B9eA9QRhAmVSyJt1zsCN2sQ%40mail.gmail.com
There's no known supported system needing 1 argument gettimeofday()
support. The test for it was added a long time ago (92c6bf9775). Remove.
Until now we tested whether a gettimeofday() fallback is needed when
targetting windows. Which lead to the odd result that HAVE_GETTIMEOFDAY only
being defined when targetting MinGW (which has gettimeofday() since at least
2007). As the fallback is specific to msvc, remove the configure code and
rename src/port/gettimeofday.c to src/port/win32gettimeofday.c.
While at it, also remove the definition of struct timezone, a forward
declaration of the struct is sufficient.
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220806000311.ywx65iuchvj4qn2k@awork3.anarazel.de
Now that lstat() reports junction points with S_IFLNK/S_ISLINK(), and
unlink() can unlink them, there is no need for conditional code for
Windows in a few places. That was expressed by testing for WIN32 or
S_ISLNK, which we can now constant-fold.
The coding around pgwin32_is_junction() was a bit suspect anyway, as we
never checked for errors, and we also know that errors can be spuriously
reported because of transient sharing violations on this OS. The
lstat()-based code has handling for that.
This also reverts 4fc6b6ee on master only. That was done because
lstat() didn't previously work for symlinks (junction points), but now
it does.
Tested-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
Junction points will be reported with S_ISLNK(x.st_mode), simulating
POSIX lstat(). stat() will follow pseudo-symlinks, like in POSIX (but
only one level before giving up, unlike in POSIX).
This completes a TODO left by commit bed90759fc.
Tested-by: Andrew Dunstan <andrew@dunslane.net> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
strtoll was backfilled with either __strtoll or strtoq on systems without
strtoll. The last such system on the buildfarm was an ancient HP-UX animal. We
don't support HP-UX anymore, so remove.
On other systems strtoll was present, but did not have a declaration. The last
known instance on the buildfarm was running an ancient OSX and shut down in
2019.
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220804013546.h65najrzig764jar@awork3.anarazel.de
Using ATSimpleRecursion() in ATPrepCmd() to do so as bbb927b4db did is
not correct, because ATPrepCmd() can't distinguish between triggers that
may be cloned and those that may not, so would wrongly try to recurse
for the latter category of triggers.
So this commit restores the code in EnableDisableTrigger() that
86f575948c had added to do the recursion, which would do it only for
triggers that may be cloned, that is, row-level triggers. This also
changes tablecmds.c such that ATExecCmd() is able to pass the value of
ONLY flag down to EnableDisableTrigger() using its new 'recurse'
parameter.
This also fixes what seems like an oversight of 86f575948c that the
recursion to partition triggers would only occur if EnableDisableTrigger()
had actually changed the trigger. It is more apt to recurse to inspect
partition triggers even if the parent's trigger didn't need to be
changed: only then can we be certain that all descendants share the same
state afterwards.
Backpatch all the way back to 11, like bbb927b4db. Care is taken not
to break ABI compatibility (and that no catversion bump is needed.)
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Discussion: https://postgr.es/m/CA+HiwqG-cZT3XzGAnEgZQLoQbyfJApVwOTQaCaas1mhpf+4V5A@mail.gmail.com
clock_gettime() is in SUSv2 and all targeted Unix systems have it.
Remove a chunk of fallback code for old Unix is no longer reachable on
modern systems, and untested as of the retirement of build farm animal
prairiedog.
There is no need to retain a HAVE_CLOCK_GETTIME macro here, because it
is already used in a context with Unix and Windows code paths.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
Commit dd299df818, which made heap TID a tiebreaker nbtree index
column, introduced new rules on page space management to make suffix
truncation safe for v4+ indexes. New pivot tuples (generated by suffix
truncation during leaf page splits) sometimes require dedicated extra
space at the end of a new leaf page high key/pivot to store a heap TID
using a special representation (a representation only used in pivot
tuples).
The definition of "1/3 of a page" was reduced by a single MAXALIGN()
quantum for v4 indexes to make sure that the final enlarged pivot tuple
always fit, even with a split point whose firstright tuple happened to
already be at the "1/3 of a page" limit (limit for non-pivot tuples).
Internal pages (which only contain pivot tuples) stuck with the original
"1/3 of a page" definition. This scheme made it impossible for any page
split to fail to free enough space for its newitem, which is never okay.
The macro that determines whether non-pivot tuples exceed their "1/3 of
a leaf page" restriction was structured as if space was needed for all
three tuples during a leaf page split (the new pivot plus two very large
adjoining non-pivots that are separated by the split). This was subtly
wrong, in that it accidentally relied on implementation details that
could (at least in theory) change in the future.
To fix, make the macro subtract a single MAXALIGN() quantum, once. The
macro evaluates to exactly the same value as before in practice. But it
no longer depends on the current layout of nbtree's special area struct.
No backpatch, since this isn't a live bug.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Robert Haas <robertmhaas@gmail.com>
Diagnosed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+Tgmoa7UBxivM7f6Ocx_qbq4=ky3uXc+WZNOBcVX+kvJvWOEA@mail.gmail.com
preadv() and pwritev() are not standardized by POSIX, but appeared in
NetBSD in 1999 and were adopted by at least OpenBSD, FreeBSD,
DragonFlyBSD, Linux, AIX, illumos and macOS. We don't use them much
yet, but an active proposal uses them heavily.
In 15, we had two replacement implementations for other OSes: one based
on lseek() + -v function if available for true vector I/O, and the other
based on a loop over p- function.
The former would be an obstacle to hypothetical future multi-threaded
code sharing file descriptors, while the latter would not, since commit
cf112c12. Furthermore, the number of targeted systems that could
benefit from the former's potential upside has dwindled to just one
niche OS, since macOS added the functions and we de-supported HP-UX.
That doesn't seem like a good trade-off.
Therefore, drop the lseek()-based variant, and also the pg_ prefix now
that the file position portability hazard is gone.
At the time of writing, the only systems in our build farm that lack
native preadv/pwritev and thus use fallback code are:
* Solaris (but not illumos)
* macOS before release 11.0
* Windows
With this commit, the above systems will now use the *same* fallback
code, the version that loops over pread()/pwrite(). Windows already
used that (though a later proposal may include true vector I/O for
Windows), so this decision really only affects Solaris, until it gets
around to adding these system calls.
Also remove some useless includes while here.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
pread() and pwrite() are in SUSv2, and all targeted Unix systems have
them.
Previously, we defined pg_pread and pg_pwrite to emulate these function
with lseek() on old Unixen. The names with a pg_ prefix were a reminder
of a portability hazard: they might change the current file position.
That hazard is gone, so we can drop the prefixes.
Since the remaining replacement code is Windows-only, move it into
src/port/win32p{read,write}.c, and move the declarations into
src/include/port/win32_port.h.
No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only
used for declarations in port.h which have now moved into win32_port.h.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
setenv() and unsetenv() are in SUSv3 and targeted Unix systems have
them. We still need special code for these on Windows, but that doesn't
require a configure probe.
This marks the first time we require a SUSv3 (POSIX.1-2001) facility
(rather than SUSv2). The replacement code removed here was not needed
on any targeted system or any known non-EOL'd Unix system, and was
therefore dead and untested.
No need for vestigial HAVE_SETENV and HAVE_UNSETENV macros, because we
provide a replacement for Windows, and we didn't previously test the
macros.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
poll() and <poll.h> are in SUSv2 and all targeted Unix systems have
them.
Retain HAVE_POLL and HAVE_POLL_H macros for readability. There's an
error in latch.c that is now unreachable (since we always have one of
WIN32 or HAVE_POLL defined), but that falls out of a decision to keep
using defined(HAVE_POLL) instead of !defined(WIN32) to guard the poll()
code.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
link() is in SUSv2 and all targeted Unix systems have it. We have
replacement code for Windows that doesn't require a configure probe.
Since only Windows needs it, rename src/port/link.c to win32link.c like
other similar things.
There is no need for a vestigial HAVE_LINK macro, because we expect all
Unix and, with our replacement function, Windows systems to have it, so
we didn't have any tests around link() usage.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
symlink() and readlink() are in SUSv2 and all targeted Unix systems have
them. We have partial emulation on Windows. Code that raised runtime
errors on systems without it has been dead for years, so we can remove
that and also references to such systems in the documentation.
Define HAVE_READLINK and HAVE_SYMLINK macros on Unix. Our Windows
replacement functions based on junction points can't be used for
relative paths or for non-directories, so the macros can be used to
check for full symlink support. The places that deal with tablespaces
can just use symlink functions without checking the macros. (If they
did check the macros, they'd need to provide an #else branch with a
runtime or compile time error, and it'd be dead code.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
setsid() is in SUSv2 and all targeted Unix systems have it. Retain a
HAVE_SETSID macro, defined on Unix only. That's easier to understand
than !defined(WIN32), for the optional code it governs.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
shm_open() is in SUSv2 and all targeted Unix systems have it.
We retain a HAVE_SHM_OPEN macro, because it's clearer to readers than
something like !defined(WIN32).
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
getrlimit() is in SUSv2 and all targeted systems have it.
Windows doesn't have it. We could just use #ifndef WIN32, but for a
little more explanation about why we're making things conditional, let's
retain the HAVE_GETRLIMIT macro. It's defined in port.h for Unix systems.
On systems that have it, it's not necessary to test for RLIMIT_CORE,
RLIMIT_STACK or RLIMIT_NOFILE macros, since SUSv2 requires those and all
targeted systems have them. Also remove references to a pre-historic
alternative spelling of RLIMIT_NOFILE, and coding that seemed to believe
that Cygwin didn't have it.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
dlopen() is in SUSv2 and all targeted Unix systems have it. We still
need replacement functions for Windows, but we don't need a configure
probe for that.
Since it's no longer needed by other operating systems, rename dlopen.c
to win32dlopen.c and move the declarations into win32_port.h.
Likewise, the macros RTLD_NOW and RTLD_GLOBAL now only need to be
defined on Windows, since all targeted Unix systems have 'em.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
SSE2 vector instructions are part of the spec for the 64-bit x86
architecture. Until now we have relied on the compiler to autovectorize
in some limited situations, but some useful coding idioms can only be
expressed explicitly via compiler intrinsics. To this end, add a header
that defines USE_SSE2 where available. While x86-only for now, we can
add other architectures in the future. This will also be the intended
place for helper functions that use vector operations.
Reviewed by Nathan Bossart and Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/CAFBsxsE2G_H_5Wbw%2BNOPm70-BK4xxKf86-mRzY%3DL2sLoQqM%2B-Q%40mail.gmail.com
Commit fac1b470a thought we could check for set-returning functions
by testing only the top-level node in an expression tree. This is
wrong in itself, and to make matters worse it encouraged others
to make the same mistake, by exporting tlist.c's special-purpose
IS_SRF_CALL() as a widely-visible macro. I can't find any evidence
that anyone's taken the bait, but it was only a matter of time.
Use expression_returns_set() instead, and stuff the IS_SRF_CALL()
genie back in its bottle, this time with a warning label. I also
added a couple of cross-reference comments.
After a fair amount of fooling around, I've despaired of making
a robust test case that exposes the bug reliably, so no test case
here. (Note that the test case added by fac1b470a is itself
broken, in that it doesn't notice if you remove the code change.
The repro given by the bug submitter currently doesn't fail either
in v15 or HEAD, though I suspect that may indicate an unrelated bug.)
Per bug #17564 from Martijn van Oosterhout. Back-patch to v13,
as the faulty patch was.
Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org
ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been
executed by always performing a sort in nodeAgg.c to sort the tuples in
the current group into the correct order before calling the transition
function on the sorted tuples. This was not great as often there might be
an index that could have provided pre-sorted input and allowed the
transition functions to be called as the rows come in, rather than having
to store them in a tuplestore in order to sort them once all the tuples
for the group have arrived.
Here we change the planner so it requests a path with a sort order which
supports the most amount of ORDER BY / DISTINCT aggregate functions and
add new code to the executor to allow it to support the processing of
ORDER BY / DISTINCT aggregates where the tuples are already sorted in the
correct order.
Since there can be many ORDER BY / DISTINCT aggregates in any given query
level, it's very possible that we can't find an order that suits all of
these aggregates. The sort order that the planner chooses is simply the
one that suits the most aggregate functions. We take the most strictly
sorted variation of each order and see how many aggregate functions can
use that, then we try again with the order of the remaining aggregates to
see if another order would suit more aggregate functions. For example:
SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...
would request the sort order to be {a, b} because {a} is a subset of the
sort order of {a,b}, but;
SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...
would just pick a plan ordered by {a} (we give precedence to aggregates
which are earlier in the targetlist).
SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ...
would choose to order by {b} since two aggregates suit that vs just one
that requires input ordered by {a}.
Author: David Rowley
Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane
Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
Here we add code which detects when ExecFindPartition() continually finds
the same partition and add a caching layer to improve partition lookup
performance for such cases.
Both RANGE and LIST partitioned tables traditionally require a binary
search for the set of Datums that a partition needs to be found for. This
binary search is commonly visible in profiles when bulk loading into a
partitioned table. Here we aim to reduce the overhead of bulk-loading
into partitioned tables for cases where many consecutive tuples belong to
the same partition and make the performance of this operation closer to
what it is with a traditional non-partitioned table.
When we find the same partition 16 times in a row, the next search will
result in us simply just checking if the current set of values belongs to
the last found partition. For LIST partitioning we record the index into
the PartitionBoundInfo's datum array. This allows us to check if the
current Datum is the same as the Datum that was last looked up. This
means if any given LIST partition supports storing multiple different
Datum values, then the caching only works when we find the same value as
we did the last time. For RANGE partitioning we simply check if the given
Datums are in the same range as the previously found partition.
We store the details of the cached partition in PartitionDesc (i.e.
relcache) so that the cached values are maintained over multiple
statements.
No caching is done for HASH partitions. The majority of the cost in HASH
partition lookups are in the hashing function(s), which would also have to
be executed if we were to try to do caching for HASH partitioned tables.
Since most of the cost is already incurred, we just don't bother. We also
don't do any caching for LIST partitions when we continually find the
values being looked up belong to the DEFAULT partition. We've no
corresponding index in the PartitionBoundInfo's datum array for this case.
We also don't cache when we find the given values match to a LIST
partitioned table's NULL partition. This is so cheap that there's no
point in doing any caching for this. We also don't cache for a RANGE
partitioned table's DEFAULT partition.
There have been a number of different patches submitted to improve
partition lookups. Hou, Zhijie submitted a patch to detect when the value
belonging to the partition key column(s) were constant and added code to
cache the partition in that case. Amit Langote then implemented an idea
suggested by me to remember the last found partition and start to check if
the current values work for that partition. The final patch here was
written by me and was done by taking many of the ideas I liked from the
patches in the thread and redesigning other aspects.
Discussion: https://postgr.es/m/OS0PR01MB571649B27E912EA6CC4EEF03942D9%40OS0PR01MB5716.jpnprd01.prod.outlook.com
Author: Amit Langote, Hou Zhijie, David Rowley
Reviewed-by: Amit Langote, Hou Zhijie
There wasn't an especially nice way to read all of a file while
passing missing_ok = true. Add an additional overloaded variant
to support that use-case.
While here, refactor the C code to avoid a rats-nest of PG_NARGS
checks, instead handling the argument collection in the outer
wrapper functions. It's a bit longer this way, but far more
straightforward.
(Upon looking at the code coverage report for genfile.c, I was
impelled to also add a test case for pg_stat_file() -- tgl)
Kyotaro Horiguchi
Discussion: https://postgr.es/m/20220607.160520.1984541900138970018.horikyota.ntt@gmail.com
Commit b0a55e4329 missed a few places
where we are referring to the number used as a part of the relation
filename as an "OID". We now want to call that a "RelFileNumber".
Some of these places actually made it sound like the OID in question
is pg_class.oid rather than pg_class.relfilenode, which is especially
good to clean up.
Dilip Kumar with some editing by me.
The catalog contents haven't changed, but it's good to make clear
that initdb is required. Changing RELMAPPER_FILEMAGIC would be more
appropriate, but that doesn't actually produce a useful diagnostic,
so cheat by doing this instead.
Discussion: http://postgr.es/m/20220727171939.6ixixqcjt5riil2o@alvherre.pgsql
GetSubscriptionRelations() and GetSubscriptionNotReadyRelations() share
mostly the same code, which scans pg_subscription_rel and fetches all
the relations of a given subscription. The only difference is that the
second routine looks for all the relations not in a ready state. This
commit refactors the code to use a single routine, shaving a bit of
code.
Author: Vignesh C
Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Michael Paquier, Peter
Smith
Discussion: https://postgr.es/m/CALDaNm0eW-9g4G_EzHebnFT5zZoasWCS_EzZQ5BgnLZny9S=pg@mail.gmail.com
This commit puts the implementation of Tuple sort variants into the separate
file tuplesortvariants.c. That gives better separation of the code and
serves well as the demonstration that Tuple sort variant can be defined outside
of tuplesort.c.
Discussion: https://postgr.es/m/CAPpHfdvjix0Ahx-H3Jp1M2R%2B_74P-zKnGGygx4OWr%3DbUQ8BNdw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Pavel Borisov, Maxim Orlov, Matthias van de Meent
Reviewed-by: Andres Freund, John Naylor
XLogRecordBlockHeader, the header holding the information for the data
related to a block, tracks the length of the data appended to the WAL
record with data_length (uint16). This limitation in size was not
enforced by the public routine in charge of registering the data
assembled later to form the WAL record inserted, XLogRegisterBufData().
Incorrectly used, it could lead to the generation of records with some
of its data overflowed. This commit adds some safeguards to prevent
that for the block data, complaining immediately if attempting to add to
a record block information with a size larger than UINT16_MAX, which is
the limit implied by the internal logic.
Note that this also adjusts XLogRegisterData() and XLogRegisterBufData()
so as the length of the WAL record data given by the caller is unsigned,
matching with what gets stored in XLogRecData->len.
Extracted from a larger patch by the same author. The original patch
includes more protections when assembling a record in full that will be
looked at separately later.
Author: Matthias van de Meent
Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier, David
Zhang
Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com
We have a few commands that "can't run in a transaction block",
meaning that if they complete their processing but then we fail
to COMMIT, we'll be left with inconsistent on-disk state.
However, the existing defenses for this are only watertight for
simple query protocol. In extended protocol, we didn't commit
until receiving a Sync message. Since the client is allowed to
issue another command instead of Sync, we're in trouble if that
command fails or is an explicit ROLLBACK. In any case, sitting
in an inconsistent state while waiting for a client message
that might not come seems pretty risky.
This case wasn't reachable via libpq before we introduced pipeline
mode, but it's always been an intended aspect of extended query
protocol, and likely there are other clients that could reach it
before.
To fix, set a flag in PreventInTransactionBlock that tells
exec_execute_message to force an immediate commit. This seems
to be the approach that does least damage to existing working
cases while still preventing the undesirable outcomes.
While here, add some documentation to protocol.sgml that explicitly
says how to use pipelining. That's latent in the existing docs if
you know what to look for, but it's better to spell it out; and it
provides a place to document this new behavior.
Per bug #17434 from Yugo Nagata. It's been wrong for ages,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
The setting controls tha maximum length of the header line in expanded
format output. Possible settings are full, column, page, or an integer.
the default is full, the current behaviour, and in this case the header
line is the length of the widest line of output. column causes the
header to be truncated to the width of the first column, page causes it
to be truncated to the width of the terminal page, and an integer causes
it to be truncated to that value. If the full value is less than the
page or integer value no truncation occurs. If given without an argument
this option prints its current setting.
Platon Pronko, somewhat modified by me.
Discussion: https://postgr.es/m/f03d38a3-db96-a56e-d1bc-dbbc80bbde4d@gmail.com
Previously we did this after InitPostgres, at a somewhat randomly chosen
place within PostgresMain. However, since commit a0ffa885e doing this
outside a transaction can cause a crash, if we need to check permissions
while replacing a placeholder GUC. (Besides which, a preloaded library
could itself want to do database access within _PG_init.)
To avoid needing an additional transaction start/end in every session,
move the process_session_preload_libraries call to within InitPostgres's
transaction. That requires teaching the code not to call it when
InitPostgres is called from somewhere other than PostgresMain, since
we don't want session_preload_libraries to affect background workers.
The most future-proof solution here seems to be to add an additional
flag parameter to InitPostgres; fortunately, we're not yet very worried
about API stability for v15.
Doing this also exposed the fact that we're currently honoring
session_preload_libraries in walsenders, even those not connected to
any database. This seems, at minimum, a POLA violation: walsenders
are not interactive sessions. Let's stop doing that.
(All these comments also apply to local_preload_libraries, of course.)
Per report from Gurjeet Singh (thanks also to Nathan Bossart and Kyotaro
Horiguchi for review). Backpatch to v15 where a0ffa885e came in.
Discussion: https://postgr.es/m/CABwTF4VEpwTHhRQ+q5MiC5ucngN-whN-PdcKeufX7eLSoAfbZA@mail.gmail.com
sigwait() is in SUSv2 and all targeted Unix systems have it. An earlier
pre-standard function prototype existed on some older systems, but we
no longer need a workaround for that.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
getrusage() is in SUSv2 and all targeted Unix systems have it.
Note that POSIX only covers ru_utime and ru_stime and we rely on many
more fields without any kind of configure probe, but that predates this
commit.
The only supported system we need replacement code for now is Windows,
and that can be done without a configure probe.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
O_FSYNC was a pre-POSIX way of spelling O_SYNC, supported since commit
9d645fd84c for non-conforming operating systems of the time. It's not
needed on any modern system. We can just use standard O_SYNC directly
if it exists (= all targeted systems except Windows), and get rid of our
OPEN_SYNC_FLAG macro.
Similarly for standard O_DSYNC, we can just use that directly if it
exists (= all targeted systems except DragonFlyBSD), and get rid of our
OPEN_DATASYNC_FLAG macro.
We still avoid choosing open_datasync as a default value for
wal_sync_method if O_DSYNC has the same value as O_SYNC (= only
OpenBSD), so there is no change in default behavior.
Discussion: https://postgr.es/m/CA%2BhUKGJE7y92NY7FG2ftUbZUaqohBU65_Ys_7xF5mUHo4wirTQ%40mail.gmail.com
Commit 4f658dc8 provided the traditional BSD fls() function in
src/port/fls.c so it could be used in several places. Later we added a
bunch of similar facilities in pg_bitutils.h, based on compiler
builtins that map to hardware instructions. It's a bit confusing to
have both 1-based and 0-based variants of this operation in use in
different parts of the tree, and neither is blessed by a standard.
Let's drop fls.c and the configure probe, and reuse the newer code.
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2B7dSX1XF8yFGmYk-%3D48dbjH2kmzZj16XvhbrWP-9BzRg%40mail.gmail.com
This patch adds a new SUBSCRIPTION parameter "origin". It specifies
whether the subscription will request the publisher to only send changes
that don't have an origin or send changes regardless of origin. Setting it
to "none" means that the subscription will request the publisher to only
send changes that have no origin associated. Setting it to "any" means
that the publisher sends changes regardless of their origin. The default
is "any".
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=9999'
PUBLICATION pub1 WITH (origin = none);
This can be used to avoid loops (infinite replication of the same data)
among replication nodes.
This feature allows filtering only the replication data originating from
WAL but for initial sync (initial copy of table data) we don't have such a
facility as we can only distinguish the data based on origin from WAL. As
a follow-up patch, we are planning to forbid the initial sync if the
origin is specified as none and we notice that the publication tables were
also replicated from other publishers to avoid duplicate data or loops.
We forbid to allow creating origin with names 'none' and 'any' to avoid
confusion with the same name options.
Author: Vignesh C, Amit Kapila
Reviewed-By: Peter Smith, Amit Kapila, Dilip Kumar, Shi yu, Ashutosh Bapat, Hayato Kuroda
Discussion: https://postgr.es/m/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
Commit 964d01ae9 marked a lot of fields as read_write_ignore
to stay consistent with what was dumped by the manually-maintained
outfuncs.c code. However, it seems that a pretty fair number
of those omissions were either flat-out oversights, or a shortcut
taken because hand-written code seemed like it'd be too much trouble.
Let's upgrade things where it seems to make sense to dump.
To do this, we need to add support to gen_node_support.pl and
outfuncs.c for variable-length arrays of Node pointers. That's
pretty straightforward given the model of the existing code
for arrays of scalars, but I found I needed to tighten the
type-recognizing regexes in gen_node_support.pl. (As they
stood, they mistook "foo **" for "foo *". Make sure they're
all fully anchored to prevent additional problems.)
The main thing left un-done here is that a lot of partitioning-related
structs are still not dumped, because they are bare structs not Nodes.
I'm not sure about the wisdom of that choice ... but changing it would
be fairly invasive, so it probably requires more justification than
just making planner node dumps more complete.
Discussion: https://postgr.es/m/1295668.1658258637@sss.pgh.pa.us
When the ability to print variable-length-array fields was first
added to outfuncs.c, there was no corresponding read capability,
as it was used only for debug dumps of planner-internal Nodes.
Not a lot of thought seems to have been put into the output format:
it's just the space-separated array elements and nothing else.
Later such fields appeared in Plan nodes, and still later we grew
read support so that Plans could be transferred to parallel workers,
but the original text format wasn't rethought. It seems inadequate
to me because (a) no cross-check is possible that we got the right
number of array entries, (b) we can't tell the difference between
a NULL pointer and a zero-length array, and (c) except for
WRITE_INDEX_ARRAY, we'd crash if a non-zero length is specified
when the pointer is NULL, a situation that can arise in some fields
that we currently conveniently avoid printing.
Since we're currently in a campaign to make the Node infrastructure
generally more it-just-works-without-thinking-about-it, now seems
like a good time to improve this.
Let's adopt a format similar to that used for Lists, that is "<>"
for a NULL pointer or "(item item item)" otherwise. Also retool
the code to not have so many copies of the identical logic.
I bumped catversion out of an abundance of caution, although I think
that we don't use any such array fields in Nodes that can get into
the catalogs.
Discussion: https://postgr.es/m/1528424.1658272135@sss.pgh.pa.us
This allows aliases for sub-SELECTs and VALUES clauses in the FROM
clause to be omitted.
This is an extension of the SQL standard, supported by some other
database systems, and so eases the transition from such systems, as
well as removing the minor inconvenience caused by requiring these
aliases.
Patch by me, reviewed by Tom Lane.
Discussion: https://postgr.es/m/CAEZATCUCGCf82=hxd9N5n6xGHPyYpQnxW8HneeH+uP7yNALkWA@mail.gmail.com
Windows 10 gained support for flushing NTFS files with fdatasync()
semantics. The main advantage over open_datasync (in Windows API terms
FILE_FLAG_WRITE_THROUGH) is that the latter does not flush SATA drive
caches. The default setting is not changed, so users have to opt in to
this.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
We allow users to set the values of not-yet-loaded extension GUCs,
remembering those values in "placeholder" GUC entries. When/if
the extension is loaded later in the session, we need to verify that
the user had permissions to set the GUC. That was done correctly
before commit a0ffa885e, but as of that commit, we'd check the
permissions of the active role when the LOAD happens, not the role
that had set the value. (This'd be a security bug if it had made it
into a released version.)
In principle this is simple enough to fix: we just need to remember
the exact role OID that set each GUC value, and use that not
GetUserID() when verifying permissions. Maintaining that data in
the guc.c data structures is slightly tedious, but fortunately it's
all basically just copy-n-paste of the logic for tracking the
GucSource of each setting, as we were already doing.
Another oversight is that validate_option_array_item() hadn't
been taught to check for granted GUC privileges. This appears
to manifest only in that ALTER ROLE/DATABASE RESET ALL will
fail to reset settings that the user should be allowed to reset.
Patch by myself and Nathan Bossart, per report from Nathan Bossart.
Back-patch to v15 where the faulty code came in.
Discussion: https://postgr.es/m/20220706224727.GA2158260@nathanxps13
This is mostly just to get outfuncs.c support for them, so that
the agginfos and aggtransinfos lists can be dumped when dumping
the contents of PlannerInfo.
While here, improve some related comments; notably, clean up
obsolete comments left over from when preprocess_minmax_aggregates
had to make its own scan of the query tree.
Discussion: https://postgr.es/m/742479.1658160504@sss.pgh.pa.us
setrefs.c contains logic to discard no-op SubqueryScan nodes, that is,
ones that have no qual to check and copy the input targetlist unchanged.
(Formally it's not very nice to be applying such optimizations so late
in the planner, but there are practical reasons for it; mostly that we
can't unify relids between the subquery and the parent query until we
flatten the rangetable during setrefs.c.) This behavior falsifies our
previous cost estimates, since we would've charged cpu_tuple_cost per
row just to pass data through the node. Most of the time that's little
enough to not matter, but there are cases where this effect visibly
changes the plan compared to what you would've gotten with no
sub-select.
To improve the situation, make the callers of cost_subqueryscan tell
it whether they think the targetlist is trivial. cost_subqueryscan
already has the qual list, so it can check the other half of the
condition easily. It could make its own determination of tlist
triviality too, but doing so would be repetitive (for callers that
may call it several times) or unnecessarily expensive (for callers
that can determine this more cheaply than a general test would do).
This isn't a 100% solution, because createplan.c also does things
that can falsify any earlier estimate of whether the tlist is
trivial. However, it fixes nearly all cases in practice, if results
for the regression tests are anything to go by.
setrefs.c also contains logic to discard no-op Append and MergeAppend
nodes. We did have knowledge of that behavior at costing time, but
somebody failed to update it when a check on parallel-awareness was
added to the setrefs.c logic. Fix that while we're here.
These changes result in two minor changes in query plans shown in
our regression tests. Neither is relevant to the purposes of its
test case AFAICT.
Patch by me; thanks to Richard Guo for review.
Discussion: https://postgr.es/m/2581077.1651703520@sss.pgh.pa.us
Because they are not available we've used _fileno(stdin) in some places, but
that doesn't reliably work, because stdin might be closed. This is the
prerequisite of the subsequent commit, fixing a failure introduced in
76e38b37a5.
Author: Andres Freund <andres@anarazel.de>
Reported-By: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>
Message-Id: 20220520164558.ozb7lm6unakqzezi@alap3.anarazel.de (on pgsql-packagers)
Backpatch: 15-, where 76e38b37a5 came in
This is in preparation for defaulting to -fvisibility=hidden in extensions,
instead of relying on all symbols in extensions to be exported.
This should have been committed before 089480c077, but something in my commit
scripts went wrong.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20211101020311.av6hphdl6xbjbuif@alap3.anarazel.de
Until now postgres built extension libraries with global visibility, i.e.
exporting all symbols. On the one platform where that behavior is not
natively available, namely windows, we emulate it by analyzing the input files
to the shared library and exporting all the symbols therein.
Not exporting all symbols is actually desirable, as it can improve loading
speed, reduces the likelihood of symbol conflicts and can improve intra
extension library function call performance. It also makes the non-windows
builds more similar to windows builds.
Additionally, with meson implementing the export-all-symbols behavior for
windows, turns out to be more verbose than desirable.
This patch adds support for hiding symbols by default and, to counteract that,
explicit symbol visibility annotation for compilers that support
__attribute__((visibility("default"))) and -fvisibility=hidden. That is
expected to be most, if not all, compilers except msvc (for which we already
support explicit symbol export annotations).
Now that extension library symbols are explicitly exported, we don't need to
export all symbols on windows anymore, hence remove that behavior from
src/tools/msvc. The supporting code can't be removed, as we still need to
export all symbols from the main postgres binary.
Author: Andres Freund <andres@anarazel.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20211101020311.av6hphdl6xbjbuif@alap3.anarazel.de
This is in preparation for defaulting to -fvisibility=hidden in extensions,
instead of exporting all symbols. For that symbols intended to be exported
need to be tagged with PGDLLEXPORT. Most extensions only need to do so for
_PG_init() and functions defined with PG_FUNCTION_INFO_V1. Adding central
declarations avoids each extension having to add PGDLLEXPORT. Any existing
declarations in extensions will continue to work if fmgr.h is included before
them, otherwise compilation for Windows will fail.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20211101020311.av6hphdl6xbjbuif@alap3.anarazel.de
Since commit a65e0864, we've required Unix systems to have
sigprocmask(). As noted in that commit's message, we were still
emulating the historical pre-standard sigsetmask() function in our
Windows support code. Emulate standard sigprocmask() instead, for
consistency.
The PG_SETMASK() abstraction is now redundant and all calls could in
theory be replaced by plain sigprocmask() calls, but that isn't done by
this commit.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3153247.1657834482%40sss.pgh.pa.us
For some systems, we need to avoid unsatisfied-external-reference
errors in static inlines. See
27d2693187 for example. In order to
test that on other systems, the gcc option -fkeep-inline-functions can
be used. But it actually is a bit stricter than what we currently
have in place, so fix up a few more places along the lines of the
above commit. (This undoes part of commit
2cd2569c72b8920048e35c31c9be30a6170e1410.)
(Note, this does not add that gcc option anywhere to the build system,
it just makes it possible to use it successfully manually.)
Discussion: https://www.postgresql.org/message-id/flat/E1oBgIW-002ehP-VJ%40gemulon.postgresql.org
Noticed while working in this area. This code was introduced in PG15,
which is still in beta, so backpatch to there for consistency.
Backpatch-through: 15
Teach this script to handle function pointer fields honestly.
Previously they were just silently ignored, but that's not likely to
be a behavior we can accept indefinitely. This mostly entails fixing
it so that a field declaration spanning multiple lines can be parsed,
because we have a bunch of such fields that're laid out that way.
But that's a good improvement in its own right.
With that change and a minor regex adjustment, the only struct it
fails to parse in the node-defining headers is A_Const, because
of the embedded union. The path of least resistance is to move
that union declaration outside the struct.
Having done those things, we can make it error out if it finds
any within-struct syntax it doesn't understand, which seems like
a pretty important property for robustness.
This commit doesn't change the output files at all; it's just in
the way of future-proofing.
Discussion: https://postgr.es/m/2593369.1657759779@sss.pgh.pa.us
Previously we displayed "DSMFillZeroWrite" while in posix_fallocate(),
because we shared the same wait event for "mmap" and "posix" DSM types.
Let's introduce a new wait event "DSMAllocate", to be more accurate.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220711174518.yldckniicknsxgzl%40awork3.anarazel.de
No members of the buildfarm are using this version of Visual Studio,
resulting in all the code cleaned up here as being mostly dead, and
VS2017 is the oldest version still supported.
More versions could be cut, but the gain would be minimal, while
removing only VS2013 has the advantage to remove from the core code all
the dependencies on the value defined by _MSC_VER, where compatibility
tweaks have accumulated across the years mostly around locales and
strtof(), so that's a nice isolated cleanup.
Note that this commit additionally allows a revert of 3154e16. The
versions of Visual Studio now supported range from 2015 to 2022.
Author: Michael Paquier
Reviewed-by: Juan José Santamaría Flecha, Tom Lane, Thomas Munro, Justin
Pryzby
Discussion: https://postgr.es/m/YoH2IMtxcS3ncWn+@paquier.xyz
Commit 9c727360b neglected the lesson we've learned before:
protect references to backend global variables with #ifndef FRONTEND.
Since there's already a place for static inlines in this file,
move the just-converted functions to that stanza. Undo the
entirely gratuitous de-macroization of RelationGetNumberOfBlocks
(that one may be okay, since it has no global variable references,
but it's also pointless).
Per buildfarm.
The initial version of gen_node_support.pl manually excluded most
utility statement node types from having out/read support, and
also some raw-parse-tree-only node types. That was mostly to keep
the output comparable to the old hand-maintained code. We'd like
to have out/read support for utility statements, for debugging
purposes and so that they can be included in new-style SQL functions;
so it's time to lift that restriction.
Most if not all of the previously-excluded raw-parse-tree-only node
types can appear in expression subtrees of utility statements, so
they have to be handled too.
We don't quite have full read support yet; certain custom_read_write
node types need to have their handwritten read functions implemented
before that will work.
Doing this allows us to drop the previous hack in _outQuery to not
dump the utilityStmt field in most cases, which means we no longer
need manually-maintained out/read functions for Query, so get rid
of those in favor of auto-generating them.
Fix a couple of omissions in gen_node_support.pl that are exposed
through having to handle more node types.
catversion bump forced because somebody was sloppy about the field
order in the manually-maintained Query out/read functions.
(Committers should note that almost all changes in parsenodes.h
are now grounds for a catversion bump.)
Previously, the STORAGE specification was only available in ALTER
TABLE. This makes it available in CREATE TABLE as well.
Also make the code and the documentation for STORAGE and COMPRESSION
attributes consistent.
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: wenjing zeng <wjzeng2012@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b@sigaev.ru
There are a few things that we could do a little better within
get_cheapest_group_keys_order():
1. We should be using list_free() rather than pfree() on a List.
2. We should use for_each_from() instead of manually coding a for loop to
skip the first n elements of a List
3. list_truncate(list_copy(...), n) is not a great way to copy the first n
elements of a list. Let's invent list_copy_head() for that. That way we
don't need to copy the entire list just to truncate it directly
afterwards.
4. We can simplify finding the cheapest cost by setting the cheapest cost
variable to DBL_MAX. That allows us to skip special-casing the initial
iteration of the loop.
Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvrGyL3ft8waEkncG9y5HDMu5TFFJB1paoTC8zi9YK97Nw@mail.gmail.com
Backpatch-through: 15, where get_cheapest_group_keys_order was added.
Justin Pryzby reported that some scenarios could cause gathering
of extended statistics to spend many seconds in an un-cancelable
qsort() operation. To fix, invent qsort_interruptible(), which is
just like qsort_arg() except that it will also do CHECK_FOR_INTERRUPTS
every so often. This bloats the backend by a couple of kB, which
seems like a good investment. (We considered just enabling
CHECK_FOR_INTERRUPTS in the existing qsort and qsort_arg functions,
but there are some callers for which that'd demonstrably be unsafe.
Opt-in seems like a better way.)
For now, just apply qsort_interruptible() in statistics collection.
There's probably more places where it could be useful, but we can
always change other call sites as we find problems.
Back-patch to v14. Before that we didn't have extended stats on
expressions, so that the problem was less severe. Also, this patch
depends on the sort_template infrastructure introduced in v14.
Tom Lane and Justin Pryzby
Discussion: https://postgr.es/m/20220509000108.GQ28830@telsasoft.com
Having different build systems producing different contents of the
NodeTag enum would be catastrophic for extension ABI stability.
But that ordering depends on the order in which gen_node_support.pl
processes its input files. It seems too fragile to let the Makefiles,
MSVC build scripts, and soon meson build scripts all set this order
independently. As a klugy but serviceable solution, put a canonical
copy of the file list into gen_node_support.pl itself, and check that
against the files given on the command line.
Also, while it's fine to add and delete node tags during development,
we must not let the assigned NodeTag values change unexpectedly in
stable branches. Add a cross-check that can be enabled when a branch
is forked off (or later, but that is a time when we're unlikely to
miss doing it). It just checks that the last auto-assigned number
doesn't change, which is simplistic but will catch the most likely
sorts of mistakes.
From time to time we do need to add a node tag in a stable branch.
To support doing that without changing the branch's auto-assigned
tag numbers, invent pg_node_attr(nodetag_number(VALUE)) which can
be used to give such a node a hand-assigned tag above the last
auto-assigned one.
Discussion: https://postgr.es/m/1249010.1657574337@sss.pgh.pa.us
This allows explaining gen_node_support.pl's handling of execnodes.h
and some other input files as being a shortcut for explicit marking
of all their node declarations as pg_node_attr(nodetag_only).
I foresee that someday we might need to be more fine-grained about
that, and this change provides the infrastructure needed to do so.
For now, it just allows removal of the script's klugy special case
for CallContext and InlineCodeBlock.
Discussion: https://postgr.es/m/75063.1657410615@sss.pgh.pa.us
Further to commit 92d70b77, let's drop the code we carry for the
following untested architectures: M68K, M88K, M32R, SuperH. We have no
idea if anything actually works there, and surely as vintage hardware
and microcontrollers they would be underpowered for modern purposes.
We could always consider re-adding SuperH based on evidence of usage and
build farm support, if someone shows up to provide it.
While here, SPARC is usually written in all caps.
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (the idea, not the patch)
Discussion: https://postgr.es/m/959917.1657522169%40sss.pgh.pa.us
Refactor so that log_line_prefix() is a thin wrapper over a new
function log_status_format(), and move the implementation to the
latter. Export log_status_format() so that it can be used by an
emit_log_hook.
Discussion: https://postgr.es/m/39c8197652f4d3050aedafae79fa5af31096505f.camel%40j-davis.com
Reviewed-by: Michael Paquier, Alvaro Herrera
Remove PageIsValid() and PageSizeIsValid(), which weren't used and
seem unnecessary.
Some code using these formerly-macros needs some adjustments because
it was previously playing loose with the Page vs. PageHeader types,
which is no longer possible with the functions instead of macros.
Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517%40enterprisedb.com
dshash.c previously maintained flags to be able to assert that you
didn't hold any partition lock. These flags could get out of sync with
reality in error scenarios.
Get rid of all that, and make assertions about the locks themselves
instead. Since LWLockHeldByMe() loops internally, we don't want to put
that inside another loop over all partition locks. Introduce a new
debugging-only interface LWLockAnyHeldByMe() to avoid that.
This problem was noted by Tom and Andres while reviewing changes to
support the new shared memory stats system, and later showed up in
reality while working on commit 389869af.
Back-patch to 11, where dshash.c arrived.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220311012712.botrpsikaufzteyt@alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BhUKGJ31Wce6HJ7xnVTKWjFUWQZPBngxfJVx4q0E98pDr3kAw%40mail.gmail.com
* Remove arbitrary mention of certain endianness and bitness variants;
it's enough to say that applicable variants are expected to work.
* List RISC-V (known to work, being tested).
* List SuperH and M88K (code exists, unknown status, like M68K).
* De-list VAX and remove code (known not to work).
* Remove stray trace of Alpha (support was removed years ago).
* List illumos, DragonFlyBSD (known to work, being tested).
* No need to single Windows out by listing a specific version, when we
don't do that for other OSes; it's enough to say that we support
current versions of the listed OSes (when 16 ships, that'll be
Windows 10+).
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Discussion: https://postgr.es/m/CA%2BhUKGKk7NZO1UnJM0PyixcZPpCGqjBXW_0bzFZpJBGAf84XKg%40mail.gmail.com
copyfuncs.c and friends no longer seem like great places to put
high-level remarks about what's covered and what isn't. Move that
material to backend/nodes/README and other more-prominent places.
Add back (versions of) some remarks that disappeared in 2be87f092.
Discussion: https://postgr.es/m/3843645.1657385930@sss.pgh.pa.us
In the same vein as commit 251154beb, make it clear that we never
instantiate PlanState.
Also mark MemoryContextData as abstract. This has no effect right now,
since memnodes.h isn't one of the files fed to gen_node_support.pl.
But it seems like good documentation and future-proofing.
Add a script to automatically generate the node support functions
(copy, equal, out, and read, as well as the node tags enum) from the
struct definitions.
For each of the four node support files, it creates two include files,
e.g., copyfuncs.funcs.c and copyfuncs.switch.c, to include in the main
file. All the scaffolding of the main file stays in place.
I have tried to mostly make the coverage of the output match what is
currently there. For example, one could now do out/read coverage of
utility statement nodes, but I have manually excluded those for now.
The reason is mainly that it's easier to diff the before and after,
and adding a bunch of stuff like this might require a separate
analysis and review.
Subtyping (TidScan -> Scan) is supported.
For the hard cases, you can just write a manual function and exclude
generating one. For the not so hard cases, there is a way of
annotating struct fields to get special behaviors. For example,
pg_node_attr(equal_ignore) has the field ignored in equal functions.
(In this patch, I have only ifdef'ed out the code to could be removed,
mainly so that it won't constantly have merge conflicts. It will be
deleted in a separate patch. All the code comments that are worth
keeping from those sections have already been moved to the header
files where the structs are defined.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce%40enterprisedb.com
PostgreSQL contains the implementation of the red-black tree. The red-black
tree is the ordered data structure, and one of its advantages is the ability
to do inequality searches. This commit adds rbt_find_less() and
rbt_find_great() functions implementing these searches. While these searches
aren't yet used in the core code, they might be useful for extensions.
Discussion: https://postgr.es/m/CAGRrpzYE8-7GCoaPjOiL9T_HY605MRax-2jgTtLq236uksZ1Sw%40mail.gmail.com
Author: Steve Chavez, Alexander Korotkov
Reviewed-by: Alexander Korotkov
This CPU architecture has been discontinued. We already removed HP-UX
support, we never supported Windows/Itanium, and the open source
operating systems that a vintage hardware owner might hope to run have
all either ended Itanium support or never fully released support (NetBSD
may eventually). The extra code we carry for this rare ISA is now
untested. It seems like a good time to remove it.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1415825.1656893299%40sss.pgh.pa.us
HP-UX hardware is no longer produced, build farm coverage recently
ended, and there are no known active maintainers targeting this OS.
Since there is a major rewrite of the build system in the pipeline for
PostgreSQL 16, and that requires development, testing and maintainance
for each OS and tool chain, it seems like a good time to drop support
for:
* HP-UX, the operating system.
* HP aCC, the HP-UX native compiler.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/1415825.1656893299%40sss.pgh.pa.us
This commit bumps the runtime value of _WIN32_WINNT to be 0x0A00 for any
builds on Windows. Hence, this makes Windows 10 the minimal requirement
when running PostgreSQL under WIN32, be it for builds of Cygwin, MinGW
or Visual Studio.
The previous minimal runtime version was either Windows Vista when
building with at least Visual Studio 2015 or Windows XP for the rest.
Windows 10 is the most modern version supported by Microsoft, and per
discussion, as we don't have buildfarm members that run older versions
anymore, this is the minimal supported version that suits better for our
needs. This will actually make easier the development of some patches,
two being async I/O and large page handling by avoiding a lot of
compatibility gotchas, on platforms that have most likely few users
anyway.
It is possible to remove MIN_WINNT in win32.h and the macros
IsWindowsXXXOrGreater() that were used in the code at runtime to check
which version of Windows was getting used. The change in pg_locale.c
comes from Juan. Note that all my tests passed, and that the CI is
green. The buildfarm will quickly tell if this needs more adjustments.
Author: Michael Paquier, Juan José Santamaría Flecha
Reviewed-by: Thomas Munro
Discussion: https://postgr.es/m/Yo7tHKD8VCkeNi71@paquier.xyz
40af10b57 changed things so we make use of a generation memory context for
storing tuples to be sorted by tuplesort.c. That change does not play
nicely with the changes made in 9f03ca915 (back in 2014). That commit
changed things so that index_form_tuple() is called while switched into
the tuplestore's tuplecontext. In order to fetch the tuple from the index,
index_form_tuple() must do various memory allocations which are unrelated
to the storage of the final returned tuple. Although all of these
allocations are pfree'd, the fact that we now use a generation context
means that the memory for these pfree'd allocations won't be used again by
any other allocation due to generation.c's lack of freelists. This could
result in sorts used for building indexes exceeding maintenance_work_mem
by a very large amount.
Here we fix it so we no longer allocate anything apart from the tuple
itself into the generation context by adding a new version of
index_form_tuple named index_form_tuple_context, which can be called to
specify the MemoryContext to allocate the tuple into.
Discussion: https://postgr.es/m/CAApHDvrHQkiFRHiGiAS-LMOvJN-eK-s762=tVzBz8ZqUea-a_A@mail.gmail.com
Backpatch-through: 15, where 40af10b57 was added.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
50e17ad28 increased the size of ExprEvalStep from 64 bytes up to 88 bytes.
Lots of effort was spent during the development of the current expression
evaluation code to make an instance of this struct as small as possible.
Making this struct larger than needed reduces CPU cache efficiency during
expression evaluation which causes noticeable slowdowns during query
execution.
In order to reduce the size of the struct, here we remove the fn_addr
field. The values from this field can be obtained via fcinfo, just with
some extra pointer dereferencing. The extra indirection does not seem to
cause any noticeable slowdowns.
Various other fields have been moved into the ScalarArrayOpExprHashTable
struct. These fields are only used when the ScalarArrayOpExprHashTable
pointer has already been dereferenced, so no additional pointer
dereferences occur for these. Here we also make hash_fcinfo_data the last
field in ScalarArrayOpExprHashTable so that we can avoid a further pointer
dereference to get the FunctionCallInfoBaseData. This also saves a call to
palloc().
50e17ad28 was added in 14, but it's too late to adjust the size of the
ExprEvalStep in that version, so here we just backpatch to 15, which is
currently in beta.
Author: Andres Freund, David Rowley
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Backpatch-through: 15
auto_explain.log_parameter_max_length is a new GUC part of the
extension, similar to the corresponding core setting, that controls the
inclusion of query parameters in the logged explain output.
More tests are added to check the behavior of this new parameter: when
parameters logged in full (the default of -1), when disabled (value of
0) and when partially truncated (value different than the two others).
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87ee09mohb.fsf@wibble.ilmari.org
The new expression step types increased the size of ExprEvalStep by ~4 for all
types of expression steps, slowing down expression evaluation noticeably. Move
them out of line.
There's other issues with these expression steps, but addressing them is
largely independent of this aspect.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Backpatch: 15-
This reverts most of 91c0570a79, f28bf667f6, fe0972ee5e, afdeff1052. The
only thing left is the retry loop in 019_replslot_limit.pl that avoids
spurious failures by retrying a couple times.
We haven't seen any hard evidence that this is caused by anything but slow
process shutdown. We did not find any cases where walsenders did not vanish
after waiting for longer. Therefore there's no reason for this debugging code
to remain.
Discussion: https://postgr.es/m/20220530190155.47wr3x2prdwyciah@alap3.anarazel.de
Backpatch: 15-
A previous commit replaced all the calls to this function with
durable_rename() as of dac1ff3, making it used nowhere in the tree.
Using it in extension code is also risky based on the issues described
in this previous commit, so let's remove it. This makes possible the
removal of HAVE_WORKING_LINK.
Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
Use it for RelationSyncEntry->streamed_txns, which is currently using an
integer list.
The API support is not complete, not because it is hard to write but
because it's unclear that it's worth the code space, there being so
little use of XID lists.
Discussion: https://postgr.es/m/202205130830.g5ntonhztspb@alvherre.pgsql
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
pg_attribute_nonnull(...) can be used to generate compiler warnings
when a function is called with the specified arguments set to NULL, as
per an idea from Andres Freund. An empty argument list indicates that
no pointer arguments can be NULL. pg_attribute_nonnull() only works for
compilers that support the nonnull function attribute. If nonnull is
not supported, pg_attribute_nonnull() has no effect.
As a beginning, this commit uses it for the DefineCustomXXXVariable()
functions to generate warnings when the "name" and "value" arguments are
set to NULL. This will likely be expanded to other places in the
future, where it makes sense.
Author: Nathan Bossart
Reviewed by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20220525061739.ur7x535vtzyzkmqo@alap3.anarazel.de
There were many calls to construct_array() and deconstruct_array() for
built-in types, for example, when dealing with system catalog columns.
These all hardcoded the type attributes necessary to pass to these
functions.
To simplify this a bit, add construct_array_builtin(),
deconstruct_array_builtin() as wrappers that centralize this hardcoded
knowledge. This simplifies many call sites and reduces the amount of
hardcoded stuff that is spread around.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/2914356f-9e5f-8c59-2995-5997fc48bcba%40enterprisedb.com
TransactionIdIsInProgress had a fast path to return 'false' if the
single-item CLOG cache said that the transaction was known to be
committed. However, that was wrong, because a transaction is first
marked as committed in the CLOG but doesn't become visible to others
until it has removed its XID from the proc array. That could lead to an
error:
ERROR: t_xmin is uncommitted in tuple to be updated
or for an UPDATE to go ahead without blocking, before the previous
UPDATE on the same row was made visible.
The window is usually very short, but synchronous replication makes it
much wider, because the wait for synchronous replica happens in that
window.
Another thing that makes it hard to hit is that it's hard to get such
a commit-in-progress transaction into the single item CLOG cache.
Normally, if you call TransactionIdIsInProgress on such a transaction,
it determines that the XID is in progress without checking the CLOG
and without populating the cache. One way to prime the cache is to
explicitly call pg_xact_status() on the XID. Another way is to use a
lot of subtransactions, so that the subxid cache in the proc array is
overflown, making TransactionIdIsInProgress rely on pg_subtrans and
CLOG checks.
This has been broken ever since it was introduced in 2008, but the race
condition is very hard to hit, especially without synchronous
replication. There were a couple of reports of the error starting from
summer 2021, but no one was able to find the root cause then.
TransactionIdIsKnownCompleted() is now unused. In 'master', remove it,
but I left it in place in backbranches in case it's used by extensions.
Also change pg_xact_status() to check TransactionIdIsInProgress().
Previously, it only checked the CLOG, and returned "committed" before
the transaction was actually made visible to other queries. Note that
this also means that you cannot use pg_xact_status() to reproduce the
bug anymore, even if the code wasn't fixed.
Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with
the pg_xact_status() change added by me.
Author: Simon Riggs
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
Previously, we encoded both NULL and the first byte at the base address
as 0. That confusion led to the assertion in commit e07d4ddc, which
failed when min_dynamic_shared_memory was used. Give them distinct
encodings, by switching to 1-based offsets for non-NULL pointers. Also
improve macro hygiene in passing (missing/misplaced parentheses), and
remove open-coded access to the raw offset value from freepage.c/h.
Although e07d4ddc was back-patched to 10, the only code that actually
makes use of relptr at the base address arrived in 84b1c63a, so no need
to back-patch further than 14 for now.
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20220519193839.GT19626%40telsasoft.com
This reverts commits 5753d4ee32 and fe60b67250 that modified HOT to
ignore BRIN indexes. The commit message for 5753d4ee32 claims that:
When determining whether an index update may be skipped by using
HOT, we can ignore attributes indexed only by BRIN indexes. There
are no index pointers to individual tuples in BRIN, and the page
range summary will be updated anyway as it relies on visibility
info.
This is partially incorrect - it's true BRIN indexes don't point to
individual tuples, so HOT chains are not an issue, but the visibitlity
info is not sufficient to keep the index up to date. This can easily
result in corrupted indexes, as demonstrated in the hackers thread.
This does not mean relaxing the HOT restrictions for BRIN is a lost
cause, but it needs to handle the two aspects (allowing HOT chains and
updating the page range summaries) as separate. But that requires a
major changes, and it's too late for that in the current dev cycle.
Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
We were not updating the partition map cache in the subscriber even when
the corresponding remote rel is changed. Due to this data was getting
incorrectly replicated for partition tables after the publisher has
changed the table schema.
Fix it by resetting the required entries in the partition map cache after
receiving a new relation mapping from the publisher.
Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
The original advice for hard-wired SetConfigOption calls was to use
PGC_S_OVERRIDE, particularly for PGC_INTERNAL GUCs. However,
that's really overkill for PGC_INTERNAL GUCs, since there is no
possibility that we need to override a user-provided setting.
Instead use PGC_S_DYNAMIC_DEFAULT in most places, so that the
value will appear with source = 'default' in pg_settings and thereby
not be shown by psql's new \dconfig command. The one exception is
that when changing in_hot_standby in a hot-standby session, we still
use PGC_S_OVERRIDE, because people felt that seeing that in \dconfig
would be a good thing.
Similarly use PGC_S_DYNAMIC_DEFAULT for the auto-tune value of
wal_buffers (if possible, that is if wal_buffers wasn't explicitly
set to -1), and for the typical 2MB value of max_stack_depth.
In combination these changes remove four not-very-interesting
entries from the typical output of \dconfig, all of which people
fingered as "why is that showing up?" in the discussion thread.
Discussion: https://postgr.es/m/3118455.1649267333@sss.pgh.pa.us
With the old logic, when the reciever had not yet attached, we would
never call shm_mq_inc_bytes_written(), even if force_flush = true
was specified. That could result in a situation where data that the
sender believes it has sent is never received.
Along the way, remove a useless function prototype for a nonexistent
function from shm_mq.h.
Commit 46846433a0 introduced these
problems.
Pavan Deolasee, with a few changes by me.
Discussion: https://postgr.es/m/CABOikdPkwtLLCTnzzmpSMXo3QZa2yXq0J7Q61ssdLFAJYrOVvQ@mail.gmail.com
Commit 1a36bc9db (SQL/JSON query functions) introduced STRING as a
type_func_name_keyword, thereby breaking applications that use
"string" as a table name, column name, function parameter name, etc.
That seems like a pretty bad thing, not least because the SQL spec
says that STRING is an unreserved keyword.
This is easy enough to fix so far as the core grammar is concerned.
However, doing so causes some ECPG test cases to fail, specifically
those that use "string" as a typedef name. It turns out this is
because portions of the ECPG grammar allow type_func_name_keywords
but not unreserved_keywords as typedef names. That's pretty horrid,
and it's mildly astonishing that we've not heard complaints about it
before. We can fix two of those uses trivially, but the ones in the
var_type production are less easy. As a stopgap, hard-code STRING as
an allowed alternative in var_type.
Per report from Alastair McKinley.
Discussion: https://postgr.es/m/3661437.1653855582@sss.pgh.pa.us
Several places in the planner tried to clamp a double value to fit
in a "long" by doing
(long) Min(x, (double) LONG_MAX);
This is subtly incorrect, because it casts LONG_MAX to double and
potentially back again. If long is 64 bits then the double value
is inexact, and the platform might round it up to LONG_MAX+1
resulting in an overflow and an undesirably negative output.
While it's not hard to rewrite the expression into a safe form,
let's put it into a common function to reduce the risk of someone
doing it wrong in future.
In principle this is a bug fix, but since the problem could only
manifest with group count estimates exceeding 2^63, it seems unlikely
that anyone has actually hit this or will do so anytime soon. We're
fixing it mainly to satisfy fuzzer-type tools. That being the case,
a HEAD-only fix seems sufficient.
Andrey Lepikhov
Discussion: https://postgr.es/m/ebbc2efb-7ef9-bf2f-1ada-d6ec48f70e58@postgrespro.ru
This is a slight, convenient semantics change from what commit
0f0cfb4940 ("Fix parallel operations that prevent oldest xmin from
advancing") introduced that lets us simplify the coding in the one place
where it is used.
Backpatch to 13. This is related to commit 6fea65508a ("Tighten
ComputeXidHorizons' handling of walsenders") rewriting the code site
where this is used, which has not yet been backpatched, but it may well
be in the future.
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/202204191637.eldwa2exvguw@alvherre.pgsql
Commit 923def9a53 and 52e4f0cd47 allowed to specify column lists and row
filters for publication tables. This commit extends the
pg_publication_tables view and pg_get_publication_tables function to
display that information.
This information will be useful to users and we also need this for the
later commit that prohibits combining multiple publications with different
column lists for the same table.
Author: Hou Zhijie
Reviewed By: Amit Kapila, Alvaro Herrera, Shi Yu, Takamichi Osumi
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
I started out with the intention to rename value_type to item_type to
avoid a collision with a typedef name that appears on some platforms.
Along the way, I noticed that the adjacent field "format" was not being
correctly handled by the backend/nodes/ infrastructure functions:
copyfuncs.c erroneously treated it as a scalar, while equalfuncs,
outfuncs, and readfuncs omitted handling it at all. This looks like
it might be cosmetic at the moment because the field is always NULL
after parse analysis; but that's likely a bug in itself, and the code's
certainly not very future-proof. Let's fix it while we can still do so
without forcing an initdb on beta testers.
Further study found a few other inconsistencies in the backend/nodes/
infrastructure for the recently-added JSON node types, so fix those too.
catversion bumped because of potential change in stored rules.
Discussion: https://postgr.es/m/526703.1652385613@sss.pgh.pa.us
Currently, preloaded libraries are expected to request additional
shared memory and LWLocks in _PG_init(). However, it is not unusal
for such requests to depend on MaxBackends, which won't be
initialized at that time. Such requests could also depend on GUCs
that other modules might change. This introduces a new hook where
modules can safely use MaxBackends and GUCs to request additional
shared memory and LWLocks.
Furthermore, this change restricts requests for shared memory and
LWLocks to this hook. Previously, libraries could make requests
until the size of the main shared memory segment was calculated.
Unlike before, we no longer silently ignore requests received at
invalid times. Instead, we FATAL if someone tries to request
additional shared memory or LWLocks outside of the hook.
Nathan Bossart and Julien Rouhaud
Discussion: https://postgr.es/m/20220412210112.GA2065815%40nathanxps13
Discussion: https://postgr.es/m/Yn2jE/lmDhKtkUdr@paquier.xyz
Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta
tasks specified by RELEASE_CHANGES. For reference, the command was
./renumber_oids.pl --first-mapped-oid 8000 --target-oid 6205
Three variables in pqsignal.h (UnBlockSig, BlockSig and StartupBlockSig)
were not marked with PGDLLIMPORT, as they are declared in a way that
prevents mark_pgdllimport.pl to detect them. These variables are
redefined in a style more consistent with the other headers, allowing
the script to find and mark them.
PGDLLIMPORT was missing for __pg_log_level in logging.h, so add it
back. The marking got accidentally removed in 9a374b77, just after its
addition in 8ec5694.
While on it, add a comment in mark_pgdllimport.pl explaining what are
the arguments needed by the script (aka a list of header paths).
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20220506234924.6mxxotl3xl63db3l@alap3.anarazel.de
The code for unloading a library has been commented-out for over 12
years, ever since commit 602a9ef5a7, and we're
no closer to supporting it now than we were back then.
Nathan Bossart, reviewed by Michael Paquier and by me.
Discussion: http://postgr.es/m/Ynsc9bRL1caUSBSE@paquier.xyz