Use __builtin_clz() where available. Where it isn't, we can still win
a little by using the pg_leftmost_one_pos[] lookup table instead of
having a private table.
Also drop the initial right shift by ALLOC_MINBITS in favor of
subtracting ALLOC_MINBITS from the leftmost-one-pos result. This
is a win because the compiler can fold that adjustment into other
constants it'd have to add anyway, making the shift-removal free.
Also, we can explain this coding as an unrolled form of
pg_leftmost_one_pos32(), even though that's a bit ahistorical
since it long predates pg_bitutils.h.
John Naylor, with some cosmetic adjustments by me
Discussion: https://postgr.es/m/CACPNZCuNUGMxjK7WTn_=WZnRbfASDdBxmjsVf2+m9MdmeNw_sg@mail.gmail.com
This operation was possible for the owner of the schema or a superuser.
Down to 9.4, doing this operation would cause inconsistencies in a
session whose temporary schema was dropped, particularly if trying to
create new temporary objects after the drop. A more annoying
consequence is a crash of autovacuum on an assertion failure when
logging information about an orphaned temp table dropped. Note that
because of 246a6c8 (present in v11~), which has made the removal of
orphaned temporary tables more aggressive, the failure could be
triggered more easily, but it is possible to reproduce down to 9.4.
Reported-by: Mahendra Singh, Prabhat Sahu
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Mahendra Singh
Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com
Backpatch-through: 9.4
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.
Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
When revalidate_rectypeid() acts to update a stale record type OID
in plpgsql's data structures, it fixes the active PLpgSQL_rec struct
as well as the PLpgSQL_type struct it references. However, the latter
is shared across function executions while the former is not. In a
later function execution, the PLpgSQL_rec struct would be reinitialized
by copy_plpgsql_datums and would then contain a stale type OID,
typically leading to "could not open relation with OID NNNN" errors.
revalidate_rectypeid() can easily fix this, fortunately, just by
treating typ->typoid as authoritative.
Per report and diagnosis from Ashutosh Sharma, though this is not his
suggested fix. Back-patch to v11 where this code came in.
Discussion: https://postgr.es/m/CAE9k0Pkd4dZwt9J5pS9xhJFWpUtqs05C9xk_GEwPzYdV=GxwWg@mail.gmail.com
Instead of passing around a pointer to the RangeTblEntry that
provides the desired column, pass a pointer to the associated
ParseNamespaceItem. The RTE is trivially reachable from the nsitem,
and having the ParseNamespaceItem allows access to additional
information. As proof of concept for that, add the rangetable index
to ParseNamespaceItem, and use that to get rid of RTERangeTablePosn
searches.
(I have in mind to teach the parser to generate some different
representation for Vars that are nullable by outer joins, and
keeping the necessary information in ParseNamespaceItems seems
like a reasonable approach to that. But whether that ever
happens or not, this seems like good cleanup.)
Also refactor the code around scanRTEForColumn so that the
"fuzzy match" stuff does not leak out of parse_relation.c.
Discussion: https://postgr.es/m/26144.1576858373@sss.pgh.pa.us
The part in charge of doing the vacuum on all the indexes of a relation
was duplicated, with the same handling for progress reporting done.
While on it, update the progress reporting for heap vacuuming in the
subroutine doing the actual work, keeping the status update local. This
way, any future caller of lazy_vacuum_heap() does not have to worry
about doing any progress reporting update.
Author: Justin Pryzby, Michael Paquier
Discussion: https://postgr.es/m/20191120210600.GC30362@telsasoft.com
In the wake of commit 5b9312378, there's no particular reason
for this restriction (previously, it was problematic because of
the implied rowtype reference). A simple constraint on a whole-row
Var probably isn't that useful, but conceivably somebody would want
to pass one to a function that extracts a partitioning key. Besides
which, we're expending much more code to enforce the restriction than
we save by having it, since the latter quantity is now zero.
So drop the restriction.
Amit Langote
Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Formerly the rd_partkey and rd_partdesc data structures were always
populated immediately when a relcache entry was built or rebuilt.
This patch changes things so that they are populated only when they
are first requested. (Hence, callers *must* now always use
RelationGetPartitionKey or RelationGetPartitionDesc; just fetching
the pointer directly is no longer acceptable.)
This seems to have some performance benefits, but the main reason to do
it is that it eliminates a recursive-reload failure that occurs if the
partkey or partdesc expressions contain any references to the relation's
rowtype (as discovered by Amit Langote). In retrospect, since loading
these data structures might result in execution of nearly-arbitrary code
via eval_const_expressions, it was a dumb idea to require that to happen
during relcache entry rebuild.
Also, fix things so that old copies of a relcache partition descriptor
will be dropped when the cache entry's refcount goes to zero. In the
previous coding it was possible for such copies to survive for the
lifetime of the session, as I'd complained of in a previous discussion.
(This management technique still isn't perfect, but it's better than
before.) Improve the commentary explaining how that works and why
it's safe to hand out direct pointers to these relcache substructures.
In passing, improve RelationBuildPartitionDesc by using the same
memory-context-parent-swap approach used by RelationBuildPartitionKey,
thereby making it less dependent on strong assumptions about what
partition_bounds_copy does. Avoid doing get_rel_relkind in the
critical section, too.
Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit
for prior work in the area, too. Although this is a pre-existing
problem, no back-patch: the patch seems too invasive to be safe to
back-patch, and the bug it fixes is a corner case that seems
relatively unlikely to cause problems in the field.
Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c. Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.
This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.
Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
Using \ is unnecessary and ugly, so remove that. While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.
Leave contrib/tablefunc alone.
Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
Since d6c55de1, src/port/snprintf.c is able to use %m instead of
strerror(). A couple of utilities in src/bin/ have already done the
switch, and do it now for pg_waldump as this reduces the workload for
translators.
Note that more could be done, particularly with pgbench. Thanks to
Kyotaro Horiguchi for the discussion.
Discussion: https://postgr.es/m/20191129065115.GM2505@paquier.xyz
Our algorithm for choosing batch numbers turned out not to work
effectively for multi-billion key inner relations. We would use
more hash bits than we have, and effectively concentrate all tuples
into a smaller number of batches than we intended. While ideally
we should switch to wider hashes, for now, change the algorithm to
one that effectively gives up bits from the bucket number when we
don't have enough bits. That means we'll finish up with longer
bucket chains than would be ideal, but that's better than having
batches that don't fit in work_mem and can't be divided.
Batch-patch to all supported releases.
Author: Thomas Munro
Reviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussion
Reported-by: James Coleman
Discussion: https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
This wasn't checked originally, but it should have been, because
in general pseudo-types can't be stored to and retrieved from disk.
Notably, partition bound values of type "record" would not be
interpretable by another session.
In v12 and HEAD, add another flag to CheckAttributeType's repertoire
so that it can produce a specific error message for this case. That's
infeasible in older branches without an ABI break, so fall back to
a slightly-less-nicely-worded error message in v10 and v11.
Problem noted by Amit Langote, though this patch is not his initial
solution. Back-patch to v10 where partitioning was introduced.
Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
We probably should have thought of this case when ranges were added,
but we didn't. (It's not the fault of commit eb51af71f, because
ranges didn't exist then.)
It's an old bug, so back-patch to all supported branches.
Discussion: https://postgr.es/m/7782.1577051475@sss.pgh.pa.us
Comments about the consequences of clearing the BTP_HAS_GARBAGE page
flag bit that apply only to VACUUM were added to code that deals with
opportunistic deletion of LP_DEAD items by commit a760893d. The same
comment block was added to both _bt_delitems_vacuum() and
_bt_delitems_delete(). Correct _bt_delitems_delete()'s copy of the
comment block.
_bt_delitems_delete() reliably deletes items that were found by caller
to have their LP_DEAD bit set. There is no question about whether or
not unsetting the BTP_HAS_GARBAGE bit can miss some LP_DEAD items that
were set recently.
Also tweak a related section of the nbtree README.
If the first transaction block in these tests were entered exactly
at midnight (California time), they'd report a bogus failure due
to 'now' and 'midnight' having the same values. Commit 8c2ac75c5
had dismissed this as being of negligible probability, but we've
now seen it happen in the buildfarm, so let's prevent it. We can
get pretty much the same test coverage without an it's-not-midnight
assumption by moving the does-'now'-work cases into their own test step.
While here, apply commit 47169c255's s/DELETE/TRUNCATE/ change to
timestamptz as well as timestamp (not sure why that didn't
occur to me at the time; the risk of failure is the same).
Back-patch to all supported branches, since the main point is
to get rid of potential buildfarm failures.
Discussion: https://postgr.es/m/14821.1577031117@sss.pgh.pa.us
This fixes a performance problem introduced by commit 6d7547c21.
ERROR_ACCESS_DENIED is returned in some other cases besides the
delete-pending case considered by that commit; notably, if the
given path names a directory instead of a plain file. In that
case we'll uselessly loop for 1 second before returning the
failure condition. That slows down some usage scenarios enough
to cause test timeout failures on our Windows buildfarm critters.
To fix, try to stat() the file, and sleep/loop only if that fails.
It will fail in the delete-pending case, and also in the case where
the deletion completed before we could stat(), so we have the cases
where we want to loop covered. In the directory case, the stat()
should succeed, letting us exit without a wait.
One case where we'll still wait uselessly is if the access-denied
problem pertains to a directory in the given pathname. But we don't
expect that to happen in any performance-critical code path.
There might be room to refine this further, but I'll push it now
in hopes of making the buildfarm green again.
Back-patch, like the preceding commit.
Alexander Lakhin and Tom Lane
Discussion: https://postgr.es/m/23073.1576626626@sss.pgh.pa.us
We realized years ago that it's better for libpq to accept all
connection parameters syntactically, even if some are ignored or
restricted due to lack of the feature in a particular build.
However, that lesson from the SSL support was for some reason never
applied to the GSSAPI support. This is causing various buildfarm
members to have problems with a test case added by commit 6136e94dc,
and it's just a bad idea from a user-experience standpoint anyway,
so fix it.
While at it, fix some places where parameter-related infrastructure
was added with the aid of a dartboard, or perhaps with the aid of
the anti-pattern "add new stuff at the end". It should be safe
to rearrange the contents of struct pg_conn even in released
branches, since that's private to libpq (and we'd have to move
some fields in some builds to fix this, anyway).
Back-patch to all supported branches.
Discussion: https://postgr.es/m/11297.1576868677@sss.pgh.pa.us
Most of the MSVC Perl code uses forward slashes for file paths. Make
the few places that use backslashes the same. This also helps running
that code on non-Windows.
Previously, the Windows MSVC build generated pg_config.h from a
hard-coded pg_config.h.win32 with some ad hoc postprocessing. The
pg_config.h.win32 file required manual maintenance and was as a result
frequently out of date.
Instead, have the MSVC build scripts emulate what configure and
config.status do: collect a list of defines and then create
pg_config.h from pg_config.h.in by changing the appropriate lines.
The previous setup was made to support old Windows build systems that
didn't have any text processing capabilities, but the current system
has Perl, so it's not a problem. pg_config.h.win32 is removed.
In order to try to keep the Windows side of things more up to date in
the future, we now also require that all symbols found in
pg_config.h.in are defined in the MSVC build system. So if there is a
change in configure that results in a new symbol, an update in
Solution.pm will be required.
The other headers managed by AC_CONFIG_HEADERS in configure, namely
src/include/pg_config_ext.h and
src/interfaces/ecpg/include/ecpg_config.h, get the same treatment, so
this removes even more ad hoc code in the MSVC build scripts.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/1441b834-f434-e0bf-46ed-9c4d5c29c2d4%402ndquadrant.com
A new function EmitProcSignalBarrier() can be used to emit a global
barrier which all backends that participate in the ProcSignal
mechanism must absorb, and a new function WaitForProcSignalBarrier()
can be used to wait until all relevant backends have in fact
absorbed the barrier.
This can be used to coordinate global state changes, such as turning
checksums on while the system is running.
There's no real client of this mechanism yet, although two are
proposed, but an enum has to have at least one element, so this
includes a placeholder type (PROCSIGNAL_BARRIER_PLACEHOLDER) which
should be replaced by the first real client of this mechanism to
get committed.
Andres Freund and Robert Haas, reviewed by Daniel Gustafsson and,
in earlier versions, by Magnus Hagander.
Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
The REDO routine for nbtree's xl_btree_vacuum record type hasn't
performed a "pin scan" since commit 3e4b7d87 went in, so clearly there
isn't any point in VACUUM WAL-logging information that won't actually be
used. Finish off the work of commit 3e4b7d87 (and the closely related
preceding commit 687f2cd7) by removing the code that generates this
unused information. Also remove the REDO routine code disabled by
commit 3e4b7d87.
Replace the unneeded lastBlockVacuumed field in xl_btree_vacuum with a
new "ndeleted" field. The new field isn't actually needed right now,
since we could continue to infer the array length from the overall
record length. However, an upcoming patch to add deduplication to
nbtree needs to add an "items updated" field to xl_btree_vacuum, so we
might as well start being explicit about the number of items now.
(Besides, it doesn't seem like a good idea to leave the xl_btree_vacuum
struct without any fields; the C standard says that that's undefined.)
nbtree VACUUM no longer forces writing a WAL record for the last block
in the index. Writing out a WAL record with no items for the final
block was supposed to force processing of a lastBlockVacuumed field by a
pin scan.
Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.
Discussion: https://postgr.es/m/CAH2-WzmY_mT7UnTzFB5LBQDBkKpdV5UxP3B5bLb7uP%3D%3D6UQJRQ%40mail.gmail.com
The previous coding imagined that it could call before_shmem_exit()
when a non-exclusive backup began and then remove the previously-added
handler by calling cancel_before_shmem_exit() when that backup
ended. However, this only works provided that nothing else in the
system has registered a before_shmem_exit() hook in the interim,
because cancel_before_shmem_exit() is documented to remove a callback
only if it is the latest callback registered. It also only works
if nothing can ERROR out between the time that sessionBackupState
is reset and the time that cancel_before_shmem_exit(), which doesn't
seem to be strictly true.
To fix, leave the handler installed for the lifetime of the session,
arrange to install it just once, and teach it to quietly do nothing if
there isn't a non-exclusive backup in process.
This is a bug, but for now I'm not going to back-patch, because the
consequences are minor. It's possible to cause a spurious warning
to be generated, but that doesn't really matter. It's also possible
to trigger an assertion failure, but production builds shouldn't
have assertions enabled.
Patch by me, reviewed by Kyotaro Horiguchi, Michael Paquier (who
preferred a different approach, but got outvoted), Fujii Masao,
and Tom Lane, and with comments by various others.
Discussion: http://postgr.es/m/CA+TgmobMjnyBfNhGTKQEDbqXYE3_rXWpc4CM63fhyerNCes3mA@mail.gmail.com
Commit 7dbfea3c45 thought it could get
away with removing this, but Thomas Munro reports, on behalf of the
buildfarm, that it's still needed at least on Windows to avoid
compiler warnings.
The new function, heap_fetch_toast_slice, is shared between
toast_fetch_datum_slice and toast_fetch_datum, and does all the
work of scanning the TOAST table, fetching chunks, and storing
them into the space allocated for the result varlena.
As an incidental side effect, this allows toast_fetch_datum_slice
to perform the scan with only a single scankey if all chunks are
being fetched, which might have some tiny performance benefit.
Discussion: http://postgr.es/m/CA+TgmobBzxwFojJ0zV0Own3dr09y43hp+OzU2VW+nos4PMXWEg@mail.gmail.com
Older gcc versions are not happy with having multiple declarations
for the same typedef name (not struct name). I'm a bit dubious
as to how well-thought-out that patch was at all, but for the moment
just fix it enough so I can get some work done today.
Discussion: https://postgr.es/m/20191218101338.GB325369@paquier.xyz
Tuple conversion support in tupconvert.c is able to convert rowtypes
between two relations, inner and outer, which are logically equivalent
but have a different ordering or even dropped columns (used mainly for
inheritance tree and partitions). This makes use of attribute mappings,
which are simple arrays made of AttrNumber elements with a length
matching the number of attributes of the outer relation. The length of
the attribute mapping has been treated as completely independent of the
mapping itself until now, making it easy to pass down an incorrect
mapping length.
This commit refactors the code related to attribute mappings and moves
it into an independent facility called attmap.c, extracted from
tupconvert.c. This merges the attribute mapping with its length,
avoiding to try to guess what is the length of a mapping to use as this
is computed once, when the map is built.
This will avoid mistakes like what has been fixed in dc816e58, which has
used an incorrect mapping length by matching it with the number of
attributes of an inner relation (a child partition) instead of an outer
relation (a partitioned table).
Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
This patch allows building the local relmap cache for a subscribed
relation after processing pending invalidation messages and potential
relcache updates. Without this, the attributes in the local cache don't
tally with the updated relcache entry leading to invalid memory access.
Reported-by Jehan-Guillaume de Rorthais
Author: Jehan-Guillaume de Rorthais and Vignesh C
Reviewed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/20191025175929.7e90dbf5@firost
This changes the routines in charge of recycling WAL segments past the
last redo LSN to not use anymore "RedoRecPtr" as a local variable, which
is also available in the context of the session as a static declaration,
replacing it with "lastredoptr". This confusion has been introduced by
d9fadbf, so backpatch down to v11 like the other commit.
Thanks to Tom Lane, Robert Haas, Alvaro Herrera, Mark Dilger and Kyotaro
Horiguchi for the input provided.
Author: Ranier Vilela
Discussion: https://postgr.es/m/MN2PR18MB2927F7B5F690065E1194B258E35D0@MN2PR18MB2927.namprd18.prod.outlook.com
Backpatch-through: 11
If CheckAttributeType() threw an error about the datatype of an
index expression column, it would report an empty column name,
which is pretty unhelpful and certainly not the intended behavior.
I (tgl) evidently broke this in commit cfc5008a5, by not noticing
that the column's attname was used above where I'd placed the
assignment of it.
In HEAD and v12, this is trivially fixable by moving up the
assignment of attname. Before v12 the code is a bit more messy;
to avoid doing substantial refactoring, I took the lazy way out
and just put in two copies of the assignment code.
Report and patch by Amit Langote. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/CA+HiwqFA+BGyBFimjiYXXMa2Hc3fcL0+OJOyzUNjhU4NCa_XXw@mail.gmail.com
Commit d5406dea25 used a slightly
novel, and wrong, approach to compute the length of the last
toast chunk. It worked fine unless the last chunk happened to
have the largest possible size.
Rework some of the checks for bad TOAST chunks to be a bit simpler
and easier to understand. These checks verify that (1) we get all
and only the chunk numbers we expect to see and (2) each chunk has
the expected size. However, the existing code was a bit hard to
understand, at least for me; try to make it clearer.
As part of that, have toast_fetch_datum_slice check the relationship
between endchunk and totalchunks only with an Assert() rather than
checking every chunk number against both values. There's no need to
check that relationship in production builds because it's not a
function of whether on-disk corruption is present; it's just a
question of whether the code does the right math.
Also, have toast_fetch_datum_slice() use ereport(ERROR) rather than
elog(ERROR). Commit fd6ec93bf8 made
the two functions inconsistent with each other.
Rename assorted variables for better clarity and consistency, and
move assorted variables from function scope to the function's main
loop. Remove a few variables that are used only once entirely.
Patch by me, reviewed by Peter Eisentraut.
Discussion: http://postgr.es/m/CA+TgmobBzxwFojJ0zV0Own3dr09y43hp+OzU2VW+nos4PMXWEg@mail.gmail.com
Commit 48995040d5 removed the largest
barrier to use of simplehash in frontend code, but there's one more
problem: it uses elog(ERROR, ...) or elog(LOG, ...) in a couple of
places. Work around that by changing those to pg_log_error() and
pg_log_info() when FRONTEND is defined.
Patch by me, reviewed by Andres Freund.
Discussion: http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@mail.gmail.com
If the SH_RAW_ALLOCATOR is defined, it will be used to allocate bytes
for the hash table, and no dependencies on MemoryContext will exist.
This means, in particular, that the SH_CREATE function will not take
a MemoryContext argument.
Patch by me, reviewed by Andres Freund.
Discussion: http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@mail.gmail.com
Where possible, share signal handler code and main loop interrupt
checking. This saves quite a bit of code and should simplify
maintenance, too.
This commit intends not to change the way anything works, even
though that might allow more code to be unified. It does unify
a bunch of individual variables into a ShutdownRequestPending
flag that has is now used by a bunch of different process types,
though.
Patch by me, reviewed by Andres Freund and Daniel Gustafsson.
Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
There seems to be no reason for every background process to have
its own flag indicating that a config-file reload is needed.
Instead, let's just use ConfigFilePending for that purpose
everywhere.
Patch by me, reviewed by Andres Freund and Daniel Gustafsson.
Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
Some auxiliary processes, as well as the autovacuum launcher,
have interrupt handling code directly in their main loops.
Try to abstract things a little better by moving it into
separate functions.
This doesn't make any functional difference, and leaves
in place relatively large differences among processes in how
interrupts are handled, but hopefully it at least makes it
easier to see the commonalities and differences across
process types.
Patch by me, reviewed by Andres Freund and Daniel Gustafsson.
Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
This Assert thought that an overflowed transaction can never get registered
for the group update. But that is not true, because even when the number
of children for a transaction got reduced, the overflow flag is not
changed. And, for group update, we only care about the current number of
children for a transaction that is being committed.
Based on comments by Andres Freund, remove a redundant Assert in
TransactionIdSetPageStatus as we already had a static Assert for the same
condition a few lines earlier.
Reported-by: Vignesh C
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 11
Discussion: https://postgr.es/m/CAFiTN-s5=uJw-Z6JC9gcqtBSjXsrHnU63PXBrA=pnBjqnkm5UA@mail.gmail.com
The refactoring done in a4fd3aa for query cancellation has messed up
with the logic in psql by mixing CancelRequested and cancel_pressed,
breaking for example \watch. The former would be switched to true if a
cancellation request has been attempted and that it actually succeeded,
and the latter tracks if a cancellation attempt has been done.
This commit brings back the code of psql to a state consistent to what
it was before a4fd3aa, without giving up on the refactoring pieces
introduced. It should be actually possible to merge more both flags as
their concepts are close enough, however note that psql's --single-step
mode relies on cancel_pressed to be always set, so this requires more
careful analysis left for later.
While on it, fix the declarations of CancelRequested (in cancel.c) and
cancel_pressed (in psql) to be volatile sig_atomic_t. Previously,
both were declared as booleans, which should be fine on modern
platforms, but the C standard recommends the use of sig_atomic_t for
variables used in signal handlers. Note that since its introduction in
a1792320, CancelRequested declaration was not volatile.
Reported-by: Jeff Janes
Author: Michael Paquier
Discussion: https://postgr.es/m/CAMkU=1zpoUDGKqWKuMWkj7t-bOCaJDx0r=5te_-d0B2HVLABXg@mail.gmail.com
force_parallel_mode = regress is supposed to force use of a Gather
node without having any impact on EXPLAIN output. But it failed to
accomplish that if both ANALYZE and VERBOSE are given, because that
enables per-worker output data that you wouldn't see if the Gather
hadn't been inserted. Improve the logic so that we suppress the
per-worker data too.
This allows putting the new test case added by commit 5935917ce
back into the originally intended form (cf. 776a2c887, 22864f6e0).
We can also get rid of a kluge in subselect.sql, which previously
had to clean up after force_parallel_mode's failure to do what it
said on the tin.
Discussion: https://postgr.es/m/18445.1576177309@sss.pgh.pa.us
Attempting to open a file fails with ERROR_ACCESS_DENIED if the file
is flagged for deletion but not yet actually gone (another in a long
list of reasons why Windows is broken, if you ask me). This seems
likely to explain a lot of irreproducible failures we see in the
buildfarm. This state generally persists for only a millisecond or so,
so just wait a bit and retry. If it's a real permissions problem,
we'll eventually give up and report it as such. If it's the pending
deletion case, we'll see file-not-found and report that after the
deletion completes, and the caller will treat that in an appropriate
way.
In passing, rejigger the existing retry logic for some other error
cases so that we don't uselessly wait an extra time when we're
not going to retry anymore.
Alexander Lakhin (with cosmetic tweaks by me). Back-patch to all
supported branches, since this seems like a pretty safe change and
the problem is definitely real.
Discussion: https://postgr.es/m/16161-7a985d2f1bbe8f71@postgresql.org
recoveryDelayUntilTime was introduced by commit 36da3cfb45 as a global
because its method of operation was devilishly intrincate. Commit
c945af80cf removed all that complexity and could have turned it into a
local variable, but didn't. Do so now.
Discussion: https://postgr.es/m/20191213200751.GA10731@alvherre.pgsql
Reviewed-by: Michaël Paquier, Daniel Gustafsson
Commit a7ee7c8513 fixed a bug in GiST page split during index creation,
where we failed to re-find the position of a downlink after the page
containing it was split. However, that fix was incomplete; the other call
to gistinserttuples() in the same function needs to also clear
'downlinkoffnum'.
Fixes bug #16134 reported by Alexander Lakhin, for real this time. The
previous fix was enough to fix the crash with the reproducer script for
bug #16162, but the original script for #16134 was still crashing.
Backpatch to v12, like the previous incomplete fix.
Discussion: https://www.postgresql.org/message-id/d869f537-abe4-d2ea-0510-38cd053f5152%40gmail.com
Commit f14413b684 broke the build of
Perl-using modules on Windows.
Perl might have its own definitions of uid_t and gid_t, so we hide
ours, but then we can't use ours in our header files such as port.h
which don't see the Perl definition.
Hide our definition of getpeereid() on Windows in Perl-using modules,
using PLPERL_HAVE_UID_GID define. That means we can't portably use
getpeeruid() is such modules right now, but there is no need anyway.
The getpeereid() uses have so far been protected by HAVE_UNIX_SOCKETS,
so they didn't ever care about Windows support. But in anticipation
of Unix-domain socket support on Windows, that needs to be handled
differently.
Windows doesn't support getpeereid() at this time, so we use the
existing not-supported code path. We let configure do its usual thing
of picking up the replacement from libpgport, instead of the custom
overrides that it was doing before.
But then Windows doesn't have struct passwd, so this patch sprinkles
some additional #ifdef WIN32 around to make it work. This is similar
to existing code that deals with this issue.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/5974caea-1267-7708-40f2-6009a9d653b0@2ndquadrant.com
This has been introduced by c16dc1a since progress reporting for VACUUM
has been added. As this issue just causes some extra work and is
harmless, no backpatch is done.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20191213030831.GT2082@telsasoft.com
It appears that a concurrent autovacuum/autoanalyze run can cause
changes in the plans expected by this test. To prevent that, change
the tables it uses to be temp tables --- there's no need for them
to be permanent, and this should save a few cycles too.
Discussion: https://postgr.es/m/3244.1576160824@sss.pgh.pa.us
The RTE_RESULT simplification logic added by commit 4be058fe9 had a
flaw: it would collapse out a RTE_RESULT that is due to compute a
PlaceHolderVar, and reassign the PHV to the parent join level, even if
another input relation of the join contained a lateral reference to
the PHV. That can't work because the PHV would be computed too late.
In practice it led to failures of internal sanity checks later in
planning (either assertion failures or errors such as "failed to
construct the join relation").
To fix, add code to check for the presence of such PHVs in relevant
portions of the query tree. Notably, this required refactoring
range_table_walker so that a caller could ask to walk individual RTEs
not the whole list. (It might be a good idea to refactor
range_table_mutator in the same way, if only to keep those functions
looking similar; but I didn't do so here as it wasn't necessary for
the bug fix.)
This exercise also taught me that find_dependent_phvs(), as it stood,
could only safely be used on the entire Query, not on subtrees.
Adjust its API to reflect that; which in passing allows it to have
a fast path for the common case of no PHVs anywhere.
Per report from Will Leinweber. Back-patch to v12 where the bug
was introduced.
Discussion: https://postgr.es/m/CALLb-4xJMd4GZt2YCecMC95H-PafuWNKcmps4HLRx2NHNBfB4g@mail.gmail.com
When loading DH parameters used for the generation of ephemeral DH keys
in the backend, the code has never bothered releasing the memory used
for the DH information loaded from a file or from libpq's default. This
commit makes sure that the information is properly free()'d.
Note that as SSL parameters can be reloaded, this can cause an accumulation
of memory leaked. As the leak is minor, no backpatch is done.
Reported-by: Dmitry Uspenskiy
Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
The previous commit failed to consider that FileGetRawDesc() might
not return a valid fd, as discovered on the build farm. Switch to
using the File interface only.
Back-patch to 12, like the previous commit.
_mdfd_getseg() opens all segments up to the requested one. That
causes problems for mdsyncfiletag(), if mdunlinkfork() has
already unlinked other segment files. Open the file we want
directly by name instead, if it's not already open.
The consequence of this bug was a rare panic in the checkpointer,
made more likely if you saturated the sync request queue so that
the SYNC_FORGET_REQUEST messages for a given relation were more
likely to be absorbed in separate cycles by the checkpointer.
Back-patch to 12. Defect in commit 3eb77eba.
Author: Thomas Munro
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20191119115759.GI30362%40telsasoft.com
The bug was similar to the one that was fixed in commit 22251686f0. When
we split page X and insert the downlink for the new page, the parent page
might also need to be split. When that happens, the downlink offset number
we remembered for X is no longer valid. We correctly called
gistFindCorrectParent() to re-find it, but gistFindCorrectParent() doesn't
do anything if the LSN of the page hasn't changed, and we stopped updating
LSNs during index build in commit 9155580fd5. The buggy codepath was taken
if the page was split into three or more pages, and inserting the downlink
caused the parent page to split. To fix, explicitly mark the downlink
offset number as invalid, to force gistFindCorrectParent() to re-find it.
Fixes bug #16134 reported by Alexander Lakhin, reported again as #16162 by
Andreas Kunert. Thanks to Jeff Janes, Tom Lane and Tomas Vondra for
debugging. Backpatch to v12, where we stopped WAL-logging during index
build.
Discussion: https://www.postgresql.org/message-id/16134-0423f729671dec64%40postgresql.org
Discussion: https://www.postgresql.org/message-id/16162-45d21b7b6c1a3105%40postgresql.org
Prefer to call "rl_filename_completion_function" and
"rl_completion_matches", rather than using the names without the rl_
prefix. This matches Readline's documentation, and makes our code
a little clearer about which names are external. On platforms that
only have the un-prefixed names (just some very ancient versions of
libedit, AFAICT), reverse the direction of the compatibility macro
definitions to match.
Also, remove our extern declaration of "filename_completion_function";
whatever libedit versions may have failed to declare that are surely
dead and buried.
Discussion: https://postgr.es/m/23608.1576248145@sss.pgh.pa.us
This undoes my hurried commit 776a2c887, restoring the removed test case
in a form that passes with or without force_parallel_mode = regress.
It turns out that force_parallel_mode = regress simply fails to mask
the Worker lines that will be produced by EXPLAIN (ANALYZE, VERBOSE).
I'd say that's a bug in that feature, as its entire alleged reason
for existence is to make the EXPLAIN output the same. It's certainly
not a bug in the plan node pruning logic. Fortunately, this test case
doesn't really need to use ANALYZE, so just drop that.
Discussion: https://postgr.es/m/18891.1576109690@sss.pgh.pa.us
The DTK_DOW/DTK_ISODOW and DTK_DOY switch cases in timestamp_part() and
timestamptz_part() contained calls of timestamp2tm() that were fully
redundant with the ones done just above the switch. This evidently crept
in during commit 258ee1b63, which relocated that code from another place
where the calls were indeed needed. Just delete the redundant calls.
I (tgl) noted that our test coverage of these functions left quite a
bit to be desired, so extend timestamp.sql and timestamptz.sql to
cover all the branches.
Back-patch to all supported branches, as the previous commit was.
There's no real issue here other than some wasted cycles in some
not-too-heavily-used code paths, but the test coverage seems valuable.
Report and patch by Li Japin; test case adjustments by me.
Discussion: https://postgr.es/m/SG2PR06MB37762CAE45DB0F6CA7001EA9B6550@SG2PR06MB3776.apcprd06.prod.outlook.com
gcc-based Windows buildfarm animals are not happy about a multiline
regular expression I added recently. Try to accomodate; existing
pg_basebackup tests suggest that \n should work instead of a bare
newline, but throw in \r also. This being perl, TIMTOWTDI.
Also remove the pointless $ at the end of the pattern, for extra luck.
(If this doesn't work, I'll probably just split the regex in two.)
Per buildfarm members jacana and fairywren.
Discussion: https://postgr.es/m/3562.1576161217@sss.pgh.pa.us
This is made necessary by the fact that commit 6ef77cf46 added
AppendRelInfos to plan trees. I'd concluded that this extra code was
not necessary because we don't transmit that data to parallel workers
... but I forgot about -DWRITE_READ_PARSE_PLAN_TREES. Per buildfarm.
The buildfarm says this produces some unexpected output with
force_parallel_mode = regress. There's probably a bug underneath
that, but for the moment just delete the test case to make the
buildfarm green again.
(I now notice that the case had also failed to get updated to follow
commit d52eaa094, which made plan_cache_mode = force_generic_plan
prevail throughout partition_prune.sql; it was thereby managing to
break a later test. When/if we put this back in, *don't* include the
SET and RESET commands.)
Previously, if the startup pruning logic proved that all child nodes
of an Append or MergeAppend could be pruned, we still kept one, just
to keep EXPLAIN from failing. The previous commit removed the
ruleutils.c limitation that required this kluge, so drop it. That
results in less-confusing EXPLAIN output, as per a complaint from
Yuzuko Hosoya.
David Rowley
Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
This patch causes EXPLAIN to always assign a separate table alias to the
parent RTE of an append relation (inheritance set); before, such RTEs
were ignored if not actually scanned by the plan. Since the child RTEs
now always have that same alias to start with (cf. commit 55a1954da),
the net effect is that the parent RTE usually gets the alias used or
implied by the query text, and the children all get that alias with "_N"
appended. (The exception to "usually" is if there are duplicate aliases
in different subtrees of the original query; then some of those original
RTEs will also have "_N" appended.)
This results in more uniform output for partitioned-table plans than
we had before: the partitioned table itself gets the original alias,
and all child tables have aliases with "_N", rather than the previous
behavior where one of the children would get an alias without "_N".
The reason for giving the parent RTE an alias, even if it isn't scanned
by the plan, is that we now use the parent's alias to qualify Vars that
refer to an appendrel output column and appear above the Append or
MergeAppend that computes the appendrel. But below the append, Vars
refer to some one of the child relations, and are displayed that way.
This seems clearer than the old behavior where a Var that could carry
values from any child relation was displayed as if it referred to only
one of them.
While at it, change ruleutils.c so that the code paths used by EXPLAIN
deal in Plan trees not PlanState trees. This effectively reverts a
decision made in commit 1cc29fe7c, which seemed like a good idea at
the time to make ruleutils.c consistent with explain.c. However,
it's problematic because we'd really like to allow executor startup
pruning to remove all the children of an append node when possible,
leaving no child PlanState to resolve Vars against. (That's not done
here, but will be in the next patch.) This requires different handling
of subplans and initplans than before, but is otherwise a pretty
straightforward change.
Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
This makes such log entries more useful, since the cause of the error
can be dependent on the parameter values.
Author: Alexey Bashtanov, Álvaro Herrera
Discussion: https://postgr.es/m/0146a67b-a22a-0519-9082-bc29756b93a2@imap.cc
Reviewed-by: Peter Eisentraut, Andres Freund, Tom Lane
Since its inception, our Windows signal emulation code has worked by
running a main signal thread that just watches for incoming signal
requests, and then spawns a new thread to handle each such request.
That design is meant for servers in which requests can take substantial
effort to process, and it's worth parallelizing the handling of
requests. But those assumptions are just bogus for our signal code.
It's not much more than pg_queue_signal(), which is cheap and can't
parallelize at all, plus we don't really expect lots of signals to
arrive at the same backend at once. More importantly, this approach
creates failure modes that we could do without: either inability to
spawn a new thread or inability to create a new pipe handle will risk
loss of signals. Hence, dispense with the separate per-signal threads
and just service each request in-line in the main signal thread. This
should be a bit faster (for the normal case of one signal at a time)
as well as more robust.
Patch by me; thanks to Andrew Dunstan for testing and Amit Kapila
for review.
Discussion: https://postgr.es/m/4412.1575748586@sss.pgh.pa.us
It was once possible to do ALTER TABLE ... SET STATISTICS on system
tables without allow_sytem_table_mods. This was changed apparently by
accident between PostgreSQL 9.1 and 9.2, but a code comment still
claimed this was possible. Without that functionality, having a
separate ATPrepSetStatistics() is useless, so use the generic
ATSimplePermissions() instead and move the remaining custom code into
ATExecSetStatistics().
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/cc8d2648-a0ec-7a86-13e5-db473484e19e%402ndquadrant.com
gcc-7 used with a sufficient optimization level complains about warnings
around do_to_timestamp() regarding the initialization and handling of
some of its variables. Recent commits 66c74f8 and d589f94 made things
made the interface more confusing, so document which variables are
always expected and initialize properly the optional ones when they are
set.
Author: Andrey Lepikhov, Michael Paquier
Discussion: https://postgr.es/m/a7e28b83-27b1-4e1c-c76b-4268c4b785bc@postgrespro.ru
Clean up some comments (some generated by old versions of autoconf)
and some random ordering differences, so it's easier to diff this
against the default pg_config.h or pg_config.h.in. Remove LOCALEDIR
handling from pg_config.h.win32 altogether because it's already in
pg_config_paths.h.
This provides a mechanism to emit literal values in informative
messages, such as query parameters. The new code is more complex than
what it replaces, primarily because it wants to be more efficient.
It also has the (currently unused) additional optional capability of
specifying a maximum size to print.
The new function lives out of common/stringinfo.c so that frontend users
of that file need not pull in unnecessary multibyte-encoding support
code.
Author: Álvaro Herrera and Alexey Bashtanov, after a suggestion from Andres Freund
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20190920203905.xkv5udsd5dxfs6tr@alap3.anarazel.de
On Windows, we use CMD.EXE to redirect the postmaster's stdout/stderr
into a log file. CMD.EXE will open that file with non-sharing-friendly
parameters, and the file will remain open for a short time after the
postmaster has removed postmaster.pid. This can result in an
ERROR_SHARING_VIOLATION failure if we attempt to start a new postmaster
immediately with the same log file (e.g. during "pg_ctl restart").
This seems to explain intermittent buildfarm failures we've been seeing
on Windows machines.
To fix, just open and close the log file using our own pgwin32_open(),
which will wait if necessary to avoid the failure. (Perhaps someday
we should stop using CMD.EXE, but that would be a far more complex
patch, and it doesn't seem worth the trouble ... yet.)
Back-patch to v12. This only solves the problem when frontend fopen()
is redirected to pgwin32_fopen(), which has only been true since commit
0ba06e0bf. Hence, no point in back-patching further, unless we care
to back-patch that change too.
Diagnosis and patch by Alexander Lakhin (bug #16154).
Discussion: https://postgr.es/m/16154-1ccf0b537b24d5e0@postgresql.org
AfterTriggerExecute() retrieves a fresh tuple or pair of tuples from a
tuplestore and then stores the tuple(s) in the passed-in slot(s) if
AFTER_TRIGGER_FDW_FETCH, while it uses the most-recently-retrieved
tuple(s) stored in the slot(s) if AFTER_TRIGGER_FDW_REUSE. This was
done correctly before 12, but commit ff11e7f4b broke it by mistakenly
clearing the tuple(s) stored in the slot(s) in that function, leading to
an assertion failure as reported in bug #16139 from Alexander Lakhin.
Also, fix some other issues with the aforementioned commit in passing:
* For tg_newslot, which is a slot added to the TriggerData struct by the
commit to store new updated tuples, it didn't ensure the slot was NULL
if there was no such tuple.
* The commit failed to update the documentation about the trigger
interface.
Author: Etsuro Fujita
Backpatch-through: 12
Discussion: https://postgr.es/m/16139-94f9ccf0db6119ec%40postgresql.org
pg_signal_dispatch_thread() responded to the client (signal sender)
and disconnected the pipe before actually setting the shared variables
that make the signal visible to the backend process's main thread.
In the worst case, it seems, effective delivery of the signal could be
postponed for as long as the machine has any other work to do.
To fix, just move the pg_queue_signal() call so that we do it before
responding to the client. This essentially makes pgkill() synchronous,
which is a stronger guarantee than we have on Unix. That may be
overkill, but on the other hand we have not seen comparable timing bugs
on any Unix platform.
While at it, add some comments to this sadly underdocumented code.
Problem diagnosis and fix by Amit Kapila; I just added the comments.
Back-patch to all supported versions, as it appears that this can cause
visible NOTIFY timing oddities on all of them, and there might be
other misbehavior due to slow delivery of other signals.
Discussion: https://postgr.es/m/32745.1575303812@sss.pgh.pa.us
isolationtester.c had a hard-wired limit of 3 minutes per test step.
It now emerges that this isn't quite enough for some of the slowest
buildfarm animals. This isn't the first time we've had to raise
this limit (cf. 1db439ad4), so let's make it configurable. This
patch raises the default to 5 minutes, and introduces an environment
variable PGISOLATIONTIMEOUT that can be set if more time is needed,
following the precedent of PGCTLTIMEOUT.
Also, modify isolationtester so that when the timeout is hit,
it explicitly reports having sent a cancel. This makes the regression
failure log considerably more intelligible. (In the worst case, a
timed-out test might actually be reported as "passing" without this
extra output, so arguably this is a bug fix in itself.)
In passing, update the README file, which had apparently not gotten
touched when we added "make check" support here.
Back-patch to 9.6; older versions don't have comparable timeout logic.
Discussion: https://postgr.es/m/22964.1575842935@sss.pgh.pa.us
This partially reverts commit 4dc6355210.
The information returned by the function can be obtained by calling
PQconninfo(), so the function is redundant.
While fooling around with the EXPLAIN improvements I've been working
on, I noticed that there were some large gaps in our test coverage
of ruleutils.c, according to the code coverage report. This commit
just adds a few test cases to improve coverage of:
get_name_for_var_field()
get_update_query_targetlist_def()
isSimpleNode()
get_sublink_expr()
When creating a uniqueness constraint using a pre-existing index,
we have always required that the index have the same properties you'd
get if you just let a new index get built. However, when collations
were added, we forgot to add the index's collation to that check.
It's hard to trip over this without intentionally trying to break it:
you'd have to explicitly specify a different collation in CREATE
INDEX, then convert it to a pkey or unique constraint. Still, if you
did that, pg_dump would emit a script that fails to reproduce the
index's collation. The main practical problem is that after a
pg_upgrade the index would be corrupt, because its actual physical
order wouldn't match what pg_index says. A more theoretical issue,
which is new as of v12, is that if you create the index with a
nondeterministic collation then it wouldn't be enforcing the normal
notion of uniqueness, causing the constraint to mean something
different from a normally-created constraint.
To fix, just add collation to the conditions checked for index
acceptability in ADD PRIMARY KEY/UNIQUE USING INDEX. We won't try
to clean up after anybody who's already created such a situation;
it seems improbable enough to not be worth the effort involved.
(If you do get into trouble, a REINDEX should be enough to fix it.)
In principle this is a long-standing bug, but I chose not to
back-patch --- the odds of causing trouble seem about as great
as the odds of preventing it, and both risks are very low anyway.
Per report from Alexey Bashtanov, though this is not his preferred
fix.
Discussion: https://postgr.es/m/b05ce36a-cefb-ca5e-b386-a400535b1c0b@imap.cc