Commit Graph

2447 Commits

Author SHA1 Message Date
Robert Haas 10ea0f924a Expose some information about backend subxact status.
A new function pg_stat_get_backend_subxact() can be used to get
information about the number of subtransactions in the cache of
a particular backend and whether that cache has overflowed. This
can be useful for tracking down performance problems that can
result from overflowed snapshots.

Dilip Kumar, reviewed by Zhihong Yu, Nikolay Samokhvalov,
Justin Pryzby, Nathan Bossart, Ashutosh Sharma, Julien
Rouhaud. Additional design comments from Andres Freund,
Tom Lane, Bruce Momjian, and David G. Johnston.

Discussion: http://postgr.es/m/CAFiTN-ut0uwkRJDQJeDPXpVyTWD46m3gt3JDToE02hTfONEN=Q@mail.gmail.com
2022-12-19 14:43:09 -05:00
Peter Eisentraut 75f49221c2 Static assertions cleanup
Because we added StaticAssertStmt() first before StaticAssertDecl(),
some uses as well as the instructions in c.h are now a bit backwards
from the "native" way static assertions are meant to be used in C.
This updates the guidance and moves some static assertions to better
places.

Specifically, since the addition of StaticAssertDecl(), we can put
static assertions at the file level.  This moves a number of static
assertions out of function bodies, where they might have been stuck
out of necessity, to perhaps better places at the file level or in
header files.

Also, when the static assertion appears in a position where a
declaration is allowed, then using StaticAssertDecl() is more native
than StaticAssertStmt().

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
2022-12-15 10:10:32 +01:00
Peter Eisentraut 2d4f1ba6cf Update types in File API
Make the argument types of the File API match stdio better:

- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.

In passing, change the FilePrefetch() amount argument from int to
off_t, to match the underlying posix_fadvise().

Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
2022-12-08 08:58:15 +01:00
Michael Paquier 43351557d0 Make materialized views participate in predicate locking
Matviews have been discarded from needing predicate locks since 3bf3ab8
and their introduction.  At this point, there was no concurrent flavor
of REFRESH yet, hence there was no meaning in having materialized views
look at read/write conflicts with concurrent transactions using
themselves the serializable isolation level because they could only be
refreshed with an access exclusive lock.  CONCURRENTLY, on the contrary,
allows reads and writes during a refresh as it holds a share update
exclusive lock.

Some isolation tests are added to show the effect of the change, with a
combination of one table and a matview based on it, using a mix of
REFRESH CONCURRENTLY and read/write queries.

This could arguably be considered as a bug, but as it is a subtle
behavior change potentially impacting applications no backpatch is
done.

Author: Yugo Nagata
Reviewed-by: Richard Guo, Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/20220726164434.42d4e33911b4b4fcf751c4e7@sraoss.co.jp
2022-12-01 15:41:13 +09:00
Tom Lane 8242752f9c Improve heuristics for compressing the KnownAssignedXids array.
Previously, we'd compress only when the active range of array entries
reached Max(4 * PROCARRAY_MAXPROCS, 2 * pArray->numKnownAssignedXids).
If max_connections is large, the first term could result in not
compressing for a long time, resulting in much wastage of cycles in
hot-standby backends scanning the array to take snapshots.  Get rid
of that term, and just bound it to 2 * pArray->numKnownAssignedXids.

That however creates the opposite risk, that we might spend too much
effort compressing.  Hence, consider compressing only once every 128
commit records.  (This frequency was chosen by benchmarking.  While
we only tried one benchmark scenario, the results seem stable over
a fairly wide range of frequencies.)

Also, force compression when processing RecoveryInfo WAL records
(which should be infrequent); the old code could perform compression
then, but would do so only after the same array-range check as for
the transaction-commit path.

Also, opportunistically run compression if the startup process is about
to wait for WAL, though not oftener than once a second.  This should
prevent cases where we waste lots of time by leaving the array
not-compressed for long intervals due to low WAL traffic.

Lastly, add a simple check to keep us from uselessly compressing
when the array storage is already compact.

Back-patch, as the performance problem is worse in pre-v14 branches
than in HEAD.

Simon Riggs and Michail Nikolaev, with help from Tom Lane and
Andres Freund.

Discussion: https://postgr.es/m/CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
2022-11-29 15:43:17 -05:00
Alvaro Herrera 0557e17702
Ignore invalidated slots while computing oldest catalog Xmin
Once a logical slot has acquired a catalog_xmin, it doesn't let go of
it, even when invalidated by exceeding the max_slot_wal_keep_size, which
means that dead catalog tuples are not removed by vacuum anymore since
the point is invalidated, until the slot is dropped.  This could be
catastrophic if catalog churn is high.

Change the computation of Xmin to ignore invalidated slots,
to prevent dead rows from accumulating.

Backpatch to 13, where slot invalidation appeared.

Author: Sirisha Chamarthi <sirichamarthi22@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAKrAKeUEDeqquN9vwzNeG-CN8wuVsfRYbeOUV9qKO_RHok=j+g@mail.gmail.com
2022-11-22 10:56:07 +01:00
Andres Freund 92daeca45d Add wait event for pg_usleep() in perform_spin_delay()
The lwlock wait queue scalability issue fixed in a4adc31f69 was quite hard to
find because of the exponential backoff and because we adjust spins_per_delay
over time within a backend.

To make it easier to find similar issues in the future, add a wait event for
the pg_usleep() in perform_spin_delay(). Showing a wait event while spinning
without sleeping would increase the overhead of spinlocks, which we do not
want.

We may at some later point want to have more granular wait events, but that'd
be a substantial amount of work. This provides at least some insights into
something currently hard to observe.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
https://postgr.es/m/20221120204310.xywrhyxyytsajuuq@awork3.anarazel.de
2022-11-21 20:34:17 -08:00
Andres Freund a4adc31f69 lwlock: Fix quadratic behavior with very long wait lists
Until now LWLockDequeueSelf() sequentially searched the list of waiters to see
if the current proc is still is on the list of waiters, or has already been
removed. In extreme workloads, where the wait lists are very long, this leads
to a quadratic behavior. #backends iterating over a list #backends
long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in
the first place also increases with the increased length of the wait queue, as
it becomes more likely that a lock is released while waiting for the wait list
lock, which is held for longer during lock release.

Due to the exponential back-off in perform_spin_delay() this is surprisingly
hard to detect. We should make that easier, e.g. by adding a wait event around
the pg_usleep() - but that's a separate patch.

The fix is simple - track whether a proc is currently waiting in the wait list
or already removed but waiting to be woken up in PGPROC->lwWaiting.

In some workloads with a lot of clients contending for a small number of
lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput.

As the quadratic behavior arguably is a bug, we might want to decide to
backpatch this fix in the future.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de
Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
2022-11-20 11:56:32 -08:00
Andres Freund 8c954168cf Fix mislabeling of PROC_QUEUE->links as PGPROC, fixing UBSan on 32bit
ProcSleep() used a PGPROC* variable to point to PROC_QUEUE->links.next,
because that does "the right thing" with SHMQueueInsertBefore(). While that
largely works, it's certainly not correct and unnecessary - we can just use
SHM_QUEUE* to point to the insertion point.

Noticed when testing a 32bit of postgres with undefined behavior
sanitizer. UBSan noticed that sometimes the supposed PGPROC wasn't
sufficiently aligned (required since 46d6e5f567, ensured indirectly, via
ShmemAllocRaw() guaranteeing cacheline alignment).

For now fix this by using a SHM_QUEUE* for the insertion point. Subsequently
we should replace all the use of PROC_QUEUE and SHM_QUEUE with ilist.h, but
that's a larger change that we don't want to backpatch.

Backpatch to all supported versions - it's useful to be able to run postgres
under UBSan.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20221117014230.op5kmgypdv2dtqsf@awork3.anarazel.de
Backpatch: 11-
2022-11-19 12:22:04 -08:00
Peter Geoghegan 1489b1ce72 Standardize rmgrdesc recovery conflict XID output.
Standardize on the name snapshotConflictHorizon for all XID fields from
WAL records that generate recovery conflicts when in hot standby mode.
This supersedes the previous latestRemovedXid naming convention.

The new naming convention places emphasis on how the values are actually
used by REDO routines.  How the values are generated during original
execution (details of which vary by record type) is deemphasized.  Users
of tools like pg_waldump can now grep for snapshotConflictHorizon to see
all potential sources of recovery conflicts in a standardized way,
without necessarily having to consider which specific record types might
be involved.

Also bring a couple of WAL record types that didn't follow any kind of
naming convention into line.  These are heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type.  Now every WAL record whose REDO
routine calls ResolveRecoveryConflictWithSnapshot() passes through the
snapshotConflictHorizon field from its WAL record.  This is follow-up
work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL
records use a standard snapshotConflictHorizon style XID cutoff.

No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL
records doesn't change.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
2022-11-17 14:55:08 -08:00
Amit Kapila 8b5262fa0e Improve comments referring snapshot's subxip array.
It was referred to as subxact array in a few places and subxip array in
others. By changing it to subxip array, we make it consistent with similar
references to xip array.

Author: Japin Li
Reviewd by: Julien Rouhaud, Richard Guo
Discussion: https://postgr.es/m/MEYP282MB1669DCE7AC193A947CED2A95B6009@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2022-11-15 09:37:19 +05:30
Peter Eisentraut b4b7ce8061 Add repalloc0 and repalloc0_array
These zero out the space added by repalloc.  This is a common pattern
that is quite hairy to code by hand.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/b66dfc89-9365-cb57-4e1f-b7d31813eeec@enterprisedb.com
2022-11-12 20:34:44 +01:00
Tom Lane 85d8b30724 Apply a better fix to mdunlinkfork().
Replace the stopgap fix I made in 0e758ae89 with a cleaner one.

The real problem with 4ab5dae94 is that it contorted this function's
logic substantially, by introducing a third code path that required
different behavior in the function's main loop.  That seems quite
unnecessary on closer inspection: the new IsBinaryUpgrade case can
just share the behavior of the other immediate-unlink cases.  Hence,
revert 4ab5dae94 and most of 0e758ae89 (keeping the latter's
save/restore errno fix), and add IsBinaryUpgrade to the set of
conditions tested to choose immediate unlink.

Also fix some additional places with sloppy handling of errno,
to ensure we have an invariant that we always continue processing
after any non-ENOENT failure of do_truncate.  I doubt that that's
fixing any bug of field importance, so I don't feel it necessary to
back-patch; but we might as well get it right while we're here.

Also improve the comments, which had drifted a bit from what the
code actually does, and neglected to mention some important
considerations.

Back-patch to v15, not because this is fixing any bug but because
it doesn't seem like a good idea for v15's mdunlinkfork logic to be
significantly different from both v14 and v16.

Discussion: https://postgr.es/m/3797575.1667924888@sss.pgh.pa.us
2022-11-09 14:15:38 -05:00
Tom Lane 0e758ae89a Fix failure to remove non-first segments of temporary tables.
Commit 4ab5dae94 broke mdunlinkfork's logic for removing additional
segments of a multi-gigabyte table, because it neglected to advance
"segno" after unlinking the first segment, in the code path where it
chooses to unlink that one immediately.  Then the main remove loop
gets ENOENT at segment zero and figures it's done, so we never remove
whatever additional segments might exist.

The main problem here is with large temporary tables, but WAL replay
of a drop of a large regular table would also fail to remove extra
segments.  The third case where this path is taken is for non-main
forks; but I doubt it matters for those since they probably never
exceed 1GB.

The simplest fix is just to increment segno after that unlink().
(Probably this logic could do with a more thorough rethink, but not
with mere hours to go before 15.1 wraps.)

While here, also fix an incautious assumption that
register_forget_request cannot change errno.  I don't think that
that has any really bad consequences, as we'd end up trying to unlink
the zero'th segment either way, but it greatly complicates reasoning
about what could happen here.  Also make a couple of other cosmetic
fixes.

Per bug #17679 from Balazs Szilfai.  Back-patch into v15, as the
faulty patch was.

Discussion: https://postgr.es/m/17679-1095d04450cf6a6e@postgresql.org
2022-11-07 11:36:45 -05:00
Michael Paquier 2a71de8915 Remove unneeded includes of <sys/stat.h>
Since bfb9dfd, none of the files updated in this commit have any stat()
calls, so these inclusions are not necessary, for the same reasons as
233cf6e.

Per discussion with John Naylor.

Discussion: https://postgr.es/m/CAFBsxsGGGX7KD6RxbNoSJzuSc8Gz3hOxcfhTOMLB_hJcm68dKQ@mail.gmail.com
2022-11-05 12:31:28 +09:00
Michael Paquier d9d873bac6 Clean up some inconsistencies with GUC declarations
This is similar to 7d25958, and this commit takes care of all the
remaining inconsistencies between the initial value used in the C
variable associated to a GUC and its default value stored in the GUC
tables (as of pg_settings.boot_val).

Some of the initial values of the GUCs updated rely on a compile-time
default.  These are refactored so as the GUC table and its C declaration
use the same values.  This makes everything consistent with other
places, backend_flush_after, bgwriter_flush_after, port,
checkpoint_flush_after doing so already, for example.

Extracted from a larger patch by Peter Smith.  The spots updated in the
modules are from me.

Author: Peter Smith, Michael Paquier
Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby
Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com
2022-10-31 12:44:48 +09:00
Peter Eisentraut b1099eca8f Remove AssertArg and AssertState
These don't offer anything over plain Assert, and their usage had
already been declared obsolescent.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13
2022-10-28 09:19:06 +02:00
Michael Paquier 4ab8c81bd9 Move pg_pwritev_with_retry() to src/common/file_utils.c
This commit moves pg_pwritev_with_retry(), a convenience wrapper of
pg_writev() able to handle partial writes, to common/file_utils.c so
that the frontend code is able to use it.  A first use-case targetted
for this routine is pg_basebackup and pg_receivewal, for the
zero-padding of a newly-initialized WAL segment.  This is used currently
in the backend when the GUC wal_init_zero is enabled (default).

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Thomas Munro
Discussion: https://postgr.es/m/CALj2ACUq7nAb7=bJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA@mail.gmail.com
2022-10-27 14:39:42 +09:00
Michael Paquier a19e5cee63 Rename SetSingleFuncCall() to InitMaterializedSRF()
Per discussion, the existing routine name able to initialize a SRF
function with materialize mode is unpopular, so rename it.  Equally, the
flags of this function are renamed, as of:
- SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC
- SRF_SINGLE_BLESS -> MAT_SRF_BLESS
The previous function and flags introduced in 9e98583 are kept around
for compatibility purposes, so as any extension code already compiled
with v15 continues to work as-is.  The declarations introduced here for
compatibility will be removed from HEAD in a follow-up commit.

The new names have been suggested by Andres Freund and Melanie
Plageman.

Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de
Backpatch-through: 15
2022-10-18 10:22:35 +09:00
Peter Eisentraut 1b11561cc1 Standardize format for printing PIDs
Most code prints PIDs as %d, but some code tried to print them as long
or unsigned long.  While this is in theory allowed, the fact that PIDs
fit into int is deeply baked into all PostgreSQL code, so these random
deviations don't accomplish anything except confusion.

Note that we still need casts from pid_t to int, because on 64-bit
MinGW, pid_t is long long int.  (But per above, actually supporting
that range in PostgreSQL code would be major surgery and probably not
useful.)

Discussion: https://www.postgresql.org/message-id/289c2e45-c7d9-5ce4-7eff-a9e2a33e1580@enterprisedb.com
2022-10-14 08:38:53 +02:00
Tom Lane 18a4a620e2 Harden pmsignal.c against clobbered shared memory.
The postmaster is not supposed to do anything that depends
fundamentally on shared memory contents, because that creates
the risk that a backend crash that trashes shared memory will
take the postmaster down with it, preventing automatic recovery.
In commit 969d7cd43 I lost sight of this principle and coded
AssignPostmasterChildSlot() in such a way that it could fail
or even crash if the shared PMSignalState structure became
corrupted.  Remarkably, we've not seen field reports of such
crashes; but I managed to induce one while testing the recent
changes around palloc chunk headers.

To fix, make a semi-duplicative state array inside the postmaster
so that we need consult only local state while choosing a "child
slot" for a new backend.  Ensure that other postmaster-executed
routines in pmsignal.c don't have critical dependencies on the
shared state, either.  Corruption of PMSignalState might now
lead ReleasePostmasterChildSlot() to conclude that backend X
failed, when actually backend Y was the one that trashed things.
But that doesn't matter, because we'll force a cluster-wide reset
regardless.

Back-patch to all supported branches, since this is an old bug.

Discussion: https://postgr.es/m/3436789.1665187055@sss.pgh.pa.us
2022-10-11 18:54:31 -04:00
Tom Lane 66c2922e76 Take care to de-duplicate entries in standby.c's table of locks.
The RecoveryLockLists data structure, which tracks all exclusive
locks that the startup process is holding on behalf of transactions
being replayed, did not have any provision for avoiding duplicate
entries for the same lock.  Maybe that was okay when the code was
first written.  However, modern practice is for checkpoints to
write fresh lists of all active exclusive locks into the WAL.
Thus, an exclusive lock that survives across multiple checkpoints
causes bloat in standbys' startup processes.  If there are a lot
of such locks this can look like a memory leak, and it's even
possible to drive the startup process into a palloc failure from
an over-length List.

To fix, use a hash table instead of simple lists to track the
locks being held.  Allowing for dynahash overhead, this requires
a little more space per lock than the old way (although it's the
same size as what we were allocating prior to c6e0fe1f2).  It's
probably a shade slower too.  However, testing indicates that the
penalty is negligible on ordinary workloads, so let's make this
change to improve robustness in extreme cases.

Patch by me, per report from Dmitriy Kuzmin.  No back-patch
(for now anyway), since it seems that a significant improvement
would only occur in corner cases.

Discussion: https://postgr.es/m/CAHLDt=_ts0A7Agn=hCpUh+RCFkxd+G6uuT=kcTfqFtGur0dp=A@mail.gmail.com
2022-10-06 12:27:36 -04:00
David Rowley 2d0bbedda7 Rename shadowed local variables
In a similar effort to f01592f91, here we mostly rename shadowed local
variables to remove the warnings produced when compiling with
-Wshadow=compatible-local.

This fixes 63 warnings and leaves just 5.

Author: Justin Pryzby, David Rowley
Reviewed-by: Justin Pryzby
Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-10-05 21:01:41 +13:00
Michael Paquier 65b158ae4e Remove useless argument from UnpinBuffer()
The last caller of UnpinBuffer() that did not want to adjust
CurrentResourceOwner was removed in 2d115e4, and nothing has been
introduced in bufmgr.c to do the same thing since.  This simplifies 10
code paths.

Author: Aleksander Alekseev
Reviewed-by: Nathan Bossart, Zhang Mingli, Bharath Rupireddy
Discussion: https://postgr.es/m/CAJ7c6TOmmFpb6ohurLhTC7hKNJWGzdwf8s4EAtAZxD48g-e6Jw@mail.gmail.com
2022-09-30 15:57:47 +09:00
Thomas Munro b6d8a60aba Restore pg_pread and friends.
Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of
the pg_ prefixes where we use pread(), pwrite() and vectored variants.

We dropped support for ancient Unixes where we needed to use lseek() to
implement replacements for those, but it turns out that Windows also
changes the current position even when you pass in an offset to
ReadFile() and WriteFile() if the file handle is synchronous, despite
its documentation saying otherwise.

Switching to asynchronous file handles would fix that, but have other
complications.  For now let's just put back the pg_ prefix and add some
comments to highlight the non-standard side-effect, which we can now
describe as Windows-only.

Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
2022-09-29 13:12:11 +13:00
Robert Haas a448e49bcb Revert 56-bit relfilenode change and follow-up commits.
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
2022-09-28 09:55:28 -04:00
Robert Haas 05d4cbf9b6 Increase width of RelFileNumbers from 32 bits to 56 bits.
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.

If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.

This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.

Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.

Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
2022-09-27 13:25:21 -04:00
Andres Freund e6927270cd meson: Add initial version of meson based build system
Autoconf is showing its age, fewer and fewer contributors know how to wrangle
it. Recursive make has a lot of hard to resolve dependency issues and slow
incremental rebuilds. Our home-grown MSVC build system is hard to maintain for
developers not using Windows and runs tests serially. While these and other
issues could individually be addressed with incremental improvements, together
they seem best addressed by moving to a more modern build system.

After evaluating different build system choices, we chose to use meson, to a
good degree based on the adoption by other open source projects.

We decided that it's more realistic to commit a relatively early version of
the new build system and mature it in tree.

This commit adds an initial version of a meson based build system. It supports
building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD,
Solaris and Windows (however only gcc is supported on aix, solaris). For
Windows/MSVC postgres can now be built with ninja (faster, particularly for
incremental builds) and msbuild (supporting the visual studio GUI, but
building slower).

Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM
bitcode generation, documentation adjustments) are done in subsequent commits
requiring further review. Other aspects (e.g. not installing test-only
extensions) are not yet addressed.

When building on Windows with msbuild, builds are slower when using a visual
studio version older than 2019, because those versions do not support
MultiToolTask, required by meson for intra-target parallelism.

The plan is to remove the MSVC specific build system in src/tools/msvc soon
after reaching feature parity. However, we're not planning to remove the
autoconf/make build system in the near future. Likely we're going to keep at
least the parts required for PGXS to keep working around until all supported
versions build with meson.

Some initial help for postgres developers is at
https://wiki.postgresql.org/wiki/Meson

With contributions from Thomas Munro, John Naylor, Stone Tickle and others.

Author: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-21 22:37:17 -07:00
Michael Paquier 14ff44f80c Used optimized linear search in more code paths
This commit updates two code paths to use pg_lfind32() introduced by
b6ef167 with TransactionId arrays:
- At the end of TransactionIdIsInProgress(), when checking for the case
of still running but overflowed subxids.
- XidIsConcurrent(), when checking for a serializable conflict.

These cases are less impactful than 37a6e5d, but a bit of
micro-benchmarking of this API shows that linear search speeds up by
~20% depending on the number of items involved (x86_64 and amd64 looked
at here).

Author: Nathan Bossart
Reviewed-by: Richard Guo, Michael Paquier
Discussion: https://postgr.es/m/20220901185153.GA783106@nathanxps13
2022-09-22 09:47:28 +09:00
Amit Kapila 6971a839cc Improve some error messages.
It is not our usual style to use "we" in the error messages.

Author: Kyotaro Horiguchi
Reviewed-By: Amit Kapila
Discussion: https://postgr.es/m/20220914.111507.13049297635620898.horikyota.ntt@gmail.com
2022-09-21 09:43:59 +05:30
Peter Geoghegan bfcf1b3480 Harmonize parameter names in storage and AM code.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in storage, catalog,
access method, executor, and logical replication code, as well as in
miscellaneous utility/library code.

Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.  Later commits will do the
same for other parts of the codebase.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
2022-09-19 19:18:36 -07:00
Tom Lane b66fbd8afe Use SIGNAL_ARGS consistently to declare signal handlers.
Various bits of code were declaring signal handlers manually,
using "int signum" or variants of that.  We evidently have no
platforms where that's actually wrong, but let's use our
SIGNAL_ARGS macro everywhere anyway.  If nothing else, it's
good for finding signal handlers easily.

No need for back-patch, since this is just cosmetic AFAICS.

Discussion: https://postgr.es/m/2684964.1663167995@sss.pgh.pa.us
2022-09-14 14:44:50 -04:00
Tom Lane 0a20ff54f5 Split up guc.c for better build speed and ease of maintenance.
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation.  It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:

* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
  SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
  built-in GUC variables, along with some already-exported constant
  tables.
* GUC check/assign/show hook functions are moved to the variable's
  home module, whenever that's clearly identifiable.  A few hard-
  to-classify hooks ended up in commands/variable.c, which was
  already a home for miscellaneous GUC hook functions.

To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module.  That allowed removal of #include "guc.h"
from some existing headers.  The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.

There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments.  But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.

Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.

Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
2022-09-13 11:11:45 -04:00
Michael Paquier bfb9dfd937 Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat()
Several backend-side loops scanning one or more directories with
ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory
copy, temporary file removal, configuration file parsing, some logical
decoding logic and some pgtz stuff) already know the type of the entry
being scanned thanks to the dirent structure associated to the entry, on
platforms where we know about DT_REG, DT_DIR and DT_LNK to make the
difference between a regular file, a directory and a symbolic link.

Relying on the direct structure of an entry saves a few system calls to
stat() and lstat() in the loops updated here, shaving some code while on
it.  The logic of the code remains the same, calling stat() or lstat()
depending on if it is necessary to look through symlinks.

Authors: Nathan Bossart, Bharath Rupireddy
Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com
2022-09-02 16:58:06 +09:00
Tom Lane 7fed801135 Clean up inconsistent use of fflush().
More than twenty years ago (79fcde48b), we hacked the postmaster
to avoid a core-dump on systems that didn't support fflush(NULL).
We've mostly, though not completely, hewed to that rule ever since.
But such systems are surely gone in the wild, so in the spirit of
cleaning out no-longer-needed portability hacks let's get rid of
multiple per-file fflush() calls in favor of using fflush(NULL).

Also, we were fairly inconsistent about whether to fflush() before
popen() and system() calls.  While we've received no bug reports
about that, it seems likely that at least some of these call sites
are at risk of odd behavior, such as error messages appearing in
an unexpected order.  Rather than expend a lot of brain cells
figuring out which places are at hazard, let's just establish a
uniform coding rule that we should fflush(NULL) before these calls.
A no-op fflush() is surely of trivial cost compared to launching
a sub-process via a shell; while if it's not a no-op then we likely
need it.

Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
2022-08-29 13:55:41 -04:00
Robert Haas 82ac34db20 Include RelFileLocator fields individually in BufferTag.
This is preparatory work for a project to increase the number of bits
in a RelFileNumber from 32 to 56.

Along the way, introduce static inline accessor functions for a couple
of BufferTag fields.

Dilip Kumar, reviewed by me. The overall patch series has also had
review at various times from Andres Freund, Ashutosh Sharma, Hannu
Krosing, Vignesh C, Álvaro Herrera, and Tom Lane.

Discussion: http://postgr.es/m/CAFiTN-trubju5YbWAq-BSpZ90-Z6xCVBQE8BVqXqANOZAF1Znw@mail.gmail.com
2022-08-24 15:50:48 -04:00
David Rowley 421892a192 Further reduce warnings with -Wshadow=compatible-local
In a similar effort to f01592f91, here we're targetting fixing the
warnings that -Wshadow=compatible-local produces that we can fix by moving
a variable to an inner scope to stop that variable from being shadowed by
another variable declared somewhere later in the function.

All of the warnings being fixed here are changing the scope of variables
which are being used as an iterator for a "for" loop.  In each instance,
the fix happens to be changing the for loop to use the C99 type
initialization.  Much of this code likely pre-dates our use of C99.

Reducing the scope of the outer scoped variable seems like the safest way
to fix these.  Renaming seems more likely to risk patches using the wrong
variable.  Reducing the scope is more likely to result in a compilation
failure after applying some future patch rather than introducing bugs with
it.

By my count, this takes the warning count from 129 down to 114.

Author: Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-08-24 12:27:12 +12:00
Robert Haas 3e63e8462f When using the WAL-logged CREATE DATABASE strategy, bulk extend.
This should improve performance, and was suggested by Andres Freund.
Back-patch to v15 to keep the code consistent across branches.

Dilip Kumar

Discussion: http://postgr.es/m/C3458199-FEDD-4356-865A-08DFAA5D4065@anarazel.de
Discussion: http://postgr.es/m/CAFiTN-sJ0vVpJrZ=R5M+g7Tr8=NN4wKOtrqOcDEsfFfnZgivVA@mail.gmail.com
2022-08-18 11:26:34 -04:00
Tom Lane efd0c16bec Avoid using list_length() to test for empty list.
The standard way to check for list emptiness is to compare the
List pointer to NIL; our list code goes out of its way to ensure
that that is the only representation of an empty list.  (An
acceptable alternative is a plain boolean test for non-null
pointer, but explicit mention of NIL is usually preferable.)

Various places didn't get that memo and expressed the condition
with list_length(), which might not be so bad except that there
were such a variety of ways to check it exactly: equal to zero,
less than or equal to zero, less than one, yadda yadda.  In the
name of code readability, let's standardize all those spellings
as "list == NIL" or "list != NIL".  (There's probably some
microscopic efficiency gain too, though few of these look to be
at all performance-critical.)

A very small number of cases were left as-is because they seemed
more consistent with other adjacent list_length tests that way.

Peter Smith, with bikeshedding from a number of us

Discussion: https://postgr.es/m/CAHut+PtQYe+ENX5KrONMfugf0q6NHg4hR5dAhqEXEc2eefFeig@mail.gmail.com
2022-08-17 11:12:35 -04:00
Thomas Munro 36b3d52459 Remove configure probe for sys/resource.h and refactor.
<sys/resource.h> is in SUSv2 and is on all targeted Unix systems.  We
have a replacement for getrusage() on Windows, so let's just move its
declarations into src/include/port/win32/sys/resource.h so that we can
use a standard-looking #include.  Also remove an obsolete reference to
CLK_TCK.  Also rename src/port/getrusage.c to win32getrusage.c,
following the convention for Windows-only fallback code.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
2022-08-14 00:09:47 +12:00
Thomas Munro 37a65d1db1 Remove configure probes for sys/ipc.h, sys/sem.h, sys/shm.h.
These are in SUSv2 and every targeted Unix system has them.  It's not
hard to avoid including them on Windows system because they're mostly
used in platform-specific translation units.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
2022-08-14 00:09:47 +12:00
Robert Haas 76733b399c Avoid using a fake relcache entry to own an SmgrRelation.
If an error occurs before we close the fake relcache entry, the the
fake relcache entry will be destroyed by the SmgrRelation will
survive until end of transaction. Its smgr_owner pointer ends up
pointing to already-freed memory.

The original reason for using a fake relcache entry here was to try
to avoid reusing an SMgrRelation across a relevant invalidation. To
avoid that problem, just call smgropen() again each time we need a
reference to it. Hopefully someday we will come up with a more
elegant approach, but accessing uninitialized memory is bad so let's
do this for now.

Dilip Kumar, reviewed by Andres Freund and Tom Lane. Report by
Justin Pryzby.

Discussion: http://postgr.es/m/20220802175043.GA13682@telsasoft.com
Discussion: http://postgr.es/m/CAFiTN-vSFeE6_W9z698XNtFROOA_nSqUXWqLcG0emob_kJ+dEQ@mail.gmail.com
2022-08-12 08:25:41 -04:00
Andres Freund 320f92b744 Rely on __func__ being supported
Previously we fell back to __FUNCTION__ and then NULL. As __func__ is in C99
that shouldn't be necessary anymore.

Solution.pm defined HAVE_FUNCNAME__FUNCTION instead of
HAVE_FUNCNAME__FUNC (originating in 4164e6636e), as at some point in the past
MSVC only supported __FUNCTION__. Our minimum version supports __func__.

Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220807012914.ydz73yte6j3coulo@awork3.anarazel.de
2022-08-07 09:36:01 -07:00
Tom Lane 692df425b6 Fix data-corruption hazard in WAL-logged CREATE DATABASE.
RelationCopyStorageUsingBuffer thought it could skip copying
empty pages, but of course that does not work at all, because
subsequent blocks will be out of place.

Also fix it to acquire share lock on the source buffer.  It *might*
be safe to not do that, but it's not very certain, and I don't think
this code deserves any benefit of the doubt.

Dilip Kumar, per complaint from me

Discussion: https://postgr.es/m/3679800.1659654066@sss.pgh.pa.us
2022-08-06 11:50:34 -04:00
Thomas Munro 5fc88c5d53 Replace pgwin32_is_junction() with lstat().
Now that lstat() reports junction points with S_IFLNK/S_ISLINK(), and
unlink() can unlink them, there is no need for conditional code for
Windows in a few places.  That was expressed by testing for WIN32 or
S_ISLNK, which we can now constant-fold.

The coding around pgwin32_is_junction() was a bit suspect anyway, as we
never checked for errors, and we also know that errors can be spuriously
reported because of transient sharing violations on this OS.  The
lstat()-based code has handling for that.

This also reverts 4fc6b6ee on master only.  That was done because
lstat() didn't previously work for symlinks (junction points), but now
it does.

Tested-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
2022-08-06 12:50:59 +12:00
Thomas Munro d2e150831a Remove configure probe for fdatasync.
fdatasync() is in SUSv2, and all targeted Unix systems have it.  We have
a replacement function for Windows.

We retain the probe for the function declaration, which allows us to
supply the mysteriously missing declaration for macOS, and also for
Windows.  No need to keep a HAVE_FDATASYNC macro around.

Also rename src/port/fdatasync.c to win32fdatasync.c since it's only for
Windows.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
2022-08-05 16:37:38 +12:00
Thomas Munro a0dc827112 Simplify replacement code for preadv and pwritev.
preadv() and pwritev() are not standardized by POSIX, but appeared in
NetBSD in 1999 and were adopted by at least OpenBSD, FreeBSD,
DragonFlyBSD, Linux, AIX, illumos and macOS.  We don't use them much
yet, but an active proposal uses them heavily.

In 15, we had two replacement implementations for other OSes: one based
on lseek() + -v function if available for true vector I/O, and the other
based on a loop over p- function.

The former would be an obstacle to hypothetical future multi-threaded
code sharing file descriptors, while the latter would not, since commit
cf112c12.  Furthermore, the number of targeted systems that could
benefit from the former's potential upside has dwindled to just one
niche OS, since macOS added the functions and we de-supported HP-UX.
That doesn't seem like a good trade-off.

Therefore, drop the lseek()-based variant, and also the pg_ prefix now
that the file position portability hazard is gone.

At the time of writing, the only systems in our build farm that lack
native preadv/pwritev and thus use fallback code are:

 * Solaris (but not illumos)
 * macOS before release 11.0
 * Windows

With this commit, the above systems will now use the *same* fallback
code, the version that loops over pread()/pwrite().  Windows already
used that (though a later proposal may include true vector I/O for
Windows), so this decision really only affects Solaris, until it gets
around to adding these system calls.

Also remove some useless includes while here.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 14:04:02 +12:00
Thomas Munro cf112c1220 Remove dead pread and pwrite replacement code.
pread() and pwrite() are in SUSv2, and all targeted Unix systems have
them.

Previously, we defined pg_pread and pg_pwrite to emulate these function
with lseek() on old Unixen.  The names with a pg_ prefix were a reminder
of a portability hazard: they might change the current file position.
That hazard is gone, so we can drop the prefixes.

Since the remaining replacement code is Windows-only, move it into
src/port/win32p{read,write}.c, and move the declarations into
src/include/port/win32_port.h.

No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only
used for declarations in port.h which have now moved into win32_port.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 09:49:21 +12:00
Thomas Munro bdb657edd6 Remove configure probe and related tests for getrlimit.
getrlimit() is in SUSv2 and all targeted systems have it.

Windows doesn't have it.  We could just use #ifndef WIN32, but for a
little more explanation about why we're making things conditional, let's
retain the HAVE_GETRLIMIT macro.  It's defined in port.h for Unix systems.

On systems that have it, it's not necessary to test for RLIMIT_CORE,
RLIMIT_STACK or RLIMIT_NOFILE macros, since SUSv2 requires those and all
targeted systems have them.  Also remove references to a pre-historic
alternative spelling of RLIMIT_NOFILE, and coding that seemed to believe
that Cygwin didn't have it.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 09:18:34 +12:00
Robert Haas bbe08b8869 Use TRUNCATE to preserve relfilenode for pg_largeobject + index.
Commit 9a974cbcba arranged to preserve
the relfilenode of user tables across pg_upgrade, but failed to notice
that pg_upgrade treats pg_largeobject as a user table and thus it needs
the same treatment. Otherwise, large objects will appear to vanish
after a  pg_upgrade.

Commit d498e052b4 fixed this problem
by teaching pg_dump to UPDATE pg_class.relfilenode for pg_largeobject
and its index. However, because an UPDATE on the catalog rows doesn't
change anything on disk, this can leave stray files behind in the new
cluster. They will normally be empty, but it's a little bit untidy.

Hence, this commit arranges to do the same thing using DDL. Specifically,
it makes TRUNCATE work for the pg_largeobject catalog when in
binary-upgrade mode, and it then uses that command in binary-upgrade
dumps as a way of setting pg_class.relfilenode for pg_largeobject and
its index. That way, the old files are removed from the new cluster.

Discussion: http://postgr.es/m/CA+TgmoYYMXGUJO5GZk1-MByJGu_bB8CbOL6GJQC8=Bzt6x6vDg@mail.gmail.com
2022-07-28 16:03:42 -04:00