Commit Graph

362 Commits

Author SHA1 Message Date
Andres Freund 12605414a7 Use dlists instead of SHM_QUEUE for syncrep queue
Part of a series to remove SHM_QUEUE. ilist.h style lists are more widely used
and have an easier to use interface.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an older version)
Discussion: https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2@awork3.anarazel.de
Discussion: https://postgr.es/m/20200211042229.msv23badgqljrdg2@alap3.anarazel.de
2023-01-18 12:15:05 -08:00
Andres Freund 5764f611e1 Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locks
Part of a series to remove SHM_QUEUE. ilist.h style lists are more widely used
and have an easier to use interface.

As PROC_QUEUE is now unused, remove it.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an older version)
Discussion: https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2@awork3.anarazel.de
Discussion: https://postgr.es/m/20200211042229.msv23badgqljrdg2@alap3.anarazel.de
2023-01-18 11:41:14 -08:00
Bruce Momjian c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Andres Freund a4adc31f69 lwlock: Fix quadratic behavior with very long wait lists
Until now LWLockDequeueSelf() sequentially searched the list of waiters to see
if the current proc is still is on the list of waiters, or has already been
removed. In extreme workloads, where the wait lists are very long, this leads
to a quadratic behavior. #backends iterating over a list #backends
long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in
the first place also increases with the increased length of the wait queue, as
it becomes more likely that a lock is released while waiting for the wait list
lock, which is held for longer during lock release.

Due to the exponential back-off in perform_spin_delay() this is surprisingly
hard to detect. We should make that easier, e.g. by adding a wait event around
the pg_usleep() - but that's a separate patch.

The fix is simple - track whether a proc is currently waiting in the wait list
or already removed but waiting to be woken up in PGPROC->lwWaiting.

In some workloads with a lot of clients contending for a small number of
lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput.

As the quadratic behavior arguably is a bug, we might want to decide to
backpatch this fix in the future.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de
Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
2022-11-20 11:56:32 -08:00
Andres Freund 8c954168cf Fix mislabeling of PROC_QUEUE->links as PGPROC, fixing UBSan on 32bit
ProcSleep() used a PGPROC* variable to point to PROC_QUEUE->links.next,
because that does "the right thing" with SHMQueueInsertBefore(). While that
largely works, it's certainly not correct and unnecessary - we can just use
SHM_QUEUE* to point to the insertion point.

Noticed when testing a 32bit of postgres with undefined behavior
sanitizer. UBSan noticed that sometimes the supposed PGPROC wasn't
sufficiently aligned (required since 46d6e5f567, ensured indirectly, via
ShmemAllocRaw() guaranteeing cacheline alignment).

For now fix this by using a SHM_QUEUE* for the insertion point. Subsequently
we should replace all the use of PROC_QUEUE and SHM_QUEUE with ilist.h, but
that's a larger change that we don't want to backpatch.

Backpatch to all supported versions - it's useful to be able to run postgres
under UBSan.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20221117014230.op5kmgypdv2dtqsf@awork3.anarazel.de
Backpatch: 11-
2022-11-19 12:22:04 -08:00
David Rowley 2d0bbedda7 Rename shadowed local variables
In a similar effort to f01592f91, here we mostly rename shadowed local
variables to remove the warnings produced when compiling with
-Wshadow=compatible-local.

This fixes 63 warnings and leaves just 5.

Author: Justin Pryzby, David Rowley
Reviewed-by: Justin Pryzby
Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-10-05 21:01:41 +13:00
Robert Haas 7fc0e7de9f Revert the addition of GetMaxBackends() and related stuff.
This reverts commits 0147fc7, 4567596, aa64f23, and 5ecd018.
There is no longer agreement that introducing this function
was the right way to address the problem. The consensus now
seems to favor trying to make a correct value for MaxBackends
available to mdules executing their _PG_init() functions.

Nathan Bossart

Discussion: http://postgr.es/m/20220323045229.i23skfscdbvrsuxa@jrouhaud
2022-04-12 14:45:23 -04:00
Robert Haas f37015a161 Rename delayChkpt to delayChkptFlags.
Before commit 412ad7a556, delayChkpt
was a Boolean. Now it's an integer. Extensions using it need to be
appropriately updated, so let's rename the field to make sure that
a hard compilation failure occurs.

Replacing delayChkpt with delayChkptFlags made a few comments extend
past 80 characters, so I reflowed them and changed some wording very
slightly.

The back-branches will need a different change to restore compatibility
with existing minor releases; this is just for master.

Per suggestion from Tom Lane.

Discussion: http://postgr.es/m/a7880f4d-1d74-582a-ada7-dad168d046d1@enterprisedb.com
2022-04-08 11:44:17 -04:00
Robert Haas 412ad7a556 Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.
If TRUNCATE causes some buffers to be invalidated and thus the
checkpoint does not flush them, TRUNCATE must also ensure that the
corresponding files are truncated on disk. Otherwise, a replay
from the checkpoint might find that the buffers exist but have
the wrong contents, which may cause replay to fail.

Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design
suggestion from Heikki Linnakangas, with some changes to the
comments by me. Review of this and a prior patch that approached
the issue differently by Heikki Linnakangas, Andres Freund, Álvaro
Herrera, Masahiko Sawada, and Tom Lane.

Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
2022-03-24 14:52:28 -04:00
Andres Freund 2f6501fa3c Move replication slot release to before_shmem_exit().
Previously, replication slots were released in ProcKill() on error, resulting
in reporting replication slot drop of ephemeral slots after the stats
subsystem was already shut down.

To fix this problem, move replication slot release to a before_shmem_exit()
hook that is called before the stats collector shuts down. There wasn't really
a good reason for the slot handling to be in ProcKill() anyway.

Patch by Masahiko Sawada, with very minor polishing by me.

I, Andres, wrote a test for dropping slots during process exit, but there may
be some OS dependent issues around the number of times FATAL error messages
are displayed due to a still debated libpq issue. So that test will be
committed separately / later.

Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
2022-02-14 17:08:17 -08:00
Robert Haas aa64f23b02 Remove MaxBackends variable in favor of GetMaxBackends() function.
Previously, it was really easy to write code that accessed MaxBackends
before we'd actually initialized it, especially when coding up an
extension. To make this less error-prune, introduce a new function
GetMaxBackends() which should be used to obtain the correct value.
This will ERROR if called too early. Demote the global variable to
a file-level static, so that nobody can peak at it directly.

Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane,
by Michael Paquier (who had doubts about the approach), and by me.

Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
2022-02-08 15:53:19 -05:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Thomas Munro a13db0e164 Change ProcSendSignal() to take pgprocno.
Instead of referring to target backends by pid, use pgprocno.  This
means that we don't have to scan the ProcArray and we can drop some
special case code for dealing with the startup process.

Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com
Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
2021-12-16 15:56:03 +13:00
Heikki Linnakangas 317632f307 Move InRecovery and standbyState global vars to xlogutils.c.
They are used in code that runs both during normal operation and during
WAL replay, and needs to behave differently during replay. Move them to
xlogutils.c, because that's where we have other helper functions used by
redo routines.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi
2021-07-31 09:50:26 +03:00
Tom Lane def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Andres Freund 225a22b19e Improve efficiency of wait event reporting, remove proc.h dependency.
pgstat_report_wait_start() and pgstat_report_wait_end() required two
conditional branches so far. One to check if MyProc is NULL, the other to
check if pgstat_track_activities is set. As wait events are used around
comparatively lightweight operations, and are inlined (reducing branch
predictor effectiveness), that's not great.

The dependency on MyProc has a second disadvantage: Low-level subsystems, like
storage/file/fd.c, report wait events, but architecturally it is preferable
for them to not depend on inter-process subsystems like proc.h (defining
PGPROC).  After this change including pgstat.h (nor obviously its
sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC
related headers anymore.

These goals, efficiency and abstraction, are achieved by having
pgstat_report_wait_start/end() not interact with MyProc, but instead a new
my_wait_event_info variable. At backend startup it points to a local variable,
removing the need to check for MyProc being NULL. During process
initialization my_wait_event_info is redirected to MyProc->wait_event_info. At
shutdown this is reversed. Because wait event reporting now does not need to
know about where the wait event is stored, it does not need to know about
PGPROC anymore.

The removal of the branch for checking pgstat_track_activities is simpler:
Don't check anymore. The cost due to the branch are often higher than the
store - and even if not, pgstat_track_activities is rarely disabled.

The main motivator to commit this work now is that removing the (indirect)
pgproc.h include from pgstat.h simplifies a patch to move statistics reporting
to shared memory (which still has a chance to get into 14).

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
2021-04-03 12:03:45 -07:00
Fujii Masao f05ed5a5cf Initialize atomic variable waitStart in PGPROC, at postmaster startup.
Commit 46d6e5f567 added the atomic variable "waitStart" into PGPROC struct,
to store the time at which wait for lock acquisition started. Previously
this variable was initialized every time each backend started. Instead,
this commit makes postmaster initialize it at the startup, to ensure that
the variable should be initialized before any use of it.

This commit also moves the code to initialize "waitStart" variable for
prepare transaction, from TwoPhaseGetDummyProc() to MarkAsPreparingGuts().
Because MarkAsPreparingGuts() is more proper place to do that since
it initializes other PGPROC variables.

Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com
2021-02-22 18:25:00 +09:00
Peter Eisentraut 0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Fujii Masao 46d6e5f567 Display the time when the process started waiting for the lock, in pg_locks, take 2
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

The first attempt of this patch (commit 3b733fcd04) caused the buildfarm
member "rorqual" (built with --disable-atomics --disable-spinlocks) to report
the failure of the regression test. It was reverted by commit 890d2182a2.
The cause of this failure was that the atomic variable for "waitstart"
in the dummy process entry created at the end of prepare transaction was
not initialized. This second attempt fixes that issue.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-15 15:13:37 +09:00
Fujii Masao 890d2182a2 Revert "Display the time when the process started waiting for the lock, in pg_locks."
This reverts commit 3b733fcd04.

Per buildfarm members prion and rorqual.
2021-02-09 18:30:40 +09:00
Fujii Masao 3b733fcd04 Display the time when the process started waiting for the lock, in pg_locks.
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-09 18:10:19 +09:00
Fujii Masao 39b03690b5 Log long wait time on recovery conflict when it's resolved.
This is a follow-up of the work done in commit 0650ff2303. This commit
extends log_recovery_conflict_waits so that a log message is produced
also when recovery conflict has already been resolved after deadlock_timeout
passes, i.e., when the startup process finishes waiting for recovery
conflict after deadlock_timeout. This is useful in investigating how long
recovery conflicts prevented the recovery from applying WAL.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-13 22:59:17 +09:00
Fujii Masao 0650ff2303 Add GUC to log long wait times on recovery conflicts.
This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.

Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.

Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-08 00:47:03 +09:00
Tom Lane 9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Fujii Masao 8900b5a9d5 Detect the deadlocks between backends and the startup process.
The deadlocks that the recovery conflict on lock is involved in can
happen between hot-standby backends and the startup process.
If a backend takes an access exclusive lock on the table and which
finally triggers the deadlock, that deadlock can be detected
as expected. On the other hand, previously, if the startup process
took an access exclusive lock and which finally triggered the deadlock,
that deadlock could not be detected and could remain even after
deadlock_timeout passed. This is a bug.

The cause of this bug was that the code for handling the recovery
conflict on lock didn't take care of deadlock case at all. It assumed
that deadlocks involving the startup process and backends were able
to be detected by the deadlock detector invoked within backends.
But this assumption was incorrect. The startup process also should
have invoked the deadlock detector if necessary.

To fix this bug, this commit makes the startup process invoke
the deadlock detector if deadlock_timeout is reached while handling
the recovery conflict on lock. Specifically, in that case, the startup
process requests all the backends holding the conflicting locks to
check themselves for deadlocks.

Back-patch to v9.6. v9.5 has also this bug, but per discussion we decided
not to back-patch the fix to v9.5. Because v9.5 doesn't have some
infrastructure codes (e.g., 37c54863cf) that this bug fix patch depends on.
We can apply those codes for the back-patch, but since the next minor
version release is the final one for v9.5, it's risky to do that. If we
unexpectedly introduce new bug to v9.5 by the back-patch, there is no
chance to fix that. We determined that the back-patch to v9.5 would give
more risk than gain.

Author: Fujii Masao
Reviewed-by: Bertrand Drouvot, Masahiko Sawada, Kyotaro Horiguchi
Discussion: https://postgr.es/m/4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com
2021-01-06 12:39:18 +09:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane 789b938bf2 Centralize logic for skipping useless ereport/elog calls.
While ereport() and elog() themselves are quite cheap when the
error message level is too low to be printed, some places need to do
substantial work before they can call those macros at all.  To allow
optimizing away such setup work when nothing is to be printed, make
elog.c export a new function message_level_is_interesting(elevel)
that reports whether ereport/elog will do anything.  Make use of that
in various places that had ad-hoc direct tests of log_min_messages etc.
Also teach ProcSleep to use it to avoid some work.  (There may well
be other places that could usefully use this; I didn't search hard.)

Within elog.c, refactor a little bit to avoid having duplicate copies
of the policy-setting logic.  When that code was written, we weren't
relying on the availability of inline functions; so it had some
duplications in the name of efficiency, which I got rid of.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
2020-11-23 19:10:46 -05:00
Alvaro Herrera 450c8230b1
Don't hold ProcArrayLock longer than needed in rare cases
While cancelling an autovacuum worker, we hold ProcArrayLock while
formatting a debugging log string.  We can make this shorter by saving
the data we need to produce the message and doing the formatting outside
the locked region.

This isn't terribly critical, as it only occurs pretty rarely: when a
backend runs deadlock detection and it happens to be blocked by a
autovacuum running autovacuum.  Still, there's no need to cause a hiccup
in ProcArrayLock processing, which can be very high-traffic in some
cases.

While at it, rework code so that we only print the string when it is
really going to be used, as suggested by Michael Paquier.

Discussion: https://postgr.es/m/20201118214127.GA3179@alvherre.pgsql
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2020-11-23 18:55:23 -03:00
Alvaro Herrera cd9c1b3e19
Rename PGPROC->vacuumFlags to statusFlags
With more flags associated to a PGPROC entry that are not related to
vacuum (currently existing or planned), the name "statusFlags" describes
its purpose better.

(The same is done to the mirroring PROC_HDR->vacuumFlags.)

No functional changes in this commit.

This was suggested first by Hari Babu Kommi in [1] and then by Michael
Paquier at [2].

[1] https://postgr.es/m/CAJrrPGcsDC-oy1AhqH0JkXYa0Z2AgbuXzHPpByLoBGMxfOZMEQ@mail.gmail.com
[2] https://postgr.es/m/20200820060929.GB3730@paquier.xyz

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20201116182446.qcg3o6szo2zookyr@localhost
2020-11-16 19:42:55 -03:00
Michael Paquier aad546bd0a doc: Fix some grammar and inconsistencies
Some comments are fixed while on it.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200818171702.GK17022@telsasoft.com
Backpatch-through: 9.6
2020-09-10 15:50:19 +09:00
Andres Freund 5871f09c98 Fix autovacuum cancellation.
The problem is caused by me (Andres) having ProcSleep() look at the
wrong PGPROC entry in 5788e258bb.

Unfortunately it seems hard to write a reliable test for autovacuum
cancellations. Perhaps somebody will come up with a good approach, but
it seems worth fixing the issue even without a test.

Reported-By: Jeff Janes <jeff.janes@gmail.com>
Author: Jeff Janes <jeff.janes@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1wH2aUy+wDRDz+5RZALdcUnEofV1t9PzXS_gBJO9vZZ0Q@mail.gmail.com
2020-09-08 11:25:34 -07:00
Andres Freund 73487a60fc snapshot scalability: Move subxact info to ProcGlobal, remove PGXACT.
Similar to the previous changes this increases the chance that data
frequently needed by GetSnapshotData() stays in l2 cache. In many
workloads subtransactions are very rare, and this makes the check for
that considerably cheaper.

As this removes the last member of PGXACT, there is no need to keep it
around anymore.

On a larger 2 socket machine this and the two preceding commits result
in a ~1.07x performance increase in read-only pgbench. For read-heavy
mixed r/w workloads without row level contention, I see about 1.1x.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-14 15:33:35 -07:00
Andres Freund 5788e258bb snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags.
Similar to the previous commit this increases the chance that data
frequently needed by GetSnapshotData() stays in l2 cache. As we now
take care to not unnecessarily write to ProcGlobal->vacuumFlags, there
should be very few modifications to the ProcGlobal->vacuumFlags array.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-14 15:33:35 -07:00
Andres Freund 941697c3c1 snapshot scalability: Introduce dense array of in-progress xids.
The new array contains the xids for all connected backends / in-use
PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT
arrays which can have unused entries interspersed).

This improves performance because GetSnapshotData() always needs to
scan the xids of all live procarray entries and now there's no need to
go through the procArray->pgprocnos indirection anymore.

As the set of running top-level xids changes rarely, compared to the
number of snapshots taken, this substantially increases the likelihood
of most data required for a snapshot being in l2 cache.  In
read-mostly workloads scanning the xids[] array will sufficient to
build a snapshot, as most backends will not have an xid assigned.

To keep the xid array dense ProcArrayRemove() needs to move entries
behind the to-be-removed proc's one further up in the array. Obviously
moving array entries cannot happen while a backend sets it
xid. I.e. locking needs to prevent that array entries are moved while
a backend modifies its xid.

To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot
spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold
XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray
entry is not a very frequent operation, even taking 2PC into account.

Due to the above, the dense array entries can only be read or modified
while holding ProcArrayLock and/or XidGenLock. This prevents a
concurrent ProcArrayRemove() from shifting the dense array while it is
accessed concurrently.

While the new dense array is very good when needing to look at all
xids it is less suitable when accessing a single backend's xid. In
particular it would be problematic to have to acquire a lock to access
a backend's own xid. Therefore a backend's xid is not just stored in
the dense array, but also in PGPROC. This also allows a backend to
only access the shared xid value when the backend had acquired an
xid.

The infrastructure added in this commit will be used for the remaining
PGXACT fields in subsequent commits. They are kept separate to make
review easier.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-14 15:33:35 -07:00
Andres Freund 1f51c17c68 snapshot scalability: Move PGXACT->xmin back to PGPROC.
Now that xmin isn't needed for GetSnapshotData() anymore, it leads to
unnecessary cacheline ping-pong to have it in PGXACT, as it is updated
considerably more frequently than the other PGXACT members.

After the changes in dc7420c2c9, this is a very straight-forward change.

For highly concurrent, snapshot acquisition heavy, workloads this change alone
can significantly increase scalability. E.g. plain pgbench on a smaller 2
socket machine gains 1.07x for read-only pgbench, 1.22x for read-only pgbench
when submitting queries in batches of 100, and 2.85x for batches of 100
'SELECT';.  The latter numbers are obviously not to be expected in the
real-world, but micro-benchmark the snapshot computation
scalability (previously spending ~80% of the time in GetSnapshotData()).

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-13 16:25:21 -07:00
Peter Eisentraut a513f1dfbf Remove STATUS_WAITING
Add a separate enum for use in the locking APIs, which were the only
user.

Discussion: https://www.postgresql.org/message-id/flat/a6f91ead-0ce4-2a34-062b-7ab9813ea308%402ndquadrant.com
2020-06-17 09:14:37 +02:00
Tom Lane 36ac359d36 Rename assorted LWLock tranches.
Choose names that fit into the conventions for wait event names
(particularly, that multi-word names are in the style MultiWordName)
and hopefully convey more information to non-hacker users than the
previous names did.

Also rename SerializablePredicateLockListLock to
SerializablePredicateListLock; the old name was long enough to cause
table formatting problems, plus the double occurrence of "Lock" seems
confusing/error-prone.

Also change a couple of particularly opaque LWLock field names.

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 18:11:07 -04:00
Alexander Korotkov 1aac32df89 Revert 0f5ca02f53
0f5ca02f53 introduces 3 new keywords.  It appears to be too much for relatively
small feature.  Given now we past feature freeze, it's already late for
discussion of the new syntax.  So, revert.

Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
2020-04-08 11:37:27 +03:00
Andres Freund 75848bc744 snapshot scalability: Move delayChkpt from PGXACT to PGPROC.
The goal of separating hotly accessed per-backend data from PGPROC
into PGXACT is to make accesses fast (GetSnapshotData() in
particular). But delayChkpt is not actually accessed frequently; only
when starting a checkpoint. As it is frequently modified (multiple
times in the course of a single transaction), storing it in the same
cacheline as hotly accessed data unnecessarily dirties a contended
cacheline.

Therefore move delayChkpt to PGPROC.

This is part of a larger series of patches intending to improve
GetSnapshotData() scalability. It is committed and pushed separately,
as it is independently beneficial (small but measurable win, limited
by the other frequent modifications of PGXACT).

Author: Andres Freund
Reviewed-By: Robert Haas, Thomas Munro, David Rowley
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-04-07 17:36:23 -07:00
Alexander Korotkov 0f5ca02f53 Implement waiting for given lsn at transaction start
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.

  WAIT FOR LSN lsn [ TIMEOUT timeout ]

New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.

New shared memory struct is used to track awaited lsn per backend.  Recovery
process wakes up backend once required lsn is applied.

Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
2020-04-07 23:51:10 +03:00
Amit Kapila 3ba59ccc89 Allow page lock to conflict among parallel group members.
This is required as it is no safer for two related processes to perform
clean up in gin indexes at a time than for unrelated processes to do the
same.  After acquiring page locks, we can acquire relation extension lock
but reverse never happens which means these will also not participate in
deadlock.  So, avoid checking wait edges from this lock.

Currently, the parallel mode is strictly read-only, but after this patch
we have the infrastructure to allow parallel inserts and parallel copy.

Author: Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko
Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com
2020-03-21 08:48:06 +05:30
Amit Kapila 85f6b49c2c Allow relation extension lock to conflict among parallel group members.
This is required as it is no safer for two related processes to extend the
same relation at a time than for unrelated processes to do the same.  We
don't acquire a heavyweight lock on any other object after relation
extension lock which means such a lock can never participate in the
deadlock cycle.  So, avoid checking wait edges from this lock.

This provides an infrastructure to allow parallel operations like insert,
copy, etc. which were earlier not possible as parallel group members won't
conflict for relation extension lock.

Author: Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko
Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com
2020-03-20 08:20:56 +05:30
Michael Paquier 5ac4e9a12c Fix typo in proc.c
Author: Julien Rouhaud
Discussion: https://postgr.es/m/20200206082333.GA95343@nol
2020-02-07 12:41:10 +09:00
Peter Eisentraut c096a804d9 Remove STATUS_FOUND
Replace the solitary use with a bool.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/a6f91ead-0ce4-2a34-062b-7ab9813ea308%402ndquadrant.com
2020-01-11 07:48:57 +01:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Michael Paquier 6ca86bb7e9 Fix typos in the code
Author: Vignesh C
Reviewed-by: Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/CALDaNm0ni+GAOe4+fbXiOxNrVudajMYmhJFtXGX-zBPoN8ixhw@mail.gmail.com
2019-10-30 10:03:00 +09:00
Thomas Munro 25b93a2967 Remove obsolete comments about sempahores from proc.c.
Commit 6753333f switched from a semaphore-based wait to a latch-based
wait for ProcSleep()/ProcWakeup(), but left behind some stray references
to semaphores.

Back-patch to 9.5.

Reviewed-by: Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/CA+hUKGLs5H6zhmgTijZ1OaJvC1sG0=AFXc1aHuce32tKiQrdEA@mail.gmail.com
2019-06-21 10:57:07 +12:00
Michael Paquier ea92368cd1 Move max_wal_senders out of max_connections for connection slot handling
Since its introduction, max_wal_senders is counted as part of
max_connections when it comes to define how many connection slots can be
used for replication connections with a WAL sender context.  This can
lead to confusion for some users, as it could be possible to block a
base backup or replication from happening because other backend sessions
are already taken for other purposes by an application, and
superuser-only connection slots are not a correct solution to handle
that case.

This commit makes max_wal_senders independent of max_connections for its
handling of PGPROC entries in ProcGlobal, meaning that connection slots
for WAL senders are handled using their own free queue, like autovacuum
workers and bgworkers.

One compatibility issue that this change creates is that a standby now
requires to have a value of max_wal_senders at least equal to its
primary.  So, if a standby created enforces the value of
max_wal_senders to be lower than that, then this could break failovers.
Normally this should not be an issue though, as any settings of a
standby are inherited from its primary as postgresql.conf gets normally
copied as part of a base backup, so parameters would be consistent.

Author: Alexander Kukushkin
Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii
Kliukin
Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com
2019-02-12 10:07:56 +09:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00