postgresql

Commit Graph

Author	SHA1	Message	Date
Andres Freund	32914d900f	Fix race condition in stats.sql added in `5264add784` Very occasionally the stats test failed due to the number of sessions not being updated yet. Likely this requires that there is contention on the database's stats entry. Solve this by forcing pending stats to be flushed before fetching the stats. I verified that there are no other test failures after making pgstat_report_stat() only flush stats when force = true. Per message from Tom Lane and buildfarm member crake. Discussion: https://postgr.es/m/3428246.1663271992@sss.pgh.pa.us Backpatch: 15-, where `5264add784` added the test	2022-09-16 11:28:20 -07:00
Tomas Vondra	0df4eb3f70	Reinstate tests accidentally removed by `e3fcca0d0d` Commit `e3fcca0d0d` reverted modifications to HOT for BRIN, but it also removed a couple unrelated tests from stats.sql. Reinstate those tests. Reported-by: Peter Eisentraut	2022-07-18 19:16:44 +02:00
Tomas Vondra	e3fcca0d0d	Revert changes in HOT handling of BRIN indexes This reverts commits `5753d4ee32` and `fe60b67250` that modified HOT to ignore BRIN indexes. The commit message for `5753d4ee32` claims that: When determining whether an index update may be skipped by using HOT, we can ignore attributes indexed only by BRIN indexes. There are no index pointers to individual tuples in BRIN, and the page range summary will be updated anyway as it relies on visibility info. This is partially incorrect - it's true BRIN indexes don't point to individual tuples, so HOT chains are not an issue, but the visibitlity info is not sufficient to keep the index up to date. This can easily result in corrupted indexes, as demonstrated in the hackers thread. This does not mean relaxing the HOT restrictions for BRIN is a lost cause, but it needs to handle the two aspects (allowing HOT chains and updating the page range summaries) as separate. But that requires a major changes, and it's too late for that in the current dev cycle. Reported-by: Tomas Vondra Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com	2022-06-16 15:02:49 +02:00
Andres Freund	5264add784	pgstat: add/extend tests for resetting various kinds of stats. - subscriber stats reset path was untested - slot stat sreset path for all slots was untested - pg_stat_database.sessions etc was untested - pg_stat_reset_shared() was untested, for any kind of shared stats - pg_stat_reset() was untested Author: Melanie Plageman <melanieplageman@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de	2022-04-07 15:43:43 -07:00
Andres Freund	e349c95d3e	pgstat: add tests for transaction behaviour, 2PC, function stats. Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de	2022-04-07 00:22:49 -07:00
Andres Freund	0f96965c65	pgstat: add pg_stat_force_next_flush(), use it to simplify tests. In the stats collector days it was hard to write tests for the stats system, because fundamentally delivery of stats messages over UDP was not synchronous (nor guaranteed). Now we easily can force pending stats updates to be flushed synchronously. This moves stats.sql into a parallel group, there isn't a reason for it to run in isolation anymore. And it may shake out some bugs. Bumps catversion. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de	2022-04-06 23:35:56 -07:00
Andres Freund	5891c7a8ed	pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de	2022-04-06 21:29:46 -07:00
Andres Freund	bdbd3d9064	pgstat: stats collector references in comments. Soon the stats collector will be no more, with statistics instead getting stored in shared memory. There are a lot of references to the stats collector in comments. This commit replaces most of these references with "cumulative statistics system", with the remaining ones getting replaced as part of subsequent commits. This is done separately from the - quite large - shared memory statistics patch to make review easier. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de	2022-04-06 13:56:06 -07:00
Andres Freund	da4b56662f	Mark pg_stat_get_subscription_stats() strict. It accidentally was marked as non-strict. As it was introduced only in HEAD, we can just fix the catalog. Bumps catversion. Discussion: https://postgr.es/m/20220326212432.s5n2maw6kugnpyxw@alap3.anarazel.de	2022-03-27 21:47:26 -07:00
Andres Freund	43a7dc96eb	Fix NULL input behaviour of pg_stat_get_replication_slot(). pg_stat_get_replication_slot() accidentally was marked as non-strict, crashing when called with NULL input. As it's already released, introduce an explicit NULL check in 14, fix the catalog in HEAD. Bumps catversion in HEAD. Discussion: https://postgr.es/m/20220326212432.s5n2maw6kugnpyxw@alap3.anarazel.de Backpatch: 14-, where replication slot stats were introduced	2022-03-27 21:46:23 -07:00
Noah Misch	766075105c	Use PG_TEST_TIMEOUT_DEFAULT for pg_regress suite non-elapsing timeouts. Currently, only contrib/test_decoding has this property. Use \getenv to load the timeout value. Discussion: https://postgr.es/m/20220218052842.GA3627003@rfd.leadboat.com	2022-03-04 18:53:13 -08:00
Tomas Vondra	fe60b67250	Move test for BRIN HOT behavior to stats.sql The test added by `5753d4ee32` relies on statistics collector, and so it may occasionally fail when the UDP packet gets lost. Some machines may be susceptible to this, probably depending on load etc. Move the test to stats.sql, which is known to already have this issue and people know to ignore it. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com	2021-12-11 05:32:35 +01:00
Tom Lane	b43f7c117e	Partially revert "Insert temporary debugging output in regression tests." This reverts much of commit `f03a9ca436`, but leaves the relpages/reltuples probe in select_parallel.sql. The pg_stat_all_tables probes are unstable enough to be annoying, and it no longer seems likely that they will teach us anything more about the underlying problem. I'd still like some more confirmation though that the observed plan instability is caused by VACUUM leaving relpages/reltuples as zero for one of these tables. Discussion: https://postgr.es/m/CA+hUKG+0CxrKRWRMf5ymN3gm+BECHna2B-q1w8onKBep4HasUw@mail.gmail.com	2019-08-11 18:55:32 -04:00
Tom Lane	f03a9ca436	Insert temporary debugging output in regression tests. We're seeing occasional instability in the plans generated for parallel queries on the "a_star" table hierarchy. This suggests that something is changing the planner's stats for those tables, but that should not be happening within a regression test run. To try to gather some information about what's happening, insert additional queries to check the basic page/tuple counts for these tables, as well as whether any vacuums or analyzes have happened on them. (We expect that only the database-wide VACUUM in sanity_check.sql will have touched them.) I added the probes not only in select_parallel.sql itself, but also in stats.sql, bearing in mind that the stats collector's lag may prevent the initial query from reporting current truth. If any extra vacuum/analyze has happened, the recheck in stats.sql definitely ought to see it. This commit can be reverted once we figure out what's going on. Per suggestion from David Rowley, though I changed the queries around. Discussion: https://postgr.es/m/CA+hUKG+0CxrKRWRMf5ymN3gm+BECHna2B-q1w8onKBep4HasUw@mail.gmail.com	2019-05-21 12:23:21 -04:00
Tom Lane	5deadfef28	Fix misapplication of pgstat_count_truncate to wrong relation. The stanza of ExecuteTruncate[Guts] that truncates a target table's toast relation re-used the loop local variable "rel" to reference the toast rel. This was safe enough when written, but commit `d42358efb` added code below that that supposed "rel" still pointed to the parent table. Therefore, the stats counter update was applied to the wrong relcache entry (the toast rel not the user rel); and if we were unlucky and that relcache entry had been flushed during reindex_relation, very bad things could ensue. (I'm surprised that CLOBBER_CACHE_ALWAYS testing hasn't found this. I'm even more surprised that the problem wasn't detected during the development of d42358efb; it must not have been tested in any case with a toast table, as the incorrect stats counts are very obvious.) To fix, replace use of "rel" in that code branch with a more local variable. Adjust test cases added by `d42358efb` so that some of them use tables with toast tables. Per bug #15540 from Pan Bian. Back-patch to 9.5 where `d42358efb` came in. Discussion: https://postgr.es/m/15540-01078812338195c0@postgresql.org	2018-12-07 12:11:59 -05:00
Tom Lane	7c70996ebf	Allow bitmap scans to operate as index-only scans when possible. If we don't have to return any columns from heap tuples, and there's no need to recheck qual conditions, and the heap page is all-visible, then we can skip fetching the heap page altogether. Skip prefetching pages too, when possible, on the assumption that the recheck flag will remain the same from one page to the next. While that assumption is hardly bulletproof, it seems like a good bet most of the time, and better than prefetching pages we don't need. This commit installs the executor infrastructure, but doesn't change any planner cost estimates, thus possibly causing bitmap scans to not be chosen in cases where this change renders them the best choice. I (tgl) am not entirely convinced that we need to account for this behavior in the planner, because I think typically the bitmap scan would get chosen anyway if it's the best bet. In any case the submitted patch took way too many shortcuts, resulting in too many clearly-bad choices, to be committable. Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around rather heavily by me. Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru	2017-11-01 17:38:20 -04:00
Tom Lane	eda4ef8151	stats regression test's wait_for_stats() must check timestamp too. pg_stat_get_snapshot_timestamp() returns the timestamp seen in the "global" stats file. Because pgstat_write_statsfiles() writes per-DB stats files before the global file (or at least before renaming it into place), there is a window where the test backend can see all the stats updates that wait_for_stats() was checking for (all of which come from the per-DB file) but also see the same global stats file it had seen at the start of the test script. This results in a failure in only the "snapshot_newer" query, as reported by a couple of buildfarm members recently. I suspect that this ought to be back-patched. Commit `4e37b3e15` has evidently increased the probability of this window getting hit, but it's not apparent why it could not have been hit before. I'll refrain for the moment though.	2017-05-14 23:33:18 -04:00
Tom Lane	7606bbb3de	Make stats regression test more robust in the face of parallel query. Commit `60690a6fe` attempted to fix the wait_for_stats() function in this test so that it would wait properly if the tenk2 scans were done in parallel workers instead of the main session (typically as a consequence of force_parallel_mode being turned on). However, we made it test for whether the main session's actions had been reported by looking for inserts on 'trunc_stats_test'. This is the Wrong Thing, because those aren't the last updates we expect the main session to do. As shown by recent failures on buildfarm member frogmouth, it's entirely likely that the trunc_stats_test updates will be reported in a separate message from later updates, which means there can be a window in which wait_for_stats() will exit but not all the updates we are expecting to see will have arrived. We should test for the last updates we're expecting, namely those on 'trunc_stats_test4'. Unfortunately, I doubt that this explains frogmouth's failures, because there's no reason to believe that it's running the tenk2 queries in parallel. Still, the test is wrong on its own terms, so fix and back-patch to 9.6 where parallel query came in.	2017-05-14 21:39:10 -04:00
Tom Lane	4e37b3e15c	Avoid hard-wired sleep delays in stats regression test. On faster machines, the overall runtime for running the core regression tests is under twenty seconds these days, of which the hard-wired delays in the stats test are a significant fraction. But on closer inspection, it seems like we shouldn't need those. The initial 2-second delay is there only to reduce the risk of the test's stats messages not getting sent due to contention. But analysis of the last ten years' worth of buildfarm runs shows no evidence that such failures actually occur. (We do see failures that look like stats messages not getting sent, particularly on Windows; but there is little reason to believe that the initial delay reduces their frequency.) The later 1-second delay is there to ensure that our session's stats will have gotten sent. But we could also do that by starting a fresh session, which takes well under 1 second even on very slow machines. Hence, let's remove both delays and see what happens. The first delay was the only test of pg_sleep_for() in the regression tests, but we can move that responsibility into wait_for_stats(). Discussion: https://postgr.es/m/17795.1493869423@sss.pgh.pa.us	2017-05-13 09:42:12 -04:00
Tom Lane	60690a6fe8	Make stats regression test robust in the face of parallel query. Historically, the wait_for_stats() function in this test has simply checked for a report of an indexscan on tenk2, corresponding to the last command issued before we expect stats updates to appear. However, with parallel query that indexscan could be done by a parallel worker that will emit its stats counters to the collector before the session's main backend does (a full second before, in fact, thanks to the "pg_sleep(1.0)" added by commit `957d08c81f`). That leaves a sizable window in which an autovacuum-triggered write of the stats files would present a state in which the indexscan on tenk2 appears to have been done, but none of the write updates performed by the test have been. This is evidently the explanation for intermittent failures seen by me and on buildfarm member mandrill. To fix, we should check separately for both the tenk2 seqscan and indexscan counts, since those might be reported by different processes that could be delayed arbitrarily on an overloaded test machine. And we need to check for at least one update-related count. If we ever allow parallel workers to do writes, this will get even more complicated ... but in view of all the other hard problems that will entail, I don't feel a need to solve this one today. Per research by Rahila Syed and myself; part of this patch is Rahila's.	2016-03-04 16:20:49 -05:00
Alvaro Herrera	d42358efb1	Have TRUNCATE update pgstat tuple counters This works by keeping a per-subtransaction record of the ins/upd/del counters before the truncate, and then resetting them; this record is useful to return to the previous state in case the truncate is rolled back, either in a subtransaction or whole transaction. The state is propagated upwards as subtransactions commit. When the per-table data is sent to the stats collector, a flag indicates to reset the live/dead counters to zero as well. Catalog version bumped due to the change in pgstat format. Author: Alexander Shulgin Discussion: 1007.1207238291@sss.pgh.pa.us Discussion: 548F7D38.2000401@BlueTreble.com Reviewed-by: Álvaro Herrera, Jim Nasby	2015-02-20 12:10:01 -03:00
Tom Lane	2fb7a75f37	Add pg_stat_get_snapshot_timestamp() to show statistics snapshot timestamp. Per discussion, this could be useful for purposes such as programmatically detecting a nonresponding stats collector. We already have the timestamp anyway, it's just a matter of providing a SQL-accessible function to fetch it. Matt Kelly, reviewed by Jim Nasby	2015-02-19 21:36:50 -05:00
Robert Haas	760c770ff6	Add convenience functions pg_sleep_for and pg_sleep_until. Vik Fearing, reviewed by Pavel Stehule and myself	2014-01-30 15:47:56 -05:00
Tom Lane	45401c1c25	Prevent index-only scans in stats regression test. This bollixes the test because it's expecting to see the idx_tup_fetch counter increase, which won't happen if heap fetches were avoided by use of an index-only scan. Per buildfarm results. While at it, let's just make sure that enable_seqscan and enable_indexscan are ON for this test ...	2011-10-08 23:45:58 -04:00
Tom Lane	48f7e64395	Simplify and rename some GUC variables, per various recent discussions: * stats_start_collector goes away; we always start the collector process, unless prevented by a problem with setting up the stats UDP socket. * stats_reset_on_server_start goes away; it seems useless in view of the availability of pg_stat_reset(). * stats_block_level and stats_row_level are merged into a single variable "track_counts", which controls all reports sent to the collector process. * stats_command_string is renamed to track_activities. * log_autovacuum is renamed to log_autovacuum_min_duration to better reflect its meaning. The log_autovacuum change is not a compatibility issue since it didn't exist before 8.3 anyway. The other changes need to be release-noted.	2007-09-24 03:12:23 +00:00
Tom Lane	957d08c81f	Implement rate-limiting logic on how often backends will attempt to send messages to the stats collector. This avoids the problem that enabling stats_row_level for autovacuum has a significant overhead for short read-only transactions, as noted by Arjen van der Meijden. We can avoid an extra gettimeofday call by piggybacking on the one done for WAL-logging xact commit or abort (although that doesn't help read-only transactions, since they don't WAL-log anything). In my proposal for this, I noted that we could change the WAL log entries for commit/abort to record full TimestampTz precision, instead of only time_t as at present. That's not done in this patch, but will be committed separately.	2007-04-30 03:23:49 +00:00
Tom Lane	aec4cf1c8c	Add a function pg_stat_clear_snapshot() that discards any statistics snapshot already collected in the current transaction; this allows plpgsql functions to watch for stats updates even though they are confined to a single transaction. Use this instead of the previous kluge involving pg_stat_file() to wait for the stats collector to update in the stats regression test. Internally, decouple storage of stats snapshots from transaction boundaries; they'll now stick around until someone calls pgstat_clear_snapshot --- which xact.c still does at transaction end, to maintain the previous behavior. This makes the logic a lot cleaner, at the price of a couple dozen cycles per transaction exit.	2007-02-07 23:11:30 +00:00
Tom Lane	d9ce68872f	Modify the stats regression test to delay until the stats file actually changes (with an upper limit of 30 seconds), and record the delay time in the postmaster log. This should give us some info about what's happening with the intermittent stats failures in buildfarm. After an idea of Andrew Dunstan's.	2007-02-07 18:34:56 +00:00
Tom Lane	9bf559dee3	Add a delay at the start of the stats test, to let any prior stats activity quiesce. Possibly this will fix the large increase in non-reproducible stats test failures we've noted since turning on stats_row_level by default.	2007-01-28 03:02:31 +00:00
Tom Lane	782eefc580	Create a standard function pg_sleep() to sleep for a specified amount of time. Replace the former ad-hoc implementation used in the regression tests. Joachim Wieland	2006-01-11 20:12:43 +00:00
Tom Lane	cb8b6618ce	Revise pgstats stuff to fix the problems with not counting accesses generated by bitmap index scans. Along the way, simplify and speed up the code for counting sequential and index scans; it was both confusing and inefficient to be taking care of that in the per-tuple loops, IMHO. initdb forced because of internal changes in pg_stat view definitions.	2005-10-06 02:29:23 +00:00
Tom Lane	6c61b0d93c	In the stats test, delay for the stats collector to catch up using a function that actually sleeps, instead of busy-waiting. Perhaps this will resolve some of the intermittent stats failures we keep seeing.	2005-07-23 14:18:57 +00:00
Tom Lane	bc843d3960	First cut at planner support for bitmap index scans. Lots to do yet, but the code is basically working. Along the way, rewrite the entire approach to processing OR index conditions, and make it work in join cases for the first time ever. orindxpath.c is now basically obsolete, but I left it in for the time being to allow easy comparison testing against the old implementation.	2005-04-22 21:58:32 +00:00
Peter Eisentraut	f7e5e9d493	More whitespace fixes. Do people write the expected files by hand?	2003-11-01 03:18:20 +00:00
Peter Eisentraut	32ab99ba10	Fix hidden whitespace differences between expected and result files.	2003-11-01 03:07:07 +00:00
Bruce Momjian	cd47a4d3c4	With pg_autovacuum becoming increasingly popular it's important to have a working stats collector. This test is able to discover the problem that was present in 7.4 Beta 2. Manfred Koizar	2003-09-13 16:44:49 +00:00

36 Commits