Commit Graph

42 Commits

Author SHA1 Message Date
Amit Kapila 69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Amit Kapila a6fea120a7 Comments and doc fixes for commit 40d964ec99.
Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila and Sawada Masahiko
Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com
2020-04-14 08:10:27 +05:30
Amit Kapila ef08ca113f Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-13 15:31:16 +05:30
Amit Kapila b7ce6de93b Allow autovacuum to log WAL usage statistics.
This commit allows autovacuum to log WAL usage statistics added by commit
df3b181499.

Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-06 16:24:51 +05:30
Amit Kapila df3b181499 Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution.  The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.

This helps the users who have write-intensive workload to see the impact
of I/O due to WAL.  This further enables us to see approximately what
percentage of overall WAL is due to full page writes.

In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.

This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.

Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-04 10:02:08 +05:30
Amit Kapila 3a5e22138a Allow parallel vacuum to accumulate buffer usage.
Commit 40d964ec99 allowed vacuum command to process indexes in parallel but
forgot to accumulate the buffer usage stats of parallel workers.  This
allows leader backend to accumulate buffer usage stats of all the parallel
workers.

Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
2020-04-02 08:04:58 +05:30
Amit Kapila ef75140fe7 Avoid calls to RelationGetRelationName() and RelationGetNamespace() in
vacuum code.

After commit b61d161c14, during vacuum, we cache the information of
relation name and relation namespace in local structure LVRelStats so that
we can use it in an error callback function.  We can use the cached
information to avoid the calls to RelationGetRelationName(),
RelationGetNamespace() and get_namespace_name().  This is mainly for the
consistent in vacuum code path but it will avoid the extra syscache lookup
we do in get_namespace_name().

Author: Justin Pryzby
Reviewed-by: Amit Kapila
Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
2020-03-31 09:34:49 +05:30
Michael Paquier dd9ac7d5d8 Revert "Skip redundant anti-wraparound vacuums"
This reverts commit 2aa6e33, that added a fast path to skip
anti-wraparound and non-aggressive autovacuum jobs (these have no sense
as anti-wraparound implies aggressive).  With a cluster using a high
amount of relations with a portion of them being heavily updated, this
could cause autovacuum to lock down, with autovacuum workers attempting
repeatedly those jobs on the same relations for the same database, that
just kept being skipped.  This lock down can be solved with a manual
VACUUM FREEZE.

Justin King has reported one environment where the issue happened, and
Julien Rouhaud and I have been able to reproduce it in a second
environment.  With a very aggressive autovacuum_freeze_max_age,
triggering those jobs with pgbench is a matter of minutes, and hitting
the lock down is a lot harder (my local tests failed to do that).

Note that anti-wraparound and non-aggressive jobs can only be triggered
on a subset of shared catalogs:
- pg_auth_members
- pg_authid
- pg_database
- pg_replication_origin
- pg_shseclabel
- pg_subscription
- pg_tablespace
While the lock down was possible down to v12, the root cause of those
jobs is a much older issue, which needs more analysis.

Bonus thanks to Andres Freund for the discussion.

Reported-by: Justin King
Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com
Backpatch-through: 12
2020-03-31 08:27:47 +09:00
Andres Freund d4b34f60c5 Deduplicate PageIsNew() check in lazy_scan_heap().
The recheck isn't needed anymore, as RelationGetBufferForTuple() now
extends the relation with RBM_ZERO_AND_LOCK. Previously we needed to
handle the fact that relation extension extended the relation and then
separately acquired a lock on the page - while expecting that the page
is empty.

Reported-By: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQArA_=J0D5T258xsCY6Xtf6wiH4b=QDPDgVS+WZUN10WDw@mail.gmail.com
2020-03-30 13:56:40 -07:00
Amit Kapila b61d161c14 Introduce vacuum errcontext to display additional information.
The additional information displayed will be block number for error
occurring while processing heap and index name for error occurring
while processing the index.

This will help us in diagnosing the problems that occur during a vacuum.
For ex. due to corruption (either caused by bad hardware or by some bug)
if we get some error while vacuuming, it can help us identify the block
in heap and or additional index information.

It sets up an error context callback to display additional information
with the error.  During different phases of vacuum (heap scan, heap
vacuum, index vacuum, index clean up, heap truncate), we update the error
context callback to display appropriate information.  We can extend it to
a bit more granular level like adding the phases for FSM operations or for
prefetching the blocks while truncating. However, I felt that it requires
adding many more error callback function calls and can make the code a bit
complex, so left those for now.

Author: Justin Pryzby, with few changes by Amit Kapila
Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier
and Sawada Masahiko
Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
2020-03-30 07:33:38 +05:30
Fujii Masao 0074919794 Fix mesurement of elapsed time during truncating heap in VACUUM.
VACUUM may truncate heap in several batches. The activity report
is logged for each batch, and contains the number of pages in the table
before and after the truncation, and also the elapsed time during
the truncation. Previously the elapsed time reported in each batch was
the total elapsed time since starting the truncation until finishing
each batch. For example, if the truncation was processed dividing into
three batches, the second batch reported the accumulated time elapsed
during both first and second batches. This is strange and confusing
because the number of pages in the table reported together is not
total. Instead, each batch should report the time elapsed during
only that batch.

The cause of this issue was that the resource usage snapshot was
initialized only at the beginning of the truncation and was never
reset later. This commit fixes the issue by changing VACUUM so that
the resource usage snapshot is reset at each batch.

Back-patch to all supported branches.

Reported-by: Tatsuhito Kasahara
Author: Tatsuhito Kasahara
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com
2020-02-19 20:37:26 +09:00
Alvaro Herrera 15d13e8291 Make vacuum buffer counters 64 bits wide
Using 32 bit counters means they can now realistically wrap around when
vacuuming extremely large tables.  Because they're signed integers,
stats printed by vacuum look very odd when they do.

We'd love to backpatch this, but refrain because the variables are
exported and could cause third-party code to break.

Reviewed-by: Julien Rouhaud, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/20200131205926.GA16367@alvherre.pgsql
2020-02-05 16:59:29 -03:00
Michael Paquier f942dfb952 Clarify some comments in vacuumlazy.c
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200113004542.GA26045@telsasoft.com
2020-01-23 15:56:56 +09:00
Amit Kapila 79a3efb84d Fix the computation of max dead tuples during the vacuum.
In commit 40d964ec99, we changed the way memory is allocated for dead
tuples but forgot to update the place where we compute the maximum
number of dead tuples.  This could lead to invalid memory requests.

Reported-by: Andres Freund
Diagnosed-by: Andres Freund
Author: Masahiko Sawada
Reviewed-by: Amit Kapila and Dilip Kumar
Discussion: https://postgr.es/m/20200121060020.e3cr7s7fj5rw4lok@alap3.anarazel.de
2020-01-22 07:43:51 +05:30
Amit Kapila 40d964ec99 Allow vacuum command to process indexes in parallel.
This feature allows the vacuum to leverage multiple CPUs in order to
process indexes.  This enables us to perform index vacuuming and index
cleanup with background workers.  This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table.  Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.

Each index is processed by at most one vacuum process.  Therefore parallel
vacuum can be used when the table has at least two indexes.

The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.

Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
Mahendra Singh and Sergei Kornilov
Tested-by: Mahendra Singh and Prabhat Sahu
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com
2020-01-20 07:57:49 +05:30
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Michael Paquier 1ab41a3c8e Refactor code dedicated to index vacuuming in vacuumlazy.c
The part in charge of doing the vacuum on all the indexes of a relation
was duplicated, with the same handling for progress reporting done.
While on it, update the progress reporting for heap vacuuming in the
subroutine doing the actual work, keeping the status update local.  This
way, any future caller of lazy_vacuum_heap() does not have to worry
about doing any progress reporting update.

Author: Justin Pryzby, Michael Paquier
Discussion: https://postgr.es/m/20191120210600.GC30362@telsasoft.com
2019-12-26 17:01:23 +09:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Michael Paquier e5a02e0fc6 Remove duplicated progress reporting during heap scan of VACUUM
This has been introduced by c16dc1a since progress reporting for VACUUM
has been added.  As this issue just causes some extra work and is
harmless, no backpatch is done.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20191213030831.GT2082@telsasoft.com
2019-12-15 22:05:33 +09:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Peter Geoghegan ae7291acbc Standardize ItemIdData terminology.
The term "item pointer" should not be used to refer to ItemIdData
variables, since that is needlessly ambiguous.  Only
ItemPointerData/ItemPointer variables should be called item pointers.

To fix, establish the convention that ItemIdData variables should always
be referred to either as "item identifiers" or "line pointers".  The
term "item identifier" already predominates in docs and translatable
messages, and so should be the preferred alternative there.

Discussion: https://postgr.es/m/CAH2-Wz=c=MZQjUzde3o9+2PLAPuHTpVZPPdYxN=E4ndQ2--8ew@mail.gmail.com
2019-05-13 15:53:39 -07:00
Fujii Masao b84dbc8eb8 Add TRUNCATE parameter to VACUUM.
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.

This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-08 02:10:33 +09:00
Amit Kapila 7db0cde6b5 Revert "Avoid the creation of the free space map for small heap relations".
This feature was using a process local map to track the first few blocks
in the relation.  The map was reset each time we get the block with enough
freespace.  It was discussed that it would be better to track this map on
a per-relation basis in relcache and then invalidate the same whenever
vacuum frees up some space in the page or when FSM is created.  The new
design would be better both in terms of API design and performance.

List of commits reverted, in reverse chronological order:

06c8a5090e  Improve code comments in b0eaa4c51b.
13e8643bfc  During pg_upgrade, conditionally skip transfer of FSMs.
6f918159a9  Add more tests for FSM.
9c32e4c350  Clear the local map when not used.
29d108cdec  Update the documentation for FSM behavior..
08ecdfe7e5  Make FSM test portable.
b0eaa4c51b  Avoid creation of the free space map for small heap relations.

Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
2019-05-07 09:30:24 +05:30
Robert Haas dd69597988 Fix some problems with VACUUM (INDEX_CLEANUP FALSE).
The new nleft_dead_tuples and nleft_dead_itemids fields are confusing
and do not seem like the correct way forward.  One of them is tested
via an assertion that can fail, as it has already done on buildfarm
member topminnow.  Remove the assertion and the fields.

Change the logic for the case where a tuple is not initially pruned
by heap_page_prune but later diagnosed HEAPTUPLE_DEAD by
HeapTupleSatisfiesVacuum.  Previously, tupgone = true was set in
that case, which leads to treating the tuple as one that will be
removed.  In a normal vacuum, that's OK, because we'll remove
index entries for it and then the second heap pass will remove the
tuple itself, but when index cleanup is disabled, those things
don't happen, so we must instead treat it as a recently-dead
tuple that we have voluntarily chosen to keep.

Report and analysis by Tom Lane.  This patch loosely based on one
from Masahiko Sawada, but I changed most of it.
2019-05-02 10:07:13 -04:00
Andres Freund fdc7efcc30 Allow pg_class xid & multixid horizons to not be set.
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.

Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry.  This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value.  Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23 21:42:12 -07:00
Fujii Masao 119dcfad98 Add vacuum_truncate reloption.
vacuum_truncate controls whether vacuum tries to truncate off
any empty pages at the end of the table. Previously vacuum always
tried to do the truncation. However, the truncation could cause
some problems; for example, ACCESS EXCLUSIVE lock needs to
be taken on the table during the truncation and can cause
the query cancellation on the standby even if hot_standby_feedback
is true. Setting this reloption to false can be helpful to avoid
such problems.

Author: Tsunakawa Takayuki
Reviewed-By: Julien Rouhaud, Masahiko Sawada, Michael Paquier, Kirk Jamison and Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwE5UqFqSq1=kV3QtTUtXphTdyHA-8rAj4A=Y+e4kyp3BQ@mail.gmail.com
2019-04-08 16:43:57 +09:00
Robert Haas a96c41feec Allow VACUUM to be run with index cleanup disabled.
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.

Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
2019-04-04 15:04:43 -04:00
Alvaro Herrera 5f768045a1 Correctly initialize newly added struct member
Valgrind was rightly complaining that IndexVacuumInfo->report_progress
(added by commit ab0dfc961b) was not being initialized in some code
paths.  Repair.

Per buildfarm member lousyjack.
2019-04-03 09:58:47 -03:00
Michael Paquier 2aa6e331ea Skip redundant anti-wraparound vacuums
An anti-wraparound vacuum has to be by definition aggressive as it needs
to work on all the pages of a relation.  However it can happen that due
to some concurrent activity an anti-wraparound vacuum is marked as
non-aggressive, which makes it redundant with a previous run, and
it is actually useless as an anti-wraparound vacuum should process all
the pages of a relation.  This commit makes such vacuums to be skipped.

An anti-wraparound vacuum not aggressive can be found easily by mixing
low values of autovacuum_freeze_max_age (to control anti-wraparound) and
autovacuum_freeze_table_age (to control the aggressiveness).

28a8fa9 has added some extra logging printing all the possible
combinations of anti-wraparound and aggressive vacuums, which now gets
simplified as an anti-wraparound vacuum also non-aggressive gets
skipped.

Per discussion mainly between Andrew Dunstan, Robert Haas, Álvaro
Herrera, Kyotaro Horiguchi, Masahiko Sawada, and myself.

Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Andrew Dunstan, Álvaro Herrera
Discussion: https://postgr.es/m/20180914153554.562muwr3uwujno75@alvherre.pgsql
2019-03-31 22:59:12 +09:00
Robert Haas f41551f61f Fold vacuum's 'int options' parameter into VacuumParams.
Many places need both, so this allows a few functions to take one
fewer parameter.  More importantly, as soon as we add a VACUUM
option that takes a non-Boolean parameter, we need to replace
'int options' with a struct, and it seems better to think
of adding more fields to VacuumParams rather than passing around
both VacuumParams and a separate struct as well.

Patch by me, reviewed by Masahiko Sawada

Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com
2019-03-18 13:57:33 -04:00
Amit Kapila b0eaa4c51b Avoid creation of the free space map for small heap relations, take 2.
Previously, all heaps had FSMs. For very small tables, this means that the
FSM took up more space than the heap did. This is wasteful, so now we
refrain from creating the FSM for heaps with 4 pages or fewer. If the last
known target block has insufficient space, we still try to insert into some
other page before giving up and extending the relation, since doing
otherwise leads to table bloat. Testing showed that trying every page
penalized performance slightly, so we compromise and try every other page.
This way, we visit at most two pages. Any pages with wasted free space
become visible at next relation extension, so we still control table bloat.
As a bonus, directly attempting one or two pages can even be faster than
consulting the FSM would have been.

Once the FSM is created for a heap we don't remove it even if somebody
deletes all the rows from the corresponding relation.  We don't think it is
a useful optimization as it is quite likely that relation will again grow
to the same size.

Author: John Naylor, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Mithun C Y
Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
2019-02-04 07:49:15 +05:30
Andres Freund 0d1fe9f74e Move page initialization from RelationAddExtraBlocks() to use, take 2.
Previously we initialized pages when bulk extending in
RelationAddExtraBlocks(). That has a major disadvantage: It ties
RelationAddExtraBlocks() to heap, as other types of storage are likely
to need different amounts of special space, have different amount of
free space (previously determined by PageGetHeapFreeSpace()).

That we're relying on initializing pages, but not WAL logging the
initialization, also means the risk for getting
"WARNING:  relation \"%s\" page %u is uninitialized --- fixing"
style warnings in vacuums after crashes/immediate shutdowns, is
considerably higher. The warning sounds much more serious than what
they are.

Fix those two issues together by not initializing pages in
RelationAddExtraPages() (but continue to do so in
RelationGetBufferForTuple(), which is linked much more closely to
heap), and accepting uninitialized pages as normal in
vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it
to the FSM, but does nothing else.  We chose to not issue a debug
message, much less a warning in that case - it seems rarely useful,
and quite likely to scare people unnecessarily.

For now empty pages aren't added to the VM, because standbys would not
re-discover such pages after a promotion. In contrast to other sources
for empty pages, there's no corresponding WAL records triggering FSM
updates during replay.

Previously when extending the relation, there was a moment between
extending the relation, and acquiring an exclusive lock on the new
page, in which another backend could lock the page. To avoid new
content being put on that new page, vacuumlazy needed to acquire the
extension lock for a brief moment when encountering a new page. A
second corner case, only working somewhat by accident, was that
RelationGetBufferForTuple() sometimes checks the last page in a
relation for free space, without consulting the FSM; that only worked
because PageGetHeapFreeSpace() interprets the zero page header in a
new page as no free space.  The lack of handling this properly
required reverting the previous attempt in 684200543b.

This issue can be solved by using RBM_ZERO_AND_LOCK when extending the
relation, thereby avoiding this window. There's some added complexity
when RelationGetBufferForTuple() is called with another buffer (for
updates), to avoid deadlocks, but that's rarely hit at runtime.

Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 01:27:19 -08:00
Andres Freund 684200543b Revert "Move page initialization from RelationAddExtraBlocks() to use."
This reverts commit fc02e6724f and
e6799d5a53.

Parts of the buildfarm error out with
ERROR: page %u of relation "%s" should be empty but is not
errors, and so far I/we do not know why. fc02e672 didn't fix the
issue.  As I cannot reproduce the issue locally, it seems best to get
the buildfarm green again, and reproduce the issue without time
pressure.
2019-01-28 17:16:56 -08:00
Andres Freund fc02e6724f Fix race condition between relation extension and vacuum.
In e6799d5a53 I removed vacuumlazy.c trickery around re-checking
whether a page is actually empty after acquiring an extension lock on
the relation, because the page is not PageInit()ed anymore, and
entries in the FSM ought not to lead to user-visible errors.

As reported by various buildfarm animals that is not correct, given
the way to code currently stands: If vacuum processes a page that's
just been newly added by either RelationGetBufferForTuple() or
RelationAddExtraBlocks(), it could add that page to the FSM and it
could be reused by other backends, before those two functions check
whether the newly added page is actually new.  That's a relatively
narrow race, but several buildfarm machines appear to be able to hit
it.

While it seems wrong that the FSM, given it's lack of durability and
approximative nature, can trigger errors like this, that seems better
fixed in a separate commit. Especially given that a good portion of
the buildfarm is red, and this is just re-introducing logic that
existed a few hours ago.

Author: Andres Freund
Discussion: https://postgr.es/m/20190128222259.zhi7ovzgtkft6em6@alap3.anarazel.de
2019-01-28 15:44:12 -08:00
Andres Freund e6799d5a53 Move page initialization from RelationAddExtraBlocks() to use.
Previously we initialized pages when bulk extending in
RelationAddExtraBlocks(). That has a major disadvantage: It ties
RelationAddExtraBlocks() to heap, as other types of storage are likely
to need different amounts of special space, have different amount of
free space (previously determined by PageGetHeapFreeSpace()).

That we're relying on initializing pages, but not WAL logging the
initialization, also means the risk for getting
"WARNING:  relation \"%s\" page %u is uninitialized --- fixing"
style warnings in vacuums after crashes/immediate shutdowns, is
considerably higher. The warning sounds much more serious than what
they are.

Fix those two issues together by not initializing pages in
RelationAddExtraPages() (but continue to do so in
RelationGetBufferForTuple(), which is linked much more closely to
heap), and accepting uninitialized pages as normal in
vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it
to the FSM, but does nothing else.  We chose to not issue a debug
message, much less a warning in that case - it seems rarely useful,
and quite likely to scare people unnecessarily.

For now empty pages aren't added to the VM, because standbys would not
re-discover such pages after a promotion. In contrast to other sources
for empty pages, there's no corresponding WAL records triggering FSM
updates during replay.

Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-01-28 13:15:11 -08:00
Amit Kapila a23676503b Revert "Avoid creation of the free space map for small heap relations."
This reverts commit ac88d2962a.
2019-01-28 11:31:44 +05:30
Amit Kapila ac88d2962a Avoid creation of the free space map for small heap relations.
Previously, all heaps had FSMs. For very small tables, this means that the
FSM took up more space than the heap did. This is wasteful, so now we
refrain from creating the FSM for heaps with 4 pages or fewer. If the last
known target block has insufficient space, we still try to insert into some
other page before giving up and extending the relation, since doing
otherwise leads to table bloat. Testing showed that trying every page
penalized performance slightly, so we compromise and try every other page.
This way, we visit at most two pages. Any pages with wasted free space
become visible at next relation extension, so we still control table bloat.
As a bonus, directly attempting one or two pages can even be faster than
consulting the FSM would have been.

Once the FSM is created for a heap we don't remove it even if somebody
deletes all the rows from the corresponding relation.  We don't think it is
a useful optimization as it is quite likely that relation will again grow
to the same size.

Author: John Naylor with design inputs and some code contribution by Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Mithun C Y
Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
2019-01-28 08:14:06 +05:30
Andres Freund c91560defc Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c.
Given these routines are heap specific, and that there will be more
generic visibility support in via table AM, it makes sense to move the
prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will
not be exposed in a generic fashion, because they are too storage
specific).

Similarly, the code in tqual.c is specific to heap, so moving it into
access/heap/ makes sense.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 17:07:10 -08:00
Michael Paquier 80971bc206 Fix incorrect relation name in comment of vacuumlazy.c
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoBiOiapB7YGbWRfNZji3cs1gkEwv=uGLTemaZ9yNKK1DA@mail.gmail.com
2019-01-18 13:53:43 +09:00
Andres Freund 285d8e1205 Move vacuumlazy.c into access/heap.
It's heap table storage specific code that can't realistically be
generalized into table AM agnostic code.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-15 12:06:19 -08:00