Commit Graph

109 Commits

Author SHA1 Message Date
Heikki Linnakangas d661532e27 Also trigger restartpoints based on max_wal_size on standby.
When archive recovery and restartpoints were initially introduced,
checkpoint_segments was ignored on the grounds that the files restored from
archive don't consume any space in the recovery server. That was changed in
later releases, but even then it was arguably a feature rather than a bug,
as performing restartpoints as often as checkpoints during normal operation
might be excessive, but you might nevertheless not want to waste a lot of
space for pre-allocated WAL by setting checkpoint_segments to a high value.
But now that we have separate min_wal_size and max_wal_size settings, you
can bound WAL usage with max_wal_size, and still avoid consuming excessive
space usage by setting min_wal_size to a lower value, so that argument is
moot.

There are still some issues with actually limiting the space usage to
max_wal_size: restartpoints in recovery can only start after seeing the
checkpoint record, while a checkpoint starts flushing buffers as soon as
the redo-pointer is set. Restartpoint is paced to happen at the same
leisurily speed, determined by checkpoint_completion_target, as checkpoints,
but because they are started later, max_wal_size can be exceeded by upto
one checkpoint cycle's worth of WAL, depending on
checkpoint_completion_target. But that seems better than not trying at all,
and max_wal_size is a soft limit anyway.

The documentation already claimed that max_wal_size is obeyed in recovery,
so this just fixes the behaviour to match the docs. However, add some
weasel-words there to mention that max_wal_size may well be exceeded by
some amount in recovery.
2015-06-29 00:09:10 +03:00
Andres Freund a0f5954af1 Increase max_wal_size's default from 128MB to 1GB.
The introduction of min_wal_size & max_wal_size in 88e9823026 makes it
feasible to increase the default upper bound in checkpoint
size. Previously raising the default would lead to a increased disk
footprint, even if more segments weren't beneficial.  The low default of
checkpoint size is one of common performance problem users have thus
increasing the default makes sense.  Setups where the increase in
maximum disk usage is a problem will very likely have to run with a
modified configuration anyway.

Discussion: 54F4EFB8.40202@agliodbs.com,
    CA+TgmoZEAgX5oMGJOHVj8L7XOkAe05Gnf45rP40m-K3FhZRVKg@mail.gmail.com

Author: Josh Berkus, after a discussion involving lots of people.
2015-03-15 17:37:07 +01:00
Heikki Linnakangas 88e9823026 Replace checkpoint_segments with min_wal_size and max_wal_size.
Instead of having a single knob (checkpoint_segments) that both triggers
checkpoints, and determines how many checkpoints to recycle, they are now
separate concerns. There is still an internal variable called
CheckpointSegments, which triggers checkpoints. But it no longer determines
how many segments to recycle at a checkpoint. That is now auto-tuned by
keeping a moving average of the distance between checkpoints (in bytes),
and trying to keep that many segments in reserve. The advantage of this is
that you can set max_wal_size very high, but the system won't actually
consume that much space if there isn't any need for it. The min_wal_size
sets a floor for that; you can effectively disable the auto-tuning behavior
by setting min_wal_size equal to max_wal_size.

The max_wal_size setting is now the actual target size of WAL at which a
new checkpoint is triggered, instead of the distance between checkpoints.
Previously, you could calculate the actual WAL usage with the formula
"(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this
patch, you set the desired WAL usage with max_wal_size, and the system
calculates the appropriate CheckpointSegments with the reverse of that
formula. That's a lot more intuitive for administrators to set.

Reviewed by Amit Kapila and Venkata Balaji N.
2015-02-23 18:53:02 +02:00
Magnus Hagander 6d6cade05b Fix missing space in documentation
Ian Barwick
2014-12-01 12:12:07 +01:00
Heikki Linnakangas 2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Peter Eisentraut 220bb39dee doc: Reflect renaming of Mac OS X to OS X
bug #10528
2014-09-09 13:56:29 -04:00
Bruce Momjian 85317e88cc doc: mention data page checksums in WAL section
Backpatch to 9.3

Adjusted patch from Ian Lawrence Barwick
2014-01-31 19:05:00 -05:00
Peter Eisentraut 256f6ba78a Documentation spell checking and markup improvements 2013-05-20 21:13:13 -04:00
Peter Eisentraut 540ec93e33 doc: Mention SATA alongside IDE for Linux
suggested by Jov
2013-04-20 15:56:22 -04:00
Simon Riggs 1a091002cf Clarify assumption of filesystem metadata integrity.
Jeff Davis
2013-03-19 08:57:29 +00:00
Simon Riggs 8c3b87ca10 Correction that 2pc state files use CRC-32.
Jeff Davis
2013-03-19 08:51:35 +00:00
Simon Riggs 2266db392c Add reliability docs about storage/memory corruptions.
Add section to the Reliability section about what is and is not protected for
various file types.
Further edits welcome.

Designed to allow 1-2 line change when/if checksums are committed.

Inspired by docs written by Jeff Davis, though completely different from his
patch.
2013-03-18 22:38:07 +00:00
Tom Lane 70ec2f8f43 Improve the documentation about commit_delay.
Clarify the docs explaining what commit_delay does, and add a
recommendation about a useful value for it, namely half of the single-page
fsync time reported by pg_test_fsync.  This is informed by testing of
the new-in-9.3 implementation of commit_delay; in prior versions it
was far harder to arrive at a useful setting.

In passing, do some wordsmithing and markup-fixing in the same general
area.

Also, change pg_test_fsync's default time-per-test from 2 seconds to 5.
The old value was about the minimum at which the results could be taken
seriously at all, and so seems a tad optimistic as a default.

Peter Geoghegan, reviewed by Noah Misch; some additional editing by me
2013-03-15 17:41:47 -04:00
Bruce Momjian 7682c5947d Update URLs that pointed to sun.com; either repoint them or remove
them.
2012-09-02 09:16:26 -04:00
Robert Haas 9bedfbd02b Fix checkpoint_timeout documentation to reflect current behavior.
Jeff Janes
2012-08-30 15:08:36 -04:00
Bruce Momjian 04d2956f0d Now that the diskchecker.pl author has updated the download link on his
website, revert the separate link to the download git repository.

Backpatch from 9.0 to current.
2012-07-30 10:15:57 -04:00
Bruce Momjian c9a2532c83 Update doc mention of diskchecker.pl to add URL for script; retain URL
for description.

Patch to 9.0 and later, where script is mentioned.
2012-07-26 21:25:26 -04:00
Magnus Hagander 817d870cf9 Remove reference to default wal_buffers being 8
This hasn't been true since 9.1, when the default was changed to -1.
Remove the reference completely, keeping the discussion of the parameter
and it's shared memory effects on the config page.
2012-07-04 09:23:51 +02:00
Robert Haas f11e8be3e8 Make commit_delay much smarter.
Instead of letting every backend participating in a group commit wait
independently, have the first one that becomes ready to flush WAL wait
for the configured delay, and let all the others wait just long enough
for that first process to complete its flush.  This greatly increases
the chances of being able to configure a commit_delay setting that
actually improves performance.

As a side consequence of this change, commit_delay now affects all WAL
flushes, rather than just commits.  There was some discussion on
pgsql-hackers about whether to rename the GUC to, say, wal_flush_delay,
but in the absence of consensus I am leaving it alone for now.

Peter Geoghegan, with some changes, mostly to the documentation, by me.
2012-07-02 10:26:31 -04:00
Bruce Momjian 58d746213d Improve fsync documentation by stating that -W _0_ turns of write
caching.
2012-02-14 17:42:06 -05:00
Heikki Linnakangas d4bad4e1e1 Fix sentence in docs: checkpoints are not done by bgwriter anymore. 2012-01-26 19:08:20 +02:00
Robert Haas bd2396988a Minor grammar improvements. 2011-11-07 12:27:26 -05:00
Simon Riggs 4334289186 Improve docs for timing and skipping of checkpoints
Greg Smith
2011-11-03 08:52:20 +00:00
Robert Haas e8bb5f7245 Improve documentation of how to fiddle with SCSI drives on FreeBSD.
Per suggestions from Achilleas Mantzios and Greg Smith.
2011-10-10 13:21:35 -04:00
Bruce Momjian 4869d10afc Update documentation on FreeBSD write cache control. 2011-03-11 11:36:42 -05:00
Bruce Momjian 159e3d8629 Update contrib documention mentions to point to actual documentation
sections, rather than just calling it "/contrib/module_name".

Also update pg_test_fsync build instructions now that it is in /contrib.
2011-01-26 09:22:21 -05:00
Bruce Momjian 5925aa09a9 Update SGML docs to point to new /contrib/pg_test_fsync. 2011-01-21 12:52:16 -05:00
Bruce Momjian 7a1ca8977f Document that BBU's do not allow partial page writes to be safely turned
off unless they guarantee that all writes to the BBU arrive in 8kB chunks.

Per discussion with Greg Smith
2010-12-22 21:12:00 -05:00
Tom Lane 576477e73c Force default wal_sync_method to be fdatasync on Linux.
Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux.  open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.

Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option.  Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.

In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.

Back-patch to all supported branches, since any of them might get used
on modern Linux versions.
2010-12-08 20:01:09 -05:00
Peter Eisentraut fc946c39ae Remove useless whitespace at end of lines 2010-11-23 22:34:55 +02:00
Peter Eisentraut 2999f4ef35 Remove tabs from SGML 2010-10-28 17:40:27 +03:00
Robert Haas 2cae0aeb9c Revert "Correct WAL space calculation formula in docs."
This reverts commit 915116bc62.

Per discussion, the previous formula was in fact correct.

http://archives.postgresql.org/pgsql-docs/2010-10/msg00038.php
2010-10-27 21:24:02 -04:00
Robert Haas 0d5deebe11 Reorganize OS-specific details about write caching into a list.
Along the way, clarify that sdparm can be used on Linux as well as FreeBSD.
2010-10-27 21:20:58 -04:00
Bruce Momjian f75d6a1b19 Add mention of using tools/fsync to test fsync methods. Restructure
recent wal_sync_method doc paragraph to be clearer.
2010-10-19 14:56:53 +00:00
Simon Riggs 915116bc62 Correct WAL space calculation formula in docs.
Error pointed out by Fujii Masao, though not his patch.
2010-10-15 10:17:12 +01:00
Robert Haas 694c56af2b Improve WAL reliability documentation, and add more cross-references to it.
In particular, we are now more explicit about the fact that you may need
wal_sync_method=fsync_writethrough for crash-safety on some platforms,
including MaxOS X.  There's also now an explicit caution against assuming
that the default setting of wal_sync_method is either crash-safe or best
for performance.
2010-10-07 12:22:00 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Peter Eisentraut 5194b9d049 Spell and markup checking 2010-08-17 04:37:21 +00:00
Peter Eisentraut d33cfbd2e0 Spelling fixes 2010-07-27 19:01:16 +00:00
Heikki Linnakangas 6b0937cd58 Fix typo spotted by Thom Brown. 2010-07-16 11:35:40 +00:00
Heikki Linnakangas 8f9c461175 Add a paragraph explaining what restartpoints are. Mention that
wal_keep_segments does not take effect during recovery.

Fujii Masao
2010-07-16 11:20:23 +00:00
Bruce Momjian c9b142d965 Doc change: effected -> affected, per correction from Matthew Wakeling 2010-07-08 16:44:12 +00:00
Bruce Momjian e3243488b0 Document the interaction of write-barrier-enabled file systems, and BBU
caches, per June email thread.
2010-07-07 14:42:09 +00:00
Bruce Momjian ea9c103237 Add "SSD" acronym mention for solid state drive mention. 2010-04-13 14:15:25 +00:00
Peter Eisentraut a95e51962d Update broken and permanently moved links 2010-03-17 17:12:31 +00:00
Bruce Momjian 9295eea839 Document ATAPI FLUSH CACHE EXT. 2010-02-27 20:16:17 +00:00
Bruce Momjian adfb444581 Document ATAPI drive flush command, and mention SSD drives. 2010-02-27 01:39:46 +00:00
Bruce Momjian de9ec65431 Document that many solid-state drives have volatile write-back caches. 2010-02-20 18:28:37 +00:00
Bruce Momjian bf62b1a078 Proofreading improvements for the Administration documentation book. 2010-02-03 17:25:06 +00:00
Bruce Momjian cb98f61538 fsync test tools
Add link to exteran fsync testing script and our fsync test tool.
2009-11-28 16:21:31 +00:00