postgresql/contrib/pg_autovacuum
Bruce Momjian 9243664dd4 This patch updates pg_autovacuum in several ways:
* A few bug fixes
* fixes solaris compile and crash issue
* decouple vacuum analyze and analyze thresholds
* detach from tty (dameonize)
* improved logging layout
* more conservative default configuration
* improved, expanded and updated README

please apply and 1st convenience, or before code freeze which ever comes
first :-)

At this point I think I have brought pg_autovacuum and its client side
design as far as I think it should go.  It works, keeping file sizes in
check, helps performance and give the administrator a fair amount
flexibility in configuring it.

Next up is to do the FSM based design that is integrated into the back
end.

p.s. Thanks to Christopher Browne for his help.

Matthew T. O'Connor
2003-06-12 01:36:44 +00:00
..
Makefile I have updated my pg_autovacuum program (formerly pg_avd, the name 2003-03-20 18:14:46 +00:00
pg_autovacuum.c This patch updates pg_autovacuum in several ways: 2003-06-12 01:36:44 +00:00
pg_autovacuum.h This patch updates pg_autovacuum in several ways: 2003-06-12 01:36:44 +00:00
README.pg_autovacuum This patch updates pg_autovacuum in several ways: 2003-06-12 01:36:44 +00:00
TODO This patch updates pg_autovacuum in several ways: 2003-06-12 01:36:44 +00:00

pg_autovacuum README
--------------------

pg_autovacuum is a libpq client program that monitors all the
databases associated with a postgresql server.  It uses the stats
collector to monitor insert, update and delete activity.  

When a table exceeds its insert or delete threshold (more detail
on thresholds below) then that table will be  vacuumed or analyzed.  

This allows postgresql to keep the fsm and table statistics up to
date, and eliminates the need to schedule periodic vacuums.

The primary benefit of pg_autovacuum is that the FSM and table
statistic information are updated as needed.  When a table is actively
changing, pg_autovacuum will perform the necessary vacuums and
analyzes, whereas if a table remains static, no cycles will be wasted
performing unnecessary vacuums/analyzes.

A secondary benefit of pg_autovacuum is that it ensures that a
database wide vacuum is performed prior to xid wraparound.  This is an
important, if rare, problem, as failing to do so can result in major
data loss.


KNOWN ISSUES:
-------------
pg_autovacuum has been tested under Redhat Linux (by me) and Solaris (by
Christopher B. Browne) and all known bugs have been resolved.  Please report
any problems to the hackers list.

pg_autovacuum does not get started automatically by either the postmaster or
by pg_ctl.  Along the sames lines, when the postmaster exits no one tells
pg_autovacuum.  The result is that at the start of the next loop,
pg_autovacuum fails to connect to the server and exits.  Any time  it fails
to connect pg_autovacuum exits.

pg_autovacuum requires that the stats system be enabled and reporting row
level stats.  The overhead of the stats system has been shown to be
significant under certain workloads.  For instance a tight loop of queries
performing "select 1" was nearly 30% slower with stats enabled.  However,
in practice with more realistic workloads, the stats system overhead is
usually nominal.


INSTALL:
--------

As of postgresql v7.4 pg_autovacuum is included in the main source tree
under contrib.  Therefore you just make && make install (similar to most other
contrib modules) and it will be installed for you.

If you are using an earlier version of postgresql just uncompress the tar.gz
into the contrib directory and modify the contrib/Makefile to include the pg_autovacuum
directory.  pg_autovacuum will then be made as part of the standard
postgresql install.

make sure that the folowing are set in postgresql.conf

  stats_start_collector = true
  stats_row_level = true

start up the postmaster, then execute the pg_autovacuum executable.


Command line arguments:
-----------------------

pg_autovacuum has the following optional arguments:

-d debug: 0 silent, 1 basic info, 2 more debug info,  etc...
-D dameonize: Detach from tty and run in background.
-s sleep base value: see "Sleeping" below.
-S sleep scaling factor: see "Sleeping" below.
-v vacuum base threshold: see Vacuum and Analyze.
-V vacuum scaling factor: see Vacuum and Analyze.
-a analyze base threshold: see Vacuum and Analyze.
-A analyze scaling factor: see Vacuum and Analyze.
-L log file: Name of file to which output is submitted, otherwise STDERR
-U username: Username pg_autovacuum will use to connect with, if not
   specified the current username is used.
-P password: Password pg_autovacuum will use to connect with.
-H host: host name or IP to connect too.
-p port: port used for connection.
-h help: list of command line options.

All arguments have default values defined in pg_autovacuum.h.  At the
time of writing they are:

-d 1
-v 1000
-V 2   
-a 500 (half of -v is not specified)
-A 1   (half of -v is not specified)
-s 300 (5 minutes)
-S 2


Vacuum and Analyze:
-------------------

pg_autovacuum performs either a vacuum analyze or just analyze depending
on the quantity and type of table activity (insert, update, or delete):

- If the number of (inserts + updates + deletes) > AnalyzeThreshold, then
  only an analyze is performed.

- If the number of (deletes + updates ) > VacuumThreshold, then a
  vacuum analyze is performed.

deleteThreshold is equal to: 
    vacuum_base_value + (vacuum_scaling_factor * "number of tuples in the table")

insertThreshold is equal to: 
    analyze_base_value + (analyze_scaling_factor * "number of tuples in the table")

The AnalyzeThreshold defaults to half of the VacuumThreshold since it
represents a much less expensive operation (approx 5%-10% of vacuum), and
running it more often should not substantially degrade system performance.

Sleeping:
---------

pg_autovacuum sleeps for a while after it is done checking all the
databases.  It does this in order to limit the amount of system
resources it consumes.  This also allows the system administrator to
configure pg_autovacuum to be more or less aggressive.

Reducing the sleep time will cause pg_autovacuum to respond more
quickly to changes, whether they be database addition/removal, table
addition/removal, or just normal table activity.

On the other hand, setting pg_autovaccum to sleep values to agressivly
(for too short a period of time) can have a negative effect on server
performance.  If a table gets vacuumed 5 times during the course of a
large update, this is likely to take much longer than if the table was
vacuumed only once, at the end.

The total time it sleeps is equal to:

  base_sleep_value + sleep_scaling_factor * "duration of the previous
  loop"

Note that timing measurements are made in seconds; specifying
"pg_vacuum -s 1" means pg_autovacuum could poll the database upto 60 times
minute.  In a system with large tables where vacuums may run for several
minutes, longer times between vacuums are likely to be appropriate.

What pg_autovacuum monitors:
----------------------------

pg_autovacuum dynamically generates a list of all databases and tables that
exist on the server.  It will dynamically add and remove databases and
tables that are removed from the database server while pg_autovacuum is
running.  Overhead is fairly small per object.  For example: 10 databases
with 10 tables each appears to less than 10k of memory on my Linux box.