postgresql/contrib/pg_autovacuum
Bruce Momjian bd18c50ba8 I have updated my pg_autovacuum program (formerly pg_avd, the name
changed as per discussion on the patches list).

This version should be a good bit better.  It addresses all the issues
pointed out by Neil Conway. Vacuum and Analyze are now handled
separately.  It now monitors for xid wraparound.  The number of database
connections and queries has been significantly reduced compared the
previous version.  I have moved it from bin to contrib.  More detail on
the changes are in the TODO file.

I have not tested the xid wraparound code as I have to let my AthlonXP
1600 run select 1 in a tight loop for approx. two days in order to
perform the required 500,000,000 xacts.

Matthew T. O'Connor
2003-03-20 18:14:46 +00:00
..
Makefile I have updated my pg_autovacuum program (formerly pg_avd, the name 2003-03-20 18:14:46 +00:00
README I have updated my pg_autovacuum program (formerly pg_avd, the name 2003-03-20 18:14:46 +00:00
TODO I have updated my pg_autovacuum program (formerly pg_avd, the name 2003-03-20 18:14:46 +00:00
pg_autovacuum.c I have updated my pg_autovacuum program (formerly pg_avd, the name 2003-03-20 18:14:46 +00:00
pg_autovacuum.h I have updated my pg_autovacuum program (formerly pg_avd, the name 2003-03-20 18:14:46 +00:00

README

pg_autovacuum README

pg_autovacuum is a libpq client program that monitors all the databases of a
postgresql server.  It uses the stats collector to monitor insert, update and
delete activity.  When an individual table exceeds it's insert or delete
threshold (more detail on thresholds below) then that table is vacuumed or
analyzed.  This allows postgresql to keep the fsm and table statistics up to
date without having to schedule periodic vacuums with cron regardless of need.

The primary benefit of pg_autovacuum is that the FSM and table statistic information
are updated as needed.  When a table is actively changed pg_autovacuum performs the
necessary vacuums and analyzes, when a table is inactive, no cycles are wasted
performing vacuums and analyzes that are not needed.

A secondary benefit of pg_autovacuum is that it guarantees that a database wide
vacuum is performed prior to xid wraparound.  This is important as failing to do
so can result in major data loss.

INSTALL:
To use pg_autovacuum, uncompress the tar.gz into the contrib directory and modify the
contrib/Makefile to include the pg_autovacuum directory.  pg_autovacuum will then be made as
part of the standard postgresql install.

make sure that the folowing are set in postgresql.conf
stats_start_collector = true
stats_row_level = true

start up the postmaster
then, just execute the pg_autovacuum executable.


Command line arguments:
pg_autovacuum has the following optional arguments:
-d debug: 0 silent, 1 basic info, 2 more debug info,  etc...
-s sleep base value: see "Sleeping" below.
-S sleep scaling factor: see "Sleeping" below.
-t tuple base threshold: see Vacuuming.
-T tuple scaling factor: see Vacuuming.
-U username: Username pg_autovacuum will use to connect with, if not specified the
   current username is used
-P password: Password pg_autovacuum will use to connect with.
-H host: host name or IP to connect too.
-p port: port used for connection.
-h help: list of command line options.

All arguments have default values defined in pg_autovacuum.h.  At the time of this
writing they are:
#define AUTOVACUUM_DEBUG    1
#define BASETHRESHOLD       100
#define SCALINGFACTOR       2
#define SLEEPVALUE          3
#define SLEEPSCALINGFACTOR  2
#define UPDATE_INTERVAL     2


Vacuum and Analyze:
pg_autovacuum performes either a vacuums analyze or just analyze depending on the table activity.
If the number of (inserts + updates) > insertThreshold, then an only an analyze is performed.
If the number of (deletes + updates ) > deleteThreshold, then a vacuum analyze is performed.
deleteThreshold is equal to: tuple_base_value + (tuple_scaling_factor * "number of tuples in the table")
insertThreshold is equal to: 0.5 * tuple_base_value + (tuple_scaling_factor * "number of tuples in the table")
The insertThreshold is half the deleteThreshold because it's a much lighter operation (approx 5%-10% of vacuum),
so running it more often costs us little in performance degredation.

Sleeping:
pg_autovacuum sleeps after it is done checking all the databases.  It does this so as
to limit the amount of system resources it consumes.  This also allows the system
administrator to configure pg_autovacuum to be more or less aggressive.  Reducing the
sleep time will cause pg_autovacuum to respond more quickly to changes, be they database
addition / removal, table addition / removal, or just normal table activity.  However,
setting these values to high can have a negative net effect on the server.  If a table
gets vacuumed 5 times during the course of a large update, it might take much longer
than if it was vacuumed only once.
The total time it sleeps is equal to:
base_sleep_value + sleep_scaling_factor * "duration of the previous loop"

What it monitors:
pg_autovacuum dynamically generates a list of databases and tables to monitor, in
addition it will dynamically add and remove databases and tables that are
removed from the database server while pg_autovacuum is running.