Move most /contrib README files into SGML. Some still need conversion

or will never be converted.
This commit is contained in:
Bruce Momjian 2007-11-10 23:30:46 +00:00
parent 6e414a171e
commit c3c69ab4fd
60 changed files with 9280 additions and 5635 deletions

View File

@ -1,48 +0,0 @@
PostgreSQL Administration Functions
===================================
This directory is a PostgreSQL 'contrib' module which implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
Installation
============
This module is normally distributed as a PostgreSQL 'contrib' module. To
install it from a pre-configured source tree run the following commands
as a user with appropriate privileges from the adminpack source directory:
make
make install
Alternatively, if you have a PostgreSQL 8.2 or higher installation but no
source tree you can install using PGXS. Simply run the following commands the
adminpack source directory:
make USE_PGXS=1
make USE_PGXS=1 install
pgAdmin will look for the functions in the Maintenance Database (usually
"postgres" for 8.2 servers) specified in the connection dialogue for the server.
To install the functions in the database, either run the adminpack.sql script
using the pgAdmin SQL tool (and then close and reopen the connection to the
freshly instrumented server), or run the script using psql, eg:
psql -U postgres postgres < adminpack.sql
Other administration tools that use this module may have different requirements,
please consult the tool's documentation for further details.
Objects implemented (superuser only)
====================================
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()

View File

@ -1,55 +0,0 @@
This is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
All work was done by Teodor Sigaev (teodor@stack.net) , Oleg Bartunov
(oleg@sai.msu.su), Janko Richter (jankorichter@yahoo.de).
See http://www.sai.msu.su/~megera/postgres/gist for additional
information.
NEWS:
Apr 17, 2004 - Performance optimizing
Jan 21, 2004 - add support for bytea, numeric, bit, varbit, inet/cidr
Jan 17, 2004 - Reorganizing code and add support for char, varchar/text
Jan 10, 2004 - btree_gist now support oid , timestamp with time zone ,
time with and without time zone, date , interval
money, macaddr
Feb 5, 2003 - btree_gist now support int2, int8, float4, float8
NOTICE:
This version will only work with PostgreSQL version 7.4 and above
because of changes in the system catalogs and the function call
interface.
If you want to index varchar attributes, you have to index using
the function text(<varchar>):
Example:
CREATE TABLE test ( a varchar(23) );
CREATE INDEX testidx ON test USING GIST ( text(a) );
INSTALLATION:
gmake
gmake install
-- load functions
psql <database> < btree_gist.sql
REGRESSION TEST:
gmake installcheck
EXAMPLE USAGE:
create table test (a int4);
-- create index
create index testidx on test using gist (a);
-- query
select * from test where a < 10;

View File

@ -1,56 +0,0 @@
$PostgreSQL: pgsql/contrib/chkpass/README.chkpass,v 1.5 2007/10/01 19:06:48 darcy Exp $
Chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
The encryption uses the standard Unix function crypt(), and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
Here is some sample usage:
test=# create table test (p chkpass);
CREATE TABLE
test=# insert into test values ('hello');
INSERT 0 1
test=# select * from test;
p
----------------
:dVGkpXdOrE3ko
(1 row)
test=# select raw(p) from test;
raw
---------------
dVGkpXdOrE3ko
(1 row)
test=# select p = 'hello' from test;
?column?
----------
t
(1 row)
test=# select p = 'goodbye' from test;
?column?
----------
f
(1 row)
D'Arcy J.M. Cain
darcy@druid.net

View File

@ -1,355 +0,0 @@
This directory contains the code for the user-defined type,
CUBE, representing multidimensional cubes.
FILES
-----
Makefile building instructions for the shared library
README.cube the file you are now reading
cube.c the implementation of this data type in c
cube.sql.in SQL code needed to register this type with postgres
(transformed to cube.sql by make)
cubedata.h the data structure used to store the cubes
cubeparse.y the grammar file for the parser (used by cube_in() in cube.c)
cubescan.l scanner rules (used by cube_yyparse() in cubeparse.y)
INSTALLATION
============
To install the type, run
make
make install
The user running "make install" may need root access; depending on how you
configured the PostgreSQL installation paths.
This only installs the type implementation and documentation. To make the
type available in any particular database, as a postgres superuser do:
psql -d databasename < cube.sql
If you install the type in the template1 database, all subsequently created
databases will inherit it.
To test the new type, after "make install" do
make installcheck
If it fails, examine the file regression.diffs to find out the reason (the
test code is a direct adaptation of the regression tests from the main
source tree).
By default the external functions are made executable by anyone.
SYNTAX
======
The following are valid external representations for the CUBE type:
'x' A floating point value representing
a one-dimensional point or one-dimensional
zero length cubement
'(x)' Same as above
'x1,x2,x3,...,xn' A point in n-dimensional space,
represented internally as a zero volume box
'(x1,x2,x3,...,xn)' Same as above
'(x),(y)' 1-D cubement starting at x and ending at y
or vice versa; the order does not matter
'(x1,...,xn),(y1,...,yn)' n-dimensional box represented by
a pair of its opposite corners, no matter which.
Functions take care of swapping to achieve
"lower left -- upper right" representation
before computing any values
Grammar
-------
rule 1 box -> O_BRACKET paren_list COMMA paren_list C_BRACKET
rule 2 box -> paren_list COMMA paren_list
rule 3 box -> paren_list
rule 4 box -> list
rule 5 paren_list -> O_PAREN list C_PAREN
rule 6 list -> FLOAT
rule 7 list -> list COMMA FLOAT
Tokens
------
n [0-9]+
integer [+-]?{n}
real [+-]?({n}\.{n}?|\.{n})
FLOAT ({integer}|{real})([eE]{integer})?
O_BRACKET \[
C_BRACKET \]
O_PAREN \(
C_PAREN \)
COMMA \,
Examples of valid CUBE representations:
--------------------------------------
'x' A floating point value representing
a one-dimensional point (or, zero-length
one-dimensional interval)
'(x)' Same as above
'x1,x2,x3,...,xn' A point in n-dimensional space,
represented internally as a zero volume cube
'(x1,x2,x3,...,xn)' Same as above
'(x),(y)' A 1-D interval starting at x and ending at y
or vice versa; the order does not matter
'[(x),(y)]' Same as above
'(x1,...,xn),(y1,...,yn)' An n-dimensional box represented by
a pair of its diagonally opposite corners,
regardless of order. Swapping is provided
by all comarison routines to ensure the
"lower left -- upper right" representation
before actaul comparison takes place.
'[(x1,...,xn),(y1,...,yn)]' Same as above
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
DEFAULTS
========
I believe this union:
select cube_union('(0,5,2),(2,3,1)','0');
cube_union
-------------------
(0, 0, 0),(2, 5, 2)
(1 row)
does not contradict to the common sense, neither does the intersection
select cube_inter('(0,-1),(1,1)','(-2),(2)');
cube_inter
-------------
(0, 0),(1, 0)
(1 row)
In all binary operations on differently sized boxes, I assume the smaller
one to be a cartesian projection, i. e., having zeroes in place of coordinates
omitted in the string representation. The above examples are equivalent to:
cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
The following containment predicate uses the point syntax,
while in fact the second argument is internally represented by a box.
This syntax makes it unnecessary to define the special Point type
and functions for (box,point) predicates.
select cube_contains('(0,0),(1,1)', '0.5,0.5');
cube_contains
--------------
t
(1 row)
PRECISION
=========
Values are stored internally as 64-bit floating point numbers. This means that
numbers with more than about 16 significant digits will be truncated.
USAGE
=====
The access method for CUBE is a GiST index (gist_cube_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/seg).
The operators supported by the GiST access method include:
a = b Same as
The cubements a and b are identical.
a && b Overlaps
The cubements a and b overlap.
a @> b Contains
The cubement a contains the cubement b.
a <@ b Contained in
The cubement a is contained in b.
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
Other operators:
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
The following functions are available:
cube_distance(cube, cube) returns double
cube_distance returns the distance between two cubes. If both cubes are
points, this is the normal distance function.
cube(float8) returns cube
This makes a one dimensional cube with both coordinates the same.
If the type of the argument is a numeric type other than float8 an
explicit cast to float8 may be needed.
cube(1) == '(1)'
cube(float8, float8) returns cube
This makes a one dimensional cube.
cube(1,2) == '(1),(2)'
cube(float8[]) returns cube
This makes a zero-volume cube using the coordinates defined by the
array.
cube(ARRAY[1,2]) == '(1,2)'
cube(float8[], float8[]) returns cube
This makes a cube, with upper right and lower left coordinates as
defined by the 2 float arrays. Arrays must be of the same length.
cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
cube(cube, float8) returns cube
This builds a new cube by adding a dimension on to an existing cube with
the same values for both parts of the new coordinate. This is useful for
building cubes piece by piece from calculated values.
cube('(1)',2) == '(1,2),(1,2)'
cube(cube, float8, float8) returns cube
This builds a new cube by adding a dimension on to an existing cube.
This is useful for building cubes piece by piece from calculated values.
cube('(1,2)',3,4) == '(1,3),(2,4)'
cube_dim(cube) returns int
cube_dim returns the number of dimensions stored in the the data structure
for a cube. This is useful for constraints on the dimensions of a cube.
cube_ll_coord(cube, int) returns double
cube_ll_coord returns the nth coordinate value for the lower left corner
of a cube. This is useful for doing coordinate transformations.
cube_ur_coord(cube, int) returns double
cube_ur_coord returns the nth coordinate value for the upper right corner
of a cube. This is useful for doing coordinate transformations.
cube_subset(cube, int[]) returns cube
Builds a new cube from an existing cube, using a list of dimension indexes
from an array. Can be used to find both the ll and ur coordinate of single
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
Or can be used to drop dimensions, or reorder them as desired, e.g.:
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = '(5, 3, 1, 1),(8, 7, 6, 6)'
cube_is_point(cube) returns bool
cube_is_point returns true if a cube is also a point. This is true when the
two defining corners are the same.
cube_enlarge(cube, double, int) returns cube
cube_enlarge increases the size of a cube by a specified radius in at least
n dimensions. If the radius is negative the box is shrunk instead. This
is useful for creating bounding boxes around a point for searching for
nearby points. All defined dimensions are changed by the radius. If n
is greater than the number of defined dimensions and the cube is being
increased (r >= 0) then 0 is used as the base for the extra coordinates.
LL coordinates are decreased by r and UR coordinates are increased by r. If
a LL coordinate is increased to larger than the corresponding UR coordinate
(this can only happen when r < 0) than both coordinates are set to their
average. To make it harder for people to break things there is an effective
maximum on the dimension of cubes of 100. This is set in cubedata.h if
you need something bigger.
There are a few other potentially useful functions defined in cube.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
For examples of usage, see sql/cube.sql
CREDITS
=======
This code is essentially based on the example written for
Illustra, http://garcia.me.berkeley.edu/~adong/rtree
My thanks are primarily to Prof. Joe Hellerstein
(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST
(http://gist.cs.berkeley.edu/), and to his former student, Andy Dong
(http://best.me.berkeley.edu/~adong/), for his exemplar.
I am also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like to
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
for the years of faithful support of my database research.
------------------------------------------------------------------------
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
selkovjr@mcs.anl.gov
------------------------------------------------------------------------
Minor updates to this package were made by Bruno Wolff III <bruno@wolff.to>
in August/September of 2002.
These include changing the precision from single precision to double
precision and adding some new functions.
------------------------------------------------------------------------
Additional updates were made by Joshua Reich <josh@root.net> in July 2006.
These include cube(float8[], float8[]) and cleaning up the code to use
the V1 call protocol instead of the deprecated V0 form.

View File

@ -1,109 +0,0 @@
/*
* dblink
*
* Functions returning results from a remote database
*
* Joe Conway <mail@joeconway.com>
* And contributors:
* Darko Prenosil <Darko.Prenosil@finteh.hr>
* Shridhar Daithankar <shridhar_daithankar@persistent.co.in>
* Kai Londenberg (K.Londenberg@librics.de)
*
* Copyright (c) 2001-2007, PostgreSQL Global Development Group
* ALL RIGHTS RESERVED;
*
* Permission to use, copy, modify, and distribute this software and its
* documentation for any purpose, without fee, and without a written agreement
* is hereby granted, provided that the above copyright notice and this
* paragraph and the following two paragraphs appear in all copies.
*
* IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*
* THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIMS ANY WARRANTIES,
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
*
*/
Release Notes:
27 August 2006
- Added async query capability. Original patch by
Kai Londenberg (K.Londenberg@librics.de), modified by Joe Conway
Version 0.7 (as of 25 Feb, 2004)
- Added new version of dblink, dblink_exec, dblink_open, dblink_close,
and, dblink_fetch -- allows ERROR on remote side of connection to
throw NOTICE locally instead of ERROR
Version 0.6
- functions deprecated in 0.5 have been removed
- added ability to create "named" persistent connections
Version 0.5
- dblink now supports use directly as a table function; this is the new
preferred usage going forward
- Use of dblink_tok is now deprecated; original form of dblink is also
deprecated. They _will_ be removed in the next version.
- dblink_last_oid is also deprecated; use dblink_exec() which returns
the command status as a single row, single column result.
- Original dblink, dblink_tok, and dblink_last_oid are commented out in
dblink.sql; remove the comments to use the deprecated functions.
- dblink_strtok() and dblink_replace() functions were removed. Use
split() and replace() respectively (new backend functions in
PostgreSQL 7.3) instead.
- New functions: dblink_exec() for non-SELECT queries; dblink_connect()
opens connection that persists for duration of a backend;
dblink_disconnect() closes a persistent connection; dblink_open()
opens a cursor; dblink_fetch() fetches results from an open cursor.
dblink_close() closes a cursor.
- New test suite: dblink_check.sh, dblink.test.sql,
dblink.test.expected.out. Execute dblink_check.sh from the same
directory as the other two files. Output is dblink.test.out and
dblink.test.diff. Note that dblink.test.sql is a good source
of example usage.
Version 0.4
- removed cursor wrap around input sql to allow for remote
execution of INSERT/UPDATE/DELETE
- dblink now returns a resource id instead of a real pointer
- added several utility functions -- see below
Version 0.3
- fixed dblink invalid pointer causing corrupt elog message
- fixed dblink_tok improper handling of null results
- fixed examples in README.dblink
Version 0.2
- initial release
Installation:
Place these files in a directory called 'dblink' under 'contrib' in the PostgreSQL source tree. Then run:
make
make install
You can use dblink.sql to create the functions in your database of choice, e.g.
psql template1 < dblink.sql
installs dblink functions into database template1
Documentation:
Note: Parameters representing relation names must include double
quotes if the names are mixed-case or contain special characters. They
must also be appropriately qualified with schema name if applicable.
See the following files:
doc/connection
doc/cursor
doc/query
doc/execute
doc/misc
==================================================================
-- Joe Conway

View File

@ -1,127 +0,0 @@
This contrib package contains two different approaches to calculating
great circle distances on the surface of the Earth. The one described
first depends on the contrib/cube package (which MUST be installed before
earthdistance is installed). The second one is based on the point
datatype using latitude and longitude for the coordinates. The install
script makes the defined functions executable by anyone.
Make sure contrib/cube has been installed.
make
make install
make installcheck
To use these functions in a particular database as a postgres superuser do:
psql databasename < earthdistance.sql
-------------------------------------------
contrib/cube based Earth distance functions
Bruno Wolff III
September 2002
A spherical model of the Earth is used.
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the distance from the center of the Earth.
The radius of the Earth is obtained from the earth() function. It is
given in meters. But by changing this one function you can change it
to use some other units or to use a different value of the radius
that you feel is more appropiate.
This package also has applications to astronomical databases as well.
Astronomers will probably want to change earth() to return a radius of
180/pi() so that distances are in degrees.
Functions are provided to allow for input in latitude and longitude (in
degrees), to allow for output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
The functions are all 'sql' functions. If you want to make these functions
executable by other people you will also have to make the referenced
cube functions executable. cube(text), cube(float8), cube(cube,float8),
cube_distance(cube,cube), cube_ll_coord(cube,int) and
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
might be useful for looking at bounding box coordinates in user applications.
A domain of type cube named earth is defined.
There are constraints on it defined to make sure the cube is a point,
that it does not have more than 3 dimensions and that it is very near
the surface of a sphere centered about the origin with the radius of
the Earth.
The following functions are provided:
earth() - Returns the radius of the Earth in meters.
sec_to_gc(float8) - Converts the normal straight line (secant) distance between
between two points on the surface of the Earth to the great circle distance
between them.
gc_to_sec(float8) - Converts the great circle distance between two points
on the surface of the Earth to the normal straight line (secant) distance
between them.
ll_to_earth(float8, float8) - Returns the location of a point on the surface
of the Earth given its latitude (argument 1) and longitude (argument 2) in
degrees.
latitude(earth) - Returns the latitude in degrees of a point on the surface
of the Earth.
longitude(earth) - Returns the longitude in degrees of a point on the surface
of the Earth.
earth_distance(earth, earth) - Returns the great circle distance between
two points on the surface of the Earth.
earth_box(earth, float8) - Returns a box suitable for an indexed search using
the cube @> operator for points within a given great circle distance of a
location. Some points in this box are further than the specified great circle
distance from the location so a second check using earth_distance should be
made at the same time.
One advantage of using cube representation over a point using latitude and
longitude for coordinates, is that you don't have to worry about special
conditions at +/- 180 degrees of longitude or near the poles.
Below is the documentation for the Earth distance operator that works
with the point data type.
---------------------------------------------------------------------
I corrected a bug in the geo_distance code where two double constants
were declared as int. I also changed the distance function to use
the haversine formula which is more accurate for small distances.
Bruno Wolff
September 2002
---------------------------------------------------------------------
Date: Wed, 1 Apr 1998 15:19:32 -0600 (CST)
From: Hal Snyder <hal@vailsys.com>
To: vmehr@ctp.com
Subject: [QUESTIONS] Re: Spatial data, R-Trees
> From: Vivek Mehra <vmehr@ctp.com>
> Date: Wed, 1 Apr 1998 10:06:50 -0500
> Am just starting out with PostgreSQL and would like to learn more about
> the spatial data handling ablilities of postgreSQL - in terms of using
> R-tree indexes, user defined types, operators and functions.
>
> Would you be able to suggest where I could find some code and SQL to
> look at to create these?
Here's the setup for adding an operator '<@>' to give distance in
statute miles between two points on the Earth's surface. Coordinates
are in degrees. Points are taken as (longitude, latitude) and not vice
versa as longitude is closer to the intuitive idea of x-axis and
latitude to y-axis.
There's C source, Makefile for FreeBSD, and SQL for installing and
testing the function.
Let me know if anything looks fishy!

View File

@ -1,144 +0,0 @@
/*
* fuzzystrmatch.c
*
* Functions for "fuzzy" comparison of strings
*
* Joe Conway <mail@joeconway.com>
*
* Copyright (c) 2001-2007, PostgreSQL Global Development Group
* ALL RIGHTS RESERVED;
*
* levenshtein()
* -------------
* Written based on a description of the algorithm by Michael Gilleland
* found at http://www.merriampark.com/ld.htm
* Also looked at levenshtein.c in the PHP 4.0.6 distribution for
* inspiration.
*
* metaphone()
* -----------
* Modified for PostgreSQL by Joe Conway.
* Based on CPAN's "Text-Metaphone-1.96" by Michael G Schwern <schwern@pobox.com>
* Code slightly modified for use as PostgreSQL function (palloc, elog, etc).
* Metaphone was originally created by Lawrence Philips and presented in article
* in "Computer Language" December 1990 issue.
*
* dmetaphone() and dmetaphone_alt()
* ---------------------------------
* A port of the DoubleMetaphone perl module by Andrew Dunstan. See dmetaphone.c
* for more detail.
*
* soundex()
* -----------
* Folded existing soundex contrib into this one. Renamed text_soundex() (C function)
* to soundex() for consistency.
*
* difference()
* ------------
* Return the difference between two strings' soundex values. Kris Jurka
*
* Permission to use, copy, modify, and distribute this software and its
* documentation for any purpose, without fee, and without a written agreement
* is hereby granted, provided that the above copyright notice and this
* paragraph and the following two paragraphs appear in all copies.
*
* IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*
* THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
*
*/
Version 0.3 (30 June, 2004):
Release Notes:
Version 0.3
- added double metaphone code from Andrew Dunstan
- change metaphone so that an empty input string causes an empty
output string to be returned, instead of throwing an ERROR
- fixed examples in README.soundex
Version 0.2
- folded soundex contrib into this one
Version 0.1
- initial release
Installation:
Place these files in a directory called 'fuzzystrmatch' under 'contrib' in the PostgreSQL source tree. Then run:
make
make install
You can use fuzzystrmatch.sql to create the functions in your database of choice, e.g.
psql -U postgres template1 < fuzzystrmatch.sql
installs following functions into database template1:
levenshtein() - calculates the levenshtein distance between two strings
metaphone() - calculates the metaphone code of an input string
Documentation
==================================================================
Name
levenshtein -- calculates the levenshtein distance between two strings
Synopsis
levenshtein(text source, text target)
Inputs
source
any text string, 255 characters max, NOT NULL
target
any text string, 255 characters max, NOT NULL
Outputs
Returns int
Example usage
select levenshtein('GUMBO','GAMBOL');
==================================================================
Name
metaphone -- calculates the metaphone code of an input string
Synopsis
metaphone(text source, int max_output_length)
Inputs
source
any text string, 255 characters max, NOT NULL
max_output_length
maximum length of the output metaphone code; if longer, the output
is truncated to this length
Outputs
Returns text
Example usage
select metaphone('GUMBO',4);
==================================================================
-- Joe Conway

View File

@ -1,66 +0,0 @@
NOTE: Modified August 07, 2001 by Joe Conway. Updated for accuracy
after combining soundex code into the fuzzystrmatch contrib
---------------------------------------------------------------------
The Soundex system is a method of matching similar sounding names
(or any words) to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910, but it has little use
beyond English names (or the English pronunciation of names), and
it is not a linguistic tool.
When comparing two soundex values to determine similarity, the
difference function reports how close the match is on a scale
from zero to four, with zero being no match and four being an
exact match.
The following are some usage examples:
SELECT soundex('hello world!');
SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
CREATE TABLE s (nm text);
INSERT INTO s VALUES ('john');
INSERT INTO s VALUES ('joan');
INSERT INTO s VALUES ('wobbly');
INSERT INTO s VALUES ('jack');
SELECT * FROM s WHERE soundex(nm) = soundex('john');
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid;
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
'select soundex($1) = soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
'select soundex($1) < soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
'select soundex($1) > soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
'select soundex($1) <= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
'select soundex($1) >= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
'select soundex($1) <> soundex($2)'
LANGUAGE SQL;
DROP OPERATOR #= (text, text);
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
SELECT * FROM s WHERE s.nm #= 'john';
SELECT * FROM s WHERE difference(s.nm, 'john') > 2;

View File

@ -1,188 +0,0 @@
Hstore - contrib module for storing (key,value) pairs
[Online version] (http://www.sai.msu.su/~megera/oddmuse/index.cgi?Hstore)
Motivation
Many attributes rarely searched, semistructural data, lazy DBA
Authors
* Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
* Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
itself)
Operations
* hstore -> text - get value , perl analogy $h{key}
select 'a=>q, b=>g'->'a';
?
------
q
* hstore || hstore - concatenation, perl analogy %a=( %b, %c );
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
?column?
--------------------
"a"=>"b", "c"=>"d"
(1 row)
but, notice
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
?column?
----------
"a"=>"d"
(1 row)
* text => text - creates hstore type from two text strings
select 'a'=>'b';
?column?
----------
"a"=>"b"
* hstore @> hstore - contains operation, check if left operand contains right.
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
?column?
----------
f
(1 row)
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
?column?
----------
t
(1 row)
* hstore <@ hstore - contained operation, check if left operand is contained
in right
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
Functions
* akeys(hstore) - returns all keys from hstore as array
regression=# select akeys('a=>1,b=>2');
akeys
-------
{a,b}
* skeys(hstore) - returns all keys from hstore as strings
regression=# select skeys('a=>1,b=>2');
skeys
-------
a
b
* avals(hstore) - returns all values from hstore as array
regression=# select avals('a=>1,b=>2');
avals
-------
{1,2}
* svals(hstore) - returns all values from hstore as strings
regression=# select svals('a=>1,b=>2');
svals
-------
1
2
* delete (hstore,text) - delete (key,value) from hstore if key matches
argument.
regression=# select delete('a=>1,b=>2','b');
delete
----------
"a"=>"1"
* each(hstore) return (key, value) pairs
regression=# select * from each('a=>1,b=>2');
key | value
-----+-------
a | 1
b | 2
* exist (hstore,text)
* hstore ? text
- returns 'true if key is exists in hstore and false otherwise.
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
exist | ?column?
-------+----------
t | t
* defined (hstore,text) - returns true if key is exists in hstore and
its value is not NULL.
regression=# select defined('a=>NULL','a');
defined
---------
f
Indices
Module provides index support for '@>' and '?' operations.
create index hidx on testhstore using gist(h);
create index hidx on testhstore using gin(h);
Note
Use parenthesis in select below, because priority of 'is' is higher than that of '->'
select id from entrants where (info->'education_period') is not null;
Examples
* add key
update tt set h=h||'c=>3';
* delete key
update tt set h=delete(h,'k1');
* Statistics
hstore type, because of its intrinsic liberality, could contain a lot of
different keys. Checking for valid keys is the task of application.
Examples below demonstrate several techniques how to check keys statistics.
o simple example
select * from each('aaa=>bq, b=>NULL, ""=>1 ');
o using table
select (each(h)).key, (each(h)).value into stat from testhstore ;
o online stat
select key, count(*) from (select (each(h)).key from testhstore) as stat group by key order by count desc, key;
key | count
-----------+-------
line | 883
query | 207
pos | 203
node | 202
space | 197
status | 195
public | 194
title | 190
org | 189
...................

View File

@ -1,55 +0,0 @@
Integer aggregator/enumerator.
Many database systems have the notion of a one to many table.
A one to many table usually sits between two indexed tables,
as:
create table one_to_many(left int, right int) ;
And it is used like this:
SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
WHERE one_to_many.left = item;
This will return all the items in the right hand table for an entry
in the left hand table. This is a very common construct in SQL.
Now, this methodology can be cumbersome with a very large number of
entries in the one_to_many table. Depending on the order in which
data was entered, a join like this could result in an index scan
and a fetch for each right hand entry in the table for a particular
left hand entry.
If you have a very dynamic system, there is not much you can do.
However, if you have some data which is fairly static, you can
create a summary table with the aggregator.
CREATE TABLE summary as SELECT left, int_array_aggregate(right)
AS right FROM one_to_many GROUP BY left;
This will create a table with one row per left item, and an array
of right items. Now this is pretty useless without some way of using
the array, thats why there is an array enumerator.
SELECT left, int_array_enum(right) FROM summary WHERE left = item;
The above query using int_array_enum, produces the same results as:
SELECT left, right FROM one_to_many WHERE left = item;
The difference is that the query against the summary table has to get
only one row from the table, where as the query against "one_to_many"
must index scan and fetch a row for each entry.
On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
to a cost of 329. The query is a join between the one_to_many table,
select right, count(right) from
(
select left, int_array_enum(right) as right from summary join
(select left from left_table where left = item) as lefts
ON (summary.left = lefts.left )
) as list group by right order by count desc ;

View File

@ -1,185 +0,0 @@
This is an implementation of RD-tree data structure using GiST interface
of PostgreSQL. It has built-in lossy compression.
Current implementation provides index support for one-dimensional array of
integers: gist__int_ops, suitable for small and medium size of arrays (used by
default), and gist__intbig_ops for indexing large arrays (we use superimposed
signature with length of 4096 bits to represent sets). There is also a
non-default gin__int_ops for GIN indexes on integer arrays.
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist
for additional information. Andrey Oktyabrski did a great work on
adding new functions and operations.
FUNCTIONS:
int icount(int[]) - the number of elements in intarray
test=# select icount('{1,2,3}'::int[]);
icount
--------
3
(1 row)
int[] sort(int[], 'asc' | 'desc') - sort intarray
test=# select sort('{1,2,3}'::int[],'desc');
sort
---------
{3,2,1}
(1 row)
int[] sort(int[]) - sort in ascending order
int[] sort_asc(int[]),sort_desc(int[]) - shortcuts for sort
int[] uniq(int[]) - returns unique elements
test=# select uniq(sort('{1,2,3,2,1}'::int[]));
uniq
---------
{1,2,3}
(1 row)
int idx(int[], int item) - returns index of first intarray matching element to item, or
'0' if matching failed.
test=# select idx('{1,2,3,2,1}'::int[],2);
idx
-----
2
(1 row)
int[] subarray(int[],int START [, int LEN]) - returns part of intarray starting from
element number START (from 1) and length LEN.
test=# select subarray('{1,2,3,2,1}'::int[],2,3);
subarray
----------
{2,3,2}
(1 row)
int[] intset(int4) - casting int4 to int[]
test=# select intset(1);
intset
--------
{1}
(1 row)
OPERATIONS:
int[] && int[] - overlap - returns TRUE if arrays have at least one common element
int[] @> int[] - contains - returns TRUE if left array contains right array
int[] <@ int[] - contained - returns TRUE if left array is contained in right array
# int[] - returns the number of elements in array
int[] + int - push element to array ( add to end of array)
int[] + int[] - merge of arrays (right array added to the end of left one)
int[] - int - remove entries matched by right argument from array
int[] - int[] - remove right array from left
int[] | int - returns intarray - union of arguments
int[] | int[] - returns intarray as a union of two arrays
int[] & int[] - returns intersection of arrays
int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)')
query_int ~~ int[] - returns TRUE if array satisfies query (commutator of @@)
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
CHANGES:
August 6, 2002
1. Reworked patch from Andrey Oktyabrski (ano@spider.ru) with
functions: icount, sort, sort_asc, uniq, idx, subarray
operations: #, +, -, |, &
October 1, 2001
1. Change search method in array to binary
September 28, 2001
1. gist__int_ops now is without lossy
2. add sort entry in picksplit
September 21, 2001
1. Added support for boolean query (indexable operator @@, looks like
a @@ '1|(2&3)', perfomance is better in any case )
2. Done some small optimizations
March 19, 2001
1. Added support for toastable keys
2. Improved split algorithm for intbig (selection speedup is about 30%)
INSTALLATION:
gmake
gmake install
-- load functions
psql <database> < _int.sql
REGRESSION TEST:
gmake installcheck
EXAMPLE USAGE:
create table message (mid int not null,sections int[]);
create table message_section_map (mid int not null,sid int not null);
-- create indices
CREATE unique index message_key on message ( mid );
CREATE unique index message_section_map_key2 on message_section_map (sid, mid );
CREATE INDEX message_rdtree_idx on message using gist ( sections gist__int_ops);
-- select some messages with section in 1 OR 2 - OVERLAP operator
select message.mid from message where message.sections && '{1,2}';
-- select messages contains in sections 1 AND 2 - CONTAINS operator
select message.mid from message where message.sections @> '{1,2}';
-- the same, CONTAINED operator
select message.mid from message where '{1,2}' <@ message.sections;
BENCHMARK:
subdirectory bench contains benchmark suite.
cd ./bench
1. createdb TEST
2. psql TEST < ../_int.sql
3. ./create_test.pl | psql TEST
4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
with/without RD-Tree. Run script without arguments to
see availbale options.
a)test without RD-Tree (OR)
./bench.pl -d TEST -c -s 1,2 -v
b)test with RD-Tree
./bench.pl -d TEST -c -s 1,2 -v -r
BENCHMARKS:
Size of table <message>: 200000
Size of table <message_section_map>: 269133
Distribution of messages by sections:
section 0: 74377 messages
section 1: 16284 messages
section 50: 1229 messages
section 99: 683 messages
old - without RD-Tree support,
new - with RD-Tree
+----------+---------------+----------------+
|Search set|OR, time in sec|AND, time in sec|
| +-------+-------+--------+-------+
| | old | new | old | new |
+----------+-------+-------+--------+-------+
| 1| 0.625| 0.101| -| -|
+----------+-------+-------+--------+-------+
| 99| 0.018| 0.017| -| -|
+----------+-------+-------+--------+-------+
| 1,2| 0.766| 0.133| 0.628| 0.045|
+----------+-------+-------+--------+-------+
| 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
+----------+-------+-------+--------+-------+

View File

@ -1,220 +0,0 @@
-- EAN13 - UPC - ISBN (books) - ISMN (music) - ISSN (serials)
-------------------------------------------------------------
Copyright Germán Méndez Bravo (Kronuz), 2004 - 2006
This module is released under the same BSD license as the rest of PostgreSQL.
The information to implement this module was collected through
several sites, including:
http://www.isbn-international.org/
http://www.issn.org/
http://www.ismn-international.org/
http://www.wikipedia.org/
the prefixes used for hyphenation where also compiled from:
http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html
http://www.isbn-international.org/en/identifiers.html
http://www.ismn-international.org/ranges.html
Care was taken during the creation of the algorithms and they
were meticulously verified against the suggested algorithms
in the official ISBN, ISMN, ISSN User Manuals.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
THIS MODULE IS PROVIDED "AS IS" AND WITHOUT ANY WARRANTY
OF ANY KIND, EXPRESS OR IMPLIED.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-- Content of the Module
-------------------------------------------------
This directory contains definitions for a few PostgreSQL
data types, for the following international-standard namespaces:
EAN13, UPC, ISBN (books), ISMN (music), and ISSN (serials). This module
is inspired by Garrett A. Wollman's isbn_issn code.
I wanted the database to fully validate numbers and also to use the
upcoming ISBN-13 and the EAN13 standards, as well as to have it
automatically doing hyphenations for ISBN numbers.
This new module validates, and automatically adds the correct
hyphenations to the numbers. Also, it supports the new ISBN-13
numbers to be used starting in January 2007.
Premises:
1. ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers
2. EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)
3. some ISBN13 numbers can be displayed as ISBN
4. some ISMN13 numbers can be displayed as ISMN
5. some ISSN13 numbers can be displayed as ISSN
6. all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers
Note: All types are internally represented as 64 bit integers,
and internally all are consistently interchangeable.
We have the following data types:
+ EAN13 for European Article Numbers.
This type will always show the EAN13-display format.
Te output function for this is -> ean13_out()
+ ISBN13 for International Standard Book Numbers to be displayed in
the new EAN13-display format.
+ ISMN13 for International Standard Music Numbers to be displayed in
the new EAN13-display format.
+ ISSN13 for International Standard Serial Numbers to be displayed
in the new EAN13-display format.
These types will always display the long version of the ISxN (EAN13)
The output function to do this is -> ean13_out()
* The need for these types is just for displaying in different
ways the same data:
ISBN13 is actually the same as ISBN, ISMN13=ISMN and ISSN13=ISSN.
+ ISBN for International Standard Book Numbers to be displayed in
the current short-display format.
+ ISMN for International Standard Music Numbers to be displayed in
the current short-display format.
+ ISSN for International Standard Serial Numbers to be displayed
in the current short-display format.
These types will display the short version of the ISxN (ISxN 10)
whenever it's possible, and it will show ISxN 13 when it's
impossible to show the short version.
The output function to do this is -> isn_out()
+ UPC for Universal Product Codes.
UPC numbers are a subset of the EAN13 numbers (they are basically
EAN13 without the first '0' digit.)
The output function to do this is also -> isn_out()
We have the following input functions:
+ To take a string and return an EAN13 -> ean13_in()
+ To take a string and return valid ISBN or ISBN13 numbers -> isbn_in()
+ To take a string and return valid ISMN or ISMN13 numbers -> ismn_in()
+ To take a string and return valid ISSN or ISSN13 numbers -> issn_in()
+ To take a string and return an UPC codes -> upc_in()
We are able to cast from:
+ ISBN13 -> EAN13
+ ISMN13 -> EAN13
+ ISSN13 -> EAN13
+ ISBN -> EAN13
+ ISMN -> EAN13
+ ISSN -> EAN13
+ UPC -> EAN13
+ ISBN <-> ISBN13
+ ISMN <-> ISMN13
+ ISSN <-> ISSN13
We have two operator classes (for btree and for hash) so each data type
can be indexed for faster access.
The C API is implemented as:
extern Datum isn_out(PG_FUNCTION_ARGS);
extern Datum ean13_out(PG_FUNCTION_ARGS);
extern Datum ean13_in(PG_FUNCTION_ARGS);
extern Datum isbn_in(PG_FUNCTION_ARGS);
extern Datum ismn_in(PG_FUNCTION_ARGS);
extern Datum issn_in(PG_FUNCTION_ARGS);
extern Datum upc_in(PG_FUNCTION_ARGS);
On success:
+ isn_out() takes any of our types and returns a string containing
the shortes possible representation of the number.
+ ean13_out() takes any of our types and returns the
EAN13 (long) representation of the number.
+ ean13_in() takes a string and return a EAN13. Which, as stated in (2)
could or could not be any of our types, but it certainly is an EAN13
number. Only if the string is a valid EAN13 number, otherwise it fails.
+ isbn_in() takes a string and return an ISBN/ISBN13. Only if the string
is really a ISBN/ISBN13, otherwise it fails.
+ ismn_in() takes a string and return an ISMN/ISMN13. Only if the string
is really a ISMN/ISMN13, otherwise it fails.
+ issn_in() takes a string and return an ISSN/ISSN13. Only if the string
is really a ISSN/ISSN13, otherwise it fails.
+ upc_in() takes a string and return an UPC. Only if the string is
really a UPC, otherwise it fails.
(on failure, the functions 'ereport' the error)
-- Testing/Playing Functions
-------------------------------------------------
isn_weak(boolean) - Sets the weak input mode.
This function is intended for testing use only!
isn_weak() gets the current status of the weak mode.
"Weak" mode is used to be able to insert "invalid" data to a table.
"Invalid" as in the check digit being wrong, not missing numbers.
Why would you want to use the weak mode? well, it could be that
you have a huge collection of ISBN numbers, and that there are so many of
them that for weird reasons some have the wrong check digit (perhaps the
numbers where scanned from a printed list and the OCR got the numbers wrong,
perhaps the numbers were manually captured... who knows.) Anyway, the thing
is you might want to clean the mess up, but you still want to be able to have
all the numbers in your database and maybe use an external tool to access
the invalid numbers in the database so you can verify the information and
validate it more easily; as selecting all the invalid numbers in the table.
When you insert invalid numbers in a table using the weak mode, the number
will be inserted with the corrected check digit, but it will be flagged
with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
You can also force the insertion of invalid numbers even not in the weak mode,
appending the '!' character at the end of the number.
To work with invalid numbers, you can use two functions:
+ make_valid(), which validates an invalid number (deleting the invalid flag)
+ is_valid(), which checks for the invalid flag presence.
-- Examples of Use
-------------------------------------------------
--Using the types directly:
select isbn('978-0-393-04002-9');
select isbn13('0901690546');
select issn('1436-4522');
--Casting types:
-- note that you can only cast from ean13 to other type when the casted
-- number would be valid in the realm of the casted type;
-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
-- but these will:
select upc(ean13('0220356483481'));
select ean13(upc('220356483481'));
--Create a table with a single column to hold ISBN numbers:
create table test ( id isbn );
insert into test values('9780393040029');
--Automatically calculating check digits (observe the '?'):
insert into test values('220500896?');
insert into test values('978055215372?');
select issn('3251231?');
select ismn('979047213542?');
--Using the weak mode:
select isn_weak(true);
insert into test values('978-0-11-000533-4');
insert into test values('9780141219307');
insert into test values('2-205-00876-X');
select isn_weak(false);
select id from test where not is_valid(id);
update test set id=make_valid(id) where id = '2-205-00876-X!';
select * from test;
select isbn13(id) from test;
-- Contact
-------------------------------------------------
Please suggestions or bug reports to kronuz at users.sourceforge.net
Last reviewed on August 23, 2006 by Kronuz.

View File

@ -1,88 +0,0 @@
PostgreSQL type extension for managing Large Objects
----------------------------------------------------
Overview
One of the problems with the JDBC driver (and this affects the ODBC driver
also), is that the specification assumes that references to BLOBS (Binary
Large OBjectS) are stored within a table, and if that entry is changed, the
associated BLOB is deleted from the database.
As PostgreSQL stands, this doesn't occur. Large objects are treated as
objects in their own right; a table entry can reference a large object by
OID, but there can be multiple table entries referencing the same large
object OID, so the system doesn't delete the large object just because you
change or remove one such entry.
Now this is fine for new PostgreSQL-specific applications, but existing ones
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
that are not referenced by anything, and simply occupy disk space.
The Fix
I've fixed this by creating a new data type 'lo', some support functions, and
a Trigger which handles the orphaning problem. The trigger essentially just
does a 'lo_unlink' whenever you delete or modify a value referencing a large
object. When you use this trigger, you are assuming that there is only one
database reference to any large object that is referenced in a
trigger-controlled column!
The 'lo' type was created because we needed to differentiate between plain
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
but (after talking to Byron), the ODBC driver needed a unique type. They had
created an 'lo' type, but not the solution to orphaning.
You don't actually have to use the 'lo' type to use the trigger, but it may be
convenient to use it to keep track of which columns in your database represent
large objects that you are managing with the trigger.
Install
Ok, first build the shared library, and install. Typing 'make install' in the
contrib/lo directory should do it.
Then, as the postgres super user, run the lo.sql script in any database that
needs the features. This will install the type, and define the support
functions. You can run the script once in template1, and the objects will be
inherited by subsequently-created databases.
How to Use
The easiest way is by an example:
> create table image (title text, raster lo);
> create trigger t_raster before update or delete on image
> for each row execute procedure lo_manage(raster);
Create a trigger for each column that contains a lo type, and give the column
name as the trigger procedure argument. You can have more than one trigger on
a table if you need multiple lo columns in the same table, but don't forget to
give a different name to each trigger.
Issues
* Dropping a table will still orphan any objects it contains, as the trigger
is not executed.
Avoid this by preceding the 'drop table' with 'delete from {table}'.
If you already have, or suspect you have, orphaned large objects, see
the contrib/vacuumlo module to help you clean them up. It's a good idea
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
trigger.
* Some frontends may create their own tables, and will not create the
associated trigger(s). Also, users may not remember (or know) to create
the triggers.
As the ODBC driver needs a permanent lo type (& JDBC could be optimised to
use it if it's Oid is fixed), and as the above issues can only be fixed by
some internal changes, I feel it should become a permanent built-in type.
I'm releasing this into contrib, just to get it out, and tested.
Peter Mount <peter@retep.org.uk> June 13 1998

View File

@ -1,512 +0,0 @@
contrib/ltree module
ltree - is a PostgreSQL contrib module which contains implementation of data
types, indexed access methods and queries for data organized as a tree-like
structures.
This module will works for PostgreSQL version 7.3.
(version for 7.2 version is available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz)
-------------------------------------------------------------------------------
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist for
additional information. Authors would like to thank Eugeny Rodichev for helpful
discussions. Comments and bug reports are welcome.
-------------------------------------------------------------------------------
LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
itself). This work was done in framework of Russian Scientific Network and
partially supported by Russian Foundation for Basic Research and Stack Group.
-------------------------------------------------------------------------------
MOTIVATION
This is a placeholder for introduction to the problem. Hope, people reading
this document doesn't need it too much :-)
DEFINITIONS
A label of a node is a sequence of one or more words separated by blank
character '_' and containing letters and digits ( for example, [a-zA-Z0-9] for
C locale). The length of a label is limited by 256 bytes.
Example: 'Countries', 'Personal_Services'
A label path of a node is a sequence of one or more dot-separated labels
l1.l2...ln, represents path from root to the node. The length of a label path
is limited by 65Kb, but size <= 2Kb is preferrable. We consider it's not a
strict limitation ( maximal size of label path for DMOZ catalogue - http://
www.dmoz.org, is about 240 bytes !)
Example: 'Top.Countries.Europe.Russia'
We introduce several datatypes:
ltree
- is a datatype for label path.
ltree[]
- is a datatype for arrays of ltree.
lquery
- is a path expression that has regular expression in the label path and
used for ltree matching. Star symbol (*) is used to specify any number of
labels (levels) and could be used at the beginning and the end of lquery,
for example, '*.Europe.*'.
The following quantifiers are recognized for '*' (like in Perl):
{n} Match exactly n levels
{n,} Match at least n levels
{n,m} Match at least n but not more than m levels
{,m} Match at maximum m levels (eq. to {0,m})
It is possible to use several modifiers at the end of a label:
@ Do case-insensitive label matching
* Do prefix matching for a label
% Don't account word separator '_' in label matching, that is
'Russian%' would match 'Russian_nations', but not 'Russian'
lquery could contains logical '!' (NOT) at the beginning of the label and '
|' (OR) to specify possible alternatives for label matching.
Example of lquery:
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
a) b) c) d) e)
A label path should
+ a) begins from a node with label 'Top'
+ b) and following zero or 2 labels until
+ c) a node with label beginning from case-insensitive prefix 'sport'
+ d) following node with label not matched 'football' or 'tennis' and
+ e) ends on node with label beginning from 'Russ' or strictly matched
'Spain'.
ltxtquery
- is a datatype for label searching (like type 'query' for full text
searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
the end of word. The meaning of modifiers are the same as for lquery.
Example: 'Europe & Russia*@ & !Transportation'
Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
not 'Transportation'. Notice, the order of words as they appear in label
path is not important !
OPERATIONS
The following operations are defined for type ltree:
<,>,<=,>=,=, <>
- have their usual meanings. Comparison is doing in the order of direct
tree traversing, children of a node are sorted lexicographic.
ltree @> ltree
- returns TRUE if left argument is an ancestor of right argument (or
equal).
ltree <@ ltree
- returns TRUE if left argument is a descendant of right argument (or
equal).
ltree ~ lquery, lquery ~ ltree
- return TRUE if node represented by ltree satisfies lquery.
ltree ? lquery[], lquery ? ltree[]
- return TRUE if node represented by ltree satisfies at least one lquery
from array.
ltree @ ltxtquery, ltxtquery @ ltree
- return TRUE if node represented by ltree satisfies ltxtquery.
ltree || ltree, ltree || text, text || ltree
- return concatenated ltree.
Operations for arrays of ltree (ltree[]):
ltree[] @> ltree, ltree <@ ltree[]
- returns TRUE if array ltree[] contains an ancestor of ltree.
ltree @> ltree[], ltree[] <@ ltree
- returns TRUE if array ltree[] contains a descendant of ltree.
ltree[] ~ lquery, lquery ~ ltree[]
- returns TRUE if array ltree[] contains label paths matched lquery.
ltree[] ? lquery[], lquery[] ? ltree[]
- returns TRUE if array ltree[] contains label paths matched atleaset one
lquery from array.
ltree[] @ ltxtquery, ltxtquery @ ltree[]
- returns TRUE if array ltree[] contains label paths matched ltxtquery
(full text search).
ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery
- returns first element of array ltree[] satisfies corresponding condition
and NULL in vice versa.
REMARK
Operations <@, @>, @ and ~ have analogues - ^<@, ^@>, ^@, ^~, which doesn't use
indices !
INDICES
Various indices could be created to speed up execution of operations:
* B-tree index over ltree:
<, <=, =, >=, >
* GiST index over ltree:
<, <=, =, >=, >, @>, <@, @, ~, ?
Example:
create index path_gist_idx on test using gist (path);
* GiST index over ltree[]:
ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.
Example:
create index path_gist_idx on test using gist (array_path);
Notices: This index is lossy.
FUNCTIONS
ltree subltree
ltree subltree(ltree, start, end)
returns subpath of ltree from start (inclusive) until the end.
# select subltree('Top.Child1.Child2',1,2);
subltree
--------
Child1
ltree subpath
ltree subpath(ltree, OFFSET,LEN)
ltree subpath(ltree, OFFSET)
returns subpath of ltree from OFFSET (inclusive) with length LEN.
If OFFSET is negative returns subpath starts that far from the end
of the path. If LENGTH is omitted, returns everything to the end
of the path. If LENGTH is negative, leaves that many labels off
the end of the path.
# select subpath('Top.Child1.Child2',1,2);
subpath
-------
Child1.Child2
# select subpath('Top.Child1.Child2',-2,1);
subpath
---------
Child1
int4 nlevel
int4 nlevel(ltree) - returns level of the node.
# select nlevel('Top.Child1.Child2');
nlevel
--------
3
Note, that arguments start, end, OFFSET, LEN have meaning of level of the
node !
int4 index(ltree,ltree), int4 index(ltree,ltree,OFFSET)
returns number of level of the first occurence of second argument in first
one beginning from OFFSET. if OFFSET is negative, than search begins from |
OFFSET| levels from the end of the path.
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
index
-------
6
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
index
-------
9
ltree text2ltree(text), text ltree2text(text)
cast functions for ltree and text.
ltree lca(ltree,ltree,...) (up to 8 arguments)
ltree lca(ltree[])
Returns Lowest Common Ancestor (lca)
# select lca('1.2.2.3','1.2.3.4.5.6');
lca
-----
1.2
# select lca('{la.2.3,1.2.3.4.5.6}') is null;
?column?
----------
f
INSTALLATION
cd contrib/ltree
make
make install
make installcheck
EXAMPLE OF USAGE
createdb ltreetest
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
psql ltreetest < ltreetest.sql
Now, we have a database ltreetest populated with a data describing hierarchy
shown below:
TOP
/ | \
Science Hobbies Collections
/ | \
Astronomy Amateurs_Astronomy Pictures
/ \ |
Astrophysics Cosmology Astronomy
/ | \
Galaxies Stars Astronauts
Inheritance:
ltreetest=# select path from test where path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
Matching:
ltreetest=# select path from test where path ~ '*.Astronomy.*';
path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
Full text search:
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies.Amateurs_Astronomy
(4 rows)
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
Using Functions:
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
?column?
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
We could create SQL-function:
CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
LANGUAGE SQL IMMUTABLE;
and previous select could be rewritten as:
ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
Or with another arguments:
CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
LANGUAGE SQL IMMUTABLE;
ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
ADDITIONAL DATA
To get more feeling from our ltree module you could download
dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
available from http://www.sai.msu.su/~megera/postgres/gist/ltree/
dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
Setup your test database (dmoz), load ltree module and issue command:
zcat dmozltree-eng.sql.gz| psql dmoz
Data will be loaded into database dmoz and all indices will be created.
BENCHMARKS
All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
data, containing 300,274 nodes (see above for download link). We used some
basic queries typical for walking through catalog.
QUERIES
* Q0: Count all rows (sort of base time for comparison)
select count(*) from dmoz;
count
--------
300274
(1 row)
* Q1: Get direct children (without inheritance)
select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
path
-----------------------------------
Top.Adult.Arts.Animation.Cartoons
Top.Adult.Arts.Animation.Anime
(2 rows)
* Q2: The same as Q1 but with counting of successors
select path as parentpath , (select count(*)-1 from dmoz where path <@
p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
parentpath | count
-----------------------------------+-------
Top.Adult.Arts.Animation.Cartoons | 2
Top.Adult.Arts.Animation.Anime | 61
(2 rows)
* Q3: Get all parents
select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
path asc;
path
--------------------------
Top
Top.Adult
Top.Adult.Arts
Top.Adult.Arts.Animation
(4 rows)
* Q4: Get all parents with counting of children
select path, (select count(*)-1 from dmoz where path <@ p.path) as count
from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
path | count
--------------------------+--------
Top | 300273
Top.Adult | 4913
Top.Adult.Arts | 339
Top.Adult.Arts.Animation | 65
(4 rows)
* Q5: Get all children with levels
select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
path | level
------------------------------------------------+-------
Top.Adult.Arts.Animation.Anime | 1
Top.Adult.Arts.Animation.Anime.Fan_Works | 2
Top.Adult.Arts.Animation.Anime.Games | 2
Top.Adult.Arts.Animation.Anime.Genres | 2
Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
Top.Adult.Arts.Animation.Anime.Multimedia | 2
Top.Adult.Arts.Animation.Anime.Resources | 2
Top.Adult.Arts.Animation.Anime.Titles | 2
Top.Adult.Arts.Animation.Cartoons | 1
Top.Adult.Arts.Animation.Cartoons.AVS | 2
Top.Adult.Arts.Animation.Cartoons.Members | 2
(11 rows)
Timings
+---------------------------------------------+
|Query|Rows|Time (ms) index|Time (ms) no index|
|-----+----+---------------+------------------|
| Q0| 1| NA| 1453.44|
|-----+----+---------------+------------------|
| Q1| 2| 0.49| 1001.54|
|-----+----+---------------+------------------|
| Q2| 2| 1.48| 3009.39|
|-----+----+---------------+------------------|
| Q3| 4| 0.55| 906.98|
|-----+----+---------------+------------------|
| Q4| 4| 24385.07| 4951.91|
|-----+----+---------------+------------------|
| Q5| 11| 0.85| 1003.23|
+---------------------------------------------+
Timings without indices were obtained using operations which doesn't use
indices (see above)
Remarks
We didn't run full-scale tests, also we didn't present (yet) data for
operations with arrays of ltree (ltree[]) and full text searching. We'll
appreciate your input. So far, below some (rather obvious) results:
* Indices does help execution of queries
* Q4 performs bad because one needs to read almost all data from the HDD
CHANGES
Mar 28, 2003
Added functions index(ltree,ltree,offset), text2ltree(text),
ltree2text(text)
Feb 7, 2003
Add ? operation
Fix ~ operation bug: eg '1.1.1' ~ '*.1'
Optimize index storage
Aug 9, 2002
Fixed very stupid but important bug :-)
July 31, 2002
Now works on 64-bit platforms.
Added function lca - lowest common ancestor
Version for 7.2 is distributed as separate package -
http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz
July 13, 2002
Initial release.
TODO
* Testing on 64-bit platforms. There are several known problems with byte
alignment; -- RESOLVED
* Better documentation;
* We plan (probably) to improve regular expressions processing using
non-deterministic automata;
* Some sort of XML support;
* Better full text searching;
SOME BACKGROUNDS
The approach we use for ltree is much like one we used in our other GiST based
contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
background is available in papers referenced from our GiST development page
(http://www.sai.msu.su/~megera/postgres/gist).
A hierarchical data structure (tree) is a set of nodes. Each node has a
signature (LPS) of a fixed size, which is a hashed label path of that node.
Traversing a tree we could *certainly* prune branches if
LQS (bitwise AND) LPS != LQS
where LQS is a signature of lquery or ltxtquery, obtained in the same way as
LPS.
ltree[]:
For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
reachable from that node. Signatures are stored in RD-tree, implemented using
GiST, which provides indexed access.
ltree:
For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
represented by (left_bound, signature, right_bound), so that we could speedup
operations <, <=, =, >=, > using left_bound, right_bound and prune branches of
a tree using signature.
-------------------------------------------------------------------------------
We ask people who find the module useful to send us a postcards to:
Moscow, 119899, Universitetski pr.13, Moscow State University, Sternberg
Astronomical Institute, Russia
For: Bartunov O.S.
and
Moscow, Bratislavskaya str.23, appt. 18, Russia
For: Sigaev F.G.

View File

@ -1,94 +0,0 @@
The functions in this module allow you to inspect the contents of data pages
at a low level, for debugging purposes. All of these functions may be used
only by superusers.
1. Installation
$ make
$ make install
$ psql -e -f /usr/local/pgsql/share/contrib/pageinspect.sql test
2. Functions included:
get_raw_page
------------
get_raw_page reads one block of the named table and returns a copy as a
bytea field. This allows a single time-consistent copy of the block to be
made.
page_header
-----------
page_header shows fields which are common to all PostgreSQL heap and index
pages.
A page image obtained with get_raw_page should be passed as argument:
regression=# SELECT * FROM page_header(get_raw_page('pg_class',0));
lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid
-----------+-----+-------+-------+-------+---------+----------+---------+-----------
0/24A1B50 | 1 | 1 | 232 | 368 | 8192 | 8192 | 4 | 0
(1 row)
The returned columns correspond to the fields in the PageHeaderData struct.
See src/include/storage/bufpage.h for details.
heap_page_items
---------------
heap_page_items shows all line pointers on a heap page. For those line
pointers that are in use, tuple headers are also shown. All tuples are
shown, whether or not the tuples were visible to an MVCC snapshot at the
time the raw page was copied.
A heap page image obtained with get_raw_page should be passed as argument:
test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0));
See src/include/storage/itemid.h and src/include/access/htup.h for
explanations of the fields returned.
bt_metap
--------
bt_metap() returns information about a btree index's metapage:
test=> SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----
magic | 340322
version | 2
root | 1
level | 0
fastroot | 1
fastlevel | 0
bt_page_stats
-------------
bt_page_stats() shows information about single btree pages:
test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
dead_items | 0
avg_item_size | 12
page_size | 8192
free_size | 4056
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3
bt_page_items
-------------
bt_page_items() returns information about specific items on btree pages:
test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00

View File

@ -1,173 +0,0 @@
Pg_freespacemap - Real time queries on the free space map (FSM).
---------------
This module consists of two C functions: 'pg_freespacemap_relations()' and
'pg_freespacemap_pages()' that return a set of records, plus two views
'pg_freespacemap_relations' and 'pg_freespacemap_pages' for more
user-friendly access to the functions.
The module provides the ability to examine the contents of the free space
map, without having to restart or rebuild the server with additional
debugging code.
By default public access is REVOKED from the functions and views, just in
case there are security issues present in the code.
Installation
------------
Build and install the main Postgresql source, then this contrib module:
$ cd contrib/pg_freespacemap
$ gmake
$ gmake install
To register the functions and views:
$ psql -d <database> -f pg_freespacemap.sql
Notes
-----
The definitions for the columns exposed in the views are:
pg_freespacemap_relations
Column | references | Description
------------------+----------------------+----------------------------------
reltablespace | pg_tablespace.oid | Tablespace oid of the relation.
reldatabase | pg_database.oid | Database oid of the relation.
relfilenode | pg_class.relfilenode | Relfilenode of the relation.
avgrequest | | Moving average of free space
| | requests (NULL for indexes)
interestingpages | | Count of pages last reported as
| | containing useful free space.
storedpages | | Count of pages actually stored
| | in free space map.
nextpage | | Page index (from 0) to start next
| | search at.
pg_freespacemap_pages
Column | references | Description
----------------+----------------------+------------------------------------
reltablespace | pg_tablespace.oid | Tablespace oid of the relation.
reldatabase | pg_database.oid | Database oid of the relation.
relfilenode | pg_class.relfilenode | Relfilenode of the relation.
relblocknumber | | Page number in the relation.
bytes | | Free bytes in the page, or NULL
| | for an index page (see below).
For pg_freespacemap_relations, there is one row for each relation in the free
space map. storedpages is the number of pages actually stored in the map,
while interestingpages is the number of pages the last VACUUM thought had
useful amounts of free space.
If storedpages is consistently less than interestingpages then it'd be a
good idea to increase max_fsm_pages. Also, if the number of rows in
pg_freespacemap_relations is close to max_fsm_relations, then you should
consider increasing max_fsm_relations.
For pg_freespacemap_pages, there is one row for each page in the free space
map. The number of rows for a relation will match the storedpages column
in pg_freespacemap_relations.
For indexes, what is tracked is entirely-unused pages, rather than free
space within pages. Therefore, the average request size and free bytes
within a page are not meaningful, and are shown as NULL.
Because the map is shared by all the databases, it will include relations
not belonging to the current database.
When either of the views are accessed, internal free space map locks are
taken, and a copy of the map data is made for them to display.
This ensures that the views produce a consistent set of results, while not
blocking normal activity longer than necessary. Nonetheless there
could be some impact on database performance if they are read often.
Sample output - pg_freespacemap_relations
-------------
regression=# \d pg_freespacemap_relations
View "public.pg_freespacemap_relations"
Column | Type | Modifiers
------------------+---------+-----------
reltablespace | oid |
reldatabase | oid |
relfilenode | oid |
avgrequest | integer |
interestingpages | integer |
storedpages | integer |
nextpage | integer |
View definition:
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage
FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer);
regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages
FROM pg_freespacemap_relations r INNER JOIN pg_class c
ON c.relfilenode = r.relfilenode INNER JOIN pg_database d
ON r.reldatabase = d.oid AND (d.datname = current_database())
ORDER BY r.storedpages DESC LIMIT 10;
relname | avgrequest | interestingpages | storedpages
---------------------------------+------------+------------------+-------------
onek | 256 | 109 | 109
pg_attribute | 167 | 93 | 93
pg_class | 191 | 49 | 49
pg_attribute_relid_attnam_index | | 48 | 48
onek2 | 256 | 37 | 37
pg_depend | 95 | 26 | 26
pg_type | 199 | 16 | 16
pg_rewrite | 1011 | 13 | 13
pg_class_relname_nsp_index | | 10 | 10
pg_proc | 302 | 8 | 8
(10 rows)
Sample output - pg_freespacemap_pages
-------------
regression=# \d pg_freespacemap_pages
View "public.pg_freespacemap_pages"
Column | Type | Modifiers
----------------+---------+-----------
reltablespace | oid |
reldatabase | oid |
relfilenode | oid |
relblocknumber | bigint |
bytes | integer |
View definition:
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes
FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer);
regression=# SELECT c.relname, p.relblocknumber, p.bytes
FROM pg_freespacemap_pages p INNER JOIN pg_class c
ON c.relfilenode = p.relfilenode INNER JOIN pg_database d
ON (p.reldatabase = d.oid AND d.datname = current_database())
ORDER BY c.relname LIMIT 10;
relname | relblocknumber | bytes
--------------+----------------+-------
a_star | 0 | 8040
abstime_tbl | 0 | 7908
aggtest | 0 | 8008
altinhoid | 0 | 8128
altstartwith | 0 | 8128
arrtest | 0 | 7172
b_star | 0 | 7976
box_tbl | 0 | 7912
bt_f8_heap | 54 | 7728
bt_i4_heap | 49 | 8008
(10 rows)
Author
------
* Mark Kirkwood <markir@paradise.net.nz>

View File

@ -1,206 +0,0 @@
pg_standby README 2006/12/08 Simon Riggs
o What is pg_standby?
pg_standby allows the creation of a Warm Standby server.
It is designed to be a production-ready program, as well as a
customisable template should you require specific modifications.
Other configuration is required as well, all of which is
described in the main server manual.
The program is designed to be a wait-for restore_command,
required to turn a normal archive recovery into a Warm Standby.
Within the restore_command of the recovery.conf you could
configure pg_standby in the following way:
restore_command = 'pg_standby archiveDir %f %p %r'
which would be sufficient to define that files will be restored
from archiveDir.
o features of pg_standby
- pg_standby is written in C. So it is very portable
and easy to install.
- supports copy or link from a directory (only)
- source easy to modify, with specifically designated
sections to modify for your own needs, allowing
interfaces to be written for additional Backup Archive Restore
(BAR) systems
- portable: tested on Linux and Windows
o How to install pg_standby
$make
$make install
o How to use pg_standby?
pg_standby should be used within the restore_command of the
recovery.conf file. See the main PostgreSQL manual for details.
The basic usage should be like this:
restore_command = 'pg_standby archiveDir %f %p %r'
with the pg_standby command usage as
pg_standby [OPTION]... ARCHIVELOCATION NEXTWALFILE XLOGFILEPATH [RESTARTWALFILE]
When used within the restore_command the %f and %p macros
will provide the actual file and path required for the restore/recovery.
pg_standby assumes that ARCHIVELOCATION is directory accessible by the
server-owning user.
If RESTARTWALFILE is specified, typically by using the %r option, then all files
prior to this file will be removed from ARCHIVELOCATION. This then minimises
the number of files that need to be held, whilst at the same time maintaining
restart capability. This capability additionally assumes that ARCHIVELOCATION
directory is writable.
o options
pg_standby allows the following command line switches
-c
use copy/cp command to restore WAL files from archive
-d
debug/logging option.
-k numfiles
Cleanup files in the archive so that we maintain no more
than this many files in the archive. This parameter will
be silently ignored if RESTARTWALFILE is specified, since
that specification method is more accurate in determining
the correct cut-off point in archive.
You should be wary against setting this number too low,
since this may mean you cannot restart the standby. This
is because the last restartpoint marked in the WAL files
may be many files in the past and can vary considerably.
This should be set to a value exceeding the number of WAL
files that can be recovered in 2*checkpoint_timeout seconds,
according to the value in the warm standby postgresql.conf.
It is wholly unrelated to the setting of checkpoint_segments
on either primary or standby.
Setting numfiles to be zero will disable deletion of files
from ARCHIVELOCATION.
If in doubt, use a large value or do not set a value at all.
If you specify neither RESTARTWALFILE nor -k, then -k 0
will be assumed, i.e. keep all files in archive.
Default=0, Min=0
-l
use ln command to restore WAL files from archive
WAL files will remain in archive
Link is more efficient, but the default is copy to
allow you to maintain the WAL archive for recovery
purposes as well as high-availability.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
This option uses the Windows Vista command mklink
to provide a file-to-file symbolic link. -l will
not work on versions of Windows prior to Vista.
Use the -c option instead.
see http://en.wikipedia.org/wiki/NTFS_symbolic_link
-r maxretries
the maximum number of times to retry the restore command if it
fails. After each failure, we wait for sleeptime * num_retries
so that the wait time increases progressively, so by default
we will wait 5 secs, 10 secs then 15 secs before reporting
the failure back to the database server. This will be
interpreted as and end of recovery and the Standby will come
up fully as a result.
Default=3, Min=0
-s sleeptime
the number of seconds to sleep between testing to see
if the file to be restored is available in the archive yet.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
Default=5, Min=1, Max=60
-t triggerfile
the presence of the triggerfile will cause recovery to end
whether or not the next file is available
It is recommended that you use a structured filename to
avoid confusion as to which server is being triggered
when multiple servers exist on same system.
e.g. /tmp/pgsql.trigger.5432
-w maxwaittime
the maximum number of seconds to wait for the next file,
after which recovery will end and the Standby will come up.
A setting of zero means wait forever.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
Default=0, Min=0
Note: --help is not supported since pg_standby is not intended
for interactive use, except during dev/test
o examples
Linux
archive_command = 'cp %p ../archive/%f'
restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log'
which will
- use a ln command to restore WAL files from archive
- produce logfile output in standby.log
- keep the last 255 full WAL files, plus the current one
- sleep for 2 seconds between checks for next WAL file is full
- never timeout if file not found
- stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears
Windows
archive_command = 'copy %p ..\\archive\\%f'
Note that backslashes need to be doubled in the archive_command, but
*not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442 ..\archive %f %p 2>> standby.log'
which will
- use a copy command to restore WAL files from archive
- produce logfile output in standby.log
- sleep for 5 seconds between checks for next WAL file is full
- never timeout if file not found
- stop waiting when a trigger file called C:\pgsql.trigger.5442 appears
o supported versions
pg_standby is designed to work with PostgreSQL 8.2 and later. It is
currently compatible across minor changes between the way 8.3 and 8.2
operate.
PostgreSQL 8.3 provides the %r command line substitution, designed to
let pg_standby know the last file it needs to keep. If the last
parameter is omitted, no error is generated, allowing pg_standby to
function correctly with PostgreSQL 8.2 also. With PostgreSQL 8.2,
the -k option must be used if archive cleanup is required. This option
remains available in 8.3.
o reported test success
SUSE Linux 10.2
Windows XP Pro
o additional design notes
The use of a move command seems like it would be a good idea, but
this would prevent recovery from being restartable. Also, the last WAL
file is always requested twice from the archive.

View File

@ -1,144 +0,0 @@
trgm - Trigram matching for PostgreSQL
--------------------------------------
Introduction
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
The pg_trgm contrib module provides functions and index classes
for determining the similarity of text based on trigram
matching.
Definitions
Trigram (or Trigraph)
A trigram is a set of three consecutive characters taken
from a string. A string is considered to have two spaces
prefixed and one space suffixed when determining the set
of trigrams that comprise the string.
eg. The set of trigrams in the word "cat" is " c", " ca",
"at " and "cat".
Public Functions
real similarity(text, text)
Returns a number that indicates how closely matches the two
arguments are. A zero result indicates that the two words
are completely dissimilar, and a result of one indicates that
the two words are identical.
real show_limit()
Returns the current similarity threshold used by the '%'
operator. This in effect sets the minimum similarity between
two words in order that they be considered similar enough to
be misspellings of each other, for example.
real set_limit(real)
Sets the current similarity threshold that is used by the '%'
operator, and is returned by the show_limit() function.
text[] show_trgm(text)
Returns an array of all the trigrams of the supplied text
parameter.
Public Operators
text % text (returns boolean)
The '%' operator returns TRUE if its two arguments have a similarity
that is greater than the similarity threshold set by set_limit(). It
will return FALSE if the similarity is less than the current
threshold.
Public Index Operator Classes
gist_trgm_ops
The pg_trgm module comes with an index operator class that allows a
developer to create an index over a text column for the purpose
of very fast similarity searches.
To use this index, the '%' operator must be used and an appropriate
similarity threshold for the application must be set.
eg.
CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
At this point, you will have an index on the t text column that you
can use for similarity searching.
eg.
SELECT
t,
similarity(t, 'word') AS sml
FROM
test_trgm
WHERE
t % 'word'
ORDER BY
sml DESC, t;
This will return all values in the text column that are sufficiently
similar to 'word', sorted from best match to worst. The index will
be used to make this a fast operation over very large data sets.
Tsearch2 Integration
Trigram matching is a very useful tool when used in conjunction
with a text index created by the Tsearch2 contrib module. (See
contrib/tsearch2)
The first step is to generate an auxiliary table containing all
the unique words in the Tsearch2 index:
CREATE TABLE words AS SELECT word FROM
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
Where 'documents' is a table that has a text field 'bodytext'
that TSearch2 is used to search. The use of the 'simple' dictionary
with the to_tsvector function, instead of just using the already
existing vector is to avoid creating a list of already stemmed
words. This way, only the original, unstemmed words are added
to the word list.
Next, create a trigram index on the word column:
CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
or
CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
Now, a SELECT query similar to the example above can be used to
suggest spellings for misspelled words in user search terms. A
useful extra clause is to ensure that the similar words are also
of similar length to the misspelled word.
Note: Since the 'words' table has been generated as a separate,
static table, it will need to be periodically regenerated so that
it remains up to date with the word list in the Tsearch2 index.
Authors
Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
Contributors
Christopher Kings-Lynne wrote this README file
References
Tsearch2 Development Site
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
GiST Development Site
http://www.sai.msu.su/~megera/postgres/gist/

View File

@ -1,284 +0,0 @@
$PostgreSQL: pgsql/contrib/pgbench/README.pgbench,v 1.20 2007/07/06 20:17:02 wieck Exp $
pgbench README
o What is pgbench?
pgbench is a simple program to run a benchmark test. pgbench is a
client application of PostgreSQL and runs with PostgreSQL only. It
performs lots of small and simple transactions including
SELECT/UPDATE/INSERT operations then calculates number of
transactions successfully completed within a second (transactions
per second, tps). Targeting data includes a table with at least 100k
tuples.
Example outputs from pgbench look like:
number of clients: 4
number of transactions per client: 100
number of processed transactions: 400/400
tps = 19.875015(including connections establishing)
tps = 20.098827(excluding connections establishing)
Similar program called "JDBCBench" already exists, but it requires
Java that may not be available on every platform. Moreover some
people concerned about the overhead of Java that might lead
inaccurate results. So I decided to write in pure C, and named
it "pgbench."
o features of pgbench
- pgbench is written in C using libpq only. So it is very portable
and easy to install.
- pgbench can simulate concurrent connections using asynchronous
capability of libpq. No threading is required.
o How to install pgbench
$make
$make install
o How to use pgbench?
(1) (optional)Initialize database by:
pgbench -i <dbname>
where <dbname> is the name of database. pgbench uses four tables
accounts, branches, history and tellers. These tables will be
destroyed. Be very careful if you have tables having same
names. Default test data contains:
table # of tuples
-------------------------
branches 1
tellers 10
accounts 100000
history 0
You can increase the number of tuples by using -s option. branches,
tellers and accounts tables are created with a fillfactor which is
set using -F option. See below.
(2) Run the benchmark test
pgbench <dbname>
The default configuration is:
number of clients: 1
number of transactions per client: 10
o options
pgbench has number of options.
-h hostname
hostname where the backend is running. If this option
is omitted, pgbench will connect to the localhost via
Unix domain socket.
-p port
the port number that the backend is accepting. default is
libpq's default, usually 5432.
-c number_of_clients
Number of clients simulated. default is 1.
-t number_of_transactions
Number of transactions each client runs. default is 10.
-s scaling_factor
this should be used with -i (initialize) option.
number of tuples generated will be multiple of the
scaling factor. For example, -s 100 will imply 10M
(10,000,000) tuples in the accounts table.
default is 1. NOTE: scaling factor should be at least
as large as the largest number of clients you intend
to test; else you'll mostly be measuring update contention.
Regular (not initializing) runs using one of the
built-in tests will detect scale based on the number of
branches in the database. For custom (-f) runs it can
be manually specified with this parameter.
-D varname=value
Define a variable. It can be refered to by a script
provided by using -f option. Multiple -D options are allowed.
-U login
Specify db user's login name if it is different from
the Unix login name.
-P password
Specify the db password. CAUTION: using this option
might be a security hole since ps command will
show the password. Use this for TESTING PURPOSE ONLY.
-n
No vacuuming and cleaning the history table prior to the
test is performed.
-v
Do vacuuming before testing. This will take some time.
With neither -n nor -v, pgbench will vacuum tellers and
branches tables only.
-S
Perform select only transactions instead of TPC-B.
-N Do not update "branches" and "tellers". This will
avoid heavy update contention on branches and tellers,
while it will not make pgbench supporting TPC-B like
transactions.
-f filename
Read transaction script from file. Detailed
explanation will appear later.
-C
Establish connection for each transaction, rather than
doing it just once at beginning of pgbench in the normal
mode. This is useful to measure the connection overhead.
-l
Write the time taken by each transaction to a logfile,
with the name "pgbench_log.xxx", where xxx is the PID
of the pgbench process. The format of the log is:
client_id transaction_no time file_no time-epoch time-us
where time is measured in microseconds, , the file_no is
which test file was used (useful when multiple were
specified with -f), and time-epoch/time-us are a
UNIX epoch format timestamp followed by an offset
in microseconds (suitable for creating a ISO 8601
timestamp with a fraction of a second) of when
the transaction completed.
Here are example outputs:
0 199 2241 0 1175850568 995598
0 200 2465 0 1175850568 998079
0 201 2513 0 1175850569 608
0 202 2038 0 1175850569 2663
-F fillfactor
Create tables(accounts, tellers and branches) with the given
fillfactor. Default is 100. This should be used with -i
(initialize) option.
-d
debug option.
o What is the "transaction" actually performed in pgbench?
(1) begin;
(2) update accounts set abalance = abalance + :delta where aid = :aid;
(3) select abalance from accounts where aid = :aid;
(4) update tellers set tbalance = tbalance + :delta where tid = :tid;
(5) update branches set bbalance = bbalance + :delta where bid = :bid;
(6) insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);
(7) end;
If you specify -N, (4) and (5) aren't included in the transaction.
o -f option
This supports for reading transaction script from a specified
file. This file should include SQL commands in each line. SQL
command consists of multiple lines are not supported. Empty lines
and lines begging with "--" will be ignored.
Multiple -f options are allowed. In this case each transaction is
assigned randomly chosen script.
SQL commands can include "meta command" which begins with "\" (back
slash). A meta command takes some arguments separted by white
spaces. Currently following meta command is supported:
\set name operand1 [ operator operand2 ]
set the calculated value using "operand1" "operator"
"operand2" to variable "name". If "operator" and "operand2"
are omitted, the value of operand1 is set to variable "name".
example:
\set ntellers 10 * :scale
\setrandom name min max
assign random integer to name between min and max
example:
\setrandom aid 1 100000
variables can be reffered to in SQL comands by adding ":" in front
of the varible name.
example:
SELECT abalance FROM accounts WHERE aid = :aid
Variables can also be defined by using -D option.
\sleep num [us|ms|s]
causes script execution to sleep for the specified duration of
microseconds (us), milliseconds (ms) or the default seconds (s).
example:
\setrandom millisec 1000 2500
\sleep :millisec ms
Example, TPC-B like benchmark can be defined as follows(scaling
factor = 1):
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta 1 10000
BEGIN
UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
SELECT abalance FROM accounts WHERE aid = :aid
UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
END
If you want to automatically set the scaling factor from the number of
tuples in branches table, use -s option and shell command like this:
pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
Notice that -f option does not execute vacuum and clearing history
table before starting benchmark.
o License?
Basically it is same as BSD license. See pgbench.c for more details.
o History before contributed to PostgreSQL
2000/1/15 pgbench-1.2 contributed to PostgreSQL
* Add -v option
1999/09/29 pgbench-1.1 released
* Apply cygwin patches contributed by Yutaka Tanida
* More robust when backends die
* Add -S option (select only)
1999/09/04 pgbench-1.0 released

View File

@ -1,709 +0,0 @@
pgcrypto - cryptographic functions for PostgreSQL
=================================================
Marko Kreen <markokr@gmail.com>
// Note: this document is in asciidoc format.
1. Installation
-----------------
Run following commands:
make
make install
make installcheck
The `make installcheck` command is important. It runs regression tests
for the module. They make sure the functions here produce correct
results.
Next, to put the functions into a particular database, run the commands in
file pgcrypto.sql, which has been installed into the shared files directory.
Example using psql:
psql -d DBNAME -f pgcrypto.sql
2. Notes
----------
2.1. Configuration
~~~~~~~~~~~~~~~~~~~~
pgcrypto configures itself according to the findings of main PostgreSQL
`configure` script. The options that affect it are `--with-zlib` and
`--with-openssl`.
When compiled with zlib, PGP encryption functions are able to
compress data before encrypting.
When compiled with OpenSSL there will be more algorithms available.
Also public-key encryption functions will be faster as OpenSSL
has more optimized BIGNUM functions.
Summary of functionality with and without OpenSSL:
`----------------------------`---------`------------
Functionality built-in OpenSSL
----------------------------------------------------
MD5 yes yes
SHA1 yes yes
SHA224/256/384/512 yes yes (3)
Any other digest algo no yes (1)
Blowfish yes yes
AES yes yes (2)
DES/3DES/CAST5 no yes
Raw encryption yes yes
PGP Symmetric encryption yes yes
PGP Public-Key encryption yes yes
----------------------------------------------------
1. Any digest algorithm OpenSSL supports is automatically picked up.
This is not possible with ciphers, which need to be supported
explicitly.
2. AES is included in OpenSSL since version 0.9.7. If pgcrypto is
compiled against older version, it will use built-in AES code,
so it has AES always available.
3. SHA2 algorithms were added to OpenSSL in version 0.9.8. For
older versions, pgcrypto will use built-in code.
2.2. NULL handling
~~~~~~~~~~~~~~~~~~~~
As standard in SQL, all functions return NULL, if any of the arguments
are NULL. This may create security risks on careless usage.
2.3. Security
~~~~~~~~~~~~~~~
All the functions here run inside database server. That means that all
the data and passwords move between pgcrypto and client application in
clear-text. Thus you must:
1. Connect locally or use SSL connections.
2. Trust both system and database administrator.
If you cannot, then better do crypto inside client application.
3. General hashing
--------------------
3.1. digest(data, type)
~~~~~~~~~~~~~~~~~~~~~~~~~
digest(data text, type text) RETURNS bytea
digest(data bytea, type text) RETURNS bytea
Type is here the algorithm to use. Standard algorithms are `md5` and
`sha1`, although there may be more supported, depending on build
options.
Returns binary hash.
If you want hexadecimal string, use `encode()` on result. Example:
CREATE OR REPLACE FUNCTION sha1(bytea) RETURNS text AS $$
SELECT encode(digest($1, 'sha1'), 'hex')
$$ LANGUAGE SQL STRICT IMMUTABLE;
3.2. hmac(data, key, type)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hmac(data text, key text, type text) RETURNS bytea
hmac(data bytea, key text, type text) RETURNS bytea
Calculates Hashed MAC over data. `type` is the same as in `digest()`.
If the key is larger than hash block size it will first hashed and the
hash will be used as key.
It is similar to digest() but the hash can be recalculated only knowing
the key. This avoids the scenario of someone altering data and also
changing the hash.
Returns binary hash.
4. Password hashing
---------------------
The functions `crypt()` and `gen_salt()` are specifically designed
for hashing passwords. `crypt()` does the hashing and `gen_salt()`
prepares algorithm parameters for it.
The algorithms in `crypt()` differ from usual hashing algorithms like
MD5 or SHA1 in following respects:
1. They are slow. As the amount of data is so small, this is only
way to make brute-forcing passwords hard.
2. Include random 'salt' with result, so that users having same
password would have different crypted passwords. This is also
additional defense against reversing the algorithm.
3. Include algorithm type in the result, so passwords hashed with
different algorithms can co-exist.
4. Some of them are adaptive - that means after computers get
faster, you can tune the algorithm to be slower, without
introducing incompatibility with existing passwords.
Supported algorithms:
`------`-------------`---------`----------`---------------------------
Type Max password Adaptive Salt bits Description
----------------------------------------------------------------------
`bf` 72 yes 128 Blowfish-based, variant 2a
`md5` unlimited no 48 md5-based crypt()
`xdes` 8 yes 24 Extended DES
`des` 8 no 12 Original UNIX crypt
----------------------------------------------------------------------
4.1. crypt(password, salt)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
crypt(password text, salt text) RETURNS text
Calculates UN*X crypt(3) style hash of password. When storing new
password, you need to use function `gen_salt()` to generate new salt.
When checking password you should use existing hash as salt.
Example - setting new password:
UPDATE .. SET pswhash = crypt('new password', gen_salt('md5'));
Example - authentication:
SELECT pswhash = crypt('entered password', pswhash) WHERE .. ;
returns true or false whether the entered password is correct.
It also can return NULL if `pswhash` field is NULL.
4.2. gen_salt(type)
~~~~~~~~~~~~~~~~~~~~~
gen_salt(type text) RETURNS text
Generates a new random salt for usage in `crypt()`. For adaptible
algorithms, it uses the default iteration count.
Accepted types are: `des`, `xdes`, `md5` and `bf`.
4.3. gen_salt(type, rounds)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gen_salt(type text, rounds integer) RETURNS text
Same as above, but lets user specify iteration count for some
algorithms. The higher the count, the more time it takes to hash
the password and therefore the more time to break it. Although with
too high count the time to calculate a hash may be several years
- which is somewhat impractical.
Number is algorithm specific:
`-----'---------'-----'----------
type default min max
---------------------------------
`xdes` 725 1 16777215
`bf` 6 4 31
---------------------------------
In case of xdes there is a additional limitation that the count must be
a odd number.
Notes:
- Original DES crypt was designed to have the speed of 4 hashes per
second on the hardware of that time.
- Slower than 4 hashes per second would probably dampen usability.
- Faster than 100 hashes per second is probably too fast.
- See next section about possible values for `crypt-bf`.
4.4. Comparison of crypt and regular hashes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is a table that should give overview of relative slowness
of different hashing algorithms.
* The goal is to crack a 8-character password, which consists:
1. Only of lowercase letters
2. Numbers, lower- and uppercase letters.
* The table below shows how much time it would take to try all
combinations of characters.
* The `crypt-bf` is featured in several settings - the number
after slash is the `rounds` parameter of `gen_salt()`.
`------------'----------'--------------'--------------------
Algorithm Hashes/sec Chars: [a-z] Chars: [A-Za-z0-9]
------------------------------------------------------------
crypt-bf/8 28 246 years 251322 years
crypt-bf/7 57 121 years 123457 years
crypt-bf/6 112 62 years 62831 years
crypt-bf/5 211 33 years 33351 years
crypt-md5 2681 2.6 years 2625 years
crypt-des 362837 7 days 19 years
sha1 590223 4 days 12 years
md5 2345086 1 day 3 years
------------------------------------------------------------
* The machine used is 1.5GHz Pentium 4.
* crypt-des and crypt-md5 algorithm numbers are taken from
John the Ripper v1.6.38 `-test` output.
* MD5 numbers are from mdcrack 1.2.
* SHA1 numbers are from lcrack-20031130-beta.
* `crypt-bf` numbers are taken using simple program that loops
over 1000 8-character passwords. That way I can show the speed with
different number of rounds. For reference: `john -test` shows 213
loops/sec for crypt-bf/5. (The small difference in results is in
accordance to the fact that the `crypt-bf` implementation in pgcrypto
is same one that is used in John the Ripper.)
Note that "try all combinations" is not a realistic exercise.
Usually password cracking is done with the help of dictionaries, which
contain both regular words and various mutations of them. So, even
somewhat word-like passwords could be cracked much faster than the above
numbers suggest, and a 6-character non-word like password may escape
cracking. Or not.
5. PGP encryption
-------------------
The functions here implement the encryption part of OpenPGP (RFC2440)
standard. Supported are both symmetric-key and public-key encryption.
5.1. Overview
~~~~~~~~~~~~~~~
Encrypted PGP message consists of 2 packets:
- Packet for session key - either symmetric- or public-key encrypted.
- Packet for session-key encrypted data.
When encrypting with password:
1. Given password is hashed using String2Key (S2K) algorithm. This
is rather similar to `crypt()` algorithm - purposefully slow
and with random salt - but it produces a full-length binary key.
2. If separate session key is requested, new random key will be
generated. Otherwise S2K key will be used directly as session key.
3. If S2K key is to be used directly, then only S2K settings will be put
into session key packet. Otherwise session key will be encrypted with
S2K key and put into session key packet.
When encrypting with public key:
1. New random session key is generated.
2. It is encrypted using public key and put into session key packet.
Now common part, the session-key encrypted data packet:
1. Optional data-manipulation: compression, conversion to UTF-8,
conversion of line-endings.
2. Data is prefixed with block of random bytes. This is equal
to using random IV.
3. A SHA1 hash of random prefix and data is appended.
4. All this is encrypted with session key.
5.2. pgp_sym_encrypt(data, psw)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgp_sym_encrypt(data text, psw text [, options text] ) RETURNS bytea
pgp_sym_encrypt_bytea(data bytea, psw text [, options text] ) RETURNS bytea
Return a symmetric-key encrypted PGP message.
Options are described in section 5.8.
5.3. pgp_sym_decrypt(msg, psw)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgp_sym_decrypt(msg bytea, psw text [, options text] ) RETURNS text
pgp_sym_decrypt_bytea(msg bytea, psw text [, options text] ) RETURNS bytea
Decrypt a symmetric-key encrypted PGP message.
Decrypting bytea data with `pgp_sym_decrypt` is disallowed.
This is to avoid outputting invalid character data. Decrypting
originally textual data with `pgp_sym_decrypt_bytea` is fine.
Options are described in section 5.8.
5.4. pgp_pub_encrypt(data, pub_key)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgp_pub_encrypt(data text, key bytea [, options text] ) RETURNS bytea
pgp_pub_encrypt_bytea(data bytea, key bytea [, options text] ) RETURNS bytea
Encrypt data with a public key. Giving this function a secret key will
produce a error.
Options are described in section 5.8.
5.5. pgp_pub_decrypt(msg, sec_key [, psw])
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgp_pub_decrypt(msg bytea, key bytea [, psw text [, options text]] ) \
RETURNS text
pgp_pub_decrypt_bytea(msg bytea, key bytea [,psw text [, options text]] ) \
RETURNS bytea
Decrypt a public-key encrypted message with secret key. If the secret
key is password-protected, you must give the password in `psw`. If
there is no password, but you want to specify option for function, you
need to give empty password.
Decrypting bytea data with `pgp_pub_decrypt` is disallowed.
This is to avoid outputting invalid character data. Decrypting
originally textual data with `pgp_pub_decrypt_bytea` is fine.
Options are described in section 5.8.
5.6. pgp_key_id(key / msg)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgp_key_id(key or msg bytea) RETURNS text
It shows you either key ID if given PGP public or secret key. Or it
gives the key ID that was used for encrypting the data, if given
encrypted message.
It can return 2 special key IDs:
SYMKEY::
The data is encrypted with symmetric key.
ANYKEY::
The data is public-key encrypted, but the key ID is cleared.
That means you need to try all your secret keys on it to see
which one decrypts it. pgcrypto itself does not produce such
messages.
Note that different keys may have same ID. This is rare but normal
event. Client application should then try to decrypt with each one,
to see which fits - like handling ANYKEY.
5.7. armor / dearmor
~~~~~~~~~~~~~~~~~~~~~~
armor(data bytea) RETURNS text
dearmor(data text) RETURNS bytea
Those wrap/unwrap data into PGP Ascii Armor which is basically Base64
with CRC and additional formatting.
5.8. Options for PGP functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Options are named to be similar to GnuPG. Values should be given after
an equal sign; separate options from each other with commas. Example:
pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256')
All of the options except `convert-crlf` apply only to encrypt
functions. Decrypt functions get the parameters from PGP data.
Most interesting options are probably `compression-algo` and
`unicode-mode`. The rest should have reasonable defaults.
cipher-algo::
What cipher algorithm to use.
Values: bf, aes128, aes192, aes256 (OpenSSL-only: `3des`, `cast5`)
Default: aes128
Applies: pgp_sym_encrypt, pgp_pub_encrypt
compress-algo::
Which compression algorithm to use. Needs building with zlib.
Values:
0 - no compression
1 - ZIP compression
2 - ZLIB compression [=ZIP plus meta-data and block-CRC's]
Default: 0
Applies: pgp_sym_encrypt, pgp_pub_encrypt
compress-level::
How much to compress. Bigger level compresses smaller but is slower.
0 disables compression.
Values: 0, 1-9
Default: 6
Applies: pgp_sym_encrypt, pgp_pub_encrypt
convert-crlf::
Whether to convert `\n` into `\r\n` when encrypting and `\r\n` to `\n`
when decrypting. RFC2440 specifies that text data should be stored
using `\r\n` line-feeds. Use this to get fully RFC-compliant
behavior.
Values: 0, 1
Default: 0
Applies: pgp_sym_encrypt, pgp_pub_encrypt, pgp_sym_decrypt, pgp_pub_decrypt
disable-mdc::
Do not protect data with SHA-1. Only good reason to use this
option is to achieve compatibility with ancient PGP products, as the
SHA-1 protected packet is from upcoming update to RFC2440. (Currently
at version RFC2440bis-14.) Recent gnupg.org and pgp.com software
supports it fine.
Values: 0, 1
Default: 0
Applies: pgp_sym_encrypt, pgp_pub_encrypt
enable-session-key::
Use separate session key. Public-key encryption always uses separate
session key, this is for symmetric-key encryption, which by default
uses S2K directly.
Values: 0, 1
Default: 0
Applies: pgp_sym_encrypt
s2k-mode::
Which S2K algorithm to use.
Values:
0 - Without salt. Dangerous!
1 - With salt but with fixed iteration count.
3 - Variable iteration count.
Default: 3
Applies: pgp_sym_encrypt
s2k-digest-algo::
Which digest algorithm to use in S2K calculation.
Values: md5, sha1
Default: sha1
Applies: pgp_sym_encrypt
s2k-cipher-algo::
Which cipher to use for encrypting separate session key.
Values: bf, aes, aes128, aes192, aes256
Default: use cipher-algo.
Applies: pgp_sym_encrypt
unicode-mode::
Whether to convert textual data from database internal encoding to
UTF-8 and back. If your database already is UTF-8, no conversion will
be done, only the data will be tagged as UTF-8. Without this option
it will not be.
Values: 0, 1
Default: 0
Applies: pgp_sym_encrypt, pgp_pub_encrypt
5.9. Generating keys with GnuPG
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generate a new key:
gpg --gen-key
The preferred key type is "DSA and Elgamal".
For RSA encryption you must create either DSA or RSA sign-only key
as master and then add RSA encryption subkey with `gpg --edit-key`.
List keys:
gpg --list-secret-keys
Export ascii-armored public key:
gpg -a --export KEYID > public.key
Export ascii-armored secret key:
gpg -a --export-secret-keys KEYID > secret.key
You need to use `dearmor()` on them before giving them to
pgp_pub_* functions. Or if you can handle binary data, you can drop
"-a" from gpg.
For more details see `man gpg`, http://www.gnupg.org/gph/en/manual.html[
The GNU Privacy Handbook] and other docs on http://www.gnupg.org[] site.
5.10. Limitations of PGP code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- No support for signing. That also means that it is not checked
whether the encryption subkey belongs to master key.
- No support for encryption key as master key. As such practice
is generally discouraged, it should not be a problem.
- No support for several subkeys. This may seem like a problem, as this
is common practice. On the other hand, you should not use your regular
GPG/PGP keys with pgcrypto, but create new ones, as the usage scenario
is rather different.
6. Raw encryption
-------------------
Those functions only run a cipher over data, they don't have any advanced
features of PGP encryption. Therefore they have some major problems:
1. They use user key directly as cipher key.
2. They don't provide any integrity checking, to see
if the encrypted data was modified.
3. They expect that users manage all encryption parameters
themselves, even IV.
4. They don't handle text.
So, with the introduction of PGP encryption, usage of raw
encryption functions is discouraged.
encrypt(data bytea, key bytea, type text) RETURNS bytea
decrypt(data bytea, key bytea, type text) RETURNS bytea
encrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea
decrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea
Encrypt/decrypt data with cipher, padding data if needed.
`type` parameter description in pseudo-noteup:
algo ['-' mode] ['/pad:' padding]
Supported algorithms:
* `bf` - Blowfish
* `aes` - AES (Rijndael-128)
Modes:
* `cbc` - next block depends on previous. (default)
* `ecb` - each block is encrypted separately.
(for testing only)
Padding:
* `pkcs` - data may be any length (default)
* `none` - data must be multiple of cipher block size.
IV is initial value for mode, defaults to all zeroes. It is ignored for
ECB. It is clipped or padded with zeroes if not exactly block size.
So, example:
encrypt(data, 'fooz', 'bf')
is equal to
encrypt(data, 'fooz', 'bf-cbc/pad:pkcs')
7. Random bytes
-----------------
gen_random_bytes(count integer)
Returns `count` cryptographically strong random bytes as bytea value.
There can be maximally 1024 bytes extracted at a time. This is to avoid
draining the randomness generator pool.
8. Credits
------------
I have used code from following sources:
`--------------------`-------------------------`-------------------------------
Algorithm Author Source origin
-------------------------------------------------------------------------------
DES crypt() David Burren and others FreeBSD libcrypt
MD5 crypt() Poul-Henning Kamp FreeBSD libcrypt
Blowfish crypt() Solar Designer www.openwall.com
Blowfish cipher Simon Tatham PuTTY
Rijndael cipher Brian Gladman OpenBSD sys/crypto
MD5 and SHA1 WIDE Project KAME kame/sys/crypto
SHA256/384/512 Aaron D. Gifford OpenBSD sys/crypto
BIGNUM math Michael J. Fromberger dartmouth.edu/~sting/sw/imath
-------------------------------------------------------------------------------
9. Legalese
-------------
* I owe a beer to Poul-Henning.
10. References/Links
----------------------
10.1. Useful reading
~~~~~~~~~~~~~~~~~~~~~~
http://www.gnupg.org/gph/en/manual.html[]::
The GNU Privacy Handbook
http://www.openwall.com/crypt/[]::
Describes the crypt-blowfish algorithm.
http://www.stack.nl/~galactus/remailers/passphrase-faq.html[]::
How to choose good password.
http://world.std.com/~reinhold/diceware.html[]::
Interesting idea for picking passwords.
http://www.interhack.net/people/cmcurtin/snake-oil-faq.html[]::
Describes good and bad cryptography.
10.2. Technical references
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
http://www.ietf.org/rfc/rfc2440.txt[]::
OpenPGP message format
http://www.imc.org/draft-ietf-openpgp-rfc2440bis[]::
New version of RFC2440.
http://www.ietf.org/rfc/rfc1321.txt[]::
The MD5 Message-Digest Algorithm
http://www.ietf.org/rfc/rfc2104.txt[]::
HMAC: Keyed-Hashing for Message Authentication
http://www.usenix.org/events/usenix99/provos.html[]::
Comparison of crypt-des, crypt-md5 and bcrypt algorithms.
http://csrc.nist.gov/cryptval/des.htm[]::
Standards for DES, 3DES and AES.
http://en.wikipedia.org/wiki/Fortuna_(PRNG)[]::
Description of Fortuna CSPRNG.
http://jlcooke.ca/random/[]::
Jean-Luc Cooke Fortuna-based /dev/random driver for Linux.
http://www.cs.ut.ee/~helger/crypto/[]::
Collection of cryptology pointers.
// $PostgreSQL: pgsql/contrib/pgcrypto/README.pgcrypto,v 1.19 2007/03/28 22:48:58 neilc Exp $

View File

@ -1,88 +0,0 @@
$PostgreSQL: pgsql/contrib/pgrowlocks/README.pgrowlocks,v 1.2 2007/08/27 00:13:51 tgl Exp $
pgrowlocks README Tatsuo Ishii
1. What is pgrowlocks?
pgrowlocks shows row locking information for specified table.
pgrowlocks returns following columns:
locked_row TID, -- row TID
lock_type TEXT, -- lock type
locker XID, -- locking XID
multi bool, -- multi XID?
xids xid[], -- multi XIDs
pids INTEGER[] -- locker's process id
Here is a sample execution of pgrowlocks:
test=# SELECT * FROM pgrowlocks('t1');
locked_row | lock_type | locker | multi | xids | pids
------------+-----------+--------+-------+-----------+---------------
(0,1) | Shared | 19 | t | {804,805} | {29066,29068}
(0,2) | Shared | 19 | t | {804,805} | {29066,29068}
(0,3) | Exclusive | 804 | f | {804} | {29066}
(0,4) | Exclusive | 804 | f | {804} | {29066}
(4 rows)
locked_row -- tuple ID(TID) of each locked rows
lock_type -- "Shared" for shared lock, "Exclusive" for exclusive lock
locker -- transaction ID of locker (note 1)
multi -- "t" if locker is a multi transaction, otherwise "f"
xids -- XIDs of lockers (note 2)
pids -- process ids of locking backends
note1: if the locker is multi transaction, it represents the multi ID
note2: if the locker is multi, multiple data are shown
2. Installing pgrowlocks
Installing pgrowlocks requires PostgreSQL 8.0 or later source tree.
$ cd /usr/local/src/postgresql-8.1/contrib
$ tar xfz /tmp/pgrowlocks-1.0.tar.gz
If you are using PostgreSQL 8.0, you need to modify pgrowlocks source code.
Around line 61, you will see:
#undef MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS
change this to:
#define MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS
$ make
$ make install
$ psql -e -f pgrowlocks.sql test
3. How to use pgrowlocks
pgrowlocks grab AccessShareLock for the target table and read each
row one by one to get the row locking information. You should
notice that:
1) if the table is exclusive locked by someone else, pgrowlocks
will be blocked.
2) pgrowlocks may show incorrect information if there's a new
lock or a lock is freeed while its execution.
pgrowlocks does not show the contents of locked rows. If you want
to take a look at the row contents at the same time, you could do
something like this:
SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
4. License
pgrowlocks is distribute under (modified) BSD license described in
the source file.
5. History
2006/03/21 pgrowlocks version 1.1 released (tested on 8.2 current)
2005/08/22 pgrowlocks version 1.0 released

View File

@ -1,102 +0,0 @@
pgstattuple README 2002/08/29 Tatsuo Ishii
1. Functions supported:
pgstattuple
-----------
pgstattuple() returns the relation length, percentage of the "dead"
tuples of a relation and other info. This may help users to determine
whether vacuum is necessary or not. Here is an example session:
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
Here are explanations for each column:
table_len -- physical relation length in bytes
tuple_count -- number of live tuples
tuple_len -- total tuples length in bytes
tuple_percent -- live tuples in %
dead_tuple_len -- total dead tuples length in bytes
dead_tuple_percent -- dead tuples in %
free_space -- free space in bytes
free_percent -- free space in %
pg_relpages
-----------
pg_relpages() returns the number of pages in the relation.
pgstatindex
-----------
pgstatindex() returns an array showing the information about an index:
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0
2. Installing pgstattuple
$ make
$ make install
$ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
3. Using pgstattuple
pgstattuple may be called as a relation function and is
defined as follows:
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;
The argument is the relation name (optionally it may be qualified)
or the OID of the relation. Note that pgstattuple only returns
one row.
4. Notes
pgstattuple acquires only a read lock on the relation. So concurrent
update may affect the result.
pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.
5. History
2007/05/17
Moved page-level functions to contrib/pageinspect.
2006/06/28
Extended to work against indexes.

View File

@ -1,326 +0,0 @@
This directory contains the code for the user-defined type,
SEG, representing laboratory measurements as floating point
intervals.
RATIONALE
=========
The geometry of measurements is usually more complex than that of a
point in a numeric continuum. A measurement is usually a segment of
that continuum with somewhat fuzzy limits. The measurements come out
as intervals because of uncertainty and randomness, as well as because
the value being measured may naturally be an interval indicating some
condition, such as the temperature range of stability of a protein.
Using just common sense, it appears more convenient to store such data
as intervals, rather than pairs of numbers. In practice, it even turns
out more efficient in most applications.
Further along the line of common sense, the fuzziness of the limits
suggests that the use of traditional numeric data types leads to a
certain loss of information. Consider this: your instrument reads
6.50, and you input this reading into the database. What do you get
when you fetch it? Watch:
test=> select 6.50 as "pH";
pH
---
6.5
(1 row)
In the world of measurements, 6.50 is not the same as 6.5. It may
sometimes be critically different. The experimenters usually write
down (and publish) the digits they trust. 6.50 is actually a fuzzy
interval contained within a bigger and even fuzzier interval, 6.5,
with their center points being (probably) the only common feature they
share. We definitely do not want such different data items to appear the
same.
Conclusion? It is nice to have a special data type that can record the
limits of an interval with arbitrarily variable precision. Variable in
a sense that each data element records its own precision.
Check this out:
test=> select '6.25 .. 6.50'::seg as "pH";
pH
------------
6.25 .. 6.50
(1 row)
FILES
=====
Makefile building instructions for the shared library
README.seg the file you are now reading
seg.c the implementation of this data type in c
seg.sql.in SQL code needed to register this type with postgres
(transformed to seg.sql by make)
segdata.h the data structure used to store the segments
segparse.y the grammar file for the parser (used by seg_in() in seg.c)
segscan.l scanner rules (used by seg_yyparse() in segparse.y)
seg-validate.pl a simple input validation script. It is probably a
little stricter than the type itself: for example,
it rejects '22 ' because of the trailing space. Use
as a filter to discard bad values from a single column;
redirect to /dev/null to see the offending input
sort-segments.pl a script to sort the tables having a SEG type column
INSTALLATION
============
To install the type, run
make
make install
The user running "make install" may need root access; depending on how you
configured the PostgreSQL installation paths.
This only installs the type implementation and documentation. To make the
type available in any particular database, do
psql -d databasename < seg.sql
If you install the type in the template1 database, all subsequently created
databases will inherit it.
To test the new type, after "make install" do
make installcheck
If it fails, examine the file regression.diffs to find out the reason (the
test code is a direct adaptation of the regression tests from the main
source tree).
SYNTAX
======
The external representation of an interval is formed using one or two
floating point numbers joined by the range operator ('..' or '...').
Optional certainty indicators (<, > and ~) are ignored by the internal
logics, but are retained in the data.
Grammar
-------
rule 1 seg -> boundary PLUMIN deviation
rule 2 seg -> boundary RANGE boundary
rule 3 seg -> boundary RANGE
rule 4 seg -> RANGE boundary
rule 5 seg -> boundary
rule 6 boundary -> FLOAT
rule 7 boundary -> EXTENSION FLOAT
rule 8 deviation -> FLOAT
Tokens
------
RANGE (\.\.)(\.)?
PLUMIN \'\+\-\'
integer [+-]?[0-9]+
real [+-]?[0-9]+\.[0-9]+
FLOAT ({integer}|{real})([eE]{integer})?
EXTENSION [<>~]
Examples of valid SEG representations:
--------------------------------------
Any number (rules 5,6) -- creates a zero-length segment (a point,
if you will)
~5.0 (rules 5,7) -- creates a zero-length segment AND records
'~' in the data. This notation reads 'approximately 5.0',
but its meaning is not recognized by the code. It is ignored
until you get the value back. View it is a short-hand comment.
<5.0 (rules 5,7) -- creates a point at 5.0; '<' is ignored but
is preserved as a comment
>5.0 (rules 5,7) -- creates a point at 5.0; '>' is ignored but
is preserved as a comment
5(+-)0.3
5'+-'0.3 (rules 1,8) -- creates an interval '4.7..5.3'. As of this
writing (02/09/2000), this mechanism isn't completely accurate
in determining the number of significant digits for the
boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
Also, the (+-) notation is not preserved: 'a(+-)b' will
always be returned as '(a-b) .. (a+b)'. The purpose of this
notation is to allow input from certain data sources without
conversion.
50 .. (rule 3) -- everything that is greater than or equal to 50
.. 0 (rule 4) -- everything that is less than or equal to 0
1.5e-2 .. 2E-2 (rule 2) -- creates an interval (0.015 .. 0.02)
1 ... 2 The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
Because of the widespread use of '...' in the data sources,
I decided to stick to is as a range operator. This, and
also the fact that the white space around the range operator
is ignored, creates a parsing conflict with numeric constants
starting with a decimal point.
Examples of invalid SEG input:
------------------------------
.1e7 should be: 0.1e7
.1 .. .2 should be: 0.1 .. 0.2
2.4 E4 should be: 2.4E4
The following, although it is not a syntax error, is disallowed to improve
the sanity of the data:
5 .. 2 should be: 2 .. 5
PRECISION
=========
The segments are stored internally as pairs of 32-bit floating point
numbers. It means that the numbers with more than 7 significant digits
will be truncated.
The numbers with less than or exactly 7 significant digits retain their
original precision. That is, if your query returns 0.00, you will be
sure that the trailing zeroes are not the artifacts of formatting: they
reflect the precision of the original data. The number of leading
zeroes does not affect precision: the value 0.0067 is considered to
have just 2 significant digits.
USAGE
=====
The access method for SEG is a GiST index (gist_seg_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/cube).
The operators supported by the GiST access method include:
[a, b] << [c, d] Is left of
The left operand, [a, b], occurs entirely to the left of the
right operand, [c, d], on the axis (-inf, inf). It means,
[a, b] << [c, d] is true if b < c and false otherwise
[a, b] >> [c, d] Is right of
[a, b] is occurs entirely to the right of [c, d].
[a, b] >> [c, d] is true if a > d and false otherwise
[a, b] &< [c, d] Overlaps or is left of
This might be better read as "does not extend to right of".
It is true when b <= d.
[a, b] &> [c, d] Overlaps or is right of
This might be better read as "does not extend to left of".
It is true when a >= c.
[a, b] = [c, d] Same as
The segments [a, b] and [c, d] are identical, that is, a == b
and c == d
[a, b] && [c, d] Overlaps
The segments [a, b] and [c, d] overlap.
[a, b] @> [c, d] Contains
The segment [a, b] contains the segment [c, d], that is,
a <= c and b >= d
[a, b] <@ [c, d] Contained in
The segment [a, b] is contained in [c, d], that is,
a >= c and b <= d
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
Other operators:
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
There are a few other potentially useful functions defined in seg.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
For examples of usage, see sql/seg.sql
NOTE: The performance of an R-tree index can largely depend on the
order of input values. It may be very helpful to sort the input table
on the SEG column (see the script sort-segments.pl for an example)
CREDITS
=======
My thanks are primarily to Prof. Joe Hellerstein
(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST
(http://gist.cs.berkeley.edu/). I am also grateful to all postgres
developers, present and past, for enabling myself to create my own
world and live undisturbed in it. And I would like to acknowledge my
gratitude to Argonne Lab and to the U.S. Department of Energy for the
years of faithful support of my database research.
------------------------------------------------------------------------
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
selkovjr@mcs.anl.gov

View File

@ -1,120 +0,0 @@
sslinfo - information about current SSL certificate for PostgreSQL
==================================================================
Author: Victor Wagner <vitus@cryptocom.ru>, Cryptocom LTD
E-Mail of Cryptocom OpenSSL development group: <openssl@cryptocom.ru>
1. Notes
--------
This extension won't build unless your PostgreSQL server is configured
with --with-openssl. Information provided with these functions would
be completely useless if you don't use SSL to connect to database.
2. Functions Description
------------------------
2.1. ssl_is_used()
~~~~~~~~~~~~~~~~~~
ssl_is_used() RETURNS boolean;
Returns TRUE, if current connection to server uses SSL and FALSE
otherwise.
2.2. ssl_client_cert_present()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ssl_client_cert_present() RETURNS boolean
Returns TRUE if current client have presented valid SSL client
certificate to the server and FALSE otherwise (e.g., no SSL,
certificate hadn't be requested by server).
2.3. ssl_client_serial()
~~~~~~~~~~~~~~~~~~~~~~~~
ssl_client_serial() RETURNS numeric
Returns serial number of current client certificate. The combination
of certificate serial number and certificate issuer is guaranteed to
uniquely identify certificate (but not its owner -- the owner ought to
regularily change his keys, and get new certificates from the issuer).
So, if you run you own CA and allow only certificates from this CA to
be accepted by server, the serial number is the most reliable (albeit
not very mnemonic) means to indentify user.
2.4. ssl_client_dn()
~~~~~~~~~~~~~~~~~~~~
ssl_client_dn() RETURNS text
Returns the full subject of current client certificate, converting
character data into the current database encoding. It is assumed that
if you use non-Latin characters in the certificate names, your
database is able to represent these characters, too. If your database
uses the SQL_ASCII encoding, non-Latin characters in the name will be
represented as UTF-8 sequences.
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
2.5. ssl_issuer_dn()
~~~~~~~~~~~~~~~~~~~~
Returns the full issuer name of the client certificate, converting
character data into current database encoding.
The combination of the return value of this function with the
certificate serial number uniquely identifies the certificate.
The result of this function is really useful only if you have more
than one trusted CA certificate in your server's root.crt file, or if
this CA has issued some intermediate certificate authority
certificates.
2.6. ssl_client_dn_field()
~~~~~~~~~~~~~~~~~~~~~~~~~~
ssl_client_dn_field(fieldName text) RETURNS text
This function returns the value of the specified field in the
certificate subject. Field names are string constants that are
converted into ASN1 object identificators using the OpenSSL object
database. The following values are acceptable:
commonName (alias CN)
surname (alias SN)
name
givenName (alias GN)
countryName (alias C)
localityName (alias L)
stateOrProvinceName (alias ST)
organizationName (alias O)
organizationUnitName (alias OU)
title
description
initials
postalCode
streetAddress
generationQualifier
description
dnQualifier
x500UniqueIdentifier
pseudonim
role
emailAddress
All of these fields are optional, except commonName. It depends
entirely on your CA policy which of them would be included and which
wouldn't. The meaning of these fields, howeer, is strictly defined by
the X.500 and X.509 standards, so you cannot just assign arbitrary
meaning to them.
2.7 ssl_issuer_field()
~~~~~~~~~~~~~~~~~~~
ssl_issuer_field(fieldName text) RETURNS text;
Does same as ssl_client_dn_field, but for the certificate issuer
rather than the certificate subject.

View File

@ -1,642 +0,0 @@
/*
* tablefunc
*
* Sample to demonstrate C functions which return setof scalar
* and setof composite.
* Joe Conway <mail@joeconway.com>
* And contributors:
* Nabil Sayegh <postgresql@e-trolley.de>
*
* Copyright (c) 2002-2007, PostgreSQL Global Development Group
*
* Permission to use, copy, modify, and distribute this software and its
* documentation for any purpose, without fee, and without a written agreement
* is hereby granted, provided that the above copyright notice and this
* paragraph and the following two paragraphs appear in all copies.
*
* IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*
* THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
*
*/
Version 0.1 (20 July, 2002):
First release
Release Notes:
Version 0.1
- initial release
Installation:
Place these files in a directory called 'tablefunc' under 'contrib' in the
PostgreSQL source tree. Then run:
make
make install
You can use tablefunc.sql to create the functions in your database of choice, e.g.
psql -U postgres template1 < tablefunc.sql
installs following functions into database template1:
normal_rand(int numvals, float8 mean, float8 stddev)
- returns a set of normally distributed float8 values
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per the instructions
in the documentation below.
crosstab(text sql)
- returns a set of row_name plus N category value columns
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
crosstab(text sql, N int)
- obsolete version of crosstab()
- the argument N is now ignored, since the number of value columns
is always determined by the calling query
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
and an optional serial column for ordering siblings
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Documentation
==================================================================
Name
normal_rand(int, float8, float8) - returns a set of normally
distributed float8 values
Synopsis
normal_rand(int numvals, float8 mean, float8 stddev)
Inputs
numvals
the number of random values to be returned from the function
mean
the mean of the normal distribution of values
stddev
the standard deviation of the normal distribution of values
Outputs
Returns setof float8, where the returned set of random values are normally
distributed (Gaussian distribution)
Example usage
test=# SELECT * FROM
test=# normal_rand(1000, 5, 3);
normal_rand
----------------------
1.56556322244898
9.10040991424657
5.36957140345079
-0.369151492880995
0.283600703686639
.
.
.
4.82992125404908
9.71308014517282
2.49639286969028
(1000 rows)
Returns 1000 values with a mean of 5 and a standard deviation of 3.
==================================================================
Name
crosstabN(text) - returns a set of row_name plus N category value columns
Synopsis
crosstabN(text sql)
Inputs
sql
A SQL statement which produces the source set of data. The SQL statement
must return one row_name column, one category column, and one value
column. row_name and value must be of type text.
e.g. provided sql must produce a set something like:
row_name cat value
----------+-------+-------
row1 cat1 val1
row1 cat2 val2
row1 cat3 val3
row1 cat4 val4
row2 cat1 val5
row2 cat2 val6
row2 cat3 val7
row2 cat4 val8
Outputs
Returns setof tablefunc_crosstab_N, which is defined by:
CREATE TYPE tablefunc_crosstab_N AS (
row_name TEXT,
category_1 TEXT,
category_2 TEXT,
.
.
.
category_N TEXT
);
for the default installed functions, where N is 2, 3, or 4.
e.g. the provided crosstab2 function produces a set something like:
<== values columns ==>
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
Notes
1. The sql result must be ordered by 1,2.
2. The number of values columns depends on the tuple description
of the function's declared return type.
3. Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
4. Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
5. Rows with all nulls in the values columns are skipped.
6. The installed defaults are for illustration purposes. You
can create your own return types and functions based on the
crosstab() function of the installed library. See below for
details.
Example usage
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
select * from crosstab3(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;');
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
==================================================================
Name
crosstab(text) - returns a set of row_names plus category value columns
Synopsis
crosstab(text sql)
crosstab(text sql, int N)
Inputs
sql
A SQL statement which produces the source set of data. The SQL statement
must return one row_name column, one category column, and one value
column.
e.g. provided sql must produce a set something like:
row_name cat value
----------+-------+-------
row1 cat1 val1
row1 cat2 val2
row1 cat3 val3
row1 cat4 val4
row2 cat1 val5
row2 cat2 val6
row2 cat3 val7
row2 cat4 val8
N
Obsolete argument; ignored if supplied (formerly this had to match
the number of category columns determined by the calling query)
Outputs
Returns setof record, which must be defined with a column definition
in the FROM clause of the SELECT statement, e.g.:
SELECT *
FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text);
the example crosstab function produces a set something like:
<== values columns ==>
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
Notes
1. The sql result must be ordered by 1,2.
2. The number of values columns is determined by the column definition
provided in the FROM clause. The FROM clause must define one
row_name column (of the same datatype as the first result column
of the sql query) followed by N category columns (of the same
datatype as the third result column of the sql query). You can
set up as many category columns as you wish.
3. Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
4. Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
5. Rows with all nulls in the values columns are skipped.
6. You can avoid always having to write out a FROM clause that defines the
output columns by setting up a custom crosstab function that has
the desired output row type wired into its definition.
There are two ways you can set up a custom crosstab function:
A. Create a composite type to define your return type, similar to the
examples in the installation script. Then define a unique function
name accepting one text parameter and returning setof your_type_name.
For example, if your source data produces row_names that are TEXT,
and values that are FLOAT8, and you want 5 category columns:
CREATE TYPE my_crosstab_float8_5_cols AS (
row_name TEXT,
category_1 FLOAT8,
category_2 FLOAT8,
category_3 FLOAT8,
category_4 FLOAT8,
category_5 FLOAT8
);
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
RETURNS setof my_crosstab_float8_5_cols
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
B. Use OUT parameters to define the return type implicitly.
The same example could also be done this way:
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text,
OUT row_name TEXT,
OUT category_1 FLOAT8,
OUT category_2 FLOAT8,
OUT category_3 FLOAT8,
OUT category_4 FLOAT8,
OUT category_5 FLOAT8)
RETURNS setof record
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
Example usage
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
SELECT *
FROM crosstab(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3)
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
==================================================================
Name
crosstab(text, text) - returns a set of row_name, extra, and
category value columns
Synopsis
crosstab(text source_sql, text category_sql)
Inputs
source_sql
A SQL statement which produces the source set of data. The SQL statement
must return one row_name column, one category column, and one value
column. It may also have one or more "extra" columns.
The row_name column must be first. The category and value columns
must be the last two columns, in that order. "extra" columns must be
columns 2 through (N - 2), where N is the total number of columns.
The "extra" columns are assumed to be the same for all rows with the
same row_name. The values returned are copied from the first row
with a given row_name and subsequent values of these columns are ignored
until row_name changes.
e.g. source_sql must produce a set something like:
SELECT row_name, extra_col, cat, value FROM foo;
row_name extra_col cat value
----------+------------+-----+---------
row1 extra1 cat1 val1
row1 extra1 cat2 val2
row1 extra1 cat4 val4
row2 extra2 cat1 val5
row2 extra2 cat2 val6
row2 extra2 cat3 val7
row2 extra2 cat4 val8
category_sql
A SQL statement which produces the distinct set of categories. The SQL
statement must return one category column only. category_sql must produce
at least one result row or an error will be generated. category_sql
must not produce duplicate categories or an error will be generated.
e.g. SELECT DISTINCT cat FROM foo;
cat
-------
cat1
cat2
cat3
cat4
Outputs
Returns setof record, which must be defined with a column definition
in the FROM clause of the SELECT statement, e.g.:
SELECT * FROM crosstab(source_sql, cat_sql)
AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
the example crosstab function produces a set something like:
<== values columns ==>
row_name extra cat1 cat2 cat3 cat4
---------+-------+------+------+------+------
row1 extra1 val1 val2 val4
row2 extra2 val5 val6 val7 val8
Notes
1. source_sql must be ordered by row_name (column 1).
2. The number of values columns is determined at run-time. The
column definition provided in the FROM clause must provide for
the correct number of columns of the proper data types.
3. Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
4. Extra values (i.e. source rows with category not found in category_sql
result) are skipped.
5. Rows with a null row_name column are skipped.
6. You can create predefined functions to avoid having to write out
the result column names/types in each query. See the examples
for crosstab(text).
Example usage
create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
SELECT * FROM crosstab
(
'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
rowid text,
rowdt timestamp,
temperature int4,
test_result text,
test_startdate timestamp,
volts float8
);
rowid | rowdt | temperature | test_result | test_startdate | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)
==================================================================
Name
connectby(text, text, text[, text], text, text, int[, text]) - returns a set
representing a hierarchy (tree structure)
Synopsis
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
Inputs
relname
Name of the source relation
keyid_fld
Name of the key field
parent_keyid_fld
Name of the key_parent field
orderby_fld
If optional ordering of siblings is desired:
Name of the field to order siblings
start_with
root value of the tree input as a text value regardless of keyid_fld type
max_depth
zero (0) for unlimited depth, otherwise restrict level to this depth
branch_delim
If optional branch value is desired, this string is used as the delimiter.
When not provided, a default value of '~' is used for internal
recursion detection only, and no "branch" field is returned.
Outputs
Returns setof record, which must defined with a column definition
in the FROM clause of the SELECT statement, e.g.:
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
- or -
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
- or -
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int);
- or -
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int);
Notes
1. keyid and parent_keyid must be the same data type
2. The column definition *must* include a third column of type INT4 for
the level value output
3. If the branch field is not desired, omit both the branch_delim input
parameter *and* the branch field in the query column definition. Note
that when branch_delim is not provided, a default value of '~' is used
for branch_delim for internal recursion detection, even though the branch
field is not returned.
4. If the branch field is desired, it must be the fourth column in the query
column definition, and it must be type TEXT.
5. The parameters representing table and field names must include double
quotes if the names are mixed-case or contain special characters.
6. If sorting of siblings is desired, the orderby_fld input parameter *and*
a name for the resulting serial field (type INT32) in the query column
definition must be given.
Example usage
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);
-- with branch, without orderby_fld
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+---------------------
row2 | | 0 | row2
row4 | row2 | 1 | row2~row4
row6 | row4 | 2 | row2~row4~row6
row8 | row6 | 3 | row2~row4~row6~row8
row5 | row2 | 1 | row2~row5
row9 | row5 | 2 | row2~row5~row9
(6 rows)
-- without branch, without orderby_fld
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
keyid | parent_keyid | level
-------+--------------+-------
row2 | | 0
row4 | row2 | 1
row6 | row4 | 2
row8 | row6 | 3
row5 | row2 | 1
row9 | row5 | 2
(6 rows)
-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos;
keyid | parent_keyid | level | branch | pos
-------+--------------+-------+---------------------+-----
row2 | | 0 | row2 | 1
row5 | row2 | 1 | row2~row5 | 2
row9 | row5 | 2 | row2~row5~row9 | 3
row4 | row2 | 1 | row2~row4 | 4
row6 | row4 | 2 | row2~row4~row6 | 5
row8 | row6 | 3 | row2~row4~row6~row8 | 6
(6 rows)
-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos;
keyid | parent_keyid | level | pos
-------+--------------+-------+-----
row2 | | 0 | 1
row5 | row2 | 1 | 2
row9 | row5 | 2 | 3
row4 | row2 | 1 | 4
row6 | row4 | 2 | 5
row8 | row6 | 3 | 6
(6 rows)
==================================================================
-- Joe Conway

View File

@ -1,97 +0,0 @@
UUID Generation Functions
=========================
Peter Eisentraut <peter_e@gmx.net>
This module provides functions to generate universally unique
identifiers (UUIDs) using one of the several standard algorithms, as
well as functions to produce certain special UUID constants.
Installation
------------
The extra library required can be found at
<http://www.ossp.org/pkg/lib/uuid/>.
UUID Generation
---------------
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
4122 specify four algorithms for generating UUIDs, identified by the
version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.)
Each of these algorithms could be suitable for a different set of
applications.
uuid_generate_v1()
~~~~~~~~~~~~~~~~~~
This function generates a version 1 UUID. This involves the MAC
address of the computer and a time stamp. Note that UUIDs of this
kind reveal the identity of the computer that created the identifier
and the time at which it did so, which might make it unsuitable for
certain security-sensitive applications.
uuid_generate_v1mc()
~~~~~~~~~~~~~~~~~~~~
This function generates a version 1 UUID but uses a random multicast
MAC address instead of the real MAC address of the computer.
uuid_generate_v3(namespace uuid, name text)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function generates a version 3 UUID in the given namespace using
the specified input name. The namespace should be one of the special
constants produced by the uuid_ns_*() functions shown below. (It
could be any UUID in theory.) The name is an identifier in the
selected namespace. For example:
uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
uuid_generate_v4()
~~~~~~~~~~~~~~~~~~
This function generates a version 4 UUID, which is derived entirely
from random numbers.
uuid_generate_v5(namespace uuid, name text)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function generates a version 5 UUID, which works like a version 3
UUID except that SHA-1 is used as a hashing method. Version 5 should
be preferred over version 3 because SHA-1 is thought to be more secure
than MD5.
UUID Constants
--------------
uuid_nil()
A "nil" UUID constant, which does not occur as a real UUID.
uuid_ns_dns()
Constant designating the DNS namespace for UUIDs.
uuid_ns_url()
Constant designating the URL namespace for UUIDs.
uuid_ns_oid()
Constant designating the ISO object identifier (OID) namespace for
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
PostgreSQL.)
uuid_ns_x500()
Constant designating the X.500 distinguished name (DN) namespace for
UUIDs.

View File

@ -1,58 +0,0 @@
$PostgreSQL: pgsql/contrib/vacuumlo/README.vacuumlo,v 1.5 2005/06/23 00:06:37 tgl Exp $
This is a simple utility that will remove any orphaned large objects out of a
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
does not appear in any OID data column of the database.
If you use this, you may also be interested in the lo_manage trigger in
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
in the first place.
Compiling
--------
Simply run make. A single executable "vacuumlo" is created.
Usage
-----
vacuumlo [options] database [database2 ... databasen]
All databases named on the command line are processed. Available options
include:
-v Write a lot of progress messages
-n Don't remove large objects, just show what would be done
-U username Username to connect as
-W Prompt for password
-h hostname Database server host
-p port Database server port
Method
------
First, it builds a temporary table which contains all of the OIDs of the
large objects in that database.
It then scans through all columns in the database that are of type "oid"
or "lo", and removes matching entries from the temporary table.
The remaining entries in the temp table identify orphaned LOs. These are
removed.
Notes
-----
I decided to place this in contrib as it needs further testing, but hopefully,
this (or a variant of it) would make it into the backend as a "vacuum lo"
command in a later release.
Peter Mount <peter@retep.org.uk>
http://www.retep.org.uk
March 21 1999
Committed April 10 1999 Peter

View File

@ -1,278 +0,0 @@
XML-handling functions for PostgreSQL
=====================================
DEPRECATION NOTICE: From PostgreSQL 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does as well, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
-- Peter Eisentraut, 2007-05-24
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
It has the same BSD licence as PostgreSQL.
This version of the XML functions provides both XPath querying and
XSLT functionality. There is also a new table function which allows
the straightforward return of multiple XML results. Note that the current code
doesn't take any particular care over character sets - this is
something that should be fixed at some point!
Installation
------------
The current build process will only work if the files are in
contrib/xml2 in a PostgreSQL 7.3 or later source tree which has been
configured and built (If you alter the subdir value in the Makefile
you can place it in a different directory in a PostgreSQL tree).
Before you begin, just check the Makefile, and then just 'make' and
'make install'.
By default, this module requires both libxml2 and libxslt to be installed
on your system. If you do not have libxslt or do not want to use XSLT
functions, you must edit the Makefile to not build the XSLT functions,
as directed in its comments; and edit pgxml.sql.in to remove the XSLT
function declarations, as directed in its comments.
Description of functions
------------------------
The first set of functions are straightforward XML parsing and XPath queries:
xml_is_well_formed(document) RETURNS bool
This parses the document text in its parameter and returns true if the
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
was called xml_valid(). That is the wrong name since validity and
well-formedness have different meanings in XML. The old name is still
available, but is deprecated and will be removed in 8.3.)
xpath_string(document,query) RETURNS text
xpath_number(document,query) RETURNS float4
xpath_bool(document,query) RETURNS bool
These functions evaluate the XPath query on the supplied document, and
cast the result to the specified type.
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
This evaluates query on document and wraps the result in XML tags. If
the result is multivalued, the output will look like:
<toptag>
<itemtag>Value 1 which could be an XML fragment</itemtag>
<itemtag>Value 2....</itemtag>
</toptag>
If either toptag or itemtag is an empty string, the relevant tag is omitted.
There are also wrapper functions for this operation:
xpath_nodeset(document,query) RETURNS text omits both tags.
xpath_nodeset(document,query,itemtag) RETURNS text omits toptag.
xpath_list(document,query,seperator) RETURNS text
This function returns multiple values seperated by the specified
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
xpath_list(document,query) RETURNS text
This is a wrapper for the above function that uses ',' as the seperator.
xpath_table
-----------
This is a table function which evaluates a set of XPath queries on
each of a set of documents and returns the results as a table. The
primary key field from the original document table is returned as the
first column of the result so that the resultset from xpath_table can
be readily used in joins.
The function itself takes 5 arguments, all text.
xpath_table(key,document,relation,xpaths,criteria)
key - the name of the "key" field - this is just a field to be used as
the first column of the output table i.e. it identifies the record from
which each output row came (see note below about multiple values).
document - the name of the field containing the XML document
relation - the name of the table or view containing the documents
xpaths - multiple xpath expressions separated by |
criteria - The contents of the where clause. This needs to be specified,
so use "true" or "1=1" here if you want to process all the rows in the
relation.
NB These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility - the
statement is
SELECT <key>,<document> FROM <relation> WHERE <criteria>
so those parameters can be *anything* valid in those particular
locations. The result from this SELECT needs to return exactly two
columns (which it will unless you try to list multiple fields for key
or document). Beware that this simplistic approach requires that you
validate any user-supplied values to avoid SQL injection attacks.
Using the function
The function has to be used in a FROM expression. This gives the following
form:
SELECT * FROM
xpath_table('article_id',
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);
The AS clause defines the names and types of the columns in the
virtual table. If there are more XPath queries than result columns,
the extra queries will be ignored. If there are more result columns
than XPath queries, the extra columns will be NULL.
Note that I've said in this example that pages is an integer. The
function deals internally with string representations, so when you say
you want an integer in the output, it will take the string
representation of the XPath result and use PostgreSQL input functions
to transform it into an integer (or whatever type the AS clause
requests). An error will result if it can't do this - for example if
the result is empty - so you may wish to just stick to 'text' as the
column type if you think your data has any problems.
The select statement doesn't need to use * alone - it can reference the
columns by name or join them to other tables. The function produces a
virtual table with which you can perform any operation you wish (e.g.
aggregation, joining, sorting etc). So we could also have:
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id','article_xml','articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
WHERE t.author_id = p.person_id;
as a more complicated example. Of course, you could wrap all
of this in a view for convenience.
Multivalued results
The xpath_table function assumes that the results of each XPath query
might be multi-valued, so the number of rows returned by the function
may not be the same as the number of input documents. The first row
returned contains the first result from each query, the second row the
second result from each query. If one of the queries has fewer values
than the others, NULLs will be returned instead.
In some cases, a user will know that a given XPath query will return
only a single result (perhaps a unique document identifier) - if used
alongside an XPath query returning multiple results, the single-valued
result will appear only on the first row of the result. The solution
to this is to use the key field as part of a join against a simpler
XPath query. As an example:
CREATE TABLE test
(
id int4 NOT NULL,
xml text,
CONSTRAINT pk PRIMARY KEY (id)
)
WITHOUT OIDS;
INSERT INTO test VALUES (1, '<doc num="C1">
<line num="L1"><a>1</a><b>2</b><c>3</c></line>
<line num="L2"><a>11</a><b>22</b><c>33</c></line>
</doc>');
INSERT INTO test VALUES (2, '<doc num="C2">
<line num="L1"><a>111</a><b>222</b><c>333</c></line>
<line num="L2"><a>111</a><b>222</b><c>333</c></line>
</doc>');
The query:
SELECT * FROM xpath_table('id','xml','test',
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
Gives the result:
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
SELECT t.*,i.doc_num FROM
xpath_table('id','xml','test',
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
xpath_table('id','xml','test','/doc/@num','1=1')
AS i(id int4, doc_num varchar(10))
WHERE i.id=t.id AND i.id=1
ORDER BY doc_num, line_num;
which gives the desired result:
id | line_num | val1 | val2 | val3 | doc_num
----+----------+------+------+------+---------
1 | L1 | 1 | 2 | 3 | C1
1 | L2 | 11 | 22 | 33 | C1
(2 rows)
XSLT functions
--------------
The following functions are available if libxslt is installed (this is
not currently detected automatically, so you will have to amend the
Makefile)
xslt_process(document,stylesheet,paramlist) RETURNS text
This function appplies the XSL stylesheet to the document and returns
the transformed result. The paramlist is a list of parameter
assignments to be used in the transformation, specified in the form
'a=1,b=2'. Note that this is also proof-of-concept code and the
parameter parsing is very simple-minded (e.g. parameter values cannot
contain commas!)
Also note that if either the document or stylesheet values do not
begin with a < then they will be treated as URLs and libxslt will
fetch them. It thus follows that you can use xslt_process as a means
to fetch the contents of URLs - you should be aware of the security
implications of this.
There is also a two-parameter version of xslt_process which does not
pass any parameters to the transformation.
Feedback
--------
If you have any comments or suggestions, please do contact me at
jgray@azuli.co.uk. Unfortunately, this isn't my main job, so I can't
guarantee a rapid response to your query!

View File

@ -0,0 +1,32 @@
<sect1>
<title>adminpack</title>
<para>
adminpack is a PostgreSQL standard module that implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
</para>
<sect2>
<title>Functions implemented</title>
<para>
Functions implemented by adminpack can only be run by a superuser. Here's a
list of these functions:
</para>
<para>
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,40 @@
<sect1>
<!--
<indexterm zone="btree-gist">
<primary>btree-gist</primary>
</indexterm>
-->
<title>btree-gist</title>
<para>
btree-gist is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
</para>
<sect2>
<title>Example usage</title>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a < 10;
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
(<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
information.
</para>
</sect2>
</sect1>

View File

@ -1,37 +1,32 @@
Pg_buffercache - Real time queries on the shared buffer cache.
--------------
This module consists of a C function 'pg_buffercache_pages()' that returns
a set of records, plus a view 'pg_buffercache' to wrapper the function.
The intent is to do for the buffercache what pg_locks does for locks, i.e -
ability to examine what is happening at any given time without having to
restart or rebuild the server with debugging code added.
<sect1 id="buffercache">
<title>pg_buffercache</title>
<indexterm zone="buffercache">
<primary>pg_buffercache</primary>
</indexterm>
<para>
<literal>pg_buffercache</literal> module provides the means for examining
what's happening to the buffercache at any given time without having to
restart or rebuild the server with debugging code added. The intent is to
do for the buffercache what pg_locks does for locks.
</para>
<para>
This module consists of a C function <literal>pg_buffercache_pages()</literal>
that returns a set of records, plus a view <literal>pg_buffercache</literal>
to wrapper the function.
</para>
<para>
By default public access is REVOKED from both of these, just in case there
are security issues lurking.
</para>
Installation
------------
Build and install the main Postgresql source, then this contrib module:
$ cd contrib/pg_buffercache
$ gmake
$ gmake install
To register the functions:
$ psql -d <database> -f pg_buffercache.sql
Notes
-----
The definition of the columns exposed in the view is:
<sect2>
<title>Notes</title>
<para>
The definition of the columns exposed in the view is:
</para>
<programlisting>
Column | references | Description
----------------+----------------------+------------------------------------
bufferid | | Id, 1..shared_buffers.
@ -41,23 +36,27 @@ Notes
relblocknumber | | Offset of the page in the relation.
isdirty | | Is the page dirty?
usagecount | | Page LRU count
</programlisting>
<para>
There is one row for each buffer in the shared cache. Unused buffers are
shown with all fields null except bufferid.
</para>
<para>
Because the cache is shared by all the databases, there are pages from
relations not belonging to the current database.
</para>
<para>
When the pg_buffercache view is accessed, internal buffer manager locks are
taken, and a copy of the buffer cache data is made for the view to display.
This ensures that the view produces a consistent set of results, while not
blocking normal buffer activity longer than necessary. Nonetheless there
could be some impact on database performance if this view is read often.
</para>
</sect2>
There is one row for each buffer in the shared cache. Unused buffers are
shown with all fields null except bufferid.
Because the cache is shared by all the databases, there are pages from
relations not belonging to the current database.
When the pg_buffercache view is accessed, internal buffer manager locks are
taken, and a copy of the buffer cache data is made for the view to display.
This ensures that the view produces a consistent set of results, while not
blocking normal buffer activity longer than necessary. Nonetheless there
could be some impact on database performance if this view is read often.
Sample output
-------------
<sect2>
<title>Sample output</title>
<programlisting>
regression=# \d pg_buffercache;
View "public.pg_buffercache"
Column | Type | Modifiers
@ -98,18 +97,25 @@ Sample output
(10 rows)
regression=#
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<itemizedlist>
<listitem>
<para>
Mark Kirkwood <email>markir@paradise.net.nz</email>
</para>
</listitem>
<listitem>
<para>Design suggestions: Neil Conway <email>neilc@samurai.com</email></para>
</listitem>
<listitem>
<para>Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email></para>
</listitem>
</itemizedlist>
</sect2>
Author
------
</sect1>
* Mark Kirkwood <markir@paradise.net.nz>
Help
----
* Design suggestions : Neil Conway <neilc@samurai.com>
* Debugging advice : Tom Lane <tgl@sss.pgh.pa.us>
Thanks guys!

84
doc/src/sgml/chkpass.sgml Normal file
View File

@ -0,0 +1,84 @@
<sect1 id="chkpass">
<title>chkpass</title>
<!--
<indexterm zone="chkpass">
<primary>chkpass</primary>
</indexterm>
-->
<para>
chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
</para>
<para>
Note that the chkpass data type is not indexable.
<!--
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
-->
</para>
<para>
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
</para>
<para>
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
</para>
<para>
The encryption uses the standard Unix function crypt(), and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
</para>
<para>
Here is some sample usage:
</para>
<programlisting>
test=# create table test (p chkpass);
CREATE TABLE
test=# insert into test values ('hello');
INSERT 0 1
test=# select * from test;
p
----------------
:dVGkpXdOrE3ko
(1 row)
test=# select raw(p) from test;
raw
---------------
dVGkpXdOrE3ko
(1 row)
test=# select p = 'hello' from test;
?column?
----------
t
(1 row)
test=# select p = 'goodbye' from test;
?column?
----------
f
(1 row)
</programlisting>
<sect2>
<title>Author</title>
<para>
D'Arcy J.M. Cain <email>darcy@druid.net</email>
</para>
</sect2>
</sect1>

56
doc/src/sgml/contrib.sgml Normal file
View File

@ -0,0 +1,56 @@
<chapter id="contrib">
<title>Standard Modules</title>
<para>
This section contains information regarding the standard modules which
can be found in the <literal>contrib</literal> directory of the
PostgreSQL distribution. These are porting tools, analysis utilities,
and plug-in features that are not part of the core PostgreSQL system,
mainly because they address a limited audience or are too experimental
to be part of the main source tree. This does not preclude their
usefulness.
</para>
<para>
Some modules supply new user-defined functions, operators, or types. In
these cases, you will need to run <literal>make</literal> and <literal>make
install</literal> in <literal>contrib/module</literal>. After you have
installed the files you need to register the new entities in the database
system by running the commands in the supplied .sql file. For example,
<programlisting>
$ psql -d dbname -f module.sql
</programlisting>
</para>
&adminpack;
&btree-gist;
&chkpass;
&cube;
&dblink;
&earthdistance;
&fuzzystrmatch;
&hstore;
&intagg;
&intarray;
&isn;
&lo;
&ltree;
&oid2name;
&pageinspect;
&pgbench;
&buffercache;
&pgcrypto;
&freespacemap;
&pgrowlocks;
&standby;
&pgstattuple;
&trgm;
&seg;
&sslinfo;
&tablefunc;
&uuid-ossp;
&vacuumlo;
&xml2;
</chapter>

529
doc/src/sgml/cube.sgml Normal file
View File

@ -0,0 +1,529 @@
<sect1 id="cube">
<title>cube</title>
<indexterm zone="cube">
<primary>cube</primary>
</indexterm>
<para>
This module contains the user-defined type, CUBE, representing
multidimensional cubes.
</para>
<sect2>
<title>Syntax</title>
<para>
The following are valid external representations for the CUBE type:
</para>
<table>
<title>Cube external representations</title>
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point or
one-dimensional zero length cubement
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space, represented internally as a zero
volume box
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>1-D cubement starting at x and ending at y or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>n-dimensional box represented by a pair of its opposite corners, no
matter which. Functions take care of swapping to achieve "lower left --
upper right" representation before computing any values
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Grammar</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>box -> O_BRACKET paren_list COMMA paren_list C_BRACKET</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>box -> paren_list COMMA paren_list</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>box -> paren_list</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>box -> list</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>paren_list -> O_PAREN list C_PAREN</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>list -> FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>list -> list COMMA FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Tokens</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>n</entry>
<entry>[0-9]+</entry>
</row>
<row>
<entry>i</entry>
<entry>nteger [+-]?{n}</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?({n}\.{n}?|\.{n})</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
</row>
<row>
<entry>O_BRACKET</entry>
<entry>\[</entry>
</row>
<row>
<entry>C_BRACKET</entry>
<entry>\]</entry>
</row>
<row>
<entry>O_PAREN</entry>
<entry>\(</entry>
</row>
<row>
<entry>C_PAREN</entry>
<entry>\)</entry>
</row>
<row>
<entry>COMMA</entry>
<entry>\,</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Examples</title>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point
(or, zero-length one-dimensional interval)
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space,represented internally as a zero
volume cube
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>A 1-D interval starting at x and ending at y or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'[(x),(y)]'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>An n-dimensional box represented by a pair of its diagonally
opposite corners, regardless of order. Swapping is provided
by all comarison routines to ensure the
"lower left -- upper right" representation
before actaul comparison takes place.
</entry>
</row>
<row>
<entry>'[(x1,...,xn),(y1,...,yn)]'</entry>
<entry>Same as above</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
</para>
</sect2>
<sect2>
<title>Defaults</title>
<para>
I believe this union:
</para>
<programlisting>
select cube_union('(0,5,2),(2,3,1)','0');
cube_union
-------------------
(0, 0, 0),(2, 5, 2)
(1 row)
</programlisting>
<para>
does not contradict to the common sense, neither does the intersection
</para>
<programlisting>
select cube_inter('(0,-1),(1,1)','(-2),(2)');
cube_inter
-------------
(0, 0),(1, 0)
(1 row)
</programlisting>
<para>
In all binary operations on differently sized boxes, I assume the smaller
one to be a cartesian projection, i. e., having zeroes in place of coordinates
omitted in the string representation. The above examples are equivalent to:
</para>
<programlisting>
cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
</programlisting>
<para>
The following containment predicate uses the point syntax,
while in fact the second argument is internally represented by a box.
This syntax makes it unnecessary to define the special Point type
and functions for (box,point) predicates.
</para>
<programlisting>
select cube_contains('(0,0),(1,1)', '0.5,0.5');
cube_contains
--------------
t
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Precision</title>
<para>
Values are stored internally as 64-bit floating point numbers. This means that
numbers with more than about 16 significant digits will be truncated.
</para>
</sect2>
<sect2>
<title>Usage</title>
<para>
The access method for CUBE is a GiST index (gist_cube_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/seg).
</para>
<para>
The operators supported by the GiST access method include:
</para>
<programlisting>
a = b Same as
</programlisting>
<para>
The cubements a and b are identical.
</para>
<programlisting>
a && b Overlaps
</programlisting>
<para>
The cubements a and b overlap.
</para>
<programlisting>
a @> b Contains
</programlisting>
<para>
The cubement a contains the cubement b.
</para>
<programlisting>
a <@ b Contained in
</programlisting>
<para>
The cubement a is contained in b.
</para>
<para>
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
<programlisting>
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
</para>
<para>
The following functions are available:
</para>
<table>
<title>Functions available</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>cube_distance(cube, cube) returns double</literal></entry>
<entry>cube_distance returns the distance between two cubes. If both
cubes are points, this is the normal distance function.
</entry>
</row>
<row>
<entry><literal>cube(float8) returns cube</literal></entry>
<entry>This makes a one dimensional cube with both coordinates the same.
If the type of the argument is a numeric type other than float8 an
explicit cast to float8 may be needed.
<literal>cube(1) == '(1)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8, float8) returns cube</literal></entry>
<entry>
This makes a one dimensional cube.
<literal>cube(1,2) == '(1),(2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[]) returns cube</literal></entry>
<entry>This makes a zero-volume cube using the coordinates
defined by thearray.<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[], float8[]) returns cube</literal></entry>
<entry>This makes a cube, with upper right and lower left
coordinates as defined by the 2 float arrays. Arrays must be of the
same length.
<literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube with the same values for both parts of the new coordinate.
This is useful for building cubes piece by piece from calculated values.
<literal>cube('(1)',2) == '(1,2),(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube. This is useful for building cubes piece by piece from
calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal>
</entry>
</row>
<row>
<entry><literal>cube_dim(cube) returns int</literal></entry>
<entry>cube_dim returns the number of dimensions stored in the
the data structure
for a cube. This is useful for constraints on the dimensions of a cube.
</entry>
</row>
<row>
<entry><literal>cube_ll_coord(cube, int) returns double </literal></entry>
<entry>
cube_ll_coord returns the nth coordinate value for the lower left
corner of a cube. This is useful for doing coordinate transformations.
</entry>
</row>
<row>
<entry><literal>cube_ur_coord(cube, int) returns double
</literal></entry>
<entry>cube_ur_coord returns the nth coordinate value for the
upper right corner of a cube. This is useful for doing coordinate
transformations.
</entry>
</row>
<row>
<entry><literal>cube_subset(cube, int[]) returns cube
</literal></entry>
<entry>Builds a new cube from an existing cube, using a list of
dimension indexes
from an array. Can be used to find both the ll and ur coordinate of single
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
Or can be used to drop dimensions, or reorder them as desired, e.g.:
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) =
'(5, 3, 1, 1),(8, 7, 6, 6)'
</entry>
</row>
<row>
<entry><literal>cube_is_point(cube) returns bool</literal></entry>
<entry>cube_is_point returns true if a cube is also a point.
This is true when the two defining corners are the same.</entry>
</row>
<row>
<entry><literal>cube_enlarge(cube, double, int) returns cube</literal></entry>
<entry>
cube_enlarge increases the size of a cube by a specified
radius in at least
n dimensions. If the radius is negative the box is shrunk instead. This
is useful for creating bounding boxes around a point for searching for
nearby points. All defined dimensions are changed by the radius. If n
is greater than the number of defined dimensions and the cube is being
increased (r >= 0) then 0 is used as the base for the extra coordinates.
LL coordinates are decreased by r and UR coordinates are increased by r.
If a LL coordinate is increased to larger than the corresponding UR
coordinate (this can only happen when r < 0) than both coordinates are
set to their average. To make it harder for people to break things there
is an effective maximum on the dimension of cubes of 100. This is set
in cubedata.h if you need something bigger.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
There are a few other potentially useful functions defined in cube.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
</para>
<para>
For examples of usage, see sql/cube.sql
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
This code is essentially based on the example written for
Illustra, <ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>
</para>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and
to his former student, Andy Dong
(<ulink url="http://best.me.berkeley.edu/~adong/"></ulink>), for his exemplar.
I am also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like to
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
for the years of faithful support of my database research.
</para>
<para>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
<email>selkovjr@mcs.anl.gov</email>
</para>
<para>
Minor updates to this package were made by Bruno Wolff III
<email>bruno@wolff.to</email> in August/September of 2002. These include
changing the precision from single precision to double precision and adding
some new functions.
</para>
<para>
Additional updates were made by Joshua Reich <email>josh@root.net</email> in
July 2006. These include <literal>cube(float8[], float8[])</literal> and
cleaning up the code to use the V1 call protocol instead of the deprecated V0
form.
</para>
</sect2>
</sect1>

1312
doc/src/sgml/dblink.sgml Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,133 @@
<sect1 id="earthdistance">
<title>earthdistance</title>
<indexterm zone="earthdistance">
<primary>earthdistance</primary>
</indexterm>
<para>
This module contains two different approaches to calculating
great circle distances on the surface of the Earth. The one described
first depends on the contrib/cube package (which MUST be installed before
earthdistance is installed). The second one is based on the point
datatype using latitude and longitude for the coordinates. The install
script makes the defined functions executable by anyone.
</para>
<para>
A spherical model of the Earth is used.
</para>
<para>
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the distance from the center of the Earth.
</para>
<para>
The radius of the Earth is obtained from the earth() function. It is
given in meters. But by changing this one function you can change it
to use some other units or to use a different value of the radius
that you feel is more appropiate.
</para>
<para>
This package also has applications to astronomical databases as well.
Astronomers will probably want to change earth() to return a radius of
180/pi() so that distances are in degrees.
</para>
<para>
Functions are provided to allow for input in latitude and longitude (in
degrees), to allow for output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
</para>
<para>
The functions are all 'sql' functions. If you want to make these functions
executable by other people you will also have to make the referenced
cube functions executable. cube(text), cube(float8), cube(cube,float8),
cube_distance(cube,cube), cube_ll_coord(cube,int) and
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
might be useful for looking at bounding box coordinates in user applications.
</para>
<para>
A domain of type cube named earth is defined.
There are constraints on it defined to make sure the cube is a point,
that it does not have more than 3 dimensions and that it is very near
the surface of a sphere centered about the origin with the radius of
the Earth.
</para>
<para>
The following functions are provided:
</para>
<table id="earthdistance-functions">
<title>EarthDistance functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>earth()</literal></entry>
<entry>returns the radius of the Earth in meters.</entry>
</row>
<row>
<entry><literal>sec_to_gc(float8)</literal></entry>
<entry>converts the normal straight line
(secant) distance between between two points on the surface of the Earth
to the great circle distance between them.
</entry>
</row>
<row>
<entry><literal>gc_to_sec(float8)</literal></entry>
<entry>Converts the great circle distance
between two points on the surface of the Earth to the normal straight line
(secant) distance between them.
</entry>
</row>
<row>
<entry><literal>ll_to_earth(float8, float8)</literal></entry>
<entry>Returns the location of a point on the surface of the Earth given
its latitude (argument 1) and longitude (argument 2) in degrees.
</entry>
</row>
<row>
<entry><literal>latitude(earth)</literal></entry>
<entry>Returns the latitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>longitude(earth)</literal></entry>
<entry>Returns the longitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>earth_distance(earth, earth)</literal></entry>
<entry>Returns the great circle distance between two points on the
surface of the Earth.
</entry>
</row>
<row>
<entry><literal>earth_box(earth, float8)</literal></entry>
<entry>Returns a box suitable for an indexed search using the cube @>
operator for points within a given great circle distance of a location.
Some points in this box are further than the specified great circle
distance from the location so a second check using earth_distance
should be made at the same time.
</entry>
</row>
<row>
<entry><literal><@></literal> operator</entry>
<entry>gives the distance in statute miles between
two points on the Earth's surface. Coordinates are in degrees. Points are
taken as (longitude, latitude) and not vice versa as longitude is closer
to the intuitive idea of x-axis and latitude to y-axis.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
One advantage of using cube representation over a point using latitude and
longitude for coordinates, is that you don't have to worry about special
conditions at +/- 180 degrees of longitude or near the poles.
</para>
</sect1>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.51 2007/11/01 17:00:18 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.52 2007/11/10 23:30:46 momjian Exp $ -->
<!entity history SYSTEM "history.sgml">
<!entity info SYSTEM "info.sgml">
@ -89,6 +89,38 @@
<!entity sources SYSTEM "sources.sgml">
<!entity storage SYSTEM "storage.sgml">
<!-- contrib information -->
<!entity contrib SYSTEM "contrib.sgml">
<!entity adminpack SYSTEM "adminpack.sgml">
<!entity btree-gist SYSTEM "btree-gist.sgml">
<!entity chkpass SYSTEM "chkpass.sgml">
<!entity cube SYSTEM "cube.sgml">
<!entity dblink SYSTEM "dblink.sgml">
<!entity earthdistance SYSTEM "earthdistance.sgml">
<!entity fuzzystrmatch SYSTEM "fuzzystrmatch.sgml">
<!entity hstore SYSTEM "hstore.sgml">
<!entity intagg SYSTEM "intagg.sgml">
<!entity intarray SYSTEM "intarray.sgml">
<!entity isn SYSTEM "isn.sgml">
<!entity lo SYSTEM "lo.sgml">
<!entity ltree SYSTEM "ltree.sgml">
<!entity oid2name SYSTEM "oid2name.sgml">
<!entity pageinspect SYSTEM "pageinspect.sgml">
<!entity pgbench SYSTEM "pgbench.sgml">
<!entity buffercache SYSTEM "buffercache.sgml">
<!entity pgcrypto SYSTEM "pgcrypto.sgml">
<!entity freespacemap SYSTEM "freespacemap.sgml">
<!entity pgrowlocks SYSTEM "pgrowlocks.sgml">
<!entity standby SYSTEM "standby.sgml">
<!entity pgstattuple SYSTEM "pgstattuple.sgml">
<!entity trgm SYSTEM "trgm.sgml">
<!entity seg SYSTEM "seg.sgml">
<!entity sslinfo SYSTEM "sslinfo.sgml">
<!entity tablefunc SYSTEM "tablefunc.sgml">
<!entity uuid-ossp SYSTEM "uuid-ossp.sgml">
<!entity vacuumlo SYSTEM "vacuumlo.sgml">
<!entity xml2 SYSTEM "xml2.sgml">
<!-- appendixes -->
<!entity contacts SYSTEM "contacts.sgml">
<!entity cvs SYSTEM "cvs.sgml">

View File

@ -0,0 +1,243 @@
<sect1 id="pgfreespacemap">
<title>pgfreespacemap</title>
<indexterm zone="pgfreespacemap">
<primary>pgfreespacemap</primary>
</indexterm>
<para>
This modules provides the means for examining the free space map (FSM). It
consists of two C functions: <literal>pg_freespacemap_relations()</literal>
and <literal>pg_freespacemap_pages()</literal> that return a set
of records, plus two views <literal>pg_freespacemap_relations</literal> and
<literal>pg_freespacemap_pages</literal> for more user-friendly access to
the functions.
</para>
<para>
The module provides the ability to examine the contents of the free space
map, without having to restart or rebuild the server with additional
debugging code.
</para>
<para>
By default public access is REVOKED from the functions and views, just in
case there are security issues present in the code.
</para>
<sect2>
<title>Notes</title>
<para>
The definitions for the columns exposed in the views are:
</para>
<table>
<title>pg_freespacemap_relations</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column</entry>
<entry>references</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>reltablespace</entry>
<entry>pg_tablespace.oid</entry>
<entry>Tablespace oid of the relation.</entry>
</row>
<row>
<entry>reldatabase</entry>
<entry>pg_database.oid</entry>
<entry>Database oid of the relation.</entry>
</row>
<row>
<entry>relfilenode</entry>
<entry>pg_class.relfilenode</entry>
<entry>Relfilenode of the relation.</entry>
</row>
<row>
<entry>avgrequest</entry>
<entry></entry>
<entry>Moving average of free space requests (NULL for indexes)</entry>
</row>
<row>
<entry>interestingpages</entry>
<entry></entry>
<entry>Count of pages last reported as containing useful free space.</entry>
</row>
<row>
<entry>storedpages</entry>
<entry></entry>
<entry>Count of pages actually stored in free space map.</entry>
</row>
<row>
<entry>nextpage</entry>
<entry></entry>
<entry>Page index (from 0) to start next search at.</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>pg_freespacemap_pages</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column</entry>
<entry> references</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>reltablespace</entry>
<entry>pg_tablespace.oid</entry>
<entry>Tablespace oid of the relation.</entry>
</row>
<row>
<entry>reldatabase</entry>
<entry>pg_database.oid</entry>
<entry>Database oid of the relation.</entry>
</row>
<row>
<entry>relfilenode</entry>
<entry>pg_class.relfilenode</entry>
<entry>Relfilenode of the relation.</entry>
</row>
<row>
<entry>relblocknumber</entry>
<entry></entry>
<entry>Page number in the relation.</entry>
</row>
<row>
<entry>bytes</entry>
<entry></entry>
<entry>Free bytes in the page, or NULL for an index page (see below).</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
For <literal>pg_freespacemap_relations</literal>, there is one row for each
relation in the free space map. <literal>storedpages</literal> is the
number of pages actually stored in the map, while
<literal>interestingpages</literal> is the number of pages the last VACUUM
thought had useful amounts of free space.
</para>
<para>
If <literal>storedpages</literal> is consistently less than interestingpages
then it'd be a good idea to increase <literal>max_fsm_pages</literal>. Also,
if the number of rows in <literal>pg_freespacemap_relations</literal> is
close to <literal>max_fsm_relations</literal>, then you should consider
increasing <literal>max_fsm_relations</literal>.
</para>
<para>
For <literal>pg_freespacemap_pages</literal>, there is one row for each page
in the free space map. The number of rows for a relation will match the
<literal>storedpages</literal> column in
<literal>pg_freespacemap_relations</literal>.
</para>
<para>
For indexes, what is tracked is entirely-unused pages, rather than free
space within pages. Therefore, the average request size and free bytes
within a page are not meaningful, and are shown as NULL.
</para>
<para>
Because the map is shared by all the databases, it will include relations
not belonging to the current database.
</para>
<para>
When either of the views are accessed, internal free space map locks are
taken, and a copy of the map data is made for them to display.
This ensures that the views produce a consistent set of results, while not
blocking normal activity longer than necessary. Nonetheless there
could be some impact on database performance if they are read often.
</para>
</sect2>
<sect2>
<title>Sample output - pg_freespacemap_relations</title>
<programlisting>
regression=# \d pg_freespacemap_relations
View "public.pg_freespacemap_relations"
Column | Type | Modifiers
------------------+---------+-----------
reltablespace | oid |
reldatabase | oid |
relfilenode | oid |
avgrequest | integer |
interestingpages | integer |
storedpages | integer |
nextpage | integer |
View definition:
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage
FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer);
regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages
FROM pg_freespacemap_relations r INNER JOIN pg_class c
ON c.relfilenode = r.relfilenode INNER JOIN pg_database d
ON r.reldatabase = d.oid AND (d.datname = current_database())
ORDER BY r.storedpages DESC LIMIT 10;
relname | avgrequest | interestingpages | storedpages
---------------------------------+------------+------------------+-------------
onek | 256 | 109 | 109
pg_attribute | 167 | 93 | 93
pg_class | 191 | 49 | 49
pg_attribute_relid_attnam_index | | 48 | 48
onek2 | 256 | 37 | 37
pg_depend | 95 | 26 | 26
pg_type | 199 | 16 | 16
pg_rewrite | 1011 | 13 | 13
pg_class_relname_nsp_index | | 10 | 10
pg_proc | 302 | 8 | 8
(10 rows)
</programlisting>
</sect2>
<sect2>
<title>Sample output - pg_freespacemap_pages</title>
<programlisting>
regression=# \d pg_freespacemap_pages
View "public.pg_freespacemap_pages"
Column | Type | Modifiers
----------------+---------+-----------
reltablespace | oid |
reldatabase | oid |
relfilenode | oid |
relblocknumber | bigint |
bytes | integer |
View definition:
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes
FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer);
regression=# SELECT c.relname, p.relblocknumber, p.bytes
FROM pg_freespacemap_pages p INNER JOIN pg_class c
ON c.relfilenode = p.relfilenode INNER JOIN pg_database d
ON (p.reldatabase = d.oid AND d.datname = current_database())
ORDER BY c.relname LIMIT 10;
relname | relblocknumber | bytes
--------------+----------------+-------
a_star | 0 | 8040
abstime_tbl | 0 | 7908
aggtest | 0 | 8008
altinhoid | 0 | 8128
altstartwith | 0 | 8128
arrtest | 0 | 7172
b_star | 0 | 7976
box_tbl | 0 | 7912
bt_f8_heap | 54 | 7728
bt_i4_heap | 49 | 8008
(10 rows)
</programlisting>
</sect2>
<sect2>
<title>Author</title>
<para>
Mark Kirkwood <email>markir@paradise.net.nz</email>
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,122 @@
<sect1 id="fuzzystrmatch">
<title>fuzzystrmatch</title>
<para>
This section describes the fuzzystrmatch module which provides different
functions to determine similarities and distance between strings.
</para>
<sect2>
<title>Soundex</title>
<para>
The Soundex system is a method of matching similar sounding names
(or any words) to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910, but it has little use
beyond English names (or the English pronunciation of names), and
it is not a linguistic tool.
</para>
<para>
When comparing two soundex values to determine similarity, the
difference function reports how close the match is on a scale
from zero to four, with zero being no match and four being an
exact match.
</para>
<para>
The following are some usage examples:
</para>
<programlisting>
SELECT soundex('hello world!');
SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
CREATE TABLE s (nm text);
INSERT INTO s VALUES ('john');
INSERT INTO s VALUES ('joan');
INSERT INTO s VALUES ('wobbly');
INSERT INTO s VALUES ('jack');
SELECT * FROM s WHERE soundex(nm) = soundex('john');
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid;
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
'select soundex($1) = soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
'select soundex($1) < soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
'select soundex($1) > soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
'select soundex($1) <= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
'select soundex($1) >= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
'select soundex($1) <> soundex($2)'
LANGUAGE SQL;
DROP OPERATOR #= (text, text);
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
SELECT * FROM s WHERE s.nm #= 'john';
SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
</programlisting>
</sect2>
<sect2>
<title>levenshtein</title>
<para>
This function calculates the levenshtein distance between two strings:
</para>
<programlisting>
int levenshtein(text source, text target)
</programlisting>
<para>
Both <literal>source</literal> and <literal>target</literal> can be any
NOT NULL string with a maximum of 255 characters.
</para>
<para>
Example:
</para>
<programlisting>
SELECT levenshtein('GUMBO','GAMBOL');
</programlisting>
</sect2>
<sect2>
<title>metaphone</title>
<para>
This function calculates and returns the metaphone code of an input string:
</para>
<programlisting>
text metahpone(text source, int max_output_length)
</programlisting>
<para>
<literal>source</literal> has to be a NOT NULL string with a maximum of
255 characters. <literal>max_output_length</literal> fixes the maximum
length of the output metaphone code; if longer, the output is truncated
to this length.
</para>
<para>Example</para>
<programlisting>
SELECT metaphone('GUMBO',4);
</programlisting>
</sect2>
</sect1>

298
doc/src/sgml/hstore.sgml Normal file
View File

@ -0,0 +1,298 @@
<sect1 id="hstore">
<title>hstore</title>
<indexterm zone="hstore">
<primary>hstore</primary>
</indexterm>
<para>
The <literal>hstore</literal> module is usefull for storing (key,value) pairs.
This module can be useful in different scenarios: case with many attributes
rarely searched, semistructural data or a lazy DBA.
</para>
<sect2>
<title>Operations</title>
<itemizedlist>
<listitem>
<para>
<literal>hstore -> text</literal> - get value , perl analogy $h{key}
</para>
<programlisting>
select 'a=>q, b=>g'->'a';
?
------
q
</programlisting>
<para>
Note the use of parenthesis in the select below, because priority of 'is' is
higher than that of '->':
</para>
<programlisting>
SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL;
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore || hstore</literal> - concatenation, perl analogy %a=( %b, %c );
</para>
<programlisting>
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
?column?
--------------------
"a"=>"b", "c"=>"d"
(1 row)
</programlisting>
<para>
but, notice
</para>
<programlisting>
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
?column?
----------
"a"=>"d"
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>text => text</literal> - creates hstore type from two text strings
</para>
<programlisting>
select 'a'=>'b';
?column?
----------
"a"=>"b"
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore @> hstore</literal> - contains operation, check if left operand contains right.
</para>
<programlisting>
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
?column?
----------
f
(1 row)
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
?column?
----------
t
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore &lt;@ hstore</literal> - contained operation, check if
left operand is contained in right
</para>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>akeys(hstore)</literal> - returns all keys from hstore as array
</para>
<programlisting>
regression=# select akeys('a=>1,b=>2');
akeys
-------
{a,b}
</programlisting>
</listitem>
<listitem>
<para>
<literal>skeys(hstore)</literal> - returns all keys from hstore as strings
</para>
<programlisting>
regression=# select skeys('a=>1,b=>2');
skeys
-------
a
b
</programlisting>
</listitem>
<listitem>
<para>
<literal>avals(hstore)</literal> - returns all values from hstore as array
</para>
<programlisting>
regression=# select avals('a=>1,b=>2');
avals
-------
{1,2}
</programlisting>
</listitem>
<listitem>
<para>
<literal>svals(hstore)</literal> - returns all values from hstore as
strings
</para>
<programlisting>
regression=# select svals('a=>1,b=>2');
svals
-------
1
2
</programlisting>
</listitem>
<listitem>
<para>
<literal>delete (hstore,text)</literal> - delete (key,value) from hstore if
key matches argument.
</para>
<programlisting>
regression=# select delete('a=>1,b=>2','b');
delete
----------
"a"=>"1"
</programlisting>
</listitem>
<listitem>
<para>
<literal>each(hstore)</literal> - return (key, value) pairs
</para>
<programlisting>
regression=# select * from each('a=>1,b=>2');
key | value
-----+-------
a | 1
b | 2
</programlisting>
</listitem>
<listitem>
<para>
<literal>exist (hstore,text)</literal>
</para>
<para>
<literal>hstore ? text</literal> - returns 'true if key is exists in hstore
and false otherwise.
</para>
<programlisting>
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
exist | ?column?
-------+----------
t | t
</programlisting>
</listitem>
<listitem>
<para>
<literal>defined (hstore,text)</literal> - returns true if key is exists in
hstore and its value is not NULL.
</para>
<programlisting>
regression=# select defined('a=>NULL','a');
defined
---------
f
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Indices</title>
<para>
Module provides index support for '@>' and '?' operations.
</para>
<programlisting>
CREATE INDEX hidx ON testhstore USING GIST(h);
CREATE INDEX hidx ON testhstore USING GIN(h);
</programlisting>
</sect2>
<sect2>
<title>Examples</title>
<para>
Add a key:
</para>
<programlisting>
UPDATE tt SET h=h||'c=>3';
</programlisting>
<para>
Delete a key:
</para>
<programlisting>
UPDATE tt SET h=delete(h,'k1');
</programlisting>
</sect2>
<sect2>
<title>Statistics</title>
<para>
hstore type, because of its intrinsic liberality, could contain a lot of
different keys. Checking for valid keys is the task of application.
Examples below demonstrate several techniques how to check keys statistics.
</para>
<para>
Simple example
</para>
<programlisting>
SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 ');
</programlisting>
<para>
Using table
</para>
<programlisting>
SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ;
</programlisting>
<para>Online stat</para>
<programlisting>
SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key;
key | count
-----------+-------
line | 883
query | 207
pos | 203
node | 202
space | 197
status | 195
public | 194
title | 190
org | 189
...................
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
</para>
<para>
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
</para>
</sect2>
</sect1>

82
doc/src/sgml/intagg.sgml Normal file
View File

@ -0,0 +1,82 @@
<sect1 id="intagg">
<title>intagg</title>
<indexterm zone="intagg">
<primary>intagg</primary>
</indexterm>
<para>
This section describes the <literal>intagg</literal> module which provides an integer aggregator and an enumerator.
</para>
<para>
Many database systems have the notion of a one to many table. Such a table usually sits between two indexed tables, as:
</para>
<programlisting>
CREATE TABLE one_to_many(left INT, right INT) ;
</programlisting>
<para>
And it is used like this:
</para>
<programlisting>
SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
WHERE one_to_many.left = item;
</programlisting>
<para>
This will return all the items in the right hand table for an entry
in the left hand table. This is a very common construct in SQL.
</para>
<para>
Now, this methodology can be cumbersome with a very large number of
entries in the one_to_many table. Depending on the order in which
data was entered, a join like this could result in an index scan
and a fetch for each right hand entry in the table for a particular
left hand entry. If you have a very dynamic system, there is not much you
can do. However, if you have some data which is fairly static, you can
create a summary table with the aggregator.
</para>
<programlisting>
CREATE TABLE summary as SELECT left, int_array_aggregate(right)
AS right FROM one_to_many GROUP BY left;
</programlisting>
<para>
This will create a table with one row per left item, and an array
of right items. Now this is pretty useless without some way of using
the array, thats why there is an array enumerator.
</para>
<programlisting>
SELECT left, int_array_enum(right) FROM summary WHERE left = item;
</programlisting>
<para>
The above query using int_array_enum, produces the same results as:
</para>
<programlisting>
SELECT left, right FROM one_to_many WHERE left = item;
</programlisting>
<para>
The difference is that the query against the summary table has to get
only one row from the table, where as the query against "one_to_many"
must index scan and fetch a row for each entry.
</para>
<para>
On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
to a cost of 329. The query is a join between the one_to_many table,
</para>
<programlisting>
SELECT right, count(right) FROM
(
SELECT left, int_array_enum(right) AS right FROM summary JOIN
(SELECT left FROM left_table WHERE left = item) AS lefts
ON (summary.left = lefts.left )
) AS list GROUP BY right ORDER BY count DESC ;
</programlisting>
</sect1>

286
doc/src/sgml/intarray.sgml Normal file
View File

@ -0,0 +1,286 @@
<sect1 id="intarray">
<title>intarray</title>
<indexterm zone="intarray">
<primary>intarray</primary>
</indexterm>
<para>
This is an implementation of RD-tree data structure using GiST interface
of PostgreSQL. It has built-in lossy compression.
</para>
<para>
Current implementation provides index support for one-dimensional array of
int4's - gist__int_ops, suitable for small and medium size of arrays (used on
default), and gist__intbig_ops for indexing large arrays (we use superimposed
signature with length of 4096 bits to represent sets).
</para>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>int icount(int[])</literal> - the number of elements in intarray
</para>
<programlisting>
test=# select icount('{1,2,3}'::int[]);
icount
--------
3
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] sort(int[], 'asc' | 'desc')</literal> - sort intarray
</para>
<programlisting>
test=# select sort('{1,2,3}'::int[],'desc');
sort
---------
{3,2,1}
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] sort(int[])</literal> - sort in ascending order
</para>
</listitem>
<listitem>
<para>
<literal>int[] sort_asc(int[]),sort_desc(int[])</literal> - shortcuts for sort
</para>
</listitem>
<listitem>
<para>
<literal>int[] uniq(int[])</literal> - returns unique elements
</para>
<programlisting>
test=# select uniq(sort('{1,2,3,2,1}'::int[]));
uniq
---------
{1,2,3}
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int idx(int[], int item)</literal> - returns index of first
intarray matching element to item, or '0' if matching failed.
</para>
<programlisting>
test=# select idx('{1,2,3,2,1}'::int[],2);
idx
-----
2
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] subarray(int[],int START [, int LEN])</literal> - returns
part of intarray starting from element number START (from 1) and length LEN.
</para>
<programlisting>
test=# select subarray('{1,2,3,2,1}'::int[],2,3);
subarray
----------
{2,3,2}
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] intset(int4)</literal> - casting int4 to int[]
</para>
<programlisting>
test=# select intset(1);
intset
--------
{1}
(1 row)
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Operations</title>
<table>
<title>Operations</title>
<tgroup cols="2">
<thead>
<row>
<entry>Operator</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>int[] && int[]</literal></entry>
<entry>overlap - returns TRUE if arrays have at least one common element</entry>
</row>
<row>
<entry><literal>int[] @> int[]</literal></entry>
<entry>contains - returns TRUE if left array contains right array</entry>
</row>
<row>
<entry><literal>int[] <@ int[]</literal></entry>
<entry>contained - returns TRUE if left array is contained in right array</entry>
</row>
<row>
<entry><literal># int[]</literal></entry>
<entry>returns the number of elements in array</entry>
</row>
<row>
<entry><literal>int[] + int</literal></entry>
<entry>push element to array ( add to end of array)</entry>
</row>
<row>
<entry><literal>int[] + int[] </literal></entry>
<entry>merge of arrays (right array added to the end of left one)</entry>
</row>
<row>
<entry><literal>int[] - int</literal></entry>
<entry>remove entries matched by right argument from array</entry>
</row>
<row>
<entry><literal>int[] - int[]</literal></entry>
<entry>remove right array from left</entry>
</row>
<row>
<entry><literal>int[] | int</literal></entry>
<entry>returns intarray - union of arguments</entry>
</row>
<row>
<entry><literal>int[] | int[]</literal></entry>
<entry>returns intarray as a union of two arrays</entry>
</row>
<row>
<entry><literal>int[] & int[]</literal></entry>
<entry>returns intersection of arrays</entry>
</row>
<row>
<entry><literal>int[] @@ query_int</literal></entry>
<entry>
returns TRUE if array satisfies query (like
<literal>'1&amp;(2|3)'</literal>)
</entry>
</row>
<row>
<entry><literal>query_int ~~ int[]</literal></entry>
<entry>returns TRUE if array satisfies query (commutator of @@)</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
</sect2>
<sect2>
<title>Example</title>
<programlisting>
CREATE TABLE message (mid INT NOT NULL,sections INT[]);
CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL);
-- create indices
CREATE unique index message_key ON message ( mid );
CREATE unique index message_section_map_key2 ON message_section_map (sid, mid );
CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops);
-- select some messages with section in 1 OR 2 - OVERLAP operator
SELECT message.mid FROM message WHERE message.sections && '{1,2}';
-- select messages contains in sections 1 AND 2 - CONTAINS operator
SELECT message.mid FROM message WHERE message.sections @> '{1,2}';
-- the same, CONTAINED operator
SELECT message.mid FROM message WHERE '{1,2}' <@ message.sections;
</programlisting>
</sect2>
<sect2>
<title>Benchmark</title>
<para>
subdirectory bench contains benchmark suite.
</para>
<programlisting>
cd ./bench
1. createdb TEST
2. psql TEST < ../_int.sql
3. ./create_test.pl | psql TEST
4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
with/without RD-Tree. Run script without arguments to
see availbale options.
a)test without RD-Tree (OR)
./bench.pl -d TEST -c -s 1,2 -v
b)test with RD-Tree
./bench.pl -d TEST -c -s 1,2 -v -r
BENCHMARKS:
Size of table &lt;message>: 200000
Size of table &lt;message_section_map>: 269133
Distribution of messages by sections:
section 0: 74377 messages
section 1: 16284 messages
section 50: 1229 messages
section 99: 683 messages
old - without RD-Tree support,
new - with RD-Tree
+----------+---------------+----------------+
|Search set|OR, time in sec|AND, time in sec|
| +-------+-------+--------+-------+
| | old | new | old | new |
+----------+-------+-------+--------+-------+
| 1| 0.625| 0.101| -| -|
+----------+-------+-------+--------+-------+
| 99| 0.018| 0.017| -| -|
+----------+-------+-------+--------+-------+
| 1,2| 0.766| 0.133| 0.628| 0.045|
+----------+-------+-------+--------+-------+
| 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
+----------+-------+-------+--------+-------+
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and Oleg
Bartunov (<email>oleg@sai.msu.su</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
additional information. Andrey Oktyabrski did a great work on adding new
functions and operations.
</para>
</sect2>
</sect1>

502
doc/src/sgml/isn.sgml Normal file
View File

@ -0,0 +1,502 @@
<sect1 id="isn">
<title>isn</title>
<indexterm zone="isn">
<primary>isn</primary>
</indexterm>
<para>
The <literal>isn</literal> module adds data types for the following
international-standard namespaces: EAN13, UPC, ISBN (books), ISMN (music),
and ISSN (serials). This module is inspired by Garrett A. Wollman's
isbn_issn code.
</para>
<para>
This module validates, and automatically adds the correct
hyphenations to the numbers. Also, it supports the new ISBN-13
numbers to be used starting in January 2007.
</para>
<para>
Premises:
</para>
<orderedlist>
<listitem>
<para>ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers</para>
</listitem>
<listitem>
<para>EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)</para>
</listitem>
<listitem>
<para>some ISBN13 numbers can be displayed as ISBN</para>
</listitem>
<listitem>
<para>some ISMN13 numbers can be displayed as ISMN</para>
</listitem>
<listitem>
<para>some ISSN13 numbers can be displayed as ISSN</para>
</listitem>
<listitem>
<para>all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers</para>
</listitem>
</orderedlist>
<note>
<para>
All types are internally represented as 64 bit integers,
and internally all are consistently interchangeable.
</para>
</note>
<note>
<para>
We have two operator classes (for btree and for hash) so each data type
can be indexed for faster access.
</para>
</note>
<sect2>
<title>Data types</title>
<para>
We have the following data types:
</para>
<table>
<title>Data types</title>
<tgroup cols="2">
<thead>
<row>
<entry><para>Data type</para></entry>
<entry><para>Description</para></entry>
</row>
</thead>
<tbody>
<row>
<entry><para><literal>EAN13</literal></para></entry>
<entry>
<para>
European Article Numbers. This type will always show the EAN13-display
format. Te output function for this is <literal>ean13_out()</literal>
</para>
</entry>
</row>
<row>
<entry><para><literal>ISBN13</literal></para></entry>
<entry>
<para>
For International Standard Book Numbers to be displayed in
the new EAN13-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISMN13</literal></para></entry>
<entry>
<para>
For International Standard Music Numbers to be displayed in
the new EAN13-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISSN13</literal></para></entry>
<entry>
<para>
For International Standard Serial Numbers to be displayed in the new
EAN13-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISBN</literal></para></entry>
<entry>
<para>
For International Standard Book Numbers to be displayed in the current
short-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISMN</literal></para></entry>
<entry>
<para>
For International Standard Music Numbers to be displayed in the
current short-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISSN</literal></para></entry>
<entry>
<para>
For International Standard Serial Numbers to be displayed in the
current short-display format. These types will display the short
version of the ISxN (ISxN 10) whenever it's possible, and it will
show ISxN 13 when it's impossible to show the short version. The
output function to do this is <literal>isn_out()</literal>
</para>
</entry>
</row>
<row>
<entry><para><literal>UPC</literal></para></entry>
<entry>
<para>
For Universal Product Codes. UPC numbers are a subset of the EAN13
numbers (they are basically EAN13 without the first '0' digit.)
The output function to do this is also <literal>isn_out()</literal>
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<para>
<literal>EAN13</literal>, <literal>ISBN13</literal>,
<literal>ISMN13</literal> and <literal>ISSN13</literal> types will always
display the long version of the ISxN (EAN13). The output function to do
this is <literal>ean13_out()</literal>.
</para>
<para>
The need for these types is just for displaying in different ways the same
data: <literal>ISBN13</literal> is actually the same as
<literal>ISBN</literal>, <literal>ISMN13=ISMN</literal> and
<literal>ISSN13=ISSN</literal>.
</para>
</note>
</sect2>
<sect2>
<title>Input functions</title>
<para>
We have the following input functions:
</para>
<table>
<title>Input functions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Function</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><para><literal>ean13_in()</literal></para></entry>
<entry>
<para>
To take a string and return an EAN13.
</para>
</entry>
</row>
<row>
<entry><para><literal>isbn_in()</literal></para></entry>
<entry>
<para>
To take a string and return valid ISBN or ISBN13 numbers.
</para>
</entry>
</row>
<row>
<entry><para><literal>ismn_in()</literal></para></entry>
<entry>
<para>
To take a string and return valid ISMN or ISMN13 numbers.
</para>
</entry>
</row>
<row>
<entry><para><literal>issn_in()</literal></para></entry>
<entry>
<para>
To take a string and return valid ISSN or ISSN13 numbers.
</para>
</entry>
</row>
<row>
<entry><para><literal>upc_in()</literal></para></entry>
<entry>
<para>
To take a string and return an UPC codes.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Casts</title>
<para>
We are able to cast from:
</para>
<itemizedlist>
<listitem>
<para>
ISBN13 -> EAN13
</para>
</listitem>
<listitem>
<para>
ISMN13 -> EAN13
</para>
</listitem>
<listitem>
<para>
ISSN13 -> EAN13
</para>
</listitem>
<listitem>
<para>
ISBN -> EAN13
</para>
</listitem>
<listitem>
<para>
ISMN -> EAN13
</para>
</listitem>
<listitem>
<para>
ISSN -> EAN13
</para>
</listitem>
<listitem>
<para>
UPC -> EAN13
</para>
</listitem>
<listitem>
<para>
ISBN <-> ISBN13
</para>
</listitem>
<listitem>
<para>
ISMN <-> ISMN13
</para>
</listitem>
<listitem>
<para>
ISSN <-> ISSN13
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>C API</title>
<para>
The C API is implemented as:
</para>
<programlisting>
extern Datum isn_out(PG_FUNCTION_ARGS);
extern Datum ean13_out(PG_FUNCTION_ARGS);
extern Datum ean13_in(PG_FUNCTION_ARGS);
extern Datum isbn_in(PG_FUNCTION_ARGS);
extern Datum ismn_in(PG_FUNCTION_ARGS);
extern Datum issn_in(PG_FUNCTION_ARGS);
extern Datum upc_in(PG_FUNCTION_ARGS);
</programlisting>
<para>
On success:
</para>
<itemizedlist>
<listitem>
<para>
<literal>isn_out()</literal> takes any of our types and returns a string containing
the shortes possible representation of the number.
</para>
</listitem>
<listitem>
<para>
<literal>ean13_out()</literal> takes any of our types and returns the
EAN13 (long) representation of the number.
</para>
</listitem>
<listitem>
<para>
<literal>ean13_in()</literal> takes a string and return a EAN13. Which, as stated in (2)
could or could not be any of our types, but it certainly is an EAN13
number. Only if the string is a valid EAN13 number, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>isbn_in()</literal> takes a string and return an ISBN/ISBN13. Only if the string
is really a ISBN/ISBN13, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>ismn_in()</literal> takes a string and return an ISMN/ISMN13. Only if the string
is really a ISMN/ISMN13, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>issn_in()</literal> takes a string and return an ISSN/ISSN13. Only if the string
is really a ISSN/ISSN13, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>upc_in()</literal> takes a string and return an UPC. Only if the string is
really a UPC, otherwise it fails.
</para>
</listitem>
</itemizedlist>
<para>
(on failure, the functions 'ereport' the error)
</para>
</sect2>
<sect2>
<title>Testing functions</title>
<table>
<title>Testing functions</title>
<tgroup cols="2">
<thead>
<row>
<entry><para>Function</para></entry>
<entry><para>Description</para></entry>
</row>
</thead>
<tbody>
<row>
<entry><para><literal>isn_weak(boolean)</literal></para></entry>
<entry><para>Sets the weak input mode.</para></entry>
</row>
<row>
<entry><para><literal>isn_weak()</literal></para></entry>
<entry><para>Gets the current status of the weak mode.</para></entry>
</row>
<row>
<entry><para><literal>make_valid()</literal></para></entry>
<entry><para>Validates an invalid number (deleting the invalid flag).</para></entry>
</row>
<row>
<entry><para><literal>is_valid()</literal></para></entry>
<entry><para>Checks for the invalid flag prsence.</para></entry>
</row>
</tbody>
</tgroup>
</table>
<para>
<literal>Weak</literal> mode is used to be able to insert invalid data to
a table. Invalid as in the check digit being wrong, not missing numbers.
</para>
<para>
Why would you want to use the weak mode? Well, it could be that
you have a huge collection of ISBN numbers, and that there are so many of
them that for weird reasons some have the wrong check digit (perhaps the
numbers where scanned from a printed list and the OCR got the numbers wrong,
perhaps the numbers were manually captured... who knows.) Anyway, the thing
is you might want to clean the mess up, but you still want to be able to have
all the numbers in your database and maybe use an external tool to access
the invalid numbers in the database so you can verify the information and
validate it more easily; as selecting all the invalid numbers in the table.
</para>
<para>
When you insert invalid numbers in a table using the weak mode, the number
will be inserted with the corrected check digit, but it will be flagged
with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
</para>
<para>
You can also force the insertion of invalid numbers even not in the weak mode,
appending the '!' character at the end of the number.
</para>
</sect2>
<sect2>
<title>Examples</title>
<programlisting>
--Using the types directly:
SELECT isbn('978-0-393-04002-9');
SELECT isbn13('0901690546');
SELECT issn('1436-4522');
--Casting types:
-- note that you can only cast from ean13 to other type when the casted
-- number would be valid in the realm of the casted type;
-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
-- but these will:
SELECT upc(ean13('0220356483481'));
SELECT ean13(upc('220356483481'));
--Create a table with a single column to hold ISBN numbers:
CREATE TABLE test ( id isbn );
INSERT INTO test VALUES('9780393040029');
--Automatically calculating check digits (observe the '?'):
INSERT INTO test VALUES('220500896?');
INSERT INTO test VALUES('978055215372?');
SELECT issn('3251231?');
SELECT ismn('979047213542?');
--Using the weak mode:
SELECT isn_weak(true);
INSERT INTO test VALUES('978-0-11-000533-4');
INSERT INTO test VALUES('9780141219307');
INSERT INTO test VALUES('2-205-00876-X');
SELECT isn_weak(false);
SELECT id FROM test WHERE NOT is_valid(id);
UPDATE test SET id=make_valid(id) WHERE id = '2-205-00876-X!';
SELECT * FROM test;
SELECT isbn13(id) FROM test;
</programlisting>
</sect2>
<sect2>
<title>Bibliography</title>
<para>
The information to implement this module was collected through
several sites, including:
</para>
<programlisting>
http://www.isbn-international.org/
http://www.issn.org/
http://www.ismn-international.org/
http://www.wikipedia.org/
</programlisting>
<para>
the prefixes used for hyphenation where also compiled from:
</para>
<programlisting>
http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html
http://www.isbn-international.org/en/identifiers.html
http://www.ismn-international.org/ranges.html
</programlisting>
<para>
Care was taken during the creation of the algorithms and they
were meticulously verified against the suggested algorithms
in the official ISBN, ISMN, ISSN User Manuals.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Germán Méndez Bravo (Kronuz), 2004 - 2006
</para>
</sect2>
</sect1>

118
doc/src/sgml/lo.sgml Normal file
View File

@ -0,0 +1,118 @@
<sect1 id="lo">
<title>lo</title>
<indexterm zone="lo">
<primary>lo</primary>
</indexterm>
<para>
PostgreSQL type extension for managing Large Objects
</para>
<sect2>
<title>Overview</title>
<para>
One of the problems with the JDBC driver (and this affects the ODBC driver
also), is that the specification assumes that references to BLOBS (Binary
Large OBjectS) are stored within a table, and if that entry is changed, the
associated BLOB is deleted from the database.
</para>
<para>
As PostgreSQL stands, this doesn't occur. Large objects are treated as
objects in their own right; a table entry can reference a large object by
OID, but there can be multiple table entries referencing the same large
object OID, so the system doesn't delete the large object just because you
change or remove one such entry.
</para>
<para>
Now this is fine for new PostgreSQL-specific applications, but existing ones
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
that are not referenced by anything, and simply occupy disk space.
</para>
</sect2>
<sect2>
<title>The Fix</title>
<para>
I've fixed this by creating a new data type 'lo', some support functions, and
a Trigger which handles the orphaning problem. The trigger essentially just
does a 'lo_unlink' whenever you delete or modify a value referencing a large
object. When you use this trigger, you are assuming that there is only one
database reference to any large object that is referenced in a
trigger-controlled column!
</para>
<para>
The 'lo' type was created because we needed to differentiate between plain
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
but (after talking to Byron), the ODBC driver needed a unique type. They had
created an 'lo' type, but not the solution to orphaning.
</para>
<para>
You don't actually have to use the 'lo' type to use the trigger, but it may be
convenient to use it to keep track of which columns in your database represent
large objects that you are managing with the trigger.
</para>
</sect2>
<sect2>
<title>How to Use</title>
<para>
The easiest way is by an example:
</para>
<programlisting>
CREATE TABLE image (title TEXT, raster lo);
CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image
FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster);
</programlisting>
<para>
Create a trigger for each column that contains a lo type, and give the column
name as the trigger procedure argument. You can have more than one trigger on
a table if you need multiple lo columns in the same table, but don't forget to
give a different name to each trigger.
</para>
</sect2>
<sect2>
<title>Issues</title>
<itemizedlist>
<listitem>
<para>
Dropping a table will still orphan any objects it contains, as the trigger
is not executed.
</para>
<para>
Avoid this by preceding the 'drop table' with 'delete from {table}'.
</para>
<para>
If you already have, or suspect you have, orphaned large objects, see
the contrib/vacuumlo module to help you clean them up. It's a good idea
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
trigger.
</para>
</listitem>
<listitem>
<para>
Some frontends may create their own tables, and will not create the
associated trigger(s). Also, users may not remember (or know) to create
the triggers.
</para>
</listitem>
</itemizedlist>
<para>
As the ODBC driver needs a permanent lo type (& JDBC could be optimised to
use it if it's Oid is fixed), and as the above issues can only be fixed by
some internal changes, I feel it should become a permanent built-in type.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email> June 13 1998
</para>
</sect2>
</sect1>

771
doc/src/sgml/ltree.sgml Normal file
View File

@ -0,0 +1,771 @@
<sect1 id="ltree">
<title>ltree</title>
<indexterm zone="ltree">
<primary>ltree</primary>
</indexterm>
<para>
<literal>ltree</literal> is a PostgreSQL module that contains implementation
of data types, indexed access methods and queries for data organized as a
tree-like structures.
</para>
<sect2>
<title>Definitions</title>
<para>
A <emphasis>label</emphasis> of a node is a sequence of one or more words
separated by blank character '_' and containing letters and digits ( for
example, [a-zA-Z0-9] for C locale). The length of a label is limited by 256
bytes.
</para>
<para>
Example: 'Countries', 'Personal_Services'
</para>
<para>
A <emphasis>label path</emphasis> of a node is a sequence of one or more
dot-separated labels l1.l2...ln, represents path from root to the node. The
length of a label path is limited by 65Kb, but size <= 2Kb is preferrable.
We consider it's not a strict limitation (maximal size of label path for
DMOZ catalogue - <ulink url="http://www.dmoz.org"></ulink>, is about 240
bytes!)
</para>
<para>
Example: <literal>'Top.Countries.Europe.Russia'</literal>
</para>
<para>
We introduce several datatypes:
</para>
<itemizedlist>
<listitem>
<para>
<literal>ltree</literal> - is a datatype for label path.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[]</literal> - is a datatype for arrays of ltree.
</para>
</listitem>
<listitem>
<para>
<literal>lquery</literal>
- is a path expression that has regular expression in the label path and
used for ltree matching. Star symbol (*) is used to specify any number of
labels (levels) and could be used at the beginning and the end of lquery,
for example, '*.Europe.*'.
</para>
<para>
The following quantifiers are recognized for '*' (like in Perl):
</para>
<itemizedlist>
<listitem>
<para>{n} Match exactly n levels</para>
</listitem>
<listitem>
<para>{n,} Match at least n levels</para>
</listitem>
<listitem>
<para>{n,m} Match at least n but not more than m levels</para>
</listitem>
<listitem>
<para>{,m} Match at maximum m levels (eq. to {0,m})</para>
</listitem>
</itemizedlist>
<para>
It is possible to use several modifiers at the end of a label:
</para>
<itemizedlist>
<listitem>
<para>@ Do case-insensitive label matching</para>
</listitem>
<listitem>
<para>* Do prefix matching for a label</para>
</listitem>
<listitem>
<para>% Don't account word separator '_' in label matching, that is
'Russian%' would match 'Russian_nations', but not 'Russian'
</para>
</listitem>
</itemizedlist>
<para>
<literal>lquery</literal> can contain logical '!' (NOT) at the beginning
of the label and '|' (OR) to specify possible alternatives for label
matching.
</para>
<para>
Example of <literal>lquery</literal>:
</para>
<programlisting>
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
a) b) c) d) e)
</programlisting>
<para>
A label path should
</para>
<orderedlist numeration='loweralpha'>
<listitem>
<para>
begin from a node with label 'Top'
</para>
</listitem>
<listitem>
<para>
and following zero or 2 labels until
</para>
</listitem>
<listitem>
<para>
a node with label beginning from case-insensitive prefix 'sport'
</para>
</listitem>
<listitem>
<para>
following node with label not matched 'football' or 'tennis' and
</para>
</listitem>
<listitem>
<para>
end on node with label beginning from 'Russ' or strictly matched
'Spain'.
</para>
</listitem>
</orderedlist>
</listitem>
<listitem>
<para><literal>ltxtquery</literal>
- is a datatype for label searching (like type 'query' for full text
searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
the end of word. The meaning of modifiers are the same as for lquery.
</para>
<para>
Example: <literal>'Europe & Russia*@ & !Transportation'</literal>
</para>
<para>
Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
not 'Transportation'. Notice, the order of words as they appear in label
path is not important !
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Operations</title>
<para>
The following operations are defined for type ltree:
</para>
<itemizedlist>
<listitem>
<para>
<literal><,>,<=,>=,=, <></literal>
- Have their usual meanings. Comparison is doing in the order of direct
tree traversing, children of a node are sorted lexicographic.
</para>
</listitem>
<listitem>
<para>
<literal>ltree @> ltree</literal>
- returns TRUE if left argument is an ancestor of right argument (or
equal).
</para>
</listitem>
<listitem>
<para>
<literal>ltree <@ ltree </literal>
- returns TRUE if left argument is a descendant of right argument (or
equal).
</para>
</listitem>
<listitem>
<para>
<literal>ltree ~ lquery, lquery ~ ltree</literal>
- return TRUE if node represented by ltree satisfies lquery.
</para>
</listitem>
<listitem>
<para>
<literal>ltree ? lquery[], lquery ? ltree[]</literal>
- return TRUE if node represented by ltree satisfies at least one lquery
from array.
</para>
</listitem>
<listitem>
<para>
<literal>ltree @ ltxtquery, ltxtquery @ ltree</literal>
- return TRUE if node represented by ltree satisfies ltxtquery.
</para>
</listitem>
<listitem>
<para>
<literal>ltree || ltree, ltree || text, text || ltree</literal>
- return concatenated ltree.
</para>
</listitem>
</itemizedlist>
<para>
Operations for arrays of ltree (<literal>ltree[]</literal>):
</para>
<itemizedlist>
<listitem>
<para>
<literal>ltree[] @> ltree, ltree <@ ltree[]</literal>
- returns TRUE if array ltree[] contains an ancestor of ltree.
</para>
</listitem>
<listitem>
<para>
<literal>ltree @> ltree[], ltree[] <@ ltree</literal>
- returns TRUE if array ltree[] contains a descendant of ltree.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] ~ lquery, lquery ~ ltree[]</literal>
- returns TRUE if array ltree[] contains label paths matched lquery.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] ? lquery[], lquery[] ? ltree[]</literal>
- returns TRUE if array ltree[] contains label paths matched atleaset one
lquery from array.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] @ ltxtquery, ltxtquery @ ltree[]</literal>
- returns TRUE if array ltree[] contains label paths matched ltxtquery
(full text search).
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery</literal>
- returns first element of array ltree[] satisfies corresponding condition
and NULL in vice versa.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Remark</title>
<para>
Operations <literal>&lt;@</literal>, <literal>@&gt;</literal>, <literal>@</literal> and
<literal>~</literal> have analogues - <literal>^&lt;@, ^@&gt;, ^@, ^~,</literal> which don't use
indices!
</para>
</sect2>
<sect2>
<title>Indices</title>
<para>
Various indices could be created to speed up execution of operations:
</para>
<itemizedlist>
<listitem>
<para>
B-tree index over ltree: <literal>&lt;, &lt;=, =, &gt;=, &gt;</literal>
</para>
</listitem>
<listitem>
<para>
GiST index over ltree: <literal>&lt;, &lt;=, =, &gt;=, &gt;, @&gt;, &lt;@, @, ~, ?</literal>
</para>
<para>
Example:
</para>
<programlisting>
CREATE INDEX path_gist_idx ON test USING GIST (path);
</programlisting>
</listitem>
<listitem>
<para>GiST index over ltree[]:
<literal>ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.</literal>
</para>
<para>
Example:
</para>
<programlisting>
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
</programlisting>
<para>
Notices: This index is lossy.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>ltree subltree(ltree, start, end)</literal>
returns subpath of ltree from start (inclusive) until the end.
</para>
<programlisting>
# select subltree('Top.Child1.Child2',1,2);
subltree
--------
Child1
</programlisting>
</listitem>
<listitem>
<para>
<literal>ltree subpath(ltree, OFFSET,LEN)</literal> and
<literal>ltree subpath(ltree, OFFSET)</literal>
returns subpath of ltree from OFFSET (inclusive) with length LEN.
If OFFSET is negative returns subpath starts that far from the end
of the path. If LENGTH is omitted, returns everything to the end
of the path. If LENGTH is negative, leaves that many labels off
the end of the path.
</para>
<programlisting>
# select subpath('Top.Child1.Child2',1,2);
subpath
-------
Child1.Child2
# select subpath('Top.Child1.Child2',-2,1);
subpath
---------
Child1
</programlisting>
</listitem>
<listitem>
<para>
<literal>int4 nlevel(ltree)</literal> - returns level of the node.
</para>
<programlisting>
# select nlevel('Top.Child1.Child2');
nlevel
--------
3
</programlisting>
<para>
Note, that arguments start, end, OFFSET, LEN have meaning of level of the
node !
</para>
</listitem>
<listitem>
<para>
<literal>int4 index(ltree,ltree)</literal> and
<literal>int4 index(ltree,ltree,OFFSET)</literal>
returns number of level of the first occurence of second argument in first
one beginning from OFFSET. if OFFSET is negative, than search begins from |
OFFSET| levels from the end of the path.
</para>
<programlisting>
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
index
-------
6
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
index
-------
9
</programlisting>
</listitem>
<listitem>
<para>
<literal>ltree text2ltree(text)</literal> and
<literal>text ltree2text(text)</literal> cast functions for ltree and text.
</para>
</listitem>
<listitem>
<para>
<literal>ltree lca(ltree,ltree,...) (up to 8 arguments)</literal> and
<literal>ltree lca(ltree[])</literal> Returns Lowest Common Ancestor (lca).
</para>
<programlisting>
# select lca('1.2.2.3','1.2.3.4.5.6');
lca
-----
1.2
# select lca('{la.2.3,1.2.3.4.5.6}') is null;
?column?
----------
f
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Installation</title>
<programlisting>
cd contrib/ltree
make
make install
make installcheck
</programlisting>
</sect2>
<sect2>
<title>Example</title>
<programlisting>
createdb ltreetest
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
psql ltreetest < ltreetest.sql
</programlisting>
<para>
Now, we have a database ltreetest populated with a data describing hierarchy
shown below:
</para>
<programlisting>
TOP
/ | \
Science Hobbies Collections
/ | \
Astronomy Amateurs_Astronomy Pictures
/ \ |
Astrophysics Cosmology Astronomy
/ | \
Galaxies Stars Astronauts
</programlisting>
<para>
Inheritance:
</para>
<programlisting>
ltreetest=# select path from test where path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
</programlisting>
<para>
Matching:
</para>
<programlisting>
ltreetest=# select path from test where path ~ '*.Astronomy.*';
path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
</programlisting>
<para>
Full text search:
</para>
<programlisting>
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies.Amateurs_Astronomy
(4 rows)
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
</programlisting>
<para>
Using Functions:
</para>
<programlisting>
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
?column?
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
We could create SQL-function:
CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
LANGUAGE SQL IMMUTABLE;
</programlisting>
<para>
and previous select could be rewritten as:
</para>
<programlisting>
ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
</programlisting>
<para>
Or with another arguments:
</para>
<programlisting>
CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
LANGUAGE SQL IMMUTABLE;
ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
</programlisting>
</sect2>
<sect2>
<title>Additional data</title>
<para>
To get more feeling from our ltree module you could download
dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
available from
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/ltree/"></ulink>
dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
Setup your test database (dmoz), load ltree module and issue command:
</para>
<programlisting>
zcat dmozltree-eng.sql.gz| psql dmoz
</programlisting>
<para>
Data will be loaded into database dmoz and all indices will be created.
</para>
</sect2>
<sect2>
<title>Benchmarks</title>
<para>
All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
data, containing 300,274 nodes (see above for download link). We used some
basic queries typical for walking through catalog.
</para>
<sect3>
<title>Queries</title>
<itemizedlist>
<listitem>
<para>
Q0: Count all rows (sort of base time for comparison)
</para>
<programlisting>
select count(*) from dmoz;
count
--------
300274
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
Q1: Get direct children (without inheritance)
</para>
<programlisting>
select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
path
-----------------------------------
Top.Adult.Arts.Animation.Cartoons
Top.Adult.Arts.Animation.Anime
(2 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q2: The same as Q1 but with counting of successors
</para>
<programlisting>
select path as parentpath , (select count(*)-1 from dmoz where path <@
p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
parentpath | count
-----------------------------------+-------
Top.Adult.Arts.Animation.Cartoons | 2
Top.Adult.Arts.Animation.Anime | 61
(2 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q3: Get all parents
</para>
<programlisting>
select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
path asc;
path
--------------------------
Top
Top.Adult
Top.Adult.Arts
Top.Adult.Arts.Animation
(4 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q4: Get all parents with counting of children
</para>
<programlisting>
select path, (select count(*)-1 from dmoz where path <@ p.path) as count
from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
path | count
--------------------------+--------
Top | 300273
Top.Adult | 4913
Top.Adult.Arts | 339
Top.Adult.Arts.Animation | 65
(4 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q5: Get all children with levels
</para>
<programlisting>
select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
path | level
------------------------------------------------+-------
Top.Adult.Arts.Animation.Anime | 1
Top.Adult.Arts.Animation.Anime.Fan_Works | 2
Top.Adult.Arts.Animation.Anime.Games | 2
Top.Adult.Arts.Animation.Anime.Genres | 2
Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
Top.Adult.Arts.Animation.Anime.Multimedia | 2
Top.Adult.Arts.Animation.Anime.Resources | 2
Top.Adult.Arts.Animation.Anime.Titles | 2
Top.Adult.Arts.Animation.Cartoons | 1
Top.Adult.Arts.Animation.Cartoons.AVS | 2
Top.Adult.Arts.Animation.Cartoons.Members | 2
(11 rows)
</programlisting>
</listitem>
</itemizedlist>
</sect3>
<sect3>
<title>Timings</title>
<programlisting>
+---------------------------------------------+
|Query|Rows|Time (ms) index|Time (ms) no index|
|-----+----+---------------+------------------|
| Q0| 1| NA| 1453.44|
|-----+----+---------------+------------------|
| Q1| 2| 0.49| 1001.54|
|-----+----+---------------+------------------|
| Q2| 2| 1.48| 3009.39|
|-----+----+---------------+------------------|
| Q3| 4| 0.55| 906.98|
|-----+----+---------------+------------------|
| Q4| 4| 24385.07| 4951.91|
|-----+----+---------------+------------------|
| Q5| 11| 0.85| 1003.23|
+---------------------------------------------+
</programlisting>
<para>
Timings without indices were obtained using operations which doesn't use
indices (see above)
</para>
</sect3>
<sect3>
<title>Remarks</title>
<para>
We didn't run full-scale tests, also we didn't present (yet) data for
operations with arrays of ltree (ltree[]) and full text searching. We'll
appreciate your input. So far, below some (rather obvious) results:
</para>
<itemizedlist>
<listitem>
<para>
Indices does help execution of queries
</para>
</listitem>
<listitem>
<para>
Q4 performs bad because one needs to read almost all data from the HDD
</para>
</listitem>
</itemizedlist>
</sect3>
</sect2>
<sect2>
<title>Some Backgrounds</title>
<para>
The approach we use for ltree is much like one we used in our other GiST based
contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
background is available in papers referenced from our GiST development page
(<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>).
</para>
<para>
A hierarchical data structure (tree) is a set of nodes. Each node has a
signature (LPS) of a fixed size, which is a hashed label path of that node.
Traversing a tree we could *certainly* prune branches if
</para>
<programlisting>
LQS (bitwise AND) LPS != LQS
</programlisting>
<para>
where LQS is a signature of lquery or ltxtquery, obtained in the same way as
LPS.
</para>
<programlisting>
ltree[]:
</programlisting>
<para>
For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
reachable from that node. Signatures are stored in RD-tree, implemented using
GiST, which provides indexed access.
</para>
<programlisting>
ltree:
</programlisting>
<para>
For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
represented by (left_bound, signature, right_bound), so that we could speedup
operations <literal><, <=, =, >=, ></literal> using left_bound, right_bound and prune branches of
a tree using signature.
</para>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
additional information. Authors would like to thank Eugeny Rodichev for
helpful discussions. Comments and bug reports are welcome.
</para>
</sect2>
</sect1>

View File

@ -1,37 +1,70 @@
This utility allows administrators to examine the file structure used by
PostgreSQL. To make use of it, you need to be familiar with the file
structure, which is described in the "Database File Layout" chapter of
the "Internals" section of the PostgreSQL documentation.
Oid2name connects to the database and extracts OID, filenode, and table
name information. You can also have it show database OIDs and tablespace
OIDs.
When displaying specific tables, you can select which tables to show by
using -o, -f and -t. The first switch takes an OID, the second takes
a filenode, and the third takes a tablename (actually, it's a LIKE
pattern, so you can use things like "foo%"). Note that you can use as many
of these switches as you like, and the listing will include all objects
matched by any of the switches. Also note that these switches can only
show objects in the database given in -d.
If you don't give any of -o, -f or -t it will dump all the tables in the
database given in -d. If you don't give -d, it will show a database
listing. Alternatively you can give -s to get a tablespace listing.
Additional switches:
-i include indexes and sequences in the database listing.
-x display more information about each object shown:
tablespace name, schema name, OID.
-S also show system objects
(those in information_schema, pg_toast and pg_catalog schemas)
-q don't display headers
(useful for scripting)
---------------------------------------------------------------------------
Sample session:
<sect1 id="oid2name">
<title>oid2name</title>
<indexterm zone="oid2name">
<primary>oid2name</primary>
</indexterm>
<para>
This utility allows administrators to examine the file structure used by
PostgreSQL. To make use of it, you need to be familiar with the file
structure, which is described in <xref linkend="storage">.
</para>
<sect2>
<title>Overview</title>
<para>
<literal>oid2name</literal> connects to the database and extracts OID,
filenode, and table name information. You can also have it show database
OIDs and tablespace OIDs.
</para>
<para>
When displaying specific tables, you can select which tables to show by
using -o, -f and -t. The first switch takes an OID, the second takes
a filenode, and the third takes a tablename (actually, it's a LIKE
pattern, so you can use things like "foo%"). Note that you can use as many
of these switches as you like, and the listing will include all objects
matched by any of the switches. Also note that these switches can only
show objects in the database given in -d.
</para>
<para>
If you don't give any of -o, -f or -t it will dump all the tables in the
database given in -d. If you don't give -d, it will show a database
listing. Alternatively you can give -s to get a tablespace listing.
</para>
<table>
<title>Additional switches</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>-i</literal></entry>
<entry>include indexes and sequences in the database listing.</entry>
</row>
<row>
<entry><literal>-x</literal></entry>
<entry>display more information about each object shown: tablespace name,
schema name, OID.
</entry>
</row>
<row>
<entry><literal>-S</literal></entry>
<entry>also show system objects (those in information_schema, pg_toast
and pg_catalog schemas)
</entry>
</row>
<row>
<entry><literal>-q</literal></entry>
<entry>don't display headers(useful for scripting)</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Examples</title>
<programlisting>
$ oid2name
All databases:
Oid Database Name Tablespace
@ -147,19 +180,26 @@ From database "alvherre":
155156 foo
$ # end of sample session.
</programlisting>
---------------------------------------------------------------------------
<para>
You can also get approximate size data for each object using psql. For
example,
</para>
<programlisting>
SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC;
</programlisting>
<para>
Each page is typically 8k. Relpages is updated by VACUUM.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
b. palmer, <email>bpalmer@crimelabs.net</email>
</para>
</sect2>
You can also get approximate size data for each object using psql. For
example,
</sect1>
SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC;
Each page is typically 8k. Relpages is updated by VACUUM.
---------------------------------------------------------------------------
Mail me with any problems or additions you would like to see. Clearing
house for the code will be at: http://www.crimelabs.net
b. palmer, bpalmer@crimelabs.net

View File

@ -0,0 +1,125 @@
<sect1 id="pageinspect">
<title>pageinspect</title>
<indexterm zone="pageinspect">
<primary>pageinspect</primary>
</indexterm>
<para>
The functions in this module allow you to inspect the contents of data pages
at a low level, for debugging purposes.
</para>
<sect2>
<title>Functions included</title>
<itemizedlist>
<listitem>
<para>
<literal>get_raw_page</literal> reads one block of the named table and returns a copy as a
bytea field. This allows a single time-consistent copy of the block to be
made. Use of this functions is restricted to superusers.
</para>
</listitem>
<listitem>
<para>
<literal>page_header</literal> shows fields which are common to all PostgreSQL heap and index
pages. Use of this function is restricted to superusers.
</para>
<para>
A page image obtained with <literal>get_raw_page</literal> should be passed as argument:
</para>
<programlisting>
test=# SELECT * FROM page_header(get_raw_page('pg_class',0));
lsn | tli | flags | lower | upper | special | pagesize | version
----------+-----+-------+-------+-------+---------+----------+---------
0/3C5614 | 1 | 1 | 216 | 256 | 8192 | 8192 | 4
(1 row)
</programlisting>
<para>
The returned columns correspond to the fields in the PageHeaderData-struct,
see src/include/storage/bufpage.h for more details.
</para>
</listitem>
<listitem>
<para>
<literal>heap_page_items</literal> shows all line pointers on a heap page. For those line
pointers that are in use, tuple headers are also shown. All tuples are
shown, whether or not the tuples were visible to an MVCC snapshot at the
time the raw page was copied. Use of this function is restricted to
superusers.
</para>
<para>
A heap page image obtained with <literal>get_raw_page</literal> should be passed as argument:
</para>
<programlisting>
test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0));
</programlisting>
<para>
See src/include/storage/itemid.h and src/include/access/htup.h for
explanations of the fields returned.
</para>
</listitem>
<listitem>
<para>
<literal>bt_metap()</literal> returns information about the btree index metapage:
</para>
<programlisting>
test=> SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----
magic | 340322
version | 2
root | 1
level | 0
fastroot | 1
fastlevel | 0
</programlisting>
</listitem>
<listitem>
<para>
<literal>bt_page_stats()</literal> shows information about single btree pages:
</para>
<programlisting>
test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
dead_items | 0
avg_item_size | 12
page_size | 8192
free_size | 4056
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3
</programlisting>
</listitem>
<listitem>
<para>
<literal>bt_page_items()</literal> returns information about specific items on btree pages:
</para>
<programlisting>
test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
</programlisting>
</listitem>
</itemizedlist>
</sect2>
</sect1>

422
doc/src/sgml/pgbench.sgml Normal file
View File

@ -0,0 +1,422 @@
<sect1 id="pgbench">
<title>pgbench</title>
<indexterm zone="pgbench">
<primary>pgbench</primary>
</indexterm>
<para>
<literal>pgbench</literal> is a simple program to run a benchmark test.
<literal>pgbench</literal> is a client application of PostgreSQL and runs
with PostgreSQL only. It performs lots of small and simple transactions
including SELECT/UPDATE/INSERT operations then calculates number of
transactions successfully completed within a second (transactions
per second, tps). Targeting data includes a table with at least 100k
tuples.
</para>
<para>
Example outputs from pgbench look like:
</para>
<programlisting>
number of clients: 4
number of transactions per client: 100
number of processed transactions: 400/400
tps = 19.875015(including connections establishing)
tps = 20.098827(excluding connections establishing)
</programlisting>
<para> Similar program called "JDBCBench" already exists, but it requires
Java that may not be available on every platform. Moreover some
people concerned about the overhead of Java that might lead
inaccurate results. So I decided to write in pure C, and named
it "pgbench."
</para>
<para>
Features of pgbench:
</para>
<itemizedlist>
<listitem>
<para>
pgbench is written in C using libpq only. So it is very portable
and easy to install.
</para>
</listitem>
<listitem>
<para>
pgbench can simulate concurrent connections using asynchronous
capability of libpq. No threading is required.
</para>
</listitem>
</itemizedlist>
<sect2>
<title>Overview</title>
<orderedlist>
<listitem>
<para>(optional)Initialize database by:</para>
<programlisting>
pgbench -i &lt;dbname&gt;
</programlisting>
<para>
where &lt;dbname&gt; is the name of database. pgbench uses four tables
accounts, branches, history and tellers. These tables will be
destroyed. Be very careful if you have tables having same
names. Default test data contains:
</para>
<programlisting>
table # of tuples
-------------------------
branches 1
tellers 10
accounts 100000
history 0
</programlisting>
<para>
You can increase the number of tuples by using -s option. branches,
tellers and accounts tables are created with a fillfactor which is
set using -F option. See below.
</para>
</listitem>
<listitem>
<para>Run the benchmark test</para>
<programlisting>
pgbench &lt;dbname&gt;
</programlisting>
<para>
The default configuration is:
</para>
<programlisting>
number of clients: 1
number of transactions per client: 10
</programlisting>
</listitem>
</orderedlist>
<table>
<title><literal>pgbench</literal> options</title>
<tgroup cols="2">
<thead>
<row>
<entry>Parameter</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>-h hostname</literal></entry>
<entry>
<para>
hostname where the backend is running. If this option
is omitted, pgbench will connect to the localhost via
Unix domain socket.
</para>
</entry>
</row>
<row>
<entry><literal>-p port</literal></entry>
<entry>
<para>
the port number that the backend is accepting. default is
libpq's default, usually 5432.
</para>
</entry>
</row>
<row>
<entry><literal>-c number_of_clients</literal></entry>
<entry>
<para>
Number of clients simulated. default is 1.
</para>
</entry>
</row>
<row>
<entry><literal>-t number_of_transactions</literal></entry>
<entry>
<para>
Number of transactions each client runs. default is 10.
</para>
</entry>
</row>
<row>
<entry><literal>-s scaling_factor</literal></entry>
<entry>
<para>
this should be used with -i (initialize) option.
number of tuples generated will be multiple of the
scaling factor. For example, -s 100 will imply 10M
(10,000,000) tuples in the accounts table.
default is 1.
</para>
<para>
NOTE: scaling factor should be at least
as large as the largest number of clients you intend
to test; else you'll mostly be measuring update contention.
Regular (not initializing) runs using one of the
built-in tests will detect scale based on the number of
branches in the database. For custom (-f) runs it can
be manually specified with this parameter.
</para>
</entry>
</row>
<row>
<entry><literal>-D varname=value</literal></entry>
<entry>
<para>
Define a variable. It can be refered to by a script
provided by using -f option. Multiple -D options are allowed.
</para>
</entry>
</row>
<row>
<entry><literal>-U login</literal></entry>
<entry>
<para>
Specify db user's login name if it is different from
the Unix login name.
</para>
</entry>
</row>
<row>
<entry><literal>-P password</literal></entry>
<entry>
<para>
Specify the db password. CAUTION: using this option
might be a security hole since ps command will
show the password. Use this for TESTING PURPOSE ONLY.
</para>
</entry>
</row>
<row>
<entry><literal>-n</literal></entry>
<entry>
<para>
No vacuuming and cleaning the history table prior to the
test is performed.
</para>
</entry>
</row>
<row>
<entry><literal>-v</literal></entry>
<entry>
<para>
Do vacuuming before testing. This will take some time.
With neither -n nor -v, pgbench will vacuum tellers and
branches tables only.
</para>
</entry>
</row>
<row>
<entry><literal>-S</literal></entry>
<entry>
<para>
Perform select only transactions instead of TPC-B.
</para>
</entry>
</row>
<row>
<entry><literal>-N</literal></entry>
<entry>
<para>
Do not update "branches" and "tellers". This will
avoid heavy update contention on branches and tellers,
while it will not make pgbench supporting TPC-B like
transactions.
</para>
</entry>
</row>
<row>
<entry><literal>-f filename</literal></entry>
<entry>
<para>
Read transaction script from file. Detailed
explanation will appear later.
</para>
</entry>
</row>
<row>
<entry><literal>-C</literal></entry>
<entry>
<para>
Establish connection for each transaction, rather than
doing it just once at beginning of pgbench in the normal
mode. This is useful to measure the connection overhead.
</para>
</entry>
</row>
<row>
<entry><literal>-l</literal></entry>
<entry>
<para>
Write the time taken by each transaction to a logfile,
with the name "pgbench_log.xxx", where xxx is the PID
of the pgbench process. The format of the log is:
</para>
<programlisting>
client_id transaction_no time file_no time-epoch time-us
</programlisting>
<para>
where time is measured in microseconds, , the file_no is
which test file was used (useful when multiple were
specified with -f), and time-epoch/time-us are a
UNIX epoch format timestamp followed by an offset
in microseconds (suitable for creating a ISO 8601
timestamp with a fraction of a second) of when
the transaction completed.
</para>
<para>
Here are example outputs:
</para>
<programlisting>
0 199 2241 0 1175850568 995598
0 200 2465 0 1175850568 998079
0 201 2513 0 1175850569 608
0 202 2038 0 1175850569 2663
</programlisting>
</entry>
</row>
<row>
<entry><literal>-F fillfactor</literal></entry>
<entry>
<para>
Create tables(accounts, tellers and branches) with the given
fillfactor. Default is 100. This should be used with -i
(initialize) option.
</para>
</entry>
</row>
<row>
<entry><literal>-d</literal></entry>
<entry>
<para>
debug option.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>What is the "transaction" actually performed in pgbench?</title>
<orderedlist>
<listitem><para><literal>begin;</literal></para></listitem>
<listitem><para><literal>update accounts set abalance = abalance + :delta where aid = :aid;</literal></para></listitem>
<listitem><para><literal>select abalance from accounts where aid = :aid;</literal></para></listitem>
<listitem><para><literal>update tellers set tbalance = tbalance + :delta where tid = :tid;</literal></para></listitem>
<listitem><para><literal>update branches set bbalance = bbalance + :delta where bid = :bid;</literal></para></listitem>
<listitem><para><literal>insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);</literal></para></listitem>
<listitem><para><literal>end;</literal></para></listitem>
</orderedlist>
<para>
If you specify -N, (4) and (5) aren't included in the transaction.
</para>
</sect2>
<sect2>
<title>Script file</title>
<para>
<literal>pgbench</literal> has support for reading a transaction script
from a specified file (<literal>-f</literal> option). This file should
include SQL commands in each line. SQL command consists of multiple lines
are not supported. Empty lines and lines begging with "--" will be ignored.
</para>
<para>
Multiple <literal>-f</literal> options are allowed. In this case each
transaction is assigned randomly chosen script.
</para>
<para>
SQL commands can include "meta command" which begins with "\" (back
slash). A meta command takes some arguments separted by white
spaces. Currently following meta command is supported:
</para>
<itemizedlist>
<listitem>
<para>
<literal>\set name operand1 [ operator operand2 ]</literal>
- Sets the calculated value using "operand1" "operator"
"operand2" to variable "name". If "operator" and "operand2"
are omitted, the value of operand1 is set to variable "name".
</para>
<para>
Example:
</para>
<programlisting>
\set ntellers 10 * :scale
</programlisting>
</listitem>
<listitem>
<para>
<literal>\setrandom name min max</literal>
- Assigns random integer to name between min and max
</para>
<para>
Example:
</para>
<programlisting>
\setrandom aid 1 100000
</programlisting>
</listitem>
<listitem>
<para>
Variables can be referred to in SQL comands by adding ":" in front
of the varible name.
</para>
<para>
Example:
</para>
<programlisting>
SELECT abalance FROM accounts WHERE aid = :aid
</programlisting>
<para>
Variables can also be defined by using -D option.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Examples</title>
<para>
Example, TPC-B like benchmark can be defined as follows(scaling
factor = 1):
</para>
<programlisting>
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta 1 10000
BEGIN
UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
SELECT abalance FROM accounts WHERE aid = :aid
UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
END
</programlisting>
<para>
If you want to automatically set the scaling factor from the number of
tuples in branches table, use -s option and shell command like this:
</para>
<programlisting>
pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
</programlisting>
<para>
Notice that -f option does not execute vacuum and clearing history
table before starting benchmark.
</para>
</sect2>
</sect1>

1144
doc/src/sgml/pgcrypto.sgml Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,123 @@
<sect1 id="pgrowlocks">
<title>pgrowlocks</title>
<indexterm zone="pgrowlocks">
<primary>pgrowlocks</primary>
</indexterm>
<para>
The <literal>pgrowlocks</literal> module provides a function to show row
locking information for a specified table.
</para>
<sect2>
<title>Overview</title>
<programlisting>
pgrowlocks(text) RETURNS pgrowlocks_type
</programlisting>
<para>
The parameter is a name of table. And <literal>pgrowlocks_type</literal> is
defined as:
</para>
<programlisting>
CREATE TYPE pgrowlocks_type AS (
locked_row TID, -- row TID
lock_type TEXT, -- lock type
locker XID, -- locking XID
multi bool, -- multi XID?
xids xid[], -- multi XIDs
pids INTEGER[] -- locker's process id
);
</programlisting>
<table>
<title>pgrowlocks_type</title>
<tgroup cols="2">
<tbody>
<row>
<entry>locked_row</entry>
<entry>tuple ID(TID) of each locked rows</entry>
</row>
<row>
<entry>lock_type</entry>
<entry>"Shared" for shared lock, "Exclusive" for exclusive lock</entry>
</row>
<row>
<entry>locker</entry>
<entry>transaction ID of locker (Note 1)</entry>
</row>
<row>
<entry>multi</entry>
<entry>"t" if locker is a multi transaction, otherwise "f"</entry>
</row>
<row>
<entry>xids</entry>
<entry>XIDs of lockers (Note 2)</entry>
</row>
<row>
<entry>pids</entry>
<entry>process ids of locking backends</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Note1: If the locker is multi transaction, it represents the multi ID.
</para>
<para>
Note2: If the locker is multi, multiple data are shown.
</para>
<para>
The calling sequence for <literal>pgrowlocks</literal> is as follows:
<literal>pgrowlocks</literal> grabs AccessShareLock for the target table and
reads each row one by one to get the row locking information. You should
notice that:
</para>
<orderedlist>
<listitem>
<para>
if the table is exclusive locked by someone else,
<literal>pgrowlocks</literal> will be blocked.
</para>
</listitem>
<listitem>
<para>
<literal>pgrowlocks</literal> may show incorrect information if there's a
new lock or a lock is freeed while its execution.
</para>
</listitem>
</orderedlist>
<para>
<literal>pgrowlocks</literal> does not show the contents of locked rows. If
you want to take a look at the row contents at the same time, you could do
something like this:
</para>
<programlisting>
SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
</programlisting>
</sect2>
<sect2>
<title>Example</title>
<para>
<literal>pgrowlocks</literal> returns the following data type:
</para>
<para>
Here is a sample execution of pgrowlocks:
</para>
<programlisting>
test=# SELECT * FROM pgrowlocks('t1');
locked_row | lock_type | locker | multi | xids | pids
------------+-----------+--------+-------+-----------+---------------
(0,1) | Shared | 19 | t | {804,805} | {29066,29068}
(0,2) | Shared | 19 | t | {804,805} | {29066,29068}
(0,3) | Exclusive | 804 | f | {804} | {29066}
(0,4) | Exclusive | 804 | f | {804} | {29066}
(4 rows)
</programlisting>
</sect2>
</sect1>

View File

@ -0,0 +1,158 @@
<sect1 id="pgstattuple">
<title>pgstattuple</title>
<indexterm zone="pgstattuple">
<primary>pgstattuple</primary>
</indexterm>
<para>
<literal>pgstattuple</literal> modules provides various functions to obtain
tuple statistics.
</para>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>pgstattuple()</literal> returns the relation length, percentage
of the "dead" tuples of a relation and other info. This may help users to
determine whether vacuum is necessary or not. Here is an example session:
</para>
<programlisting>
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
</programlisting>
<para>
Here are explanations for each column:
</para>
<table>
<title><literal>pgstattuple()</literal> column descriptions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Column</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>table_len</entry>
<entry>physical relation length in bytes</entry>
</row>
<row>
<entry>tuple_count</entry>
<entry>number of live tuples</entry>
</row>
<row>
<entry>tuple_len</entry>
<entry>total tuples length in bytes</entry>
</row>
<row>
<entry>tuple_percent</entry>
<entry>live tuples in %</entry>
</row>
<row>
<entry>dead_tuple_len</entry>
<entry>total dead tuples length in bytes</entry>
</row>
<row>
<entry>dead_tuple_percent</entry>
<entry>dead tuples in %</entry>
</row>
<row>
<entry>free_space</entry>
<entry>free space in bytes</entry>
</row>
<row>
<entry>free_percent</entry>
<entry>free space in %</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
<note>
<para>
<literal>pgstattuple</literal> acquires only a read lock on the relation. So
concurrent update may affect the result.
</para>
</note>
<note>
<para>
<literal>pgstattuple</literal> judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.
</para>
</note>
</para>
</listitem>
<listitem>
<para>
<literal>pg_relpages()</literal> returns the number of pages in the relation.
</para>
</listitem>
<listitem>
<para>
<literal>pgstatindex()</literal> returns an array showing the information about an index:
</para>
<programlisting>
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Usage</title>
<para>
<literal>pgstattuple</literal> may be called as a relation function and is
defined as follows:
</para>
<programlisting>
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;
</programlisting>
<para>
The argument is the relation name (optionally it may be qualified)
or the OID of the relation. Note that pgstattuple only returns
one row.
</para>
</sect2>
</sect1>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.83 2007/11/01 17:00:18 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.84 2007/11/10 23:30:46 momjian Exp $ -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
@ -102,6 +102,7 @@
&typeconv;
&indices;
&textsearch;
&contrib;
&mvcc;
&perform;

450
doc/src/sgml/seg.sgml Normal file
View File

@ -0,0 +1,450 @@
<sect1 id="seg">
<title>seg</title>
<indexterm zone="seg">
<primary>seg</primary>
</indexterm>
<para>
The <literal>seg</literal> module contains the code for the user-defined
type, <literal>SEG</literal>, representing laboratory measurements as
floating point intervals.
</para>
<sect2>
<title>Rationale</title>
<para>
The geometry of measurements is usually more complex than that of a
point in a numeric continuum. A measurement is usually a segment of
that continuum with somewhat fuzzy limits. The measurements come out
as intervals because of uncertainty and randomness, as well as because
the value being measured may naturally be an interval indicating some
condition, such as the temperature range of stability of a protein.
</para>
<para>
Using just common sense, it appears more convenient to store such data
as intervals, rather than pairs of numbers. In practice, it even turns
out more efficient in most applications.
</para>
<para>
Further along the line of common sense, the fuzziness of the limits
suggests that the use of traditional numeric data types leads to a
certain loss of information. Consider this: your instrument reads
6.50, and you input this reading into the database. What do you get
when you fetch it? Watch:
</para>
<programlisting>
test=> select 6.50 as "pH";
pH
---
6.5
(1 row)
</programlisting>
<para>
In the world of measurements, 6.50 is not the same as 6.5. It may
sometimes be critically different. The experimenters usually write
down (and publish) the digits they trust. 6.50 is actually a fuzzy
interval contained within a bigger and even fuzzier interval, 6.5,
with their center points being (probably) the only common feature they
share. We definitely do not want such different data items to appear the
same.
</para>
<para>
Conclusion? It is nice to have a special data type that can record the
limits of an interval with arbitrarily variable precision. Variable in
a sense that each data element records its own precision.
</para>
<para>
Check this out:
</para>
<programlisting>
test=> select '6.25 .. 6.50'::seg as "pH";
pH
------------
6.25 .. 6.50
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Syntax</title>
<para>
The external representation of an interval is formed using one or two
floating point numbers joined by the range operator ('..' or '...').
Optional certainty indicators (<, > and ~) are ignored by the internal
logics, but are retained in the data.
</para>
<table>
<title>Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>seg -> boundary PLUMIN deviation</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>seg -> boundary RANGE boundary</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>seg -> boundary RANGE</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>seg -> RANGE boundary</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>seg -> boundary</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>boundary -> FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>boundary -> EXTENSION FLOAT</entry>
</row>
<row>
<entry>rule 8</entry>
<entry>deviation -> FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Tokens</title>
<tgroup cols="2">
<tbody>
<row>
<entry>RANGE</entry>
<entry>(\.\.)(\.)?</entry>
</row>
<row>
<entry>PLUMIN</entry>
<entry>\'\+\-\'</entry>
</row>
<row>
<entry>integer</entry>
<entry>[+-]?[0-9]+</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?[0-9]+\.[0-9]+</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
</row>
<row>
<entry>EXTENSION</entry>
<entry>[<>~]</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples of valid <literal>SEG</literal> representations</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Any number</entry>
<entry>
(rules 5,6) -- creates a zero-length segment (a point,
if you will)
</entry>
</row>
<row>
<entry>~5.0</entry>
<entry>
(rules 5,7) -- creates a zero-length segment AND records
'~' in the data. This notation reads 'approximately 5.0',
but its meaning is not recognized by the code. It is ignored
until you get the value back. View it is a short-hand comment.
</entry>
</row>
<row>
<entry><5.0</entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '<' is ignored but
is preserved as a comment
</entry>
</row>
<row>
<entry>>5.0</entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '>' is ignored but
is preserved as a comment
</entry>
</row>
<row>
<entry><para>5(+-)0.3</para><para>5'+-'0.3</para></entry>
<entry>
<para>
(rules 1,8) -- creates an interval '4.7..5.3'. As of this
writing (02/09/2000), this mechanism isn't completely accurate
in determining the number of significant digits for the
boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
</para>
<programlisting>
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
</programlisting>
<para>
Also, the (+-) notation is not preserved: 'a(+-)b' will
always be returned as '(a-b) .. (a+b)'. The purpose of this
notation is to allow input from certain data sources without
conversion.
</para>
</entry>
</row>
<row>
<entry>50 .. </entry>
<entry>(rule 3) -- everything that is greater than or equal to 50</entry>
</row>
<row>
<entry>.. 0</entry>
<entry>(rule 4) -- everything that is less than or equal to 0</entry>
</row>
<row>
<entry>1.5e-2 .. 2E-2 </entry>
<entry>(rule 2) -- creates an interval (0.015 .. 0.02)</entry>
</row>
<row>
<entry>1 ... 2</entry>
<entry>
The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
Because of the widespread use of '...' in the data sources,
I decided to stick to is as a range operator. This, and
also the fact that the white space around the range operator
is ignored, creates a parsing conflict with numeric constants
starting with a decimal point.
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>.1e7</entry>
<entry>should be: 0.1e7</entry>
</row>
<row>
<entry>.1 .. .2</entry>
<entry>should be: 0.1 .. 0.2</entry>
</row>
<row>
<entry>2.4 E4</entry>
<entry>should be: 2.4E4</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The following, although it is not a syntax error, is disallowed to improve
the sanity of the data:
</para>
<table>
<title></title>
<tgroup cols="2">
<tbody>
<row>
<entry>5 .. 2</entry>
<entry>should be: 2 .. 5</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Precision</title>
<para>
The segments are stored internally as pairs of 32-bit floating point
numbers. It means that the numbers with more than 7 significant digits
will be truncated.
</para>
<para>
The numbers with less than or exactly 7 significant digits retain their
original precision. That is, if your query returns 0.00, you will be
sure that the trailing zeroes are not the artifacts of formatting: they
reflect the precision of the original data. The number of leading
zeroes does not affect precision: the value 0.0067 is considered to
have just 2 significant digits.
</para>
</sect2>
<sect2>
<title>Usage</title>
<para>
The access method for SEG is a GiST index (gist_seg_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/cube).
</para>
<para>
The operators supported by the GiST access method include:
</para>
<itemizedlist>
<listitem>
<programlisting>
[a, b] << [c, d] Is left of
</programlisting>
<para>
The left operand, [a, b], occurs entirely to the left of the
right operand, [c, d], on the axis (-inf, inf). It means,
[a, b] << [c, d] is true if b < c and false otherwise
</para>
</listitem>
<listitem>
<programlisting>
[a, b] >> [c, d] Is right of
</programlisting>
<para>
[a, b] is occurs entirely to the right of [c, d].
[a, b] >> [c, d] is true if a > d and false otherwise
</para>
</listitem>
<listitem>
<programlisting>
[a, b] &< [c, d] Overlaps or is left of
</programlisting>
<para>
This might be better read as "does not extend to right of".
It is true when b <= d.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] &> [c, d] Overlaps or is right of
</programlisting>
<para>
This might be better read as "does not extend to left of".
It is true when a >= c.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] = [c, d] Same as
</programlisting>
<para>
The segments [a, b] and [c, d] are identical, that is, a == b
and c == d
</para>
</listitem>
<listitem>
<programlisting>
[a, b] && [c, d] Overlaps
</programlisting>
<para>
The segments [a, b] and [c, d] overlap.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] @> [c, d] Contains
</programlisting>
<para>
The segment [a, b] contains the segment [c, d], that is,
a <= c and b >= d
</para>
</listitem>
<listitem>
<programlisting>
[a, b] <@ [c, d] Contained in
</programlisting>
<para>
The segment [a, b] is contained in [c, d], that is,
a >= c and b <= d
</para>
</listitem>
</itemizedlist>
<para>
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
<programlisting>
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
</para>
<para>
There are a few other potentially useful functions defined in seg.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
</para>
<para>
For examples of usage, see sql/seg.sql
</para>
<para>
NOTE: The performance of an R-tree index can largely depend on the
order of input values. It may be very helpful to sort the input table
on the SEG column (see the script sort-segments.pl for an example)
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like
to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
Energy for the years of faithful support of my database research.
</para>
<programlisting>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
</programlisting>
<para>
<email>selkovjr@mcs.anl.gov</email>
</para>
</sect2>
</sect1>

164
doc/src/sgml/sslinfo.sgml Normal file
View File

@ -0,0 +1,164 @@
<sect1 id="sslinfo">
<title>sslinfo</title>
<indexterm zone="sslinfo">
<primary>sslinfo</primary>
</indexterm>
<para>
This modules provides information about current SSL certificate for PostgreSQL.
</para>
<sect2>
<title>Notes</title>
<para>
This extension won't build unless your PostgreSQL server is configured
with --with-openssl. Information provided with these functions would
be completely useless if you don't use SSL to connect to database.
</para>
</sect2>
<sect2>
<title>Functions Description</title>
<itemizedlist>
<listitem>
<programlisting>
ssl_is_used() RETURNS boolean;
</programlisting>
<para>
Returns TRUE, if current connection to server uses SSL and FALSE
otherwise.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_cert_present() RETURNS boolean
</programlisting>
<para>
Returns TRUE if current client have presented valid SSL client
certificate to the server and FALSE otherwise (e.g., no SSL,
certificate hadn't be requested by server).
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_serial() RETURNS numeric
</programlisting>
<para>
Returns serial number of current client certificate. The combination
of certificate serial number and certificate issuer is guaranteed to
uniquely identify certificate (but not its owner -- the owner ought to
regularily change his keys, and get new certificates from the issuer).
</para>
<para>
So, if you run you own CA and allow only certificates from this CA to
be accepted by server, the serial number is the most reliable (albeit
not very mnemonic) means to indentify user.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_dn() RETURNS text
</programlisting>
<para>
Returns the full subject of current client certificate, converting
character data into the current database encoding. It is assumed that
if you use non-Latin characters in the certificate names, your
database is able to represent these characters, too. If your database
uses the SQL_ASCII encoding, non-Latin characters in the name will be
represented as UTF-8 sequences.
</para>
<para>
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
</para>
</listitem>
<listitem>
<programlisting>
ssl_issuer_dn()
</programlisting>
<para>
Returns the full issuer name of the client certificate, converting
character data into current database encoding.
</para>
<para>
The combination of the return value of this function with the
certificate serial number uniquely identifies the certificate.
</para>
<para>
The result of this function is really useful only if you have more
than one trusted CA certificate in your server's root.crt file, or if
this CA has issued some intermediate certificate authority
certificates.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_dn_field(fieldName text) RETURNS text
</programlisting>
<para>
This function returns the value of the specified field in the
certificate subject. Field names are string constants that are
converted into ASN1 object identificators using the OpenSSL object
database. The following values are acceptable:
</para>
<programlisting>
commonName (alias CN)
surname (alias SN)
name
givenName (alias GN)
countryName (alias C)
localityName (alias L)
stateOrProvinceName (alias ST)
organizationName (alias O)
organizationUnitName (alias OU)
title
description
initials
postalCode
streetAddress
generationQualifier
description
dnQualifier
x500UniqueIdentifier
pseudonim
role
emailAddress
</programlisting>
<para>
All of these fields are optional, except commonName. It depends
entirely on your CA policy which of them would be included and which
wouldn't. The meaning of these fields, howeer, is strictly defined by
the X.500 and X.509 standards, so you cannot just assign arbitrary
meaning to them.
</para>
</listitem>
<listitem>
<programlisting>
ssl_issuer_field(fieldName text) RETURNS text;
</programlisting>
<para>
Does same as ssl_client_dn_field, but for the certificate issuer
rather than the certificate subject.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Author</title>
<para>
Victor Wagner <email>vitus@cryptocom.ru</email>, Cryptocom LTD
E-Mail of Cryptocom OpenSSL development group:
<email>openssl@cryptocom.ru</email>
</para>
</sect2>
</sect1>

249
doc/src/sgml/standby.sgml Normal file
View File

@ -0,0 +1,249 @@
<sect1 id="pgstandby">
<title>pg_standby</title>
<indexterm zone="pgstandby">
<primary>pgstandby</primary>
</indexterm>
<para>
<literal>pg_standby</literal> is a production-ready program that can be used
to create a Warm Standby server. Other configuration is required as well,
all of which is described in the main server manual.
</para>
<para>
The program is designed to be a wait-for <literal>restore_command</literal>,
required to turn a normal archive recovery into a Warm Standby. Within the
<literal>restore_command</literal> of the <literal>recovery.conf</literal>
you could configure <literal>pg_standby</literal> in the following way:
</para>
<programlisting>
restore_command = 'pg_standby archiveDir %f %p'
</programlisting>
<para>
which would be sufficient to define that files will be restored from
archiveDir.
</para>
<para>
<literal>pg_standby</literal> features include:
</para>
<itemizedlist>
<listitem>
<para>
It is written in C. So it is very portable
and easy to install.
</para>
</listitem>
<listitem>
<para>
Supports copy or link from a directory (only)
</para>
</listitem>
<listitem>
<para>
Source easy to modify, with specifically designated
sections to modify for your own needs, allowing
interfaces to be written for additional Backup Archive Restore
(BAR) systems
</para>
</listitem>
<listitem>
<para>
Already tested on Linux and Windows
</para>
</listitem>
</itemizedlist>
<sect2>
<title>Usage</title>
<para>
<literal>pg_standby</literal> should be used within the
<literal>restore_command</literal> of the <literal>recovery.conf</literal>
file.
</para>
<para>
The basic usage should be like this:
</para>
<programlisting>
restore_command = 'pg_standby archiveDir %f %p'
</programlisting>
<para>
with the pg_standby command usage as
</para>
<programlisting>
pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH]
</programlisting>
<para>
When used within the <literal>restore_command</literal> the %f and %p macros
will provide the actual file and path required for the restore/recovery.
</para>
<table>
<title>Options</title>
<tgroup cols="2">
<tbody>
<row>
<entry>-c</entry>
<entry> use copy/cp command to restore WAL files from archive</entry>
</row>
<row>
<entry>-d</entry>
<entry>debug/logging option.</entry>
</row>
<row>
<entry>-k numfiles</entry>
<entry>
<para>
Cleanup files in the archive so that we maintain no more
than this many files in the archive.
</para>
<para>
You should be wary against setting this number too low,
since this may mean you cannot restart the standby. This
is because the last restartpoint marked in the WAL files
may be many files in the past and can vary considerably.
This should be set to a value exceeding the number of WAL
files that can be recovered in 2*checkpoint_timeout seconds,
according to the value in the warm standby postgresql.conf.
It is wholly unrelated to the setting of checkpoint_segments
on either primary or standby.
</para>
<para>
If in doubt, use a large value or do not set a value at all.
</para>
</entry>
</row>
<row>
<entry>-l</entry>
<entry>
<para>
use ln command to restore WAL files from archive
WAL files will remain in archive
</para>
<para>
Link is more efficient, but the default is copy to
allow you to maintain the WAL archive for recovery
purposes as well as high-availability.
</para>
<para>
This option uses the Windows Vista command mklink
to provide a file-to-file symbolic link. -l will
not work on versions of Windows prior to Vista.
Use the -c option instead.
see <ulink url="http://en.wikipedia.org/wiki/NTFS_symbolic_link"></ulink>
</para>
</entry>
</row>
<row>
<entry>-r maxretries</entry>
<entry>
<para>
the maximum number of times to retry the restore command if it
fails. After each failure, we wait for sleeptime * num_retries
so that the wait time increases progressively, so by default
we will wait 5 secs, 10 secs then 15 secs before reporting
the failure back to the database server. This will be
interpreted as and end of recovery and the Standby will come
up fully as a result. <literal>Default=3</literal>
</para>
</entry>
</row>
<row>
<entry>-s sleeptime</entry>
<entry>
the number of seconds to sleep between testing to see
if the file to be restored is available in the archive yet.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
<literal>Default=5</literal>
</entry>
</row>
<row>
<entry>-t triggerfile</entry>
<entry>
the presence of the triggerfile will cause recovery to end
whether or not the next file is available
It is recommended that you use a structured filename to
avoid confusion as to which server is being triggered
when multiple servers exist on same system.
e.g. /tmp/pgsql.trigger.5432
</entry>
</row>
<row>
<entry>-w maxwaittime</entry>
<entry>
the maximum number of seconds to wait for the next file,
after which recovery will end and the Standby will come up.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
<literal>Default=0</literal>
</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<para>
<literal>--help</literal> is not supported since
<literal>pg_standby</literal> is not intended for interactive use, except
during development and testing.
</para>
</note>
</sect2>
<sect2>
<title>Examples</title>
<itemizedlist>
<listitem>
<para>Example on Linux</para>
<programlisting>
archive_command = 'cp %p ../archive/%f'
restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log'
</programlisting>
<para>
which will
</para>
<itemizedlist>
<listitem><para>use a ln command to restore WAL files from archive</para></listitem>
<listitem><para>produce logfile output in standby.log</para></listitem>
<listitem><para>keep the last 255 full WAL files, plus the current one</para></listitem>
<listitem><para>sleep for 2 seconds between checks for next WAL file is full</para></listitem>
<listitem><para>never timeout if file not found</para></listitem>
<listitem><para>stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears</para></listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
Example on Windows
</para>
<programlisting>
archive_command = 'copy %p ..\\archive\\%f'
</programlisting>
<para>
Note that backslashes need to be doubled in the archive_command, but
*not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
</para>
<programlisting>
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442
..\archive %f %p 2>> standby.log'
</programlisting>
<para>
which will
</para>
<itemizedlist>
<listitem><para>use a copy command to restore WAL files from archive</para></listitem>
<listitem><para>produce logfile output in standby.log</para></listitem>
<listitem><para>sleep for 5 seconds between checks for next WAL file is full</para></listitem>
<listitem><para>never timeout if file not found</para></listitem>
<listitem><para>stop waiting when a trigger file called C:\pgsql.trigger.5442 appears</para></listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</sect2>
</sect1>

765
doc/src/sgml/tablefunc.sgml Normal file
View File

@ -0,0 +1,765 @@
<sect1 id="tablefunc">
<title>tablefunc</title>
<indexterm zone="tablefunc">
<primary>tablefunc</primary>
</indexterm>
<para>
<literal>tablefunc</literal> provides functions to convert query rows into fields.
</para>
<sect2>
<title>Functions</title>
<table>
<title></title>
<tgroup cols="3">
<thead>
<row>
<entry>Function</entry>
<entry>Returns</entry>
<entry>Comments</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<literal>
normal_rand(int numvals, float8 mean, float8 stddev)
</literal>
</entry>
<entry>
returns a set of normally distributed float8 values
</entry>
<entry></entry>
</row>
<row>
<entry><literal>crosstabN(text sql)</literal></entry>
<entry>returns a set of row_name plus N category value columns</entry>
<entry>
crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per the instructions
in the documentation below.
</entry>
</row>
<row>
<entry><literal>crosstab(text sql)</literal></entry>
<entry>returns a set of row_name plus N category value columns</entry>
<entry>
requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
</entry>
</row>
<row>
<entry><literal>crosstab(text sql, N int)</literal></entry>
<entry></entry>
<entry>
<para>obsolete version of crosstab()</para>
<para>
the argument N is now ignored, since the number of value columns
is always determined by the calling query
</para>
</entry>
</row>
<row>
<entry>
<literal>
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
</literal>
</entry>
<entry>
returns keyid, parent_keyid, level, and an optional branch string
and an optional serial column for ordering siblings
</entry>
<entry>
requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
</entry>
</row>
</tbody>
</tgroup>
</table>
<sect3>
<title><literal>normal_rand</literal></title>
<programlisting>
normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8
</programlisting>
<para>
Where <literal>numvals</literal> is the number of values to be returned
from the function. <literal>mean</literal> is the mean of the normal
distribution of values and <literal>stddev</literal> is the standard
deviation of the normal distribution of values.
</para>
<para>
Returns a float8 set of random values normally distributed (Gaussian
distribution).
</para>
<para>
Example:
</para>
<programlisting>
test=# SELECT * FROM
test=# normal_rand(1000, 5, 3);
normal_rand
----------------------
1.56556322244898
9.10040991424657
5.36957140345079
-0.369151492880995
0.283600703686639
.
.
.
4.82992125404908
9.71308014517282
2.49639286969028
(1000 rows)
</programlisting>
<para>
Returns 1000 values with a mean of 5 and a standard deviation of 3.
</para>
</sect3>
<sect3>
<title><literal>crosstabN(text sql)</literal></title>
<programlisting>
crosstabN(text sql)
</programlisting>
<para>
The <literal>sql</literal> parameter is a SQL statement which produces the
source set of data. The SQL statement must return one row_name column, one
category column, and one value column. <literal>row_name</literal> and
value must be of type text. The function returns a set of
<literal>row_name</literal> plus N category value columns.
</para>
<para>
Provided <literal>sql</literal> must produce a set something like:
</para>
<programlisting>
row_name cat value
---------+-------+-------
row1 cat1 val1
row1 cat2 val2
row1 cat3 val3
row1 cat4 val4
row2 cat1 val5
row2 cat2 val6
row2 cat3 val7
row2 cat4 val8
</programlisting>
<para>
The returned value is a <literal>SETOF table_crosstab_N</literal>, which
is defined by:
</para>
<programlisting>
CREATE TYPE tablefunc_crosstab_N AS (
row_name TEXT,
category_1 TEXT,
category_2 TEXT,
.
.
.
category_N TEXT
);
</programlisting>
<para>
for the default installed functions, where N is 2, 3, or 4.
</para>
<para>
e.g. the provided crosstab2 function produces a set something like:
</para>
<programlisting>
<== values columns ==>
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
</programlisting>
<note>
<orderedlist>
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
<listitem>
<para>
The number of values columns depends on the tuple description
of the function's declared return type.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with all nulls in the values columns are skipped.
</para>
</listitem>
<listitem>
<para>
The installed defaults are for illustration purposes. You
can create your own return types and functions based on the
crosstab() function of the installed library. See below for
details.
</para>
</listitem>
</orderedlist>
</note>
<para>
Example:
</para>
<programlisting>
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
select * from crosstab3(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;');
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
</programlisting>
</sect3>
<sect3>
<title><literal>crosstab(text)</literal></title>
<programlisting>
crosstab(text sql)
crosstab(text sql, int N)
</programlisting>
<para>
The <literal>sql</literal> parameter is a SQL statement which produces the
source set of data. The SQL statement must return one
<literal>row_name</literal> column, one <literal>category</literal> column,
and one <literal>value</literal> column. <literal>N</literal> is an
obsolete argument; ignored if supplied (formerly this had to match the
number of category columns determined by the calling query).
</para>
<para>
</para>
<para>
e.g. provided sql must produce a set something like:
</para>
<programlisting>
row_name cat value
----------+-------+-------
row1 cat1 val1
row1 cat2 val2
row1 cat3 val3
row1 cat4 val4
row2 cat1 val5
row2 cat2 val6
row2 cat3 val7
row2 cat4 val8
</programlisting>
<para>
Returns a <literal>SETOF RECORD</literal>, which must be defined with a
column definition in the FROM clause of the SELECT statement, e.g.:
</para>
<programlisting>
SELECT *
FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text);
</programlisting>
<para>
the example crosstab function produces a set something like:
</para>
<programlisting>
<== values columns ==>
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
</programlisting>
<para>
Note that it follows these rules:
</para>
<orderedlist>
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
<listitem>
<para>
The number of values columns is determined by the column definition
provided in the FROM clause. The FROM clause must define one
row_name column (of the same datatype as the first result column
of the sql query) followed by N category columns (of the same
datatype as the third result column of the sql query). You can
set up as many category columns as you wish.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with all nulls in the values columns are skipped.
</para>
</listitem>
<listitem>
<para>
You can avoid always having to write out a FROM clause that defines the
output columns by setting up a custom crosstab function that has
the desired output row type wired into its definition.
</para>
</listitem>
</orderedlist>
<para>
There are two ways you can set up a custom crosstab function:
</para>
<itemizedlist>
<listitem>
<para>
Create a composite type to define your return type, similar to the
examples in the installation script. Then define a unique function
name accepting one text parameter and returning setof your_type_name.
For example, if your source data produces row_names that are TEXT,
and values that are FLOAT8, and you want 5 category columns:
</para>
<programlisting>
CREATE TYPE my_crosstab_float8_5_cols AS (
row_name TEXT,
category_1 FLOAT8,
category_2 FLOAT8,
category_3 FLOAT8,
category_4 FLOAT8,
category_5 FLOAT8
);
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
RETURNS setof my_crosstab_float8_5_cols
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
</programlisting>
</listitem>
<listitem>
<para>
Use OUT parameters to define the return type implicitly.
The same example could also be done this way:
</para>
<programlisting>
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text,
OUT row_name TEXT,
OUT category_1 FLOAT8,
OUT category_2 FLOAT8,
OUT category_3 FLOAT8,
OUT category_4 FLOAT8,
OUT category_5 FLOAT8)
RETURNS setof record
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
</programlisting>
</listitem>
</itemizedlist>
<para>
Example:
</para>
<programlisting>
CREATE TABLE ct(id SERIAL, rowclass TEXT, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att1','val1');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att2','val2');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att3','val3');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att4','val4');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att1','val5');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att2','val6');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att3','val7');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att4','val8');
SELECT *
FROM crosstab(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3)
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
</programlisting>
</sect3>
<sect3>
<title><literal>crosstab(text, text)</literal></title>
<programlisting>
crosstab(text source_sql, text category_sql)
</programlisting>
<para>
Where <literal>source_sql</literal> is a SQL statement which produces the
source set of data. The SQL statement must return one
<literal>row_name</literal> column, one <literal>category</literal> column,
and one <literal>value</literal> column. It may also have one or more
<emphasis>extra</emphasis> columns.
</para>
<para>
The <literal>row_name</literal> column must be first. The
<literal>category</literal> and <literal>value</literal> columns must be
the last two columns, in that order. <emphasis>extra</emphasis> columns must
be columns 2 through (N - 2), where N is the total number of columns.
</para>
<para>
The <emphasis>extra</emphasis> columns are assumed to be the same for all
rows with the same <literal>row_name</literal>. The values returned are
copied from the first row with a given <literal>row_name</literal> and
subsequent values of these columns are ignored until
<literal>row_name</literal> changes.
</para>
<para>
e.g. <literal>source_sql</literal> must produce a set something like:
</para>
<programlisting>
SELECT row_name, extra_col, cat, value FROM foo;
row_name extra_col cat value
----------+------------+-----+---------
row1 extra1 cat1 val1
row1 extra1 cat2 val2
row1 extra1 cat4 val4
row2 extra2 cat1 val5
row2 extra2 cat2 val6
row2 extra2 cat3 val7
row2 extra2 cat4 val8
</programlisting>
<para>
<literal>category_sql</literal> has to be a SQL statement which produces
the distinct set of categories. The SQL statement must return one category
column only. <literal>category_sql</literal> must produce at least one
result row or an error will be generated. <literal>category_sql</literal>
must not produce duplicate categories or an error will be generated. e.g.:
</para>
<programlisting>
SELECT DISTINCT cat FROM foo;
cat
-------
cat1
cat2
cat3
cat4
</programlisting>
<para>
The function returns <literal>SETOF RECORD</literal>, which must be defined
with a column definition in the FROM clause of the SELECT statement, e.g.:
</para>
<programlisting>
SELECT * FROM crosstab(source_sql, cat_sql)
AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
</programlisting>
<para>
the example crosstab function produces a set something like:
</para>
<programlisting>
<== values columns ==>
row_name extra cat1 cat2 cat3 cat4
---------+-------+------+------+------+------
row1 extra1 val1 val2 val4
row2 extra2 val5 val6 val7 val8
</programlisting>
<para>
Note that it follows these rules:
</para>
<orderedlist>
<listitem><para>source_sql must be ordered by row_name (column 1).</para></listitem>
<listitem>
<para>
The number of values columns is determined at run-time. The
column definition provided in the FROM clause must provide for
the correct number of columns of the proper data types.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. source rows with category not found in category_sql
result) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with a null row_name column are skipped.
</para>
</listitem>
<listitem>
<para>
You can create predefined functions to avoid having to write out
the result column names/types in each query. See the examples
for crosstab(text).
</para>
</listitem>
</orderedlist>
<programlisting>
CREATE TABLE cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','volts','3.1234');
SELECT * FROM crosstab
(
'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
rowid text,
rowdt timestamp,
temperature int4,
test_result text,
test_startdate timestamp,
volts float8
);
rowid | rowdt | temperature | test_result | test_startdate | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)
</programlisting>
</sect3>
<sect3>
<title>
<literal>connectby(text, text, text[, text], text, text, int[, text])</literal>
</title>
<programlisting>
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
</programlisting>
<table>
<title><literal>connectby</literal> parameters</title>
<tgroup cols="2">
<thead>
<row>
<entry>Parameter</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>relname</literal></entry>
<entry>Name of the source relation</entry>
</row>
<row>
<entry><literal>keyid_fld</literal></entry>
<entry>Name of the key field</entry>
</row>
<row>
<entry><literal>parent_keyid_fld</literal></entry>
<entry>Name of the key_parent field</entry>
</row>
<row>
<entry><literal>orderby_fld</literal></entry>
<entry>
If optional ordering of siblings is desired: Name of the field to
order siblings
</entry>
</row>
<row>
<entry><literal>start_with</literal></entry>
<entry>
Root value of the tree input as a text value regardless of
<literal>keyid_fld</literal>
</entry>
</row>
<row>
<entry><literal>max_depth</literal></entry>
<entry>
Zero (0) for unlimited depth, otherwise restrict level to this depth
</entry>
</row>
<row>
<entry><literal>branch_delim</literal></entry>
<entry>
If optional branch value is desired, this string is used as the delimiter.
When not provided, a default value of '~' is used for internal
recursion detection only, and no "branch" field is returned.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The function returns <literal>SETOF RECORD</literal>, which must defined
with a column definition in the FROM clause of the SELECT statement, e.g.:
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
</programlisting>
<para>
or
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
</programlisting>
<para>
or
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int);
</programlisting>
<para>
or
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int);
</programlisting>
<para>
Note that it follows these rules:
</para>
<orderedlist>
<listitem><para>keyid and parent_keyid must be the same data type</para></listitem>
<listitem>
<para>
The column definition *must* include a third column of type INT4 for
the level value output
</para>
</listitem>
<listitem>
<para>
If the branch field is not desired, omit both the branch_delim input
parameter *and* the branch field in the query column definition. Note
that when branch_delim is not provided, a default value of '~' is used
for branch_delim for internal recursion detection, even though the branch
field is not returned.
</para>
</listitem>
<listitem>
<para>
If the branch field is desired, it must be the fourth column in the query
column definition, and it must be type TEXT.
</para>
</listitem>
<listitem>
<para>
The parameters representing table and field names must include double
quotes if the names are mixed-case or contain special characters.
</para>
</listitem>
<listitem>
<para>
If sorting of siblings is desired, the orderby_fld input parameter *and*
a name for the resulting serial field (type INT32) in the query column
definition must be given.
</para>
</listitem>
</orderedlist>
<para>
Example:
</para>
<programlisting>
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);
-- with branch, without orderby_fld
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+---------------------
row2 | | 0 | row2
row4 | row2 | 1 | row2~row4
row6 | row4 | 2 | row2~row4~row6
row8 | row6 | 3 | row2~row4~row6~row8
row5 | row2 | 1 | row2~row5
row9 | row5 | 2 | row2~row5~row9
(6 rows)
-- without branch, without orderby_fld
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
keyid | parent_keyid | level
-------+--------------+-------
row2 | | 0
row4 | row2 | 1
row6 | row4 | 2
row8 | row6 | 3
row5 | row2 | 1
row9 | row5 | 2
(6 rows)
-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos;
keyid | parent_keyid | level | branch | pos
-------+--------------+-------+---------------------+-----
row2 | | 0 | row2 | 1
row5 | row2 | 1 | row2~row5 | 2
row9 | row5 | 2 | row2~row5~row9 | 3
row4 | row2 | 1 | row2~row4 | 4
row6 | row4 | 2 | row2~row4~row6 | 5
row8 | row6 | 3 | row2~row4~row6~row8 | 6
(6 rows)
-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos;
keyid | parent_keyid | level | pos
-------+--------------+-------+-----
row2 | | 0 | 1
row5 | row2 | 1 | 2
row9 | row5 | 2 | 3
row4 | row2 | 1 | 4
row6 | row4 | 2 | 5
row8 | row6 | 3 | 6
(6 rows)
</programlisting>
</sect3>
</sect2>
<sect2>
<title>Author</title>
<para>
Joe Conway
</para>
</sect2>
</sect1>

214
doc/src/sgml/trgm.sgml Normal file
View File

@ -0,0 +1,214 @@
<sect1 id="pgtrgm">
<title>pg_trgm</title>
<indexterm zone="pgtrgm">
<primary>pgtrgm</primary>
</indexterm>
<para>
The <literal>pg_trgm</literal> module provides functions and index classes
for determining the similarity of text based on trigram matching.
</para>
<sect2>
<title>Trigram (or Trigraph)</title>
<para>
A trigram is a set of three consecutive characters taken
from a string. A string is considered to have two spaces
prefixed and one space suffixed when determining the set
of trigrams that comprise the string.
</para>
<para>
eg. The set of trigrams in the word "cat" is " c", " ca",
"at " and "cat".
</para>
</sect2>
<sect2>
<title>Public Functions</title>
<table>
<title><literal>pg_trgm</literal> functions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Function</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>real similarity(text, text)</literal></entry>
<entry>
<para>
Returns a number that indicates how closely matches the two
arguments are. A zero result indicates that the two words
are completely dissimilar, and a result of one indicates that
the two words are identical.
</para>
</entry>
</row>
<row>
<entry><literal>real show_limit()</literal></entry>
<entry>
<para>
Returns the current similarity threshold used by the '%'
operator. This in effect sets the minimum similarity between
two words in order that they be considered similar enough to
be misspellings of each other, for example.
</para>
</entry>
</row>
<row>
<entry><literal>real set_limit(real)</literal></entry>
<entry>
<para>
Sets the current similarity threshold that is used by the '%'
operator, and is returned by the show_limit() function.
</para>
</entry>
</row>
<row>
<entry><literal>text[] show_trgm(text)</literal></entry>
<entry>
<para>
Returns an array of all the trigrams of the supplied text
parameter.
</para>
</entry>
</row>
<row>
<entry>Operator: <literal>text % text (returns boolean)</literal></entry>
<entry>
<para>
The '%' operator returns TRUE if its two arguments have a similarity
that is greater than the similarity threshold set by set_limit(). It
will return FALSE if the similarity is less than the current
threshold.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Public Index Operator Class</title>
<para>
The <literal>pg_trgm</literal> module comes with the
<literal>gist_trgm_ops</literal> index operator class that allows a
developer to create an index over a text column for the purpose
of very fast similarity searches.
</para>
<para>
To use this index, the '%' operator must be used and an appropriate
similarity threshold for the application must be set. Example:
</para>
<programlisting>
CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
</programlisting>
<para>
At this point, you will have an index on the t text column that you
can use for similarity searching. Example:
</para>
<programlisting>
SELECT
t,
similarity(t, 'word') AS sml
FROM
test_trgm
WHERE
t % 'word'
ORDER BY
sml DESC, t;
</programlisting>
<para>
This will return all values in the text column that are sufficiently
similar to 'word', sorted from best match to worst. The index will
be used to make this a fast operation over very large data sets.
</para>
</sect2>
<sect2>
<title>Tsearch2 Integration</title>
<para>
Trigram matching is a very useful tool when used in conjunction
with a text index created by the Tsearch2 contrib module. (See
contrib/tsearch2)
</para>
<para>
The first step is to generate an auxiliary table containing all
the unique words in the Tsearch2 index:
</para>
<programlisting>
CREATE TABLE words AS SELECT word FROM
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
</programlisting>
<para>
Where 'documents' is a table that has a text field 'bodytext'
that TSearch2 is used to search. The use of the 'simple' dictionary
with the to_tsvector function, instead of just using the already
existing vector is to avoid creating a list of already stemmed
words. This way, only the original, unstemmed words are added
to the word list.
</para>
<para>
Next, create a trigram index on the word column:
</para>
<programlisting>
CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
</programlisting>
<para>
or
</para>
<programlisting>
CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
</programlisting>
<para>
Now, a <literal>SELECT</literal> query similar to the example above can be
used to suggest spellings for misspelled words in user search terms. A
useful extra clause is to ensure that the similar words are also
of similar length to the misspelled word.
</para>
<para>
<note>
<para>
Since the 'words' table has been generated as a separate,
static table, it will need to be periodically regenerated so that
it remains up to date with the word list in the Tsearch2 index.
</para>
</note>
</para>
</sect2>
<sect2>
<title>References</title>
<para>
Tsearch2 Development Site
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/"></ulink>
</para>
<para>
GiST Development Site
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink>
</para>
</sect2>
<sect2>
<title>Authors</title>
<para>
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
</para>
<para>
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
</para>
<para>
Documentation: Christopher Kings-Lynne
</para>
<para>
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
</para>
</sect2>
</sect1>

163
doc/src/sgml/uuid-ossp.sgml Normal file
View File

@ -0,0 +1,163 @@
<sect1 id="uuid-ossp">
<title>uuid-ossp</title>
<indexterm zone="uuid-ossp">
<primary>uuid-ossp</primary>
</indexterm>
<para>
This module provides functions to generate universally unique
identifiers (UUIDs) using one of the several standard algorithms, as
well as functions to produce certain special UUID constants.
</para>
<sect2>
<title>UUID Generation</title>
<para>
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
4122 specify four algorithms for generating UUIDs, identified by the
version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.)
Each of these algorithms could be suitable for a different set of
applications.
</para>
<table>
<title><literal>uuid-ossp</literal> functions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Function</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>uuid_generate_v1()</literal></entry>
<entry>
<para>
This function generates a version 1 UUID. This involves the MAC
address of the computer and a time stamp. Note that UUIDs of this
kind reveal the identity of the computer that created the identifier
and the time at which it did so, which might make it unsuitable for
certain security-sensitive applications.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v1mc()</literal></entry>
<entry>
<para>
This function generates a version 1 UUID but uses a random multicast
MAC address instead of the real MAC address of the computer.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v3(namespace uuid, name text)</literal></entry>
<entry>
<para>
This function generates a version 3 UUID in the given namespace using
the specified input name. The namespace should be one of the special
constants produced by the uuid_ns_*() functions shown below. (It
could be any UUID in theory.) The name is an identifier in the
selected namespace. For example:
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry>
<entry>
<para>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
</para>
<para>
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v4()</literal></entry>
<entry>
<para>
This function generates a version 4 UUID, which is derived entirely
from random numbers.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v5(namespace uuid, name text)</literal></entry>
<entry>
<para>
This function generates a version 5 UUID, which works like a version 3
UUID except that SHA-1 is used as a hashing method. Version 5 should
be preferred over version 3 because SHA-1 is thought to be more secure
than MD5.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>UUID Constants</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>uuid_nil()</literal></entry>
<entry>
<para>
A "nil" UUID constant, which does not occur as a real UUID.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_dns()</literal></entry>
<entry>
<para>
Constant designating the DNS namespace for UUIDs.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_url()</literal></entry>
<entry>
<para>
Constant designating the URL namespace for UUIDs.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_oid()</literal></entry>
<entry>
<para>
Constant designating the ISO object identifier (OID) namespace for
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
PostgreSQL.)
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_x500()</literal></entry>
<entry>
<para>
Constant designating the X.500 distinguished name (DN) namespace for
UUIDs.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Eisentraut <email>peter_e@gmx.net</email>
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,74 @@
<sect1 id="vacuumlo">
<title>vacuumlo</title>
<indexterm zone="vacuumlo">
<primary>vacuumlo</primary>
</indexterm>
<para>
This is a simple utility that will remove any orphaned large objects out of a
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
does not appear in any OID data column of the database.
</para>
<para>
If you use this, you may also be interested in the lo_manage trigger in
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
in the first place.
</para>
<para>
<note>
<para>
It was decided to place this in contrib as it needs further testing, but hopefully,
this (or a variant of it) would make it into the backend as a "vacuum lo"
command in a later release.
</para>
</note>
</para>
<sect2>
<title>Usage</title>
<programlisting>
vacuumlo [options] database [database2 ... databasen]
</programlisting>
<para>
All databases named on the command line are processed. Available options
include:
</para>
<programlisting>
-v Write a lot of progress messages
-n Don't remove large objects, just show what would be done
-U username Username to connect as
-W Prompt for password
-h hostname Database server host
-p port Database server port
</programlisting>
</sect2>
<sect2>
<title>Method</title>
<para>
First, it builds a temporary table which contains all of the OIDs of the
large objects in that database.
</para>
<para>
It then scans through all columns in the database that are of type "oid"
or "lo", and removes matching entries from the temporary table.
</para>
<para>
The remaining entries in the temp table identify orphaned LOs. These are
removed.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email>
</para>
<para>
<ulink url="http://www.retep.org.uk"></ulink>
</para>
</sect2>
</sect1>

436
doc/src/sgml/xml2.sgml Normal file
View File

@ -0,0 +1,436 @@
<sect1 id="xml2">
<title>xml2: XML-handling functions</title>
<indexterm zone="xml2">
<primary>xml2</primary>
</indexterm>
<sect2>
<title>Deprecation notice</title>
<para>
From PostgreSQL 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does as well, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
</para>
</sect2>
<sect2>
<title>Description of functions</title>
<para>
The first set of functions are straightforward XML parsing and XPath queries:
</para>
<table>
<title>Functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry>
<programlisting>
xml_is_well_formed(document) RETURNS bool
</programlisting>
</entry>
<entry>
<para>
This parses the document text in its parameter and returns true if the
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
was called xml_valid(). That is the wrong name since validity and
well-formedness have different meanings in XML. The old name is still
available, but is deprecated and will be removed in 8.3.)
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_string(document,query) RETURNS text
xpath_number(document,query) RETURNS float4
xpath_bool(document,query) RETURNS bool
</programlisting>
</entry>
<entry>
<para>
These functions evaluate the XPath query on the supplied document, and
cast the result to the specified type.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
</programlisting>
</entry>
<entry>
<para>
This evaluates query on document and wraps the result in XML tags. If
the result is multivalued, the output will look like:
</para>
<literal>
&lt;toptag>
&lt;itemtag>Value 1 which could be an XML fragment&lt;/itemtag>
&lt;itemtag>Value 2....&lt;/itemtag>
&lt;/toptag>
</literal>
<para>
If either toptag or itemtag is an empty string, the relevant tag is omitted.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query) RETURNS
</programlisting>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,itemtag) RETURNS
</programlisting>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query,seperator) RETURNS text
</programlisting>
</entry>
<entry>
<para>
This function returns multiple values seperated by the specified
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query) RETURNS text
</programlisting>
</entry>
<entry>
This is a wrapper for the above function that uses ',' as the seperator.
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title><literal>xpath_table</literal></title>
<para>
This is a table function which evaluates a set of XPath queries on
each of a set of documents and returns the results as a table. The
primary key field from the original document table is returned as the
first column of the result so that the resultset from xpath_table can
be readily used in joins.
</para>
<para>
The function itself takes 5 arguments, all text.
</para>
<programlisting>
xpath_table(key,document,relation,xpaths,criteria)
</programlisting>
<table>
<title>Parameters</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>key</literal></entry>
<entry>
<para>
the name of the "key" field - this is just a field to be used as
the first column of the output table i.e. it identifies the record from
which each output row came (see note below about multiple values).
</para>
</entry>
</row>
<row>
<entry><literal>document</literal></entry>
<entry>
<para>
the name of the field containing the XML document
</para>
</entry>
</row>
<row>
<entry><literal>relation</literal></entry>
<entry>
<para>
the name of the table or view containing the documents
</para>
</entry>
</row>
<row>
<entry><literal>xpaths</literal></entry>
<entry>
<para>
multiple xpath expressions separated by <literal>|</literal>
</para>
</entry>
</row>
<row>
<entry><literal>criteria</literal></entry>
<entry>
<para>
The contents of the where clause. This needs to be specified,
so use "true" or "1=1" here if you want to process all the rows in the
relation.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
NB These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility - the
statement is
</para>
<para>
<literal>
SELECT &lt;key>,&lt;document> FROM &lt;relation> WHERE &lt;criteria>
</literal>
</para>
<para>
so those parameters can be *anything* valid in those particular
locations. The result from this SELECT needs to return exactly two
columns (which it will unless you try to list multiple fields for key
or document). Beware that this simplistic approach requires that you
validate any user-supplied values to avoid SQL injection attacks.
</para>
<para>
Using the function
</para>
<para>
The function has to be used in a FROM expression. This gives the following
form:
</para>
<programlisting>
SELECT * FROM
xpath_table('article_id',
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);
</programlisting>
<para>
The AS clause defines the names and types of the columns in the
virtual table. If there are more XPath queries than result columns,
the extra queries will be ignored. If there are more result columns
than XPath queries, the extra columns will be NULL.
</para>
<para>
Note that I've said in this example that pages is an integer. The
function deals internally with string representations, so when you say
you want an integer in the output, it will take the string
representation of the XPath result and use PostgreSQL input functions
to transform it into an integer (or whatever type the AS clause
requests). An error will result if it can't do this - for example if
the result is empty - so you may wish to just stick to 'text' as the
column type if you think your data has any problems.
</para>
<para>
The select statement doesn't need to use * alone - it can reference the
columns by name or join them to other tables. The function produces a
virtual table with which you can perform any operation you wish (e.g.
aggregation, joining, sorting etc). So we could also have:
</para>
<programlisting>
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id','article_xml','articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
WHERE t.author_id = p.person_id;
</programlisting>
<para>
as a more complicated example. Of course, you could wrap all
of this in a view for convenience.
</para>
<sect3>
<title>Multivalued results</title>
<para>
The xpath_table function assumes that the results of each XPath query
might be multi-valued, so the number of rows returned by the function
may not be the same as the number of input documents. The first row
returned contains the first result from each query, the second row the
second result from each query. If one of the queries has fewer values
than the others, NULLs will be returned instead.
</para>
<para>
In some cases, a user will know that a given XPath query will return
only a single result (perhaps a unique document identifier) - if used
alongside an XPath query returning multiple results, the single-valued
result will appear only on the first row of the result. The solution
to this is to use the key field as part of a join against a simpler
XPath query. As an example:
</para>
<para>
<literal>
CREATE TABLE test
(
id int4 NOT NULL,
xml text,
CONSTRAINT pk PRIMARY KEY (id)
)
WITHOUT OIDS;
INSERT INTO test VALUES (1, '&lt;doc num="C1">
&lt;line num="L1">&lt;a>1&lt;/a>&lt;b>2&lt;/b>&lt;c>3&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>11&lt;/a>&lt;b>22&lt;/b>&lt;c>33&lt;/c>&lt;/line>
&lt;/doc>');
INSERT INTO test VALUES (2, '&lt;doc num="C2">
&lt;line num="L1">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;/doc>');
</literal>
</para>
</sect3>
<sect3>
<title>The query</title>
<programlisting>
SELECT * FROM xpath_table('id','xml','test',
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
</programlisting>
<para>
Gives the result:
</para>
<programlisting>
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
</programlisting>
<para>
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
</para>
<programlisting>
SELECT t.*,i.doc_num FROM
xpath_table('id','xml','test',
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
xpath_table('id','xml','test','/doc/@num','1=1')
AS i(id int4, doc_num varchar(10))
WHERE i.id=t.id AND i.id=1
ORDER BY doc_num, line_num;
</programlisting>
<para>
which gives the desired result:
</para>
<programlisting>
id | line_num | val1 | val2 | val3 | doc_num
----+----------+------+------+------+---------
1 | L1 | 1 | 2 | 3 | C1
1 | L2 | 11 | 22 | 33 | C1
(2 rows)
</programlisting>
</sect3>
</sect2>
<sect2>
<title>XSLT functions</title>
<para>
The following functions are available if libxslt is installed (this is
not currently detected automatically, so you will have to amend the
Makefile)
</para>
<sect3>
<title><literal>xslt_process</literal></title>
<programlisting>
xslt_process(document,stylesheet,paramlist) RETURNS text
</programlisting>
<para>
This function appplies the XSL stylesheet to the document and returns
the transformed result. The paramlist is a list of parameter
assignments to be used in the transformation, specified in the form
'a=1,b=2'. Note that this is also proof-of-concept code and the
parameter parsing is very simple-minded (e.g. parameter values cannot
contain commas!)
</para>
<para>
Also note that if either the document or stylesheet values do not
begin with a < then they will be treated as URLs and libxslt will
fetch them. It thus follows that you can use xslt_process as a means
to fetch the contents of URLs - you should be aware of the security
implications of this.
</para>
<para>
There is also a two-parameter version of xslt_process which does not
pass any parameters to the transformation.
</para>
</sect3>
</sect2>
<sect2>
<title>Credits</title>
<para>
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
It has the same BSD licence as PostgreSQL.
</para>
<para>
This version of the XML functions provides both XPath querying and
XSLT functionality. There is also a new table function which allows
the straightforward return of multiple XML results. Note that the current code
doesn't take any particular care over character sets - this is
something that should be fixed at some point!
</para>
<para>
If you have any comments or suggestions, please do contact me at
<email>jgray@azuli.co.uk.</email> Unfortunately, this isn't my main job, so
I can't guarantee a rapid response to your query!
</para>
</sect2>
</sect1>