Move most /contrib README files into SGML. Some still need conversion
or will never be converted.
This commit is contained in:
parent
6e414a171e
commit
c3c69ab4fd
|
@ -1,48 +0,0 @@
|
||||||
PostgreSQL Administration Functions
|
|
||||||
===================================
|
|
||||||
|
|
||||||
This directory is a PostgreSQL 'contrib' module which implements a number of
|
|
||||||
support functions which pgAdmin and other administration and management tools
|
|
||||||
can use to provide additional functionality if installed on a server.
|
|
||||||
|
|
||||||
Installation
|
|
||||||
============
|
|
||||||
|
|
||||||
This module is normally distributed as a PostgreSQL 'contrib' module. To
|
|
||||||
install it from a pre-configured source tree run the following commands
|
|
||||||
as a user with appropriate privileges from the adminpack source directory:
|
|
||||||
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
|
|
||||||
Alternatively, if you have a PostgreSQL 8.2 or higher installation but no
|
|
||||||
source tree you can install using PGXS. Simply run the following commands the
|
|
||||||
adminpack source directory:
|
|
||||||
|
|
||||||
make USE_PGXS=1
|
|
||||||
make USE_PGXS=1 install
|
|
||||||
|
|
||||||
pgAdmin will look for the functions in the Maintenance Database (usually
|
|
||||||
"postgres" for 8.2 servers) specified in the connection dialogue for the server.
|
|
||||||
To install the functions in the database, either run the adminpack.sql script
|
|
||||||
using the pgAdmin SQL tool (and then close and reopen the connection to the
|
|
||||||
freshly instrumented server), or run the script using psql, eg:
|
|
||||||
|
|
||||||
psql -U postgres postgres < adminpack.sql
|
|
||||||
|
|
||||||
Other administration tools that use this module may have different requirements,
|
|
||||||
please consult the tool's documentation for further details.
|
|
||||||
|
|
||||||
Objects implemented (superuser only)
|
|
||||||
====================================
|
|
||||||
|
|
||||||
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
|
|
||||||
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
|
|
||||||
bool pg_catalog.pg_file_rename(oldname text, newname text)
|
|
||||||
bool pg_catalog.pg_file_unlink(fname text)
|
|
||||||
setof record pg_catalog.pg_logdir_ls()
|
|
||||||
|
|
||||||
/* Renaming of existing backend functions for pgAdmin compatibility */
|
|
||||||
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
|
|
||||||
bigint pg_catalog.pg_file_length(text)
|
|
||||||
int4 pg_catalog.pg_logfile_rotate()
|
|
|
@ -1,55 +0,0 @@
|
||||||
This is a B-Tree implementation using GiST that supports the int2, int4,
|
|
||||||
int8, float4, float8 timestamp with/without time zone, time
|
|
||||||
with/without time zone, date, interval, oid, money, macaddr, char,
|
|
||||||
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
|
|
||||||
|
|
||||||
All work was done by Teodor Sigaev (teodor@stack.net) , Oleg Bartunov
|
|
||||||
(oleg@sai.msu.su), Janko Richter (jankorichter@yahoo.de).
|
|
||||||
See http://www.sai.msu.su/~megera/postgres/gist for additional
|
|
||||||
information.
|
|
||||||
|
|
||||||
NEWS:
|
|
||||||
|
|
||||||
Apr 17, 2004 - Performance optimizing
|
|
||||||
|
|
||||||
Jan 21, 2004 - add support for bytea, numeric, bit, varbit, inet/cidr
|
|
||||||
|
|
||||||
Jan 17, 2004 - Reorganizing code and add support for char, varchar/text
|
|
||||||
|
|
||||||
Jan 10, 2004 - btree_gist now support oid , timestamp with time zone ,
|
|
||||||
time with and without time zone, date , interval
|
|
||||||
money, macaddr
|
|
||||||
|
|
||||||
Feb 5, 2003 - btree_gist now support int2, int8, float4, float8
|
|
||||||
|
|
||||||
NOTICE:
|
|
||||||
This version will only work with PostgreSQL version 7.4 and above
|
|
||||||
because of changes in the system catalogs and the function call
|
|
||||||
interface.
|
|
||||||
|
|
||||||
If you want to index varchar attributes, you have to index using
|
|
||||||
the function text(<varchar>):
|
|
||||||
Example:
|
|
||||||
CREATE TABLE test ( a varchar(23) );
|
|
||||||
CREATE INDEX testidx ON test USING GIST ( text(a) );
|
|
||||||
|
|
||||||
|
|
||||||
INSTALLATION:
|
|
||||||
|
|
||||||
gmake
|
|
||||||
gmake install
|
|
||||||
-- load functions
|
|
||||||
psql <database> < btree_gist.sql
|
|
||||||
|
|
||||||
REGRESSION TEST:
|
|
||||||
|
|
||||||
gmake installcheck
|
|
||||||
|
|
||||||
EXAMPLE USAGE:
|
|
||||||
|
|
||||||
create table test (a int4);
|
|
||||||
-- create index
|
|
||||||
create index testidx on test using gist (a);
|
|
||||||
-- query
|
|
||||||
select * from test where a < 10;
|
|
||||||
|
|
|
@ -1,56 +0,0 @@
|
||||||
$PostgreSQL: pgsql/contrib/chkpass/README.chkpass,v 1.5 2007/10/01 19:06:48 darcy Exp $
|
|
||||||
|
|
||||||
Chkpass is a password type that is automatically checked and converted upon
|
|
||||||
entry. It is stored encrypted. To compare, simply compare against a clear
|
|
||||||
text password and the comparison function will encrypt it before comparing.
|
|
||||||
It also returns an error if the code determines that the password is easily
|
|
||||||
crackable. This is currently a stub that does nothing.
|
|
||||||
|
|
||||||
I haven't worried about making this type indexable. I doubt that anyone
|
|
||||||
would ever need to sort a file in order of encrypted password.
|
|
||||||
|
|
||||||
If you precede the string with a colon, the encryption and checking are
|
|
||||||
skipped so that you can enter existing passwords into the field.
|
|
||||||
|
|
||||||
On output, a colon is prepended. This makes it possible to dump and reload
|
|
||||||
passwords without re-encrypting them. If you want the password (encrypted)
|
|
||||||
without the colon then use the raw() function. This allows you to use the
|
|
||||||
type with things like Apache's Auth_PostgreSQL module.
|
|
||||||
|
|
||||||
The encryption uses the standard Unix function crypt(), and so it suffers
|
|
||||||
from all the usual limitations of that function; notably that only the
|
|
||||||
first eight characters of a password are considered.
|
|
||||||
|
|
||||||
Here is some sample usage:
|
|
||||||
|
|
||||||
test=# create table test (p chkpass);
|
|
||||||
CREATE TABLE
|
|
||||||
test=# insert into test values ('hello');
|
|
||||||
INSERT 0 1
|
|
||||||
test=# select * from test;
|
|
||||||
p
|
|
||||||
----------------
|
|
||||||
:dVGkpXdOrE3ko
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
test=# select raw(p) from test;
|
|
||||||
raw
|
|
||||||
---------------
|
|
||||||
dVGkpXdOrE3ko
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
test=# select p = 'hello' from test;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
t
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
test=# select p = 'goodbye' from test;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
f
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
D'Arcy J.M. Cain
|
|
||||||
darcy@druid.net
|
|
||||||
|
|
|
@ -1,355 +0,0 @@
|
||||||
This directory contains the code for the user-defined type,
|
|
||||||
CUBE, representing multidimensional cubes.
|
|
||||||
|
|
||||||
|
|
||||||
FILES
|
|
||||||
-----
|
|
||||||
|
|
||||||
Makefile building instructions for the shared library
|
|
||||||
|
|
||||||
README.cube the file you are now reading
|
|
||||||
|
|
||||||
cube.c the implementation of this data type in c
|
|
||||||
|
|
||||||
cube.sql.in SQL code needed to register this type with postgres
|
|
||||||
(transformed to cube.sql by make)
|
|
||||||
|
|
||||||
cubedata.h the data structure used to store the cubes
|
|
||||||
|
|
||||||
cubeparse.y the grammar file for the parser (used by cube_in() in cube.c)
|
|
||||||
|
|
||||||
cubescan.l scanner rules (used by cube_yyparse() in cubeparse.y)
|
|
||||||
|
|
||||||
|
|
||||||
INSTALLATION
|
|
||||||
============
|
|
||||||
|
|
||||||
To install the type, run
|
|
||||||
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
|
|
||||||
The user running "make install" may need root access; depending on how you
|
|
||||||
configured the PostgreSQL installation paths.
|
|
||||||
|
|
||||||
This only installs the type implementation and documentation. To make the
|
|
||||||
type available in any particular database, as a postgres superuser do:
|
|
||||||
|
|
||||||
psql -d databasename < cube.sql
|
|
||||||
|
|
||||||
If you install the type in the template1 database, all subsequently created
|
|
||||||
databases will inherit it.
|
|
||||||
|
|
||||||
To test the new type, after "make install" do
|
|
||||||
|
|
||||||
make installcheck
|
|
||||||
|
|
||||||
If it fails, examine the file regression.diffs to find out the reason (the
|
|
||||||
test code is a direct adaptation of the regression tests from the main
|
|
||||||
source tree).
|
|
||||||
|
|
||||||
By default the external functions are made executable by anyone.
|
|
||||||
|
|
||||||
SYNTAX
|
|
||||||
======
|
|
||||||
|
|
||||||
The following are valid external representations for the CUBE type:
|
|
||||||
|
|
||||||
'x' A floating point value representing
|
|
||||||
a one-dimensional point or one-dimensional
|
|
||||||
zero length cubement
|
|
||||||
|
|
||||||
'(x)' Same as above
|
|
||||||
|
|
||||||
'x1,x2,x3,...,xn' A point in n-dimensional space,
|
|
||||||
represented internally as a zero volume box
|
|
||||||
|
|
||||||
'(x1,x2,x3,...,xn)' Same as above
|
|
||||||
|
|
||||||
'(x),(y)' 1-D cubement starting at x and ending at y
|
|
||||||
or vice versa; the order does not matter
|
|
||||||
|
|
||||||
'(x1,...,xn),(y1,...,yn)' n-dimensional box represented by
|
|
||||||
a pair of its opposite corners, no matter which.
|
|
||||||
Functions take care of swapping to achieve
|
|
||||||
"lower left -- upper right" representation
|
|
||||||
before computing any values
|
|
||||||
|
|
||||||
Grammar
|
|
||||||
-------
|
|
||||||
|
|
||||||
rule 1 box -> O_BRACKET paren_list COMMA paren_list C_BRACKET
|
|
||||||
rule 2 box -> paren_list COMMA paren_list
|
|
||||||
rule 3 box -> paren_list
|
|
||||||
rule 4 box -> list
|
|
||||||
rule 5 paren_list -> O_PAREN list C_PAREN
|
|
||||||
rule 6 list -> FLOAT
|
|
||||||
rule 7 list -> list COMMA FLOAT
|
|
||||||
|
|
||||||
Tokens
|
|
||||||
------
|
|
||||||
|
|
||||||
n [0-9]+
|
|
||||||
integer [+-]?{n}
|
|
||||||
real [+-]?({n}\.{n}?|\.{n})
|
|
||||||
FLOAT ({integer}|{real})([eE]{integer})?
|
|
||||||
O_BRACKET \[
|
|
||||||
C_BRACKET \]
|
|
||||||
O_PAREN \(
|
|
||||||
C_PAREN \)
|
|
||||||
COMMA \,
|
|
||||||
|
|
||||||
|
|
||||||
Examples of valid CUBE representations:
|
|
||||||
--------------------------------------
|
|
||||||
|
|
||||||
'x' A floating point value representing
|
|
||||||
a one-dimensional point (or, zero-length
|
|
||||||
one-dimensional interval)
|
|
||||||
|
|
||||||
'(x)' Same as above
|
|
||||||
|
|
||||||
'x1,x2,x3,...,xn' A point in n-dimensional space,
|
|
||||||
represented internally as a zero volume cube
|
|
||||||
|
|
||||||
'(x1,x2,x3,...,xn)' Same as above
|
|
||||||
|
|
||||||
'(x),(y)' A 1-D interval starting at x and ending at y
|
|
||||||
or vice versa; the order does not matter
|
|
||||||
|
|
||||||
'[(x),(y)]' Same as above
|
|
||||||
|
|
||||||
'(x1,...,xn),(y1,...,yn)' An n-dimensional box represented by
|
|
||||||
a pair of its diagonally opposite corners,
|
|
||||||
regardless of order. Swapping is provided
|
|
||||||
by all comarison routines to ensure the
|
|
||||||
"lower left -- upper right" representation
|
|
||||||
before actaul comparison takes place.
|
|
||||||
|
|
||||||
'[(x1,...,xn),(y1,...,yn)]' Same as above
|
|
||||||
|
|
||||||
|
|
||||||
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
|
|
||||||
|
|
||||||
|
|
||||||
DEFAULTS
|
|
||||||
========
|
|
||||||
|
|
||||||
I believe this union:
|
|
||||||
|
|
||||||
select cube_union('(0,5,2),(2,3,1)','0');
|
|
||||||
cube_union
|
|
||||||
-------------------
|
|
||||||
(0, 0, 0),(2, 5, 2)
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
does not contradict to the common sense, neither does the intersection
|
|
||||||
|
|
||||||
select cube_inter('(0,-1),(1,1)','(-2),(2)');
|
|
||||||
cube_inter
|
|
||||||
-------------
|
|
||||||
(0, 0),(1, 0)
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
In all binary operations on differently sized boxes, I assume the smaller
|
|
||||||
one to be a cartesian projection, i. e., having zeroes in place of coordinates
|
|
||||||
omitted in the string representation. The above examples are equivalent to:
|
|
||||||
|
|
||||||
cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
|
|
||||||
cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
|
|
||||||
|
|
||||||
|
|
||||||
The following containment predicate uses the point syntax,
|
|
||||||
while in fact the second argument is internally represented by a box.
|
|
||||||
This syntax makes it unnecessary to define the special Point type
|
|
||||||
and functions for (box,point) predicates.
|
|
||||||
|
|
||||||
select cube_contains('(0,0),(1,1)', '0.5,0.5');
|
|
||||||
cube_contains
|
|
||||||
--------------
|
|
||||||
t
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
|
|
||||||
PRECISION
|
|
||||||
=========
|
|
||||||
|
|
||||||
Values are stored internally as 64-bit floating point numbers. This means that
|
|
||||||
numbers with more than about 16 significant digits will be truncated.
|
|
||||||
|
|
||||||
|
|
||||||
USAGE
|
|
||||||
=====
|
|
||||||
|
|
||||||
The access method for CUBE is a GiST index (gist_cube_ops), which is a
|
|
||||||
generalization of R-tree. GiSTs allow the postgres implementation of
|
|
||||||
R-tree, originally encoded to support 2-D geometric types such as
|
|
||||||
boxes and polygons, to be used with any data type whose data domain
|
|
||||||
can be partitioned using the concepts of containment, intersection and
|
|
||||||
equality. In other words, everything that can intersect or contain
|
|
||||||
its own kind can be indexed with a GiST. That includes, among other
|
|
||||||
things, all geometric data types, regardless of their dimensionality
|
|
||||||
(see also contrib/seg).
|
|
||||||
|
|
||||||
The operators supported by the GiST access method include:
|
|
||||||
|
|
||||||
a = b Same as
|
|
||||||
|
|
||||||
The cubements a and b are identical.
|
|
||||||
|
|
||||||
a && b Overlaps
|
|
||||||
|
|
||||||
The cubements a and b overlap.
|
|
||||||
|
|
||||||
a @> b Contains
|
|
||||||
|
|
||||||
The cubement a contains the cubement b.
|
|
||||||
|
|
||||||
a <@ b Contained in
|
|
||||||
|
|
||||||
The cubement a is contained in b.
|
|
||||||
|
|
||||||
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
|
||||||
respectively called @ and ~. These names are still available, but are
|
|
||||||
deprecated and will eventually be retired. Notice that the old names
|
|
||||||
are reversed from the convention formerly followed by the core geometric
|
|
||||||
datatypes!)
|
|
||||||
|
|
||||||
Although the mnemonics of the following operators is questionable, I
|
|
||||||
preserved them to maintain visual consistency with other geometric
|
|
||||||
data types defined in Postgres.
|
|
||||||
|
|
||||||
Other operators:
|
|
||||||
|
|
||||||
[a, b] < [c, d] Less than
|
|
||||||
[a, b] > [c, d] Greater than
|
|
||||||
|
|
||||||
These operators do not make a lot of sense for any practical
|
|
||||||
purpose but sorting. These operators first compare (a) to (c),
|
|
||||||
and if these are equal, compare (b) to (d). That accounts for
|
|
||||||
reasonably good sorting in most cases, which is useful if
|
|
||||||
you want to use ORDER BY with this type
|
|
||||||
|
|
||||||
The following functions are available:
|
|
||||||
|
|
||||||
cube_distance(cube, cube) returns double
|
|
||||||
cube_distance returns the distance between two cubes. If both cubes are
|
|
||||||
points, this is the normal distance function.
|
|
||||||
|
|
||||||
cube(float8) returns cube
|
|
||||||
This makes a one dimensional cube with both coordinates the same.
|
|
||||||
If the type of the argument is a numeric type other than float8 an
|
|
||||||
explicit cast to float8 may be needed.
|
|
||||||
cube(1) == '(1)'
|
|
||||||
|
|
||||||
cube(float8, float8) returns cube
|
|
||||||
This makes a one dimensional cube.
|
|
||||||
cube(1,2) == '(1),(2)'
|
|
||||||
|
|
||||||
cube(float8[]) returns cube
|
|
||||||
This makes a zero-volume cube using the coordinates defined by the
|
|
||||||
array.
|
|
||||||
cube(ARRAY[1,2]) == '(1,2)'
|
|
||||||
|
|
||||||
cube(float8[], float8[]) returns cube
|
|
||||||
This makes a cube, with upper right and lower left coordinates as
|
|
||||||
defined by the 2 float arrays. Arrays must be of the same length.
|
|
||||||
cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
|
|
||||||
|
|
||||||
cube(cube, float8) returns cube
|
|
||||||
This builds a new cube by adding a dimension on to an existing cube with
|
|
||||||
the same values for both parts of the new coordinate. This is useful for
|
|
||||||
building cubes piece by piece from calculated values.
|
|
||||||
cube('(1)',2) == '(1,2),(1,2)'
|
|
||||||
|
|
||||||
cube(cube, float8, float8) returns cube
|
|
||||||
This builds a new cube by adding a dimension on to an existing cube.
|
|
||||||
This is useful for building cubes piece by piece from calculated values.
|
|
||||||
cube('(1,2)',3,4) == '(1,3),(2,4)'
|
|
||||||
|
|
||||||
cube_dim(cube) returns int
|
|
||||||
cube_dim returns the number of dimensions stored in the the data structure
|
|
||||||
for a cube. This is useful for constraints on the dimensions of a cube.
|
|
||||||
|
|
||||||
cube_ll_coord(cube, int) returns double
|
|
||||||
cube_ll_coord returns the nth coordinate value for the lower left corner
|
|
||||||
of a cube. This is useful for doing coordinate transformations.
|
|
||||||
|
|
||||||
cube_ur_coord(cube, int) returns double
|
|
||||||
cube_ur_coord returns the nth coordinate value for the upper right corner
|
|
||||||
of a cube. This is useful for doing coordinate transformations.
|
|
||||||
|
|
||||||
cube_subset(cube, int[]) returns cube
|
|
||||||
Builds a new cube from an existing cube, using a list of dimension indexes
|
|
||||||
from an array. Can be used to find both the ll and ur coordinate of single
|
|
||||||
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
|
|
||||||
Or can be used to drop dimensions, or reorder them as desired, e.g.:
|
|
||||||
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = '(5, 3, 1, 1),(8, 7, 6, 6)'
|
|
||||||
|
|
||||||
cube_is_point(cube) returns bool
|
|
||||||
cube_is_point returns true if a cube is also a point. This is true when the
|
|
||||||
two defining corners are the same.
|
|
||||||
|
|
||||||
cube_enlarge(cube, double, int) returns cube
|
|
||||||
cube_enlarge increases the size of a cube by a specified radius in at least
|
|
||||||
n dimensions. If the radius is negative the box is shrunk instead. This
|
|
||||||
is useful for creating bounding boxes around a point for searching for
|
|
||||||
nearby points. All defined dimensions are changed by the radius. If n
|
|
||||||
is greater than the number of defined dimensions and the cube is being
|
|
||||||
increased (r >= 0) then 0 is used as the base for the extra coordinates.
|
|
||||||
LL coordinates are decreased by r and UR coordinates are increased by r. If
|
|
||||||
a LL coordinate is increased to larger than the corresponding UR coordinate
|
|
||||||
(this can only happen when r < 0) than both coordinates are set to their
|
|
||||||
average. To make it harder for people to break things there is an effective
|
|
||||||
maximum on the dimension of cubes of 100. This is set in cubedata.h if
|
|
||||||
you need something bigger.
|
|
||||||
|
|
||||||
There are a few other potentially useful functions defined in cube.c
|
|
||||||
that vanished from the schema because I stopped using them. Some of
|
|
||||||
these were meant to support type casting. Let me know if I was wrong:
|
|
||||||
I will then add them back to the schema. I would also appreciate
|
|
||||||
other ideas that would enhance the type and make it more useful.
|
|
||||||
|
|
||||||
For examples of usage, see sql/cube.sql
|
|
||||||
|
|
||||||
|
|
||||||
CREDITS
|
|
||||||
=======
|
|
||||||
|
|
||||||
This code is essentially based on the example written for
|
|
||||||
Illustra, http://garcia.me.berkeley.edu/~adong/rtree
|
|
||||||
|
|
||||||
My thanks are primarily to Prof. Joe Hellerstein
|
|
||||||
(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST
|
|
||||||
(http://gist.cs.berkeley.edu/), and to his former student, Andy Dong
|
|
||||||
(http://best.me.berkeley.edu/~adong/), for his exemplar.
|
|
||||||
I am also grateful to all postgres developers, present and past, for enabling
|
|
||||||
myself to create my own world and live undisturbed in it. And I would like to
|
|
||||||
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
|
|
||||||
for the years of faithful support of my database research.
|
|
||||||
|
|
||||||
------------------------------------------------------------------------
|
|
||||||
Gene Selkov, Jr.
|
|
||||||
Computational Scientist
|
|
||||||
Mathematics and Computer Science Division
|
|
||||||
Argonne National Laboratory
|
|
||||||
9700 S Cass Ave.
|
|
||||||
Building 221
|
|
||||||
Argonne, IL 60439-4844
|
|
||||||
|
|
||||||
selkovjr@mcs.anl.gov
|
|
||||||
|
|
||||||
------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Minor updates to this package were made by Bruno Wolff III <bruno@wolff.to>
|
|
||||||
in August/September of 2002.
|
|
||||||
|
|
||||||
These include changing the precision from single precision to double
|
|
||||||
precision and adding some new functions.
|
|
||||||
|
|
||||||
------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Additional updates were made by Joshua Reich <josh@root.net> in July 2006.
|
|
||||||
|
|
||||||
These include cube(float8[], float8[]) and cleaning up the code to use
|
|
||||||
the V1 call protocol instead of the deprecated V0 form.
|
|
|
@ -1,109 +0,0 @@
|
||||||
/*
|
|
||||||
* dblink
|
|
||||||
*
|
|
||||||
* Functions returning results from a remote database
|
|
||||||
*
|
|
||||||
* Joe Conway <mail@joeconway.com>
|
|
||||||
* And contributors:
|
|
||||||
* Darko Prenosil <Darko.Prenosil@finteh.hr>
|
|
||||||
* Shridhar Daithankar <shridhar_daithankar@persistent.co.in>
|
|
||||||
* Kai Londenberg (K.Londenberg@librics.de)
|
|
||||||
*
|
|
||||||
* Copyright (c) 2001-2007, PostgreSQL Global Development Group
|
|
||||||
* ALL RIGHTS RESERVED;
|
|
||||||
*
|
|
||||||
* Permission to use, copy, modify, and distribute this software and its
|
|
||||||
* documentation for any purpose, without fee, and without a written agreement
|
|
||||||
* is hereby granted, provided that the above copyright notice and this
|
|
||||||
* paragraph and the following two paragraphs appear in all copies.
|
|
||||||
*
|
|
||||||
* IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
|
|
||||||
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
|
|
||||||
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
|
|
||||||
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
|
|
||||||
* POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
*
|
|
||||||
* THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIMS ANY WARRANTIES,
|
|
||||||
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
|
|
||||||
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
|
|
||||||
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
|
|
||||||
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
|
|
||||||
Release Notes:
|
|
||||||
27 August 2006
|
|
||||||
- Added async query capability. Original patch by
|
|
||||||
Kai Londenberg (K.Londenberg@librics.de), modified by Joe Conway
|
|
||||||
Version 0.7 (as of 25 Feb, 2004)
|
|
||||||
- Added new version of dblink, dblink_exec, dblink_open, dblink_close,
|
|
||||||
and, dblink_fetch -- allows ERROR on remote side of connection to
|
|
||||||
throw NOTICE locally instead of ERROR
|
|
||||||
Version 0.6
|
|
||||||
- functions deprecated in 0.5 have been removed
|
|
||||||
- added ability to create "named" persistent connections
|
|
||||||
Version 0.5
|
|
||||||
- dblink now supports use directly as a table function; this is the new
|
|
||||||
preferred usage going forward
|
|
||||||
- Use of dblink_tok is now deprecated; original form of dblink is also
|
|
||||||
deprecated. They _will_ be removed in the next version.
|
|
||||||
- dblink_last_oid is also deprecated; use dblink_exec() which returns
|
|
||||||
the command status as a single row, single column result.
|
|
||||||
- Original dblink, dblink_tok, and dblink_last_oid are commented out in
|
|
||||||
dblink.sql; remove the comments to use the deprecated functions.
|
|
||||||
- dblink_strtok() and dblink_replace() functions were removed. Use
|
|
||||||
split() and replace() respectively (new backend functions in
|
|
||||||
PostgreSQL 7.3) instead.
|
|
||||||
- New functions: dblink_exec() for non-SELECT queries; dblink_connect()
|
|
||||||
opens connection that persists for duration of a backend;
|
|
||||||
dblink_disconnect() closes a persistent connection; dblink_open()
|
|
||||||
opens a cursor; dblink_fetch() fetches results from an open cursor.
|
|
||||||
dblink_close() closes a cursor.
|
|
||||||
- New test suite: dblink_check.sh, dblink.test.sql,
|
|
||||||
dblink.test.expected.out. Execute dblink_check.sh from the same
|
|
||||||
directory as the other two files. Output is dblink.test.out and
|
|
||||||
dblink.test.diff. Note that dblink.test.sql is a good source
|
|
||||||
of example usage.
|
|
||||||
|
|
||||||
Version 0.4
|
|
||||||
- removed cursor wrap around input sql to allow for remote
|
|
||||||
execution of INSERT/UPDATE/DELETE
|
|
||||||
- dblink now returns a resource id instead of a real pointer
|
|
||||||
- added several utility functions -- see below
|
|
||||||
|
|
||||||
Version 0.3
|
|
||||||
- fixed dblink invalid pointer causing corrupt elog message
|
|
||||||
- fixed dblink_tok improper handling of null results
|
|
||||||
- fixed examples in README.dblink
|
|
||||||
|
|
||||||
Version 0.2
|
|
||||||
- initial release
|
|
||||||
|
|
||||||
Installation:
|
|
||||||
Place these files in a directory called 'dblink' under 'contrib' in the PostgreSQL source tree. Then run:
|
|
||||||
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
|
|
||||||
You can use dblink.sql to create the functions in your database of choice, e.g.
|
|
||||||
|
|
||||||
psql template1 < dblink.sql
|
|
||||||
|
|
||||||
installs dblink functions into database template1
|
|
||||||
|
|
||||||
Documentation:
|
|
||||||
|
|
||||||
Note: Parameters representing relation names must include double
|
|
||||||
quotes if the names are mixed-case or contain special characters. They
|
|
||||||
must also be appropriately qualified with schema name if applicable.
|
|
||||||
|
|
||||||
See the following files:
|
|
||||||
doc/connection
|
|
||||||
doc/cursor
|
|
||||||
doc/query
|
|
||||||
doc/execute
|
|
||||||
doc/misc
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
-- Joe Conway
|
|
||||||
|
|
|
@ -1,127 +0,0 @@
|
||||||
This contrib package contains two different approaches to calculating
|
|
||||||
great circle distances on the surface of the Earth. The one described
|
|
||||||
first depends on the contrib/cube package (which MUST be installed before
|
|
||||||
earthdistance is installed). The second one is based on the point
|
|
||||||
datatype using latitude and longitude for the coordinates. The install
|
|
||||||
script makes the defined functions executable by anyone.
|
|
||||||
|
|
||||||
Make sure contrib/cube has been installed.
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
make installcheck
|
|
||||||
|
|
||||||
To use these functions in a particular database as a postgres superuser do:
|
|
||||||
psql databasename < earthdistance.sql
|
|
||||||
|
|
||||||
-------------------------------------------
|
|
||||||
contrib/cube based Earth distance functions
|
|
||||||
Bruno Wolff III
|
|
||||||
September 2002
|
|
||||||
|
|
||||||
A spherical model of the Earth is used.
|
|
||||||
|
|
||||||
Data is stored in cubes that are points (both corners are the same) using 3
|
|
||||||
coordinates representing the distance from the center of the Earth.
|
|
||||||
|
|
||||||
The radius of the Earth is obtained from the earth() function. It is
|
|
||||||
given in meters. But by changing this one function you can change it
|
|
||||||
to use some other units or to use a different value of the radius
|
|
||||||
that you feel is more appropiate.
|
|
||||||
|
|
||||||
This package also has applications to astronomical databases as well.
|
|
||||||
Astronomers will probably want to change earth() to return a radius of
|
|
||||||
180/pi() so that distances are in degrees.
|
|
||||||
|
|
||||||
Functions are provided to allow for input in latitude and longitude (in
|
|
||||||
degrees), to allow for output of latitude and longitude, to calculate
|
|
||||||
the great circle distance between two points and to easily specify a
|
|
||||||
bounding box usable for index searches.
|
|
||||||
|
|
||||||
The functions are all 'sql' functions. If you want to make these functions
|
|
||||||
executable by other people you will also have to make the referenced
|
|
||||||
cube functions executable. cube(text), cube(float8), cube(cube,float8),
|
|
||||||
cube_distance(cube,cube), cube_ll_coord(cube,int) and
|
|
||||||
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
|
|
||||||
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
|
|
||||||
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
|
|
||||||
might be useful for looking at bounding box coordinates in user applications.
|
|
||||||
|
|
||||||
A domain of type cube named earth is defined.
|
|
||||||
There are constraints on it defined to make sure the cube is a point,
|
|
||||||
that it does not have more than 3 dimensions and that it is very near
|
|
||||||
the surface of a sphere centered about the origin with the radius of
|
|
||||||
the Earth.
|
|
||||||
|
|
||||||
The following functions are provided:
|
|
||||||
|
|
||||||
earth() - Returns the radius of the Earth in meters.
|
|
||||||
|
|
||||||
sec_to_gc(float8) - Converts the normal straight line (secant) distance between
|
|
||||||
between two points on the surface of the Earth to the great circle distance
|
|
||||||
between them.
|
|
||||||
|
|
||||||
gc_to_sec(float8) - Converts the great circle distance between two points
|
|
||||||
on the surface of the Earth to the normal straight line (secant) distance
|
|
||||||
between them.
|
|
||||||
|
|
||||||
ll_to_earth(float8, float8) - Returns the location of a point on the surface
|
|
||||||
of the Earth given its latitude (argument 1) and longitude (argument 2) in
|
|
||||||
degrees.
|
|
||||||
|
|
||||||
latitude(earth) - Returns the latitude in degrees of a point on the surface
|
|
||||||
of the Earth.
|
|
||||||
|
|
||||||
longitude(earth) - Returns the longitude in degrees of a point on the surface
|
|
||||||
of the Earth.
|
|
||||||
|
|
||||||
earth_distance(earth, earth) - Returns the great circle distance between
|
|
||||||
two points on the surface of the Earth.
|
|
||||||
|
|
||||||
earth_box(earth, float8) - Returns a box suitable for an indexed search using
|
|
||||||
the cube @> operator for points within a given great circle distance of a
|
|
||||||
location. Some points in this box are further than the specified great circle
|
|
||||||
distance from the location so a second check using earth_distance should be
|
|
||||||
made at the same time.
|
|
||||||
|
|
||||||
One advantage of using cube representation over a point using latitude and
|
|
||||||
longitude for coordinates, is that you don't have to worry about special
|
|
||||||
conditions at +/- 180 degrees of longitude or near the poles.
|
|
||||||
|
|
||||||
Below is the documentation for the Earth distance operator that works
|
|
||||||
with the point data type.
|
|
||||||
|
|
||||||
---------------------------------------------------------------------
|
|
||||||
|
|
||||||
I corrected a bug in the geo_distance code where two double constants
|
|
||||||
were declared as int. I also changed the distance function to use
|
|
||||||
the haversine formula which is more accurate for small distances.
|
|
||||||
Bruno Wolff
|
|
||||||
September 2002
|
|
||||||
|
|
||||||
---------------------------------------------------------------------
|
|
||||||
|
|
||||||
Date: Wed, 1 Apr 1998 15:19:32 -0600 (CST)
|
|
||||||
From: Hal Snyder <hal@vailsys.com>
|
|
||||||
To: vmehr@ctp.com
|
|
||||||
Subject: [QUESTIONS] Re: Spatial data, R-Trees
|
|
||||||
|
|
||||||
> From: Vivek Mehra <vmehr@ctp.com>
|
|
||||||
> Date: Wed, 1 Apr 1998 10:06:50 -0500
|
|
||||||
|
|
||||||
> Am just starting out with PostgreSQL and would like to learn more about
|
|
||||||
> the spatial data handling ablilities of postgreSQL - in terms of using
|
|
||||||
> R-tree indexes, user defined types, operators and functions.
|
|
||||||
>
|
|
||||||
> Would you be able to suggest where I could find some code and SQL to
|
|
||||||
> look at to create these?
|
|
||||||
|
|
||||||
Here's the setup for adding an operator '<@>' to give distance in
|
|
||||||
statute miles between two points on the Earth's surface. Coordinates
|
|
||||||
are in degrees. Points are taken as (longitude, latitude) and not vice
|
|
||||||
versa as longitude is closer to the intuitive idea of x-axis and
|
|
||||||
latitude to y-axis.
|
|
||||||
|
|
||||||
There's C source, Makefile for FreeBSD, and SQL for installing and
|
|
||||||
testing the function.
|
|
||||||
|
|
||||||
Let me know if anything looks fishy!
|
|
|
@ -1,144 +0,0 @@
|
||||||
/*
|
|
||||||
* fuzzystrmatch.c
|
|
||||||
*
|
|
||||||
* Functions for "fuzzy" comparison of strings
|
|
||||||
*
|
|
||||||
* Joe Conway <mail@joeconway.com>
|
|
||||||
*
|
|
||||||
* Copyright (c) 2001-2007, PostgreSQL Global Development Group
|
|
||||||
* ALL RIGHTS RESERVED;
|
|
||||||
*
|
|
||||||
* levenshtein()
|
|
||||||
* -------------
|
|
||||||
* Written based on a description of the algorithm by Michael Gilleland
|
|
||||||
* found at http://www.merriampark.com/ld.htm
|
|
||||||
* Also looked at levenshtein.c in the PHP 4.0.6 distribution for
|
|
||||||
* inspiration.
|
|
||||||
*
|
|
||||||
* metaphone()
|
|
||||||
* -----------
|
|
||||||
* Modified for PostgreSQL by Joe Conway.
|
|
||||||
* Based on CPAN's "Text-Metaphone-1.96" by Michael G Schwern <schwern@pobox.com>
|
|
||||||
* Code slightly modified for use as PostgreSQL function (palloc, elog, etc).
|
|
||||||
* Metaphone was originally created by Lawrence Philips and presented in article
|
|
||||||
* in "Computer Language" December 1990 issue.
|
|
||||||
*
|
|
||||||
* dmetaphone() and dmetaphone_alt()
|
|
||||||
* ---------------------------------
|
|
||||||
* A port of the DoubleMetaphone perl module by Andrew Dunstan. See dmetaphone.c
|
|
||||||
* for more detail.
|
|
||||||
*
|
|
||||||
* soundex()
|
|
||||||
* -----------
|
|
||||||
* Folded existing soundex contrib into this one. Renamed text_soundex() (C function)
|
|
||||||
* to soundex() for consistency.
|
|
||||||
*
|
|
||||||
* difference()
|
|
||||||
* ------------
|
|
||||||
* Return the difference between two strings' soundex values. Kris Jurka
|
|
||||||
*
|
|
||||||
* Permission to use, copy, modify, and distribute this software and its
|
|
||||||
* documentation for any purpose, without fee, and without a written agreement
|
|
||||||
* is hereby granted, provided that the above copyright notice and this
|
|
||||||
* paragraph and the following two paragraphs appear in all copies.
|
|
||||||
*
|
|
||||||
* IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
|
|
||||||
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
|
|
||||||
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
|
|
||||||
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
|
|
||||||
* POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
*
|
|
||||||
* THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
|
|
||||||
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
|
|
||||||
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
|
|
||||||
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
|
|
||||||
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
|
|
||||||
|
|
||||||
Version 0.3 (30 June, 2004):
|
|
||||||
|
|
||||||
Release Notes:
|
|
||||||
Version 0.3
|
|
||||||
- added double metaphone code from Andrew Dunstan
|
|
||||||
- change metaphone so that an empty input string causes an empty
|
|
||||||
output string to be returned, instead of throwing an ERROR
|
|
||||||
- fixed examples in README.soundex
|
|
||||||
|
|
||||||
Version 0.2
|
|
||||||
- folded soundex contrib into this one
|
|
||||||
|
|
||||||
Version 0.1
|
|
||||||
- initial release
|
|
||||||
|
|
||||||
Installation:
|
|
||||||
Place these files in a directory called 'fuzzystrmatch' under 'contrib' in the PostgreSQL source tree. Then run:
|
|
||||||
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
|
|
||||||
You can use fuzzystrmatch.sql to create the functions in your database of choice, e.g.
|
|
||||||
|
|
||||||
psql -U postgres template1 < fuzzystrmatch.sql
|
|
||||||
|
|
||||||
installs following functions into database template1:
|
|
||||||
|
|
||||||
levenshtein() - calculates the levenshtein distance between two strings
|
|
||||||
metaphone() - calculates the metaphone code of an input string
|
|
||||||
|
|
||||||
Documentation
|
|
||||||
==================================================================
|
|
||||||
Name
|
|
||||||
|
|
||||||
levenshtein -- calculates the levenshtein distance between two strings
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
|
|
||||||
levenshtein(text source, text target)
|
|
||||||
|
|
||||||
Inputs
|
|
||||||
|
|
||||||
source
|
|
||||||
any text string, 255 characters max, NOT NULL
|
|
||||||
|
|
||||||
target
|
|
||||||
any text string, 255 characters max, NOT NULL
|
|
||||||
|
|
||||||
Outputs
|
|
||||||
|
|
||||||
Returns int
|
|
||||||
|
|
||||||
Example usage
|
|
||||||
|
|
||||||
select levenshtein('GUMBO','GAMBOL');
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
Name
|
|
||||||
|
|
||||||
metaphone -- calculates the metaphone code of an input string
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
|
|
||||||
metaphone(text source, int max_output_length)
|
|
||||||
|
|
||||||
Inputs
|
|
||||||
|
|
||||||
source
|
|
||||||
any text string, 255 characters max, NOT NULL
|
|
||||||
|
|
||||||
max_output_length
|
|
||||||
maximum length of the output metaphone code; if longer, the output
|
|
||||||
is truncated to this length
|
|
||||||
|
|
||||||
Outputs
|
|
||||||
|
|
||||||
Returns text
|
|
||||||
|
|
||||||
Example usage
|
|
||||||
|
|
||||||
select metaphone('GUMBO',4);
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
-- Joe Conway
|
|
||||||
|
|
|
@ -1,66 +0,0 @@
|
||||||
NOTE: Modified August 07, 2001 by Joe Conway. Updated for accuracy
|
|
||||||
after combining soundex code into the fuzzystrmatch contrib
|
|
||||||
---------------------------------------------------------------------
|
|
||||||
The Soundex system is a method of matching similar sounding names
|
|
||||||
(or any words) to the same code. It was initially used by the
|
|
||||||
United States Census in 1880, 1900, and 1910, but it has little use
|
|
||||||
beyond English names (or the English pronunciation of names), and
|
|
||||||
it is not a linguistic tool.
|
|
||||||
|
|
||||||
When comparing two soundex values to determine similarity, the
|
|
||||||
difference function reports how close the match is on a scale
|
|
||||||
from zero to four, with zero being no match and four being an
|
|
||||||
exact match.
|
|
||||||
|
|
||||||
The following are some usage examples:
|
|
||||||
|
|
||||||
SELECT soundex('hello world!');
|
|
||||||
|
|
||||||
SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
|
|
||||||
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
|
|
||||||
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
|
|
||||||
|
|
||||||
CREATE TABLE s (nm text);
|
|
||||||
|
|
||||||
INSERT INTO s VALUES ('john');
|
|
||||||
INSERT INTO s VALUES ('joan');
|
|
||||||
INSERT INTO s VALUES ('wobbly');
|
|
||||||
INSERT INTO s VALUES ('jack');
|
|
||||||
|
|
||||||
SELECT * FROM s WHERE soundex(nm) = soundex('john');
|
|
||||||
|
|
||||||
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid;
|
|
||||||
|
|
||||||
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
|
|
||||||
'select soundex($1) = soundex($2)'
|
|
||||||
LANGUAGE SQL;
|
|
||||||
|
|
||||||
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
|
|
||||||
'select soundex($1) < soundex($2)'
|
|
||||||
LANGUAGE SQL;
|
|
||||||
|
|
||||||
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
|
|
||||||
'select soundex($1) > soundex($2)'
|
|
||||||
LANGUAGE SQL;
|
|
||||||
|
|
||||||
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
|
|
||||||
'select soundex($1) <= soundex($2)'
|
|
||||||
LANGUAGE SQL;
|
|
||||||
|
|
||||||
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
|
|
||||||
'select soundex($1) >= soundex($2)'
|
|
||||||
LANGUAGE SQL;
|
|
||||||
|
|
||||||
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
|
|
||||||
'select soundex($1) <> soundex($2)'
|
|
||||||
LANGUAGE SQL;
|
|
||||||
|
|
||||||
DROP OPERATOR #= (text, text);
|
|
||||||
|
|
||||||
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
|
|
||||||
|
|
||||||
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
|
|
||||||
|
|
||||||
SELECT * FROM s WHERE s.nm #= 'john';
|
|
||||||
|
|
||||||
SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
|
|
|
@ -1,188 +0,0 @@
|
||||||
Hstore - contrib module for storing (key,value) pairs
|
|
||||||
|
|
||||||
[Online version] (http://www.sai.msu.su/~megera/oddmuse/index.cgi?Hstore)
|
|
||||||
|
|
||||||
Motivation
|
|
||||||
|
|
||||||
Many attributes rarely searched, semistructural data, lazy DBA
|
|
||||||
|
|
||||||
Authors
|
|
||||||
|
|
||||||
* Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
|
|
||||||
* Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
|
|
||||||
|
|
||||||
LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
|
|
||||||
itself)
|
|
||||||
|
|
||||||
Operations
|
|
||||||
|
|
||||||
* hstore -> text - get value , perl analogy $h{key}
|
|
||||||
|
|
||||||
select 'a=>q, b=>g'->'a';
|
|
||||||
?
|
|
||||||
------
|
|
||||||
q
|
|
||||||
|
|
||||||
* hstore || hstore - concatenation, perl analogy %a=( %b, %c );
|
|
||||||
|
|
||||||
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
|
|
||||||
?column?
|
|
||||||
--------------------
|
|
||||||
"a"=>"b", "c"=>"d"
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
but, notice
|
|
||||||
|
|
||||||
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
"a"=>"d"
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
* text => text - creates hstore type from two text strings
|
|
||||||
|
|
||||||
select 'a'=>'b';
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
"a"=>"b"
|
|
||||||
|
|
||||||
* hstore @> hstore - contains operation, check if left operand contains right.
|
|
||||||
|
|
||||||
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
f
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
t
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
* hstore <@ hstore - contained operation, check if left operand is contained
|
|
||||||
in right
|
|
||||||
|
|
||||||
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
|
||||||
respectively called @ and ~. These names are still available, but are
|
|
||||||
deprecated and will eventually be retired. Notice that the old names
|
|
||||||
are reversed from the convention formerly followed by the core geometric
|
|
||||||
datatypes!)
|
|
||||||
|
|
||||||
Functions
|
|
||||||
|
|
||||||
* akeys(hstore) - returns all keys from hstore as array
|
|
||||||
|
|
||||||
regression=# select akeys('a=>1,b=>2');
|
|
||||||
akeys
|
|
||||||
-------
|
|
||||||
{a,b}
|
|
||||||
|
|
||||||
* skeys(hstore) - returns all keys from hstore as strings
|
|
||||||
|
|
||||||
regression=# select skeys('a=>1,b=>2');
|
|
||||||
skeys
|
|
||||||
-------
|
|
||||||
a
|
|
||||||
b
|
|
||||||
|
|
||||||
* avals(hstore) - returns all values from hstore as array
|
|
||||||
|
|
||||||
regression=# select avals('a=>1,b=>2');
|
|
||||||
avals
|
|
||||||
-------
|
|
||||||
{1,2}
|
|
||||||
|
|
||||||
* svals(hstore) - returns all values from hstore as strings
|
|
||||||
|
|
||||||
regression=# select svals('a=>1,b=>2');
|
|
||||||
svals
|
|
||||||
-------
|
|
||||||
1
|
|
||||||
2
|
|
||||||
|
|
||||||
* delete (hstore,text) - delete (key,value) from hstore if key matches
|
|
||||||
argument.
|
|
||||||
|
|
||||||
regression=# select delete('a=>1,b=>2','b');
|
|
||||||
delete
|
|
||||||
----------
|
|
||||||
"a"=>"1"
|
|
||||||
|
|
||||||
* each(hstore) return (key, value) pairs
|
|
||||||
|
|
||||||
regression=# select * from each('a=>1,b=>2');
|
|
||||||
key | value
|
|
||||||
-----+-------
|
|
||||||
a | 1
|
|
||||||
b | 2
|
|
||||||
|
|
||||||
* exist (hstore,text)
|
|
||||||
* hstore ? text
|
|
||||||
- returns 'true if key is exists in hstore and false otherwise.
|
|
||||||
|
|
||||||
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
|
|
||||||
exist | ?column?
|
|
||||||
-------+----------
|
|
||||||
t | t
|
|
||||||
|
|
||||||
* defined (hstore,text) - returns true if key is exists in hstore and
|
|
||||||
its value is not NULL.
|
|
||||||
|
|
||||||
regression=# select defined('a=>NULL','a');
|
|
||||||
defined
|
|
||||||
---------
|
|
||||||
f
|
|
||||||
|
|
||||||
Indices
|
|
||||||
|
|
||||||
Module provides index support for '@>' and '?' operations.
|
|
||||||
|
|
||||||
create index hidx on testhstore using gist(h);
|
|
||||||
create index hidx on testhstore using gin(h);
|
|
||||||
|
|
||||||
Note
|
|
||||||
|
|
||||||
Use parenthesis in select below, because priority of 'is' is higher than that of '->'
|
|
||||||
|
|
||||||
select id from entrants where (info->'education_period') is not null;
|
|
||||||
|
|
||||||
Examples
|
|
||||||
|
|
||||||
* add key
|
|
||||||
|
|
||||||
update tt set h=h||'c=>3';
|
|
||||||
|
|
||||||
* delete key
|
|
||||||
|
|
||||||
update tt set h=delete(h,'k1');
|
|
||||||
|
|
||||||
* Statistics
|
|
||||||
|
|
||||||
hstore type, because of its intrinsic liberality, could contain a lot of
|
|
||||||
different keys. Checking for valid keys is the task of application.
|
|
||||||
Examples below demonstrate several techniques how to check keys statistics.
|
|
||||||
|
|
||||||
o simple example
|
|
||||||
|
|
||||||
select * from each('aaa=>bq, b=>NULL, ""=>1 ');
|
|
||||||
|
|
||||||
o using table
|
|
||||||
|
|
||||||
select (each(h)).key, (each(h)).value into stat from testhstore ;
|
|
||||||
|
|
||||||
o online stat
|
|
||||||
|
|
||||||
select key, count(*) from (select (each(h)).key from testhstore) as stat group by key order by count desc, key;
|
|
||||||
key | count
|
|
||||||
-----------+-------
|
|
||||||
line | 883
|
|
||||||
query | 207
|
|
||||||
pos | 203
|
|
||||||
node | 202
|
|
||||||
space | 197
|
|
||||||
status | 195
|
|
||||||
public | 194
|
|
||||||
title | 190
|
|
||||||
org | 189
|
|
||||||
...................
|
|
|
@ -1,55 +0,0 @@
|
||||||
Integer aggregator/enumerator.
|
|
||||||
|
|
||||||
Many database systems have the notion of a one to many table.
|
|
||||||
|
|
||||||
A one to many table usually sits between two indexed tables,
|
|
||||||
as:
|
|
||||||
|
|
||||||
create table one_to_many(left int, right int) ;
|
|
||||||
|
|
||||||
And it is used like this:
|
|
||||||
|
|
||||||
SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
|
|
||||||
WHERE one_to_many.left = item;
|
|
||||||
|
|
||||||
This will return all the items in the right hand table for an entry
|
|
||||||
in the left hand table. This is a very common construct in SQL.
|
|
||||||
|
|
||||||
Now, this methodology can be cumbersome with a very large number of
|
|
||||||
entries in the one_to_many table. Depending on the order in which
|
|
||||||
data was entered, a join like this could result in an index scan
|
|
||||||
and a fetch for each right hand entry in the table for a particular
|
|
||||||
left hand entry.
|
|
||||||
|
|
||||||
If you have a very dynamic system, there is not much you can do.
|
|
||||||
However, if you have some data which is fairly static, you can
|
|
||||||
create a summary table with the aggregator.
|
|
||||||
|
|
||||||
CREATE TABLE summary as SELECT left, int_array_aggregate(right)
|
|
||||||
AS right FROM one_to_many GROUP BY left;
|
|
||||||
|
|
||||||
This will create a table with one row per left item, and an array
|
|
||||||
of right items. Now this is pretty useless without some way of using
|
|
||||||
the array, thats why there is an array enumerator.
|
|
||||||
|
|
||||||
SELECT left, int_array_enum(right) FROM summary WHERE left = item;
|
|
||||||
|
|
||||||
The above query using int_array_enum, produces the same results as:
|
|
||||||
|
|
||||||
SELECT left, right FROM one_to_many WHERE left = item;
|
|
||||||
|
|
||||||
The difference is that the query against the summary table has to get
|
|
||||||
only one row from the table, where as the query against "one_to_many"
|
|
||||||
must index scan and fetch a row for each entry.
|
|
||||||
|
|
||||||
On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
|
|
||||||
to a cost of 329. The query is a join between the one_to_many table,
|
|
||||||
|
|
||||||
select right, count(right) from
|
|
||||||
(
|
|
||||||
select left, int_array_enum(right) as right from summary join
|
|
||||||
(select left from left_table where left = item) as lefts
|
|
||||||
ON (summary.left = lefts.left )
|
|
||||||
) as list group by right order by count desc ;
|
|
||||||
|
|
||||||
|
|
|
@ -1,185 +0,0 @@
|
||||||
This is an implementation of RD-tree data structure using GiST interface
|
|
||||||
of PostgreSQL. It has built-in lossy compression.
|
|
||||||
|
|
||||||
Current implementation provides index support for one-dimensional array of
|
|
||||||
integers: gist__int_ops, suitable for small and medium size of arrays (used by
|
|
||||||
default), and gist__intbig_ops for indexing large arrays (we use superimposed
|
|
||||||
signature with length of 4096 bits to represent sets). There is also a
|
|
||||||
non-default gin__int_ops for GIN indexes on integer arrays.
|
|
||||||
|
|
||||||
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
|
|
||||||
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist
|
|
||||||
for additional information. Andrey Oktyabrski did a great work on
|
|
||||||
adding new functions and operations.
|
|
||||||
|
|
||||||
|
|
||||||
FUNCTIONS:
|
|
||||||
|
|
||||||
int icount(int[]) - the number of elements in intarray
|
|
||||||
|
|
||||||
test=# select icount('{1,2,3}'::int[]);
|
|
||||||
icount
|
|
||||||
--------
|
|
||||||
3
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
int[] sort(int[], 'asc' | 'desc') - sort intarray
|
|
||||||
|
|
||||||
test=# select sort('{1,2,3}'::int[],'desc');
|
|
||||||
sort
|
|
||||||
---------
|
|
||||||
{3,2,1}
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
int[] sort(int[]) - sort in ascending order
|
|
||||||
int[] sort_asc(int[]),sort_desc(int[]) - shortcuts for sort
|
|
||||||
|
|
||||||
int[] uniq(int[]) - returns unique elements
|
|
||||||
|
|
||||||
test=# select uniq(sort('{1,2,3,2,1}'::int[]));
|
|
||||||
uniq
|
|
||||||
---------
|
|
||||||
{1,2,3}
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
int idx(int[], int item) - returns index of first intarray matching element to item, or
|
|
||||||
'0' if matching failed.
|
|
||||||
|
|
||||||
test=# select idx('{1,2,3,2,1}'::int[],2);
|
|
||||||
idx
|
|
||||||
-----
|
|
||||||
2
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
|
|
||||||
int[] subarray(int[],int START [, int LEN]) - returns part of intarray starting from
|
|
||||||
element number START (from 1) and length LEN.
|
|
||||||
|
|
||||||
test=# select subarray('{1,2,3,2,1}'::int[],2,3);
|
|
||||||
subarray
|
|
||||||
----------
|
|
||||||
{2,3,2}
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
int[] intset(int4) - casting int4 to int[]
|
|
||||||
|
|
||||||
test=# select intset(1);
|
|
||||||
intset
|
|
||||||
--------
|
|
||||||
{1}
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
OPERATIONS:
|
|
||||||
|
|
||||||
int[] && int[] - overlap - returns TRUE if arrays have at least one common element
|
|
||||||
int[] @> int[] - contains - returns TRUE if left array contains right array
|
|
||||||
int[] <@ int[] - contained - returns TRUE if left array is contained in right array
|
|
||||||
# int[] - returns the number of elements in array
|
|
||||||
int[] + int - push element to array ( add to end of array)
|
|
||||||
int[] + int[] - merge of arrays (right array added to the end of left one)
|
|
||||||
int[] - int - remove entries matched by right argument from array
|
|
||||||
int[] - int[] - remove right array from left
|
|
||||||
int[] | int - returns intarray - union of arguments
|
|
||||||
int[] | int[] - returns intarray as a union of two arrays
|
|
||||||
int[] & int[] - returns intersection of arrays
|
|
||||||
int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)')
|
|
||||||
query_int ~~ int[] - returns TRUE if array satisfies query (commutator of @@)
|
|
||||||
|
|
||||||
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
|
||||||
respectively called @ and ~. These names are still available, but are
|
|
||||||
deprecated and will eventually be retired. Notice that the old names
|
|
||||||
are reversed from the convention formerly followed by the core geometric
|
|
||||||
datatypes!)
|
|
||||||
|
|
||||||
CHANGES:
|
|
||||||
|
|
||||||
August 6, 2002
|
|
||||||
1. Reworked patch from Andrey Oktyabrski (ano@spider.ru) with
|
|
||||||
functions: icount, sort, sort_asc, uniq, idx, subarray
|
|
||||||
operations: #, +, -, |, &
|
|
||||||
October 1, 2001
|
|
||||||
1. Change search method in array to binary
|
|
||||||
September 28, 2001
|
|
||||||
1. gist__int_ops now is without lossy
|
|
||||||
2. add sort entry in picksplit
|
|
||||||
September 21, 2001
|
|
||||||
1. Added support for boolean query (indexable operator @@, looks like
|
|
||||||
a @@ '1|(2&3)', perfomance is better in any case )
|
|
||||||
2. Done some small optimizations
|
|
||||||
March 19, 2001
|
|
||||||
1. Added support for toastable keys
|
|
||||||
2. Improved split algorithm for intbig (selection speedup is about 30%)
|
|
||||||
|
|
||||||
INSTALLATION:
|
|
||||||
|
|
||||||
gmake
|
|
||||||
gmake install
|
|
||||||
-- load functions
|
|
||||||
psql <database> < _int.sql
|
|
||||||
|
|
||||||
REGRESSION TEST:
|
|
||||||
|
|
||||||
gmake installcheck
|
|
||||||
|
|
||||||
EXAMPLE USAGE:
|
|
||||||
|
|
||||||
create table message (mid int not null,sections int[]);
|
|
||||||
create table message_section_map (mid int not null,sid int not null);
|
|
||||||
|
|
||||||
-- create indices
|
|
||||||
CREATE unique index message_key on message ( mid );
|
|
||||||
CREATE unique index message_section_map_key2 on message_section_map (sid, mid );
|
|
||||||
CREATE INDEX message_rdtree_idx on message using gist ( sections gist__int_ops);
|
|
||||||
|
|
||||||
-- select some messages with section in 1 OR 2 - OVERLAP operator
|
|
||||||
select message.mid from message where message.sections && '{1,2}';
|
|
||||||
|
|
||||||
-- select messages contains in sections 1 AND 2 - CONTAINS operator
|
|
||||||
select message.mid from message where message.sections @> '{1,2}';
|
|
||||||
-- the same, CONTAINED operator
|
|
||||||
select message.mid from message where '{1,2}' <@ message.sections;
|
|
||||||
|
|
||||||
BENCHMARK:
|
|
||||||
|
|
||||||
subdirectory bench contains benchmark suite.
|
|
||||||
cd ./bench
|
|
||||||
1. createdb TEST
|
|
||||||
2. psql TEST < ../_int.sql
|
|
||||||
3. ./create_test.pl | psql TEST
|
|
||||||
4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
|
|
||||||
with/without RD-Tree. Run script without arguments to
|
|
||||||
see availbale options.
|
|
||||||
|
|
||||||
a)test without RD-Tree (OR)
|
|
||||||
./bench.pl -d TEST -c -s 1,2 -v
|
|
||||||
b)test with RD-Tree
|
|
||||||
./bench.pl -d TEST -c -s 1,2 -v -r
|
|
||||||
|
|
||||||
BENCHMARKS:
|
|
||||||
|
|
||||||
Size of table <message>: 200000
|
|
||||||
Size of table <message_section_map>: 269133
|
|
||||||
|
|
||||||
Distribution of messages by sections:
|
|
||||||
|
|
||||||
section 0: 74377 messages
|
|
||||||
section 1: 16284 messages
|
|
||||||
section 50: 1229 messages
|
|
||||||
section 99: 683 messages
|
|
||||||
|
|
||||||
old - without RD-Tree support,
|
|
||||||
new - with RD-Tree
|
|
||||||
|
|
||||||
+----------+---------------+----------------+
|
|
||||||
|Search set|OR, time in sec|AND, time in sec|
|
|
||||||
| +-------+-------+--------+-------+
|
|
||||||
| | old | new | old | new |
|
|
||||||
+----------+-------+-------+--------+-------+
|
|
||||||
| 1| 0.625| 0.101| -| -|
|
|
||||||
+----------+-------+-------+--------+-------+
|
|
||||||
| 99| 0.018| 0.017| -| -|
|
|
||||||
+----------+-------+-------+--------+-------+
|
|
||||||
| 1,2| 0.766| 0.133| 0.628| 0.045|
|
|
||||||
+----------+-------+-------+--------+-------+
|
|
||||||
| 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
|
|
||||||
+----------+-------+-------+--------+-------+
|
|
|
@ -1,220 +0,0 @@
|
||||||
|
|
||||||
-- EAN13 - UPC - ISBN (books) - ISMN (music) - ISSN (serials)
|
|
||||||
-------------------------------------------------------------
|
|
||||||
|
|
||||||
Copyright Germán Méndez Bravo (Kronuz), 2004 - 2006
|
|
||||||
This module is released under the same BSD license as the rest of PostgreSQL.
|
|
||||||
|
|
||||||
The information to implement this module was collected through
|
|
||||||
several sites, including:
|
|
||||||
http://www.isbn-international.org/
|
|
||||||
http://www.issn.org/
|
|
||||||
http://www.ismn-international.org/
|
|
||||||
http://www.wikipedia.org/
|
|
||||||
the prefixes used for hyphenation where also compiled from:
|
|
||||||
http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html
|
|
||||||
http://www.isbn-international.org/en/identifiers.html
|
|
||||||
http://www.ismn-international.org/ranges.html
|
|
||||||
Care was taken during the creation of the algorithms and they
|
|
||||||
were meticulously verified against the suggested algorithms
|
|
||||||
in the official ISBN, ISMN, ISSN User Manuals.
|
|
||||||
|
|
||||||
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
|
||||||
THIS MODULE IS PROVIDED "AS IS" AND WITHOUT ANY WARRANTY
|
|
||||||
OF ANY KIND, EXPRESS OR IMPLIED.
|
|
||||||
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
|
||||||
|
|
||||||
-- Content of the Module
|
|
||||||
-------------------------------------------------
|
|
||||||
|
|
||||||
This directory contains definitions for a few PostgreSQL
|
|
||||||
data types, for the following international-standard namespaces:
|
|
||||||
EAN13, UPC, ISBN (books), ISMN (music), and ISSN (serials). This module
|
|
||||||
is inspired by Garrett A. Wollman's isbn_issn code.
|
|
||||||
|
|
||||||
I wanted the database to fully validate numbers and also to use the
|
|
||||||
upcoming ISBN-13 and the EAN13 standards, as well as to have it
|
|
||||||
automatically doing hyphenations for ISBN numbers.
|
|
||||||
|
|
||||||
This new module validates, and automatically adds the correct
|
|
||||||
hyphenations to the numbers. Also, it supports the new ISBN-13
|
|
||||||
numbers to be used starting in January 2007.
|
|
||||||
|
|
||||||
Premises:
|
|
||||||
1. ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers
|
|
||||||
2. EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)
|
|
||||||
3. some ISBN13 numbers can be displayed as ISBN
|
|
||||||
4. some ISMN13 numbers can be displayed as ISMN
|
|
||||||
5. some ISSN13 numbers can be displayed as ISSN
|
|
||||||
6. all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers
|
|
||||||
|
|
||||||
Note: All types are internally represented as 64 bit integers,
|
|
||||||
and internally all are consistently interchangeable.
|
|
||||||
|
|
||||||
We have the following data types:
|
|
||||||
|
|
||||||
+ EAN13 for European Article Numbers.
|
|
||||||
This type will always show the EAN13-display format.
|
|
||||||
Te output function for this is -> ean13_out()
|
|
||||||
|
|
||||||
+ ISBN13 for International Standard Book Numbers to be displayed in
|
|
||||||
the new EAN13-display format.
|
|
||||||
+ ISMN13 for International Standard Music Numbers to be displayed in
|
|
||||||
the new EAN13-display format.
|
|
||||||
+ ISSN13 for International Standard Serial Numbers to be displayed
|
|
||||||
in the new EAN13-display format.
|
|
||||||
These types will always display the long version of the ISxN (EAN13)
|
|
||||||
The output function to do this is -> ean13_out()
|
|
||||||
* The need for these types is just for displaying in different
|
|
||||||
ways the same data:
|
|
||||||
ISBN13 is actually the same as ISBN, ISMN13=ISMN and ISSN13=ISSN.
|
|
||||||
|
|
||||||
+ ISBN for International Standard Book Numbers to be displayed in
|
|
||||||
the current short-display format.
|
|
||||||
+ ISMN for International Standard Music Numbers to be displayed in
|
|
||||||
the current short-display format.
|
|
||||||
+ ISSN for International Standard Serial Numbers to be displayed
|
|
||||||
in the current short-display format.
|
|
||||||
These types will display the short version of the ISxN (ISxN 10)
|
|
||||||
whenever it's possible, and it will show ISxN 13 when it's
|
|
||||||
impossible to show the short version.
|
|
||||||
The output function to do this is -> isn_out()
|
|
||||||
|
|
||||||
+ UPC for Universal Product Codes.
|
|
||||||
UPC numbers are a subset of the EAN13 numbers (they are basically
|
|
||||||
EAN13 without the first '0' digit.)
|
|
||||||
The output function to do this is also -> isn_out()
|
|
||||||
|
|
||||||
We have the following input functions:
|
|
||||||
+ To take a string and return an EAN13 -> ean13_in()
|
|
||||||
+ To take a string and return valid ISBN or ISBN13 numbers -> isbn_in()
|
|
||||||
+ To take a string and return valid ISMN or ISMN13 numbers -> ismn_in()
|
|
||||||
+ To take a string and return valid ISSN or ISSN13 numbers -> issn_in()
|
|
||||||
+ To take a string and return an UPC codes -> upc_in()
|
|
||||||
|
|
||||||
We are able to cast from:
|
|
||||||
+ ISBN13 -> EAN13
|
|
||||||
+ ISMN13 -> EAN13
|
|
||||||
+ ISSN13 -> EAN13
|
|
||||||
|
|
||||||
+ ISBN -> EAN13
|
|
||||||
+ ISMN -> EAN13
|
|
||||||
+ ISSN -> EAN13
|
|
||||||
+ UPC -> EAN13
|
|
||||||
|
|
||||||
+ ISBN <-> ISBN13
|
|
||||||
+ ISMN <-> ISMN13
|
|
||||||
+ ISSN <-> ISSN13
|
|
||||||
|
|
||||||
We have two operator classes (for btree and for hash) so each data type
|
|
||||||
can be indexed for faster access.
|
|
||||||
|
|
||||||
The C API is implemented as:
|
|
||||||
extern Datum isn_out(PG_FUNCTION_ARGS);
|
|
||||||
extern Datum ean13_out(PG_FUNCTION_ARGS);
|
|
||||||
extern Datum ean13_in(PG_FUNCTION_ARGS);
|
|
||||||
extern Datum isbn_in(PG_FUNCTION_ARGS);
|
|
||||||
extern Datum ismn_in(PG_FUNCTION_ARGS);
|
|
||||||
extern Datum issn_in(PG_FUNCTION_ARGS);
|
|
||||||
extern Datum upc_in(PG_FUNCTION_ARGS);
|
|
||||||
|
|
||||||
On success:
|
|
||||||
+ isn_out() takes any of our types and returns a string containing
|
|
||||||
the shortes possible representation of the number.
|
|
||||||
|
|
||||||
+ ean13_out() takes any of our types and returns the
|
|
||||||
EAN13 (long) representation of the number.
|
|
||||||
|
|
||||||
+ ean13_in() takes a string and return a EAN13. Which, as stated in (2)
|
|
||||||
could or could not be any of our types, but it certainly is an EAN13
|
|
||||||
number. Only if the string is a valid EAN13 number, otherwise it fails.
|
|
||||||
|
|
||||||
+ isbn_in() takes a string and return an ISBN/ISBN13. Only if the string
|
|
||||||
is really a ISBN/ISBN13, otherwise it fails.
|
|
||||||
|
|
||||||
+ ismn_in() takes a string and return an ISMN/ISMN13. Only if the string
|
|
||||||
is really a ISMN/ISMN13, otherwise it fails.
|
|
||||||
|
|
||||||
+ issn_in() takes a string and return an ISSN/ISSN13. Only if the string
|
|
||||||
is really a ISSN/ISSN13, otherwise it fails.
|
|
||||||
|
|
||||||
+ upc_in() takes a string and return an UPC. Only if the string is
|
|
||||||
really a UPC, otherwise it fails.
|
|
||||||
|
|
||||||
(on failure, the functions 'ereport' the error)
|
|
||||||
|
|
||||||
-- Testing/Playing Functions
|
|
||||||
-------------------------------------------------
|
|
||||||
isn_weak(boolean) - Sets the weak input mode.
|
|
||||||
This function is intended for testing use only!
|
|
||||||
isn_weak() gets the current status of the weak mode.
|
|
||||||
|
|
||||||
"Weak" mode is used to be able to insert "invalid" data to a table.
|
|
||||||
"Invalid" as in the check digit being wrong, not missing numbers.
|
|
||||||
|
|
||||||
Why would you want to use the weak mode? well, it could be that
|
|
||||||
you have a huge collection of ISBN numbers, and that there are so many of
|
|
||||||
them that for weird reasons some have the wrong check digit (perhaps the
|
|
||||||
numbers where scanned from a printed list and the OCR got the numbers wrong,
|
|
||||||
perhaps the numbers were manually captured... who knows.) Anyway, the thing
|
|
||||||
is you might want to clean the mess up, but you still want to be able to have
|
|
||||||
all the numbers in your database and maybe use an external tool to access
|
|
||||||
the invalid numbers in the database so you can verify the information and
|
|
||||||
validate it more easily; as selecting all the invalid numbers in the table.
|
|
||||||
|
|
||||||
When you insert invalid numbers in a table using the weak mode, the number
|
|
||||||
will be inserted with the corrected check digit, but it will be flagged
|
|
||||||
with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
|
|
||||||
|
|
||||||
You can also force the insertion of invalid numbers even not in the weak mode,
|
|
||||||
appending the '!' character at the end of the number.
|
|
||||||
|
|
||||||
To work with invalid numbers, you can use two functions:
|
|
||||||
+ make_valid(), which validates an invalid number (deleting the invalid flag)
|
|
||||||
+ is_valid(), which checks for the invalid flag presence.
|
|
||||||
|
|
||||||
-- Examples of Use
|
|
||||||
-------------------------------------------------
|
|
||||||
--Using the types directly:
|
|
||||||
select isbn('978-0-393-04002-9');
|
|
||||||
select isbn13('0901690546');
|
|
||||||
select issn('1436-4522');
|
|
||||||
|
|
||||||
--Casting types:
|
|
||||||
-- note that you can only cast from ean13 to other type when the casted
|
|
||||||
-- number would be valid in the realm of the casted type;
|
|
||||||
-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
|
|
||||||
-- but these will:
|
|
||||||
select upc(ean13('0220356483481'));
|
|
||||||
select ean13(upc('220356483481'));
|
|
||||||
|
|
||||||
--Create a table with a single column to hold ISBN numbers:
|
|
||||||
create table test ( id isbn );
|
|
||||||
insert into test values('9780393040029');
|
|
||||||
|
|
||||||
--Automatically calculating check digits (observe the '?'):
|
|
||||||
insert into test values('220500896?');
|
|
||||||
insert into test values('978055215372?');
|
|
||||||
|
|
||||||
select issn('3251231?');
|
|
||||||
select ismn('979047213542?');
|
|
||||||
|
|
||||||
--Using the weak mode:
|
|
||||||
select isn_weak(true);
|
|
||||||
insert into test values('978-0-11-000533-4');
|
|
||||||
insert into test values('9780141219307');
|
|
||||||
insert into test values('2-205-00876-X');
|
|
||||||
select isn_weak(false);
|
|
||||||
|
|
||||||
select id from test where not is_valid(id);
|
|
||||||
update test set id=make_valid(id) where id = '2-205-00876-X!';
|
|
||||||
|
|
||||||
select * from test;
|
|
||||||
|
|
||||||
select isbn13(id) from test;
|
|
||||||
|
|
||||||
-- Contact
|
|
||||||
-------------------------------------------------
|
|
||||||
Please suggestions or bug reports to kronuz at users.sourceforge.net
|
|
||||||
|
|
||||||
Last reviewed on August 23, 2006 by Kronuz.
|
|
|
@ -1,88 +0,0 @@
|
||||||
PostgreSQL type extension for managing Large Objects
|
|
||||||
----------------------------------------------------
|
|
||||||
|
|
||||||
Overview
|
|
||||||
|
|
||||||
One of the problems with the JDBC driver (and this affects the ODBC driver
|
|
||||||
also), is that the specification assumes that references to BLOBS (Binary
|
|
||||||
Large OBjectS) are stored within a table, and if that entry is changed, the
|
|
||||||
associated BLOB is deleted from the database.
|
|
||||||
|
|
||||||
As PostgreSQL stands, this doesn't occur. Large objects are treated as
|
|
||||||
objects in their own right; a table entry can reference a large object by
|
|
||||||
OID, but there can be multiple table entries referencing the same large
|
|
||||||
object OID, so the system doesn't delete the large object just because you
|
|
||||||
change or remove one such entry.
|
|
||||||
|
|
||||||
Now this is fine for new PostgreSQL-specific applications, but existing ones
|
|
||||||
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
|
|
||||||
that are not referenced by anything, and simply occupy disk space.
|
|
||||||
|
|
||||||
|
|
||||||
The Fix
|
|
||||||
|
|
||||||
I've fixed this by creating a new data type 'lo', some support functions, and
|
|
||||||
a Trigger which handles the orphaning problem. The trigger essentially just
|
|
||||||
does a 'lo_unlink' whenever you delete or modify a value referencing a large
|
|
||||||
object. When you use this trigger, you are assuming that there is only one
|
|
||||||
database reference to any large object that is referenced in a
|
|
||||||
trigger-controlled column!
|
|
||||||
|
|
||||||
The 'lo' type was created because we needed to differentiate between plain
|
|
||||||
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
|
|
||||||
but (after talking to Byron), the ODBC driver needed a unique type. They had
|
|
||||||
created an 'lo' type, but not the solution to orphaning.
|
|
||||||
|
|
||||||
You don't actually have to use the 'lo' type to use the trigger, but it may be
|
|
||||||
convenient to use it to keep track of which columns in your database represent
|
|
||||||
large objects that you are managing with the trigger.
|
|
||||||
|
|
||||||
|
|
||||||
Install
|
|
||||||
|
|
||||||
Ok, first build the shared library, and install. Typing 'make install' in the
|
|
||||||
contrib/lo directory should do it.
|
|
||||||
|
|
||||||
Then, as the postgres super user, run the lo.sql script in any database that
|
|
||||||
needs the features. This will install the type, and define the support
|
|
||||||
functions. You can run the script once in template1, and the objects will be
|
|
||||||
inherited by subsequently-created databases.
|
|
||||||
|
|
||||||
|
|
||||||
How to Use
|
|
||||||
|
|
||||||
The easiest way is by an example:
|
|
||||||
|
|
||||||
> create table image (title text, raster lo);
|
|
||||||
> create trigger t_raster before update or delete on image
|
|
||||||
> for each row execute procedure lo_manage(raster);
|
|
||||||
|
|
||||||
Create a trigger for each column that contains a lo type, and give the column
|
|
||||||
name as the trigger procedure argument. You can have more than one trigger on
|
|
||||||
a table if you need multiple lo columns in the same table, but don't forget to
|
|
||||||
give a different name to each trigger.
|
|
||||||
|
|
||||||
|
|
||||||
Issues
|
|
||||||
|
|
||||||
* Dropping a table will still orphan any objects it contains, as the trigger
|
|
||||||
is not executed.
|
|
||||||
|
|
||||||
Avoid this by preceding the 'drop table' with 'delete from {table}'.
|
|
||||||
|
|
||||||
If you already have, or suspect you have, orphaned large objects, see
|
|
||||||
the contrib/vacuumlo module to help you clean them up. It's a good idea
|
|
||||||
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
|
|
||||||
trigger.
|
|
||||||
|
|
||||||
* Some frontends may create their own tables, and will not create the
|
|
||||||
associated trigger(s). Also, users may not remember (or know) to create
|
|
||||||
the triggers.
|
|
||||||
|
|
||||||
As the ODBC driver needs a permanent lo type (& JDBC could be optimised to
|
|
||||||
use it if it's Oid is fixed), and as the above issues can only be fixed by
|
|
||||||
some internal changes, I feel it should become a permanent built-in type.
|
|
||||||
|
|
||||||
I'm releasing this into contrib, just to get it out, and tested.
|
|
||||||
|
|
||||||
Peter Mount <peter@retep.org.uk> June 13 1998
|
|
|
@ -1,512 +0,0 @@
|
||||||
contrib/ltree module
|
|
||||||
|
|
||||||
ltree - is a PostgreSQL contrib module which contains implementation of data
|
|
||||||
types, indexed access methods and queries for data organized as a tree-like
|
|
||||||
structures.
|
|
||||||
This module will works for PostgreSQL version 7.3.
|
|
||||||
(version for 7.2 version is available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz)
|
|
||||||
-------------------------------------------------------------------------------
|
|
||||||
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
|
|
||||||
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist for
|
|
||||||
additional information. Authors would like to thank Eugeny Rodichev for helpful
|
|
||||||
discussions. Comments and bug reports are welcome.
|
|
||||||
-------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
|
|
||||||
itself). This work was done in framework of Russian Scientific Network and
|
|
||||||
partially supported by Russian Foundation for Basic Research and Stack Group.
|
|
||||||
-------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
MOTIVATION
|
|
||||||
|
|
||||||
This is a placeholder for introduction to the problem. Hope, people reading
|
|
||||||
this document doesn't need it too much :-)
|
|
||||||
|
|
||||||
DEFINITIONS
|
|
||||||
|
|
||||||
A label of a node is a sequence of one or more words separated by blank
|
|
||||||
character '_' and containing letters and digits ( for example, [a-zA-Z0-9] for
|
|
||||||
C locale). The length of a label is limited by 256 bytes.
|
|
||||||
|
|
||||||
Example: 'Countries', 'Personal_Services'
|
|
||||||
|
|
||||||
A label path of a node is a sequence of one or more dot-separated labels
|
|
||||||
l1.l2...ln, represents path from root to the node. The length of a label path
|
|
||||||
is limited by 65Kb, but size <= 2Kb is preferrable. We consider it's not a
|
|
||||||
strict limitation ( maximal size of label path for DMOZ catalogue - http://
|
|
||||||
www.dmoz.org, is about 240 bytes !)
|
|
||||||
|
|
||||||
Example: 'Top.Countries.Europe.Russia'
|
|
||||||
|
|
||||||
We introduce several datatypes:
|
|
||||||
|
|
||||||
ltree
|
|
||||||
- is a datatype for label path.
|
|
||||||
|
|
||||||
ltree[]
|
|
||||||
- is a datatype for arrays of ltree.
|
|
||||||
|
|
||||||
lquery
|
|
||||||
- is a path expression that has regular expression in the label path and
|
|
||||||
used for ltree matching. Star symbol (*) is used to specify any number of
|
|
||||||
labels (levels) and could be used at the beginning and the end of lquery,
|
|
||||||
for example, '*.Europe.*'.
|
|
||||||
|
|
||||||
The following quantifiers are recognized for '*' (like in Perl):
|
|
||||||
|
|
||||||
{n} Match exactly n levels
|
|
||||||
{n,} Match at least n levels
|
|
||||||
{n,m} Match at least n but not more than m levels
|
|
||||||
{,m} Match at maximum m levels (eq. to {0,m})
|
|
||||||
|
|
||||||
It is possible to use several modifiers at the end of a label:
|
|
||||||
|
|
||||||
|
|
||||||
@ Do case-insensitive label matching
|
|
||||||
* Do prefix matching for a label
|
|
||||||
% Don't account word separator '_' in label matching, that is
|
|
||||||
'Russian%' would match 'Russian_nations', but not 'Russian'
|
|
||||||
|
|
||||||
lquery could contains logical '!' (NOT) at the beginning of the label and '
|
|
||||||
|' (OR) to specify possible alternatives for label matching.
|
|
||||||
|
|
||||||
Example of lquery:
|
|
||||||
|
|
||||||
|
|
||||||
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
|
|
||||||
a) b) c) d) e)
|
|
||||||
|
|
||||||
A label path should
|
|
||||||
+ a) begins from a node with label 'Top'
|
|
||||||
+ b) and following zero or 2 labels until
|
|
||||||
+ c) a node with label beginning from case-insensitive prefix 'sport'
|
|
||||||
+ d) following node with label not matched 'football' or 'tennis' and
|
|
||||||
+ e) ends on node with label beginning from 'Russ' or strictly matched
|
|
||||||
'Spain'.
|
|
||||||
|
|
||||||
ltxtquery
|
|
||||||
- is a datatype for label searching (like type 'query' for full text
|
|
||||||
searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
|
|
||||||
the end of word. The meaning of modifiers are the same as for lquery.
|
|
||||||
|
|
||||||
Example: 'Europe & Russia*@ & !Transportation'
|
|
||||||
|
|
||||||
Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
|
|
||||||
not 'Transportation'. Notice, the order of words as they appear in label
|
|
||||||
path is not important !
|
|
||||||
|
|
||||||
OPERATIONS
|
|
||||||
|
|
||||||
The following operations are defined for type ltree:
|
|
||||||
|
|
||||||
<,>,<=,>=,=, <>
|
|
||||||
- have their usual meanings. Comparison is doing in the order of direct
|
|
||||||
tree traversing, children of a node are sorted lexicographic.
|
|
||||||
ltree @> ltree
|
|
||||||
- returns TRUE if left argument is an ancestor of right argument (or
|
|
||||||
equal).
|
|
||||||
ltree <@ ltree
|
|
||||||
- returns TRUE if left argument is a descendant of right argument (or
|
|
||||||
equal).
|
|
||||||
ltree ~ lquery, lquery ~ ltree
|
|
||||||
- return TRUE if node represented by ltree satisfies lquery.
|
|
||||||
ltree ? lquery[], lquery ? ltree[]
|
|
||||||
- return TRUE if node represented by ltree satisfies at least one lquery
|
|
||||||
from array.
|
|
||||||
ltree @ ltxtquery, ltxtquery @ ltree
|
|
||||||
- return TRUE if node represented by ltree satisfies ltxtquery.
|
|
||||||
ltree || ltree, ltree || text, text || ltree
|
|
||||||
- return concatenated ltree.
|
|
||||||
|
|
||||||
Operations for arrays of ltree (ltree[]):
|
|
||||||
|
|
||||||
ltree[] @> ltree, ltree <@ ltree[]
|
|
||||||
- returns TRUE if array ltree[] contains an ancestor of ltree.
|
|
||||||
ltree @> ltree[], ltree[] <@ ltree
|
|
||||||
- returns TRUE if array ltree[] contains a descendant of ltree.
|
|
||||||
ltree[] ~ lquery, lquery ~ ltree[]
|
|
||||||
- returns TRUE if array ltree[] contains label paths matched lquery.
|
|
||||||
ltree[] ? lquery[], lquery[] ? ltree[]
|
|
||||||
- returns TRUE if array ltree[] contains label paths matched atleaset one
|
|
||||||
lquery from array.
|
|
||||||
ltree[] @ ltxtquery, ltxtquery @ ltree[]
|
|
||||||
- returns TRUE if array ltree[] contains label paths matched ltxtquery
|
|
||||||
(full text search).
|
|
||||||
ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery
|
|
||||||
- returns first element of array ltree[] satisfies corresponding condition
|
|
||||||
and NULL in vice versa.
|
|
||||||
|
|
||||||
REMARK
|
|
||||||
|
|
||||||
Operations <@, @>, @ and ~ have analogues - ^<@, ^@>, ^@, ^~, which doesn't use
|
|
||||||
indices !
|
|
||||||
|
|
||||||
INDICES
|
|
||||||
|
|
||||||
Various indices could be created to speed up execution of operations:
|
|
||||||
|
|
||||||
* B-tree index over ltree:
|
|
||||||
<, <=, =, >=, >
|
|
||||||
* GiST index over ltree:
|
|
||||||
<, <=, =, >=, >, @>, <@, @, ~, ?
|
|
||||||
Example:
|
|
||||||
create index path_gist_idx on test using gist (path);
|
|
||||||
* GiST index over ltree[]:
|
|
||||||
ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.
|
|
||||||
Example:
|
|
||||||
create index path_gist_idx on test using gist (array_path);
|
|
||||||
Notices: This index is lossy.
|
|
||||||
|
|
||||||
FUNCTIONS
|
|
||||||
|
|
||||||
ltree subltree
|
|
||||||
ltree subltree(ltree, start, end)
|
|
||||||
returns subpath of ltree from start (inclusive) until the end.
|
|
||||||
# select subltree('Top.Child1.Child2',1,2);
|
|
||||||
subltree
|
|
||||||
--------
|
|
||||||
Child1
|
|
||||||
ltree subpath
|
|
||||||
ltree subpath(ltree, OFFSET,LEN)
|
|
||||||
ltree subpath(ltree, OFFSET)
|
|
||||||
returns subpath of ltree from OFFSET (inclusive) with length LEN.
|
|
||||||
If OFFSET is negative returns subpath starts that far from the end
|
|
||||||
of the path. If LENGTH is omitted, returns everything to the end
|
|
||||||
of the path. If LENGTH is negative, leaves that many labels off
|
|
||||||
the end of the path.
|
|
||||||
# select subpath('Top.Child1.Child2',1,2);
|
|
||||||
subpath
|
|
||||||
-------
|
|
||||||
Child1.Child2
|
|
||||||
|
|
||||||
# select subpath('Top.Child1.Child2',-2,1);
|
|
||||||
subpath
|
|
||||||
---------
|
|
||||||
Child1
|
|
||||||
int4 nlevel
|
|
||||||
|
|
||||||
int4 nlevel(ltree) - returns level of the node.
|
|
||||||
# select nlevel('Top.Child1.Child2');
|
|
||||||
nlevel
|
|
||||||
--------
|
|
||||||
3
|
|
||||||
Note, that arguments start, end, OFFSET, LEN have meaning of level of the
|
|
||||||
node !
|
|
||||||
|
|
||||||
int4 index(ltree,ltree), int4 index(ltree,ltree,OFFSET)
|
|
||||||
returns number of level of the first occurence of second argument in first
|
|
||||||
one beginning from OFFSET. if OFFSET is negative, than search begins from |
|
|
||||||
OFFSET| levels from the end of the path.
|
|
||||||
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
|
|
||||||
index
|
|
||||||
-------
|
|
||||||
6
|
|
||||||
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
|
|
||||||
index
|
|
||||||
-------
|
|
||||||
9
|
|
||||||
|
|
||||||
ltree text2ltree(text), text ltree2text(text)
|
|
||||||
cast functions for ltree and text.
|
|
||||||
|
|
||||||
|
|
||||||
ltree lca(ltree,ltree,...) (up to 8 arguments)
|
|
||||||
ltree lca(ltree[])
|
|
||||||
Returns Lowest Common Ancestor (lca)
|
|
||||||
# select lca('1.2.2.3','1.2.3.4.5.6');
|
|
||||||
lca
|
|
||||||
-----
|
|
||||||
1.2
|
|
||||||
# select lca('{la.2.3,1.2.3.4.5.6}') is null;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
f
|
|
||||||
|
|
||||||
|
|
||||||
INSTALLATION
|
|
||||||
|
|
||||||
cd contrib/ltree
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
make installcheck
|
|
||||||
|
|
||||||
EXAMPLE OF USAGE
|
|
||||||
|
|
||||||
createdb ltreetest
|
|
||||||
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
|
|
||||||
psql ltreetest < ltreetest.sql
|
|
||||||
|
|
||||||
Now, we have a database ltreetest populated with a data describing hierarchy
|
|
||||||
shown below:
|
|
||||||
|
|
||||||
|
|
||||||
TOP
|
|
||||||
/ | \
|
|
||||||
Science Hobbies Collections
|
|
||||||
/ | \
|
|
||||||
Astronomy Amateurs_Astronomy Pictures
|
|
||||||
/ \ |
|
|
||||||
Astrophysics Cosmology Astronomy
|
|
||||||
/ | \
|
|
||||||
Galaxies Stars Astronauts
|
|
||||||
|
|
||||||
Inheritance:
|
|
||||||
|
|
||||||
ltreetest=# select path from test where path <@ 'Top.Science';
|
|
||||||
path
|
|
||||||
------------------------------------
|
|
||||||
Top.Science
|
|
||||||
Top.Science.Astronomy
|
|
||||||
Top.Science.Astronomy.Astrophysics
|
|
||||||
Top.Science.Astronomy.Cosmology
|
|
||||||
(4 rows)
|
|
||||||
|
|
||||||
Matching:
|
|
||||||
|
|
||||||
ltreetest=# select path from test where path ~ '*.Astronomy.*';
|
|
||||||
path
|
|
||||||
-----------------------------------------------
|
|
||||||
Top.Science.Astronomy
|
|
||||||
Top.Science.Astronomy.Astrophysics
|
|
||||||
Top.Science.Astronomy.Cosmology
|
|
||||||
Top.Collections.Pictures.Astronomy
|
|
||||||
Top.Collections.Pictures.Astronomy.Stars
|
|
||||||
Top.Collections.Pictures.Astronomy.Galaxies
|
|
||||||
Top.Collections.Pictures.Astronomy.Astronauts
|
|
||||||
(7 rows)
|
|
||||||
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
|
|
||||||
path
|
|
||||||
------------------------------------
|
|
||||||
Top.Science.Astronomy
|
|
||||||
Top.Science.Astronomy.Astrophysics
|
|
||||||
Top.Science.Astronomy.Cosmology
|
|
||||||
(3 rows)
|
|
||||||
|
|
||||||
Full text search:
|
|
||||||
|
|
||||||
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
|
|
||||||
path
|
|
||||||
------------------------------------
|
|
||||||
Top.Science.Astronomy
|
|
||||||
Top.Science.Astronomy.Astrophysics
|
|
||||||
Top.Science.Astronomy.Cosmology
|
|
||||||
Top.Hobbies.Amateurs_Astronomy
|
|
||||||
(4 rows)
|
|
||||||
|
|
||||||
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
|
|
||||||
path
|
|
||||||
------------------------------------
|
|
||||||
Top.Science.Astronomy
|
|
||||||
Top.Science.Astronomy.Astrophysics
|
|
||||||
Top.Science.Astronomy.Cosmology
|
|
||||||
(3 rows)
|
|
||||||
|
|
||||||
Using Functions:
|
|
||||||
|
|
||||||
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
|
|
||||||
?column?
|
|
||||||
------------------------------------------
|
|
||||||
Top.Science.Space.Astronomy
|
|
||||||
Top.Science.Space.Astronomy.Astrophysics
|
|
||||||
Top.Science.Space.Astronomy.Cosmology
|
|
||||||
(3 rows)
|
|
||||||
We could create SQL-function:
|
|
||||||
CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
|
|
||||||
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
|
|
||||||
LANGUAGE SQL IMMUTABLE;
|
|
||||||
|
|
||||||
and previous select could be rewritten as:
|
|
||||||
|
|
||||||
ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
|
|
||||||
ins_label
|
|
||||||
------------------------------------------
|
|
||||||
Top.Science.Space.Astronomy
|
|
||||||
Top.Science.Space.Astronomy.Astrophysics
|
|
||||||
Top.Science.Space.Astronomy.Cosmology
|
|
||||||
(3 rows)
|
|
||||||
|
|
||||||
Or with another arguments:
|
|
||||||
|
|
||||||
CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
|
|
||||||
AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
|
|
||||||
LANGUAGE SQL IMMUTABLE;
|
|
||||||
|
|
||||||
ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
|
|
||||||
ins_label
|
|
||||||
------------------------------------------
|
|
||||||
Top.Science.Space.Astronomy
|
|
||||||
Top.Science.Space.Astronomy.Astrophysics
|
|
||||||
Top.Science.Space.Astronomy.Cosmology
|
|
||||||
(3 rows)
|
|
||||||
|
|
||||||
ADDITIONAL DATA
|
|
||||||
|
|
||||||
To get more feeling from our ltree module you could download
|
|
||||||
dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
|
|
||||||
available from http://www.sai.msu.su/~megera/postgres/gist/ltree/
|
|
||||||
dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
|
|
||||||
Setup your test database (dmoz), load ltree module and issue command:
|
|
||||||
|
|
||||||
zcat dmozltree-eng.sql.gz| psql dmoz
|
|
||||||
|
|
||||||
Data will be loaded into database dmoz and all indices will be created.
|
|
||||||
|
|
||||||
BENCHMARKS
|
|
||||||
|
|
||||||
All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
|
|
||||||
data, containing 300,274 nodes (see above for download link). We used some
|
|
||||||
basic queries typical for walking through catalog.
|
|
||||||
|
|
||||||
QUERIES
|
|
||||||
|
|
||||||
* Q0: Count all rows (sort of base time for comparison)
|
|
||||||
select count(*) from dmoz;
|
|
||||||
count
|
|
||||||
--------
|
|
||||||
300274
|
|
||||||
(1 row)
|
|
||||||
* Q1: Get direct children (without inheritance)
|
|
||||||
select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
|
|
||||||
path
|
|
||||||
-----------------------------------
|
|
||||||
Top.Adult.Arts.Animation.Cartoons
|
|
||||||
Top.Adult.Arts.Animation.Anime
|
|
||||||
(2 rows)
|
|
||||||
* Q2: The same as Q1 but with counting of successors
|
|
||||||
select path as parentpath , (select count(*)-1 from dmoz where path <@
|
|
||||||
p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
|
|
||||||
parentpath | count
|
|
||||||
-----------------------------------+-------
|
|
||||||
Top.Adult.Arts.Animation.Cartoons | 2
|
|
||||||
Top.Adult.Arts.Animation.Anime | 61
|
|
||||||
(2 rows)
|
|
||||||
* Q3: Get all parents
|
|
||||||
select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
|
|
||||||
path asc;
|
|
||||||
path
|
|
||||||
--------------------------
|
|
||||||
Top
|
|
||||||
Top.Adult
|
|
||||||
Top.Adult.Arts
|
|
||||||
Top.Adult.Arts.Animation
|
|
||||||
(4 rows)
|
|
||||||
* Q4: Get all parents with counting of children
|
|
||||||
select path, (select count(*)-1 from dmoz where path <@ p.path) as count
|
|
||||||
from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
|
|
||||||
path | count
|
|
||||||
--------------------------+--------
|
|
||||||
Top | 300273
|
|
||||||
Top.Adult | 4913
|
|
||||||
Top.Adult.Arts | 339
|
|
||||||
Top.Adult.Arts.Animation | 65
|
|
||||||
(4 rows)
|
|
||||||
* Q5: Get all children with levels
|
|
||||||
select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
|
|
||||||
from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
|
|
||||||
path | level
|
|
||||||
------------------------------------------------+-------
|
|
||||||
Top.Adult.Arts.Animation.Anime | 1
|
|
||||||
Top.Adult.Arts.Animation.Anime.Fan_Works | 2
|
|
||||||
Top.Adult.Arts.Animation.Anime.Games | 2
|
|
||||||
Top.Adult.Arts.Animation.Anime.Genres | 2
|
|
||||||
Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
|
|
||||||
Top.Adult.Arts.Animation.Anime.Multimedia | 2
|
|
||||||
Top.Adult.Arts.Animation.Anime.Resources | 2
|
|
||||||
Top.Adult.Arts.Animation.Anime.Titles | 2
|
|
||||||
Top.Adult.Arts.Animation.Cartoons | 1
|
|
||||||
Top.Adult.Arts.Animation.Cartoons.AVS | 2
|
|
||||||
Top.Adult.Arts.Animation.Cartoons.Members | 2
|
|
||||||
(11 rows)
|
|
||||||
|
|
||||||
Timings
|
|
||||||
|
|
||||||
+---------------------------------------------+
|
|
||||||
|Query|Rows|Time (ms) index|Time (ms) no index|
|
|
||||||
|-----+----+---------------+------------------|
|
|
||||||
| Q0| 1| NA| 1453.44|
|
|
||||||
|-----+----+---------------+------------------|
|
|
||||||
| Q1| 2| 0.49| 1001.54|
|
|
||||||
|-----+----+---------------+------------------|
|
|
||||||
| Q2| 2| 1.48| 3009.39|
|
|
||||||
|-----+----+---------------+------------------|
|
|
||||||
| Q3| 4| 0.55| 906.98|
|
|
||||||
|-----+----+---------------+------------------|
|
|
||||||
| Q4| 4| 24385.07| 4951.91|
|
|
||||||
|-----+----+---------------+------------------|
|
|
||||||
| Q5| 11| 0.85| 1003.23|
|
|
||||||
+---------------------------------------------+
|
|
||||||
Timings without indices were obtained using operations which doesn't use
|
|
||||||
indices (see above)
|
|
||||||
|
|
||||||
Remarks
|
|
||||||
|
|
||||||
We didn't run full-scale tests, also we didn't present (yet) data for
|
|
||||||
operations with arrays of ltree (ltree[]) and full text searching. We'll
|
|
||||||
appreciate your input. So far, below some (rather obvious) results:
|
|
||||||
|
|
||||||
* Indices does help execution of queries
|
|
||||||
* Q4 performs bad because one needs to read almost all data from the HDD
|
|
||||||
|
|
||||||
CHANGES
|
|
||||||
|
|
||||||
Mar 28, 2003
|
|
||||||
Added functions index(ltree,ltree,offset), text2ltree(text),
|
|
||||||
ltree2text(text)
|
|
||||||
Feb 7, 2003
|
|
||||||
Add ? operation
|
|
||||||
Fix ~ operation bug: eg '1.1.1' ~ '*.1'
|
|
||||||
Optimize index storage
|
|
||||||
Aug 9, 2002
|
|
||||||
Fixed very stupid but important bug :-)
|
|
||||||
July 31, 2002
|
|
||||||
Now works on 64-bit platforms.
|
|
||||||
Added function lca - lowest common ancestor
|
|
||||||
Version for 7.2 is distributed as separate package -
|
|
||||||
http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz
|
|
||||||
July 13, 2002
|
|
||||||
Initial release.
|
|
||||||
|
|
||||||
TODO
|
|
||||||
|
|
||||||
* Testing on 64-bit platforms. There are several known problems with byte
|
|
||||||
alignment; -- RESOLVED
|
|
||||||
* Better documentation;
|
|
||||||
* We plan (probably) to improve regular expressions processing using
|
|
||||||
non-deterministic automata;
|
|
||||||
* Some sort of XML support;
|
|
||||||
* Better full text searching;
|
|
||||||
|
|
||||||
SOME BACKGROUNDS
|
|
||||||
|
|
||||||
The approach we use for ltree is much like one we used in our other GiST based
|
|
||||||
contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
|
|
||||||
background is available in papers referenced from our GiST development page
|
|
||||||
(http://www.sai.msu.su/~megera/postgres/gist).
|
|
||||||
|
|
||||||
A hierarchical data structure (tree) is a set of nodes. Each node has a
|
|
||||||
signature (LPS) of a fixed size, which is a hashed label path of that node.
|
|
||||||
Traversing a tree we could *certainly* prune branches if
|
|
||||||
|
|
||||||
LQS (bitwise AND) LPS != LQS
|
|
||||||
|
|
||||||
where LQS is a signature of lquery or ltxtquery, obtained in the same way as
|
|
||||||
LPS.
|
|
||||||
|
|
||||||
ltree[]:
|
|
||||||
For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
|
|
||||||
reachable from that node. Signatures are stored in RD-tree, implemented using
|
|
||||||
GiST, which provides indexed access.
|
|
||||||
|
|
||||||
ltree:
|
|
||||||
For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
|
|
||||||
represented by (left_bound, signature, right_bound), so that we could speedup
|
|
||||||
operations <, <=, =, >=, > using left_bound, right_bound and prune branches of
|
|
||||||
a tree using signature.
|
|
||||||
-------------------------------------------------------------------------------
|
|
||||||
We ask people who find the module useful to send us a postcards to:
|
|
||||||
Moscow, 119899, Universitetski pr.13, Moscow State University, Sternberg
|
|
||||||
Astronomical Institute, Russia
|
|
||||||
For: Bartunov O.S.
|
|
||||||
and
|
|
||||||
Moscow, Bratislavskaya str.23, appt. 18, Russia
|
|
||||||
For: Sigaev F.G.
|
|
|
@ -1,94 +0,0 @@
|
||||||
The functions in this module allow you to inspect the contents of data pages
|
|
||||||
at a low level, for debugging purposes. All of these functions may be used
|
|
||||||
only by superusers.
|
|
||||||
|
|
||||||
1. Installation
|
|
||||||
|
|
||||||
$ make
|
|
||||||
$ make install
|
|
||||||
$ psql -e -f /usr/local/pgsql/share/contrib/pageinspect.sql test
|
|
||||||
|
|
||||||
2. Functions included:
|
|
||||||
|
|
||||||
get_raw_page
|
|
||||||
------------
|
|
||||||
get_raw_page reads one block of the named table and returns a copy as a
|
|
||||||
bytea field. This allows a single time-consistent copy of the block to be
|
|
||||||
made.
|
|
||||||
|
|
||||||
page_header
|
|
||||||
-----------
|
|
||||||
page_header shows fields which are common to all PostgreSQL heap and index
|
|
||||||
pages.
|
|
||||||
|
|
||||||
A page image obtained with get_raw_page should be passed as argument:
|
|
||||||
|
|
||||||
regression=# SELECT * FROM page_header(get_raw_page('pg_class',0));
|
|
||||||
lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid
|
|
||||||
-----------+-----+-------+-------+-------+---------+----------+---------+-----------
|
|
||||||
0/24A1B50 | 1 | 1 | 232 | 368 | 8192 | 8192 | 4 | 0
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
The returned columns correspond to the fields in the PageHeaderData struct.
|
|
||||||
See src/include/storage/bufpage.h for details.
|
|
||||||
|
|
||||||
heap_page_items
|
|
||||||
---------------
|
|
||||||
heap_page_items shows all line pointers on a heap page. For those line
|
|
||||||
pointers that are in use, tuple headers are also shown. All tuples are
|
|
||||||
shown, whether or not the tuples were visible to an MVCC snapshot at the
|
|
||||||
time the raw page was copied.
|
|
||||||
|
|
||||||
A heap page image obtained with get_raw_page should be passed as argument:
|
|
||||||
|
|
||||||
test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0));
|
|
||||||
|
|
||||||
See src/include/storage/itemid.h and src/include/access/htup.h for
|
|
||||||
explanations of the fields returned.
|
|
||||||
|
|
||||||
bt_metap
|
|
||||||
--------
|
|
||||||
bt_metap() returns information about a btree index's metapage:
|
|
||||||
|
|
||||||
test=> SELECT * FROM bt_metap('pg_cast_oid_index');
|
|
||||||
-[ RECORD 1 ]-----
|
|
||||||
magic | 340322
|
|
||||||
version | 2
|
|
||||||
root | 1
|
|
||||||
level | 0
|
|
||||||
fastroot | 1
|
|
||||||
fastlevel | 0
|
|
||||||
|
|
||||||
bt_page_stats
|
|
||||||
-------------
|
|
||||||
bt_page_stats() shows information about single btree pages:
|
|
||||||
|
|
||||||
test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
|
|
||||||
-[ RECORD 1 ]-+-----
|
|
||||||
blkno | 1
|
|
||||||
type | l
|
|
||||||
live_items | 256
|
|
||||||
dead_items | 0
|
|
||||||
avg_item_size | 12
|
|
||||||
page_size | 8192
|
|
||||||
free_size | 4056
|
|
||||||
btpo_prev | 0
|
|
||||||
btpo_next | 0
|
|
||||||
btpo | 0
|
|
||||||
btpo_flags | 3
|
|
||||||
|
|
||||||
bt_page_items
|
|
||||||
-------------
|
|
||||||
bt_page_items() returns information about specific items on btree pages:
|
|
||||||
|
|
||||||
test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
|
|
||||||
itemoffset | ctid | itemlen | nulls | vars | data
|
|
||||||
------------+---------+---------+-------+------+-------------
|
|
||||||
1 | (0,1) | 12 | f | f | 23 27 00 00
|
|
||||||
2 | (0,2) | 12 | f | f | 24 27 00 00
|
|
||||||
3 | (0,3) | 12 | f | f | 25 27 00 00
|
|
||||||
4 | (0,4) | 12 | f | f | 26 27 00 00
|
|
||||||
5 | (0,5) | 12 | f | f | 27 27 00 00
|
|
||||||
6 | (0,6) | 12 | f | f | 28 27 00 00
|
|
||||||
7 | (0,7) | 12 | f | f | 29 27 00 00
|
|
||||||
8 | (0,8) | 12 | f | f | 2a 27 00 00
|
|
|
@ -1,173 +0,0 @@
|
||||||
Pg_freespacemap - Real time queries on the free space map (FSM).
|
|
||||||
---------------
|
|
||||||
|
|
||||||
This module consists of two C functions: 'pg_freespacemap_relations()' and
|
|
||||||
'pg_freespacemap_pages()' that return a set of records, plus two views
|
|
||||||
'pg_freespacemap_relations' and 'pg_freespacemap_pages' for more
|
|
||||||
user-friendly access to the functions.
|
|
||||||
|
|
||||||
The module provides the ability to examine the contents of the free space
|
|
||||||
map, without having to restart or rebuild the server with additional
|
|
||||||
debugging code.
|
|
||||||
|
|
||||||
By default public access is REVOKED from the functions and views, just in
|
|
||||||
case there are security issues present in the code.
|
|
||||||
|
|
||||||
|
|
||||||
Installation
|
|
||||||
------------
|
|
||||||
|
|
||||||
Build and install the main Postgresql source, then this contrib module:
|
|
||||||
|
|
||||||
$ cd contrib/pg_freespacemap
|
|
||||||
$ gmake
|
|
||||||
$ gmake install
|
|
||||||
|
|
||||||
|
|
||||||
To register the functions and views:
|
|
||||||
|
|
||||||
$ psql -d <database> -f pg_freespacemap.sql
|
|
||||||
|
|
||||||
|
|
||||||
Notes
|
|
||||||
-----
|
|
||||||
|
|
||||||
The definitions for the columns exposed in the views are:
|
|
||||||
|
|
||||||
pg_freespacemap_relations
|
|
||||||
|
|
||||||
Column | references | Description
|
|
||||||
------------------+----------------------+----------------------------------
|
|
||||||
reltablespace | pg_tablespace.oid | Tablespace oid of the relation.
|
|
||||||
reldatabase | pg_database.oid | Database oid of the relation.
|
|
||||||
relfilenode | pg_class.relfilenode | Relfilenode of the relation.
|
|
||||||
avgrequest | | Moving average of free space
|
|
||||||
| | requests (NULL for indexes)
|
|
||||||
interestingpages | | Count of pages last reported as
|
|
||||||
| | containing useful free space.
|
|
||||||
storedpages | | Count of pages actually stored
|
|
||||||
| | in free space map.
|
|
||||||
nextpage | | Page index (from 0) to start next
|
|
||||||
| | search at.
|
|
||||||
|
|
||||||
|
|
||||||
pg_freespacemap_pages
|
|
||||||
|
|
||||||
Column | references | Description
|
|
||||||
----------------+----------------------+------------------------------------
|
|
||||||
reltablespace | pg_tablespace.oid | Tablespace oid of the relation.
|
|
||||||
reldatabase | pg_database.oid | Database oid of the relation.
|
|
||||||
relfilenode | pg_class.relfilenode | Relfilenode of the relation.
|
|
||||||
relblocknumber | | Page number in the relation.
|
|
||||||
bytes | | Free bytes in the page, or NULL
|
|
||||||
| | for an index page (see below).
|
|
||||||
|
|
||||||
|
|
||||||
For pg_freespacemap_relations, there is one row for each relation in the free
|
|
||||||
space map. storedpages is the number of pages actually stored in the map,
|
|
||||||
while interestingpages is the number of pages the last VACUUM thought had
|
|
||||||
useful amounts of free space.
|
|
||||||
|
|
||||||
If storedpages is consistently less than interestingpages then it'd be a
|
|
||||||
good idea to increase max_fsm_pages. Also, if the number of rows in
|
|
||||||
pg_freespacemap_relations is close to max_fsm_relations, then you should
|
|
||||||
consider increasing max_fsm_relations.
|
|
||||||
|
|
||||||
For pg_freespacemap_pages, there is one row for each page in the free space
|
|
||||||
map. The number of rows for a relation will match the storedpages column
|
|
||||||
in pg_freespacemap_relations.
|
|
||||||
|
|
||||||
For indexes, what is tracked is entirely-unused pages, rather than free
|
|
||||||
space within pages. Therefore, the average request size and free bytes
|
|
||||||
within a page are not meaningful, and are shown as NULL.
|
|
||||||
|
|
||||||
Because the map is shared by all the databases, it will include relations
|
|
||||||
not belonging to the current database.
|
|
||||||
|
|
||||||
When either of the views are accessed, internal free space map locks are
|
|
||||||
taken, and a copy of the map data is made for them to display.
|
|
||||||
This ensures that the views produce a consistent set of results, while not
|
|
||||||
blocking normal activity longer than necessary. Nonetheless there
|
|
||||||
could be some impact on database performance if they are read often.
|
|
||||||
|
|
||||||
|
|
||||||
Sample output - pg_freespacemap_relations
|
|
||||||
-------------
|
|
||||||
|
|
||||||
regression=# \d pg_freespacemap_relations
|
|
||||||
View "public.pg_freespacemap_relations"
|
|
||||||
Column | Type | Modifiers
|
|
||||||
------------------+---------+-----------
|
|
||||||
reltablespace | oid |
|
|
||||||
reldatabase | oid |
|
|
||||||
relfilenode | oid |
|
|
||||||
avgrequest | integer |
|
|
||||||
interestingpages | integer |
|
|
||||||
storedpages | integer |
|
|
||||||
nextpage | integer |
|
|
||||||
View definition:
|
|
||||||
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage
|
|
||||||
FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer);
|
|
||||||
|
|
||||||
regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages
|
|
||||||
FROM pg_freespacemap_relations r INNER JOIN pg_class c
|
|
||||||
ON c.relfilenode = r.relfilenode INNER JOIN pg_database d
|
|
||||||
ON r.reldatabase = d.oid AND (d.datname = current_database())
|
|
||||||
ORDER BY r.storedpages DESC LIMIT 10;
|
|
||||||
relname | avgrequest | interestingpages | storedpages
|
|
||||||
---------------------------------+------------+------------------+-------------
|
|
||||||
onek | 256 | 109 | 109
|
|
||||||
pg_attribute | 167 | 93 | 93
|
|
||||||
pg_class | 191 | 49 | 49
|
|
||||||
pg_attribute_relid_attnam_index | | 48 | 48
|
|
||||||
onek2 | 256 | 37 | 37
|
|
||||||
pg_depend | 95 | 26 | 26
|
|
||||||
pg_type | 199 | 16 | 16
|
|
||||||
pg_rewrite | 1011 | 13 | 13
|
|
||||||
pg_class_relname_nsp_index | | 10 | 10
|
|
||||||
pg_proc | 302 | 8 | 8
|
|
||||||
(10 rows)
|
|
||||||
|
|
||||||
|
|
||||||
Sample output - pg_freespacemap_pages
|
|
||||||
-------------
|
|
||||||
|
|
||||||
regression=# \d pg_freespacemap_pages
|
|
||||||
View "public.pg_freespacemap_pages"
|
|
||||||
Column | Type | Modifiers
|
|
||||||
----------------+---------+-----------
|
|
||||||
reltablespace | oid |
|
|
||||||
reldatabase | oid |
|
|
||||||
relfilenode | oid |
|
|
||||||
relblocknumber | bigint |
|
|
||||||
bytes | integer |
|
|
||||||
View definition:
|
|
||||||
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes
|
|
||||||
FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer);
|
|
||||||
|
|
||||||
regression=# SELECT c.relname, p.relblocknumber, p.bytes
|
|
||||||
FROM pg_freespacemap_pages p INNER JOIN pg_class c
|
|
||||||
ON c.relfilenode = p.relfilenode INNER JOIN pg_database d
|
|
||||||
ON (p.reldatabase = d.oid AND d.datname = current_database())
|
|
||||||
ORDER BY c.relname LIMIT 10;
|
|
||||||
relname | relblocknumber | bytes
|
|
||||||
--------------+----------------+-------
|
|
||||||
a_star | 0 | 8040
|
|
||||||
abstime_tbl | 0 | 7908
|
|
||||||
aggtest | 0 | 8008
|
|
||||||
altinhoid | 0 | 8128
|
|
||||||
altstartwith | 0 | 8128
|
|
||||||
arrtest | 0 | 7172
|
|
||||||
b_star | 0 | 7976
|
|
||||||
box_tbl | 0 | 7912
|
|
||||||
bt_f8_heap | 54 | 7728
|
|
||||||
bt_i4_heap | 49 | 8008
|
|
||||||
(10 rows)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Author
|
|
||||||
------
|
|
||||||
|
|
||||||
* Mark Kirkwood <markir@paradise.net.nz>
|
|
||||||
|
|
|
@ -1,206 +0,0 @@
|
||||||
pg_standby README 2006/12/08 Simon Riggs
|
|
||||||
|
|
||||||
o What is pg_standby?
|
|
||||||
|
|
||||||
pg_standby allows the creation of a Warm Standby server.
|
|
||||||
It is designed to be a production-ready program, as well as a
|
|
||||||
customisable template should you require specific modifications.
|
|
||||||
Other configuration is required as well, all of which is
|
|
||||||
described in the main server manual.
|
|
||||||
|
|
||||||
The program is designed to be a wait-for restore_command,
|
|
||||||
required to turn a normal archive recovery into a Warm Standby.
|
|
||||||
Within the restore_command of the recovery.conf you could
|
|
||||||
configure pg_standby in the following way:
|
|
||||||
|
|
||||||
restore_command = 'pg_standby archiveDir %f %p %r'
|
|
||||||
|
|
||||||
which would be sufficient to define that files will be restored
|
|
||||||
from archiveDir.
|
|
||||||
|
|
||||||
o features of pg_standby
|
|
||||||
|
|
||||||
- pg_standby is written in C. So it is very portable
|
|
||||||
and easy to install.
|
|
||||||
|
|
||||||
- supports copy or link from a directory (only)
|
|
||||||
|
|
||||||
- source easy to modify, with specifically designated
|
|
||||||
sections to modify for your own needs, allowing
|
|
||||||
interfaces to be written for additional Backup Archive Restore
|
|
||||||
(BAR) systems
|
|
||||||
|
|
||||||
- portable: tested on Linux and Windows
|
|
||||||
|
|
||||||
o How to install pg_standby
|
|
||||||
|
|
||||||
$make
|
|
||||||
$make install
|
|
||||||
|
|
||||||
o How to use pg_standby?
|
|
||||||
|
|
||||||
pg_standby should be used within the restore_command of the
|
|
||||||
recovery.conf file. See the main PostgreSQL manual for details.
|
|
||||||
|
|
||||||
The basic usage should be like this:
|
|
||||||
|
|
||||||
restore_command = 'pg_standby archiveDir %f %p %r'
|
|
||||||
|
|
||||||
with the pg_standby command usage as
|
|
||||||
|
|
||||||
pg_standby [OPTION]... ARCHIVELOCATION NEXTWALFILE XLOGFILEPATH [RESTARTWALFILE]
|
|
||||||
|
|
||||||
When used within the restore_command the %f and %p macros
|
|
||||||
will provide the actual file and path required for the restore/recovery.
|
|
||||||
|
|
||||||
pg_standby assumes that ARCHIVELOCATION is directory accessible by the
|
|
||||||
server-owning user.
|
|
||||||
|
|
||||||
If RESTARTWALFILE is specified, typically by using the %r option, then all files
|
|
||||||
prior to this file will be removed from ARCHIVELOCATION. This then minimises
|
|
||||||
the number of files that need to be held, whilst at the same time maintaining
|
|
||||||
restart capability. This capability additionally assumes that ARCHIVELOCATION
|
|
||||||
directory is writable.
|
|
||||||
|
|
||||||
o options
|
|
||||||
|
|
||||||
pg_standby allows the following command line switches
|
|
||||||
|
|
||||||
-c
|
|
||||||
use copy/cp command to restore WAL files from archive
|
|
||||||
|
|
||||||
-d
|
|
||||||
debug/logging option.
|
|
||||||
|
|
||||||
-k numfiles
|
|
||||||
Cleanup files in the archive so that we maintain no more
|
|
||||||
than this many files in the archive. This parameter will
|
|
||||||
be silently ignored if RESTARTWALFILE is specified, since
|
|
||||||
that specification method is more accurate in determining
|
|
||||||
the correct cut-off point in archive.
|
|
||||||
|
|
||||||
You should be wary against setting this number too low,
|
|
||||||
since this may mean you cannot restart the standby. This
|
|
||||||
is because the last restartpoint marked in the WAL files
|
|
||||||
may be many files in the past and can vary considerably.
|
|
||||||
This should be set to a value exceeding the number of WAL
|
|
||||||
files that can be recovered in 2*checkpoint_timeout seconds,
|
|
||||||
according to the value in the warm standby postgresql.conf.
|
|
||||||
It is wholly unrelated to the setting of checkpoint_segments
|
|
||||||
on either primary or standby.
|
|
||||||
|
|
||||||
Setting numfiles to be zero will disable deletion of files
|
|
||||||
from ARCHIVELOCATION.
|
|
||||||
|
|
||||||
If in doubt, use a large value or do not set a value at all.
|
|
||||||
|
|
||||||
If you specify neither RESTARTWALFILE nor -k, then -k 0
|
|
||||||
will be assumed, i.e. keep all files in archive.
|
|
||||||
Default=0, Min=0
|
|
||||||
|
|
||||||
-l
|
|
||||||
use ln command to restore WAL files from archive
|
|
||||||
WAL files will remain in archive
|
|
||||||
|
|
||||||
Link is more efficient, but the default is copy to
|
|
||||||
allow you to maintain the WAL archive for recovery
|
|
||||||
purposes as well as high-availability.
|
|
||||||
The default setting is not necessarily recommended,
|
|
||||||
consult the main database server manual for discussion.
|
|
||||||
|
|
||||||
This option uses the Windows Vista command mklink
|
|
||||||
to provide a file-to-file symbolic link. -l will
|
|
||||||
not work on versions of Windows prior to Vista.
|
|
||||||
Use the -c option instead.
|
|
||||||
see http://en.wikipedia.org/wiki/NTFS_symbolic_link
|
|
||||||
|
|
||||||
-r maxretries
|
|
||||||
the maximum number of times to retry the restore command if it
|
|
||||||
fails. After each failure, we wait for sleeptime * num_retries
|
|
||||||
so that the wait time increases progressively, so by default
|
|
||||||
we will wait 5 secs, 10 secs then 15 secs before reporting
|
|
||||||
the failure back to the database server. This will be
|
|
||||||
interpreted as and end of recovery and the Standby will come
|
|
||||||
up fully as a result.
|
|
||||||
Default=3, Min=0
|
|
||||||
|
|
||||||
-s sleeptime
|
|
||||||
the number of seconds to sleep between testing to see
|
|
||||||
if the file to be restored is available in the archive yet.
|
|
||||||
The default setting is not necessarily recommended,
|
|
||||||
consult the main database server manual for discussion.
|
|
||||||
Default=5, Min=1, Max=60
|
|
||||||
|
|
||||||
-t triggerfile
|
|
||||||
the presence of the triggerfile will cause recovery to end
|
|
||||||
whether or not the next file is available
|
|
||||||
It is recommended that you use a structured filename to
|
|
||||||
avoid confusion as to which server is being triggered
|
|
||||||
when multiple servers exist on same system.
|
|
||||||
e.g. /tmp/pgsql.trigger.5432
|
|
||||||
|
|
||||||
-w maxwaittime
|
|
||||||
the maximum number of seconds to wait for the next file,
|
|
||||||
after which recovery will end and the Standby will come up.
|
|
||||||
A setting of zero means wait forever.
|
|
||||||
The default setting is not necessarily recommended,
|
|
||||||
consult the main database server manual for discussion.
|
|
||||||
Default=0, Min=0
|
|
||||||
|
|
||||||
Note: --help is not supported since pg_standby is not intended
|
|
||||||
for interactive use, except during dev/test
|
|
||||||
|
|
||||||
o examples
|
|
||||||
|
|
||||||
Linux
|
|
||||||
|
|
||||||
archive_command = 'cp %p ../archive/%f'
|
|
||||||
|
|
||||||
restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log'
|
|
||||||
|
|
||||||
which will
|
|
||||||
- use a ln command to restore WAL files from archive
|
|
||||||
- produce logfile output in standby.log
|
|
||||||
- keep the last 255 full WAL files, plus the current one
|
|
||||||
- sleep for 2 seconds between checks for next WAL file is full
|
|
||||||
- never timeout if file not found
|
|
||||||
- stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears
|
|
||||||
|
|
||||||
Windows
|
|
||||||
|
|
||||||
archive_command = 'copy %p ..\\archive\\%f'
|
|
||||||
Note that backslashes need to be doubled in the archive_command, but
|
|
||||||
*not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
|
|
||||||
|
|
||||||
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442 ..\archive %f %p 2>> standby.log'
|
|
||||||
|
|
||||||
which will
|
|
||||||
- use a copy command to restore WAL files from archive
|
|
||||||
- produce logfile output in standby.log
|
|
||||||
- sleep for 5 seconds between checks for next WAL file is full
|
|
||||||
- never timeout if file not found
|
|
||||||
- stop waiting when a trigger file called C:\pgsql.trigger.5442 appears
|
|
||||||
|
|
||||||
o supported versions
|
|
||||||
|
|
||||||
pg_standby is designed to work with PostgreSQL 8.2 and later. It is
|
|
||||||
currently compatible across minor changes between the way 8.3 and 8.2
|
|
||||||
operate.
|
|
||||||
|
|
||||||
PostgreSQL 8.3 provides the %r command line substitution, designed to
|
|
||||||
let pg_standby know the last file it needs to keep. If the last
|
|
||||||
parameter is omitted, no error is generated, allowing pg_standby to
|
|
||||||
function correctly with PostgreSQL 8.2 also. With PostgreSQL 8.2,
|
|
||||||
the -k option must be used if archive cleanup is required. This option
|
|
||||||
remains available in 8.3.
|
|
||||||
|
|
||||||
o reported test success
|
|
||||||
|
|
||||||
SUSE Linux 10.2
|
|
||||||
Windows XP Pro
|
|
||||||
|
|
||||||
o additional design notes
|
|
||||||
|
|
||||||
The use of a move command seems like it would be a good idea, but
|
|
||||||
this would prevent recovery from being restartable. Also, the last WAL
|
|
||||||
file is always requested twice from the archive.
|
|
|
@ -1,144 +0,0 @@
|
||||||
trgm - Trigram matching for PostgreSQL
|
|
||||||
--------------------------------------
|
|
||||||
|
|
||||||
Introduction
|
|
||||||
|
|
||||||
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
|
|
||||||
|
|
||||||
The pg_trgm contrib module provides functions and index classes
|
|
||||||
for determining the similarity of text based on trigram
|
|
||||||
matching.
|
|
||||||
|
|
||||||
Definitions
|
|
||||||
|
|
||||||
Trigram (or Trigraph)
|
|
||||||
|
|
||||||
A trigram is a set of three consecutive characters taken
|
|
||||||
from a string. A string is considered to have two spaces
|
|
||||||
prefixed and one space suffixed when determining the set
|
|
||||||
of trigrams that comprise the string.
|
|
||||||
|
|
||||||
eg. The set of trigrams in the word "cat" is " c", " ca",
|
|
||||||
"at " and "cat".
|
|
||||||
|
|
||||||
Public Functions
|
|
||||||
|
|
||||||
real similarity(text, text)
|
|
||||||
|
|
||||||
Returns a number that indicates how closely matches the two
|
|
||||||
arguments are. A zero result indicates that the two words
|
|
||||||
are completely dissimilar, and a result of one indicates that
|
|
||||||
the two words are identical.
|
|
||||||
|
|
||||||
real show_limit()
|
|
||||||
|
|
||||||
Returns the current similarity threshold used by the '%'
|
|
||||||
operator. This in effect sets the minimum similarity between
|
|
||||||
two words in order that they be considered similar enough to
|
|
||||||
be misspellings of each other, for example.
|
|
||||||
|
|
||||||
real set_limit(real)
|
|
||||||
|
|
||||||
Sets the current similarity threshold that is used by the '%'
|
|
||||||
operator, and is returned by the show_limit() function.
|
|
||||||
|
|
||||||
text[] show_trgm(text)
|
|
||||||
|
|
||||||
Returns an array of all the trigrams of the supplied text
|
|
||||||
parameter.
|
|
||||||
|
|
||||||
Public Operators
|
|
||||||
|
|
||||||
text % text (returns boolean)
|
|
||||||
|
|
||||||
The '%' operator returns TRUE if its two arguments have a similarity
|
|
||||||
that is greater than the similarity threshold set by set_limit(). It
|
|
||||||
will return FALSE if the similarity is less than the current
|
|
||||||
threshold.
|
|
||||||
|
|
||||||
Public Index Operator Classes
|
|
||||||
|
|
||||||
gist_trgm_ops
|
|
||||||
|
|
||||||
The pg_trgm module comes with an index operator class that allows a
|
|
||||||
developer to create an index over a text column for the purpose
|
|
||||||
of very fast similarity searches.
|
|
||||||
|
|
||||||
To use this index, the '%' operator must be used and an appropriate
|
|
||||||
similarity threshold for the application must be set.
|
|
||||||
|
|
||||||
eg.
|
|
||||||
|
|
||||||
CREATE TABLE test_trgm (t text);
|
|
||||||
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
|
|
||||||
|
|
||||||
At this point, you will have an index on the t text column that you
|
|
||||||
can use for similarity searching.
|
|
||||||
|
|
||||||
eg.
|
|
||||||
|
|
||||||
SELECT
|
|
||||||
t,
|
|
||||||
similarity(t, 'word') AS sml
|
|
||||||
FROM
|
|
||||||
test_trgm
|
|
||||||
WHERE
|
|
||||||
t % 'word'
|
|
||||||
ORDER BY
|
|
||||||
sml DESC, t;
|
|
||||||
|
|
||||||
This will return all values in the text column that are sufficiently
|
|
||||||
similar to 'word', sorted from best match to worst. The index will
|
|
||||||
be used to make this a fast operation over very large data sets.
|
|
||||||
|
|
||||||
Tsearch2 Integration
|
|
||||||
|
|
||||||
Trigram matching is a very useful tool when used in conjunction
|
|
||||||
with a text index created by the Tsearch2 contrib module. (See
|
|
||||||
contrib/tsearch2)
|
|
||||||
|
|
||||||
The first step is to generate an auxiliary table containing all
|
|
||||||
the unique words in the Tsearch2 index:
|
|
||||||
|
|
||||||
CREATE TABLE words AS SELECT word FROM
|
|
||||||
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
|
|
||||||
|
|
||||||
Where 'documents' is a table that has a text field 'bodytext'
|
|
||||||
that TSearch2 is used to search. The use of the 'simple' dictionary
|
|
||||||
with the to_tsvector function, instead of just using the already
|
|
||||||
existing vector is to avoid creating a list of already stemmed
|
|
||||||
words. This way, only the original, unstemmed words are added
|
|
||||||
to the word list.
|
|
||||||
|
|
||||||
Next, create a trigram index on the word column:
|
|
||||||
|
|
||||||
CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
|
|
||||||
or
|
|
||||||
CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
|
|
||||||
|
|
||||||
Now, a SELECT query similar to the example above can be used to
|
|
||||||
suggest spellings for misspelled words in user search terms. A
|
|
||||||
useful extra clause is to ensure that the similar words are also
|
|
||||||
of similar length to the misspelled word.
|
|
||||||
|
|
||||||
Note: Since the 'words' table has been generated as a separate,
|
|
||||||
static table, it will need to be periodically regenerated so that
|
|
||||||
it remains up to date with the word list in the Tsearch2 index.
|
|
||||||
|
|
||||||
Authors
|
|
||||||
|
|
||||||
Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
|
|
||||||
Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
|
|
||||||
|
|
||||||
Contributors
|
|
||||||
|
|
||||||
Christopher Kings-Lynne wrote this README file
|
|
||||||
|
|
||||||
References
|
|
||||||
|
|
||||||
Tsearch2 Development Site
|
|
||||||
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
|
|
||||||
|
|
||||||
GiST Development Site
|
|
||||||
http://www.sai.msu.su/~megera/postgres/gist/
|
|
||||||
|
|
|
@ -1,284 +0,0 @@
|
||||||
$PostgreSQL: pgsql/contrib/pgbench/README.pgbench,v 1.20 2007/07/06 20:17:02 wieck Exp $
|
|
||||||
|
|
||||||
pgbench README
|
|
||||||
|
|
||||||
o What is pgbench?
|
|
||||||
|
|
||||||
pgbench is a simple program to run a benchmark test. pgbench is a
|
|
||||||
client application of PostgreSQL and runs with PostgreSQL only. It
|
|
||||||
performs lots of small and simple transactions including
|
|
||||||
SELECT/UPDATE/INSERT operations then calculates number of
|
|
||||||
transactions successfully completed within a second (transactions
|
|
||||||
per second, tps). Targeting data includes a table with at least 100k
|
|
||||||
tuples.
|
|
||||||
|
|
||||||
Example outputs from pgbench look like:
|
|
||||||
|
|
||||||
number of clients: 4
|
|
||||||
number of transactions per client: 100
|
|
||||||
number of processed transactions: 400/400
|
|
||||||
tps = 19.875015(including connections establishing)
|
|
||||||
tps = 20.098827(excluding connections establishing)
|
|
||||||
|
|
||||||
Similar program called "JDBCBench" already exists, but it requires
|
|
||||||
Java that may not be available on every platform. Moreover some
|
|
||||||
people concerned about the overhead of Java that might lead
|
|
||||||
inaccurate results. So I decided to write in pure C, and named
|
|
||||||
it "pgbench."
|
|
||||||
|
|
||||||
o features of pgbench
|
|
||||||
|
|
||||||
- pgbench is written in C using libpq only. So it is very portable
|
|
||||||
and easy to install.
|
|
||||||
|
|
||||||
- pgbench can simulate concurrent connections using asynchronous
|
|
||||||
capability of libpq. No threading is required.
|
|
||||||
|
|
||||||
o How to install pgbench
|
|
||||||
|
|
||||||
$make
|
|
||||||
$make install
|
|
||||||
|
|
||||||
o How to use pgbench?
|
|
||||||
|
|
||||||
(1) (optional)Initialize database by:
|
|
||||||
|
|
||||||
pgbench -i <dbname>
|
|
||||||
|
|
||||||
where <dbname> is the name of database. pgbench uses four tables
|
|
||||||
accounts, branches, history and tellers. These tables will be
|
|
||||||
destroyed. Be very careful if you have tables having same
|
|
||||||
names. Default test data contains:
|
|
||||||
|
|
||||||
table # of tuples
|
|
||||||
-------------------------
|
|
||||||
branches 1
|
|
||||||
tellers 10
|
|
||||||
accounts 100000
|
|
||||||
history 0
|
|
||||||
|
|
||||||
You can increase the number of tuples by using -s option. branches,
|
|
||||||
tellers and accounts tables are created with a fillfactor which is
|
|
||||||
set using -F option. See below.
|
|
||||||
|
|
||||||
(2) Run the benchmark test
|
|
||||||
|
|
||||||
pgbench <dbname>
|
|
||||||
|
|
||||||
The default configuration is:
|
|
||||||
|
|
||||||
number of clients: 1
|
|
||||||
number of transactions per client: 10
|
|
||||||
|
|
||||||
o options
|
|
||||||
|
|
||||||
pgbench has number of options.
|
|
||||||
|
|
||||||
-h hostname
|
|
||||||
hostname where the backend is running. If this option
|
|
||||||
is omitted, pgbench will connect to the localhost via
|
|
||||||
Unix domain socket.
|
|
||||||
|
|
||||||
-p port
|
|
||||||
the port number that the backend is accepting. default is
|
|
||||||
libpq's default, usually 5432.
|
|
||||||
|
|
||||||
-c number_of_clients
|
|
||||||
Number of clients simulated. default is 1.
|
|
||||||
|
|
||||||
-t number_of_transactions
|
|
||||||
Number of transactions each client runs. default is 10.
|
|
||||||
|
|
||||||
-s scaling_factor
|
|
||||||
this should be used with -i (initialize) option.
|
|
||||||
number of tuples generated will be multiple of the
|
|
||||||
scaling factor. For example, -s 100 will imply 10M
|
|
||||||
(10,000,000) tuples in the accounts table.
|
|
||||||
default is 1. NOTE: scaling factor should be at least
|
|
||||||
as large as the largest number of clients you intend
|
|
||||||
to test; else you'll mostly be measuring update contention.
|
|
||||||
Regular (not initializing) runs using one of the
|
|
||||||
built-in tests will detect scale based on the number of
|
|
||||||
branches in the database. For custom (-f) runs it can
|
|
||||||
be manually specified with this parameter.
|
|
||||||
|
|
||||||
-D varname=value
|
|
||||||
Define a variable. It can be refered to by a script
|
|
||||||
provided by using -f option. Multiple -D options are allowed.
|
|
||||||
|
|
||||||
-U login
|
|
||||||
Specify db user's login name if it is different from
|
|
||||||
the Unix login name.
|
|
||||||
|
|
||||||
-P password
|
|
||||||
Specify the db password. CAUTION: using this option
|
|
||||||
might be a security hole since ps command will
|
|
||||||
show the password. Use this for TESTING PURPOSE ONLY.
|
|
||||||
|
|
||||||
-n
|
|
||||||
No vacuuming and cleaning the history table prior to the
|
|
||||||
test is performed.
|
|
||||||
|
|
||||||
-v
|
|
||||||
Do vacuuming before testing. This will take some time.
|
|
||||||
With neither -n nor -v, pgbench will vacuum tellers and
|
|
||||||
branches tables only.
|
|
||||||
|
|
||||||
-S
|
|
||||||
Perform select only transactions instead of TPC-B.
|
|
||||||
|
|
||||||
-N Do not update "branches" and "tellers". This will
|
|
||||||
avoid heavy update contention on branches and tellers,
|
|
||||||
while it will not make pgbench supporting TPC-B like
|
|
||||||
transactions.
|
|
||||||
|
|
||||||
-f filename
|
|
||||||
Read transaction script from file. Detailed
|
|
||||||
explanation will appear later.
|
|
||||||
|
|
||||||
-C
|
|
||||||
Establish connection for each transaction, rather than
|
|
||||||
doing it just once at beginning of pgbench in the normal
|
|
||||||
mode. This is useful to measure the connection overhead.
|
|
||||||
|
|
||||||
-l
|
|
||||||
Write the time taken by each transaction to a logfile,
|
|
||||||
with the name "pgbench_log.xxx", where xxx is the PID
|
|
||||||
of the pgbench process. The format of the log is:
|
|
||||||
|
|
||||||
client_id transaction_no time file_no time-epoch time-us
|
|
||||||
|
|
||||||
where time is measured in microseconds, , the file_no is
|
|
||||||
which test file was used (useful when multiple were
|
|
||||||
specified with -f), and time-epoch/time-us are a
|
|
||||||
UNIX epoch format timestamp followed by an offset
|
|
||||||
in microseconds (suitable for creating a ISO 8601
|
|
||||||
timestamp with a fraction of a second) of when
|
|
||||||
the transaction completed.
|
|
||||||
|
|
||||||
Here are example outputs:
|
|
||||||
|
|
||||||
0 199 2241 0 1175850568 995598
|
|
||||||
0 200 2465 0 1175850568 998079
|
|
||||||
0 201 2513 0 1175850569 608
|
|
||||||
0 202 2038 0 1175850569 2663
|
|
||||||
|
|
||||||
-F fillfactor
|
|
||||||
|
|
||||||
Create tables(accounts, tellers and branches) with the given
|
|
||||||
fillfactor. Default is 100. This should be used with -i
|
|
||||||
(initialize) option.
|
|
||||||
|
|
||||||
-d
|
|
||||||
debug option.
|
|
||||||
|
|
||||||
|
|
||||||
o What is the "transaction" actually performed in pgbench?
|
|
||||||
|
|
||||||
(1) begin;
|
|
||||||
|
|
||||||
(2) update accounts set abalance = abalance + :delta where aid = :aid;
|
|
||||||
|
|
||||||
(3) select abalance from accounts where aid = :aid;
|
|
||||||
|
|
||||||
(4) update tellers set tbalance = tbalance + :delta where tid = :tid;
|
|
||||||
|
|
||||||
(5) update branches set bbalance = bbalance + :delta where bid = :bid;
|
|
||||||
|
|
||||||
(6) insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);
|
|
||||||
|
|
||||||
(7) end;
|
|
||||||
|
|
||||||
If you specify -N, (4) and (5) aren't included in the transaction.
|
|
||||||
|
|
||||||
o -f option
|
|
||||||
|
|
||||||
This supports for reading transaction script from a specified
|
|
||||||
file. This file should include SQL commands in each line. SQL
|
|
||||||
command consists of multiple lines are not supported. Empty lines
|
|
||||||
and lines begging with "--" will be ignored.
|
|
||||||
|
|
||||||
Multiple -f options are allowed. In this case each transaction is
|
|
||||||
assigned randomly chosen script.
|
|
||||||
|
|
||||||
SQL commands can include "meta command" which begins with "\" (back
|
|
||||||
slash). A meta command takes some arguments separted by white
|
|
||||||
spaces. Currently following meta command is supported:
|
|
||||||
|
|
||||||
\set name operand1 [ operator operand2 ]
|
|
||||||
set the calculated value using "operand1" "operator"
|
|
||||||
"operand2" to variable "name". If "operator" and "operand2"
|
|
||||||
are omitted, the value of operand1 is set to variable "name".
|
|
||||||
|
|
||||||
example:
|
|
||||||
|
|
||||||
\set ntellers 10 * :scale
|
|
||||||
|
|
||||||
\setrandom name min max
|
|
||||||
|
|
||||||
assign random integer to name between min and max
|
|
||||||
|
|
||||||
example:
|
|
||||||
|
|
||||||
\setrandom aid 1 100000
|
|
||||||
|
|
||||||
variables can be reffered to in SQL comands by adding ":" in front
|
|
||||||
of the varible name.
|
|
||||||
|
|
||||||
example:
|
|
||||||
|
|
||||||
SELECT abalance FROM accounts WHERE aid = :aid
|
|
||||||
|
|
||||||
Variables can also be defined by using -D option.
|
|
||||||
|
|
||||||
\sleep num [us|ms|s]
|
|
||||||
|
|
||||||
causes script execution to sleep for the specified duration of
|
|
||||||
microseconds (us), milliseconds (ms) or the default seconds (s).
|
|
||||||
|
|
||||||
example:
|
|
||||||
|
|
||||||
\setrandom millisec 1000 2500
|
|
||||||
\sleep :millisec ms
|
|
||||||
|
|
||||||
Example, TPC-B like benchmark can be defined as follows(scaling
|
|
||||||
factor = 1):
|
|
||||||
|
|
||||||
\set nbranches :scale
|
|
||||||
\set ntellers 10 * :scale
|
|
||||||
\set naccounts 100000 * :scale
|
|
||||||
\setrandom aid 1 :naccounts
|
|
||||||
\setrandom bid 1 :nbranches
|
|
||||||
\setrandom tid 1 :ntellers
|
|
||||||
\setrandom delta 1 10000
|
|
||||||
BEGIN
|
|
||||||
UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
|
|
||||||
SELECT abalance FROM accounts WHERE aid = :aid
|
|
||||||
UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
|
|
||||||
UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
|
|
||||||
INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
|
|
||||||
END
|
|
||||||
|
|
||||||
If you want to automatically set the scaling factor from the number of
|
|
||||||
tuples in branches table, use -s option and shell command like this:
|
|
||||||
|
|
||||||
pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
|
|
||||||
|
|
||||||
Notice that -f option does not execute vacuum and clearing history
|
|
||||||
table before starting benchmark.
|
|
||||||
|
|
||||||
o License?
|
|
||||||
|
|
||||||
Basically it is same as BSD license. See pgbench.c for more details.
|
|
||||||
|
|
||||||
o History before contributed to PostgreSQL
|
|
||||||
|
|
||||||
2000/1/15 pgbench-1.2 contributed to PostgreSQL
|
|
||||||
* Add -v option
|
|
||||||
|
|
||||||
1999/09/29 pgbench-1.1 released
|
|
||||||
* Apply cygwin patches contributed by Yutaka Tanida
|
|
||||||
* More robust when backends die
|
|
||||||
* Add -S option (select only)
|
|
||||||
|
|
||||||
1999/09/04 pgbench-1.0 released
|
|
|
@ -1,709 +0,0 @@
|
||||||
pgcrypto - cryptographic functions for PostgreSQL
|
|
||||||
=================================================
|
|
||||||
Marko Kreen <markokr@gmail.com>
|
|
||||||
|
|
||||||
// Note: this document is in asciidoc format.
|
|
||||||
|
|
||||||
|
|
||||||
1. Installation
|
|
||||||
-----------------
|
|
||||||
|
|
||||||
Run following commands:
|
|
||||||
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
make installcheck
|
|
||||||
|
|
||||||
The `make installcheck` command is important. It runs regression tests
|
|
||||||
for the module. They make sure the functions here produce correct
|
|
||||||
results.
|
|
||||||
|
|
||||||
Next, to put the functions into a particular database, run the commands in
|
|
||||||
file pgcrypto.sql, which has been installed into the shared files directory.
|
|
||||||
|
|
||||||
Example using psql:
|
|
||||||
|
|
||||||
psql -d DBNAME -f pgcrypto.sql
|
|
||||||
|
|
||||||
|
|
||||||
2. Notes
|
|
||||||
----------
|
|
||||||
|
|
||||||
2.1. Configuration
|
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
pgcrypto configures itself according to the findings of main PostgreSQL
|
|
||||||
`configure` script. The options that affect it are `--with-zlib` and
|
|
||||||
`--with-openssl`.
|
|
||||||
|
|
||||||
When compiled with zlib, PGP encryption functions are able to
|
|
||||||
compress data before encrypting.
|
|
||||||
|
|
||||||
When compiled with OpenSSL there will be more algorithms available.
|
|
||||||
Also public-key encryption functions will be faster as OpenSSL
|
|
||||||
has more optimized BIGNUM functions.
|
|
||||||
|
|
||||||
Summary of functionality with and without OpenSSL:
|
|
||||||
|
|
||||||
`----------------------------`---------`------------
|
|
||||||
Functionality built-in OpenSSL
|
|
||||||
----------------------------------------------------
|
|
||||||
MD5 yes yes
|
|
||||||
SHA1 yes yes
|
|
||||||
SHA224/256/384/512 yes yes (3)
|
|
||||||
Any other digest algo no yes (1)
|
|
||||||
Blowfish yes yes
|
|
||||||
AES yes yes (2)
|
|
||||||
DES/3DES/CAST5 no yes
|
|
||||||
Raw encryption yes yes
|
|
||||||
PGP Symmetric encryption yes yes
|
|
||||||
PGP Public-Key encryption yes yes
|
|
||||||
----------------------------------------------------
|
|
||||||
|
|
||||||
1. Any digest algorithm OpenSSL supports is automatically picked up.
|
|
||||||
This is not possible with ciphers, which need to be supported
|
|
||||||
explicitly.
|
|
||||||
|
|
||||||
2. AES is included in OpenSSL since version 0.9.7. If pgcrypto is
|
|
||||||
compiled against older version, it will use built-in AES code,
|
|
||||||
so it has AES always available.
|
|
||||||
|
|
||||||
3. SHA2 algorithms were added to OpenSSL in version 0.9.8. For
|
|
||||||
older versions, pgcrypto will use built-in code.
|
|
||||||
|
|
||||||
|
|
||||||
2.2. NULL handling
|
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
As standard in SQL, all functions return NULL, if any of the arguments
|
|
||||||
are NULL. This may create security risks on careless usage.
|
|
||||||
|
|
||||||
|
|
||||||
2.3. Security
|
|
||||||
~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
All the functions here run inside database server. That means that all
|
|
||||||
the data and passwords move between pgcrypto and client application in
|
|
||||||
clear-text. Thus you must:
|
|
||||||
|
|
||||||
1. Connect locally or use SSL connections.
|
|
||||||
2. Trust both system and database administrator.
|
|
||||||
|
|
||||||
If you cannot, then better do crypto inside client application.
|
|
||||||
|
|
||||||
|
|
||||||
3. General hashing
|
|
||||||
--------------------
|
|
||||||
|
|
||||||
3.1. digest(data, type)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
digest(data text, type text) RETURNS bytea
|
|
||||||
digest(data bytea, type text) RETURNS bytea
|
|
||||||
|
|
||||||
Type is here the algorithm to use. Standard algorithms are `md5` and
|
|
||||||
`sha1`, although there may be more supported, depending on build
|
|
||||||
options.
|
|
||||||
|
|
||||||
Returns binary hash.
|
|
||||||
|
|
||||||
If you want hexadecimal string, use `encode()` on result. Example:
|
|
||||||
|
|
||||||
CREATE OR REPLACE FUNCTION sha1(bytea) RETURNS text AS $$
|
|
||||||
SELECT encode(digest($1, 'sha1'), 'hex')
|
|
||||||
$$ LANGUAGE SQL STRICT IMMUTABLE;
|
|
||||||
|
|
||||||
|
|
||||||
3.2. hmac(data, key, type)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
hmac(data text, key text, type text) RETURNS bytea
|
|
||||||
hmac(data bytea, key text, type text) RETURNS bytea
|
|
||||||
|
|
||||||
Calculates Hashed MAC over data. `type` is the same as in `digest()`.
|
|
||||||
If the key is larger than hash block size it will first hashed and the
|
|
||||||
hash will be used as key.
|
|
||||||
|
|
||||||
It is similar to digest() but the hash can be recalculated only knowing
|
|
||||||
the key. This avoids the scenario of someone altering data and also
|
|
||||||
changing the hash.
|
|
||||||
|
|
||||||
Returns binary hash.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
4. Password hashing
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
The functions `crypt()` and `gen_salt()` are specifically designed
|
|
||||||
for hashing passwords. `crypt()` does the hashing and `gen_salt()`
|
|
||||||
prepares algorithm parameters for it.
|
|
||||||
|
|
||||||
The algorithms in `crypt()` differ from usual hashing algorithms like
|
|
||||||
MD5 or SHA1 in following respects:
|
|
||||||
|
|
||||||
1. They are slow. As the amount of data is so small, this is only
|
|
||||||
way to make brute-forcing passwords hard.
|
|
||||||
2. Include random 'salt' with result, so that users having same
|
|
||||||
password would have different crypted passwords. This is also
|
|
||||||
additional defense against reversing the algorithm.
|
|
||||||
3. Include algorithm type in the result, so passwords hashed with
|
|
||||||
different algorithms can co-exist.
|
|
||||||
4. Some of them are adaptive - that means after computers get
|
|
||||||
faster, you can tune the algorithm to be slower, without
|
|
||||||
introducing incompatibility with existing passwords.
|
|
||||||
|
|
||||||
Supported algorithms:
|
|
||||||
`------`-------------`---------`----------`---------------------------
|
|
||||||
Type Max password Adaptive Salt bits Description
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
`bf` 72 yes 128 Blowfish-based, variant 2a
|
|
||||||
`md5` unlimited no 48 md5-based crypt()
|
|
||||||
`xdes` 8 yes 24 Extended DES
|
|
||||||
`des` 8 no 12 Original UNIX crypt
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
4.1. crypt(password, salt)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
crypt(password text, salt text) RETURNS text
|
|
||||||
|
|
||||||
Calculates UN*X crypt(3) style hash of password. When storing new
|
|
||||||
password, you need to use function `gen_salt()` to generate new salt.
|
|
||||||
When checking password you should use existing hash as salt.
|
|
||||||
|
|
||||||
Example - setting new password:
|
|
||||||
|
|
||||||
UPDATE .. SET pswhash = crypt('new password', gen_salt('md5'));
|
|
||||||
|
|
||||||
Example - authentication:
|
|
||||||
|
|
||||||
SELECT pswhash = crypt('entered password', pswhash) WHERE .. ;
|
|
||||||
|
|
||||||
returns true or false whether the entered password is correct.
|
|
||||||
It also can return NULL if `pswhash` field is NULL.
|
|
||||||
|
|
||||||
|
|
||||||
4.2. gen_salt(type)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
gen_salt(type text) RETURNS text
|
|
||||||
|
|
||||||
Generates a new random salt for usage in `crypt()`. For adaptible
|
|
||||||
algorithms, it uses the default iteration count.
|
|
||||||
|
|
||||||
Accepted types are: `des`, `xdes`, `md5` and `bf`.
|
|
||||||
|
|
||||||
|
|
||||||
4.3. gen_salt(type, rounds)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
gen_salt(type text, rounds integer) RETURNS text
|
|
||||||
|
|
||||||
Same as above, but lets user specify iteration count for some
|
|
||||||
algorithms. The higher the count, the more time it takes to hash
|
|
||||||
the password and therefore the more time to break it. Although with
|
|
||||||
too high count the time to calculate a hash may be several years
|
|
||||||
- which is somewhat impractical.
|
|
||||||
|
|
||||||
Number is algorithm specific:
|
|
||||||
|
|
||||||
`-----'---------'-----'----------
|
|
||||||
type default min max
|
|
||||||
---------------------------------
|
|
||||||
`xdes` 725 1 16777215
|
|
||||||
`bf` 6 4 31
|
|
||||||
---------------------------------
|
|
||||||
|
|
||||||
In case of xdes there is a additional limitation that the count must be
|
|
||||||
a odd number.
|
|
||||||
|
|
||||||
Notes:
|
|
||||||
|
|
||||||
- Original DES crypt was designed to have the speed of 4 hashes per
|
|
||||||
second on the hardware of that time.
|
|
||||||
- Slower than 4 hashes per second would probably dampen usability.
|
|
||||||
- Faster than 100 hashes per second is probably too fast.
|
|
||||||
- See next section about possible values for `crypt-bf`.
|
|
||||||
|
|
||||||
|
|
||||||
4.4. Comparison of crypt and regular hashes
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Here is a table that should give overview of relative slowness
|
|
||||||
of different hashing algorithms.
|
|
||||||
|
|
||||||
* The goal is to crack a 8-character password, which consists:
|
|
||||||
1. Only of lowercase letters
|
|
||||||
2. Numbers, lower- and uppercase letters.
|
|
||||||
* The table below shows how much time it would take to try all
|
|
||||||
combinations of characters.
|
|
||||||
* The `crypt-bf` is featured in several settings - the number
|
|
||||||
after slash is the `rounds` parameter of `gen_salt()`.
|
|
||||||
|
|
||||||
`------------'----------'--------------'--------------------
|
|
||||||
Algorithm Hashes/sec Chars: [a-z] Chars: [A-Za-z0-9]
|
|
||||||
------------------------------------------------------------
|
|
||||||
crypt-bf/8 28 246 years 251322 years
|
|
||||||
crypt-bf/7 57 121 years 123457 years
|
|
||||||
crypt-bf/6 112 62 years 62831 years
|
|
||||||
crypt-bf/5 211 33 years 33351 years
|
|
||||||
crypt-md5 2681 2.6 years 2625 years
|
|
||||||
crypt-des 362837 7 days 19 years
|
|
||||||
sha1 590223 4 days 12 years
|
|
||||||
md5 2345086 1 day 3 years
|
|
||||||
------------------------------------------------------------
|
|
||||||
|
|
||||||
* The machine used is 1.5GHz Pentium 4.
|
|
||||||
* crypt-des and crypt-md5 algorithm numbers are taken from
|
|
||||||
John the Ripper v1.6.38 `-test` output.
|
|
||||||
* MD5 numbers are from mdcrack 1.2.
|
|
||||||
* SHA1 numbers are from lcrack-20031130-beta.
|
|
||||||
* `crypt-bf` numbers are taken using simple program that loops
|
|
||||||
over 1000 8-character passwords. That way I can show the speed with
|
|
||||||
different number of rounds. For reference: `john -test` shows 213
|
|
||||||
loops/sec for crypt-bf/5. (The small difference in results is in
|
|
||||||
accordance to the fact that the `crypt-bf` implementation in pgcrypto
|
|
||||||
is same one that is used in John the Ripper.)
|
|
||||||
|
|
||||||
Note that "try all combinations" is not a realistic exercise.
|
|
||||||
Usually password cracking is done with the help of dictionaries, which
|
|
||||||
contain both regular words and various mutations of them. So, even
|
|
||||||
somewhat word-like passwords could be cracked much faster than the above
|
|
||||||
numbers suggest, and a 6-character non-word like password may escape
|
|
||||||
cracking. Or not.
|
|
||||||
|
|
||||||
|
|
||||||
5. PGP encryption
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
The functions here implement the encryption part of OpenPGP (RFC2440)
|
|
||||||
standard. Supported are both symmetric-key and public-key encryption.
|
|
||||||
|
|
||||||
|
|
||||||
5.1. Overview
|
|
||||||
~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Encrypted PGP message consists of 2 packets:
|
|
||||||
|
|
||||||
- Packet for session key - either symmetric- or public-key encrypted.
|
|
||||||
- Packet for session-key encrypted data.
|
|
||||||
|
|
||||||
When encrypting with password:
|
|
||||||
|
|
||||||
1. Given password is hashed using String2Key (S2K) algorithm. This
|
|
||||||
is rather similar to `crypt()` algorithm - purposefully slow
|
|
||||||
and with random salt - but it produces a full-length binary key.
|
|
||||||
2. If separate session key is requested, new random key will be
|
|
||||||
generated. Otherwise S2K key will be used directly as session key.
|
|
||||||
3. If S2K key is to be used directly, then only S2K settings will be put
|
|
||||||
into session key packet. Otherwise session key will be encrypted with
|
|
||||||
S2K key and put into session key packet.
|
|
||||||
|
|
||||||
When encrypting with public key:
|
|
||||||
|
|
||||||
1. New random session key is generated.
|
|
||||||
2. It is encrypted using public key and put into session key packet.
|
|
||||||
|
|
||||||
Now common part, the session-key encrypted data packet:
|
|
||||||
|
|
||||||
1. Optional data-manipulation: compression, conversion to UTF-8,
|
|
||||||
conversion of line-endings.
|
|
||||||
2. Data is prefixed with block of random bytes. This is equal
|
|
||||||
to using random IV.
|
|
||||||
3. A SHA1 hash of random prefix and data is appended.
|
|
||||||
4. All this is encrypted with session key.
|
|
||||||
|
|
||||||
|
|
||||||
5.2. pgp_sym_encrypt(data, psw)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
pgp_sym_encrypt(data text, psw text [, options text] ) RETURNS bytea
|
|
||||||
pgp_sym_encrypt_bytea(data bytea, psw text [, options text] ) RETURNS bytea
|
|
||||||
|
|
||||||
Return a symmetric-key encrypted PGP message.
|
|
||||||
|
|
||||||
Options are described in section 5.8.
|
|
||||||
|
|
||||||
|
|
||||||
5.3. pgp_sym_decrypt(msg, psw)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
pgp_sym_decrypt(msg bytea, psw text [, options text] ) RETURNS text
|
|
||||||
pgp_sym_decrypt_bytea(msg bytea, psw text [, options text] ) RETURNS bytea
|
|
||||||
|
|
||||||
Decrypt a symmetric-key encrypted PGP message.
|
|
||||||
|
|
||||||
Decrypting bytea data with `pgp_sym_decrypt` is disallowed.
|
|
||||||
This is to avoid outputting invalid character data. Decrypting
|
|
||||||
originally textual data with `pgp_sym_decrypt_bytea` is fine.
|
|
||||||
|
|
||||||
Options are described in section 5.8.
|
|
||||||
|
|
||||||
|
|
||||||
5.4. pgp_pub_encrypt(data, pub_key)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
pgp_pub_encrypt(data text, key bytea [, options text] ) RETURNS bytea
|
|
||||||
pgp_pub_encrypt_bytea(data bytea, key bytea [, options text] ) RETURNS bytea
|
|
||||||
|
|
||||||
Encrypt data with a public key. Giving this function a secret key will
|
|
||||||
produce a error.
|
|
||||||
|
|
||||||
Options are described in section 5.8.
|
|
||||||
|
|
||||||
|
|
||||||
5.5. pgp_pub_decrypt(msg, sec_key [, psw])
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
pgp_pub_decrypt(msg bytea, key bytea [, psw text [, options text]] ) \
|
|
||||||
RETURNS text
|
|
||||||
pgp_pub_decrypt_bytea(msg bytea, key bytea [,psw text [, options text]] ) \
|
|
||||||
RETURNS bytea
|
|
||||||
|
|
||||||
Decrypt a public-key encrypted message with secret key. If the secret
|
|
||||||
key is password-protected, you must give the password in `psw`. If
|
|
||||||
there is no password, but you want to specify option for function, you
|
|
||||||
need to give empty password.
|
|
||||||
|
|
||||||
Decrypting bytea data with `pgp_pub_decrypt` is disallowed.
|
|
||||||
This is to avoid outputting invalid character data. Decrypting
|
|
||||||
originally textual data with `pgp_pub_decrypt_bytea` is fine.
|
|
||||||
|
|
||||||
Options are described in section 5.8.
|
|
||||||
|
|
||||||
|
|
||||||
5.6. pgp_key_id(key / msg)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
pgp_key_id(key or msg bytea) RETURNS text
|
|
||||||
|
|
||||||
It shows you either key ID if given PGP public or secret key. Or it
|
|
||||||
gives the key ID that was used for encrypting the data, if given
|
|
||||||
encrypted message.
|
|
||||||
|
|
||||||
It can return 2 special key IDs:
|
|
||||||
|
|
||||||
SYMKEY::
|
|
||||||
The data is encrypted with symmetric key.
|
|
||||||
|
|
||||||
ANYKEY::
|
|
||||||
The data is public-key encrypted, but the key ID is cleared.
|
|
||||||
That means you need to try all your secret keys on it to see
|
|
||||||
which one decrypts it. pgcrypto itself does not produce such
|
|
||||||
messages.
|
|
||||||
|
|
||||||
Note that different keys may have same ID. This is rare but normal
|
|
||||||
event. Client application should then try to decrypt with each one,
|
|
||||||
to see which fits - like handling ANYKEY.
|
|
||||||
|
|
||||||
|
|
||||||
5.7. armor / dearmor
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
armor(data bytea) RETURNS text
|
|
||||||
dearmor(data text) RETURNS bytea
|
|
||||||
|
|
||||||
Those wrap/unwrap data into PGP Ascii Armor which is basically Base64
|
|
||||||
with CRC and additional formatting.
|
|
||||||
|
|
||||||
|
|
||||||
5.8. Options for PGP functions
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Options are named to be similar to GnuPG. Values should be given after
|
|
||||||
an equal sign; separate options from each other with commas. Example:
|
|
||||||
|
|
||||||
pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256')
|
|
||||||
|
|
||||||
All of the options except `convert-crlf` apply only to encrypt
|
|
||||||
functions. Decrypt functions get the parameters from PGP data.
|
|
||||||
|
|
||||||
Most interesting options are probably `compression-algo` and
|
|
||||||
`unicode-mode`. The rest should have reasonable defaults.
|
|
||||||
|
|
||||||
|
|
||||||
cipher-algo::
|
|
||||||
What cipher algorithm to use.
|
|
||||||
|
|
||||||
Values: bf, aes128, aes192, aes256 (OpenSSL-only: `3des`, `cast5`)
|
|
||||||
Default: aes128
|
|
||||||
Applies: pgp_sym_encrypt, pgp_pub_encrypt
|
|
||||||
|
|
||||||
compress-algo::
|
|
||||||
Which compression algorithm to use. Needs building with zlib.
|
|
||||||
|
|
||||||
Values:
|
|
||||||
0 - no compression
|
|
||||||
1 - ZIP compression
|
|
||||||
2 - ZLIB compression [=ZIP plus meta-data and block-CRC's]
|
|
||||||
Default: 0
|
|
||||||
Applies: pgp_sym_encrypt, pgp_pub_encrypt
|
|
||||||
|
|
||||||
compress-level::
|
|
||||||
How much to compress. Bigger level compresses smaller but is slower.
|
|
||||||
0 disables compression.
|
|
||||||
|
|
||||||
Values: 0, 1-9
|
|
||||||
Default: 6
|
|
||||||
Applies: pgp_sym_encrypt, pgp_pub_encrypt
|
|
||||||
|
|
||||||
convert-crlf::
|
|
||||||
Whether to convert `\n` into `\r\n` when encrypting and `\r\n` to `\n`
|
|
||||||
when decrypting. RFC2440 specifies that text data should be stored
|
|
||||||
using `\r\n` line-feeds. Use this to get fully RFC-compliant
|
|
||||||
behavior.
|
|
||||||
|
|
||||||
Values: 0, 1
|
|
||||||
Default: 0
|
|
||||||
Applies: pgp_sym_encrypt, pgp_pub_encrypt, pgp_sym_decrypt, pgp_pub_decrypt
|
|
||||||
|
|
||||||
disable-mdc::
|
|
||||||
Do not protect data with SHA-1. Only good reason to use this
|
|
||||||
option is to achieve compatibility with ancient PGP products, as the
|
|
||||||
SHA-1 protected packet is from upcoming update to RFC2440. (Currently
|
|
||||||
at version RFC2440bis-14.) Recent gnupg.org and pgp.com software
|
|
||||||
supports it fine.
|
|
||||||
|
|
||||||
Values: 0, 1
|
|
||||||
Default: 0
|
|
||||||
Applies: pgp_sym_encrypt, pgp_pub_encrypt
|
|
||||||
|
|
||||||
enable-session-key::
|
|
||||||
Use separate session key. Public-key encryption always uses separate
|
|
||||||
session key, this is for symmetric-key encryption, which by default
|
|
||||||
uses S2K directly.
|
|
||||||
|
|
||||||
Values: 0, 1
|
|
||||||
Default: 0
|
|
||||||
Applies: pgp_sym_encrypt
|
|
||||||
|
|
||||||
s2k-mode::
|
|
||||||
Which S2K algorithm to use.
|
|
||||||
|
|
||||||
Values:
|
|
||||||
0 - Without salt. Dangerous!
|
|
||||||
1 - With salt but with fixed iteration count.
|
|
||||||
3 - Variable iteration count.
|
|
||||||
Default: 3
|
|
||||||
Applies: pgp_sym_encrypt
|
|
||||||
|
|
||||||
s2k-digest-algo::
|
|
||||||
Which digest algorithm to use in S2K calculation.
|
|
||||||
|
|
||||||
Values: md5, sha1
|
|
||||||
Default: sha1
|
|
||||||
Applies: pgp_sym_encrypt
|
|
||||||
|
|
||||||
s2k-cipher-algo::
|
|
||||||
Which cipher to use for encrypting separate session key.
|
|
||||||
|
|
||||||
Values: bf, aes, aes128, aes192, aes256
|
|
||||||
Default: use cipher-algo.
|
|
||||||
Applies: pgp_sym_encrypt
|
|
||||||
|
|
||||||
unicode-mode::
|
|
||||||
Whether to convert textual data from database internal encoding to
|
|
||||||
UTF-8 and back. If your database already is UTF-8, no conversion will
|
|
||||||
be done, only the data will be tagged as UTF-8. Without this option
|
|
||||||
it will not be.
|
|
||||||
|
|
||||||
Values: 0, 1
|
|
||||||
Default: 0
|
|
||||||
Applies: pgp_sym_encrypt, pgp_pub_encrypt
|
|
||||||
|
|
||||||
|
|
||||||
5.9. Generating keys with GnuPG
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Generate a new key:
|
|
||||||
|
|
||||||
gpg --gen-key
|
|
||||||
|
|
||||||
The preferred key type is "DSA and Elgamal".
|
|
||||||
|
|
||||||
For RSA encryption you must create either DSA or RSA sign-only key
|
|
||||||
as master and then add RSA encryption subkey with `gpg --edit-key`.
|
|
||||||
|
|
||||||
List keys:
|
|
||||||
|
|
||||||
gpg --list-secret-keys
|
|
||||||
|
|
||||||
Export ascii-armored public key:
|
|
||||||
|
|
||||||
gpg -a --export KEYID > public.key
|
|
||||||
|
|
||||||
Export ascii-armored secret key:
|
|
||||||
|
|
||||||
gpg -a --export-secret-keys KEYID > secret.key
|
|
||||||
|
|
||||||
You need to use `dearmor()` on them before giving them to
|
|
||||||
pgp_pub_* functions. Or if you can handle binary data, you can drop
|
|
||||||
"-a" from gpg.
|
|
||||||
|
|
||||||
For more details see `man gpg`, http://www.gnupg.org/gph/en/manual.html[
|
|
||||||
The GNU Privacy Handbook] and other docs on http://www.gnupg.org[] site.
|
|
||||||
|
|
||||||
|
|
||||||
5.10. Limitations of PGP code
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
- No support for signing. That also means that it is not checked
|
|
||||||
whether the encryption subkey belongs to master key.
|
|
||||||
|
|
||||||
- No support for encryption key as master key. As such practice
|
|
||||||
is generally discouraged, it should not be a problem.
|
|
||||||
|
|
||||||
- No support for several subkeys. This may seem like a problem, as this
|
|
||||||
is common practice. On the other hand, you should not use your regular
|
|
||||||
GPG/PGP keys with pgcrypto, but create new ones, as the usage scenario
|
|
||||||
is rather different.
|
|
||||||
|
|
||||||
|
|
||||||
6. Raw encryption
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
Those functions only run a cipher over data, they don't have any advanced
|
|
||||||
features of PGP encryption. Therefore they have some major problems:
|
|
||||||
|
|
||||||
1. They use user key directly as cipher key.
|
|
||||||
2. They don't provide any integrity checking, to see
|
|
||||||
if the encrypted data was modified.
|
|
||||||
3. They expect that users manage all encryption parameters
|
|
||||||
themselves, even IV.
|
|
||||||
4. They don't handle text.
|
|
||||||
|
|
||||||
So, with the introduction of PGP encryption, usage of raw
|
|
||||||
encryption functions is discouraged.
|
|
||||||
|
|
||||||
|
|
||||||
encrypt(data bytea, key bytea, type text) RETURNS bytea
|
|
||||||
decrypt(data bytea, key bytea, type text) RETURNS bytea
|
|
||||||
|
|
||||||
encrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea
|
|
||||||
decrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea
|
|
||||||
|
|
||||||
Encrypt/decrypt data with cipher, padding data if needed.
|
|
||||||
|
|
||||||
`type` parameter description in pseudo-noteup:
|
|
||||||
|
|
||||||
algo ['-' mode] ['/pad:' padding]
|
|
||||||
|
|
||||||
Supported algorithms:
|
|
||||||
|
|
||||||
* `bf` - Blowfish
|
|
||||||
* `aes` - AES (Rijndael-128)
|
|
||||||
|
|
||||||
Modes:
|
|
||||||
|
|
||||||
* `cbc` - next block depends on previous. (default)
|
|
||||||
* `ecb` - each block is encrypted separately.
|
|
||||||
(for testing only)
|
|
||||||
|
|
||||||
Padding:
|
|
||||||
|
|
||||||
* `pkcs` - data may be any length (default)
|
|
||||||
* `none` - data must be multiple of cipher block size.
|
|
||||||
|
|
||||||
IV is initial value for mode, defaults to all zeroes. It is ignored for
|
|
||||||
ECB. It is clipped or padded with zeroes if not exactly block size.
|
|
||||||
|
|
||||||
So, example:
|
|
||||||
|
|
||||||
encrypt(data, 'fooz', 'bf')
|
|
||||||
|
|
||||||
is equal to
|
|
||||||
|
|
||||||
encrypt(data, 'fooz', 'bf-cbc/pad:pkcs')
|
|
||||||
|
|
||||||
|
|
||||||
7. Random bytes
|
|
||||||
-----------------
|
|
||||||
|
|
||||||
gen_random_bytes(count integer)
|
|
||||||
|
|
||||||
Returns `count` cryptographically strong random bytes as bytea value.
|
|
||||||
There can be maximally 1024 bytes extracted at a time. This is to avoid
|
|
||||||
draining the randomness generator pool.
|
|
||||||
|
|
||||||
|
|
||||||
8. Credits
|
|
||||||
------------
|
|
||||||
|
|
||||||
I have used code from following sources:
|
|
||||||
|
|
||||||
`--------------------`-------------------------`-------------------------------
|
|
||||||
Algorithm Author Source origin
|
|
||||||
-------------------------------------------------------------------------------
|
|
||||||
DES crypt() David Burren and others FreeBSD libcrypt
|
|
||||||
MD5 crypt() Poul-Henning Kamp FreeBSD libcrypt
|
|
||||||
Blowfish crypt() Solar Designer www.openwall.com
|
|
||||||
Blowfish cipher Simon Tatham PuTTY
|
|
||||||
Rijndael cipher Brian Gladman OpenBSD sys/crypto
|
|
||||||
MD5 and SHA1 WIDE Project KAME kame/sys/crypto
|
|
||||||
SHA256/384/512 Aaron D. Gifford OpenBSD sys/crypto
|
|
||||||
BIGNUM math Michael J. Fromberger dartmouth.edu/~sting/sw/imath
|
|
||||||
-------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
9. Legalese
|
|
||||||
-------------
|
|
||||||
|
|
||||||
* I owe a beer to Poul-Henning.
|
|
||||||
|
|
||||||
|
|
||||||
10. References/Links
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
10.1. Useful reading
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
http://www.gnupg.org/gph/en/manual.html[]::
|
|
||||||
The GNU Privacy Handbook
|
|
||||||
|
|
||||||
http://www.openwall.com/crypt/[]::
|
|
||||||
Describes the crypt-blowfish algorithm.
|
|
||||||
|
|
||||||
http://www.stack.nl/~galactus/remailers/passphrase-faq.html[]::
|
|
||||||
How to choose good password.
|
|
||||||
|
|
||||||
http://world.std.com/~reinhold/diceware.html[]::
|
|
||||||
Interesting idea for picking passwords.
|
|
||||||
|
|
||||||
http://www.interhack.net/people/cmcurtin/snake-oil-faq.html[]::
|
|
||||||
Describes good and bad cryptography.
|
|
||||||
|
|
||||||
|
|
||||||
10.2. Technical references
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
http://www.ietf.org/rfc/rfc2440.txt[]::
|
|
||||||
OpenPGP message format
|
|
||||||
|
|
||||||
http://www.imc.org/draft-ietf-openpgp-rfc2440bis[]::
|
|
||||||
New version of RFC2440.
|
|
||||||
|
|
||||||
http://www.ietf.org/rfc/rfc1321.txt[]::
|
|
||||||
The MD5 Message-Digest Algorithm
|
|
||||||
|
|
||||||
http://www.ietf.org/rfc/rfc2104.txt[]::
|
|
||||||
HMAC: Keyed-Hashing for Message Authentication
|
|
||||||
|
|
||||||
http://www.usenix.org/events/usenix99/provos.html[]::
|
|
||||||
Comparison of crypt-des, crypt-md5 and bcrypt algorithms.
|
|
||||||
|
|
||||||
http://csrc.nist.gov/cryptval/des.htm[]::
|
|
||||||
Standards for DES, 3DES and AES.
|
|
||||||
|
|
||||||
http://en.wikipedia.org/wiki/Fortuna_(PRNG)[]::
|
|
||||||
Description of Fortuna CSPRNG.
|
|
||||||
|
|
||||||
http://jlcooke.ca/random/[]::
|
|
||||||
Jean-Luc Cooke Fortuna-based /dev/random driver for Linux.
|
|
||||||
|
|
||||||
http://www.cs.ut.ee/~helger/crypto/[]::
|
|
||||||
Collection of cryptology pointers.
|
|
||||||
|
|
||||||
|
|
||||||
// $PostgreSQL: pgsql/contrib/pgcrypto/README.pgcrypto,v 1.19 2007/03/28 22:48:58 neilc Exp $
|
|
|
@ -1,88 +0,0 @@
|
||||||
$PostgreSQL: pgsql/contrib/pgrowlocks/README.pgrowlocks,v 1.2 2007/08/27 00:13:51 tgl Exp $
|
|
||||||
|
|
||||||
pgrowlocks README Tatsuo Ishii
|
|
||||||
|
|
||||||
1. What is pgrowlocks?
|
|
||||||
|
|
||||||
pgrowlocks shows row locking information for specified table.
|
|
||||||
|
|
||||||
pgrowlocks returns following columns:
|
|
||||||
|
|
||||||
locked_row TID, -- row TID
|
|
||||||
lock_type TEXT, -- lock type
|
|
||||||
locker XID, -- locking XID
|
|
||||||
multi bool, -- multi XID?
|
|
||||||
xids xid[], -- multi XIDs
|
|
||||||
pids INTEGER[] -- locker's process id
|
|
||||||
|
|
||||||
Here is a sample execution of pgrowlocks:
|
|
||||||
|
|
||||||
test=# SELECT * FROM pgrowlocks('t1');
|
|
||||||
locked_row | lock_type | locker | multi | xids | pids
|
|
||||||
------------+-----------+--------+-------+-----------+---------------
|
|
||||||
(0,1) | Shared | 19 | t | {804,805} | {29066,29068}
|
|
||||||
(0,2) | Shared | 19 | t | {804,805} | {29066,29068}
|
|
||||||
(0,3) | Exclusive | 804 | f | {804} | {29066}
|
|
||||||
(0,4) | Exclusive | 804 | f | {804} | {29066}
|
|
||||||
(4 rows)
|
|
||||||
|
|
||||||
locked_row -- tuple ID(TID) of each locked rows
|
|
||||||
lock_type -- "Shared" for shared lock, "Exclusive" for exclusive lock
|
|
||||||
locker -- transaction ID of locker (note 1)
|
|
||||||
multi -- "t" if locker is a multi transaction, otherwise "f"
|
|
||||||
xids -- XIDs of lockers (note 2)
|
|
||||||
pids -- process ids of locking backends
|
|
||||||
|
|
||||||
note1: if the locker is multi transaction, it represents the multi ID
|
|
||||||
|
|
||||||
note2: if the locker is multi, multiple data are shown
|
|
||||||
|
|
||||||
2. Installing pgrowlocks
|
|
||||||
|
|
||||||
Installing pgrowlocks requires PostgreSQL 8.0 or later source tree.
|
|
||||||
|
|
||||||
$ cd /usr/local/src/postgresql-8.1/contrib
|
|
||||||
$ tar xfz /tmp/pgrowlocks-1.0.tar.gz
|
|
||||||
|
|
||||||
If you are using PostgreSQL 8.0, you need to modify pgrowlocks source code.
|
|
||||||
Around line 61, you will see:
|
|
||||||
|
|
||||||
#undef MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS
|
|
||||||
|
|
||||||
change this to:
|
|
||||||
|
|
||||||
#define MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS
|
|
||||||
|
|
||||||
$ make
|
|
||||||
$ make install
|
|
||||||
|
|
||||||
$ psql -e -f pgrowlocks.sql test
|
|
||||||
|
|
||||||
3. How to use pgrowlocks
|
|
||||||
|
|
||||||
pgrowlocks grab AccessShareLock for the target table and read each
|
|
||||||
row one by one to get the row locking information. You should
|
|
||||||
notice that:
|
|
||||||
|
|
||||||
1) if the table is exclusive locked by someone else, pgrowlocks
|
|
||||||
will be blocked.
|
|
||||||
|
|
||||||
2) pgrowlocks may show incorrect information if there's a new
|
|
||||||
lock or a lock is freeed while its execution.
|
|
||||||
|
|
||||||
pgrowlocks does not show the contents of locked rows. If you want
|
|
||||||
to take a look at the row contents at the same time, you could do
|
|
||||||
something like this:
|
|
||||||
|
|
||||||
SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
|
|
||||||
|
|
||||||
|
|
||||||
4. License
|
|
||||||
|
|
||||||
pgrowlocks is distribute under (modified) BSD license described in
|
|
||||||
the source file.
|
|
||||||
|
|
||||||
5. History
|
|
||||||
|
|
||||||
2006/03/21 pgrowlocks version 1.1 released (tested on 8.2 current)
|
|
||||||
2005/08/22 pgrowlocks version 1.0 released
|
|
|
@ -1,102 +0,0 @@
|
||||||
pgstattuple README 2002/08/29 Tatsuo Ishii
|
|
||||||
|
|
||||||
1. Functions supported:
|
|
||||||
|
|
||||||
pgstattuple
|
|
||||||
-----------
|
|
||||||
pgstattuple() returns the relation length, percentage of the "dead"
|
|
||||||
tuples of a relation and other info. This may help users to determine
|
|
||||||
whether vacuum is necessary or not. Here is an example session:
|
|
||||||
|
|
||||||
test=> \x
|
|
||||||
Expanded display is on.
|
|
||||||
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
|
|
||||||
-[ RECORD 1 ]------+-------
|
|
||||||
table_len | 458752
|
|
||||||
tuple_count | 1470
|
|
||||||
tuple_len | 438896
|
|
||||||
tuple_percent | 95.67
|
|
||||||
dead_tuple_count | 11
|
|
||||||
dead_tuple_len | 3157
|
|
||||||
dead_tuple_percent | 0.69
|
|
||||||
free_space | 8932
|
|
||||||
free_percent | 1.95
|
|
||||||
|
|
||||||
Here are explanations for each column:
|
|
||||||
|
|
||||||
table_len -- physical relation length in bytes
|
|
||||||
tuple_count -- number of live tuples
|
|
||||||
tuple_len -- total tuples length in bytes
|
|
||||||
tuple_percent -- live tuples in %
|
|
||||||
dead_tuple_len -- total dead tuples length in bytes
|
|
||||||
dead_tuple_percent -- dead tuples in %
|
|
||||||
free_space -- free space in bytes
|
|
||||||
free_percent -- free space in %
|
|
||||||
|
|
||||||
pg_relpages
|
|
||||||
-----------
|
|
||||||
pg_relpages() returns the number of pages in the relation.
|
|
||||||
|
|
||||||
pgstatindex
|
|
||||||
-----------
|
|
||||||
pgstatindex() returns an array showing the information about an index:
|
|
||||||
|
|
||||||
test=> \x
|
|
||||||
Expanded display is on.
|
|
||||||
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
|
|
||||||
-[ RECORD 1 ]------+------
|
|
||||||
version | 2
|
|
||||||
tree_level | 0
|
|
||||||
index_size | 8192
|
|
||||||
root_block_no | 1
|
|
||||||
internal_pages | 0
|
|
||||||
leaf_pages | 1
|
|
||||||
empty_pages | 0
|
|
||||||
deleted_pages | 0
|
|
||||||
avg_leaf_density | 50.27
|
|
||||||
leaf_fragmentation | 0
|
|
||||||
|
|
||||||
|
|
||||||
2. Installing pgstattuple
|
|
||||||
|
|
||||||
$ make
|
|
||||||
$ make install
|
|
||||||
$ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
|
|
||||||
|
|
||||||
|
|
||||||
3. Using pgstattuple
|
|
||||||
|
|
||||||
pgstattuple may be called as a relation function and is
|
|
||||||
defined as follows:
|
|
||||||
|
|
||||||
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
|
|
||||||
AS 'MODULE_PATHNAME', 'pgstattuple'
|
|
||||||
LANGUAGE C STRICT;
|
|
||||||
|
|
||||||
CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
|
|
||||||
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
|
|
||||||
LANGUAGE C STRICT;
|
|
||||||
|
|
||||||
The argument is the relation name (optionally it may be qualified)
|
|
||||||
or the OID of the relation. Note that pgstattuple only returns
|
|
||||||
one row.
|
|
||||||
|
|
||||||
|
|
||||||
4. Notes
|
|
||||||
|
|
||||||
pgstattuple acquires only a read lock on the relation. So concurrent
|
|
||||||
update may affect the result.
|
|
||||||
|
|
||||||
pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
|
|
||||||
returns false.
|
|
||||||
|
|
||||||
|
|
||||||
5. History
|
|
||||||
|
|
||||||
2007/05/17
|
|
||||||
|
|
||||||
Moved page-level functions to contrib/pageinspect.
|
|
||||||
|
|
||||||
2006/06/28
|
|
||||||
|
|
||||||
Extended to work against indexes.
|
|
|
@ -1,326 +0,0 @@
|
||||||
This directory contains the code for the user-defined type,
|
|
||||||
SEG, representing laboratory measurements as floating point
|
|
||||||
intervals.
|
|
||||||
|
|
||||||
RATIONALE
|
|
||||||
=========
|
|
||||||
|
|
||||||
The geometry of measurements is usually more complex than that of a
|
|
||||||
point in a numeric continuum. A measurement is usually a segment of
|
|
||||||
that continuum with somewhat fuzzy limits. The measurements come out
|
|
||||||
as intervals because of uncertainty and randomness, as well as because
|
|
||||||
the value being measured may naturally be an interval indicating some
|
|
||||||
condition, such as the temperature range of stability of a protein.
|
|
||||||
|
|
||||||
Using just common sense, it appears more convenient to store such data
|
|
||||||
as intervals, rather than pairs of numbers. In practice, it even turns
|
|
||||||
out more efficient in most applications.
|
|
||||||
|
|
||||||
Further along the line of common sense, the fuzziness of the limits
|
|
||||||
suggests that the use of traditional numeric data types leads to a
|
|
||||||
certain loss of information. Consider this: your instrument reads
|
|
||||||
6.50, and you input this reading into the database. What do you get
|
|
||||||
when you fetch it? Watch:
|
|
||||||
|
|
||||||
test=> select 6.50 as "pH";
|
|
||||||
pH
|
|
||||||
---
|
|
||||||
6.5
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
In the world of measurements, 6.50 is not the same as 6.5. It may
|
|
||||||
sometimes be critically different. The experimenters usually write
|
|
||||||
down (and publish) the digits they trust. 6.50 is actually a fuzzy
|
|
||||||
interval contained within a bigger and even fuzzier interval, 6.5,
|
|
||||||
with their center points being (probably) the only common feature they
|
|
||||||
share. We definitely do not want such different data items to appear the
|
|
||||||
same.
|
|
||||||
|
|
||||||
Conclusion? It is nice to have a special data type that can record the
|
|
||||||
limits of an interval with arbitrarily variable precision. Variable in
|
|
||||||
a sense that each data element records its own precision.
|
|
||||||
|
|
||||||
Check this out:
|
|
||||||
|
|
||||||
test=> select '6.25 .. 6.50'::seg as "pH";
|
|
||||||
pH
|
|
||||||
------------
|
|
||||||
6.25 .. 6.50
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
|
|
||||||
FILES
|
|
||||||
=====
|
|
||||||
|
|
||||||
Makefile building instructions for the shared library
|
|
||||||
|
|
||||||
README.seg the file you are now reading
|
|
||||||
|
|
||||||
seg.c the implementation of this data type in c
|
|
||||||
|
|
||||||
seg.sql.in SQL code needed to register this type with postgres
|
|
||||||
(transformed to seg.sql by make)
|
|
||||||
|
|
||||||
segdata.h the data structure used to store the segments
|
|
||||||
|
|
||||||
segparse.y the grammar file for the parser (used by seg_in() in seg.c)
|
|
||||||
|
|
||||||
segscan.l scanner rules (used by seg_yyparse() in segparse.y)
|
|
||||||
|
|
||||||
seg-validate.pl a simple input validation script. It is probably a
|
|
||||||
little stricter than the type itself: for example,
|
|
||||||
it rejects '22 ' because of the trailing space. Use
|
|
||||||
as a filter to discard bad values from a single column;
|
|
||||||
redirect to /dev/null to see the offending input
|
|
||||||
|
|
||||||
sort-segments.pl a script to sort the tables having a SEG type column
|
|
||||||
|
|
||||||
|
|
||||||
INSTALLATION
|
|
||||||
============
|
|
||||||
|
|
||||||
To install the type, run
|
|
||||||
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
|
|
||||||
The user running "make install" may need root access; depending on how you
|
|
||||||
configured the PostgreSQL installation paths.
|
|
||||||
|
|
||||||
This only installs the type implementation and documentation. To make the
|
|
||||||
type available in any particular database, do
|
|
||||||
|
|
||||||
psql -d databasename < seg.sql
|
|
||||||
|
|
||||||
If you install the type in the template1 database, all subsequently created
|
|
||||||
databases will inherit it.
|
|
||||||
|
|
||||||
To test the new type, after "make install" do
|
|
||||||
|
|
||||||
make installcheck
|
|
||||||
|
|
||||||
If it fails, examine the file regression.diffs to find out the reason (the
|
|
||||||
test code is a direct adaptation of the regression tests from the main
|
|
||||||
source tree).
|
|
||||||
|
|
||||||
|
|
||||||
SYNTAX
|
|
||||||
======
|
|
||||||
|
|
||||||
The external representation of an interval is formed using one or two
|
|
||||||
floating point numbers joined by the range operator ('..' or '...').
|
|
||||||
Optional certainty indicators (<, > and ~) are ignored by the internal
|
|
||||||
logics, but are retained in the data.
|
|
||||||
|
|
||||||
Grammar
|
|
||||||
-------
|
|
||||||
|
|
||||||
rule 1 seg -> boundary PLUMIN deviation
|
|
||||||
rule 2 seg -> boundary RANGE boundary
|
|
||||||
rule 3 seg -> boundary RANGE
|
|
||||||
rule 4 seg -> RANGE boundary
|
|
||||||
rule 5 seg -> boundary
|
|
||||||
rule 6 boundary -> FLOAT
|
|
||||||
rule 7 boundary -> EXTENSION FLOAT
|
|
||||||
rule 8 deviation -> FLOAT
|
|
||||||
|
|
||||||
Tokens
|
|
||||||
------
|
|
||||||
|
|
||||||
RANGE (\.\.)(\.)?
|
|
||||||
PLUMIN \'\+\-\'
|
|
||||||
integer [+-]?[0-9]+
|
|
||||||
real [+-]?[0-9]+\.[0-9]+
|
|
||||||
FLOAT ({integer}|{real})([eE]{integer})?
|
|
||||||
EXTENSION [<>~]
|
|
||||||
|
|
||||||
|
|
||||||
Examples of valid SEG representations:
|
|
||||||
--------------------------------------
|
|
||||||
|
|
||||||
Any number (rules 5,6) -- creates a zero-length segment (a point,
|
|
||||||
if you will)
|
|
||||||
|
|
||||||
~5.0 (rules 5,7) -- creates a zero-length segment AND records
|
|
||||||
'~' in the data. This notation reads 'approximately 5.0',
|
|
||||||
but its meaning is not recognized by the code. It is ignored
|
|
||||||
until you get the value back. View it is a short-hand comment.
|
|
||||||
|
|
||||||
<5.0 (rules 5,7) -- creates a point at 5.0; '<' is ignored but
|
|
||||||
is preserved as a comment
|
|
||||||
|
|
||||||
>5.0 (rules 5,7) -- creates a point at 5.0; '>' is ignored but
|
|
||||||
is preserved as a comment
|
|
||||||
|
|
||||||
5(+-)0.3
|
|
||||||
5'+-'0.3 (rules 1,8) -- creates an interval '4.7..5.3'. As of this
|
|
||||||
writing (02/09/2000), this mechanism isn't completely accurate
|
|
||||||
in determining the number of significant digits for the
|
|
||||||
boundaries. For example, it adds an extra digit to the lower
|
|
||||||
boundary if the resulting interval includes a power of ten:
|
|
||||||
|
|
||||||
postgres=> select '10(+-)1'::seg as seg;
|
|
||||||
seg
|
|
||||||
---------
|
|
||||||
9.0 .. 11 -- should be: 9 .. 11
|
|
||||||
|
|
||||||
Also, the (+-) notation is not preserved: 'a(+-)b' will
|
|
||||||
always be returned as '(a-b) .. (a+b)'. The purpose of this
|
|
||||||
notation is to allow input from certain data sources without
|
|
||||||
conversion.
|
|
||||||
|
|
||||||
50 .. (rule 3) -- everything that is greater than or equal to 50
|
|
||||||
|
|
||||||
.. 0 (rule 4) -- everything that is less than or equal to 0
|
|
||||||
|
|
||||||
1.5e-2 .. 2E-2 (rule 2) -- creates an interval (0.015 .. 0.02)
|
|
||||||
|
|
||||||
1 ... 2 The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
|
|
||||||
Because of the widespread use of '...' in the data sources,
|
|
||||||
I decided to stick to is as a range operator. This, and
|
|
||||||
also the fact that the white space around the range operator
|
|
||||||
is ignored, creates a parsing conflict with numeric constants
|
|
||||||
starting with a decimal point.
|
|
||||||
|
|
||||||
|
|
||||||
Examples of invalid SEG input:
|
|
||||||
------------------------------
|
|
||||||
|
|
||||||
.1e7 should be: 0.1e7
|
|
||||||
.1 .. .2 should be: 0.1 .. 0.2
|
|
||||||
2.4 E4 should be: 2.4E4
|
|
||||||
|
|
||||||
The following, although it is not a syntax error, is disallowed to improve
|
|
||||||
the sanity of the data:
|
|
||||||
|
|
||||||
5 .. 2 should be: 2 .. 5
|
|
||||||
|
|
||||||
|
|
||||||
PRECISION
|
|
||||||
=========
|
|
||||||
|
|
||||||
The segments are stored internally as pairs of 32-bit floating point
|
|
||||||
numbers. It means that the numbers with more than 7 significant digits
|
|
||||||
will be truncated.
|
|
||||||
|
|
||||||
The numbers with less than or exactly 7 significant digits retain their
|
|
||||||
original precision. That is, if your query returns 0.00, you will be
|
|
||||||
sure that the trailing zeroes are not the artifacts of formatting: they
|
|
||||||
reflect the precision of the original data. The number of leading
|
|
||||||
zeroes does not affect precision: the value 0.0067 is considered to
|
|
||||||
have just 2 significant digits.
|
|
||||||
|
|
||||||
|
|
||||||
USAGE
|
|
||||||
=====
|
|
||||||
|
|
||||||
The access method for SEG is a GiST index (gist_seg_ops), which is a
|
|
||||||
generalization of R-tree. GiSTs allow the postgres implementation of
|
|
||||||
R-tree, originally encoded to support 2-D geometric types such as
|
|
||||||
boxes and polygons, to be used with any data type whose data domain
|
|
||||||
can be partitioned using the concepts of containment, intersection and
|
|
||||||
equality. In other words, everything that can intersect or contain
|
|
||||||
its own kind can be indexed with a GiST. That includes, among other
|
|
||||||
things, all geometric data types, regardless of their dimensionality
|
|
||||||
(see also contrib/cube).
|
|
||||||
|
|
||||||
The operators supported by the GiST access method include:
|
|
||||||
|
|
||||||
|
|
||||||
[a, b] << [c, d] Is left of
|
|
||||||
|
|
||||||
The left operand, [a, b], occurs entirely to the left of the
|
|
||||||
right operand, [c, d], on the axis (-inf, inf). It means,
|
|
||||||
[a, b] << [c, d] is true if b < c and false otherwise
|
|
||||||
|
|
||||||
[a, b] >> [c, d] Is right of
|
|
||||||
|
|
||||||
[a, b] is occurs entirely to the right of [c, d].
|
|
||||||
[a, b] >> [c, d] is true if a > d and false otherwise
|
|
||||||
|
|
||||||
[a, b] &< [c, d] Overlaps or is left of
|
|
||||||
|
|
||||||
This might be better read as "does not extend to right of".
|
|
||||||
It is true when b <= d.
|
|
||||||
|
|
||||||
[a, b] &> [c, d] Overlaps or is right of
|
|
||||||
|
|
||||||
This might be better read as "does not extend to left of".
|
|
||||||
It is true when a >= c.
|
|
||||||
|
|
||||||
[a, b] = [c, d] Same as
|
|
||||||
|
|
||||||
The segments [a, b] and [c, d] are identical, that is, a == b
|
|
||||||
and c == d
|
|
||||||
|
|
||||||
[a, b] && [c, d] Overlaps
|
|
||||||
|
|
||||||
The segments [a, b] and [c, d] overlap.
|
|
||||||
|
|
||||||
[a, b] @> [c, d] Contains
|
|
||||||
|
|
||||||
The segment [a, b] contains the segment [c, d], that is,
|
|
||||||
a <= c and b >= d
|
|
||||||
|
|
||||||
[a, b] <@ [c, d] Contained in
|
|
||||||
|
|
||||||
The segment [a, b] is contained in [c, d], that is,
|
|
||||||
a >= c and b <= d
|
|
||||||
|
|
||||||
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
|
||||||
respectively called @ and ~. These names are still available, but are
|
|
||||||
deprecated and will eventually be retired. Notice that the old names
|
|
||||||
are reversed from the convention formerly followed by the core geometric
|
|
||||||
datatypes!)
|
|
||||||
|
|
||||||
Although the mnemonics of the following operators is questionable, I
|
|
||||||
preserved them to maintain visual consistency with other geometric
|
|
||||||
data types defined in Postgres.
|
|
||||||
|
|
||||||
Other operators:
|
|
||||||
|
|
||||||
[a, b] < [c, d] Less than
|
|
||||||
[a, b] > [c, d] Greater than
|
|
||||||
|
|
||||||
These operators do not make a lot of sense for any practical
|
|
||||||
purpose but sorting. These operators first compare (a) to (c),
|
|
||||||
and if these are equal, compare (b) to (d). That accounts for
|
|
||||||
reasonably good sorting in most cases, which is useful if
|
|
||||||
you want to use ORDER BY with this type
|
|
||||||
|
|
||||||
There are a few other potentially useful functions defined in seg.c
|
|
||||||
that vanished from the schema because I stopped using them. Some of
|
|
||||||
these were meant to support type casting. Let me know if I was wrong:
|
|
||||||
I will then add them back to the schema. I would also appreciate
|
|
||||||
other ideas that would enhance the type and make it more useful.
|
|
||||||
|
|
||||||
For examples of usage, see sql/seg.sql
|
|
||||||
|
|
||||||
NOTE: The performance of an R-tree index can largely depend on the
|
|
||||||
order of input values. It may be very helpful to sort the input table
|
|
||||||
on the SEG column (see the script sort-segments.pl for an example)
|
|
||||||
|
|
||||||
|
|
||||||
CREDITS
|
|
||||||
=======
|
|
||||||
|
|
||||||
My thanks are primarily to Prof. Joe Hellerstein
|
|
||||||
(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST
|
|
||||||
(http://gist.cs.berkeley.edu/). I am also grateful to all postgres
|
|
||||||
developers, present and past, for enabling myself to create my own
|
|
||||||
world and live undisturbed in it. And I would like to acknowledge my
|
|
||||||
gratitude to Argonne Lab and to the U.S. Department of Energy for the
|
|
||||||
years of faithful support of my database research.
|
|
||||||
|
|
||||||
|
|
||||||
------------------------------------------------------------------------
|
|
||||||
Gene Selkov, Jr.
|
|
||||||
Computational Scientist
|
|
||||||
Mathematics and Computer Science Division
|
|
||||||
Argonne National Laboratory
|
|
||||||
9700 S Cass Ave.
|
|
||||||
Building 221
|
|
||||||
Argonne, IL 60439-4844
|
|
||||||
|
|
||||||
selkovjr@mcs.anl.gov
|
|
||||||
|
|
|
@ -1,120 +0,0 @@
|
||||||
sslinfo - information about current SSL certificate for PostgreSQL
|
|
||||||
==================================================================
|
|
||||||
Author: Victor Wagner <vitus@cryptocom.ru>, Cryptocom LTD
|
|
||||||
E-Mail of Cryptocom OpenSSL development group: <openssl@cryptocom.ru>
|
|
||||||
|
|
||||||
|
|
||||||
1. Notes
|
|
||||||
--------
|
|
||||||
This extension won't build unless your PostgreSQL server is configured
|
|
||||||
with --with-openssl. Information provided with these functions would
|
|
||||||
be completely useless if you don't use SSL to connect to database.
|
|
||||||
|
|
||||||
|
|
||||||
2. Functions Description
|
|
||||||
------------------------
|
|
||||||
|
|
||||||
2.1. ssl_is_used()
|
|
||||||
~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
ssl_is_used() RETURNS boolean;
|
|
||||||
|
|
||||||
Returns TRUE, if current connection to server uses SSL and FALSE
|
|
||||||
otherwise.
|
|
||||||
|
|
||||||
2.2. ssl_client_cert_present()
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
ssl_client_cert_present() RETURNS boolean
|
|
||||||
|
|
||||||
Returns TRUE if current client have presented valid SSL client
|
|
||||||
certificate to the server and FALSE otherwise (e.g., no SSL,
|
|
||||||
certificate hadn't be requested by server).
|
|
||||||
|
|
||||||
2.3. ssl_client_serial()
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
ssl_client_serial() RETURNS numeric
|
|
||||||
|
|
||||||
Returns serial number of current client certificate. The combination
|
|
||||||
of certificate serial number and certificate issuer is guaranteed to
|
|
||||||
uniquely identify certificate (but not its owner -- the owner ought to
|
|
||||||
regularily change his keys, and get new certificates from the issuer).
|
|
||||||
|
|
||||||
So, if you run you own CA and allow only certificates from this CA to
|
|
||||||
be accepted by server, the serial number is the most reliable (albeit
|
|
||||||
not very mnemonic) means to indentify user.
|
|
||||||
|
|
||||||
2.4. ssl_client_dn()
|
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
ssl_client_dn() RETURNS text
|
|
||||||
|
|
||||||
Returns the full subject of current client certificate, converting
|
|
||||||
character data into the current database encoding. It is assumed that
|
|
||||||
if you use non-Latin characters in the certificate names, your
|
|
||||||
database is able to represent these characters, too. If your database
|
|
||||||
uses the SQL_ASCII encoding, non-Latin characters in the name will be
|
|
||||||
represented as UTF-8 sequences.
|
|
||||||
|
|
||||||
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
|
|
||||||
|
|
||||||
2.5. ssl_issuer_dn()
|
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Returns the full issuer name of the client certificate, converting
|
|
||||||
character data into current database encoding.
|
|
||||||
|
|
||||||
The combination of the return value of this function with the
|
|
||||||
certificate serial number uniquely identifies the certificate.
|
|
||||||
|
|
||||||
The result of this function is really useful only if you have more
|
|
||||||
than one trusted CA certificate in your server's root.crt file, or if
|
|
||||||
this CA has issued some intermediate certificate authority
|
|
||||||
certificates.
|
|
||||||
|
|
||||||
2.6. ssl_client_dn_field()
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
ssl_client_dn_field(fieldName text) RETURNS text
|
|
||||||
|
|
||||||
This function returns the value of the specified field in the
|
|
||||||
certificate subject. Field names are string constants that are
|
|
||||||
converted into ASN1 object identificators using the OpenSSL object
|
|
||||||
database. The following values are acceptable:
|
|
||||||
|
|
||||||
commonName (alias CN)
|
|
||||||
surname (alias SN)
|
|
||||||
name
|
|
||||||
givenName (alias GN)
|
|
||||||
countryName (alias C)
|
|
||||||
localityName (alias L)
|
|
||||||
stateOrProvinceName (alias ST)
|
|
||||||
organizationName (alias O)
|
|
||||||
organizationUnitName (alias OU)
|
|
||||||
title
|
|
||||||
description
|
|
||||||
initials
|
|
||||||
postalCode
|
|
||||||
streetAddress
|
|
||||||
generationQualifier
|
|
||||||
description
|
|
||||||
dnQualifier
|
|
||||||
x500UniqueIdentifier
|
|
||||||
pseudonim
|
|
||||||
role
|
|
||||||
emailAddress
|
|
||||||
|
|
||||||
All of these fields are optional, except commonName. It depends
|
|
||||||
entirely on your CA policy which of them would be included and which
|
|
||||||
wouldn't. The meaning of these fields, howeer, is strictly defined by
|
|
||||||
the X.500 and X.509 standards, so you cannot just assign arbitrary
|
|
||||||
meaning to them.
|
|
||||||
|
|
||||||
2.7 ssl_issuer_field()
|
|
||||||
~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
ssl_issuer_field(fieldName text) RETURNS text;
|
|
||||||
|
|
||||||
Does same as ssl_client_dn_field, but for the certificate issuer
|
|
||||||
rather than the certificate subject.
|
|
|
@ -1,642 +0,0 @@
|
||||||
/*
|
|
||||||
* tablefunc
|
|
||||||
*
|
|
||||||
* Sample to demonstrate C functions which return setof scalar
|
|
||||||
* and setof composite.
|
|
||||||
* Joe Conway <mail@joeconway.com>
|
|
||||||
* And contributors:
|
|
||||||
* Nabil Sayegh <postgresql@e-trolley.de>
|
|
||||||
*
|
|
||||||
* Copyright (c) 2002-2007, PostgreSQL Global Development Group
|
|
||||||
*
|
|
||||||
* Permission to use, copy, modify, and distribute this software and its
|
|
||||||
* documentation for any purpose, without fee, and without a written agreement
|
|
||||||
* is hereby granted, provided that the above copyright notice and this
|
|
||||||
* paragraph and the following two paragraphs appear in all copies.
|
|
||||||
*
|
|
||||||
* IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
|
|
||||||
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
|
|
||||||
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
|
|
||||||
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
|
|
||||||
* POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
*
|
|
||||||
* THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
|
|
||||||
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
|
|
||||||
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
|
|
||||||
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
|
|
||||||
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
Version 0.1 (20 July, 2002):
|
|
||||||
First release
|
|
||||||
|
|
||||||
Release Notes:
|
|
||||||
|
|
||||||
Version 0.1
|
|
||||||
- initial release
|
|
||||||
|
|
||||||
Installation:
|
|
||||||
Place these files in a directory called 'tablefunc' under 'contrib' in the
|
|
||||||
PostgreSQL source tree. Then run:
|
|
||||||
|
|
||||||
make
|
|
||||||
make install
|
|
||||||
|
|
||||||
You can use tablefunc.sql to create the functions in your database of choice, e.g.
|
|
||||||
|
|
||||||
psql -U postgres template1 < tablefunc.sql
|
|
||||||
|
|
||||||
installs following functions into database template1:
|
|
||||||
|
|
||||||
normal_rand(int numvals, float8 mean, float8 stddev)
|
|
||||||
- returns a set of normally distributed float8 values
|
|
||||||
|
|
||||||
crosstabN(text sql)
|
|
||||||
- returns a set of row_name plus N category value columns
|
|
||||||
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
|
|
||||||
but you can create additional crosstab functions per the instructions
|
|
||||||
in the documentation below.
|
|
||||||
|
|
||||||
crosstab(text sql)
|
|
||||||
- returns a set of row_name plus N category value columns
|
|
||||||
- requires anonymous composite type syntax in the FROM clause. See
|
|
||||||
the instructions in the documentation below.
|
|
||||||
|
|
||||||
crosstab(text sql, N int)
|
|
||||||
- obsolete version of crosstab()
|
|
||||||
- the argument N is now ignored, since the number of value columns
|
|
||||||
is always determined by the calling query
|
|
||||||
|
|
||||||
connectby(text relname, text keyid_fld, text parent_keyid_fld
|
|
||||||
[, text orderby_fld], text start_with, int max_depth
|
|
||||||
[, text branch_delim])
|
|
||||||
- returns keyid, parent_keyid, level, and an optional branch string
|
|
||||||
and an optional serial column for ordering siblings
|
|
||||||
- requires anonymous composite type syntax in the FROM clause. See
|
|
||||||
the instructions in the documentation below.
|
|
||||||
|
|
||||||
Documentation
|
|
||||||
==================================================================
|
|
||||||
Name
|
|
||||||
|
|
||||||
normal_rand(int, float8, float8) - returns a set of normally
|
|
||||||
distributed float8 values
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
|
|
||||||
normal_rand(int numvals, float8 mean, float8 stddev)
|
|
||||||
|
|
||||||
Inputs
|
|
||||||
|
|
||||||
numvals
|
|
||||||
the number of random values to be returned from the function
|
|
||||||
|
|
||||||
mean
|
|
||||||
the mean of the normal distribution of values
|
|
||||||
|
|
||||||
stddev
|
|
||||||
the standard deviation of the normal distribution of values
|
|
||||||
|
|
||||||
Outputs
|
|
||||||
|
|
||||||
Returns setof float8, where the returned set of random values are normally
|
|
||||||
distributed (Gaussian distribution)
|
|
||||||
|
|
||||||
Example usage
|
|
||||||
|
|
||||||
test=# SELECT * FROM
|
|
||||||
test=# normal_rand(1000, 5, 3);
|
|
||||||
normal_rand
|
|
||||||
----------------------
|
|
||||||
1.56556322244898
|
|
||||||
9.10040991424657
|
|
||||||
5.36957140345079
|
|
||||||
-0.369151492880995
|
|
||||||
0.283600703686639
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
4.82992125404908
|
|
||||||
9.71308014517282
|
|
||||||
2.49639286969028
|
|
||||||
(1000 rows)
|
|
||||||
|
|
||||||
Returns 1000 values with a mean of 5 and a standard deviation of 3.
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
Name
|
|
||||||
|
|
||||||
crosstabN(text) - returns a set of row_name plus N category value columns
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
|
|
||||||
crosstabN(text sql)
|
|
||||||
|
|
||||||
Inputs
|
|
||||||
|
|
||||||
sql
|
|
||||||
|
|
||||||
A SQL statement which produces the source set of data. The SQL statement
|
|
||||||
must return one row_name column, one category column, and one value
|
|
||||||
column. row_name and value must be of type text.
|
|
||||||
|
|
||||||
e.g. provided sql must produce a set something like:
|
|
||||||
|
|
||||||
row_name cat value
|
|
||||||
----------+-------+-------
|
|
||||||
row1 cat1 val1
|
|
||||||
row1 cat2 val2
|
|
||||||
row1 cat3 val3
|
|
||||||
row1 cat4 val4
|
|
||||||
row2 cat1 val5
|
|
||||||
row2 cat2 val6
|
|
||||||
row2 cat3 val7
|
|
||||||
row2 cat4 val8
|
|
||||||
|
|
||||||
Outputs
|
|
||||||
|
|
||||||
Returns setof tablefunc_crosstab_N, which is defined by:
|
|
||||||
|
|
||||||
CREATE TYPE tablefunc_crosstab_N AS (
|
|
||||||
row_name TEXT,
|
|
||||||
category_1 TEXT,
|
|
||||||
category_2 TEXT,
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
category_N TEXT
|
|
||||||
);
|
|
||||||
|
|
||||||
for the default installed functions, where N is 2, 3, or 4.
|
|
||||||
|
|
||||||
e.g. the provided crosstab2 function produces a set something like:
|
|
||||||
<== values columns ==>
|
|
||||||
row_name category_1 category_2
|
|
||||||
---------+------------+------------
|
|
||||||
row1 val1 val2
|
|
||||||
row2 val5 val6
|
|
||||||
|
|
||||||
Notes
|
|
||||||
|
|
||||||
1. The sql result must be ordered by 1,2.
|
|
||||||
|
|
||||||
2. The number of values columns depends on the tuple description
|
|
||||||
of the function's declared return type.
|
|
||||||
|
|
||||||
3. Missing values (i.e. not enough adjacent rows of same row_name to
|
|
||||||
fill the number of result values columns) are filled in with nulls.
|
|
||||||
|
|
||||||
4. Extra values (i.e. too many adjacent rows of same row_name to fill
|
|
||||||
the number of result values columns) are skipped.
|
|
||||||
|
|
||||||
5. Rows with all nulls in the values columns are skipped.
|
|
||||||
|
|
||||||
6. The installed defaults are for illustration purposes. You
|
|
||||||
can create your own return types and functions based on the
|
|
||||||
crosstab() function of the installed library. See below for
|
|
||||||
details.
|
|
||||||
|
|
||||||
|
|
||||||
Example usage
|
|
||||||
|
|
||||||
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
|
|
||||||
|
|
||||||
select * from crosstab3(
|
|
||||||
'select rowid, attribute, value
|
|
||||||
from ct
|
|
||||||
where rowclass = ''group1''
|
|
||||||
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;');
|
|
||||||
|
|
||||||
row_name | category_1 | category_2 | category_3
|
|
||||||
----------+------------+------------+------------
|
|
||||||
test1 | val2 | val3 |
|
|
||||||
test2 | val6 | val7 |
|
|
||||||
(2 rows)
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
Name
|
|
||||||
|
|
||||||
crosstab(text) - returns a set of row_names plus category value columns
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
|
|
||||||
crosstab(text sql)
|
|
||||||
|
|
||||||
crosstab(text sql, int N)
|
|
||||||
|
|
||||||
Inputs
|
|
||||||
|
|
||||||
sql
|
|
||||||
|
|
||||||
A SQL statement which produces the source set of data. The SQL statement
|
|
||||||
must return one row_name column, one category column, and one value
|
|
||||||
column.
|
|
||||||
|
|
||||||
e.g. provided sql must produce a set something like:
|
|
||||||
|
|
||||||
row_name cat value
|
|
||||||
----------+-------+-------
|
|
||||||
row1 cat1 val1
|
|
||||||
row1 cat2 val2
|
|
||||||
row1 cat3 val3
|
|
||||||
row1 cat4 val4
|
|
||||||
row2 cat1 val5
|
|
||||||
row2 cat2 val6
|
|
||||||
row2 cat3 val7
|
|
||||||
row2 cat4 val8
|
|
||||||
|
|
||||||
N
|
|
||||||
|
|
||||||
Obsolete argument; ignored if supplied (formerly this had to match
|
|
||||||
the number of category columns determined by the calling query)
|
|
||||||
|
|
||||||
Outputs
|
|
||||||
|
|
||||||
Returns setof record, which must be defined with a column definition
|
|
||||||
in the FROM clause of the SELECT statement, e.g.:
|
|
||||||
|
|
||||||
SELECT *
|
|
||||||
FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text);
|
|
||||||
|
|
||||||
the example crosstab function produces a set something like:
|
|
||||||
<== values columns ==>
|
|
||||||
row_name category_1 category_2
|
|
||||||
---------+------------+------------
|
|
||||||
row1 val1 val2
|
|
||||||
row2 val5 val6
|
|
||||||
|
|
||||||
Notes
|
|
||||||
|
|
||||||
1. The sql result must be ordered by 1,2.
|
|
||||||
|
|
||||||
2. The number of values columns is determined by the column definition
|
|
||||||
provided in the FROM clause. The FROM clause must define one
|
|
||||||
row_name column (of the same datatype as the first result column
|
|
||||||
of the sql query) followed by N category columns (of the same
|
|
||||||
datatype as the third result column of the sql query). You can
|
|
||||||
set up as many category columns as you wish.
|
|
||||||
|
|
||||||
3. Missing values (i.e. not enough adjacent rows of same row_name to
|
|
||||||
fill the number of result values columns) are filled in with nulls.
|
|
||||||
|
|
||||||
4. Extra values (i.e. too many adjacent rows of same row_name to fill
|
|
||||||
the number of result values columns) are skipped.
|
|
||||||
|
|
||||||
5. Rows with all nulls in the values columns are skipped.
|
|
||||||
|
|
||||||
6. You can avoid always having to write out a FROM clause that defines the
|
|
||||||
output columns by setting up a custom crosstab function that has
|
|
||||||
the desired output row type wired into its definition.
|
|
||||||
|
|
||||||
There are two ways you can set up a custom crosstab function:
|
|
||||||
|
|
||||||
A. Create a composite type to define your return type, similar to the
|
|
||||||
examples in the installation script. Then define a unique function
|
|
||||||
name accepting one text parameter and returning setof your_type_name.
|
|
||||||
For example, if your source data produces row_names that are TEXT,
|
|
||||||
and values that are FLOAT8, and you want 5 category columns:
|
|
||||||
|
|
||||||
CREATE TYPE my_crosstab_float8_5_cols AS (
|
|
||||||
row_name TEXT,
|
|
||||||
category_1 FLOAT8,
|
|
||||||
category_2 FLOAT8,
|
|
||||||
category_3 FLOAT8,
|
|
||||||
category_4 FLOAT8,
|
|
||||||
category_5 FLOAT8
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
|
|
||||||
RETURNS setof my_crosstab_float8_5_cols
|
|
||||||
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
|
|
||||||
|
|
||||||
B. Use OUT parameters to define the return type implicitly.
|
|
||||||
The same example could also be done this way:
|
|
||||||
|
|
||||||
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text,
|
|
||||||
OUT row_name TEXT,
|
|
||||||
OUT category_1 FLOAT8,
|
|
||||||
OUT category_2 FLOAT8,
|
|
||||||
OUT category_3 FLOAT8,
|
|
||||||
OUT category_4 FLOAT8,
|
|
||||||
OUT category_5 FLOAT8)
|
|
||||||
RETURNS setof record
|
|
||||||
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
|
|
||||||
|
|
||||||
|
|
||||||
Example usage
|
|
||||||
|
|
||||||
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
|
|
||||||
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
|
|
||||||
|
|
||||||
SELECT *
|
|
||||||
FROM crosstab(
|
|
||||||
'select rowid, attribute, value
|
|
||||||
from ct
|
|
||||||
where rowclass = ''group1''
|
|
||||||
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3)
|
|
||||||
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
|
|
||||||
|
|
||||||
row_name | category_1 | category_2 | category_3
|
|
||||||
----------+------------+------------+------------
|
|
||||||
test1 | val2 | val3 |
|
|
||||||
test2 | val6 | val7 |
|
|
||||||
(2 rows)
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
Name
|
|
||||||
|
|
||||||
crosstab(text, text) - returns a set of row_name, extra, and
|
|
||||||
category value columns
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
|
|
||||||
crosstab(text source_sql, text category_sql)
|
|
||||||
|
|
||||||
Inputs
|
|
||||||
|
|
||||||
source_sql
|
|
||||||
|
|
||||||
A SQL statement which produces the source set of data. The SQL statement
|
|
||||||
must return one row_name column, one category column, and one value
|
|
||||||
column. It may also have one or more "extra" columns.
|
|
||||||
|
|
||||||
The row_name column must be first. The category and value columns
|
|
||||||
must be the last two columns, in that order. "extra" columns must be
|
|
||||||
columns 2 through (N - 2), where N is the total number of columns.
|
|
||||||
|
|
||||||
The "extra" columns are assumed to be the same for all rows with the
|
|
||||||
same row_name. The values returned are copied from the first row
|
|
||||||
with a given row_name and subsequent values of these columns are ignored
|
|
||||||
until row_name changes.
|
|
||||||
|
|
||||||
e.g. source_sql must produce a set something like:
|
|
||||||
SELECT row_name, extra_col, cat, value FROM foo;
|
|
||||||
|
|
||||||
row_name extra_col cat value
|
|
||||||
----------+------------+-----+---------
|
|
||||||
row1 extra1 cat1 val1
|
|
||||||
row1 extra1 cat2 val2
|
|
||||||
row1 extra1 cat4 val4
|
|
||||||
row2 extra2 cat1 val5
|
|
||||||
row2 extra2 cat2 val6
|
|
||||||
row2 extra2 cat3 val7
|
|
||||||
row2 extra2 cat4 val8
|
|
||||||
|
|
||||||
category_sql
|
|
||||||
|
|
||||||
A SQL statement which produces the distinct set of categories. The SQL
|
|
||||||
statement must return one category column only. category_sql must produce
|
|
||||||
at least one result row or an error will be generated. category_sql
|
|
||||||
must not produce duplicate categories or an error will be generated.
|
|
||||||
|
|
||||||
e.g. SELECT DISTINCT cat FROM foo;
|
|
||||||
|
|
||||||
cat
|
|
||||||
-------
|
|
||||||
cat1
|
|
||||||
cat2
|
|
||||||
cat3
|
|
||||||
cat4
|
|
||||||
|
|
||||||
Outputs
|
|
||||||
|
|
||||||
Returns setof record, which must be defined with a column definition
|
|
||||||
in the FROM clause of the SELECT statement, e.g.:
|
|
||||||
|
|
||||||
SELECT * FROM crosstab(source_sql, cat_sql)
|
|
||||||
AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
|
|
||||||
|
|
||||||
the example crosstab function produces a set something like:
|
|
||||||
<== values columns ==>
|
|
||||||
row_name extra cat1 cat2 cat3 cat4
|
|
||||||
---------+-------+------+------+------+------
|
|
||||||
row1 extra1 val1 val2 val4
|
|
||||||
row2 extra2 val5 val6 val7 val8
|
|
||||||
|
|
||||||
Notes
|
|
||||||
|
|
||||||
1. source_sql must be ordered by row_name (column 1).
|
|
||||||
|
|
||||||
2. The number of values columns is determined at run-time. The
|
|
||||||
column definition provided in the FROM clause must provide for
|
|
||||||
the correct number of columns of the proper data types.
|
|
||||||
|
|
||||||
3. Missing values (i.e. not enough adjacent rows of same row_name to
|
|
||||||
fill the number of result values columns) are filled in with nulls.
|
|
||||||
|
|
||||||
4. Extra values (i.e. source rows with category not found in category_sql
|
|
||||||
result) are skipped.
|
|
||||||
|
|
||||||
5. Rows with a null row_name column are skipped.
|
|
||||||
|
|
||||||
6. You can create predefined functions to avoid having to write out
|
|
||||||
the result column names/types in each query. See the examples
|
|
||||||
for crosstab(text).
|
|
||||||
|
|
||||||
|
|
||||||
Example usage
|
|
||||||
|
|
||||||
create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
|
|
||||||
insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
|
|
||||||
insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
|
|
||||||
insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
|
|
||||||
insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
|
|
||||||
insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
|
|
||||||
insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
|
|
||||||
insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
|
|
||||||
|
|
||||||
SELECT * FROM crosstab
|
|
||||||
(
|
|
||||||
'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
|
|
||||||
'SELECT DISTINCT attribute FROM cth ORDER BY 1'
|
|
||||||
)
|
|
||||||
AS
|
|
||||||
(
|
|
||||||
rowid text,
|
|
||||||
rowdt timestamp,
|
|
||||||
temperature int4,
|
|
||||||
test_result text,
|
|
||||||
test_startdate timestamp,
|
|
||||||
volts float8
|
|
||||||
);
|
|
||||||
rowid | rowdt | temperature | test_result | test_startdate | volts
|
|
||||||
-------+--------------------------+-------------+-------------+--------------------------+--------
|
|
||||||
test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
|
|
||||||
test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
|
|
||||||
(2 rows)
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
Name
|
|
||||||
|
|
||||||
connectby(text, text, text[, text], text, text, int[, text]) - returns a set
|
|
||||||
representing a hierarchy (tree structure)
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
|
|
||||||
connectby(text relname, text keyid_fld, text parent_keyid_fld
|
|
||||||
[, text orderby_fld], text start_with, int max_depth
|
|
||||||
[, text branch_delim])
|
|
||||||
|
|
||||||
Inputs
|
|
||||||
|
|
||||||
relname
|
|
||||||
|
|
||||||
Name of the source relation
|
|
||||||
|
|
||||||
keyid_fld
|
|
||||||
|
|
||||||
Name of the key field
|
|
||||||
|
|
||||||
parent_keyid_fld
|
|
||||||
|
|
||||||
Name of the key_parent field
|
|
||||||
|
|
||||||
orderby_fld
|
|
||||||
|
|
||||||
If optional ordering of siblings is desired:
|
|
||||||
Name of the field to order siblings
|
|
||||||
|
|
||||||
start_with
|
|
||||||
|
|
||||||
root value of the tree input as a text value regardless of keyid_fld type
|
|
||||||
|
|
||||||
max_depth
|
|
||||||
|
|
||||||
zero (0) for unlimited depth, otherwise restrict level to this depth
|
|
||||||
|
|
||||||
branch_delim
|
|
||||||
|
|
||||||
If optional branch value is desired, this string is used as the delimiter.
|
|
||||||
When not provided, a default value of '~' is used for internal
|
|
||||||
recursion detection only, and no "branch" field is returned.
|
|
||||||
|
|
||||||
Outputs
|
|
||||||
|
|
||||||
Returns setof record, which must defined with a column definition
|
|
||||||
in the FROM clause of the SELECT statement, e.g.:
|
|
||||||
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
|
|
||||||
AS t(keyid text, parent_keyid text, level int, branch text);
|
|
||||||
|
|
||||||
- or -
|
|
||||||
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
|
|
||||||
AS t(keyid text, parent_keyid text, level int);
|
|
||||||
|
|
||||||
- or -
|
|
||||||
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
|
|
||||||
AS t(keyid text, parent_keyid text, level int, branch text, pos int);
|
|
||||||
|
|
||||||
- or -
|
|
||||||
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
|
|
||||||
AS t(keyid text, parent_keyid text, level int, pos int);
|
|
||||||
|
|
||||||
Notes
|
|
||||||
|
|
||||||
1. keyid and parent_keyid must be the same data type
|
|
||||||
|
|
||||||
2. The column definition *must* include a third column of type INT4 for
|
|
||||||
the level value output
|
|
||||||
|
|
||||||
3. If the branch field is not desired, omit both the branch_delim input
|
|
||||||
parameter *and* the branch field in the query column definition. Note
|
|
||||||
that when branch_delim is not provided, a default value of '~' is used
|
|
||||||
for branch_delim for internal recursion detection, even though the branch
|
|
||||||
field is not returned.
|
|
||||||
|
|
||||||
4. If the branch field is desired, it must be the fourth column in the query
|
|
||||||
column definition, and it must be type TEXT.
|
|
||||||
|
|
||||||
5. The parameters representing table and field names must include double
|
|
||||||
quotes if the names are mixed-case or contain special characters.
|
|
||||||
|
|
||||||
6. If sorting of siblings is desired, the orderby_fld input parameter *and*
|
|
||||||
a name for the resulting serial field (type INT32) in the query column
|
|
||||||
definition must be given.
|
|
||||||
|
|
||||||
Example usage
|
|
||||||
|
|
||||||
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
|
|
||||||
|
|
||||||
INSERT INTO connectby_tree VALUES('row1',NULL, 0);
|
|
||||||
INSERT INTO connectby_tree VALUES('row2','row1', 0);
|
|
||||||
INSERT INTO connectby_tree VALUES('row3','row1', 0);
|
|
||||||
INSERT INTO connectby_tree VALUES('row4','row2', 1);
|
|
||||||
INSERT INTO connectby_tree VALUES('row5','row2', 0);
|
|
||||||
INSERT INTO connectby_tree VALUES('row6','row4', 0);
|
|
||||||
INSERT INTO connectby_tree VALUES('row7','row3', 0);
|
|
||||||
INSERT INTO connectby_tree VALUES('row8','row6', 0);
|
|
||||||
INSERT INTO connectby_tree VALUES('row9','row5', 0);
|
|
||||||
|
|
||||||
-- with branch, without orderby_fld
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
|
|
||||||
AS t(keyid text, parent_keyid text, level int, branch text);
|
|
||||||
keyid | parent_keyid | level | branch
|
|
||||||
-------+--------------+-------+---------------------
|
|
||||||
row2 | | 0 | row2
|
|
||||||
row4 | row2 | 1 | row2~row4
|
|
||||||
row6 | row4 | 2 | row2~row4~row6
|
|
||||||
row8 | row6 | 3 | row2~row4~row6~row8
|
|
||||||
row5 | row2 | 1 | row2~row5
|
|
||||||
row9 | row5 | 2 | row2~row5~row9
|
|
||||||
(6 rows)
|
|
||||||
|
|
||||||
-- without branch, without orderby_fld
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
|
|
||||||
AS t(keyid text, parent_keyid text, level int);
|
|
||||||
keyid | parent_keyid | level
|
|
||||||
-------+--------------+-------
|
|
||||||
row2 | | 0
|
|
||||||
row4 | row2 | 1
|
|
||||||
row6 | row4 | 2
|
|
||||||
row8 | row6 | 3
|
|
||||||
row5 | row2 | 1
|
|
||||||
row9 | row5 | 2
|
|
||||||
(6 rows)
|
|
||||||
|
|
||||||
-- with branch, with orderby_fld (notice that row5 comes before row4)
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
|
|
||||||
AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos;
|
|
||||||
keyid | parent_keyid | level | branch | pos
|
|
||||||
-------+--------------+-------+---------------------+-----
|
|
||||||
row2 | | 0 | row2 | 1
|
|
||||||
row5 | row2 | 1 | row2~row5 | 2
|
|
||||||
row9 | row5 | 2 | row2~row5~row9 | 3
|
|
||||||
row4 | row2 | 1 | row2~row4 | 4
|
|
||||||
row6 | row4 | 2 | row2~row4~row6 | 5
|
|
||||||
row8 | row6 | 3 | row2~row4~row6~row8 | 6
|
|
||||||
(6 rows)
|
|
||||||
|
|
||||||
-- without branch, with orderby_fld (notice that row5 comes before row4)
|
|
||||||
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
|
|
||||||
AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos;
|
|
||||||
keyid | parent_keyid | level | pos
|
|
||||||
-------+--------------+-------+-----
|
|
||||||
row2 | | 0 | 1
|
|
||||||
row5 | row2 | 1 | 2
|
|
||||||
row9 | row5 | 2 | 3
|
|
||||||
row4 | row2 | 1 | 4
|
|
||||||
row6 | row4 | 2 | 5
|
|
||||||
row8 | row6 | 3 | 6
|
|
||||||
(6 rows)
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
-- Joe Conway
|
|
||||||
|
|
|
@ -1,97 +0,0 @@
|
||||||
UUID Generation Functions
|
|
||||||
=========================
|
|
||||||
Peter Eisentraut <peter_e@gmx.net>
|
|
||||||
|
|
||||||
This module provides functions to generate universally unique
|
|
||||||
identifiers (UUIDs) using one of the several standard algorithms, as
|
|
||||||
well as functions to produce certain special UUID constants.
|
|
||||||
|
|
||||||
|
|
||||||
Installation
|
|
||||||
------------
|
|
||||||
|
|
||||||
The extra library required can be found at
|
|
||||||
<http://www.ossp.org/pkg/lib/uuid/>.
|
|
||||||
|
|
||||||
|
|
||||||
UUID Generation
|
|
||||||
---------------
|
|
||||||
|
|
||||||
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
|
|
||||||
4122 specify four algorithms for generating UUIDs, identified by the
|
|
||||||
version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.)
|
|
||||||
Each of these algorithms could be suitable for a different set of
|
|
||||||
applications.
|
|
||||||
|
|
||||||
uuid_generate_v1()
|
|
||||||
~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This function generates a version 1 UUID. This involves the MAC
|
|
||||||
address of the computer and a time stamp. Note that UUIDs of this
|
|
||||||
kind reveal the identity of the computer that created the identifier
|
|
||||||
and the time at which it did so, which might make it unsuitable for
|
|
||||||
certain security-sensitive applications.
|
|
||||||
|
|
||||||
uuid_generate_v1mc()
|
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This function generates a version 1 UUID but uses a random multicast
|
|
||||||
MAC address instead of the real MAC address of the computer.
|
|
||||||
|
|
||||||
uuid_generate_v3(namespace uuid, name text)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This function generates a version 3 UUID in the given namespace using
|
|
||||||
the specified input name. The namespace should be one of the special
|
|
||||||
constants produced by the uuid_ns_*() functions shown below. (It
|
|
||||||
could be any UUID in theory.) The name is an identifier in the
|
|
||||||
selected namespace. For example:
|
|
||||||
|
|
||||||
uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')
|
|
||||||
|
|
||||||
The name parameter will be MD5-hashed, so the cleartext cannot be
|
|
||||||
derived from the generated UUID.
|
|
||||||
|
|
||||||
The generation of UUIDs by this method has no random or
|
|
||||||
environment-dependent element and is therefore reproducible.
|
|
||||||
|
|
||||||
uuid_generate_v4()
|
|
||||||
~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This function generates a version 4 UUID, which is derived entirely
|
|
||||||
from random numbers.
|
|
||||||
|
|
||||||
uuid_generate_v5(namespace uuid, name text)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This function generates a version 5 UUID, which works like a version 3
|
|
||||||
UUID except that SHA-1 is used as a hashing method. Version 5 should
|
|
||||||
be preferred over version 3 because SHA-1 is thought to be more secure
|
|
||||||
than MD5.
|
|
||||||
|
|
||||||
|
|
||||||
UUID Constants
|
|
||||||
--------------
|
|
||||||
|
|
||||||
uuid_nil()
|
|
||||||
|
|
||||||
A "nil" UUID constant, which does not occur as a real UUID.
|
|
||||||
|
|
||||||
uuid_ns_dns()
|
|
||||||
|
|
||||||
Constant designating the DNS namespace for UUIDs.
|
|
||||||
|
|
||||||
uuid_ns_url()
|
|
||||||
|
|
||||||
Constant designating the URL namespace for UUIDs.
|
|
||||||
|
|
||||||
uuid_ns_oid()
|
|
||||||
|
|
||||||
Constant designating the ISO object identifier (OID) namespace for
|
|
||||||
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
|
|
||||||
PostgreSQL.)
|
|
||||||
|
|
||||||
uuid_ns_x500()
|
|
||||||
|
|
||||||
Constant designating the X.500 distinguished name (DN) namespace for
|
|
||||||
UUIDs.
|
|
|
@ -1,58 +0,0 @@
|
||||||
$PostgreSQL: pgsql/contrib/vacuumlo/README.vacuumlo,v 1.5 2005/06/23 00:06:37 tgl Exp $
|
|
||||||
|
|
||||||
This is a simple utility that will remove any orphaned large objects out of a
|
|
||||||
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
|
|
||||||
does not appear in any OID data column of the database.
|
|
||||||
|
|
||||||
If you use this, you may also be interested in the lo_manage trigger in
|
|
||||||
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
|
|
||||||
in the first place.
|
|
||||||
|
|
||||||
|
|
||||||
Compiling
|
|
||||||
--------
|
|
||||||
|
|
||||||
Simply run make. A single executable "vacuumlo" is created.
|
|
||||||
|
|
||||||
|
|
||||||
Usage
|
|
||||||
-----
|
|
||||||
|
|
||||||
vacuumlo [options] database [database2 ... databasen]
|
|
||||||
|
|
||||||
All databases named on the command line are processed. Available options
|
|
||||||
include:
|
|
||||||
|
|
||||||
-v Write a lot of progress messages
|
|
||||||
-n Don't remove large objects, just show what would be done
|
|
||||||
-U username Username to connect as
|
|
||||||
-W Prompt for password
|
|
||||||
-h hostname Database server host
|
|
||||||
-p port Database server port
|
|
||||||
|
|
||||||
|
|
||||||
Method
|
|
||||||
------
|
|
||||||
|
|
||||||
First, it builds a temporary table which contains all of the OIDs of the
|
|
||||||
large objects in that database.
|
|
||||||
|
|
||||||
It then scans through all columns in the database that are of type "oid"
|
|
||||||
or "lo", and removes matching entries from the temporary table.
|
|
||||||
|
|
||||||
The remaining entries in the temp table identify orphaned LOs. These are
|
|
||||||
removed.
|
|
||||||
|
|
||||||
|
|
||||||
Notes
|
|
||||||
-----
|
|
||||||
|
|
||||||
I decided to place this in contrib as it needs further testing, but hopefully,
|
|
||||||
this (or a variant of it) would make it into the backend as a "vacuum lo"
|
|
||||||
command in a later release.
|
|
||||||
|
|
||||||
Peter Mount <peter@retep.org.uk>
|
|
||||||
http://www.retep.org.uk
|
|
||||||
March 21 1999
|
|
||||||
|
|
||||||
Committed April 10 1999 Peter
|
|
|
@ -1,278 +0,0 @@
|
||||||
XML-handling functions for PostgreSQL
|
|
||||||
=====================================
|
|
||||||
|
|
||||||
DEPRECATION NOTICE: From PostgreSQL 8.3 on, there is XML-related
|
|
||||||
functionality based on the SQL/XML standard in the core server.
|
|
||||||
That functionality covers XML syntax checking and XPath queries,
|
|
||||||
which is what this module does as well, and more, but the API is
|
|
||||||
not at all compatible. It is planned that this module will be
|
|
||||||
removed in PostgreSQL 8.4 in favor of the newer standard API, so
|
|
||||||
you are encouraged to try converting your applications. If you
|
|
||||||
find that some of the functionality of this module is not
|
|
||||||
available in an adequate form with the newer API, please explain
|
|
||||||
your issue to pgsql-hackers@postgresql.org so that the deficiency
|
|
||||||
can be addressed.
|
|
||||||
-- Peter Eisentraut, 2007-05-24
|
|
||||||
|
|
||||||
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
|
|
||||||
It has the same BSD licence as PostgreSQL.
|
|
||||||
|
|
||||||
This version of the XML functions provides both XPath querying and
|
|
||||||
XSLT functionality. There is also a new table function which allows
|
|
||||||
the straightforward return of multiple XML results. Note that the current code
|
|
||||||
doesn't take any particular care over character sets - this is
|
|
||||||
something that should be fixed at some point!
|
|
||||||
|
|
||||||
Installation
|
|
||||||
------------
|
|
||||||
|
|
||||||
The current build process will only work if the files are in
|
|
||||||
contrib/xml2 in a PostgreSQL 7.3 or later source tree which has been
|
|
||||||
configured and built (If you alter the subdir value in the Makefile
|
|
||||||
you can place it in a different directory in a PostgreSQL tree).
|
|
||||||
|
|
||||||
Before you begin, just check the Makefile, and then just 'make' and
|
|
||||||
'make install'.
|
|
||||||
|
|
||||||
By default, this module requires both libxml2 and libxslt to be installed
|
|
||||||
on your system. If you do not have libxslt or do not want to use XSLT
|
|
||||||
functions, you must edit the Makefile to not build the XSLT functions,
|
|
||||||
as directed in its comments; and edit pgxml.sql.in to remove the XSLT
|
|
||||||
function declarations, as directed in its comments.
|
|
||||||
|
|
||||||
Description of functions
|
|
||||||
------------------------
|
|
||||||
|
|
||||||
The first set of functions are straightforward XML parsing and XPath queries:
|
|
||||||
|
|
||||||
xml_is_well_formed(document) RETURNS bool
|
|
||||||
|
|
||||||
This parses the document text in its parameter and returns true if the
|
|
||||||
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
|
|
||||||
was called xml_valid(). That is the wrong name since validity and
|
|
||||||
well-formedness have different meanings in XML. The old name is still
|
|
||||||
available, but is deprecated and will be removed in 8.3.)
|
|
||||||
|
|
||||||
xpath_string(document,query) RETURNS text
|
|
||||||
xpath_number(document,query) RETURNS float4
|
|
||||||
xpath_bool(document,query) RETURNS bool
|
|
||||||
|
|
||||||
These functions evaluate the XPath query on the supplied document, and
|
|
||||||
cast the result to the specified type.
|
|
||||||
|
|
||||||
|
|
||||||
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
|
|
||||||
|
|
||||||
This evaluates query on document and wraps the result in XML tags. If
|
|
||||||
the result is multivalued, the output will look like:
|
|
||||||
|
|
||||||
<toptag>
|
|
||||||
<itemtag>Value 1 which could be an XML fragment</itemtag>
|
|
||||||
<itemtag>Value 2....</itemtag>
|
|
||||||
</toptag>
|
|
||||||
|
|
||||||
If either toptag or itemtag is an empty string, the relevant tag is omitted.
|
|
||||||
There are also wrapper functions for this operation:
|
|
||||||
|
|
||||||
xpath_nodeset(document,query) RETURNS text omits both tags.
|
|
||||||
xpath_nodeset(document,query,itemtag) RETURNS text omits toptag.
|
|
||||||
|
|
||||||
|
|
||||||
xpath_list(document,query,seperator) RETURNS text
|
|
||||||
|
|
||||||
This function returns multiple values seperated by the specified
|
|
||||||
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
|
|
||||||
|
|
||||||
xpath_list(document,query) RETURNS text
|
|
||||||
|
|
||||||
This is a wrapper for the above function that uses ',' as the seperator.
|
|
||||||
|
|
||||||
|
|
||||||
xpath_table
|
|
||||||
-----------
|
|
||||||
|
|
||||||
This is a table function which evaluates a set of XPath queries on
|
|
||||||
each of a set of documents and returns the results as a table. The
|
|
||||||
primary key field from the original document table is returned as the
|
|
||||||
first column of the result so that the resultset from xpath_table can
|
|
||||||
be readily used in joins.
|
|
||||||
|
|
||||||
The function itself takes 5 arguments, all text.
|
|
||||||
|
|
||||||
xpath_table(key,document,relation,xpaths,criteria)
|
|
||||||
|
|
||||||
key - the name of the "key" field - this is just a field to be used as
|
|
||||||
the first column of the output table i.e. it identifies the record from
|
|
||||||
which each output row came (see note below about multiple values).
|
|
||||||
|
|
||||||
document - the name of the field containing the XML document
|
|
||||||
|
|
||||||
relation - the name of the table or view containing the documents
|
|
||||||
|
|
||||||
xpaths - multiple xpath expressions separated by |
|
|
||||||
|
|
||||||
criteria - The contents of the where clause. This needs to be specified,
|
|
||||||
so use "true" or "1=1" here if you want to process all the rows in the
|
|
||||||
relation.
|
|
||||||
|
|
||||||
NB These parameters (except the XPath strings) are just substituted
|
|
||||||
into a plain SQL SELECT statement, so you have some flexibility - the
|
|
||||||
statement is
|
|
||||||
|
|
||||||
SELECT <key>,<document> FROM <relation> WHERE <criteria>
|
|
||||||
|
|
||||||
so those parameters can be *anything* valid in those particular
|
|
||||||
locations. The result from this SELECT needs to return exactly two
|
|
||||||
columns (which it will unless you try to list multiple fields for key
|
|
||||||
or document). Beware that this simplistic approach requires that you
|
|
||||||
validate any user-supplied values to avoid SQL injection attacks.
|
|
||||||
|
|
||||||
Using the function
|
|
||||||
|
|
||||||
The function has to be used in a FROM expression. This gives the following
|
|
||||||
form:
|
|
||||||
|
|
||||||
SELECT * FROM
|
|
||||||
xpath_table('article_id',
|
|
||||||
'article_xml',
|
|
||||||
'articles',
|
|
||||||
'/article/author|/article/pages|/article/title',
|
|
||||||
'date_entered > ''2003-01-01'' ')
|
|
||||||
AS t(article_id integer, author text, page_count integer, title text);
|
|
||||||
|
|
||||||
The AS clause defines the names and types of the columns in the
|
|
||||||
virtual table. If there are more XPath queries than result columns,
|
|
||||||
the extra queries will be ignored. If there are more result columns
|
|
||||||
than XPath queries, the extra columns will be NULL.
|
|
||||||
|
|
||||||
Note that I've said in this example that pages is an integer. The
|
|
||||||
function deals internally with string representations, so when you say
|
|
||||||
you want an integer in the output, it will take the string
|
|
||||||
representation of the XPath result and use PostgreSQL input functions
|
|
||||||
to transform it into an integer (or whatever type the AS clause
|
|
||||||
requests). An error will result if it can't do this - for example if
|
|
||||||
the result is empty - so you may wish to just stick to 'text' as the
|
|
||||||
column type if you think your data has any problems.
|
|
||||||
|
|
||||||
The select statement doesn't need to use * alone - it can reference the
|
|
||||||
columns by name or join them to other tables. The function produces a
|
|
||||||
virtual table with which you can perform any operation you wish (e.g.
|
|
||||||
aggregation, joining, sorting etc). So we could also have:
|
|
||||||
|
|
||||||
SELECT t.title, p.fullname, p.email
|
|
||||||
FROM xpath_table('article_id','article_xml','articles',
|
|
||||||
'/article/title|/article/author/@id',
|
|
||||||
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
|
|
||||||
AS t(article_id integer, title text, author_id integer),
|
|
||||||
tblPeopleInfo AS p
|
|
||||||
WHERE t.author_id = p.person_id;
|
|
||||||
|
|
||||||
as a more complicated example. Of course, you could wrap all
|
|
||||||
of this in a view for convenience.
|
|
||||||
|
|
||||||
Multivalued results
|
|
||||||
|
|
||||||
The xpath_table function assumes that the results of each XPath query
|
|
||||||
might be multi-valued, so the number of rows returned by the function
|
|
||||||
may not be the same as the number of input documents. The first row
|
|
||||||
returned contains the first result from each query, the second row the
|
|
||||||
second result from each query. If one of the queries has fewer values
|
|
||||||
than the others, NULLs will be returned instead.
|
|
||||||
|
|
||||||
In some cases, a user will know that a given XPath query will return
|
|
||||||
only a single result (perhaps a unique document identifier) - if used
|
|
||||||
alongside an XPath query returning multiple results, the single-valued
|
|
||||||
result will appear only on the first row of the result. The solution
|
|
||||||
to this is to use the key field as part of a join against a simpler
|
|
||||||
XPath query. As an example:
|
|
||||||
|
|
||||||
|
|
||||||
CREATE TABLE test
|
|
||||||
(
|
|
||||||
id int4 NOT NULL,
|
|
||||||
xml text,
|
|
||||||
CONSTRAINT pk PRIMARY KEY (id)
|
|
||||||
)
|
|
||||||
WITHOUT OIDS;
|
|
||||||
|
|
||||||
INSERT INTO test VALUES (1, '<doc num="C1">
|
|
||||||
<line num="L1"><a>1</a><b>2</b><c>3</c></line>
|
|
||||||
<line num="L2"><a>11</a><b>22</b><c>33</c></line>
|
|
||||||
</doc>');
|
|
||||||
|
|
||||||
INSERT INTO test VALUES (2, '<doc num="C2">
|
|
||||||
<line num="L1"><a>111</a><b>222</b><c>333</c></line>
|
|
||||||
<line num="L2"><a>111</a><b>222</b><c>333</c></line>
|
|
||||||
</doc>');
|
|
||||||
|
|
||||||
|
|
||||||
The query:
|
|
||||||
|
|
||||||
SELECT * FROM xpath_table('id','xml','test',
|
|
||||||
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
|
|
||||||
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
|
|
||||||
val2 int4, val3 int4)
|
|
||||||
WHERE id = 1 ORDER BY doc_num, line_num
|
|
||||||
|
|
||||||
|
|
||||||
Gives the result:
|
|
||||||
|
|
||||||
id | doc_num | line_num | val1 | val2 | val3
|
|
||||||
----+---------+----------+------+------+------
|
|
||||||
1 | C1 | L1 | 1 | 2 | 3
|
|
||||||
1 | | L2 | 11 | 22 | 33
|
|
||||||
|
|
||||||
To get doc_num on every line, the solution is to use two invocations
|
|
||||||
of xpath_table and join the results:
|
|
||||||
|
|
||||||
SELECT t.*,i.doc_num FROM
|
|
||||||
xpath_table('id','xml','test',
|
|
||||||
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
|
|
||||||
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
|
|
||||||
xpath_table('id','xml','test','/doc/@num','1=1')
|
|
||||||
AS i(id int4, doc_num varchar(10))
|
|
||||||
WHERE i.id=t.id AND i.id=1
|
|
||||||
ORDER BY doc_num, line_num;
|
|
||||||
|
|
||||||
which gives the desired result:
|
|
||||||
|
|
||||||
id | line_num | val1 | val2 | val3 | doc_num
|
|
||||||
----+----------+------+------+------+---------
|
|
||||||
1 | L1 | 1 | 2 | 3 | C1
|
|
||||||
1 | L2 | 11 | 22 | 33 | C1
|
|
||||||
(2 rows)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
XSLT functions
|
|
||||||
--------------
|
|
||||||
|
|
||||||
The following functions are available if libxslt is installed (this is
|
|
||||||
not currently detected automatically, so you will have to amend the
|
|
||||||
Makefile)
|
|
||||||
|
|
||||||
xslt_process(document,stylesheet,paramlist) RETURNS text
|
|
||||||
|
|
||||||
This function appplies the XSL stylesheet to the document and returns
|
|
||||||
the transformed result. The paramlist is a list of parameter
|
|
||||||
assignments to be used in the transformation, specified in the form
|
|
||||||
'a=1,b=2'. Note that this is also proof-of-concept code and the
|
|
||||||
parameter parsing is very simple-minded (e.g. parameter values cannot
|
|
||||||
contain commas!)
|
|
||||||
|
|
||||||
Also note that if either the document or stylesheet values do not
|
|
||||||
begin with a < then they will be treated as URLs and libxslt will
|
|
||||||
fetch them. It thus follows that you can use xslt_process as a means
|
|
||||||
to fetch the contents of URLs - you should be aware of the security
|
|
||||||
implications of this.
|
|
||||||
|
|
||||||
There is also a two-parameter version of xslt_process which does not
|
|
||||||
pass any parameters to the transformation.
|
|
||||||
|
|
||||||
|
|
||||||
Feedback
|
|
||||||
--------
|
|
||||||
|
|
||||||
If you have any comments or suggestions, please do contact me at
|
|
||||||
jgray@azuli.co.uk. Unfortunately, this isn't my main job, so I can't
|
|
||||||
guarantee a rapid response to your query!
|
|
|
@ -0,0 +1,32 @@
|
||||||
|
<sect1>
|
||||||
|
<title>adminpack</title>
|
||||||
|
<para>
|
||||||
|
adminpack is a PostgreSQL standard module that implements a number of
|
||||||
|
support functions which pgAdmin and other administration and management tools
|
||||||
|
can use to provide additional functionality if installed on a server.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Functions implemented</title>
|
||||||
|
<para>
|
||||||
|
Functions implemented by adminpack can only be run by a superuser. Here's a
|
||||||
|
list of these functions:
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
<programlisting>
|
||||||
|
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
|
||||||
|
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
|
||||||
|
bool pg_catalog.pg_file_rename(oldname text, newname text)
|
||||||
|
bool pg_catalog.pg_file_unlink(fname text)
|
||||||
|
setof record pg_catalog.pg_logdir_ls()
|
||||||
|
|
||||||
|
/* Renaming of existing backend functions for pgAdmin compatibility */
|
||||||
|
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
|
||||||
|
bigint pg_catalog.pg_file_length(text)
|
||||||
|
int4 pg_catalog.pg_logfile_rotate()
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,40 @@
|
||||||
|
<sect1>
|
||||||
|
<!--
|
||||||
|
<indexterm zone="btree-gist">
|
||||||
|
<primary>btree-gist</primary>
|
||||||
|
</indexterm>
|
||||||
|
-->
|
||||||
|
|
||||||
|
<title>btree-gist</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
btree-gist is a B-Tree implementation using GiST that supports the int2, int4,
|
||||||
|
int8, float4, float8 timestamp with/without time zone, time
|
||||||
|
with/without time zone, date, interval, oid, money, macaddr, char,
|
||||||
|
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Example usage</title>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE test (a int4);
|
||||||
|
-- create index
|
||||||
|
CREATE INDEX testidx ON test USING gist (a);
|
||||||
|
-- query
|
||||||
|
SELECT * FROM test WHERE a < 10;
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Authors</title>
|
||||||
|
<para>
|
||||||
|
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
|
||||||
|
Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
|
||||||
|
(<email>jankorichter@yahoo.de</email>). See
|
||||||
|
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
|
||||||
|
information.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -1,37 +1,32 @@
|
||||||
Pg_buffercache - Real time queries on the shared buffer cache.
|
<sect1 id="buffercache">
|
||||||
--------------
|
<title>pg_buffercache</title>
|
||||||
|
|
||||||
This module consists of a C function 'pg_buffercache_pages()' that returns
|
<indexterm zone="buffercache">
|
||||||
a set of records, plus a view 'pg_buffercache' to wrapper the function.
|
<primary>pg_buffercache</primary>
|
||||||
|
</indexterm>
|
||||||
The intent is to do for the buffercache what pg_locks does for locks, i.e -
|
|
||||||
ability to examine what is happening at any given time without having to
|
|
||||||
restart or rebuild the server with debugging code added.
|
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>pg_buffercache</literal> module provides the means for examining
|
||||||
|
what's happening to the buffercache at any given time without having to
|
||||||
|
restart or rebuild the server with debugging code added. The intent is to
|
||||||
|
do for the buffercache what pg_locks does for locks.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This module consists of a C function <literal>pg_buffercache_pages()</literal>
|
||||||
|
that returns a set of records, plus a view <literal>pg_buffercache</literal>
|
||||||
|
to wrapper the function.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
By default public access is REVOKED from both of these, just in case there
|
By default public access is REVOKED from both of these, just in case there
|
||||||
are security issues lurking.
|
are security issues lurking.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
Installation
|
<title>Notes</title>
|
||||||
------------
|
<para>
|
||||||
|
The definition of the columns exposed in the view is:
|
||||||
Build and install the main Postgresql source, then this contrib module:
|
</para>
|
||||||
|
<programlisting>
|
||||||
$ cd contrib/pg_buffercache
|
|
||||||
$ gmake
|
|
||||||
$ gmake install
|
|
||||||
|
|
||||||
|
|
||||||
To register the functions:
|
|
||||||
|
|
||||||
$ psql -d <database> -f pg_buffercache.sql
|
|
||||||
|
|
||||||
|
|
||||||
Notes
|
|
||||||
-----
|
|
||||||
|
|
||||||
The definition of the columns exposed in the view is:
|
|
||||||
|
|
||||||
Column | references | Description
|
Column | references | Description
|
||||||
----------------+----------------------+------------------------------------
|
----------------+----------------------+------------------------------------
|
||||||
bufferid | | Id, 1..shared_buffers.
|
bufferid | | Id, 1..shared_buffers.
|
||||||
|
@ -41,23 +36,27 @@ Notes
|
||||||
relblocknumber | | Offset of the page in the relation.
|
relblocknumber | | Offset of the page in the relation.
|
||||||
isdirty | | Is the page dirty?
|
isdirty | | Is the page dirty?
|
||||||
usagecount | | Page LRU count
|
usagecount | | Page LRU count
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
There is one row for each buffer in the shared cache. Unused buffers are
|
||||||
|
shown with all fields null except bufferid.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Because the cache is shared by all the databases, there are pages from
|
||||||
|
relations not belonging to the current database.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
When the pg_buffercache view is accessed, internal buffer manager locks are
|
||||||
|
taken, and a copy of the buffer cache data is made for the view to display.
|
||||||
|
This ensures that the view produces a consistent set of results, while not
|
||||||
|
blocking normal buffer activity longer than necessary. Nonetheless there
|
||||||
|
could be some impact on database performance if this view is read often.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
There is one row for each buffer in the shared cache. Unused buffers are
|
<sect2>
|
||||||
shown with all fields null except bufferid.
|
<title>Sample output</title>
|
||||||
|
<programlisting>
|
||||||
Because the cache is shared by all the databases, there are pages from
|
|
||||||
relations not belonging to the current database.
|
|
||||||
|
|
||||||
When the pg_buffercache view is accessed, internal buffer manager locks are
|
|
||||||
taken, and a copy of the buffer cache data is made for the view to display.
|
|
||||||
This ensures that the view produces a consistent set of results, while not
|
|
||||||
blocking normal buffer activity longer than necessary. Nonetheless there
|
|
||||||
could be some impact on database performance if this view is read often.
|
|
||||||
|
|
||||||
|
|
||||||
Sample output
|
|
||||||
-------------
|
|
||||||
|
|
||||||
regression=# \d pg_buffercache;
|
regression=# \d pg_buffercache;
|
||||||
View "public.pg_buffercache"
|
View "public.pg_buffercache"
|
||||||
Column | Type | Modifiers
|
Column | Type | Modifiers
|
||||||
|
@ -98,18 +97,25 @@ Sample output
|
||||||
(10 rows)
|
(10 rows)
|
||||||
|
|
||||||
regression=#
|
regression=#
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Authors</title>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Mark Kirkwood <email>markir@paradise.net.nz</email>
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Design suggestions: Neil Conway <email>neilc@samurai.com</email></para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email></para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
Author
|
</sect1>
|
||||||
------
|
|
||||||
|
|
||||||
* Mark Kirkwood <markir@paradise.net.nz>
|
|
||||||
|
|
||||||
|
|
||||||
Help
|
|
||||||
----
|
|
||||||
|
|
||||||
* Design suggestions : Neil Conway <neilc@samurai.com>
|
|
||||||
* Debugging advice : Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
|
|
||||||
Thanks guys!
|
|
|
@ -0,0 +1,84 @@
|
||||||
|
<sect1 id="chkpass">
|
||||||
|
<title>chkpass</title>
|
||||||
|
|
||||||
|
<!--
|
||||||
|
<indexterm zone="chkpass">
|
||||||
|
<primary>chkpass</primary>
|
||||||
|
</indexterm>
|
||||||
|
-->
|
||||||
|
<para>
|
||||||
|
chkpass is a password type that is automatically checked and converted upon
|
||||||
|
entry. It is stored encrypted. To compare, simply compare against a clear
|
||||||
|
text password and the comparison function will encrypt it before comparing.
|
||||||
|
It also returns an error if the code determines that the password is easily
|
||||||
|
crackable. This is currently a stub that does nothing.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Note that the chkpass data type is not indexable.
|
||||||
|
<!--
|
||||||
|
I haven't worried about making this type indexable. I doubt that anyone
|
||||||
|
would ever need to sort a file in order of encrypted password.
|
||||||
|
-->
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If you precede the string with a colon, the encryption and checking are
|
||||||
|
skipped so that you can enter existing passwords into the field.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
On output, a colon is prepended. This makes it possible to dump and reload
|
||||||
|
passwords without re-encrypting them. If you want the password (encrypted)
|
||||||
|
without the colon then use the raw() function. This allows you to use the
|
||||||
|
type with things like Apache's Auth_PostgreSQL module.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The encryption uses the standard Unix function crypt(), and so it suffers
|
||||||
|
from all the usual limitations of that function; notably that only the
|
||||||
|
first eight characters of a password are considered.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Here is some sample usage:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
test=# create table test (p chkpass);
|
||||||
|
CREATE TABLE
|
||||||
|
test=# insert into test values ('hello');
|
||||||
|
INSERT 0 1
|
||||||
|
test=# select * from test;
|
||||||
|
p
|
||||||
|
----------------
|
||||||
|
:dVGkpXdOrE3ko
|
||||||
|
(1 row)
|
||||||
|
|
||||||
|
test=# select raw(p) from test;
|
||||||
|
raw
|
||||||
|
---------------
|
||||||
|
dVGkpXdOrE3ko
|
||||||
|
(1 row)
|
||||||
|
|
||||||
|
test=# select p = 'hello' from test;
|
||||||
|
?column?
|
||||||
|
----------
|
||||||
|
t
|
||||||
|
(1 row)
|
||||||
|
|
||||||
|
test=# select p = 'goodbye' from test;
|
||||||
|
?column?
|
||||||
|
----------
|
||||||
|
f
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
D'Arcy J.M. Cain <email>darcy@druid.net</email>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,56 @@
|
||||||
|
<chapter id="contrib">
|
||||||
|
<title>Standard Modules</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This section contains information regarding the standard modules which
|
||||||
|
can be found in the <literal>contrib</literal> directory of the
|
||||||
|
PostgreSQL distribution. These are porting tools, analysis utilities,
|
||||||
|
and plug-in features that are not part of the core PostgreSQL system,
|
||||||
|
mainly because they address a limited audience or are too experimental
|
||||||
|
to be part of the main source tree. This does not preclude their
|
||||||
|
usefulness.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Some modules supply new user-defined functions, operators, or types. In
|
||||||
|
these cases, you will need to run <literal>make</literal> and <literal>make
|
||||||
|
install</literal> in <literal>contrib/module</literal>. After you have
|
||||||
|
installed the files you need to register the new entities in the database
|
||||||
|
system by running the commands in the supplied .sql file. For example,
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
$ psql -d dbname -f module.sql
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
&adminpack;
|
||||||
|
&btree-gist;
|
||||||
|
&chkpass;
|
||||||
|
&cube;
|
||||||
|
&dblink;
|
||||||
|
&earthdistance;
|
||||||
|
&fuzzystrmatch;
|
||||||
|
&hstore;
|
||||||
|
&intagg;
|
||||||
|
&intarray;
|
||||||
|
&isn;
|
||||||
|
&lo;
|
||||||
|
<ree;
|
||||||
|
&oid2name;
|
||||||
|
&pageinspect;
|
||||||
|
&pgbench;
|
||||||
|
&buffercache;
|
||||||
|
&pgcrypto;
|
||||||
|
&freespacemap;
|
||||||
|
&pgrowlocks;
|
||||||
|
&standby;
|
||||||
|
&pgstattuple;
|
||||||
|
&trgm;
|
||||||
|
&seg;
|
||||||
|
&sslinfo;
|
||||||
|
&tablefunc;
|
||||||
|
&uuid-ossp;
|
||||||
|
&vacuumlo;
|
||||||
|
&xml2;
|
||||||
|
</chapter>
|
||||||
|
|
|
@ -0,0 +1,529 @@
|
||||||
|
|
||||||
|
<sect1 id="cube">
|
||||||
|
<title>cube</title>
|
||||||
|
|
||||||
|
<indexterm zone="cube">
|
||||||
|
<primary>cube</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This module contains the user-defined type, CUBE, representing
|
||||||
|
multidimensional cubes.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Syntax</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The following are valid external representations for the CUBE type:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Cube external representations</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>'x'</entry>
|
||||||
|
<entry>A floating point value representing a one-dimensional point or
|
||||||
|
one-dimensional zero length cubement
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x)'</entry>
|
||||||
|
<entry>Same as above</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'x1,x2,x3,...,xn'</entry>
|
||||||
|
<entry>A point in n-dimensional space, represented internally as a zero
|
||||||
|
volume box
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x1,x2,x3,...,xn)'</entry>
|
||||||
|
<entry>Same as above</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x),(y)'</entry>
|
||||||
|
<entry>1-D cubement starting at x and ending at y or vice versa; the
|
||||||
|
order does not matter
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
|
||||||
|
<entry>n-dimensional box represented by a pair of its opposite corners, no
|
||||||
|
matter which. Functions take care of swapping to achieve "lower left --
|
||||||
|
upper right" representation before computing any values
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Grammar</title>
|
||||||
|
<table>
|
||||||
|
<title>Cube Grammar Rules</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>rule 1</entry>
|
||||||
|
<entry>box -> O_BRACKET paren_list COMMA paren_list C_BRACKET</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 2</entry>
|
||||||
|
<entry>box -> paren_list COMMA paren_list</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 3</entry>
|
||||||
|
<entry>box -> paren_list</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 4</entry>
|
||||||
|
<entry>box -> list</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 5</entry>
|
||||||
|
<entry>paren_list -> O_PAREN list C_PAREN</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 6</entry>
|
||||||
|
<entry>list -> FLOAT</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 7</entry>
|
||||||
|
<entry>list -> list COMMA FLOAT</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Tokens</title>
|
||||||
|
<table>
|
||||||
|
<title>Cube Grammar Rules</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>n</entry>
|
||||||
|
<entry>[0-9]+</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>i</entry>
|
||||||
|
<entry>nteger [+-]?{n}</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>real</entry>
|
||||||
|
<entry>[+-]?({n}\.{n}?|\.{n})</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>FLOAT</entry>
|
||||||
|
<entry>({integer}|{real})([eE]{integer})?</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>O_BRACKET</entry>
|
||||||
|
<entry>\[</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>C_BRACKET</entry>
|
||||||
|
<entry>\]</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>O_PAREN</entry>
|
||||||
|
<entry>\(</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>C_PAREN</entry>
|
||||||
|
<entry>\)</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>COMMA</entry>
|
||||||
|
<entry>\,</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Examples</title>
|
||||||
|
<table>
|
||||||
|
<title>Examples</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>'x'</entry>
|
||||||
|
<entry>A floating point value representing a one-dimensional point
|
||||||
|
(or, zero-length one-dimensional interval)
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x)'</entry>
|
||||||
|
<entry>Same as above</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'x1,x2,x3,...,xn'</entry>
|
||||||
|
<entry>A point in n-dimensional space,represented internally as a zero
|
||||||
|
volume cube
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x1,x2,x3,...,xn)'</entry>
|
||||||
|
<entry>Same as above</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x),(y)'</entry>
|
||||||
|
<entry>A 1-D interval starting at x and ending at y or vice versa; the
|
||||||
|
order does not matter
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'[(x),(y)]'</entry>
|
||||||
|
<entry>Same as above</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
|
||||||
|
<entry>An n-dimensional box represented by a pair of its diagonally
|
||||||
|
opposite corners, regardless of order. Swapping is provided
|
||||||
|
by all comarison routines to ensure the
|
||||||
|
"lower left -- upper right" representation
|
||||||
|
before actaul comparison takes place.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>'[(x1,...,xn),(y1,...,yn)]'</entry>
|
||||||
|
<entry>Same as above</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<para>
|
||||||
|
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
<sect2>
|
||||||
|
<title>Defaults</title>
|
||||||
|
<para>
|
||||||
|
I believe this union:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select cube_union('(0,5,2),(2,3,1)','0');
|
||||||
|
cube_union
|
||||||
|
-------------------
|
||||||
|
(0, 0, 0),(2, 5, 2)
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
does not contradict to the common sense, neither does the intersection
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
select cube_inter('(0,-1),(1,1)','(-2),(2)');
|
||||||
|
cube_inter
|
||||||
|
-------------
|
||||||
|
(0, 0),(1, 0)
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
In all binary operations on differently sized boxes, I assume the smaller
|
||||||
|
one to be a cartesian projection, i. e., having zeroes in place of coordinates
|
||||||
|
omitted in the string representation. The above examples are equivalent to:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
|
||||||
|
cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The following containment predicate uses the point syntax,
|
||||||
|
while in fact the second argument is internally represented by a box.
|
||||||
|
This syntax makes it unnecessary to define the special Point type
|
||||||
|
and functions for (box,point) predicates.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
select cube_contains('(0,0),(1,1)', '0.5,0.5');
|
||||||
|
cube_contains
|
||||||
|
--------------
|
||||||
|
t
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
<sect2>
|
||||||
|
<title>Precision</title>
|
||||||
|
<para>
|
||||||
|
Values are stored internally as 64-bit floating point numbers. This means that
|
||||||
|
numbers with more than about 16 significant digits will be truncated.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Usage</title>
|
||||||
|
<para>
|
||||||
|
The access method for CUBE is a GiST index (gist_cube_ops), which is a
|
||||||
|
generalization of R-tree. GiSTs allow the postgres implementation of
|
||||||
|
R-tree, originally encoded to support 2-D geometric types such as
|
||||||
|
boxes and polygons, to be used with any data type whose data domain
|
||||||
|
can be partitioned using the concepts of containment, intersection and
|
||||||
|
equality. In other words, everything that can intersect or contain
|
||||||
|
its own kind can be indexed with a GiST. That includes, among other
|
||||||
|
things, all geometric data types, regardless of their dimensionality
|
||||||
|
(see also contrib/seg).
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The operators supported by the GiST access method include:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
a = b Same as
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The cubements a and b are identical.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
a && b Overlaps
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The cubements a and b overlap.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
a @> b Contains
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The cubement a contains the cubement b.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
a <@ b Contained in
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The cubement a is contained in b.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
||||||
|
respectively called @ and ~. These names are still available, but are
|
||||||
|
deprecated and will eventually be retired. Notice that the old names
|
||||||
|
are reversed from the convention formerly followed by the core geometric
|
||||||
|
datatypes!)
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Although the mnemonics of the following operators is questionable, I
|
||||||
|
preserved them to maintain visual consistency with other geometric
|
||||||
|
data types defined in Postgres.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Other operators:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
[a, b] < [c, d] Less than
|
||||||
|
[a, b] > [c, d] Greater than
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
These operators do not make a lot of sense for any practical
|
||||||
|
purpose but sorting. These operators first compare (a) to (c),
|
||||||
|
and if these are equal, compare (b) to (d). That accounts for
|
||||||
|
reasonably good sorting in most cases, which is useful if
|
||||||
|
you want to use ORDER BY with this type
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The following functions are available:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Functions available</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube_distance(cube, cube) returns double</literal></entry>
|
||||||
|
<entry>cube_distance returns the distance between two cubes. If both
|
||||||
|
cubes are points, this is the normal distance function.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube(float8) returns cube</literal></entry>
|
||||||
|
<entry>This makes a one dimensional cube with both coordinates the same.
|
||||||
|
If the type of the argument is a numeric type other than float8 an
|
||||||
|
explicit cast to float8 may be needed.
|
||||||
|
<literal>cube(1) == '(1)'</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube(float8, float8) returns cube</literal></entry>
|
||||||
|
<entry>
|
||||||
|
This makes a one dimensional cube.
|
||||||
|
<literal>cube(1,2) == '(1),(2)'</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube(float8[]) returns cube</literal></entry>
|
||||||
|
<entry>This makes a zero-volume cube using the coordinates
|
||||||
|
defined by thearray.<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube(float8[], float8[]) returns cube</literal></entry>
|
||||||
|
<entry>This makes a cube, with upper right and lower left
|
||||||
|
coordinates as defined by the 2 float arrays. Arrays must be of the
|
||||||
|
same length.
|
||||||
|
<literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
|
||||||
|
</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube(cube, float8) returns cube</literal></entry>
|
||||||
|
<entry>This builds a new cube by adding a dimension on to an
|
||||||
|
existing cube with the same values for both parts of the new coordinate.
|
||||||
|
This is useful for building cubes piece by piece from calculated values.
|
||||||
|
<literal>cube('(1)',2) == '(1,2),(1,2)'</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube(cube, float8, float8) returns cube</literal></entry>
|
||||||
|
<entry>This builds a new cube by adding a dimension on to an
|
||||||
|
existing cube. This is useful for building cubes piece by piece from
|
||||||
|
calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube_dim(cube) returns int</literal></entry>
|
||||||
|
<entry>cube_dim returns the number of dimensions stored in the
|
||||||
|
the data structure
|
||||||
|
for a cube. This is useful for constraints on the dimensions of a cube.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube_ll_coord(cube, int) returns double </literal></entry>
|
||||||
|
<entry>
|
||||||
|
cube_ll_coord returns the nth coordinate value for the lower left
|
||||||
|
corner of a cube. This is useful for doing coordinate transformations.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube_ur_coord(cube, int) returns double
|
||||||
|
</literal></entry>
|
||||||
|
<entry>cube_ur_coord returns the nth coordinate value for the
|
||||||
|
upper right corner of a cube. This is useful for doing coordinate
|
||||||
|
transformations.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube_subset(cube, int[]) returns cube
|
||||||
|
</literal></entry>
|
||||||
|
<entry>Builds a new cube from an existing cube, using a list of
|
||||||
|
dimension indexes
|
||||||
|
from an array. Can be used to find both the ll and ur coordinate of single
|
||||||
|
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
|
||||||
|
Or can be used to drop dimensions, or reorder them as desired, e.g.:
|
||||||
|
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) =
|
||||||
|
'(5, 3, 1, 1),(8, 7, 6, 6)'
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube_is_point(cube) returns bool</literal></entry>
|
||||||
|
<entry>cube_is_point returns true if a cube is also a point.
|
||||||
|
This is true when the two defining corners are the same.</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>cube_enlarge(cube, double, int) returns cube</literal></entry>
|
||||||
|
<entry>
|
||||||
|
cube_enlarge increases the size of a cube by a specified
|
||||||
|
radius in at least
|
||||||
|
n dimensions. If the radius is negative the box is shrunk instead. This
|
||||||
|
is useful for creating bounding boxes around a point for searching for
|
||||||
|
nearby points. All defined dimensions are changed by the radius. If n
|
||||||
|
is greater than the number of defined dimensions and the cube is being
|
||||||
|
increased (r >= 0) then 0 is used as the base for the extra coordinates.
|
||||||
|
LL coordinates are decreased by r and UR coordinates are increased by r.
|
||||||
|
If a LL coordinate is increased to larger than the corresponding UR
|
||||||
|
coordinate (this can only happen when r < 0) than both coordinates are
|
||||||
|
set to their average. To make it harder for people to break things there
|
||||||
|
is an effective maximum on the dimension of cubes of 100. This is set
|
||||||
|
in cubedata.h if you need something bigger.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
There are a few other potentially useful functions defined in cube.c
|
||||||
|
that vanished from the schema because I stopped using them. Some of
|
||||||
|
these were meant to support type casting. Let me know if I was wrong:
|
||||||
|
I will then add them back to the schema. I would also appreciate
|
||||||
|
other ideas that would enhance the type and make it more useful.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
For examples of usage, see sql/cube.sql
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Credits</title>
|
||||||
|
<para>
|
||||||
|
This code is essentially based on the example written for
|
||||||
|
Illustra, <ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
My thanks are primarily to Prof. Joe Hellerstein
|
||||||
|
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
|
||||||
|
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and
|
||||||
|
to his former student, Andy Dong
|
||||||
|
(<ulink url="http://best.me.berkeley.edu/~adong/"></ulink>), for his exemplar.
|
||||||
|
I am also grateful to all postgres developers, present and past, for enabling
|
||||||
|
myself to create my own world and live undisturbed in it. And I would like to
|
||||||
|
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
|
||||||
|
for the years of faithful support of my database research.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Gene Selkov, Jr.
|
||||||
|
Computational Scientist
|
||||||
|
Mathematics and Computer Science Division
|
||||||
|
Argonne National Laboratory
|
||||||
|
9700 S Cass Ave.
|
||||||
|
Building 221
|
||||||
|
Argonne, IL 60439-4844
|
||||||
|
<email>selkovjr@mcs.anl.gov</email>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Minor updates to this package were made by Bruno Wolff III
|
||||||
|
<email>bruno@wolff.to</email> in August/September of 2002. These include
|
||||||
|
changing the precision from single precision to double precision and adding
|
||||||
|
some new functions.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Additional updates were made by Joshua Reich <email>josh@root.net</email> in
|
||||||
|
July 2006. These include <literal>cube(float8[], float8[])</literal> and
|
||||||
|
cleaning up the code to use the V1 call protocol instead of the deprecated V0
|
||||||
|
form.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,133 @@
|
||||||
|
<sect1 id="earthdistance">
|
||||||
|
<title>earthdistance</title>
|
||||||
|
|
||||||
|
<indexterm zone="earthdistance">
|
||||||
|
<primary>earthdistance</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This module contains two different approaches to calculating
|
||||||
|
great circle distances on the surface of the Earth. The one described
|
||||||
|
first depends on the contrib/cube package (which MUST be installed before
|
||||||
|
earthdistance is installed). The second one is based on the point
|
||||||
|
datatype using latitude and longitude for the coordinates. The install
|
||||||
|
script makes the defined functions executable by anyone.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
A spherical model of the Earth is used.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Data is stored in cubes that are points (both corners are the same) using 3
|
||||||
|
coordinates representing the distance from the center of the Earth.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The radius of the Earth is obtained from the earth() function. It is
|
||||||
|
given in meters. But by changing this one function you can change it
|
||||||
|
to use some other units or to use a different value of the radius
|
||||||
|
that you feel is more appropiate.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This package also has applications to astronomical databases as well.
|
||||||
|
Astronomers will probably want to change earth() to return a radius of
|
||||||
|
180/pi() so that distances are in degrees.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Functions are provided to allow for input in latitude and longitude (in
|
||||||
|
degrees), to allow for output of latitude and longitude, to calculate
|
||||||
|
the great circle distance between two points and to easily specify a
|
||||||
|
bounding box usable for index searches.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The functions are all 'sql' functions. If you want to make these functions
|
||||||
|
executable by other people you will also have to make the referenced
|
||||||
|
cube functions executable. cube(text), cube(float8), cube(cube,float8),
|
||||||
|
cube_distance(cube,cube), cube_ll_coord(cube,int) and
|
||||||
|
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
|
||||||
|
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
|
||||||
|
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
|
||||||
|
might be useful for looking at bounding box coordinates in user applications.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
A domain of type cube named earth is defined.
|
||||||
|
There are constraints on it defined to make sure the cube is a point,
|
||||||
|
that it does not have more than 3 dimensions and that it is very near
|
||||||
|
the surface of a sphere centered about the origin with the radius of
|
||||||
|
the Earth.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The following functions are provided:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table id="earthdistance-functions">
|
||||||
|
<title>EarthDistance functions</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>earth()</literal></entry>
|
||||||
|
<entry>returns the radius of the Earth in meters.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>sec_to_gc(float8)</literal></entry>
|
||||||
|
<entry>converts the normal straight line
|
||||||
|
(secant) distance between between two points on the surface of the Earth
|
||||||
|
to the great circle distance between them.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>gc_to_sec(float8)</literal></entry>
|
||||||
|
<entry>Converts the great circle distance
|
||||||
|
between two points on the surface of the Earth to the normal straight line
|
||||||
|
(secant) distance between them.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>ll_to_earth(float8, float8)</literal></entry>
|
||||||
|
<entry>Returns the location of a point on the surface of the Earth given
|
||||||
|
its latitude (argument 1) and longitude (argument 2) in degrees.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>latitude(earth)</literal></entry>
|
||||||
|
<entry>Returns the latitude in degrees of a point on the surface of the
|
||||||
|
Earth.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>longitude(earth)</literal></entry>
|
||||||
|
<entry>Returns the longitude in degrees of a point on the surface of the
|
||||||
|
Earth.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>earth_distance(earth, earth)</literal></entry>
|
||||||
|
<entry>Returns the great circle distance between two points on the
|
||||||
|
surface of the Earth.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>earth_box(earth, float8)</literal></entry>
|
||||||
|
<entry>Returns a box suitable for an indexed search using the cube @>
|
||||||
|
operator for points within a given great circle distance of a location.
|
||||||
|
Some points in this box are further than the specified great circle
|
||||||
|
distance from the location so a second check using earth_distance
|
||||||
|
should be made at the same time.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal><@></literal> operator</entry>
|
||||||
|
<entry>gives the distance in statute miles between
|
||||||
|
two points on the Earth's surface. Coordinates are in degrees. Points are
|
||||||
|
taken as (longitude, latitude) and not vice versa as longitude is closer
|
||||||
|
to the intuitive idea of x-axis and latitude to y-axis.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<para>
|
||||||
|
One advantage of using cube representation over a point using latitude and
|
||||||
|
longitude for coordinates, is that you don't have to worry about special
|
||||||
|
conditions at +/- 180 degrees of longitude or near the poles.
|
||||||
|
</para>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.51 2007/11/01 17:00:18 momjian Exp $ -->
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.52 2007/11/10 23:30:46 momjian Exp $ -->
|
||||||
|
|
||||||
<!entity history SYSTEM "history.sgml">
|
<!entity history SYSTEM "history.sgml">
|
||||||
<!entity info SYSTEM "info.sgml">
|
<!entity info SYSTEM "info.sgml">
|
||||||
|
@ -89,6 +89,38 @@
|
||||||
<!entity sources SYSTEM "sources.sgml">
|
<!entity sources SYSTEM "sources.sgml">
|
||||||
<!entity storage SYSTEM "storage.sgml">
|
<!entity storage SYSTEM "storage.sgml">
|
||||||
|
|
||||||
|
<!-- contrib information -->
|
||||||
|
<!entity contrib SYSTEM "contrib.sgml">
|
||||||
|
<!entity adminpack SYSTEM "adminpack.sgml">
|
||||||
|
<!entity btree-gist SYSTEM "btree-gist.sgml">
|
||||||
|
<!entity chkpass SYSTEM "chkpass.sgml">
|
||||||
|
<!entity cube SYSTEM "cube.sgml">
|
||||||
|
<!entity dblink SYSTEM "dblink.sgml">
|
||||||
|
<!entity earthdistance SYSTEM "earthdistance.sgml">
|
||||||
|
<!entity fuzzystrmatch SYSTEM "fuzzystrmatch.sgml">
|
||||||
|
<!entity hstore SYSTEM "hstore.sgml">
|
||||||
|
<!entity intagg SYSTEM "intagg.sgml">
|
||||||
|
<!entity intarray SYSTEM "intarray.sgml">
|
||||||
|
<!entity isn SYSTEM "isn.sgml">
|
||||||
|
<!entity lo SYSTEM "lo.sgml">
|
||||||
|
<!entity ltree SYSTEM "ltree.sgml">
|
||||||
|
<!entity oid2name SYSTEM "oid2name.sgml">
|
||||||
|
<!entity pageinspect SYSTEM "pageinspect.sgml">
|
||||||
|
<!entity pgbench SYSTEM "pgbench.sgml">
|
||||||
|
<!entity buffercache SYSTEM "buffercache.sgml">
|
||||||
|
<!entity pgcrypto SYSTEM "pgcrypto.sgml">
|
||||||
|
<!entity freespacemap SYSTEM "freespacemap.sgml">
|
||||||
|
<!entity pgrowlocks SYSTEM "pgrowlocks.sgml">
|
||||||
|
<!entity standby SYSTEM "standby.sgml">
|
||||||
|
<!entity pgstattuple SYSTEM "pgstattuple.sgml">
|
||||||
|
<!entity trgm SYSTEM "trgm.sgml">
|
||||||
|
<!entity seg SYSTEM "seg.sgml">
|
||||||
|
<!entity sslinfo SYSTEM "sslinfo.sgml">
|
||||||
|
<!entity tablefunc SYSTEM "tablefunc.sgml">
|
||||||
|
<!entity uuid-ossp SYSTEM "uuid-ossp.sgml">
|
||||||
|
<!entity vacuumlo SYSTEM "vacuumlo.sgml">
|
||||||
|
<!entity xml2 SYSTEM "xml2.sgml">
|
||||||
|
|
||||||
<!-- appendixes -->
|
<!-- appendixes -->
|
||||||
<!entity contacts SYSTEM "contacts.sgml">
|
<!entity contacts SYSTEM "contacts.sgml">
|
||||||
<!entity cvs SYSTEM "cvs.sgml">
|
<!entity cvs SYSTEM "cvs.sgml">
|
||||||
|
|
|
@ -0,0 +1,243 @@
|
||||||
|
<sect1 id="pgfreespacemap">
|
||||||
|
<title>pgfreespacemap</title>
|
||||||
|
|
||||||
|
<indexterm zone="pgfreespacemap">
|
||||||
|
<primary>pgfreespacemap</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This modules provides the means for examining the free space map (FSM). It
|
||||||
|
consists of two C functions: <literal>pg_freespacemap_relations()</literal>
|
||||||
|
and <literal>pg_freespacemap_pages()</literal> that return a set
|
||||||
|
of records, plus two views <literal>pg_freespacemap_relations</literal> and
|
||||||
|
<literal>pg_freespacemap_pages</literal> for more user-friendly access to
|
||||||
|
the functions.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The module provides the ability to examine the contents of the free space
|
||||||
|
map, without having to restart or rebuild the server with additional
|
||||||
|
debugging code.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
By default public access is REVOKED from the functions and views, just in
|
||||||
|
case there are security issues present in the code.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Notes</title>
|
||||||
|
<para>
|
||||||
|
The definitions for the columns exposed in the views are:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>pg_freespacemap_relations</title>
|
||||||
|
<tgroup cols="3">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Column</entry>
|
||||||
|
<entry>references</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>reltablespace</entry>
|
||||||
|
<entry>pg_tablespace.oid</entry>
|
||||||
|
<entry>Tablespace oid of the relation.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>reldatabase</entry>
|
||||||
|
<entry>pg_database.oid</entry>
|
||||||
|
<entry>Database oid of the relation.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>relfilenode</entry>
|
||||||
|
<entry>pg_class.relfilenode</entry>
|
||||||
|
<entry>Relfilenode of the relation.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>avgrequest</entry>
|
||||||
|
<entry></entry>
|
||||||
|
<entry>Moving average of free space requests (NULL for indexes)</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>interestingpages</entry>
|
||||||
|
<entry></entry>
|
||||||
|
<entry>Count of pages last reported as containing useful free space.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>storedpages</entry>
|
||||||
|
<entry></entry>
|
||||||
|
<entry>Count of pages actually stored in free space map.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>nextpage</entry>
|
||||||
|
<entry></entry>
|
||||||
|
<entry>Page index (from 0) to start next search at.</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>pg_freespacemap_pages</title>
|
||||||
|
<tgroup cols="3">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Column</entry>
|
||||||
|
<entry> references</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>reltablespace</entry>
|
||||||
|
<entry>pg_tablespace.oid</entry>
|
||||||
|
<entry>Tablespace oid of the relation.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>reldatabase</entry>
|
||||||
|
<entry>pg_database.oid</entry>
|
||||||
|
<entry>Database oid of the relation.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>relfilenode</entry>
|
||||||
|
<entry>pg_class.relfilenode</entry>
|
||||||
|
<entry>Relfilenode of the relation.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>relblocknumber</entry>
|
||||||
|
<entry></entry>
|
||||||
|
<entry>Page number in the relation.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>bytes</entry>
|
||||||
|
<entry></entry>
|
||||||
|
<entry>Free bytes in the page, or NULL for an index page (see below).</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
For <literal>pg_freespacemap_relations</literal>, there is one row for each
|
||||||
|
relation in the free space map. <literal>storedpages</literal> is the
|
||||||
|
number of pages actually stored in the map, while
|
||||||
|
<literal>interestingpages</literal> is the number of pages the last VACUUM
|
||||||
|
thought had useful amounts of free space.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
If <literal>storedpages</literal> is consistently less than interestingpages
|
||||||
|
then it'd be a good idea to increase <literal>max_fsm_pages</literal>. Also,
|
||||||
|
if the number of rows in <literal>pg_freespacemap_relations</literal> is
|
||||||
|
close to <literal>max_fsm_relations</literal>, then you should consider
|
||||||
|
increasing <literal>max_fsm_relations</literal>.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
For <literal>pg_freespacemap_pages</literal>, there is one row for each page
|
||||||
|
in the free space map. The number of rows for a relation will match the
|
||||||
|
<literal>storedpages</literal> column in
|
||||||
|
<literal>pg_freespacemap_relations</literal>.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
For indexes, what is tracked is entirely-unused pages, rather than free
|
||||||
|
space within pages. Therefore, the average request size and free bytes
|
||||||
|
within a page are not meaningful, and are shown as NULL.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Because the map is shared by all the databases, it will include relations
|
||||||
|
not belonging to the current database.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
When either of the views are accessed, internal free space map locks are
|
||||||
|
taken, and a copy of the map data is made for them to display.
|
||||||
|
This ensures that the views produce a consistent set of results, while not
|
||||||
|
blocking normal activity longer than necessary. Nonetheless there
|
||||||
|
could be some impact on database performance if they are read often.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Sample output - pg_freespacemap_relations</title>
|
||||||
|
<programlisting>
|
||||||
|
regression=# \d pg_freespacemap_relations
|
||||||
|
View "public.pg_freespacemap_relations"
|
||||||
|
Column | Type | Modifiers
|
||||||
|
------------------+---------+-----------
|
||||||
|
reltablespace | oid |
|
||||||
|
reldatabase | oid |
|
||||||
|
relfilenode | oid |
|
||||||
|
avgrequest | integer |
|
||||||
|
interestingpages | integer |
|
||||||
|
storedpages | integer |
|
||||||
|
nextpage | integer |
|
||||||
|
View definition:
|
||||||
|
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage
|
||||||
|
FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer);
|
||||||
|
|
||||||
|
regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages
|
||||||
|
FROM pg_freespacemap_relations r INNER JOIN pg_class c
|
||||||
|
ON c.relfilenode = r.relfilenode INNER JOIN pg_database d
|
||||||
|
ON r.reldatabase = d.oid AND (d.datname = current_database())
|
||||||
|
ORDER BY r.storedpages DESC LIMIT 10;
|
||||||
|
relname | avgrequest | interestingpages | storedpages
|
||||||
|
---------------------------------+------------+------------------+-------------
|
||||||
|
onek | 256 | 109 | 109
|
||||||
|
pg_attribute | 167 | 93 | 93
|
||||||
|
pg_class | 191 | 49 | 49
|
||||||
|
pg_attribute_relid_attnam_index | | 48 | 48
|
||||||
|
onek2 | 256 | 37 | 37
|
||||||
|
pg_depend | 95 | 26 | 26
|
||||||
|
pg_type | 199 | 16 | 16
|
||||||
|
pg_rewrite | 1011 | 13 | 13
|
||||||
|
pg_class_relname_nsp_index | | 10 | 10
|
||||||
|
pg_proc | 302 | 8 | 8
|
||||||
|
(10 rows)
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Sample output - pg_freespacemap_pages</title>
|
||||||
|
<programlisting>
|
||||||
|
regression=# \d pg_freespacemap_pages
|
||||||
|
View "public.pg_freespacemap_pages"
|
||||||
|
Column | Type | Modifiers
|
||||||
|
----------------+---------+-----------
|
||||||
|
reltablespace | oid |
|
||||||
|
reldatabase | oid |
|
||||||
|
relfilenode | oid |
|
||||||
|
relblocknumber | bigint |
|
||||||
|
bytes | integer |
|
||||||
|
View definition:
|
||||||
|
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes
|
||||||
|
FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer);
|
||||||
|
|
||||||
|
regression=# SELECT c.relname, p.relblocknumber, p.bytes
|
||||||
|
FROM pg_freespacemap_pages p INNER JOIN pg_class c
|
||||||
|
ON c.relfilenode = p.relfilenode INNER JOIN pg_database d
|
||||||
|
ON (p.reldatabase = d.oid AND d.datname = current_database())
|
||||||
|
ORDER BY c.relname LIMIT 10;
|
||||||
|
relname | relblocknumber | bytes
|
||||||
|
--------------+----------------+-------
|
||||||
|
a_star | 0 | 8040
|
||||||
|
abstime_tbl | 0 | 7908
|
||||||
|
aggtest | 0 | 8008
|
||||||
|
altinhoid | 0 | 8128
|
||||||
|
altstartwith | 0 | 8128
|
||||||
|
arrtest | 0 | 7172
|
||||||
|
b_star | 0 | 7976
|
||||||
|
box_tbl | 0 | 7912
|
||||||
|
bt_f8_heap | 54 | 7728
|
||||||
|
bt_i4_heap | 49 | 8008
|
||||||
|
(10 rows)
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
Mark Kirkwood <email>markir@paradise.net.nz</email>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,122 @@
|
||||||
|
|
||||||
|
<sect1 id="fuzzystrmatch">
|
||||||
|
<title>fuzzystrmatch</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This section describes the fuzzystrmatch module which provides different
|
||||||
|
functions to determine similarities and distance between strings.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Soundex</title>
|
||||||
|
<para>
|
||||||
|
The Soundex system is a method of matching similar sounding names
|
||||||
|
(or any words) to the same code. It was initially used by the
|
||||||
|
United States Census in 1880, 1900, and 1910, but it has little use
|
||||||
|
beyond English names (or the English pronunciation of names), and
|
||||||
|
it is not a linguistic tool.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
When comparing two soundex values to determine similarity, the
|
||||||
|
difference function reports how close the match is on a scale
|
||||||
|
from zero to four, with zero being no match and four being an
|
||||||
|
exact match.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The following are some usage examples:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT soundex('hello world!');
|
||||||
|
|
||||||
|
SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
|
||||||
|
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
|
||||||
|
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
|
||||||
|
|
||||||
|
CREATE TABLE s (nm text);
|
||||||
|
|
||||||
|
INSERT INTO s VALUES ('john');
|
||||||
|
INSERT INTO s VALUES ('joan');
|
||||||
|
INSERT INTO s VALUES ('wobbly');
|
||||||
|
INSERT INTO s VALUES ('jack');
|
||||||
|
|
||||||
|
SELECT * FROM s WHERE soundex(nm) = soundex('john');
|
||||||
|
|
||||||
|
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid;
|
||||||
|
|
||||||
|
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
|
||||||
|
'select soundex($1) = soundex($2)'
|
||||||
|
LANGUAGE SQL;
|
||||||
|
|
||||||
|
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
|
||||||
|
'select soundex($1) < soundex($2)'
|
||||||
|
LANGUAGE SQL;
|
||||||
|
|
||||||
|
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
|
||||||
|
'select soundex($1) > soundex($2)'
|
||||||
|
LANGUAGE SQL;
|
||||||
|
|
||||||
|
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
|
||||||
|
'select soundex($1) <= soundex($2)'
|
||||||
|
LANGUAGE SQL;
|
||||||
|
|
||||||
|
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
|
||||||
|
'select soundex($1) >= soundex($2)'
|
||||||
|
LANGUAGE SQL;
|
||||||
|
|
||||||
|
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
|
||||||
|
'select soundex($1) <> soundex($2)'
|
||||||
|
LANGUAGE SQL;
|
||||||
|
|
||||||
|
DROP OPERATOR #= (text, text);
|
||||||
|
|
||||||
|
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
|
||||||
|
|
||||||
|
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
|
||||||
|
|
||||||
|
SELECT * FROM s WHERE s.nm #= 'john';
|
||||||
|
|
||||||
|
SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>levenshtein</title>
|
||||||
|
<para>
|
||||||
|
This function calculates the levenshtein distance between two strings:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
int levenshtein(text source, text target)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Both <literal>source</literal> and <literal>target</literal> can be any
|
||||||
|
NOT NULL string with a maximum of 255 characters.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT levenshtein('GUMBO','GAMBOL');
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>metaphone</title>
|
||||||
|
<para>
|
||||||
|
This function calculates and returns the metaphone code of an input string:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
text metahpone(text source, int max_output_length)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
<literal>source</literal> has to be a NOT NULL string with a maximum of
|
||||||
|
255 characters. <literal>max_output_length</literal> fixes the maximum
|
||||||
|
length of the output metaphone code; if longer, the output is truncated
|
||||||
|
to this length.
|
||||||
|
</para>
|
||||||
|
<para>Example</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT metaphone('GUMBO',4);
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
|
@ -0,0 +1,298 @@
|
||||||
|
<sect1 id="hstore">
|
||||||
|
<title>hstore</title>
|
||||||
|
|
||||||
|
<indexterm zone="hstore">
|
||||||
|
<primary>hstore</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <literal>hstore</literal> module is usefull for storing (key,value) pairs.
|
||||||
|
This module can be useful in different scenarios: case with many attributes
|
||||||
|
rarely searched, semistructural data or a lazy DBA.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Operations</title>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>hstore -> text</literal> - get value , perl analogy $h{key}
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select 'a=>q, b=>g'->'a';
|
||||||
|
?
|
||||||
|
------
|
||||||
|
q
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Note the use of parenthesis in the select below, because priority of 'is' is
|
||||||
|
higher than that of '->':
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL;
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>hstore || hstore</literal> - concatenation, perl analogy %a=( %b, %c );
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
|
||||||
|
?column?
|
||||||
|
--------------------
|
||||||
|
"a"=>"b", "c"=>"d"
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
but, notice
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
|
||||||
|
?column?
|
||||||
|
----------
|
||||||
|
"a"=>"d"
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>text => text</literal> - creates hstore type from two text strings
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select 'a'=>'b';
|
||||||
|
?column?
|
||||||
|
----------
|
||||||
|
"a"=>"b"
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>hstore @> hstore</literal> - contains operation, check if left operand contains right.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
|
||||||
|
?column?
|
||||||
|
----------
|
||||||
|
f
|
||||||
|
(1 row)
|
||||||
|
|
||||||
|
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
|
||||||
|
?column?
|
||||||
|
----------
|
||||||
|
t
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>hstore <@ hstore</literal> - contained operation, check if
|
||||||
|
left operand is contained in right
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
||||||
|
respectively called @ and ~. These names are still available, but are
|
||||||
|
deprecated and will eventually be retired. Notice that the old names
|
||||||
|
are reversed from the convention formerly followed by the core geometric
|
||||||
|
datatypes!)
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Functions</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>akeys(hstore)</literal> - returns all keys from hstore as array
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select akeys('a=>1,b=>2');
|
||||||
|
akeys
|
||||||
|
-------
|
||||||
|
{a,b}
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>skeys(hstore)</literal> - returns all keys from hstore as strings
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select skeys('a=>1,b=>2');
|
||||||
|
skeys
|
||||||
|
-------
|
||||||
|
a
|
||||||
|
b
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>avals(hstore)</literal> - returns all values from hstore as array
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select avals('a=>1,b=>2');
|
||||||
|
avals
|
||||||
|
-------
|
||||||
|
{1,2}
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>svals(hstore)</literal> - returns all values from hstore as
|
||||||
|
strings
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select svals('a=>1,b=>2');
|
||||||
|
svals
|
||||||
|
-------
|
||||||
|
1
|
||||||
|
2
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>delete (hstore,text)</literal> - delete (key,value) from hstore if
|
||||||
|
key matches argument.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select delete('a=>1,b=>2','b');
|
||||||
|
delete
|
||||||
|
----------
|
||||||
|
"a"=>"1"
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>each(hstore)</literal> - return (key, value) pairs
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select * from each('a=>1,b=>2');
|
||||||
|
key | value
|
||||||
|
-----+-------
|
||||||
|
a | 1
|
||||||
|
b | 2
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>exist (hstore,text)</literal>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
<literal>hstore ? text</literal> - returns 'true if key is exists in hstore
|
||||||
|
and false otherwise.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
|
||||||
|
exist | ?column?
|
||||||
|
-------+----------
|
||||||
|
t | t
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>defined (hstore,text)</literal> - returns true if key is exists in
|
||||||
|
hstore and its value is not NULL.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
regression=# select defined('a=>NULL','a');
|
||||||
|
defined
|
||||||
|
---------
|
||||||
|
f
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Indices</title>
|
||||||
|
<para>
|
||||||
|
Module provides index support for '@>' and '?' operations.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE INDEX hidx ON testhstore USING GIST(h);
|
||||||
|
CREATE INDEX hidx ON testhstore USING GIN(h);
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Examples</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Add a key:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
UPDATE tt SET h=h||'c=>3';
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Delete a key:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
UPDATE tt SET h=delete(h,'k1');
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Statistics</title>
|
||||||
|
<para>
|
||||||
|
hstore type, because of its intrinsic liberality, could contain a lot of
|
||||||
|
different keys. Checking for valid keys is the task of application.
|
||||||
|
Examples below demonstrate several techniques how to check keys statistics.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Simple example
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 ');
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Using table
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>Online stat</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key;
|
||||||
|
key | count
|
||||||
|
-----------+-------
|
||||||
|
line | 883
|
||||||
|
query | 207
|
||||||
|
pos | 203
|
||||||
|
node | 202
|
||||||
|
space | 197
|
||||||
|
status | 195
|
||||||
|
public | 194
|
||||||
|
title | 190
|
||||||
|
org | 189
|
||||||
|
...................
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Authors</title>
|
||||||
|
<para>
|
||||||
|
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,82 @@
|
||||||
|
|
||||||
|
<sect1 id="intagg">
|
||||||
|
<title>intagg</title>
|
||||||
|
|
||||||
|
<indexterm zone="intagg">
|
||||||
|
<primary>intagg</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This section describes the <literal>intagg</literal> module which provides an integer aggregator and an enumerator.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Many database systems have the notion of a one to many table. Such a table usually sits between two indexed tables, as:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE one_to_many(left INT, right INT) ;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
And it is used like this:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
|
||||||
|
WHERE one_to_many.left = item;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This will return all the items in the right hand table for an entry
|
||||||
|
in the left hand table. This is a very common construct in SQL.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Now, this methodology can be cumbersome with a very large number of
|
||||||
|
entries in the one_to_many table. Depending on the order in which
|
||||||
|
data was entered, a join like this could result in an index scan
|
||||||
|
and a fetch for each right hand entry in the table for a particular
|
||||||
|
left hand entry. If you have a very dynamic system, there is not much you
|
||||||
|
can do. However, if you have some data which is fairly static, you can
|
||||||
|
create a summary table with the aggregator.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE summary as SELECT left, int_array_aggregate(right)
|
||||||
|
AS right FROM one_to_many GROUP BY left;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This will create a table with one row per left item, and an array
|
||||||
|
of right items. Now this is pretty useless without some way of using
|
||||||
|
the array, thats why there is an array enumerator.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT left, int_array_enum(right) FROM summary WHERE left = item;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The above query using int_array_enum, produces the same results as:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT left, right FROM one_to_many WHERE left = item;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The difference is that the query against the summary table has to get
|
||||||
|
only one row from the table, where as the query against "one_to_many"
|
||||||
|
must index scan and fetch a row for each entry.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
|
||||||
|
to a cost of 329. The query is a join between the one_to_many table,
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT right, count(right) FROM
|
||||||
|
(
|
||||||
|
SELECT left, int_array_enum(right) AS right FROM summary JOIN
|
||||||
|
(SELECT left FROM left_table WHERE left = item) AS lefts
|
||||||
|
ON (summary.left = lefts.left )
|
||||||
|
) AS list GROUP BY right ORDER BY count DESC ;
|
||||||
|
</programlisting>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,286 @@
|
||||||
|
<sect1 id="intarray">
|
||||||
|
<title>intarray</title>
|
||||||
|
|
||||||
|
<indexterm zone="intarray">
|
||||||
|
<primary>intarray</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This is an implementation of RD-tree data structure using GiST interface
|
||||||
|
of PostgreSQL. It has built-in lossy compression.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Current implementation provides index support for one-dimensional array of
|
||||||
|
int4's - gist__int_ops, suitable for small and medium size of arrays (used on
|
||||||
|
default), and gist__intbig_ops for indexing large arrays (we use superimposed
|
||||||
|
signature with length of 4096 bits to represent sets).
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Functions</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int icount(int[])</literal> - the number of elements in intarray
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# select icount('{1,2,3}'::int[]);
|
||||||
|
icount
|
||||||
|
--------
|
||||||
|
3
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int[] sort(int[], 'asc' | 'desc')</literal> - sort intarray
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# select sort('{1,2,3}'::int[],'desc');
|
||||||
|
sort
|
||||||
|
---------
|
||||||
|
{3,2,1}
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int[] sort(int[])</literal> - sort in ascending order
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int[] sort_asc(int[]),sort_desc(int[])</literal> - shortcuts for sort
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int[] uniq(int[])</literal> - returns unique elements
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# select uniq(sort('{1,2,3,2,1}'::int[]));
|
||||||
|
uniq
|
||||||
|
---------
|
||||||
|
{1,2,3}
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int idx(int[], int item)</literal> - returns index of first
|
||||||
|
intarray matching element to item, or '0' if matching failed.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# select idx('{1,2,3,2,1}'::int[],2);
|
||||||
|
idx
|
||||||
|
-----
|
||||||
|
2
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int[] subarray(int[],int START [, int LEN])</literal> - returns
|
||||||
|
part of intarray starting from element number START (from 1) and length LEN.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# select subarray('{1,2,3,2,1}'::int[],2,3);
|
||||||
|
subarray
|
||||||
|
----------
|
||||||
|
{2,3,2}
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int[] intset(int4)</literal> - casting int4 to int[]
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# select intset(1);
|
||||||
|
intset
|
||||||
|
--------
|
||||||
|
{1}
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Operations</title>
|
||||||
|
<table>
|
||||||
|
<title>Operations</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Operator</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] && int[]</literal></entry>
|
||||||
|
<entry>overlap - returns TRUE if arrays have at least one common element</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] @> int[]</literal></entry>
|
||||||
|
<entry>contains - returns TRUE if left array contains right array</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] <@ int[]</literal></entry>
|
||||||
|
<entry>contained - returns TRUE if left array is contained in right array</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal># int[]</literal></entry>
|
||||||
|
<entry>returns the number of elements in array</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] + int</literal></entry>
|
||||||
|
<entry>push element to array ( add to end of array)</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] + int[] </literal></entry>
|
||||||
|
<entry>merge of arrays (right array added to the end of left one)</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] - int</literal></entry>
|
||||||
|
<entry>remove entries matched by right argument from array</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] - int[]</literal></entry>
|
||||||
|
<entry>remove right array from left</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] | int</literal></entry>
|
||||||
|
<entry>returns intarray - union of arguments</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] | int[]</literal></entry>
|
||||||
|
<entry>returns intarray as a union of two arrays</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] & int[]</literal></entry>
|
||||||
|
<entry>returns intersection of arrays</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>int[] @@ query_int</literal></entry>
|
||||||
|
<entry>
|
||||||
|
returns TRUE if array satisfies query (like
|
||||||
|
<literal>'1&(2|3)'</literal>)
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><literal>query_int ~~ int[]</literal></entry>
|
||||||
|
<entry>returns TRUE if array satisfies query (commutator of @@)</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<para>
|
||||||
|
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
||||||
|
respectively called @ and ~. These names are still available, but are
|
||||||
|
deprecated and will eventually be retired. Notice that the old names
|
||||||
|
are reversed from the convention formerly followed by the core geometric
|
||||||
|
datatypes!)
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Example</title>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE message (mid INT NOT NULL,sections INT[]);
|
||||||
|
CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL);
|
||||||
|
|
||||||
|
-- create indices
|
||||||
|
CREATE unique index message_key ON message ( mid );
|
||||||
|
CREATE unique index message_section_map_key2 ON message_section_map (sid, mid );
|
||||||
|
CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops);
|
||||||
|
|
||||||
|
-- select some messages with section in 1 OR 2 - OVERLAP operator
|
||||||
|
SELECT message.mid FROM message WHERE message.sections && '{1,2}';
|
||||||
|
|
||||||
|
-- select messages contains in sections 1 AND 2 - CONTAINS operator
|
||||||
|
SELECT message.mid FROM message WHERE message.sections @> '{1,2}';
|
||||||
|
-- the same, CONTAINED operator
|
||||||
|
SELECT message.mid FROM message WHERE '{1,2}' <@ message.sections;
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Benchmark</title>
|
||||||
|
<para>
|
||||||
|
subdirectory bench contains benchmark suite.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
cd ./bench
|
||||||
|
1. createdb TEST
|
||||||
|
2. psql TEST < ../_int.sql
|
||||||
|
3. ./create_test.pl | psql TEST
|
||||||
|
4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
|
||||||
|
with/without RD-Tree. Run script without arguments to
|
||||||
|
see availbale options.
|
||||||
|
|
||||||
|
a)test without RD-Tree (OR)
|
||||||
|
./bench.pl -d TEST -c -s 1,2 -v
|
||||||
|
b)test with RD-Tree
|
||||||
|
./bench.pl -d TEST -c -s 1,2 -v -r
|
||||||
|
|
||||||
|
BENCHMARKS:
|
||||||
|
|
||||||
|
Size of table <message>: 200000
|
||||||
|
Size of table <message_section_map>: 269133
|
||||||
|
|
||||||
|
Distribution of messages by sections:
|
||||||
|
|
||||||
|
section 0: 74377 messages
|
||||||
|
section 1: 16284 messages
|
||||||
|
section 50: 1229 messages
|
||||||
|
section 99: 683 messages
|
||||||
|
|
||||||
|
old - without RD-Tree support,
|
||||||
|
new - with RD-Tree
|
||||||
|
|
||||||
|
+----------+---------------+----------------+
|
||||||
|
|Search set|OR, time in sec|AND, time in sec|
|
||||||
|
| +-------+-------+--------+-------+
|
||||||
|
| | old | new | old | new |
|
||||||
|
+----------+-------+-------+--------+-------+
|
||||||
|
| 1| 0.625| 0.101| -| -|
|
||||||
|
+----------+-------+-------+--------+-------+
|
||||||
|
| 99| 0.018| 0.017| -| -|
|
||||||
|
+----------+-------+-------+--------+-------+
|
||||||
|
| 1,2| 0.766| 0.133| 0.628| 0.045|
|
||||||
|
+----------+-------+-------+--------+-------+
|
||||||
|
| 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
|
||||||
|
+----------+-------+-------+--------+-------+
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Authors</title>
|
||||||
|
<para>
|
||||||
|
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and Oleg
|
||||||
|
Bartunov (<email>oleg@sai.msu.su</email>). See
|
||||||
|
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
|
||||||
|
additional information. Andrey Oktyabrski did a great work on adding new
|
||||||
|
functions and operations.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,502 @@
|
||||||
|
<sect1 id="isn">
|
||||||
|
<title>isn</title>
|
||||||
|
|
||||||
|
<indexterm zone="isn">
|
||||||
|
<primary>isn</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <literal>isn</literal> module adds data types for the following
|
||||||
|
international-standard namespaces: EAN13, UPC, ISBN (books), ISMN (music),
|
||||||
|
and ISSN (serials). This module is inspired by Garrett A. Wollman's
|
||||||
|
isbn_issn code.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This module validates, and automatically adds the correct
|
||||||
|
hyphenations to the numbers. Also, it supports the new ISBN-13
|
||||||
|
numbers to be used starting in January 2007.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Premises:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<orderedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>some ISBN13 numbers can be displayed as ISBN</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>some ISMN13 numbers can be displayed as ISMN</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>some ISSN13 numbers can be displayed as ISSN</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
All types are internally represented as 64 bit integers,
|
||||||
|
and internally all are consistently interchangeable.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
We have two operator classes (for btree and for hash) so each data type
|
||||||
|
can be indexed for faster access.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Data types</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
We have the following data types:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Data types</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry><para>Data type</para></entry>
|
||||||
|
<entry><para>Description</para></entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>EAN13</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
European Article Numbers. This type will always show the EAN13-display
|
||||||
|
format. Te output function for this is <literal>ean13_out()</literal>
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ISBN13</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
For International Standard Book Numbers to be displayed in
|
||||||
|
the new EAN13-display format.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ISMN13</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
For International Standard Music Numbers to be displayed in
|
||||||
|
the new EAN13-display format.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ISSN13</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
For International Standard Serial Numbers to be displayed in the new
|
||||||
|
EAN13-display format.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ISBN</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
For International Standard Book Numbers to be displayed in the current
|
||||||
|
short-display format.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ISMN</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
For International Standard Music Numbers to be displayed in the
|
||||||
|
current short-display format.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ISSN</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
For International Standard Serial Numbers to be displayed in the
|
||||||
|
current short-display format. These types will display the short
|
||||||
|
version of the ISxN (ISxN 10) whenever it's possible, and it will
|
||||||
|
show ISxN 13 when it's impossible to show the short version. The
|
||||||
|
output function to do this is <literal>isn_out()</literal>
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>UPC</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
For Universal Product Codes. UPC numbers are a subset of the EAN13
|
||||||
|
numbers (they are basically EAN13 without the first '0' digit.)
|
||||||
|
The output function to do this is also <literal>isn_out()</literal>
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
<literal>EAN13</literal>, <literal>ISBN13</literal>,
|
||||||
|
<literal>ISMN13</literal> and <literal>ISSN13</literal> types will always
|
||||||
|
display the long version of the ISxN (EAN13). The output function to do
|
||||||
|
this is <literal>ean13_out()</literal>.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The need for these types is just for displaying in different ways the same
|
||||||
|
data: <literal>ISBN13</literal> is actually the same as
|
||||||
|
<literal>ISBN</literal>, <literal>ISMN13=ISMN</literal> and
|
||||||
|
<literal>ISSN13=ISSN</literal>.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Input functions</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
We have the following input functions:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Input functions</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Function</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ean13_in()</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
To take a string and return an EAN13.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>isbn_in()</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
To take a string and return valid ISBN or ISBN13 numbers.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>ismn_in()</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
To take a string and return valid ISMN or ISMN13 numbers.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>issn_in()</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
To take a string and return valid ISSN or ISSN13 numbers.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>upc_in()</literal></para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
To take a string and return an UPC codes.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Casts</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
We are able to cast from:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISBN13 -> EAN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISMN13 -> EAN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISSN13 -> EAN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISBN -> EAN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISMN -> EAN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISSN -> EAN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
UPC -> EAN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISBN <-> ISBN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISMN <-> ISMN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
ISSN <-> ISSN13
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>C API</title>
|
||||||
|
<para>
|
||||||
|
The C API is implemented as:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
extern Datum isn_out(PG_FUNCTION_ARGS);
|
||||||
|
extern Datum ean13_out(PG_FUNCTION_ARGS);
|
||||||
|
extern Datum ean13_in(PG_FUNCTION_ARGS);
|
||||||
|
extern Datum isbn_in(PG_FUNCTION_ARGS);
|
||||||
|
extern Datum ismn_in(PG_FUNCTION_ARGS);
|
||||||
|
extern Datum issn_in(PG_FUNCTION_ARGS);
|
||||||
|
extern Datum upc_in(PG_FUNCTION_ARGS);
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
On success:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>isn_out()</literal> takes any of our types and returns a string containing
|
||||||
|
the shortes possible representation of the number.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ean13_out()</literal> takes any of our types and returns the
|
||||||
|
EAN13 (long) representation of the number.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ean13_in()</literal> takes a string and return a EAN13. Which, as stated in (2)
|
||||||
|
could or could not be any of our types, but it certainly is an EAN13
|
||||||
|
number. Only if the string is a valid EAN13 number, otherwise it fails.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>isbn_in()</literal> takes a string and return an ISBN/ISBN13. Only if the string
|
||||||
|
is really a ISBN/ISBN13, otherwise it fails.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ismn_in()</literal> takes a string and return an ISMN/ISMN13. Only if the string
|
||||||
|
is really a ISMN/ISMN13, otherwise it fails.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>issn_in()</literal> takes a string and return an ISSN/ISSN13. Only if the string
|
||||||
|
is really a ISSN/ISSN13, otherwise it fails.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>upc_in()</literal> takes a string and return an UPC. Only if the string is
|
||||||
|
really a UPC, otherwise it fails.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
(on failure, the functions 'ereport' the error)
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Testing functions</title>
|
||||||
|
<table>
|
||||||
|
<title>Testing functions</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry><para>Function</para></entry>
|
||||||
|
<entry><para>Description</para></entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>isn_weak(boolean)</literal></para></entry>
|
||||||
|
<entry><para>Sets the weak input mode.</para></entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>isn_weak()</literal></para></entry>
|
||||||
|
<entry><para>Gets the current status of the weak mode.</para></entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>make_valid()</literal></para></entry>
|
||||||
|
<entry><para>Validates an invalid number (deleting the invalid flag).</para></entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para><literal>is_valid()</literal></para></entry>
|
||||||
|
<entry><para>Checks for the invalid flag prsence.</para></entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>Weak</literal> mode is used to be able to insert invalid data to
|
||||||
|
a table. Invalid as in the check digit being wrong, not missing numbers.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Why would you want to use the weak mode? Well, it could be that
|
||||||
|
you have a huge collection of ISBN numbers, and that there are so many of
|
||||||
|
them that for weird reasons some have the wrong check digit (perhaps the
|
||||||
|
numbers where scanned from a printed list and the OCR got the numbers wrong,
|
||||||
|
perhaps the numbers were manually captured... who knows.) Anyway, the thing
|
||||||
|
is you might want to clean the mess up, but you still want to be able to have
|
||||||
|
all the numbers in your database and maybe use an external tool to access
|
||||||
|
the invalid numbers in the database so you can verify the information and
|
||||||
|
validate it more easily; as selecting all the invalid numbers in the table.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
When you insert invalid numbers in a table using the weak mode, the number
|
||||||
|
will be inserted with the corrected check digit, but it will be flagged
|
||||||
|
with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
You can also force the insertion of invalid numbers even not in the weak mode,
|
||||||
|
appending the '!' character at the end of the number.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Examples</title>
|
||||||
|
<programlisting>
|
||||||
|
--Using the types directly:
|
||||||
|
SELECT isbn('978-0-393-04002-9');
|
||||||
|
SELECT isbn13('0901690546');
|
||||||
|
SELECT issn('1436-4522');
|
||||||
|
|
||||||
|
--Casting types:
|
||||||
|
-- note that you can only cast from ean13 to other type when the casted
|
||||||
|
-- number would be valid in the realm of the casted type;
|
||||||
|
-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
|
||||||
|
-- but these will:
|
||||||
|
SELECT upc(ean13('0220356483481'));
|
||||||
|
SELECT ean13(upc('220356483481'));
|
||||||
|
|
||||||
|
--Create a table with a single column to hold ISBN numbers:
|
||||||
|
CREATE TABLE test ( id isbn );
|
||||||
|
INSERT INTO test VALUES('9780393040029');
|
||||||
|
|
||||||
|
--Automatically calculating check digits (observe the '?'):
|
||||||
|
INSERT INTO test VALUES('220500896?');
|
||||||
|
INSERT INTO test VALUES('978055215372?');
|
||||||
|
|
||||||
|
SELECT issn('3251231?');
|
||||||
|
SELECT ismn('979047213542?');
|
||||||
|
|
||||||
|
--Using the weak mode:
|
||||||
|
SELECT isn_weak(true);
|
||||||
|
INSERT INTO test VALUES('978-0-11-000533-4');
|
||||||
|
INSERT INTO test VALUES('9780141219307');
|
||||||
|
INSERT INTO test VALUES('2-205-00876-X');
|
||||||
|
SELECT isn_weak(false);
|
||||||
|
|
||||||
|
SELECT id FROM test WHERE NOT is_valid(id);
|
||||||
|
UPDATE test SET id=make_valid(id) WHERE id = '2-205-00876-X!';
|
||||||
|
|
||||||
|
SELECT * FROM test;
|
||||||
|
|
||||||
|
SELECT isbn13(id) FROM test;
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Bibliography</title>
|
||||||
|
<para>
|
||||||
|
The information to implement this module was collected through
|
||||||
|
several sites, including:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
http://www.isbn-international.org/
|
||||||
|
http://www.issn.org/
|
||||||
|
http://www.ismn-international.org/
|
||||||
|
http://www.wikipedia.org/
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
the prefixes used for hyphenation where also compiled from:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html
|
||||||
|
http://www.isbn-international.org/en/identifiers.html
|
||||||
|
http://www.ismn-international.org/ranges.html
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Care was taken during the creation of the algorithms and they
|
||||||
|
were meticulously verified against the suggested algorithms
|
||||||
|
in the official ISBN, ISMN, ISSN User Manuals.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
Germán Méndez Bravo (Kronuz), 2004 - 2006
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,118 @@
|
||||||
|
|
||||||
|
<sect1 id="lo">
|
||||||
|
<title>lo</title>
|
||||||
|
|
||||||
|
<indexterm zone="lo">
|
||||||
|
<primary>lo</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
PostgreSQL type extension for managing Large Objects
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Overview</title>
|
||||||
|
<para>
|
||||||
|
One of the problems with the JDBC driver (and this affects the ODBC driver
|
||||||
|
also), is that the specification assumes that references to BLOBS (Binary
|
||||||
|
Large OBjectS) are stored within a table, and if that entry is changed, the
|
||||||
|
associated BLOB is deleted from the database.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
As PostgreSQL stands, this doesn't occur. Large objects are treated as
|
||||||
|
objects in their own right; a table entry can reference a large object by
|
||||||
|
OID, but there can be multiple table entries referencing the same large
|
||||||
|
object OID, so the system doesn't delete the large object just because you
|
||||||
|
change or remove one such entry.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Now this is fine for new PostgreSQL-specific applications, but existing ones
|
||||||
|
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
|
||||||
|
that are not referenced by anything, and simply occupy disk space.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>The Fix</title>
|
||||||
|
<para>
|
||||||
|
I've fixed this by creating a new data type 'lo', some support functions, and
|
||||||
|
a Trigger which handles the orphaning problem. The trigger essentially just
|
||||||
|
does a 'lo_unlink' whenever you delete or modify a value referencing a large
|
||||||
|
object. When you use this trigger, you are assuming that there is only one
|
||||||
|
database reference to any large object that is referenced in a
|
||||||
|
trigger-controlled column!
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The 'lo' type was created because we needed to differentiate between plain
|
||||||
|
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
|
||||||
|
but (after talking to Byron), the ODBC driver needed a unique type. They had
|
||||||
|
created an 'lo' type, but not the solution to orphaning.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
You don't actually have to use the 'lo' type to use the trigger, but it may be
|
||||||
|
convenient to use it to keep track of which columns in your database represent
|
||||||
|
large objects that you are managing with the trigger.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>How to Use</title>
|
||||||
|
<para>
|
||||||
|
The easiest way is by an example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE image (title TEXT, raster lo);
|
||||||
|
CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image
|
||||||
|
FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Create a trigger for each column that contains a lo type, and give the column
|
||||||
|
name as the trigger procedure argument. You can have more than one trigger on
|
||||||
|
a table if you need multiple lo columns in the same table, but don't forget to
|
||||||
|
give a different name to each trigger.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Issues</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Dropping a table will still orphan any objects it contains, as the trigger
|
||||||
|
is not executed.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Avoid this by preceding the 'drop table' with 'delete from {table}'.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
If you already have, or suspect you have, orphaned large objects, see
|
||||||
|
the contrib/vacuumlo module to help you clean them up. It's a good idea
|
||||||
|
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
|
||||||
|
trigger.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Some frontends may create their own tables, and will not create the
|
||||||
|
associated trigger(s). Also, users may not remember (or know) to create
|
||||||
|
the triggers.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
As the ODBC driver needs a permanent lo type (& JDBC could be optimised to
|
||||||
|
use it if it's Oid is fixed), and as the above issues can only be fixed by
|
||||||
|
some internal changes, I feel it should become a permanent built-in type.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
Peter Mount <email>peter@retep.org.uk</email> June 13 1998
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,771 @@
|
||||||
|
|
||||||
|
<sect1 id="ltree">
|
||||||
|
<title>ltree</title>
|
||||||
|
|
||||||
|
<indexterm zone="ltree">
|
||||||
|
<primary>ltree</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>ltree</literal> is a PostgreSQL module that contains implementation
|
||||||
|
of data types, indexed access methods and queries for data organized as a
|
||||||
|
tree-like structures.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Definitions</title>
|
||||||
|
<para>
|
||||||
|
A <emphasis>label</emphasis> of a node is a sequence of one or more words
|
||||||
|
separated by blank character '_' and containing letters and digits ( for
|
||||||
|
example, [a-zA-Z0-9] for C locale). The length of a label is limited by 256
|
||||||
|
bytes.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example: 'Countries', 'Personal_Services'
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
A <emphasis>label path</emphasis> of a node is a sequence of one or more
|
||||||
|
dot-separated labels l1.l2...ln, represents path from root to the node. The
|
||||||
|
length of a label path is limited by 65Kb, but size <= 2Kb is preferrable.
|
||||||
|
We consider it's not a strict limitation (maximal size of label path for
|
||||||
|
DMOZ catalogue - <ulink url="http://www.dmoz.org"></ulink>, is about 240
|
||||||
|
bytes!)
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example: <literal>'Top.Countries.Europe.Russia'</literal>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
We introduce several datatypes:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree</literal> - is a datatype for label path.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree[]</literal> - is a datatype for arrays of ltree.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>lquery</literal>
|
||||||
|
- is a path expression that has regular expression in the label path and
|
||||||
|
used for ltree matching. Star symbol (*) is used to specify any number of
|
||||||
|
labels (levels) and could be used at the beginning and the end of lquery,
|
||||||
|
for example, '*.Europe.*'.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The following quantifiers are recognized for '*' (like in Perl):
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>{n} Match exactly n levels</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>{n,} Match at least n levels</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>{n,m} Match at least n but not more than m levels</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>{,m} Match at maximum m levels (eq. to {0,m})</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
<para>
|
||||||
|
It is possible to use several modifiers at the end of a label:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>@ Do case-insensitive label matching</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>* Do prefix matching for a label</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>% Don't account word separator '_' in label matching, that is
|
||||||
|
'Russian%' would match 'Russian_nations', but not 'Russian'
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>lquery</literal> can contain logical '!' (NOT) at the beginning
|
||||||
|
of the label and '|' (OR) to specify possible alternatives for label
|
||||||
|
matching.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example of <literal>lquery</literal>:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
|
||||||
|
a) b) c) d) e)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
A label path should
|
||||||
|
</para>
|
||||||
|
<orderedlist numeration='loweralpha'>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
begin from a node with label 'Top'
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
and following zero or 2 labels until
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
a node with label beginning from case-insensitive prefix 'sport'
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
following node with label not matched 'football' or 'tennis' and
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
end on node with label beginning from 'Russ' or strictly matched
|
||||||
|
'Spain'.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para><literal>ltxtquery</literal>
|
||||||
|
- is a datatype for label searching (like type 'query' for full text
|
||||||
|
searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
|
||||||
|
the end of word. The meaning of modifiers are the same as for lquery.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example: <literal>'Europe & Russia*@ & !Transportation'</literal>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
|
||||||
|
not 'Transportation'. Notice, the order of words as they appear in label
|
||||||
|
path is not important !
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Operations</title>
|
||||||
|
<para>
|
||||||
|
The following operations are defined for type ltree:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal><,>,<=,>=,=, <></literal>
|
||||||
|
- Have their usual meanings. Comparison is doing in the order of direct
|
||||||
|
tree traversing, children of a node are sorted lexicographic.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree @> ltree</literal>
|
||||||
|
- returns TRUE if left argument is an ancestor of right argument (or
|
||||||
|
equal).
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree <@ ltree </literal>
|
||||||
|
- returns TRUE if left argument is a descendant of right argument (or
|
||||||
|
equal).
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree ~ lquery, lquery ~ ltree</literal>
|
||||||
|
- return TRUE if node represented by ltree satisfies lquery.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree ? lquery[], lquery ? ltree[]</literal>
|
||||||
|
- return TRUE if node represented by ltree satisfies at least one lquery
|
||||||
|
from array.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree @ ltxtquery, ltxtquery @ ltree</literal>
|
||||||
|
- return TRUE if node represented by ltree satisfies ltxtquery.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree || ltree, ltree || text, text || ltree</literal>
|
||||||
|
- return concatenated ltree.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Operations for arrays of ltree (<literal>ltree[]</literal>):
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree[] @> ltree, ltree <@ ltree[]</literal>
|
||||||
|
- returns TRUE if array ltree[] contains an ancestor of ltree.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree @> ltree[], ltree[] <@ ltree</literal>
|
||||||
|
- returns TRUE if array ltree[] contains a descendant of ltree.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree[] ~ lquery, lquery ~ ltree[]</literal>
|
||||||
|
- returns TRUE if array ltree[] contains label paths matched lquery.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree[] ? lquery[], lquery[] ? ltree[]</literal>
|
||||||
|
- returns TRUE if array ltree[] contains label paths matched atleaset one
|
||||||
|
lquery from array.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree[] @ ltxtquery, ltxtquery @ ltree[]</literal>
|
||||||
|
- returns TRUE if array ltree[] contains label paths matched ltxtquery
|
||||||
|
(full text search).
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery</literal>
|
||||||
|
|
||||||
|
- returns first element of array ltree[] satisfies corresponding condition
|
||||||
|
and NULL in vice versa.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Remark</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Operations <literal><@</literal>, <literal>@></literal>, <literal>@</literal> and
|
||||||
|
<literal>~</literal> have analogues - <literal>^<@, ^@>, ^@, ^~,</literal> which don't use
|
||||||
|
indices!
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Indices</title>
|
||||||
|
<para>
|
||||||
|
Various indices could be created to speed up execution of operations:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
B-tree index over ltree: <literal><, <=, =, >=, ></literal>
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
GiST index over ltree: <literal><, <=, =, >=, >, @>, <@, @, ~, ?</literal>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE INDEX path_gist_idx ON test USING GIST (path);
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>GiST index over ltree[]:
|
||||||
|
<literal>ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.</literal>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Notices: This index is lossy.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Functions</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree subltree(ltree, start, end)</literal>
|
||||||
|
returns subpath of ltree from start (inclusive) until the end.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
# select subltree('Top.Child1.Child2',1,2);
|
||||||
|
subltree
|
||||||
|
--------
|
||||||
|
Child1
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree subpath(ltree, OFFSET,LEN)</literal> and
|
||||||
|
<literal>ltree subpath(ltree, OFFSET)</literal>
|
||||||
|
returns subpath of ltree from OFFSET (inclusive) with length LEN.
|
||||||
|
If OFFSET is negative returns subpath starts that far from the end
|
||||||
|
of the path. If LENGTH is omitted, returns everything to the end
|
||||||
|
of the path. If LENGTH is negative, leaves that many labels off
|
||||||
|
the end of the path.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
# select subpath('Top.Child1.Child2',1,2);
|
||||||
|
subpath
|
||||||
|
-------
|
||||||
|
Child1.Child2
|
||||||
|
|
||||||
|
# select subpath('Top.Child1.Child2',-2,1);
|
||||||
|
subpath
|
||||||
|
---------
|
||||||
|
Child1
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int4 nlevel(ltree)</literal> - returns level of the node.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
# select nlevel('Top.Child1.Child2');
|
||||||
|
nlevel
|
||||||
|
--------
|
||||||
|
3
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Note, that arguments start, end, OFFSET, LEN have meaning of level of the
|
||||||
|
node !
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>int4 index(ltree,ltree)</literal> and
|
||||||
|
<literal>int4 index(ltree,ltree,OFFSET)</literal>
|
||||||
|
returns number of level of the first occurence of second argument in first
|
||||||
|
one beginning from OFFSET. if OFFSET is negative, than search begins from |
|
||||||
|
OFFSET| levels from the end of the path.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
|
||||||
|
index
|
||||||
|
-------
|
||||||
|
6
|
||||||
|
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
|
||||||
|
index
|
||||||
|
-------
|
||||||
|
9
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree text2ltree(text)</literal> and
|
||||||
|
<literal>text ltree2text(text)</literal> cast functions for ltree and text.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>ltree lca(ltree,ltree,...) (up to 8 arguments)</literal> and
|
||||||
|
<literal>ltree lca(ltree[])</literal> Returns Lowest Common Ancestor (lca).
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
# select lca('1.2.2.3','1.2.3.4.5.6');
|
||||||
|
lca
|
||||||
|
-----
|
||||||
|
1.2
|
||||||
|
# select lca('{la.2.3,1.2.3.4.5.6}') is null;
|
||||||
|
?column?
|
||||||
|
----------
|
||||||
|
f
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Installation</title>
|
||||||
|
<programlisting>
|
||||||
|
cd contrib/ltree
|
||||||
|
make
|
||||||
|
make install
|
||||||
|
make installcheck
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Example</title>
|
||||||
|
<programlisting>
|
||||||
|
createdb ltreetest
|
||||||
|
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
|
||||||
|
psql ltreetest < ltreetest.sql
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Now, we have a database ltreetest populated with a data describing hierarchy
|
||||||
|
shown below:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
|
||||||
|
|
||||||
|
TOP
|
||||||
|
/ | \
|
||||||
|
Science Hobbies Collections
|
||||||
|
/ | \
|
||||||
|
Astronomy Amateurs_Astronomy Pictures
|
||||||
|
/ \ |
|
||||||
|
Astrophysics Cosmology Astronomy
|
||||||
|
/ | \
|
||||||
|
Galaxies Stars Astronauts
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Inheritance:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
ltreetest=# select path from test where path <@ 'Top.Science';
|
||||||
|
path
|
||||||
|
------------------------------------
|
||||||
|
Top.Science
|
||||||
|
Top.Science.Astronomy
|
||||||
|
Top.Science.Astronomy.Astrophysics
|
||||||
|
Top.Science.Astronomy.Cosmology
|
||||||
|
(4 rows)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Matching:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
ltreetest=# select path from test where path ~ '*.Astronomy.*';
|
||||||
|
path
|
||||||
|
-----------------------------------------------
|
||||||
|
Top.Science.Astronomy
|
||||||
|
Top.Science.Astronomy.Astrophysics
|
||||||
|
Top.Science.Astronomy.Cosmology
|
||||||
|
Top.Collections.Pictures.Astronomy
|
||||||
|
Top.Collections.Pictures.Astronomy.Stars
|
||||||
|
Top.Collections.Pictures.Astronomy.Galaxies
|
||||||
|
Top.Collections.Pictures.Astronomy.Astronauts
|
||||||
|
(7 rows)
|
||||||
|
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
|
||||||
|
path
|
||||||
|
------------------------------------
|
||||||
|
Top.Science.Astronomy
|
||||||
|
Top.Science.Astronomy.Astrophysics
|
||||||
|
Top.Science.Astronomy.Cosmology
|
||||||
|
(3 rows)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Full text search:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
|
||||||
|
path
|
||||||
|
------------------------------------
|
||||||
|
Top.Science.Astronomy
|
||||||
|
Top.Science.Astronomy.Astrophysics
|
||||||
|
Top.Science.Astronomy.Cosmology
|
||||||
|
Top.Hobbies.Amateurs_Astronomy
|
||||||
|
(4 rows)
|
||||||
|
|
||||||
|
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
|
||||||
|
path
|
||||||
|
------------------------------------
|
||||||
|
Top.Science.Astronomy
|
||||||
|
Top.Science.Astronomy.Astrophysics
|
||||||
|
Top.Science.Astronomy.Cosmology
|
||||||
|
(3 rows)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Using Functions:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
|
||||||
|
?column?
|
||||||
|
------------------------------------------
|
||||||
|
Top.Science.Space.Astronomy
|
||||||
|
Top.Science.Space.Astronomy.Astrophysics
|
||||||
|
Top.Science.Space.Astronomy.Cosmology
|
||||||
|
(3 rows)
|
||||||
|
We could create SQL-function:
|
||||||
|
CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
|
||||||
|
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
|
||||||
|
LANGUAGE SQL IMMUTABLE;
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
and previous select could be rewritten as:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
|
||||||
|
ins_label
|
||||||
|
------------------------------------------
|
||||||
|
Top.Science.Space.Astronomy
|
||||||
|
Top.Science.Space.Astronomy.Astrophysics
|
||||||
|
Top.Science.Space.Astronomy.Cosmology
|
||||||
|
(3 rows)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Or with another arguments:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
|
||||||
|
AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
|
||||||
|
LANGUAGE SQL IMMUTABLE;
|
||||||
|
|
||||||
|
ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
|
||||||
|
ins_label
|
||||||
|
------------------------------------------
|
||||||
|
Top.Science.Space.Astronomy
|
||||||
|
Top.Science.Space.Astronomy.Astrophysics
|
||||||
|
Top.Science.Space.Astronomy.Cosmology
|
||||||
|
(3 rows)
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Additional data</title>
|
||||||
|
<para>
|
||||||
|
To get more feeling from our ltree module you could download
|
||||||
|
dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
|
||||||
|
available from
|
||||||
|
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/ltree/"></ulink>
|
||||||
|
dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
|
||||||
|
Setup your test database (dmoz), load ltree module and issue command:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
zcat dmozltree-eng.sql.gz| psql dmoz
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Data will be loaded into database dmoz and all indices will be created.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Benchmarks</title>
|
||||||
|
<para>
|
||||||
|
All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
|
||||||
|
data, containing 300,274 nodes (see above for download link). We used some
|
||||||
|
basic queries typical for walking through catalog.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title>Queries</title>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Q0: Count all rows (sort of base time for comparison)
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select count(*) from dmoz;
|
||||||
|
count
|
||||||
|
--------
|
||||||
|
300274
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Q1: Get direct children (without inheritance)
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
|
||||||
|
path
|
||||||
|
-----------------------------------
|
||||||
|
Top.Adult.Arts.Animation.Cartoons
|
||||||
|
Top.Adult.Arts.Animation.Anime
|
||||||
|
(2 rows)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Q2: The same as Q1 but with counting of successors
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select path as parentpath , (select count(*)-1 from dmoz where path <@
|
||||||
|
p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
|
||||||
|
parentpath | count
|
||||||
|
-----------------------------------+-------
|
||||||
|
Top.Adult.Arts.Animation.Cartoons | 2
|
||||||
|
Top.Adult.Arts.Animation.Anime | 61
|
||||||
|
(2 rows)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Q3: Get all parents
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
|
||||||
|
path asc;
|
||||||
|
path
|
||||||
|
--------------------------
|
||||||
|
Top
|
||||||
|
Top.Adult
|
||||||
|
Top.Adult.Arts
|
||||||
|
Top.Adult.Arts.Animation
|
||||||
|
(4 rows)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Q4: Get all parents with counting of children
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select path, (select count(*)-1 from dmoz where path <@ p.path) as count
|
||||||
|
from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
|
||||||
|
path | count
|
||||||
|
--------------------------+--------
|
||||||
|
Top | 300273
|
||||||
|
Top.Adult | 4913
|
||||||
|
Top.Adult.Arts | 339
|
||||||
|
Top.Adult.Arts.Animation | 65
|
||||||
|
(4 rows)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Q5: Get all children with levels
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
|
||||||
|
from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
|
||||||
|
path | level
|
||||||
|
------------------------------------------------+-------
|
||||||
|
Top.Adult.Arts.Animation.Anime | 1
|
||||||
|
Top.Adult.Arts.Animation.Anime.Fan_Works | 2
|
||||||
|
Top.Adult.Arts.Animation.Anime.Games | 2
|
||||||
|
Top.Adult.Arts.Animation.Anime.Genres | 2
|
||||||
|
Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
|
||||||
|
Top.Adult.Arts.Animation.Anime.Multimedia | 2
|
||||||
|
Top.Adult.Arts.Animation.Anime.Resources | 2
|
||||||
|
Top.Adult.Arts.Animation.Anime.Titles | 2
|
||||||
|
Top.Adult.Arts.Animation.Cartoons | 1
|
||||||
|
Top.Adult.Arts.Animation.Cartoons.AVS | 2
|
||||||
|
Top.Adult.Arts.Animation.Cartoons.Members | 2
|
||||||
|
(11 rows)
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect3>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title>Timings</title>
|
||||||
|
<programlisting>
|
||||||
|
+---------------------------------------------+
|
||||||
|
|Query|Rows|Time (ms) index|Time (ms) no index|
|
||||||
|
|-----+----+---------------+------------------|
|
||||||
|
| Q0| 1| NA| 1453.44|
|
||||||
|
|-----+----+---------------+------------------|
|
||||||
|
| Q1| 2| 0.49| 1001.54|
|
||||||
|
|-----+----+---------------+------------------|
|
||||||
|
| Q2| 2| 1.48| 3009.39|
|
||||||
|
|-----+----+---------------+------------------|
|
||||||
|
| Q3| 4| 0.55| 906.98|
|
||||||
|
|-----+----+---------------+------------------|
|
||||||
|
| Q4| 4| 24385.07| 4951.91|
|
||||||
|
|-----+----+---------------+------------------|
|
||||||
|
| Q5| 11| 0.85| 1003.23|
|
||||||
|
+---------------------------------------------+
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Timings without indices were obtained using operations which doesn't use
|
||||||
|
indices (see above)
|
||||||
|
</para>
|
||||||
|
</sect3>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title>Remarks</title>
|
||||||
|
<para>
|
||||||
|
We didn't run full-scale tests, also we didn't present (yet) data for
|
||||||
|
operations with arrays of ltree (ltree[]) and full text searching. We'll
|
||||||
|
appreciate your input. So far, below some (rather obvious) results:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Indices does help execution of queries
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Q4 performs bad because one needs to read almost all data from the HDD
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect3>
|
||||||
|
</sect2>
|
||||||
|
<sect2>
|
||||||
|
<title>Some Backgrounds</title>
|
||||||
|
<para>
|
||||||
|
The approach we use for ltree is much like one we used in our other GiST based
|
||||||
|
contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
|
||||||
|
background is available in papers referenced from our GiST development page
|
||||||
|
(<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>).
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
A hierarchical data structure (tree) is a set of nodes. Each node has a
|
||||||
|
signature (LPS) of a fixed size, which is a hashed label path of that node.
|
||||||
|
Traversing a tree we could *certainly* prune branches if
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
LQS (bitwise AND) LPS != LQS
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
where LQS is a signature of lquery or ltxtquery, obtained in the same way as
|
||||||
|
LPS.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
ltree[]:
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
|
||||||
|
reachable from that node. Signatures are stored in RD-tree, implemented using
|
||||||
|
GiST, which provides indexed access.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
ltree:
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
|
||||||
|
represented by (left_bound, signature, right_bound), so that we could speedup
|
||||||
|
operations <literal><, <=, =, >=, ></literal> using left_bound, right_bound and prune branches of
|
||||||
|
a tree using signature.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
<sect2>
|
||||||
|
<title>Authors</title>
|
||||||
|
<para>
|
||||||
|
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
|
||||||
|
Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
|
||||||
|
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
|
||||||
|
additional information. Authors would like to thank Eugeny Rodichev for
|
||||||
|
helpful discussions. Comments and bug reports are welcome.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -1,37 +1,70 @@
|
||||||
This utility allows administrators to examine the file structure used by
|
<sect1 id="oid2name">
|
||||||
PostgreSQL. To make use of it, you need to be familiar with the file
|
<title>oid2name</title>
|
||||||
structure, which is described in the "Database File Layout" chapter of
|
|
||||||
the "Internals" section of the PostgreSQL documentation.
|
|
||||||
|
|
||||||
Oid2name connects to the database and extracts OID, filenode, and table
|
<indexterm zone="oid2name">
|
||||||
name information. You can also have it show database OIDs and tablespace
|
<primary>oid2name</primary>
|
||||||
OIDs.
|
</indexterm>
|
||||||
|
|
||||||
When displaying specific tables, you can select which tables to show by
|
<para>
|
||||||
using -o, -f and -t. The first switch takes an OID, the second takes
|
This utility allows administrators to examine the file structure used by
|
||||||
a filenode, and the third takes a tablename (actually, it's a LIKE
|
PostgreSQL. To make use of it, you need to be familiar with the file
|
||||||
pattern, so you can use things like "foo%"). Note that you can use as many
|
structure, which is described in <xref linkend="storage">.
|
||||||
of these switches as you like, and the listing will include all objects
|
</para>
|
||||||
matched by any of the switches. Also note that these switches can only
|
|
||||||
show objects in the database given in -d.
|
|
||||||
|
|
||||||
If you don't give any of -o, -f or -t it will dump all the tables in the
|
<sect2>
|
||||||
database given in -d. If you don't give -d, it will show a database
|
<title>Overview</title>
|
||||||
listing. Alternatively you can give -s to get a tablespace listing.
|
<para>
|
||||||
|
<literal>oid2name</literal> connects to the database and extracts OID,
|
||||||
|
filenode, and table name information. You can also have it show database
|
||||||
|
OIDs and tablespace OIDs.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
When displaying specific tables, you can select which tables to show by
|
||||||
|
using -o, -f and -t. The first switch takes an OID, the second takes
|
||||||
|
a filenode, and the third takes a tablename (actually, it's a LIKE
|
||||||
|
pattern, so you can use things like "foo%"). Note that you can use as many
|
||||||
|
of these switches as you like, and the listing will include all objects
|
||||||
|
matched by any of the switches. Also note that these switches can only
|
||||||
|
show objects in the database given in -d.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
If you don't give any of -o, -f or -t it will dump all the tables in the
|
||||||
|
database given in -d. If you don't give -d, it will show a database
|
||||||
|
listing. Alternatively you can give -s to get a tablespace listing.
|
||||||
|
</para>
|
||||||
|
<table>
|
||||||
|
<title>Additional switches</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-i</literal></entry>
|
||||||
|
<entry>include indexes and sequences in the database listing.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-x</literal></entry>
|
||||||
|
<entry>display more information about each object shown: tablespace name,
|
||||||
|
schema name, OID.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-S</literal></entry>
|
||||||
|
<entry>also show system objects (those in information_schema, pg_toast
|
||||||
|
and pg_catalog schemas)
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-q</literal></entry>
|
||||||
|
<entry>don't display headers(useful for scripting)</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
Additional switches:
|
<sect2>
|
||||||
-i include indexes and sequences in the database listing.
|
<title>Examples</title>
|
||||||
-x display more information about each object shown:
|
|
||||||
tablespace name, schema name, OID.
|
|
||||||
-S also show system objects
|
|
||||||
(those in information_schema, pg_toast and pg_catalog schemas)
|
|
||||||
-q don't display headers
|
|
||||||
(useful for scripting)
|
|
||||||
|
|
||||||
---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Sample session:
|
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
$ oid2name
|
$ oid2name
|
||||||
All databases:
|
All databases:
|
||||||
Oid Database Name Tablespace
|
Oid Database Name Tablespace
|
||||||
|
@ -147,19 +180,26 @@ From database "alvherre":
|
||||||
155156 foo
|
155156 foo
|
||||||
|
|
||||||
$ # end of sample session.
|
$ # end of sample session.
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
---------------------------------------------------------------------------
|
<para>
|
||||||
|
You can also get approximate size data for each object using psql. For
|
||||||
|
example,
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC;
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Each page is typically 8k. Relpages is updated by VACUUM.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
You can also get approximate size data for each object using psql. For
|
<sect2>
|
||||||
example,
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
b. palmer, <email>bpalmer@crimelabs.net</email>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC;
|
</sect1>
|
||||||
|
|
||||||
Each page is typically 8k. Relpages is updated by VACUUM.
|
|
||||||
|
|
||||||
---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Mail me with any problems or additions you would like to see. Clearing
|
|
||||||
house for the code will be at: http://www.crimelabs.net
|
|
||||||
|
|
||||||
b. palmer, bpalmer@crimelabs.net
|
|
|
@ -0,0 +1,125 @@
|
||||||
|
|
||||||
|
<sect1 id="pageinspect">
|
||||||
|
<title>pageinspect</title>
|
||||||
|
|
||||||
|
<indexterm zone="pageinspect">
|
||||||
|
<primary>pageinspect</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The functions in this module allow you to inspect the contents of data pages
|
||||||
|
at a low level, for debugging purposes.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Functions included</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>get_raw_page</literal> reads one block of the named table and returns a copy as a
|
||||||
|
bytea field. This allows a single time-consistent copy of the block to be
|
||||||
|
made. Use of this functions is restricted to superusers.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>page_header</literal> shows fields which are common to all PostgreSQL heap and index
|
||||||
|
pages. Use of this function is restricted to superusers.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
A page image obtained with <literal>get_raw_page</literal> should be passed as argument:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# SELECT * FROM page_header(get_raw_page('pg_class',0));
|
||||||
|
lsn | tli | flags | lower | upper | special | pagesize | version
|
||||||
|
----------+-----+-------+-------+-------+---------+----------+---------
|
||||||
|
0/3C5614 | 1 | 1 | 216 | 256 | 8192 | 8192 | 4
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The returned columns correspond to the fields in the PageHeaderData-struct,
|
||||||
|
see src/include/storage/bufpage.h for more details.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>heap_page_items</literal> shows all line pointers on a heap page. For those line
|
||||||
|
pointers that are in use, tuple headers are also shown. All tuples are
|
||||||
|
shown, whether or not the tuples were visible to an MVCC snapshot at the
|
||||||
|
time the raw page was copied. Use of this function is restricted to
|
||||||
|
superusers.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
A heap page image obtained with <literal>get_raw_page</literal> should be passed as argument:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0));
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
See src/include/storage/itemid.h and src/include/access/htup.h for
|
||||||
|
explanations of the fields returned.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>bt_metap()</literal> returns information about the btree index metapage:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=> SELECT * FROM bt_metap('pg_cast_oid_index');
|
||||||
|
-[ RECORD 1 ]-----
|
||||||
|
magic | 340322
|
||||||
|
version | 2
|
||||||
|
root | 1
|
||||||
|
level | 0
|
||||||
|
fastroot | 1
|
||||||
|
fastlevel | 0
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>bt_page_stats()</literal> shows information about single btree pages:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
|
||||||
|
-[ RECORD 1 ]-+-----
|
||||||
|
blkno | 1
|
||||||
|
type | l
|
||||||
|
live_items | 256
|
||||||
|
dead_items | 0
|
||||||
|
avg_item_size | 12
|
||||||
|
page_size | 8192
|
||||||
|
free_size | 4056
|
||||||
|
btpo_prev | 0
|
||||||
|
btpo_next | 0
|
||||||
|
btpo | 0
|
||||||
|
btpo_flags | 3
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>bt_page_items()</literal> returns information about specific items on btree pages:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
|
||||||
|
itemoffset | ctid | itemlen | nulls | vars | data
|
||||||
|
------------+---------+---------+-------+------+-------------
|
||||||
|
1 | (0,1) | 12 | f | f | 23 27 00 00
|
||||||
|
2 | (0,2) | 12 | f | f | 24 27 00 00
|
||||||
|
3 | (0,3) | 12 | f | f | 25 27 00 00
|
||||||
|
4 | (0,4) | 12 | f | f | 26 27 00 00
|
||||||
|
5 | (0,5) | 12 | f | f | 27 27 00 00
|
||||||
|
6 | (0,6) | 12 | f | f | 28 27 00 00
|
||||||
|
7 | (0,7) | 12 | f | f | 29 27 00 00
|
||||||
|
8 | (0,8) | 12 | f | f | 2a 27 00 00
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,422 @@
|
||||||
|
|
||||||
|
<sect1 id="pgbench">
|
||||||
|
<title>pgbench</title>
|
||||||
|
|
||||||
|
<indexterm zone="pgbench">
|
||||||
|
<primary>pgbench</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>pgbench</literal> is a simple program to run a benchmark test.
|
||||||
|
<literal>pgbench</literal> is a client application of PostgreSQL and runs
|
||||||
|
with PostgreSQL only. It performs lots of small and simple transactions
|
||||||
|
including SELECT/UPDATE/INSERT operations then calculates number of
|
||||||
|
transactions successfully completed within a second (transactions
|
||||||
|
per second, tps). Targeting data includes a table with at least 100k
|
||||||
|
tuples.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example outputs from pgbench look like:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
number of clients: 4
|
||||||
|
number of transactions per client: 100
|
||||||
|
number of processed transactions: 400/400
|
||||||
|
tps = 19.875015(including connections establishing)
|
||||||
|
tps = 20.098827(excluding connections establishing)
|
||||||
|
</programlisting>
|
||||||
|
<para> Similar program called "JDBCBench" already exists, but it requires
|
||||||
|
Java that may not be available on every platform. Moreover some
|
||||||
|
people concerned about the overhead of Java that might lead
|
||||||
|
inaccurate results. So I decided to write in pure C, and named
|
||||||
|
it "pgbench."
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Features of pgbench:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
pgbench is written in C using libpq only. So it is very portable
|
||||||
|
and easy to install.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
pgbench can simulate concurrent connections using asynchronous
|
||||||
|
capability of libpq. No threading is required.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Overview</title>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>(optional)Initialize database by:</para>
|
||||||
|
<programlisting>
|
||||||
|
pgbench -i <dbname>
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
where <dbname> is the name of database. pgbench uses four tables
|
||||||
|
accounts, branches, history and tellers. These tables will be
|
||||||
|
destroyed. Be very careful if you have tables having same
|
||||||
|
names. Default test data contains:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
table # of tuples
|
||||||
|
-------------------------
|
||||||
|
branches 1
|
||||||
|
tellers 10
|
||||||
|
accounts 100000
|
||||||
|
history 0
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
You can increase the number of tuples by using -s option. branches,
|
||||||
|
tellers and accounts tables are created with a fillfactor which is
|
||||||
|
set using -F option. See below.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Run the benchmark test</para>
|
||||||
|
<programlisting>
|
||||||
|
pgbench <dbname>
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The default configuration is:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
number of clients: 1
|
||||||
|
number of transactions per client: 10
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title><literal>pgbench</literal> options</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Parameter</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-h hostname</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
hostname where the backend is running. If this option
|
||||||
|
is omitted, pgbench will connect to the localhost via
|
||||||
|
Unix domain socket.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-p port</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
the port number that the backend is accepting. default is
|
||||||
|
libpq's default, usually 5432.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-c number_of_clients</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Number of clients simulated. default is 1.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-t number_of_transactions</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Number of transactions each client runs. default is 10.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-s scaling_factor</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
this should be used with -i (initialize) option.
|
||||||
|
number of tuples generated will be multiple of the
|
||||||
|
scaling factor. For example, -s 100 will imply 10M
|
||||||
|
(10,000,000) tuples in the accounts table.
|
||||||
|
default is 1.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
NOTE: scaling factor should be at least
|
||||||
|
as large as the largest number of clients you intend
|
||||||
|
to test; else you'll mostly be measuring update contention.
|
||||||
|
Regular (not initializing) runs using one of the
|
||||||
|
built-in tests will detect scale based on the number of
|
||||||
|
branches in the database. For custom (-f) runs it can
|
||||||
|
be manually specified with this parameter.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-D varname=value</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Define a variable. It can be refered to by a script
|
||||||
|
provided by using -f option. Multiple -D options are allowed.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-U login</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Specify db user's login name if it is different from
|
||||||
|
the Unix login name.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-P password</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Specify the db password. CAUTION: using this option
|
||||||
|
might be a security hole since ps command will
|
||||||
|
show the password. Use this for TESTING PURPOSE ONLY.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-n</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
No vacuuming and cleaning the history table prior to the
|
||||||
|
test is performed.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-v</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Do vacuuming before testing. This will take some time.
|
||||||
|
With neither -n nor -v, pgbench will vacuum tellers and
|
||||||
|
branches tables only.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-S</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Perform select only transactions instead of TPC-B.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-N</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Do not update "branches" and "tellers". This will
|
||||||
|
avoid heavy update contention on branches and tellers,
|
||||||
|
while it will not make pgbench supporting TPC-B like
|
||||||
|
transactions.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-f filename</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Read transaction script from file. Detailed
|
||||||
|
explanation will appear later.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-C</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Establish connection for each transaction, rather than
|
||||||
|
doing it just once at beginning of pgbench in the normal
|
||||||
|
mode. This is useful to measure the connection overhead.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-l</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Write the time taken by each transaction to a logfile,
|
||||||
|
with the name "pgbench_log.xxx", where xxx is the PID
|
||||||
|
of the pgbench process. The format of the log is:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
client_id transaction_no time file_no time-epoch time-us
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
where time is measured in microseconds, , the file_no is
|
||||||
|
which test file was used (useful when multiple were
|
||||||
|
specified with -f), and time-epoch/time-us are a
|
||||||
|
UNIX epoch format timestamp followed by an offset
|
||||||
|
in microseconds (suitable for creating a ISO 8601
|
||||||
|
timestamp with a fraction of a second) of when
|
||||||
|
the transaction completed.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Here are example outputs:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
0 199 2241 0 1175850568 995598
|
||||||
|
0 200 2465 0 1175850568 998079
|
||||||
|
0 201 2513 0 1175850569 608
|
||||||
|
0 202 2038 0 1175850569 2663
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-F fillfactor</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Create tables(accounts, tellers and branches) with the given
|
||||||
|
fillfactor. Default is 100. This should be used with -i
|
||||||
|
(initialize) option.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>-d</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
debug option.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>What is the "transaction" actually performed in pgbench?</title>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem><para><literal>begin;</literal></para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>update accounts set abalance = abalance + :delta where aid = :aid;</literal></para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>select abalance from accounts where aid = :aid;</literal></para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>update tellers set tbalance = tbalance + :delta where tid = :tid;</literal></para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>update branches set bbalance = bbalance + :delta where bid = :bid;</literal></para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);</literal></para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>end;</literal></para></listitem>
|
||||||
|
</orderedlist>
|
||||||
|
<para>
|
||||||
|
If you specify -N, (4) and (5) aren't included in the transaction.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Script file</title>
|
||||||
|
<para>
|
||||||
|
<literal>pgbench</literal> has support for reading a transaction script
|
||||||
|
from a specified file (<literal>-f</literal> option). This file should
|
||||||
|
include SQL commands in each line. SQL command consists of multiple lines
|
||||||
|
are not supported. Empty lines and lines begging with "--" will be ignored.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Multiple <literal>-f</literal> options are allowed. In this case each
|
||||||
|
transaction is assigned randomly chosen script.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
SQL commands can include "meta command" which begins with "\" (back
|
||||||
|
slash). A meta command takes some arguments separted by white
|
||||||
|
spaces. Currently following meta command is supported:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>\set name operand1 [ operator operand2 ]</literal>
|
||||||
|
- Sets the calculated value using "operand1" "operator"
|
||||||
|
"operand2" to variable "name". If "operator" and "operand2"
|
||||||
|
are omitted, the value of operand1 is set to variable "name".
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
\set ntellers 10 * :scale
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>\setrandom name min max</literal>
|
||||||
|
- Assigns random integer to name between min and max
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
\setrandom aid 1 100000
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Variables can be referred to in SQL comands by adding ":" in front
|
||||||
|
of the varible name.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT abalance FROM accounts WHERE aid = :aid
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Variables can also be defined by using -D option.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Examples</title>
|
||||||
|
<para>
|
||||||
|
Example, TPC-B like benchmark can be defined as follows(scaling
|
||||||
|
factor = 1):
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
\set nbranches :scale
|
||||||
|
\set ntellers 10 * :scale
|
||||||
|
\set naccounts 100000 * :scale
|
||||||
|
\setrandom aid 1 :naccounts
|
||||||
|
\setrandom bid 1 :nbranches
|
||||||
|
\setrandom tid 1 :ntellers
|
||||||
|
\setrandom delta 1 10000
|
||||||
|
BEGIN
|
||||||
|
UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
|
||||||
|
SELECT abalance FROM accounts WHERE aid = :aid
|
||||||
|
UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
|
||||||
|
UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
|
||||||
|
INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
|
||||||
|
END
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
If you want to automatically set the scaling factor from the number of
|
||||||
|
tuples in branches table, use -s option and shell command like this:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Notice that -f option does not execute vacuum and clearing history
|
||||||
|
table before starting benchmark.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,123 @@
|
||||||
|
|
||||||
|
<sect1 id="pgrowlocks">
|
||||||
|
<title>pgrowlocks</title>
|
||||||
|
|
||||||
|
<indexterm zone="pgrowlocks">
|
||||||
|
<primary>pgrowlocks</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <literal>pgrowlocks</literal> module provides a function to show row
|
||||||
|
locking information for a specified table.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Overview</title>
|
||||||
|
<programlisting>
|
||||||
|
pgrowlocks(text) RETURNS pgrowlocks_type
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The parameter is a name of table. And <literal>pgrowlocks_type</literal> is
|
||||||
|
defined as:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TYPE pgrowlocks_type AS (
|
||||||
|
locked_row TID, -- row TID
|
||||||
|
lock_type TEXT, -- lock type
|
||||||
|
locker XID, -- locking XID
|
||||||
|
multi bool, -- multi XID?
|
||||||
|
xids xid[], -- multi XIDs
|
||||||
|
pids INTEGER[] -- locker's process id
|
||||||
|
);
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>pgrowlocks_type</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>locked_row</entry>
|
||||||
|
<entry>tuple ID(TID) of each locked rows</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>lock_type</entry>
|
||||||
|
<entry>"Shared" for shared lock, "Exclusive" for exclusive lock</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>locker</entry>
|
||||||
|
<entry>transaction ID of locker (Note 1)</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>multi</entry>
|
||||||
|
<entry>"t" if locker is a multi transaction, otherwise "f"</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>xids</entry>
|
||||||
|
<entry>XIDs of lockers (Note 2)</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>pids</entry>
|
||||||
|
<entry>process ids of locking backends</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<para>
|
||||||
|
Note1: If the locker is multi transaction, it represents the multi ID.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Note2: If the locker is multi, multiple data are shown.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The calling sequence for <literal>pgrowlocks</literal> is as follows:
|
||||||
|
<literal>pgrowlocks</literal> grabs AccessShareLock for the target table and
|
||||||
|
reads each row one by one to get the row locking information. You should
|
||||||
|
notice that:
|
||||||
|
</para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
if the table is exclusive locked by someone else,
|
||||||
|
<literal>pgrowlocks</literal> will be blocked.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>pgrowlocks</literal> may show incorrect information if there's a
|
||||||
|
new lock or a lock is freeed while its execution.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
<para>
|
||||||
|
<literal>pgrowlocks</literal> does not show the contents of locked rows. If
|
||||||
|
you want to take a look at the row contents at the same time, you could do
|
||||||
|
something like this:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Example</title>
|
||||||
|
<para>
|
||||||
|
<literal>pgrowlocks</literal> returns the following data type:
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Here is a sample execution of pgrowlocks:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# SELECT * FROM pgrowlocks('t1');
|
||||||
|
locked_row | lock_type | locker | multi | xids | pids
|
||||||
|
------------+-----------+--------+-------+-----------+---------------
|
||||||
|
(0,1) | Shared | 19 | t | {804,805} | {29066,29068}
|
||||||
|
(0,2) | Shared | 19 | t | {804,805} | {29066,29068}
|
||||||
|
(0,3) | Exclusive | 804 | f | {804} | {29066}
|
||||||
|
(0,4) | Exclusive | 804 | f | {804} | {29066}
|
||||||
|
(4 rows)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,158 @@
|
||||||
|
|
||||||
|
<sect1 id="pgstattuple">
|
||||||
|
<title>pgstattuple</title>
|
||||||
|
|
||||||
|
<indexterm zone="pgstattuple">
|
||||||
|
<primary>pgstattuple</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>pgstattuple</literal> modules provides various functions to obtain
|
||||||
|
tuple statistics.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Functions</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>pgstattuple()</literal> returns the relation length, percentage
|
||||||
|
of the "dead" tuples of a relation and other info. This may help users to
|
||||||
|
determine whether vacuum is necessary or not. Here is an example session:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=> \x
|
||||||
|
Expanded display is on.
|
||||||
|
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
|
||||||
|
-[ RECORD 1 ]------+-------
|
||||||
|
table_len | 458752
|
||||||
|
tuple_count | 1470
|
||||||
|
tuple_len | 438896
|
||||||
|
tuple_percent | 95.67
|
||||||
|
dead_tuple_count | 11
|
||||||
|
dead_tuple_len | 3157
|
||||||
|
dead_tuple_percent | 0.69
|
||||||
|
free_space | 8932
|
||||||
|
free_percent | 1.95
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Here are explanations for each column:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title><literal>pgstattuple()</literal> column descriptions</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Column</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>table_len</entry>
|
||||||
|
<entry>physical relation length in bytes</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>tuple_count</entry>
|
||||||
|
<entry>number of live tuples</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>tuple_len</entry>
|
||||||
|
<entry>total tuples length in bytes</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>tuple_percent</entry>
|
||||||
|
<entry>live tuples in %</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>dead_tuple_len</entry>
|
||||||
|
<entry>total dead tuples length in bytes</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>dead_tuple_percent</entry>
|
||||||
|
<entry>dead tuples in %</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>free_space</entry>
|
||||||
|
<entry>free space in bytes</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>free_percent</entry>
|
||||||
|
<entry>free space in %</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<para>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
<literal>pgstattuple</literal> acquires only a read lock on the relation. So
|
||||||
|
concurrent update may affect the result.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
<literal>pgstattuple</literal> judges a tuple is "dead" if HeapTupleSatisfiesNow()
|
||||||
|
returns false.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>pg_relpages()</literal> returns the number of pages in the relation.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<literal>pgstatindex()</literal> returns an array showing the information about an index:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=> \x
|
||||||
|
Expanded display is on.
|
||||||
|
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
|
||||||
|
-[ RECORD 1 ]------+------
|
||||||
|
version | 2
|
||||||
|
tree_level | 0
|
||||||
|
index_size | 8192
|
||||||
|
root_block_no | 1
|
||||||
|
internal_pages | 0
|
||||||
|
leaf_pages | 1
|
||||||
|
empty_pages | 0
|
||||||
|
deleted_pages | 0
|
||||||
|
avg_leaf_density | 50.27
|
||||||
|
leaf_fragmentation | 0
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Usage</title>
|
||||||
|
<para>
|
||||||
|
<literal>pgstattuple</literal> may be called as a relation function and is
|
||||||
|
defined as follows:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
|
||||||
|
AS 'MODULE_PATHNAME', 'pgstattuple'
|
||||||
|
LANGUAGE C STRICT;
|
||||||
|
|
||||||
|
CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
|
||||||
|
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
|
||||||
|
LANGUAGE C STRICT;
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The argument is the relation name (optionally it may be qualified)
|
||||||
|
or the OID of the relation. Note that pgstattuple only returns
|
||||||
|
one row.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.83 2007/11/01 17:00:18 momjian Exp $ -->
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.84 2007/11/10 23:30:46 momjian Exp $ -->
|
||||||
|
|
||||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
|
||||||
|
|
||||||
|
@ -102,6 +102,7 @@
|
||||||
&typeconv;
|
&typeconv;
|
||||||
&indices;
|
&indices;
|
||||||
&textsearch;
|
&textsearch;
|
||||||
|
&contrib;
|
||||||
&mvcc;
|
&mvcc;
|
||||||
&perform;
|
&perform;
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,450 @@
|
||||||
|
|
||||||
|
<sect1 id="seg">
|
||||||
|
<title>seg</title>
|
||||||
|
|
||||||
|
<indexterm zone="seg">
|
||||||
|
<primary>seg</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <literal>seg</literal> module contains the code for the user-defined
|
||||||
|
type, <literal>SEG</literal>, representing laboratory measurements as
|
||||||
|
floating point intervals.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Rationale</title>
|
||||||
|
<para>
|
||||||
|
The geometry of measurements is usually more complex than that of a
|
||||||
|
point in a numeric continuum. A measurement is usually a segment of
|
||||||
|
that continuum with somewhat fuzzy limits. The measurements come out
|
||||||
|
as intervals because of uncertainty and randomness, as well as because
|
||||||
|
the value being measured may naturally be an interval indicating some
|
||||||
|
condition, such as the temperature range of stability of a protein.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Using just common sense, it appears more convenient to store such data
|
||||||
|
as intervals, rather than pairs of numbers. In practice, it even turns
|
||||||
|
out more efficient in most applications.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Further along the line of common sense, the fuzziness of the limits
|
||||||
|
suggests that the use of traditional numeric data types leads to a
|
||||||
|
certain loss of information. Consider this: your instrument reads
|
||||||
|
6.50, and you input this reading into the database. What do you get
|
||||||
|
when you fetch it? Watch:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=> select 6.50 as "pH";
|
||||||
|
pH
|
||||||
|
---
|
||||||
|
6.5
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
In the world of measurements, 6.50 is not the same as 6.5. It may
|
||||||
|
sometimes be critically different. The experimenters usually write
|
||||||
|
down (and publish) the digits they trust. 6.50 is actually a fuzzy
|
||||||
|
interval contained within a bigger and even fuzzier interval, 6.5,
|
||||||
|
with their center points being (probably) the only common feature they
|
||||||
|
share. We definitely do not want such different data items to appear the
|
||||||
|
same.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Conclusion? It is nice to have a special data type that can record the
|
||||||
|
limits of an interval with arbitrarily variable precision. Variable in
|
||||||
|
a sense that each data element records its own precision.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Check this out:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=> select '6.25 .. 6.50'::seg as "pH";
|
||||||
|
pH
|
||||||
|
------------
|
||||||
|
6.25 .. 6.50
|
||||||
|
(1 row)
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Syntax</title>
|
||||||
|
<para>
|
||||||
|
The external representation of an interval is formed using one or two
|
||||||
|
floating point numbers joined by the range operator ('..' or '...').
|
||||||
|
Optional certainty indicators (<, > and ~) are ignored by the internal
|
||||||
|
logics, but are retained in the data.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Rules</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>rule 1</entry>
|
||||||
|
<entry>seg -> boundary PLUMIN deviation</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 2</entry>
|
||||||
|
<entry>seg -> boundary RANGE boundary</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 3</entry>
|
||||||
|
<entry>seg -> boundary RANGE</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 4</entry>
|
||||||
|
<entry>seg -> RANGE boundary</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 5</entry>
|
||||||
|
<entry>seg -> boundary</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 6</entry>
|
||||||
|
<entry>boundary -> FLOAT</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 7</entry>
|
||||||
|
<entry>boundary -> EXTENSION FLOAT</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>rule 8</entry>
|
||||||
|
<entry>deviation -> FLOAT</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Tokens</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>RANGE</entry>
|
||||||
|
<entry>(\.\.)(\.)?</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>PLUMIN</entry>
|
||||||
|
<entry>\'\+\-\'</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>integer</entry>
|
||||||
|
<entry>[+-]?[0-9]+</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>real</entry>
|
||||||
|
<entry>[+-]?[0-9]+\.[0-9]+</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>FLOAT</entry>
|
||||||
|
<entry>({integer}|{real})([eE]{integer})?</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>EXTENSION</entry>
|
||||||
|
<entry>[<>~]</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Examples of valid <literal>SEG</literal> representations</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>Any number</entry>
|
||||||
|
<entry>
|
||||||
|
(rules 5,6) -- creates a zero-length segment (a point,
|
||||||
|
if you will)
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>~5.0</entry>
|
||||||
|
<entry>
|
||||||
|
(rules 5,7) -- creates a zero-length segment AND records
|
||||||
|
'~' in the data. This notation reads 'approximately 5.0',
|
||||||
|
but its meaning is not recognized by the code. It is ignored
|
||||||
|
until you get the value back. View it is a short-hand comment.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><5.0</entry>
|
||||||
|
<entry>
|
||||||
|
(rules 5,7) -- creates a point at 5.0; '<' is ignored but
|
||||||
|
is preserved as a comment
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>>5.0</entry>
|
||||||
|
<entry>
|
||||||
|
(rules 5,7) -- creates a point at 5.0; '>' is ignored but
|
||||||
|
is preserved as a comment
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><para>5(+-)0.3</para><para>5'+-'0.3</para></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
(rules 1,8) -- creates an interval '4.7..5.3'. As of this
|
||||||
|
writing (02/09/2000), this mechanism isn't completely accurate
|
||||||
|
in determining the number of significant digits for the
|
||||||
|
boundaries. For example, it adds an extra digit to the lower
|
||||||
|
boundary if the resulting interval includes a power of ten:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
postgres=> select '10(+-)1'::seg as seg;
|
||||||
|
seg
|
||||||
|
---------
|
||||||
|
9.0 .. 11 -- should be: 9 .. 11
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Also, the (+-) notation is not preserved: 'a(+-)b' will
|
||||||
|
always be returned as '(a-b) .. (a+b)'. The purpose of this
|
||||||
|
notation is to allow input from certain data sources without
|
||||||
|
conversion.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>50 .. </entry>
|
||||||
|
<entry>(rule 3) -- everything that is greater than or equal to 50</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>.. 0</entry>
|
||||||
|
<entry>(rule 4) -- everything that is less than or equal to 0</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>1.5e-2 .. 2E-2 </entry>
|
||||||
|
<entry>(rule 2) -- creates an interval (0.015 .. 0.02)</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>1 ... 2</entry>
|
||||||
|
<entry>
|
||||||
|
The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
|
||||||
|
Because of the widespread use of '...' in the data sources,
|
||||||
|
I decided to stick to is as a range operator. This, and
|
||||||
|
also the fact that the white space around the range operator
|
||||||
|
is ignored, creates a parsing conflict with numeric constants
|
||||||
|
starting with a decimal point.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Examples</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>.1e7</entry>
|
||||||
|
<entry>should be: 0.1e7</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>.1 .. .2</entry>
|
||||||
|
<entry>should be: 0.1 .. 0.2</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>2.4 E4</entry>
|
||||||
|
<entry>should be: 2.4E4</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<para>
|
||||||
|
The following, although it is not a syntax error, is disallowed to improve
|
||||||
|
the sanity of the data:
|
||||||
|
</para>
|
||||||
|
<table>
|
||||||
|
<title></title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>5 .. 2</entry>
|
||||||
|
<entry>should be: 2 .. 5</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Precision</title>
|
||||||
|
<para>
|
||||||
|
The segments are stored internally as pairs of 32-bit floating point
|
||||||
|
numbers. It means that the numbers with more than 7 significant digits
|
||||||
|
will be truncated.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The numbers with less than or exactly 7 significant digits retain their
|
||||||
|
original precision. That is, if your query returns 0.00, you will be
|
||||||
|
sure that the trailing zeroes are not the artifacts of formatting: they
|
||||||
|
reflect the precision of the original data. The number of leading
|
||||||
|
zeroes does not affect precision: the value 0.0067 is considered to
|
||||||
|
have just 2 significant digits.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Usage</title>
|
||||||
|
<para>
|
||||||
|
The access method for SEG is a GiST index (gist_seg_ops), which is a
|
||||||
|
generalization of R-tree. GiSTs allow the postgres implementation of
|
||||||
|
R-tree, originally encoded to support 2-D geometric types such as
|
||||||
|
boxes and polygons, to be used with any data type whose data domain
|
||||||
|
can be partitioned using the concepts of containment, intersection and
|
||||||
|
equality. In other words, everything that can intersect or contain
|
||||||
|
its own kind can be indexed with a GiST. That includes, among other
|
||||||
|
things, all geometric data types, regardless of their dimensionality
|
||||||
|
(see also contrib/cube).
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The operators supported by the GiST access method include:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] << [c, d] Is left of
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The left operand, [a, b], occurs entirely to the left of the
|
||||||
|
right operand, [c, d], on the axis (-inf, inf). It means,
|
||||||
|
[a, b] << [c, d] is true if b < c and false otherwise
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] >> [c, d] Is right of
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
[a, b] is occurs entirely to the right of [c, d].
|
||||||
|
[a, b] >> [c, d] is true if a > d and false otherwise
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] &< [c, d] Overlaps or is left of
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
This might be better read as "does not extend to right of".
|
||||||
|
It is true when b <= d.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] &> [c, d] Overlaps or is right of
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
This might be better read as "does not extend to left of".
|
||||||
|
It is true when a >= c.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] = [c, d] Same as
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The segments [a, b] and [c, d] are identical, that is, a == b
|
||||||
|
and c == d
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] && [c, d] Overlaps
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The segments [a, b] and [c, d] overlap.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] @> [c, d] Contains
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The segment [a, b] contains the segment [c, d], that is,
|
||||||
|
a <= c and b >= d
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
[a, b] <@ [c, d] Contained in
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The segment [a, b] is contained in [c, d], that is,
|
||||||
|
a >= c and b <= d
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
<para>
|
||||||
|
(Before PostgreSQL 8.2, the containment operators @> and <@ were
|
||||||
|
respectively called @ and ~. These names are still available, but are
|
||||||
|
deprecated and will eventually be retired. Notice that the old names
|
||||||
|
are reversed from the convention formerly followed by the core geometric
|
||||||
|
datatypes!)
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Although the mnemonics of the following operators is questionable, I
|
||||||
|
preserved them to maintain visual consistency with other geometric
|
||||||
|
data types defined in Postgres.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Other operators:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
[a, b] < [c, d] Less than
|
||||||
|
[a, b] > [c, d] Greater than
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
These operators do not make a lot of sense for any practical
|
||||||
|
purpose but sorting. These operators first compare (a) to (c),
|
||||||
|
and if these are equal, compare (b) to (d). That accounts for
|
||||||
|
reasonably good sorting in most cases, which is useful if
|
||||||
|
you want to use ORDER BY with this type
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
There are a few other potentially useful functions defined in seg.c
|
||||||
|
that vanished from the schema because I stopped using them. Some of
|
||||||
|
these were meant to support type casting. Let me know if I was wrong:
|
||||||
|
I will then add them back to the schema. I would also appreciate
|
||||||
|
other ideas that would enhance the type and make it more useful.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
For examples of usage, see sql/seg.sql
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
NOTE: The performance of an R-tree index can largely depend on the
|
||||||
|
order of input values. It may be very helpful to sort the input table
|
||||||
|
on the SEG column (see the script sort-segments.pl for an example)
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Credits</title>
|
||||||
|
<para>
|
||||||
|
My thanks are primarily to Prof. Joe Hellerstein
|
||||||
|
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
|
||||||
|
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
|
||||||
|
also grateful to all postgres developers, present and past, for enabling
|
||||||
|
myself to create my own world and live undisturbed in it. And I would like
|
||||||
|
to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
|
||||||
|
Energy for the years of faithful support of my database research.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
Gene Selkov, Jr.
|
||||||
|
Computational Scientist
|
||||||
|
Mathematics and Computer Science Division
|
||||||
|
Argonne National Laboratory
|
||||||
|
9700 S Cass Ave.
|
||||||
|
Building 221
|
||||||
|
Argonne, IL 60439-4844
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
<email>selkovjr@mcs.anl.gov</email>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,164 @@
|
||||||
|
|
||||||
|
<sect1 id="sslinfo">
|
||||||
|
<title>sslinfo</title>
|
||||||
|
|
||||||
|
<indexterm zone="sslinfo">
|
||||||
|
<primary>sslinfo</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This modules provides information about current SSL certificate for PostgreSQL.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Notes</title>
|
||||||
|
<para>
|
||||||
|
This extension won't build unless your PostgreSQL server is configured
|
||||||
|
with --with-openssl. Information provided with these functions would
|
||||||
|
be completely useless if you don't use SSL to connect to database.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Functions Description</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
ssl_is_used() RETURNS boolean;
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Returns TRUE, if current connection to server uses SSL and FALSE
|
||||||
|
otherwise.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
ssl_client_cert_present() RETURNS boolean
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Returns TRUE if current client have presented valid SSL client
|
||||||
|
certificate to the server and FALSE otherwise (e.g., no SSL,
|
||||||
|
certificate hadn't be requested by server).
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
ssl_client_serial() RETURNS numeric
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Returns serial number of current client certificate. The combination
|
||||||
|
of certificate serial number and certificate issuer is guaranteed to
|
||||||
|
uniquely identify certificate (but not its owner -- the owner ought to
|
||||||
|
regularily change his keys, and get new certificates from the issuer).
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
So, if you run you own CA and allow only certificates from this CA to
|
||||||
|
be accepted by server, the serial number is the most reliable (albeit
|
||||||
|
not very mnemonic) means to indentify user.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
ssl_client_dn() RETURNS text
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Returns the full subject of current client certificate, converting
|
||||||
|
character data into the current database encoding. It is assumed that
|
||||||
|
if you use non-Latin characters in the certificate names, your
|
||||||
|
database is able to represent these characters, too. If your database
|
||||||
|
uses the SQL_ASCII encoding, non-Latin characters in the name will be
|
||||||
|
represented as UTF-8 sequences.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
ssl_issuer_dn()
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Returns the full issuer name of the client certificate, converting
|
||||||
|
character data into current database encoding.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The combination of the return value of this function with the
|
||||||
|
certificate serial number uniquely identifies the certificate.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The result of this function is really useful only if you have more
|
||||||
|
than one trusted CA certificate in your server's root.crt file, or if
|
||||||
|
this CA has issued some intermediate certificate authority
|
||||||
|
certificates.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
ssl_client_dn_field(fieldName text) RETURNS text
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
This function returns the value of the specified field in the
|
||||||
|
certificate subject. Field names are string constants that are
|
||||||
|
converted into ASN1 object identificators using the OpenSSL object
|
||||||
|
database. The following values are acceptable:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
commonName (alias CN)
|
||||||
|
surname (alias SN)
|
||||||
|
name
|
||||||
|
givenName (alias GN)
|
||||||
|
countryName (alias C)
|
||||||
|
localityName (alias L)
|
||||||
|
stateOrProvinceName (alias ST)
|
||||||
|
organizationName (alias O)
|
||||||
|
organizationUnitName (alias OU)
|
||||||
|
title
|
||||||
|
description
|
||||||
|
initials
|
||||||
|
postalCode
|
||||||
|
streetAddress
|
||||||
|
generationQualifier
|
||||||
|
description
|
||||||
|
dnQualifier
|
||||||
|
x500UniqueIdentifier
|
||||||
|
pseudonim
|
||||||
|
role
|
||||||
|
emailAddress
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
All of these fields are optional, except commonName. It depends
|
||||||
|
entirely on your CA policy which of them would be included and which
|
||||||
|
wouldn't. The meaning of these fields, howeer, is strictly defined by
|
||||||
|
the X.500 and X.509 standards, so you cannot just assign arbitrary
|
||||||
|
meaning to them.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<programlisting>
|
||||||
|
ssl_issuer_field(fieldName text) RETURNS text;
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Does same as ssl_client_dn_field, but for the certificate issuer
|
||||||
|
rather than the certificate subject.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
Victor Wagner <email>vitus@cryptocom.ru</email>, Cryptocom LTD
|
||||||
|
E-Mail of Cryptocom OpenSSL development group:
|
||||||
|
<email>openssl@cryptocom.ru</email>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,249 @@
|
||||||
|
|
||||||
|
<sect1 id="pgstandby">
|
||||||
|
<title>pg_standby</title>
|
||||||
|
|
||||||
|
<indexterm zone="pgstandby">
|
||||||
|
<primary>pgstandby</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>pg_standby</literal> is a production-ready program that can be used
|
||||||
|
to create a Warm Standby server. Other configuration is required as well,
|
||||||
|
all of which is described in the main server manual.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The program is designed to be a wait-for <literal>restore_command</literal>,
|
||||||
|
required to turn a normal archive recovery into a Warm Standby. Within the
|
||||||
|
<literal>restore_command</literal> of the <literal>recovery.conf</literal>
|
||||||
|
you could configure <literal>pg_standby</literal> in the following way:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
restore_command = 'pg_standby archiveDir %f %p'
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
which would be sufficient to define that files will be restored from
|
||||||
|
archiveDir.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>pg_standby</literal> features include:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
It is written in C. So it is very portable
|
||||||
|
and easy to install.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Supports copy or link from a directory (only)
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Source easy to modify, with specifically designated
|
||||||
|
sections to modify for your own needs, allowing
|
||||||
|
interfaces to be written for additional Backup Archive Restore
|
||||||
|
(BAR) systems
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Already tested on Linux and Windows
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Usage</title>
|
||||||
|
<para>
|
||||||
|
<literal>pg_standby</literal> should be used within the
|
||||||
|
<literal>restore_command</literal> of the <literal>recovery.conf</literal>
|
||||||
|
file.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The basic usage should be like this:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
restore_command = 'pg_standby archiveDir %f %p'
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
with the pg_standby command usage as
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH]
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
When used within the <literal>restore_command</literal> the %f and %p macros
|
||||||
|
will provide the actual file and path required for the restore/recovery.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Options</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>-c</entry>
|
||||||
|
<entry> use copy/cp command to restore WAL files from archive</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>-d</entry>
|
||||||
|
<entry>debug/logging option.</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>-k numfiles</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Cleanup files in the archive so that we maintain no more
|
||||||
|
than this many files in the archive.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
You should be wary against setting this number too low,
|
||||||
|
since this may mean you cannot restart the standby. This
|
||||||
|
is because the last restartpoint marked in the WAL files
|
||||||
|
may be many files in the past and can vary considerably.
|
||||||
|
This should be set to a value exceeding the number of WAL
|
||||||
|
files that can be recovered in 2*checkpoint_timeout seconds,
|
||||||
|
according to the value in the warm standby postgresql.conf.
|
||||||
|
It is wholly unrelated to the setting of checkpoint_segments
|
||||||
|
on either primary or standby.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
If in doubt, use a large value or do not set a value at all.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>-l</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
use ln command to restore WAL files from archive
|
||||||
|
WAL files will remain in archive
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Link is more efficient, but the default is copy to
|
||||||
|
allow you to maintain the WAL archive for recovery
|
||||||
|
purposes as well as high-availability.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This option uses the Windows Vista command mklink
|
||||||
|
to provide a file-to-file symbolic link. -l will
|
||||||
|
not work on versions of Windows prior to Vista.
|
||||||
|
Use the -c option instead.
|
||||||
|
see <ulink url="http://en.wikipedia.org/wiki/NTFS_symbolic_link"></ulink>
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>-r maxretries</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
the maximum number of times to retry the restore command if it
|
||||||
|
fails. After each failure, we wait for sleeptime * num_retries
|
||||||
|
so that the wait time increases progressively, so by default
|
||||||
|
we will wait 5 secs, 10 secs then 15 secs before reporting
|
||||||
|
the failure back to the database server. This will be
|
||||||
|
interpreted as and end of recovery and the Standby will come
|
||||||
|
up fully as a result. <literal>Default=3</literal>
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>-s sleeptime</entry>
|
||||||
|
<entry>
|
||||||
|
the number of seconds to sleep between testing to see
|
||||||
|
if the file to be restored is available in the archive yet.
|
||||||
|
The default setting is not necessarily recommended,
|
||||||
|
consult the main database server manual for discussion.
|
||||||
|
<literal>Default=5</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>-t triggerfile</entry>
|
||||||
|
<entry>
|
||||||
|
the presence of the triggerfile will cause recovery to end
|
||||||
|
whether or not the next file is available
|
||||||
|
It is recommended that you use a structured filename to
|
||||||
|
avoid confusion as to which server is being triggered
|
||||||
|
when multiple servers exist on same system.
|
||||||
|
e.g. /tmp/pgsql.trigger.5432
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>-w maxwaittime</entry>
|
||||||
|
<entry>
|
||||||
|
the maximum number of seconds to wait for the next file,
|
||||||
|
after which recovery will end and the Standby will come up.
|
||||||
|
The default setting is not necessarily recommended,
|
||||||
|
consult the main database server manual for discussion.
|
||||||
|
<literal>Default=0</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
<literal>--help</literal> is not supported since
|
||||||
|
<literal>pg_standby</literal> is not intended for interactive use, except
|
||||||
|
during development and testing.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Examples</title>
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>Example on Linux</para>
|
||||||
|
<programlisting>
|
||||||
|
archive_command = 'cp %p ../archive/%f'
|
||||||
|
|
||||||
|
restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log'
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
which will
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem><para>use a ln command to restore WAL files from archive</para></listitem>
|
||||||
|
<listitem><para>produce logfile output in standby.log</para></listitem>
|
||||||
|
<listitem><para>keep the last 255 full WAL files, plus the current one</para></listitem>
|
||||||
|
<listitem><para>sleep for 2 seconds between checks for next WAL file is full</para></listitem>
|
||||||
|
<listitem><para>never timeout if file not found</para></listitem>
|
||||||
|
<listitem><para>stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears</para></listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Example on Windows
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
archive_command = 'copy %p ..\\archive\\%f'
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Note that backslashes need to be doubled in the archive_command, but
|
||||||
|
*not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442
|
||||||
|
..\archive %f %p 2>> standby.log'
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
which will
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem><para>use a copy command to restore WAL files from archive</para></listitem>
|
||||||
|
<listitem><para>produce logfile output in standby.log</para></listitem>
|
||||||
|
<listitem><para>sleep for 5 seconds between checks for next WAL file is full</para></listitem>
|
||||||
|
<listitem><para>never timeout if file not found</para></listitem>
|
||||||
|
<listitem><para>stop waiting when a trigger file called C:\pgsql.trigger.5442 appears</para></listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,765 @@
|
||||||
|
|
||||||
|
<sect1 id="tablefunc">
|
||||||
|
<title>tablefunc</title>
|
||||||
|
|
||||||
|
<indexterm zone="tablefunc">
|
||||||
|
<primary>tablefunc</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>tablefunc</literal> provides functions to convert query rows into fields.
|
||||||
|
</para>
|
||||||
|
<sect2>
|
||||||
|
<title>Functions</title>
|
||||||
|
<table>
|
||||||
|
<title></title>
|
||||||
|
<tgroup cols="3">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Function</entry>
|
||||||
|
<entry>Returns</entry>
|
||||||
|
<entry>Comments</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<literal>
|
||||||
|
normal_rand(int numvals, float8 mean, float8 stddev)
|
||||||
|
</literal>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
returns a set of normally distributed float8 values
|
||||||
|
</entry>
|
||||||
|
<entry></entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>crosstabN(text sql)</literal></entry>
|
||||||
|
<entry>returns a set of row_name plus N category value columns</entry>
|
||||||
|
<entry>
|
||||||
|
crosstab2(), crosstab3(), and crosstab4() are defined for you,
|
||||||
|
but you can create additional crosstab functions per the instructions
|
||||||
|
in the documentation below.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>crosstab(text sql)</literal></entry>
|
||||||
|
<entry>returns a set of row_name plus N category value columns</entry>
|
||||||
|
<entry>
|
||||||
|
requires anonymous composite type syntax in the FROM clause. See
|
||||||
|
the instructions in the documentation below.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>crosstab(text sql, N int)</literal></entry>
|
||||||
|
<entry></entry>
|
||||||
|
<entry>
|
||||||
|
<para>obsolete version of crosstab()</para>
|
||||||
|
<para>
|
||||||
|
the argument N is now ignored, since the number of value columns
|
||||||
|
is always determined by the calling query
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<literal>
|
||||||
|
connectby(text relname, text keyid_fld, text parent_keyid_fld
|
||||||
|
[, text orderby_fld], text start_with, int max_depth
|
||||||
|
[, text branch_delim])
|
||||||
|
</literal>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
returns keyid, parent_keyid, level, and an optional branch string
|
||||||
|
and an optional serial column for ordering siblings
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
requires anonymous composite type syntax in the FROM clause. See
|
||||||
|
the instructions in the documentation below.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title><literal>normal_rand</literal></title>
|
||||||
|
<programlisting>
|
||||||
|
normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Where <literal>numvals</literal> is the number of values to be returned
|
||||||
|
from the function. <literal>mean</literal> is the mean of the normal
|
||||||
|
distribution of values and <literal>stddev</literal> is the standard
|
||||||
|
deviation of the normal distribution of values.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Returns a float8 set of random values normally distributed (Gaussian
|
||||||
|
distribution).
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
test=# SELECT * FROM
|
||||||
|
test=# normal_rand(1000, 5, 3);
|
||||||
|
normal_rand
|
||||||
|
----------------------
|
||||||
|
1.56556322244898
|
||||||
|
9.10040991424657
|
||||||
|
5.36957140345079
|
||||||
|
-0.369151492880995
|
||||||
|
0.283600703686639
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
4.82992125404908
|
||||||
|
9.71308014517282
|
||||||
|
2.49639286969028
|
||||||
|
(1000 rows)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Returns 1000 values with a mean of 5 and a standard deviation of 3.
|
||||||
|
</para>
|
||||||
|
</sect3>
|
||||||
|
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title><literal>crosstabN(text sql)</literal></title>
|
||||||
|
<programlisting>
|
||||||
|
crosstabN(text sql)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The <literal>sql</literal> parameter is a SQL statement which produces the
|
||||||
|
source set of data. The SQL statement must return one row_name column, one
|
||||||
|
category column, and one value column. <literal>row_name</literal> and
|
||||||
|
value must be of type text. The function returns a set of
|
||||||
|
<literal>row_name</literal> plus N category value columns.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Provided <literal>sql</literal> must produce a set something like:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
row_name cat value
|
||||||
|
---------+-------+-------
|
||||||
|
row1 cat1 val1
|
||||||
|
row1 cat2 val2
|
||||||
|
row1 cat3 val3
|
||||||
|
row1 cat4 val4
|
||||||
|
row2 cat1 val5
|
||||||
|
row2 cat2 val6
|
||||||
|
row2 cat3 val7
|
||||||
|
row2 cat4 val8
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The returned value is a <literal>SETOF table_crosstab_N</literal>, which
|
||||||
|
is defined by:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TYPE tablefunc_crosstab_N AS (
|
||||||
|
row_name TEXT,
|
||||||
|
category_1 TEXT,
|
||||||
|
category_2 TEXT,
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
category_N TEXT
|
||||||
|
);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
for the default installed functions, where N is 2, 3, or 4.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
e.g. the provided crosstab2 function produces a set something like:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
<== values columns ==>
|
||||||
|
row_name category_1 category_2
|
||||||
|
---------+------------+------------
|
||||||
|
row1 val1 val2
|
||||||
|
row2 val5 val6
|
||||||
|
</programlisting>
|
||||||
|
<note>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The number of values columns depends on the tuple description
|
||||||
|
of the function's declared return type.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Missing values (i.e. not enough adjacent rows of same row_name to
|
||||||
|
fill the number of result values columns) are filled in with nulls.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Extra values (i.e. too many adjacent rows of same row_name to fill
|
||||||
|
the number of result values columns) are skipped.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Rows with all nulls in the values columns are skipped.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The installed defaults are for illustration purposes. You
|
||||||
|
can create your own return types and functions based on the
|
||||||
|
crosstab() function of the installed library. See below for
|
||||||
|
details.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
</note>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
|
||||||
|
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
|
||||||
|
|
||||||
|
select * from crosstab3(
|
||||||
|
'select rowid, attribute, value
|
||||||
|
from ct
|
||||||
|
where rowclass = ''group1''
|
||||||
|
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;');
|
||||||
|
|
||||||
|
row_name | category_1 | category_2 | category_3
|
||||||
|
----------+------------+------------+------------
|
||||||
|
test1 | val2 | val3 |
|
||||||
|
test2 | val6 | val7 |
|
||||||
|
(2 rows)
|
||||||
|
</programlisting>
|
||||||
|
</sect3>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title><literal>crosstab(text)</literal></title>
|
||||||
|
<programlisting>
|
||||||
|
crosstab(text sql)
|
||||||
|
crosstab(text sql, int N)
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The <literal>sql</literal> parameter is a SQL statement which produces the
|
||||||
|
source set of data. The SQL statement must return one
|
||||||
|
<literal>row_name</literal> column, one <literal>category</literal> column,
|
||||||
|
and one <literal>value</literal> column. <literal>N</literal> is an
|
||||||
|
obsolete argument; ignored if supplied (formerly this had to match the
|
||||||
|
number of category columns determined by the calling query).
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
e.g. provided sql must produce a set something like:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
row_name cat value
|
||||||
|
----------+-------+-------
|
||||||
|
row1 cat1 val1
|
||||||
|
row1 cat2 val2
|
||||||
|
row1 cat3 val3
|
||||||
|
row1 cat4 val4
|
||||||
|
row2 cat1 val5
|
||||||
|
row2 cat2 val6
|
||||||
|
row2 cat3 val7
|
||||||
|
row2 cat4 val8
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Returns a <literal>SETOF RECORD</literal>, which must be defined with a
|
||||||
|
column definition in the FROM clause of the SELECT statement, e.g.:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT *
|
||||||
|
FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
the example crosstab function produces a set something like:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
<== values columns ==>
|
||||||
|
row_name category_1 category_2
|
||||||
|
---------+------------+------------
|
||||||
|
row1 val1 val2
|
||||||
|
row2 val5 val6
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Note that it follows these rules:
|
||||||
|
</para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The number of values columns is determined by the column definition
|
||||||
|
provided in the FROM clause. The FROM clause must define one
|
||||||
|
row_name column (of the same datatype as the first result column
|
||||||
|
of the sql query) followed by N category columns (of the same
|
||||||
|
datatype as the third result column of the sql query). You can
|
||||||
|
set up as many category columns as you wish.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Missing values (i.e. not enough adjacent rows of same row_name to
|
||||||
|
fill the number of result values columns) are filled in with nulls.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Extra values (i.e. too many adjacent rows of same row_name to fill
|
||||||
|
the number of result values columns) are skipped.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Rows with all nulls in the values columns are skipped.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
You can avoid always having to write out a FROM clause that defines the
|
||||||
|
output columns by setting up a custom crosstab function that has
|
||||||
|
the desired output row type wired into its definition.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
<para>
|
||||||
|
There are two ways you can set up a custom crosstab function:
|
||||||
|
</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Create a composite type to define your return type, similar to the
|
||||||
|
examples in the installation script. Then define a unique function
|
||||||
|
name accepting one text parameter and returning setof your_type_name.
|
||||||
|
For example, if your source data produces row_names that are TEXT,
|
||||||
|
and values that are FLOAT8, and you want 5 category columns:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TYPE my_crosstab_float8_5_cols AS (
|
||||||
|
row_name TEXT,
|
||||||
|
category_1 FLOAT8,
|
||||||
|
category_2 FLOAT8,
|
||||||
|
category_3 FLOAT8,
|
||||||
|
category_4 FLOAT8,
|
||||||
|
category_5 FLOAT8
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
|
||||||
|
RETURNS setof my_crosstab_float8_5_cols
|
||||||
|
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Use OUT parameters to define the return type implicitly.
|
||||||
|
The same example could also be done this way:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text,
|
||||||
|
OUT row_name TEXT,
|
||||||
|
OUT category_1 FLOAT8,
|
||||||
|
OUT category_2 FLOAT8,
|
||||||
|
OUT category_3 FLOAT8,
|
||||||
|
OUT category_4 FLOAT8,
|
||||||
|
OUT category_5 FLOAT8)
|
||||||
|
RETURNS setof record
|
||||||
|
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
|
||||||
|
</programlisting>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE ct(id SERIAL, rowclass TEXT, rowid TEXT, attribute TEXT, value TEXT);
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att1','val1');
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att2','val2');
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att3','val3');
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att4','val4');
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att1','val5');
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att2','val6');
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att3','val7');
|
||||||
|
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att4','val8');
|
||||||
|
|
||||||
|
SELECT *
|
||||||
|
FROM crosstab(
|
||||||
|
'select rowid, attribute, value
|
||||||
|
from ct
|
||||||
|
where rowclass = ''group1''
|
||||||
|
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3)
|
||||||
|
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
|
||||||
|
|
||||||
|
row_name | category_1 | category_2 | category_3
|
||||||
|
----------+------------+------------+------------
|
||||||
|
test1 | val2 | val3 |
|
||||||
|
test2 | val6 | val7 |
|
||||||
|
(2 rows)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
</sect3>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title><literal>crosstab(text, text)</literal></title>
|
||||||
|
<programlisting>
|
||||||
|
crosstab(text source_sql, text category_sql)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Where <literal>source_sql</literal> is a SQL statement which produces the
|
||||||
|
source set of data. The SQL statement must return one
|
||||||
|
<literal>row_name</literal> column, one <literal>category</literal> column,
|
||||||
|
and one <literal>value</literal> column. It may also have one or more
|
||||||
|
<emphasis>extra</emphasis> columns.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The <literal>row_name</literal> column must be first. The
|
||||||
|
<literal>category</literal> and <literal>value</literal> columns must be
|
||||||
|
the last two columns, in that order. <emphasis>extra</emphasis> columns must
|
||||||
|
be columns 2 through (N - 2), where N is the total number of columns.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The <emphasis>extra</emphasis> columns are assumed to be the same for all
|
||||||
|
rows with the same <literal>row_name</literal>. The values returned are
|
||||||
|
copied from the first row with a given <literal>row_name</literal> and
|
||||||
|
subsequent values of these columns are ignored until
|
||||||
|
<literal>row_name</literal> changes.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
e.g. <literal>source_sql</literal> must produce a set something like:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT row_name, extra_col, cat, value FROM foo;
|
||||||
|
|
||||||
|
row_name extra_col cat value
|
||||||
|
----------+------------+-----+---------
|
||||||
|
row1 extra1 cat1 val1
|
||||||
|
row1 extra1 cat2 val2
|
||||||
|
row1 extra1 cat4 val4
|
||||||
|
row2 extra2 cat1 val5
|
||||||
|
row2 extra2 cat2 val6
|
||||||
|
row2 extra2 cat3 val7
|
||||||
|
row2 extra2 cat4 val8
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>category_sql</literal> has to be a SQL statement which produces
|
||||||
|
the distinct set of categories. The SQL statement must return one category
|
||||||
|
column only. <literal>category_sql</literal> must produce at least one
|
||||||
|
result row or an error will be generated. <literal>category_sql</literal>
|
||||||
|
must not produce duplicate categories or an error will be generated. e.g.:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT DISTINCT cat FROM foo;
|
||||||
|
cat
|
||||||
|
-------
|
||||||
|
cat1
|
||||||
|
cat2
|
||||||
|
cat3
|
||||||
|
cat4
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
The function returns <literal>SETOF RECORD</literal>, which must be defined
|
||||||
|
with a column definition in the FROM clause of the SELECT statement, e.g.:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM crosstab(source_sql, cat_sql)
|
||||||
|
AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
the example crosstab function produces a set something like:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
<== values columns ==>
|
||||||
|
row_name extra cat1 cat2 cat3 cat4
|
||||||
|
---------+-------+------+------+------+------
|
||||||
|
row1 extra1 val1 val2 val4
|
||||||
|
row2 extra2 val5 val6 val7 val8
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Note that it follows these rules:
|
||||||
|
</para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem><para>source_sql must be ordered by row_name (column 1).</para></listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The number of values columns is determined at run-time. The
|
||||||
|
column definition provided in the FROM clause must provide for
|
||||||
|
the correct number of columns of the proper data types.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Missing values (i.e. not enough adjacent rows of same row_name to
|
||||||
|
fill the number of result values columns) are filled in with nulls.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Extra values (i.e. source rows with category not found in category_sql
|
||||||
|
result) are skipped.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Rows with a null row_name column are skipped.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
You can create predefined functions to avoid having to write out
|
||||||
|
the result column names/types in each query. See the examples
|
||||||
|
for crosstab(text).
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
|
||||||
|
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','temperature','42');
|
||||||
|
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','test_result','PASS');
|
||||||
|
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','volts','2.6987');
|
||||||
|
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','temperature','53');
|
||||||
|
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_result','FAIL');
|
||||||
|
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
|
||||||
|
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','volts','3.1234');
|
||||||
|
|
||||||
|
SELECT * FROM crosstab
|
||||||
|
(
|
||||||
|
'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
|
||||||
|
'SELECT DISTINCT attribute FROM cth ORDER BY 1'
|
||||||
|
)
|
||||||
|
AS
|
||||||
|
(
|
||||||
|
rowid text,
|
||||||
|
rowdt timestamp,
|
||||||
|
temperature int4,
|
||||||
|
test_result text,
|
||||||
|
test_startdate timestamp,
|
||||||
|
volts float8
|
||||||
|
);
|
||||||
|
rowid | rowdt | temperature | test_result | test_startdate | volts
|
||||||
|
-------+--------------------------+-------------+-------------+--------------------------+--------
|
||||||
|
test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
|
||||||
|
test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
|
||||||
|
(2 rows)
|
||||||
|
</programlisting>
|
||||||
|
</sect3>
|
||||||
|
<sect3>
|
||||||
|
<title>
|
||||||
|
<literal>connectby(text, text, text[, text], text, text, int[, text])</literal>
|
||||||
|
</title>
|
||||||
|
<programlisting>
|
||||||
|
connectby(text relname, text keyid_fld, text parent_keyid_fld
|
||||||
|
[, text orderby_fld], text start_with, int max_depth
|
||||||
|
[, text branch_delim])
|
||||||
|
</programlisting>
|
||||||
|
<table>
|
||||||
|
<title><literal>connectby</literal> parameters</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Parameter</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>relname</literal></entry>
|
||||||
|
<entry>Name of the source relation</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>keyid_fld</literal></entry>
|
||||||
|
<entry>Name of the key field</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>parent_keyid_fld</literal></entry>
|
||||||
|
<entry>Name of the key_parent field</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>orderby_fld</literal></entry>
|
||||||
|
<entry>
|
||||||
|
If optional ordering of siblings is desired: Name of the field to
|
||||||
|
order siblings
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>start_with</literal></entry>
|
||||||
|
<entry>
|
||||||
|
Root value of the tree input as a text value regardless of
|
||||||
|
<literal>keyid_fld</literal>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>max_depth</literal></entry>
|
||||||
|
<entry>
|
||||||
|
Zero (0) for unlimited depth, otherwise restrict level to this depth
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>branch_delim</literal></entry>
|
||||||
|
<entry>
|
||||||
|
If optional branch value is desired, this string is used as the delimiter.
|
||||||
|
When not provided, a default value of '~' is used for internal
|
||||||
|
recursion detection only, and no "branch" field is returned.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<para>
|
||||||
|
The function returns <literal>SETOF RECORD</literal>, which must defined
|
||||||
|
with a column definition in the FROM clause of the SELECT statement, e.g.:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
|
||||||
|
AS t(keyid text, parent_keyid text, level int, branch text);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
or
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
|
||||||
|
AS t(keyid text, parent_keyid text, level int);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
or
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
|
||||||
|
AS t(keyid text, parent_keyid text, level int, branch text, pos int);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
or
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
|
||||||
|
AS t(keyid text, parent_keyid text, level int, pos int);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Note that it follows these rules:
|
||||||
|
</para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem><para>keyid and parent_keyid must be the same data type</para></listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The column definition *must* include a third column of type INT4 for
|
||||||
|
the level value output
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
If the branch field is not desired, omit both the branch_delim input
|
||||||
|
parameter *and* the branch field in the query column definition. Note
|
||||||
|
that when branch_delim is not provided, a default value of '~' is used
|
||||||
|
for branch_delim for internal recursion detection, even though the branch
|
||||||
|
field is not returned.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
If the branch field is desired, it must be the fourth column in the query
|
||||||
|
column definition, and it must be type TEXT.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The parameters representing table and field names must include double
|
||||||
|
quotes if the names are mixed-case or contain special characters.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
If sorting of siblings is desired, the orderby_fld input parameter *and*
|
||||||
|
a name for the resulting serial field (type INT32) in the query column
|
||||||
|
definition must be given.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
|
||||||
|
|
||||||
|
INSERT INTO connectby_tree VALUES('row1',NULL, 0);
|
||||||
|
INSERT INTO connectby_tree VALUES('row2','row1', 0);
|
||||||
|
INSERT INTO connectby_tree VALUES('row3','row1', 0);
|
||||||
|
INSERT INTO connectby_tree VALUES('row4','row2', 1);
|
||||||
|
INSERT INTO connectby_tree VALUES('row5','row2', 0);
|
||||||
|
INSERT INTO connectby_tree VALUES('row6','row4', 0);
|
||||||
|
INSERT INTO connectby_tree VALUES('row7','row3', 0);
|
||||||
|
INSERT INTO connectby_tree VALUES('row8','row6', 0);
|
||||||
|
INSERT INTO connectby_tree VALUES('row9','row5', 0);
|
||||||
|
|
||||||
|
-- with branch, without orderby_fld
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
|
||||||
|
AS t(keyid text, parent_keyid text, level int, branch text);
|
||||||
|
keyid | parent_keyid | level | branch
|
||||||
|
-------+--------------+-------+---------------------
|
||||||
|
row2 | | 0 | row2
|
||||||
|
row4 | row2 | 1 | row2~row4
|
||||||
|
row6 | row4 | 2 | row2~row4~row6
|
||||||
|
row8 | row6 | 3 | row2~row4~row6~row8
|
||||||
|
row5 | row2 | 1 | row2~row5
|
||||||
|
row9 | row5 | 2 | row2~row5~row9
|
||||||
|
(6 rows)
|
||||||
|
|
||||||
|
-- without branch, without orderby_fld
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
|
||||||
|
AS t(keyid text, parent_keyid text, level int);
|
||||||
|
keyid | parent_keyid | level
|
||||||
|
-------+--------------+-------
|
||||||
|
row2 | | 0
|
||||||
|
row4 | row2 | 1
|
||||||
|
row6 | row4 | 2
|
||||||
|
row8 | row6 | 3
|
||||||
|
row5 | row2 | 1
|
||||||
|
row9 | row5 | 2
|
||||||
|
(6 rows)
|
||||||
|
|
||||||
|
-- with branch, with orderby_fld (notice that row5 comes before row4)
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
|
||||||
|
AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos;
|
||||||
|
keyid | parent_keyid | level | branch | pos
|
||||||
|
-------+--------------+-------+---------------------+-----
|
||||||
|
row2 | | 0 | row2 | 1
|
||||||
|
row5 | row2 | 1 | row2~row5 | 2
|
||||||
|
row9 | row5 | 2 | row2~row5~row9 | 3
|
||||||
|
row4 | row2 | 1 | row2~row4 | 4
|
||||||
|
row6 | row4 | 2 | row2~row4~row6 | 5
|
||||||
|
row8 | row6 | 3 | row2~row4~row6~row8 | 6
|
||||||
|
(6 rows)
|
||||||
|
|
||||||
|
-- without branch, with orderby_fld (notice that row5 comes before row4)
|
||||||
|
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
|
||||||
|
AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos;
|
||||||
|
keyid | parent_keyid | level | pos
|
||||||
|
-------+--------------+-------+-----
|
||||||
|
row2 | | 0 | 1
|
||||||
|
row5 | row2 | 1 | 2
|
||||||
|
row9 | row5 | 2 | 3
|
||||||
|
row4 | row2 | 1 | 4
|
||||||
|
row6 | row4 | 2 | 5
|
||||||
|
row8 | row6 | 3 | 6
|
||||||
|
(6 rows)
|
||||||
|
</programlisting>
|
||||||
|
</sect3>
|
||||||
|
</sect2>
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
Joe Conway
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,214 @@
|
||||||
|
<sect1 id="pgtrgm">
|
||||||
|
<title>pg_trgm</title>
|
||||||
|
|
||||||
|
<indexterm zone="pgtrgm">
|
||||||
|
<primary>pgtrgm</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <literal>pg_trgm</literal> module provides functions and index classes
|
||||||
|
for determining the similarity of text based on trigram matching.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Trigram (or Trigraph)</title>
|
||||||
|
<para>
|
||||||
|
A trigram is a set of three consecutive characters taken
|
||||||
|
from a string. A string is considered to have two spaces
|
||||||
|
prefixed and one space suffixed when determining the set
|
||||||
|
of trigrams that comprise the string.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
eg. The set of trigrams in the word "cat" is " c", " ca",
|
||||||
|
"at " and "cat".
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Public Functions</title>
|
||||||
|
<table>
|
||||||
|
<title><literal>pg_trgm</literal> functions</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Function</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>real similarity(text, text)</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Returns a number that indicates how closely matches the two
|
||||||
|
arguments are. A zero result indicates that the two words
|
||||||
|
are completely dissimilar, and a result of one indicates that
|
||||||
|
the two words are identical.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>real show_limit()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Returns the current similarity threshold used by the '%'
|
||||||
|
operator. This in effect sets the minimum similarity between
|
||||||
|
two words in order that they be considered similar enough to
|
||||||
|
be misspellings of each other, for example.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>real set_limit(real)</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Sets the current similarity threshold that is used by the '%'
|
||||||
|
operator, and is returned by the show_limit() function.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>text[] show_trgm(text)</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Returns an array of all the trigrams of the supplied text
|
||||||
|
parameter.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>Operator: <literal>text % text (returns boolean)</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
The '%' operator returns TRUE if its two arguments have a similarity
|
||||||
|
that is greater than the similarity threshold set by set_limit(). It
|
||||||
|
will return FALSE if the similarity is less than the current
|
||||||
|
threshold.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Public Index Operator Class</title>
|
||||||
|
<para>
|
||||||
|
The <literal>pg_trgm</literal> module comes with the
|
||||||
|
<literal>gist_trgm_ops</literal> index operator class that allows a
|
||||||
|
developer to create an index over a text column for the purpose
|
||||||
|
of very fast similarity searches.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
To use this index, the '%' operator must be used and an appropriate
|
||||||
|
similarity threshold for the application must be set. Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE test_trgm (t text);
|
||||||
|
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
At this point, you will have an index on the t text column that you
|
||||||
|
can use for similarity searching. Example:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
SELECT
|
||||||
|
t,
|
||||||
|
similarity(t, 'word') AS sml
|
||||||
|
FROM
|
||||||
|
test_trgm
|
||||||
|
WHERE
|
||||||
|
t % 'word'
|
||||||
|
ORDER BY
|
||||||
|
sml DESC, t;
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
This will return all values in the text column that are sufficiently
|
||||||
|
similar to 'word', sorted from best match to worst. The index will
|
||||||
|
be used to make this a fast operation over very large data sets.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Tsearch2 Integration</title>
|
||||||
|
<para>
|
||||||
|
Trigram matching is a very useful tool when used in conjunction
|
||||||
|
with a text index created by the Tsearch2 contrib module. (See
|
||||||
|
contrib/tsearch2)
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The first step is to generate an auxiliary table containing all
|
||||||
|
the unique words in the Tsearch2 index:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE TABLE words AS SELECT word FROM
|
||||||
|
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Where 'documents' is a table that has a text field 'bodytext'
|
||||||
|
that TSearch2 is used to search. The use of the 'simple' dictionary
|
||||||
|
with the to_tsvector function, instead of just using the already
|
||||||
|
existing vector is to avoid creating a list of already stemmed
|
||||||
|
words. This way, only the original, unstemmed words are added
|
||||||
|
to the word list.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Next, create a trigram index on the word column:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
or
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
Now, a <literal>SELECT</literal> query similar to the example above can be
|
||||||
|
used to suggest spellings for misspelled words in user search terms. A
|
||||||
|
useful extra clause is to ensure that the similar words are also
|
||||||
|
of similar length to the misspelled word.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
Since the 'words' table has been generated as a separate,
|
||||||
|
static table, it will need to be periodically regenerated so that
|
||||||
|
it remains up to date with the word list in the Tsearch2 index.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>References</title>
|
||||||
|
<para>
|
||||||
|
Tsearch2 Development Site
|
||||||
|
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/"></ulink>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
GiST Development Site
|
||||||
|
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Authors</title>
|
||||||
|
<para>
|
||||||
|
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Documentation: Christopher Kings-Lynne
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,163 @@
|
||||||
|
|
||||||
|
<sect1 id="uuid-ossp">
|
||||||
|
<title>uuid-ossp</title>
|
||||||
|
|
||||||
|
<indexterm zone="uuid-ossp">
|
||||||
|
<primary>uuid-ossp</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This module provides functions to generate universally unique
|
||||||
|
identifiers (UUIDs) using one of the several standard algorithms, as
|
||||||
|
well as functions to produce certain special UUID constants.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>UUID Generation</title>
|
||||||
|
<para>
|
||||||
|
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
|
||||||
|
4122 specify four algorithms for generating UUIDs, identified by the
|
||||||
|
version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.)
|
||||||
|
Each of these algorithms could be suitable for a different set of
|
||||||
|
applications.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title><literal>uuid-ossp</literal> functions</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Function</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_generate_v1()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This function generates a version 1 UUID. This involves the MAC
|
||||||
|
address of the computer and a time stamp. Note that UUIDs of this
|
||||||
|
kind reveal the identity of the computer that created the identifier
|
||||||
|
and the time at which it did so, which might make it unsuitable for
|
||||||
|
certain security-sensitive applications.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_generate_v1mc()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This function generates a version 1 UUID but uses a random multicast
|
||||||
|
MAC address instead of the real MAC address of the computer.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_generate_v3(namespace uuid, name text)</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This function generates a version 3 UUID in the given namespace using
|
||||||
|
the specified input name. The namespace should be one of the special
|
||||||
|
constants produced by the uuid_ns_*() functions shown below. (It
|
||||||
|
could be any UUID in theory.) The name is an identifier in the
|
||||||
|
selected namespace. For example:
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
The name parameter will be MD5-hashed, so the cleartext cannot be
|
||||||
|
derived from the generated UUID.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The generation of UUIDs by this method has no random or
|
||||||
|
environment-dependent element and is therefore reproducible.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_generate_v4()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This function generates a version 4 UUID, which is derived entirely
|
||||||
|
from random numbers.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_generate_v5(namespace uuid, name text)</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This function generates a version 5 UUID, which works like a version 3
|
||||||
|
UUID except that SHA-1 is used as a hashing method. Version 5 should
|
||||||
|
be preferred over version 3 because SHA-1 is thought to be more secure
|
||||||
|
than MD5.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>UUID Constants</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_nil()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
A "nil" UUID constant, which does not occur as a real UUID.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_ns_dns()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Constant designating the DNS namespace for UUIDs.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_ns_url()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Constant designating the URL namespace for UUIDs.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_ns_oid()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Constant designating the ISO object identifier (OID) namespace for
|
||||||
|
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
|
||||||
|
PostgreSQL.)
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>uuid_ns_x500()</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Constant designating the X.500 distinguished name (DN) namespace for
|
||||||
|
UUIDs.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
Peter Eisentraut <email>peter_e@gmx.net</email>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,74 @@
|
||||||
|
<sect1 id="vacuumlo">
|
||||||
|
<title>vacuumlo</title>
|
||||||
|
|
||||||
|
<indexterm zone="vacuumlo">
|
||||||
|
<primary>vacuumlo</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This is a simple utility that will remove any orphaned large objects out of a
|
||||||
|
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
|
||||||
|
does not appear in any OID data column of the database.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
If you use this, you may also be interested in the lo_manage trigger in
|
||||||
|
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
|
||||||
|
in the first place.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
It was decided to place this in contrib as it needs further testing, but hopefully,
|
||||||
|
this (or a variant of it) would make it into the backend as a "vacuum lo"
|
||||||
|
command in a later release.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Usage</title>
|
||||||
|
<programlisting>
|
||||||
|
vacuumlo [options] database [database2 ... databasen]
|
||||||
|
</programlisting>
|
||||||
|
<para>
|
||||||
|
All databases named on the command line are processed. Available options
|
||||||
|
include:
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
-v Write a lot of progress messages
|
||||||
|
-n Don't remove large objects, just show what would be done
|
||||||
|
-U username Username to connect as
|
||||||
|
-W Prompt for password
|
||||||
|
-h hostname Database server host
|
||||||
|
-p port Database server port
|
||||||
|
</programlisting>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Method</title>
|
||||||
|
<para>
|
||||||
|
First, it builds a temporary table which contains all of the OIDs of the
|
||||||
|
large objects in that database.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
It then scans through all columns in the database that are of type "oid"
|
||||||
|
or "lo", and removes matching entries from the temporary table.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The remaining entries in the temp table identify orphaned LOs. These are
|
||||||
|
removed.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Author</title>
|
||||||
|
<para>
|
||||||
|
Peter Mount <email>peter@retep.org.uk</email>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
<ulink url="http://www.retep.org.uk"></ulink>
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
|
@ -0,0 +1,436 @@
|
||||||
|
<sect1 id="xml2">
|
||||||
|
<title>xml2: XML-handling functions</title>
|
||||||
|
|
||||||
|
<indexterm zone="xml2">
|
||||||
|
<primary>xml2</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Deprecation notice</title>
|
||||||
|
<para>
|
||||||
|
From PostgreSQL 8.3 on, there is XML-related
|
||||||
|
functionality based on the SQL/XML standard in the core server.
|
||||||
|
That functionality covers XML syntax checking and XPath queries,
|
||||||
|
which is what this module does as well, and more, but the API is
|
||||||
|
not at all compatible. It is planned that this module will be
|
||||||
|
removed in PostgreSQL 8.4 in favor of the newer standard API, so
|
||||||
|
you are encouraged to try converting your applications. If you
|
||||||
|
find that some of the functionality of this module is not
|
||||||
|
available in an adequate form with the newer API, please explain
|
||||||
|
your issue to pgsql-hackers@postgresql.org so that the deficiency
|
||||||
|
can be addressed.
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Description of functions</title>
|
||||||
|
<para>
|
||||||
|
The first set of functions are straightforward XML parsing and XPath queries:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<title>Functions</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<programlisting>
|
||||||
|
xml_is_well_formed(document) RETURNS bool
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This parses the document text in its parameter and returns true if the
|
||||||
|
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
|
||||||
|
was called xml_valid(). That is the wrong name since validity and
|
||||||
|
well-formedness have different meanings in XML. The old name is still
|
||||||
|
available, but is deprecated and will be removed in 8.3.)
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<programlisting>
|
||||||
|
xpath_string(document,query) RETURNS text
|
||||||
|
xpath_number(document,query) RETURNS float4
|
||||||
|
xpath_bool(document,query) RETURNS bool
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
These functions evaluate the XPath query on the supplied document, and
|
||||||
|
cast the result to the specified type.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<programlisting>
|
||||||
|
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This evaluates query on document and wraps the result in XML tags. If
|
||||||
|
the result is multivalued, the output will look like:
|
||||||
|
</para>
|
||||||
|
<literal>
|
||||||
|
<toptag>
|
||||||
|
<itemtag>Value 1 which could be an XML fragment</itemtag>
|
||||||
|
<itemtag>Value 2....</itemtag>
|
||||||
|
</toptag>
|
||||||
|
</literal>
|
||||||
|
<para>
|
||||||
|
If either toptag or itemtag is an empty string, the relevant tag is omitted.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<programlisting>
|
||||||
|
xpath_nodeset(document,query) RETURNS
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<programlisting>
|
||||||
|
xpath_nodeset(document,query,itemtag) RETURNS
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<programlisting>
|
||||||
|
xpath_list(document,query,seperator) RETURNS text
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
This function returns multiple values seperated by the specified
|
||||||
|
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>
|
||||||
|
<programlisting>
|
||||||
|
xpath_list(document,query) RETURNS text
|
||||||
|
</programlisting>
|
||||||
|
</entry>
|
||||||
|
<entry>
|
||||||
|
This is a wrapper for the above function that uses ',' as the seperator.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title><literal>xpath_table</literal></title>
|
||||||
|
<para>
|
||||||
|
This is a table function which evaluates a set of XPath queries on
|
||||||
|
each of a set of documents and returns the results as a table. The
|
||||||
|
primary key field from the original document table is returned as the
|
||||||
|
first column of the result so that the resultset from xpath_table can
|
||||||
|
be readily used in joins.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The function itself takes 5 arguments, all text.
|
||||||
|
</para>
|
||||||
|
<programlisting>
|
||||||
|
xpath_table(key,document,relation,xpaths,criteria)
|
||||||
|
</programlisting>
|
||||||
|
<table>
|
||||||
|
<title>Parameters</title>
|
||||||
|
<tgroup cols="2">
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><literal>key</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
the name of the "key" field - this is just a field to be used as
|
||||||
|
the first column of the output table i.e. it identifies the record from
|
||||||
|
which each output row came (see note below about multiple values).
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>document</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
the name of the field containing the XML document
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>relation</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
the name of the table or view containing the documents
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>xpaths</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
multiple xpath expressions separated by <literal>|</literal>
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><literal>criteria</literal></entry>
|
||||||
|
<entry>
|
||||||
|
<para>
|
||||||
|
The contents of the where clause. This needs to be specified,
|
||||||
|
so use "true" or "1=1" here if you want to process all the rows in the
|
||||||
|
relation.
|
||||||
|
</para>
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
NB These parameters (except the XPath strings) are just substituted
|
||||||
|
into a plain SQL SELECT statement, so you have some flexibility - the
|
||||||
|
statement is
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>
|
||||||
|
SELECT <key>,<document> FROM <relation> WHERE <criteria>
|
||||||
|
</literal>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
so those parameters can be *anything* valid in those particular
|
||||||
|
locations. The result from this SELECT needs to return exactly two
|
||||||
|
columns (which it will unless you try to list multiple fields for key
|
||||||
|
or document). Beware that this simplistic approach requires that you
|
||||||
|
validate any user-supplied values to avoid SQL injection attacks.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Using the function
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The function has to be used in a FROM expression. This gives the following
|
||||||
|
form:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM
|
||||||
|
xpath_table('article_id',
|
||||||
|
'article_xml',
|
||||||
|
'articles',
|
||||||
|
'/article/author|/article/pages|/article/title',
|
||||||
|
'date_entered > ''2003-01-01'' ')
|
||||||
|
AS t(article_id integer, author text, page_count integer, title text);
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The AS clause defines the names and types of the columns in the
|
||||||
|
virtual table. If there are more XPath queries than result columns,
|
||||||
|
the extra queries will be ignored. If there are more result columns
|
||||||
|
than XPath queries, the extra columns will be NULL.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Note that I've said in this example that pages is an integer. The
|
||||||
|
function deals internally with string representations, so when you say
|
||||||
|
you want an integer in the output, it will take the string
|
||||||
|
representation of the XPath result and use PostgreSQL input functions
|
||||||
|
to transform it into an integer (or whatever type the AS clause
|
||||||
|
requests). An error will result if it can't do this - for example if
|
||||||
|
the result is empty - so you may wish to just stick to 'text' as the
|
||||||
|
column type if you think your data has any problems.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The select statement doesn't need to use * alone - it can reference the
|
||||||
|
columns by name or join them to other tables. The function produces a
|
||||||
|
virtual table with which you can perform any operation you wish (e.g.
|
||||||
|
aggregation, joining, sorting etc). So we could also have:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
SELECT t.title, p.fullname, p.email
|
||||||
|
FROM xpath_table('article_id','article_xml','articles',
|
||||||
|
'/article/title|/article/author/@id',
|
||||||
|
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
|
||||||
|
AS t(article_id integer, title text, author_id integer),
|
||||||
|
tblPeopleInfo AS p
|
||||||
|
WHERE t.author_id = p.person_id;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
as a more complicated example. Of course, you could wrap all
|
||||||
|
of this in a view for convenience.
|
||||||
|
</para>
|
||||||
|
<sect3>
|
||||||
|
<title>Multivalued results</title>
|
||||||
|
<para>
|
||||||
|
The xpath_table function assumes that the results of each XPath query
|
||||||
|
might be multi-valued, so the number of rows returned by the function
|
||||||
|
may not be the same as the number of input documents. The first row
|
||||||
|
returned contains the first result from each query, the second row the
|
||||||
|
second result from each query. If one of the queries has fewer values
|
||||||
|
than the others, NULLs will be returned instead.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
In some cases, a user will know that a given XPath query will return
|
||||||
|
only a single result (perhaps a unique document identifier) - if used
|
||||||
|
alongside an XPath query returning multiple results, the single-valued
|
||||||
|
result will appear only on the first row of the result. The solution
|
||||||
|
to this is to use the key field as part of a join against a simpler
|
||||||
|
XPath query. As an example:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<literal>
|
||||||
|
CREATE TABLE test
|
||||||
|
(
|
||||||
|
id int4 NOT NULL,
|
||||||
|
xml text,
|
||||||
|
CONSTRAINT pk PRIMARY KEY (id)
|
||||||
|
)
|
||||||
|
WITHOUT OIDS;
|
||||||
|
|
||||||
|
INSERT INTO test VALUES (1, '<doc num="C1">
|
||||||
|
<line num="L1"><a>1</a><b>2</b><c>3</c></line>
|
||||||
|
<line num="L2"><a>11</a><b>22</b><c>33</c></line>
|
||||||
|
</doc>');
|
||||||
|
|
||||||
|
INSERT INTO test VALUES (2, '<doc num="C2">
|
||||||
|
<line num="L1"><a>111</a><b>222</b><c>333</c></line>
|
||||||
|
<line num="L2"><a>111</a><b>222</b><c>333</c></line>
|
||||||
|
</doc>');
|
||||||
|
</literal>
|
||||||
|
</para>
|
||||||
|
</sect3>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title>The query</title>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
SELECT * FROM xpath_table('id','xml','test',
|
||||||
|
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
|
||||||
|
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
|
||||||
|
val2 int4, val3 int4)
|
||||||
|
WHERE id = 1 ORDER BY doc_num, line_num
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Gives the result:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
id | doc_num | line_num | val1 | val2 | val3
|
||||||
|
----+---------+----------+------+------+------
|
||||||
|
1 | C1 | L1 | 1 | 2 | 3
|
||||||
|
1 | | L2 | 11 | 22 | 33
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
To get doc_num on every line, the solution is to use two invocations
|
||||||
|
of xpath_table and join the results:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
SELECT t.*,i.doc_num FROM
|
||||||
|
xpath_table('id','xml','test',
|
||||||
|
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
|
||||||
|
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
|
||||||
|
xpath_table('id','xml','test','/doc/@num','1=1')
|
||||||
|
AS i(id int4, doc_num varchar(10))
|
||||||
|
WHERE i.id=t.id AND i.id=1
|
||||||
|
ORDER BY doc_num, line_num;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
which gives the desired result:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
id | line_num | val1 | val2 | val3 | doc_num
|
||||||
|
----+----------+------+------+------+---------
|
||||||
|
1 | L1 | 1 | 2 | 3 | C1
|
||||||
|
1 | L2 | 11 | 22 | 33 | C1
|
||||||
|
(2 rows)
|
||||||
|
</programlisting>
|
||||||
|
</sect3>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>XSLT functions</title>
|
||||||
|
<para>
|
||||||
|
The following functions are available if libxslt is installed (this is
|
||||||
|
not currently detected automatically, so you will have to amend the
|
||||||
|
Makefile)
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect3>
|
||||||
|
<title><literal>xslt_process</literal></title>
|
||||||
|
<programlisting>
|
||||||
|
xslt_process(document,stylesheet,paramlist) RETURNS text
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This function appplies the XSL stylesheet to the document and returns
|
||||||
|
the transformed result. The paramlist is a list of parameter
|
||||||
|
assignments to be used in the transformation, specified in the form
|
||||||
|
'a=1,b=2'. Note that this is also proof-of-concept code and the
|
||||||
|
parameter parsing is very simple-minded (e.g. parameter values cannot
|
||||||
|
contain commas!)
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Also note that if either the document or stylesheet values do not
|
||||||
|
begin with a < then they will be treated as URLs and libxslt will
|
||||||
|
fetch them. It thus follows that you can use xslt_process as a means
|
||||||
|
to fetch the contents of URLs - you should be aware of the security
|
||||||
|
implications of this.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
There is also a two-parameter version of xslt_process which does not
|
||||||
|
pass any parameters to the transformation.
|
||||||
|
</para>
|
||||||
|
</sect3>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Credits</title>
|
||||||
|
<para>
|
||||||
|
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
|
||||||
|
It has the same BSD licence as PostgreSQL.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This version of the XML functions provides both XPath querying and
|
||||||
|
XSLT functionality. There is also a new table function which allows
|
||||||
|
the straightforward return of multiple XML results. Note that the current code
|
||||||
|
doesn't take any particular care over character sets - this is
|
||||||
|
something that should be fixed at some point!
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
If you have any comments or suggestions, please do contact me at
|
||||||
|
<email>jgray@azuli.co.uk.</email> Unfortunately, this isn't my main job, so
|
||||||
|
I can't guarantee a rapid response to your query!
|
||||||
|
</para>
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
|
Loading…
Reference in New Issue