Go to file
Bruce Momjian 64d0b8b05f Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:

Data:
-------------------------------------------------------------------
select * from cth;
  id | rowid |        rowdt        |   attribute    |      val
----+-------+---------------------+----------------+---------------
   1 | test1 | 2003-03-01 00:00:00 | temperature    | 42
   2 | test1 | 2003-03-01 00:00:00 | test_result    | PASS
   3 | test1 | 2003-03-01 00:00:00 | volts          | 2.6987
   4 | test2 | 2003-03-02 00:00:00 | temperature    | 53
   5 | test2 | 2003-03-02 00:00:00 | test_result    | FAIL
   6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
   7 | test2 | 2003-03-02 00:00:00 | volts          | 3.1234
(7 rows)

Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
   'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
  rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
  test1 | 42          | PASS        | 2.6987         |
  test2 | 53          | FAIL        | 01 March 2003  | 3.1234
(2 rows)

Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
   'SELECT rowid, attribute, val FROM cth ORDER BY 1',
   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
  rowid | temperature | test_result |   test_startdate    | volts
-------+-------------+-------------+---------------------+--------
  test1 |          42 | PASS        |                     | 2.6987
  test2 |          53 | FAIL        | 2003-03-01 00:00:00 | 3.1234
(2 rows)

Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.

The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).

In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).

I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.

README and regression test adjustments included. If there are no
objections, please apply.

Joe Conway
2003-03-20 06:46:30 +00:00
config Factor out the code that detects the long long int snprintf format into a 2003-01-28 21:57:12 +00:00
contrib Attached is an update to contrib/tablefunc. It implements a new hashed 2003-03-20 06:46:30 +00:00
doc PGRES_POLLING_ACTIVE is unused, keep for backward compatibility. 2003-03-20 06:23:30 +00:00
src I'm continuing to work on cleaning up code in psql. As things appear 2003-03-20 06:43:35 +00:00
aclocal.m4 Remove leftovers from subproject removals. Fixes for Python and Kerberos 2002-09-04 22:54:18 +00:00
configure Use poll(2) in preference to select(2), if available. This solves 2003-03-06 03:16:55 +00:00
configure.in Use poll(2) in preference to select(2), if available. This solves 2003-03-06 03:16:55 +00:00
COPYRIGHT Update copyright to 2002. 2002-06-20 20:29:54 +00:00
GNUmakefile.in First step to removing /contrib/retep, with Peter Mount's approval. 2002-10-21 00:12:46 +00:00
HISTORY Allow PAM to work on MAC OS X, report from Aaron Hillegass. 2003-02-14 14:13:56 +00:00
INSTALL Regenerate 2002-11-21 23:33:22 +00:00
Makefile Restructure the key include files per recent pghackers discussion: there 2001-02-10 02:31:31 +00:00
README Improve wording. 2002-11-11 20:03:40 +00:00
register.txt Stamp for 7.3 beta3. 2002-10-24 03:03:37 +00:00

PostgreSQL Database Management System
=====================================
  
This directory contains the source code distribution of the PostgreSQL
database management system.

PostgreSQL is an advanced object-relational database management system
that supports an extended subset of the SQL standard, including
transactions, foreign keys, subqueries, triggers, user-defined types
and functions.  This distribution also contains several language
bindings, including C, Perl, Python, and Tcl, as well as a JDBC
driver.

The ODBC and C++ interfaces have been moved to the PostgreSQL Projects
Web Site at http://gborg.postgresql.org for separate maintenance.

See the file INSTALL for instructions on how to build and install
PostgreSQL.  That file also lists supported operating systems and
hardware platforms and contains information regarding any other
software packages that are required to build or run the PostgreSQL
system.  Changes between all PostgreSQL releases are recorded in the
file HISTORY.  Copyright and license information can be found in the
file COPYRIGHT.  A comprehensive documentation set is included in this
distribution; it can be read as described in the installation
instructions.

The latest version of this software may be obtained at
ftp://ftp.postgresql.org/pub/.  For more information look at our web
site located at http://www.postgresql.org/.