2010-09-20 22:08:53 +02:00
<!-- doc/src/sgml/datatype.sgml -->
2000-03-31 05:27:42 +02:00
1999-05-12 09:32:47 +02:00
<chapter id="datatype">
2010-04-03 09:23:02 +02:00
<title>Data Types</title>
1998-03-01 09:16:16 +01:00
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype">
2003-08-31 19:32:24 +02:00
<primary>data type</primary>
2001-05-13 00:51:36 +02:00
</indexterm>
<indexterm>
2003-08-31 19:32:24 +02:00
<primary>type</primary>
<see>data type</see>
2001-05-13 00:51:36 +02:00
</indexterm>
1999-05-12 09:32:47 +02:00
<para>
2005-10-15 03:47:12 +02:00
<productname>PostgreSQL</productname> has a rich set of native data
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
types available to users. Users can add new types to
2005-10-15 03:47:12 +02:00
<productname>PostgreSQL</productname> using the <xref
2017-11-23 15:39:47 +01:00
linkend="sql-createtype"/> command.
1999-05-12 09:32:47 +02:00
</para>
1998-03-01 09:16:16 +01:00
1999-05-12 09:32:47 +02:00
<para>
2017-11-23 15:39:47 +01:00
<xref linkend="datatype-table"/> shows all the built-in general-purpose data
2010-11-23 21:27:50 +01:00
types. Most of the alternative names listed in the
2001-01-13 19:34:51 +01:00
<quote>Aliases</quote> column are the names used internally by
2001-11-21 06:53:41 +01:00
<productname>PostgreSQL</productname> for historical reasons. In
2001-01-13 19:34:51 +01:00
addition, some internally used or deprecated types are available,
2009-04-27 18:27:36 +02:00
but are not listed here.
1999-05-12 09:32:47 +02:00
</para>
1998-12-18 17:11:12 +01:00
2001-01-13 19:34:51 +01:00
<table id="datatype-table">
2001-02-14 20:37:26 +01:00
<title>Data Types</title>
1999-05-12 09:32:47 +02:00
<tgroup cols="3">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="2*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="2*"/>
1999-05-12 09:32:47 +02:00
<thead>
<row>
2003-03-13 02:30:29 +01:00
<entry>Name</entry>
2001-01-13 19:34:51 +01:00
<entry>Aliases</entry>
1999-05-12 09:32:47 +02:00
<entry>Description</entry>
</row>
</thead>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<tbody>
<row>
2001-01-13 19:34:51 +01:00
<entry><type>bigint</type></entry>
<entry><type>int8</type></entry>
<entry>signed eight-byte integer</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2001-08-24 22:03:45 +02:00
<row>
<entry><type>bigserial</type></entry>
<entry><type>serial8</type></entry>
<entry>autoincrementing eight-byte integer</entry>
</row>
1999-05-12 09:32:47 +02:00
<row>
2004-09-20 06:19:50 +02:00
<entry><type>bit [ (<replaceable>n</replaceable>) ]</type></entry>
1999-05-12 09:32:47 +02:00
<entry></entry>
2001-01-13 19:34:51 +01:00
<entry>fixed-length bit string</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2004-09-20 06:19:50 +02:00
<entry><type>bit varying [ (<replaceable>n</replaceable>) ]</type></entry>
2018-03-30 17:18:08 +02:00
<entry><type>varbit [ (<replaceable>n</replaceable>) ]</type></entry>
2001-01-13 19:34:51 +01:00
<entry>variable-length bit string</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>boolean</type></entry>
<entry><type>bool</type></entry>
2001-02-14 20:37:26 +01:00
<entry>logical Boolean (true/false)</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>box</type></entry>
1999-05-12 09:32:47 +02:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>rectangular box on a plane</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2001-09-04 05:17:54 +02:00
<row>
<entry><type>bytea</type></entry>
<entry></entry>
2017-10-09 03:44:17 +02:00
<entry>binary data (<quote>byte array</quote>)</entry>
2001-09-04 05:17:54 +02:00
</row>
2003-01-15 19:01:05 +01:00
<row>
2004-09-20 06:19:50 +02:00
<entry><type>character [ (<replaceable>n</replaceable>) ]</type></entry>
<entry><type>char [ (<replaceable>n</replaceable>) ]</type></entry>
2003-01-15 19:01:05 +01:00
<entry>fixed-length character string</entry>
</row>
2013-11-28 03:50:27 +01:00
<row>
<entry><type>character varying [ (<replaceable>n</replaceable>) ]</type></entry>
<entry><type>varchar [ (<replaceable>n</replaceable>) ]</type></entry>
<entry>variable-length character string</entry>
</row>
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>cidr</type></entry>
<entry></entry>
2003-06-25 00:21:24 +02:00
<entry>IPv4 or IPv6 network address</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>circle</type></entry>
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>circle on a plane</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>date</type></entry>
1999-05-12 09:32:47 +02:00
<entry></entry>
2001-01-13 19:34:51 +01:00
<entry>calendar date (year, month, day)</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>double precision</type></entry>
<entry><type>float8</type></entry>
2009-04-27 18:27:36 +02:00
<entry>double precision floating-point number (8 bytes)</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>inet</type></entry>
<entry></entry>
2003-06-25 00:21:24 +02:00
<entry>IPv4 or IPv6 host address</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>integer</type></entry>
<entry><type>int</type>, <type>int4</type></entry>
<entry>signed four-byte integer</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2000-03-14 23:52:53 +01:00
<row>
2008-09-11 17:27:30 +02:00
<entry><type>interval [ <replaceable>fields</replaceable> ] [ (<replaceable>p</replaceable>) ]</type></entry>
2001-01-13 19:34:51 +01:00
<entry></entry>
2003-03-13 02:30:29 +01:00
<entry>time span</entry>
2000-03-14 23:52:53 +01:00
</row>
2001-01-13 19:34:51 +01:00
2013-11-28 03:50:27 +01:00
<row>
<entry><type>json</type></entry>
<entry></entry>
Introduce jsonb, a structured format for storing json.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.
The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.
This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.
Authors: Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
2014-03-23 21:40:19 +01:00
<entry>textual JSON data</entry>
</row>
<row>
<entry><type>jsonb</type></entry>
<entry></entry>
<entry>binary JSON data, decomposed</entry>
2013-11-28 03:50:27 +01:00
</row>
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>line</type></entry>
1999-05-12 09:32:47 +02:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>infinite line on a plane</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>lseg</type></entry>
1999-05-12 09:32:47 +02:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>line segment on a plane</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>macaddr</type></entry>
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>MAC (Media Access Control) address</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2017-03-15 16:16:25 +01:00
<row>
<entry><type>macaddr8</type></entry>
<entry></entry>
<entry>MAC (Media Access Control) address (EUI-64 format)</entry>
</row>
1999-10-13 04:44:23 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>money</type></entry>
<entry></entry>
2002-11-11 21:14:04 +01:00
<entry>currency amount</entry>
1999-10-13 04:44:23 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-12-29 19:35:54 +01:00
<entry><type>numeric [ (<replaceable>p</replaceable>,
2003-11-01 02:56:29 +01:00
<replaceable>s</replaceable>) ]</type></entry>
2001-12-29 19:35:54 +01:00
<entry><type>decimal [ (<replaceable>p</replaceable>,
2003-11-01 02:56:29 +01:00
<replaceable>s</replaceable>) ]</type></entry>
2004-12-23 06:37:40 +01:00
<entry>exact numeric of selectable precision</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>path</type></entry>
1999-05-12 09:32:47 +02:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>geometric path on a plane</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2014-02-19 14:35:23 +01:00
<row>
<entry><type>pg_lsn</type></entry>
<entry></entry>
<entry><productname>PostgreSQL</productname> Log Sequence Number</entry>
</row>
2020-04-07 01:33:56 +02:00
<row>
<entry><type>pg_snapshot</type></entry>
<entry></entry>
<entry>user-level transaction ID snapshot</entry>
</row>
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>point</type></entry>
1999-05-12 09:32:47 +02:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>geometric point on a plane</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2000-08-25 01:36:29 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>polygon</type></entry>
2000-08-25 01:36:29 +02:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>closed geometric path on a plane</entry>
2000-08-25 01:36:29 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>real</type></entry>
<entry><type>float4</type></entry>
2009-04-27 18:27:36 +02:00
<entry>single precision floating-point number (4 bytes)</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>smallint</type></entry>
<entry><type>int2</type></entry>
<entry>signed two-byte integer</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2011-06-22 04:52:52 +02:00
<row>
<entry><type>smallserial</type></entry>
<entry><type>serial2</type></entry>
<entry>autoincrementing two-byte integer</entry>
</row>
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>serial</type></entry>
2001-08-16 22:38:56 +02:00
<entry><type>serial4</type></entry>
2001-01-13 19:34:51 +01:00
<entry>autoincrementing four-byte integer</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-01-13 19:34:51 +01:00
<entry><type>text</type></entry>
<entry></entry>
1999-05-12 09:32:47 +02:00
<entry>variable-length character string</entry>
</row>
1998-12-18 17:11:12 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-12-08 04:24:23 +01:00
<entry><type>time [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
2001-01-13 19:34:51 +01:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>time of day (no time zone)</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
1999-05-12 09:32:47 +02:00
<row>
2001-12-08 04:24:23 +01:00
<entry><type>time [ (<replaceable>p</replaceable>) ] with time zone</type></entry>
2001-11-28 21:49:10 +01:00
<entry><type>timetz</type></entry>
2001-01-13 19:34:51 +01:00
<entry>time of day, including time zone</entry>
1999-05-12 09:32:47 +02:00
</row>
2001-01-13 19:34:51 +01:00
2001-09-28 10:15:35 +02:00
<row>
2004-09-18 17:28:03 +02:00
<entry><type>timestamp [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
2004-12-23 06:37:40 +01:00
<entry></entry>
2009-04-27 18:27:36 +02:00
<entry>date and time (no time zone)</entry>
2001-09-28 10:15:35 +02:00
</row>
1999-05-12 09:32:47 +02:00
<row>
2004-09-18 17:28:03 +02:00
<entry><type>timestamp [ (<replaceable>p</replaceable>) ] with time zone</type></entry>
2001-11-28 21:49:10 +01:00
<entry><type>timestamptz</type></entry>
2001-11-19 10:05:02 +01:00
<entry>date and time, including time zone</entry>
1999-05-12 09:32:47 +02:00
</row>
2006-12-21 17:05:16 +01:00
2007-08-29 22:37:14 +02:00
<row>
<entry><type>tsquery</type></entry>
<entry></entry>
2007-10-21 22:04:37 +02:00
<entry>text search query</entry>
2007-08-29 22:37:14 +02:00
</row>
<row>
<entry><type>tsvector</type></entry>
<entry></entry>
2007-10-21 22:04:37 +02:00
<entry>text search document</entry>
2007-08-29 22:37:14 +02:00
</row>
2007-10-14 01:06:28 +02:00
<row>
<entry><type>txid_snapshot</type></entry>
<entry></entry>
2020-04-07 01:33:56 +02:00
<entry>user-level transaction ID snapshot (deprecated; see <type>pg_snapshot</type>)</entry>
2007-10-14 01:06:28 +02:00
</row>
2007-04-20 23:51:46 +02:00
<row>
<entry><type>uuid</type></entry>
<entry></entry>
<entry>universally unique identifier</entry>
</row>
2006-12-21 17:05:16 +01:00
<row>
<entry><type>xml</type></entry>
<entry></entry>
<entry>XML data</entry>
</row>
1999-05-12 09:32:47 +02:00
</tbody>
</tgroup>
</table>
1998-10-27 07:14:41 +01:00
2001-01-13 19:34:51 +01:00
<note>
<title>Compatibility</title>
<para>
2002-11-15 04:11:18 +01:00
The following types (or spellings thereof) are specified by
2007-07-27 12:37:52 +02:00
<acronym>SQL</acronym>: <type>bigint</type>, <type>bit</type>, <type>bit
2002-11-15 04:11:18 +01:00
varying</type>, <type>boolean</type>, <type>char</type>,
2003-01-15 19:01:05 +01:00
<type>character varying</type>, <type>character</type>,
2002-11-15 04:11:18 +01:00
<type>varchar</type>, <type>date</type>, <type>double
precision</type>, <type>integer</type>, <type>interval</type>,
<type>numeric</type>, <type>decimal</type>, <type>real</type>,
2003-11-06 23:21:47 +01:00
<type>smallint</type>, <type>time</type> (with or without time zone),
2006-12-21 17:05:16 +01:00
<type>timestamp</type> (with or without time zone),
<type>xml</type>.
2001-01-13 19:34:51 +01:00
</para>
</note>
1998-03-01 09:16:16 +01:00
1999-05-12 09:32:47 +02:00
<para>
2001-11-19 10:05:02 +01:00
Each data type has an external representation determined by its input
and output functions. Many of the built-in types have
obvious external formats. However, several types are either unique
2004-12-23 06:37:40 +01:00
to <productname>PostgreSQL</productname>, such as geometric
2009-04-27 18:27:36 +02:00
paths, or have several possible formats, such as the date
2001-11-19 10:05:02 +01:00
and time types.
2009-06-17 23:58:49 +02:00
Some of the input and output functions are not invertible, i.e.,
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
the result of an output function might lose accuracy when compared to
2001-11-19 10:05:02 +01:00
the original input.
</para>
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-numeric">
1999-05-12 09:32:47 +02:00
<title>Numeric Types</title>
1998-03-01 09:16:16 +01:00
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype-numeric">
2003-08-31 19:32:24 +02:00
<primary>data type</primary>
2001-05-13 00:51:36 +02:00
<secondary>numeric</secondary>
</indexterm>
1999-05-12 09:32:47 +02:00
<para>
2000-08-25 01:36:29 +02:00
Numeric types consist of two-, four-, and eight-byte integers,
2004-12-23 06:37:40 +01:00
four- and eight-byte floating-point numbers, and selectable-precision
2017-11-23 15:39:47 +01:00
decimals. <xref linkend="datatype-numeric-table"/> lists the
2002-11-11 21:14:04 +01:00
available types.
1998-12-18 17:11:12 +01:00
</para>
1998-10-14 18:26:31 +02:00
2002-11-11 21:14:04 +01:00
<table id="datatype-numeric-table">
2001-02-14 20:37:26 +01:00
<title>Numeric Types</title>
1999-08-06 15:43:42 +02:00
<tgroup cols="4">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="2*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="2*"/>
<colspec colname="col4" colwidth="2*"/>
1999-08-06 15:43:42 +02:00
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>Storage Size</entry>
<entry>Description</entry>
<entry>Range</entry>
1999-08-06 15:43:42 +02:00
</row>
</thead>
2001-01-13 19:34:51 +01:00
1999-08-06 15:43:42 +02:00
<tbody>
1999-10-13 04:44:23 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>smallint</type></entry>
2003-11-01 02:56:29 +01:00
<entry>2 bytes</entry>
<entry>small-range integer</entry>
<entry>-32768 to +32767</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>integer</type></entry>
2003-11-01 02:56:29 +01:00
<entry>4 bytes</entry>
2009-04-27 18:27:36 +02:00
<entry>typical choice for integer</entry>
2003-11-01 02:56:29 +01:00
<entry>-2147483648 to +2147483647</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>bigint</type></entry>
2003-11-01 02:56:29 +01:00
<entry>8 bytes</entry>
<entry>large-range integer</entry>
2013-03-03 14:58:34 +01:00
<entry>-9223372036854775808 to +9223372036854775807</entry>
2001-01-13 19:34:51 +01:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>decimal</type></entry>
2003-11-01 02:56:29 +01:00
<entry>variable</entry>
<entry>user-specified precision, exact</entry>
2011-04-04 01:56:22 +02:00
<entry>up to 131072 digits before the decimal point; up to 16383 digits after the decimal point</entry>
1999-10-13 04:44:23 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>numeric</type></entry>
2003-11-01 02:56:29 +01:00
<entry>variable</entry>
<entry>user-specified precision, exact</entry>
2011-04-04 01:56:22 +02:00
<entry>up to 131072 digits before the decimal point; up to 16383 digits after the decimal point</entry>
1999-08-06 15:43:42 +02:00
</row>
2001-01-13 19:34:51 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>real</type></entry>
2003-11-01 02:56:29 +01:00
<entry>4 bytes</entry>
<entry>variable-precision, inexact</entry>
<entry>6 decimal digits precision</entry>
2001-01-13 19:34:51 +01:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>double precision</type></entry>
2003-11-01 02:56:29 +01:00
<entry>8 bytes</entry>
<entry>variable-precision, inexact</entry>
<entry>15 decimal digits precision</entry>
2001-01-13 19:34:51 +01:00
</row>
2011-06-22 04:52:52 +02:00
<row>
<entry><type>smallserial</type></entry>
<entry>2 bytes</entry>
<entry>small autoincrementing integer</entry>
<entry>1 to 32767</entry>
</row>
1999-08-06 15:43:42 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>serial</type></entry>
2003-11-01 02:56:29 +01:00
<entry>4 bytes</entry>
<entry>autoincrementing integer</entry>
<entry>1 to 2147483647</entry>
2001-08-16 22:38:56 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>bigserial</type></entry>
<entry>8 bytes</entry>
<entry>large autoincrementing integer</entry>
<entry>1 to 9223372036854775807</entry>
1999-08-06 15:43:42 +02:00
</row>
</tbody>
</tgroup>
</table>
1998-10-14 18:26:31 +02:00
1999-08-06 15:43:42 +02:00
<para>
2001-01-26 23:04:22 +01:00
The syntax of constants for the numeric types is described in
2017-11-23 15:39:47 +01:00
<xref linkend="sql-syntax-constants"/>. The numeric types have a
2001-01-26 23:04:22 +01:00
full set of corresponding arithmetic operators and
2017-11-23 15:39:47 +01:00
functions. Refer to <xref linkend="functions"/> for more
2001-08-24 22:03:45 +02:00
information. The following sections describe the types in detail.
1999-08-06 15:43:42 +02:00
</para>
2001-08-24 22:03:45 +02:00
<sect2 id="datatype-int">
2003-03-13 02:30:29 +01:00
<title>Integer Types</title>
2001-08-24 22:03:45 +02:00
2003-08-31 19:32:24 +02:00
<indexterm zone="datatype-int">
<primary>integer</primary>
</indexterm>
<indexterm zone="datatype-int">
<primary>smallint</primary>
</indexterm>
<indexterm zone="datatype-int">
<primary>bigint</primary>
</indexterm>
<indexterm>
<primary>int4</primary>
<see>integer</see>
</indexterm>
<indexterm>
<primary>int2</primary>
<see>smallint</see>
</indexterm>
<indexterm>
<primary>int8</primary>
<see>bigint</see>
</indexterm>
2001-08-24 22:03:45 +02:00
<para>
2003-03-13 02:30:29 +01:00
The types <type>smallint</type>, <type>integer</type>, and
2001-08-24 22:03:45 +02:00
<type>bigint</type> store whole numbers, that is, numbers without
fractional components, of various ranges. Attempts to store
values outside of the allowed range will result in an error.
</para>
<para>
2009-04-27 18:27:36 +02:00
The type <type>integer</type> is the common choice, as it offers
2001-08-24 22:03:45 +02:00
the best balance between range, storage size, and performance.
The <type>smallint</type> type is generally only used if disk
2012-08-30 19:13:35 +02:00
space is at a premium. The <type>bigint</type> type is designed to be
used when the range of the <type>integer</type> type is insufficient.
2001-08-24 22:03:45 +02:00
</para>
2001-11-19 10:05:02 +01:00
<para>
2002-11-15 04:11:18 +01:00
<acronym>SQL</acronym> only specifies the integer types
2007-07-27 12:37:52 +02:00
<type>integer</type> (or <type>int</type>),
<type>smallint</type>, and <type>bigint</type>. The
2002-11-15 04:11:18 +01:00
type names <type>int2</type>, <type>int4</type>, and
2009-06-17 23:58:49 +02:00
<type>int8</type> are extensions, which are also used by some
2002-11-15 04:11:18 +01:00
other <acronym>SQL</acronym> database systems.
2001-11-19 10:05:02 +01:00
</para>
2001-08-24 22:03:45 +02:00
</sect2>
<sect2 id="datatype-numeric-decimal">
<title>Arbitrary Precision Numbers</title>
2007-01-14 23:37:59 +01:00
<indexterm>
2003-08-31 19:32:24 +02:00
<primary>numeric (data type)</primary>
</indexterm>
2007-01-14 23:37:59 +01:00
<indexterm>
<primary>arbitrary precision numbers</primary>
</indexterm>
2003-08-31 19:32:24 +02:00
<indexterm>
<primary>decimal</primary>
<see>numeric</see>
</indexterm>
2001-08-24 22:03:45 +02:00
<para>
2011-04-04 01:56:22 +02:00
The type <type>numeric</type> can store numbers with a
2015-03-25 00:57:52 +01:00
very large number of digits. It is especially recommended for
storing monetary amounts and other quantities where exactness is
required. Calculations with <type>numeric</type> values yield exact
2020-09-01 00:33:37 +02:00
results where possible, e.g., addition, subtraction, multiplication.
2015-03-25 00:57:52 +01:00
However, calculations on <type>numeric</type> values are very slow
compared to the integer types, or to the floating-point types
described in the next section.
2001-08-24 22:03:45 +02:00
</para>
<para>
2009-04-27 18:27:36 +02:00
We use the following terms below: The
2018-06-25 00:07:00 +02:00
<firstterm>precision</firstterm> of a <type>numeric</type>
is the total count of significant digits in the whole number,
that is, the number of digits to both sides of the decimal point.
The <firstterm>scale</firstterm> of a <type>numeric</type> is the
count of decimal digits in the fractional part, to the right of the
decimal point. So the number 23.5141 has a precision of 6 and a
scale of 4. Integers can be considered to have a scale of zero.
2001-08-24 22:03:45 +02:00
</para>
<para>
2005-01-08 06:19:18 +01:00
Both the maximum precision and the maximum scale of a
<type>numeric</type> column can be
2001-08-24 22:03:45 +02:00
configured. To declare a column of type <type>numeric</type> use
2007-02-01 01:28:19 +01:00
the syntax:
2001-08-24 22:03:45 +02:00
<programlisting>
NUMERIC(<replaceable>precision</replaceable>, <replaceable>scale</replaceable>)
</programlisting>
Allow numeric scale to be negative or greater than precision.
Formerly, when specifying NUMERIC(precision, scale), the scale had to
be in the range [0, precision], which was per SQL spec. This commit
extends the range of allowed scales to [-1000, 1000], independent of
the precision (whose valid range remains [1, 1000]).
A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.
A scale greater than the precision supports fractional values with
zeros immediately after the decimal point.
Take the opportunity to tidy up the code that packs, unpacks and
validates the contents of a typmod integer, encapsulating it in a
small set of new inline functions.
Bump the catversion because the allowed contents of atttypmod have
changed for numeric columns. This isn't a change that requires a
re-initdb, but negative scale values in the typmod would confuse old
backends.
Dean Rasheed, with additional improvements by Tom Lane. Reviewed by
Tom Lane.
Discussion: https://postgr.es/m/CAEZATCWdNLgpKihmURF8nfofP0RFtAKJ7ktY6GcZOPnMfUoRqA@mail.gmail.com
2021-07-26 15:13:47 +02:00
The precision must be positive, while the scale may be positive or
negative (see below). Alternatively:
2001-08-24 22:03:45 +02:00
<programlisting>
NUMERIC(<replaceable>precision</replaceable>)
</programlisting>
2007-02-01 01:28:19 +01:00
selects a scale of 0. Specifying:
2001-08-24 22:03:45 +02:00
<programlisting>
NUMERIC
</programlisting>
2020-07-23 01:19:44 +02:00
without any precision or scale creates an <quote>unconstrained
numeric</quote> column in which numeric values of any length can be
stored, up to the implementation limits. A column of this kind will
2002-11-15 04:11:18 +01:00
not coerce input values to any particular scale, whereas
<type>numeric</type> columns with a declared scale will coerce
input values to that scale. (The <acronym>SQL</acronym> standard
requires a default scale of 0, i.e., coercion to integer
precision. We find this a bit useless. If you're concerned
about portability, always specify the precision and scale
explicitly.)
2001-08-24 22:03:45 +02:00
</para>
2011-04-04 01:56:22 +02:00
<note>
<para>
2020-07-23 01:19:44 +02:00
The maximum precision that can be explicitly specified in
Allow numeric scale to be negative or greater than precision.
Formerly, when specifying NUMERIC(precision, scale), the scale had to
be in the range [0, precision], which was per SQL spec. This commit
extends the range of allowed scales to [-1000, 1000], independent of
the precision (whose valid range remains [1, 1000]).
A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.
A scale greater than the precision supports fractional values with
zeros immediately after the decimal point.
Take the opportunity to tidy up the code that packs, unpacks and
validates the contents of a typmod integer, encapsulating it in a
small set of new inline functions.
Bump the catversion because the allowed contents of atttypmod have
changed for numeric columns. This isn't a change that requires a
re-initdb, but negative scale values in the typmod would confuse old
backends.
Dean Rasheed, with additional improvements by Tom Lane. Reviewed by
Tom Lane.
Discussion: https://postgr.es/m/CAEZATCWdNLgpKihmURF8nfofP0RFtAKJ7ktY6GcZOPnMfUoRqA@mail.gmail.com
2021-07-26 15:13:47 +02:00
a <type>numeric</type> type declaration is 1000. An
unconstrained <type>numeric</type> column is subject to the limits
2020-07-23 01:19:44 +02:00
described in <xref linkend="datatype-numeric-table"/>.
2011-04-04 01:56:22 +02:00
</para>
</note>
2001-08-24 22:03:45 +02:00
<para>
2005-01-08 06:19:18 +01:00
If the scale of a value to be stored is greater than the declared
scale of the column, the system will round the value to the specified
number of fractional digits. Then, if the number of digits to the
left of the decimal point exceeds the declared precision minus the
declared scale, an error is raised.
Allow numeric scale to be negative or greater than precision.
Formerly, when specifying NUMERIC(precision, scale), the scale had to
be in the range [0, precision], which was per SQL spec. This commit
extends the range of allowed scales to [-1000, 1000], independent of
the precision (whose valid range remains [1, 1000]).
A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.
A scale greater than the precision supports fractional values with
zeros immediately after the decimal point.
Take the opportunity to tidy up the code that packs, unpacks and
validates the contents of a typmod integer, encapsulating it in a
small set of new inline functions.
Bump the catversion because the allowed contents of atttypmod have
changed for numeric columns. This isn't a change that requires a
re-initdb, but negative scale values in the typmod would confuse old
backends.
Dean Rasheed, with additional improvements by Tom Lane. Reviewed by
Tom Lane.
Discussion: https://postgr.es/m/CAEZATCWdNLgpKihmURF8nfofP0RFtAKJ7ktY6GcZOPnMfUoRqA@mail.gmail.com
2021-07-26 15:13:47 +02:00
For example, a column declared as
<programlisting>
NUMERIC(3, 1)
</programlisting>
will round values to 1 decimal place and can store values between
-99.9 and 99.9, inclusive.
</para>
<para>
Beginning in <productname>PostgreSQL</productname> 15, it is allowed
to declare a <type>numeric</type> column with a negative scale. Then
values will be rounded to the left of the decimal point. The
precision still represents the maximum number of non-rounded digits.
Thus, a column declared as
<programlisting>
NUMERIC(2, -3)
</programlisting>
will round values to the nearest thousand and can store values
between -99000 and 99000, inclusive.
It is also allowed to declare a scale larger than the declared
precision. Such a column can only hold fractional values, and it
requires the number of zero digits just to the right of the decimal
point to be at least the declared scale minus the declared precision.
For example, a column declared as
<programlisting>
NUMERIC(3, 5)
</programlisting>
will round values to 5 decimal places and can store values between
-0.00999 and 0.00999, inclusive.
2005-01-08 06:19:18 +01:00
</para>
Allow numeric scale to be negative or greater than precision.
Formerly, when specifying NUMERIC(precision, scale), the scale had to
be in the range [0, precision], which was per SQL spec. This commit
extends the range of allowed scales to [-1000, 1000], independent of
the precision (whose valid range remains [1, 1000]).
A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.
A scale greater than the precision supports fractional values with
zeros immediately after the decimal point.
Take the opportunity to tidy up the code that packs, unpacks and
validates the contents of a typmod integer, encapsulating it in a
small set of new inline functions.
Bump the catversion because the allowed contents of atttypmod have
changed for numeric columns. This isn't a change that requires a
re-initdb, but negative scale values in the typmod would confuse old
backends.
Dean Rasheed, with additional improvements by Tom Lane. Reviewed by
Tom Lane.
Discussion: https://postgr.es/m/CAEZATCWdNLgpKihmURF8nfofP0RFtAKJ7ktY6GcZOPnMfUoRqA@mail.gmail.com
2021-07-26 15:13:47 +02:00
<note>
<para>
<productname>PostgreSQL</productname> permits the scale in a
<type>numeric</type> type declaration to be any value in the range
-1000 to 1000. However, the <acronym>SQL</acronym> standard requires
the scale to be in the range 0 to <replaceable>precision</replaceable>.
Using scales outside that range may not be portable to other database
systems.
</para>
</note>
2005-01-08 06:19:18 +01:00
<para>
Numeric values are physically stored without any extra leading or
trailing zeroes. Thus, the declared precision and scale of a column
2017-10-09 03:44:17 +02:00
are maximums, not fixed allocations. (In this sense the <type>numeric</type>
type is more akin to <type>varchar(<replaceable>n</replaceable>)</type>
than to <type>char(<replaceable>n</replaceable>)</type>.) The actual storage
2005-05-01 17:54:46 +02:00
requirement is two bytes for each group of four decimal digits,
2012-03-22 20:40:27 +01:00
plus three to eight bytes overhead.
2001-08-24 22:03:45 +02:00
</para>
2020-07-23 01:19:44 +02:00
<indexterm>
<primary>infinity</primary>
<secondary>numeric (data type)</secondary>
</indexterm>
2007-01-14 23:37:59 +01:00
<indexterm>
<primary>NaN</primary>
<see>not a number</see>
</indexterm>
<indexterm>
<primary>not a number</primary>
<secondary>numeric (data type)</secondary>
</indexterm>
2004-09-21 00:48:29 +02:00
<para>
2020-07-23 01:19:44 +02:00
In addition to ordinary numeric values, the <type>numeric</type> type
has several special values:
<literallayout>
<literal>Infinity</literal>
<literal>-Infinity</literal>
<literal>NaN</literal>
</literallayout>
These are adapted from the IEEE 754 standard, and represent
<quote>infinity</quote>, <quote>negative infinity</quote>, and
<quote>not-a-number</quote>, respectively. When writing these values
as constants in an SQL command, you must put quotes around them,
for example <literal>UPDATE table SET x = '-Infinity'</literal>.
On input, these strings are recognized in a case-insensitive manner.
The infinity values can alternatively be spelled <literal>inf</literal>
and <literal>-inf</literal>.
</para>
<para>
The infinity values behave as per mathematical expectations. For
example, <literal>Infinity</literal> plus any finite value equals
<literal>Infinity</literal>, as does <literal>Infinity</literal>
plus <literal>Infinity</literal>; but <literal>Infinity</literal>
minus <literal>Infinity</literal> yields <literal>NaN</literal> (not a
number), because it has no well-defined interpretation. Note that an
infinity can only be stored in an unconstrained <type>numeric</type>
column, because it notionally exceeds any finite precision limit.
</para>
<para>
The <literal>NaN</literal> (not a number) value is used to represent
undefined calculational results. In general, any operation with
a <literal>NaN</literal> input yields another <literal>NaN</literal>.
The only exception is when the operation's other inputs are such that
the same output would be obtained if the <literal>NaN</literal> were to
be replaced by any finite or infinite numeric value; then, that output
value is used for <literal>NaN</literal> too. (An example of this
principle is that <literal>NaN</literal> raised to the zero power
yields one.)
2004-09-21 00:48:29 +02:00
</para>
2007-01-14 23:37:59 +01:00
<note>
<para>
2017-10-09 03:44:17 +02:00
In most implementations of the <quote>not-a-number</quote> concept,
<literal>NaN</literal> is not considered equal to any other numeric
value (including <literal>NaN</literal>). In order to allow
<type>numeric</type> values to be sorted and used in tree-based
indexes, <productname>PostgreSQL</productname> treats <literal>NaN</literal>
values as equal, and greater than all non-<literal>NaN</literal>
2007-01-14 23:37:59 +01:00
values.
</para>
</note>
2001-08-24 22:03:45 +02:00
<para>
The types <type>decimal</type> and <type>numeric</type> are
2002-11-15 04:11:18 +01:00
equivalent. Both types are part of the <acronym>SQL</acronym>
standard.
2001-08-24 22:03:45 +02:00
</para>
2015-07-03 23:04:39 +02:00
<para>
When rounding values, the <type>numeric</type> type rounds ties away
from zero, while (on most machines) the <type>real</type>
and <type>double precision</type> types round ties to the nearest even
number. For example:
<programlisting>
SELECT x,
round(x::numeric) AS num_round,
round(x::double precision) AS dbl_round
FROM generate_series(-3.5, 3.5, 1) as x;
x | num_round | dbl_round
------+-----------+-----------
-3.5 | -4 | -4
-2.5 | -3 | -2
-1.5 | -2 | -2
-0.5 | -1 | -0
0.5 | 1 | 0
1.5 | 2 | 2
2.5 | 3 | 2
3.5 | 4 | 4
(8 rows)
</programlisting>
</para>
2001-08-24 22:03:45 +02:00
</sect2>
<sect2 id="datatype-float">
2002-01-07 03:29:15 +01:00
<title>Floating-Point Types</title>
2001-08-24 22:03:45 +02:00
2003-08-31 19:32:24 +02:00
<indexterm zone="datatype-float">
<primary>real</primary>
</indexterm>
<indexterm zone="datatype-float">
<primary>double precision</primary>
</indexterm>
<indexterm>
<primary>float4</primary>
<see>real</see>
</indexterm>
<indexterm>
<primary>float8</primary>
<see>double precision</see>
</indexterm>
<indexterm zone="datatype-float">
<primary>floating point</primary>
</indexterm>
2001-08-24 22:03:45 +02:00
<para>
Change floating-point output format for improved performance.
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
The data types <type>real</type> and <type>double precision</type> are
inexact, variable-precision numeric types. On all currently supported
platforms, these types are implementations of <acronym>IEEE</acronym>
Standard 754 for Binary Floating-Point Arithmetic (single and double
precision, respectively), to the extent that the underlying processor,
operating system, and compiler support it.
2001-08-24 22:03:45 +02:00
</para>
<para>
Inexact means that some values cannot be converted exactly to the
internal format and are stored as approximations, so that storing
2009-04-27 18:27:36 +02:00
and retrieving a value might show slight discrepancies.
2001-08-24 22:03:45 +02:00
Managing these errors and how they propagate through calculations
is the subject of an entire branch of mathematics and computer
2009-04-27 18:27:36 +02:00
science and will not be discussed here, except for the
2001-08-24 22:03:45 +02:00
following points:
<itemizedlist>
<listitem>
<para>
If you require exact storage and calculations (such as for
monetary amounts), use the <type>numeric</type> type instead.
</para>
</listitem>
<listitem>
<para>
If you want to do complicated calculations with these types
for anything important, especially if you rely on certain
behavior in boundary cases (infinity, underflow), you should
evaluate the implementation carefully.
</para>
</listitem>
<listitem>
<para>
2009-04-27 18:27:36 +02:00
Comparing two floating-point values for equality might not
always work as expected.
2001-08-24 22:03:45 +02:00
</para>
</listitem>
</itemizedlist>
</para>
<para>
Change floating-point output format for improved performance.
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
On all currently supported platforms, the <type>real</type> type has a
range of around 1E-37 to 1E+37 with a precision of at least 6 decimal
digits. The <type>double precision</type> type has a range of around
1E-307 to 1E+308 with a precision of at least 15 digits. Values that are
too large or too small will cause an error. Rounding might take place if
the precision of an input number is too high. Numbers too close to zero
that are not representable as distinct from zero will cause an underflow
error.
</para>
<para>
By default, floating point values are output in text form in their
shortest precise decimal representation; the decimal value produced is
closer to the true stored binary value than to any other value
representable in the same binary precision. (However, the output value is
currently never <emphasis>exactly</emphasis> midway between two
representable values, in order to avoid a widespread bug where input
2020-01-05 11:45:37 +01:00
routines do not properly respect the round-to-nearest-even rule.) This value will
Change floating-point output format for improved performance.
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
use at most 17 significant decimal digits for <type>float8</type>
values, and at most 9 digits for <type>float4</type> values.
2001-08-24 22:03:45 +02:00
</para>
2013-07-02 18:21:16 +02:00
<note>
<para>
Change floating-point output format for improved performance.
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
This shortest-precise output format is much faster to generate than the
historical rounded format.
</para>
</note>
<para>
For compatibility with output generated by older versions
of <productname>PostgreSQL</productname>, and to allow the output
precision to be reduced, the <xref linkend="guc-extra-float-digits"/>
parameter can be used to select rounded decimal output instead. Setting a
value of 0 restores the previous default of rounding the value to 6
(for <type>float4</type>) or 15 (for <type>float8</type>)
significant decimal digits. Setting a negative value reduces the number
of digits further; for example -2 would round output to 4 or 13 digits
respectively.
</para>
<para>
Any value of <xref linkend="guc-extra-float-digits"/> greater than 0
selects the shortest-precise format.
</para>
<note>
<para>
Applications that wanted precise values have historically had to set
2019-05-11 04:23:55 +02:00
<xref linkend="guc-extra-float-digits"/> to 3 to obtain them. For
maximum compatibility between versions, they should continue to do so.
2013-07-02 18:21:16 +02:00
</para>
</note>
2020-07-23 01:19:44 +02:00
<indexterm>
<primary>infinity</primary>
<secondary>floating point</secondary>
</indexterm>
2007-01-14 23:37:59 +01:00
<indexterm>
<primary>not a number</primary>
2020-07-23 01:19:44 +02:00
<secondary>floating point</secondary>
2007-01-14 23:37:59 +01:00
</indexterm>
2004-09-21 00:48:29 +02:00
<para>
In addition to ordinary numeric values, the floating-point types
have several special values:
<literallayout>
<literal>Infinity</literal>
<literal>-Infinity</literal>
<literal>NaN</literal>
</literallayout>
These represent the IEEE 754 special values
<quote>infinity</quote>, <quote>negative infinity</quote>, and
Change floating-point output format for improved performance.
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
<quote>not-a-number</quote>, respectively. When writing these values
2009-06-17 23:58:49 +02:00
as constants in an SQL command, you must put quotes around them,
2017-10-09 03:44:17 +02:00
for example <literal>UPDATE table SET x = '-Infinity'</literal>. On input,
2004-09-21 00:48:29 +02:00
these strings are recognized in a case-insensitive manner.
2020-07-23 01:19:44 +02:00
The infinity values can alternatively be spelled <literal>inf</literal>
and <literal>-inf</literal>.
2004-09-21 00:48:29 +02:00
</para>
2007-01-14 23:37:59 +01:00
<note>
<para>
2020-07-23 01:19:44 +02:00
IEEE 754 specifies that <literal>NaN</literal> should not compare equal
2017-10-09 03:44:17 +02:00
to any other floating-point value (including <literal>NaN</literal>).
2007-01-14 23:37:59 +01:00
In order to allow floating-point values to be sorted and used
2017-10-09 03:44:17 +02:00
in tree-based indexes, <productname>PostgreSQL</productname> treats
<literal>NaN</literal> values as equal, and greater than all
non-<literal>NaN</literal> values.
2007-01-14 23:37:59 +01:00
</para>
</note>
2003-06-18 01:12:36 +02:00
<para>
<productname>PostgreSQL</productname> also supports the SQL-standard
notations <type>float</type> and
<type>float(<replaceable>p</replaceable>)</type> for specifying
inexact numeric types. Here, <replaceable>p</replaceable> specifies
2017-10-09 03:44:17 +02:00
the minimum acceptable precision in <emphasis>binary</emphasis> digits.
2010-11-23 21:27:50 +01:00
<productname>PostgreSQL</productname> accepts
2003-06-18 01:12:36 +02:00
<type>float(1)</type> to <type>float(24)</type> as selecting the
2010-11-23 21:27:50 +01:00
<type>real</type> type, while
2003-06-18 01:12:36 +02:00
<type>float(25)</type> to <type>float(53)</type> select
<type>double precision</type>. Values of <replaceable>p</replaceable>
outside the allowed range draw an error.
<type>float</type> with no precision specified is taken to mean
<type>double precision</type>.
</para>
2001-08-24 22:03:45 +02:00
</sect2>
1999-08-06 15:43:42 +02:00
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-serial">
2003-03-13 02:30:29 +01:00
<title>Serial Types</title>
1999-08-06 15:43:42 +02:00
2011-06-22 04:52:52 +02:00
<indexterm zone="datatype-serial">
<primary>smallserial</primary>
</indexterm>
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype-serial">
<primary>serial</primary>
</indexterm>
2001-10-30 21:13:44 +01:00
<indexterm zone="datatype-serial">
<primary>bigserial</primary>
</indexterm>
2011-06-22 04:52:52 +02:00
<indexterm zone="datatype-serial">
<primary>serial2</primary>
</indexterm>
2001-08-16 22:38:56 +02:00
<indexterm zone="datatype-serial">
<primary>serial4</primary>
</indexterm>
<indexterm zone="datatype-serial">
<primary>serial8</primary>
</indexterm>
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>auto-increment</primary>
<see>serial</see>
</indexterm>
<indexterm>
2003-08-31 19:32:24 +02:00
<primary>sequence</primary>
2001-05-13 00:51:36 +02:00
<secondary>and serial type</secondary>
</indexterm>
2017-08-23 01:55:21 +02:00
<note>
<para>
This section describes a PostgreSQL-specific way to create an
autoincrementing column. Another way is to use the SQL-standard
2024-01-16 09:42:40 +01:00
identity column feature, described at <xref linkend="ddl-identity-columns"/>.
2017-08-23 01:55:21 +02:00
</para>
</note>
1999-08-06 15:43:42 +02:00
<para>
2011-08-07 15:11:55 +02:00
The data types <type>smallserial</type>, <type>serial</type> and
2011-06-22 04:52:52 +02:00
<type>bigserial</type> are not true types, but merely
2009-04-27 18:27:36 +02:00
a notational convenience for creating unique identifier columns
2002-12-06 06:17:42 +01:00
(similar to the <literal>AUTO_INCREMENT</literal> property
supported by some other databases). In the current
2007-02-01 01:28:19 +01:00
implementation, specifying:
1999-08-06 15:43:42 +02:00
2001-08-24 22:03:45 +02:00
<programlisting>
2001-10-09 20:46:00 +02:00
CREATE TABLE <replaceable class="parameter">tablename</replaceable> (
<replaceable class="parameter">colname</replaceable> SERIAL
);
2001-08-24 22:03:45 +02:00
</programlisting>
1998-10-14 18:26:31 +02:00
1999-08-06 15:43:42 +02:00
is equivalent to specifying:
1998-10-14 18:26:31 +02:00
2001-08-24 22:03:45 +02:00
<programlisting>
2019-04-08 22:03:48 +02:00
CREATE SEQUENCE <replaceable class="parameter">tablename</replaceable>_<replaceable class="parameter">colname</replaceable>_seq AS integer;
2001-10-09 20:46:00 +02:00
CREATE TABLE <replaceable class="parameter">tablename</replaceable> (
2006-08-21 02:57:26 +02:00
<replaceable class="parameter">colname</replaceable> integer NOT NULL DEFAULT nextval('<replaceable class="parameter">tablename</replaceable>_<replaceable class="parameter">colname</replaceable>_seq')
2001-10-09 20:46:00 +02:00
);
2006-08-21 02:57:26 +02:00
ALTER SEQUENCE <replaceable class="parameter">tablename</replaceable>_<replaceable class="parameter">colname</replaceable>_seq OWNED BY <replaceable class="parameter">tablename</replaceable>.<replaceable class="parameter">colname</replaceable>;
2001-08-24 22:03:45 +02:00
</programlisting>
1999-08-06 15:43:42 +02:00
2001-08-16 22:38:56 +02:00
Thus, we have created an integer column and arranged for its default
2017-10-09 03:44:17 +02:00
values to be assigned from a sequence generator. A <literal>NOT NULL</literal>
2009-06-17 23:58:49 +02:00
constraint is applied to ensure that a null value cannot be
2009-04-27 18:27:36 +02:00
inserted. (In most cases you would also want to attach a
2017-10-09 03:44:17 +02:00
<literal>UNIQUE</literal> or <literal>PRIMARY KEY</literal> constraint to prevent
2002-08-19 21:33:36 +02:00
duplicate values from being inserted by accident, but this is
2017-10-09 03:44:17 +02:00
not automatic.) Lastly, the sequence is marked as <quote>owned by</quote>
2006-08-21 02:57:26 +02:00
the column, so that it will be dropped if the column or table is dropped.
2001-10-09 20:46:00 +02:00
</para>
2001-08-16 22:38:56 +02:00
2012-08-06 21:18:00 +02:00
<note>
<para>
Because <type>smallserial</type>, <type>serial</type> and
2012-08-06 22:12:17 +02:00
<type>bigserial</type> are implemented using sequences, there may
2012-08-06 21:18:00 +02:00
be "holes" or gaps in the sequence of values which appears in the
2012-08-06 22:12:17 +02:00
column, even if no rows are ever deleted. A value allocated
2012-08-06 21:18:00 +02:00
from the sequence is still "used up" even if a row containing that
value is never successfully inserted into the table column. This
may happen, for example, if the inserting transaction rolls back.
2017-11-23 15:39:47 +01:00
See <literal>nextval()</literal> in <xref linkend="functions-sequence"/>
2012-08-06 21:18:00 +02:00
for details.
</para>
</note>
2002-12-06 06:17:42 +01:00
<para>
2003-10-16 06:52:21 +02:00
To insert the next value of the sequence into the <type>serial</type>
column, specify that the <type>serial</type>
column should be assigned its default value. This can be done
either by excluding the column from the list of columns in
2002-12-06 06:17:42 +01:00
the <command>INSERT</command> statement, or through the use of
2003-11-01 02:56:29 +01:00
the <literal>DEFAULT</literal> key word.
2002-12-06 06:17:42 +01:00
</para>
2001-08-16 22:38:56 +02:00
<para>
The type names <type>serial</type> and <type>serial4</type> are
equivalent: both create <type>integer</type> columns. The type
2009-04-27 18:27:36 +02:00
names <type>bigserial</type> and <type>serial8</type> work
2001-10-30 21:13:44 +01:00
the same way, except that they create a <type>bigint</type>
column. <type>bigserial</type> should be used if you anticipate
2017-10-09 03:44:17 +02:00
the use of more than 2<superscript>31</superscript> identifiers over the
2011-06-22 04:52:52 +02:00
lifetime of the table. The type names <type>smallserial</type> and
2012-04-12 16:43:39 +02:00
<type>serial2</type> also work the same way, except that they
2011-06-22 04:52:52 +02:00
create a <type>smallint</type> column.
2001-08-16 22:38:56 +02:00
</para>
<para>
2003-10-16 06:52:21 +02:00
The sequence created for a <type>serial</type> column is
2006-08-21 02:57:26 +02:00
automatically dropped when the owning column is dropped.
You can drop the sequence without dropping the column, but this
will force removal of the column default expression.
1999-08-06 15:43:42 +02:00
</para>
</sect2>
</sect1>
1998-10-14 18:26:31 +02:00
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-money">
2003-03-13 02:30:29 +01:00
<title>Monetary Types</title>
1999-08-06 15:43:42 +02:00
<para>
2003-03-13 02:30:29 +01:00
The <type>money</type> type stores a currency amount with a fixed
fractional precision; see <xref
2017-11-23 15:39:47 +01:00
linkend="datatype-money-table"/>. The fractional precision is
determined by the database's <xref linkend="guc-lc-monetary"/> setting.
2010-07-16 04:15:56 +02:00
The range shown in the table assumes there are two fractional digits.
2001-01-26 23:04:22 +01:00
Input is accepted in a variety of formats, including integer and
2009-04-27 18:27:36 +02:00
floating-point literals, as well as typical
2001-01-26 23:04:22 +01:00
currency formatting, such as <literal>'$1,000.00'</literal>.
2003-03-13 02:30:29 +01:00
Output is generally in the latter form but depends on the locale.
2007-11-05 13:02:20 +01:00
</para>
2002-11-11 21:14:04 +01:00
<table id="datatype-money-table">
2001-02-14 20:37:26 +01:00
<title>Monetary Types</title>
1999-08-06 15:43:42 +02:00
<tgroup cols="4">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="2*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="2*"/>
<colspec colname="col4" colwidth="2*"/>
1999-08-06 15:43:42 +02:00
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>Storage Size</entry>
<entry>Description</entry>
<entry>Range</entry>
1999-08-06 15:43:42 +02:00
</row>
</thead>
<tbody>
<row>
2017-05-02 20:33:19 +02:00
<entry><type>money</type></entry>
2007-04-06 21:22:38 +02:00
<entry>8 bytes</entry>
2003-11-01 02:56:29 +01:00
<entry>currency amount</entry>
2007-04-06 21:22:38 +02:00
<entry>-92233720368547758.08 to +92233720368547758.07</entry>
1999-08-06 15:43:42 +02:00
</row>
</tbody>
</tgroup>
</table>
2010-07-16 04:15:56 +02:00
<para>
Since the output of this data type is locale-sensitive, it might not
2017-10-09 03:44:17 +02:00
work to load <type>money</type> data into a database that has a different
setting of <varname>lc_monetary</varname>. To avoid problems, before
restoring a dump into a new database make sure <varname>lc_monetary</varname> has
2010-07-16 04:15:56 +02:00
the same or equivalent value as in the database that was dumped.
</para>
<para>
2011-04-05 15:35:43 +02:00
Values of the <type>numeric</type>, <type>int</type>, and
<type>bigint</type> data types can be cast to <type>money</type>.
Conversion from the <type>real</type> and <type>double precision</type>
data types can be done by casting to <type>numeric</type> first, for
example:
2010-07-16 04:15:56 +02:00
<programlisting>
2011-04-05 15:35:43 +02:00
SELECT '12.34'::float8::numeric::money;
2010-07-16 04:15:56 +02:00
</programlisting>
2011-04-05 15:35:43 +02:00
However, this is not recommended. Floating point numbers should not be
used to handle money due to the potential for rounding errors.
</para>
<para>
2010-07-16 04:15:56 +02:00
A <type>money</type> value can be cast to <type>numeric</type> without
loss of precision. Conversion to other types could potentially lose
2011-04-05 15:35:43 +02:00
precision, and must also be done in two stages:
2010-07-16 04:15:56 +02:00
<programlisting>
SELECT '52093.89'::money::numeric::float8;
</programlisting>
</para>
<para>
2017-05-21 19:05:16 +02:00
Division of a <type>money</type> value by an integer value is performed
with truncation of the fractional part towards zero. To get a rounded
result, divide by a floating-point value, or cast the <type>money</type>
2017-10-09 03:44:17 +02:00
value to <type>numeric</type> before dividing and back to <type>money</type>
2017-05-21 19:05:16 +02:00
afterwards. (The latter is preferable to avoid risking precision loss.)
2010-07-16 04:15:56 +02:00
When a <type>money</type> value is divided by another <type>money</type>
value, the result is <type>double precision</type> (i.e., a pure number,
not money); the currency units cancel each other out in the division.
</para>
1999-08-06 15:43:42 +02:00
</sect1>
1998-03-01 09:16:16 +01:00
2001-01-26 23:04:22 +01:00
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-character">
1999-08-06 15:43:42 +02:00
<title>Character Types</title>
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype-character">
2003-08-31 19:32:24 +02:00
<primary>character string</primary>
2001-05-13 00:51:36 +02:00
<secondary>data types</secondary>
</indexterm>
<indexterm>
2003-08-31 19:32:24 +02:00
<primary>string</primary>
<see>character string</see>
2001-05-13 00:51:36 +02:00
</indexterm>
2003-08-31 19:32:24 +02:00
<indexterm zone="datatype-character">
<primary>character</primary>
</indexterm>
<indexterm zone="datatype-character">
<primary>character varying</primary>
</indexterm>
<indexterm zone="datatype-character">
2001-05-13 00:51:36 +02:00
<primary>text</primary>
2003-08-31 19:32:24 +02:00
</indexterm>
<indexterm zone="datatype-character">
<primary>char</primary>
</indexterm>
<indexterm zone="datatype-character">
<primary>varchar</primary>
2001-05-13 00:51:36 +02:00
</indexterm>
2022-09-28 18:31:36 +02:00
<indexterm zone="datatype-character">
<primary>bpchar</primary>
</indexterm>
2002-11-11 21:14:04 +01:00
<table id="datatype-character-table">
2001-02-14 20:37:26 +01:00
<title>Character Types</title>
2001-08-08 00:41:49 +02:00
<tgroup cols="2">
1999-08-06 15:43:42 +02:00
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>Description</entry>
1999-08-06 15:43:42 +02:00
</row>
</thead>
<tbody>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>character varying(<replaceable>n</replaceable>)</type>, <type>varchar(<replaceable>n</replaceable>)</type></entry>
2003-11-01 02:56:29 +01:00
<entry>variable-length with limit</entry>
1999-08-06 15:43:42 +02:00
</row>
2003-01-15 19:01:05 +01:00
<row>
2022-09-28 18:31:36 +02:00
<entry><type>character(<replaceable>n</replaceable>)</type>, <type>char(<replaceable>n</replaceable>)</type>, <type>bpchar(<replaceable>n</replaceable>)</type></entry>
2023-10-31 15:13:11 +01:00
<entry>fixed-length, blank-padded</entry>
</row>
<row>
<entry><type>bpchar</type></entry>
<entry>variable unlimited length, blank-trimmed</entry>
2003-01-15 19:01:05 +01:00
</row>
1999-08-06 15:43:42 +02:00
<row>
2003-11-01 02:56:29 +01:00
<entry><type>text</type></entry>
<entry>variable unlimited length</entry>
1999-08-06 15:43:42 +02:00
</row>
2001-09-04 05:17:54 +02:00
</tbody>
1999-08-06 15:43:42 +02:00
</tgroup>
</table>
2001-01-13 19:34:51 +01:00
2002-11-11 21:14:04 +01:00
<para>
2017-11-23 15:39:47 +01:00
<xref linkend="datatype-character-table"/> shows the
2002-11-15 04:11:18 +01:00
general-purpose character types available in
<productname>PostgreSQL</productname>.
2002-11-11 21:14:04 +01:00
</para>
2001-05-21 18:54:46 +02:00
<para>
<acronym>SQL</acronym> defines two primary character types:
2017-10-09 03:44:17 +02:00
<type>character varying(<replaceable>n</replaceable>)</type> and
<type>character(<replaceable>n</replaceable>)</type>, where <replaceable>n</replaceable>
2003-01-15 19:01:05 +01:00
is a positive integer. Both of these types can store strings up to
2017-10-09 03:44:17 +02:00
<replaceable>n</replaceable> characters (not bytes) in length. An attempt to store a
2001-05-21 18:54:46 +02:00
longer string into a column of these types will result in an
error, unless the excess characters are all spaces, in which case
2003-01-15 19:01:05 +01:00
the string will be truncated to the maximum length. (This somewhat
bizarre exception is required by the <acronym>SQL</acronym>
2022-09-28 18:31:36 +02:00
standard.)
However, if one explicitly casts a value to <type>character
varying(<replaceable>n</replaceable>)</type> or
<type>character(<replaceable>n</replaceable>)</type>, then an over-length
value will be truncated to <replaceable>n</replaceable> characters without
raising an error. (This too is required by the
<acronym>SQL</acronym> standard.)
If the string to be stored is shorter than the declared
2003-01-15 19:01:05 +01:00
length, values of type <type>character</type> will be space-padded;
values of type <type>character varying</type> will simply store the
shorter
2002-11-15 04:11:18 +01:00
string.
1999-08-06 15:43:42 +02:00
</para>
1998-12-18 17:11:12 +01:00
2006-04-23 05:39:52 +02:00
<para>
2022-09-28 18:31:36 +02:00
In addition, <productname>PostgreSQL</productname> provides the
<type>text</type> type, which stores strings of any length.
Although the <type>text</type> type is not in the
<acronym>SQL</acronym> standard, several other SQL database
management systems have it as well.
<type>text</type> is <productname>PostgreSQL</productname>'s native
string data type, in that most built-in functions operating on strings
are declared to take or return <type>text</type> not <type>character
varying</type>. For many purposes, <type>character varying</type>
acts as though it were a <link linkend="domains">domain</link>
over <type>text</type>.
2006-04-23 05:39:52 +02:00
</para>
2001-05-21 18:54:46 +02:00
<para>
2022-09-28 18:31:36 +02:00
The type name <type>varchar</type> is an alias for <type>character
2023-10-31 15:13:11 +01:00
varying</type>, while <type>bpchar</type> (with length specifier) and
<type>char</type> are aliases for <type>character</type>. The
<type>varchar</type> and <type>char</type> aliases are defined in the
<acronym>SQL</acronym> standard; <type>bpchar</type> is a
<productname>PostgreSQL</productname> extension.
2001-05-21 18:54:46 +02:00
</para>
<para>
2022-09-28 18:31:36 +02:00
If specified, the length <replaceable>n</replaceable> must be greater
2023-10-31 15:13:11 +01:00
than zero and cannot exceed 10,485,760. If <type>character
varying</type> (or <type>varchar</type>) is used without
length specifier, the type accepts strings of any length. If
<type>bpchar</type> lacks a length specifier, it also accepts strings
of any length, but trailing spaces are semantically insignificant.
If <type>character</type> (or <type>char</type>) lacks a specifier,
it is equivalent to <type>character(1)</type>.
2001-05-21 18:54:46 +02:00
</para>
2004-02-01 07:27:48 +01:00
<para>
Values of type <type>character</type> are physically padded
2017-10-09 03:44:17 +02:00
with spaces to the specified width <replaceable>n</replaceable>, and are
2014-02-24 18:09:23 +01:00
stored and displayed that way. However, trailing spaces are treated as
semantically insignificant and disregarded when comparing two values
of type <type>character</type>. In collations where whitespace
2015-11-11 04:11:39 +01:00
is significant, this behavior can produce unexpected results;
for example <command>SELECT 'a '::CHAR(2) collate "C" <
2017-10-09 03:44:17 +02:00
E'a\n'::CHAR(2)</command> returns true, even though <literal>C</literal>
2015-11-11 04:11:39 +01:00
locale would consider a space to be greater than a newline.
2014-02-24 18:09:23 +01:00
Trailing spaces are removed when converting a <type>character</type> value
2004-02-01 07:27:48 +01:00
to one of the other string types. Note that trailing spaces
2017-10-09 03:44:17 +02:00
<emphasis>are</emphasis> semantically significant in
2011-03-08 17:03:02 +01:00
<type>character varying</type> and <type>text</type> values, and
2017-10-09 03:44:17 +02:00
when using pattern matching, that is <literal>LIKE</literal> and
2011-03-08 17:03:02 +01:00
regular expressions.
2004-02-01 07:27:48 +01:00
</para>
2020-12-08 18:06:19 +01:00
<para>
The characters that can be stored in any of these data types are
determined by the database character set, which is selected when
the database is created. Regardless of the specific character set,
the character with code zero (sometimes called NUL) cannot be stored.
For more information refer to <xref linkend="multibyte"/>.
</para>
2001-05-21 18:54:46 +02:00
<para>
2007-04-06 21:22:38 +02:00
The storage requirement for a short string (up to 126 bytes) is 1 byte
plus the actual string, which includes the space padding in the case of
2009-04-27 18:27:36 +02:00
<type>character</type>. Longer strings have 4 bytes of overhead instead
2007-04-06 21:22:38 +02:00
of 1. Long strings are compressed by the system automatically, so
the physical requirement on disk might be less. Very long values are also
stored in background tables so that they do not interfere with rapid
access to shorter column values. In any case, the longest
2016-08-05 20:35:09 +02:00
possible character string that can be stored is about 1 GB. (The
2017-10-09 03:44:17 +02:00
maximum value that will be allowed for <replaceable>n</replaceable> in the data
2009-04-27 18:27:36 +02:00
type declaration is less than that. It wouldn't be useful to
2002-07-16 06:45:59 +02:00
change this because with multibyte character encodings the number of
2009-04-27 18:27:36 +02:00
characters and bytes can be quite different. If you desire to
2002-07-16 06:45:59 +02:00
store long strings with no specific upper limit, use
<type>text</type> or <type>character varying</type> without a length
specifier, rather than making up an arbitrary length limit.)
2001-05-21 18:54:46 +02:00
</para>
<tip>
<para>
2009-06-17 23:58:49 +02:00
There is no performance difference among these three types,
2009-04-27 18:27:36 +02:00
apart from increased storage space when using the blank-padded
type, and a few extra CPU cycles to check the length when storing into
2007-04-06 21:22:38 +02:00
a length-constrained column. While
2017-10-09 03:44:17 +02:00
<type>character(<replaceable>n</replaceable>)</type> has performance
2009-04-27 18:27:36 +02:00
advantages in some other database systems, there is no such advantage in
2009-06-17 23:58:49 +02:00
<productname>PostgreSQL</productname>; in fact
2017-10-09 03:44:17 +02:00
<type>character(<replaceable>n</replaceable>)</type> is usually the slowest of
2016-02-03 20:17:35 +01:00
the three because of its additional storage costs. In most situations
2004-02-01 07:27:48 +01:00
<type>text</type> or <type>character varying</type> should be used
instead.
2001-05-21 18:54:46 +02:00
</para>
</tip>
2001-08-08 00:41:49 +02:00
<para>
2017-11-23 15:39:47 +01:00
Refer to <xref linkend="sql-syntax-strings"/> for information about
the syntax of string literals, and to <xref linkend="functions"/>
2020-12-08 18:06:19 +01:00
for information about available operators and functions.
2001-08-08 00:41:49 +02:00
</para>
2001-05-21 18:54:46 +02:00
<example>
2011-01-29 19:00:18 +01:00
<title>Using the Character Types</title>
2001-05-21 18:54:46 +02:00
<programlisting>
CREATE TABLE test1 (a character(4));
INSERT INTO test1 VALUES ('ok');
2017-11-23 15:39:47 +01:00
SELECT a, char_length(a) FROM test1; -- <co id="co.datatype-char"/>
2001-05-21 18:54:46 +02:00
<computeroutput>
a | char_length
------+-------------
2004-02-01 07:55:07 +01:00
ok | 2
2001-05-21 18:54:46 +02:00
</computeroutput>
CREATE TABLE test2 (b varchar(5));
INSERT INTO test2 VALUES ('ok');
INSERT INTO test2 VALUES ('good ');
INSERT INTO test2 VALUES ('too long');
<computeroutput>ERROR: value too long for type character varying(5)</computeroutput>
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
INSERT INTO test2 VALUES ('too long'::varchar(5)); -- explicit truncation
2001-05-21 18:54:46 +02:00
SELECT b, char_length(b) FROM test2;
<computeroutput>
b | char_length
-------+-------------
ok | 2
good | 5
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
too l | 5
2001-05-21 18:54:46 +02:00
</computeroutput>
</programlisting>
<calloutlist>
<callout arearefs="co.datatype-char">
<para>
The <function>char_length</function> function is discussed in
2017-11-23 15:39:47 +01:00
<xref linkend="functions-string"/>.
2001-05-21 18:54:46 +02:00
</para>
</callout>
</calloutlist>
</example>
1999-05-27 17:47:28 +02:00
<para>
2001-01-13 19:34:51 +01:00
There are two other fixed-length character types in
2003-01-15 19:01:05 +01:00
<productname>PostgreSQL</productname>, shown in <xref
2022-08-02 16:29:35 +02:00
linkend="datatype-character-special-table"/>.
These are not intended for general-purpose use, only for use
in the internal system catalogs.
The <type>name</type> type is used to store identifiers. Its
2003-01-15 19:01:05 +01:00
length is currently defined as 64 bytes (63 usable characters plus
terminator) but should be referenced using the constant
2017-10-09 03:44:17 +02:00
<symbol>NAMEDATALEN</symbol> in <literal>C</literal> source code.
2009-04-27 18:27:36 +02:00
The length is set at compile time (and
2003-01-15 19:01:05 +01:00
is therefore adjustable for special uses); the default maximum
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
length might change in a future release. The type <type>"char"</type>
2003-01-15 19:01:05 +01:00
(note the quotes) is different from <type>char(1)</type> in that it
2022-08-02 16:29:35 +02:00
only uses one byte of storage, and therefore can store only a single
ASCII character. It is used in the system
2009-04-27 18:27:36 +02:00
catalogs as a simplistic enumeration type.
1999-05-27 17:47:28 +02:00
</para>
1998-03-01 09:16:16 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-character-special-table">
2003-03-13 02:30:29 +01:00
<title>Special Character Types</title>
1999-08-06 15:43:42 +02:00
<tgroup cols="3">
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>Storage Size</entry>
<entry>Description</entry>
1999-08-06 15:43:42 +02:00
</row>
</thead>
<tbody>
2001-01-13 19:34:51 +01:00
<row>
2003-11-01 02:56:29 +01:00
<entry><type>"char"</type></entry>
<entry>1 byte</entry>
2007-04-06 21:22:38 +02:00
<entry>single-byte internal type</entry>
2001-01-13 19:34:51 +01:00
</row>
1999-08-06 15:43:42 +02:00
<row>
2003-11-01 02:56:29 +01:00
<entry><type>name</type></entry>
<entry>64 bytes</entry>
<entry>internal type for object names</entry>
1999-08-06 15:43:42 +02:00
</row>
</tbody>
</tgroup>
</table>
</sect1>
1998-12-18 17:11:12 +01:00
2001-11-20 16:42:44 +01:00
<sect1 id="datatype-binary">
2003-03-13 02:30:29 +01:00
<title>Binary Data Types</title>
2003-08-31 19:32:24 +02:00
<indexterm zone="datatype-binary">
<primary>binary data</primary>
</indexterm>
<indexterm zone="datatype-binary">
<primary>bytea</primary>
</indexterm>
2001-11-20 16:42:44 +01:00
<para>
2002-11-11 21:14:04 +01:00
The <type>bytea</type> data type allows storage of binary strings;
2017-11-23 15:39:47 +01:00
see <xref linkend="datatype-binary-table"/>.
2001-11-20 16:42:44 +01:00
</para>
2002-11-11 21:14:04 +01:00
<table id="datatype-binary-table">
2003-03-13 02:30:29 +01:00
<title>Binary Data Types</title>
2001-11-20 16:42:44 +01:00
<tgroup cols="3">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="3*"/>
<colspec colname="col3" colwidth="2*"/>
2001-11-20 16:42:44 +01:00
<thead>
<row>
2003-03-13 02:30:29 +01:00
<entry>Name</entry>
<entry>Storage Size</entry>
2001-11-20 16:42:44 +01:00
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
2002-01-20 23:19:57 +01:00
<entry><type>bytea</type></entry>
2007-04-06 21:22:38 +02:00
<entry>1 or 4 bytes plus the actual binary string</entry>
2003-03-13 02:30:29 +01:00
<entry>variable-length binary string</entry>
2001-11-20 16:42:44 +01:00
</row>
</tbody>
</tgroup>
</table>
<para>
2002-11-11 21:14:04 +01:00
A binary string is a sequence of octets (or bytes). Binary
2009-04-27 18:27:36 +02:00
strings are distinguished from character strings in two
2009-08-04 18:08:37 +02:00
ways. First, binary strings specifically allow storing
2003-03-13 02:30:29 +01:00
octets of value zero and other <quote>non-printable</quote>
2018-09-22 01:55:07 +02:00
octets (usually, octets outside the decimal range 32 to 126).
2005-01-08 06:19:18 +01:00
Character strings disallow zero octets, and also disallow any
other octet values and sequences of octet values that are invalid
according to the database's selected character set encoding.
2003-11-30 21:55:09 +01:00
Second, operations on binary strings process the actual bytes,
2005-01-08 06:19:18 +01:00
whereas the processing of character strings depends on locale settings.
In short, binary strings are appropriate for storing data that the
2017-10-09 03:44:17 +02:00
programmer thinks of as <quote>raw bytes</quote>, whereas character
2005-01-08 06:19:18 +01:00
strings are appropriate for storing text.
2001-11-20 16:42:44 +01:00
</para>
2001-09-09 19:21:59 +02:00
<para>
2018-09-22 01:55:07 +02:00
The <type>bytea</type> type supports two
formats for input and output: <quote>hex</quote> format
and <productname>PostgreSQL</productname>'s historical
<quote>escape</quote> format. Both
2009-08-04 18:08:37 +02:00
of these are always accepted on input. The output format depends
2017-11-23 15:39:47 +01:00
on the configuration parameter <xref linkend="guc-bytea-output"/>;
2009-08-04 18:08:37 +02:00
the default is hex. (Note that the hex format was introduced in
2010-02-17 05:19:41 +01:00
<productname>PostgreSQL</productname> 9.0; earlier versions and some
2009-08-04 18:08:37 +02:00
tools don't understand it.)
</para>
<para>
The <acronym>SQL</acronym> standard defines a different binary
string type, called <type>BLOB</type> or <type>BINARY LARGE
OBJECT</type>. The input format is different from
<type>bytea</type>, but the provided functions and operators are
mostly the same.
</para>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-binary-bytea-hex-format">
2017-10-09 03:44:17 +02:00
<title><type>bytea</type> Hex Format</title>
2009-08-04 18:08:37 +02:00
<para>
2017-10-09 03:44:17 +02:00
The <quote>hex</quote> format encodes binary data as 2 hexadecimal digits
2009-08-04 18:08:37 +02:00
per byte, most significant nibble first. The entire string is
preceded by the sequence <literal>\x</literal> (to distinguish it
from the escape format). In some contexts, the initial backslash may
2019-02-08 18:49:36 +01:00
need to be escaped by doubling it
(see <xref linkend="sql-syntax-strings"/>).
For input, the hexadecimal digits can
2009-08-04 18:08:37 +02:00
be either upper or lower case, and whitespace is permitted between
digit pairs (but not within a digit pair nor in the starting
<literal>\x</literal> sequence).
The hex format is compatible with a wide
range of external applications and protocols, and it tends to be
faster to convert than the escape format, so its use is preferred.
</para>
<para>
Example:
<programlisting>
2023-03-18 21:11:22 +01:00
SET bytea_output = 'hex';
SELECT '\xDEADBEEF'::bytea;
bytea
------------
\xdeadbeef
2009-08-04 18:08:37 +02:00
</programlisting>
</para>
</sect2>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-binary-bytea-escape-format">
2017-10-09 03:44:17 +02:00
<title><type>bytea</type> Escape Format</title>
2009-08-04 18:08:37 +02:00
<para>
The <quote>escape</quote> format is the traditional
<productname>PostgreSQL</productname> format for the <type>bytea</type>
type. It
takes the approach of representing a binary string as a sequence
of ASCII characters, while converting those bytes that cannot be
represented as an ASCII character into special escape sequences.
If, from the point of view of the application, representing bytes
as characters makes sense, then this representation can be
2010-07-27 21:01:16 +02:00
convenient. But in practice it is usually confusing because it
2009-08-04 18:08:37 +02:00
fuzzes up the distinction between binary strings and character
strings, and also the particular escape mechanism that was chosen is
2018-09-22 01:55:07 +02:00
somewhat unwieldy. Therefore, this format should probably be avoided
2009-08-04 18:08:37 +02:00
for most new applications.
</para>
<para>
When entering <type>bytea</type> values in escape format,
octets of certain
values <emphasis>must</emphasis> be escaped, while all octet
values <emphasis>can</emphasis> be escaped. In
2009-04-27 18:27:36 +02:00
general, to escape an octet, convert it into its three-digit
2019-02-08 18:49:36 +01:00
octal value and precede it by a backslash.
2018-09-22 01:55:07 +02:00
Backslash itself (octet decimal value 92) can alternatively be represented by
2009-08-04 18:08:37 +02:00
double backslashes.
2017-11-23 15:39:47 +01:00
<xref linkend="datatype-binary-sqlesc"/>
2007-11-07 13:24:24 +01:00
shows the characters that must be escaped, and gives the alternative
2007-01-30 23:29:23 +01:00
escape sequences where applicable.
2001-09-09 19:21:59 +02:00
</para>
2001-11-20 16:42:44 +01:00
<table id="datatype-binary-sqlesc">
2017-10-09 03:44:17 +02:00
<title><type>bytea</type> Literal Escaped Octets</title>
2001-11-20 16:42:44 +01:00
<tgroup cols="5">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="1*"/>
<colspec colname="col4" colwidth="1.25*"/>
<colspec colname="col5" colwidth="1*"/>
2001-11-20 16:42:44 +01:00
<thead>
<row>
2001-11-21 04:17:22 +01:00
<entry>Decimal Octet Value</entry>
2001-11-20 16:42:44 +01:00
<entry>Description</entry>
2003-03-13 02:30:29 +01:00
<entry>Escaped Input Representation</entry>
2001-11-20 16:42:44 +01:00
<entry>Example</entry>
2019-02-08 18:49:36 +01:00
<entry>Hex Representation</entry>
2001-11-20 16:42:44 +01:00
</row>
</thead>
<tbody>
<row>
2002-11-11 21:14:04 +01:00
<entry>0</entry>
<entry>zero octet</entry>
2018-09-22 01:55:07 +02:00
<entry><literal>'\000'</literal></entry>
2020-05-06 18:23:43 +02:00
<entry><literal>'\000'::bytea</literal></entry>
2018-09-22 01:55:07 +02:00
<entry><literal>\x00</literal></entry>
2001-11-20 16:42:44 +01:00
</row>
<row>
2002-11-11 21:14:04 +01:00
<entry>39</entry>
<entry>single quote</entry>
2018-09-22 01:55:07 +02:00
<entry><literal>''''</literal> or <literal>'\047'</literal></entry>
2020-05-06 18:23:43 +02:00
<entry><literal>''''::bytea</literal></entry>
2018-09-22 01:55:07 +02:00
<entry><literal>\x27</literal></entry>
2001-11-20 16:42:44 +01:00
</row>
<row>
2002-11-11 21:14:04 +01:00
<entry>92</entry>
<entry>backslash</entry>
2019-02-08 18:49:36 +01:00
<entry><literal>'\\'</literal> or <literal>'\134'</literal></entry>
2020-05-06 18:23:43 +02:00
<entry><literal>'\\'::bytea</literal></entry>
2018-09-22 01:55:07 +02:00
<entry><literal>\x5c</literal></entry>
2001-11-20 16:42:44 +01:00
</row>
2003-11-30 21:55:09 +01:00
<row>
<entry>0 to 31 and 127 to 255</entry>
<entry><quote>non-printable</quote> octets</entry>
2018-09-22 01:55:07 +02:00
<entry><literal>'\<replaceable>xxx'</replaceable></literal> (octal value)</entry>
2020-05-06 18:23:43 +02:00
<entry><literal>'\001'::bytea</literal></entry>
2018-09-22 01:55:07 +02:00
<entry><literal>\x01</literal></entry>
2003-11-30 21:55:09 +01:00
</row>
2001-11-20 16:42:44 +01:00
</tbody>
</tgroup>
</table>
<para>
2009-04-27 18:27:36 +02:00
The requirement to escape <emphasis>non-printable</emphasis> octets
2003-11-30 21:55:09 +01:00
varies depending on locale settings. In some instances you can get away
2019-02-08 18:49:36 +01:00
with leaving them unescaped.
2003-03-13 02:30:29 +01:00
</para>
<para>
2019-02-08 18:49:36 +01:00
The reason that single quotes must be doubled, as shown
in <xref linkend="datatype-binary-sqlesc"/>, is that this
2021-06-11 03:38:04 +02:00
is true for any string literal in an SQL command. The generic
2019-02-08 18:49:36 +01:00
string-literal parser consumes the outermost single quotes
and reduces any pair of single quotes to one data character.
What the <type>bytea</type> input function sees is just one
single quote, which it treats as a plain data character.
However, the <type>bytea</type> input function treats
backslashes as special, and the other behaviors shown in
<xref linkend="datatype-binary-sqlesc"/> are implemented by
that function.
</para>
<para>
In some contexts, backslashes must be doubled compared to what is
shown above, because the generic string-literal parser will also
reduce pairs of backslashes to one data character;
see <xref linkend="sql-syntax-strings"/>.
2003-03-13 02:30:29 +01:00
</para>
<para>
2018-09-22 01:55:07 +02:00
<type>Bytea</type> octets are output in <literal>hex</literal>
format by default. If you change <xref linkend="guc-bytea-output"/>
to <literal>escape</literal>,
2019-02-08 18:49:36 +01:00
<quote>non-printable</quote> octets are converted to their
2018-09-22 01:55:07 +02:00
equivalent three-digit octal value and preceded by one backslash.
Most <quote>printable</quote> octets are output by their standard
representation in the client character set, e.g.:
<programlisting>
SET bytea_output = 'escape';
SELECT 'abc \153\154\155 \052\251\124'::bytea;
bytea
----------------
abc klm *\251T
</programlisting>
The octet with decimal value 92 (backslash) is doubled in the output.
2017-11-23 15:39:47 +01:00
Details are in <xref linkend="datatype-binary-resesc"/>.
2001-11-20 16:42:44 +01:00
</para>
<table id="datatype-binary-resesc">
2017-10-09 03:44:17 +02:00
<title><type>bytea</type> Output Escaped Octets</title>
2001-11-20 16:42:44 +01:00
<tgroup cols="5">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="1*"/>
<colspec colname="col4" colwidth="1.25*"/>
<colspec colname="col5" colwidth="1*"/>
2001-11-20 16:42:44 +01:00
<thead>
<row>
2001-11-21 04:17:22 +01:00
<entry>Decimal Octet Value</entry>
2001-11-20 16:42:44 +01:00
<entry>Description</entry>
2003-03-13 02:30:29 +01:00
<entry>Escaped Output Representation</entry>
2001-11-20 16:42:44 +01:00
<entry>Example</entry>
2003-03-13 02:30:29 +01:00
<entry>Output Result</entry>
2001-11-20 16:42:44 +01:00
</row>
</thead>
<tbody>
<row>
2002-11-11 21:14:04 +01:00
<entry>92</entry>
<entry>backslash</entry>
<entry><literal>\\</literal></entry>
2020-05-06 18:23:43 +02:00
<entry><literal>'\134'::bytea</literal></entry>
2002-11-11 21:14:04 +01:00
<entry><literal>\\</literal></entry>
2001-11-20 16:42:44 +01:00
</row>
<row>
2002-11-11 21:14:04 +01:00
<entry>0 to 31 and 127 to 255</entry>
<entry><quote>non-printable</quote> octets</entry>
2017-10-09 03:44:17 +02:00
<entry><literal>\<replaceable>xxx</replaceable></literal> (octal value)</entry>
2020-05-06 18:23:43 +02:00
<entry><literal>'\001'::bytea</literal></entry>
2002-11-11 21:14:04 +01:00
<entry><literal>\001</literal></entry>
2001-11-20 16:42:44 +01:00
</row>
<row>
2002-11-11 21:14:04 +01:00
<entry>32 to 126</entry>
<entry><quote>printable</quote> octets</entry>
2003-11-30 21:55:09 +01:00
<entry>client character set representation</entry>
2020-05-06 18:23:43 +02:00
<entry><literal>'\176'::bytea</literal></entry>
2002-11-11 21:14:04 +01:00
<entry><literal>~</literal></entry>
2001-11-20 16:42:44 +01:00
</row>
</tbody>
</tgroup>
</table>
<para>
2017-10-09 03:44:17 +02:00
Depending on the front end to <productname>PostgreSQL</productname> you use,
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
you might have additional work to do in terms of escaping and
unescaping <type>bytea</type> strings. For example, you might also
2002-11-15 04:11:18 +01:00
have to escape line feeds and carriage returns if your interface
2003-03-13 02:30:29 +01:00
automatically translates these.
2001-11-20 16:42:44 +01:00
</para>
2009-08-04 18:08:37 +02:00
</sect2>
2002-11-11 21:14:04 +01:00
</sect1>
2001-09-09 19:21:59 +02:00
2001-11-20 16:42:44 +01:00
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-datetime">
1999-05-04 04:22:13 +02:00
<title>Date/Time Types</title>
2003-08-31 19:32:24 +02:00
<indexterm zone="datatype-datetime">
<primary>date</primary>
</indexterm>
<indexterm zone="datatype-datetime">
<primary>time</primary>
</indexterm>
<indexterm zone="datatype-datetime">
<primary>time without time zone</primary>
</indexterm>
<indexterm zone="datatype-datetime">
<primary>time with time zone</primary>
</indexterm>
<indexterm zone="datatype-datetime">
<primary>timestamp</primary>
</indexterm>
2010-12-16 02:52:31 +01:00
<indexterm zone="datatype-datetime">
<primary>timestamptz</primary>
</indexterm>
2003-08-31 19:32:24 +02:00
<indexterm zone="datatype-datetime">
<primary>timestamp with time zone</primary>
</indexterm>
<indexterm zone="datatype-datetime">
<primary>timestamp without time zone</primary>
</indexterm>
<indexterm zone="datatype-datetime">
<primary>interval</primary>
</indexterm>
<indexterm zone="datatype-datetime">
<primary>time span</primary>
</indexterm>
1999-05-04 04:22:13 +02:00
<para>
2001-11-21 06:53:41 +01:00
<productname>PostgreSQL</productname> supports the full set of
2002-11-11 21:14:04 +01:00
<acronym>SQL</acronym> date and time types, shown in <xref
2017-11-23 15:39:47 +01:00
linkend="datatype-datetime-table"/>. The operations available
2005-01-08 06:19:18 +01:00
on these data types are described in
2017-11-23 15:39:47 +01:00
<xref linkend="functions-datetime"/>.
2012-04-27 00:28:52 +02:00
Dates are counted according to the Gregorian calendar, even in
years before that calendar was introduced (see <xref
2017-11-23 15:39:47 +01:00
linkend="datetime-units-history"/> for more information).
1999-05-04 04:22:13 +02:00
</para>
1998-10-27 07:14:41 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-datetime-table">
2001-02-14 20:37:26 +01:00
<title>Date/Time Types</title>
2001-10-09 20:46:00 +02:00
<tgroup cols="6">
1999-05-04 04:22:13 +02:00
<thead>
<row>
2003-03-13 02:30:29 +01:00
<entry>Name</entry>
<entry>Storage Size</entry>
2000-01-23 02:27:39 +01:00
<entry>Description</entry>
2003-03-13 02:30:29 +01:00
<entry>Low Value</entry>
<entry>High Value</entry>
2000-01-23 02:27:39 +01:00
<entry>Resolution</entry>
1999-05-04 04:22:13 +02:00
</row>
</thead>
<tbody>
<row>
2002-10-31 23:18:42 +01:00
<entry><type>timestamp [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
2000-01-23 02:27:39 +01:00
<entry>8 bytes</entry>
2009-04-27 18:27:36 +02:00
<entry>both date and time (no time zone)</entry>
2000-01-23 02:27:39 +01:00
<entry>4713 BC</entry>
2008-03-30 06:08:15 +02:00
<entry>294276 AD</entry>
2017-02-23 17:40:12 +01:00
<entry>1 microsecond</entry>
1999-05-04 04:22:13 +02:00
</row>
2000-03-14 23:52:53 +01:00
<row>
2002-10-31 23:18:42 +01:00
<entry><type>timestamp [ (<replaceable>p</replaceable>) ] with time zone</type></entry>
2000-03-14 23:52:53 +01:00
<entry>8 bytes</entry>
2003-03-13 02:30:29 +01:00
<entry>both date and time, with time zone</entry>
2001-09-28 10:15:35 +02:00
<entry>4713 BC</entry>
2008-03-30 06:08:15 +02:00
<entry>294276 AD</entry>
2017-02-23 17:40:12 +01:00
<entry>1 microsecond</entry>
2000-03-14 23:52:53 +01:00
</row>
1999-05-04 04:22:13 +02:00
<row>
2000-01-23 02:27:39 +01:00
<entry><type>date</type></entry>
<entry>4 bytes</entry>
2009-04-27 18:27:36 +02:00
<entry>date (no time of day)</entry>
2000-01-23 02:27:39 +01:00
<entry>4713 BC</entry>
2006-02-09 04:39:17 +01:00
<entry>5874897 AD</entry>
2000-01-23 02:27:39 +01:00
<entry>1 day</entry>
1999-05-04 04:22:13 +02:00
</row>
<row>
2001-12-29 19:35:54 +01:00
<entry><type>time [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
2001-08-31 03:55:25 +02:00
<entry>8 bytes</entry>
2009-04-27 18:27:36 +02:00
<entry>time of day (no date)</entry>
2005-10-22 21:33:57 +02:00
<entry>00:00:00</entry>
2005-10-14 13:47:57 +02:00
<entry>24:00:00</entry>
2017-02-23 17:40:12 +01:00
<entry>1 microsecond</entry>
1999-05-04 04:22:13 +02:00
</row>
2000-03-14 23:52:53 +01:00
<row>
2001-12-29 19:35:54 +01:00
<entry><type>time [ (<replaceable>p</replaceable>) ] with time zone</type></entry>
2001-08-31 03:55:25 +02:00
<entry>12 bytes</entry>
2017-02-23 17:40:12 +01:00
<entry>time of day (no date), with time zone</entry>
2020-08-03 19:11:16 +02:00
<!-- see MAX_TZDISP_HOUR in datatype/timestamp.h -->
<entry>00:00:00+1559</entry>
<entry>24:00:00-1559</entry>
2017-02-23 17:40:12 +01:00
<entry>1 microsecond</entry>
2000-03-14 23:52:53 +01:00
</row>
2008-11-09 01:28:35 +01:00
<row>
<entry><type>interval [ <replaceable>fields</replaceable> ] [ (<replaceable>p</replaceable>) ]</type></entry>
2014-01-30 15:41:43 +01:00
<entry>16 bytes</entry>
2009-04-27 18:27:36 +02:00
<entry>time interval</entry>
2008-11-09 01:28:35 +01:00
<entry>-178000000 years</entry>
<entry>178000000 years</entry>
2017-02-23 17:40:12 +01:00
<entry>1 microsecond</entry>
2008-11-09 01:28:35 +01:00
</row>
1999-05-04 04:22:13 +02:00
</tbody>
</tgroup>
</table>
2010-02-24 16:54:31 +01:00
<note>
<para>
2010-02-25 19:16:53 +01:00
The SQL standard requires that writing just <type>timestamp</type>
2010-02-26 02:11:46 +01:00
be equivalent to <type>timestamp without time
2010-02-25 19:16:53 +01:00
zone</type>, and <productname>PostgreSQL</productname> honors that
2014-02-24 18:56:37 +01:00
behavior. <type>timestamptz</type> is accepted as an
2010-12-16 02:52:31 +01:00
abbreviation for <type>timestamp with time zone</type>; this is a
<productname>PostgreSQL</productname> extension.
2010-02-24 16:54:31 +01:00
</para>
</note>
2001-12-08 04:24:23 +01:00
<para>
2002-11-11 21:14:04 +01:00
<type>time</type>, <type>timestamp</type>, and
<type>interval</type> accept an optional precision value
<replaceable>p</replaceable> which specifies the number of
fractional digits retained in the seconds field. By default, there
is no explicit bound on precision. The allowed range of
2017-02-23 17:40:12 +01:00
<replaceable>p</replaceable> is from 0 to 6.
2003-01-29 02:08:42 +01:00
</para>
2008-09-11 17:27:30 +02:00
<para>
The <type>interval</type> type has an additional option, which is
to restrict the set of stored fields by writing one of these phrases:
2010-07-29 21:34:41 +02:00
<literallayout class="monospaced">
YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND
</literallayout>
2008-09-11 17:27:30 +02:00
Note that if both <replaceable>fields</replaceable> and
2009-07-08 19:21:55 +02:00
<replaceable>p</replaceable> are specified, the
2017-10-09 03:44:17 +02:00
<replaceable>fields</replaceable> must include <literal>SECOND</literal>,
2008-09-11 17:27:30 +02:00
since the precision applies only to the seconds.
</para>
2002-11-11 21:14:04 +01:00
<para>
The type <type>time with time zone</type> is defined by the SQL
standard, but the definition exhibits properties which lead to
questionable usefulness. In most cases, a combination of
<type>date</type>, <type>time</type>, <type>timestamp without time
2003-03-13 02:30:29 +01:00
zone</type>, and <type>timestamp with time zone</type> should
2002-11-11 21:14:04 +01:00
provide a complete range of date/time functionality required by
any application.
</para>
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-datetime-input">
1999-01-19 17:08:26 +01:00
<title>Date/Time Input</title>
1998-03-01 09:16:16 +01:00
1999-01-19 17:08:26 +01:00
<para>
2000-01-23 02:27:39 +01:00
Date and time input is accepted in almost any reasonable format, including
2010-11-23 21:27:50 +01:00
ISO 8601, <acronym>SQL</acronym>-compatible,
2003-03-13 02:30:29 +01:00
traditional <productname>POSTGRES</productname>, and others.
2009-04-27 18:27:36 +02:00
For some formats, ordering of day, month, and year in date input is
2001-12-21 04:54:02 +01:00
ambiguous and there is support for specifying the expected
2017-11-23 15:39:47 +01:00
ordering of these fields. Set the <xref linkend="guc-datestyle"/> parameter
2017-10-09 03:44:17 +02:00
to <literal>MDY</literal> to select month-day-year interpretation,
<literal>DMY</literal> to select day-month-year interpretation, or
<literal>YMD</literal> to select year-month-day interpretation.
1998-12-18 17:11:12 +01:00
</para>
<para>
2001-12-21 04:54:02 +01:00
<productname>PostgreSQL</productname> is more flexible in
2003-03-13 02:30:29 +01:00
handling date/time input than the
2001-12-21 04:54:02 +01:00
<acronym>SQL</acronym> standard requires.
2017-11-23 15:39:47 +01:00
See <xref linkend="datetime-appendix"/>
2001-12-21 04:54:02 +01:00
for the exact parsing rules of date/time input and for the
2002-01-04 18:02:25 +01:00
recognized text fields including months, days of the week, and
2010-11-23 21:27:50 +01:00
time zones.
1999-01-19 17:08:26 +01:00
</para>
1998-12-18 17:11:12 +01:00
1999-01-19 17:08:26 +01:00
<para>
2001-12-21 04:54:02 +01:00
Remember that any date or time literal input needs to be enclosed
2010-11-23 21:27:50 +01:00
in single quotes, like text strings. Refer to
2017-11-23 15:39:47 +01:00
<xref linkend="sql-syntax-constants-generic"/> for more
2001-12-21 04:54:02 +01:00
information.
2002-11-11 21:14:04 +01:00
<acronym>SQL</acronym> requires the following syntax
2001-01-26 23:04:22 +01:00
<synopsis>
2001-12-08 04:24:23 +01:00
<replaceable>type</replaceable> [ (<replaceable>p</replaceable>) ] '<replaceable>value</replaceable>'
2001-01-26 23:04:22 +01:00
</synopsis>
2009-06-17 23:58:49 +02:00
where <replaceable>p</replaceable> is an optional precision
specification giving the number of
2003-03-13 02:30:29 +01:00
fractional digits in the seconds field. Precision can be
specified for <type>time</type>, <type>timestamp</type>, and
2017-02-23 17:40:12 +01:00
<type>interval</type> types, and can range from 0 to 6.
If no precision is specified in a constant specification,
it defaults to the precision of the literal value (but not
more than 6 digits).
1999-01-19 17:08:26 +01:00
</para>
2023-01-09 21:08:24 +01:00
<sect3 id="datatype-datetime-input-dates">
2002-11-11 21:14:04 +01:00
<title>Dates</title>
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>date</primary>
</indexterm>
2010-11-23 21:27:50 +01:00
1999-01-19 17:08:26 +01:00
<para>
2017-11-23 15:39:47 +01:00
<xref linkend="datatype-datetime-date-table"/> shows some possible
2002-11-11 21:14:04 +01:00
inputs for the <type>date</type> type.
</para>
2000-03-14 23:52:53 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-datetime-date-table">
2001-02-14 20:37:26 +01:00
<title>Date Input</title>
1999-01-19 17:08:26 +01:00
<tgroup cols="2">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="2*"/>
1999-01-19 17:08:26 +01:00
<thead>
2003-11-01 02:56:29 +01:00
<row>
<entry>Example</entry>
<entry>Description</entry>
</row>
1999-01-19 17:08:26 +01:00
</thead>
<tbody>
2003-11-01 02:56:29 +01:00
<row>
<entry>1999-01-08</entry>
2003-11-16 21:29:16 +01:00
<entry>ISO 8601; January 8 in any mode
2003-11-01 02:56:29 +01:00
(recommended format)</entry>
</row>
2009-04-27 18:27:36 +02:00
<row>
<entry>January 8, 1999</entry>
<entry>unambiguous in any <varname>datestyle</varname> input mode</entry>
</row>
2003-11-01 02:56:29 +01:00
<row>
<entry>1/8/1999</entry>
2017-10-09 03:44:17 +02:00
<entry>January 8 in <literal>MDY</literal> mode;
August 1 in <literal>DMY</literal> mode</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry>1/18/1999</entry>
2017-10-09 03:44:17 +02:00
<entry>January 18 in <literal>MDY</literal> mode;
2003-11-01 02:56:29 +01:00
rejected in other modes</entry>
</row>
<row>
<entry>01/02/03</entry>
2017-10-09 03:44:17 +02:00
<entry>January 2, 2003 in <literal>MDY</literal> mode;
February 1, 2003 in <literal>DMY</literal> mode;
February 3, 2001 in <literal>YMD</literal> mode
2003-11-01 02:56:29 +01:00
</entry>
</row>
2003-11-16 21:29:16 +01:00
<row>
<entry>1999-Jan-08</entry>
<entry>January 8 in any mode</entry>
</row>
<row>
<entry>Jan-08-1999</entry>
<entry>January 8 in any mode</entry>
</row>
<row>
<entry>08-Jan-1999</entry>
<entry>January 8 in any mode</entry>
</row>
<row>
<entry>99-Jan-08</entry>
2017-10-09 03:44:17 +02:00
<entry>January 8 in <literal>YMD</literal> mode, else error</entry>
2003-11-16 21:29:16 +01:00
</row>
<row>
<entry>08-Jan-99</entry>
2017-10-09 03:44:17 +02:00
<entry>January 8, except error in <literal>YMD</literal> mode</entry>
2003-11-16 21:29:16 +01:00
</row>
<row>
<entry>Jan-08-99</entry>
2017-10-09 03:44:17 +02:00
<entry>January 8, except error in <literal>YMD</literal> mode</entry>
2003-11-16 21:29:16 +01:00
</row>
2003-11-01 02:56:29 +01:00
<row>
<entry>19990108</entry>
2003-11-04 10:55:39 +01:00
<entry>ISO 8601; January 8, 1999 in any mode</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry>990108</entry>
2003-11-04 10:55:39 +01:00
<entry>ISO 8601; January 8, 1999 in any mode</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry>1999.008</entry>
<entry>year and day of year</entry>
</row>
<row>
<entry>J2451187</entry>
2012-04-27 00:28:52 +02:00
<entry>Julian date</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry>January 8, 99 BC</entry>
2009-04-27 18:27:36 +02:00
<entry>year 99 BC</entry>
2003-11-01 02:56:29 +01:00
</row>
1999-01-19 17:08:26 +01:00
</tbody>
</tgroup>
</table>
2000-01-23 02:27:39 +01:00
</sect3>
1999-01-19 17:08:26 +01:00
2023-01-09 21:08:24 +01:00
<sect3 id="datatype-datetime-input-times">
2002-11-11 21:14:04 +01:00
<title>Times</title>
2000-07-14 17:26:21 +02:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>time</primary>
</indexterm>
2001-09-28 10:15:35 +02:00
<indexterm>
<primary>time without time zone</primary>
</indexterm>
2002-11-11 21:14:04 +01:00
<indexterm>
<primary>time with time zone</primary>
</indexterm>
2001-05-13 00:51:36 +02:00
2000-07-14 17:26:21 +02:00
<para>
2003-01-31 02:08:08 +01:00
The time-of-day types are <type>time [
(<replaceable>p</replaceable>) ] without time zone</type> and
<type>time [ (<replaceable>p</replaceable>) ] with time
2009-06-17 23:58:49 +02:00
zone</type>. <type>time</type> alone is equivalent to
2003-01-31 02:08:08 +01:00
<type>time without time zone</type>.
2000-07-14 17:26:21 +02:00
</para>
2000-03-14 23:52:53 +01:00
<para>
2003-03-13 02:30:29 +01:00
Valid input for these types consists of a time of day followed
by an optional time zone. (See <xref
2017-11-23 15:39:47 +01:00
linkend="datatype-datetime-time-table"/>
and <xref linkend="datatype-timezone-table"/>.) If a time zone is
2003-03-13 02:30:29 +01:00
specified in the input for <type>time without time zone</type>,
2006-10-16 21:58:27 +02:00
it is silently ignored. You can also specify a date but it will
2006-10-18 18:43:14 +02:00
be ignored, except when you use a time zone name that involves a
daylight-savings rule, such as
2006-07-06 03:46:38 +02:00
<literal>America/New_York</literal>. In this case specifying the date
2006-10-16 21:58:27 +02:00
is required in order to determine whether standard or daylight-savings
time applies. The appropriate time zone offset is recorded in the
2023-09-27 01:23:59 +02:00
<type>time with time zone</type> value and is output as stored;
it is not adjusted to the active time zone.
2002-11-11 21:14:04 +01:00
</para>
2000-03-14 23:52:53 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-datetime-time-table">
2001-02-14 20:37:26 +01:00
<title>Time Input</title>
2000-03-14 23:52:53 +01:00
<tgroup cols="2">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="3*"/>
<colspec colname="col2" colwidth="2*"/>
2003-11-01 02:56:29 +01:00
<thead>
<row>
<entry>Example</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>04:05:06.789</literal></entry>
<entry>ISO 8601</entry>
</row>
<row>
<entry><literal>04:05:06</literal></entry>
<entry>ISO 8601</entry>
</row>
<row>
<entry><literal>04:05</literal></entry>
<entry>ISO 8601</entry>
</row>
<row>
<entry><literal>040506</literal></entry>
<entry>ISO 8601</entry>
</row>
<row>
<entry><literal>04:05 AM</literal></entry>
2009-06-17 23:58:49 +02:00
<entry>same as 04:05; AM does not affect value</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>04:05 PM</literal></entry>
2005-01-22 23:56:36 +01:00
<entry>same as 16:05; input hour must be <= 12</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>04:05:06.789-8</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>ISO 8601, with time zone as UTC offset</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>04:05:06-08:00</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>ISO 8601, with time zone as UTC offset</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>04:05-08:00</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>ISO 8601, with time zone as UTC offset</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>040506-08</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>ISO 8601, with time zone as UTC offset</entry>
</row>
<row>
<entry><literal>040506+0730</literal></entry>
<entry>ISO 8601, with fractional-hour time zone as UTC offset</entry>
</row>
<row>
<entry><literal>040506+07:30:00</literal></entry>
<entry>UTC offset specified to seconds (not allowed in ISO 8601)</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>04:05:06 PST</literal></entry>
2006-10-16 21:58:27 +02:00
<entry>time zone specified by abbreviation</entry>
2003-11-01 02:56:29 +01:00
</row>
2006-07-06 03:46:38 +02:00
<row>
<entry><literal>2003-04-12 04:05:06 America/New_York</literal></entry>
<entry>time zone specified by full name</entry>
</row>
2003-11-01 02:56:29 +01:00
</tbody>
</tgroup>
</table>
2000-03-14 23:52:53 +01:00
2003-07-29 02:03:19 +02:00
<table tocentry="1" id="datatype-timezone-table">
<title>Time Zone Input</title>
<tgroup cols="2">
2003-11-01 02:56:29 +01:00
<thead>
<row>
<entry>Example</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>PST</literal></entry>
2006-10-17 23:03:21 +02:00
<entry>Abbreviation (for Pacific Standard Time)</entry>
2003-11-01 02:56:29 +01:00
</row>
2006-07-06 03:46:38 +02:00
<row>
<entry><literal>America/New_York</literal></entry>
<entry>Full time zone name</entry>
</row>
2006-10-17 23:03:21 +02:00
<row>
<entry><literal>PST8PDT</literal></entry>
<entry>POSIX-style time zone specification</entry>
</row>
2021-07-06 16:34:51 +02:00
<row>
<entry><literal>-8:00:00</literal></entry>
<entry>UTC offset for PST</entry>
</row>
2003-11-01 02:56:29 +01:00
<row>
<entry><literal>-8:00</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>UTC offset for PST (ISO 8601 extended format)</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>-800</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>UTC offset for PST (ISO 8601 basic format)</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>-8</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>UTC offset for PST (ISO 8601 basic format)</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>zulu</literal></entry>
2003-11-16 21:29:16 +01:00
<entry>Military abbreviation for UTC</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>z</literal></entry>
2021-07-06 16:34:51 +02:00
<entry>Short form of <literal>zulu</literal> (also in ISO 8601)</entry>
2003-11-01 02:56:29 +01:00
</row>
</tbody>
2003-07-29 02:03:19 +02:00
</tgroup>
</table>
2004-08-10 02:55:08 +02:00
<para>
2017-11-23 15:39:47 +01:00
Refer to <xref linkend="datatype-timezones"/> for more information on how
2006-09-22 18:20:00 +02:00
to specify time zones.
2004-08-10 02:55:08 +02:00
</para>
2000-01-23 02:27:39 +01:00
</sect3>
2023-01-09 21:08:24 +01:00
<sect3 id="datatype-datetime-input-time-stamps">
2003-03-13 02:30:29 +01:00
<title>Time Stamps</title>
2002-11-11 21:14:04 +01:00
<indexterm>
<primary>timestamp</primary>
</indexterm>
2001-09-28 10:15:35 +02:00
2002-11-22 00:31:20 +01:00
<indexterm>
<primary>timestamp with time zone</primary>
</indexterm>
2001-09-28 10:15:35 +02:00
<indexterm>
<primary>timestamp without time zone</primary>
</indexterm>
2002-11-11 21:14:04 +01:00
<para>
2009-04-27 18:27:36 +02:00
Valid input for the time stamp types consists of the concatenation
2004-08-10 02:55:08 +02:00
of a date and a time, followed by an optional time zone,
followed by an optional <literal>AD</literal> or <literal>BC</literal>.
(Alternatively, <literal>AD</literal>/<literal>BC</literal> can appear
before the time zone, but this is not the preferred ordering.)
2007-02-01 01:28:19 +01:00
Thus:
2001-09-28 10:15:35 +02:00
2001-11-09 00:36:55 +01:00
<programlisting>
2001-09-28 10:15:35 +02:00
1999-01-08 04:05:06
2002-11-11 21:14:04 +01:00
</programlisting>
2007-02-01 01:28:19 +01:00
and:
2002-11-11 21:14:04 +01:00
<programlisting>
1999-01-08 04:05:06 -8:00
2001-11-09 00:36:55 +01:00
</programlisting>
2001-09-28 10:15:35 +02:00
2002-11-11 21:14:04 +01:00
are valid values, which follow the <acronym>ISO</acronym> 8601
2009-04-27 18:27:36 +02:00
standard. In addition, the common format:
2001-11-09 00:36:55 +01:00
<programlisting>
2001-09-28 10:15:35 +02:00
January 8 04:05:06 1999 PST
2001-11-09 00:36:55 +01:00
</programlisting>
2001-09-28 10:15:35 +02:00
is supported.
</para>
2001-12-08 04:24:23 +01:00
<para>
2009-06-17 23:58:49 +02:00
The <acronym>SQL</acronym> standard differentiates
2010-11-23 21:27:50 +01:00
<type>timestamp without time zone</type>
and <type>timestamp with time zone</type> literals by the presence of a
2009-06-17 23:58:49 +02:00
<quote>+</quote> or <quote>-</quote> symbol and time zone offset after
the time. Hence, according to the standard,
2009-04-27 18:27:36 +02:00
2020-05-14 13:14:58 +02:00
<programlisting>
TIMESTAMP '2004-10-19 10:23:54'
</programlisting>
2009-04-27 18:27:36 +02:00
2009-06-17 23:58:49 +02:00
is a <type>timestamp without time zone</type>, while
2009-04-27 18:27:36 +02:00
2020-05-14 13:14:58 +02:00
<programlisting>
TIMESTAMP '2004-10-19 10:23:54+02'
</programlisting>
2009-04-27 18:27:36 +02:00
2004-11-27 22:27:08 +01:00
is a <type>timestamp with time zone</type>.
2005-10-22 21:33:57 +02:00
<productname>PostgreSQL</productname> never examines the content of a
literal string before determining its type, and therefore will treat
both of the above as <type>timestamp without time zone</type>. To
ensure that a literal is treated as <type>timestamp with time
zone</type>, give it the correct explicit type:
2009-04-27 18:27:36 +02:00
2020-05-14 13:14:58 +02:00
<programlisting>
TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
</programlisting>
2009-04-27 18:27:36 +02:00
In a literal that has been determined to be <type>timestamp without time
2005-10-22 21:33:57 +02:00
zone</type>, <productname>PostgreSQL</productname> will silently ignore
any time zone indication.
That is, the resulting value is derived from the date/time
2001-09-28 10:15:35 +02:00
fields in the input value, and is not adjusted for time zone.
</para>
2001-12-08 04:24:23 +01:00
2002-11-22 00:31:20 +01:00
<para>
For <type>timestamp with time zone</type>, the internally stored
2003-03-13 02:30:29 +01:00
value is always in UTC (Universal
Coordinated Time, traditionally known as Greenwich Mean Time,
2017-10-09 03:44:17 +02:00
<acronym>GMT</acronym>). An input value that has an explicit
2002-11-22 00:31:20 +01:00
time zone specified is converted to UTC using the appropriate offset
for that time zone. If no time zone is stated in the input string,
then it is assumed to be in the time zone indicated by the system's
2017-11-23 15:39:47 +01:00
<xref linkend="guc-timezone"/> parameter, and is converted to UTC using the
2017-10-09 03:44:17 +02:00
offset for the <varname>timezone</varname> zone.
2002-11-22 00:31:20 +01:00
</para>
<para>
When a <type>timestamp with time
zone</type> value is output, it is always converted from UTC to the
2017-10-09 03:44:17 +02:00
current <varname>timezone</varname> zone, and displayed as local time in that
2002-11-22 00:31:20 +01:00
zone. To see the time in another time zone, either change
2017-10-09 03:44:17 +02:00
<varname>timezone</varname> or use the <literal>AT TIME ZONE</literal> construct
2017-11-23 15:39:47 +01:00
(see <xref linkend="functions-datetime-zoneconvert"/>).
2002-11-22 00:31:20 +01:00
</para>
<para>
Conversions between <type>timestamp without time zone</type> and
<type>timestamp with time zone</type> normally assume that the
<type>timestamp without time zone</type> value should be taken or given
2017-10-09 03:44:17 +02:00
as <varname>timezone</varname> local time. A different time zone can
be specified for the conversion using <literal>AT TIME ZONE</literal>.
2002-11-22 00:31:20 +01:00
</para>
2000-01-23 02:27:39 +01:00
</sect3>
1999-01-19 17:08:26 +01:00
2020-10-17 22:02:47 +02:00
<sect3 id="datatype-datetime-special-values">
2003-03-13 02:30:29 +01:00
<title>Special Values</title>
2000-05-02 22:02:03 +02:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>time</primary>
2001-11-21 06:53:41 +01:00
<secondary>constants</secondary>
2001-05-13 00:51:36 +02:00
</indexterm>
<indexterm>
<primary>date</primary>
2001-11-21 06:53:41 +01:00
<secondary>constants</secondary>
2001-05-13 00:51:36 +02:00
</indexterm>
2000-05-02 22:02:03 +02:00
<para>
2004-08-10 02:55:08 +02:00
<productname>PostgreSQL</productname> supports several
2002-11-22 00:31:20 +01:00
special date/time input values for convenience, as shown in <xref
2017-11-23 15:39:47 +01:00
linkend="datatype-datetime-special-table"/>. The values
2002-11-22 00:31:20 +01:00
<literal>infinity</literal> and <literal>-infinity</literal>
are specially represented inside the system and will be displayed
2009-04-27 18:27:36 +02:00
unchanged; but the others are simply notational shorthands
2002-11-22 00:31:20 +01:00
that will be converted to ordinary date/time values when read.
2017-10-09 03:44:17 +02:00
(In particular, <literal>now</literal> and related strings are converted
2005-01-08 06:19:18 +01:00
to a specific time value as soon as they are read.)
2009-04-27 18:27:36 +02:00
All of these values need to be enclosed in single quotes when used
2005-01-08 06:19:18 +01:00
as constants in SQL commands.
2002-11-11 21:14:04 +01:00
</para>
2000-01-23 02:27:39 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-datetime-special-table">
2002-11-22 00:31:20 +01:00
<title>Special Date/Time Inputs</title>
2007-04-17 19:30:35 +02:00
<tgroup cols="3">
2003-11-01 02:56:29 +01:00
<thead>
<row>
<entry>Input String</entry>
2003-03-13 02:30:29 +01:00
<entry>Valid Types</entry>
2003-11-01 02:56:29 +01:00
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>epoch</literal></entry>
2003-03-13 02:30:29 +01:00
<entry><type>date</type>, <type>timestamp</type></entry>
2003-11-01 02:56:29 +01:00
<entry>1970-01-01 00:00:00+00 (Unix system time zero)</entry>
</row>
<row>
<entry><literal>infinity</literal></entry>
Support +/- infinity in the interval data type.
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.
The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.
Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.
Bump catalog version.
Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.
Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
2023-11-14 11:58:49 +01:00
<entry><type>date</type>, <type>timestamp</type>, <type>interval</type></entry>
2003-11-01 02:56:29 +01:00
<entry>later than all other time stamps</entry>
</row>
<row>
<entry><literal>-infinity</literal></entry>
Support +/- infinity in the interval data type.
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.
The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.
Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.
Bump catalog version.
Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.
Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
2023-11-14 11:58:49 +01:00
<entry><type>date</type>, <type>timestamp</type>, <type>interval</type></entry>
2003-11-01 02:56:29 +01:00
<entry>earlier than all other time stamps</entry>
</row>
<row>
<entry><literal>now</literal></entry>
2003-03-13 02:30:29 +01:00
<entry><type>date</type>, <type>time</type>, <type>timestamp</type></entry>
2003-11-01 02:56:29 +01:00
<entry>current transaction's start time</entry>
</row>
<row>
<entry><literal>today</literal></entry>
2003-03-13 02:30:29 +01:00
<entry><type>date</type>, <type>timestamp</type></entry>
2019-10-07 23:26:46 +02:00
<entry>midnight (<literal>00:00</literal>) today</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>tomorrow</literal></entry>
2003-03-13 02:30:29 +01:00
<entry><type>date</type>, <type>timestamp</type></entry>
2019-10-07 23:26:46 +02:00
<entry>midnight (<literal>00:00</literal>) tomorrow</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>yesterday</literal></entry>
2003-03-13 02:30:29 +01:00
<entry><type>date</type>, <type>timestamp</type></entry>
2019-10-07 23:26:46 +02:00
<entry>midnight (<literal>00:00</literal>) yesterday</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry><literal>allballs</literal></entry>
2003-03-13 02:30:29 +01:00
<entry><type>time</type></entry>
2003-11-01 02:56:29 +01:00
<entry>00:00:00.00 UTC</entry>
</row>
</tbody>
2001-11-21 06:53:41 +01:00
</tgroup>
</table>
1998-12-18 17:11:12 +01:00
2004-08-10 02:55:08 +02:00
<para>
The following <acronym>SQL</acronym>-compatible functions can also
be used to obtain the current time value for the corresponding data
type:
2010-11-23 21:27:50 +01:00
<literal>CURRENT_DATE</literal>, <literal>CURRENT_TIME</literal>,
<literal>CURRENT_TIMESTAMP</literal>, <literal>LOCALTIME</literal>,
2020-10-17 22:02:47 +02:00
<literal>LOCALTIMESTAMP</literal>. (See <xref
2017-11-23 15:39:47 +01:00
linkend="functions-datetime-current"/>.) Note that these are
2017-10-09 03:44:17 +02:00
SQL functions and are <emphasis>not</emphasis> recognized in data input strings.
2004-08-10 02:55:08 +02:00
</para>
2020-10-17 22:02:47 +02:00
<caution>
<para>
While the input strings <literal>now</literal>,
<literal>today</literal>, <literal>tomorrow</literal>,
and <literal>yesterday</literal> are fine to use in interactive SQL
commands, they can have surprising behavior when the command is
saved to be executed later, for example in prepared statements,
views, and function definitions. The string can be converted to a
specific time value that continues to be used long after it becomes
stale. Use one of the SQL functions instead in such contexts.
For example, <literal>CURRENT_DATE + 1</literal> is safer than
<literal>'tomorrow'::date</literal>.
</para>
</caution>
2001-11-21 06:53:41 +01:00
</sect3>
</sect2>
2000-01-23 02:27:39 +01:00
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-datetime-output">
2000-01-23 02:27:39 +01:00
<title>Date/Time Output</title>
1999-08-06 15:43:42 +02:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>date</primary>
<secondary>output format</secondary>
2003-08-31 19:32:24 +02:00
<seealso>formatting</seealso>
2001-05-13 00:51:36 +02:00
</indexterm>
<indexterm>
<primary>time</primary>
<secondary>output format</secondary>
2003-08-31 19:32:24 +02:00
<seealso>formatting</seealso>
2001-05-13 00:51:36 +02:00
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2009-06-17 23:58:49 +02:00
The output format of the date/time types can be set to one of the four
styles ISO 8601,
2017-10-09 03:44:17 +02:00
<acronym>SQL</acronym> (Ingres), traditional <productname>POSTGRES</productname>
(Unix <application>date</application> format), or
2009-06-17 23:58:49 +02:00
German. The default
2002-11-15 04:11:18 +01:00
is the <acronym>ISO</acronym> format. (The
<acronym>SQL</acronym> standard requires the use of the ISO 8601
2009-06-17 23:58:49 +02:00
format. The name of the <quote>SQL</quote> output format is a
historical accident.) <xref
2017-11-23 15:39:47 +01:00
linkend="datatype-datetime-output-table"/> shows examples of each
2002-11-15 04:11:18 +01:00
output style. The output of the <type>date</type> and
2014-02-12 17:25:04 +01:00
<type>time</type> types is generally only the date or time part
in accordance with the given examples. However, the
2017-10-09 03:44:17 +02:00
<productname>POSTGRES</productname> style outputs date-only values in
2014-02-12 17:25:04 +01:00
<acronym>ISO</acronym> format.
2002-11-11 21:14:04 +01:00
</para>
2000-01-23 02:27:39 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-datetime-output-table">
2001-02-14 20:37:26 +01:00
<title>Date/Time Output Styles</title>
2000-01-23 02:27:39 +01:00
<tgroup cols="3">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="2*"/>
2000-01-23 02:27:39 +01:00
<thead>
2003-11-01 02:56:29 +01:00
<row>
<entry>Style Specification</entry>
<entry>Description</entry>
<entry>Example</entry>
</row>
2000-01-23 02:27:39 +01:00
</thead>
<tbody>
2003-11-01 02:56:29 +01:00
<row>
2012-05-21 17:56:00 +02:00
<entry><literal>ISO</literal></entry>
<entry>ISO 8601, SQL standard</entry>
<entry><literal>1997-12-17 07:37:16-08</literal></entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
2012-05-21 17:56:00 +02:00
<entry><literal>SQL</literal></entry>
2003-11-01 02:56:29 +01:00
<entry>traditional style</entry>
2012-05-21 17:56:00 +02:00
<entry><literal>12/17/1997 07:37:16.00 PST</literal></entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
2012-05-21 17:56:00 +02:00
<entry><literal>Postgres</literal></entry>
2003-11-01 02:56:29 +01:00
<entry>original style</entry>
2012-05-21 17:56:00 +02:00
<entry><literal>Wed Dec 17 07:37:16 1997 PST</literal></entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
2012-05-21 17:56:00 +02:00
<entry><literal>German</literal></entry>
2003-11-01 02:56:29 +01:00
<entry>regional style</entry>
2012-05-21 17:56:00 +02:00
<entry><literal>17.12.1997 07:37:16.00 PST</literal></entry>
2003-11-01 02:56:29 +01:00
</row>
2000-01-23 02:27:39 +01:00
</tbody>
</tgroup>
</table>
1999-08-06 15:43:42 +02:00
2012-05-21 17:56:00 +02:00
<note>
<para>
2017-10-09 03:44:17 +02:00
ISO 8601 specifies the use of uppercase letter <literal>T</literal> to separate
the date and time. <productname>PostgreSQL</productname> accepts that format on
input, but on output it uses a space rather than <literal>T</literal>, as shown
2020-12-01 13:36:30 +01:00
above. This is for readability and for consistency with
2024-04-10 13:53:25 +02:00
<ulink url="https://datatracker.ietf.org/doc/html/rfc3339">RFC 3339</ulink> as
2012-05-21 17:56:00 +02:00
well as some other database systems.
</para>
</note>
1999-08-06 15:43:42 +02:00
<para>
2003-07-29 02:03:19 +02:00
In the <acronym>SQL</acronym> and POSTGRES styles, day appears before
month if DMY field ordering has been specified, otherwise month appears
before day.
2017-11-23 15:39:47 +01:00
(See <xref linkend="datatype-datetime-input"/>
2002-11-11 21:14:04 +01:00
for how this setting also affects interpretation of input values.)
2017-11-23 15:39:47 +01:00
<xref linkend="datatype-datetime-output2-table"/> shows examples.
1998-12-18 17:11:12 +01:00
</para>
1998-03-01 09:16:16 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-datetime-output2-table">
<title>Date Order Conventions</title>
2000-01-23 02:27:39 +01:00
<tgroup cols="3">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="2*"/>
2000-01-23 02:27:39 +01:00
<thead>
2003-11-01 02:56:29 +01:00
<row>
<entry><varname>datestyle</varname> Setting</entry>
<entry>Input Ordering</entry>
<entry>Example Output</entry>
</row>
2000-01-23 02:27:39 +01:00
</thead>
<tbody>
2003-11-01 02:56:29 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><literal>SQL, DMY</literal></entry>
2003-11-01 02:56:29 +01:00
<entry><replaceable>day</replaceable>/<replaceable>month</replaceable>/<replaceable>year</replaceable></entry>
2012-05-21 17:56:00 +02:00
<entry><literal>17/12/1997 15:37:16.00 CET</literal></entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><literal>SQL, MDY</literal></entry>
2003-11-01 02:56:29 +01:00
<entry><replaceable>month</replaceable>/<replaceable>day</replaceable>/<replaceable>year</replaceable></entry>
2012-05-21 17:56:00 +02:00
<entry><literal>12/17/1997 07:37:16.00 PST</literal></entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><literal>Postgres, DMY</literal></entry>
2003-11-01 02:56:29 +01:00
<entry><replaceable>day</replaceable>/<replaceable>month</replaceable>/<replaceable>year</replaceable></entry>
2012-05-21 17:56:00 +02:00
<entry><literal>Wed 17 Dec 07:37:16 1997 PST</literal></entry>
2003-11-01 02:56:29 +01:00
</row>
2000-01-23 02:27:39 +01:00
</tbody>
</tgroup>
</table>
1998-03-01 09:16:16 +01:00
2021-07-06 16:34:51 +02:00
<para>
In the <acronym>ISO</acronym> style, the time zone is always shown as
a signed numeric offset from UTC, with positive sign used for zones
east of Greenwich. The offset will be shown
as <replaceable>hh</replaceable> (hours only) if it is an integral
number of hours, else
as <replaceable>hh</replaceable>:<replaceable>mm</replaceable> if it
is an integral number of minutes, else as
<replaceable>hh</replaceable>:<replaceable>mm</replaceable>:<replaceable>ss</replaceable>.
(The third case is not possible with any modern time zone standard,
but it can appear when working with timestamps that predate the
adoption of standardized time zones.)
In the other date styles, the time zone is shown as an alphabetic
abbreviation if one is in common use in the current zone. Otherwise
it appears as a signed numeric offset in ISO 8601 basic format
(<replaceable>hh</replaceable> or <replaceable>hhmm</replaceable>).
</para>
1999-08-06 15:43:42 +02:00
<para>
2012-05-21 17:56:00 +02:00
The date/time style can be selected by the user using the
2004-03-09 17:57:47 +01:00
<command>SET datestyle</command> command, the <xref
2017-11-23 15:39:47 +01:00
linkend="guc-datestyle"/> parameter in the
2003-07-29 02:03:19 +02:00
<filename>postgresql.conf</filename> configuration file, or the
2002-11-11 21:14:04 +01:00
<envar>PGDATESTYLE</envar> environment variable on the server or
2012-05-21 17:56:00 +02:00
client.
</para>
<para>
The formatting function <function>to_char</function>
2017-11-23 15:39:47 +01:00
(see <xref linkend="functions-formatting"/>) is also available as
2008-11-09 01:28:35 +01:00
a more flexible way to format date/time output.
1999-08-06 15:43:42 +02:00
</para>
</sect2>
1998-12-18 17:11:12 +01:00
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-timezones">
2000-01-23 02:27:39 +01:00
<title>Time Zones</title>
1998-12-18 17:11:12 +01:00
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype-timezones">
2003-08-31 19:32:24 +02:00
<primary>time zone</primary>
2001-05-13 00:51:36 +02:00
</indexterm>
2003-03-13 02:30:29 +01:00
<para>
Time zones, and time-zone conventions, are influenced by
political decisions, not just earth geometry. Time zones around the
2014-08-30 17:52:36 +02:00
world became somewhat standardized during the 1900s,
2004-08-10 02:55:08 +02:00
but continue to be prone to arbitrary changes, particularly with
respect to daylight-savings rules.
2008-02-16 22:51:04 +01:00
<productname>PostgreSQL</productname> uses the widely-used
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
IANA (Olson) time zone database for information about
2008-02-16 22:51:04 +01:00
historical time zone rules. For times in the future, the assumption
is that the latest known rules for a given time zone will
continue to be observed indefinitely far into the future.
2003-03-13 02:30:29 +01:00
</para>
1999-08-06 15:43:42 +02:00
<para>
2001-11-21 06:53:41 +01:00
<productname>PostgreSQL</productname> endeavors to be compatible with
2002-11-11 21:14:04 +01:00
the <acronym>SQL</acronym> standard definitions for typical usage.
However, the <acronym>SQL</acronym> standard has an odd mix of date and
2000-01-23 02:27:39 +01:00
time types and capabilities. Two obvious problems are:
1998-03-01 09:16:16 +01:00
2000-01-23 02:27:39 +01:00
<itemizedlist>
<listitem>
<para>
2010-11-23 21:27:50 +01:00
Although the <type>date</type> type
2009-04-27 18:27:36 +02:00
cannot have an associated time zone, the
2003-11-01 02:56:29 +01:00
<type>time</type> type can.
2010-11-23 21:27:50 +01:00
Time zones in the real world have little meaning unless
2004-08-10 02:55:08 +02:00
associated with a date as well as a time,
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
since the offset can vary through the year with daylight-saving
2003-11-01 02:56:29 +01:00
time boundaries.
2000-01-23 02:27:39 +01:00
</para>
</listitem>
1998-03-01 09:16:16 +01:00
2000-01-23 02:27:39 +01:00
<listitem>
<para>
2010-11-23 21:27:50 +01:00
The default time zone is specified as a constant numeric offset
2017-10-09 03:44:17 +02:00
from <acronym>UTC</acronym>. It is therefore impossible to adapt to
2004-08-10 02:55:08 +02:00
daylight-saving time when doing date/time arithmetic across
2003-11-01 02:56:29 +01:00
<acronym>DST</acronym> boundaries.
2000-01-23 02:27:39 +01:00
</para>
</listitem>
1998-10-14 18:26:31 +02:00
2000-01-23 02:27:39 +01:00
</itemizedlist>
1999-08-06 15:43:42 +02:00
</para>
2000-01-23 02:27:39 +01:00
<para>
2002-11-15 04:11:18 +01:00
To address these difficulties, we recommend using date/time types
that contain both date and time when using time zones. We
2017-10-09 03:44:17 +02:00
do <emphasis>not</emphasis> recommend using the type <type>time with
2002-11-15 04:11:18 +01:00
time zone</type> (though it is supported by
2001-11-21 06:53:41 +01:00
<productname>PostgreSQL</productname> for legacy applications and
2004-08-10 02:55:08 +02:00
for compliance with the <acronym>SQL</acronym> standard).
<productname>PostgreSQL</productname> assumes
2003-03-13 02:30:29 +01:00
your local time zone for any type containing only date or time.
1999-08-06 15:43:42 +02:00
</para>
<para>
2004-08-10 02:55:08 +02:00
All timezone-aware dates and times are stored internally in
<acronym>UTC</acronym>. They are converted to local time
2017-11-23 15:39:47 +01:00
in the zone specified by the <xref linkend="guc-timezone"/> configuration
2004-08-10 02:55:08 +02:00
parameter before being displayed to the client.
1999-08-06 15:43:42 +02:00
</para>
2006-09-22 18:20:00 +02:00
<para>
<productname>PostgreSQL</productname> allows you to specify time zones in
three different forms:
<itemizedlist>
<listitem>
<para>
2017-10-09 03:44:17 +02:00
A full time zone name, for example <literal>America/New_York</literal>.
2006-09-22 18:20:00 +02:00
The recognized time zone names are listed in the
<literal>pg_timezone_names</literal> view (see <xref
2017-11-23 15:39:47 +01:00
linkend="view-pg-timezone-names"/>).
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
<productname>PostgreSQL</productname> uses the widely-used IANA
time zone data for this purpose, so the same time zone
2018-11-21 23:20:15 +01:00
names are also recognized by other software.
2006-09-22 18:20:00 +02:00
</para>
</listitem>
<listitem>
<para>
2017-10-09 03:44:17 +02:00
A time zone abbreviation, for example <literal>PST</literal>. Such a
2006-09-22 18:20:00 +02:00
specification merely defines a particular offset from UTC, in
2009-04-27 18:27:36 +02:00
contrast to full time zone names which can imply a set of daylight
2020-06-18 22:27:18 +02:00
savings transition rules as well. The recognized abbreviations
2017-10-09 03:44:17 +02:00
are listed in the <literal>pg_timezone_abbrevs</literal> view (see <xref
2017-11-23 15:39:47 +01:00
linkend="view-pg-timezone-abbrevs"/>). You cannot set the
configuration parameters <xref linkend="guc-timezone"/> or
<xref linkend="guc-log-timezone"/> to a time
2006-09-22 18:20:00 +02:00
zone abbreviation, but you can use abbreviations in
2017-10-09 03:44:17 +02:00
date/time input values and with the <literal>AT TIME ZONE</literal>
2006-09-22 18:20:00 +02:00
operator.
</para>
</listitem>
<listitem>
<para>
In addition to the timezone names and abbreviations,
2006-10-17 23:03:21 +02:00
<productname>PostgreSQL</productname> will accept POSIX-style time zone
2020-06-18 22:27:18 +02:00
specifications, as described in
<xref linkend="datetime-posix-timezone-specs"/>. This option is not
normally preferable to using a named time zone, but it may be
necessary if no suitable IANA time zone entry is available.
2006-09-22 18:20:00 +02:00
</para>
</listitem>
</itemizedlist>
2009-06-17 23:58:49 +02:00
In short, this is the difference between abbreviations
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
and full names: abbreviations represent a specific offset from UTC,
whereas many of the full names imply a local daylight-savings time
rule, and so have two possible UTC offsets. As an example,
2017-10-09 03:44:17 +02:00
<literal>2014-06-04 12:00 America/New_York</literal> represents noon local
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
time in New York, which for this particular date was Eastern Daylight
2017-10-09 03:44:17 +02:00
Time (UTC-4). So <literal>2014-06-04 12:00 EDT</literal> specifies that
same time instant. But <literal>2014-06-04 12:00 EST</literal> specifies
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
noon Eastern Standard Time (UTC-5), regardless of whether daylight
savings was nominally in effect on that date.
</para>
<para>
To complicate matters, some jurisdictions have used the same timezone
abbreviation to mean different UTC offsets at different times; for
2017-10-09 03:44:17 +02:00
example, in Moscow <literal>MSK</literal> has meant UTC+3 in some years and
2023-03-21 17:23:20 +01:00
UTC+4 in others. <productname>PostgreSQL</productname> interprets such
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
abbreviations according to whatever they meant (or had most recently
2017-10-09 03:44:17 +02:00
meant) on the specified date; but, as with the <literal>EST</literal> example
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
above, this is not necessarily the same as local civil time on that date.
2006-09-22 18:20:00 +02:00
</para>
2006-10-16 21:58:27 +02:00
<para>
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
In all cases, timezone names and abbreviations are recognized
2017-10-09 03:44:17 +02:00
case-insensitively. (This is a change from <productname>PostgreSQL</productname>
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
versions prior to 8.2, which were case-sensitive in some contexts but
not others.)
2006-10-16 21:58:27 +02:00
</para>
2006-09-22 18:20:00 +02:00
<para>
Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 21:22:10 +02:00
Neither timezone names nor abbreviations are hard-wired into the server;
2006-09-22 18:20:00 +02:00
they are obtained from configuration files stored under
2017-10-09 03:44:17 +02:00
<filename>.../share/timezone/</filename> and <filename>.../share/timezonesets/</filename>
2006-09-22 18:20:00 +02:00
of the installation directory
2017-11-23 15:39:47 +01:00
(see <xref linkend="datetime-config-files"/>).
2006-09-22 18:20:00 +02:00
</para>
2000-01-23 02:27:39 +01:00
<para>
2017-11-23 15:39:47 +01:00
The <xref linkend="guc-timezone"/> configuration parameter can
2017-10-09 03:44:17 +02:00
be set in the file <filename>postgresql.conf</filename>, or in any of the
2017-11-23 15:39:47 +01:00
other standard ways described in <xref linkend="runtime-config"/>.
2011-09-09 23:59:11 +02:00
There are also some special ways to set it:
1999-08-06 15:43:42 +02:00
2002-11-11 21:14:04 +01:00
<itemizedlist>
2002-11-22 00:31:20 +01:00
<listitem>
<para>
2004-08-10 02:55:08 +02:00
The <acronym>SQL</acronym> command <command>SET TIME ZONE</command>
sets the time zone for the session. This is an alternative spelling
2017-10-09 03:44:17 +02:00
of <command>SET TIMEZONE TO</command> with a more SQL-spec-compatible syntax.
2000-01-23 02:27:39 +01:00
</para>
</listitem>
2002-11-11 21:14:04 +01:00
2000-01-23 02:27:39 +01:00
<listitem>
<para>
2009-04-27 18:27:36 +02:00
The <envar>PGTZ</envar> environment variable is used by
<application>libpq</application> clients
to send a <command>SET TIME ZONE</command>
2003-11-01 02:56:29 +01:00
command to the server upon connection.
2000-01-23 02:27:39 +01:00
</para>
</listitem>
</itemizedlist>
</para>
1999-08-06 15:43:42 +02:00
</sect2>
2008-11-09 01:28:35 +01:00
<sect2 id="datatype-interval-input">
<title>Interval Input</title>
<indexterm>
<primary>interval</primary>
</indexterm>
<para>
2009-06-17 23:58:49 +02:00
<type>interval</type> values can be written using the following
2008-11-09 01:28:35 +01:00
verbose syntax:
2008-11-11 03:42:33 +01:00
<synopsis>
2017-10-09 03:44:17 +02:00
<optional>@</optional> <replaceable>quantity</replaceable> <replaceable>unit</replaceable> <optional><replaceable>quantity</replaceable> <replaceable>unit</replaceable>...</optional> <optional><replaceable>direction</replaceable></optional>
2008-11-11 03:42:33 +01:00
</synopsis>
2008-11-09 01:28:35 +01:00
2017-10-09 03:44:17 +02:00
where <replaceable>quantity</replaceable> is a number (possibly signed);
<replaceable>unit</replaceable> is <literal>microsecond</literal>,
2008-11-09 01:28:35 +01:00
<literal>millisecond</literal>, <literal>second</literal>,
<literal>minute</literal>, <literal>hour</literal>, <literal>day</literal>,
<literal>week</literal>, <literal>month</literal>, <literal>year</literal>,
<literal>decade</literal>, <literal>century</literal>, <literal>millennium</literal>,
or abbreviations or plurals of these units;
2017-10-09 03:44:17 +02:00
<replaceable>direction</replaceable> can be <literal>ago</literal> or
empty. The at sign (<literal>@</literal>) is optional noise. The amounts
2009-04-27 18:27:36 +02:00
of the different units are implicitly added with appropriate
2008-11-09 01:28:35 +01:00
sign accounting. <literal>ago</literal> negates all the fields.
This syntax is also used for interval output, if
2017-11-23 15:39:47 +01:00
<xref linkend="guc-intervalstyle"/> is set to
2017-10-09 03:44:17 +02:00
<literal>postgres_verbose</literal>.
2008-11-09 01:28:35 +01:00
</para>
<para>
Quantities of days, hours, minutes, and seconds can be specified without
2017-10-09 03:44:17 +02:00
explicit unit markings. For example, <literal>'1 12:59:10'</literal> is read
the same as <literal>'1 day 12 hours 59 min 10 sec'</literal>. Also,
2008-11-09 01:28:35 +01:00
a combination of years and months can be specified with a dash;
2017-10-09 03:44:17 +02:00
for example <literal>'200-10'</literal> is read the same as <literal>'200 years
10 months'</literal>. (These shorter forms are in fact the only ones allowed
2008-11-09 01:28:35 +01:00
by the <acronym>SQL</acronym> standard, and are used for output when
2017-10-09 03:44:17 +02:00
<varname>IntervalStyle</varname> is set to <literal>sql_standard</literal>.)
2008-11-09 01:28:35 +01:00
</para>
2008-11-11 03:42:33 +01:00
<para>
Interval values can also be written as ISO 8601 time intervals, using
2017-10-09 03:44:17 +02:00
either the <quote>format with designators</quote> of the standard's section
4.4.3.2 or the <quote>alternative format</quote> of section 4.4.3.3. The
2008-11-11 03:42:33 +01:00
format with designators looks like this:
<synopsis>
2017-10-09 03:44:17 +02:00
P <replaceable>quantity</replaceable> <replaceable>unit</replaceable> <optional> <replaceable>quantity</replaceable> <replaceable>unit</replaceable> ...</optional> <optional> T <optional> <replaceable>quantity</replaceable> <replaceable>unit</replaceable> ...</optional></optional>
2008-11-11 03:42:33 +01:00
</synopsis>
2017-10-09 03:44:17 +02:00
The string must start with a <literal>P</literal>, and may include a
<literal>T</literal> that introduces the time-of-day units. The
2008-11-11 03:42:33 +01:00
available unit abbreviations are given in <xref
2017-11-23 15:39:47 +01:00
linkend="datatype-interval-iso8601-units"/>. Units may be
2008-11-11 03:42:33 +01:00
omitted, and may be specified in any order, but units smaller than
2017-10-09 03:44:17 +02:00
a day must appear after <literal>T</literal>. In particular, the meaning of
<literal>M</literal> depends on whether it is before or after
<literal>T</literal>.
2008-11-11 03:42:33 +01:00
</para>
<table id="datatype-interval-iso8601-units">
2011-01-29 19:00:18 +01:00
<title>ISO 8601 Interval Unit Abbreviations</title>
2008-11-11 03:42:33 +01:00
<tgroup cols="2">
<thead>
<row>
<entry>Abbreviation</entry>
<entry>Meaning</entry>
</row>
</thead>
<tbody>
<row>
<entry>Y</entry>
<entry>Years</entry>
</row>
<row>
<entry>M</entry>
<entry>Months (in the date part)</entry>
</row>
<row>
<entry>W</entry>
<entry>Weeks</entry>
</row>
<row>
<entry>D</entry>
<entry>Days</entry>
</row>
<row>
<entry>H</entry>
<entry>Hours</entry>
</row>
<row>
<entry>M</entry>
<entry>Minutes (in the time part)</entry>
</row>
<row>
<entry>S</entry>
<entry>Seconds</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
In the alternative format:
<synopsis>
2017-10-09 03:44:17 +02:00
P <optional> <replaceable>years</replaceable>-<replaceable>months</replaceable>-<replaceable>days</replaceable> </optional> <optional> T <replaceable>hours</replaceable>:<replaceable>minutes</replaceable>:<replaceable>seconds</replaceable> </optional>
2008-11-11 03:42:33 +01:00
</synopsis>
the string must begin with <literal>P</literal>, and a
2017-10-09 03:44:17 +02:00
<literal>T</literal> separates the date and time parts of the interval.
2008-11-11 03:42:33 +01:00
The values are given as numbers similar to ISO 8601 dates.
</para>
2008-11-09 01:28:35 +01:00
<para>
2017-10-09 03:44:17 +02:00
When writing an interval constant with a <replaceable>fields</replaceable>
2009-07-08 19:21:55 +02:00
specification, or when assigning a string to an interval column that was
2017-10-09 03:44:17 +02:00
defined with a <replaceable>fields</replaceable> specification, the interpretation of
unmarked quantities depends on the <replaceable>fields</replaceable>. For
example <literal>INTERVAL '1' YEAR</literal> is read as 1 year, whereas
<literal>INTERVAL '1'</literal> means 1 second. Also, field values
<quote>to the right</quote> of the least significant field allowed by the
<replaceable>fields</replaceable> specification are silently discarded. For
example, writing <literal>INTERVAL '1 day 2:03:04' HOUR TO MINUTE</literal>
2009-07-08 19:21:55 +02:00
results in dropping the seconds field, but not the day field.
2008-11-09 01:28:35 +01:00
</para>
<para>
2017-10-09 03:44:17 +02:00
According to the <acronym>SQL</acronym> standard all fields of an interval
2008-11-09 01:28:35 +01:00
value must have the same sign, so a leading negative sign applies to all
fields; for example the negative sign in the interval literal
2017-10-09 03:44:17 +02:00
<literal>'-1 2:03:04'</literal> applies to both the days and hour/minute/second
parts. <productname>PostgreSQL</productname> allows the fields to have different
2008-11-09 01:28:35 +01:00
signs, and traditionally treats each field in the textual representation
as independently signed, so that the hour/minute/second part is
2017-10-09 03:44:17 +02:00
considered positive in this example. If <varname>IntervalStyle</varname> is
2008-11-09 01:28:35 +01:00
set to <literal>sql_standard</literal> then a leading sign is considered
to apply to all fields (but only if no additional signs appear).
2017-10-09 03:44:17 +02:00
Otherwise the traditional <productname>PostgreSQL</productname> interpretation is
2008-11-09 01:28:35 +01:00
used. To avoid ambiguity, it's recommended to attach an explicit sign
to each field if any field is negative.
</para>
2008-11-09 18:09:48 +01:00
<para>
Doc: improve explanation of type interval, especially extract().
The explanation of interval's behavior in datatype.sgml wasn't wrong
exactly, but it was unclear, partly because it buried the lede about
there being three internal fields. Rearrange and wordsmith for more
clarity.
The discussion of extract() claimed that input of type date was
handled by casting, but actually there's been a separate SQL function
taking date for a very long time. Also, it was mostly silent about
how interval inputs are handled, but there are several field types
for which it seems useful to be specific.
Improve discussion of justify_days()/justify_hours() too.
In passing, remove vertical space in some groups of examples,
as there was little consistency about whether to have such space
or not. (I only did this within the datetime functions section;
there are some related inconsistencies elsewhere.)
Per discussion of bug #18348 from Michael Bondarenko. There
may be some code changes coming out of that discussion too,
but we likely won't back-patch them. This docs-only patch
seems useful to back-patch, though I only carried it back to
v13 because it didn't apply easily in v12.
Discussion: https://postgr.es/m/18348-b097a3587dfde8a4@postgresql.org
2024-02-20 20:35:12 +01:00
Internally, <type>interval</type> values are stored as three integral
fields: months, days, and microseconds. These fields are kept
separate because the number of days in a month varies, while a day
can have 23 or 25 hours if a daylight savings time transition is
involved. An interval input string that uses other units is
normalized into this format, and then reconstructed in a standardized
way for output, for example:
<programlisting>
SELECT '2 years 15 months 100 weeks 99 hours 123456789 milliseconds'::interval;
interval
---------------------------------------
3 years 3 mons 700 days 133:17:36.789
</programlisting>
Here weeks, which are understood as <quote>7 days</quote>, have been
kept separate, while the smaller and larger time units were
combined and normalized.
</para>
<para>
Input field values can have fractional parts, for example <literal>'1.5
2021-08-03 18:10:29 +02:00
weeks'</literal> or <literal>'01:02:03.45'</literal>. However,
Doc: improve explanation of type interval, especially extract().
The explanation of interval's behavior in datatype.sgml wasn't wrong
exactly, but it was unclear, partly because it buried the lede about
there being three internal fields. Rearrange and wordsmith for more
clarity.
The discussion of extract() claimed that input of type date was
handled by casting, but actually there's been a separate SQL function
taking date for a very long time. Also, it was mostly silent about
how interval inputs are handled, but there are several field types
for which it seems useful to be specific.
Improve discussion of justify_days()/justify_hours() too.
In passing, remove vertical space in some groups of examples,
as there was little consistency about whether to have such space
or not. (I only did this within the datetime functions section;
there are some related inconsistencies elsewhere.)
Per discussion of bug #18348 from Michael Bondarenko. There
may be some code changes coming out of that discussion too,
but we likely won't back-patch them. This docs-only patch
seems useful to back-patch, though I only carried it back to
v13 because it didn't apply easily in v12.
Discussion: https://postgr.es/m/18348-b097a3587dfde8a4@postgresql.org
2024-02-20 20:35:12 +01:00
because <type>interval</type> internally stores only integral fields,
fractional values must be converted into smaller
2021-09-26 12:17:30 +02:00
units. Fractional parts of units greater than months are rounded to
2021-08-03 18:10:29 +02:00
be an integer number of months, e.g. <literal>'1.5 years'</literal>
becomes <literal>'1 year 6 mons'</literal>. Fractional parts of
weeks and days are computed to be an integer number of days and
microseconds, assuming 30 days per month and 24 hours per day, e.g.,
<literal>'1.75 months'</literal> becomes <literal>1 mon 22 days
12:00:00</literal>. Only seconds will ever be shown as fractional
on output.
2008-11-09 18:09:48 +01:00
</para>
2008-11-11 03:42:33 +01:00
<para>
2017-11-23 15:39:47 +01:00
<xref linkend="datatype-interval-input-examples"/> shows some examples
2017-10-09 03:44:17 +02:00
of valid <type>interval</type> input.
2008-11-11 03:42:33 +01:00
</para>
<table id="datatype-interval-input-examples">
<title>Interval Input</title>
<tgroup cols="2">
<thead>
<row>
<entry>Example</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
2020-05-06 18:23:43 +02:00
<entry><literal>1-2</literal></entry>
2008-11-11 03:42:33 +01:00
<entry>SQL standard format: 1 year 2 months</entry>
</row>
<row>
2020-05-06 18:23:43 +02:00
<entry><literal>3 4:05:06</literal></entry>
2008-11-11 03:42:33 +01:00
<entry>SQL standard format: 3 days 4 hours 5 minutes 6 seconds</entry>
</row>
<row>
2020-05-06 18:23:43 +02:00
<entry><literal>1 year 2 months 3 days 4 hours 5 minutes 6 seconds</literal></entry>
2008-11-11 03:42:33 +01:00
<entry>Traditional Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds</entry>
</row>
<row>
2020-05-06 18:23:43 +02:00
<entry><literal>P1Y2M3DT4H5M6S</literal></entry>
2017-10-09 03:44:17 +02:00
<entry>ISO 8601 <quote>format with designators</quote>: same meaning as above</entry>
2008-11-11 03:42:33 +01:00
</row>
<row>
2020-05-06 18:23:43 +02:00
<entry><literal>P0001-02-03T04:05:06</literal></entry>
2017-10-09 03:44:17 +02:00
<entry>ISO 8601 <quote>alternative format</quote>: same meaning as above</entry>
2008-11-11 03:42:33 +01:00
</row>
</tbody>
</tgroup>
</table>
2008-11-09 01:28:35 +01:00
</sect2>
<sect2 id="datatype-interval-output">
<title>Interval Output</title>
<indexterm>
<primary>interval</primary>
<secondary>output format</secondary>
<seealso>formatting</seealso>
</indexterm>
Doc: improve explanation of type interval, especially extract().
The explanation of interval's behavior in datatype.sgml wasn't wrong
exactly, but it was unclear, partly because it buried the lede about
there being three internal fields. Rearrange and wordsmith for more
clarity.
The discussion of extract() claimed that input of type date was
handled by casting, but actually there's been a separate SQL function
taking date for a very long time. Also, it was mostly silent about
how interval inputs are handled, but there are several field types
for which it seems useful to be specific.
Improve discussion of justify_days()/justify_hours() too.
In passing, remove vertical space in some groups of examples,
as there was little consistency about whether to have such space
or not. (I only did this within the datetime functions section;
there are some related inconsistencies elsewhere.)
Per discussion of bug #18348 from Michael Bondarenko. There
may be some code changes coming out of that discussion too,
but we likely won't back-patch them. This docs-only patch
seems useful to back-patch, though I only carried it back to
v13 because it didn't apply easily in v12.
Discussion: https://postgr.es/m/18348-b097a3587dfde8a4@postgresql.org
2024-02-20 20:35:12 +01:00
<para>
As previously explained, <productname>PostgreSQL</productname>
stores <type>interval</type> values as months, days, and
microseconds. For output, the months field is converted to years and
months by dividing by 12. The days field is shown as-is. The
microseconds field is converted to hours, minutes, seconds, and
fractional seconds. Thus months, minutes, and seconds will never be
shown as exceeding the ranges 0–11, 0–59, and 0–59
respectively, while the displayed years, days, and hours fields can
be quite large. (The <link
linkend="function-justify-days"><function>justify_days</function></link>
and <link
linkend="function-justify-hours"><function>justify_hours</function></link>
functions can be used if it is desirable to transpose large days or
hours values into the next higher field.)
</para>
2008-11-09 01:28:35 +01:00
<para>
The output format of the interval type can be set to one of the
2017-10-09 03:44:17 +02:00
four styles <literal>sql_standard</literal>, <literal>postgres</literal>,
<literal>postgres_verbose</literal>, or <literal>iso_8601</literal>,
2008-11-09 01:28:35 +01:00
using the command <literal>SET intervalstyle</literal>.
2017-10-09 03:44:17 +02:00
The default is the <literal>postgres</literal> format.
2017-11-23 15:39:47 +01:00
<xref linkend="interval-style-output-table"/> shows examples of each
2008-11-09 01:28:35 +01:00
output style.
</para>
<para>
2017-10-09 03:44:17 +02:00
The <literal>sql_standard</literal> style produces output that conforms to
2008-11-09 01:28:35 +01:00
the SQL standard's specification for interval literal strings, if
the interval value meets the standard's restrictions (either year-month
only or day-time only, with no mixing of positive
and negative components). Otherwise the output looks like a standard
year-month literal string followed by a day-time literal string,
with explicit signs added to disambiguate mixed-sign intervals.
</para>
<para>
2017-10-09 03:44:17 +02:00
The output of the <literal>postgres</literal> style matches the output of
<productname>PostgreSQL</productname> releases prior to 8.4 when the
2017-11-23 15:39:47 +01:00
<xref linkend="guc-datestyle"/> parameter was set to <literal>ISO</literal>.
2008-11-09 01:28:35 +01:00
</para>
<para>
2017-10-09 03:44:17 +02:00
The output of the <literal>postgres_verbose</literal> style matches the output of
<productname>PostgreSQL</productname> releases prior to 8.4 when the
<varname>DateStyle</varname> parameter was set to non-<literal>ISO</literal> output.
2008-11-09 01:28:35 +01:00
</para>
2008-11-11 03:42:33 +01:00
<para>
2017-10-09 03:44:17 +02:00
The output of the <literal>iso_8601</literal> style matches the <quote>format
with designators</quote> described in section 4.4.3.2 of the
2008-11-11 03:42:33 +01:00
ISO 8601 standard.
</para>
2008-11-09 01:28:35 +01:00
<table id="interval-style-output-table">
<title>Interval Output Style Examples</title>
<tgroup cols="4">
<thead>
<row>
<entry>Style Specification</entry>
<entry>Year-Month Interval</entry>
<entry>Day-Time Interval</entry>
<entry>Mixed Interval</entry>
</row>
</thead>
<tbody>
<row>
2017-10-09 03:44:17 +02:00
<entry><literal>sql_standard</literal></entry>
2008-11-09 01:28:35 +01:00
<entry>1-2</entry>
<entry>3 4:05:06</entry>
<entry>-1-2 +3 -4:05:06</entry>
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><literal>postgres</literal></entry>
2008-11-09 01:28:35 +01:00
<entry>1 year 2 mons</entry>
<entry>3 days 04:05:06</entry>
<entry>-1 year -2 mons +3 days -04:05:06</entry>
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><literal>postgres_verbose</literal></entry>
2008-11-09 01:28:35 +01:00
<entry>@ 1 year 2 mons</entry>
<entry>@ 3 days 4 hours 5 mins 6 secs</entry>
<entry>@ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago</entry>
</row>
2008-11-11 03:42:33 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><literal>iso_8601</literal></entry>
2008-11-11 03:42:33 +01:00
<entry>P1Y2M</entry>
<entry>P3DT4H5M6S</entry>
2020-04-23 21:12:42 +02:00
<entry>P-1Y-2M3D&zwsp;T-4H-5M-6S</entry>
2008-11-11 03:42:33 +01:00
</row>
2008-11-09 01:28:35 +01:00
</tbody>
</tgroup>
</table>
</sect2>
1999-08-06 15:43:42 +02:00
</sect1>
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-boolean">
1999-08-06 15:43:42 +02:00
<title>Boolean Type</title>
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype-boolean">
<primary>Boolean</primary>
<secondary>data type</secondary>
</indexterm>
<indexterm zone="datatype-boolean">
<primary>true</primary>
</indexterm>
<indexterm zone="datatype-boolean">
<primary>false</primary>
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2001-11-21 06:53:41 +01:00
<productname>PostgreSQL</productname> provides the
2010-07-24 14:17:35 +02:00
standard <acronym>SQL</acronym> type <type>boolean</type>;
2017-11-23 15:39:47 +01:00
see <xref linkend="datatype-boolean-table"/>.
2011-05-10 05:25:16 +02:00
The <type>boolean</type> type can have several states:
<quote>true</quote>, <quote>false</quote>, and a third state,
<quote>unknown</quote>, which is represented by the
2002-11-11 21:14:04 +01:00
<acronym>SQL</acronym> null value.
2001-01-13 19:34:51 +01:00
</para>
1999-08-06 15:43:42 +02:00
2010-07-24 14:17:35 +02:00
<table id="datatype-boolean-table">
<title>Boolean Data Type</title>
<tgroup cols="3">
<thead>
<row>
<entry>Name</entry>
<entry>Storage Size</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><type>boolean</type></entry>
<entry>1 byte</entry>
2010-07-24 18:46:57 +02:00
<entry>state of true or false</entry>
2010-07-24 14:17:35 +02:00
</row>
</tbody>
</tgroup>
</table>
1999-08-06 15:43:42 +02:00
<para>
2019-06-13 04:54:46 +02:00
Boolean constants can be represented in SQL queries by the SQL
key words <literal>TRUE</literal>, <literal>FALSE</literal>,
and <literal>NULL</literal>.
</para>
<para>
The datatype input function for type <type>boolean</type> accepts these
string representations for the <quote>true</quote> state:
2001-02-14 20:37:26 +01:00
<simplelist>
2019-06-13 04:54:46 +02:00
<member><literal>true</literal></member>
<member><literal>yes</literal></member>
<member><literal>on</literal></member>
<member><literal>1</literal></member>
2001-02-14 20:37:26 +01:00
</simplelist>
2019-06-13 04:54:46 +02:00
and these representations for the <quote>false</quote> state:
2001-02-14 20:37:26 +01:00
<simplelist>
2019-06-13 04:54:46 +02:00
<member><literal>false</literal></member>
<member><literal>no</literal></member>
<member><literal>off</literal></member>
<member><literal>0</literal></member>
2001-02-14 20:37:26 +01:00
</simplelist>
2019-06-13 04:54:46 +02:00
Unique prefixes of these strings are also accepted, for
example <literal>t</literal> or <literal>n</literal>.
2009-06-17 23:58:49 +02:00
Leading or trailing whitespace is ignored, and case does not matter.
1999-08-06 15:43:42 +02:00
</para>
2010-07-24 14:17:35 +02:00
<para>
2019-06-13 04:54:46 +02:00
The datatype output function for type <type>boolean</type> always emits
either <literal>t</literal> or <literal>f</literal>, as shown in
<xref linkend="datatype-boolean-example"/>.
2010-07-24 14:17:35 +02:00
</para>
2001-02-14 20:37:26 +01:00
<example id="datatype-boolean-example">
2011-01-29 19:00:18 +01:00
<title>Using the <type>boolean</type> Type</title>
2001-02-14 20:37:26 +01:00
<programlisting>
CREATE TABLE test1 (a boolean, b text);
INSERT INTO test1 VALUES (TRUE, 'sic est');
INSERT INTO test1 VALUES (FALSE, 'non est');
SELECT * FROM test1;
a | b
---+---------
t | sic est
f | non est
SELECT * FROM test1 WHERE a;
a | b
---+---------
t | sic est
</programlisting>
</example>
2019-06-13 04:54:46 +02:00
<para>
The key words <literal>TRUE</literal> and <literal>FALSE</literal> are
the preferred (<acronym>SQL</acronym>-compliant) method for writing
Boolean constants in SQL queries. But you can also use the string
representations by following the generic string-literal constant syntax
described in <xref linkend="sql-syntax-constants-generic"/>, for
example <literal>'yes'::boolean</literal>.
</para>
<para>
Note that the parser automatically understands
that <literal>TRUE</literal> and <literal>FALSE</literal> are of
type <type>boolean</type>, but this is not so
for <literal>NULL</literal> because that can have any type.
So in some contexts you might have to cast <literal>NULL</literal>
to <type>boolean</type> explicitly, for
example <literal>NULL::boolean</literal>. Conversely, the cast can be
omitted from a string-literal Boolean value in contexts where the parser
can deduce that the literal must be of type <type>boolean</type>.
</para>
1999-08-06 15:43:42 +02:00
</sect1>
2007-04-02 05:49:42 +02:00
<sect1 id="datatype-enum">
<title>Enumerated Types</title>
<indexterm zone="datatype-enum">
<primary>data type</primary>
<secondary>enumerated (enum)</secondary>
</indexterm>
2008-12-19 02:34:19 +01:00
<indexterm zone="datatype-enum">
<primary>enumerated types</primary>
</indexterm>
2007-04-02 05:49:42 +02:00
<para>
Enumerated (enum) types are data types that
2009-04-27 18:27:36 +02:00
comprise a static, ordered set of values.
They are equivalent to the <type>enum</type>
types supported in a number of programming languages. An example of an enum
2007-04-02 05:49:42 +02:00
type might be the days of the week, or a set of status values for
a piece of data.
</para>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-enum-declaration">
2007-04-02 05:49:42 +02:00
<title>Declaration of Enumerated Types</title>
<para>
Enum types are created using the <xref
2017-11-23 15:39:47 +01:00
linkend="sql-createtype"/> command,
2007-04-02 05:49:42 +02:00
for example:
<programlisting>
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
</programlisting>
Once created, the enum type can be used in table and function
definitions much like any other type:
<programlisting>
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE person (
name text,
current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
SELECT * FROM person WHERE current_mood = 'happy';
2022-04-20 17:04:28 +02:00
name | current_mood
2007-04-02 05:49:42 +02:00
------+--------------
Moe | happy
(1 row)
</programlisting>
2010-08-10 22:41:27 +02:00
</para>
2007-04-02 05:49:42 +02:00
</sect2>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-enum-ordering">
2007-04-02 05:49:42 +02:00
<title>Ordering</title>
<para>
The ordering of the values in an enum type is the
2009-04-27 18:27:36 +02:00
order in which the values were listed when the type was created.
2007-04-02 05:49:42 +02:00
All standard comparison operators and related
aggregate functions are supported for enums. For example:
2010-11-23 21:27:50 +01:00
2007-04-02 05:49:42 +02:00
<programlisting>
INSERT INTO person VALUES ('Larry', 'sad');
INSERT INTO person VALUES ('Curly', 'ok');
SELECT * FROM person WHERE current_mood > 'sad';
2022-04-20 17:04:28 +02:00
name | current_mood
2007-04-02 05:49:42 +02:00
-------+--------------
Moe | happy
Curly | ok
(2 rows)
SELECT * FROM person WHERE current_mood > 'sad' ORDER BY current_mood;
2022-04-20 17:04:28 +02:00
name | current_mood
2007-04-02 05:49:42 +02:00
-------+--------------
Curly | ok
Moe | happy
(2 rows)
2010-11-23 21:27:50 +01:00
SELECT name
2009-04-27 18:27:36 +02:00
FROM person
WHERE current_mood = (SELECT MIN(current_mood) FROM person);
2022-04-20 17:04:28 +02:00
name
2007-04-02 05:49:42 +02:00
-------
Larry
(1 row)
</programlisting>
2010-08-10 22:41:27 +02:00
</para>
2007-04-02 05:49:42 +02:00
</sect2>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-enum-type-safety">
2007-04-02 05:49:42 +02:00
<title>Type Safety</title>
<para>
2009-04-27 18:27:36 +02:00
Each enumerated data type is separate and cannot
2010-08-10 22:41:27 +02:00
be compared with other enumerated types. See this example:
2007-04-02 05:49:42 +02:00
<programlisting>
CREATE TYPE happiness AS ENUM ('happy', 'very happy', 'ecstatic');
2009-04-27 18:27:36 +02:00
CREATE TABLE holidays (
num_weeks integer,
2007-04-02 05:49:42 +02:00
happiness happiness
);
INSERT INTO holidays(num_weeks,happiness) VALUES (4, 'happy');
INSERT INTO holidays(num_weeks,happiness) VALUES (6, 'very happy');
INSERT INTO holidays(num_weeks,happiness) VALUES (8, 'ecstatic');
INSERT INTO holidays(num_weeks,happiness) VALUES (2, 'sad');
ERROR: invalid input value for enum happiness: "sad"
SELECT person.name, holidays.num_weeks FROM person, holidays
WHERE person.current_mood = holidays.happiness;
ERROR: operator does not exist: mood = happiness
</programlisting>
2010-08-10 22:41:27 +02:00
</para>
2007-04-02 05:49:42 +02:00
<para>
If you really need to do something like that, you can either
write a custom operator or add explicit casts to your query:
<programlisting>
SELECT person.name, holidays.num_weeks FROM person, holidays
WHERE person.current_mood::text = holidays.happiness::text;
2022-04-20 17:04:28 +02:00
name | num_weeks
2007-04-02 05:49:42 +02:00
------+-----------
Moe | 4
(1 row)
</programlisting>
2010-08-10 22:41:27 +02:00
</para>
2007-04-02 05:49:42 +02:00
</sect2>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-enum-implementation-details">
2007-04-02 05:49:42 +02:00
<title>Implementation Details</title>
2010-11-23 21:27:50 +01:00
2018-03-16 18:44:34 +01:00
<para>
Enum labels are case sensitive, so
<type>'happy'</type> is not the same as <type>'HAPPY'</type>.
White space in the labels is significant too.
</para>
<para>
Although enum types are primarily intended for static sets of values,
there is support for adding new values to an existing enum type, and for
renaming values (see <xref linkend="sql-altertype"/>). Existing values
cannot be removed from an enum type, nor can the sort ordering of such
values be changed, short of dropping and re-creating the enum type.
</para>
2007-04-02 05:49:42 +02:00
<para>
An enum value occupies four bytes on disk. The length of an enum
value's textual label is limited by the <symbol>NAMEDATALEN</symbol>
setting compiled into <productname>PostgreSQL</productname>; in standard
builds this means at most 63 bytes.
</para>
2008-12-19 02:34:19 +01:00
<para>
The translations from internal enum values to textual labels are
kept in the system catalog
<link linkend="catalog-pg-enum"><structname>pg_enum</structname></link>.
Querying this catalog directly can be useful.
</para>
2007-04-02 05:49:42 +02:00
</sect2>
</sect1>
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-geometric">
1999-08-06 15:43:42 +02:00
<title>Geometric Types</title>
<para>
2002-11-11 21:14:04 +01:00
Geometric data types represent two-dimensional spatial
2017-11-23 15:39:47 +01:00
objects. <xref linkend="datatype-geo-table"/> shows the geometric
2013-10-09 07:09:18 +02:00
types available in <productname>PostgreSQL</productname>.
1999-08-06 15:43:42 +02:00
</para>
2002-11-11 21:14:04 +01:00
<table id="datatype-geo-table">
2001-02-14 20:37:26 +01:00
<title>Geometric Types</title>
1999-08-06 15:43:42 +02:00
<tgroup cols="4">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="2*"/>
<colspec colname="col4" colwidth="1*"/>
1999-08-06 15:43:42 +02:00
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>Storage Size</entry>
<entry>Description</entry>
2013-10-09 07:09:18 +02:00
<entry>Representation</entry>
1999-08-06 15:43:42 +02:00
</row>
</thead>
<tbody>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>point</type></entry>
<entry>16 bytes</entry>
2009-04-27 18:27:36 +02:00
<entry>Point on a plane</entry>
2003-11-01 02:56:29 +01:00
<entry>(x,y)</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>line</type></entry>
<entry>32 bytes</entry>
2013-10-09 07:09:18 +02:00
<entry>Infinite line</entry>
<entry>{A,B,C}</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>lseg</type></entry>
<entry>32 bytes</entry>
<entry>Finite line segment</entry>
2023-11-13 22:26:59 +01:00
<entry>[(x1,y1),(x2,y2)]</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>box</type></entry>
<entry>32 bytes</entry>
<entry>Rectangular box</entry>
2023-11-13 22:26:59 +01:00
<entry>(x1,y1),(x2,y2)</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>path</type></entry>
<entry>16+16n bytes</entry>
<entry>Closed path (similar to polygon)</entry>
<entry>((x1,y1),...)</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>path</type></entry>
<entry>16+16n bytes</entry>
<entry>Open path</entry>
<entry>[(x1,y1),...]</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>polygon</type></entry>
<entry>40+16n bytes</entry>
<entry>Polygon (similar to closed path)</entry>
<entry>((x1,y1),...)</entry>
1999-08-06 15:43:42 +02:00
</row>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>circle</type></entry>
<entry>24 bytes</entry>
<entry>Circle</entry>
2009-04-27 18:27:36 +02:00
<entry><(x,y),r> (center point and radius)</entry>
1999-08-06 15:43:42 +02:00
</row>
</tbody>
</tgroup>
</table>
<para>
A rich set of functions and operators is available to perform various geometric
2010-07-03 06:03:06 +02:00
operations such as scaling, translation, rotation, and determining
2017-11-23 15:39:47 +01:00
intersections. They are explained in <xref linkend="functions-geometry"/>.
1999-08-06 15:43:42 +02:00
</para>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-geometric-points">
2003-03-13 02:30:29 +01:00
<title>Points</title>
1998-03-01 09:16:16 +01:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>point</primary>
</indexterm>
1998-12-18 17:11:12 +01:00
<para>
2010-07-03 06:03:06 +02:00
Points are the fundamental two-dimensional building block for geometric
types. Values of type <type>point</type> are specified using either of
the following syntaxes:
1998-03-01 09:16:16 +01:00
2002-11-11 21:14:04 +01:00
<synopsis>
2000-05-02 22:02:03 +02:00
( <replaceable>x</replaceable> , <replaceable>y</replaceable> )
<replaceable>x</replaceable> , <replaceable>y</replaceable>
2002-11-11 21:14:04 +01:00
</synopsis>
2000-05-02 22:02:03 +02:00
2017-10-09 03:44:17 +02:00
where <replaceable>x</replaceable> and <replaceable>y</replaceable> are the respective
2009-04-27 18:27:36 +02:00
coordinates, as floating-point numbers.
1999-08-06 15:43:42 +02:00
</para>
2010-07-03 06:03:06 +02:00
<para>
Points are output using the first syntax.
</para>
1999-08-06 15:43:42 +02:00
</sect2>
1998-03-01 09:16:16 +01:00
2014-05-05 22:26:27 +02:00
<sect2 id="datatype-line">
2013-10-09 07:09:18 +02:00
<title>Lines</title>
<indexterm>
<primary>line</primary>
</indexterm>
<para>
2014-12-14 20:58:03 +01:00
Lines are represented by the linear
2017-10-09 03:44:17 +02:00
equation <replaceable>A</replaceable>x + <replaceable>B</replaceable>y + <replaceable>C</replaceable> = 0,
where <replaceable>A</replaceable> and <replaceable>B</replaceable> are not both zero. Values
2014-12-14 20:58:03 +01:00
of type <type>line</type> are input and output in the following form:
2013-10-09 07:09:18 +02:00
<synopsis>
{ <replaceable>A</replaceable>, <replaceable>B</replaceable>, <replaceable>C</replaceable> }
</synopsis>
Alternatively, any of the following forms can be used for input:
<synopsis>
[ ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> ) ]
( ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> ) )
( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> )
<replaceable>x1</replaceable> , <replaceable>y1</replaceable> , <replaceable>x2</replaceable> , <replaceable>y2</replaceable>
</synopsis>
where
<literal>(<replaceable>x1</replaceable>,<replaceable>y1</replaceable>)</literal>
and
<literal>(<replaceable>x2</replaceable>,<replaceable>y2</replaceable>)</literal>
2014-12-14 20:58:03 +01:00
are two different points on the line.
2013-10-09 07:09:18 +02:00
</para>
</sect2>
2014-05-05 22:26:27 +02:00
<sect2 id="datatype-lseg">
2003-03-13 02:30:29 +01:00
<title>Line Segments</title>
1998-03-01 09:16:16 +01:00
2001-05-13 00:51:36 +02:00
<indexterm>
2003-08-31 19:32:24 +02:00
<primary>lseg</primary>
</indexterm>
<indexterm>
<primary>line segment</primary>
2001-05-13 00:51:36 +02:00
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2014-12-14 20:58:03 +01:00
Line segments are represented by pairs of points that are the endpoints
of the segment. Values of type <type>lseg</type> are specified using any
of the following syntaxes:
2000-05-02 22:02:03 +02:00
2002-11-11 21:14:04 +01:00
<synopsis>
2010-07-03 06:03:06 +02:00
[ ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> ) ]
2000-05-02 22:02:03 +02:00
( ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> ) )
2010-07-03 06:03:06 +02:00
( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> )
2000-05-02 22:02:03 +02:00
<replaceable>x1</replaceable> , <replaceable>y1</replaceable> , <replaceable>x2</replaceable> , <replaceable>y2</replaceable>
2002-11-11 21:14:04 +01:00
</synopsis>
2000-05-02 22:02:03 +02:00
2003-03-13 02:30:29 +01:00
where
<literal>(<replaceable>x1</replaceable>,<replaceable>y1</replaceable>)</literal>
and
<literal>(<replaceable>x2</replaceable>,<replaceable>y2</replaceable>)</literal>
are the end points of the line segment.
1999-08-06 15:43:42 +02:00
</para>
2010-07-03 06:03:06 +02:00
<para>
Line segments are output using the first syntax.
</para>
1999-08-06 15:43:42 +02:00
</sect2>
1998-03-01 09:16:16 +01:00
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-geometric-boxes">
2003-03-13 02:30:29 +01:00
<title>Boxes</title>
1998-03-01 09:16:16 +01:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>box (data type)</primary>
</indexterm>
2003-08-31 19:32:24 +02:00
<indexterm>
<primary>rectangle</primary>
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2000-12-22 19:57:50 +01:00
Boxes are represented by pairs of points that are opposite
1999-08-06 15:43:42 +02:00
corners of the box.
2010-07-03 06:03:06 +02:00
Values of type <type>box</type> are specified using any of the following
syntaxes:
1998-03-01 09:16:16 +01:00
2002-11-11 21:14:04 +01:00
<synopsis>
2000-05-02 22:02:03 +02:00
( ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> ) )
2010-07-03 06:03:06 +02:00
( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ( <replaceable>x2</replaceable> , <replaceable>y2</replaceable> )
2000-05-02 22:02:03 +02:00
<replaceable>x1</replaceable> , <replaceable>y1</replaceable> , <replaceable>x2</replaceable> , <replaceable>y2</replaceable>
2002-11-11 21:14:04 +01:00
</synopsis>
2000-05-02 22:02:03 +02:00
2003-03-13 02:30:29 +01:00
where
<literal>(<replaceable>x1</replaceable>,<replaceable>y1</replaceable>)</literal>
and
<literal>(<replaceable>x2</replaceable>,<replaceable>y2</replaceable>)</literal>
2004-12-23 06:37:40 +01:00
are any two opposite corners of the box.
2000-05-02 22:02:03 +02:00
</para>
1999-08-06 15:43:42 +02:00
2000-05-02 22:02:03 +02:00
<para>
2010-07-03 06:03:06 +02:00
Boxes are output using the second syntax.
</para>
<para>
2009-06-17 23:58:49 +02:00
Any two opposite corners can be supplied on input, but the values
will be reordered as needed to store the
2010-07-03 06:03:06 +02:00
upper right and lower left corners, in that order.
1999-08-06 15:43:42 +02:00
</para>
</sect2>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-geometric-paths">
2003-03-13 02:30:29 +01:00
<title>Paths</title>
1999-08-06 15:43:42 +02:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>path (data type)</primary>
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2004-12-23 06:37:40 +01:00
Paths are represented by lists of connected points. Paths can be
2001-09-13 17:55:24 +02:00
<firstterm>open</firstterm>, where
2009-04-27 18:27:36 +02:00
the first and last points in the list are considered not connected, or
2004-12-23 06:37:40 +01:00
<firstterm>closed</firstterm>,
2005-01-08 06:19:18 +01:00
where the first and last points are considered connected.
1999-08-06 15:43:42 +02:00
</para>
<para>
2010-07-03 06:03:06 +02:00
Values of type <type>path</type> are specified using any of the following
syntaxes:
1999-08-06 15:43:42 +02:00
2002-11-11 21:14:04 +01:00
<synopsis>
2000-05-02 22:02:03 +02:00
[ ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ... , ( <replaceable>xn</replaceable> , <replaceable>yn</replaceable> ) ]
2010-07-03 06:03:06 +02:00
( ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ... , ( <replaceable>xn</replaceable> , <replaceable>yn</replaceable> ) )
( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ... , ( <replaceable>xn</replaceable> , <replaceable>yn</replaceable> )
( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> , ... , <replaceable>xn</replaceable> , <replaceable>yn</replaceable> )
<replaceable>x1</replaceable> , <replaceable>y1</replaceable> , ... , <replaceable>xn</replaceable> , <replaceable>yn</replaceable>
2002-11-11 21:14:04 +01:00
</synopsis>
2000-05-02 22:02:03 +02:00
2003-03-13 02:30:29 +01:00
where the points are the end points of the line segments
2017-10-09 03:44:17 +02:00
comprising the path. Square brackets (<literal>[]</literal>) indicate
an open path, while parentheses (<literal>()</literal>) indicate a
2010-07-03 06:03:06 +02:00
closed path. When the outermost parentheses are omitted, as
in the third through fifth syntaxes, a closed path is assumed.
2000-05-02 22:02:03 +02:00
</para>
1999-08-06 15:43:42 +02:00
2000-05-02 22:02:03 +02:00
<para>
2009-06-17 23:58:49 +02:00
Paths are output using the first or second syntax, as appropriate.
1999-08-06 15:43:42 +02:00
</para>
</sect2>
2014-05-05 22:26:27 +02:00
<sect2 id="datatype-polygon">
2003-03-13 02:30:29 +01:00
<title>Polygons</title>
1999-08-06 15:43:42 +02:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>polygon</primary>
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2024-01-04 00:47:34 +01:00
Polygons are represented by lists of points (the vertices of the
2021-12-13 23:33:32 +01:00
polygon). Polygons are very similar to closed paths; the essential
difference is that a polygon is considered to include the area
within it, while a path is not.
1999-08-06 15:43:42 +02:00
</para>
<para>
2010-07-03 06:03:06 +02:00
Values of type <type>polygon</type> are specified using any of the
following syntaxes:
1999-08-06 15:43:42 +02:00
2002-11-11 21:14:04 +01:00
<synopsis>
2000-05-02 22:02:03 +02:00
( ( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ... , ( <replaceable>xn</replaceable> , <replaceable>yn</replaceable> ) )
2010-07-03 06:03:06 +02:00
( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> ) , ... , ( <replaceable>xn</replaceable> , <replaceable>yn</replaceable> )
( <replaceable>x1</replaceable> , <replaceable>y1</replaceable> , ... , <replaceable>xn</replaceable> , <replaceable>yn</replaceable> )
<replaceable>x1</replaceable> , <replaceable>y1</replaceable> , ... , <replaceable>xn</replaceable> , <replaceable>yn</replaceable>
2002-11-11 21:14:04 +01:00
</synopsis>
2000-05-02 22:02:03 +02:00
2003-03-13 02:30:29 +01:00
where the points are the end points of the line segments
comprising the boundary of the polygon.
2000-05-02 22:02:03 +02:00
</para>
1999-08-06 15:43:42 +02:00
2000-05-02 22:02:03 +02:00
<para>
1999-08-06 15:43:42 +02:00
Polygons are output using the first syntax.
</para>
</sect2>
1998-03-01 09:16:16 +01:00
2014-05-05 22:26:27 +02:00
<sect2 id="datatype-circle">
2003-03-13 02:30:29 +01:00
<title>Circles</title>
1998-03-01 09:16:16 +01:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>circle</primary>
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2009-04-27 18:27:36 +02:00
Circles are represented by a center point and radius.
2010-07-03 06:03:06 +02:00
Values of type <type>circle</type> are specified using any of the
following syntaxes:
1998-03-01 09:16:16 +01:00
2002-11-11 21:14:04 +01:00
<synopsis>
2000-05-02 22:02:03 +02:00
< ( <replaceable>x</replaceable> , <replaceable>y</replaceable> ) , <replaceable>r</replaceable> >
( ( <replaceable>x</replaceable> , <replaceable>y</replaceable> ) , <replaceable>r</replaceable> )
2010-07-03 06:03:06 +02:00
( <replaceable>x</replaceable> , <replaceable>y</replaceable> ) , <replaceable>r</replaceable>
<replaceable>x</replaceable> , <replaceable>y</replaceable> , <replaceable>r</replaceable>
2002-11-11 21:14:04 +01:00
</synopsis>
2000-05-02 22:02:03 +02:00
2003-03-13 02:30:29 +01:00
where
2017-10-09 03:44:17 +02:00
<literal>(<replaceable>x</replaceable>,<replaceable>y</replaceable>)</literal>
2010-07-03 06:03:06 +02:00
is the center point and <replaceable>r</replaceable> is the radius of the
circle.
2000-05-02 22:02:03 +02:00
</para>
1999-08-06 15:43:42 +02:00
2000-05-02 22:02:03 +02:00
<para>
1999-08-06 15:43:42 +02:00
Circles are output using the first syntax.
</para>
</sect2>
</sect1>
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-net-types">
2003-03-13 02:30:29 +01:00
<title>Network Address Types</title>
1999-08-06 15:43:42 +02:00
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype-net-types">
<primary>network</primary>
2003-08-31 19:32:24 +02:00
<secondary>data types</secondary>
2001-05-13 00:51:36 +02:00
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2017-10-09 03:44:17 +02:00
<productname>PostgreSQL</productname> offers data types to store IPv4, IPv6, and MAC
2017-11-23 15:39:47 +01:00
addresses, as shown in <xref linkend="datatype-net-types-table"/>. It
2009-04-27 18:27:36 +02:00
is better to use these types instead of plain text types to store
2009-06-17 23:58:49 +02:00
network addresses, because
2009-04-27 18:27:36 +02:00
these types offer input error checking and specialized
2017-11-23 15:39:47 +01:00
operators and functions (see <xref linkend="functions-net"/>).
2002-11-11 21:14:04 +01:00
</para>
1999-08-06 15:43:42 +02:00
2001-01-13 19:34:51 +01:00
<table tocentry="1" id="datatype-net-types-table">
2003-03-13 02:30:29 +01:00
<title>Network Address Types</title>
<tgroup cols="3">
2020-05-06 18:23:43 +02:00
<colspec colname="col1" colwidth="1*"/>
<colspec colname="col2" colwidth="1*"/>
<colspec colname="col3" colwidth="2*"/>
1999-08-06 15:43:42 +02:00
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>Storage Size</entry>
<entry>Description</entry>
1999-08-06 15:43:42 +02:00
</row>
</thead>
<tbody>
2000-10-04 17:47:45 +02:00
1999-08-06 15:43:42 +02:00
<row>
2003-11-01 02:56:29 +01:00
<entry><type>cidr</type></entry>
2007-04-06 21:22:38 +02:00
<entry>7 or 19 bytes</entry>
2004-12-23 06:37:40 +01:00
<entry>IPv4 and IPv6 networks</entry>
1999-08-06 15:43:42 +02:00
</row>
2000-10-04 17:47:45 +02:00
1999-08-06 15:43:42 +02:00
<row>
2003-11-01 02:56:29 +01:00
<entry><type>inet</type></entry>
2007-04-06 21:22:38 +02:00
<entry>7 or 19 bytes</entry>
2003-11-01 02:56:29 +01:00
<entry>IPv4 and IPv6 hosts and networks</entry>
1999-08-06 15:43:42 +02:00
</row>
2000-10-04 17:47:45 +02:00
<row>
2003-11-01 02:56:29 +01:00
<entry><type>macaddr</type></entry>
<entry>6 bytes</entry>
<entry>MAC addresses</entry>
2000-10-04 17:47:45 +02:00
</row>
2017-03-15 16:16:25 +01:00
<row>
<entry><type>macaddr8</type></entry>
<entry>8 bytes</entry>
<entry>MAC addresses (EUI-64 format)</entry>
</row>
1999-08-06 15:43:42 +02:00
</tbody>
</tgroup>
</table>
2000-10-04 17:47:45 +02:00
<para>
2003-06-25 00:21:24 +02:00
When sorting <type>inet</type> or <type>cidr</type> data types,
IPv4 addresses will always sort before IPv6 addresses, including
2009-04-27 18:27:36 +02:00
IPv4 addresses encapsulated or mapped to IPv6 addresses, such as
2008-01-02 20:53:13 +01:00
::10.2.3.4 or ::ffff:10.4.3.2.
2000-10-04 17:47:45 +02:00
</para>
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-inet">
2000-11-10 21:13:27 +01:00
<title><type>inet</type></title>
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>inet (data type)</primary>
</indexterm>
2000-11-10 21:13:27 +01:00
<para>
2003-06-25 00:21:24 +02:00
The <type>inet</type> type holds an IPv4 or IPv6 host address, and
2009-04-27 18:27:36 +02:00
optionally its subnet, all in one field.
The subnet is represented by the number of network address bits
present in the host address (the
2003-06-25 00:21:24 +02:00
<quote>netmask</quote>). If the netmask is 32 and the address is IPv4,
then the value does not indicate a subnet, only a single host.
2004-12-23 06:37:40 +01:00
In IPv6, the address length is 128 bits, so 128 bits specify a
2003-06-25 00:21:24 +02:00
unique host address. Note that if you
2009-04-27 18:27:36 +02:00
want to accept only networks, you should use the
2000-11-10 21:13:27 +01:00
<type>cidr</type> type rather than <type>inet</type>.
</para>
<para>
2003-06-25 00:21:24 +02:00
The input format for this type is
<replaceable class="parameter">address/y</replaceable>
where
<replaceable class="parameter">address</replaceable>
is an IPv4 or IPv6 address and
<replaceable class="parameter">y</replaceable>
is the number of bits in the netmask. If the
<replaceable class="parameter">/y</replaceable>
2020-04-23 21:12:42 +02:00
portion is omitted, the
netmask is taken to be 32 for IPv4 or 128 for IPv6,
so the value represents
2003-06-25 00:21:24 +02:00
just a single host. On display, the
<replaceable class="parameter">/y</replaceable>
portion is suppressed if the netmask specifies a single host.
2000-11-10 21:13:27 +01:00
</para>
</sect2>
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-cidr">
2017-10-09 03:44:17 +02:00
<title><type>cidr</type></title>
1999-08-06 15:43:42 +02:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>cidr</primary>
</indexterm>
1999-08-06 15:43:42 +02:00
<para>
2003-06-25 00:21:24 +02:00
The <type>cidr</type> type holds an IPv4 or IPv6 network specification.
2000-11-10 21:13:27 +01:00
Input and output formats follow Classless Internet Domain Routing
conventions.
2003-03-13 02:30:29 +01:00
The format for specifying networks is <replaceable
2017-10-09 03:44:17 +02:00
class="parameter">address/y</replaceable> where <replaceable
2020-04-23 21:12:42 +02:00
class="parameter">address</replaceable> is the network's lowest
address represented as an
2003-06-25 00:21:24 +02:00
IPv4 or IPv6 address, and <replaceable
2017-10-09 03:44:17 +02:00
class="parameter">y</replaceable> is the number of bits in the netmask. If
<replaceable class="parameter">y</replaceable> is omitted, it is calculated
2003-03-13 02:30:29 +01:00
using assumptions from the older classful network numbering system, except
2009-04-27 18:27:36 +02:00
it will be at least large enough to include all of the octets
2003-03-13 02:30:29 +01:00
written in the input. It is an error to specify a network address
that has bits set to the right of the specified netmask.
1999-08-06 15:43:42 +02:00
</para>
<para>
2017-11-23 15:39:47 +01:00
<xref linkend="datatype-net-cidr-table"/> shows some examples.
2002-11-11 21:14:04 +01:00
</para>
1998-10-27 07:14:41 +01:00
2002-11-11 21:14:04 +01:00
<table id="datatype-net-cidr-table">
2017-10-09 03:44:17 +02:00
<title><type>cidr</type> Type Input Examples</title>
2000-12-22 19:00:24 +01:00
<tgroup cols="3">
2010-11-23 21:27:50 +01:00
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry><type>cidr</type> Input</entry>
<entry><type>cidr</type> Output</entry>
2010-09-09 02:48:22 +02:00
<entry><literal><function>abbrev(<type>cidr</type>)</function></literal></entry>
2003-11-01 02:56:29 +01:00
</row>
1998-12-18 17:11:12 +01:00
</thead>
<tbody>
2003-11-01 02:56:29 +01:00
<row>
<entry>192.168.100.128/25</entry>
<entry>192.168.100.128/25</entry>
<entry>192.168.100.128/25</entry>
</row>
<row>
<entry>192.168/24</entry>
<entry>192.168.0.0/24</entry>
<entry>192.168.0/24</entry>
</row>
<row>
<entry>192.168/25</entry>
<entry>192.168.0.0/25</entry>
<entry>192.168.0.0/25</entry>
</row>
<row>
<entry>192.168.1</entry>
<entry>192.168.1.0/24</entry>
<entry>192.168.1/24</entry>
</row>
<row>
<entry>192.168</entry>
<entry>192.168.0.0/24</entry>
<entry>192.168.0/24</entry>
</row>
<row>
<entry>128.1</entry>
<entry>128.1.0.0/16</entry>
<entry>128.1/16</entry>
</row>
<row>
<entry>128</entry>
<entry>128.0.0.0/16</entry>
<entry>128.0/16</entry>
</row>
<row>
<entry>128.1.2</entry>
<entry>128.1.2.0/24</entry>
<entry>128.1.2/24</entry>
</row>
<row>
<entry>10.1.2</entry>
<entry>10.1.2.0/24</entry>
<entry>10.1.2/24</entry>
</row>
<row>
<entry>10.1</entry>
<entry>10.1.0.0/16</entry>
<entry>10.1/16</entry>
</row>
<row>
<entry>10</entry>
<entry>10.0.0.0/8</entry>
<entry>10/8</entry>
</row>
<row>
<entry>10.1.2.3/32</entry>
<entry>10.1.2.3/32</entry>
2003-06-25 00:21:24 +02:00
<entry>10.1.2.3/32</entry>
2003-11-01 02:56:29 +01:00
</row>
2003-06-25 00:21:24 +02:00
<row>
2003-11-01 02:56:29 +01:00
<entry>2001:4f8:3:ba::/64</entry>
<entry>2001:4f8:3:ba::/64</entry>
2020-04-23 21:12:42 +02:00
<entry>2001:4f8:3:ba/64</entry>
2003-11-01 02:56:29 +01:00
</row>
2003-06-25 00:21:24 +02:00
<row>
2020-04-23 21:12:42 +02:00
<entry>2001:4f8:3:ba:&zwsp;2e0:81ff:fe22:d1f1/128</entry>
<entry>2001:4f8:3:ba:&zwsp;2e0:81ff:fe22:d1f1/128</entry>
<entry>2001:4f8:3:ba:&zwsp;2e0:81ff:fe22:d1f1/128</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry>::ffff:1.2.3.0/120</entry>
<entry>::ffff:1.2.3.0/120</entry>
2003-06-25 00:21:24 +02:00
<entry>::ffff:1.2.3/120</entry>
2003-11-01 02:56:29 +01:00
</row>
<row>
<entry>::ffff:1.2.3.0/128</entry>
<entry>::ffff:1.2.3.0/128</entry>
2003-06-25 00:21:24 +02:00
<entry>::ffff:1.2.3.0/128</entry>
2003-11-01 02:56:29 +01:00
</row>
1998-12-18 17:11:12 +01:00
</tbody>
</tgroup>
</table>
2000-11-11 20:50:31 +01:00
</sect2>
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-inet-vs-cidr">
2003-03-13 02:30:29 +01:00
<title><type>inet</type> vs. <type>cidr</type></title>
1998-12-18 17:11:12 +01:00
1999-08-06 15:43:42 +02:00
<para>
2000-11-10 21:13:27 +01:00
The essential difference between <type>inet</type> and <type>cidr</type>
data types is that <type>inet</type> accepts values with nonzero bits to
2017-08-11 22:40:56 +02:00
the right of the netmask, whereas <type>cidr</type> does not. For
example, <literal>192.168.0.1/24</literal> is valid for <type>inet</type>
but not for <type>cidr</type>.
2003-03-13 02:30:29 +01:00
</para>
2000-11-10 21:13:27 +01:00
<tip>
<para>
2003-11-01 02:56:29 +01:00
If you do not like the output format for <type>inet</type> or
2017-10-09 03:44:17 +02:00
<type>cidr</type> values, try the functions <function>host</function>,
<function>text</function>, and <function>abbrev</function>.
2003-11-01 02:56:29 +01:00
</para>
2000-11-10 21:13:27 +01:00
</tip>
1999-08-06 15:43:42 +02:00
</sect2>
2000-10-04 17:47:45 +02:00
2001-01-13 19:34:51 +01:00
<sect2 id="datatype-macaddr">
2011-01-29 19:00:18 +01:00
<title><type>macaddr</type></title>
2000-10-04 17:47:45 +02:00
2001-05-13 00:51:36 +02:00
<indexterm>
<primary>macaddr (data type)</primary>
</indexterm>
<indexterm>
<primary>MAC address</primary>
<see>macaddr</see>
</indexterm>
2000-10-04 17:47:45 +02:00
<para>
2017-10-09 03:44:17 +02:00
The <type>macaddr</type> type stores MAC addresses, known for example
2008-10-03 17:37:18 +02:00
from Ethernet card hardware addresses (although MAC addresses are
used for other purposes as well). Input is accepted in the
following formats:
2001-10-09 20:46:00 +02:00
<simplelist>
2017-10-09 03:44:17 +02:00
<member><literal>'08:00:2b:01:02:03'</literal></member>
<member><literal>'08-00-2b-01-02-03'</literal></member>
<member><literal>'08002b:010203'</literal></member>
<member><literal>'08002b-010203'</literal></member>
<member><literal>'0800.2b01.0203'</literal></member>
<member><literal>'0800-2b01-0203'</literal></member>
<member><literal>'08002b010203'</literal></member>
2001-10-09 20:46:00 +02:00
</simplelist>
2020-04-23 21:12:42 +02:00
These examples all specify the same address. Upper and
2008-10-03 17:37:18 +02:00
lower case is accepted for the digits
2017-10-09 03:44:17 +02:00
<literal>a</literal> through <literal>f</literal>. Output is always in the
2008-10-03 17:37:18 +02:00
first of the forms shown.
</para>
<para>
2022-06-03 17:51:37 +02:00
IEEE Standard 802-2001 specifies the second form shown (with hyphens)
2008-10-03 17:37:18 +02:00
as the canonical form for MAC addresses, and specifies the first
2022-06-03 17:51:37 +02:00
form (with colons) as used with bit-reversed, MSB-first notation, so that
08-00-2b-01-02-03 = 10:00:D4:80:40:C0. This convention is widely
2012-11-12 04:50:24 +01:00
ignored nowadays, and it is relevant only for obsolete network
2008-10-03 17:37:18 +02:00
protocols (such as Token Ring). PostgreSQL makes no provisions
2022-06-03 17:51:37 +02:00
for bit reversal; all accepted formats use the canonical LSB
2008-10-03 17:37:18 +02:00
order.
</para>
<para>
2014-10-21 22:16:39 +02:00
The remaining five input formats are not part of any standard.
2000-10-04 17:47:45 +02:00
</para>
</sect2>
2017-03-15 16:16:25 +01:00
<sect2 id="datatype-macaddr8">
<title><type>macaddr8</type></title>
<indexterm>
<primary>macaddr8 (data type)</primary>
</indexterm>
<indexterm>
<primary>MAC address (EUI-64 format)</primary>
<see>macaddr</see>
</indexterm>
<para>
2017-10-09 03:44:17 +02:00
The <type>macaddr8</type> type stores MAC addresses in EUI-64
2017-03-15 16:16:25 +01:00
format, known for example from Ethernet card hardware addresses
(although MAC addresses are used for other purposes as well).
This type can accept both 6 and 8 byte length MAC addresses
and stores them in 8 byte length format. MAC addresses given
in 6 byte format will be stored in 8 byte length format with the
4th and 5th bytes set to FF and FE, respectively.
Note that IPv6 uses a modified EUI-64 format where the 7th bit
should be set to one after the conversion from EUI-48. The
2017-10-09 03:44:17 +02:00
function <function>macaddr8_set7bit</function> is provided to make this
2017-03-15 16:16:25 +01:00
change.
Generally speaking, any input which is comprised of pairs of hex
digits (on byte boundaries), optionally separated consistently by
2017-10-09 03:44:17 +02:00
one of <literal>':'</literal>, <literal>'-'</literal> or <literal>'.'</literal>, is
2017-03-15 16:16:25 +01:00
accepted. The number of hex digits must be either 16 (8 bytes) or
12 (6 bytes). Leading and trailing whitespace is ignored.
The following are examples of input formats that are accepted:
<simplelist>
2017-10-09 03:44:17 +02:00
<member><literal>'08:00:2b:01:02:03:04:05'</literal></member>
<member><literal>'08-00-2b-01-02-03-04-05'</literal></member>
<member><literal>'08002b:0102030405'</literal></member>
<member><literal>'08002b-0102030405'</literal></member>
<member><literal>'0800.2b01.0203.0405'</literal></member>
<member><literal>'0800-2b01-0203-0405'</literal></member>
<member><literal>'08002b01:02030405'</literal></member>
<member><literal>'08002b0102030405'</literal></member>
2017-03-15 16:16:25 +01:00
</simplelist>
2020-04-23 21:12:42 +02:00
These examples all specify the same address. Upper and
2017-03-15 16:16:25 +01:00
lower case is accepted for the digits
2017-10-09 03:44:17 +02:00
<literal>a</literal> through <literal>f</literal>. Output is always in the
2017-03-15 16:16:25 +01:00
first of the forms shown.
2020-04-23 21:12:42 +02:00
</para>
2017-03-15 16:16:25 +01:00
2020-04-23 21:12:42 +02:00
<para>
The last six input formats shown above are not part of any standard.
</para>
2017-03-15 16:16:25 +01:00
2020-04-23 21:12:42 +02:00
<para>
2017-03-15 16:16:25 +01:00
To convert a traditional 48 bit MAC address in EUI-48 format to
modified EUI-64 format to be included as the host portion of an
2017-10-09 03:44:17 +02:00
IPv6 address, use <function>macaddr8_set7bit</function> as shown:
2017-03-15 16:16:25 +01:00
<programlisting>
SELECT macaddr8_set7bit('08:00:2b:01:02:03');
<computeroutput>
2022-04-20 17:04:28 +02:00
macaddr8_set7bit
2017-03-15 16:16:25 +01:00
-------------------------
0a:00:2b:ff:fe:01:02:03
(1 row)
</computeroutput>
</programlisting>
</para>
</sect2>
1999-08-06 15:43:42 +02:00
</sect1>
2001-01-13 19:34:51 +01:00
<sect1 id="datatype-bit">
<title>Bit String Types</title>
2001-05-13 00:51:36 +02:00
<indexterm zone="datatype-bit">
2003-08-31 19:32:24 +02:00
<primary>bit string</primary>
2001-05-13 00:51:36 +02:00
<secondary>data type</secondary>
</indexterm>
2001-01-13 19:34:51 +01:00
<para>
Bit strings are strings of 1's and 0's. They can be used to store
or visualize bit masks. There are two SQL bit types:
2003-03-13 02:30:29 +01:00
<type>bit(<replaceable>n</replaceable>)</type> and <type>bit
varying(<replaceable>n</replaceable>)</type>, where
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
<replaceable>n</replaceable> is a positive integer.
2001-05-22 18:37:17 +02:00
</para>
<para>
2003-03-13 02:30:29 +01:00
<type>bit</type> type data must match the length
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
<replaceable>n</replaceable> exactly; it is an error to attempt to
2003-03-13 02:30:29 +01:00
store shorter or longer bit strings. <type>bit varying</type> data is
2001-05-22 18:37:17 +02:00
of variable length up to the maximum length
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
<replaceable>n</replaceable>; longer strings will be rejected.
2003-03-13 02:30:29 +01:00
Writing <type>bit</type> without a length is equivalent to
<literal>bit(1)</literal>, while <type>bit varying</type> without a length
2001-05-22 18:37:17 +02:00
specification means unlimited length.
</para>
<note>
<para>
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
If one explicitly casts a bit-string value to
2017-10-09 03:44:17 +02:00
<type>bit(<replaceable>n</replaceable>)</type>, it will be truncated or
zero-padded on the right to be exactly <replaceable>n</replaceable> bits,
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
without raising an error. Similarly,
if one explicitly casts a bit-string value to
2017-10-09 03:44:17 +02:00
<type>bit varying(<replaceable>n</replaceable>)</type>, it will be truncated
on the right if it is more than <replaceable>n</replaceable> bits.
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
</para>
</note>
2001-05-22 18:37:17 +02:00
<para>
Refer to <xref
2017-11-23 15:39:47 +01:00
linkend="sql-syntax-bit-strings"/> for information about the syntax
2001-01-13 19:34:51 +01:00
of bit string constants. Bit-logical operators and string
manipulation functions are available; see <xref
2017-11-23 15:39:47 +01:00
linkend="functions-bitstring"/>.
2001-01-13 19:34:51 +01:00
</para>
2001-05-22 18:37:17 +02:00
<example>
2011-01-29 19:00:18 +01:00
<title>Using the Bit String Types</title>
2001-05-22 18:37:17 +02:00
2001-01-13 19:34:51 +01:00
<programlisting>
CREATE TABLE test (a BIT(3), b BIT VARYING(5));
INSERT INTO test VALUES (B'101', B'00');
2001-05-22 18:37:17 +02:00
INSERT INTO test VALUES (B'10', B'101');
<computeroutput>
2003-09-13 00:17:24 +02:00
ERROR: bit string length 2 does not match type bit(3)
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
</computeroutput>
INSERT INTO test VALUES (B'10'::bit(3), B'101');
SELECT * FROM test;
<computeroutput>
a | b
-----+-----
101 | 00
100 | 101
2001-05-22 18:37:17 +02:00
</computeroutput>
2001-01-13 19:34:51 +01:00
</programlisting>
2001-05-22 18:37:17 +02:00
</example>
2001-01-13 19:34:51 +01:00
2007-04-06 21:22:38 +02:00
<para>
A bit string value requires 1 byte for each group of 8 bits, plus
5 or 8 bytes overhead depending on the length of the string
(but long values may be compressed or moved out-of-line, as explained
2017-11-23 15:39:47 +01:00
in <xref linkend="datatype-character"/> for character strings).
2007-04-06 21:22:38 +02:00
</para>
2001-01-13 19:34:51 +01:00
</sect1>
2007-10-21 22:04:37 +02:00
<sect1 id="datatype-textsearch">
<title>Text Search Types</title>
2007-04-20 23:51:46 +02:00
2007-10-21 22:04:37 +02:00
<indexterm zone="datatype-textsearch">
<primary>full text search</primary>
<secondary>data types</secondary>
2007-04-20 23:51:46 +02:00
</indexterm>
2007-10-21 22:04:37 +02:00
<indexterm zone="datatype-textsearch">
<primary>text search</primary>
<secondary>data types</secondary>
</indexterm>
2007-04-21 19:26:18 +02:00
<para>
2007-10-21 22:04:37 +02:00
<productname>PostgreSQL</productname> provides two data types that
are designed to support full text search, which is the activity of
2017-10-09 03:44:17 +02:00
searching through a collection of natural-language <firstterm>documents</firstterm>
to locate those that best match a <firstterm>query</firstterm>.
2009-06-17 23:58:49 +02:00
The <type>tsvector</type> type represents a document in a form optimized
for text search; the <type>tsquery</type> type similarly represents
2009-04-27 18:27:36 +02:00
a text query.
2017-11-23 15:39:47 +01:00
<xref linkend="textsearch"/> provides a detailed explanation of this
facility, and <xref linkend="functions-textsearch"/> summarizes the
2007-10-21 22:04:37 +02:00
related functions and operators.
2007-04-21 19:26:18 +02:00
</para>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
<sect2 id="datatype-tsvector">
<title><type>tsvector</type></title>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
<indexterm>
<primary>tsvector (data type)</primary>
</indexterm>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
<para>
A <type>tsvector</type> value is a sorted list of distinct
2017-10-09 03:44:17 +02:00
<firstterm>lexemes</firstterm>, which are words that have been
<firstterm>normalized</firstterm> to merge different variants of the same word
2017-11-23 15:39:47 +01:00
(see <xref linkend="textsearch"/> for details). Sorting and
2007-10-21 22:04:37 +02:00
duplicate-elimination are done automatically during input, as shown in
this example:
2007-08-29 22:37:14 +02:00
<programlisting>
SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
tsvector
----------------------------------------------------
2008-05-16 18:31:02 +02:00
'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'
2007-08-29 22:37:14 +02:00
</programlisting>
2008-05-16 18:31:02 +02:00
To represent
2007-11-21 05:01:37 +01:00
lexemes containing whitespace or punctuation, surround them with quotes:
2007-10-21 22:04:37 +02:00
<programlisting>
SELECT $$the lexeme ' ' contains spaces$$::tsvector;
2022-04-20 17:04:28 +02:00
tsvector
2007-10-21 22:04:37 +02:00
-------------------------------------------
2008-05-16 18:31:02 +02:00
' ' 'contains' 'lexeme' 'spaces' 'the'
2007-10-21 22:04:37 +02:00
</programlisting>
2009-04-27 18:27:36 +02:00
(We use dollar-quoted string literals in this example and the next one
to avoid the confusion of having to double quote marks within the
2007-11-21 05:01:37 +01:00
literals.) Embedded quotes and backslashes must be doubled:
2007-08-29 22:37:14 +02:00
<programlisting>
2007-10-21 22:04:37 +02:00
SELECT $$the lexeme 'Joe''s' contains a quote$$::tsvector;
2022-04-20 17:04:28 +02:00
tsvector
2007-10-21 22:04:37 +02:00
------------------------------------------------
2008-05-16 18:31:02 +02:00
'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the'
2007-08-29 22:37:14 +02:00
</programlisting>
2017-10-09 03:44:17 +02:00
Optionally, integer <firstterm>positions</firstterm>
2009-04-27 18:27:36 +02:00
can be attached to lexemes:
2007-08-29 22:37:14 +02:00
<programlisting>
SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;
tsvector
2020-05-15 00:13:08 +02:00
-------------------------------------------------------------------&zwsp;------------
2008-05-16 18:31:02 +02:00
'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4
2007-08-29 22:37:14 +02:00
</programlisting>
2007-10-21 22:04:37 +02:00
A position normally indicates the source word's location in the
document. Positional information can be used for
<firstterm>proximity ranking</firstterm>. Position values can
2009-04-27 18:27:36 +02:00
range from 1 to 16383; larger numbers are silently set to 16383.
2008-01-12 22:51:36 +01:00
Duplicate positions for the same lexeme are discarded.
2007-10-21 22:04:37 +02:00
</para>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
<para>
Lexemes that have positions can further be labeled with a
2017-10-09 03:44:17 +02:00
<firstterm>weight</firstterm>, which can be <literal>A</literal>,
2007-10-21 22:04:37 +02:00
<literal>B</literal>, <literal>C</literal>, or <literal>D</literal>.
<literal>D</literal> is the default and hence is not shown on output:
2007-08-29 22:37:14 +02:00
<programlisting>
2007-10-21 22:04:37 +02:00
SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
2022-04-20 17:04:28 +02:00
tsvector
2007-10-21 22:04:37 +02:00
----------------------------
'a':1A 'cat':5 'fat':2B,4C
</programlisting>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
Weights are typically used to reflect document structure, for example
by marking title words differently from body words. Text search
ranking functions can assign different priorities to the different
weight markers.
</para>
<para>
It is important to understand that the
2016-06-29 21:00:25 +02:00
<type>tsvector</type> type itself does not perform any word
normalization; it assumes the words it is given are normalized
appropriately for the application. For example,
2007-10-21 22:04:37 +02:00
<programlisting>
2016-06-29 21:00:25 +02:00
SELECT 'The Fat Rats'::tsvector;
2022-04-20 17:04:28 +02:00
tsvector
2007-10-21 22:04:37 +02:00
--------------------
2008-05-16 18:31:02 +02:00
'Fat' 'Rats' 'The'
2007-08-29 22:37:14 +02:00
</programlisting>
2007-10-21 22:04:37 +02:00
For most English-text-searching applications the above words would
be considered non-normalized, but <type>tsvector</type> doesn't care.
Raw document text should usually be passed through
2017-10-09 03:44:17 +02:00
<function>to_tsvector</function> to normalize the words appropriately
2007-10-21 22:04:37 +02:00
for searching:
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
<programlisting>
2010-11-23 21:27:50 +01:00
SELECT to_tsvector('english', 'The Fat Rats');
2022-04-20 17:04:28 +02:00
to_tsvector
2007-10-21 22:04:37 +02:00
-----------------
'fat':2 'rat':3
</programlisting>
2007-08-29 22:37:14 +02:00
2017-11-23 15:39:47 +01:00
Again, see <xref linkend="textsearch"/> for more detail.
2007-10-21 22:04:37 +02:00
</para>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
</sect2>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
<sect2 id="datatype-tsquery">
<title><type>tsquery</type></title>
<indexterm>
<primary>tsquery (data type)</primary>
</indexterm>
<para>
A <type>tsquery</type> value stores lexemes that are to be
2016-06-09 06:30:59 +02:00
searched for, and can combine them using the Boolean operators
<literal>&</literal> (AND), <literal>|</literal> (OR), and
2017-10-09 03:44:17 +02:00
<literal>!</literal> (NOT), as well as the phrase search operator
<literal><-></literal> (FOLLOWED BY). There is also a variant
<literal><<replaceable>N</replaceable>></literal> of the FOLLOWED BY
operator, where <replaceable>N</replaceable> is an integer constant that
2016-06-29 21:00:25 +02:00
specifies the distance between the two lexemes being searched
2017-10-09 03:44:17 +02:00
for. <literal><-></literal> is equivalent to <literal><1></literal>.
2016-06-09 06:30:59 +02:00
</para>
<para>
2016-06-29 21:00:25 +02:00
Parentheses can be used to enforce grouping of these operators.
2017-10-09 03:44:17 +02:00
In the absence of parentheses, <literal>!</literal> (NOT) binds most tightly,
2016-06-29 21:00:25 +02:00
<literal><-></literal> (FOLLOWED BY) next most tightly, then
<literal>&</literal> (AND), with <literal>|</literal> (OR) binding
the least tightly.
</para>
<para>
Here are some examples:
2007-08-29 22:37:14 +02:00
<programlisting>
2010-07-29 21:34:41 +02:00
SELECT 'fat & rat'::tsquery;
2022-04-20 17:04:28 +02:00
tsquery
2007-08-29 22:37:14 +02:00
---------------
2007-10-21 22:04:37 +02:00
'fat' & 'rat'
SELECT 'fat & (rat | cat)'::tsquery;
2022-04-20 17:04:28 +02:00
tsquery
2007-10-21 22:04:37 +02:00
---------------------------
'fat' & ( 'rat' | 'cat' )
SELECT 'fat & rat & ! cat'::tsquery;
2022-04-20 17:04:28 +02:00
tsquery
2007-10-21 22:04:37 +02:00
------------------------
'fat' & 'rat' & !'cat'
</programlisting>
</para>
<para>
Optionally, lexemes in a <type>tsquery</type> can be labeled with
one or more weight letters, which restricts them to match only
2017-10-09 03:44:17 +02:00
<type>tsvector</type> lexemes with one of those weights:
2007-10-21 22:04:37 +02:00
<programlisting>
2007-08-29 22:37:14 +02:00
SELECT 'fat:ab & cat'::tsquery;
tsquery
------------------
'fat':AB & 'cat'
</programlisting>
2007-10-21 22:04:37 +02:00
</para>
2007-08-29 22:37:14 +02:00
2008-05-16 18:31:02 +02:00
<para>
2017-10-09 03:44:17 +02:00
Also, lexemes in a <type>tsquery</type> can be labeled with <literal>*</literal>
2008-05-16 18:31:02 +02:00
to specify prefix matching:
<programlisting>
SELECT 'super:*'::tsquery;
2022-04-20 17:04:28 +02:00
tsquery
2008-05-16 18:31:02 +02:00
-----------
'super':*
</programlisting>
2017-10-09 03:44:17 +02:00
This query will match any word in a <type>tsvector</type> that begins
with <quote>super</quote>.
2008-05-16 18:31:02 +02:00
</para>
2007-10-21 22:04:37 +02:00
<para>
2009-04-27 18:27:36 +02:00
Quoting rules for lexemes are the same as described previously for
2017-10-09 03:44:17 +02:00
lexemes in <type>tsvector</type>; and, as with <type>tsvector</type>,
2009-04-27 18:27:36 +02:00
any required normalization of words must be done before converting
2017-10-09 03:44:17 +02:00
to the <type>tsquery</type> type. The <function>to_tsquery</function>
2007-10-21 22:04:37 +02:00
function is convenient for performing such normalization:
2007-08-29 22:37:14 +02:00
<programlisting>
2007-12-04 00:49:51 +01:00
SELECT to_tsquery('Fat:ab & Cats');
2022-04-20 17:04:28 +02:00
to_tsquery
2007-10-21 22:04:37 +02:00
------------------
2007-12-04 00:49:51 +01:00
'fat':AB & 'cat'
2007-08-29 22:37:14 +02:00
</programlisting>
2016-06-29 21:00:25 +02:00
2017-10-09 03:44:17 +02:00
Note that <function>to_tsquery</function> will process prefixes in the same way
2016-06-29 21:00:25 +02:00
as other words, which means this comparison returns true:
<programlisting>
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
?column?
----------
t
</programlisting>
2017-10-09 03:44:17 +02:00
because <literal>postgres</literal> gets stemmed to <literal>postgr</literal>:
2016-06-29 21:00:25 +02:00
<programlisting>
SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
to_tsvector | to_tsquery
---------------+------------
'postgradu':1 | 'postgr':*
</programlisting>
2017-10-09 03:44:17 +02:00
which will match the stemmed form of <literal>postgraduate</literal>.
2007-10-21 22:04:37 +02:00
</para>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
</sect2>
</sect1>
<sect1 id="datatype-uuid">
<title><acronym>UUID</acronym> Type</title>
2007-08-29 22:37:14 +02:00
2007-10-21 22:04:37 +02:00
<indexterm zone="datatype-uuid">
<primary>UUID</primary>
</indexterm>
<para>
The data type <type>uuid</type> stores Universally Unique Identifiers
2024-04-10 13:53:25 +02:00
(UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
2020-12-01 13:36:30 +01:00
ISO/IEC 9834-8:2005, and related standards.
2009-04-27 18:27:36 +02:00
(Some systems refer to this data type as a globally unique identifier, or
GUID,<indexterm><primary>GUID</primary></indexterm> instead.) This
2007-10-21 22:04:37 +02:00
identifier is a 128-bit quantity that is generated by an algorithm chosen
to make it very unlikely that the same identifier will be generated by
anyone else in the known universe using the same algorithm. Therefore,
for distributed systems, these identifiers provide a better uniqueness
2009-04-27 18:27:36 +02:00
guarantee than sequence generators, which
2007-10-21 22:04:37 +02:00
are only unique within a single database.
</para>
<para>
A UUID is written as a sequence of lower-case hexadecimal digits,
in several groups separated by hyphens, specifically a group of 8
digits followed by three groups of 4 digits followed by a group of
12 digits, for a total of 32 digits representing the 128 bits. An
example of a UUID in this standard form is:
<programlisting>
a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11
</programlisting>
<productname>PostgreSQL</productname> also accepts the following
alternative forms for input:
use of upper-case digits, the standard format surrounded by
2008-11-03 23:14:40 +01:00
braces, omitting some or all hyphens, adding a hyphen after any
group of four digits. Examples are:
2007-10-21 22:04:37 +02:00
<programlisting>
A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
{a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
a0eebc999c0b4ef8bb6d6bb9bd380a11
2008-11-03 23:14:40 +01:00
a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
{a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}
2007-10-21 22:04:37 +02:00
</programlisting>
Output is always in the standard form.
</para>
<para>
2019-07-14 14:30:27 +02:00
See <xref linkend="functions-uuid"/> for how to generate a UUID in
<productname>PostgreSQL</productname>.
2007-10-21 22:04:37 +02:00
</para>
2007-08-29 22:37:14 +02:00
</sect1>
2007-04-02 17:27:02 +02:00
<sect1 id="datatype-xml">
2017-10-09 03:44:17 +02:00
<title><acronym>XML</acronym> Type</title>
2007-04-02 17:27:02 +02:00
<indexterm zone="datatype-xml">
<primary>XML</primary>
</indexterm>
<para>
2009-04-27 18:27:36 +02:00
The <type>xml</type> data type can be used to store XML data. Its
2007-04-02 17:27:02 +02:00
advantage over storing XML data in a <type>text</type> field is that it
2009-06-17 23:58:49 +02:00
checks the input values for well-formedness, and there are support
functions to perform type-safe operations on it; see <xref
2017-11-23 15:39:47 +01:00
linkend="functions-xml"/>. Use of this data type requires the
2010-11-23 21:27:50 +01:00
installation to have been built with <command>configure
2017-10-09 03:44:17 +02:00
--with-libxml</command>.
2007-04-02 17:27:02 +02:00
</para>
<para>
2007-04-05 03:46:27 +02:00
The <type>xml</type> type can store well-formed
2007-04-02 17:27:02 +02:00
<quote>documents</quote>, as defined by the XML standard, as well
2019-03-23 21:24:30 +01:00
as <quote>content</quote> fragments, which are defined by reference
to the more permissive
<ulink url="https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode"><quote>document node</quote></ulink>
of the XQuery and XPath data model.
Roughly, this means that content fragments can have
2007-04-02 17:27:02 +02:00
more than one top-level element or character node. The expression
<literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
can be used to evaluate whether a particular <type>xml</type>
value is a full document or only a content fragment.
</para>
2019-04-01 22:20:22 +02:00
<para>
Limits and compatibility notes for the <type>xml</type> data type
can be found in <xref linkend="xml-limits-conformance"/>.
</para>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-xml-creating">
2007-05-21 19:10:29 +02:00
<title>Creating XML Values</title>
2007-04-02 17:27:02 +02:00
<para>
To produce a value of type <type>xml</type> from character data,
use the function
<function>xmlparse</function>:<indexterm><primary>xmlparse</primary></indexterm>
<synopsis>
XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
</synopsis>
Examples:
<programlisting><![CDATA[
2008-02-13 23:46:55 +01:00
XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>')
2007-05-21 19:10:29 +02:00
XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')
2007-04-02 17:27:02 +02:00
]]></programlisting>
While this is the only way to convert character strings into XML
values according to the SQL standard, the PostgreSQL-specific
syntaxes:
<programlisting><![CDATA[
xml '<foo>bar</foo>'
'<foo>bar</foo>'::xml
]]></programlisting>
can also be used.
</para>
<para>
2009-04-27 18:27:36 +02:00
The <type>xml</type> type does not validate input values
2009-06-17 23:58:49 +02:00
against a document type declaration
(DTD),<indexterm><primary>DTD</primary></indexterm>
even when the input value specifies a DTD.
2010-03-30 00:01:08 +02:00
There is also currently no built-in support for validating against
other XML schema languages such as XML Schema.
2007-04-02 17:27:02 +02:00
</para>
<para>
2009-04-27 18:27:36 +02:00
The inverse operation, producing a character string value from
2007-04-02 17:27:02 +02:00
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
2023-03-15 21:58:59 +01:00
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [ NO ] INDENT ] )
2007-04-02 17:27:02 +02:00
</synopsis>
2009-04-27 18:27:36 +02:00
<replaceable>type</replaceable> can be
2007-04-02 17:27:02 +02:00
<type>character</type>, <type>character varying</type>, or
2009-06-17 23:58:49 +02:00
<type>text</type> (or an alias for one of those). Again, according
2007-04-02 17:27:02 +02:00
to the SQL standard, this is the only way to convert between type
<type>xml</type> and character types, but PostgreSQL also allows
you to simply cast the value.
</para>
2023-03-15 21:58:59 +01:00
<para>
The <literal>INDENT</literal> option causes the result to be
pretty-printed, while <literal>NO INDENT</literal> (which is the
default) just emits the original input string. Casting to a character
type likewise produces the original string.
</para>
2007-04-02 17:27:02 +02:00
<para>
2009-04-27 18:27:36 +02:00
When a character string value is cast to or from type
2007-04-02 17:27:02 +02:00
<type>xml</type> without going through <type>XMLPARSE</type> or
<type>XMLSERIALIZE</type>, respectively, the choice of
<literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
determined by the <quote>XML option</quote>
<indexterm><primary>XML option</primary></indexterm>
session configuration parameter, which can be set using the
2009-04-27 18:27:36 +02:00
standard command:
2007-04-02 17:27:02 +02:00
<synopsis>
SET XML OPTION { DOCUMENT | CONTENT };
</synopsis>
or the more PostgreSQL-like syntax
<synopsis>
SET xmloption TO { DOCUMENT | CONTENT };
</synopsis>
The default is <literal>CONTENT</literal>, so all forms of XML
data are allowed.
</para>
2010-06-29 02:03:39 +02:00
2007-05-21 19:10:29 +02:00
</sect2>
2007-04-02 17:27:02 +02:00
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-xml-encoding-handling">
2007-05-21 19:10:29 +02:00
<title>Encoding Handling</title>
2007-04-02 17:27:02 +02:00
<para>
Care must be taken when dealing with multiple character encodings
on the client, server, and in the XML data passed through them.
When using the text mode to pass queries to the server and query
results to the client (which is the normal mode), PostgreSQL
converts all character data passed between the client and the
server and vice versa to the character encoding of the respective
2017-11-23 15:39:47 +01:00
end; see <xref linkend="multibyte"/>. This includes string
2007-04-02 17:27:02 +02:00
representations of XML values, such as in the above examples.
This would ordinarily mean that encoding declarations contained in
2009-04-27 18:27:36 +02:00
XML data can become invalid as the character data is converted
2013-05-21 03:13:13 +02:00
to other encodings while traveling between client and server,
2009-04-27 18:27:36 +02:00
because the embedded encoding declaration is not changed. To cope
with this behavior, encoding declarations contained in
character strings presented for input to the <type>xml</type> type
are <emphasis>ignored</emphasis>, and content is assumed
2007-04-02 17:27:02 +02:00
to be in the current server encoding. Consequently, for correct
2009-04-27 18:27:36 +02:00
processing, character strings of XML data must be sent
2007-04-02 17:27:02 +02:00
from the client in the current client encoding. It is the
2009-04-27 18:27:36 +02:00
responsibility of the client to either convert documents to the
2009-06-17 23:58:49 +02:00
current client encoding before sending them to the server, or to
2007-04-02 17:27:02 +02:00
adjust the client encoding appropriately. On output, values of
type <type>xml</type> will not have an encoding declaration, and
2009-04-27 18:27:36 +02:00
clients should assume all data is in the current client
2007-04-02 17:27:02 +02:00
encoding.
</para>
<para>
2009-04-27 18:27:36 +02:00
When using binary mode to pass query parameters to the server
2016-06-28 20:21:43 +02:00
and query results back to the client, no encoding conversion
2007-04-02 17:27:02 +02:00
is performed, so the situation is different. In this case, an
encoding declaration in the XML data will be observed, and if it
is absent, the data will be assumed to be in UTF-8 (as required by
2009-04-27 18:27:36 +02:00
the XML standard; note that PostgreSQL does not support UTF-16).
On output, data will have an encoding declaration
2007-04-02 17:27:02 +02:00
specifying the client encoding, unless the client encoding is
UTF-8, in which case it will be omitted.
</para>
<para>
Needless to say, processing XML data with PostgreSQL will be less
2009-04-27 18:27:36 +02:00
error-prone and more efficient if the XML data encoding, client encoding,
2007-04-02 17:27:02 +02:00
and server encoding are the same. Since XML data is internally
processed in UTF-8, computations will be most efficient if the
server encoding is also UTF-8.
</para>
2009-06-10 22:25:41 +02:00
<caution>
<para>
Some XML-related functions may not work at all on non-ASCII data
when the server encoding is not UTF-8. This is known to be an
2017-10-09 03:44:17 +02:00
issue for <function>xmltable()</function> and <function>xpath()</function> in particular.
2009-06-10 22:25:41 +02:00
</para>
</caution>
2007-05-21 19:10:29 +02:00
</sect2>
2023-01-09 21:08:24 +01:00
<sect2 id="datatype-xml-accessing-xml-values">
2007-05-21 19:10:29 +02:00
<title>Accessing XML Values</title>
<para>
The <type>xml</type> data type is unusual in that it does not
provide any comparison operators. This is because there is no
well-defined and universally useful comparison algorithm for XML
data. One consequence of this is that you cannot retrieve rows by
comparing an <type>xml</type> column against a search value. XML
values should therefore typically be accompanied by a separate key
field such as an ID. An alternative solution for comparing XML
values is to convert them to character strings first, but note
that character string comparison has little to do with a useful
XML comparison method.
</para>
<para>
Since there are no comparison operators for the <type>xml</type>
data type, it is not possible to create an index directly on a
column of this type. If speedy searches in XML data are desired,
2009-04-27 18:27:36 +02:00
possible workarounds include casting the expression to a
2007-05-21 19:10:29 +02:00
character string type and indexing that, or indexing an XPath
2009-04-27 18:27:36 +02:00
expression. Of course, the actual query would have to be adjusted
2007-05-21 19:10:29 +02:00
to search by the indexed expression.
</para>
<para>
2009-04-27 18:27:36 +02:00
The text-search functionality in PostgreSQL can also be used to speed
up full-document searches of XML data. The necessary
preprocessing support is, however, not yet available in the PostgreSQL
distribution.
2007-05-21 19:10:29 +02:00
</para>
</sect2>
2007-04-02 17:27:02 +02:00
</sect1>
Introduce jsonb, a structured format for storing json.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.
The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.
This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.
Authors: Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
2014-03-23 21:40:19 +01:00
&json;
2012-01-31 17:48:23 +01:00
2003-03-13 02:30:29 +01:00
&array;
2004-06-07 06:04:47 +02:00
&rowtypes;
2011-11-03 12:16:28 +01:00
&rangetypes;
Documentation improvements around domain types.
I was a bit surprised to find that domains were almost completely
unmentioned in the main SGML documentation, outside of the reference
pages for CREATE/ALTER/DROP DOMAIN. In particular, noplace was it
mentioned that we don't support domains over composite, making it
hard to document the planned fix for that.
Hence, add a section about domains to chapter 8 (Data Types).
Also, modernize the type system overview in section 37.2; it had never
heard of range types, and insisted on calling arrays base types, which
seems a bit odd from a user's perspective; furthermore it didn't fit well
with the fact that we now support arrays over types other than base types.
It seems appropriate to use the term "container types" to describe all of
arrays, composites, and ranges, so let's do that.
Also a few other minor improvements, notably improve an example query
in rowtypes.sgml by using a LATERAL function instead of an ad-hoc
OFFSET 0 clause.
In part this is mop-up for commit c12d570fa, which missed updating 37.2
to reflect the fact that it added arrays of domains. We could possibly
back-patch this without that claim, but I don't feel a strong need to.
2017-10-24 20:08:40 +02:00
<sect1 id="domains">
<title>Domain Types</title>
<indexterm zone="domains">
<primary>domain</primary>
</indexterm>
<indexterm zone="domains">
<primary>data type</primary>
<secondary>domain</secondary>
</indexterm>
<para>
A <firstterm>domain</firstterm> is a user-defined data type that is
based on another <firstterm>underlying type</firstterm>. Optionally,
it can have constraints that restrict its valid values to a subset of
what the underlying type would allow. Otherwise it behaves like the
underlying type — for example, any operator or function that
can be applied to the underlying type will work on the domain type.
The underlying type can be any built-in or user-defined base type,
2017-10-26 19:47:45 +02:00
enum type, array type, composite type, range type, or another domain.
Documentation improvements around domain types.
I was a bit surprised to find that domains were almost completely
unmentioned in the main SGML documentation, outside of the reference
pages for CREATE/ALTER/DROP DOMAIN. In particular, noplace was it
mentioned that we don't support domains over composite, making it
hard to document the planned fix for that.
Hence, add a section about domains to chapter 8 (Data Types).
Also, modernize the type system overview in section 37.2; it had never
heard of range types, and insisted on calling arrays base types, which
seems a bit odd from a user's perspective; furthermore it didn't fit well
with the fact that we now support arrays over types other than base types.
It seems appropriate to use the term "container types" to describe all of
arrays, composites, and ranges, so let's do that.
Also a few other minor improvements, notably improve an example query
in rowtypes.sgml by using a LATERAL function instead of an ad-hoc
OFFSET 0 clause.
In part this is mop-up for commit c12d570fa, which missed updating 37.2
to reflect the fact that it added arrays of domains. We could possibly
back-patch this without that claim, but I don't feel a strong need to.
2017-10-24 20:08:40 +02:00
</para>
<para>
For example, we could create a domain over integers that accepts only
positive integers:
<programlisting>
CREATE DOMAIN posint AS integer CHECK (VALUE > 0);
CREATE TABLE mytable (id posint);
INSERT INTO mytable VALUES(1); -- works
INSERT INTO mytable VALUES(-1); -- fails
</programlisting>
</para>
<para>
When an operator or function of the underlying type is applied to a
domain value, the domain is automatically down-cast to the underlying
type. Thus, for example, the result of <literal>mytable.id - 1</literal>
is considered to be of type <type>integer</type> not <type>posint</type>.
We could write <literal>(mytable.id - 1)::posint</literal> to cast the
result back to <type>posint</type>, causing the domain's constraints
to be rechecked. In this case, that would result in an error if the
expression had been applied to an <structfield>id</structfield> value of
1. Assigning a value of the underlying type to a field or variable of
the domain type is allowed without writing an explicit cast, but the
domain's constraints will be checked.
</para>
<para>
2017-11-23 15:39:47 +01:00
For additional information see <xref linkend="sql-createdomain"/>.
Documentation improvements around domain types.
I was a bit surprised to find that domains were almost completely
unmentioned in the main SGML documentation, outside of the reference
pages for CREATE/ALTER/DROP DOMAIN. In particular, noplace was it
mentioned that we don't support domains over composite, making it
hard to document the planned fix for that.
Hence, add a section about domains to chapter 8 (Data Types).
Also, modernize the type system overview in section 37.2; it had never
heard of range types, and insisted on calling arrays base types, which
seems a bit odd from a user's perspective; furthermore it didn't fit well
with the fact that we now support arrays over types other than base types.
It seems appropriate to use the term "container types" to describe all of
arrays, composites, and ranges, so let's do that.
Also a few other minor improvements, notably improve an example query
in rowtypes.sgml by using a LATERAL function instead of an ad-hoc
OFFSET 0 clause.
In part this is mop-up for commit c12d570fa, which missed updating 37.2
to reflect the fact that it added arrays of domains. We could possibly
back-patch this without that claim, but I don't feel a strong need to.
2017-10-24 20:08:40 +02:00
</para>
</sect1>
2002-04-25 04:56:56 +02:00
<sect1 id="datatype-oid">
<title>Object Identifier Types</title>
<indexterm zone="datatype-oid">
<primary>object identifier</primary>
<secondary>data type</secondary>
</indexterm>
<indexterm zone="datatype-oid">
<primary>oid</primary>
</indexterm>
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regclass</primary>
2002-04-25 04:56:56 +02:00
</indexterm>
2020-03-18 21:20:01 +01:00
<indexterm zone="datatype-oid">
<primary>regcollation</primary>
</indexterm>
2002-04-25 04:56:56 +02:00
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regconfig</primary>
2002-04-25 04:56:56 +02:00
</indexterm>
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regdictionary</primary>
2002-04-25 04:56:56 +02:00
</indexterm>
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regnamespace</primary>
2002-04-25 04:56:56 +02:00
</indexterm>
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regoper</primary>
2002-04-25 04:56:56 +02:00
</indexterm>
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regoperator</primary>
2002-04-25 04:56:56 +02:00
</indexterm>
2007-08-21 03:11:32 +02:00
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regproc</primary>
2007-08-21 03:11:32 +02:00
</indexterm>
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regprocedure</primary>
2007-08-21 03:11:32 +02:00
</indexterm>
2002-04-25 22:14:43 +02:00
<indexterm zone="datatype-oid">
2020-03-18 14:51:37 +01:00
<primary>regrole</primary>
</indexterm>
<indexterm zone="datatype-oid">
<primary>regtype</primary>
2002-04-25 22:14:43 +02:00
</indexterm>
2020-04-07 01:08:14 +02:00
<indexterm zone="datatype-oid">
<primary>xid8</primary>
</indexterm>
2002-04-25 22:14:43 +02:00
<indexterm zone="datatype-oid">
<primary>cid</primary>
</indexterm>
<indexterm zone="datatype-oid">
<primary>tid</primary>
</indexterm>
2020-03-18 14:51:37 +01:00
<indexterm zone="datatype-oid">
<primary>xid</primary>
</indexterm>
2002-04-25 04:56:56 +02:00
<para>
Object identifiers (OIDs) are used internally by
2003-12-01 23:08:02 +01:00
<productname>PostgreSQL</productname> as primary keys for various
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
system tables.
Type <type>oid</type> represents an object identifier. There are also
2021-05-05 17:26:48 +02:00
several alias types for <type>oid</type>, each
named <type>reg<replaceable>something</replaceable></type>.
2020-03-18 14:51:37 +01:00
<xref linkend="datatype-oid-table"/> shows an
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
overview.
2002-04-25 04:56:56 +02:00
</para>
<para>
2017-10-09 03:44:17 +02:00
The <type>oid</type> type is currently implemented as an unsigned
2003-12-01 23:08:02 +01:00
four-byte integer. Therefore, it is not large enough to provide
database-wide uniqueness in large databases, or even in large
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
individual tables.
2002-04-25 04:56:56 +02:00
</para>
<para>
2017-10-09 03:44:17 +02:00
The <type>oid</type> type itself has few operations beyond comparison.
2005-03-13 10:36:31 +01:00
It can be cast to integer, however, and then manipulated using the
standard integer operators. (Beware of possible
signed-versus-unsigned confusion if you do this.)
2002-04-25 04:56:56 +02:00
</para>
<para>
2003-03-13 02:30:29 +01:00
The OID alias types have no operations of their own except
2002-04-25 04:56:56 +02:00
for specialized input and output routines. These routines are able
to accept and display symbolic names for system objects, rather than
2017-10-09 03:44:17 +02:00
the raw numeric value that type <type>oid</type> would use. The alias
2005-01-08 06:19:18 +01:00
types allow simplified lookup of OID values for objects. For example,
2017-10-09 03:44:17 +02:00
to examine the <structname>pg_attribute</structname> rows related to a table
<literal>mytable</literal>, one could write:
2005-01-08 06:19:18 +01:00
<programlisting>
SELECT * FROM pg_attribute WHERE attrelid = 'mytable'::regclass;
</programlisting>
2007-02-01 01:28:19 +01:00
rather than:
2005-01-08 06:19:18 +01:00
<programlisting>
SELECT * FROM pg_attribute
WHERE attrelid = (SELECT oid FROM pg_class WHERE relname = 'mytable');
</programlisting>
While that doesn't look all that bad by itself, it's still oversimplified.
A far more complicated sub-select would be needed to
select the right OID if there are multiple tables named
2017-10-09 03:44:17 +02:00
<literal>mytable</literal> in different schemas.
The <type>regclass</type> input converter handles the table lookup according
to the schema path setting, and so it does the <quote>right thing</quote>
2005-01-08 06:19:18 +01:00
automatically. Similarly, casting a table's OID to
2017-10-09 03:44:17 +02:00
<type>regclass</type> is handy for symbolic display of a numeric OID.
2002-04-25 04:56:56 +02:00
</para>
2002-11-11 21:14:04 +01:00
<table id="datatype-oid-table">
2002-04-25 04:56:56 +02:00
<title>Object Identifier Types</title>
<tgroup cols="4">
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>References</entry>
<entry>Description</entry>
<entry>Value Example</entry>
2002-04-25 04:56:56 +02:00
</row>
</thead>
<tbody>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>oid</type></entry>
2003-11-01 02:56:29 +01:00
<entry>any</entry>
<entry>numeric object identifier</entry>
2017-10-09 03:44:17 +02:00
<entry><literal>564182</literal></entry>
2002-04-25 04:56:56 +02:00
</row>
<row>
2020-03-18 14:51:37 +01:00
<entry><type>regclass</type></entry>
<entry><structname>pg_class</structname></entry>
<entry>relation name</entry>
<entry><literal>pg_type</literal></entry>
2002-04-25 04:56:56 +02:00
</row>
2020-03-18 21:20:01 +01:00
<row>
<entry><type>regcollation</type></entry>
<entry><structname>pg_collation</structname></entry>
<entry>collation name</entry>
<entry><literal>"POSIX"</literal></entry>
</row>
2020-03-18 14:51:37 +01:00
2002-04-25 04:56:56 +02:00
<row>
2020-03-18 14:51:37 +01:00
<entry><type>regconfig</type></entry>
<entry><structname>pg_ts_config</structname></entry>
<entry>text search configuration</entry>
<entry><literal>english</literal></entry>
</row>
<row>
<entry><type>regdictionary</type></entry>
<entry><structname>pg_ts_dict</structname></entry>
<entry>text search dictionary</entry>
<entry><literal>simple</literal></entry>
</row>
<row>
<entry><type>regnamespace</type></entry>
<entry><structname>pg_namespace</structname></entry>
<entry>namespace name</entry>
<entry><literal>pg_catalog</literal></entry>
2002-04-25 04:56:56 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>regoper</type></entry>
<entry><structname>pg_operator</structname></entry>
2003-11-01 02:56:29 +01:00
<entry>operator name</entry>
2017-10-09 03:44:17 +02:00
<entry><literal>+</literal></entry>
2002-04-25 04:56:56 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>regoperator</type></entry>
<entry><structname>pg_operator</structname></entry>
2003-11-01 02:56:29 +01:00
<entry>operator with argument types</entry>
2020-04-23 21:12:42 +02:00
<entry><literal>*(integer,&zwsp;integer)</literal>
or <literal>-(NONE,&zwsp;integer)</literal></entry>
2002-04-25 04:56:56 +02:00
</row>
<row>
2020-03-18 14:51:37 +01:00
<entry><type>regproc</type></entry>
<entry><structname>pg_proc</structname></entry>
<entry>function name</entry>
<entry><literal>sum</literal></entry>
2002-04-25 04:56:56 +02:00
</row>
<row>
2020-03-18 14:51:37 +01:00
<entry><type>regprocedure</type></entry>
<entry><structname>pg_proc</structname></entry>
<entry>function with argument types</entry>
<entry><literal>sum(int4)</literal></entry>
2002-04-25 04:56:56 +02:00
</row>
2007-08-21 03:11:32 +02:00
2015-05-09 19:06:49 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>regrole</type></entry>
<entry><structname>pg_authid</structname></entry>
2015-05-09 19:06:49 +02:00
<entry>role name</entry>
2017-10-09 03:44:17 +02:00
<entry><literal>smithee</literal></entry>
2015-05-09 19:06:49 +02:00
</row>
2007-08-21 03:11:32 +02:00
<row>
2020-03-18 14:51:37 +01:00
<entry><type>regtype</type></entry>
<entry><structname>pg_type</structname></entry>
<entry>data type name</entry>
<entry><literal>integer</literal></entry>
2007-08-21 03:11:32 +02:00
</row>
2002-04-25 04:56:56 +02:00
</tbody>
</tgroup>
</table>
<para>
2021-05-05 17:26:48 +02:00
All of the OID alias types for objects that are grouped by namespace
accept schema-qualified names, and will
2002-04-25 04:56:56 +02:00
display schema-qualified names on output if the object would not
be found in the current search path without being qualified.
2021-05-05 17:26:48 +02:00
For example, <literal>myschema.mytable</literal> is acceptable input
for <type>regclass</type> (if there is such a table). That value
might be output as <literal>myschema.mytable</literal>, or
just <literal>mytable</literal>, depending on the current search path.
2017-10-09 03:44:17 +02:00
The <type>regproc</type> and <type>regoper</type> alias types will only
2002-04-25 04:56:56 +02:00
accept input names that are unique (not overloaded), so they are
2017-10-09 03:44:17 +02:00
of limited use; for most uses <type>regprocedure</type> or
<type>regoperator</type> are more appropriate. For <type>regoperator</type>,
unary operators are identified by writing <literal>NONE</literal> for the unused
2002-04-25 04:56:56 +02:00
operand.
</para>
2021-05-05 17:26:48 +02:00
<para>
The input functions for these types allow whitespace between tokens,
and will fold upper-case letters to lower case, except within double
quotes; this is done to make the syntax rules similar to the way
object names are written in SQL. Conversely, the output functions
will use double quotes if needed to make the output be a valid SQL
identifier. For example, the OID of a function
named <literal>Foo</literal> (with upper case <literal>F</literal>)
taking two integer arguments could be entered as
<literal>' "Foo" ( int, integer ) '::regprocedure</literal>. The
output would look like <literal>"Foo"(integer,integer)</literal>.
Both the function name and the argument type names could be
schema-qualified, too.
</para>
<para>
Many built-in <productname>PostgreSQL</productname> functions accept
the OID of a table, or another kind of database object, and for
convenience are declared as taking <type>regclass</type> (or the
appropriate OID alias type). This means you do not have to look up
the object's OID by hand, but can just enter its name as a string
literal. For example, the <function>nextval(regclass)</function> function
takes a sequence relation's OID, so you could call it like this:
<programlisting>
nextval('foo') <lineannotation>operates on sequence <literal>foo</literal></lineannotation>
nextval('FOO') <lineannotation>same as above</lineannotation>
nextval('"Foo"') <lineannotation>operates on sequence <literal>Foo</literal></lineannotation>
nextval('myschema.foo') <lineannotation>operates on <literal>myschema.foo</literal></lineannotation>
nextval('"myschema".foo') <lineannotation>same as above</lineannotation>
nextval('foo') <lineannotation>searches search path for <literal>foo</literal></lineannotation>
</programlisting>
</para>
<note>
<para>
When you write the argument of such a function as an unadorned
literal string, it becomes a constant of type <type>regclass</type>
(or the appropriate type).
Since this is really just an OID, it will track the originally
identified object despite later renaming, schema reassignment,
etc. This <quote>early binding</quote> behavior is usually desirable for
object references in column defaults and views. But sometimes you might
want <quote>late binding</quote> where the object reference is resolved
at run time. To get late-binding behavior, force the constant to be
stored as a <type>text</type> constant instead of <type>regclass</type>:
<programlisting>
nextval('foo'::text) <lineannotation><literal>foo</literal> is looked up at runtime</lineannotation>
</programlisting>
The <function>to_regclass()</function> function and its siblings
can also be used to perform run-time lookups. See
<xref linkend="functions-info-catalog-table"/>.
</para>
</note>
<para>
Another practical example of use of <type>regclass</type>
is to look up the OID of a table listed in
the <literal>information_schema</literal> views, which don't supply
such OIDs directly. One might for example wish to call
the <function>pg_relation_size()</function> function, which requires
the table OID. Taking the above rules into account, the correct way
to do that is
<programlisting>
SELECT table_schema, table_name,
pg_relation_size((quote_ident(table_schema) || '.' ||
quote_ident(table_name))::regclass)
FROM information_schema.tables
WHERE ...
</programlisting>
The <function>quote_ident()</function> function will take care of
double-quoting the identifiers where needed. The seemingly easier
<programlisting>
SELECT pg_relation_size(table_name)
FROM information_schema.tables
WHERE ...
</programlisting>
is <emphasis>not recommended</emphasis>, because it will fail for
tables that are outside your search path or have names that require
quoting.
</para>
2005-10-03 01:50:16 +02:00
<para>
2015-05-09 19:06:49 +02:00
An additional property of most of the OID alias types is the creation of
2009-04-27 18:27:36 +02:00
dependencies. If a
2005-10-03 01:50:16 +02:00
constant of one of these types appears in a stored expression
(such as a column default expression or view), it creates a dependency
on the referenced object. For example, if a column has a default
2017-10-09 03:44:17 +02:00
expression <literal>nextval('my_seq'::regclass)</literal>,
2005-10-03 01:50:16 +02:00
<productname>PostgreSQL</productname>
understands that the default expression depends on the sequence
2021-05-05 17:26:48 +02:00
<literal>my_seq</literal>, so the system will not let the sequence
be dropped without first removing the default expression. The
alternative of <literal>nextval('my_seq'::text)</literal> does not
create a dependency.
(<type>regrole</type> is an exception to this property. Constants of this
type are not allowed in stored expressions.)
2005-10-03 01:50:16 +02:00
</para>
2002-04-25 22:14:43 +02:00
<para>
2017-10-09 03:44:17 +02:00
Another identifier type used by the system is <type>xid</type>, or transaction
(abbreviated <abbrev>xact</abbrev>) identifier. This is the data type of the system columns
<structfield>xmin</structfield> and <structfield>xmax</structfield>. Transaction identifiers are 32-bit quantities.
2020-04-07 01:08:14 +02:00
In some contexts, a 64-bit variant <type>xid8</type> is used. Unlike
<type>xid</type> values, <type>xid8</type> values increase strictly
2022-11-30 02:49:52 +01:00
monotonically and cannot be reused in the lifetime of a database
cluster. See <xref linkend="transaction-id"/> for more details.
2002-04-25 22:14:43 +02:00
</para>
<para>
2017-10-09 03:44:17 +02:00
A third identifier type used by the system is <type>cid</type>, or
2002-11-15 04:11:18 +01:00
command identifier. This is the data type of the system columns
2017-10-09 03:44:17 +02:00
<structfield>cmin</structfield> and <structfield>cmax</structfield>. Command identifiers are also 32-bit quantities.
2002-04-25 22:14:43 +02:00
</para>
<para>
2017-10-09 03:44:17 +02:00
A final identifier type used by the system is <type>tid</type>, or tuple
2003-11-01 02:56:29 +01:00
identifier (row identifier). This is the data type of the system column
2017-10-09 03:44:17 +02:00
<structfield>ctid</structfield>. A tuple ID is a pair
2002-04-25 22:14:43 +02:00
(block number, tuple index within block) that identifies the
2003-11-01 02:56:29 +01:00
physical location of the row within its table.
2002-04-25 22:14:43 +02:00
</para>
2003-03-13 02:30:29 +01:00
<para>
(The system columns are further explained in <xref
2017-11-23 15:39:47 +01:00
linkend="ddl-system-columns"/>.)
2003-03-13 02:30:29 +01:00
</para>
2002-04-25 04:56:56 +02:00
</sect1>
2014-02-19 14:35:23 +01:00
<sect1 id="datatype-pg-lsn">
2021-07-01 22:23:37 +02:00
<title><type>pg_lsn</type> Type</title>
2014-02-19 14:35:23 +01:00
<indexterm zone="datatype-pg-lsn">
<primary>pg_lsn</primary>
</indexterm>
<para>
The <type>pg_lsn</type> data type can be used to store LSN (Log Sequence
2017-02-13 18:30:46 +01:00
Number) data which is a pointer to a location in the WAL. This type is a
2014-07-17 04:20:15 +02:00
representation of <type>XLogRecPtr</type> and an internal system type of
2014-02-19 14:35:23 +01:00
<productname>PostgreSQL</productname>.
</para>
<para>
Internally, an LSN is a 64-bit integer, representing a byte position in
the write-ahead log stream. It is printed as two hexadecimal numbers of
up to 8 digits each, separated by a slash; for example,
2017-10-09 03:44:17 +02:00
<literal>16/B374D848</literal>. The <type>pg_lsn</type> type supports the
2014-07-09 05:29:09 +02:00
standard comparison operators, like <literal>=</literal> and
2014-02-19 14:35:23 +01:00
<literal>></literal>. Two LSNs can be subtracted using the
<literal>-</literal> operator; the result is the number of bytes separating
2020-06-30 16:55:07 +02:00
those write-ahead log locations. Also the number of bytes can be
added into and subtracted from LSN using the
<literal>+(pg_lsn,numeric)</literal> and
<literal>-(pg_lsn,numeric)</literal> operators, respectively. Note that
the calculated LSN should be in the range of <type>pg_lsn</type> type,
i.e., between <literal>0/0</literal> and
<literal>FFFFFFFF/FFFFFFFF</literal>.
2014-02-19 14:35:23 +01:00
</para>
</sect1>
2002-08-22 02:01:51 +02:00
<sect1 id="datatype-pseudo">
<title>Pseudo-Types</title>
<indexterm zone="datatype-pseudo">
<primary>record</primary>
</indexterm>
<indexterm zone="datatype-pseudo">
<primary>any</primary>
</indexterm>
2007-06-07 01:00:50 +02:00
<indexterm zone="datatype-pseudo">
<primary>anyelement</primary>
</indexterm>
2002-08-22 02:01:51 +02:00
<indexterm zone="datatype-pseudo">
<primary>anyarray</primary>
</indexterm>
2003-08-10 00:50:22 +02:00
<indexterm zone="datatype-pseudo">
2007-06-07 01:00:50 +02:00
<primary>anynonarray</primary>
2003-08-10 00:50:22 +02:00
</indexterm>
2007-04-02 05:49:42 +02:00
<indexterm zone="datatype-pseudo">
<primary>anyenum</primary>
</indexterm>
2011-11-03 12:16:28 +01:00
<indexterm zone="datatype-pseudo">
<primary>anyrange</primary>
</indexterm>
Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.
Since v14, each range type automatically gets a corresponding multirange
datatype. There are both manual and automatic mechanisms for naming multirange
types. Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE. Otherwise, a multirange type name is generated
automatically. If the range type name contains "range" then we change that to
"multirange". Otherwise, we add "_multirange" to the end.
Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids. Altogether this format allows fetching a particular range by its index
in O(n).
Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.
Catversion is bumped.
Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 05:20:33 +01:00
<indexterm zone="datatype-pseudo">
<primary>anymultirange</primary>
</indexterm>
Introduce "anycompatible" family of polymorphic types.
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
2020-03-19 16:43:11 +01:00
<indexterm zone="datatype-pseudo">
<primary>anycompatible</primary>
</indexterm>
<indexterm zone="datatype-pseudo">
<primary>anycompatiblearray</primary>
</indexterm>
<indexterm zone="datatype-pseudo">
<primary>anycompatiblenonarray</primary>
</indexterm>
<indexterm zone="datatype-pseudo">
<primary>anycompatiblerange</primary>
</indexterm>
Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.
Since v14, each range type automatically gets a corresponding multirange
datatype. There are both manual and automatic mechanisms for naming multirange
types. Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE. Otherwise, a multirange type name is generated
automatically. If the range type name contains "range" then we change that to
"multirange". Otherwise, we add "_multirange" to the end.
Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids. Altogether this format allows fetching a particular range by its index
in O(n).
Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.
Catversion is bumped.
Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 05:20:33 +01:00
<indexterm zone="datatype-pseudo">
<primary>anycompatiblemultirange</primary>
</indexterm>
2002-08-22 02:01:51 +02:00
<indexterm zone="datatype-pseudo">
<primary>void</primary>
</indexterm>
<indexterm zone="datatype-pseudo">
<primary>trigger</primary>
</indexterm>
2015-12-28 17:04:42 +01:00
<indexterm zone="datatype-pseudo">
<primary>event_trigger</primary>
</indexterm>
<indexterm zone="datatype-pseudo">
<primary>pg_ddl_command</primary>
</indexterm>
2002-08-22 02:01:51 +02:00
<indexterm zone="datatype-pseudo">
<primary>language_handler</primary>
</indexterm>
2011-02-19 06:06:18 +01:00
<indexterm zone="datatype-pseudo">
<primary>fdw_handler</primary>
</indexterm>
Introduce "anycompatible" family of polymorphic types.
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
2020-03-19 16:43:11 +01:00
<indexterm zone="datatype-pseudo">
<primary>table_am_handler</primary>
</indexterm>
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
<indexterm zone="datatype-pseudo">
<primary>index_am_handler</primary>
</indexterm>
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
<indexterm zone="datatype-pseudo">
<primary>tsm_handler</primary>
</indexterm>
2002-08-22 02:01:51 +02:00
<indexterm zone="datatype-pseudo">
<primary>cstring</primary>
</indexterm>
<indexterm zone="datatype-pseudo">
<primary>internal</primary>
</indexterm>
2017-01-25 15:27:09 +01:00
<indexterm zone="datatype-pseudo">
<primary>unknown</primary>
</indexterm>
2002-08-22 02:01:51 +02:00
<para>
2002-11-11 21:14:04 +01:00
The <productname>PostgreSQL</productname> type system contains a
number of special-purpose entries that are collectively called
2017-10-09 03:44:17 +02:00
<firstterm>pseudo-types</firstterm>. A pseudo-type cannot be used as a
2002-11-11 21:14:04 +01:00
column data type, but it can be used to declare a function's
argument or result type. Each of the available pseudo-types is
useful in situations where a function's behavior does not
2002-11-15 04:11:18 +01:00
correspond to simply taking or returning a value of a specific
<acronym>SQL</acronym> data type. <xref
2017-11-23 15:39:47 +01:00
linkend="datatype-pseudotypes-table"/> lists the existing
2002-11-15 04:11:18 +01:00
pseudo-types.
2002-08-22 02:01:51 +02:00
</para>
2002-11-11 21:14:04 +01:00
<table id="datatype-pseudotypes-table">
2002-08-22 02:01:51 +02:00
<title>Pseudo-Types</title>
<tgroup cols="2">
2020-12-30 02:44:03 +01:00
<colspec colname="col1" colwidth="2*"/>
<colspec colname="col2" colwidth="3*"/>
2002-08-22 02:01:51 +02:00
<thead>
<row>
2003-11-01 02:56:29 +01:00
<entry>Name</entry>
<entry>Description</entry>
2002-08-22 02:01:51 +02:00
</row>
</thead>
<tbody>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>any</type></entry>
2009-04-27 18:27:36 +02:00
<entry>Indicates that a function accepts any input data type.</entry>
2002-08-22 02:01:51 +02:00
</row>
2011-11-21 05:50:27 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>anyelement</type></entry>
2011-11-21 05:50:27 +01:00
<entry>Indicates that a function accepts any data type
2017-11-23 15:39:47 +01:00
(see <xref linkend="extend-types-polymorphic"/>).</entry>
2011-11-21 05:50:27 +01:00
</row>
2002-08-22 02:01:51 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>anyarray</type></entry>
2003-11-01 02:56:29 +01:00
<entry>Indicates that a function accepts any array data type
2017-11-23 15:39:47 +01:00
(see <xref linkend="extend-types-polymorphic"/>).</entry>
2003-08-10 00:50:22 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>anynonarray</type></entry>
2011-11-21 05:50:27 +01:00
<entry>Indicates that a function accepts any non-array data type
2017-11-23 15:39:47 +01:00
(see <xref linkend="extend-types-polymorphic"/>).</entry>
2002-08-22 02:01:51 +02:00
</row>
2007-04-02 05:49:42 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>anyenum</type></entry>
2007-04-02 05:49:42 +02:00
<entry>Indicates that a function accepts any enum data type
2017-11-23 15:39:47 +01:00
(see <xref linkend="extend-types-polymorphic"/> and
<xref linkend="datatype-enum"/>).</entry>
2007-04-02 05:49:42 +02:00
</row>
2011-11-03 12:16:28 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>anyrange</type></entry>
2011-11-03 12:16:28 +01:00
<entry>Indicates that a function accepts any range data type
2017-11-23 15:39:47 +01:00
(see <xref linkend="extend-types-polymorphic"/> and
<xref linkend="rangetypes"/>).</entry>
2011-11-03 12:16:28 +01:00
</row>
Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.
Since v14, each range type automatically gets a corresponding multirange
datatype. There are both manual and automatic mechanisms for naming multirange
types. Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE. Otherwise, a multirange type name is generated
automatically. If the range type name contains "range" then we change that to
"multirange". Otherwise, we add "_multirange" to the end.
Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids. Altogether this format allows fetching a particular range by its index
in O(n).
Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.
Catversion is bumped.
Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 05:20:33 +01:00
<row>
<entry><type>anymultirange</type></entry>
<entry>Indicates that a function accepts any multirange data type
(see <xref linkend="extend-types-polymorphic"/> and
<xref linkend="rangetypes"/>).</entry>
</row>
Introduce "anycompatible" family of polymorphic types.
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
2020-03-19 16:43:11 +01:00
<row>
<entry><type>anycompatible</type></entry>
<entry>Indicates that a function accepts any data type,
with automatic promotion of multiple arguments to a common data type
(see <xref linkend="extend-types-polymorphic"/>).</entry>
</row>
<row>
<entry><type>anycompatiblearray</type></entry>
<entry>Indicates that a function accepts any array data type,
with automatic promotion of multiple arguments to a common data type
(see <xref linkend="extend-types-polymorphic"/>).</entry>
</row>
<row>
<entry><type>anycompatiblenonarray</type></entry>
<entry>Indicates that a function accepts any non-array data type,
with automatic promotion of multiple arguments to a common data type
(see <xref linkend="extend-types-polymorphic"/>).</entry>
</row>
<row>
<entry><type>anycompatiblerange</type></entry>
<entry>Indicates that a function accepts any range data type,
with automatic promotion of multiple arguments to a common data type
(see <xref linkend="extend-types-polymorphic"/> and
<xref linkend="rangetypes"/>).</entry>
</row>
Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.
Since v14, each range type automatically gets a corresponding multirange
datatype. There are both manual and automatic mechanisms for naming multirange
types. Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE. Otherwise, a multirange type name is generated
automatically. If the range type name contains "range" then we change that to
"multirange". Otherwise, we add "_multirange" to the end.
Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids. Altogether this format allows fetching a particular range by its index
in O(n).
Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.
Catversion is bumped.
Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 05:20:33 +01:00
<row>
<entry><type>anycompatiblemultirange</type></entry>
<entry>Indicates that a function accepts any multirange data type,
with automatic promotion of multiple arguments to a common data type
(see <xref linkend="extend-types-polymorphic"/> and
<xref linkend="rangetypes"/>).</entry>
</row>
2002-08-22 02:01:51 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>cstring</type></entry>
2003-11-01 02:56:29 +01:00
<entry>Indicates that a function accepts or returns a null-terminated C string.</entry>
2002-08-22 02:01:51 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>internal</type></entry>
2003-11-01 02:56:29 +01:00
<entry>Indicates that a function accepts or returns a server-internal
data type.</entry>
2002-08-22 02:01:51 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>language_handler</type></entry>
<entry>A procedural language call handler is declared to return <type>language_handler</type>.</entry>
2002-08-22 02:01:51 +02:00
</row>
2011-02-19 06:06:18 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>fdw_handler</type></entry>
<entry>A foreign-data wrapper handler is declared to return <type>fdw_handler</type>.</entry>
2011-02-19 06:06:18 +01:00
</row>
Introduce "anycompatible" family of polymorphic types.
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
2020-03-19 16:43:11 +01:00
<row>
<entry><type>table_am_handler</type></entry>
<entry>A table access method handler is declared to return <type>table_am_handler</type>.</entry>
</row>
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>index_am_handler</type></entry>
<entry>An index access method handler is declared to return <type>index_am_handler</type>.</entry>
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
</row>
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>tsm_handler</type></entry>
<entry>A tablesample method handler is declared to return <type>tsm_handler</type>.</entry>
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
</row>
2002-08-22 02:01:51 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>record</type></entry>
2015-12-28 17:04:42 +01:00
<entry>Identifies a function taking or returning an unspecified row type.</entry>
2002-08-22 02:01:51 +02:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>trigger</type></entry>
<entry>A trigger function is declared to return <type>trigger.</type></entry>
2003-08-31 19:32:24 +02:00
</row>
2015-12-28 17:04:42 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>event_trigger</type></entry>
<entry>An event trigger function is declared to return <type>event_trigger.</type></entry>
2015-12-28 17:04:42 +01:00
</row>
<row>
2017-10-09 03:44:17 +02:00
<entry><type>pg_ddl_command</type></entry>
2016-01-18 13:26:30 +01:00
<entry>Identifies a representation of DDL commands that is available to event triggers.</entry>
2015-12-28 17:04:42 +01:00
</row>
2003-08-31 19:32:24 +02:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>void</type></entry>
2003-11-01 02:56:29 +01:00
<entry>Indicates that a function returns no value.</entry>
2002-08-22 02:01:51 +02:00
</row>
2017-01-25 15:27:09 +01:00
<row>
2017-10-09 03:44:17 +02:00
<entry><type>unknown</type></entry>
2020-09-01 00:33:37 +02:00
<entry>Identifies a not-yet-resolved type, e.g., of an undecorated
2017-01-25 15:27:09 +01:00
string literal.</entry>
</row>
2002-08-22 02:01:51 +02:00
</tbody>
</tgroup>
</table>
<para>
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
Functions coded in C (whether built-in or dynamically loaded) can be
Introduce "anycompatible" family of polymorphic types.
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
2020-03-19 16:43:11 +01:00
declared to accept or return any of these pseudo-types. It is up to
2002-08-22 02:01:51 +02:00
the function author to ensure that the function will behave safely
when a pseudo-type is used as an argument type.
</para>
<para>
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
Functions coded in procedural languages can use pseudo-types only as
2015-12-28 17:04:42 +01:00
allowed by their implementation languages. At present most procedural
languages forbid use of a pseudo-type as an argument type, and allow
2017-10-09 03:44:17 +02:00
only <type>void</type> and <type>record</type> as a result type (plus
<type>trigger</type> or <type>event_trigger</type> when the function is used
Introduce "anycompatible" family of polymorphic types.
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
2020-03-19 16:43:11 +01:00
as a trigger or event trigger). Some also support polymorphic functions
using the polymorphic pseudo-types, which are shown above and discussed
in detail in <xref linkend="extend-types-polymorphic"/>.
2002-08-22 02:01:51 +02:00
</para>
<para>
2017-10-09 03:44:17 +02:00
The <type>internal</type> pseudo-type is used to declare functions
2002-11-15 04:11:18 +01:00
that are meant only to be called internally by the database
2009-04-27 18:27:36 +02:00
system, and not by direct invocation in an <acronym>SQL</acronym>
2017-10-09 03:44:17 +02:00
query. If a function has at least one <type>internal</type>-type
2002-11-15 04:11:18 +01:00
argument then it cannot be called from <acronym>SQL</acronym>. To
preserve the type safety of this restriction it is important to
follow this coding rule: do not create any function that is
2017-10-09 03:44:17 +02:00
declared to return <type>internal</type> unless it has at least one
<type>internal</type> argument.
2002-08-22 02:01:51 +02:00
</para>
</sect1>
1999-08-06 15:43:42 +02:00
</chapter>