postgresql/doc/src/sgml/rowtypes.sgml

<!-- doc/src/sgml/rowtypes.sgml -->

<sect1 id="rowtypes">
 <title>Composite Types</title>

 <indexterm>
  <primary>composite type</primary>
 </indexterm>

 <indexterm>
  <primary>row type</primary>
 </indexterm>

 <para>
  A <firstterm>composite type</firstterm> represents the structure of a row or record;
  it is essentially just a list of field names and their data types.
  <productname>PostgreSQL</productname> allows  composite types to be
  used in many of the same ways that simple types can be used.  For example, a
  column of a table can be declared to be of a composite type.
 </para>

 <sect2 id="rowtypes-declaring">
  <title>Declaration of Composite Types</title>

 <para>
  Here are two simple examples of defining composite types:
<programlisting>
CREATE TYPE complex AS (
    r       double precision,
    i       double precision
);

CREATE TYPE inventory_item AS (
    name            text,
    supplier_id     integer,
    price           numeric
);
</programlisting>
  The syntax is comparable to <command>CREATE TABLE</command>, except that only
  field names and types can be specified; no constraints (such as <literal>NOT
  NULL</literal>) can presently be included.  Note that the <literal>AS</literal> keyword
  is essential; without it, the system will think a different kind
  of <command>CREATE TYPE</command> command is meant, and you will get odd syntax
  errors.
 </para>

 <para>
  Having defined the types, we can use them to create tables:

<programlisting>
CREATE TABLE on_hand (
    item      inventory_item,
    count     integer
);

INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000);
</programlisting>

  or functions:

<programlisting>
CREATE FUNCTION price_extension(inventory_item, integer) RETURNS numeric
AS 'SELECT $1.price * $2' LANGUAGE SQL;

SELECT price_extension(item, 10) FROM on_hand;
</programlisting>

 </para>

 <para>
  Whenever you create a table, a composite type is also automatically
  created, with the same name as the table, to represent the table's
  row type.  For example, had we said:
<programlisting>
CREATE TABLE inventory_item (
    name            text,
    supplier_id     integer REFERENCES suppliers,
    price           numeric CHECK (price &gt; 0)
);
</programlisting>
  then the same <literal>inventory_item</literal> composite type shown above would
  come into being as a
  byproduct, and could be used just as above.  Note however an important
  restriction of the current implementation: since no constraints are
  associated with a composite type, the constraints shown in the table
  definition <emphasis>do not apply</emphasis> to values of the composite type
  outside the table.  (To work around this, create a
  <glossterm linkend="glossary-domain">domain</glossterm> over the composite
  type, and apply the desired constraints as <literal>CHECK</literal>
  constraints of the domain.)
 </para>
 </sect2>

 <sect2 id="rowtypes-constructing">
  <title>Constructing Composite Values</title>

  <indexterm>
   <primary>composite type</primary>
   <secondary>constant</secondary>
  </indexterm>

  <para>
   To write a composite value as a literal constant, enclose the field
   values within parentheses and separate them by commas.  You can put double
   quotes around any field value, and must do so if it contains commas or
   parentheses.  (More details appear <link
   linkend="rowtypes-io-syntax">below</link>.)  Thus, the general format of
   a composite constant is the following:
<synopsis>
'( <replaceable>val1</replaceable> , <replaceable>val2</replaceable> , ... )'
</synopsis>
   An example is:
<programlisting>
'("fuzzy dice",42,1.99)'
</programlisting>
   which would be a valid value of the <literal>inventory_item</literal> type
   defined above.  To make a field be NULL, write no characters at all
   in its position in the list.  For example, this constant specifies
   a NULL third field:
<programlisting>
'("fuzzy dice",42,)'
</programlisting>
   If you want an empty string rather than NULL, write double quotes:
<programlisting>
'("",42,)'
</programlisting>
   Here the first field is a non-NULL empty string, the third is NULL.
  </para>

  <para>
   (These constants are actually only a special case of
   the generic type constants discussed in <xref
   linkend="sql-syntax-constants-generic"/>.  The constant is initially
   treated as a string and passed to the composite-type input conversion
   routine.  An explicit type specification might be necessary to tell
   which type to convert the constant to.)
  </para>

 <para>
  The <literal>ROW</literal> expression syntax can also be used to
  construct composite values.  In most cases this is considerably
  simpler to use than the string-literal syntax since you don't have
  to worry about multiple layers of quoting.  We already used this
  method above:
<programlisting>
ROW('fuzzy dice', 42, 1.99)
ROW('', 42, NULL)
</programlisting>
  The ROW keyword is actually optional as long as you have more than one
  field in the expression, so these can be simplified to:
<programlisting>
('fuzzy dice', 42, 1.99)
('', 42, NULL)
</programlisting>
  The <literal>ROW</literal> expression syntax is discussed in more detail in <xref
  linkend="sql-syntax-row-constructors"/>.
 </para>
 </sect2>

 <sect2 id="rowtypes-accessing">
  <title>Accessing Composite Types</title>

 <para>
  To access a field of a composite column, one writes a dot and the field
  name, much like selecting a field from a table name.  In fact, it's so
  much like selecting from a table name that you often have to use parentheses
  to keep from confusing the parser.  For example, you might try to select
  some subfields from our <literal>on_hand</literal> example table with something
  like:

<programlisting>
SELECT item.name FROM on_hand WHERE item.price &gt; 9.99;
</programlisting>

  This will not work since the name <literal>item</literal> is taken to be a table
  name, not a column name of <literal>on_hand</literal>, per SQL syntax rules.
  You must write it like this:

<programlisting>
SELECT (item).name FROM on_hand WHERE (item).price &gt; 9.99;
</programlisting>

  or if you need to use the table name as well (for instance in a multitable
  query), like this:

<programlisting>
SELECT (on_hand.item).name FROM on_hand WHERE (on_hand.item).price &gt; 9.99;
</programlisting>

  Now the parenthesized object is correctly interpreted as a reference to
  the <literal>item</literal> column, and then the subfield can be selected from it.
 </para>

 <para>
  Similar syntactic issues apply whenever you select a field from a composite
  value.  For instance, to select just one field from the result of a function
  that returns a composite value, you'd need to write something like:

<programlisting>
SELECT (my_func(...)).field FROM ...
</programlisting>

  Without the extra parentheses, this will generate a syntax error.
 </para>

 <para>
  The special field name <literal>*</literal> means <quote>all fields</quote>, as
  further explained in <xref linkend="rowtypes-usage"/>.
 </para>
 </sect2>

 <sect2 id="rowtypes-modifying">
  <title>Modifying Composite Types</title>

 <para>
  Here are some examples of the proper syntax for inserting and updating
  composite columns.
  First, inserting or updating a whole column:

<programlisting>
INSERT INTO mytab (complex_col) VALUES((1.1,2.2));

UPDATE mytab SET complex_col = ROW(1.1,2.2) WHERE ...;
</programlisting>

  The first example omits <literal>ROW</literal>, the second uses it; we
  could have done it either way.
 </para>

 <para>
  We can update an individual subfield of a composite column:

<programlisting>
UPDATE mytab SET complex_col.r = (complex_col).r + 1 WHERE ...;
</programlisting>

  Notice here that we don't need to (and indeed cannot)
  put parentheses around the column name appearing just after
  <literal>SET</literal>, but we do need parentheses when referencing the same
  column in the expression to the right of the equal sign.
 </para>

 <para>
  And we can specify subfields as targets for <command>INSERT</command>, too:

<programlisting>
INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2);
</programlisting>

  Had we not supplied values for all the subfields of the column, the
  remaining subfields would have been filled with null values.
 </para>
 </sect2>

 <sect2 id="rowtypes-usage">
  <title>Using Composite Types in Queries</title>

  <para>
   There are various special syntax rules and behaviors associated with
   composite types in queries.  These rules provide useful shortcuts,
   but can be confusing if you don't know the logic behind them.
  </para>

  <para>
   In <productname>PostgreSQL</productname>, a reference to a table name (or alias)
   in a query is effectively a reference to the composite value of the
   table's current row.  For example, if we had a table
   <structname>inventory_item</structname> as shown
   <link linkend="rowtypes-declaring">above</link>, we could write:
<programlisting>
SELECT c FROM inventory_item c;
</programlisting>
   This query produces a single composite-valued column, so we might get
   output like:
<programlisting>
           c
------------------------
 ("fuzzy dice",42,1.99)
(1 row)
</programlisting>
   Note however that simple names are matched to column names before table
   names, so this example works only because there is no column
   named <structfield>c</structfield> in the query's tables.
  </para>

  <para>
   The ordinary qualified-column-name
   syntax <replaceable>table_name</replaceable><literal>.</literal><replaceable>column_name</replaceable>
   can be understood as applying <link linkend="field-selection">field
   selection</link> to the composite value of the table's current row.
   (For efficiency reasons, it's not actually implemented that way.)
  </para>

  <para>
   When we write
<programlisting>
SELECT c.* FROM inventory_item c;
</programlisting>
   then, according to the SQL standard, we should get the contents of the
   table expanded into separate columns:
<programlisting>
    name    | supplier_id | price
------------+-------------+-------
 fuzzy dice |          42 |  1.99
(1 row)
</programlisting>
   as if the query were
<programlisting>
SELECT c.name, c.supplier_id, c.price FROM inventory_item c;
</programlisting>
   <productname>PostgreSQL</productname> will apply this expansion behavior to
   any composite-valued expression, although as shown <link
   linkend="rowtypes-accessing">above</link>, you need to write parentheses
   around the value that <literal>.*</literal> is applied to whenever it's not a
   simple table name.  For example, if <function>myfunc()</function> is a function
   returning a composite type with columns <structfield>a</structfield>,
   <structfield>b</structfield>, and <structfield>c</structfield>, then these two queries have the
   same result:
<programlisting>
SELECT (myfunc(x)).* FROM some_table;
SELECT (myfunc(x)).a, (myfunc(x)).b, (myfunc(x)).c FROM some_table;
</programlisting>
  </para>

  <tip>
   <para>
    <productname>PostgreSQL</productname> handles column expansion by
    actually transforming the first form into the second.  So, in this
    example, <function>myfunc()</function> would get invoked three times per row
    with either syntax.  If it's an expensive function you may wish to
    avoid that, which you can do with a query like:
<programlisting>
SELECT m.* FROM some_table, LATERAL myfunc(x) AS m;
</programlisting>
    Placing the function in
    a <literal>LATERAL</literal> <literal>FROM</literal> item keeps it from
    being invoked more than once per row.  <literal>m.*</literal> is still
    expanded into <literal>m.a, m.b, m.c</literal>, but now those variables
    are just references to the output of the <literal>FROM</literal> item.
    (The <literal>LATERAL</literal> keyword is optional here, but we show it
    to clarify that the function is getting <structfield>x</structfield>
    from <structname>some_table</structname>.)
   </para>
  </tip>

  <para>
   The <replaceable>composite_value</replaceable><literal>.*</literal> syntax results in
   column expansion of this kind when it appears at the top level of
   a <link linkend="queries-select-lists"><command>SELECT</command> output
   list</link>, a <link linkend="dml-returning"><literal>RETURNING</literal>
   list</link> in <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command>/<command>MERGE</command>,
   a <link linkend="queries-values"><literal>VALUES</literal> clause</link>, or
   a <link linkend="sql-syntax-row-constructors">row constructor</link>.
   In all other contexts (including when nested inside one of those
   constructs), attaching <literal>.*</literal> to a composite value does not
   change the value, since it means <quote>all columns</quote> and so the
   same composite value is produced again.  For example,
   if <function>somefunc()</function> accepts a composite-valued argument,
   these queries are the same:

<programlisting>
SELECT somefunc(c.*) FROM inventory_item c;
SELECT somefunc(c) FROM inventory_item c;
</programlisting>

   In both cases, the current row of <structname>inventory_item</structname> is
   passed to the function as a single composite-valued argument.
   Even though <literal>.*</literal> does nothing in such cases, using it is good
   style, since it makes clear that a composite value is intended.  In
   particular, the parser will consider <literal>c</literal> in <literal>c.*</literal> to
   refer to a table name or alias, not to a column name, so that there is
   no ambiguity; whereas without <literal>.*</literal>, it is not clear
   whether <literal>c</literal> means a table name or a column name, and in fact
   the column-name interpretation will be preferred if there is a column
   named <literal>c</literal>.
  </para>

  <para>
   Another example demonstrating these concepts is that all these queries
   mean the same thing:
<programlisting>
SELECT * FROM inventory_item c ORDER BY c;
SELECT * FROM inventory_item c ORDER BY c.*;
SELECT * FROM inventory_item c ORDER BY ROW(c.*);
</programlisting>
   All of these <literal>ORDER BY</literal> clauses specify the row's composite
   value, resulting in sorting the rows according to the rules described
   in <xref linkend="composite-type-comparison"/>.  However,
   if <structname>inventory_item</structname> contained a column
   named <structfield>c</structfield>, the first case would be different from the
   others, as it would mean to sort by that column only.  Given the column
   names previously shown, these queries are also equivalent to those above:
<programlisting>
SELECT * FROM inventory_item c ORDER BY ROW(c.name, c.supplier_id, c.price);
SELECT * FROM inventory_item c ORDER BY (c.name, c.supplier_id, c.price);
</programlisting>
   (The last case uses a row constructor with the key word <literal>ROW</literal>
   omitted.)
  </para>

  <para>
   Another special syntactical behavior associated with composite values is
   that we can use <firstterm>functional notation</firstterm> for extracting a field
   of a composite value.  The simple way to explain this is that
   the notations <literal><replaceable>field</replaceable>(<replaceable>table</replaceable>)</literal>
   and <literal><replaceable>table</replaceable>.<replaceable>field</replaceable></literal>
   are interchangeable.  For example, these queries are equivalent:

<programlisting>
SELECT c.name FROM inventory_item c WHERE c.price &gt; 1000;
SELECT name(c) FROM inventory_item c WHERE price(c) &gt; 1000;
</programlisting>

   Moreover, if we have a function that accepts a single argument of a
   composite type, we can call it with either notation.  These queries are
   all equivalent:

<programlisting>
SELECT somefunc(c) FROM inventory_item c;
SELECT somefunc(c.*) FROM inventory_item c;
SELECT c.somefunc FROM inventory_item c;
</programlisting>
  </para>

  <para>
   This equivalence between functional notation and field notation
   makes it possible to use functions on composite types to implement
   <quote>computed fields</quote>.
   <indexterm>
    <primary>computed field</primary>
   </indexterm>
   <indexterm>
    <primary>field</primary>
    <secondary>computed</secondary>
   </indexterm>
   An application using the last query above wouldn't need to be directly
   aware that <literal>somefunc</literal> isn't a real column of the table.
  </para>

  <tip>
   <para>
    Because of this behavior, it's unwise to give a function that takes a
    single composite-type argument the same name as any of the fields of
    that composite type.  If there is ambiguity, the field-name
    interpretation will be chosen if field-name syntax is used, while the
    function will be chosen if function-call syntax is used.  However,
    <productname>PostgreSQL</productname> versions before 11 always chose the
    field-name interpretation, unless the syntax of the call required it to
    be a function call.  One way to force the function interpretation in
    older versions is to schema-qualify the function name, that is, write
    <literal><replaceable>schema</replaceable>.<replaceable>func</replaceable>(<replaceable>compositevalue</replaceable>)</literal>.
   </para>
  </tip>
 </sect2>

 <sect2 id="rowtypes-io-syntax">
  <title>Composite Type Input and Output Syntax</title>

  <para>
   The external text representation of a composite value consists of items that
   are interpreted according to the I/O conversion rules for the individual
   field types, plus decoration that indicates the composite structure.
   The decoration consists of parentheses (<literal>(</literal> and <literal>)</literal>)
   around the whole value, plus commas (<literal>,</literal>) between adjacent
   items.  Whitespace outside the parentheses is ignored, but within the
   parentheses it is considered part of the field value, and might or might not be
   significant depending on the input conversion rules for the field data type.
   For example, in:
<programlisting>
'(  42)'
</programlisting>
   the whitespace will be ignored if the field type is integer, but not if
   it is text.
  </para>

  <para>
   As shown previously, when writing a composite value you can write double
   quotes around any individual field value.
   You <emphasis>must</emphasis> do so if the field value would otherwise
   confuse the composite-value parser.  In particular, fields containing
   parentheses, commas, double quotes, or backslashes must be double-quoted.
   To put a double quote or backslash in a quoted composite field value,
   precede it with a backslash.  (Also, a pair of double quotes within a
   double-quoted field value is taken to represent a double quote character,
   analogously to the rules for single quotes in SQL literal strings.)
   Alternatively, you can avoid quoting and use backslash-escaping to
   protect all data characters
   that would otherwise be taken as composite syntax.
  </para>

  <para>
   A completely empty field value (no characters at all between the commas
   or parentheses) represents a NULL.  To write a value that is an empty
   string rather than NULL, write <literal>""</literal>.
  </para>

  <para>
   The composite output routine will put double quotes around field values
   if they are empty strings or contain parentheses, commas,
   double quotes, backslashes, or white space.  (Doing so for white space
   is not essential, but aids legibility.)  Double quotes and backslashes
   embedded in field values will be doubled.
  </para>

 <note>
  <para>
   Remember that what you write in an SQL command will first be interpreted
   as a string literal, and then as a composite.  This doubles the number of
   backslashes you need (assuming escape string syntax is used).
   For example, to insert a <type>text</type> field
   containing a double quote and a backslash in a composite
   value, you'd need to write:
<programlisting>
INSERT ... VALUES ('("\"\\")');
</programlisting>
   The string-literal processor removes one level of backslashes, so that
   what arrives at the composite-value parser looks like
   <literal>("\"\\")</literal>.  In turn, the string
   fed to the <type>text</type> data type's input routine
   becomes <literal>"\</literal>.  (If we were working
   with a data type whose input routine also treated backslashes specially,
   <type>bytea</type> for example, we might need as many as eight backslashes
   in the command to get one backslash into the stored composite field.)
   Dollar quoting (see <xref linkend="sql-syntax-dollar-quoting"/>) can be
   used to avoid the need to double backslashes.
  </para>
 </note>

 <tip>
  <para>
   The <literal>ROW</literal> constructor syntax is usually easier to work with
   than the composite-literal syntax when writing composite values in SQL
   commands.
   In <literal>ROW</literal>, individual field values are written the same way
   they would be written when not members of a composite.
  </para>
 </tip>
 </sect2>

</sect1>