postgresql/doc/src/sgml/ltree.sgml

<!-- doc/src/sgml/ltree.sgml -->

<sect1 id="ltree" xreflabel="ltree">
 <title>ltree</title>

 <indexterm zone="ltree">
  <primary>ltree</primary>
 </indexterm>

 <para>
  This module implements a data type <type>ltree</> for representing
  labels of data stored in a hierarchical tree-like structure.
  Extensive facilities for searching through label trees are provided.
 </para>

 <sect2>
  <title>Definitions</title>

  <para>
   A <firstterm>label</firstterm> is a sequence of alphanumeric characters
   and underscores (for example, in C locale the characters
   <literal>A-Za-z0-9_</> are allowed).  Labels must be less than 256 bytes
   long.
  </para>

  <para>
   Examples: <literal>42</>, <literal>Personal_Services</>
  </para>

  <para>
   A <firstterm>label path</firstterm> is a sequence of zero or more
   labels separated by dots, for example <literal>L1.L2.L3</>, representing
   a path from the root of a hierarchical tree to a particular node.  The
   length of a label path must be less than 65kB, but keeping it under 2kB is
   preferable.  In practice this is not a major limitation; for example,
   the longest label path in the DMOZ catalog (<ulink
   url="http://www.dmoz.org"></ulink>) is about 240 bytes.
  </para>

  <para>
   Example: <literal>Top.Countries.Europe.Russia</literal>
  </para>

  <para>
   The <filename>ltree</> module provides several data types:
  </para>

  <itemizedlist>
   <listitem>
    <para>
     <type>ltree</type> stores a label path.
    </para>
   </listitem>

   <listitem>
    <para>
     <type>lquery</type> represents a regular-expression-like pattern
     for matching <type>ltree</> values.  A simple word matches that
     label within a path.  A star symbol (<literal>*</>) matches zero
     or more labels.  For example:
<synopsis>
foo         <lineannotation>Match the exact label path <literal>foo</></lineannotation>
*.foo.*     <lineannotation>Match any label path containing the label <literal>foo</></lineannotation>
*.foo       <lineannotation>Match any label path whose last label is <literal>foo</></lineannotation>
</synopsis>
    </para>

    <para>
     Star symbols can also be quantified to restrict how many labels
     they can match:
<synopsis>
*{<replaceable>n</>}        <lineannotation>Match exactly <replaceable>n</> labels</lineannotation>
*{<replaceable>n</>,}       <lineannotation>Match at least <replaceable>n</> labels</lineannotation>
*{<replaceable>n</>,<replaceable>m</>}      <lineannotation>Match at least <replaceable>n</> but not more than <replaceable>m</> labels</lineannotation>
*{,<replaceable>m</>}       <lineannotation>Match at most <replaceable>m</> labels &mdash; same as </lineannotation> *{0,<replaceable>m</>}
</synopsis>
    </para>

    <para>
     There are several modifiers that can be put at the end of a non-star
     label in <type>lquery</> to make it match more than just the exact match:
<synopsis>
@           <lineannotation>Match case-insensitively, for example <literal>a@</> matches <literal>A</></lineannotation>
*           <lineannotation>Match any label with this prefix, for example <literal>foo*</> matches <literal>foobar</></lineannotation>
%           <lineannotation>Match initial underscore-separated words</lineannotation>
</synopsis>
     The behavior of <literal>%</> is a bit complicated.  It tries to match
     words rather than the entire label.  For example
     <literal>foo_bar%</> matches <literal>foo_bar_baz</> but not
     <literal>foo_barbaz</>.  If combined with <literal>*</>, prefix
     matching applies to each word separately, for example
     <literal>foo_bar%*</> matches <literal>foo1_bar2_baz</> but
     not <literal>foo1_br2_baz</>.
    </para>

    <para>
     Also, you can write several possibly-modified labels separated with
     <literal>|</> (OR) to match any of those labels, and you can put
     <literal>!</> (NOT) at the start to match any label that doesn't
     match any of the alternatives.
    </para>

    <para>
     Here's an annotated example of <type>lquery</type>:
<programlisting>
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
a.  b.     c.      d.               e.
</programlisting>
     This query will match any label path that:
    </para>
    <orderedlist numeration="loweralpha">
     <listitem>
      <para>
       begins with the label <literal>Top</literal>
      </para>
     </listitem>
     <listitem>
      <para>
       and next has zero to two labels before
      </para>
     </listitem>
     <listitem>
      <para>
       a label beginning with the case-insensitive prefix <literal>sport</literal>
      </para>
     </listitem>
     <listitem>
      <para>
       then a label not matching <literal>football</literal> nor
       <literal>tennis</literal>
      </para>
     </listitem>
     <listitem>
      <para>
       and then ends with a label beginning with <literal>Russ</literal> or
       exactly matching <literal>Spain</literal>.
      </para>
     </listitem>
    </orderedlist>
   </listitem>

   <listitem>
    <para><type>ltxtquery</type> represents a full-text-search-like
    pattern for matching <type>ltree</> values.  An
    <type>ltxtquery</type> value contains words, possibly with the
    modifiers <literal>@</>, <literal>*</>, <literal>%</> at the end;
    the modifiers have the same meanings as in <type>lquery</>.
    Words can be combined with <literal>&amp;</> (AND),
    <literal>|</> (OR), <literal>!</> (NOT), and parentheses.
    The key difference from
    <type>lquery</> is that <type>ltxtquery</type> matches words without
    regard to their position in the label path.
    </para>

    <para>
     Here's an example <type>ltxtquery</type>:
<programlisting>
Europe &amp; Russia*@ &amp; !Transportation
</programlisting>
     This will match paths that contain the label <literal>Europe</literal> and
     any label beginning with <literal>Russia</literal> (case-insensitive),
     but not paths containing the label <literal>Transportation</literal>.
     The location of these words within the path is not important.
     Also, when <literal>%</> is used, the word can be matched to any
     underscore-separated word within a label, regardless of position.
    </para>
   </listitem>

  </itemizedlist>

  <para>
   Note: <type>ltxtquery</> allows whitespace between symbols, but
   <type>ltree</> and <type>lquery</> do not.
  </para>
 </sect2>

 <sect2>
  <title>Operators and Functions</title>

  <para>
   Type <type>ltree</> has the usual comparison operators
   <literal>=</>, <literal>&lt;&gt;</literal>,
   <literal>&lt;</>, <literal>&gt;</>, <literal>&lt;=</>, <literal>&gt;=</>.
   Comparison sorts in the order of a tree traversal, with the children
   of a node sorted by label text.  In addition, the specialized
   operators shown in <xref linkend="ltree-op-table"> are available.
  </para>

  <table id="ltree-op-table">
   <title><type>ltree</> Operators</title>

   <tgroup cols="3">
    <thead>
     <row>
      <entry>Operator</entry>
      <entry>Returns</entry>
      <entry>Description</entry>
     </row>
    </thead>

    <tbody>
     <row>
      <entry><type>ltree</> <literal>@&gt;</> <type>ltree</></entry>
      <entry><type>boolean</type></entry>
      <entry>is left argument an ancestor of right (or equal)?</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>&lt;@</> <type>ltree</></entry>
      <entry><type>boolean</type></entry>
      <entry>is left argument a descendant of right (or equal)?</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>~</> <type>lquery</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> match <type>lquery</>?</entry>
     </row>

     <row>
      <entry><type>lquery</> <literal>~</> <type>ltree</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> match <type>lquery</>?</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>?</> <type>lquery[]</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> match any <type>lquery</> in array?</entry>
     </row>

     <row>
      <entry><type>lquery[]</> <literal>?</> <type>ltree</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> match any <type>lquery</> in array?</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>@</> <type>ltxtquery</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> match <type>ltxtquery</>?</entry>
     </row>

     <row>
      <entry><type>ltxtquery</> <literal>@</> <type>ltree</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> match <type>ltxtquery</>?</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>||</> <type>ltree</></entry>
      <entry><type>ltree</type></entry>
      <entry>concatenate <type>ltree</> paths</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>||</> <type>text</></entry>
      <entry><type>ltree</type></entry>
      <entry>convert text to <type>ltree</> and concatenate</entry>
     </row>

     <row>
      <entry><type>text</> <literal>||</> <type>ltree</></entry>
      <entry><type>ltree</type></entry>
      <entry>convert text to <type>ltree</> and concatenate</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>@&gt;</> <type>ltree</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain an ancestor of <type>ltree</>?</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>&lt;@</> <type>ltree[]</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain an ancestor of <type>ltree</>?</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>&lt;@</> <type>ltree</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain a descendant of <type>ltree</>?</entry>
     </row>

     <row>
      <entry><type>ltree</> <literal>@&gt;</> <type>ltree[]</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain a descendant of <type>ltree</>?</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>~</> <type>lquery</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain any path matching <type>lquery</>?</entry>
     </row>

     <row>
      <entry><type>lquery</> <literal>~</> <type>ltree[]</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain any path matching <type>lquery</>?</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>?</> <type>lquery[]</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> array contain any path matching any <type>lquery</>?</entry>
     </row>

     <row>
      <entry><type>lquery[]</> <literal>?</> <type>ltree[]</></entry>
      <entry><type>boolean</type></entry>
      <entry>does <type>ltree</> array contain any path matching any <type>lquery</>?</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>@</> <type>ltxtquery</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain any path matching <type>ltxtquery</>?</entry>
     </row>

     <row>
      <entry><type>ltxtquery</> <literal>@</> <type>ltree[]</></entry>
      <entry><type>boolean</type></entry>
      <entry>does array contain any path matching <type>ltxtquery</>?</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>?@&gt;</> <type>ltree</></entry>
      <entry><type>ltree</type></entry>
      <entry>first array entry that is an ancestor of <type>ltree</>; NULL if none</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>?&lt;@</> <type>ltree</></entry>
      <entry><type>ltree</type></entry>
      <entry>first array entry that is a descendant of <type>ltree</>; NULL if none</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>?~</> <type>lquery</></entry>
      <entry><type>ltree</type></entry>
      <entry>first array entry that matches <type>lquery</>; NULL if none</entry>
     </row>

     <row>
      <entry><type>ltree[]</> <literal>?@</> <type>ltxtquery</></entry>
      <entry><type>ltree</type></entry>
      <entry>first array entry that matches <type>ltxtquery</>; NULL if none</entry>
     </row>

    </tbody>
   </tgroup>
  </table>

  <para>
   The operators <literal>&lt;@</literal>, <literal>@&gt;</literal>,
   <literal>@</literal> and <literal>~</literal> have analogues
   <literal>^&lt;@</>, <literal>^@&gt;</>, <literal>^@</>,
   <literal>^~</literal>, which are the same except they do not use
   indexes.  These are useful only for testing purposes.
  </para>

  <para>
   The available functions are shown in <xref linkend="ltree-func-table">.
  </para>

  <table id="ltree-func-table">
   <title><type>ltree</> Functions</title>

   <tgroup cols="5">
    <thead>
     <row>
      <entry>Function</entry>
      <entry>Return Type</entry>
      <entry>Description</entry>
      <entry>Example</entry>
      <entry>Result</entry>
     </row>
    </thead>

    <tbody>
     <row>
      <entry><function>subltree(ltree, int start, int end)</function><indexterm><primary>subltree</primary></indexterm></entry>
      <entry><type>ltree</type></entry>
      <entry>subpath of <type>ltree</> from position <parameter>start</> to
       position <parameter>end</>-1 (counting from 0)</entry>
      <entry><literal>subltree('Top.Child1.Child2',1,2)</literal></entry>
      <entry><literal>Child1</literal></entry>
     </row>

     <row>
      <entry><function>subpath(ltree, int offset, int len)</function><indexterm><primary>subpath</primary></indexterm></entry>
      <entry><type>ltree</type></entry>
      <entry>subpath of <type>ltree</> starting at position
       <parameter>offset</>, length <parameter>len</>.
       If <parameter>offset</> is negative, subpath starts that far from the
       end of the path.  If <parameter>len</> is negative, leaves that many
       labels off the end of the path.</entry>
      <entry><literal>subpath('Top.Child1.Child2',0,2)</literal></entry>
      <entry><literal>Top.Child1</literal></entry>
     </row>

     <row>
      <entry><function>subpath(ltree, int offset)</function></entry>
      <entry><type>ltree</type></entry>
      <entry>subpath of <type>ltree</> starting at position
       <parameter>offset</>, extending to end of path.
       If <parameter>offset</> is negative, subpath starts that far from the
       end of the path.</entry>
      <entry><literal>subpath('Top.Child1.Child2',1)</literal></entry>
      <entry><literal>Child1.Child2</literal></entry>
     </row>

     <row>
      <entry><function>nlevel(ltree)</function><indexterm><primary>nlevel</primary></indexterm></entry>
      <entry><type>integer</type></entry>
      <entry>number of labels in path</entry>
      <entry><literal>nlevel('Top.Child1.Child2')</literal></entry>
      <entry><literal>3</literal></entry>
     </row>

     <row>
      <entry><function>index(ltree a, ltree b)</function><indexterm><primary>index</primary></indexterm></entry>
      <entry><type>integer</type></entry>
      <entry>position of first occurrence of <parameter>b</> in
       <parameter>a</>; -1 if not found</entry>
      <entry><literal>index('0.1.2.3.5.4.5.6.8.5.6.8','5.6')</literal></entry>
      <entry><literal>6</literal></entry>
     </row>

     <row>
      <entry><function>index(ltree a, ltree b, int offset)</function></entry>
      <entry><type>integer</type></entry>
      <entry>position of first occurrence of <parameter>b</> in
       <parameter>a</>, searching starting at <parameter>offset</>;
       negative <parameter>offset</> means start <parameter>-offset</>
       labels from the end of the path</entry>
      <entry><literal>index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4)</literal></entry>
      <entry><literal>9</literal></entry>
     </row>

     <row>
      <entry><function>text2ltree(text)</function><indexterm><primary>text2ltree</primary></indexterm></entry>
      <entry><type>ltree</type></entry>
      <entry>cast <type>text</> to <type>ltree</></entry>
      <entry><literal></literal></entry>
      <entry><literal></literal></entry>
     </row>

     <row>
      <entry><function>ltree2text(ltree)</function><indexterm><primary>ltree2text</primary></indexterm></entry>
      <entry><type>text</type></entry>
      <entry>cast <type>ltree</> to <type>text</></entry>
      <entry><literal></literal></entry>
      <entry><literal></literal></entry>
     </row>

     <row>
      <entry><function>lca(ltree, ltree, ...)</function><indexterm><primary>lca</primary></indexterm></entry>
      <entry><type>ltree</type></entry>
      <entry>lowest common ancestor, i.e., longest common prefix of paths
       (up to 8 arguments supported)</entry>
      <entry><literal>lca('1.2.2.3','1.2.3.4.5.6')</literal></entry>
      <entry><literal>1.2</literal></entry>
     </row>

     <row>
      <entry><function>lca(ltree[])</function></entry>
      <entry><type>ltree</type></entry>
      <entry>lowest common ancestor, i.e., longest common prefix of paths</entry>
      <entry><literal>lca(array['1.2.2.3'::ltree,'1.2.3'])</literal></entry>
      <entry><literal>1.2</literal></entry>
     </row>

    </tbody>
   </tgroup>
  </table>
 </sect2>

 <sect2>
  <title>Indexes</title>
  <para>
   <filename>ltree</> supports several types of indexes that can speed
   up the indicated operators:
  </para>

  <itemizedlist>
   <listitem>
    <para>
     B-tree index over <type>ltree</>:
     <literal>&lt;</>, <literal>&lt;=</>, <literal>=</>,
     <literal>&gt;=</>, <literal>&gt;</literal>
    </para>
   </listitem>
   <listitem>
    <para>
     GiST index over <type>ltree</>:
     <literal>&lt;</>, <literal>&lt;=</>, <literal>=</>,
     <literal>&gt;=</>, <literal>&gt;</>,
     <literal>@&gt;</>, <literal>&lt;@</>,
     <literal>@</>, <literal>~</>, <literal>?</literal>
    </para>
    <para>
     Example of creating such an index:
    </para>
<programlisting>
CREATE INDEX path_gist_idx ON test USING GIST (path);
</programlisting>
   </listitem>
   <listitem>
    <para>
     GiST index over <type>ltree[]</>:
     <literal>ltree[] &lt;@ ltree</>, <literal>ltree @&gt; ltree[]</>,
     <literal>@</>, <literal>~</>, <literal>?</literal>
    </para>
    <para>
     Example of creating such an index:
    </para>
<programlisting>
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
</programlisting>
    <para>
     Note: This index type is lossy.
    </para>
   </listitem>
  </itemizedlist>
 </sect2>

 <sect2>
  <title>Example</title>

  <para>
   This example uses the following data (also available in file
   <filename>contrib/ltree/ltreetest.sql</> in the source distribution):
  </para>

<programlisting>
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);
</programlisting>

  <para>
   Now, we have a table <structname>test</> populated with data describing
   the hierarchy shown below:
  </para>

<literallayout class="monospaced">
                        Top
                     /   |  \
             Science Hobbies Collections
                 /       |              \
        Astronomy   Amateurs_Astronomy Pictures
           /  \                            |
Astrophysics  Cosmology                Astronomy
                                        /  |    \
                                 Galaxies Stars Astronauts
</literallayout>

  <para>
   We can do inheritance:
<screen>
ltreetest=&gt; SELECT path FROM test WHERE path &lt;@ 'Top.Science';
                path
------------------------------------
 Top.Science
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
(4 rows)
</screen>
  </para>

  <para>
   Here are some examples of path matching:
<screen>
ltreetest=&gt; SELECT path FROM test WHERE path ~ '*.Astronomy.*';
                     path
-----------------------------------------------
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
 Top.Collections.Pictures.Astronomy
 Top.Collections.Pictures.Astronomy.Stars
 Top.Collections.Pictures.Astronomy.Galaxies
 Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)

ltreetest=&gt; SELECT path FROM test WHERE path ~ '*.!pictures@.*.Astronomy.*';
                path
------------------------------------
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
(3 rows)
</screen>
  </para>

  <para>
   Here are some examples of full text search:
<screen>
ltreetest=&gt; SELECT path FROM test WHERE path @ 'Astro*% &amp; !pictures@';
                path
------------------------------------
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
 Top.Hobbies.Amateurs_Astronomy
(4 rows)

ltreetest=&gt; SELECT path FROM test WHERE path @ 'Astro* &amp; !pictures@';
                path
------------------------------------
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
(3 rows)
</screen>
  </para>

  <para>
   Path construction using functions:
<screen>
ltreetest=&gt; SELECT subpath(path,0,2)||'Space'||subpath(path,2) FROM test WHERE path &lt;@ 'Top.Science.Astronomy';
                 ?column?
------------------------------------------
 Top.Science.Space.Astronomy
 Top.Science.Space.Astronomy.Astrophysics
 Top.Science.Space.Astronomy.Cosmology
(3 rows)
</screen>
  </para>

  <para>
   We could simplify this by creating a SQL function that inserts a label
   at a specified position in a path:
<screen>
CREATE FUNCTION ins_label(ltree, int, text) RETURNS ltree
    AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
    LANGUAGE SQL IMMUTABLE;

ltreetest=&gt; SELECT ins_label(path,2,'Space') FROM test WHERE path &lt;@ 'Top.Science.Astronomy';
                ins_label
------------------------------------------
 Top.Science.Space.Astronomy
 Top.Science.Space.Astronomy.Astrophysics
 Top.Science.Space.Astronomy.Cosmology
(3 rows)
</screen>
  </para>
 </sect2>

 <sect2>
  <title>Transforms</title>

  <para>
   Additional extensions are available that implement transforms for
   the <type>ltree</type> type for PL/Python.  The extensions are
   called <literal>ltree_plpythonu</literal>, <literal>ltree_plpython2u</literal>,
   and <literal>ltree_plpython3u</literal>
   (see <xref linkend="plpython-python23"> for the PL/Python naming
   convention).  If you install these transforms and specify them when
   creating a function, <type>ltree</type> values are mapped to Python lists.
   (The reverse is currently not supported, however.)
  </para>
 </sect2>

 <sect2>
  <title>Authors</title>

  <para>
   All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
   Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
   <ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink> for
   additional information. Authors would like to thank Eugeny Rodichev for
   helpful discussions. Comments and bug reports are welcome.
  </para>
 </sect2>

</sect1>