2010-09-20 22:08:53 +02:00
|
|
|
<!-- doc/src/sgml/xoper.sgml -->
|
2000-03-31 05:27:42 +02:00
|
|
|
|
2003-04-10 03:22:45 +02:00
|
|
|
<sect1 id="xoper">
|
2011-02-01 23:00:26 +01:00
|
|
|
<title>User-defined Operators</title>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
2003-08-31 19:32:24 +02:00
|
|
|
<indexterm zone="xoper">
|
|
|
|
<primary>operator</primary>
|
|
|
|
<secondary>user-defined</secondary>
|
|
|
|
</indexterm>
|
|
|
|
|
2005-01-23 01:30:59 +01:00
|
|
|
<para>
|
2001-09-13 17:55:24 +02:00
|
|
|
Every operator is <quote>syntactic sugar</quote> for a call to an
|
1999-05-21 02:38:33 +02:00
|
|
|
underlying function that does the real work; so you must
|
|
|
|
first create the underlying function before you can create
|
2002-01-07 03:29:15 +01:00
|
|
|
the operator. However, an operator is <emphasis>not merely</emphasis>
|
|
|
|
syntactic sugar, because it carries additional information
|
1999-05-21 02:38:33 +02:00
|
|
|
that helps the query planner optimize queries that use the
|
2003-04-10 03:22:45 +02:00
|
|
|
operator. The next section will be devoted to explaining
|
1999-05-21 02:38:33 +02:00
|
|
|
that additional information.
|
2005-01-23 01:30:59 +01:00
|
|
|
</para>
|
2002-01-07 03:29:15 +01:00
|
|
|
|
2005-01-23 01:30:59 +01:00
|
|
|
<para>
|
2003-04-10 03:22:45 +02:00
|
|
|
<productname>PostgreSQL</productname> supports left unary, right
|
2003-08-31 19:32:24 +02:00
|
|
|
unary, and binary operators. Operators can be
|
|
|
|
overloaded;<indexterm><primary>overloading</primary><secondary>operators</secondary></indexterm>
|
|
|
|
that is, the same operator name can be used for different operators
|
|
|
|
that have different numbers and types of operands. When a query is
|
2003-04-10 03:22:45 +02:00
|
|
|
executed, the system determines the operator to call from the
|
|
|
|
number and types of the provided operands.
|
2005-01-23 01:30:59 +01:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
2005-01-23 01:30:59 +01:00
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
Here is an example of creating an operator for adding two complex
|
|
|
|
numbers. We assume we've already created the definition of type
|
|
|
|
<type>complex</type> (see <xref linkend="xtypes">). First we need a
|
|
|
|
function that does the work, then we can define the operator:
|
1999-04-08 15:29:08 +02:00
|
|
|
|
2005-01-23 01:30:59 +01:00
|
|
|
<programlisting>
|
1999-04-08 15:29:08 +02:00
|
|
|
CREATE FUNCTION complex_add(complex, complex)
|
|
|
|
RETURNS complex
|
2003-04-10 03:22:45 +02:00
|
|
|
AS '<replaceable>filename</replaceable>', 'complex_add'
|
2003-10-22 05:50:27 +02:00
|
|
|
LANGUAGE C IMMUTABLE STRICT;
|
1999-04-08 15:29:08 +02:00
|
|
|
|
|
|
|
CREATE OPERATOR + (
|
|
|
|
leftarg = complex,
|
|
|
|
rightarg = complex,
|
|
|
|
procedure = complex_add,
|
|
|
|
commutator = +
|
|
|
|
);
|
2005-01-23 01:30:59 +01:00
|
|
|
</programlisting>
|
|
|
|
</para>
|
1999-04-08 15:29:08 +02:00
|
|
|
|
2005-01-23 01:30:59 +01:00
|
|
|
<para>
|
2003-04-10 03:22:45 +02:00
|
|
|
Now we could execute a query like this:
|
2010-11-23 21:27:50 +01:00
|
|
|
|
2002-01-07 03:29:15 +01:00
|
|
|
<screen>
|
1999-04-08 15:29:08 +02:00
|
|
|
SELECT (a + b) AS c FROM test_complex;
|
|
|
|
|
2002-01-07 03:29:15 +01:00
|
|
|
c
|
|
|
|
-----------------
|
|
|
|
(5.2,6.05)
|
|
|
|
(133.42,144.95)
|
|
|
|
</screen>
|
2005-01-23 01:30:59 +01:00
|
|
|
</para>
|
1999-04-08 15:29:08 +02:00
|
|
|
|
2005-01-23 01:30:59 +01:00
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
We've shown how to create a binary operator here. To create unary
|
|
|
|
operators, just omit one of <literal>leftarg</> (for left unary) or
|
|
|
|
<literal>rightarg</> (for right unary). The <literal>procedure</>
|
|
|
|
clause and the argument clauses are the only required items in
|
|
|
|
<command>CREATE OPERATOR</command>. The <literal>commutator</>
|
|
|
|
clause shown in the example is an optional hint to the query
|
|
|
|
optimizer. Further details about <literal>commutator</> and other
|
2003-04-10 03:22:45 +02:00
|
|
|
optimizer hints appear in the next section.
|
2005-01-23 01:30:59 +01:00
|
|
|
</para>
|
2002-01-07 03:29:15 +01:00
|
|
|
</sect1>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="xoper-optimization">
|
1999-05-21 02:38:33 +02:00
|
|
|
<title>Operator Optimization Information</title>
|
1999-04-08 15:29:08 +02:00
|
|
|
|
|
|
|
<para>
|
2005-01-23 01:30:59 +01:00
|
|
|
A <productname>PostgreSQL</productname> operator definition can include
|
1999-05-21 02:38:33 +02:00
|
|
|
several optional clauses that tell the system useful things about how
|
|
|
|
the operator behaves. These clauses should be provided whenever
|
|
|
|
appropriate, because they can make for considerable speedups in execution
|
|
|
|
of queries that use the operator. But if you provide them, you must be
|
|
|
|
sure that they are right! Incorrect use of an optimization clause can
|
2007-02-06 05:38:31 +01:00
|
|
|
result in slow queries, subtly wrong output, or other Bad Things.
|
1999-05-21 02:38:33 +02:00
|
|
|
You can always leave out an optimization clause if you are not sure
|
1999-05-26 19:30:30 +02:00
|
|
|
about it; the only consequence is that queries might run slower than
|
1999-05-21 02:38:33 +02:00
|
|
|
they need to.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Additional optimization clauses might be added in future versions of
|
2005-01-23 01:30:59 +01:00
|
|
|
<productname>PostgreSQL</productname>. The ones described here are all
|
2002-01-07 03:29:15 +01:00
|
|
|
the ones that release &version; understands.
|
1999-05-21 02:38:33 +02:00
|
|
|
</para>
|
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<sect2>
|
2003-04-10 03:22:45 +02:00
|
|
|
<title><literal>COMMUTATOR</></title>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
The <literal>COMMUTATOR</> clause, if provided, names an operator that is the
|
1999-05-26 19:30:30 +02:00
|
|
|
commutator of the operator being defined. We say that operator A is the
|
|
|
|
commutator of operator B if (x A y) equals (y B x) for all possible input
|
2002-01-07 03:29:15 +01:00
|
|
|
values x, y. Notice that B is also the commutator of A. For example,
|
2001-09-13 17:55:24 +02:00
|
|
|
operators <literal><</> and <literal>></> for a particular data type are usually each others'
|
|
|
|
commutators, and operator <literal>+</> is usually commutative with itself.
|
|
|
|
But operator <literal>-</> is usually not commutative with anything.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2003-10-22 05:50:27 +02:00
|
|
|
The left operand type of a commutable operator is the same as the
|
2002-01-07 03:29:15 +01:00
|
|
|
right operand type of its commutator, and vice versa. So the name of
|
2005-01-23 01:30:59 +01:00
|
|
|
the commutator operator is all that <productname>PostgreSQL</productname>
|
2002-01-07 03:29:15 +01:00
|
|
|
needs to be given to look up the commutator, and that's all that needs to
|
|
|
|
be provided in the <literal>COMMUTATOR</> clause.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
2003-10-22 00:51:14 +02:00
|
|
|
<para>
|
|
|
|
It's critical to provide commutator information for operators that
|
|
|
|
will be used in indexes and join clauses, because this allows the
|
|
|
|
query optimizer to <quote>flip around</> such a clause to the forms
|
|
|
|
needed for different plan types. For example, consider a query with
|
|
|
|
a WHERE clause like <literal>tab1.x = tab2.y</>, where <literal>tab1.x</>
|
|
|
|
and <literal>tab2.y</> are of a user-defined type, and suppose that
|
|
|
|
<literal>tab2.y</> is indexed. The optimizer cannot generate an
|
2003-11-01 02:56:29 +01:00
|
|
|
index scan unless it can determine how to flip the clause around to
|
|
|
|
<literal>tab2.y = tab1.x</>, because the index-scan machinery expects
|
2003-10-22 00:51:14 +02:00
|
|
|
to see the indexed column on the left of the operator it is given.
|
2005-01-23 01:30:59 +01:00
|
|
|
<productname>PostgreSQL</productname> will <emphasis>not</> simply
|
2004-11-15 07:32:15 +01:00
|
|
|
assume that this is a valid transformation — the creator of the
|
2003-10-22 00:51:14 +02:00
|
|
|
<literal>=</> operator must specify that it is valid, by marking the
|
|
|
|
operator with commutator information.
|
|
|
|
</para>
|
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
|
|
|
When you are defining a self-commutative operator, you just do it.
|
|
|
|
When you are defining a pair of commutative operators, things are
|
|
|
|
a little trickier: how can the first one to be defined refer to the
|
|
|
|
other one, which you haven't defined yet? There are two solutions
|
|
|
|
to this problem:
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2007-01-20 21:45:41 +01:00
|
|
|
One way is to omit the <literal>COMMUTATOR</> clause in the first operator that
|
|
|
|
you define, and then provide one in the second operator's definition.
|
|
|
|
Since <productname>PostgreSQL</productname> knows that commutative
|
|
|
|
operators come in pairs, when it sees the second definition it will
|
|
|
|
automatically go back and fill in the missing <literal>COMMUTATOR</> clause in
|
|
|
|
the first definition.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2007-01-20 21:45:41 +01:00
|
|
|
The other, more straightforward way is just to include <literal>COMMUTATOR</> clauses
|
|
|
|
in both definitions. When <productname>PostgreSQL</productname> processes
|
|
|
|
the first definition and realizes that <literal>COMMUTATOR</> refers to a nonexistent
|
|
|
|
operator, the system will make a dummy entry for that operator in the
|
|
|
|
system catalog. This dummy entry will have valid data only
|
|
|
|
for the operator name, left and right operand types, and result type,
|
|
|
|
since that's all that <productname>PostgreSQL</productname> can deduce
|
|
|
|
at this point. The first operator's catalog entry will link to this
|
|
|
|
dummy entry. Later, when you define the second operator, the system
|
|
|
|
updates the dummy entry with the additional information from the second
|
|
|
|
definition. If you try to use the dummy operator before it's been filled
|
|
|
|
in, you'll just get an error message.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
</sect2>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<sect2>
|
2003-04-10 03:22:45 +02:00
|
|
|
<title><literal>NEGATOR</></title>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
The <literal>NEGATOR</> clause, if provided, names an operator that is the
|
1999-05-26 19:30:30 +02:00
|
|
|
negator of the operator being defined. We say that operator A
|
2002-01-07 03:29:15 +01:00
|
|
|
is the negator of operator B if both return Boolean results and
|
|
|
|
(x A y) equals NOT (x B y) for all possible inputs x, y.
|
1999-05-26 19:30:30 +02:00
|
|
|
Notice that B is also the negator of A.
|
2001-09-13 17:55:24 +02:00
|
|
|
For example, <literal><</> and <literal>>=</> are a negator pair for most data types.
|
2002-01-07 03:29:15 +01:00
|
|
|
An operator can never validly be its own negator.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
Unlike commutators, a pair of unary operators could validly be marked
|
1999-05-21 02:38:33 +02:00
|
|
|
as each others' negators; that would mean (A x) equals NOT (B x)
|
2002-01-07 03:29:15 +01:00
|
|
|
for all x, or the equivalent for right unary operators.
|
1999-05-21 02:38:33 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
An operator's negator must have the same left and/or right operand types
|
2003-04-10 03:22:45 +02:00
|
|
|
as the operator to be defined, so just as with <literal>COMMUTATOR</>, only the operator
|
2002-01-07 03:29:15 +01:00
|
|
|
name need be given in the <literal>NEGATOR</> clause.
|
1999-05-21 02:38:33 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
Providing a negator is very helpful to the query optimizer since
|
2002-09-21 20:32:54 +02:00
|
|
|
it allows expressions like <literal>NOT (x = y)</> to be simplified into
|
2003-04-10 03:22:45 +02:00
|
|
|
<literal>x <> y</>. This comes up more often than you might think, because
|
2002-09-21 20:32:54 +02:00
|
|
|
<literal>NOT</> operations can be inserted as a consequence of other rearrangements.
|
1999-05-21 02:38:33 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Pairs of negator operators can be defined using the same methods
|
|
|
|
explained above for commutator pairs.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2>
|
2003-04-10 03:22:45 +02:00
|
|
|
<title><literal>RESTRICT</></title>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
The <literal>RESTRICT</> clause, if provided, names a restriction selectivity
|
2003-04-10 03:22:45 +02:00
|
|
|
estimation function for the operator. (Note that this is a function
|
|
|
|
name, not an operator name.) <literal>RESTRICT</> clauses only make sense for
|
2002-01-07 03:29:15 +01:00
|
|
|
binary operators that return <type>boolean</>. The idea behind a restriction
|
1999-05-21 02:38:33 +02:00
|
|
|
selectivity estimator is to guess what fraction of the rows in a
|
2007-02-01 01:28:19 +01:00
|
|
|
table will satisfy a <literal>WHERE</literal>-clause condition of the form:
|
2005-01-23 01:30:59 +01:00
|
|
|
<programlisting>
|
2002-01-07 03:29:15 +01:00
|
|
|
column OP constant
|
2005-01-23 01:30:59 +01:00
|
|
|
</programlisting>
|
1999-05-21 02:38:33 +02:00
|
|
|
for the current operator and a particular constant value.
|
|
|
|
This assists the optimizer by
|
2002-01-07 03:29:15 +01:00
|
|
|
giving it some idea of how many rows will be eliminated by <literal>WHERE</>
|
1999-05-21 02:38:33 +02:00
|
|
|
clauses that have this form. (What happens if the constant is on
|
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
|
|
|
the left, you might be wondering? Well, that's one of the things that
|
2002-01-07 03:29:15 +01:00
|
|
|
<literal>COMMUTATOR</> is for...)
|
1999-05-21 02:38:33 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Writing new restriction selectivity estimation functions is far beyond
|
|
|
|
the scope of this chapter, but fortunately you can usually just use
|
|
|
|
one of the system's standard estimators for many of your own operators.
|
|
|
|
These are the standard restriction estimators:
|
2002-01-07 03:29:15 +01:00
|
|
|
<simplelist>
|
2007-01-20 21:45:41 +01:00
|
|
|
<member><function>eqsel</> for <literal>=</></member>
|
2002-01-07 03:29:15 +01:00
|
|
|
<member><function>neqsel</> for <literal><></></member>
|
|
|
|
<member><function>scalarltsel</> for <literal><</> or <literal><=</></member>
|
|
|
|
<member><function>scalargtsel</> for <literal>></> or <literal>>=</></member>
|
|
|
|
</simplelist>
|
1999-05-21 02:38:33 +02:00
|
|
|
It might seem a little odd that these are the categories, but they
|
2001-09-13 17:55:24 +02:00
|
|
|
make sense if you think about it. <literal>=</> will typically accept only
|
|
|
|
a small fraction of the rows in a table; <literal><></> will typically reject
|
|
|
|
only a small fraction. <literal><</> will accept a fraction that depends on
|
1999-05-21 02:38:33 +02:00
|
|
|
where the given constant falls in the range of values for that table
|
|
|
|
column (which, it just so happens, is information collected by
|
2001-05-07 02:43:27 +02:00
|
|
|
<command>ANALYZE</command> and made available to the selectivity estimator).
|
2001-09-13 17:55:24 +02:00
|
|
|
<literal><=</> will accept a slightly larger fraction than <literal><</> for the same
|
1999-05-21 02:38:33 +02:00
|
|
|
comparison constant, but they're close enough to not be worth
|
|
|
|
distinguishing, especially since we're not likely to do better than a
|
2001-09-13 17:55:24 +02:00
|
|
|
rough guess anyhow. Similar remarks apply to <literal>></> and <literal>>=</>.
|
1999-05-21 02:38:33 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
You can frequently get away with using either <function>eqsel</function> or <function>neqsel</function> for
|
1999-05-21 02:38:33 +02:00
|
|
|
operators that have very high or very low selectivity, even if they
|
2000-04-16 06:41:03 +02:00
|
|
|
aren't really equality or inequality. For example, the
|
2002-01-07 03:29:15 +01:00
|
|
|
approximate-equality geometric operators use <function>eqsel</function> on the assumption that
|
2000-04-16 06:41:03 +02:00
|
|
|
they'll usually only match a small fraction of the entries in a table.
|
1999-05-21 02:38:33 +02:00
|
|
|
</para>
|
2000-01-24 08:16:52 +01:00
|
|
|
|
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
You can use <function>scalarltsel</> and <function>scalargtsel</> for comparisons on data types that
|
2000-01-24 08:16:52 +01:00
|
|
|
have some sensible means of being converted into numeric scalars for
|
2001-09-10 23:58:47 +02:00
|
|
|
range comparisons. If possible, add the data type to those understood
|
2003-04-10 03:22:45 +02:00
|
|
|
by the function <function>convert_to_scalar()</function> in <filename>src/backend/utils/adt/selfuncs.c</filename>.
|
|
|
|
(Eventually, this function should be replaced by per-data-type functions
|
2002-01-07 03:29:15 +01:00
|
|
|
identified through a column of the <classname>pg_type</> system catalog; but that hasn't happened
|
2000-01-24 08:16:52 +01:00
|
|
|
yet.) If you do not do this, things will still work, but the optimizer's
|
|
|
|
estimates won't be as good as they could be.
|
|
|
|
</para>
|
2000-02-17 04:40:02 +01:00
|
|
|
|
|
|
|
<para>
|
2003-04-10 03:22:45 +02:00
|
|
|
There are additional selectivity estimation functions designed for geometric
|
2002-01-07 03:29:15 +01:00
|
|
|
operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>, <function>positionsel</function>,
|
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
|
|
|
and <function>contsel</function>. At this writing these are just stubs, but you might want
|
2000-02-17 04:40:02 +01:00
|
|
|
to use them (or even better, improve them) anyway.
|
|
|
|
</para>
|
1999-05-22 04:27:25 +02:00
|
|
|
</sect2>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-22 04:27:25 +02:00
|
|
|
<sect2>
|
2003-04-10 03:22:45 +02:00
|
|
|
<title><literal>JOIN</></title>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-22 04:27:25 +02:00
|
|
|
<para>
|
2002-01-07 03:29:15 +01:00
|
|
|
The <literal>JOIN</> clause, if provided, names a join selectivity
|
2003-04-10 03:22:45 +02:00
|
|
|
estimation function for the operator. (Note that this is a function
|
|
|
|
name, not an operator name.) <literal>JOIN</> clauses only make sense for
|
2002-01-07 03:29:15 +01:00
|
|
|
binary operators that return <type>boolean</type>. The idea behind a join
|
1999-05-22 04:27:25 +02:00
|
|
|
selectivity estimator is to guess what fraction of the rows in a
|
2007-02-01 01:28:19 +01:00
|
|
|
pair of tables will satisfy a <literal>WHERE</>-clause condition of the form:
|
2005-01-23 01:30:59 +01:00
|
|
|
<programlisting>
|
2002-01-07 03:29:15 +01:00
|
|
|
table1.column1 OP table2.column2
|
2005-01-23 01:30:59 +01:00
|
|
|
</programlisting>
|
2002-01-07 03:29:15 +01:00
|
|
|
for the current operator. As with the <literal>RESTRICT</literal> clause, this helps
|
1999-05-22 04:27:25 +02:00
|
|
|
the optimizer very substantially by letting it figure out which
|
|
|
|
of several possible join sequences is likely to take the least work.
|
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-22 04:27:25 +02:00
|
|
|
<para>
|
|
|
|
As before, this chapter will make no attempt to explain how to write
|
|
|
|
a join selectivity estimator function, but will just suggest that
|
|
|
|
you use one of the standard estimators if one is applicable:
|
2002-01-07 03:29:15 +01:00
|
|
|
<simplelist>
|
|
|
|
<member><function>eqjoinsel</> for <literal>=</></member>
|
|
|
|
<member><function>neqjoinsel</> for <literal><></></member>
|
|
|
|
<member><function>scalarltjoinsel</> for <literal><</> or <literal><=</></member>
|
|
|
|
<member><function>scalargtjoinsel</> for <literal>></> or <literal>>=</></member>
|
|
|
|
<member><function>areajoinsel</> for 2D area-based comparisons</member>
|
|
|
|
<member><function>positionjoinsel</> for 2D position-based comparisons</member>
|
|
|
|
<member><function>contjoinsel</> for 2D containment-based comparisons</member>
|
|
|
|
</simplelist>
|
1999-05-22 04:27:25 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-22 04:27:25 +02:00
|
|
|
<sect2>
|
2003-04-10 03:22:45 +02:00
|
|
|
<title><literal>HASHES</></title>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-22 04:27:25 +02:00
|
|
|
<para>
|
2002-05-11 04:09:41 +02:00
|
|
|
The <literal>HASHES</literal> clause, if present, tells the system that
|
|
|
|
it is permissible to use the hash join method for a join based on this
|
2003-04-10 03:22:45 +02:00
|
|
|
operator. <literal>HASHES</> only makes sense for a binary operator that
|
2007-02-06 05:38:31 +01:00
|
|
|
returns <literal>boolean</>, and in practice the operator must represent
|
|
|
|
equality for some data type or pair of data types.
|
1999-05-22 04:27:25 +02:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
|
|
|
The assumption underlying hash join is that the join operator can
|
2002-01-07 03:29:15 +01:00
|
|
|
only return true for pairs of left and right values that hash to the
|
1999-05-26 19:30:30 +02:00
|
|
|
same hash code. If two values get put in different hash buckets, the
|
|
|
|
join will never compare them at all, implicitly assuming that the
|
2002-01-07 03:29:15 +01:00
|
|
|
result of the join operator must be false. So it never makes sense
|
2003-06-23 00:04:55 +02:00
|
|
|
to specify <literal>HASHES</literal> for operators that do not represent
|
2007-02-06 05:38:31 +01:00
|
|
|
some form of equality. In most cases it is only practical to support
|
|
|
|
hashing for operators that take the same data type on both sides.
|
|
|
|
However, sometimes it is possible to design compatible hash functions
|
2007-11-28 16:42:31 +01:00
|
|
|
for two or more data types; that is, functions that will generate the
|
2007-02-06 05:38:31 +01:00
|
|
|
same hash codes for <quote>equal</> values, even though the values
|
|
|
|
have different representations. For example, it's fairly simple
|
|
|
|
to arrange this property when hashing integers of different widths.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-04-08 15:29:08 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2003-06-23 00:04:55 +02:00
|
|
|
To be marked <literal>HASHES</literal>, the join operator must appear
|
2006-12-23 01:43:13 +01:00
|
|
|
in a hash index operator family. This is not enforced when you create
|
|
|
|
the operator, since of course the referencing operator family couldn't
|
2003-06-23 00:04:55 +02:00
|
|
|
exist yet. But attempts to use the operator in hash joins will fail
|
2006-12-23 01:43:13 +01:00
|
|
|
at run time if no such operator family exists. The system needs the
|
2007-02-06 05:38:31 +01:00
|
|
|
operator family to find the data-type-specific hash function(s) for the
|
|
|
|
operator's input data type(s). Of course, you must also create suitable
|
|
|
|
hash functions before you can create the operator family.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2003-06-23 00:04:55 +02:00
|
|
|
Care should be exercised when preparing a hash function, because there
|
|
|
|
are machine-dependent ways in which it might fail to do the right thing.
|
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
|
|
|
For example, if your data type is a structure in which there might be
|
Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 20:10:30 +01:00
|
|
|
uninteresting pad bits, you cannot simply pass the whole structure to
|
2003-06-23 00:04:55 +02:00
|
|
|
<function>hash_any</>. (Unless you write your other operators and
|
|
|
|
functions to ensure that the unused bits are always zero, which is the
|
|
|
|
recommended strategy.)
|
|
|
|
Another example is that on machines that meet the <acronym>IEEE</>
|
|
|
|
floating-point standard, negative zero and positive zero are different
|
|
|
|
values (different bit patterns) but they are defined to compare equal.
|
|
|
|
If a float value might contain negative zero then extra steps are needed
|
|
|
|
to ensure it generates the same hash value as positive zero.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-04-08 15:29:08 +02:00
|
|
|
|
2007-02-06 05:38:31 +01:00
|
|
|
<para>
|
|
|
|
A hash-joinable operator must have a commutator (itself if the two
|
|
|
|
operand data types are the same, or a related equality operator
|
|
|
|
if they are different) that appears in the same operator family.
|
|
|
|
If this is not the case, planner errors might occur when the operator
|
|
|
|
is used. Also, it is a good idea (but not strictly required) for
|
2007-11-28 16:42:31 +01:00
|
|
|
a hash operator family that supports multiple data types to provide
|
|
|
|
equality operators for every combination of the data types; this
|
2007-02-06 05:38:31 +01:00
|
|
|
allows better optimization.
|
|
|
|
</para>
|
|
|
|
|
2003-01-15 20:35:48 +01:00
|
|
|
<note>
|
|
|
|
<para>
|
2003-11-01 02:56:29 +01:00
|
|
|
The function underlying a hash-joinable operator must be marked
|
2003-01-15 20:35:48 +01:00
|
|
|
immutable or stable. If it is volatile, the system will never
|
|
|
|
attempt to use the operator for a hash join.
|
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
|
|
|
|
<note>
|
|
|
|
<para>
|
2003-11-01 02:56:29 +01:00
|
|
|
If a hash-joinable operator has an underlying function that is marked
|
2003-01-15 20:35:48 +01:00
|
|
|
strict, the
|
2003-11-01 02:56:29 +01:00
|
|
|
function must also be complete: that is, it should return true or
|
|
|
|
false, never null, for any two nonnull inputs. If this rule is
|
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
|
|
|
not followed, hash-optimization of <literal>IN</> operations might
|
2003-01-15 20:35:48 +01:00
|
|
|
generate wrong results. (Specifically, <literal>IN</> might return
|
2007-01-20 21:45:41 +01:00
|
|
|
false where the correct answer according to the standard would be null;
|
|
|
|
or it might yield an error complaining that it wasn't prepared for a
|
|
|
|
null result.)
|
2003-01-15 20:35:48 +01:00
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
</sect2>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<sect2>
|
2006-12-23 01:43:13 +01:00
|
|
|
<title><literal>MERGES</></title>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2002-05-11 04:09:41 +02:00
|
|
|
The <literal>MERGES</literal> clause, if present, tells the system that
|
2003-04-10 03:22:45 +02:00
|
|
|
it is permissible to use the merge-join method for a join based on this
|
|
|
|
operator. <literal>MERGES</> only makes sense for a binary operator that
|
|
|
|
returns <literal>boolean</>, and in practice the operator must represent
|
2002-09-21 20:32:54 +02:00
|
|
|
equality for some data type or pair of data types.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2002-03-22 20:20:45 +01:00
|
|
|
Merge join is based on the idea of sorting the left- and right-hand tables
|
1999-05-26 19:30:30 +02:00
|
|
|
into order and then scanning them in parallel. So, both data types must
|
|
|
|
be capable of being fully ordered, and the join operator must be one
|
2002-05-11 04:09:41 +02:00
|
|
|
that can only succeed for pairs of values that fall at the
|
|
|
|
<quote>same place</>
|
1999-05-26 19:30:30 +02:00
|
|
|
in the sort order. In practice this means that the join operator must
|
2007-01-20 21:45:41 +01:00
|
|
|
behave like equality. But it is possible to merge-join two
|
1999-05-26 19:30:30 +02:00
|
|
|
distinct data types so long as they are logically compatible. For
|
2007-01-20 21:45:41 +01:00
|
|
|
example, the <type>smallint</type>-versus-<type>integer</type>
|
|
|
|
equality operator is merge-joinable.
|
2001-09-10 23:58:47 +02:00
|
|
|
We only need sorting operators that will bring both data types into a
|
1999-05-26 19:30:30 +02:00
|
|
|
logically compatible sequence.
|
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2006-12-23 01:43:13 +01:00
|
|
|
To be marked <literal>MERGES</literal>, the join operator must appear
|
2010-08-17 06:37:21 +02:00
|
|
|
as an equality member of a <literal>btree</> index operator family.
|
2007-01-20 21:45:41 +01:00
|
|
|
This is not enforced when you create
|
2006-12-23 01:43:13 +01:00
|
|
|
the operator, since of course the referencing operator family couldn't
|
|
|
|
exist yet. But the operator will not actually be used for merge joins
|
|
|
|
unless a matching operator family can be found. The
|
|
|
|
<literal>MERGES</literal> flag thus acts as a hint to the planner that
|
|
|
|
it's worth looking for a matching operator family.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
1999-05-21 02:38:33 +02:00
|
|
|
|
1999-05-26 19:30:30 +02:00
|
|
|
<para>
|
2007-01-20 21:45:41 +01:00
|
|
|
A merge-joinable operator must have a commutator (itself if the two
|
|
|
|
operand data types are the same, or a related equality operator
|
|
|
|
if they are different) that appears in the same operator family.
|
Update documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
2007-01-31 21:56:20 +01:00
|
|
|
If this is not the case, planner errors might occur when the operator
|
2007-01-20 21:45:41 +01:00
|
|
|
is used. Also, it is a good idea (but not strictly required) for
|
2010-08-17 06:37:21 +02:00
|
|
|
a <literal>btree</> operator family that supports multiple data types to provide
|
2007-11-28 16:42:31 +01:00
|
|
|
equality operators for every combination of the data types; this
|
2007-01-20 21:45:41 +01:00
|
|
|
allows better optimization.
|
1999-05-26 19:30:30 +02:00
|
|
|
</para>
|
2002-05-11 04:09:41 +02:00
|
|
|
|
2003-01-15 20:35:48 +01:00
|
|
|
<note>
|
|
|
|
<para>
|
2003-11-01 02:56:29 +01:00
|
|
|
The function underlying a merge-joinable operator must be marked
|
2003-01-15 20:35:48 +01:00
|
|
|
immutable or stable. If it is volatile, the system will never
|
|
|
|
attempt to use the operator for a merge join.
|
|
|
|
</para>
|
|
|
|
</note>
|
1999-05-26 19:30:30 +02:00
|
|
|
</sect2>
|
1999-04-08 15:29:08 +02:00
|
|
|
</sect1>
|