Type Conversion
SQL queries can, intentionally or not, require
mixing of different data types in the same expression.
Postgres has extensive facilities for
evaluating mixed-type expressions.
In many cases a user will not need
to understand the details of the type conversion mechanism.
However, the implicit conversions done by Postgres
can affect the apparent results of a query, and these results
can be tailored by a user or programmer
using explicit type coercion.
This chapter introduces the Postgres
type conversion mechanisms and conventions.
Refer to the relevant sections in the User's Guide and Programmer's Guide
for more information on specific data types and allowed functions and operators.
The Programmer's Guide has more details on the exact algorithms used for
implicit type conversion and coercion.
Overview
SQL is a strongly typed language. That is, every data item
has an associated data type which determines its behavior and allowed usage.
Postgres has an extensible type system which is
much more general and flexible than other RDBMS implementations.
Hence, most type conversion behavior in Postgres
should be governed by general rules rather than by ad-hoc heuristics to allow
mixed-type expressions to be meaningful, even with user-defined types.
The Postgres scanner/parser decodes lexical elements
into only five fundamental categories: integers, floats, strings, names, and keywords.
Most extended types are first tokenized into strings. The SQL
language definition allows specifying type names with strings, and this mechanism
is used by Postgres
to start the parser down the correct path. For example, the query
tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
Label | Value
--------+-------
Origin | (0,0)
(1 row)
has two strings, of type text and point.
If a type is not specified, then the placeholder type unknown
is assigned initially, to be resolved in later stages as described below.
There are four fundamental SQL constructs requiring
distinct type conversion rules in the Postgres
parser:
Operators
Postgres allows expressions with
left- and right-unary (one argument) operators,
as well as binary (two argument) operators.
Function calls
Much of the Postgres type system is built around a rich set of
functions. Function calls have one or more arguments which, for any specific query,
must be matched to the functions available in the system catalog.
Query targets
SQL INSERT statements place the results of query into a table. The expressions
in the query must be matched up with, and perhaps converted to, the target columns of the insert.
UNION queries
Since all select results from a UNION SELECT statement must appear in a single set of columns, the types
of each SELECT clause must be matched up and converted to a uniform set.
Many of the general type conversion rules use simple conventions built on
the Postgres function and operator system tables.
There are some heuristics included in the conversion rules to better support
conventions for the SQL92 standard native types such as
smallint, integer, and float.
The Postgres parser uses the convention that all
type conversion functions take a single argument of the source type and are
named with the same name as the target type. Any function meeting this
criteria is considered to be a valid conversion function, and may be used
by the parser as such. This simple assumption gives the parser the power
to explore type conversion possibilities without hardcoding, allowing
extended user-defined types to use these same features transparently.
An additional heuristic is provided in the parser to allow better guesses
at proper behavior for SQL standard types. There are
five categories of types defined: boolean, string, numeric, geometric,
and user-defined. Each category, with the exception of user-defined, has
a "preferred type" which is used to resolve ambiguities in candidates.
Each "user-defined" type is its own "preferred type", so ambiguous
expressions (those with multiple candidate parsing solutions)
with only one user-defined type can resolve to a single best choice, while those with
multiple user-defined types will remain ambiguous and throw an error.
Ambiguous expressions which have candidate solutions within only one type category are
likely to resolve, while ambiguous expressions with candidates spanning multiple
categories are likely to throw an error and ask for clarification from the user.
Guidelines
All type conversion rules are designed with several principles in mind:
Implicit conversions should never have surprising or unpredictable outcomes.
User-defined types, of which the parser has no a-priori knowledge, should be
"higher" in the type hierarchy. In mixed-type expressions, native types shall always
be converted to a user-defined type (of course, only if conversion is necessary).
User-defined types are not related. Currently, Postgres
does not have information available to it on relationships between types, other than
hardcoded heuristics for built-in types and implicit relationships based on available functions
in the catalog.
There should be no extra overhead from the parser or executor
if a query does not need implicit type conversion.
That is, if a query is well formulated and the types already match up, then the query should proceed
without spending extra time in the parser and without introducing unnecessary implicit conversion
functions into the query.
Additionally, if a query usually requires an implicit conversion for a function, and
if then the user defines an explicit function with the correct argument types, the parser
should use this new function and will no longer do the implicit conversion using the old function.
Operators
Conversion Procedure
Operator Evaluation
Check for an exact match in the pg_operator system catalog.
If one argument of a binary operator is unknown,
then assume it is the same type as the other argument.
Reverse the arguments, and look for an exact match with an operator which
points to itself as being commutative.
If found, then reverse the arguments in the parse tree and use this operator.
Look for the best match.
Make a list of all operators of the same name.
If only one operator is in the list, use it if the input type can be coerced,
and throw an error if the type cannot be coerced.
Keep all operators with the most explicit matches for types. Keep all if there
are no explicit matches and move to the next step.
If only one candidate remains, use it if the type can be coerced.
If any input arguments are "unknown", categorize the input candidates as
boolean, numeric, string, geometric, or user-defined. If there is a mix of
categories, or more than one user-defined type, throw an error because
the correct choice cannot be deduced without more clues.
If only one category is present, then assign the "preferred type"
to the input column which had been previously "unknown".
Choose the candidate with the most exact type matches, and which matches
the "preferred type" for each column category from the previous step.
If there is still more than one candidate, or if there are none,
then throw an error.
Examples
Exponentiation Operator
There is only one exponentiation
operator defined in the catalog, and it takes float8 arguments.
The scanner assigns an initial type of int4 to both arguments
of this query expression:
tgl=> select 2 ^ 3 AS "Exp";
Exp
-----
8
(1 row)
So the parser does a type conversion on both operands and the query
is equivalent to
tgl=> select float8(2) ^ float8(3) AS "Exp";
Exp
-----
8
(1 row)
or
tgl=> select 2.0 ^ 3.0 AS "Exp";
Exp
-----
8
(1 row)
This last form has the least overhead, since no functions are called to do
implicit type conversion. This is not an issue for small queries, but may
have an impact on the performance of queries involving large tables.
String Concatenation
A string-like syntax is used for working with string types as well as for
working with complex extended types.
Strings with unspecified type are matched with likely operator candidates.
One unspecified argument:
tgl=> SELECT text 'abc' || 'def' AS "Text and Unknown";
Text and Unknown
------------------
abcdef
(1 row)
In this case the parser looks to see if there is an operator taking text
for both arguments. Since there is, it assumes that the second argument should
be interpreted as of type text.
Concatenation on unspecified types:
tgl=> SELECT 'abc' || 'def' AS "Unspecified";
Unspecified
-------------
abcdef
(1 row)
In this case there is no initial hint for which type to use, since no types
are specified in the query. So, the parser looks for all candidate operators
and finds that all arguments for all the candidates are string types. It chooses
the "preferred type" for strings, text, for this query.
If a user defines a new type and defines an operator ||
to work
with it, then this query would no longer succeed as written. The parser would
now have candidate types from two categories, and could not decide which to use.
Factorial
This example illustrates an interesting result. Traditionally, the
factorial operator is defined for integers only. The Postgres
operator catalog has only one entry for factorial, taking an integer operand.
If given a non-integer numeric argument, Postgres
will try to convert that argument to an integer for evaluation of the
factorial.
tgl=> select (4.3 !);
?column?
----------
24
(1 row)
Of course, this leads to a mathematically suspect result,
since in principle the factorial of a non-integer is not defined.
However, the role of a database is not to teach mathematics, but
to be a tool for data manipulation. If a user chooses to take the
factorial of a floating point number, Postgres
will try to oblige.
Functions
Function Evaluation
Check for an exact match in the pg_proc system catalog.
Look for the best match.
Make a list of all functions of the same name with the same number of arguments.
If only one function is in the list, use it if the input types can be coerced,
and throw an error if the types cannot be coerced.
Keep all functions with the most explicit matches for types. Keep all if there
are no explicit matches and move to the next step.
If only one candidate remains, use it if the type can be coerced.
If any input arguments are "unknown", categorize the input candidate arguments as
boolean, numeric, string, geometric, or user-defined. If there is a mix of
categories, or more than one user-defined type, throw an error because
the correct choice cannot be deduced without more clues.
If only one category is present, then assign the "preferred type"
to the input column which had been previously "unknown".
Choose the candidate with the most exact type matches, and which matches
the "preferred type" for each column category from the previous step.
If there is still more than one candidate, or if there are none,
then throw an error.
Examples
Factorial Function
There is only one factorial function defined in the pg_proc catalog.
So the following query automatically converts the int2 argument
to int4:
tgl=> select int4fac(int2 '4');
int4fac
---------
24
(1 row)
and is actually transformed by the parser to
tgl=> select int4fac(int4(int2 '4'));
int4fac
---------
24
(1 row)
Substring Function
There are two substr functions declared in pg_proc. However,
only one takes two arguments, of types text and int4.
If called with a string constant of unspecified type, the type is matched up
directly with the only candidate function type:
tgl=> select substr('1234', 3);
substr
--------
34
(1 row)
If the string is declared to be of type varchar, as might be the case
if it comes from a table, then the parser will try to coerce it to become text:
tgl=> select substr(varchar '1234', 3);
substr
--------
34
(1 row)
which is transformed by the parser to become
tgl=> select substr(text(varchar '1234'), 3);
substr
--------
34
(1 row)
There are some heuristics in the parser to optimize the relationship between the
char, varchar, and text types.
For this case, substr is called directly with the varchar string
rather than inserting an explicit conversion call.
And, if the function is called with an int4, the parser will
try to convert that to text:
tgl=> select substr(1234, 3);
substr
--------
34
(1 row)
actually executes as
tgl=> select substr(text(1234), 3);
substr
--------
34
(1 row)
Query Targets
Target Evaluation
Check for an exact match with the target.
Try to coerce the expression directly to the target type if necessary.
If the target is a fixed-length type (e.g. char or varchar
declared with a length) then try to find a sizing function of the same name
as the type taking two arguments, the first the type name and the second an
integer length.
Examples
varchar Storage
For a target column declared as varchar(4) the following query
ensures that the target is sized correctly:
tgl=> CREATE TABLE vv (v varchar(4));
CREATE
tgl=> INSERT INTO vv SELECT 'abc' || 'def';
INSERT 392905 1
tgl=> SELECT * FROM vv;
v
------
abcd
(1 row)
UNION Queries
The UNION construct is somewhat different in that it must match up
possibly dissimilar types to become a single result set.
UNION Evaluation
Check for identical types for all results.
Coerce each result from the UNION clauses to match the type of the
first SELECT clause or the target column.
Examples
Underspecified Types
tgl=> SELECT text 'a' AS "Text" UNION SELECT 'b';
Text
------
a
b
(2 rows)
Simple UNION
tgl=> SELECT 1.2 AS "Float8" UNION SELECT 1;
Float8
--------
1
1.2
(2 rows)
Transposed UNION
The types of the union are forced to match the types of
the first/top clause in the union:
tgl=> SELECT 1 AS "All integers"
tgl-> UNION SELECT '2.2'::float4
tgl-> UNION SELECT 3.3;
All integers
--------------
1
2
3
(3 rows)
An alternate parser strategy could be to choose the "best" type of the bunch, but
this is more difficult because of the nice recursion technique used in the
parser. However, the "best" type is used when selecting into
a table:
tgl=> CREATE TABLE ff (f float);
CREATE
tgl=> INSERT INTO ff
tgl-> SELECT 1
tgl-> UNION SELECT '2.2'::float4
tgl-> UNION SELECT 3.3;
INSERT 0 3
tgl=> SELECT f AS "Floating point" from ff;
Floating point
------------------
1
2.20000004768372
3.3
(3 rows)