From 728c49779ba885d72f298d549fceb89789078b55 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 12 Sep 2019 16:52:03 -0300 Subject: [PATCH] Improve documentation about our XML functionality Add a section explaining how our XML features depart from current versions of the SQL standard. Update and clarify the descriptions of some XML functions. This is a backpatch for branches 10 and 11, taken from Tom's commit 12d46ac392d0 for 12, then edited to correctly describe behaviors that are fixed in 12 but still broken in 10 and 11. Chapman Flack, reviewed by Ryan Lambert Discussion: https://postgr.es/m/5BD1284C.1010305@anastigmatix.net Discussion: https://postgr.es/m/5C81F8C0.6090901@anastigmatix.net Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com --- doc/src/sgml/datatype.sgml | 5 + doc/src/sgml/features.sgml | 381 ++++++++++++++++++++++++++- doc/src/sgml/func.sgml | 184 +++++++------ src/backend/catalog/sql_features.txt | 6 +- 4 files changed, 490 insertions(+), 86 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 401a2f07ae..fa505e0c47 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4228,6 +4228,11 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11 value is a full document or only a content fragment. + + Limits and compatibility notes for the xml data type + can be found in . + + Creating XML Values diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml index 6c22d69867..253ec879f3 100644 --- a/doc/src/sgml/features.sgml +++ b/doc/src/sgml/features.sgml @@ -16,7 +16,8 @@ Language SQL. A revised version of the standard is released from time to time; the most recent update appearing in 2011. The 2011 version is referred to as ISO/IEC 9075:2011, or simply as SQL:2011. - The versions prior to that were SQL:2008, SQL:2003, SQL:1999, and SQL-92. Each version + The versions prior to that were SQL:2008, SQL:2006, SQL:2003, SQL:1999, + and SQL-92. Each version replaces the previous one, so claims of conformance to earlier versions have no official merit. PostgreSQL development aims for @@ -155,4 +156,382 @@ + + XML Limits and Conformance to SQL/XML + + + SQL/XML + limits and conformance + + + + Significant revisions to the XML-related specifications in ISO/IEC 9075-14 + (SQL/XML) were introduced with SQL:2006. + PostgreSQL's implementation of the XML data + type and related functions largely follows the earlier 2003 edition, + with some borrowing from later editions. In particular: + + + + Where the current standard provides a family of XML data types + to hold document or content in + untyped or XML Schema-typed variants, and a type + XML(SEQUENCE) to hold arbitrary pieces of XML content, + PostgreSQL provides the single + xml type, which can hold document or + content. There is no equivalent of the + standard's sequence type. + + + + + + PostgreSQL provides two functions + introduced in SQL:2006, but in variants that use the XPath 1.0 + language, rather than XML Query as specified for them in the + standard. + + + + + + + This section presents some of the resulting differences you may encounter. + + + + Queries are restricted to XPath 1.0 + + + The PostgreSQL-specific functions + xpath() and xpath_exists() + query XML documents using the XPath language. + PostgreSQL also provides XPath-only variants + of the standard functions XMLEXISTS and + XMLTABLE, which officially use + the XQuery language. For all of these functions, + PostgreSQL relies on the + libxml2 library, which provides only XPath 1.0. + + + + There is a strong connection between the XQuery language and XPath + versions 2.0 and later: any expression that is syntactically valid and + executes successfully in both produces the same result (with a minor + exception for expressions containing numeric character references or + predefined entity references, which XQuery replaces with the + corresponding character while XPath leaves them alone). But there is + no such connection between these languages and XPath 1.0; it was an + earlier language and differs in many respects. + + + + There are two categories of limitation to keep in mind: the restriction + from XQuery to XPath for the functions specified in the SQL standard, and + the restriction of XPath to version 1.0 for both the standard and the + PostgreSQL-specific functions. + + + + Restriction of XQuery to XPath + + + Features of XQuery beyond those of XPath include: + + + + + XQuery expressions can construct and return new XML nodes, in + addition to all possible XPath values. XPath can create and return + values of the atomic types (numbers, strings, and so on) but can + only return XML nodes that were already present in documents + supplied as input to the expression. + + + + + + XQuery has control constructs for iteration, sorting, and grouping. + + + + + + XQuery allows declaration and use of local functions. + + + + + + + Recent XPath versions begin to offer capabilities overlapping with + these (such as functional-style for-each and + sort, anonymous functions, and + parse-xml to create a node from a string), + but such features were not available before XPath 3.0. + + + + + Restriction of XPath to 1.0 + + + For developers familiar with XQuery and XPath 2.0 or later, XPath 1.0 + presents a number of differences to contend with: + + + + + The fundamental type of an XQuery/XPath expression, the + sequence, which can contain XML nodes, atomic values, + or both, does not exist in XPath 1.0. A 1.0 expression can only + produce a node-set (containing zero or more XML nodes), or a single + atomic value. + + + + + + Unlike an XQuery/XPath sequence, which can contain any desired + items in any desired order, an XPath 1.0 node-set has no + guaranteed order and, like any set, does not allow multiple + appearances of the same item. + + + The libxml2 library does seem to + always return node-sets to PostgreSQL + with their members in the same relative order they had in the + input document. Its documentation does not commit to this + behavior, and an XPath 1.0 expression cannot control it. + + + + + + + + While XQuery/XPath provides all of the types defined in XML Schema + and many operators and functions over those types, XPath 1.0 has only + node-sets and the three atomic types boolean, + double, and string. + + + + + + XPath 1.0 has no conditional operator. An XQuery/XPath expression + such as if ( hat ) then hat/@size else "no hat" + has no XPath 1.0 equivalent. + + + + + + XPath 1.0 has no ordering comparison operator for strings. Both + "cat" < "dog" and + "cat" > "dog" are false, because each is a + numeric comparison of two NaNs. In contrast, + = and != do compare the strings + as strings. + + + + + + XPath 1.0 blurs the distinction between + value comparisons and + general comparisons as XQuery/XPath define + them. Both sale/@hatsize = 7 and + sale/@customer = "alice" are existentially + quantified comparisons, true if there is + any sale with the given value for the + attribute, but sale/@taxable = false() is a + value comparison to the + effective boolean value of a whole node-set. + It is true only if no sale has + a taxable attribute at all. + + + + + + In the XQuery/XPath data model, a document + node can have either document form (i.e., exactly one + top-level element, with only comments and processing instructions + outside of it) or content form (with those constraints + relaxed). Its equivalent in XPath 1.0, the + root node, can only be in document form. + This is part of the reason an xml value passed as the + context item to any PostgreSQL + XPath-based function must be in document form. + + + + + + + The differences highlighted here are not all of them. In XQuery and + the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility + mode, and the W3C lists of + function library changes + and + language changes + applied in that mode offer a more complete (but still not exhaustive) + account of the differences. The compatibility mode cannot make the + later languages exactly equivalent to XPath 1.0. + + + + + Mappings between SQL and XML data types and values + + + In SQL:2006 and later, both directions of conversion between standard SQL + data types and the XML Schema types are specified precisely. However, the + rules are expressed using the types and semantics of XQuery/XPath, and + have no direct application to the different data model of XPath 1.0. + + + + When PostgreSQL maps SQL data values to XML + (as in xmlelement), or XML to SQL (as in the output + columns of xmltable), except for a few cases + treated specially, PostgreSQL simply assumes + that the XML data type's XPath 1.0 string form will be valid as the + text-input form of the SQL datatype, and conversely. This rule has the + virtue of simplicity while producing, for many data types, results similar + to the mappings specified in the standard. In this release, + an explicit cast is needed if an xmltable column + expression produces a boolean or double value; see + . + + + + Where interoperability with other systems is a concern, for some data + types, it may be necessary to use data type formatting functions (such + as those in ) explicitly to + produce the standard mappings. + + + + + + + Incidental limits of the implementation + + + + This section concerns limits that are not inherent in the + libxml2 library, but apply to the current + implementation in PostgreSQL. + + + + + Cast needed for <function>xmltable</function> column + of boolean or double type + + + + An xmltable column expression evaluating to an XPath + boolean or number result will produce an unexpected XPath object + type error. The workaround is to rewrite the column expression to + be inside the XPath string function; + PostgreSQL will then assign the string value + successfully to an SQL output column of boolean or double type. + + + + + + Column path result or SQL result column of XML type + + + + In this release, a xmltable column expression + that evaluates to an XML node-set can be assigned to an SQL result + column of XML type, producing a concatenation of: for most types of + node in the node-set, a text node containing the XPath 1.0 + string-value of the node, but for an element node, + a copy of the node itself. Such a node-set may be assigned to an SQL + column of non-XML type only if the node-set has a single node, with the + string-value of most node types replaced with an empty string, the + string-value of an element node replaced with a concatenation of only its + direct text-node children (excluding those of descendants), and the + string-value of a text or attribute node being as defined in XPath 1.0. + An XPath string value assigned to a result column of XML type must be + parsable as XML. + + + + It is best not to develop code that relies on these behaviors, which have + little resemblance to the spec, and are changed in + PostgreSQL 12. + + + + + Only <literal>BY VALUE</literal> passing mechanism is supported + + + The SQL standard defines two passing mechanisms + that apply when passing an XML argument from SQL to an XML function or + receiving a result: BY REF, in which a particular XML + value retains its node identity, and BY VALUE, in which + the content of the XML is passed but node identity is not preserved. A + mechanism can be specified before a list of parameters, as the default + mechanism for all of them, or after any parameter, to override the + default. + + + + To illustrate the difference, if + x is an XML value, these two queries in + an SQL:2006 environment would produce true and false, respectively: + + +SELECT XMLQUERY('$a is $b' PASSING BY REF x AS a, x AS b NULL ON EMPTY); +SELECT XMLQUERY('$a is $b' PASSING BY VALUE x AS a, x AS b NULL ON EMPTY); + + + + + In this release, PostgreSQL will accept + BY REF in an + XMLEXISTS or XMLTABLE + construct, but will ignore it. The xml data type holds + a character-string serialized representation, so there is no node + identity to preserve, and passing is always effectively BY + VALUE. + + + + + Cannot pass named parameters to queries + + + The XPath-based functions support passing one parameter to serve as the + XPath expression's context item, but do not support passing additional + values to be available to the expression as named parameters. + + + + + No <type>XML(SEQUENCE)</type> type + + + The PostgreSQL xml data type + can only hold a value in DOCUMENT + or CONTENT form. An XQuery/XPath expression + context item must be a single XML node or atomic value, but XPath 1.0 + further restricts it to be only an XML node, and has no node type + allowing CONTENT. The upshot is that a + well-formed DOCUMENT is the only form of XML value + that PostgreSQL can supply as an XPath + context item. + + + + + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f81b163bf8..5b1a2b4190 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10045,16 +10045,25 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + XML Functions + + XML Functions + + The functions and function-like expressions described in this - section operate on values of type xml. Check xml. See for information about the xml type. The function-like expressions xmlparse and xmlserialize for converting to and from - type xml are not repeated here. Use of most of these - functions requires the installation to have been built + type xml are documented there, not in this section. + + + + Use of most of these functions + requires PostgreSQL to have been built with configure --with-libxml. @@ -10249,8 +10258,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar), encoding, depending on the setting of the configuration parameter . The particular behavior for individual data types is expected to evolve in order to align the - SQL and PostgreSQL data types with the XML Schema specification, - at which point a more precise description will appear. + PostgreSQL mappings with those specified in SQL:2006 and later, + as discussed in . @@ -10492,10 +10501,13 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab; - The function xmlexists returns true if the - XPath expression in the first argument returns any nodes, and - false otherwise. (If either argument is null, the result is - null.) + The function xmlexists evaluates an XPath 1.0 + expression (the first argument), with the passed XML value as its context + item. The function returns false if the result of that evaluation + yields an empty node-set, true if it yields any other value. The + function returns null if any argument is null. A nonnull value + passed as the context item must be an XML document, not a content + fragment or any non-XML value. @@ -10511,14 +10523,14 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF 'Tor - The BY REF clauses have no effect in - PostgreSQL, but are allowed for SQL conformance and compatibility - with other implementations. Per SQL standard, the - first BY REF is required, the second is - optional. Also note that the SQL standard specifies - the xmlexists construct to take an XQuery - expression as first argument, but PostgreSQL currently only - supports XPath, which is a subset of XQuery. + The BY REF clauses + are accepted in PostgreSQL, but are ignored, + as discussed in . + In the SQL standard, the xmlexists function + evaluates an expression in the XML Query language, + but PostgreSQL allows only an XPath 1.0 + expression, as discussed in + . @@ -10624,12 +10636,12 @@ SELECT xml_is_well_formed_document('test The function xpath_exists is a specialized form of the xpath function. Instead of returning the - individual XML values that satisfy the XPath, this function returns a - Boolean indicating whether the query was satisfied or not. This - function is equivalent to the standard XMLEXISTS predicate, + individual XML values that satisfy the XPath 1.0 expression, this function + returns a Boolean indicating whether the query was satisfied or not + (specifically, whether it produced any value other than an empty node-set). + This function is equivalent to the XMLEXISTS predicate, except that it also offers support for a namespace mapping argument. @@ -10734,8 +10747,8 @@ SELECT xpath_exists('/my:a/text()', 'test The xmltable function produces a table based - on the given XML value, an XPath filter to extract rows, and an - optional set of column definitions. + on the given XML value, an XPath filter to extract rows, and a + set of column definitions. @@ -10746,30 +10759,34 @@ SELECT xpath_exists('/my:a/text()', 'test - The required row_expression argument is an XPath - expression that is evaluated against the supplied XML document to - obtain an ordered sequence of XML nodes. This sequence is what - xmltable transforms into output rows. + The required row_expression argument is + an XPath 1.0 expression that is evaluated, passing the + document_expression as its context item, to + obtain a set of XML nodes. These nodes are what + xmltable transforms into output rows. No rows + will be produced if the document_expression + is null, nor if the row_expression produces + an empty node-set or any value other than a node-set. - document_expression provides the XML document to - operate on. - The BY REF clauses have no effect in PostgreSQL, - but are allowed for SQL conformance and compatibility with other - implementations. - The argument must be a well-formed XML document; fragments/forests - are not accepted. + document_expression provides the context + item for the row_expression. It must be a + well-formed XML document; fragments/forests are not accepted. + The BY REF clause + is accepted but ignored, as discussed in + . + In the SQL standard, the xmltable function + evaluates expressions in the XML Query language, + but PostgreSQL allows only XPath 1.0 + expressions, as discussed in + . The mandatory COLUMNS clause specifies the list of columns in the output table. - If the COLUMNS clause is omitted, the rows in the result - set contain a single column of type xml containing the - data matched by row_expression. - If COLUMNS is specified, each entry describes a - single column. + Each entry describes a single column. See the syntax summary above for the format. The column name and type are required; the path, default and nullability clauses are optional. @@ -10777,48 +10794,57 @@ SELECT xpath_exists('/my:a/text()', 'test A column marked FOR ORDINALITY will be populated - with row numbers matching the order in which the - output rows appeared in the original input XML document. + with row numbers, starting with 1, in the order of nodes retrieved from + the row_expression's result node-set. At most one column may be marked FOR ORDINALITY. + + + XPath 1.0 does not specify an order for nodes in a node-set, so code + that relies on a particular order of the results will be + implementation-dependent. Details can be found in + . + + + - The column_expression for a column is an XPath expression - that is evaluated for each row, relative to the result of the - row_expression, to find the value of the column. - If no column_expression is given, then the column name - is used as an implicit path. + The column_expression for a column is an + XPath 1.0 expression that is evaluated for each row, with the current + node from the row_expression result as its + context item, to find the value of the column. If + no column_expression is given, then the + column name is used as an implicit path. - If a column's XPath expression returns multiple elements, an error - is raised. - If the expression matches an empty tag, the result is an - empty string (not NULL). - Any xsi:nil attributes are ignored. + If a column's XPath expression returns a non-XML value (limited to + string, boolean, or double in XPath 1.0) and the column has a + PostgreSQL type other than xml, the column will be set + as if by assigning the value's string representation to the PostgreSQL + type. In this release, an XPath boolean or double result must be explicitly + cast to string (that is, the XPath 1.0 string function + wrapped around the original column expression); + PostgreSQL can then successfully assign the + string to an SQL result column of boolean or double type. + These conversion rules differ from those of the SQL + standard, as discussed in . - The text body of the XML matched by the column_expression - is used as the column value. Multiple text() nodes - within an element are concatenated in order. Any child elements, - processing instructions, and comments are ignored, but the text contents - of child elements are concatenated to the result. - Note that the whitespace-only text() node between two non-text - elements is preserved, and that leading whitespace on a text() - node is not flattened. + In this release, SQL result columns of xml type, or + column XPath expressions evaluating to an XML type, regardless of the + output column SQL type, are handled as described in + ; the behavior + changes significantly in PostgreSQL 12. - If the path expression does not match for a given row but - default_expression is specified, the value resulting - from evaluating that expression is used. - If no DEFAULT clause is given for the column, - the field will be set to NULL. - It is possible for a default_expression to reference - the value of output columns that appear prior to it in the column list, - so the default of one column may be based on the value of another - column. + If the path expression returns an empty node-set + (typically, when it does not match) + for a given row, the column will be set to NULL, unless + a default_expression is specified; then the + value resulting from evaluating that expression is used. @@ -10830,20 +10856,14 @@ SELECT xpath_exists('/my:a/text()', 'test - Unlike regular PostgreSQL functions, column_expression - and default_expression are not evaluated to a simple - value before calling the function. - column_expression is normally evaluated - exactly once per input row, and default_expression - is evaluated each time a default is needed for a field. - If the expression qualifies as stable or immutable the repeat + A default_expression, rather than being + evaluated immediately when xmltable is called, + is evaluated each time a default is needed for the column. + If the expression qualifies as stable or immutable, the repeat evaluation may be skipped. - Effectively xmltable behaves more like a subquery than a - function call. This means that you can usefully use volatile functions like - nextval in default_expression, and - column_expression may depend on other parts of the - XML document. + nextval in + default_expression. diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index aeb262a5b0..915696ee3f 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -593,7 +593,7 @@ X085 Predefined namespace prefixes NO X086 XML namespace declarations in XMLTable NO X090 XML document predicate YES X091 XML content predicate NO -X096 XMLExists NO XPath only +X096 XMLExists NO XPath 1.0 only X100 Host language support for XML: CONTENT option NO X101 Host language support for XML: DOCUMENT option NO X110 Host language support for XML: VARCHAR mapping NO @@ -661,11 +661,11 @@ X282 XMLValidate with CONTENT option NO X283 XMLValidate with SEQUENCE option NO X284 XMLValidate: NAMESPACE without ELEMENT clause NO X286 XMLValidate: NO NAMESPACE with ELEMENT clause NO -X300 XMLTable NO XPath only +X300 XMLTable NO XPath 1.0 only X301 XMLTable: derived column list option YES X302 XMLTable: ordinality column option YES X303 XMLTable: column default option YES -X304 XMLTable: passing a context item YES +X304 XMLTable: passing a context item YES must be XML DOCUMENT X305 XMLTable: initializing an XQuery variable NO X400 Name and identifier mapping YES X410 Alter column data type: XML type YES