Commit Graph

36 Commits

Author SHA1 Message Date
Tom Lane a9d199f6d3 Fix json_to_record() bug with nested objects.
A thinko concerning nesting depth caused json_to_record() to produce bogus
output if a field of its input object contained a sub-object with a field
name matching one of the requested output column names.  Per bug #13996
from Johann Visagie.

I added a regression test case based on his example, plus parallel tests
for json_to_recordset, jsonb_to_record, jsonb_to_recordset.  The latter
three do not exhibit the same bug (which suggests that we may be missing
some opportunities to share code...) but testing seems like a good idea
in any case.

Back-patch to 9.4 where these functions were introduced.
2016-03-02 23:31:39 -05:00
Andrew Dunstan 94c745eb18 Fix two-argument jsonb_object when called with empty arrays
Some over-eager copy-and-pasting on my part resulted in a nonsense
result being returned in this case. I have adopted the same pattern for
handling this case as is used in the one argument form of the function,
i.e. we just skip over the code that adds values to the object.

Diagnosis and patch from Michael Paquier, although not quite his
solution.

Fixes bug #13936.

Backpatch to 9.5 where jsonb_object was introduced.
2016-02-21 10:30:49 -05:00
Tom Lane d435542583 Fix incorrect translation of minus-infinity datetimes for json/jsonb.
Commit bda76c1c8c caused both plus and
minus infinity to be rendered as "infinity", which is not only wrong
but inconsistent with the pre-9.4 behavior of to_json().  Fix that by
duplicating the coding in date_out/timestamp_out/timestamptz_out more
closely.  Per bug #13687 from Stepan Perlov.  Back-patch to 9.4, like
the previous commit.

In passing, also re-pgindent json.c, since it had gotten a bit messed up by
recent patches (and I was already annoyed by indentation-related problems
in back-patching this fix ...)
2015-10-20 11:07:04 -07:00
Andrew Dunstan b6363772fd Factor out encoding specific tests for json
This lets us remove the large alternative results files for the main
json and jsonb tests, which makes modifying those tests simpler for
committers and patch submitters.

Backpatch to 9.4 for jsonb and 9.3 for json.
2015-10-07 22:18:27 -04:00
Tom Lane 9e36c91b46 Fix insufficiently-portable regression test case.
Some of the buildfarm members are evidently miserly enough of stack space
to pass the originally-committed form of this test.  Increase the
requirement 10X to hopefully ensure that it fails as-expected everywhere.

Security: CVE-2015-5289
2015-10-05 12:19:14 -04:00
Noah Misch 08fa47c485 Prevent stack overflow in json-related functions.
Sufficiently-deep recursion heretofore elicited a SIGSEGV.  If an
application constructs PostgreSQL json or jsonb values from arbitrary
user input, application users could have exploited this to terminate all
active database connections.  That applies to 9.3, where the json parser
adopted recursive descent, and later versions.  Only row_to_json() and
array_to_json() were at risk in 9.2, both in a non-security capacity.
Back-patch to 9.2, where the json type was introduced.

Oskari Saarenmaa, reviewed by Michael Paquier.

Security: CVE-2015-5289
2015-10-05 10:06:29 -04:00
Andrew Dunstan d9a356ff2e Fix treatment of nulls in jsonb_agg and jsonb_object_agg
The wrong is_null flag was being passed to datum_to_json. Also, null
object key values are not permitted, and this was not being checked
for. Add regression tests covering these cases, and also add those tests
to the json set, even though it was doing the right thing.

Fixes bug #13514, initially diagnosed by Tom Lane.
2015-07-24 09:40:46 -04:00
Andrew Dunstan e02d44b8a7 Support JSON negative array subscripts everywhere
Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting.  Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator).  This inconsistency seemed likely to be
counter-intuitive to users.  To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators.  This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.

For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count.  From there on, it's as
if a positive-wise value was originally provided.

Note that there is still a minor inconsistency here across jsonb
deletion operators.  Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed.  The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements.  However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).

In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte.  This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator).  Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.

Peter Geoghegan and Andrew Dunstan
2015-07-17 21:13:47 -04:00
Andrew Dunstan bda76c1c8c Render infinite date/timestamps as 'infinity' for json/jsonb
Commit ab14a73a6c raised an error in these cases and later the
behaviour was copied to jsonb. This is what the XML code, which we
then adopted, does, as the XSD types don't accept infinite values.
However, json dates and timestamps are just strings as far as json is
concerned, so there is no reason not to render these values as
'infinity'.

The json portion of this is backpatched to 9.4 where the behaviour was
introduced. The jsonb portion only affects the development branch.

Per gripe on pgsql-general.
2015-02-26 12:25:21 -05:00
Tom Lane 451d280815 Fix jsonb Unicode escape processing, and in consequence disallow \u0000.
We've been trying to support \u0000 in JSON values since commit
78ed8e03c6, and have introduced increasingly worse hacks to try to
make it work, such as commit 0ad1a81632.  However, it fundamentally
can't work in the way envisioned, because the stored representation looks
the same as for \\u0000 which is not the same thing at all.  It's also
entirely bogus to output \u0000 when de-escaped output is called for.

The right way to do this would be to store an actual 0x00 byte, and then
throw error only if asked to produce de-escaped textual output.  However,
getting to that point seems likely to take considerable work and may well
never be practical in the 9.4.x series.

To preserve our options for better behavior while getting rid of the nasty
side-effects of 0ad1a81632, revert that commit in toto and instead
throw error if \u0000 is used in a context where it needs to be de-escaped.
(These are the same contexts where non-ASCII Unicode escapes throw error
if the database encoding isn't UTF8, so this behavior is by no means
without precedent.)

In passing, make both the \u0000 case and the non-ASCII Unicode case report
ERRCODE_UNTRANSLATABLE_CHARACTER / "unsupported Unicode escape sequence"
rather than claiming there's something wrong with the input syntax.

Back-patch to 9.4, where we have to do something because 0ad1a81632
broke things for many cases having nothing to do with \u0000.  9.3 also has
bogus behavior, but only for that specific escape value, so given the lack
of field complaints it seems better to leave 9.3 alone.
2015-01-30 14:44:56 -05:00
Andrew Dunstan 237a882443 Add json_strip_nulls and jsonb_strip_nulls functions.
The functions remove object fields, including in nested objects, that
have null as a value. In certain cases this can lead to considerably
smaller datums, with no loss of semantic information.

Andrew Dunstan, reviewed by Pavel Stehule.
2014-12-12 09:00:43 -05:00
Stephen Frost c8a026e4f1 Revert 95d737ff to add 'ignore_nulls'
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json.  This capability would be better added as an independent
function rather than being bolted on to row_to_json.  Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.

Pointed out by Tom and discussed with Andrew and Robert.
2014-09-29 13:32:22 -04:00
Stephen Frost 95d737ff45 Add 'ignore_nulls' option to row_to_json
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json.  This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.

This also makes row_to_json into a single function with default values,
rather than having multiple functions.  In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).

Pavel Stehule
2014-09-11 21:23:51 -04:00
Tom Lane 41dd50e84d Fix corner-case behaviors in JSON/JSONB field extraction operators.
Cause the path extraction operators to return their lefthand input,
not NULL, if the path array has no elements.  This seems more consistent
since the case ought to correspond to applying the simple extraction
operator (->) zero times.

Cause other corner cases in field/element/path extraction to return NULL
rather than failing.  This behavior is arguably more useful than throwing
an error, since it allows an expression index using these operators to be
built even when not all values in the column are suitable for the
extraction being indexed.  Moreover, we already had multiple
inconsistencies between the path extraction operators and the simple
extraction operators, as well as inconsistencies between the JSON and
JSONB code paths.  Adopt a uniform rule of returning NULL rather than
throwing an error when the JSON input does not have a structure that
permits the request to be satisfied.

Back-patch to 9.4.  Update the release notes to list this as a behavior
change since 9.3.
2014-08-22 13:17:58 -04:00
Tom Lane fa069822f5 More regression test cases for json/jsonb extraction operators.
Cover some cases I omitted before, such as null and empty-string
elements in the path array.  This exposes another inconsistency:
json_extract_path complains about empty path elements but
jsonb_extract_path does not.
2014-08-20 19:05:05 -04:00
Tom Lane 9bac66020d Fix core dump in jsonb #> operator, and add regression test cases.
jsonb's #> operator segfaulted (dereferencing a null pointer) if the RHS
was a zero-length array, as reported in bug #11207 from Justin Van Winkle.
json's #> operator returns NULL in such cases, so for the moment let's
make jsonb act likewise.

Also add a bunch of regression test queries memorializing the -> and #>
operators' behavior for this and other corner cases.

There is a good argument for changing some of these behaviors, as they
are not very consistent with each other, and throwing an error isn't
necessarily a desirable behavior for operators that are likely to be
used in indexes.  However, everybody can agree that a core dump is the
Wrong Thing, and we need test cases even if we decide to change their
expected output later.
2014-08-20 16:48:53 -04:00
Andrew Dunstan 4ebe3519e1 Allow empty string object keys in json_object().
This makes the behaviour consistent with the json parser, other
json-generating functions, and the JSON standards.
2014-07-22 11:27:31 -04:00
Tom Lane a749a23d7a Remove use_json_as_text options from json_to_record/json_populate_record.
The "false" case was really quite useless since all it did was to throw
an error; a definition not helped in the least by making it the default.
Instead let's just have the "true" case, which emits nested objects and
arrays in JSON syntax.  We might later want to provide the ability to
emit sub-objects in Postgres record or array syntax, but we'd be best off
to drive that off a check of the target field datatype, not a separate
argument.

For the functions newly added in 9.4, we can just remove the flag arguments
outright.  We can't do that for json_populate_record[set], which already
existed in 9.3, but we can ignore the argument and always behave as if it
were "true".  It helps that the flag arguments were optional and not
documented in any useful fashion anyway.
2014-06-29 13:50:58 -04:00
Tom Lane 57d8c1270e Fix handling of nested JSON objects in json_populate_recordset and friends.
populate_recordset_object_start() improperly created a new hash table
(overwriting the link to the existing one) if called at nest levels
greater than one.  This resulted in previous fields not appearing in
the final output, as reported by Matti Hameister in bug #10728.
In 9.4 the problem also affects json_to_recordset.

This perhaps missed detection earlier because the default behavior is to
throw an error for nested objects: you have to pass use_json_as_text = true
to see the problem.

In addition, fix query-lifespan leakage of the hashtable created by
json_populate_record().  This is pretty much the same problem recently
fixed in dblink: creating an intended-to-be-temporary context underneath
the executor's per-tuple context isn't enough to make it go away at the
end of the tuple cycle, because MemoryContextReset is not
MemoryContextResetAndDeleteChildren.

Michael Paquier and Tom Lane
2014-06-24 21:22:40 -07:00
Andrew Dunstan 0ad1a81632 Do not escape a unicode sequence when escaping JSON text.
Previously, any backslash in text being escaped for JSON was doubled so
that the result was still valid JSON. However, this led to some perverse
results in the case of Unicode sequences, These are now detected and the
initial backslash is no longer escaped. All other backslashes are
still escaped. No validity check is performed, all that is looked for is
\uXXXX where X is a hexidecimal digit.

This is a change from the 9.2 and 9.3 behaviour as noted in the Release
notes.

Per complaint from Teodor Sigaev.
2014-06-03 16:11:31 -04:00
Andrew Dunstan f30015b6d7 Output timestamps in ISO 8601 format when rendering JSON.
Many JSON processors require timestamp strings in ISO 8601 format in
order to convert the strings. When converting a timestamp, with or
without timezone, to a JSON datum we therefore now use such a format
rather than the type's default text output, in functions such as
to_json().

This is a change in behaviour from 9.2 and 9.3, as noted in the release
notes.
2014-06-03 13:56:53 -04:00
Andrew Dunstan d9134d0a35 Introduce jsonb, a structured format for storing json.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.

The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.

This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.

Authors: Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
2014-03-23 16:40:19 -04:00
Andrew Dunstan 5264d91541 Add json_array_elements_text function.
This was a notable omission from the json functions added in 9.3 and
there have been numerous complaints about its absence.

Laurence Rowe.
2014-01-29 15:39:01 -05:00
Andrew Dunstan 105639900b New json functions.
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.

Catalog version bumped.

Andrew Dunstan, reviewed by Marko Tiikkaja.
2014-01-28 17:48:21 -05:00
Peter Eisentraut 001e114b8d Fix whitespace issues found by git diff --check, add gitattributes
Set per file type attributes in .gitattributes to fine-tune whitespace
checks.  With the associated cleanups, the tree is now clean for git
2013-11-10 14:48:29 -05:00
Andrew Dunstan 4d212bac17 json_typeof function.
Andrew Tipton.
2013-10-10 12:21:59 -04:00
Andrew Dunstan 78ed8e03c6 Fix unescaping of JSON Unicode escapes, especially for non-UTF8.
Per discussion  on -hackers. We treat Unicode escapes when unescaping
them similarly to the way we treat them in PostgreSQL string literals.
Escapes in the ASCII range are always accepted, no matter what the
database encoding. Escapes for higher code points are only processed in
UTF8 databases, and attempts to process them in other databases will
result in an error. \u0000 is never unescaped, since it would result in
an impermissible null byte.
2013-06-12 13:35:24 -04:00
Andrew Dunstan 94e3311b97 Handle Unicode surrogate pairs correctly when processing JSON.
In 9.2, Unicode escape sequences are not analysed at all other than
to make sure that they are in the form \uXXXX. But in 9.3 many of the
new operators and functions try to turn JSON text values into text in
the server encoding, and this includes de-escaping Unicode escape
sequences. This processing had not taken into account the possibility
that this might contain a surrogate pair to designate a character
outside the BMP. That is now handled correctly.

This also enforces correct use of surrogate pairs, something that is not
done by the type's input routines. This fact is noted in the docs.
2013-06-08 09:12:48 -04:00
Peter Eisentraut 8b5a3998a1 Remove whitespace from end of lines 2013-05-30 21:05:07 -04:00
Andrew Dunstan a570c98d7f Add new JSON processing functions and parser API.
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.

Catalog version bumped.

Andrew Dunstan, with some documentation assistance from Merlin Moncure.
2013-03-29 14:12:13 -04:00
Andrew Dunstan 38fb4d978c JSON generation improvements.
This adds the following:

    json_agg(anyrecord) -> json
    to_json(any) -> json
    hstore_to_json(hstore) -> json (also used as a cast)
    hstore_to_json_loose(hstore) -> json

The last provides heuristic treatment of numbers and booleans.

Also, in json generation, if any non-builtin type has a cast to json,
that function is used instead of the type's output function.

Andrew Dunstan, reviewed by Steve Singer.

Catalog version bumped.
2013-03-10 17:35:36 -04:00
Peter Eisentraut f1f6737e15 Fix incorrect logic in JSON number lexer
Detectable by gcc -Wlogical-op.

Add two regression test cases that would previously allow incorrect
values to pass.
2012-05-20 02:24:46 +03:00
Peter Eisentraut c8e086795a Remove whitespace from end of lines
pgindent and perltidy should clean up the rest.
2012-05-15 22:19:41 +03:00
Andrew Dunstan 83fcaffea2 Fix a couple of cases of JSON output.
First, as noted by Itagaki Takahiro, a datum of type JSON doesn't
need to be escaped. Second, ensure that numeric output not in
the form of a legal JSON number is quoted and escaped.
2012-02-20 15:01:03 -05:00
Andrew Dunstan 39909d1d39 Add array_to_json and row_to_json functions.
Also move the escape_json function from explain.c to json.c where it
seems to belong.

Andrew Dunstan, Reviewd by Abhijit Menon-Sen.
2012-02-03 12:11:16 -05:00
Robert Haas 5384a73f98 Built-in JSON data type.
Like the XML data type, we simply store JSON data as text, after checking
that it is valid.  More complex operations such as canonicalization and
comparison may come later, but this is enough for not.

There are a few open issues here, such as whether we should attempt to
detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets
the basic framework in place.
2012-01-31 11:48:23 -05:00