Commit Graph

73 Commits

Author SHA1 Message Date
Nathan Bossart 260a1f18da Add to_bin() and to_oct().
This commit introduces functions for converting numbers to their
equivalent binary and octal representations.  Also, the base
conversion code for these functions and to_hex() has been moved to
a common helper function.

Co-authored-by: Eric Radman
Reviewed-by: Ian Barwick, Dag Lem, Vignesh C, Tom Lane, Peter Eisentraut, Kirk Wolak, Vik Fearing, John Naylor, Dean Rasheed
Discussion: https://postgr.es/m/Y6IyTQQ/TsD5wnsH%40vm3.eradman.com
2023-08-23 07:49:03 -07:00
Tom Lane d7056bc1c7 Avoid fetching one past the end of translate()'s "to" parameter.
This is usually harmless, but if you were very unlucky it could
provoke a segfault due to the "to" string being right up against
the end of memory.  Found via valgrind testing (so we might've
found it earlier, except that our regression tests lacked any
exercise of translate()'s deletion feature).

Fix by switching the order of the test-for-end-of-string and
advance-pointer steps.  While here, compute "to_ptr + tolen"
just once.  (Smarter compilers might figure that out for
themselves, but let's just make sure.)

Report and fix by Daniil Anisimov, in bug #17816.

Discussion: https://postgr.es/m/17816-70f3d2764e88a108@postgresql.org
2023-03-01 11:30:31 -05:00
Michael Paquier b8da37b3ad Rework pg_input_error_message(), now renamed pg_input_error_info()
pg_input_error_info() is now a SQL function able to return a row with
more than just the error message generated for incorrect data type
inputs when these are able to handle soft failures, returning more
contents of ErrorData, as of:
- The error message (same as before).
- The error detail, if set.
- The error hint, if set.
- SQL error code.

All the regression tests that relied on pg_input_error_message() are
updated to reflect the effects of the rename.

Per discussion with Tom Lane and Andrew Dunstan.

Author: Nathan Bossart
Discussion: https://postgr.es/m/139a68e1-bd1f-a9a7-b5fe-0be9845c6311@dunslane.net
2023-02-28 08:04:13 +09:00
Tom Lane 3b9d2deb67 Convert a few more datatype input functions to report errors softly.
Convert the remaining string-category input functions
(bpcharin, varcharin, byteain) to the new style.

Discussion: https://postgr.es/m/3038346.1671060258@sss.pgh.pa.us
2022-12-14 19:42:05 -05:00
Peter Eisentraut 9786b89bd1 Put tests of md5() function into separate test file
In FIPS mode, these calls will fail.  By having them in a separate
file, it would make it easier to have an alternative output file or
selectively disable these tests.  This isn't done here; this is just
some preparation.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/647f6cc1-473d-f788-ade0-c09201e5ab6a@enterprisedb.com
2022-10-13 12:02:31 +02:00
Tom Lane 18bac60ede Let regexp_replace() make use of REG_NOSUB when feasible.
If the replacement string doesn't contain \1...\9, then we don't
need sub-match locations, so we can use the REG_NOSUB optimization
here too.  There's already a pre-scan of the replacement string
to look for backslashes, so extend that to check for digits, and
refactor to allow that to happen before we compile the regexp.

While at it, try to speed up the pre-scan by using memchr() instead
of a handwritten loop.  It's likely that this is lost in the noise
compared to the regexp processing proper, but maybe not.  In any
case, this coding is shorter.

Also, add some test cases to improve the poor coverage of
appendStringInfoRegexpSubstr().

Discussion: https://postgr.es/m/3534632.1628536485@sss.pgh.pa.us
2021-08-09 20:53:25 -04:00
Tom Lane 6424337073 Add assorted new regexp_xxx SQL functions.
This patch adds new functions regexp_count(), regexp_instr(),
regexp_like(), and regexp_substr(), and extends regexp_replace()
with some new optional arguments.  All these functions follow
the definitions used in Oracle, although there are small differences
in the regexp language due to using our own regexp engine -- most
notably, that the default newline-matching behavior is different.
Similar functions appear in DB2 and elsewhere, too.  Aside from
easing portability, these functions are easier to use for certain
tasks than our existing regexp_match[es] functions.

Gilles Darold, heavily revised by me

Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net
2021-08-03 13:08:49 -04:00
Peter Eisentraut f37fec837c Add unistr function
This allows decoding a string with Unicode escape sequences.  It is
similar to Unicode escape strings, but offers some more flexibility.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Asif Rehman <asifr.rehman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com
2021-03-29 11:56:53 +02:00
Peter Eisentraut ebedd0c78f Reset standard_conforming_strings in strings test
After some tests relating to standard_conforming_strings behavior, the
value was not reset to the default value.  Therefore, the rest of the
tests in that file ran with the nondefault setting, which affected the
results of some tests.  For clarity, reset the value and run the rest
of the tests with the default setting again.
2021-03-29 08:40:39 +02:00
Peter Eisentraut a6715af1e7 Add bit_count SQL function
This function for bit and bytea counts the set bits in the bit or byte
string.  Internally, we use the existing popcount functionality.

For the name, after some discussion, we settled on bit_count, which
also exists with this meaning in MySQL, Java, and Python.

Author: David Fetter <david@fetter.org>
Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
2021-03-23 10:13:58 +01:00
Peter Eisentraut eb42110d95 Add tests for bytea LIKE operator
Add test coverage for the following operations, which were previously
not tested at all:

bytea LIKE bytea (bytealike)
bytea NOT LIKE bytea (byteanlike)
ESCAPE clause for the above (like_escape_bytea)

also

name NOT ILIKE text (nameicnlike)

Discussion: https://www.postgresql.org/message-id/flat/4d13563a-2c8d-fd91-20d5-e71b7a4eaa87%40enterprisedb.com
2021-02-18 08:42:04 +01:00
Tom Lane a6cf3df4eb Add bytea equivalents of ltrim() and rtrim().
We had bytea btrim() already, but for some reason not the other two.

Joel Jacobson

Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
2021-01-18 15:11:32 -05:00
Tom Lane 4bd3fad80e Fix integer-overflow corner cases in substring() functions.
If the substring start index and length overflow when added together,
substring() misbehaved, either throwing a bogus "negative substring
length" error on a case that should succeed, or failing to complain that
a negative length is negative (and instead returning the whole string,
in most cases).  Unsurprisingly, the text, bytea, and bit variants of
the function all had this issue.  Rearrange the logic to ensure that
negative lengths are always rejected, and add an overflow check to
handle the other case.

Also install similar guards into detoast_attr_slice() (nee
heap_tuple_untoast_attr_slice()), since it's far from clear that
no other code paths leading to that function could pass it values
that would overflow.

Patch by myself and Pavel Stehule, per bug #16804 from Rafi Shamim.

Back-patch to v11.  While these bugs are old, the common/int.h
infrastructure for overflow-detecting arithmetic didn't exist before
commit 4d6ad3125, and it doesn't seem like these misbehaviors are bad
enough to justify developing a standalone fix for the older branches.

Discussion: https://postgr.es/m/16804-f4eeeb6c11ba71d4@postgresql.org
2021-01-04 18:32:44 -05:00
Tom Lane ec0294fb2c Support negative indexes in split_part().
This provides a handy way to get, say, the last field of the string.
Use of a negative index in this way has precedent in the nearby
left() and right() functions.

The implementation scans the string twice when N < -1, but it seems
likely that N = -1 will be the huge majority of actual use cases,
so I'm not really excited about adding complexity to avoid that.

Nikhil Benesch, reviewed by Jacob Champion; cosmetic tweakage by me

Discussion: https://postgr.es/m/cbb7f861-6162-3a51-9823-97bc3aa0b638@gmail.com
2020-11-13 13:49:48 -05:00
Peter Eisentraut 78c887679d Add current substring regular expression syntax
SQL:1999 had syntax

    SUBSTRING(text FROM pattern FOR escapechar)

but this was replaced in SQL:2003 by the more clear

    SUBSTRING(text SIMILAR pattern ESCAPE escapechar)

but this was never implemented in PostgreSQL.  This patch adds that
new syntax as an alternative in the parser, and updates documentation
and tests to indicate that this is the preferred alternative now.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/flat/a15db31c-d0f8-8ce0-9039-578a31758adb%402ndquadrant.com
2020-06-29 11:05:00 +02:00
Tom Lane 26a944cf29 Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering.
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value.  Lift that restriction by redeclaring the
bit number arguments as int8 (which requires a catversion bump,
hence is not back-patchable).

The similarly-named functions for bit/varbit don't really have a
problem because we restrict those types to at most VARBITMAXLEN bits;
hence leave them alone.

While here, extend the encode/decode functions in utils/adt/encode.c
to allow dealing with values wider than 1GB.  This is not a live bug
or restriction in current usage, because no input could be more than
1GB, and since none of the encoders can expand a string more than 4X,
the result size couldn't overflow uint32.  But it might be desirable
to support more in future, so make the input length values size_t
and the potential-output-length values uint64.

Also add some test cases to improve the miserable code coverage
of these functions.

Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat

Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
2020-04-07 15:57:58 -04:00
Tom Lane a6525588b7 Allow Unicode escapes in any server encoding, not only UTF-8.
SQL includes provisions for numeric Unicode escapes in string
literals and identifiers.  Previously we only accepted those
if they represented ASCII characters or the server encoding
was UTF-8, making the conversion to internal form trivial.
This patch adjusts things so that we'll call the appropriate
encoding conversion function in less-trivial cases, allowing
the escape sequence to be accepted so long as it corresponds
to some character available in the server encoding.

This also applies to processing of Unicode escapes in JSONB.
However, the old restriction still applies to client-side
JSON processing, since that hasn't got access to the server's
encoding conversion infrastructure.

This patch includes some lexer infrastructure that simplifies
throwing errors with error cursors pointing into the middle of
a string (or other complex token).  For the moment I only used
it for errors relating to Unicode escapes, but we might later
expand the usage to some other cases.

Patch by me, reviewed by John Naylor.

Discussion: https://postgr.es/m/2393.1578958316@sss.pgh.pa.us
2020-03-06 14:17:43 -05:00
Tom Lane 7f380c59f8 Reduce size of backend scanner's tables.
Previously, the core scanner's yy_transition[] array had 37045 elements.
Since that number is larger than INT16_MAX, Flex generated the array to
contain 32-bit integers.  By reimplementing some of the bulkier scanner
rules, this patch reduces the array to 20495 elements.  The much smaller
total length, combined with the consequent use of 16-bit integers for
the array elements reduces the binary size by over 200kB.  This was
accomplished in two ways:

1. Consolidate handling of quote continuations into a new start condition,
rather than duplicating that logic for five different string types.

2. Treat Unicode strings and identifiers followed by a UESCAPE sequence
as three separate tokens, rather than one.  The logic to de-escape
Unicode strings is moved to the filter code in parser.c, which already
had the ability to provide special processing for token sequences.
While we could have implemented the conversion in the grammar, that
approach was rejected for performance and maintainability reasons.

Performance in microbenchmarks of raw parsing seems equal or slightly
faster in most cases, and it's reasonable to expect that in real-world
usage (with more competition for the CPU cache) there will be a larger
win.  The exception is UESCAPE sequences; lexing those is about 10%
slower, primarily because the scanner now has to be called three times
rather than one.  This seems acceptable since that feature is very
rarely used.

The psql and epcg lexers are likewise modified, primarily because we
want to keep them all in sync.  Since those lexers don't use the
space-hogging -CF option, the space savings is much less, but it's
still good for perhaps 10kB apiece.

While at it, merge the ecpg lexer's handling of C-style comments used
in SQL and in C.  Those have different rules regarding nested comments,
but since we already have the ability to keep track of the previous
start condition, we can use that to handle both cases within a single
start condition.  This matches the core scanner more closely.

John Naylor

Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
2020-01-13 15:04:31 -05:00
Tom Lane bd1ef5799b Handle empty-string edge cases correctly in strpos().
Commit 9556aa01c rearranged the innards of text_position() in a way
that would make it not work for empty search strings.  Which is fine,
because all callers of that code special-case an empty pattern in
some way.  However, the primary use-case (text_position itself) got
special-cased incorrectly: historically it's returned 1 not 0 for
an empty search string.  Restore the historical behavior.

Per complaint from Austin Drenski (via Shay Rojansky).
Back-patch to v12 where it got broken.

Discussion: https://postgr.es/m/CADT4RqAz7oN4vkPir86Kg1_mQBmBxCp-L_=9vRpgSNPJf0KRkw@mail.gmail.com
2019-10-28 12:21:13 -04:00
Tom Lane ca70bdaefe Fix issues around strictness of SIMILAR TO.
As a result of some long-ago quick hacks, the SIMILAR TO operator
and the corresponding flavor of substring() interpreted "ESCAPE NULL"
as selecting the default escape character '\'.  This is both
surprising and not per spec: the standard is clear that these
functions should return NULL for NULL input.

Additionally, because of inconsistency of the strictness markings
of 3-argument substring() and similar_escape(), the planner could not
inline the SQL definition of substring(), resulting in a substantial
performance penalty compared to the underlying POSIX substring()
function.

The simplest fix for this would be to change the strictness marking
of similar_escape(), but if we do that we risk breaking existing views
that depend on that function.  Hence, leave similar_escape() as-is
as a compatibility function, and instead invent a new function
similar_to_escape() that comes in two strict variants.

There are a couple of other behaviors in this area that are also
not per spec, but they are documented and seem generally at least
as sane as the spec's definition, so leave them alone.  But improve
the documentation to describe them fully.

Patch by me; thanks to Álvaro Herrera and Andrew Gierth for review
and discussion.

Discussion: https://postgr.es/m/14047.1557708214@sss.pgh.pa.us
2019-09-07 14:21:59 -04:00
Michael Paquier 6ba500cae6 Fix regression test outputs
75445c1 has caused various failures in tests across the tree after
updating some error messages, so fix the newly-expected output.

Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8332.1558048838@sss.pgh.pa.us
2019-05-17 09:40:02 +09:00
Tom Lane 7c850320d8 Fix SQL-style substring() to have spec-compliant greediness behavior.
SQL's regular-expression substring() function is defined to have a
pattern argument that's separated into three subpatterns by escape-
double-quote markers; the function result is the part of the input
matching the second subpattern.  The standard makes it clear that
if there is ambiguity about how to match the input to the subpatterns,
the first and third subpatterns should be taken to match the smallest
possible amount of text (i.e., they're "non greedy", in the terms of
our regex code).  We were not doing it that way: the first subpattern
would eat the largest possible amount of text, causing the function
result to be shorter than what the spec requires.

Fix that by attaching explicit greediness quantifiers to the
subpatterns.  (This depends on the regex fix in commit 8a29ed053;
before that, this didn't reliably change the regex engine's behavior.)

Also, by adding parentheses around each subpattern, we ensure that
"|" (OR) in the subpatterns behave sanely.  Previously, "|" in the
first or third subpatterns didn't work.

This patch also makes the function throw error if you write more than
two escape-double-quote markers, and do something sane if you write
just one, and document that behavior.  Previously, an odd number of
markers led to a confusing complaint about unbalanced parentheses,
while extra pairs of markers were just ignored.  (Note that the spec
requires exactly two markers, but we've historically allowed there
to be none, and this patch preserves the old behavior for that case.)

In passing, adjust some substring() test cases that didn't really
prove what they said they were testing for: they used patterns
that didn't match the data string, so that the output would be
NULL whether or not the function was really strict.

Although this is certainly a bug fix, changing the behavior in back
branches seems undesirable: applications could perhaps be depending on
the old behavior, since it's not obviously wrong unless you read the
spec very closely.  Hence, no back-patch.

Discussion: https://postgr.es/m/5bb27a41-350d-37bf-901e-9d26f5592dd0@charter.net
2019-05-14 11:27:31 -04:00
Peter Eisentraut abb9c63b2c Unbreak index optimization for LIKE on bytea
The same code is used to handle both text and bytea, but bytea is not
collation-aware, so we shouldn't call get_collation_isdeterministic()
in that case, since that will error out with an invalid collation.

Reported-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAM2%2B6%3DWaf3qJ1%3DyVTUH8_yG-SC0xcBMY%2BSFLhvKKNnWNXSUDBw%40mail.gmail.com
2019-04-15 09:29:17 +02:00
Michael Paquier 92c76021ae Improve readability of some tests in strings.sql
c251336 has added some tests to check if a toast relation should be
empty or not, hardcoding the toast relation name when calling
pg_relation_size().  pg_class.reltoastrelid offers the same information,
so simplify the tests to use that.

Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20190403065949.GH3298@paquier.xyz
2019-04-04 10:24:56 +09:00
Andrew Gierth b7f6bcbffc Repair bug in regexp split performance improvements.
Commit c8ea87e4b introduced a temporary conversion buffer for
substrings extracted during regexp splits. Unfortunately the code that
sized it was failing to ignore the effects of ignored degenerate
regexp matches, so for regexp_split_* calls it could under-size the
buffer in such cases.

Fix, and add some regression test cases (though those will only catch
the bug if run in a multibyte encoding).

Backpatch to 9.3 as the faulty code was.

Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the
report (via IRC) and assistance in analysis. Patch by me.
2018-09-12 19:31:06 +01:00
Peter Eisentraut 10cfce34c0 Add user-callable SHA-2 functions
Add the user-callable functions sha224, sha256, sha384, sha512.  We
already had these in the C code to support SCRAM, but there was no test
coverage outside of the SCRAM tests.  Adding these as user-callable
functions allows writing some tests.  Also, we have a user-callable md5
function but no more modern alternative, which led to wide use of md5 as
a general-purpose hash function, which leads to occasional complaints
about using md5.

Also mark the existing md5 functions as leak-proof.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-02-22 11:34:53 -05:00
Simon Riggs 56f3468622 Reduce test variability for toast_tuple_target test 2017-11-20 12:09:40 +11:00
Simon Riggs c2513365a0 Parameter toast_tuple_target controls TOAST for new rows
Specifies the point at which we try to move long column values
into TOAST tables.

No effect on existing rows.

Discussion: https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2Ya00OyQQ@mail.gmail.com

Author: Simon Riggs <simon@2ndQudrant.com>
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndQuadrant.com>
2017-11-20 09:50:10 +11:00
Tom Lane cf9b0fea5f Implement regexp_match(), a simplified alternative to regexp_matches().
regexp_match() is like regexp_matches(), but it disallows the 'g' flag
and in consequence does not need to return a set.  Instead, it returns
a simple text array value, or NULL if there's no match.  Previously people
usually got that behavior with a sub-select, but this way is considerably
more efficient.

Documentation adjusted so that regexp_match() is presented first and then
regexp_matches() is introduced as a more complicated version.  This is
a bit historically revisionist but seems pedagogically better.

Still TODO: extend contrib/citext to support this function.

Emre Hasegeli, reviewed by David Johnston

Discussion: <CAE2gYzy42sna2ME_e3y1KLQ-4UBrB-eVF0SWn8QG39sQSeVhEw@mail.gmail.com>
2016-08-17 18:33:01 -04:00
Tom Lane d074b4e50d Fix regexp_matches() handling of zero-length matches.
We'd find the same match twice if it was of zero length and not immediately
adjacent to the previous match.  replace_text_regexp() got similar cases
right, so adjust this search logic to match that.  Note that even though
the regexp_split_to_xxx() functions share this code, they did not display
equivalent misbehavior, because the second match would be considered
degenerate and ignored.

Jeevan Chalke, with some cosmetic changes by me.
2013-07-31 11:31:22 -04:00
Tom Lane e7bfc7e42c Fix some uses of "the quick brown fox".
If we're going to quote a well-known pangram, we should quote it
accurately.  Per gripe from Thom Brown.
2013-05-16 12:30:41 -04:00
Peter Eisentraut cc26ea9fe2 Clean up references to SQL92
In most cases, these were just references to the SQL standard in
general.  In a few cases, a contrast was made between SQL92 and later
standards -- those have been kept unchanged.
2013-04-20 11:04:41 -04:00
Tom Lane dbde97cdde Rewrite LIKE's %-followed-by-_ optimization so it really works (this time
for sure ;-)).  It now also optimizes more cases, such as %_%_.  Improve
comments too.  Per bug #5478.

In passing, also rename the TCHAR macro to GETCHAR, because pgindent is
messing with the formatting of the former (apparently it now thinks TCHAR
is a typedef name).

Back-patch to 8.3, where the bug was introduced.
2010-05-28 17:35:23 +00:00
Tom Lane 9507c8a1db Add get_bit/set_bit functions for bit strings, paralleling those for bytea,
and implement OVERLAY() for bit strings and bytea.

In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.

Leonardo F, reviewed by Kevin Grittner
2010-01-25 20:55:32 +00:00
Tom Lane a2a8c7a662 Support hex-string input and output for type BYTEA.
Both hex format and the traditional "escape" format are automatically
handled on input.  The output format is selected by the new GUC variable
bytea_output.

As committed, bytea_output defaults to HEX, which is an *incompatible
change*.  We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.

Peter Eisentraut
2009-08-04 16:08:37 +00:00
Tom Lane fc2660fc25 Fix LIKE's special-case code for % followed by _. I'm not entirely sure that
this case is worth a special code path, but a special code path that gets
the boundary condition wrong is definitely no good.  Per bug #4821 from
Andrew Gierth.

In passing, clean up some minor code formatting issues (excess parentheses
and blank lines in odd places).

Back-patch to 8.3, where the bug was introduced.
2009-05-24 18:10:38 +00:00
Tom Lane 1bbbcb04f0 Make new complaint about unsafe Unicode literals include an error location.
Every other ereport in scan.l has one, this should too.
2009-05-05 21:09:23 +00:00
Peter Eisentraut 40bc4c2605 Disable the use of Unicode escapes in string constants (U&'') when
standard_conforming_strings is not on, for security reasons.
2009-05-05 18:32:17 +00:00
Peter Eisentraut 06735e3256 Unicode escapes in strings and identifiers 2008-10-29 08:04:54 +00:00
Peter Eisentraut 607b2be7bb Additional string function tests for coverage of oracle_compat.c 2008-10-04 13:55:45 +00:00
Tom Lane 1b70619311 Code review for regexp_matches/regexp_split patch. Refactor to avoid assuming
that cached compiled patterns will still be there when the function is next
called.  Clean up looping logic, thereby fixing bug identified by Pavel
Stehule.  Share setup code between the two functions, add some comments, and
avoid risky mixing of int and size_t variables.  Clean up the documentation a
tad, and accept all the flag characters mentioned in table 9-19 rather than
just a subset.
2007-08-11 03:56:24 +00:00
Tom Lane 31edbadf4a Downgrade implicit casts to text to be assignment-only, except for the ones
from the other string-category types; this eliminates a lot of surprising
interpretations that the parser could formerly make when there was no directly
applicable operator.

Create a general mechanism that supports casts to and from the standard string
types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
I/O functions.  These new casts are assignment-only in the to-string direction,
explicit-only in the other, and therefore should create no surprising behavior.
Remove a bunch of thereby-obsoleted datatype-specific casting functions.

The "general mechanism" is a new expression node type CoerceViaIO that can
actually convert between *any* two datatypes if their external text
representations are compatible.  This is more general than needed for the
immediate feature, but might be useful in plpgsql or other places in future.

This commit does nothing about the issue that applying the concatenation
operator || to non-text types will now fail, often with strange error messages
due to misinterpreting the operator as array concatenation.  Since it often
(not always) worked before, we should either make it succeed or at least give
a more user-friendly error; but details are still under debate.

Peter Eisentraut and Tom Lane
2007-06-05 21:31:09 +00:00
Tom Lane 3e23b68dac Support varlena fields with single-byte headers and unaligned storage.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
2007-04-06 04:21:44 +00:00
Neil Conway 9eb78beeae Add three new regexp functions: regexp_matches, regexp_split_to_array,
and regexp_split_to_table. These functions provide access to the
capture groups resulting from a POSIX regular expression match,
and provide the ability to split a string on a POSIX regular
expression, respectively. Patch from Jeremy Drake; code review by
Neil Conway, additional comments and suggestions from Tom and
Peter E.

This patch bumps the catversion, adds some regression tests,
and updates the docs.
2007-03-20 05:45:00 +00:00
Tom Lane 637028afe1 Code review for standard_conforming_strings patch. Fix it so it does not
throw warnings for 100%-SQL-standard constructs, clean up some minor
infelicities, try to un-break ecpg to the best of my ability.  (It's not clear
how ecpg is going to find out the setting of standard_conforming_strings,
though.)  I think pg_dump still needs work, too.
2006-05-11 19:15:36 +00:00
Tom Lane 20ab467d76 Improve parser so that we can show an error cursor position for errors
during parse analysis, not only errors detected in the flex/bison stages.
This is per my earlier proposal.  This commit includes all the basic
infrastructure, but locations are only tracked and reported for errors
involving column references, function calls, and operators.  More could
be done later but this seems like a good set to start with.  I've also
moved the ReportSyntaxErrorPosition logic out of psql and into libpq,
which should make it available to more people --- even within psql this
is an improvement because warnings weren't handled by ReportSyntaxErrorPosition.
2006-03-14 22:48:25 +00:00
Bruce Momjian 19c21d115d Enable standard_conforming_strings to be turned on.
Kevin Grittner
2006-03-06 19:49:20 +00:00
Bruce Momjian 75a64eeb4b I made the patch that implements regexp_replace again.
The specification of this function is as follows.

regexp_replace(source text, pattern text, replacement text, [flags
text])
returns text

Replace string that matches to regular expression in source text to
replacement text.

 - pattern is regular expression pattern.
 - replacement is replace string that can use '\1'-'\9', and '\&'.
    '\1'-'\9': back reference to the n'th subexpression.
    '\&'     : entire matched string.
 - flags can use the following values:
    g: global (replace all)
    i: ignore case
    When the flags is not specified, case sensitive, replace the first
    instance only.

Atsushi Ogawa
2005-07-10 04:54:33 +00:00
Neil Conway f3567eeaf2 Implement md5(bytea), update regression tests and documentation. Patch
from Abhijit Menon-Sen, minor editorialization from Neil Conway. Also,
improve md5(text) to allocate a constant-sized buffer on the stack
rather than via palloc.

Catalog version bumped.
2005-05-20 01:29:56 +00:00
Tom Lane 7665d1bc16 Teach psql to show the location of syntax errors visually, per recent
discussions.  Patch by Fabien Coelho and Tom Lane.  Still needs to be
taught about multi-screen-column kanji characters; Tatsuo has promised
to provide the needed infrastructure for that.
2004-03-14 04:25:18 +00:00