Commit Graph

32 Commits

Author SHA1 Message Date
Tom Lane d074b4e50d Fix regexp_matches() handling of zero-length matches.
We'd find the same match twice if it was of zero length and not immediately
adjacent to the previous match.  replace_text_regexp() got similar cases
right, so adjust this search logic to match that.  Note that even though
the regexp_split_to_xxx() functions share this code, they did not display
equivalent misbehavior, because the second match would be considered
degenerate and ignored.

Jeevan Chalke, with some cosmetic changes by me.
2013-07-31 11:31:22 -04:00
Tom Lane e7bfc7e42c Fix some uses of "the quick brown fox".
If we're going to quote a well-known pangram, we should quote it
accurately.  Per gripe from Thom Brown.
2013-05-16 12:30:41 -04:00
Peter Eisentraut cc26ea9fe2 Clean up references to SQL92
In most cases, these were just references to the SQL standard in
general.  In a few cases, a contrast was made between SQL92 and later
standards -- those have been kept unchanged.
2013-04-20 11:04:41 -04:00
Tom Lane dbde97cdde Rewrite LIKE's %-followed-by-_ optimization so it really works (this time
for sure ;-)).  It now also optimizes more cases, such as %_%_.  Improve
comments too.  Per bug #5478.

In passing, also rename the TCHAR macro to GETCHAR, because pgindent is
messing with the formatting of the former (apparently it now thinks TCHAR
is a typedef name).

Back-patch to 8.3, where the bug was introduced.
2010-05-28 17:35:23 +00:00
Tom Lane 9507c8a1db Add get_bit/set_bit functions for bit strings, paralleling those for bytea,
and implement OVERLAY() for bit strings and bytea.

In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.

Leonardo F, reviewed by Kevin Grittner
2010-01-25 20:55:32 +00:00
Tom Lane a2a8c7a662 Support hex-string input and output for type BYTEA.
Both hex format and the traditional "escape" format are automatically
handled on input.  The output format is selected by the new GUC variable
bytea_output.

As committed, bytea_output defaults to HEX, which is an *incompatible
change*.  We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.

Peter Eisentraut
2009-08-04 16:08:37 +00:00
Tom Lane fc2660fc25 Fix LIKE's special-case code for % followed by _. I'm not entirely sure that
this case is worth a special code path, but a special code path that gets
the boundary condition wrong is definitely no good.  Per bug #4821 from
Andrew Gierth.

In passing, clean up some minor code formatting issues (excess parentheses
and blank lines in odd places).

Back-patch to 8.3, where the bug was introduced.
2009-05-24 18:10:38 +00:00
Peter Eisentraut 40bc4c2605 Disable the use of Unicode escapes in string constants (U&'') when
standard_conforming_strings is not on, for security reasons.
2009-05-05 18:32:17 +00:00
Peter Eisentraut 06735e3256 Unicode escapes in strings and identifiers 2008-10-29 08:04:54 +00:00
Peter Eisentraut 607b2be7bb Additional string function tests for coverage of oracle_compat.c 2008-10-04 13:55:45 +00:00
Tom Lane 1b70619311 Code review for regexp_matches/regexp_split patch. Refactor to avoid assuming
that cached compiled patterns will still be there when the function is next
called.  Clean up looping logic, thereby fixing bug identified by Pavel
Stehule.  Share setup code between the two functions, add some comments, and
avoid risky mixing of int and size_t variables.  Clean up the documentation a
tad, and accept all the flag characters mentioned in table 9-19 rather than
just a subset.
2007-08-11 03:56:24 +00:00
Tom Lane 31edbadf4a Downgrade implicit casts to text to be assignment-only, except for the ones
from the other string-category types; this eliminates a lot of surprising
interpretations that the parser could formerly make when there was no directly
applicable operator.

Create a general mechanism that supports casts to and from the standard string
types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
I/O functions.  These new casts are assignment-only in the to-string direction,
explicit-only in the other, and therefore should create no surprising behavior.
Remove a bunch of thereby-obsoleted datatype-specific casting functions.

The "general mechanism" is a new expression node type CoerceViaIO that can
actually convert between *any* two datatypes if their external text
representations are compatible.  This is more general than needed for the
immediate feature, but might be useful in plpgsql or other places in future.

This commit does nothing about the issue that applying the concatenation
operator || to non-text types will now fail, often with strange error messages
due to misinterpreting the operator as array concatenation.  Since it often
(not always) worked before, we should either make it succeed or at least give
a more user-friendly error; but details are still under debate.

Peter Eisentraut and Tom Lane
2007-06-05 21:31:09 +00:00
Tom Lane 3e23b68dac Support varlena fields with single-byte headers and unaligned storage.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
2007-04-06 04:21:44 +00:00
Neil Conway 9eb78beeae Add three new regexp functions: regexp_matches, regexp_split_to_array,
and regexp_split_to_table. These functions provide access to the
capture groups resulting from a POSIX regular expression match,
and provide the ability to split a string on a POSIX regular
expression, respectively. Patch from Jeremy Drake; code review by
Neil Conway, additional comments and suggestions from Tom and
Peter E.

This patch bumps the catversion, adds some regression tests,
and updates the docs.
2007-03-20 05:45:00 +00:00
Bruce Momjian 19c21d115d Enable standard_conforming_strings to be turned on.
Kevin Grittner
2006-03-06 19:49:20 +00:00
Bruce Momjian 75a64eeb4b I made the patch that implements regexp_replace again.
The specification of this function is as follows.

regexp_replace(source text, pattern text, replacement text, [flags
text])
returns text

Replace string that matches to regular expression in source text to
replacement text.

 - pattern is regular expression pattern.
 - replacement is replace string that can use '\1'-'\9', and '\&'.
    '\1'-'\9': back reference to the n'th subexpression.
    '\&'     : entire matched string.
 - flags can use the following values:
    g: global (replace all)
    i: ignore case
    When the flags is not specified, case sensitive, replace the first
    instance only.

Atsushi Ogawa
2005-07-10 04:54:33 +00:00
Neil Conway f3567eeaf2 Implement md5(bytea), update regression tests and documentation. Patch
from Abhijit Menon-Sen, minor editorialization from Neil Conway. Also,
improve md5(text) to allocate a constant-sized buffer on the stack
rather than via palloc.

Catalog version bumped.
2005-05-20 01:29:56 +00:00
Tom Lane f45df8c014 Cause CHAR(n) to TEXT or VARCHAR conversion to automatically strip trailing
blanks, in hopes of reducing the surprise factor for newbies.  Remove
redundant operators for VARCHAR (it depends wholly on TEXT operations now).
Clean up resolution of ambiguous operators/functions to avoid surprising
choices for domains: domains are treated as equivalent to their base types
and binary-coercibility is no longer considered a preference item when
choosing among multiple operators/functions.  IsBinaryCoercible now correctly
reflects the notion that you need *only* relabel the type to get from type
A to type B: that is, a domain is binary-coercible to its base type, but
not vice versa.  Various marginal cleanup, including merging the essentially
duplicate resolution code in parse_func.c and parse_oper.c.  Improve opr_sanity
regression test to understand about binary compatibility (using pg_cast),
and fix a couple of small errors in the catalogs revealed thereby.
Restructure "special operator" handling to fetch operators via index opclasses
rather than hardwiring assumptions about names (cleans up the pattern_ops
stuff a little).
2003-05-26 00:11:29 +00:00
Bruce Momjian e87e82d2b7 Attached are two small patches to expose md5 as a user function -- including
documentation and regression test mods. It seemed small and unobtrusive enough
to not require a specific proposal on the hackers list -- but if not, let me
know and I'll make a pitch. Otherwise, if there are no objections please apply.

Joe Conway
2002-12-06 05:20:28 +00:00
Tom Lane 9946b83ded Bring SIMILAR TO and SUBSTRING into some semblance of conformance with
the SQL99 standard.  (I'm not sure that the character-class features are
quite right, but that can be fixed later.)  Document SQL99 and POSIX
regexps as being different features; provide variants of SUBSTRING for
each.
2002-09-22 17:27:25 +00:00
Tom Lane b26dfb9522 Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution.  Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 21:35:25 +00:00
Bruce Momjian 81186865fe Joe Conway wrote:
> Hannu Krosing wrote:
 >
 >> It seems that my last mail on this did not get through to the list
 >> ;(
 >>
 >> Please consider renaming the new builtin function
 >> split(text,text,int)
 >>
 >> to something else, perhaps
 >>
 >> split_part(text,text,int)
 >>
 >> (like date_part)
 >>
 >> The reason for this request is that 3 most popular scripting
 >> languages (perl, python, php) all have also a function with similar
 >> signature, but returning an array instead of single element and the
 >> (optional) third argument is limit (maximum number of splits to
 >> perform)
 >>
 >> I think that it would be good to have similar function in (some
 >> future release of) postgres, but if we now let in a function with
 >> same name and arguments but returning a single string instead an
 >> array of them, then we will need to invent a new and not so easy to
 >> recognise name for the "real" split function.
 >>
 >
 > This is a good point, and I'm not opposed to changing the name, but
 > it is too bad your original email didn't get through before beta1 was
 >  rolled. The change would now require an initdb, which I know we were
 >  trying to avoid once beta started (although we could change it
 > without *requiring* an initdb I suppose).
 >
 > I guess if we do end up needing an initdb for other reasons, we
 > should make this change too. Any other opinions? Is split_part an
 > acceptable name?
 >
 > Also, if we add a todo to produce a "real" split function that
 > returns an array, similar to those languages, I'll take it for 7.4.

No one commented on the choice of name, so the attached patch changes
the name of split(text,text,int) to split_part(text,text,int) per
Hannu's recommendation above. This can be applied without an initdb if
current beta testers are advised to run:

   update pg_proc set proname = 'split_part' where proname = 'split';

in the case they want to use this function. Regression and doc fix is
also included in the patch.

Joe Conway
2002-09-12 00:21:25 +00:00
Bruce Momjian b60acaf568 The following small patch provides a couple of minor updates (against
CVS HEAD):

Amended "strings" regression test. TOAST tests now insert two values
with storage set to "external", to exercise properly the TOAST slice
routines which fetch only a subset of the chunks.

Changed now-misleading comment on AlterTableCreateToastTable in
tablecmds.c, because both columns of the index on a toast table are now
used.

John Gray
2002-08-28 20:18:29 +00:00
Bruce Momjian 89260124db Add:
replace(string, from, to)
   -- replaces all occurrences of "from" in "string" to "to"
split(string, fldsep, column)
   -- splits "string" on "fldsep" and returns "column" number piece
to_hex(int32_num) & to_hex(int64_num)
   -- takes integer number and returns as hex string

Joe Conway
2002-08-22 03:24:01 +00:00
Thomas G. Lockhart ea01a451cc Implement SQL99 OVERLAY(). Allows substitution of a substring in a string.
Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".
Implement SQL99 regular expression SUBSTRING(string FROM pat FOR escape).
 Extend the definition to make the FOR clause optional.
 Define textregexsubstr() to actually implement this feature.
Update the regression test to include these new string features.
 All tests pass.
Rename the regular expression support routines from "pg95_xxx" to "pg_xxx".
Define CREATE CHARACTER SET in the parser per SQL99. No implementation yet.
2002-06-11 15:44:38 +00:00
Tom Lane 7c0c9b3cce New improved version of bpcharin() may have got the truncation case
right, but it failed to get the padding case right.

This was obscured by subsequent application of bpchar() in all but one
regression test case, and that one didn't fail in an obvious way ---
trailing blanks are hard to see.  Add another test case to make it
more obvious if it breaks again.
2001-06-01 17:49:17 +00:00
Peter Eisentraut 5546ec289b Make char(n) and varchar(n) types raise an error if the inserted string is
too long.  While I was adjusting the regression tests I moved the array
things all into array.sql, to make things more manageable.
2001-05-21 16:54:46 +00:00
Thomas G. Lockhart 259489bab7 Implement LIKE/ESCAPE. Change parser to use like()/notlike()
rather than the "~~" operator; this made it easy to add ESCAPE features.
Implement ILIKE, NOT ILIKE, and the ESCAPE clause for them.
 afaict this is not MultiByte clean, but lots of other stuff isn't either.
Fix up underlying support code for LIKE/NOT LIKE.
 Things should be faster and does not require internal string copying.
Update regression test to add explicit checks for
 LIKE/NOT LIKE/ILIKE/NOT ILIKE.
Remove colon and semi-colon operators as threatened in 7.0.
Implement SQL99 COMMIT/AND NO CHAIN.
 Throw elog(ERROR) on COMMIT/AND CHAIN per spec
 since we don't yet support it.
Implement SQL99 CREATE/DROP SCHEMA as equivalent to CREATE DATABASE.
 This is only a stopgap or demo since schemas will have another
 implementation soon.
Remove a few unused production rules to get rid of warnings
 which crept in on the last commit.
Fix up tabbing in some places by removing embedded spaces.
2000-08-06 18:13:42 +00:00
Tom Lane 2d4a05d7df Update strings test to reflect the fact that casting to char() will
now truncate or pad to the specified length.
2000-01-17 00:16:41 +00:00
Thomas G. Lockhart 4c4e68dccc Clean up format of tests.
Remove older "::" type coersion syntax in favor of extended SQL92 style.
Include a few new tests for datetime/timespan arithmetic.
2000-01-05 06:07:58 +00:00
Thomas G. Lockhart 5812d51270 Add test for UNION.
Add additional tests in strings for conversions of the "name" data type.
Test SQL92 string functions such as SUBSTRING() and POSITION().
1998-05-29 13:23:02 +00:00
Thomas G. Lockhart 7a86a2a9e5 Add tests for varchar() and combinations of string types. 1997-12-01 02:48:47 +00:00