Commit Graph

296 Commits

Author SHA1 Message Date
Tom Lane
0231f83856 Re-pgindent varlena.c.
Just to make sure previous commit worked ...
2016-02-08 15:17:40 -05:00
Tom Lane
58e797216f Rename typedef "string" to "VarString".
Since pgindent treats typedef names as global, the original coding of
b47b4dbf68 would have had rather nasty effects on the formatting
of other files in which "string" is used as a variable or field name.
Use a less generic name for this typedef, and rename some other
identifiers to match.

Peter Geoghegan, per gripe from me
2016-02-08 15:15:56 -05:00
Robert Haas
63f39b9148 Fix small goof in comment.
Peter Geoghegan
2016-02-05 08:04:48 -05:00
Robert Haas
b47b4dbf68 Extend sortsupport for text to more opclasses.
Have varlena.c expose an interface that allows the char(n), bytea, and
bpchar types to piggyback on a now-generalized SortSupport for text.
This pushes a little more knowledge of the bpchar/char(n) type into
varlena.c than might be preferred, but that seems like the approach
that creates least friction.  Also speed things up for index builds
that use text_pattern_ops or varchar_pattern_ops.

This patch does quite a bit of renaming, but it seems likely to be
worth it, so as to avoid future confusion about the fact that this code
is now more generally used than the old names might have suggested.

Peter Geoghegan, reviewed by Álvaro Herrera and Andreas Karlsson,
with small tweaks by me.
2016-02-03 14:29:53 -05:00
Bruce Momjian
ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Peter Eisentraut
30c0c4bf12 Remove unnecessary escaping in C character literals
'\"' is more commonly written simply as '"'.
2015-12-22 22:43:46 -05:00
Robert Haas
0279f62fdc Correct tiny inaccuracy in strxfrm cache comment.
Peter Geoghegan
2015-11-03 08:32:22 -05:00
Robert Haas
5be94a9eb1 Be a bit more rigorous about how we cache strcoll and strxfrm results.
Commit 0e57b4d8bd contained some clever
logic that attempted to make sure that we couldn't get confused about
whether the last thing we cached was a strcoll() result or a strxfrm()
result, but it wasn't quite clever enough, because we can perform
further abbreviations after having already performed some comparisons.
Introduce an explicit flag in the hopes of making this watertight.

Peter Geoghegan, reviewed by me.
2015-10-20 09:27:50 -04:00
Robert Haas
d53f808e7e Remove obsolete comment.
Peter Geoghegan
2015-10-20 09:15:13 -04:00
Robert Haas
0e57b4d8bd Speed up text sorts where the same strings occur multiple times.
Cache strxfrm() blobs across calls made to the text SortSupport
abbreviation routine.  This can speed up sorting if the same string
needs to be abbreviated many times in a row.

Also, cache the result of the previous strcoll() comparison, so that
if we're asked to compare the same strings agin, we do need to call
strcoll() again.

Perhaps surprisingly, these optimizations don't seem to hurt even when
they don't help.  memcmp() is really cheap compared to strcoll() or
strxfrm().

Peter Geoghegan, reviewed by me.
2015-10-09 19:03:44 -04:00
Robert Haas
bfb54ff15a Make abbreviated key comparisons for text a bit cheaper.
If we do some byte-swapping while abbreviating, we can do comparisons
using integer arithmetic rather than memcmp.

Peter Geoghegan, reviewed and slightly revised by me.
2015-10-09 15:06:06 -04:00
Robert Haas
b48ecf862b In bttext_abbrev_convert, move pfree to the right place.
Without this, we might access memory that's already been freed, or
leak memory if in the C locale.

Peter Geoghegan
2015-06-29 23:53:05 -04:00
Bruce Momjian
807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas
4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Robert Haas
aea652abd3 Make trace_sort control abbreviation debug output for the text opclass.
This is consistent with what the new numeric suppor for abbreviated keys
now does, and seems much more convenient than having a separate compiler
define to control this debug output.

Peter Geoghegan
2015-04-07 22:45:17 -04:00
Robert Haas
f85155e18c Change the way we decide whether to give up on abbreviated text keys.
Be more aggressive about aborting early on if it looks like it's not
helping, but be less aggressive about aborting later on, since it's
more expensive at that point, and also since we're currently aborting
in some cases where abbreviation can still deliver a substantial win.

Peter Geoghegan. Extensive testing by Tomas Vondra.
2015-04-03 08:32:05 -04:00
Robert Haas
c02ef232c1 Add missing calls to DatumGetUInt32.
These were inadvertently ommitted from the commit that introduced
abbreviated keys, commit 4ea51cdfe8.

Peter Geoghegan
2015-04-02 11:57:35 -04:00
Robert Haas
168a809d4b Re-enable abbreviated keys on Windows.
Commit 1be4eb1b2d disabled this, but I
think the real problem here was fixed by commit
b181a91981 and commit
d060e07fa9.  So let's try re-enabling
it now and see what happens.
2015-01-26 14:28:14 -05:00
Tom Lane
586dd5d6a5 Replace a bunch more uses of strncpy() with safer coding.
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about.  So just use memcpy() in
these cases.

In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).

I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution).  There are also a
few such calls in ecpg that could possibly do with more analysis.

AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior.  These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
2015-01-24 13:05:42 -05:00
Robert Haas
d1747571b6 Fix typos, update README.
Peter Geoghegan
2015-01-23 15:06:53 -05:00
Robert Haas
d060e07fa9 Repair brain fade in commit b181a91981.
The split between which things need to happen in the C-locale case and
which needed to happen in the locale-aware case was a few bricks short
of a load.  Try to fix that.
2015-01-22 12:51:20 -05:00
Robert Haas
b181a91981 More fixes for abbreviated keys infrastructure.
First, when LC_COLLATE = C, bttext_abbrev_convert should use memcpy()
rather than strxfrm() to construct the abbreviated key, because the
authoritative comparator uses memcpy().  If we do anything else here,
we might get inconsistent answers, and the buildfarm says this risk
is not theoretical.  It should be faster this way, too.

Second, while I'm looking at bttext_abbrev_convert, convert a needless
use of goto into the loop it's trying to implement into an actual
loop.

Both of the above problems date to the original commit of abbreviated
keys, commit 4ea51cdfe8.

Third, fix a bogus assignment to tss->locale before tss is set up.
That's a new goof in commit b529b65d1b.
2015-01-22 11:58:58 -05:00
Robert Haas
b529b65d1b Heavily refactor btsortsupport_worker.
Prior to commit 4ea51cdfe8, this function
only had one job, which was to decide whether we could avoid trampolining
through the fmgr layer when performing sort comparisons.  As of that
commit, it has a second job, which is to decide whether we can use
abbreviated keys.  Unfortunately, those two tasks are somewhat intertwined
in the existing coding, which is likely why neither Peter Geoghegan nor
I noticed prior to commit that this calls pg_newlocale_from_collation() in
cases where it didn't previously.  The buildfarm noticed, though.

To fix, rewrite the logic so that the decision as to which comparator to
use is more cleanly separated from the decision about abbreviation.
2015-01-22 10:54:16 -05:00
Robert Haas
1be4eb1b2d Disable abbreviated keys on Windows.
Most of the Windows buildfarm members (bowerbird, hamerkop, currawong,
jacana, brolga) are unhappy with yesterday's abbreviated keys patch,
although there are some (narwhal, frogmouth) that seem OK with it.
Since there's no obvious pattern to explain why some are working and
others are failing, just disable this across-the-board on Windows for
now.  This is a bit unfortunate since the optimization will be a big
win in some cases, but we can't leave the buildfarm broken.
2015-01-20 20:32:21 -05:00
Robert Haas
4ea51cdfe8 Use abbreviated keys for faster sorting of text datums.
This commit extends the SortSupport infrastructure to allow operator
classes the option to provide abbreviated representations of Datums;
in the case of text, we abbreviate by taking the first few characters
of the strxfrm() blob.  If the abbreviated comparison is insufficent
to resolve the comparison, we fall back on the normal comparator.
This can be much faster than the old way of doing sorting if the
first few bytes of the string are usually sufficient to resolve the
comparison.

There is the potential for a performance regression if all of the
strings to be sorted are identical for the first 8+ characters and
differ only in later positions; therefore, the SortSupport machinery
now provides an infrastructure to abort the use of abbreviation if
it appears that abbreviation is producing comparatively few distinct
keys.  HyperLogLog, a streaming cardinality estimator, is included in
this commit and used to make that determination for text.

Peter Geoghegan, reviewed by me.
2015-01-19 15:28:27 -05:00
Bruce Momjian
4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Robert Haas
c0828b78e9 Move the guts of our Levenshtein implementation into core.
The hope is that we can use this to produce better diagnostics in
some cases.

Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
2014-11-13 12:33:26 -05:00
Robert Haas
e246b3d6ea Add a fast pre-check for equality of equal-length strings.
Testing reveals that that doing a memcmp() before the strcoll() costs
practically nothing, at least on the systems we tested, and it speeds
up sorts containing many equal strings significatly.

Peter Geoghegan.  Review by myself and Heikki Linnakangas.  Comments
rewritten by me.
2014-09-19 12:39:00 -04:00
Robert Haas
9522ec3e70 Fix typo in b34e37bfef.
Spotted by Peter Geoghegan.
2014-08-26 15:58:50 -04:00
Robert Haas
b34e37bfef Add sortsupport routines for text.
This provides a small but worthwhile speedup when sorting text, at least
in cases to which the sortsupport machinery applies.

Robert Haas and Peter Geoghegan
2014-08-14 12:09:52 -04:00
Bruce Momjian
0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane
741364bf5c Code review for commit d26888bc4d.
Mostly, copy-edit the comments; but also fix it to not reject domains over
arrays.
2014-04-03 16:57:45 -04:00
Tom Lane
9662143f0c Allow regex operations to be terminated early by query cancel requests.
The regex code didn't have any provision for query cancel; which is
unsurprising given its non-Postgres origin, but still problematic since
some operations can take a long time.  Introduce a callback function to
check for a pending query cancel or session termination request, and
call it in a couple of strategic spots where we can make the regex code
exit with an error indicator.

If we ever actually split out the regex code as a standalone library,
some additional work will be needed to let the cancel callback function
be specified externally to the library.  But that's straightforward
(certainly so by comparison to putting the locale-dependent character
classification logic on a similar arms-length basis), and there seems
no need to do it right now.

A bigger issue is that there may be more places than these two where
we need to check for cancels.  We can always add more checks later,
now that the infrastructure is in place.

Since there are known examples of not-terribly-long regexes that can
lock up a backend for a long time, back-patch to all supported branches.
I have hopes of fixing the known performance problems later, but adding
query cancel ability seems like a good idea even if they were all fixed.
2014-03-01 15:20:56 -05:00
Bruce Momjian
7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane
d074b4e50d Fix regexp_matches() handling of zero-length matches.
We'd find the same match twice if it was of zero length and not immediately
adjacent to the previous match.  replace_text_regexp() got similar cases
right, so adjust this search logic to match that.  Note that even though
the regexp_split_to_xxx() functions share this code, they did not display
equivalent misbehavior, because the second match would be considered
degenerate and ignored.

Jeevan Chalke, with some cosmetic changes by me.
2013-07-31 11:31:22 -04:00
Andrew Dunstan
d26888bc4d Move checking an explicit VARIADIC "any" argument into the parser.
This is more efficient and simpler . It does mean that an untyped NULL
can no longer be used in such cases, which should be mentioned in
Release Notes, but doesn't seem a terrible loss. The workaround is to
cast the NULL to some array type.

Pavel Stehule, reviewed by Jeevan Chalke.
2013-07-18 11:52:12 -04:00
Bruce Momjian
9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Peter Eisentraut
cc26ea9fe2 Clean up references to SQL92
In most cases, these were just references to the SQL standard in
general.  In a few cases, a contrast was made between SQL92 and later
standards -- those have been kept unchanged.
2013-04-20 11:04:41 -04:00
Tom Lane
73e7025bd8 Extend format() to handle field width and left/right alignment.
This change adds some more standard sprintf() functionality to format().

Pavel Stehule, reviewed by Dean Rasheed and Kyotaro Horiguchi
2013-03-14 22:56:56 -04:00
Tom Lane
760f3c043a Fix concat() and format() to handle VARIADIC-labeled arguments correctly.
Previously, the VARIADIC labeling was effectively ignored, but now these
functions act as though the array elements had all been given as separate
arguments.

Pavel Stehule
2013-01-25 00:19:56 -05:00
Bruce Momjian
bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Tom Lane
d2286a98ef Allow embedded spaces without quoting in unix_socket_directories entries.
This fix removes an unnecessary incompatibility with the old behavior of
the unix_socket_directory parameter.  Since pathnames with embedded spaces
are fairly popular on some platforms, the incompatibility could be
significant in practice.  We'll still strip unquoted leading/trailing
spaces, however.

No docs update since the documentation already implied that it worked
like this.

Per bug #7514 from Murray Cumming.
2012-09-06 11:43:51 -04:00
Tom Lane
c9b0cbe98b Support having multiple Unix-domain sockets per postmaster.
Replace unix_socket_directory with unix_socket_directories, which is a list
of socket directories, and adjust postmaster's code to allow zero or more
Unix-domain sockets to be created.

This is mostly a straightforward change, but since the Unix sockets ought
to be created after the TCP/IP sockets for safety reasons (better chance
of detecting a port number conflict), AddToDataDirLockFile needs to be
fixed to support out-of-order updates of data directory lockfile lines.
That's a change that had been foreseen to be necessary someday anyway.

Honza Horak, reviewed and revised by Tom Lane
2012-08-10 17:27:15 -04:00
Bruce Momjian
927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Tom Lane
d3b97d1488 Fix string truncation to be multibyte-aware in text_name and bpchar_name.
Previously, casts to name could generate invalidly-encoded results.

Also, make these functions match namein() more exactly, by consistently
using palloc0() instead of ad-hoc zeroing code.

Back-patch to all supported branches.

Karl Schnaitter and Tom Lane
2012-05-25 17:34:51 -04:00
Peter Eisentraut
c0cc526e8b Rename bytea_agg to string_agg and add delimiter argument
Per mailing list discussion, we would like to keep the bytea functions
parallel to the text functions, so rename bytea_agg to string_agg,
which already exists for text.

Also, to satisfy the rule that we don't want aggregate functions of
the same name with a different number of arguments, add a delimiter
argument, just like string_agg for text already has.
2012-04-13 21:36:59 +03:00
Bruce Momjian
e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Robert Haas
d5448c7d31 Add bytea_agg, parallel to string_agg.
Pavel Stehule
2011-12-23 08:40:25 -05:00
Robert Haas
7f0e4bb82e Shave a few cycles in string_agg().
Pavel Stehule
2011-12-21 08:53:50 -05:00
Andrew Dunstan
0f44335122 Miscellaneous cleanup to silence compiler warnings seen on Mingw.
Remove some dead code, conditionally declare some items or call
some code, and fix one or two declarations.
2011-12-10 18:15:15 -05:00
Tom Lane
a5b7640ba0 Fix concat_ws() to not insert a separator after leading NULL argument(s).
Per bug #6181 from Itagaki Takahiro.  Also do some marginal code cleanup
and improve error handling.
2011-08-29 15:20:57 -04:00
Peter Eisentraut
1af55e2751 Use consistent format for reporting GetLastError()
Use something like "error code %lu" for reporting GetLastError()
values on Windows.  Previously, a mix of different wordings and
formats were in use.
2011-08-23 22:00:52 +03:00
Peter Eisentraut
f05c65090a Message style improvements 2011-07-08 07:37:04 +03:00
Peter Eisentraut
27af66162b Message style tweaks 2011-07-05 00:01:35 +03:00
Bruce Momjian
6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Heikki Linnakangas
34be83b7e1 Fix integer overflow in text_format function, reported by Dean Rasheed.
In the passing, clarify the comment on why text_format_nv wrapper is needed.
2011-05-23 22:24:44 +03:00
Bruce Momjian
bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Peter Eisentraut
11745364d0 Add collation support on Windows (MSVC build)
There is not yet support in initdb to populate the pg_collation
catalog, but if that is done manually, the rest should work.
2011-04-10 00:15:41 +03:00
Tom Lane
6e197cb2e5 Improve reporting of run-time-detected indeterminate-collation errors.
pg_newlocale_from_collation does not have enough context to give an error
message that's even a little bit useful, so move the responsibility for
complaining up to its callers.  Also, reword ERRCODE_INDETERMINATE_COLLATION
error messages in a less jargony, more message-style-guide-compliant
fashion.
2011-03-22 16:55:32 -04:00
Peter Eisentraut
414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Tom Lane
1b393f4e5d Avoid detoast in texteq/textne/byteaeq/byteane for unequal-length strings.
We can get the length of a compressed or out-of-line datum without actually
detoasting it.  If the lengths of two strings are unequal, we can then
conclude they are unequal without detoasting.  That saves considerable work
in an admittedly less-common case, without costing anything much when the
optimization doesn't apply.

Noah Misch
2011-01-18 14:11:54 -05:00
Bruce Momjian
5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Robert Haas
32ba2b5160 Use memcmp() rather than strncmp() when shorter string length is known.
It appears that this will be faster for all but the shortest strings;
at least one some platforms, memcmp() can use word-at-a-time comparisons.

Noah Misch, somewhat pared down.
2010-12-21 22:11:40 -05:00
Peter Eisentraut
fc946c39ae Remove useless whitespace at end of lines 2010-11-23 22:34:55 +02:00
Robert Haas
7504870778 Add new SQL function, format(text).
Currently, three conversion format specifiers are supported: %s for a
string, %L for an SQL literal, and %I for an SQL identifier.  The latter
two are deliberately designed not to overlap with what sprintf() already
supports, in case we want to add more of sprintf()'s functionality here
later.

Patch by Pavel Stehule, heavily revised by me.  Reviewed by Jeff Janes
and, in earlier versions, by Itagaki Takahiro and Tom Lane.
2010-11-20 22:33:27 -05:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Itagaki Takahiro
49b27ab551 Add string functions: concat(), concat_ws(), left(), right(), and reverse().
Pavel Stehule, reviewed by me.
2010-08-24 06:30:44 +00:00
Tom Lane
33f43725fb Add three-parameter forms of array_to_string and string_to_array, to allow
better handling of NULL elements within the arrays.  The third parameter
is a string that should be used to represent a NULL element, or should
be translated into a NULL element, respectively.  If the third parameter
is NULL it behaves the same as the two-parameter form.

There are two incompatible changes in the behavior of the two-parameter form
of string_to_array.  First, it will return an empty (zero-element) array
rather than NULL when the input string is of zero length.  Second, if the
field separator is NULL, the function splits the string into individual
characters, rather than returning NULL as before.  These two changes make
this form fully compatible with the behavior of the new three-parameter form.

Pavel Stehule, reviewed by Brendan Jurd
2010-08-10 21:51:00 +00:00
Tom Lane
b0c451e145 Remove the single-argument form of string_agg(). It added nothing much in
functionality, while creating an ambiguity in usage with ORDER BY that at
least two people have already gotten seriously confused by.  Also, add an
opr_sanity test to check that we don't in future violate the newly minted
policy of not having built-in aggregates with the same name and different
numbers of parameters.  Per discussion of a complaint from Thom Brown.
2010-08-05 18:21:19 +00:00
Bruce Momjian
65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane
d5768dce10 Create an official API function for C functions to use to check if they are
being called as aggregates, and to get the aggregate transition state memory
context if needed.  Use it instead of poking directly into AggState and
WindowAggState in places that shouldn't know so much.

We should have done this in 8.4, probably, but better late than never.

Revised version of a patch by Hitoshi Harada.
2010-02-08 20:39:52 +00:00
Itagaki Takahiro
9ea9918e37 Add string_agg aggregate functions. The one argument version concatenates
the input values into a string. The two argument version also does the same
thing, but inserts delimiters between elements.

Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.
2010-02-01 03:14:45 +00:00
Tom Lane
9507c8a1db Add get_bit/set_bit functions for bit strings, paralleling those for bytea,
and implement OVERLAY() for bit strings and bytea.

In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.

Leonardo F, reviewed by Kevin Grittner
2010-01-25 20:55:32 +00:00
Bruce Momjian
0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane
a2a8c7a662 Support hex-string input and output for type BYTEA.
Both hex format and the traditional "escape" format are automatically
handled on input.  The output format is selected by the new GUC variable
bytea_output.

As committed, bytea_output defaults to HEX, which is an *incompatible
change*.  We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.

Peter Eisentraut
2009-08-04 16:08:37 +00:00
Bruce Momjian
d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Heikki Linnakangas
283939a321 varstr_cmp and any comparison function that piggybacks on it can return
any negative or positive number, not just -1 or 1. Fix comment on
varstr_cmp and citext test case accordingly.

As pointed out by Zdenek Kotala, and buildfarm member gothic moth.
2009-04-23 07:19:09 +00:00
Bruce Momjian
511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane
e6a310b281 Reimplement text_position and related functions to use Boyer-Moore-Horspool
searching instead of naive matching.  In the worst case this has the same
O(M*N) complexity as the naive method, but the worst case is hard to hit,
and the average case is very fast, especially with longer patterns.

David Rowley
2008-09-07 04:20:00 +00:00
Tom Lane
7b8a63c3e9 Alter the xxx_pattern_ops opclasses to use the regular equality operator of
the associated datatype as their equality member.  This means that these
opclasses can now support plain equality comparisons along with LIKE tests,
thus avoiding the need for an extra index in some applications.  This
optimization was not possible when the pattern opclasses were first introduced,
because we didn't insist that text equality meant bitwise equality; but we
do now, so there is no semantic difference between regular and pattern
equality operators.

I removed the name_pattern_ops opclass altogether, since it's really useless:
name's regular comparisons are just strcmp() and are unlikely to become
something different.  Instead teach indxpath.c that btree name_ops can be
used for LIKE whether or not the locale is C.  This might lead to a useful
speedup in LIKE queries on the system catalogs in non-C locales.

The ~=~ and ~<>~ operators are gone altogether.  (It would have been nice to
keep them for backward compatibility's sake, but since the pg_amop structure
doesn't allow multiple equality operators per opclass, there's no way.)

A not-immediately-obvious incompatibility is that the sort order within
bpchar_pattern_ops indexes changes --- it had been identical to plain
strcmp, but is now trailing-blank-insensitive.  This will impact
in-place upgrades, if those ever happen.

Per discussions a couple months ago.
2008-05-27 00:13:09 +00:00
Alvaro Herrera
f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Tom Lane
ba1c463096 Clean up a few places where Datums were being treated as pointers without
going through DatumGetPointer or some other "official" conversion macro.
Not actually a bug, since Datum the same size as pointer is the only
supported case at the moment, but good cleanup for the future.

Gavin Sherry
2008-04-12 23:21:04 +00:00
Tom Lane
220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
Tom Lane
5e00913daf Fix varstr_cmp's special case for UTF8 encoding on Windows so that strings
that are reported as "equal" by wcscoll() are checked to see if they really
are bitwise equal, and are sorted per strcmp() if not.  We made this happen
a couple of years ago in the regular code path, but it unaccountably got
left out of the Windows/UTF8 case (probably brain fade on my part at the
time).  As in the prior set of changes, affected users may need to reindex
indexes on textual columns.

Backpatch as far as 8.2, which is the oldest release we are still supporting
on Windows.
2008-03-13 18:31:56 +00:00
Bruce Momjian
9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian
f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Bruce Momjian
fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane
5e87ebb0c3 Although I'd misdiagnosed the reason for the recent failures on
buildfarm member grebe, I see no reason to revert the 1-byte-header-friendly
changes I made in varlena.c.  Instead, tweak the code a little bit to
get more advantage out of that.
2007-09-22 04:40:03 +00:00
Tom Lane
b5d1608b0a Fix varlena.c routines to allow 1-byte-header text values. This is now
demonstrably necessary for text_substring() since regexp_split functions
may pass it such a value; and we might as well convert the whole file
at once.  Per buildfarm results (though I wonder why most machines aren't
showing a failure).
2007-09-22 00:36:38 +00:00
Tom Lane
4ca7a2dacb Make replace(), split_part(), and string_to_array() behave somewhat sanely
when handed an invalidly-encoded pattern.  The previous coding could get
into an infinite loop if pg_mb2wchar_with_len() returned a zero-length
string after we'd tested for nonempty pattern; which is exactly what it
will do if the string consists only of an incomplete multibyte character.
This led to either an out-of-memory error or a backend crash depending
on platform.  Per report from Wiktor Wodecki.
2007-07-19 20:34:20 +00:00
Tom Lane
3e23b68dac Support varlena fields with single-byte headers and unaligned storage.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
2007-04-06 04:21:44 +00:00
Tom Lane
234a02b2a8 Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names.  Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught.  In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
2007-02-27 23:48:10 +00:00
Bruce Momjian
29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane
a5cf12e2ef Fix performance issues in replace_text(), replace_text_regexp(), and
text_to_array(): they all had O(N^2) behavior on long input strings in
multibyte encodings, because of repeated rescanning of the input text to
identify substrings whose positions/lengths were computed in characters
instead of bytes.  Fix by tracking the current source position as a char
pointer as well as a character-count.  Also avoid some unnecessary palloc
operations.  text_to_array() also leaked memory intracall due to failure
to pfree temporary strings.  Per gripe from Tatsuo Ishii.
2006-11-08 19:22:25 +00:00
Tom Lane
452fa214e5 Fix string_to_array() to correctly handle the case where there are
overlapping possible matches for the separator string, such as
string_to_array('123xx456xxx789', 'xx').
Also, revise the logic of replace(), split_part(), and string_to_array()
to avoid O(N^2) work from redundant searches and conversions to pg_wchar
format when there are N matches to the separator string.
Backpatched the full patch as far as 8.0.  7.4 also has the bug, but the
code has diverged a lot, so I just went for a quick-and-dirty fix of the
bug itself in that branch.
2006-10-07 00:11:53 +00:00
Bruce Momjian
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian
e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Bruce Momjian
a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Tom Lane
47a37aeebd Split definitions for md5.c out of crypt.h and into their own header
libpq/md5.h, so that there's a clear separation between backend-only
definitions and shared frontend/backend definitions.  (Turns out this
is reversing a bad decision from some years ago...)  Fix up references
to crypt.h as needed.  I looked into moving the code into src/port, but
the headers in src/include/libpq are sufficiently intertwined that it
seems more work than it's worth to do that.
2006-06-20 19:56:52 +00:00
Tom Lane
c61a2f5841 Change the backend to reject strings containing invalidly-encoded multibyte
characters in all cases.  Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release).  Embedded
zero (null) bytes will be rejected as well.  The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated.  Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.

Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.

Patches by Tatsuo Ishii and Tom Lane.  Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.
2006-05-21 20:05:21 +00:00