case where there is a match to the pattern overall but the user has specified
a parenthesized subexpression and that subexpression hasn't got a match.
An example is substring('foo' from 'foo(bar)?'). This should return NULL,
since (bar) isn't matched, but it was mistakenly returning the whole-pattern
match instead (ie, 'foo'). Per bug #4044 from Rui Martins.
This has been broken since the beginning; patch in all supported versions.
The old behavior was sufficiently inconsistent that it's impossible to believe
anyone is depending on it.
that are reported as "equal" by wcscoll() are checked to see if they really
are bitwise equal, and are sorted per strcmp() if not. We made this happen
a couple of years ago in the regular code path, but it unaccountably got
left out of the Windows/UTF8 case (probably brain fade on my part at the
time). As in the prior set of changes, affected users may need to reindex
indexes on textual columns.
Backpatch as far as 8.2, which is the oldest release we are still supporting
on Windows.
pattern-examination heuristic method to purely histogram-driven selectivity at
histogram size 100, we compute both estimates and use a weighted average.
The weight put on the heuristic estimate decreases linearly with histogram
size, dropping to zero for 100 or more histogram entries.
Likewise in ltreeparentsel(). After a patch by Greg Stark, though I
reorganized the logic a bit to give the caller of histogram_selectivity()
more control.
of the generated range condition var >= 'foo' AND var < 'fop' as being less
than what eqsel() would estimate for var = 'foo'. This is intuitively
reasonable and it gets rid of the need for some entirely ad-hoc coding we
formerly used to reject bogus estimates. The basic problem here is that
if the prefix is more than a few characters long, the two boundary values
are too close together to be distinguishable by comparison to the column
histogram, resulting in a selectivity estimate of zero, which is often
not very sane. Change motivated by an example from Peter Eisentraut.
Arguably this is a bug fix, but I'll refrain from back-patching it
for the moment.
available output buffer when presented with corrupt input. Some testing
suggests that this slows the decompression loop about 1%, which seems an
acceptable price to pay for more robustness. (Curiously, the penalty
seems to be *less* on not-very-compressible data, which I didn't expect
since the overhead per output byte ought to be more in the literal-bytes
path.)
Patch from Zdenek Kotala. I fixed a corner case and did some renaming
of variables to make the routine more readable.
were discussed last year, but we felt it was too late in the 8.3 cycle to
change the code immediately. Specifically, the patch:
* Reduces the minimum datum size to be considered for compression from
256 to 32 bytes, as suggested by Greg Stark.
* Increases the required compression rate for compressed storage from
20% to 25%, again per Greg's suggestion.
* Replaces force_input_size (size above which compression is forced)
with a maximum size to be considered for compression. It was agreed
that allowing large inputs to escape the minimum-compression-rate
requirement was not bright, and that indeed we'd rather have a knob
that acted in the other direction. I set this value to 1MB for the
moment, but it could use some performance studies to tune it.
* Adds an early-failure path to the compressor as suggested by Jan:
if it's been unable to find even one compressible substring in the
first 1KB (parameterizable), assume we're looking at incompressible
input and give up. (Possibly this logic can be improved, but I'll
commit it as-is for now.)
* Improves the toasting heuristics so that when we have very large
fields with attstorage 'x' or 'e', we will push those out to toast
storage before considering inline compression of shorter fields.
This also responds to a suggestion of Greg's, though my original
proposal for a solution was a bit off base because it didn't fix
the problem for large 'e' fields.
There was some discussion in the earlier threads of exposing some
of the compression knobs to users, perhaps even on a per-column
basis. I have not done anything about that here. It seems to me
that if we are changing around the parameters, we'd better get some
experience and be sure we are happy with the design before we set
things in stone by providing user-visible knobs.
to explicitly cast the output back to char before comparing it to a char
value, else we get the wrong result for high-bit-set characters. Found by
Rolf Jentsch. Also, fix several places where <ctype.h> functions were being
called without casting the argument to unsigned char; this is likewise
unportable, but we keep making that mistake :-(. These found by buildfarm
member salamander, which I will desperately miss if it ever goes belly-up.
left in the code though it was not meant to be provided. It represents a
security hole because unprivileged users could use it to look at (at least the
first line of) any file readable by the backend. Fortunately, this is only
possible if the backend was built with XML support, so the damage is at least
mitigated; and 8.3 probably hasn't propagated into any security-critical uses
yet anyway. Per report from Sergey Burladyan.
values into \nnn octal escape sequences. When the database encoding is
multibyte this is *necessary* to avoid generating invalidly encoded text.
Even in a single-byte encoding, the old behavior seems very hazardous ---
consider for example what happens if the text is transferred to another
database with a different encoding. Decoding would then yield some other
bytea value than what was encoded, which is surely undesirable. Per gripe
from Hernan Gonzalez.
Backpatch to 8.3, but not further. This is a bit of a judgment call, but I
make it on these grounds: pre-8.3 we don't really have much encoding safety
anyway because of the convert() function family, and we would also have much
higher risk of breaking existing apps that may not be expecting this behavior.
8.3 is still new enough that we can probably get away with making this change
in the function's behavior.
(then it means 2000 AD). Formerly we silently interpreted this as 1 BC,
which at best is unwarranted familiarity with the implementation.
It's barely possible that some app somewhere expects the old behavior,
though, so we won't back-patch this into existing release branches.
Formerly, DecodeDate attempted to verify the day-of-the-month exactly, but
it was under the misapprehension that it would know whether we were looking
at a BC year or not. In reality this check can't be made until the calling
function (eg DecodeDateTime) has processed all the fields. So, split the
BC adjustment and validity checks out into a new function ValidateDate that
is called only after processing all the fields. In passing, this patch
makes DecodeTimeOnly work for BC inputs, which it never did before.
(The historical veracity of all this is nonexistent, of course, but if
we're going to say we support proleptic Gregorian calendar then we should
do it correctly. In any case the unpatched code is broken because it could
emit dates that it would then reject on re-inputting.)
Per report from Bernd Helmle. Back-patch as far as 8.0; in 7.x we were
not using our own calendar support and so this seems a bit too risky
to put into 7.4.
and RI_FKey_keyequal_upd_fk, as well as no-longer-needed calls of
ri_BuildQueryKeyFull. Aside from saving a few cycles, this avoids needless
deadlock risks when an update is not changing the columns that participate
in an RI constraint. Per a gripe from Alexey Nalbat.
Back-patch to 8.3. Earlier releases did have a need to open the other
relation due to the way in which they retrieved information about the RI
constraint, so this problem unfortunately can't easily be improved pre-8.3.
Tom Lane and Stephan Szabo
data structures and backend internal APIs. This solves problems we've seen
recently with inconsistent layout of pg_control between machines that have
32-bit time_t and those that have already migrated to 64-bit time_t. Also,
we can get out from under the problem that Windows' Unix-API emulation is not
consistent about the width of time_t.
There are a few remaining places where local time_t variables are used to hold
the current or recent result of time(NULL). I didn't bother changing these
since they do not affect any cross-module APIs and surely all platforms will
have 64-bit time_t before overflow becomes an actual risk. time_t should
be avoided for anything visible to extension modules, however.
the parser supplies a default typmod that can result in data loss (ie,
truncation). Currently that appears to be only CHARACTER and BIT.
We can avoid the problem by specifying the type's internal name instead
of using SQL-spec syntax. Since the queries generated here are only used
internally, there's no need to worry about portability. This problem is
new in 8.3; before we just let the parser do whatever it wanted to resolve
the operator, but 8.3 is trying to be sure that the semantics of FK checks
are consistent. Per report from Harald Fuchs.
ri_FetchConstraintInfo, to avoid a query-duration memory leak when that
routine is called by RI_FKey_keyequal_upd_fk (which isn't executed in a
short-lived context). This problem was latent when the routine was added
in February, but it didn't become serious until the varvarlena patch made
it quite likely that the fields being examined would be "toasted" (ie, have
short headers). Per report from Stephen Denne.
in whichever context happens to be current during a call of an xml.c function,
use a dedicated context that will not go away until we explicitly delete it
(which we do at transaction end or subtransaction abort). This makes recovery
after an error much simpler --- we don't have to individually delete the data
structures created by libxml. Also, we need to initialize and cleanup libxml
only once per transaction (if there's no error) instead of once per function
call, so it should be a bit faster. We'll need to keep an eye out for
intra-transaction memory leaks, though. Alvaro and Tom.
Therefore we must xmlCleanupParser(), or we risk leaving behind
dangling pointers to whatever memory context is current when xml_init()
is called. This seems to fix bug #3860, though we might still want
the more invasive solution being worked on by Alvaro.
constant ORDER/GROUP BY entries properly:
http://archives.postgresql.org/pgsql-hackers/2001-04/msg00457.php
The original solution to that was in fact no good, as demonstrated by
today's report from Martin Pitt:
http://archives.postgresql.org/pgsql-bugs/2008-01/msg00027.php
We can't use the column-number-reference format for a constant that is
a resjunk targetlist entry, a case that was unfortunately not thought of
in the original discussion. What we can do instead (which did not work
at the time, but does work in 7.3 and up) is to emit the constant with
explicit ::typename decoration, even if it otherwise wouldn't need it.
This is sufficient to keep the parser from thinking it's a column number
reference, and indeed is probably what the user must have done to get
such a thing into the querytree in the first place.
and CLUSTER) execute as the table owner rather than the calling user, using
the same privilege-switching mechanism already used for SECURITY DEFINER
functions. The purpose of this change is to ensure that user-defined
functions used in index definitions cannot acquire the privileges of a
superuser account that is performing routine maintenance. While a function
used in an index is supposed to be IMMUTABLE and thus not able to do anything
very interesting, there are several easy ways around that restriction; and
even if we could plug them all, there would remain a risk of reading sensitive
information and broadcasting it through a covert channel such as CPU usage.
To prevent bypassing this security measure, execution of SET SESSION
AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context.
Thanks to Itagaki Takahiro for reporting this vulnerability.
Security: CVE-2007-6600
print the index key variable or expression for that column. It was mistakenly
printing ASC/DESC/NULLS FIRST/NULLS LAST decoration too --- and not only for
the target column, but all columns. Someday we should have an option to
extract that info (and the opclass decoration as well) for a single index
column ... but today is not that day. Per bug #3829 and subsequent
discussion.
The zero-point case is sensible so far as the data structure is concerned,
so maybe we ought to allow it sometime; but right now the textual input
routines for these types don't allow it, and it seems that not all the
functions for the types are prepared to cope.
Report and patch by Merlin Moncure.
the two join variables at both ends: not only trailing rows that need not be
scanned because there cannot be a match on the other side, but initial rows
that will be scanned without possibly having a match. This allows a more
realistic estimate of startup cost to be made, per recent pgsql-performance
discussion. In passing, fix a couple of bugs that had crept into
mergejoinscansel: it was not quite up to speed for the task of estimating
descending-order scans, which is a new requirement in 8.3.
constraint status of copied indexes (bug #3774), as well as various other
small bugs such as failure to pstrdup when needed. Allow INCLUDING INDEXES
indexes to be merged with identical declared indexes (perhaps not real useful,
but the code is there and having it not apply to LIKE indexes seems pretty
unorthogonal). Avoid useless work in generateClonedIndexStmt(). Undo some
poorly chosen API changes, and put a couple of routines in modules that seem
to be better places for them.
inappropriately generic-sounding names. This is more or less free since
we already forced initdb for the next beta, and it may prevent confusion or
name conflicts (particularly at the C-global-symbol level) down the road.
Per my proposal yesterday.
if the locale has the thousands separator as "". This now matches the
to_char and psql numericlocale behavior. (Previously this data type was
basically useless for such setups.)