Commit Graph

22853 Commits

Author SHA1 Message Date
Tom Lane 5223f96d92 Fix regex back-references that are directly quantified with *.
The syntax "\n*", that is a backref with a * quantifier directly applied
to it, has never worked correctly in Spencer's library.  This has been an
open bug in the Tcl bug tracker since 2005:
https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894

The core of the problem is in parseqatom(), which first changes "\n*" to
"\n+|" and then applies repeat() to the NFA representing the backref atom.
repeat() thinks that any arc leading into its "rp" argument is part of the
sub-NFA to be repeated.  Unfortunately, since parseqatom() already created
the arc that was intended to represent the empty bypass around "\n+", this
arc gets moved too, so that it now leads into the state loop created by
repeat().  Thus, what was supposed to be an "empty" bypass gets turned into
something that represents zero or more repetitions of the NFA representing
the backref atom.  In the original example, in place of
	^([bc])\1*$
we now have something that acts like
	^([bc])(\1+|[bc]*)$
At runtime, the branch involving the actual backref fails, as it's supposed
to, but then the other branch succeeds anyway.

We could no doubt fix this by some rearrangement of the operations in
parseqatom(), but that code is plenty ugly already, and what's more the
whole business of converting "x*" to "x+|" probably needs to go away to fix
another problem I'll mention in a moment.  Instead, this patch suppresses
the *-conversion when the target is a simple backref atom, leaving the case
of m == 0 to be handled at runtime.  This makes the patch in regcomp.c a
one-liner, at the cost of having to tweak cbrdissect() a little.  In the
event I went a bit further than that and rewrote cbrdissect() to check all
the string-length-related conditions before it starts comparing characters.
It seems a bit stupid to possibly iterate through many copies of an
n-character backreference, only to fail at the end because the target
string's length isn't a multiple of n --- we could have found that out
before starting.  The existing coding could only be a win if integer
division is hugely expensive compared to character comparison, but I don't
know of any modern machine where that might be true.

This does not fix all the problems with quantified back-references.  In
particular, the code is still broken for back-references that appear within
a larger expression that is quantified (so that direct insertion of the
quantification limits into the BACKREF node doesn't apply).  I think fixing
that will take some major surgery on the NFA code, specifically introducing
an explicit iteration node type instead of trying to transform iteration
into concatenation of modified regexps.

Back-patch to all supported branches.  In HEAD, also add a regression test
case for this.  (It may seem a bit silly to create a regression test file
for just one test case; but I'm expecting that we will soon import a whole
bunch of regex regression tests from Tcl, so might as well create the
infrastructure now.)
2012-02-20 00:52:33 -05:00
Tom Lane e00f68e49c Add caching of ctype.h/wctype.h results in regc_locale.c.
While this doesn't save a huge amount of runtime, it still seems worth
doing, especially since I realized that the data copying I did in my first
draft was quite unnecessary.  In this version, once we have the results
cached, getting them back for re-use is really very cheap.

Also, remove the hard-wired limitation to not consider wctype.h results for
character codes above 255.  It turns out that we can't push the limit as
far up as I'd originally hoped, because the regex colormap code is not
efficient enough to cope very well with character classes containing many
thousand letters, which a Unicode locale is entirely capable of producing.
Still, we can push it up to U+7FF (which I chose as the limit of 2-byte
UTF8 characters), which will at least make Eastern Europeans happy pending
a better solution.  Thus, this commit resolves the specific complaint in
bug #6457, but not the more general issue that letters of non-western
alphabets are mostly not recognized as matching [[:alpha:]].
2012-02-19 21:01:13 -05:00
Tom Lane 27af91438b Create the beginnings of internals documentation for the regex code.
Create src/backend/regex/README to hold an implementation overview of
the regex package, and fill it in with some preliminary notes about
the code's DFA/NFA processing and colormap management.  Much more to
do there of course.

Also, improve some code comments around the colormap and cvec code.
No functional changes except to add one missing assert.
2012-02-19 18:58:23 -05:00
Andrew Dunstan 2f582f76b1 Improve pretty printing of viewdefs.
Some line feeds are added to target lists and from lists to make
them more readable. By default they wrap at 80 columns if possible,
but the wrap column is also selectable - if 0 it wraps after every
item.

Andrew Dunstan, reviewed by Hitoshi Harada.
2012-02-19 11:43:46 -05:00
Michael Meskes 84ff5b5db5 In ecpglib rewrote code that used strtok_r to not use library functions
anymore. This way we don't have to worry which compiler on which OS offers
which version of strtok.
2012-02-19 14:50:14 +01:00
Tom Lane 759c95c45b Update expected/collate.linux.utf8.out for recent plpgsql changes.
This file was missed in commit 4c6cedd1b0.
2012-02-18 18:08:02 -05:00
Michael Meskes 45b7ab6b59 gcc on Windows does not know about strtok_s. 2012-02-18 17:20:53 +01:00
Michael Meskes e3155c97b0 Windows doesn't have strtok_r, so let's use strtok_s instead. 2012-02-18 15:56:39 +01:00
Michael Meskes 5e7710e725 Make sure all connection paramters are used in call to PQconnectdbParams. 2012-02-18 14:18:16 +01:00
Tom Lane 08fd6ff37f Sync regex code with Tcl 8.5.11.
Sync our regex code with upstream changes since last time we did this,
which was Tcl 8.5.0 (see commit df1e965e12).

There are no functional changes here; the main point is just to lay down
a commit-log marker that somebody has looked at this recently, and to do
what we can to keep the two codebases comparable.
2012-02-17 19:44:26 -05:00
Tom Lane 4767bc8ff2 Improve statistics estimation to make some use of DISTINCT in sub-queries.
Formerly, we just punted when trying to estimate stats for variables coming
out of sub-queries using DISTINCT, on the grounds that whatever stats we
might have for underlying table columns would be inapplicable.  But if the
sub-query has only one DISTINCT column, we can consider its output variable
as being unique, which is useful information all by itself.  The scope of
this improvement is pretty narrow, but it costs nearly nothing, so we might
as well do it.  Per discussion with Andres Freund.

This patch differs from the draft I submitted yesterday in updating various
comments about vardata.isunique (to reflect its extended meaning) and in
tweaking the interaction with security_barrier views.  There does not seem
to be a reason why we can't use this sort of knowledge even when the
sub-query is such a view.
2012-02-16 17:34:00 -05:00
Robert Haas 1cc1b91d1b pg_dump: Miscellaneous tightening based on recent refactorings.
Use exit_horribly() and ExecuteSqlQueryForSingleRow() in various
places where it's equivalent, or nearly equivalent, to the prior
coding. Apart from being more compact, this also makes the error
messages for the wrong-number-of-tuples case more consistent.
2012-02-16 13:24:19 -05:00
Robert Haas 689d0eb7db pg_dump: Remove global connection pointer.
Parallel pg_dump wants to have multiple ArchiveHandle objects, and
therefore multiple PGconns, in play at the same time.  This should
be just about the end of the refactoring that we need in order to
make that workable.
2012-02-16 13:00:24 -05:00
Robert Haas 549e93c990 Refactor pg_dump.c to avoid duplicating returns-one-row check.
Any patches apt to get broken have probably already been broken by the
error-handling cleanups I just did, so we might as well clean this up
at the same time.
2012-02-16 12:07:06 -05:00
Robert Haas e9a22259c4 Invent on_exit_nicely for pg_dump.
Per recent discussions on pgsql-hackers regarding parallel pg_dump.
2012-02-16 11:49:20 -05:00
Tom Lane 4bfe68dfab Run a portal's cleanup hook immediately when pushing it to FAILED state.
This extends the changes of commit 6252c4f9e2
so that we run the cleanup hook earlier for failure cases as well as
success cases.  As before, the point is to avoid an assertion failure from
an Assert I added in commit a874fe7b4c, which
was meant to check that no user-written code can be called during portal
cleanup.  This fixes a case reported by Pavan Deolasee in which the Assert
could be triggered during backend exit (see the new regression test case),
and also prevents the possibility that the cleanup hook is run after
portions of the portal's state have already been recycled.  That doesn't
really matter in current usage, but it foreseeably could matter in the
future.

Back-patch to 9.1 where the Assert in question was added.
2012-02-15 16:19:01 -05:00
Robert Haas edec8c8e00 Fix VPATH builds, broken by my recent commit to speed up tuplesorting.
The relevant commit is 337b6f5ecf.
2012-02-15 15:53:53 -05:00
Robert Haas 337b6f5ecf Speed up in-memory tuplesorting.
Per recent work by Peter Geoghegan, it's significantly faster to
tuplesort on a single sortkey if ApplySortComparator is inlined into
quicksort rather reached via a function pointer.  It's also faster
in general to have a version of quicksort which is specialized for
sorting SortTuple objects rather than objects of arbitrary size and
type.  This requires a couple of additional copies of the quicksort
logic, which in this patch are generate using a Perl script.  There
might be some benefit in adding further specializations here too,
but thus far it's not clear that those gains are worth their weight
in code footprint.
2012-02-15 12:13:32 -05:00
Robert Haas ac9100f8cf More regression tests for LEAKPROOF/NOT LEAKPROOF stuff.
Along the way, move create_function_3 into a parallel schedule.

KaiGai Kohei
2012-02-15 10:56:26 -05:00
Robert Haas 73a4b994a6 Make CREATE/ALTER FUNCTION support NOT LEAKPROOF.
Because it isn't good to be able to turn things on, and not off again.
2012-02-15 10:45:08 -05:00
Tom Lane 398f70ec07 Preserve column names in the execution-time tupledesc for a RowExpr.
The hstore and json datatypes both have record-conversion functions that
pay attention to column names in the composite values they're handed.
We used to not worry about inserting correct field names into tuple
descriptors generated at runtime, but given these examples it seems
useful to do so.  Observe the nicer-looking results in the regression
tests whose results changed.

catversion bump because there is a subtle change in requirements for stored
rule parsetrees: RowExprs from ROW() constructs now have to include field
names.

Andrew Dunstan and Tom Lane
2012-02-14 17:34:56 -05:00
Robert Haas dc66f1c5f2 Remove new, intermittently failing regression test.
Per buildfarm.
2012-02-13 23:43:24 -05:00
Robert Haas e37e448650 Fix new create_function_3 regression tests not to rely on tuple order.
Per buildfarm.
2012-02-13 22:49:07 -05:00
Robert Haas cd30728fb2 Allow LEAKPROOF functions for better performance of security views.
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK.  This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).

There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag.  But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.

KaiGai Kohei, with some wordsmithing by me.
2012-02-13 22:21:14 -05:00
Michael Meskes 9a4880a0dd Do not use the variable name when defining a varchar structure in ecpg.
With a unique counter being added anyway, there is no need anymore to have the variable name listed, too.
2012-02-13 15:49:50 +01:00
Heikki Linnakangas 21b1634275 Fix heap_multi_insert to set t_self field in the caller's tuples.
If tuples were toasted, heap_multi_insert didn't update the ctid on the
original tuples. This caused a failure if there was an after trigger
(including a foreign key), on the table, and a tuple got toasted.

Per off-list report and test case from Ted Phelps
2012-02-13 10:20:50 +02:00
Heikki Linnakangas b4e3633ac4 Silence warning about deprecated assignment to $[ in check_keywords.pl
Alex Hunsaker
2012-02-13 09:15:08 +02:00
Tom Lane 58a9596ed4 Fix I/O-conversion-related memory leaks in plpgsql.
Datatype I/O functions are allowed to leak memory in CurrentMemoryContext,
since they are generally called in short-lived contexts.  However, plpgsql
calls such functions for purposes of type conversion, and was calling them
in its procedure context.  Therefore, any leaked memory would not be
recovered until the end of the plpgsql function.  If such a conversion
was done within a loop, quite a bit of memory could get consumed.  Fix by
calling such functions in the transient "eval_econtext", and adjust other
logic to match.  Back-patch to all supported versions.

Andres Freund, Jan Urbański, Tom Lane
2012-02-11 18:06:24 -05:00
Tom Lane 59de132f9a Fix oversight in pg_dump's handling of extension configuration tables.
If an extension has not been selected to be dumped (perhaps because of
a --schema or --table switch), the contents of its configuration tables
surely should not get dumped either.  Per gripe from
Hubert Depesz Lubaczewski.
2012-02-10 15:22:14 -05:00
Tom Lane 97dc3c8a14 Fix brain fade in previous pg_dump patch.
In pre-7.3 databases, pg_attribute.attislocal doesn't exist.  The easiest
way to make sure the new inheritance logic behaves sanely is to assume it's
TRUE, not FALSE.  This will result in printing child columns even when
they're not really needed.  We could work harder at trying to reconstruct a
value for attislocal, but there is little evidence that anyone still cares
about dumping from such old versions, so just do the minimum necessary to
have a valid dump.

I had this correct in the original draft of the patch, but for some
unaccountable reason decided it wasn't necessary to change the value.
Testing against an old server shows otherwise...
2012-02-10 14:09:21 -05:00
Tom Lane 00bc96bd2b Fix pg_dump for better handling of inherited columns.
Revise pg_dump's handling of inherited columns, which was last looked at
seriously in 2001, to eliminate several misbehaviors associated with
inherited default expressions and NOT NULL flags.  In particular make sure
that a column is printed in a child table's CREATE TABLE command if and
only if it has attislocal = true; the former behavior would sometimes cause
a column to become marked attislocal when it was not so marked in the
source database.  Also, stop relying on textual comparison of default
expressions to decide if they're inherited; instead, don't use
default-expression inheritance at all, but just install the default
explicitly at each level of the hierarchy.  This fixes the
search-path-related misbehavior recently exhibited by Chester Young, and
also removes some dubious assumptions about the order in which ALTER TABLE
SET DEFAULT commands would be executed.

Back-patch to all supported branches.
2012-02-10 13:28:05 -05:00
Tom Lane d06e2d2005 Add ORDER BY to a query to prevent occasional regression test failures.
Per buildfarm, we sometimes get row-ordering variations in the output.
This also makes this query look more like numerous other ones in the same
test file.
2012-02-10 02:33:00 -05:00
Peter Eisentraut 169c8a9112 psql: Support zero byte field and record separators
Add new psql settings and command-line options to support setting the
field and record separators for unaligned output to a zero byte, for
easier interfacing with other shell tools.

reviewed by Abhijit Menon-Sen
2012-02-09 20:20:15 +02:00
Robert Haas dd7c84185c Attempt to fix MSVC builds and other fls-related breakage.
Thanks to Andrew Dunstan for bringing this to my attention.
2012-02-09 12:39:33 -05:00
Robert Haas d429ebe347 Add a comment to AdjustIntervalForTypmod to reduce chance of future bugs.
It's not entirely evident how the logic here relates to the
interval_transform function, so let's clue people in that they need to
check that if the rules change.
2012-02-09 12:24:36 -05:00
Robert Haas 6656588575 Improve interval_transform function to detect a few more cases.
Noah Misch, per a review comment from me.
2012-02-09 12:24:22 -05:00
Magnus Hagander d7ea9193d1 Have pg_receivexlog always send an invalid log position in status messages
This prevents pg_basebackup and pg_receivexlog from becoming a synchronous
standby in case 'write' is used for synchronous_commit.

Fujii Masao
2012-02-09 14:12:49 +01:00
Heikki Linnakangas 82e73ba0d1 Add new keywords SNAPSHOT and TYPES to the keyword list in gram.y
These were added to kwlist.h as unreserved keywords in separate patches,
but authors forgot to add them to the corresponding list in gram.y.
Because of that, even though they were supposed to be unreserved keywords,
they could not be used as identifiers. src/tools/check_keywords.pl is your
friend.
2012-02-09 11:37:54 +02:00
Tom Lane 331bf6712c Throw error sooner for unlogged GiST indexes.
Throwing an error only after we've built the main index fork is pretty
unfriendly when the table already contains data.  Per gripe from Jay
Levitt.
2012-02-08 16:19:27 -05:00
Tom Lane d77354eaec Fix up dumping conditions for extension configuration tables.
Various filters that were meant to prevent dumping of table data were not
being applied to extension config tables, notably --exclude-table-data and
--no-unlogged-table-data.  We also would bogusly try to dump data from
views, sequences, or foreign tables, should an extension try to claim they
were config tables.  Fix all that, and refactor/redocument to try to make
this a bit less fragile.  This reverts the implementation, though not the
feature, of commit 7b070e896c, which had
broken config-table dumping altogether :-(.

It is still the case that the code will dump config-table data even if
--schema is specified.  That behavior was intentional, as per the comments
in getExtensionMembership, so I think it requires some more discussion
before we change it.
2012-02-08 15:23:00 -05:00
Tom Lane cb7c84fae8 Check misplaced window functions before checking aggregate/group by sanity.
If somebody puts a window function in WHERE, we should complain about that
in so many words.  The previous coding tended to complain about the window
function's arguments instead, which is likely to be misleading to users who
are unclear on the semantics of window functions; as seen for example in
bug #6440 from Matyas Novak.

Just another example of how "add new code at the end" is frequently a bad
heuristic.
2012-02-08 13:15:02 -05:00
Tom Lane cbba55d6d7 Support min/max index optimizations on boolean columns.
Since bool_and() is equivalent to min(), and bool_or() to max(), we might
as well let them be index-optimized in the same way.  The practical value
of this is debatable at best, but it seems nearly cost-free to enable it.
Code-wise, we need only adjust the entries in pg_aggregate.  There is a
measurable planning speed penalty for a query involving one of these
aggregates, but it is only a few percent in simple cases, so that seems
acceptable.

Marti Raudsepp, reviewed by Abhijit Menon-Sen
2012-02-08 12:41:48 -05:00
Tom Lane 3db6524fe6 Mark some more I/O-conversion-invoking functions as stable not volatile.
When written, textanycat, anytextcat, quote_literal, and quote_nullable
were marked volatile, because they could invoke arbitrary type-specific
output functions as part of casting their anyelement arguments to text.
Since then, we have defined a project policy that I/O functions must not
be volatile, as per commit aab353a60b.
So these functions can safely be downgraded to stable.  Most of the time
this makes no difference since they'll get inlined anyway, but as noted
by Andrew Dunstan, there are cases where the volatile marking prevents
optimizations that the planner does before function inlining.  (I think
I might have overlooked these functions in the earlier commit on the
grounds that inlining would make it moot, but not so --- tgl)

This change results in a change in the expected output of the json
regression tests, because the planner can now flatten a sub-select
that it failed to before.  The old output is preferable, but getting
that back will require some as-yet-unfinished work on RowExpr handling.

Marti Raudsepp
2012-02-08 11:29:29 -05:00
Robert Haas c13897983a Add transform functions for various temporal typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds in some cases.

Noah Misch, with trivial changes by me.
2012-02-08 09:33:37 -05:00
Heikki Linnakangas 1a01560cbb Rename LWLockWaitUntilFree to LWLockAcquireOrWait.
LWLockAcquireOrWait makes it more clear that the lock is acquired if it's
free.
2012-02-08 09:17:13 +02:00
Robert Haas af7dd696b0 Fix typos pointed out by Noah Misch. 2012-02-07 21:40:49 -05:00
Peter Eisentraut e09509bd33 pg_dump: Add some const qualifiers 2012-02-07 23:20:29 +02:00
Peter Eisentraut d66b31c94f pg_regress: Use target-specific variable instead of overriding make rule
Use a target-specific variable to add to CPPFLAGS instead of writing a
custom .c -> .o rule.  This will ensure that dependency tracking is
used when enabled.
2012-02-07 22:42:19 +02:00
Heikki Linnakangas 5ece8ecae8 Fix typo in comment. 2012-02-07 21:21:50 +02:00
Robert Haas 4f658dc851 Support fls().
The immediate impetus for this is that Noah Misch's patch to elide
unnecessary table and index rebuilds when changing typmod for temporal
types uses it; and this is extracted from that patch, with some
further commentary by me.  But it seems logically separate from the
remainder of the patch, so I'm committing it separately; this is not
the first time someone has wanted fls() in the backend and probably
won't be the last.

If we end up using this in more performance-critical spots it may be
worthwhile to add some architecture-specific optimizations to our
src/port version of fls() - e.g. any x86 platform can implement this
using the assembly instruction BSRL.  But performance won't matter
a bit for assessing typmod changes, so I'm not worried about that
right now.
2012-02-07 13:45:46 -05:00