Commit Graph

27435 Commits

Author SHA1 Message Date
Robert Haas 179c97bf58 Remove accidentally-committed debugging code.
Amit Kapila
2015-11-15 18:07:57 -05:00
Tom Lane 7745bc352a Fix ruleutils.c's dumping of whole-row Vars in ROW() and VALUES() contexts.
Normally ruleutils prints a whole-row Var as "foo.*".  We already knew that
that doesn't work at top level of a SELECT list, because the parser would
treat the "*" as a directive to expand the reference into separate columns,
not a whole-row Var.  However, Joshua Yanovski points out in bug #13776
that the same thing happens at top level of a ROW() construct; and some
nosing around in the parser shows that the same is true in VALUES().
Hence, apply the same workaround already devised for the SELECT-list case,
namely to add a forced cast to the appropriate rowtype in these cases.
(The alternative of just printing "foo" was rejected because it is
difficult to avoid ambiguity against plain columns named "foo".)

Back-patch to all supported branches.
2015-11-15 14:41:09 -05:00
Tom Lane 7d9a4737c2 Improve type numeric's calculations for ln(), log(), exp(), pow().
Set the "rscales" for intermediate-result calculations to ensure that
suitable numbers of significant digits are maintained throughout.  The
previous coding hadn't thought this through in any detail, and as a result
could deliver results with many inaccurate digits, or in the worst cases
even fail with divide-by-zero errors as a result of losing all nonzero
digits of intermediate results.

In exp_var(), get rid entirely of the logic that separated the calculation
into integer and fractional parts: that was neither accurate nor
particularly fast.  The existing range-reduction method of dividing by 2^n
can be applied across the full input range instead of only 0..1, as long as
we are careful to set an appropriate rscale for each step.

Also fix the logic in mul_var() for shortening the calculation when the
caller asks for fewer output digits than an exact calculation would
require.  This bug doesn't affect simple multiplications since that code
path asks for an exact result, but it does contribute to accuracy issues
in the transcendental math functions.

In passing, improve performance of mul_var() a bit by forcing the shorter
input to be on the left, thus reducing the number of iterations of the
outer loop and probably also reducing the number of carry-propagation
steps needed.

This is arguably a bug fix, but in view of the lack of field complaints,
it does not seem worth the risk of back-patching.

Dean Rasheed
2015-11-14 14:55:46 -05:00
Bruce Momjian e57646e962 Fix spelling error in postgresql.conf
Report by Greg Clough
2015-11-14 14:00:17 -05:00
Bruce Momjian 025106e314 pg_upgrade: properly detect file copy failure on Windows
Previously, file copy failures were ignored on Windows due to an
incorrect return value check.

Report by Manu Joye

Backpatch through 9.1
2015-11-14 11:47:12 -05:00
Alvaro Herrera 83dec5a712 vacuumdb: don't prompt for passwords over and over
Having the script prompt for passwords over and over was a preexisting
problem when it processed multiple databases or when it processed
multiple analyze stages, but the parallel mode introduced in commit
a179232047 made it worse.

Fix the annoyance by keeping a copy of the password used by the first
connection that requires one.  Since users can (currently) only have a
single password, there's no need for more complex arrangements (such as
remembering one password per database).

Per bug #13741 reported by Eric Brown.  Patch authored and
cross-reviewed by Haribabu Kommi and Michael Paquier, slightly tweaked
by Álvaro Herrera.

Discussion: http://www.postgresql.org/message-id/20151027193919.931.54948@wrigleys.postgresql.org
Backpatch to 9.5, where parallel vacuumdb was introduced.
2015-11-12 18:05:23 -03:00
Robert Haas fe702a7b3f Move each SLRU's lwlocks to a separate tranche.
This makes it significantly easier to identify these lwlocks in
LWLOCK_STATS or Trace_lwlocks output.  It's also arguably better
from a modularity standpoint, since lwlock.c no longer needs to
know anything about the LWLock needs of the higher-level SLRU
facility.

Ildus Kurbangaliev, reviewd by Álvaro Herrera and by me.
2015-11-12 14:59:09 -05:00
Tom Lane c405918858 Fix unwanted flushing of libpq's input buffer when socket EOF is seen.
In commit 210eb9b743 I centralized libpq's logic for closing down
the backend communication socket, and made the new pqDropConnection
routine always reset the I/O buffers to empty.  Many of the call sites
previously had not had such code, and while that amounted to an oversight
in some cases, there was one place where it was intentional and necessary
*not* to flush the input buffer: pqReadData should never cause that to
happen, since we probably still want to process whatever data we read.

This is the true cause of the problem Robert was attempting to fix in
c3e7c24a1d, namely that libpq no longer reported the backend's final
ERROR message before reporting "server closed the connection unexpectedly".
But that only accidentally fixed it, by invoking parseInput before the
input buffer got flushed; and very likely there are timing scenarios
where we'd still lose the message before processing it.

To fix, pass a flag to pqDropConnection to tell it whether to flush the
input buffer or not.  On review I think flushing is actually correct for
every other call site.

Back-patch to 9.3 where the problem was introduced.  In HEAD, also improve
the comments added by c3e7c24a1d.
2015-11-12 13:03:52 -05:00
Robert Haas c3e7c24a1d libpq: Notice errors a backend may have sent just before dying.
At least since the introduction of Hot Standby, the backend has
sometimes sent fatal errors even when no client query was in
progress, assuming that the client would receive it.  However,
pqHandleSendFailure was not in sync with this assumption, and
only tries to catch notices and notifies.  Add a parseInput call
to the loop there to fix.

Andres Freund suggested the fix.  Comments are by me.
Reviewed by Michael Paquier.
2015-11-12 09:12:18 -05:00
Robert Haas ac1d7945f8 Make idle backends exit if the postmaster dies.
Letting backends continue to run if the postmaster has exited prevents
PostgreSQL from being restarted, which in many environments is
catastrophic.  Worse, if some other backend crashes, we no longer have
any protection against shared memory corruption.  So, arrange for them
to exit instead.  We don't want to expend many cycles on this, but
including postmaster death in the set of things that we wait for when
a backend is idle seems cheap enough.

Rajeev Rastogi and Robert Haas
2015-11-12 09:00:33 -05:00
Robert Haas a05dc4d7fd Provide readfuncs support for custom scans.
Commit a0d9f6e434 added this support for
all other plan node types; this fills in the gap.

Since TextOutCustomScan complicates this and is pretty well useless,
remove it.

KaiGai Kohei, with some modifications by me.
2015-11-12 07:40:31 -05:00
Tom Lane da3751c8ea Be more noisy about "wrong number of nailed relations" initfile problems.
In commit 5d1ff6bd55 I added some logic to
relcache.c to try to ensure that the regression tests would fail if we
made a mistake about which relations belong in the relcache init files.
I'm quite sure I tested that, but I must have done so only for the
non-shared-catalog case, because a report from Adam Brightwell showed that
the regression tests still pass just fine if we bollix the shared-catalog
init file in the way this code was supposed to catch.  The reason is that
that file gets loaded before we do client authentication, so the WARNING
is not sent to the client, only to the postmaster log, where it's far too
easily missed.

The least Rube Goldbergian answer to this is to put an Assert(false)
after the elog(WARNING).  That will certainly get developers' attention,
while not breaking production builds' ability to recover from corner
cases with similar symptoms.

Since this is only of interest to developers, there seems no need for
a back-patch, even though the previous commit went into all branches.
2015-11-11 13:39:21 -05:00
Robert Haas 80558c1f5a Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.

Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.

Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.

Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.

Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 09:02:52 -05:00
Robert Haas f0661c4e8c Make sequential scans parallel-aware.
In addition, this path fills in a number of missing bits and pieces in
the parallel infrastructure.  Paths and plans now have a parallel_aware
flag indicating whether whatever parallel-aware logic they have should
be engaged.  It is believed that we will need this flag for a number of
path/plan types, not just sequential scans, which is why the flag is
generic rather than part of the SeqScan structures specifically.
Also, execParallel.c now gives parallel nodes a chance to initialize
their PlanState nodes from the DSM during parallel worker startup.

Amit Kapila, with a fair amount of adjustment by me.  Review of previous
patch versions by Haribabu Kommi and others.
2015-11-11 08:57:52 -05:00
Robert Haas f764ecd81b Add outfuncs.c support for GatherPath.
I dunno how commit 3bd909b220 missed
this, but it evidently did.
2015-11-11 06:29:03 -05:00
Tom Lane b05ae27e9a Add missing "static" qualifier.
Per buildfarm member pademelon.
2015-11-10 18:24:18 -05:00
Robert Haas 5c90a2ffdd Comment update.
Adjust to account for 5fc4c26db5.

Etsuro Fujita
2015-11-09 13:48:50 -05:00
Robert Haas bf3d015631 Fix rebasing mistake in nodeGather.c
The patches committed as 6e71dd7ce9
and 3a1f8611f2 were developed in
parallel but dependent on each other in a way that I failed to
notice.

This patch to fix the problem was prepared by Amit Kapila.
2015-11-09 10:49:24 -05:00
Robert Haas 89ff5c7f75 Add a dummy return statement to TupleQueueRemap.
This is unreachable for multiple reasons, but per Amit Kapila the
Windows compiler he is using still thinks we can get there.
2015-11-09 10:45:32 -05:00
Andres Freund f3a764b0da Set replication origin when decoding commit records.
By accident the replication origin was not set properly in
DecodeCommit(). That's bad because the origin is passed to the output
plugins origin filter, and accessible from the output plugin via
ReorderBufferTXN->origin_id.  Accessing the origin of individual changes
worked before the fix, which is why this wasn't notices earlier.

Reported-By: Craig Ringer
Author: Craig Ringer
Discussion: CAMsr+YFhBJLp=qfSz3-J+0P1zLkE8zNXM2otycn20QRMx380gw@mail.gmail.com
Backpatch: 9.5, where replication origins where introduced
2015-11-09 00:03:35 +01:00
Noah Misch fed19f312c Don't connect() to a wildcard address in test_postmaster_connection().
At least OpenBSD, NetBSD, and Windows don't support it.  This repairs
pg_ctl for listen_addresses='0.0.0.0' and listen_addresses='::'.  Since
pg_ctl prefers to test a Unix-domain socket, Windows users are most
likely to need this change.  Back-patch to 9.1 (all supported versions).
This could change pg_ctl interaction with loopback-interface firewall
rules.  Therefore, in 9.4 and earlier (released branches), activate the
change only on known-affected platforms.

Reported (bug #13611) and designed by Kondo Yuta.
2015-11-08 17:28:53 -05:00
Robert Haas fba60e573e Remove set-but-not-used variables.
Reported by both Peter Eisentraunt and Kevin Grittner.
2015-11-07 20:25:32 -05:00
Tom Lane c5e86ea932 Add "xid <> xid" and "xid <> int4" operators.
The corresponding "=" operators have been there a long time, and not
having their negators is a bit of a nuisance.

Michael Paquier
2015-11-07 16:40:15 -05:00
Tom Lane 9042f58342 Rename PQsslAttributes() to PQsslAttributeNames(), and const-ify fully.
Per discussion, the original name was a bit misleading, and
PQsslAttributeNames() seems more apropos.  It's not quite too late to
change this in 9.5, so let's change it while we can.

Also, make sure that the pointer array is const, not only the pointed-to
strings.

Minor documentation wordsmithing while at it.

Lars Kanis, slight adjustments by me
2015-11-07 16:13:49 -05:00
Tom Lane a43b4ab111 Fix enforcement of restrictions inside regexp lookaround constraints.
Lookahead and lookbehind constraints aren't allowed to contain backrefs,
and parentheses within them are always considered non-capturing.  Or so
says the manual.  But the regexp parser forgot about these rules once
inside a parenthesized subexpression, so that constructs like (\w)(?=(\1))
were accepted (but then not correctly executed --- a case like this acted
like (\w)(?=\w), without any enforcement that the two \w's match the same
text).  And in (?=((foo))) the innermost parentheses would be counted as
capturing parentheses, though no text would ever be captured for them.

To fix, properly pass down the "type" argument to the recursive invocation
of parse().

Back-patch to all supported branches; it was agreed that silent
misexecution of such patterns is worse than throwing an error, even though
new errors in minor releases are generally not desirable.
2015-11-07 12:43:24 -05:00
Robert Haas 8d7396e509 Try to convince gcc that TupleQueueRemap never falls off the end.
Without this, MacOS gcc version 4.2.1 isn't convinced.
2015-11-06 23:04:21 -05:00
Robert Haas af9773cf4c When completing ALTER INDEX .. SET, add an equals sign also.
Jeff Janes
2015-11-06 22:59:47 -05:00
Robert Haas 6e71dd7ce9 Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend.  Transferring such tuples without modification between two
cooperating backends does not work.  This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves.  The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender.  That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.

Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues.  This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was.  typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
Robert Haas cbb82e370d Remove unnecessary cast in previous commit.
Noted by Kyotaro Horiguchi, who also reviewed the previous patch, but
I failed to notice his review before committing.
2015-11-06 12:17:31 -05:00
Robert Haas a76ef15d9f Add sort support routine for the UUID data type.
This introduces a simple encoding scheme to produce abbreviated keys:
pack as many bytes of each UUID as will fit into a Datum.  On
little-endian machines, a byteswap is also performed; the abbreviated
comparator can therefore just consist of a simple 3-way unsigned integer
comparison.

The purpose of this change is to speed up sorting data on a column
of type UUID.

Peter Geoghegan
2015-11-06 12:14:35 -05:00
Stephen Frost 5644419b3d Set include_realm=1 default in parse_hba_line
With include_realm=1 being set down in parse_hba_auth_opt, if multiple
options are passed on the pg_hba line, such as:

host all     all    0.0.0.0/0    gss include_realm=0 krb_realm=XYZ.COM

We would mistakenly reset include_realm back to 1.  Instead, we need to
set include_realm=1 up in parse_hba_line, prior to parsing any of the
additional options.

Discovered by Jeff McCormick during testing.

Bug introduced by 9a08841.

Back-patch to 9.5
2015-11-06 11:18:27 -05:00
Robert Haas 8a1fab36ab pg_size_pretty: Format negative values similar to positive ones.
Previously, negative values were always displayed in bytes, regardless
of how large they were.

Adrian Vondendriesch, reviewed by Julien Rouhaud and myself
2015-11-06 11:03:02 -05:00
Tom Lane b23af45875 Fix erroneous hash calculations in gin_extract_jsonb_path().
The jsonb_path_ops code calculated hash values inconsistently in some cases
involving nested arrays and objects.  This would result in queries possibly
not finding entries that they should find, when using a jsonb_path_ops GIN
index for the search.  The problem cases involve JSONB values that contain
both scalars and sub-objects at the same nesting level, for example an
array containing both scalars and sub-arrays.  To fix, reset the current
stack->hash after processing each value or sub-object, not before; and
don't try to be cute about the outermost level's initial hash.

Correcting this means that existing jsonb_path_ops indexes may now be
inconsistent with the new hash calculation code.  The symptom is the same
--- searches not finding entries they should find --- but the specific
rows affected are likely to be different.  Users will need to REINDEX
jsonb_path_ops indexes to make sure that all searches work as expected.

Per bug #13756 from Daniel Cheng.  Back-patch to 9.4 where the faulty
logic was introduced.
2015-11-05 18:15:48 -05:00
Tom Lane 8c75ad436f Fix memory leaks in PL/Python.
Previously, plpython was in the habit of allocating a lot of stuff in
TopMemoryContext, and it was very slipshod about making sure that stuff
got cleaned up; in particular, use of TopMemoryContext as fn_mcxt for
function calls represents an unfixable leak, since we generally don't
know what the called function might have allocated in fn_mcxt.  This
results in session-lifespan leakage in certain usage scenarios, as for
example in a case reported by Ed Behn back in July.

To fix, get rid of all the retail allocations in TopMemoryContext.
All long-lived allocations are now made in sub-contexts that are
associated with specific objects (either pl/python procedures, or
Python-visible objects such as cursors and plans).  We can clean these
up when the associated object is deleted.

I went so far as to get rid of PLy_malloc completely.  There were a
couple of places where it could still have been used safely, but on
the whole it was just an invitation to bad coding.

Haribabu Kommi, based on a draft patch by Heikki Linnakangas;
some further work by me
2015-11-05 13:52:40 -05:00
Robert Haas 64b2e7ad91 Pass extra data to bgworkers, and use this to fix parallel contexts.
Up until now, the total amount of data that could be passed to a
background worker at startup was one datum, which can be a small as
4 bytes on some systems.  That's enough to pass a dsm_handle or an
array index, but not much else.  Add a bgw_extra flag to the
BackgroundWorker struct, allowing up to 128 bytes to be passed to
a new worker on any platform.

Use this to fix a problem I recently discovered with the parallel
context machinery added in 9.5: the master assigns each worker an
array index, and each worker subsequently assigns itself an array
index, and there's nothing to guarantee that the two sets of indexes
match, leading to chaos.

Normally, I would not back-patch the change to add bgw_extra, since it
is basically a feature addition.  However, since 9.5 is still in beta
and there seems to be no other sensible way to repair the broken
parallel context machinery, back-patch to 9.5.  Existing background
worker code can ignore the bgw_extra field without a problem, but
might need to be recompiled since the structure size has changed.

Report and patch by me.  Review by Amit Kapila.
2015-11-05 12:13:56 -05:00
Tom Lane 59464bd6f9 Improve implementation of GEQO's init_tour() function.
Rather than filling a temporary array and then copying values to the
output array, we can generate the required random permutation in-place
using the Fisher-Yates shuffle algorithm.  This is shorter as well as
more efficient than before.  It's pretty unlikely that anyone would
notice a speed improvement, but shorter code is better.

Nathan Wagner, edited a bit by me
2015-11-05 10:46:14 -05:00
Peter Eisentraut 7bd099d511 Update spelling of COPY options
The preferred spelling was changed from FORCE QUOTE to FORCE_QUOTE and
the like, but some code was still referring to the old spellings.
2015-11-04 21:01:26 -05:00
Tom Lane d894941663 Allow postgres_fdw to ship extension funcs/operators for remote execution.
The user can whitelist specified extension(s) in the foreign server's
options, whereupon we will treat immutable functions and operators of those
extensions as candidates to be sent for remote execution.

Whitelisting an extension in this way basically promises that the extension
exists on the remote server and behaves compatibly with the local instance.
We have no way to prove that formally, so we have to rely on the user to
get it right.  But this seems like something that people can usually get
right in practice.

We might in future allow functions and operators to be whitelisted
individually, but extension granularity is a very convenient special case,
so it got done first.

The patch as-committed lacks any regression tests, which is unfortunate,
but introducing dependencies on other extensions for testing purposes
would break "make installcheck" scenarios, which is worse.  I have some
ideas about klugy ways around that, but it seems like material for a
separate patch.  For the moment, leave the problem open.

Paul Ramsey, hacked up a bit more by me
2015-11-03 18:42:18 -05:00
Robert Haas ee44cb7566 Improve comments about abbreviation abort.
Peter Geoghegan
2015-11-03 14:11:49 -05:00
Tom Lane a69b0b2c14 Code + docs review for unicode linestyle patch.
Fix some brain fade in commit a2dabf0e1dda93c8: erroneous variable names
in docs, rearrangements that made sentences less clear not more so,
undocumented and poorly-chosen-anyway API behaviors of subroutines,
bad grammar in error messages, copy-and-paste faults.

Albe Laurenz and Tom Lane
2015-11-03 11:49:21 -05:00
Robert Haas 4efe26cbd3 shm_mq: Third attempt at fixing nowait behavior in shm_mq_receive.
Commit a1480ec1d3 purported to fix the
problems with commit b2ccb5f4e6, but it
didn't completely fix them.  The problem is that the checks were
performed in the wrong order, leading to a race condition.  If the
sender attached, sent a message, and detached after the receiver
called shm_mq_get_sender and before the receiver called
shm_mq_counterparty_gone, we'd incorrectly return SHM_MQ_DETACHED
before all messages were read.  Repair by reversing the order of
operations, and add a long comment explaining why this new logic is
(hopefully) correct.
2015-11-03 09:12:52 -05:00
Robert Haas 0279f62fdc Correct tiny inaccuracy in strxfrm cache comment.
Peter Geoghegan
2015-11-03 08:32:22 -05:00
Tom Lane 620ac88d6f Remove some more dead Alpha-specific code. 2015-11-02 19:37:51 -05:00
Robert Haas 1efc7e5382 Fix problems with ParamListInfo serialization mechanism.
Commit d1b7c1ffe7 introduced a mechanism
for serializing a ParamListInfo structure to be passed to a parallel
worker.  However, this mechanism failed to handle external expanded
values, as pointed out by Noah Misch.  Repair.

Moreover, plpgsql_param_fetch requires adjustment because the
serialization mechanism needs it to skip evaluating unused parameters
just as we would do when it is called from copyParamList, but params
== estate->paramLI in that case.  To fix, make the bms_is_member test
in that function unconditional.

Finally, have setup_param_list set a new ParamListInfo field,
paramMask, to the parameters actually used in the expression, so that
we don't try to fetch those that are not needed when serializing a
parameter list.  This isn't necessary for correctness, but it makes
the performance of the parallel executor code comparable to what we
do for cases involving cursors.

Design suggestions and extensive review by Noah Misch.  Patch by me.
2015-11-02 18:11:29 -05:00
Kevin Grittner 585e2a3b1a Fix serialization anomalies due to race conditions on INSERT.
On insert the CheckForSerializableConflictIn() test was performed
before the page(s) which were going to be modified had been locked
(with an exclusive buffer content lock).  If another process
acquired a relation SIReadLock on the heap and scanned to a page on
which an insert was going to occur before the page was so locked,
a rw-conflict would be missed, which could allow a serialization
anomaly to be missed.  The window between the check and the page
lock was small, so the bug was generally not noticed unless there
was high concurrency with multiple processes inserting into the
same table.

This was reported by Peter Bailis as bug #11732, by Sean Chittenden
as bug #13667, and by others.

The race condition was eliminated in heap_insert() by moving the
check down below the acquisition of the buffer lock, which had been
the very next statement.  Because of the loop locking and unlocking
multiple buffers in heap_multi_insert() a check was added after all
inserts were completed.  The check before the start of the inserts
was left because it might avoid a large amount of work to detect a
serialization anomaly before performing the all of the inserts and
the related WAL logging.

While investigating this bug, other SSI bugs which were even harder
to hit in practice were noticed and fixed, an unnecessary check
(covered by another check, so redundant) was removed from
heap_update(), and comments were improved.

Back-patch to all supported branches.

Kevin Grittner and Thomas Munro
2015-10-31 14:43:34 -05:00
Tom Lane 12c9a04008 Implement lookbehind constraints in our regular-expression engine.
A lookbehind constraint is like a lookahead constraint in that it consumes
no text; but it checks for existence (or nonexistence) of a match *ending*
at the current point in the string, rather than one *starting* at the
current point.  This is a long-requested feature since it exists in many
other regex libraries, but Henry Spencer had never got around to
implementing it in the code we use.

Just making it work is actually pretty trivial; but naive copying of the
logic for lookahead constraints leads to code that often spends O(N^2) time
to scan an N-character string, because we have to run the match engine
from string start to the current probe point each time the constraint is
checked.  In typical use-cases a lookbehind constraint will be written at
the start of the regex and hence will need to be checked at every character
--- so O(N^2) work overall.  To fix that, I introduced a third copy of the
core DFA matching loop, paralleling the existing longest() and shortest()
loops.  This version, matchuntil(), can suspend and resume matching given
a couple of pointers' worth of storage space.  So we need only run it
across the string once, stopping at each interesting probe point and then
resuming to advance to the next one.

I also put in an optimization that simplifies one-character lookahead and
lookbehind constraints, such as "(?=x)" or "(?<!\w)", into AHEAD and BEHIND
constraints, which already existed in the engine.  This avoids the overhead
of the LACON machinery entirely for these rather common cases.

The net result is that lookbehind constraints run a factor of three or so
slower than Perl's for multi-character constraints, but faster than Perl's
for one-character constraints ... and they work fine for variable-length
constraints, which Perl gives up on entirely.  So that's not bad from a
competitive perspective, and there's room for further optimization if
anyone cares.  (In reality, raw scan rate across a large input string is
probably not that big a deal for Postgres usage anyway; so I'm happy if
it's linear.)
2015-10-30 19:14:19 -04:00
Robert Haas 3a1f8611f2 Update parallel executor support to reuse the same DSM.
Commit b0b0d84b3d purported to make it
possible to relaunch workers using the same parallel context, but it had
an unpleasant race condition: we might reinitialize after the workers
have sent their last control message but before they have dettached the
DSM, leaving to crashes.  Repair by introducing a new ParallelContext
operation, ReinitializeParallelDSM.

Adjust execParallel.c to use this new support, so that we can rescan a
Gather node by relaunching workers but without needing to recreate the
DSM.

Amit Kapila, with some adjustments by me.  Extracted from latest parallel
sequential scan patch.
2015-10-30 10:44:54 +01:00
Robert Haas c6baec92fc Fix typo in bgworker.c 2015-10-30 10:35:33 +01:00
Peter Eisentraut c5130e8ee0 Remove some remains from Alpha support removal 2015-10-29 16:40:14 -04:00
Peter Eisentraut a8d585c091 Message style improvements
Message style, plurals, quoting, spelling, consistency with similar
messages
2015-10-28 20:38:36 -04:00