postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	c3e7c24a1d	libpq: Notice errors a backend may have sent just before dying. At least since the introduction of Hot Standby, the backend has sometimes sent fatal errors even when no client query was in progress, assuming that the client would receive it. However, pqHandleSendFailure was not in sync with this assumption, and only tries to catch notices and notifies. Add a parseInput call to the loop there to fix. Andres Freund suggested the fix. Comments are by me. Reviewed by Michael Paquier.	2015-11-12 09:12:18 -05:00
Robert Haas	ac1d7945f8	Make idle backends exit if the postmaster dies. Letting backends continue to run if the postmaster has exited prevents PostgreSQL from being restarted, which in many environments is catastrophic. Worse, if some other backend crashes, we no longer have any protection against shared memory corruption. So, arrange for them to exit instead. We don't want to expend many cycles on this, but including postmaster death in the set of things that we wait for when a backend is idle seems cheap enough. Rajeev Rastogi and Robert Haas	2015-11-12 09:00:33 -05:00
Robert Haas	a05dc4d7fd	Provide readfuncs support for custom scans. Commit `a0d9f6e434` added this support for all other plan node types; this fills in the gap. Since TextOutCustomScan complicates this and is pretty well useless, remove it. KaiGai Kohei, with some modifications by me.	2015-11-12 07:40:31 -05:00
Tom Lane	39b9978d9c	Do a round of copy-editing on the 9.5 release notes. Also fill in the previously empty "major enhancements" list. YMMV as to which items should make the cut, but it's past time we had something more than a placeholder here. (I meant to get this done before beta2 was wrapped, but got distracted by PDF build problems. Better late than never.)	2015-11-11 19:19:14 -05:00
Tom Lane	6404751ce9	Improve documentation around autovacuum-related storage parameters. These were discussed in three different sections of the manual, which unsurprisingly had diverged over time; and the descriptions of individual variables lacked stylistic consistency even within each section (and frequently weren't in very good English anyway). Clean up the mess, and remove some of the redundant information in hopes that future additions will be less likely to re-introduce inconsistency. For instance I see no need for maintenance.sgml to include its very own list of all the autovacuum storage parameters, especially since that list was already incomplete.	2015-11-11 17:13:38 -05:00
Tom Lane	da3751c8ea	Be more noisy about "wrong number of nailed relations" initfile problems. In commit `5d1ff6bd55` I added some logic to relcache.c to try to ensure that the regression tests would fail if we made a mistake about which relations belong in the relcache init files. I'm quite sure I tested that, but I must have done so only for the non-shared-catalog case, because a report from Adam Brightwell showed that the regression tests still pass just fine if we bollix the shared-catalog init file in the way this code was supposed to catch. The reason is that that file gets loaded before we do client authentication, so the WARNING is not sent to the client, only to the postmaster log, where it's far too easily missed. The least Rube Goldbergian answer to this is to put an Assert(false) after the elog(WARNING). That will certainly get developers' attention, while not breaking production builds' ability to recover from corner cases with similar symptoms. Since this is only of interest to developers, there seems no need for a back-patch, even though the previous commit went into all branches.	2015-11-11 13:39:21 -05:00
Robert Haas	80558c1f5a	Generate parallel sequential scan plans in simple cases. Add a new flag, consider_parallel, to each RelOptInfo, indicating whether a plan for that relation could conceivably be run inside of a parallel worker. Right now, we're pretty conservative: for example, it might be possible to defer applying a parallel-restricted qual in a worker, and later do it in the leader, but right now we just don't try to parallelize access to that relation. That's probably the right decision in most cases, anyway. Using the new flag, generate parallel sequential scan plans for plain baserels, meaning that we now have parallel sequential scan in PostgreSQL. The logic here is pretty unsophisticated right now: the costing model probably isn't right in detail, and we can't push joins beneath Gather nodes, so the number of plans that can actually benefit from this is pretty limited right now. Lots more work is needed. Nevertheless, it seems time to enable this functionality so that all this code can actually be tested easily by users and developers. Note that, if you wish to test this functionality, it will be necessary to set max_parallel_degree to a value greater than the default of 0. Once a few more loose ends have been tidied up here, we might want to consider changing the default value of this GUC, but I'm leaving it alone for now. Along the way, fix a bug in cost_gather: the previous coding thought that a Gather node's transfer overhead should be costed on the basis of the relation size rather than the number of tuples that actually need to be passed off to the leader. Patch by me, reviewed in earlier versions by Amit Kapila.	2015-11-11 09:02:52 -05:00
Robert Haas	f0661c4e8c	Make sequential scans parallel-aware. In addition, this path fills in a number of missing bits and pieces in the parallel infrastructure. Paths and plans now have a parallel_aware flag indicating whether whatever parallel-aware logic they have should be engaged. It is believed that we will need this flag for a number of path/plan types, not just sequential scans, which is why the flag is generic rather than part of the SeqScan structures specifically. Also, execParallel.c now gives parallel nodes a chance to initialize their PlanState nodes from the DSM during parallel worker startup. Amit Kapila, with a fair amount of adjustment by me. Review of previous patch versions by Haribabu Kommi and others.	2015-11-11 08:57:52 -05:00
Robert Haas	f764ecd81b	Add outfuncs.c support for GatherPath. I dunno how commit `3bd909b220` missed this, but it evidently did.	2015-11-11 06:29:03 -05:00
Tom Lane	7b6fb76349	Docs: fix misleading example. Commit `8457d0beca` introduced an example which, while not incorrect, failed to exhibit the behavior it meant to describe, as a result of omitting an E'' prefix that needed to be there. Noticed and fixed by Peter Geoghegan. I (tgl) failed to resist the temptation to wordsmith nearby text a bit while at it.	2015-11-10 22:11:39 -05:00
Tom Lane	b05ae27e9a	Add missing "static" qualifier. Per buildfarm member pademelon.	2015-11-10 18:24:18 -05:00
Tom Lane	944b41fc00	Improve our workaround for 'TeX capacity exceeded' in building PDF files. In commit `a5ec86a7c7` I wrote a quick hack that reduced the number of TeX string pool entries created while converting our documentation to PDF form. That held the fort for awhile, but as of HEAD we're back up against the same limitation. It turns out that the original coding of \FlowObjectSetup actually results in three string pool entries being generated for every "flow object" (that is, potential cross-reference target) in the documentation, and my previous hack only got rid of one of them. With a little more care, we can reduce the string count to one per flow object plus one per actually-cross-referenced flow object (about 115000 + 5000 as of current HEAD); that should work until the documentation volume roughly doubles from where it is today. As a not-incidental side benefit, this change also causes pdfjadetex to stop emitting unreferenced hyperlink anchors (bookmarks) into the PDF file. It had been making one willy-nilly for every flow object; now it's just one per actually-cross-referenced object. This results in close to a 2X savings in PDF file size. We will still want to run the output through "jpdftweak" to get it to be compressed; but we no longer need removal of unreferenced bookmarks, so we might be able to find a quicker tool for that step. Although the failure only affects HEAD and US-format output at the moment, 9.5 cannot be more than a few pages short of failing likewise, so it will inevitably fail after a few rounds of minor-version release notes. I don't have a lot of faith that we'll never hit the limit in the older branches; and anyway it would be nice to get rid of jpdftweak across the board. Therefore, back-patch to all supported branches.	2015-11-10 15:59:59 -05:00
Robert Haas	5c90a2ffdd	Comment update. Adjust to account for `5fc4c26db5`. Etsuro Fujita	2015-11-09 13:48:50 -05:00
Robert Haas	bf3d015631	Fix rebasing mistake in nodeGather.c The patches committed as `6e71dd7ce9` and `3a1f8611f2` were developed in parallel but dependent on each other in a way that I failed to notice. This patch to fix the problem was prepared by Amit Kapila.	2015-11-09 10:49:24 -05:00
Robert Haas	89ff5c7f75	Add a dummy return statement to TupleQueueRemap. This is unreachable for multiple reasons, but per Amit Kapila the Windows compiler he is using still thinks we can get there.	2015-11-09 10:45:32 -05:00
Andres Freund	c31f1dc559	Add paragraph about ON CONFLICT interaction with partitioning. Author: Peter Geoghegan and Andres Freund Discussion: CAM3SWZScpWzQ-7EJC77vwqzZ1GO8GNmURQ1QqDQ3wRn7AbW1Cg@mail.gmail.com, CAHGQGwFUCWwSU7dtc2aRdRk73ztyr_jY5cPOyts+K8xKJ92X4Q@mail.gmail.com Backpatch: 9.5, where UPSERT was introduced	2015-11-09 06:57:21 +01:00
Andres Freund	f3a764b0da	Set replication origin when decoding commit records. By accident the replication origin was not set properly in DecodeCommit(). That's bad because the origin is passed to the output plugins origin filter, and accessible from the output plugin via ReorderBufferTXN->origin_id. Accessing the origin of individual changes worked before the fix, which is why this wasn't notices earlier. Reported-By: Craig Ringer Author: Craig Ringer Discussion: CAMsr+YFhBJLp=qfSz3-J+0P1zLkE8zNXM2otycn20QRMx380gw@mail.gmail.com Backpatch: 9.5, where replication origins where introduced	2015-11-09 00:03:35 +01:00
Noah Misch	fed19f312c	Don't connect() to a wildcard address in test_postmaster_connection(). At least OpenBSD, NetBSD, and Windows don't support it. This repairs pg_ctl for listen_addresses='0.0.0.0' and listen_addresses='::'. Since pg_ctl prefers to test a Unix-domain socket, Windows users are most likely to need this change. Back-patch to 9.1 (all supported versions). This could change pg_ctl interaction with loopback-interface firewall rules. Therefore, in 9.4 and earlier (released branches), activate the change only on known-affected platforms. Reported (bug #13611) and designed by Kondo Yuta.	2015-11-08 17:28:53 -05:00
Robert Haas	fba60e573e	Remove set-but-not-used variables. Reported by both Peter Eisentraunt and Kevin Grittner.	2015-11-07 20:25:32 -05:00
Tom Lane	ad9fad7b68	Update 9.5 release notes through today.	2015-11-07 17:09:04 -05:00
Tom Lane	c5e86ea932	Add "xid <> xid" and "xid <> int4" operators. The corresponding "=" operators have been there a long time, and not having their negators is a bit of a nuisance. Michael Paquier	2015-11-07 16:40:15 -05:00
Tom Lane	9042f58342	Rename PQsslAttributes() to PQsslAttributeNames(), and const-ify fully. Per discussion, the original name was a bit misleading, and PQsslAttributeNames() seems more apropos. It's not quite too late to change this in 9.5, so let's change it while we can. Also, make sure that the pointer array is const, not only the pointed-to strings. Minor documentation wordsmithing while at it. Lars Kanis, slight adjustments by me	2015-11-07 16:13:49 -05:00
Tom Lane	a43b4ab111	Fix enforcement of restrictions inside regexp lookaround constraints. Lookahead and lookbehind constraints aren't allowed to contain backrefs, and parentheses within them are always considered non-capturing. Or so says the manual. But the regexp parser forgot about these rules once inside a parenthesized subexpression, so that constructs like (\w)(?=(\1)) were accepted (but then not correctly executed --- a case like this acted like (\w)(?=\w), without any enforcement that the two \w's match the same text). And in (?=((foo))) the innermost parentheses would be counted as capturing parentheses, though no text would ever be captured for them. To fix, properly pass down the "type" argument to the recursive invocation of parse(). Back-patch to all supported branches; it was agreed that silent misexecution of such patterns is worse than throwing an error, even though new errors in minor releases are generally not desirable.	2015-11-07 12:43:24 -05:00
Robert Haas	8d7396e509	Try to convince gcc that TupleQueueRemap never falls off the end. Without this, MacOS gcc version 4.2.1 isn't convinced.	2015-11-06 23:04:21 -05:00
Robert Haas	af9773cf4c	When completing ALTER INDEX .. SET, add an equals sign also. Jeff Janes	2015-11-06 22:59:47 -05:00
Robert Haas	6e71dd7ce9	Modify tqueue infrastructure to support transient record types. Commit `4a4e6893aa`, which introduced this mechanism, failed to account for the fact that the RECORD pseudo-type uses transient typmods that are only meaningful within a single backend. Transferring such tuples without modification between two cooperating backends does not work. This commit installs a system for passing the tuple descriptors over the same shm_mq being used to send the tuples themselves. The two sides might not assign the same transient typmod to any given tuple descriptor, so we must also substitute the appropriate receiver-side typmod for the one used by the sender. That adds some CPU overhead, but still seems better than being unable to pass records between cooperating parallel processes. Along the way, move the logic for handling multiple tuple queues from tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader, which reads from a single queue, rather than a TupleQueueFunnel, which potentially reads from multiple queues. This change was suggested previously as a way to make sure that nodeGather.c rather than tqueue.c had policy control over the order in which to read from queues, but it wasn't clear to me until now how good an idea it was. typmod mapping needs to be performed separately for each queue, and it is much simpler if the tqueue.c code handles that and leaves multiplexing multiple queues to higher layers of the stack.	2015-11-06 16:58:45 -05:00
Robert Haas	cbb82e370d	Remove unnecessary cast in previous commit. Noted by Kyotaro Horiguchi, who also reviewed the previous patch, but I failed to notice his review before committing.	2015-11-06 12:17:31 -05:00
Robert Haas	a76ef15d9f	Add sort support routine for the UUID data type. This introduces a simple encoding scheme to produce abbreviated keys: pack as many bytes of each UUID as will fit into a Datum. On little-endian machines, a byteswap is also performed; the abbreviated comparator can therefore just consist of a simple 3-way unsigned integer comparison. The purpose of this change is to speed up sorting data on a column of type UUID. Peter Geoghegan	2015-11-06 12:14:35 -05:00
Stephen Frost	5644419b3d	Set include_realm=1 default in parse_hba_line With include_realm=1 being set down in parse_hba_auth_opt, if multiple options are passed on the pg_hba line, such as: host all all 0.0.0.0/0 gss include_realm=0 krb_realm=XYZ.COM We would mistakenly reset include_realm back to 1. Instead, we need to set include_realm=1 up in parse_hba_line, prior to parsing any of the additional options. Discovered by Jeff McCormick during testing. Bug introduced by `9a08841`. Back-patch to 9.5	2015-11-06 11:18:27 -05:00
Robert Haas	8a1fab36ab	pg_size_pretty: Format negative values similar to positive ones. Previously, negative values were always displayed in bytes, regardless of how large they were. Adrian Vondendriesch, reviewed by Julien Rouhaud and myself	2015-11-06 11:03:02 -05:00
Robert Haas	dde5f09fad	Document interaction of bgworkers with LISTEN/NOTIFY. Thomas Munro and Robert Haas, reviewed by Haribabu Kommi	2015-11-06 00:31:46 -05:00
Tom Lane	b23af45875	Fix erroneous hash calculations in gin_extract_jsonb_path(). The jsonb_path_ops code calculated hash values inconsistently in some cases involving nested arrays and objects. This would result in queries possibly not finding entries that they should find, when using a jsonb_path_ops GIN index for the search. The problem cases involve JSONB values that contain both scalars and sub-objects at the same nesting level, for example an array containing both scalars and sub-arrays. To fix, reset the current stack->hash after processing each value or sub-object, not before; and don't try to be cute about the outermost level's initial hash. Correcting this means that existing jsonb_path_ops indexes may now be inconsistent with the new hash calculation code. The symptom is the same --- searches not finding entries they should find --- but the specific rows affected are likely to be different. Users will need to REINDEX jsonb_path_ops indexes to make sure that all searches work as expected. Per bug #13756 from Daniel Cheng. Back-patch to 9.4 where the faulty logic was introduced.	2015-11-05 18:15:48 -05:00
Tom Lane	8c75ad436f	Fix memory leaks in PL/Python. Previously, plpython was in the habit of allocating a lot of stuff in TopMemoryContext, and it was very slipshod about making sure that stuff got cleaned up; in particular, use of TopMemoryContext as fn_mcxt for function calls represents an unfixable leak, since we generally don't know what the called function might have allocated in fn_mcxt. This results in session-lifespan leakage in certain usage scenarios, as for example in a case reported by Ed Behn back in July. To fix, get rid of all the retail allocations in TopMemoryContext. All long-lived allocations are now made in sub-contexts that are associated with specific objects (either pl/python procedures, or Python-visible objects such as cursors and plans). We can clean these up when the associated object is deleted. I went so far as to get rid of PLy_malloc completely. There were a couple of places where it could still have been used safely, but on the whole it was just an invitation to bad coding. Haribabu Kommi, based on a draft patch by Heikki Linnakangas; some further work by me	2015-11-05 13:52:40 -05:00
Robert Haas	64b2e7ad91	Pass extra data to bgworkers, and use this to fix parallel contexts. Up until now, the total amount of data that could be passed to a background worker at startup was one datum, which can be a small as 4 bytes on some systems. That's enough to pass a dsm_handle or an array index, but not much else. Add a bgw_extra flag to the BackgroundWorker struct, allowing up to 128 bytes to be passed to a new worker on any platform. Use this to fix a problem I recently discovered with the parallel context machinery added in 9.5: the master assigns each worker an array index, and each worker subsequently assigns itself an array index, and there's nothing to guarantee that the two sets of indexes match, leading to chaos. Normally, I would not back-patch the change to add bgw_extra, since it is basically a feature addition. However, since 9.5 is still in beta and there seems to be no other sensible way to repair the broken parallel context machinery, back-patch to 9.5. Existing background worker code can ignore the bgw_extra field without a problem, but might need to be recompiled since the structure size has changed. Report and patch by me. Review by Amit Kapila.	2015-11-05 12:13:56 -05:00
Tom Lane	59464bd6f9	Improve implementation of GEQO's init_tour() function. Rather than filling a temporary array and then copying values to the output array, we can generate the required random permutation in-place using the Fisher-Yates shuffle algorithm. This is shorter as well as more efficient than before. It's pretty unlikely that anyone would notice a speed improvement, but shorter code is better. Nathan Wagner, edited a bit by me	2015-11-05 10:46:14 -05:00
Peter Eisentraut	7bd099d511	Update spelling of COPY options The preferred spelling was changed from FORCE QUOTE to FORCE_QUOTE and the like, but some code was still referring to the old spellings.	2015-11-04 21:01:26 -05:00
Tom Lane	b9f117d6cd	Add regression tests for remote execution of extension operators/functions. Rather than relying on other extensions to be available for installation, let's just add some test objects to the postgres_fdw extension itself within the regression script.	2015-11-04 12:03:30 -05:00
Tom Lane	d894941663	Allow postgres_fdw to ship extension funcs/operators for remote execution. The user can whitelist specified extension(s) in the foreign server's options, whereupon we will treat immutable functions and operators of those extensions as candidates to be sent for remote execution. Whitelisting an extension in this way basically promises that the extension exists on the remote server and behaves compatibly with the local instance. We have no way to prove that formally, so we have to rely on the user to get it right. But this seems like something that people can usually get right in practice. We might in future allow functions and operators to be whitelisted individually, but extension granularity is a very convenient special case, so it got done first. The patch as-committed lacks any regression tests, which is unfortunate, but introducing dependencies on other extensions for testing purposes would break "make installcheck" scenarios, which is worse. I have some ideas about klugy ways around that, but it seems like material for a separate patch. For the moment, leave the problem open. Paul Ramsey, hacked up a bit more by me	2015-11-03 18:42:18 -05:00
Robert Haas	ee44cb7566	Improve comments about abbreviation abort. Peter Geoghegan	2015-11-03 14:11:49 -05:00
Robert Haas	f18c944b61	postgres_fdw: Add ORDER BY to some remote SQL queries. If the join problem's entire ORDER BY clause can be pushed to the remote server, consider a path that adds this ORDER BY clause. If use_remote_estimate is on, we cost this path using an additional remote EXPLAIN. If not, we just estimate that the path costs 20% more, which is intended to be large enough that we won't request a remote sort when it's not helpful, but small enough that we'll have the remote side do the sort when in doubt. In some cases, the remote sort might actually be free, because the remote query plan might happen to produce output that is ordered the way we need, but without remote estimates we have no way of knowing that. It might also be useful to request sorted output from the remote side if it enables an efficient merge join, but this patch doesn't attempt to handle that case. Ashutosh Bapat with revisions by me. Also reviewed by Fabrízio de Royes Mello and Jeevan Chalke.	2015-11-03 13:04:42 -05:00
Tom Lane	fc0b893521	Remove obsolete advice about doubling backslashes in regex escapes. Standard-conforming literals have been the default for long enough that it no longer seems necessary to go out of our way to tell people to write regex escapes illegibly.	2015-11-03 11:57:56 -05:00
Tom Lane	a69b0b2c14	Code + docs review for unicode linestyle patch. Fix some brain fade in commit a2dabf0e1dda93c8: erroneous variable names in docs, rearrangements that made sentences less clear not more so, undocumented and poorly-chosen-anyway API behaviors of subroutines, bad grammar in error messages, copy-and-paste faults. Albe Laurenz and Tom Lane	2015-11-03 11:49:21 -05:00
Robert Haas	4efe26cbd3	shm_mq: Third attempt at fixing nowait behavior in shm_mq_receive. Commit `a1480ec1d3` purported to fix the problems with commit `b2ccb5f4e6`, but it didn't completely fix them. The problem is that the checks were performed in the wrong order, leading to a race condition. If the sender attached, sent a message, and detached after the receiver called shm_mq_get_sender and before the receiver called shm_mq_counterparty_gone, we'd incorrectly return SHM_MQ_DETACHED before all messages were read. Repair by reversing the order of operations, and add a long comment explaining why this new logic is (hopefully) correct.	2015-11-03 09:12:52 -05:00
Robert Haas	0279f62fdc	Correct tiny inaccuracy in strxfrm cache comment. Peter Geoghegan	2015-11-03 08:32:22 -05:00
Tom Lane	620ac88d6f	Remove some more dead Alpha-specific code.	2015-11-02 19:37:51 -05:00
Robert Haas	1efc7e5382	Fix problems with ParamListInfo serialization mechanism. Commit `d1b7c1ffe7` introduced a mechanism for serializing a ParamListInfo structure to be passed to a parallel worker. However, this mechanism failed to handle external expanded values, as pointed out by Noah Misch. Repair. Moreover, plpgsql_param_fetch requires adjustment because the serialization mechanism needs it to skip evaluating unused parameters just as we would do when it is called from copyParamList, but params == estate->paramLI in that case. To fix, make the bms_is_member test in that function unconditional. Finally, have setup_param_list set a new ParamListInfo field, paramMask, to the parameters actually used in the expression, so that we don't try to fetch those that are not needed when serializing a parameter list. This isn't necessary for correctness, but it makes the performance of the parallel executor code comparable to what we do for cases involving cursors. Design suggestions and extensive review by Noah Misch. Patch by me.	2015-11-02 18:11:29 -05:00
Kevin Grittner	bf25fb2f93	Add RMV to list of commands taking AE lock. Backpatch to 9.3, where it was initially omitted. Craig Ringer, with minor adjustment by Kevin Grittner	2015-11-02 06:23:10 -06:00
Kevin Grittner	585e2a3b1a	Fix serialization anomalies due to race conditions on INSERT. On insert the CheckForSerializableConflictIn() test was performed before the page(s) which were going to be modified had been locked (with an exclusive buffer content lock). If another process acquired a relation SIReadLock on the heap and scanned to a page on which an insert was going to occur before the page was so locked, a rw-conflict would be missed, which could allow a serialization anomaly to be missed. The window between the check and the page lock was small, so the bug was generally not noticed unless there was high concurrency with multiple processes inserting into the same table. This was reported by Peter Bailis as bug #11732, by Sean Chittenden as bug #13667, and by others. The race condition was eliminated in heap_insert() by moving the check down below the acquisition of the buffer lock, which had been the very next statement. Because of the loop locking and unlocking multiple buffers in heap_multi_insert() a check was added after all inserts were completed. The check before the start of the inserts was left because it might avoid a large amount of work to detect a serialization anomaly before performing the all of the inserts and the related WAL logging. While investigating this bug, other SSI bugs which were even harder to hit in practice were noticed and fixed, an unnecessary check (covered by another check, so redundant) was removed from heap_update(), and comments were improved. Back-patch to all supported branches. Kevin Grittner and Thomas Munro	2015-10-31 14:43:34 -05:00
Tom Lane	12c9a04008	Implement lookbehind constraints in our regular-expression engine. A lookbehind constraint is like a lookahead constraint in that it consumes no text; but it checks for existence (or nonexistence) of a match ending at the current point in the string, rather than one starting at the current point. This is a long-requested feature since it exists in many other regex libraries, but Henry Spencer had never got around to implementing it in the code we use. Just making it work is actually pretty trivial; but naive copying of the logic for lookahead constraints leads to code that often spends O(N^2) time to scan an N-character string, because we have to run the match engine from string start to the current probe point each time the constraint is checked. In typical use-cases a lookbehind constraint will be written at the start of the regex and hence will need to be checked at every character --- so O(N^2) work overall. To fix that, I introduced a third copy of the core DFA matching loop, paralleling the existing longest() and shortest() loops. This version, matchuntil(), can suspend and resume matching given a couple of pointers' worth of storage space. So we need only run it across the string once, stopping at each interesting probe point and then resuming to advance to the next one. I also put in an optimization that simplifies one-character lookahead and lookbehind constraints, such as "(?=x)" or "(?<!\w)", into AHEAD and BEHIND constraints, which already existed in the engine. This avoids the overhead of the LACON machinery entirely for these rather common cases. The net result is that lookbehind constraints run a factor of three or so slower than Perl's for multi-character constraints, but faster than Perl's for one-character constraints ... and they work fine for variable-length constraints, which Perl gives up on entirely. So that's not bad from a competitive perspective, and there's room for further optimization if anyone cares. (In reality, raw scan rate across a large input string is probably not that big a deal for Postgres usage anyway; so I'm happy if it's linear.)	2015-10-30 19:14:19 -04:00
Robert Haas	c5057b2b34	doc: security_barrier option is a Boolean, not a string. Mistake introduced by commit `5bd91e3a83`. Hari Babu	2015-10-30 12:19:28 +01:00

1 2 3 4 5 ...

39306 Commits All Branches Search

39306 Commits

All Branches