postgresql/src/backend
Tom Lane b4c6d31c0b Fix serious performance problems in json(b) to_tsvector().
In an off-list followup to bug #14745, Bob Jones complained that
to_tsvector() on a 2MB jsonb value took an unreasonable amount of
time and space --- enough to draw the wrath of the OOM killer on
his machine.  On my machine, his example proved to require upwards
of 18 seconds and 4GB, which seemed pretty bogus considering that
to_tsvector() on the same data treated as text took just a couple
hundred msec and 10 or so MB.

On investigation, the problem is that the implementation scans each
string element of the json(b) and converts it to tsvector separately,
then applies tsvector_concat() to join those separate tsvectors.
The unreasonable memory usage came from leaking every single one of
the transient tsvectors --- but even without that mistake, this is an
O(N^2) or worse algorithm, because tsvector_concat() has to repeatedly
process the words coming from earlier elements.

We can fix it by accumulating all the lexeme data and applying
make_tsvector() just once.  As a side benefit, that also makes the
desired adjustment of lexeme positions far cheaper, because we can
just tweak the running "pos" counter between JSON elements.

In passing, try to make the explanation of that tweak more intelligible.
(I didn't think that a barely-readable comment far removed from the
actual code was helpful.)  And do some minor other code beautification.
2017-07-18 12:45:51 -04:00
..
access hash: Fix write-ahead logging bugs related to init forks. 2017-07-17 12:03:35 -04:00
bootstrap Phase 3 of pgindent updates. 2017-06-21 15:35:54 -04:00
catalog Code review for NextValueExpr expression node type. 2017-07-14 15:25:43 -04:00
commands Use a real RT index when setting up partition tuple routing. 2017-07-17 21:29:45 -04:00
executor Reverse-convert row types in ExecWithCheckOptions. 2017-07-17 21:56:31 -04:00
foreign Abstract logic to allow for multiple kinds of child rels. 2017-04-03 22:41:31 -04:00
lib Phase 3 of pgindent updates. 2017-06-21 15:35:54 -04:00
libpq Treat clean shutdown of an SSL connection same as the non-SSL case. 2017-07-03 14:51:51 +03:00
main Change pg_ctl to detect server-ready by watching status in postmaster.pid. 2017-06-28 17:31:32 -04:00
nodes Code review for NextValueExpr expression node type. 2017-07-14 15:25:43 -04:00
optimizer Code review for NextValueExpr expression node type. 2017-07-14 15:25:43 -04:00
parser Re-allow SRFs and window functions within sub-selects within aggregates. 2017-06-27 17:51:11 -04:00
po Translation updates 2017-07-10 11:53:55 -04:00
port Change pg_ctl to detect server-ready by watching status in postmaster.pid. 2017-06-28 17:31:32 -04:00
postmaster On Windows, retry process creation if we fail to reserve shared memory. 2017-07-10 11:00:09 -04:00
regex Phase 2 of pgindent updates. 2017-06-21 15:19:25 -04:00
replication Fix ordering of operations in SyncRepWakeQueue to avoid assertion failure. 2017-07-12 15:30:52 +03:00
rewrite Fix multiple assignments to a column of a domain type. 2017-07-11 16:48:59 -04:00
snowball Initial pgindent run with pg_bsd_indent version 2.0. 2017-06-21 14:39:04 -04:00
statistics Fix typos in README.dependencies 2017-06-22 17:12:27 -04:00
storage Fix race between GetNewTransactionId and GetOldestActiveTransactionId. 2017-07-13 15:47:02 +03:00
tcop Phase 3 of pgindent updates. 2017-06-21 15:35:54 -04:00
tsearch Fix serious performance problems in json(b) to_tsvector(). 2017-07-18 12:45:51 -04:00
utils Code review for NextValueExpr expression node type. 2017-07-14 15:25:43 -04:00
.gitignore Add .gitignore entries for AIX-specific intermediate build artifacts. 2015-07-08 20:44:22 -04:00
common.mk Add ICU_CFLAGS to global CPPFLAGS 2017-06-12 15:57:22 -04:00
Makefile Implement multivariate n-distinct coefficients 2017-03-24 14:06:10 -03:00
nls.mk Translation updates 2017-05-15 12:19:54 -04:00