Commit Graph

27 Commits

Author SHA1 Message Date
Robert Haas acf555bc53 Shut down Gather's children before shutting down Gather itself.
It turns out that the original shutdown order here does not work well.
Multiple people attempting to develop further parallel query patches
have discovered that they need to do cleanup before the DSM goes away,
and you can't do that if the parent node gets cleaned up first.

Patch by me, reviewed by KaiGai Kohei and Dilip Kumar.

Discussion: http://postgr.es/m/CA+TgmoY6bOc1YnhcAQnMfCBDbsJzROQ3sYxSAL-SYB5tMJcTKg@mail.gmail.com
Discussion: http://postgr.es/m/9A28C8860F777E439AA12E8AEA7694F8012AEB82@BPXM15GP.gisp.nec.co.jp
Discussion: http://postgr.es/m/CA+TgmoYuPOc=+xrG1v0fCsoLbKAab9F1ddOeaaiLMzKOiBar1Q@mail.gmail.com
2017-02-22 08:08:07 +05:30
Tom Lane 0a8b9d3b2c Remove no-longer-needed loop in ExecGather().
Coverity complained quite properly that commit ea15e1867 had introduced
unreachable code into ExecGather(); to wit, it was no longer possible to
iterate the final for-loop more or less than once.  So remove the for().

In passing, clean up a couple of comments, and make better use of a local
variable.
2017-01-22 11:47:38 -05:00
Andres Freund ea15e18677 Remove obsoleted code relating to targetlist SRF evaluation.
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.

This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.

Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-19 14:40:41 -08:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas 53c7cff720 Ensure gatherstate->nextreader is properly initialized.
The previously code worked OK as long as a Gather node was never
rescanned, or if it was rescanned, as long as it got at least as
many workers on rescan as it had originally.  But if the number
of workers ever decreased on a rescan, then it could crash.

Andreas Seltenreich
2016-12-05 15:54:28 -05:00
Robert Haas 6f3bd98ebf Extend framework from commit 53be0b1ad to report latch waits.
WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an
additional wait_event_info parameter; legal values are defined in
pgstat.h.  This makes it possible to uniquely identify every point in
the core code where we are waiting for a latch; extensions can pass
WAIT_EXTENSION.

Because latches were the major wait primitive not previously covered
by this patch, it is now possible to see information in
pg_stat_activity on a large number of important wait events not
previously addressed, such as ClientRead, ClientWrite, and SyncRep.

Unfortunately, many of the wait events added by this patch will fail
to appear in pg_stat_activity because they're only used in background
processes which don't currently appear in pg_stat_activity.  We should
fix this either by creating a separate view for such information, or
else by deciding to include them in pg_stat_activity after all.

Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and
Thomas Munro.
2016-10-04 11:01:42 -04:00
Tom Lane 887feefe87 Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch.
This coding pattern creates a race condition, because if an interesting
interrupt happens after we've checked InterruptPending but before we reset
our latch, the latch-setting done by the signal handler would get lost,
and then we might block at WaitLatch in the next iteration without ever
noticing the interrupt condition.  You can put the CHECK_FOR_INTERRUPTS
before WaitLatch or after ResetLatch, but not between them.

Aside from fixing the bugs, add some explanatory comments to latch.h
to perhaps forestall the next person from making the same mistake.

In HEAD, also replace gather_readnext's direct call of
HandleParallelMessages with CHECK_FOR_INTERRUPTS.  It does not seem clean
or useful for this one caller to bypass ProcessInterrupts and go straight
to HandleParallelMessages; not least because that fails to consider the
InterruptPending flag, resulting in useless work both here
(if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call
(if it is).

This thinko seems to have been introduced in the initial coding of
storage/ipc/shm_mq.c (commit ec9037df2), and then blindly copied into all
the subsequent parallel-query support logic.  Back-patch relevant hunks
to 9.4 to extirpate the error everywhere.

Discussion: <1661.1469996911@sss.pgh.pa.us>
2016-08-01 15:13:53 -04:00
Tom Lane af33039317 Fix worst memory leaks in tqueue.c.
TupleQueueReaderNext() leaks like a sieve if it has to do any tuple
disassembly/reconstruction.  While we could try to clean up its allocations
piecemeal, it seems like a better idea just to insist that it should be run
in a short-lived memory context, so that any transient space goes away
automatically.  I chose to have nodeGather.c switch into its existing
per-tuple context before the call, rather than inventing a separate
context inside tqueue.c.

This is sufficient to stop all leakage in the simple case I exhibited
earlier today (see link below), but it does not deal with leaks induced
in more complex cases by tqueue.c's insistence on using TopMemoryContext
for data that it's not actually trying hard to keep track of.  That issue
is intertwined with another major source of inefficiency, namely failure
to cache lookup results across calls, so it seems best to deal with it
separately.

In passing, improve some comments, and modify gather_readnext's method for
deciding when it's visited all the readers so that it's more obviously
correct.  (I'm not actually convinced that the previous code *is*
correct in the case of a reader deletion; it certainly seems fragile.)

Discussion: <32763.1469821037@sss.pgh.pa.us>
2016-07-29 19:31:06 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas 5702277ca9 Tweak EXPLAIN for parallel query to show workers launched.
The previous display was sort of confusing, because it didn't
distinguish between the number of workers that we planned to launch
and the number that actually got launched.  This has already confused
several people, so display both numbers and label them clearly.

Julien Rouhaud, reviewed by me.
2016-04-15 11:52:18 -04:00
Robert Haas df4685fb0c Minor optimizations based on ParallelContext having nworkers_launched.
Originally, we didn't have nworkers_launched, so code that used parallel
contexts had to be preprared for the possibility that not all of the
workers requested actually got launched.  But now we can count on knowing
the number of workers that were successfully launched, which can shave
off a few cycles and simplify some code slightly.

Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter
Geoghegan.
2016-03-04 12:59:10 -05:00
Robert Haas 23c2dd03d5 Fix spelling mistakes.
Same patch submitted independently by David Rowley and Peter Geoghegan.
2016-01-14 23:16:40 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Robert Haas bc7fcab5e3 Read from the same worker repeatedly until it returns no tuple.
The original coding read tuples from workers in round-robin fashion,
but performance testing shows that it works much better to read enough
to empty one queue before moving on to the next.  I believe the
reason for this is that, with the old approach, we could easily wake
up a worker repeatedly to write only one new tuple into the shm_mq
each time.  With this approach, by the time the process gets scheduled,
it has a decent chance of being able to fill the entire buffer in
one go.

Patch by me.  Dilip Kumar helped with performance testing.
2015-12-23 14:06:52 -05:00
Robert Haas 3690dc6b03 Fix obsolete comment.
It's amazing how fast things become obsolete these days.

Amit Langote
2015-11-30 12:54:46 -05:00
Robert Haas 6c878a7553 Avoid server crash when worker registration fails at execution time.
The previous coding attempts to destroy the DSM in this case, but
child nodes might have stored data there and still be holding onto
pointers in this case.  So don't do that.

Also, free the reader array instead of leaking it.

Extracted from two different patch versions both by Amit Kapila.
2015-11-20 13:03:39 -05:00
Robert Haas 166b61a88e Avoid aggregating worker instrumentation multiple times.
Amit Kapila, per design ideas from me.
2015-11-18 12:35:25 -05:00
Tom Lane b05ae27e9a Add missing "static" qualifier.
Per buildfarm member pademelon.
2015-11-10 18:24:18 -05:00
Robert Haas bf3d015631 Fix rebasing mistake in nodeGather.c
The patches committed as 6e71dd7ce9
and 3a1f8611f2 were developed in
parallel but dependent on each other in a way that I failed to
notice.

This patch to fix the problem was prepared by Amit Kapila.
2015-11-09 10:49:24 -05:00
Robert Haas 6e71dd7ce9 Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend.  Transferring such tuples without modification between two
cooperating backends does not work.  This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves.  The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender.  That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.

Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues.  This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was.  typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
Robert Haas 3a1f8611f2 Update parallel executor support to reuse the same DSM.
Commit b0b0d84b3d purported to make it
possible to relaunch workers using the same parallel context, but it had
an unpleasant race condition: we might reinitialize after the workers
have sent their last control message but before they have dettached the
DSM, leaving to crashes.  Repair by introducing a new ParallelContext
operation, ReinitializeParallelDSM.

Adjust execParallel.c to use this new support, so that we can rescan a
Gather node by relaunching workers but without needing to recreate the
DSM.

Amit Kapila, with some adjustments by me.  Extracted from latest parallel
sequential scan patch.
2015-10-30 10:44:54 +01:00
Robert Haas 8538a63070 Make Gather node projection-capable.
The original Gather code failed to mark a Gather node as not able to
do projection, but it couldn't, even though it did call initialize its
projection info via ExecAssignProjectionInfo.  There doesn't seem to
be any good reason for this node not to have projection capability,
so clean things up so that it does.  Without this, plans using Gather
nodes might need to carry extra Result nodes to do projection.
2015-10-28 00:27:58 +01:00
Robert Haas 31ba62ce32 Fix typos in comments.
CharSyam
2015-10-22 14:52:23 -04:00
Robert Haas 1a219fa15b Add header comments to execParallel.c and nodeGather.c.
Patch by me, per a note from Simon Riggs.  Reviewed by Amit Kapila
and Amit Langote.
2015-10-22 10:37:24 -04:00
Robert Haas bfc78d7196 Rewrite interaction of parallel mode with parallel executor support.
In the previous coding, before returning from ExecutorRun, we'd shut
down all parallel workers.  This was dead wrong if ExecutorRun was
called with a non-zero tuple count; it had the effect of truncating
the query output.  To fix, give ExecutePlan control over whether to
enter parallel mode, and have it refuse to do so if the tuple count
is non-zero.  Rewrite the Gather logic so that it can cope with being
called outside parallel mode.

Commit 7aea8e4f2d is largely to blame
for this problem, though this patch modifies some subsequently-committed
code which relied on the guarantees it purported to make.
2015-10-16 11:56:02 -04:00
Tom Lane bf686796a0 Add missing "static" specifier.
Per buildfarm (pademelon, at least, doesn't like this).
2015-10-03 10:59:42 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00