postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	acf555bc53	Shut down Gather's children before shutting down Gather itself. It turns out that the original shutdown order here does not work well. Multiple people attempting to develop further parallel query patches have discovered that they need to do cleanup before the DSM goes away, and you can't do that if the parent node gets cleaned up first. Patch by me, reviewed by KaiGai Kohei and Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoY6bOc1YnhcAQnMfCBDbsJzROQ3sYxSAL-SYB5tMJcTKg@mail.gmail.com Discussion: http://postgr.es/m/9A28C8860F777E439AA12E8AEA7694F8012AEB82@BPXM15GP.gisp.nec.co.jp Discussion: http://postgr.es/m/CA+TgmoYuPOc=+xrG1v0fCsoLbKAab9F1ddOeaaiLMzKOiBar1Q@mail.gmail.com	2017-02-22 08:08:07 +05:30
Tom Lane	0a8b9d3b2c	Remove no-longer-needed loop in ExecGather(). Coverity complained quite properly that commit `ea15e1867` had introduced unreachable code into ExecGather(); to wit, it was no longer possible to iterate the final for-loop more or less than once. So remove the for(). In passing, clean up a couple of comments, and make better use of a local variable.	2017-01-22 11:47:38 -05:00
Andres Freund	ea15e18677	Remove obsoleted code relating to targetlist SRF evaluation. Since `69f4b9c` plain expression evaluation (and thus normal projection) can't return sets of tuples anymore. Thus remove code dealing with that possibility. This will require adjustments in external code using ExecEvalExpr()/ExecProject() - that should neither be hard nor very common. Author: Andres Freund and Tom Lane Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de	2017-01-19 14:40:41 -08:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Robert Haas	53c7cff720	Ensure gatherstate->nextreader is properly initialized. The previously code worked OK as long as a Gather node was never rescanned, or if it was rescanned, as long as it got at least as many workers on rescan as it had originally. But if the number of workers ever decreased on a rescan, then it could crash. Andreas Seltenreich	2016-12-05 15:54:28 -05:00
Robert Haas	6f3bd98ebf	Extend framework from commit `53be0b1ad` to report latch waits. WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an additional wait_event_info parameter; legal values are defined in pgstat.h. This makes it possible to uniquely identify every point in the core code where we are waiting for a latch; extensions can pass WAIT_EXTENSION. Because latches were the major wait primitive not previously covered by this patch, it is now possible to see information in pg_stat_activity on a large number of important wait events not previously addressed, such as ClientRead, ClientWrite, and SyncRep. Unfortunately, many of the wait events added by this patch will fail to appear in pg_stat_activity because they're only used in background processes which don't currently appear in pg_stat_activity. We should fix this either by creating a separate view for such information, or else by deciding to include them in pg_stat_activity after all. Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and Thomas Munro.	2016-10-04 11:01:42 -04:00
Tom Lane	887feefe87	Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch. This coding pattern creates a race condition, because if an interesting interrupt happens after we've checked InterruptPending but before we reset our latch, the latch-setting done by the signal handler would get lost, and then we might block at WaitLatch in the next iteration without ever noticing the interrupt condition. You can put the CHECK_FOR_INTERRUPTS before WaitLatch or after ResetLatch, but not between them. Aside from fixing the bugs, add some explanatory comments to latch.h to perhaps forestall the next person from making the same mistake. In HEAD, also replace gather_readnext's direct call of HandleParallelMessages with CHECK_FOR_INTERRUPTS. It does not seem clean or useful for this one caller to bypass ProcessInterrupts and go straight to HandleParallelMessages; not least because that fails to consider the InterruptPending flag, resulting in useless work both here (if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call (if it is). This thinko seems to have been introduced in the initial coding of storage/ipc/shm_mq.c (commit `ec9037df2`), and then blindly copied into all the subsequent parallel-query support logic. Back-patch relevant hunks to 9.4 to extirpate the error everywhere. Discussion: <1661.1469996911@sss.pgh.pa.us>	2016-08-01 15:13:53 -04:00
Tom Lane	af33039317	Fix worst memory leaks in tqueue.c. TupleQueueReaderNext() leaks like a sieve if it has to do any tuple disassembly/reconstruction. While we could try to clean up its allocations piecemeal, it seems like a better idea just to insist that it should be run in a short-lived memory context, so that any transient space goes away automatically. I chose to have nodeGather.c switch into its existing per-tuple context before the call, rather than inventing a separate context inside tqueue.c. This is sufficient to stop all leakage in the simple case I exhibited earlier today (see link below), but it does not deal with leaks induced in more complex cases by tqueue.c's insistence on using TopMemoryContext for data that it's not actually trying hard to keep track of. That issue is intertwined with another major source of inefficiency, namely failure to cache lookup results across calls, so it seems best to deal with it separately. In passing, improve some comments, and modify gather_readnext's method for deciding when it's visited all the readers so that it's more obviously correct. (I'm not actually convinced that the previous code is correct in the case of a reader deletion; it certainly seems fragile.) Discussion: <32763.1469821037@sss.pgh.pa.us>	2016-07-29 19:31:06 -04:00
Robert Haas	4bc424b968	pgindent run for 9.6	2016-06-09 18:02:36 -04:00
Robert Haas	5702277ca9	Tweak EXPLAIN for parallel query to show workers launched. The previous display was sort of confusing, because it didn't distinguish between the number of workers that we planned to launch and the number that actually got launched. This has already confused several people, so display both numbers and label them clearly. Julien Rouhaud, reviewed by me.	2016-04-15 11:52:18 -04:00
Robert Haas	df4685fb0c	Minor optimizations based on ParallelContext having nworkers_launched. Originally, we didn't have nworkers_launched, so code that used parallel contexts had to be preprared for the possibility that not all of the workers requested actually got launched. But now we can count on knowing the number of workers that were successfully launched, which can shave off a few cycles and simplify some code slightly. Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter Geoghegan.	2016-03-04 12:59:10 -05:00
Robert Haas	23c2dd03d5	Fix spelling mistakes. Same patch submitted independently by David Rowley and Peter Geoghegan.	2016-01-14 23:16:40 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Robert Haas	bc7fcab5e3	Read from the same worker repeatedly until it returns no tuple. The original coding read tuples from workers in round-robin fashion, but performance testing shows that it works much better to read enough to empty one queue before moving on to the next. I believe the reason for this is that, with the old approach, we could easily wake up a worker repeatedly to write only one new tuple into the shm_mq each time. With this approach, by the time the process gets scheduled, it has a decent chance of being able to fill the entire buffer in one go. Patch by me. Dilip Kumar helped with performance testing.	2015-12-23 14:06:52 -05:00
Robert Haas	3690dc6b03	Fix obsolete comment. It's amazing how fast things become obsolete these days. Amit Langote	2015-11-30 12:54:46 -05:00
Robert Haas	6c878a7553	Avoid server crash when worker registration fails at execution time. The previous coding attempts to destroy the DSM in this case, but child nodes might have stored data there and still be holding onto pointers in this case. So don't do that. Also, free the reader array instead of leaking it. Extracted from two different patch versions both by Amit Kapila.	2015-11-20 13:03:39 -05:00
Robert Haas	166b61a88e	Avoid aggregating worker instrumentation multiple times. Amit Kapila, per design ideas from me.	2015-11-18 12:35:25 -05:00
Tom Lane	b05ae27e9a	Add missing "static" qualifier. Per buildfarm member pademelon.	2015-11-10 18:24:18 -05:00
Robert Haas	bf3d015631	Fix rebasing mistake in nodeGather.c The patches committed as `6e71dd7ce9` and `3a1f8611f2` were developed in parallel but dependent on each other in a way that I failed to notice. This patch to fix the problem was prepared by Amit Kapila.	2015-11-09 10:49:24 -05:00
Robert Haas	6e71dd7ce9	Modify tqueue infrastructure to support transient record types. Commit `4a4e6893aa`, which introduced this mechanism, failed to account for the fact that the RECORD pseudo-type uses transient typmods that are only meaningful within a single backend. Transferring such tuples without modification between two cooperating backends does not work. This commit installs a system for passing the tuple descriptors over the same shm_mq being used to send the tuples themselves. The two sides might not assign the same transient typmod to any given tuple descriptor, so we must also substitute the appropriate receiver-side typmod for the one used by the sender. That adds some CPU overhead, but still seems better than being unable to pass records between cooperating parallel processes. Along the way, move the logic for handling multiple tuple queues from tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader, which reads from a single queue, rather than a TupleQueueFunnel, which potentially reads from multiple queues. This change was suggested previously as a way to make sure that nodeGather.c rather than tqueue.c had policy control over the order in which to read from queues, but it wasn't clear to me until now how good an idea it was. typmod mapping needs to be performed separately for each queue, and it is much simpler if the tqueue.c code handles that and leaves multiplexing multiple queues to higher layers of the stack.	2015-11-06 16:58:45 -05:00
Robert Haas	3a1f8611f2	Update parallel executor support to reuse the same DSM. Commit `b0b0d84b3d` purported to make it possible to relaunch workers using the same parallel context, but it had an unpleasant race condition: we might reinitialize after the workers have sent their last control message but before they have dettached the DSM, leaving to crashes. Repair by introducing a new ParallelContext operation, ReinitializeParallelDSM. Adjust execParallel.c to use this new support, so that we can rescan a Gather node by relaunching workers but without needing to recreate the DSM. Amit Kapila, with some adjustments by me. Extracted from latest parallel sequential scan patch.	2015-10-30 10:44:54 +01:00
Robert Haas	8538a63070	Make Gather node projection-capable. The original Gather code failed to mark a Gather node as not able to do projection, but it couldn't, even though it did call initialize its projection info via ExecAssignProjectionInfo. There doesn't seem to be any good reason for this node not to have projection capability, so clean things up so that it does. Without this, plans using Gather nodes might need to carry extra Result nodes to do projection.	2015-10-28 00:27:58 +01:00
Robert Haas	31ba62ce32	Fix typos in comments. CharSyam	2015-10-22 14:52:23 -04:00
Robert Haas	1a219fa15b	Add header comments to execParallel.c and nodeGather.c. Patch by me, per a note from Simon Riggs. Reviewed by Amit Kapila and Amit Langote.	2015-10-22 10:37:24 -04:00
Robert Haas	bfc78d7196	Rewrite interaction of parallel mode with parallel executor support. In the previous coding, before returning from ExecutorRun, we'd shut down all parallel workers. This was dead wrong if ExecutorRun was called with a non-zero tuple count; it had the effect of truncating the query output. To fix, give ExecutePlan control over whether to enter parallel mode, and have it refuse to do so if the tuple count is non-zero. Rewrite the Gather logic so that it can cope with being called outside parallel mode. Commit `7aea8e4f2d` is largely to blame for this problem, though this patch modifies some subsequently-committed code which relied on the guarantees it purported to make.	2015-10-16 11:56:02 -04:00
Tom Lane	bf686796a0	Add missing "static" specifier. Per buildfarm (pademelon, at least, doesn't like this).	2015-10-03 10:59:42 -04:00
Robert Haas	3bd909b220	Add a Gather executor node. A Gather executor node runs any number of copies of a plan in an equal number of workers and merges all of the results into a single tuple stream. It can also run the plan itself, if the workers are unavailable or haven't started up yet. It is intended to work with the Partial Seq Scan node which will be added in future commits. It could also be used to implement parallel query of a different sort by itself, without help from Partial Seq Scan, if the single_copy mode is used. In that mode, a worker executes the plan, and the parallel leader does not, merely collecting the worker's results. So, a Gather node could be inserted into a plan to split the execution of that plan across two processes. Nested Gather nodes aren't currently supported, but we might want to add support for that in the future. There's nothing in the planner to actually generate Gather nodes yet, so it's not quite time to break out the champagne. But we're getting close. Amit Kapila. Some designs suggestions were provided by me, and I also reviewed the patch. Single-copy mode, documentation, and other minor changes also by me.	2015-09-30 19:23:36 -04:00

27 Commits