postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	cbdb8b4c01	Fix float-to-integer coercions to handle edge cases correctly. ftoi4 and its sibling coercion functions did their overflow checks in a way that looked superficially plausible, but actually depended on an assumption that the MIN and MAX comparison constants can be represented exactly in the float4 or float8 domain. That fails in ftoi4, ftoi8, and dtoi8, resulting in a possibility that values near the MAX limit will be wrongly converted (to negative values) when they need to be rejected. Also, because we compared before rounding off the fractional part, the other three functions threw errors for values that really ought to get rounded to the min or max integer value. Fix by doing rint() first (requiring an assumption that it handles NaN and Inf correctly; but dtoi8 and ftoi8 were assuming that already), and by comparing to values that should coerce to float exactly, namely INTxx_MIN and -INTxx_MIN. Also remove some random cosmetic discrepancies between these six functions. Per bug #15519 from Victor Petrovykh. This should get back-patched, but first let's see what the buildfarm thinks of it --- I'm not too sure about portability of some of the regression test cases. Patch by me; thanks to Andrew Gierth for analysis and discussion. Discussion: https://postgr.es/m/15519-4fc785b483201ff1@postgresql.org	2018-11-23 20:57:11 -05:00
Tom Lane	a314c34079	Clamp semijoin selectivity to be not more than inner-join selectivity. We should never estimate the output of a semijoin to be more rows than we estimate for an inner join with the same input rels and join condition; it's obviously impossible for that to happen. However, given the relatively poor quality of our semijoin selectivity estimates --- particularly, but not only, in cases where we punt and return a default estimate --- we did often deliver such estimates. To improve matters, calculate both estimates inside eqjoinsel() and take the smaller one. The bulk of this patch is just mechanical refactoring to avoid repetitive information lookup when we call both eqjoinsel_semi and eqjoinsel_inner. The actual new behavior is just selec = Min(selec, inner_rel->rows * selec_inner); which looks a bit odd but is correct because of our different definitions for inner and semi join selectivity. There is one ensuing plan change in the regression tests, but it looks reasonable enough (and checking the actual row counts shows that the estimate moved closer to reality, not further away). Per bug #15160 from Alexey Ermakov. Although this is arguably a bug fix, I won't risk destabilizing plan choices in stable branches by back-patching. Tom Lane, reviewed by Melanie Plageman Discussion: https://postgr.es/m/152395805004.19366.3107109716821067806@wrigleys.postgresql.org	2018-11-23 12:48:49 -05:00
Alvaro Herrera	3be5fe2b10	Silence compiler warnings Commit `cfdf4dc4fc` left a few unnecessary assignments, one of which caused compiler warnings, as reported by Erik Rijkers. Remove them all. Discussion: https://postgr.es/m/df0dcca2025b3d90d946ecc508ca9678@xs4all.nl	2018-11-23 13:01:05 -03:00
Alvaro Herrera	de38ce1b83	Don't allow partitioned indexes in pg_global tablespace Missing in `dfa6081419`. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f-M3NMTCpv=vDfkoqHbMPFf=3-Z1ud=+1DHH00tC+zLaQ@mail.gmail.com	2018-11-23 08:48:20 -03:00
Thomas Munro	cfdf4dc4fc	Add WL_EXIT_ON_PM_DEATH pseudo-event. Users of the WaitEventSet and WaitLatch() APIs can now choose between asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death. This reduces code duplication, since almost all callers want the latter. Repair all code that was previously ignoring postmaster death completely, or requesting the event but ignoring it, or requesting the event but then doing an unconditional PostmasterIsAlive() call every time through its event loop (which is an expensive syscall on platforms for which we don't have USE_POSTMASTER_DEATH_SIGNAL support). Assert that callers of WaitLatchXXX() under the postmaster remember to ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent future bugs. The only process that doesn't handle postmaster death is syslogger. It waits until all backends holding the write end of the syslog pipe (including the postmaster) have closed it by exiting, to be sure to capture any parting messages. By using the WaitEventSet API directly it avoids the new assertion, and as a by-product it may be slightly more efficient on platforms that have epoll(). Author: Thomas Munro Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com, https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com	2018-11-23 20:46:34 +13:00
Tom Lane	eba2ce1712	Fix another crash in json{b}_populate_recordset and json{b}_to_recordset. populate_recordset_worker() failed to consider the possibility that the supplied JSON data contains no rows, so that update_cached_tupdesc never got called. This led to a null-pointer dereference since commit 9a5e8ed28; before that it led to a bogus "set-valued function called in context that cannot accept a set" error. Fix by forcing the update to happen. Per bug #15514. Back-patch to v11 as `9a5e8ed28` was. (If we were excited about the bogus error, we could perhaps go back further, but it'd take more work to figure out how to fix it in older branches. Given the lack of field complaints about that aspect, I'm not excited.) Discussion: https://postgr.es/m/15514-59d5b4c4065b178b@postgresql.org	2018-11-22 15:14:01 -05:00
Michael Paquier	25c026c284	Fix typo in description of ExecFindPartition Author: Amit Langote Discussion: https://postgr.es/m/CA+HiwqHg0=UL+Dhh3gpiwYNA=ufk9Lb7GQ2c=5rs=ZmVTP7xAw@mail.gmail.com	2018-11-22 13:23:54 +09:00
Alvaro Herrera	03e10b962f	Fix typo in commit `6f7d02aa60` Per pink buildfarm.	2018-11-21 15:35:40 -03:00
Alvaro Herrera	ee07e38c14	Fix PartitionDispatchData vertical whitespace Per David Rowley Discussion: https://postgr.es/m/CAKJS1f-MstvBWdkOzACsOHyBgj2oXcBM8kfv+NhVe-Ux-wq9Sg@mail.gmail.com	2018-11-21 15:04:25 -03:00
Alvaro Herrera	6f7d02aa60	instr_time.h: add INSTR_TIME_SET_CURRENT_LAZY Sets the timestamp to current if not already set. Will acquire more callers momentarily. Author: Fabien Coelho Discussion: https://postgr.es/m/alpine.DEB.2.21.1808111104320.1705@lancre	2018-11-21 15:04:25 -03:00
Andres Freund	578b229718	Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de	2018-11-20 16:00:17 -08:00
Peter Eisentraut	ea8bc349bd	Make detection of SSL_CTX_set_min_proto_version more portable As already explained in configure.in, using the OpenSSL version number to detect presence of functions doesn't work, because LibreSSL reports incompatible version numbers. Fortunately, the functions we need here are actually macros, so we can just test for them directly.	2018-11-20 22:59:36 +01:00
Peter Eisentraut	e73e67c719	Add settings to control SSL/TLS protocol version For example: ssl_min_protocol_version = 'TLSv1.1' ssl_max_protocol_version = 'TLSv1.2' Reviewed-by: Steve Singer <steve@ssinger.info> Discussion: https://www.postgresql.org/message-id/flat/1822da87-b862-041a-9fc2-d0310c3da173@2ndquadrant.com	2018-11-20 22:12:10 +01:00
Peter Eisentraut	2d9140ed26	Make WAL description output more consistent The output for record types XLOG_DBASE_CREATE and XLOG_DBASE_DROP used the order dbid/tablespaceid, whereas elsewhere the order is tablespaceid/dbid[/relfilenodeid]. Flip the order for those two types to make it consistent. Author: Jean-Christophe Arnu <jcarnu@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHZmTm18Ln62KW-G8NYvO1wbBL3QU1E76Zep=DuHmg-zS2XFAg@mail.gmail.com/	2018-11-20 13:30:01 +01:00
Peter Eisentraut	a568cadaff	Refine some guc.c help texts These settings apply to communication with the sending server, which is not necessarily a primary. Author: Sergei Kornilov <sk@zsrv.org>	2018-11-20 06:38:34 +01:00
Tom Lane	cb09903fe6	Add needed #include. Per POSIX, WIFSIGNALED and related macros are provided by <sys/wait.h>. Apparently on Linux they're also pulled in by some other inclusions, but BSD-ish systems are pickier. Fixes portability issue in `ffa4cbd62`. Per buildfarm.	2018-11-19 17:28:04 -05:00
Tom Lane	ffa4cbd623	Handle EPIPE more sanely when we close a pipe reading from a program. Previously, any program launched by COPY TO/FROM PROGRAM inherited the server's setting of SIGPIPE handling, i.e. SIG_IGN. Hence, if we were doing COPY FROM PROGRAM and closed the pipe early, the child process would see EPIPE on its output file and typically would treat that as a fatal error, in turn causing the COPY to report error. Similarly, one could get a failure report from a query that didn't read all of the output from a contrib/file_fdw foreign table that uses file_fdw's PROGRAM option. To fix, ensure that child programs inherit SIG_DFL not SIG_IGN processing of SIGPIPE. This seems like an all-around better situation since if the called program wants some non-default treatment of SIGPIPE, it would expect to have to set that up for itself. Then in COPY, if it's COPY FROM PROGRAM and we stop reading short of detecting EOF, treat a SIGPIPE exit from the called program as a non-error condition. This still allows us to report an error for any case where the called program gets SIGPIPE on some other file descriptor. As coded, we won't report a SIGPIPE if we stop reading as a result of seeing an in-band EOF marker (e.g. COPY BINARY EOF marker). It's somewhat debatable whether we should complain if the called program continues to transmit data after an EOF marker. However, it seems like we should avoid throwing error in any questionable cases, especially in a back-patched fix, and anyway it would take additional code to make such an error get reported consistently. Back-patch to v10. We could go further back, since COPY FROM PROGRAM has been around awhile, but AFAICS the only way to reach this situation using core or contrib is via file_fdw, which has only supported PROGRAM sources since v10. The COPY statement per se has no feature whereby it'd stop reading without having hit EOF or an error already. Therefore, I don't see any upside to back-patching further that'd outweigh the risk of complaints about behavioral change. Per bug #15449 from Eric Cyr. Patch by me, review by Etsuro Fujita and Kyotaro Horiguchi Discussion: https://postgr.es/m/15449-1cf737dd5929450e@postgresql.org	2018-11-19 17:02:39 -05:00
Robert Haas	7ee5f88e65	Reduce unnecessary list construction in RelationBuildPartitionDesc. The 'partoids' list which was constructed by the previous version of this code was necessarily identical to 'inhoids'. There's no point to duplicating the list, so avoid that. Instead, construct the array representation directly from the original 'inhoids' list. Also, use an array rather than a list for 'boundspecs'. We know exactly how many items we need to store, so there's really no reason to use a list. Using an array instead reduces the number of memory allocations we perform. Patch by me, reviewed by Michael Paquier and Amit Langote, the latter of whom also helped with rebasing.	2018-11-19 12:10:41 -05:00
Alvaro Herrera	5c9a5513a3	Disallow COPY FREEZE on partitioned tables This didn't actually work: COPY would fail to flush the right files, and instead would try to flush a non-existing file, causing the whole transaction to fail. Cope by raising an error as soon as the command is sent instead, to avoid a nasty later surprise. Of course, it would be much better to make it work, but we don't have a patch for that yet, and we don't know if we'll want to backpatch one when we do. Reported-by: Tomas Vondra Author: David Rowley Reviewed-by: Amit Langote, Steve Singer, Tomas Vondra	2018-11-19 11:16:28 -03:00
Thomas Munro	9ccdd7f66e	PANIC on fsync() failure. On some operating systems, it doesn't make sense to retry fsync(), because dirty data cached by the kernel may have been dropped on write-back failure. In that case the only remaining copy of the data is in the WAL. A subsequent fsync() could appear to succeed, but not have flushed the data. That means that a future checkpoint could apparently complete successfully but have lost data. Therefore, violently prevent any future checkpoint attempts by panicking on the first fsync() failure. Note that we already did the same for WAL data; this change extends that behavior to non-temporary data files. Provide a GUC data_sync_retry to control this new behavior, for users of operating systems that don't eject dirty data, and possibly forensic/testing uses. If it is set to on and the write-back error was transient, a later checkpoint might genuinely succeed (on a system that does not throw away buffers on failure); if the error is permanent, later checkpoints will continue to fail. The GUC defaults to off, meaning that we panic. Back-patch to all supported releases. There is still a narrow window for error-loss on some operating systems: if the file is closed and later reopened and a write-back error occurs in the intervening time, but the inode has the bad luck to be evicted due to memory pressure before we reopen, we could miss the error. A later patch will address that with a scheme for keeping files with dirty data open at all times, but we judge that to be too complicated to back-patch. Author: Craig Ringer, with some adjustments by Thomas Munro Reported-by: Craig Ringer Reviewed-by: Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de	2018-11-19 17:41:26 +13:00
Thomas Munro	1556cb2fc5	Don't forget about failed fsync() requests. If fsync() fails, md.c must keep the request in its bitmap, so that future attempts will try again. Back-patch to all supported releases. Author: Thomas Munro Reviewed-by: Amit Kapila Reported-by: Andrew Gierth Discussion: https://postgr.es/m/87y3i1ia4w.fsf%40news-spur.riddles.org.uk	2018-11-19 17:41:26 +13:00
Michael Paquier	285bd0ac4a	Remove unnecessary memcpy when reading WAL record fitting on page When reading a WAL record, its contents are copied into an intermediate buffer. However, doing so is not necessary if the record fits fully into the current page, saving one memcpy for each such record. The allocation handling of the intermediate buffer is also now done only when a record crosses a page boundary, shaving some extra cycles when reading a WAL record. Author: Andrey Lepikhov Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas Discussion: https://postgr.es/m/c2ea54dd-a1d3-80eb-ddbf-7e6f258e615e@postgrespro.ru	2018-11-19 10:25:48 +09:00
Tom Lane	37afc079ab	Avoid defining SIGTTIN/SIGTTOU on Windows. Setting them to SIG_IGN seems unlikely to have any beneficial effect on that platform, and given the signal numbering collision with SIGABRT, it could easily have bad effects. Given the lack of field complaints that can be traced to this, I don't presently feel a need to back-patch. Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us	2018-11-17 16:31:16 -05:00
Tom Lane	125f551c8b	Leave SIGTTIN/SIGTTOU signal handling alone in postmaster child processes. For reasons lost in the mists of time, most postmaster child processes reset SIGTTIN/SIGTTOU signal handling to SIG_DFL, with the major exception that backend sessions do not. It seems like a pretty bad idea for any postmaster children to do that: if stderr is connected to the terminal, and the user has put the postmaster in background, any log output would result in the child process freezing up. Hence, switch them all to doing what backends do, ie, nothing. This allows them to inherit the postmaster's SIG_IGN setting. On the other hand, manually-launched processes such as standalone backends will have default processing, which seems fine. In passing, also remove useless resets of SIGCONT and SIGWINCH signal processing. Perhaps the postmaster once changed those to something besides SIG_DFL, but it doesn't now, so these are just wasted (and confusing) syscalls. Basically, this propagates the changes made in commit `8e2998d8a` from backends to other postmaster children. Probably the only reason these calls now exist elsewhere is that I missed changing pgstat.c along with postgres.c at the time. Given the lack of field complaints that can be traced to this, I don't presently feel a need to back-patch. Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us	2018-11-17 16:31:16 -05:00
Andres Freund	73616126b4	Fix some spurious new compiler warnings in MSVC. Per buildfarm animal bowerbird. Discussion: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2018-11-17%2002%3A30%3A20	2018-11-17 11:41:14 -08:00
Andres Freund	4da597edf1	Make TupleTableSlots extensible, finish split of existing slot type. This commit completes the work prepared in `1a0586de36`, splitting the old TupleTableSlot implementation (which could store buffer, heap, minimal and virtual slots) into four different slot types. As described in the aforementioned commit, this is done with the goal of making tuple table slots extensible, to allow for pluggable table access methods. To achieve runtime extensibility for TupleTableSlots, operations on slots that can differ between types of slots are performed using the TupleTableSlotOps struct provided at slot creation time. That includes information from the size of TupleTableSlot struct to be allocated, initialization, deforming etc. See the struct's definition for more detailed information about callbacks TupleTableSlotOps. I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more consistent with other naming introduced in recent patches. There's plenty optimization potential in the slot implementation, but according to benchmarking the state after this commit has similar performance characteristics to before this set of changes, which seems sufficient. There's a few changes in execReplication.c that currently need to poke through the slot abstraction, that'll be repaired once the pluggable storage patchset provides the necessary infrastructure. Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-16 16:35:15 -08:00
Alvaro Herrera	0201d79a55	Avoid re-typedef'ing PartitionTupleRouting Apparently, gcc on macOS (?) doesn't like it. Per buildfarm.	2018-11-16 16:55:53 -03:00
Andres Freund	a7aa608e0f	Inline hot path of slot_getsomeattrs(). This yields a minor speedup, which roughly balances the loss from the upcoming introduction of callbacks to do some operations on slots. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-16 10:29:01 -08:00
Alvaro Herrera	3f2393edef	Redesign initialization of partition routing structures This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com	2018-11-16 15:01:05 -03:00
Andres Freund	a387a3dff9	Fix slot type assumptions for nodeGather[Merge]. The assumption made in `1a0586de36` was wrong, as evidenced by buildfarm failure on locust, which runs with force_parallel_mode=regress. The tuples accessed in either nodes are in the outer slot, and we can't trivially rely on the slot type being known because the leader might execute the subsidiary node directly, or via the tuple queue on a worker. In the latter case the tuple will always be a heaptuple slot, but in the former, it'll be whatever the subsidiary node returns.	2018-11-15 23:16:41 -08:00
Andres Freund	7ef04e4d2c	Don't generate tuple deforming functions for virtual slots. Virtual tuple table slots never need tuple deforming. Therefore, if we know at expression compilation time, that a certain slot will always be virtual, there's no need to create a tuple deforming routine for it. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 22:00:30 -08:00
Andres Freund	15d8f83128	Verify that expected slot types match returned slot types. This is important so JIT compilation knows what kind of tuple slot the deforming routine can expect. There's also optimization potential for expression initialization without JIT compilation. It e.g. seems plausible to elide EEOP_*_FETCHSOME ops entirely when dealing with virtual slots. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 22:00:30 -08:00
Andres Freund	675af5c01e	Compute information about EEOP_*_FETCHSOME at expression init time. Previously this information was computed when JIT compiling an expression. But the information is useful for assertions in the non-JIT case too (for assertions), therefore it makes sense to move it. This will, in a followup commit, allow to treat different slot types differently. E.g. for virtual slots there's no need to generate a JIT function to deform the slot. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 22:00:30 -08:00
Andres Freund	1a0586de36	Introduce notion of different types of slots (without implementing them). Upcoming work intends to allow pluggable ways to introduce new ways of storing table data. Accessing those table access methods from the executor requires TupleTableSlots to be carry tuples in the native format of such storage methods; otherwise there'll be a significant conversion overhead. Different access methods will require different data to store tuples efficiently (just like virtual, minimal, heap already require fields in TupleTableSlot). To allow that without requiring additional pointer indirections, we want to have different structs (embedding TupleTableSlot) for different types of slots. Thus different types of slots are needed, which requires adapting creators of slots. The slot that most efficiently can represent a type of tuple in an executor node will often depend on the type of slot a child node uses. Therefore we need to track the type of slot is returned by nodes, so parent slots can create slots based on that. Relatedly, JIT compilation of tuple deforming needs to know which type of slot a certain expression refers to, so it can create an appropriate deforming function for the type of tuple in the slot. But not all nodes will only return one type of slot, e.g. an append node will potentially return different types of slots for each of its subplans. Therefore add function that allows to query the type of a node's result slot, and whether it'll always be the same type (whether it's fixed). This can be queried using ExecGetResultSlotOps(). The scan, result, inner, outer type of slots are automatically inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(), left/right subtrees respectively. If that's not correct for a node, that can be overwritten using new fields in PlanState. This commit does not introduce the actually abstracted implementation of different kind of TupleTableSlots, that will be left for a followup commit. The different types of slots introduced will, for now, still use the same backing implementation. While this already partially invalidates the big comment in tuptable.h, it seems to make more sense to update it later, when the different TupleTableSlot implementations actually exist. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 22:00:30 -08:00
Andres Freund	763f2edd92	Rejigger materializing and fetching a HeapTuple from a slot. Previously materializing a slot always returned a HeapTuple. As current work aims to reduce the reliance on HeapTuples (so other storage systems can work efficiently), that needs to change. Thus split the tasks of materializing a slot (i.e. making it independent from the underlying storage / other memory contexts) from fetching a HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from a slot and materializing the slot at the same time, controlled by a parameter. For now some callers of ExecFetchSlotHeapTuple, with materialize = true, expect that changes to the heap tuple will be reflected in the underlying slot. Those places will be adapted in due course, so while not pretty, that's OK for now. Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely that future storage methods will need similar methods. There already is ExecFetchSlotMinimalTuple, so the new names make the naming scheme more coherent. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 14:31:12 -08:00
Peter Eisentraut	86a4819f69	Update executor documentation for run-time partition pruning With run-time partition pruning, there is no longer necessarily an executor node for each corresponding plan node. Author: David Rowley <david.rowley@2ndquadrant.com>	2018-11-15 23:02:21 +01:00
Andres Freund	c058fc2a2b	Rationalize expression context reset in ExecModifyTable(). The current pattern of reseting expressions both in ExecProcessReturning() and ExecOnConflictUpdate() makes it harder than necessary to reason about memory lifetimes. It also requires materializing slots unnecessarily, although this patch doesn't take advantage of the fact that that's not necessary anymore. Instead reset the expression context once for each input tuple. Author: Ashutosh Bapat Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 11:40:50 -08:00
Tom Lane	34c9e455d0	Improve performance of partition pruning remapping a little. ExecFindInitialMatchingSubPlans has to update the PartitionPruneState's subplan mapping data to account for the removal of subplans it prunes. However, that's only necessary if run-time pruning will also occur, so we can skip it when that won't happen, which should result in not needing to do the remapping in many cases. (We now need it only when some partitions are potentially startup-time prunable and others are potentially run-time prunable, which seems like an unusual case.) Also make some marginal performance improvements in the remapping itself. These will mainly win if most partitions got pruned by the startup-time pruning, which is perhaps a debatable assumption in this context. Also fix some bogus comments, and rearrange code to marginally reduce space consumption in the executor's query-lifespan context. David Rowley, reviewed by Yoshikazu Imai Discussion: https://postgr.es/m/CAKJS1f9+m6-di-zyy4B4AGn0y1B9F8UKDRigtBbNviXOkuyOpw@mail.gmail.com	2018-11-15 13:34:16 -05:00
Alvaro Herrera	74514bd4a5	geo_ops.c: Clarify comments and function arguments These functions were not crystal clear about what their respective APIs are. Make an effort to improve that. Emre's patch was correct AFAICT, but I (Álvaro) felt the need to improve a few comments a bit more. Any resulting errors are my own. Per complaint from Coverity, Ning Yu, and Tom Lane. Author: Emre Hasegeli, Álvaro Herrera Reviewed-by: Tomas Vondra, Álvaro Herrera Discussion: https://postgr.es/m/26769.1533090136@sss.pgh.pa.us	2018-11-15 12:08:03 -03:00
Thomas Munro	ab8984f52d	Further adjustment to random() seed initialization. Per complaint from Tom Lane, don't chomp the timestamp at 32 bits, so we can shift in some of its higher bits. Discussion: https://postgr.es/m/14712.1542253115%40sss.pgh.pa.us	2018-11-15 17:38:55 +13:00
Thomas Munro	5b0ce3ec33	Increase the number of possible random seeds per time period. Commit `197e4af9` refactored the initialization of the libc random() seed, but reduced the number of possible seeds values that could be chosen in a given time period. This negation of the effects of commit `98c50656c` was unintentional. Replace with code that shifts the fast moving timestamp bits left, similar to the original algorithm (though not the previous float-tolerating coding, which is no longer necessary). Author: Thomas Munro Reported-by: Noah Misch Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20181112083358.GA1073796%40rfd.leadboat.com	2018-11-15 16:25:30 +13:00
Thomas Munro	aa55183042	Use 64 bit type for BufFileSize(). BufFileSize() can't use off_t, because it's only 32 bits wide on some systems. BufFile objects can have many 1GB segments so the total size can exceed 2^31. The only known client of the function is parallel CREATE INDEX, which was reported to fail when building large indexes on Windows. Though this is technically an ABI break on platforms with a 32 bit off_t and we might normally avoid back-patching it, the function is brand new and thus unlikely to have been discovered by extension authors yet, and it's fairly thoroughly broken on those platforms anyway, so just fix it. Defect in `9da0cc35`. Bug #15460. Back-patch to 11, where this function landed. Author: Thomas Munro Reported-by: Paul van der Linden, Pavel Oskin Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/15460-b6db80de822fa0ad%40postgresql.org Discussion: https://postgr.es/m/CAHDGBJP_GsESbTt4P3FZA8kMUKuYxjg57XHF7NRBoKnR%3DCAR-g%40mail.gmail.com	2018-11-15 13:13:57 +13:00
Tom Lane	600b04d6b5	Add a timezone-specific variant of date_trunc(). date_trunc(field, timestamptz, zone_name) performs truncation using the named time zone as reference, rather than working in the session time zone as is the default behavior. It's equivalent to date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name but it's faster, easier to type, and arguably easier to understand. Vik Fearing and Tom Lane Discussion: https://postgr.es/m/6249ffc4-2b22-4c1b-4e7d-7af84fedd7c6@2ndquadrant.com	2018-11-14 15:41:07 -05:00
Peter Eisentraut	1b5d797cd4	Lower lock level for renaming indexes Change lock level for renaming index (either ALTER INDEX or implicitly via some other commands) from AccessExclusiveLock to ShareUpdateExclusiveLock. One reason we need a strong lock for relation renaming is that the name change causes a rebuild of the relcache entry. Concurrent sessions that have the relation open might not be able to handle the relcache entry changing underneath them. Therefore, we need to lock the relation in a way that no one can have the relation open concurrently. But for indexes, the relcache handles reloads specially in RelationReloadIndexInfo() in a way that keeps changes in the relcache entry to a minimum. As long as no one keeps pointers to rd_amcache and rd_options around across possible relcache flushes, which is the case, this ought to be safe. We also want to use a self-exclusive lock for correctness, so that concurrent DDL doesn't overwrite the rename if they start updating while still seeing the old version. Therefore, we use ShareUpdateExclusiveLock, which is already used by other DDL commands that want to operate in a concurrent manner. The reason this is interesting at all is that renaming an index is a typical part of a concurrent reindexing workflow (CREATE INDEX CONCURRENTLY new + DROP INDEX CONCURRENTLY old + rename back). And indeed a future built-in REINDEX CONCURRENTLY might rely on the ability to do concurrent renames as well. Reviewed-by: Andrey Klychkov <aaklychkov@mail.ru> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/1531767486.432607658@f357.i.mail.ru	2018-11-14 17:09:54 +01:00
Michael Paquier	b4721f3950	Initialize TransactionState and user ID consistently at transaction start If a failure happens when a transaction is starting between the moment the transaction status is changed from TRANS_DEFAULT to TRANS_START and the moment the current user ID and security context flags are fetched via GetUserIdAndSecContext(), or before initializing its basic fields, then those may get reset to incorrect values when the transaction aborts, leaving the session in an inconsistent state. One problem reported is that failing a starting transaction at the first query of a session could cause several kinds of system crashes on the follow-up queries. In order to solve that, move the initialization of the transaction state fields and the call of GetUserIdAndSecContext() in charge of fetching the current user ID close to the point where the transaction status is switched to TRANS_START, where there cannot be any error triggered in-between, per an idea of Tom Lane. This properly ensures that the current user ID, the security context flags and that the basic fields of TransactionState remain consistent even if the transaction fails while starting. Reported-by: Richard Guo Diagnosed-By: Richard Guo Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAN_9JTxECSb=pEPcb0a8d+6J+bDcOZ4=DgRo_B7Y5gRHJUM=Rw@mail.gmail.com Backpatch-through: 9.4	2018-11-14 16:46:53 +09:00
Michael Paquier	3be97b97ed	Add flag values in WAL description to all heap records Hexadecimal is consistently used as format to not bloat too much the output but keep it readable. This information is useful mainly for debugging purposes with for example pg_waldump. Author: Michael Paquier Reviewed-by: Nathan Bossart, Dmitry Dolgov, Andres Freund, Álvaro Herrera Discussion: https://postgr.es/m/20180413034734.GE1552@paquier.xyz	2018-11-14 10:33:10 +09:00
Michael Paquier	b52b7dc250	Refactor code creating PartitionBoundInfo The code building PartitionBoundInfo based on the constituent partition data read from catalogs has been located in partcache.c, with a specific set of routines dedicated to bound types, like sorting or bound data creation. All this logic is moved to partbounds.c and relocates all the bound-specific logistic into it, with partition_bounds_create() as principal entry point. Author: Amit Langote Reviewed-by: Michael Paquier, Álvaro Herrera Discussion: https://postgr.es/m/3f289da8-6d10-75fe-814a-635e8b191d43@lab.ntt.co.jp	2018-11-14 10:01:49 +09:00
Tom Lane	965a3d6be0	Fix realfailN lexer rules to not make assumptions about input format. The realfail1 and realfail2 backup-prevention rules always returned token type FCONST, ignoring the possibility that what we've scanned is more appropriately described as ICONST. I think that at the time that code was added, it might actually have been safe to not distinguish; but since we started allowing AS-less aliases in SELECT target lists, it's definitely legal to have a number immediately followed by an identifier. In the SELECT case, it seems there's no visible consequence because make_const() will change the type back to integer anyway. But I'm worried that there are other contexts, or will be in future, where it's more important to get the constant's type right. Hence, use process_integer_literal to correctly determine which token type to return. Arguably this is a bug fix, but given the lack of evidence of user-visible problems, I'll refrain from back-patching. Discussion: https://postgr.es/m/21364.1542136808@sss.pgh.pa.us	2018-11-13 14:54:41 -05:00
Tom Lane	ec937d0805	Align ECPG lexer more closely with the core and psql lexers. Make a bunch of basically-cosmetic changes to reduce the diffs between the flex rules in scan.l, psqlscan.l, and pgc.l. Reorder some code, adjust a lot of whitespace, sync some comments, make use of flex start condition scopes to do that. There are a few non-cosmetic changes in the ECPG lexer: * Bring over the decimalfail rule (and support function process_integer_literal) so that ECPG will lex "1..10" into the same tokens as the backend would. I'm not sure this makes any visible difference to users, but I'm not sure it doesn't, either. * <xdc><<EOF>> gets its own rule so as to produce a more on-point error message. * Remove duplicate <SQL>{xdstart} rule. John Naylor, with a few additional changes by me Discussion: https://postgr.es/m/CAJVSVGWGqY9YBs2EwtRUkbNv=hXkN8yRPOoD1wxE6COgvvrz5g@mail.gmail.com	2018-11-13 12:57:52 -05:00
Thomas Munro	b59d4d6c36	Fix const correctness warning. Per buildfarm.	2018-11-13 19:03:02 +13:00
Amit Kapila	a53bc135fb	Fix the initialization of atomic variables introduced by the group clearing mechanism. Commits `0e141c0fbb` and `baaf272ac9` introduced initialization of atomic variables in InitProcess which means that it's not safe to look at those for backends that aren't currently in use. Fix that by initializing them during postmaster startup. Reported-by: Andres Freund Author: Amit Kapila Backpatch-through: 9.6 Discussion: https://postgr.es/m/20181027104138.qmbbelopvy7cw2qv@alap3.anarazel.de	2018-11-13 10:52:40 +05:30
Thomas Munro	257ef3cd4f	Fix handling of HBA ldapserver with multiple hostnames. Commit `35c0754f` failed to handle space-separated lists of alternative hostnames in ldapserver, when building a URI for ldap_initialize() (OpenLDAP). Such lists need to be expanded to space-separated URIs. Repair. Back-patch to 11, to fix bug report #15495. Author: Thomas Munro Reported-by: Renaud Navarro Discussion: https://postgr.es/m/15495-2c39fc196c95cd72%40postgresql.org	2018-11-13 17:46:28 +13:00
Thomas Munro	6a3dcd2856	Fix possible buffer overrun in hba.c. Coverty reports a possible buffer overrun in the code that populates the pg_hba_file_rules view. It may not be a live bug due to restrictions on options that can be used together, but let's increase MAX_HBA_OPTIONS and correct a nearby misleading comment. Back-patch to 10 where this code arrived. Reported-by: Julian Hsiao Discussion: https://postgr.es/m/CADnGQpzbkWdKS2YHNifwAvX5VEsJ5gW49U4o-7UL5pzyTv4vTg%40mail.gmail.com	2018-11-13 16:27:13 +13:00
Michael Paquier	52b70b1c7d	Remove CommandCounterIncrement() after processing ON COMMIT DELETE This comes from `f9b5b41`, which is part of one the original commits that implemented ON COMMIT actions. By looking at the truncation code, any CCI needed happens locally when rebuilding indexes, so it looks safe to just remove this final incrementation. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20181109024731.GF2652@paquier.xyz	2018-11-13 08:59:41 +09:00
Tom Lane	3de491482b	Simplify null-element handling in extension_config_remove(). There's no point in asking deconstruct_array() for a null-flags array when we already checked the array has no nulls, and aren't going to examine the output anyhow. Not asking for this output should make the code marginally faster, and it's also more robust since if there somehow were nulls, deconstruct_array() would throw an error. Daniel Gustafsson Discussion: https://postgr.es/m/289FFB8B-7AAB-48B5-A497-6E0D41D7BA47@yesql.se	2018-11-12 11:50:28 -05:00
Tom Lane	e3f005d974	Limit the number of index clauses considered in choose_bitmap_and(). classify_index_clause_usage() is O(N^2) in the number of distinct index qual clauses it considers, because of its use of a simple search list to store them. For nearly all queries, that's fine because only a few clauses will be considered. But Alexander Kuzmenkov reported a machine-generated query with 80000 (!) index qual clauses, which caused this code to take forever. Somewhat remarkably, this is the only O(N^2) behavior we now have for such a query, so let's fix it. We can get rid of the O(N^2) runtime for cases like this without much damage to the functionality of choose_bitmap_and() by separating out paths with "too many" qual or pred clauses, and deeming them to always be nonredundant with other paths. Then their clauses needn't go into the search list, so it doesn't get too long, but we don't lose the ability to consider bitmap AND plans altogether. I set the threshold for "too many" to be 100 clauses per path, which should be plenty to ensure no change in planning behavior for normal queries. There are other things we could do to make this go faster, but it's not clear that it's worth any additional effort. 80000 qual clauses require a whole lot of work in many other places, too. The code's been like this for a long time, so back-patch to all supported branches. The troublesome query only works back to 9.5 (in 9.4 it fails with stack overflow in the parser); so I'm not sure that fixing this in 9.4 has any real-world benefit, but perhaps it does. Discussion: https://postgr.es/m/90c5bdfa-d633-dabe-9889-3cf3e1acd443@postgrespro.ru	2018-11-12 11:19:04 -05:00
Andres Freund	450c7defa6	Remove volatiles from {procarray,volatile}.c and fix memory ordering issue. The use of volatiles in procarray.c largely originated from the time when postgres did not have reliable compiler and memory barriers. That's not the case anymore, so we can do better. Several of the functions in procarray.c can be bottlenecks, and removal of volatile yields mildly better code. The new state, with explicit memory barriers, is also more correct. The previous use of volatile did not actually deliver sufficient guarantees on weakly ordered machines, in particular the logic in GetNewTransactionId() does not look safe. It seems unlikely to be a problem in practice, but worth fixing. Thomas and I independently wrote a patch for this. Reported-By: Andres Freund and Thomas Munro Author: Andres Freund, with cherrypicked changes from a patch by Thomas Munro Discussion: https://postgr.es/m/20181005172955.wyjb4fzcdzqtaxjq@alap3.anarazel.de https://postgr.es/m/CAEepm=1nff0x=7i3YQO16jLA2qw-F9O39YmUew4oq-xcBQBs0g@mail.gmail.com	2018-11-10 16:11:57 -08:00
Peter Eisentraut	69ee2ff930	Apply RI trigger skipping tests also for DELETE The tests added in `cfa0f4255b` to skip firing an RI trigger if any old key value is NULL can also be applied for DELETE. This should give a performance gain in those cases, and it also saves a lot of duplicate code in the actual RI triggers. (That code was already dead code for the UPDATE cases.) Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-11-10 16:14:51 +01:00
Peter Eisentraut	34479d9a36	Remove dead foreign key optimization code The ri_KeysEqual() calls in the foreign-key trigger functions to optimize away some updates are useless because since `adfeef55cb` those triggers are not enqueued at all. (It's also not useful to keep these checks as some kind of backstop, since it's also semantically correct to just run the full check even with equal keys.) Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-11-10 16:14:51 +01:00
Andres Freund	5fde047f2b	Combine two flag tests in GetSnapshotData(). Previously the code checked PROC_IN_LOGICAL_DECODING and PROC_IN_VACUUM separately. As the relevant variable is marked as volatile, the compiler cannot combine the two tests. As GetSnapshotData() is pretty hot in a number of workloads, it's worthwhile to fix that. It'd also be a good idea to get rid of the volatiles altogether. But for one that's a larger patch, and for another, the code after this change still seems at least as easy to read as before. Author: Andres Freund Discussion: https://postgr.es/m/20181005172955.wyjb4fzcdzqtaxjq@alap3.anarazel.de	2018-11-09 20:43:56 -08:00
Tom Lane	fa2952d8eb	Fix missing role dependencies for some schema and type ACLs. This patch fixes several related cases in which pg_shdepend entries were never made, or were lost, for references to roles appearing in the ACLs of schemas and/or types. While that did no immediate harm, if a referenced role were later dropped, the drop would be allowed and would leave a dangling reference in the object's ACL. That still wasn't a big problem for normal database usage, but it would cause obscure failures in subsequent dump/reload or pg_upgrade attempts, taking the form of attempts to grant privileges to all-numeric role names. (I think I've seen field reports matching that symptom, but can't find any right now.) Several cases are fixed here: 1. ALTER DOMAIN SET/DROP DEFAULT would lose the dependencies for any existing ACL entries for the domain. This case is ancient, dating back as far as we've had pg_shdepend tracking at all. 2. If a default type privilege applies, CREATE TYPE recorded the ACL properly but forgot to install dependency entries for it. This dates to the addition of default privileges for types in 9.2. 3. If a default schema privilege applies, CREATE SCHEMA recorded the ACL properly but forgot to install dependency entries for it. This dates to the addition of default privileges for schemas in v10 (commit `ab89e465c`). Another somewhat-related problem is that when creating a relation rowtype or implicit array type, TypeCreate would apply any available default type privileges to that type, which we don't really want since such an object isn't supposed to have privileges of its own. (You can't, for example, drop such privileges once they've been added to an array type.) `ab89e465c` is also to blame for a race condition in the regression tests: privileges.sql transiently installed globally-applicable default privileges on schemas, which sometimes got absorbed into the ACLs of schemas created by concurrent test scripts. This should have resulted in failures when privileges.sql tried to drop the role holding such privileges; but thanks to the bug fixed here, it instead led to dangling ACLs in the final state of the regression database. We'd managed not to notice that, but it became obvious in the wake of commit `da906766c`, which allowed the race condition to occur in pg_upgrade tests. To fix, add a function recordDependencyOnNewAcl to encapsulate what callers of get_user_default_acl need to do; while the original call sites got that right via ad-hoc code, none of the later-added ones have. Also change GenerateTypeDependencies to generate these dependencies, which requires adding the typacl to its parameter list. (That might be annoying if there are any extensions calling that function directly; but if there are, they're most likely buggy in the same way as the core callers were, so they need work anyway.) While I was at it, I changed GenerateTypeDependencies to accept most of its parameters in the form of a Form_pg_type pointer, making its parameter list a bit less unwieldy and mistake-prone. The test race condition is fixed just by wrapping the addition and removal of default privileges into a single transaction, so that that state is never visible externally. We might eventually prefer to separate out tests of default privileges into a script that runs by itself, but that would be a bigger change and would make the tests run slower overall. Back-patch relevant parts to all supported branches. Discussion: https://postgr.es/m/15719.1541725287@sss.pgh.pa.us	2018-11-09 20:42:14 -05:00
Andres Freund	c670d0faac	Remove ineffective check against dropped columns from slot_getattr(). Before this commit slot_getattr() checked for dropped columns (returning NULL in that case), but only after checking for previously deformed columns. As slot_deform_tuple() does not contain such a check, the check in slot_getattr() would often not have been reached, depending on previous use of the slot. These days locking and plan invalidation ought to ensure that dropped columns are not accessed in query plans. Therefore this commit just drops the insufficient check in slot_getattr(). It's possible that we'll find some holes againt use of dropped columns, but if so, those need to be addressed independent of slot_getattr(), as most accesses don't go through that function anyway. Author: Andres Freund Discussion: https://postgr.es/m/20181107174403.zai7fedgcjoqx44p@alap3.anarazel.de	2018-11-09 17:40:40 -08:00
Andres Freund	1ef6bd2954	Don't require return slots for nodes without projection. In a lot of nodes the return slot is not required. That can either be because the node doesn't do any projection (say an Append node), or because the node does perform projections but the projection is optimized away because the projection would yield an identical row. Slots aren't that small, especially for wide rows, so it's worthwhile to avoid creating them. It's not possible to just skip creating the slot - it's currently used to determine the tuple descriptor returned by ExecGetResultType(). So separate the determination of the result type from the slot creation. The work previously done internally ExecInitResultTupleSlotTL() can now also be done separately with ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that aren't guaranteed to need a result slot, can use ExecInitResultTypeTL() to determine the result type of the node, and ExecAssignScanProjectionInfo() (via ExecConditionalAssignProjectionInfo()) determines that a result slot is needed, it is created with ExecInitResultSlot(). Besides the advantage of avoiding to create slots that then are unused, this is necessary preparation for later patches around tuple table slot abstraction. In particular separating the return descriptor and slot is a prerequisite to allow JITing of tuple deforming with knowledge of the underlying tuple format, and to avoid unnecessarily creating JITed tuple deforming for virtual slots. This commit removes a redundant argument from ExecInitResultTupleSlotTL(). While this commit touches a lot of the relevant lines anyway, it'd normally still not worthwhile to cause breakage, except that aforementioned later commits will touch all ExecInitResultTupleSlotTL() callers anyway (but fits worse thematically). Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-09 17:19:39 -08:00
Michael Paquier	319a810180	Fix dependency handling of partitions and inheritance for ON COMMIT This commit fixes a set of issues with ON COMMIT actions when used on partitioned tables and tables with inheritance children: - Applying ON COMMIT DROP on a partitioned table with partitions or on a table with inheritance children caused a failure at commit time, with complains about the children being already dropped as all relations are dropped one at the same time. - Applying ON COMMIT DELETE on a partition relying on a partitioned table which uses ON COMMIT DROP would cause the partition truncation to fail as the parent is removed first. The solution to the first problem is to handle the removal of all the dependencies in one go instead of dropping relations one-by-one, based on a suggestion from Álvaro Herrera. So instead all the relation OIDs to remove are gathered and then processed in one round of multiple deletions. The solution to the second problem is to reorder the actions, with truncation happening first and relation drop done after. Even if it means that a partition could be first truncated, then immediately dropped if its partitioned table is dropped, this has the merit to keep the code simple as there is no need to do existence checks on the relations to drop. Contrary to a manual TRUNCATE on a partitioned table, ON COMMIT DELETE does not cascade to its partitions. The ON COMMIT action defined on each partition gets the priority. Author: Michael Paquier Reviewed-by: Amit Langote, Álvaro Herrera, Robert Haas Discussion: https://postgr.es/m/68f17907-ec98-1192-f99f-8011400517f5@lab.ntt.co.jp Backpatch-through: 10	2018-11-09 10:03:22 +09:00
Tom Lane	3d360e20c9	Disallow setting client_min_messages higher than ERROR. Previously it was possible to set client_min_messages to FATAL or PANIC, which had the effect of suppressing transmission of regular ERROR messages to the client. Perhaps that seemed like a useful option in the past, but the trouble with it is that it breaks guarantees that are explicitly made in our FE/BE protocol spec about how a query cycle can end. While libpq and psql manage to cope with the omission, that's mostly because they are not very bright; client libraries that have more semantic knowledge are likely to get confused. Notably, pgODBC doesn't behave very sanely. Let's fix this by getting rid of the ability to set client_min_messages above ERROR. In HEAD, just remove the FATAL and PANIC options from the set of allowed enum values for client_min_messages. (This change also affects trace_recovery_messages, but that's OK since these aren't useful values for that variable either.) In the back branches, there was concern that rejecting these values might break applications that are explicitly setting things that way. I'm pretty skeptical of that argument, but accommodate it by accepting these values and then internally setting the variable to ERROR anyway. In all branches, this allows a couple of tiny simplifications in the logic in elog.c, so do that. Also respond to the point that was made that client_min_messages has exactly nothing to do with the server's logging behavior, and therefore does not belong in the "When To Log" subsection of the documentation. The "Statement Behavior" subsection is a better match, so move it there. Jonah Harris and Tom Lane Discussion: https://postgr.es/m/7809.1541521180@sss.pgh.pa.us Discussion: https://postgr.es/m/15479-ef0f4cc2fd995ca2@postgresql.org	2018-11-08 17:33:43 -05:00
Alvaro Herrera	705d433fd5	Revise attribute handling code on partition creation The original code to propagate NOT NULL and default expressions specified when creating a partition was mostly copy-pasted from typed-tables creation, but not being a great match it contained some duplicity, inefficiency and bugs. This commit fixes the bug that NOT NULL constraints declared in the parent table would not be honored in the partition. One reported issue that is not fixed is that a DEFAULT declared in the child is not used when inserting through the parent. That would amount to a behavioral change that's better not back-patched. This rewrite makes the code simpler: 1. instead of checking for duplicate column names in its own block, reuse the original one that already did that; 2. instead of concatenating the list of columns from parent and the one declared in the partition and scanning the result to (incorrectly) propagate defaults and not-null constraints, just scan the latter searching the former for a match, and merging sensibly. This works because we know the list in the parent is already correct and there can only be one parent. This rewrite makes ColumnDef->is_from_parent unused, so it's removed on branch master; on released branches, it's kept as an unused field in order not to cause ABI incompatibilities. This commit also adds a test case for creating partitions with collations mismatching that on the parent table, something that is closely related to the code being patched. No code change is introduced though, since that'd be a behavior change that could break some (broken) working applications. Amit Langote wrote a less invasive fix for the original NOT NULL/defaults bug, but while I kept the tests he added, I ended up not using his original code. Ashutosh Bapat reviewed Amit's fix. Amit reviewed mine. Author: Álvaro Herrera, Amit Langote Reviewed-by: Ashutosh Bapat, Amit Langote Reported-by: Jürgen Strobel (bug #15212) Discussion: https://postgr.es/m/152746742177.1291.9847032632907407358@wrigleys.postgresql.org	2018-11-08 16:22:09 -03:00
Michael Paquier	170dccc69d	Fix incorrect routine name reference in partprune.c Author: Yuzuko Hosoya Discussion: https://postgr.es/m/00ac01d4774c$7feac860$7fc05920$@lab.ntt.co.jp	2018-11-08 20:14:16 +09:00
Andres Freund	c14a8ff27e	Fixup for `b84a6dafbf` triggering assert failure in LLVM debug builds. Author: Andres Freund	2018-11-07 14:00:14 -08:00
Andres Freund	b84a6dafbf	Move EEOP_*_SYSVAR evaluation out of line. This mainly de-duplicates code. As evaluating a system variable isn't the hottest path and the current inline implementation ends up calling out to an external function anyway, this is OK from a performance POV. The main motivation for de-duplicating is the upcoming slot abstraction work, after which there's not guaranteed to be a HeapTuple backing the slot. Author: Andres Freund, Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-07 11:08:45 -08:00
Andres Freund	5f32b29c18	Build HashState's hashkeys expression with the correct parent. Previously the expressions were built with the HashJoinState as a parent. That's incorrect. Currently this does not appear to be harmful, but for the upcoming 'slot abstraction' work this proves to be problematic, as the underlying slot types can differ between Hash and HashJoin. It's possible that this already causes a problem, but I've not been able to come up with a scenario. Therefore don't backpatch at this point. Author: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-11-07 09:25:54 -08:00
Tom Lane	c6e4133fae	Postpone calculating total_table_pages until after pruning/exclusion. The planner doesn't make any use of root->total_table_pages until it estimates costs of indexscans, so we don't need to compute it as early as that's currently done. By doing the calculation between set_base_rel_sizes and set_base_rel_pathlists, we can omit relations that get removed from the query by partition pruning or constraint exclusion, which seems like a more accurate basis for costing. (Historical note: I think at the time this code was written, there was not a separation between the "set sizes" and "set pathlists" steps, so that this approach would have been impossible at the time. But now that we have that separation, this is clearly the better way to do things.) David Rowley, reviewed by Edmund Horner Discussion: https://postgr.es/m/CAKJS1f-NG1mRM0VOtkAG7=ZLQWihoqees9R4ki3CKBB0-fRfCA@mail.gmail.com	2018-11-07 12:12:56 -05:00
Tom Lane	5d28c9bd73	Disable recheck_on_update optimization to avoid crashes. The code added by commit `c203d6cf8` causes a crash in at least one case, where a potentially-optimizable expression index has a storage type different from the input data type. A cursory code review turned up numerous other problems that seem impractical to fix on short notice. Andres argued for revert of that patch some time ago, and if additional senior committers had been paying attention, that's likely what would have happened, but we were not :-( At this point we can't just revert, at least not in v11, because that would mean an ABI break for code touching relcache entries. And we should not remove the (also buggy) support for the recheck_on_update index reloption, since it might already be used in some databases in the field. So this patch just does the as-little-invasive-as-possible measure of disabling the feature as though recheck_on_update were forced off for all indexes. I also removed the related regression tests (which would otherwise fail) and the user-facing documentation of the reloption. We should undertake a more thorough code cleanup if the patch can't be fixed, but not under the extreme time pressure of being already overdue for 11.1 release. Per report from Ondřej Bouda and subsequent private discussion among pgsql-release. Discussion: https://postgr.es/m/20181106185255.776mstcyehnc63ty@alvherre.pgsql	2018-11-06 18:33:28 -05:00
Thomas Munro	c4f0876fb8	Remove set-but-unused variable. Clean-up for commit `c24dcd0c`. Reported-by: Andrew Dunstan Discussion: https://postgr.es/m/2d52ff4a-5440-f6f1-7806-423b0e6370cb%402ndQuadrant.com	2018-11-07 12:06:43 +13:00
Andrew Gierth	5613da4cc7	Optimize nested ConvertRowtypeExpr nodes. A ConvertRowtypeExpr is used to translate a whole-row reference of a child to that of a parent. The planner produces nested ConvertRowtypeExpr while translating whole-row reference of a leaf partition in a multi-level partition hierarchy. Executor then translates the whole-row reference from the leaf partition into all the intermediate parent's whole-row references before arriving at the final whole-row reference. It could instead translate the whole-row reference from the leaf partition directly to the top-most parent's whole-row reference skipping any intermediate translations. Ashutosh Bapat, with tests by Kyotaro Horiguchi and some editorialization by me. Reviewed by Andres Freund, Pavel Stehule, Kyotaro Horiguchi, Dmitry Dolgov, Tom Lane.	2018-11-06 21:10:10 +00:00
Thomas Munro	c24dcd0cfd	Use pg_pread() and pg_pwrite() for data files and WAL. Cut down on system calls by doing random I/O using offset-based OS routines where available. Remove the code for tracking the 'virtual' seek position. The only reason left to call FileSeek() was to get the file's size, so provide a new function FileSize() instead. Author: Oskari Saarenmaa, Thomas Munro Reviewed-by: Thomas Munro, Jesper Pedersen, Tom Lane, Alvaro Herrera Discussion: https://postgr.es/m/CAEepm=02rapCpPR3ZGF2vW=SBHSdFYO_bz_f-wwWJonmA3APgw@mail.gmail.com Discussion: https://postgr.es/m/b8748d39-0b19-0514-a1b9-4e5a28e6a208%40gmail.com Discussion: https://postgr.es/m/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi	2018-11-07 09:51:50 +13:00
Bruce Momjian	b43df566b3	GUC: adjust effective_cache_size SQL descriptions Follow on patch for commit `3e0f1a4741`. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/369ec766-b947-51bd-4dad-6fb9e026439f@2ndquadrant.com Backpatch-through: 9.4	2018-11-06 13:40:03 -05:00
Tom Lane	003c68a3b4	Rename rbtree.c functions to use "rbt" prefix not "rb" prefix. The "rb" prefix is used by Ruby, so that our existing code results in name collisions that break plruby. We discussed ways to prevent that by adjusting dynamic linker options, but it seems that at best we'd move the pain to other cases. Renaming to avoid the collision is the only portable fix anyway. Fortunately, our rbtree code is not (yet?) widely used --- in core, there's only a single usage in GIN --- so it seems likely that we can get away with a rename. I chose to do this basically as s/rb/rbt/g, except for places where there already was a "t" after "rb". The patch could have been made smaller by only touching linker-visible symbols, but it would have resulted in oddly inconsistent-looking code. Better to make it look like "rbt" was the plan all along. Back-patch to v10. The rbtree.c code exists back to 9.5, but rb_iterate() which is the actual immediate source of pain was added in v10, so it seems like changing the names before that would have more risk than benefit. Per report from Pavel Raiskup. Discussion: https://postgr.es/m/4738198.8KVIIDhgEB@nb.usersys.redhat.com	2018-11-06 13:25:24 -05:00
Thomas Munro	9e12fb02b7	Remove some remaining traces of dsm_resize(). A couple of obsolete comments and unreachable blocks remained after commit `3c60d0fa`. Discussion: https://postgr.es/m/CAA4eK1%2B%3DyAFUvpFoHXFi_gm8YqmXN-TtkFH%2BVYjvDLS6-SFq-Q%40mail.gmail.com	2018-11-06 21:40:08 +13:00
Michael Paquier	8f045e242b	Switch pg_promote to be parallel-safe pg_promote uses nothing relying on a global state, so it is fine to mark it as parallel-safe, conclusion based on a detailed analysis from Robert Haas. This also fixes an inconsistency where pg_proc.dat missed to mark the function with its previous value for proparallel, update which does not matter now as the default is used. Based on a discussion between multiple folks: Laurenz Albe, Robert Haas, Amit Kapila, Tom Lane and myself. Discussion: https://postgr.es/m/20181029082530.GL14242@paquier.xyz	2018-11-06 14:11:21 +09:00
Thomas Munro	3c60d0fa23	Remove dsm_resize() and dsm_remap(). These interfaces were never used in core, didn't handle failure of posix_fallocate() correctly and weren't supported on all platforms. We agreed to remove them in 12. Author: Thomas Munro Reported-by: Andres Freund Discussion: https://postgr.es/m/CAA4eK1%2B%3DyAFUvpFoHXFi_gm8YqmXN-TtkFH%2BVYjvDLS6-SFq-Q%40mail.gmail.com	2018-11-06 16:11:12 +13:00
Andres Freund	a3fb382e9c	Fix copy-paste error in errhint() introduced in `691d79a079`. Reported-By: Petr Jelinek Discussion: https://postgr.es/m/c95a620b-34f0-7930-aeb5-f7ab804f26cb@2ndquadrant.com Backpatch: 9.4-, like the previous commit	2018-11-05 12:05:38 -08:00
Michael Paquier	dc3e436b19	Block creation of partitions with open references to its parent When a partition is created as part of a trigger processing, it is possible that the partition which just gets created changes the properties of the table the executor of the ongoing command relies on, causing a subsequent crash. This has been found possible when for example using a BEFORE INSERT which creates a new partition for a partitioned table being inserted to. Any attempt to do so is blocked when working on a partition, with regression tests added for both CREATE TABLE PARTITION OF and ALTER TABLE ATTACH PARTITION. Reported-by: Dmitry Shalashov Author: Amit Langote Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/15437-3fe01ee66bd1bae1@postgresql.org Backpatch-through: 10	2018-11-05 11:04:02 +09:00
Michael Paquier	4bc772e2af	Ignore partitioned tables when processing ON COMMIT DELETE ROWS Those tables have no physical storage, making this option unusable with partition trees as at commit time an actual truncation was attempted. There are still issues with the way ON COMMIT actions are done when mixing several action types, however this impacts as well inheritance trees, so this issue will be dealt with later. Reported-by: Rajkumar Raghuwanshi Author: Amit Langote Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/CAKcux6mhgcjSiB_egqEAEFgX462QZtncU8QCAJ2HZwM-wWGVew@mail.gmail.com	2018-11-05 09:14:33 +09:00
Tom Lane	9b6fb9fbb4	Fix ExecuteCallStmt to not scribble on the passed-in parse tree. Modifying the parse tree at execution time is, or at least ought to be, verboten. It seems quite difficult to actually cause a crash this way in v11 (although you can exhibit it pretty easily in HEAD by messing with plan_cache_mode). Nonetheless, it's risky, so fix and back-patch. Discussion: https://postgr.es/m/13789.1541359611@sss.pgh.pa.us	2018-11-04 14:50:55 -05:00
Tom Lane	3e0b05a756	Fix unused-variable warning. Discussion: https://postgr.es/m/CAMkU=1xTHkS6d0iptCWykHc1Xrh3LBic_gZDo3JzDYru815fLQ@mail.gmail.com	2018-11-04 11:20:59 -05:00
Andres Freund	793beab37e	Prevent generating EEOP_AGG_STRICT_INPUT_CHECK operations when nargs == 0. This only became a problem with `4c640f4f38`, which didn't synchronize the value agg_strict_input_check.nargs is set to, with the guard condition for emitting the operation. Besides such instructions being unnecessary overhead, currently the LLVM JIT provider doesn't support them. It seems more sensible to avoid generating such instruction than supporting them. Add assertions to make it easier to debug a potential further occurance. Discussion: https://postgr.es/m/2a505161-2727-2473-7c46-591ed108ac52@email.cz Backpatch: 11-, like `4c640f4f38`.	2018-11-03 15:55:23 -07:00
Andres Freund	4c640f4f38	Fix STRICT check for strict aggregates with NULL ORDER BY columns. I (Andres) broke this unintentionally in `69c3936a14`, by checking strictness for all input expressions computed for an aggregate, rather than just the input for the aggregate transition function. Reported-By: Ondřej Bouda Bisected-By: Tom Lane Diagnosed-By: Andrew Gierth Discussion: https://postgr.es/m/2a505161-2727-2473-7c46-591ed108ac52@email.cz Backpatch: 11-, like `69c3936a14`	2018-11-03 14:48:42 -07:00
Tom Lane	981dc2baa8	Make ts_locale.c's character-type functions cope with UTF-16. On Windows, in UTF8 database encoding, what char2wchar() produces is UTF16 not UTF32, ie, characters above U+FFFF will be represented by surrogate pairs. t_isdigit() and siblings did not account for this and failed to provide a large enough result buffer. That in turn led to bogus "invalid multibyte character for locale" errors, because contrary to what you might think from char2wchar()'s documentation, its Windows code path doesn't cope sanely with buffer overflow. The solution for t_isdigit() and siblings is pretty clear: provide a 3-wchar_t result buffer not 2. char2wchar() also needs some work to provide more consistent, and more accurately documented, buffer overrun behavior. But that's a bigger job and it doesn't actually have any immediate payoff, so leave it for later. Per bug #15476 from Kenji Uno, who deserves credit for identifying the cause of the problem. Back-patch to all active branches. Discussion: https://postgr.es/m/15476-4314f480acf0f114@postgresql.org	2018-11-03 13:56:10 -04:00
Alvaro Herrera	dfa6081419	Fix tablespace handling for partitioned indexes When creating partitioned indexes, the tablespace was not being saved for the parent index. This meant that subsequently created partitions would not use the right tablespace for their indexes. ALTER INDEX SET TABLESPACE and ALTER INDEX ALL IN TABLESPACE raised errors when tried; fix them too. This requires bespoke code for ATExecCmd() that applies to the special case when the tablespace move is just a catalog change. Discussion: https://postgr.es/m/20181102003138.uxpaca6qfxzskepi@alvherre.pgsql	2018-11-03 13:25:19 -03:00
Thomas Munro	1ce4a807e2	Fix NULL handling in multi-batch Parallel Hash Left Join. NULL keys in left joins were skipped when building batch files. Repair, by making the keep_nulls argument to ExecHashGetHashValue() depend on whether this is a left outer join, as we do in other paths. Bug #15475. Thinko in `1804284042`. Back-patch to 11. Reported-by: Paul Schaap Diagnosed-by: Andrew Gierth Dicussion: https://postgr.es/m/15475-11a7a783fed72a36%40postgresql.org	2018-11-03 11:05:35 +13:00
Bruce Momjian	3e0f1a4741	GUC: adjust effective_cache_size docs and SQL description Clarify that effective_cache_size is both kernel buffers and shared buffers. Reported-by: nat@makarevitch.org Discussion: https://postgr.es/m/153685164808.22334.15432535018443165207@wrigleys.postgresql.org Backpatch-through: 9.3	2018-11-02 09:11:00 -04:00
Magnus Hagander	fbec7459aa	Fix spelling errors and typos in comments Author: Daniel Gustafsson <daniel@yesql.se>	2018-11-02 13:56:52 +01:00
Michael Paquier	6286efb524	Lower error level from PANIC to FATAL when restoring slots at startup When restoring slot information from disk at startup and filling in shared memory information, the startup process would issue a PANIC message if more slots are found than what max_replication_slots allows, and then Postgres generates a core dump, recommending to increase max_replication_slots. This gives users a switch to crash Postgres at will by creating slots, lower the configuration to not support it, and then restart it. Making Postgres crash hard in this case is overdoing it just to give a recommendation to users. So instead use a FATAL, which makes Postgres fail to start without crashing, still giving the recommendation. This is more consistent with what happens for prepared transactions for example. Author: Michael Paquier Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20181030025109.GD1644@paquier.xyz	2018-11-02 07:59:24 +09:00
Peter Eisentraut	96b00c433c	Remove obsolete pg_constraint.consrc column This has been deprecated and effectively unused for a long time. Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-11-01 20:36:05 +01:00
Peter Eisentraut	fe5038236c	Remove obsolete pg_attrdef.adsrc column This has been deprecated and effectively unused for a long time. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-11-01 20:35:42 +01:00
Andres Freund	8a99f8a827	Fix error message typo introduced `691d79a079`. Reported-By: Michael Paquier Discussion: https://postgr.es/m/20181101003405.GB1727@paquier.xyz Backpatch: 9.4-, like the previous commit	2018-11-01 10:44:29 -07:00
Peter Geoghegan	cb6f8a9a72	Adjust trace_sort log messages. The project message style guide dictates: "When citing the name of an object, state what kind of object it is". The parallel CREATE INDEX patch added a worker number to most of the trace_sort messages within tuplesort.c without specifying the object type. Bring these messages into compliance with the style guide. We're still treating a leader or serial Tuplesortstate as having worker number -1. trace_sort is a developer option, and these two cases are highly comparable, so this seems appropriate. Per complaint from Tom Lane. Discussion: https://postgr.es/m/8330.1540831863@sss.pgh.pa.us Backpatch: 11-, where parallel CREATE INDEX was introduced.	2018-11-01 09:18:57 -07:00
Andres Freund	691d79a079	Disallow starting server with insufficient wal_level for existing slot. Previously it was possible to create a slot, change wal_level, and restart, even if the new wal_level was insufficient for the slot. That's a problem for both logical and physical slots, because the necessary WAL records are not generated. This removes a few tests in newer versions that, somewhat inexplicably, whether restarting with a too low wal_level worked (a buggy behaviour!). Reported-By: Joshua D. Drake Author: Andres Freund Discussion: https://postgr.es/m/20181029191304.lbsmhshkyymhw22w@alap3.anarazel.de Backpatch: 9.4-, where replication slots where introduced	2018-10-31 15:46:39 -07:00
Tom Lane	696b0c5fd0	Fix memory leak in repeated SPGIST index scans. spgendscan neglected to pfree all the memory allocated by spgbeginscan. It's possible to get away with that in most normal queries, since the memory is allocated in the executor's per-query context which is about to get deleted anyway; but it causes severe memory leakage during creation or filling of large exclusion-constraint indexes. Also, document that amendscan is supposed to free what ambeginscan allocates. The docs' lack of clarity on that point probably caused this bug to begin with. (There is discussion of changing that API spec going forward, but I don't think it'd be appropriate for the back branches.) Per report from Bruno Wolff. It's been like this since the beginning, so back-patch to all active branches. In HEAD, also fix an independent leak caused by commit `2a6368343` (allocating memory during spgrescan instead of spgbeginscan, which might be all right if it got cleaned up, but it didn't). And do a bit of code beautification on that commit, too. Discussion: https://postgr.es/m/20181024012314.GA27428@wolff.to	2018-10-31 17:05:03 -04:00
Andres Freund	c4ab62f9ac	Fix typo in xlog.c. Author: Daniel Gustafsson Discussion: https://postgr.es/m/A6817958-949E-4A5B-895D-FA421B6640C2@yesql.se	2018-10-31 07:50:32 -07:00
Tom Lane	14a158f9bf	Fix interaction of CASE and ArrayCoerceExpr. An array-type coercion appearing within a CASE that has a constant (after const-folding) test expression was mangled by the planner, causing all the elements of the resulting array to be equal to the coerced value of the CASE's test expression. This is my oversight in commit c12d570fa: that changed ArrayCoerceExpr to use a subexpression involving a CaseTestExpr, and I didn't notice that eval_const_expressions needed an adjustment to keep from folding such a CaseTestExpr to a constant when it's inside a suitable CASE. This is another in what's getting to be a depressingly long line of bugs associated with misidentification of the referent of a CaseTestExpr. We're overdue to redesign that mechanism; but any such fix is unlikely to be back-patchable into v11. As a stopgap, fix eval_const_expressions to do what it must here. Also add a bunch of comments pointing out the restrictions and assumptions that are needed to make this work at all. Also fix a related oversight: contain_context_dependent_node() was not aware of the relationship of ArrayCoerceExpr to CaseTestExpr. That was somewhat fail-soft, in that the outcome of a wrong answer would be to prevent optimizations that could have been made, but let's fix it while we're at it. Per bug #15471 from Matt Williams. Back-patch to v11 where the faulty logic came in. Discussion: https://postgr.es/m/15471-1117f49271989bad@postgresql.org	2018-10-30 15:26:11 -04:00
Michael Paquier	d5eec4eefd	Add pg_partition_tree to display information about partitions This new function is useful to display a full tree of partitions with a partitioned table given in output, and avoids the need of any complex WITH RECURSIVE query when looking at partition trees which are deep multiple levels. It returns a set of records, one for each partition, containing the partition's name, its immediate parent's name, a boolean value telling if the relation is a leaf in the tree and an integer telling its level in the partition tree with given table considered as root, beginning at zero for the root, and incrementing by one each time the scan goes one level down. Author: Amit Langote Reviewed-by: Jesper Pedersen, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/8d00e51a-9a51-ad02-d53e-ba6bf50b2e52@lab.ntt.co.jp	2018-10-30 10:25:06 +09:00
Thomas Munro	051a1494bd	Remove incorrect comment in dshash.c. Back-patch to 11. Author: Antonin Houska Discussion: https://postgr.es/m/8726.1540553521%40localhost	2018-10-29 12:57:55 +13:00
Michael Paquier	10074651e3	Add pg_promote function This function is able to promote a standby with this new SQL-callable function. Execution access can be granted to non-superusers so that failover tools can observe the principle of least privilege. Catalog version is bumped. Author: Laurenz Albe Reviewed-by: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/6e7c79b3ec916cf49742fb8849ed17cd87aed620.camel@cybertec.at	2018-10-25 09:46:00 +09:00
Peter Eisentraut	0a8590b2a0	Apply unconstify() in more places Discussion: https://www.postgresql.org/message-id/08adbe4e-38f8-2c73-55f0-591392371687%402ndquadrant.com	2018-10-25 01:02:46 +01:00
Andrew Dunstan	040a1df614	Correctly set t_self for heap tuples in expand_tuple Commit `16828d5c0` incorrectly set an invalid pointer for t_self for heap tuples. This patch correctly copies it from the source tuple, and includes a regression test that relies on it being set correctly. Backpatch to release 11. Fixes bug #15448 reported by Tillmann Schulz Diagnosis and test case by Amit Langote	2018-10-24 10:56:27 -04:00
Michael Paquier	5ef037cf0b	List wait events in alphabetical order This changes the documentation, and the related structures so as everything is consistent. Some wait events were not listed alphabetically since their introduction, others have been added rather randomly. Keeping all those entries in order helps in maintenance, and helps the user looking at the documentation. Author: Michael Paquier, Kuntal Ghosh Discussion: https://postgr.es/m/20181024002539.GI1658@paquier.xyz Backpatch-through: 10, only for the documentation part to avoid an ABI breakage.	2018-10-24 17:02:37 +09:00
Peter Eisentraut	5d7c703a44	Remove get_attidentity() All existing uses can get this information more easily from the relation descriptor, so the detour through the syscache is not necessary. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-10-23 14:47:14 +02:00
Peter Eisentraut	c903bb7b1c	Remove get_atttypmod() This has been unused since 2004. get_atttypetypmodcoll() is often a better alternative. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-10-23 14:47:14 +02:00
Peter Eisentraut	e6f5d1accd	Drop const cast from dlsym() calls This workaround might be obsolete. We'll see if those "older platforms" mentioned in the comment are still around. Discussion: https://www.postgresql.org/message-id/08adbe4e-38f8-2c73-55f0-591392371687%402ndquadrant.com	2018-10-23 14:35:59 +02:00
Peter Eisentraut	807e4bc828	Sprinkle some const decorations These mainly help understanding the function signatures better.	2018-10-23 12:25:17 +02:00
Michael Paquier	17f206fbc8	Set pg_class.relhassubclass for partitioned indexes Like for relations, switching this parameter is optimistic by turning it on each time a partitioned index gains a partition. So seeing this parameter set to true means that the partitioned index has or has had partitions. The flag cannot be reset yet for partitioned indexes, which is something not obvious anyway as partitioned relations exist to have partitions. This allows to track more conveniently partition trees for indexes, which will come in use with an upcoming patch helping in listing partition trees with an SQL-callable function. Author: Amit Langote Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/80306490-b5fc-ea34-4427-f29c52156052@lab.ntt.co.jp	2018-10-22 11:04:48 +09:00
Tom Lane	2ddb9149d1	Server-side fix for delayed NOTIFY and SIGTERM processing. Commit `4f85fde8e` introduced some code that was meant to ensure that we'd process cancel, die, sinval catchup, and notify interrupts while waiting for client input. But there was a flaw: it supposed that the process latch would be set upon arrival at secure_read() if any such interrupt was pending. In reality, we might well have cleared the process latch at some earlier point while those flags remained set -- particularly notifyInterruptPending, which can't be handled as long as we're within a transaction. To fix the NOTIFY case, also attempt to process signals (except ProcDiePending) before trying to read. Also, if we see that ProcDiePending is set before we read, forcibly set the process latch to ensure that we will handle that signal promptly if no data is available. I also made it set the process latch on the way out, in case there is similar logic elsewhere. (It remains true that we won't service ProcDiePending here unless we need to wait for input.) The code for handling ProcDiePending during a write needs those changes, too. Also be a little more careful about when to reset whereToSendOutput, and improve related comments. Back-patch to 9.5 where this code was added. I'm not entirely convinced that older branches don't have similar issues, but the complaint at hand is just about the >= 9.5 code. Jeff Janes and Tom Lane Discussion: https://postgr.es/m/CAOYf6ec-TmRYjKBXLLaGaB-jrd=mjG1Hzn1a1wufUAR39PQYhw@mail.gmail.com	2018-10-19 21:39:21 -04:00
Tom Lane	350410be45	Add missing quote_identifier calls for CREATE TRIGGER ... REFERENCING. Mixed-case names for transition tables weren't dumped correctly. Oversight in commit `8c48375e5`, per bug #15440 from Karl Czajkowski. In passing, I couldn't resist a bit of code beautification. Back-patch to v10 where this was introduced. Discussion: https://postgr.es/m/15440-02d1468e94d63d76@postgresql.org	2018-10-19 00:50:16 -04:00
Thomas Munro	197e4af9d5	Refactor pid, random seed and start time initialization. Background workers, including parallel workers, were generating the same sequence of numbers in random(). This showed up as DSM handle collisions when Parallel Hash created multiple segments, but any code that calls random() in background workers could be affected if it cares about different backends generating different numbers. Repair by making sure that all new processes initialize the seed at the same time as they set MyProcPid and MyStartTime in a new function InitProcessGlobals(), called by the postmaster, its children and also standalone processes. Also add a new high resolution MyStartTimestamp as a potentially useful by-product, and remove SessionStartTime from struct Port as it is now redundant. No back-patch for now, as the known consequences so far are just a bunch of harmless shm_open(O_EXCL) collisions. Author: Thomas Munro Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEepm%3D2eJj_6%3DB%2B2tEpGu2nf1BjthCf9nXXUouYvJJ4C5WSwhg%40mail.gmail.com	2018-10-19 13:59:28 +13:00
Tom Lane	26cb82030f	Improve some comments related to executor result relations. es_leaf_result_relations doesn't exist; perhaps this was an old name for es_tuple_routing_result_relations, or maybe this comment has gone unmaintained through multiple rounds of whacking the code around. Related comment in execnodes.h was both obsolete and ungrammatical.	2018-10-17 16:41:00 -04:00
Tom Lane	48d818ede1	Const-ify a few more large static tables. Per research by Andres. Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-17 15:32:47 -04:00
Peter Eisentraut	a7a1b44516	Fix crash in multi-insert COPY A bug introduced in `0d5f05cde0` considered the previous partition's triggers when deciding whether multi-insert can be used. Rearrange the code so that the current partition is considered. Author: Ashutosh Sharma <ashu.coek88@gmail.com>	2018-10-17 20:31:20 +02:00
Andres Freund	28d750c0cd	Reorder FmgrBuiltin members, saving 25% in size. That's worth it, as fmgr_builtins is frequently accessed, and as fmgr_builtins is one of the biggest constant variables in a backend. On most 64bit systems this will change the size of the struct from 32byte to 24bytes. While that could make indexing into the array marginally more expensive, the higher cache hit ratio is worth more, especially because these days fmgr_builtins isn't searched with a binary search anymore (c.f. `212e6f34d5`). Discussion: https://postgr.es/m/20181016201145.aa2dfeq54rhqzron@alap3.anarazel.de	2018-10-16 14:51:18 -07:00
Andres Freund	93ca02e005	Mark constantly allocated dest receiver as const. This allows the compiler / linker to mark affected pages as read-only. Doing so requires casting constness away, as CreateDestReceiver() returns both constant and non-constant dest receivers. That's fine though, as any modification of the statically allocated receivers would already have been a bug (and would now be caught on some platforms). Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-16 12:05:50 -07:00
Tom Lane	2c300c6807	Be smarter about age-counter overflow in formatting.c caches. The previous code here simply threw away whatever it knew about cache entry ages whenever a counter overflow occurred. Since the counter is int width and will be bumped once per format function execution, overflows are not really so rare as to not be worth thinking about. Instead, let's deal with the situation by halving all the age values, essentially rescaling the age metric. In that way, we retain a pretty accurate (if not quite perfect) idea of which entries are oldest.	2018-10-16 14:57:14 -04:00
Tom Lane	fd85e9f78d	Avoid statically allocating formatting.c's format string caches. This eliminates circa 120KB of static data from Postgres' memory footprint. In some usage patterns that space will get allocated anyway, but in many processes it never will be allocated. We can improve matters further by allocating only as many cache entries as we actually use, rather than allocating the whole array on first use. However, to avoid wasting lots of space due to palloc's habit of rounding requests up to power-of-2 sizes, tweak the maximum cacheable format string length to make the struct sizes be powers of 2 or just less. The sizes I chose make the maximums a little bit less than they were before, but I doubt it matters much. While at it, rearrange struct FormatNode to avoid wasting quite so much padding space. This change actually halves the size of that struct on 64-bit machines. Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-16 13:11:09 -04:00
Andres Freund	02a30a09f9	Correct constness of system attributes in heap.c & prerequisites. This allows the compiler / linker to mark affected pages as read-only. There's a fair number of pre-requisite changes, to allow the const properly be propagated. Most of consts were already required for correctness anyway, just not represented on the type-level. Arguably we could be more aggressive in using consts in related code, but.. This requires using a few of the types underlying typedefs that removes pointers (e.g. const NameData *) as declaring the typedefed type constant doesn't have the same meaning (it makes the variable const, not what it points to). Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-16 09:44:43 -07:00
Tom Lane	3dfef0c519	Avoid statically allocating gmtsub()'s timezone workspace. localtime.c's "struct state" is a rather large object, ~23KB. We were statically allocating one for gmtsub() to use to represent the GMT timezone, even though that function is not at all heavily used and is never reached in most backends. Let's malloc it on-demand, instead. This does pose the question of how to handle a malloc failure, but there's already a well-defined error report convention here, ie set errno and return NULL. We have but one caller of pg_gmtime in HEAD, and two in back branches, neither of which were troubling to check for error. Make them do so. The possible errors are sufficiently unlikely (out-of-range timestamp, and now malloc failure) that I think elog() is adequate. Back-patch to all supported branches to keep our copies of the IANA timezone code in sync. This particular change is in a stanza that already differs from upstream, so it's a wash for maintenance purposes --- but only as long as we keep the branches the same. Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-16 11:50:18 -04:00
Andres Freund	62649bad83	Correct constness of a few variables. This allows the compiler / linker to mark affected pages as read-only. There's other cases, but they're a bit more invasive, and should go through some review. These are easy. They were found with objdump -j .data -t src/backend/postgres\|awk '{print $4, $5, $6}'\|sort -r\|less Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-15 21:01:14 -07:00
Andres Freund	c5257345ef	Move TupleTableSlots boolean member into one flag variable. There's several reasons for this change: 1) It reduces the total size of TupleTableSlot / reduces alignment padding, making the commonly accessed members fit into a single cacheline (but we currently do not force proper alignment, so that's not yet guaranteed to be helpful) 2) Combining the booleans into a flag allows to combine read/writes from memory. 3) With the upcoming slot abstraction changes, it allows to have core and extended flags, in a memory efficient way. Author: Ashutosh Bapat and Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-10-15 18:23:25 -07:00
Andres Freund	9d906f1119	Move generic slot support functions from heaptuple.c into execTuples.c. heaptuple.c was never a particular good fit for slot_getattr(), slot_getsomeattrs() and slot_getmissingattrs(), but in upcoming changes slots will be made more abstract (allowing slots that contain different types of tuples), making it clearly the wrong place. Note that slot_deform_tuple() remains in it's current place, as it clearly deals with a HeapTuple. getmissingattrs() also remains, but it's less clear that that's correct - but execTuples.c wouldn't be the right place. Author: Ashutosh Bapat. Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-10-15 15:17:04 -07:00
Thomas Munro	e73ca79fc7	Move the replication lag tracker into heap memory. Andres Freund complained about the 128KB of .bss occupied by LagTracker. It's only needed in the walsender process, so allocate it in heap memory there. Author: Thomas Munro Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya%40alap3.anarazel.de	2018-10-16 11:04:41 +13:00
Tom Lane	d48da369ab	Check for stack overrun in standard_ProcessUtility(). ProcessUtility can recurse, and indeed can be driven to infinite recursion, so it ought to have a check_stack_depth() call. This covers the reported bug (portal trying to execute itself) and a bunch of other cases that could perhaps arise somewhere. Per bug #15428 from Malthe Borch. Back-patch to all supported branches. Discussion: https://postgr.es/m/15428-b3c2915ec470b033@postgresql.org	2018-10-15 14:01:38 -04:00
Peter Eisentraut	35584fd05f	Make spelling of "acknowledgment" consistent I used the preferred U.S. spelling, as we do in other cases.	2018-10-15 10:06:45 +02:00
Tom Lane	7d4a10e260	Use PlaceHolderVars within the quals of a FULL JOIN. This prevents failures in cases where we pull up a constant or var-free expression from a subquery and put it into a full join's qual. That can result in not recognizing the qual as containing a mergejoin-able or hashjoin-able condition. A PHV prevents the problem because it is still recognized as belonging to the side of the join the subquery is in. I'm not very sure about the net effect of this change on plan quality. In "typical" cases where the join keys are Vars, nothing changes. In an affected case, the PHV-wrapped expression is less likely to be seen as equal to PHV-less instances below the join, but more likely to be seen as equal to similar expressions above the join, so it may end up being a wash. In the one existing case where there's any visible change in a regression-test plan, it amounts to referencing a lower computation of a COALESCE result instead of recomputing it, which seems like a win. Given my uncertainty about that and the lack of field complaints, no back-patch, even though this is a very ancient problem. Discussion: https://postgr.es/m/32090.1539378124@sss.pgh.pa.us	2018-10-14 13:07:29 -04:00
Michael Paquier	1df21ddb19	Avoid duplicate XIDs at recovery when building initial snapshot On a primary, sets of XLOG_RUNNING_XACTS records are generated on a periodic basis to allow recovery to build the initial state of transactions for a hot standby. The set of transaction IDs is created by scanning all the entries in ProcArray. However it happens that its logic never counted on the fact that two-phase transactions finishing to prepare can put ProcArray in a state where there are two entries with the same transaction ID, one for the initial transaction which gets cleared when prepare finishes, and a second, dummy, entry to track that the transaction is still running after prepare finishes. This way ensures a continuous presence of the transaction so as callers of for example TransactionIdIsInProgress() are always able to see it as alive. So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase transaction finishes to prepare, the record can finish with duplicated XIDs, which is a state expected by design. If this record gets applied on a standby to initial its recovery state, then it would simply fail, so the odds of facing this failure are very low in practice. It would be tempting to change the generation of XLOG_RUNNING_XACTS so as duplicates are removed on the source, but this requires to hold on ProcArrayLock for longer and this would impact all workloads, particularly those using heavily two-phase transactions. XLOG_RUNNING_XACTS is also actually used only to initialize the standby state at recovery, so instead the solution is taken to discard duplicates when applying the initial snapshot. Diagnosed-by: Konstantin Knizhnik Author: Michael Paquier Discussion: https://postgr.es/m/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru Backpatch-through: 9.3	2018-10-14 22:23:21 +09:00
Tom Lane	13cd7209f7	Simplify use of AllocSetContextCreate() wrapper macro. We can allow this macro to accept either abbreviated or non-abbreviated allocation parameters by making use of __VA_ARGS__. As noted by Andres Freund, it's unlikely that any compiler would have __builtin_constant_p but not __VA_ARGS__, so this gives up little or no error checking, and it avoids a minor but annoying API break for extensions. With this change, there is no reason for anybody to call AllocSetContextCreateExtended directly, so in HEAD I renamed it to AllocSetContextCreateInternal. It's probably too late for an ABI break like that in 11, though. Discussion: https://postgr.es/m/20181012170355.bhxi273skjt6sag4@alap3.anarazel.de	2018-10-12 14:26:56 -04:00
Alvaro Herrera	c7d43c4d8a	Correct attach/detach logic for FKs in partitions There was no code to handle foreign key constraints on partitioned tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH a partition that already had an equivalent constraint, that one was ignored and a new constraint was created. Adding this to the fact that foreign key cloning reuses the constraint name on the partition instead of generating a new name (as it probably should, to cater to SQL standard rules about constraint naming within schemas), the result was a pretty poor user experience -- the most visible failure was that just detaching a partition and re-attaching it failed with an error such as ERROR: duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index" DETAIL: Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists. because it would try to create an identically-named constraint in the partition. To make matters worse, if you tried to drop the constraint in the now-independent partition, that would fail because the constraint was still seen as dependent on the constraint in its former parent partitioned table: ERROR: cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09" This fix attacks the problem from two angles: first, when the partition is detached, the constraint is also marked as independent, so the drop now works. Second, when the partition is re-attached, we scan existing constraints searching for one matching the FK in the parent, and if one exists, we link that one to the parent constraint. So we don't end up with a duplicate -- and better yet, we don't need to scan the referenced table to verify that the constraint holds. To implement this I made a small change to previously planner-only struct ForeignKeyCacheInfo to contain the constraint OID; also relcache now maintains the list of FKs for partitioned tables too. Backpatch to 11. Reported-by: Michael Vitale (bug #15425) Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org	2018-10-12 12:37:37 -03:00
Andres Freund	cda6a8d01d	Remove deprecated abstime, reltime, tinterval datatypes. These types have been deprecated for a long time. Catversion bump, for obvious reasons. Author: Andres Freund Discussion: https://postgr.es/m/20181009192237.34wjp3nmw7oynmmr@alap3.anarazel.de https://postgr.es/m/20171213080506.cwjkpcz3bkk6yz2u@alap3.anarazel.de https://postgr.es/m/25615.1513115237@sss.pgh.pa.us	2018-10-11 11:59:15 -07:00
Andres Freund	86896be60e	Move timeofday() implementation out of nabstime.c. nabstime.c is about to be removed, but timeofday() isn't related to the rest of the functionality therein, and some find it useful. Move to timestamp.c. Discussion: https://postgr.es/m/20181009192237.34wjp3nmw7oynmmr@alap3.anarazel.de https://postgr.es/m/20180928223240.kgwc4czzzekrpsid%40alap3.anarazel.de	2018-10-11 11:30:59 -07:00
Andres Freund	e9edc1ba0b	Fix logical decoding error when system table w/ toast is repeatedly rewritten. Repeatedly rewriting a mapped catalog table with VACUUM FULL or CLUSTER could cause logical decoding to fail with: ERROR, "could not map filenode \"%s\" to relation OID" To trigger the problem the rewritten catalog had to have live tuples with toasted columns. The problem was triggered as during catalog table rewrites the heap_insert() check that prevents logical decoding information to be emitted for system catalogs, failed to treat the new heap's toast table as a system catalog (because the new heap is not recognized as a catalog table via RelationIsLogicallyLogged()). The relmapper, in contrast to the normal catalog contents, does not contain historical information. After a single rewrite of a mapped table the new relation is known to the relmapper, but if the table is rewritten twice before logical decoding occurs, the relfilenode cannot be mapped to a relation anymore. Which then leads us to error out. This only happens for toast tables, because the main table contents aren't re-inserted with heap_insert(). The fix is simple, add a new heap_insert() flag that prevents logical decoding information from being emitted, and accept during decoding that there might not be tuple data for toast tables. Unfortunately that does not fix pre-existing logical decoding errors. Doing so would require not throwing an error when a filenode cannot be mapped to a relation during decoding, and that seems too likely to hide bugs. If it's crucial to fix decoding for an existing slot, temporarily changing the ERROR in ReorderBufferCommit() to a WARNING appears to be the best fix. Author: Andres Freund Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de Backpatch: 9.4-, where logical decoding was introduced	2018-10-10 13:53:02 -07:00
Peter Eisentraut	f82d4d666f	Slightly correct context check for event triggers The previous check for a "complete query" omitted the new PROCESS_UTILITY_QUERY_NONATOMIC value. This didn't actually make a difference in practice, because only CALL and SET from PL/pgSQL run in this state, but it's more correct to include it anyway. Discussion: https://www.postgresql.org/message-id/4566041d-2567-74d2-d135-19ff6a20fe51%402ndquadrant.com	2018-10-10 22:41:12 +02:00
Peter Eisentraut	f8c10f616f	Turn transaction_isolation into GUC enum It was previously a string setting that was converted into an enum by custom code, but using the GUC enum facility seems much simpler and doesn't change any functionality, except that set transaction_isolation='default'; no longer works, but that was never documented and doesn't work with any other transaction characteristics. (Note that this is not the same as RESET or SET TO DEFAULT, which still work.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/457db615-e84c-4838-310e-43841eb806e5@iki.fi	2018-10-09 21:26:00 +02:00
Michael Paquier	c481016201	Add pg_ls_archive_statusdir function This function lists the contents of the WAL archive status directory, and is intended to be used by monitoring tools. Unlike pg_ls_dir(), access to it can be granted to non-superusers so that those monitoring tools can observe the principle of least privilege. Access is also given by default to members of pg_monitor. Author: Christoph Moench-Tegeder Reviewed-by: Aya Iwata Discussion: https://postgr.es/m/20180930205920.GA64534@elch.exwg.net	2018-10-09 22:29:09 +09:00
Thomas Munro	212fab9926	Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux). Originally committed as `15bc038f` (plus some follow-ups), this was reverted in `28e07270` due to a problem discovered in parallel workers. This new version corrects that problem by sending the list of uncommitted enum values to parallel workers. Here follows the original commit message describing the change: To prevent possibly breaking indexes on enum columns, we must keep uncommitted enum values from getting stored in tables, unless we can be sure that any such column is new in the current transaction. Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE from being executed at all in a transaction block, unless the target enum type had been created in the current transaction. This patch removes that restriction, and instead insists that an uncommitted enum value can't be referenced unless it belongs to an enum type created in the same transaction as the value. Per discussion, this should be a bit less onerous. It does require each function that could possibly return a new enum value to SQL operations to check this restriction, but there aren't so many of those that this seems unmaintainable. Author: Andrew Dunstan and Tom Lane, with parallel query fix by Thomas Munro Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEepm%3D0Ei7g6PaNTbcmAh9tCRahQrk%3Dr5ZWLD-jr7hXweYX3yg%40mail.gmail.com Discussion: https://postgr.es/m/4075.1459088427%40sss.pgh.pa.us	2018-10-09 12:51:01 +13:00
Tom Lane	82ff0cc91d	Advance transaction timestamp for intra-procedure transactions. Per discussion, this behavior seems less astonishing than not doing so. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/20180920234040.GC29981@momjian.us	2018-10-08 16:16:36 -04:00
Tom Lane	6eb3eb577d	Improve snprintf.c's handling of NaN, Infinity, and minus zero. Up to now, float4out/float8out handled NaN and Infinity cases explicitly, and invoked psprintf only for ordinary float values. This was done because platform implementations of snprintf produce varying representations of these special cases. But now that we use snprintf.c always, it's better to give it the responsibility to produce a uniform representation of these cases, so that we have uniformity across the board not only in float4out/float8out. Hence, move that work into fmtfloat(). Also, teach fmtfloat() to recognize IEEE minus zero and handle it correctly. The previous coding worked only accidentally, and would fail for e.g. "%+f" format (it'd print "+-0.00000"). Now that we're using snprintf.c everywhere, it's not acceptable for it to do weird things in corner cases. (This incidentally avoids a portability problem we've seen on some really ancient platforms, that native sprintf does the wrong thing with minus zero.) Also, introduce a new entry point in snprintf.c to allow float[48]out to bypass the work of interpreting a well-known format spec, as well as bypassing the overhead of the psprintf layer. I modeled this API loosely on strfromd(). In my testing, this brings float[48]out back to approximately the same speed they had when using native snprintf, fixing one of the main performance issues caused by using snprintf.c. (There is some talk of more aggressive work to improve the speed of floating-point output conversion, but these changes seem to provide a better starting point for such work anyway.) Getting rid of the previous ad-hoc hack for Infinity/NaN in fmtfloat() allows removing <ctype.h> from snprintf.c's #includes. I also removed a few other #includes that I think are historical, though the buildfarm may expose that as wrong. Discussion: https://postgr.es/m/13178.1538794717@sss.pgh.pa.us	2018-10-08 12:19:20 -04:00
Tom Lane	f9eb7c14b0	Avoid O(N^2) cost in ExecFindRowMark(). If there are many ExecRowMark structs, we spent O(N^2) time in ExecFindRowMark during executor startup. Once upon a time this was not of great concern, but the addition of native partitioning has squeezed out enough other costs that this can become the dominant overhead in some use-cases for tables with many partitions. To fix, simply replace that List data structure with an array. This adds a little bit of cost to execCurrentOf(), but not much, and anyway that code path is neither of large importance nor very efficient now. If we ever decide it is a bottleneck, constructing a hash table for lookup-by-tableoid would likely be the thing to do. Per complaint from Amit Langote, though this is different from his fix proposal. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-08 10:41:34 -04:00
Alvaro Herrera	eee01d606e	Silence compiler warning in Assert() gcc 6.3 does not whine about this mistake I made in `39808e8868` but evidently lots of other compilers do, according to Michael Paquier, Peter Eisentraut, Arthur Zakirov, Tomas Vondra. Discussion: too many to list	2018-10-08 10:37:21 -03:00
Peter Eisentraut	634b4b79cb	Track procedure calls in pg_stat_user_functions This was forgotten when procedures were implemented. Reported-by: Lukas Fittl <lukas@fittl.com>	2018-10-08 11:22:53 +02:00
Michael Paquier	9c2a970d1f	Improve two error messages related to foreign keys on partitioned tables Error messages for creating a foreign key on a partitioned table using ONLY or NOT VALID were wrong in mentioning the objects they worked on. This commit adds on the way some regression tests missing for those cases. Author: Laurenz Albe Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/c11c05810a9ed65e9b2c817a9ef442275a32fe80.camel@cybertec.at	2018-10-08 17:56:13 +09:00
Tom Lane	52ed730d51	Remove some unnecessary fields from Plan trees. In the wake of commit `f2343653f`, we no longer need some fields that were used before to control executor lock acquisitions: * PlannedStmt.nonleafResultRelations can go away entirely. * partitioned_rels can go away from Append, MergeAppend, and ModifyTable. However, ModifyTable still needs to know the RT index of the partition root table if any, which was formerly kept in the first entry of that list. Add a new field "rootRelation" to remember that. rootRelation is partly redundant with nominalRelation, in that if it's set it will have the same value as nominalRelation. However, the latter field has a different purpose so it seems best to keep them distinct. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-07 14:33:17 -04:00
Alvaro Herrera	39808e8868	Fix catalog insertion order for ATTACH PARTITION Commit `2fbdf1b38b` changed the order in which we inserted catalog rows when creating partitions, so that we could remove an unsightly hack required for untimely relcache invalidations. However, that commit only changed the ordering for CREATE TABLE PARTITION OF, and left ALTER TABLE ATTACH PARTITION unchanged, so the latter can be affected when catalog invalidations occur, for instance when the partition key involves an SQL function. Reported-by: Rajkumar Raghuwanshi Author: Amit Langote Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/CAKcux6=nTz9KSfTr_6Z2mpzLJ_09JN-rK6=dWic6gGyTSWueyQ@mail.gmail.com	2018-10-06 22:13:19 -03:00
Alvaro Herrera	ad08006ba0	Fix event triggers for partitioned tables Index DDL cascading on partitioned tables introduced a way for ALTER TABLE to be called reentrantly. This caused an an important deficiency in event trigger support to be exposed: on exiting the reentrant call, the alter table state object was clobbered, causing a crash when the outer alter table tries to finalize its processing. Fix the crash by creating a stack of event trigger state objects. There are still ways to cause things to misbehave (and probably other crashers) with more elaborate tricks, but at least it now doesn't crash in the obvious scenario. Backpatch to 9.5, where DDL deparsing of event triggers was introduced. Reported-by: Marco Slot Authors: Michaël Paquier, Álvaro Herrera Discussion: https://postgr.es/m/CANNhMLCpi+HQ7M36uPfGbJZEQLyTy7XvX=5EFkpR-b1bo0uJew@mail.gmail.com	2018-10-06 19:17:46 -03:00
Tom Lane	29ef2b310d	Restore sane locking behavior during parallel query. Commit `9a3cebeaa` changed things so that parallel workers didn't obtain any lock of their own on tables they access. That was clearly a bad idea, but I'd mistakenly supposed that it was the intended end result of the series of patches for simplifying the executor's lock management. Undo that change in relation_open(), and adjust ExecOpenScanRelation() so that it gets the correct lock if inside a parallel worker. In passing, clean up some more obsolete comments about when locks are acquired. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-06 15:49:37 -04:00
Tom Lane	f2343653f5	Remove more redundant relation locking during executor startup. We already have appropriate locks on every relation listed in the query's rangetable before we reach the executor. Take the next step in exploiting that knowledge by removing code that worries about taking locks on non-leaf result relations in a partitioned table. In particular, get rid of ExecLockNonLeafAppendTables and a stanza in InitPlan that asserts we already have locks on certain such tables. In passing, clean up some now-obsolete comments in InitPlan. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-06 15:12:51 -04:00
Tom Lane	0209f0285d	Don't use is_infinite() where isinf() will do. Places that aren't testing for sign should not use the more expensive function; it's just wasteful, not to mention being a cognitive load for readers who may know what isinf() is but not is_infinite(). As things stand, we actually don't need is_infinite() anyplace except float4out/float8out, which means it could potentially go away altogether after the changes I proposed in <13178.1538794717@sss.pgh.pa.us>.	2018-10-06 13:18:38 -04:00
Tom Lane	07ee62ce9e	Propagate xactStartTimestamp and stmtStartTimestamp to parallel workers. Previously, a worker process would establish values for these based on its own start time. In v10 and up, this can trivially be shown to cause misbehavior of transaction_timestamp(), timestamp_in(), and related functions which are (perhaps unwisely?) marked parallel-safe. It seems likely that other behaviors might diverge from what happens in the parent as well. It's not as trivial to demonstrate problems in 9.6 or 9.5, but I'm sure it's still possible, so back-patch to all branches containing parallel worker infrastructure. In HEAD only, mark now() and statement_timestamp() as parallel-safe (other affected functions already were). While in theory we could still squeeze that change into v11, it doesn't seem important enough to force a last-minute catversion bump. Konstantin Knizhnik, whacked around a bit by me Discussion: https://postgr.es/m/6406dbd2-5d37-4cb6-6eb2-9c44172c7e7c@postgrespro.ru	2018-10-06 12:00:09 -04:00
Dean Rasheed	e954a727f0	Improve the accuracy of floating point statistical aggregates. When computing statistical aggregates like variance, the common schoolbook algorithm which computes the sum of the squares of the values and subtracts the square of the mean can lead to a large loss of precision when using floating point arithmetic, because the difference between the two terms is often very small relative to the terms themselves. To avoid this, re-work these aggregates to use the Youngs-Cramer algorithm, which is a proven, numerically stable algorithm that directly aggregates the sum of the squares of the differences of the values from the mean in a single pass over the data. While at it, improve the test coverage to test the aggregate combine functions used during parallel aggregation. Per report and suggested algorithm from Erich Schubert. Patch by me, reviewed by Madeleine Thompson. Discussion: https://postgr.es/m/153313051300.1397.9594490737341194671@wrigleys.postgresql.org	2018-10-06 11:20:09 +01:00
Michael Paquier	38921d1416	Assign constraint name when cloning FK definition for partitions This is for example used when attaching a partition to a partitioned table which includes foreign keys, and in this case the constraint name has been missing in the data cloned. This could lead to hard crashes, as when validating the foreign key constraint, the constraint name is always expected. Particularly, when using log_min_messages >= DEBUG1, a log message would be generated with this unassigned constraint name, leading to an assertion failure on HEAD. While on it, rename a variable in ATExecAttachPartition which was declared twice with the same name. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20181005042236.GG1629@paquier.xyz Backpatch-through: 11	2018-10-06 14:59:36 +09:00
Tom Lane	c87cb5f7a6	Allow btree comparison functions to return INT_MIN. Historically we forbade datatype-specific comparison functions from returning INT_MIN, so that it would be safe to invert the sort order just by negating the comparison result. However, this was never really safe for comparison functions that directly return the result of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction on those library functions. Buildfarm results show that at least on recent Linux on s390x, memcmp() actually does return INT_MIN sometimes, causing sort failures. The agreed-on answer is to remove this restriction and fix relevant call sites to not make such an assumption; code such as "res = -res" should be replaced by "INVERT_COMPARE_RESULT(res)". The same is needed in a few places that just directly negated the result of memcmp or strcmp. To help find places having this problem, I've also added a compile option to nbtcompare.c that causes some of the commonly used comparators to return INT_MIN/INT_MAX instead of their usual -1/+1. It'd likely be a good idea to have at least one buildfarm member running with "-DSTRESS_SORT_INT_MIN". That's far from a complete test of course, but it should help to prevent fresh introductions of such bugs. This is a longstanding portability hazard, so back-patch to all supported branches. Discussion: https://postgr.es/m/20180928185215.ffoq2xrq5d3pafna@alap3.anarazel.de	2018-10-05 16:01:29 -04:00
Michael Paquier	9cd92d1a33	Add pg_ls_tmpdir function This lists the contents of a temporary directory associated to a given tablespace, useful to get information about on-disk consumption caused by temporary files used by a session query. By default, pg_default is scanned, and a tablespace can be specified as argument. This function is intended to be used by monitoring tools, and, unlike pg_ls_dir(), access to them can be granted to non-superusers so that those monitoring tools can observe the principle of least privilege. Access is also given by default to members of pg_monitor. Author: Nathan Bossart Reviewed-by: Laurenz Albe Discussion: https://postgr.es/m/92F458A2-6459-44B8-A7F2-2ADD3225046A@amazon.com	2018-10-05 09:21:48 +09:00
Tom Lane	d73f4c74dd	In the executor, use an array of pointers to access the rangetable. Instead of doing a lot of list_nth() accesses to es_range_table, create a flattened pointer array during executor startup and index into that to get at individual RangeTblEntrys. This eliminates one source of O(N^2) behavior with lots of partitions. (I'm not exactly convinced that it's the most important source, but it's an easy one to fix.) Amit Langote and David Rowley Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 15:48:17 -04:00
Tom Lane	9ddef36278	Centralize executor's opening/closing of Relations for rangetable entries. Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 14:03:42 -04:00
Alvaro Herrera	fb9e93a2c5	Fix duplicate primary keys in partitions When using the CREATE TABLE .. PARTITION OF syntax, it's possible to cause a partition to get two primary keys if the parent already has one. Tighten the check to disallow that. Reported-by: Rajkumar Raghuwanshi Author: Amul Sul Discussion: https://postgr.es/m/CAKcux6=OnSV3-qd8Gb6W=KPPwcCz6Fe_O_MQYjTa24__Xn8XxA@mail.gmail.com	2018-10-04 11:40:36 -03:00
Michael Paquier	09921f397b	Refactor user-facing SQL functions signalling backends This moves the system administration functions for signalling backends from backend/utils/adt/misc.c into a separate file dedicated to backend signalling. No new functionality is introduced in this commit. Author: Daniel Gustafsson Reviewed-by: Michael Paquier, Álvaro Herrera Discussion: https://postgr.es/m/C2C7C3EC-CC5F-44B6-9C78-637C88BD7D14@yesql.se	2018-10-04 18:27:25 +09:00
Michael Paquier	803b1301e8	Add option SKIP_LOCKED to VACUUM and ANALYZE When specified, this option allows VACUUM to skip the work on a relation if there is a conflicting lock on it when trying to open it at the beginning of its processing. Similarly to autovacuum, this comes with a couple of limitations while the relation is processed which can cause the process to still block: - when opening the relation indexes. - when acquiring row samples for table inheritance trees, partition trees or certain types of foreign tables, and that a lock is taken on some leaves of such trees. Author: Nathan Bossart Reviewed-by: Michael Paquier, Andres Freund, Masahiko Sawada Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com Discussion: https://postgr.es/m/20171201160907.27110.74730@wrigleys.postgresql.org	2018-10-04 09:00:33 +09:00
Tom Lane	9a3cebeaa7	Change executor to just Assert that table locks were already obtained. Instead of locking tables during executor startup, just Assert that suitable locks were obtained already during the parse/plan pipeline (or re-obtained by the plan cache). This must be so, else we have a hazard that concurrent DDL has invalidated the plan. This is pretty inefficient as well as undercommented, but it's all going to go away shortly, so I didn't try hard. This commit is just another attempt to use the buildfarm to see if we've missed anything in the plan to simplify the executor's table management. Note that the change needed here in relation_open() exposes that parallel workers now really are accessing tables without holding any lock of their own, whereas they were not doing that before this commit. This does not give me a warm fuzzy feeling about that aspect of parallel query; it does not seem like a good design, and we now know that it's had exactly no actual testing. I think that we should modify parallel query so that that change can be reverted. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-03 16:05:12 -04:00
Andres Freund	c03c1449c0	Fix issues around EXPLAIN with JIT. I (Andres) was more than a bit hasty in committing `33001fd7a7` after last minute changes, leading to a number of problems (jit output was only shown for JIT in parallel workers, and just EXPLAIN without ANALYZE didn't work). Lukas luckily found these issues quickly. Instead of combining instrumentation in in standard_ExecutorEnd(), do so on demand in the new ExplainPrintJITSummary(). Also update a documentation example of the JIT output, changed in `52050ad8eb`. Author: Lukas Fittl, with minor changes by me Discussion: https://postgr.es/m/CAP53PkxmgJht69pabxBXJBM+0oc6kf3KHMborLP7H2ouJ0CCtQ@mail.gmail.com Backpatch: 11, where JIT compilation was introduced	2018-10-03 12:48:37 -07:00
Amit Kapila	9bc9f72b28	MAXALIGN the target address where we store flattened value. The API (EOH_flatten_into) that flattens the expanded value representation expects the target address to be maxaligned. All it's usage adhere to that principle except when serializing datums for parallel query. Fix that usage. Diagnosed-by: Tom Lane Author: Tom Lane and Amit Kapila Backpatch-through: 9.6 Discussion: https://postgr.es/m/11629.1536550032@sss.pgh.pa.us	2018-10-03 09:15:03 +05:30
Tom Lane	6e35939feb	Change rewriter/planner/executor/plancache to depend on RTE rellockmode. Instead of recomputing the required lock levels in all these places, just use what commit `fdba460a2` made the parser store in the RTE fields. This already simplifies the code measurably in these places, and follow-on changes will remove a bunch of no-longer-needed infrastructure. In a few cases, this change causes us to acquire a higher lock level than we did before. This is OK primarily because said higher lock level should've been acquired already at query parse time; thus, we're saving a useless extra trip through the shared lock manager to acquire a lesser lock alongside the original lock. The only known exception to this is that re-execution of a previously planned SELECT FOR UPDATE/SHARE query, for a table that uses ROW_MARK_REFERENCE or ROW_MARK_COPY methods, might have gotten only AccessShareLock before. Now it will get RowShareLock like the first execution did, which seems fine. While there's more to do, push it in this state anyway, to let the buildfarm help verify that nothing bad happened. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-02 14:43:09 -04:00
Andres Freund	cc2905e963	Use slots more widely in tuple mapping code and make naming more consistent. It's inefficient to use a single slot for mapping between tuple descriptors for multiple tuples, as previously done when using ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors change for every tuple. Previously we also, via ConvertPartitionTupleSlot(), built new tuples after the mapping even in cases where we, immediately afterwards, access individual columns again. Refactor the code so one slot, on demand, is used for each partition. That avoids having to change the descriptor (and allows to use the more efficient "fixed" tuple slots). Then use slot->slot mapping, to avoid unnecessarily forming a tuple. As the naming between the tuple and slot mapping functions wasn't consistent, rename them to execute_attr_map_{tuple,slot}. It's likely that we'll also rename convert_tuples_by_* to denote that these functions "only" build a map, but that's left for later. Author: Amit Khandekar and Amit Langote, editorialized by me Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund Discussion: https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp Backpatch: -	2018-10-02 11:14:26 -07:00
Tom Lane	3d0f68dd30	Fix corner-case failures in has_foo_privilege() family of functions. The variants of these functions that take numeric inputs (OIDs or column numbers) are supposed to return NULL rather than failing on bad input; this rule reduces problems with snapshot skew when queries apply the functions to all rows of a catalog. has_column_privilege() had careless handling of the case where the table OID didn't exist. You might get something like this: select has_column_privilege(9999,'nosuchcol','select'); ERROR: column "nosuchcol" of relation "(null)" does not exist or you might get a crash, depending on the platform's printf's response to a null string pointer. In addition, while applying the column-number variant to a dropped column returned NULL as desired, applying the column-name variant did not: select has_column_privilege('mytable','........pg.dropped.2........','select'); ERROR: column "........pg.dropped.2........" of relation "mytable" does not exist It seems better to make this case return NULL as well. Also, the OID-accepting variants of has_foreign_data_wrapper_privilege, has_server_privilege, and has_tablespace_privilege didn't follow the principle of returning NULL for nonexistent OIDs. Superusers got TRUE, everybody else got an error. Per investigation of Jaime Casanova's report of a new crash in HEAD. These behaviors have been like this for a long time, so back-patch to all supported branches. Patch by me; thanks to Stephen Frost for discussion and review Discussion: https://postgr.es/m/CAJGNTeP=-6Gyqq5TN9OvYEydi7Fv1oGyYj650LGTnW44oAzYCg@mail.gmail.com	2018-10-02 11:54:12 -04:00
Michael Paquier	e3a25ab9ea	Refactor relation opening for VACUUM and ANALYZE VACUUM and ANALYZE share similar logic when it comes to opening a relation to work on in terms of how the relation is opened, in which order locks are tried and how logs should be generated when something does not work as expected. This commit refactors things so as both use the same code path to handle the way a relation is opened, so as the integration of new options becomes easier. Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/20180927075152.GT1659@paquier.xyz	2018-10-02 08:53:38 +09:00
Peter Eisentraut	cf3dfea45b	Change PROCEDURE to FUNCTION in CREATE EVENT TRIGGER syntax This was claimed to have been done in `0a63f996e0`, but that actually only changed the documentation and not the grammar. (That commit did fully change it for CREATE TRIGGER.)	2018-10-01 23:02:55 +02:00
Tom Lane	b04aeb0a05	Add assertions that we hold some relevant lock during relation open. Opening a relation with no lock at all is unsafe; there's no guarantee that we'll see a consistent state of the relevant catalog entries. While use of MVCC scans to read the catalogs partially addresses that complaint, it's still possible to switch to a new catalog snapshot partway through loading the relcache entry. Moreover, whether or not you trust the reasoning behind sometimes using less than AccessExclusiveLock for ALTER TABLE, that reasoning is certainly not valid if concurrent users of the table don't hold a lock corresponding to the operation they want to perform. Hence, add some assertion-build-only checks that require any caller of relation_open(x, NoLock) to hold at least AccessShareLock. This isn't a full solution, since we can't verify that the lock level is semantically appropriate for the action --- but it's definitely of some use, because it's already caught two bugs. We can also assert that callers of addRangeTableEntryForRelation() hold at least the lock level specified for the new RTE. Amit Langote and Tom Lane Discussion: https://postgr.es/m/16565.1538327894@sss.pgh.pa.us	2018-10-01 12:43:21 -04:00
Tom Lane	e27453bd83	Fix ALTER COLUMN TYPE to not open a relation without any lock. If the column being modified is referenced by a foreign key constraint of another table, ALTER TABLE would open the other table (to re-parse the constraint's definition) without having first obtained a lock on it. This was evidently intentional, but that doesn't mean it's really safe. It's especially not safe in 9.3, which pre-dates use of MVCC scans for catalog reads, but even in current releases it doesn't seem like a good idea. We know we'll need AccessExclusiveLock shortly to drop the obsoleted constraint, so just get that a little sooner to close the hole. Per testing with a patch that complains if we open a relation without holding any lock on it. I don't plan to back-patch that patch, but we should close the holes it identifies in all supported branches. Discussion: https://postgr.es/m/2038.1538335244@sss.pgh.pa.us	2018-10-01 11:39:13 -04:00
Tom Lane	fdba460a26	Create an RTE field to record the query's lock mode for each relation. Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-09-30 13:55:51 -04:00
Stephen Frost	8bddc86400	Add application_name to connection authorized msg The connection authorized message has quite a bit of useful information in it, but didn't include the application_name (when provided), so let's add that as it can be very useful. Note that at the point where we're emitting the connection authorized message, we haven't processed GUCs, so it's not possible to get this by using log_line_prefix (which pulls from the GUC). There's also something to be said for having this included in the connection authorized message and then not needing to repeat it for every line, as having it in log_line_prefix would do. The GUC cleans the application name to pure-ascii, so do that here too, but pull out the logic for cleaning up a string into its own function in common and re-use it from those places, and check_cluster_name which was doing the same thing. Author: Don Seiler <don@seiler.us> Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com	2018-09-28 19:04:50 -04:00
Tom Lane	2b04dfc472	Improve error reporting for unsupported effective_io_concurrency setting. Give a specific error complaining about lack of posix_fadvise() when someone tries to set effective_io_concurrency > 0 on platforms without that. This probably isn't worth extensive back-patching, but I (tgl) felt cramming it into v11 was reasonable. James Robinson Discussion: https://postgr.es/m/153771876450.14994.560017943128223619@wrigleys.postgresql.org Discussion: https://postgr.es/m/A3942987-5BC7-4F05-B54D-2A0EC2914B33@jlr-photo.com	2018-09-28 16:12:13 -04:00
Amit Kapila	a86bf6057e	Fix assertion failure when updating full_page_writes for checkpointer. When the checkpointer receives a SIGHUP signal to update its configuration, it may need to update the shared memory for full_page_writes and need to write a WAL record for it. Now, it is quite possible that the XLOG machinery has not been initialized by that time and it will lead to assertion failure while doing that. Fix is to allow the initialization of the XLOG machinery outside critical section. This bug has been introduced by the commit `2c03216d83` which added the XLOG machinery initialization in RecoveryInProgress code path. Reported-by: Dilip Kumar Author: Dilip Kumar Reviewed-by: Michael Paquier and Amit Kapila Backpatch-through: 9.5 Discussion: https://postgr.es/m/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC=C2whnzBM8TAcv=stckYUw@mail.gmail.com	2018-09-28 16:40:04 +05:30
Michael Paquier	78ea8b5daa	Fix WAL recycling on standbys depending on archive_mode A restart point or a checkpoint recycling WAL segments treats segments marked with neither ".done" (archiving is done) or ".ready" (segment is ready to be archived) in archive_status the same way for archive_mode being "on" or "always". While for a primary this is fine, a standby running a restart point with archive_mode = on would try to mark such a segment as ready for archiving, which is something that will never happen except after the standby is promoted. Note that this problem applies only to WAL segments coming from the local pg_wal the first time archive recovery is run. Segments part of a self-contained base backup are the most common case where this could happen, however even in this case normally the .done markers would be most likely part of the backup. Segments recovered from an archive are marked as .ready or .done by the startup process, and segments finished streaming are marked as such by the WAL receiver, so they are handled already. Reported-by: Haruka Takatsuka Author: Michael Paquier Discussion: https://postgr.es/m/15402-a453c90ed4cf88b2@postgresql.org Backpatch-through: 9.5, where archive_mode = always has been added.	2018-09-28 11:54:38 +09:00
Tom Lane	aaf10f32a3	Fix assorted bugs in pg_get_partition_constraintdef(). It failed if passed a nonexistent relation OID, or one that was a non-heap relation, because of blindly applying heap_open to a user-supplied OID. This is not OK behavior for a SQL-exposed function; we have a project policy that we should return NULL in such cases. Moreover, since pg_get_partition_constraintdef ought now to work on indexes, restricting it to heaps is flat wrong anyway. The underlying function generate_partition_qual() wasn't on board with indexes having partition quals either, nor for that matter with rels having relispartition set but yet null relpartbound. (One wonders whether the person who wrote the function comment blocks claiming that these functions allow a missing relpartbound had ever tested it.) Fix by testing relispartition before opening the rel, and by using relation_open not heap_open. (If any other relkinds ever grow the ability to have relispartition set, the code will work with them automatically.) Also, don't reject null relpartbound in generate_partition_qual. Back-patch to v11, and all but the null-relpartbound change to v10. (It's not really necessary to change generate_partition_qual at all in v10, but I thought s/heap_open/relation_open/ would be a good idea anyway just to keep the code in sync with later branches.) Per report from Justin Pryzby. Discussion: https://postgr.es/m/20180927200020.GJ776@telsasoft.com	2018-09-27 18:15:17 -04:00
Alexander Korotkov	4ec90f53f1	Minor formatting cleanup for `2a6368343f`	2018-09-27 23:29:50 +03:00
Alexander Korotkov	0f64595894	Remove extra usage of BoxPGetDatum() macro Author: Mark Dilger Discussion: https://postgr.es/m/B2AEFCD0-836D-4654-9D59-3DF616E0A6F3%40gmail.com	2018-09-27 23:25:22 +03:00
Andres Freund	27e082b0c6	Clean up in the wake of TupleDescGetSlot() removal / `10763358c3`. The previous commit wasn't careful enough to remove all traces of TupleDescGetSlot(). Besides fixing the oversight of not removing TupleDescGetSlot()'s declaration, this also removes FuncCallContext->slot. That was documented to be for use in combination with TupleDescGetSlot(), a cursory search over extensions finds no users, and there doesn't seem to be convincing reasons to keep it around. If we later in the v12 release cycle find users, we can re-consider this part of the commit. Reported-By: Michael Paquier Discussion: https://postgr.es/m/20180926000413.GC1659@paquier.xyz	2018-09-27 11:38:11 -07:00
Michael Paquier	ba16aade33	Switch flags tracking pending interrupts to sig_atomic_t Those previously used bool, which should be safe on any modern platforms, however the C standard is clear that it is better to use sig_atomic_t for variables manipulated in signal handlers. This commit adds at the same time PGDLLIMPORT to ClientConnectionLost. Author: Michael Paquier Reviewed-by: Tom Lane, Chris Travers, Andres Freund Discussion: https://postgr.es/m/20180925011311.GD1354@paquier.xyz	2018-09-27 07:47:20 +09:00
Peter Eisentraut	0320ddaf3c	Recurse to sequences on ownership change for all relkinds When a table ownership is changed, we must apply that also to any owned sequences. (Otherwise, it would result in a situation that cannot be restored, because linked sequences must have the same owner as the table.) But this was previously only applied to regular tables and materialized views. But it should also apply to at least foreign tables. This patch removes the relkind check altogether, because it doesn't save very much and just introduces the possibility of similar omissions. Bug: #15238 Reported-by: Christoph Berg <christoph.berg@credativ.de>	2018-09-26 20:19:15 +02:00
Tom Lane	d6c55de1f9	Implement %m in src/port/snprintf.c, and teach elog.c to rely on that. I started out with the idea that we needed to detect use of %m format specs in contexts other than elog/ereport calls, because we couldn't rely on that working in *printf calls. But a better answer is to fix things so that it does work. Now that we're using snprintf.c all the time, we can implement %m in that and we've fixed the problem. This requires also adjusting our various printf-wrapping functions so that they ensure "errno" is preserved when they call snprintf.c. Remove elog.c's handmade implementation of %m, and let it rely on snprintf to support the feature. That should provide some performance gain, though I've not attempted to measure it. There are a lot of places where we could now simplify 'printf("%s", strerror(errno))' into 'printf("%m")', but I'm not in any big hurry to make that happen. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us	2018-09-26 13:31:56 -04:00
Tom Lane	26e9d4d4ef	Convert elog.c's useful_strerror() into a globally-used strerror wrapper. elog.c has long had a private strerror wrapper that handles assorted possible failures or deficiencies of the platform's strerror. On Windows, it also knows how to translate Winsock error codes, which the native strerror does not. Move all this code into src/port/strerror.c and define strerror() as a macro that invokes it, so that both our frontend and backend code will have all of this behavior. I believe this constitutes an actual bug fix on Windows, since AFAICS our frontend code did not report Winsock error codes properly before this. However, the main point is to lay the groundwork for implementing %m in src/port/snprintf.c: the behavior we want %m to have is this one, not the native strerror's. Note that this throws away the prior use of src/port/strerror.c, which was to implement strerror() on platforms lacking it. That's been dead code for nigh twenty years now, since strerror() was already required by C89. We should likewise cause strerror_r to use this behavior, but I'll tackle that separately. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us	2018-09-26 11:06:42 -04:00
Peter Eisentraut	a49ceda6a0	Update dummy CREATE ASSERTION grammar While we are probably still far away from fully implementing assertions, all patch proposals appear to take issue with the existing dummy grammar CREATE/DROP ASSERTION productions, so update those a little bit. Rename the rule, use any_name instead of name, and remove some unused code. Also remove the production for DROP ASSERTION, since that would most likely be handled via the generic DROP support. extracted from a patch by Joe Wildish	2018-09-26 13:26:24 +02:00
Tomas Vondra	2e2a392de3	Fix problems in handling the line data type According to the source history, the internal format of line data type has changed, but various functions working with it did were not updated and thus were producing wrong results. This patch addresses various such issues, in particular: * Reject invalid specification A=B=0 on receive * Reject same points on line_construct_pp() * Fix perpendicular operator when negative values are involved * Avoid division by zero on perpendicular operator * Fix intersection and distance operators when neither A nor B are 1 * Return NULL for closest point when objects are parallel * Check whether closest point of line segments is the intersection point * Fix closest point of line segments being on the wrong segment Aside from handling those issues, the patch also aims to make operators more symmetric and less sen to precision loss. The EPSILON interferes with even minor changes, but the least we can do is applying it to both sides of the operators equally. Author: Emre Hasegeli Reviewed-by: Tomas Vondra Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com	2018-09-26 10:25:24 +02:00
Michael Paquier	8d28bf500f	Rework activation of commit timestamps during recovery The activation and deactivation of commit timestamp tracking has not been handled consistently for a primary or standbys at recovery. The facility can be activated at three different moments of recovery: - The beginning, where a primary would use the GUC value for the decision-making, and where a standby relies on the contents of the control file. - When replaying a XLOG_PARAMETER_CHANGE record at redo. - The end, where both primary and standby rely on the GUC value. Using the GUC value for a primary at the beginning of recovery causes problems with commit timestamp access when doing crash recovery. Particularly, when replaying transaction commits, it could be possible that an attempt to read commit timestamps is done for a transaction which committed at a moment when track_commit_timestamp was disabled. A test case is added to reproduce the failure. The test works down to v11 as it takes advantage of transaction commits within procedures. Reported-by: Hailong Li Author: Masahiko Sawasa, Michael Paquier Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/11224478-a782-203b-1f17-e4797b39bdf0@qunar.com Backpatch-through: 9.5, where commit timestamps have been introduced.	2018-09-26 10:25:54 +09:00
Andres Freund	10763358c3	Remove absolete function TupleDescGetSlot(). TupleDescGetSlot() was kept around for backward compatibility for user-written SRFs. With the TupleTableSlot abstraction work, that code will need to be version specific anyway, so there's no point in keeping the function around any longer. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:28:57 -07:00
Andres Freund	29c94e03c7	Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple. Upcoming changes introduce further types of tuple table slots, in preparation of making table storage pluggable. New storage methods will have different representation of tuples, therefore the slot accessor should refer explicitly to heap tuples. Instead of just renaming the functions, split it into one function that accepts heap tuples not residing in buffers, and one accepting ones in buffers. Previously one function was used for both, but that was a bit awkward already, and splitting will allow us to represent slot types for tuples in buffers and normal memory separately. This is split out from the patch introducing abstract slots, as this largely consists out of mechanical changes. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:27:48 -07:00
Andres Freund	bbdfbb9154	Remove function list from prologue of execTuples.c. That section is never in sync with the actual routines available and their functionality. Author: Ashutosh Bapat Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:27:48 -07:00
Andres Freund	a598708ffa	Change TupleTableSlot->tts_nvalid to type AttrNumber. Previously it was an int / 4 bytes. The maximum number of attributes in a tuple is restricted by the maximum value Var->varattno, which is an AttrNumber/int16. Hence use the same data type for TupleTableSlot->tts_nvalid. Author: Ashutosh Bapat Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 15:59:46 -07:00
Alvaro Herrera	5913b9bbf3	Remove obsolete comment The documented shortcoming was actually fixed in `4c728f3829` so the comment is not true anymore.	2018-09-25 17:55:22 -03:00
Andres Freund	33001fd7a7	Collect JIT instrumentation from workers. Previously, when using parallel query, EXPLAIN (ANALYZE)'s JIT compilation timings did not include the overhead from doing so on the workers. Fix that. We do so by simply aggregating the cost of doing JIT compilation on workers and the leader together. Arguably that's not quite accurate, because the total time spend doing so is spent in parallel - but it's hard to do much better. For additional detail, when VERBOSE is specified, the stats for workers are displayed separately. Author: Amit Khandekar and Andres Freund Discussion: https://postgr.es/m/CAJ3gD9eLrz51RK_gTkod+71iDcjpB_N8eC6vU2AW-VicsAERpQ@mail.gmail.com Backpatch: 11-	2018-09-25 13:12:44 -07:00
Thomas Munro	64171b3206	Constify dsa_size_class_map and use a better type. Author: Mark G Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEeOP_Zy_FvVwcAU0UX9nkOhnoR5KN%3D0B6LWX_kv0ZuSc4wbGw%40mail.gmail.com	2018-09-25 14:58:41 +12:00
Tom Lane	fd582317e1	Sync our Snowball stemmer dictionaries with current upstream. We haven't touched these since text search functionality landed in core in 2007 :-(. While the upstream project isn't a beehive of activity, they do make additions and bug fixes from time to time. Update our copies of these files. Also update our documentation about how to keep things in sync, since they're not making distribution tarballs these days. Fortunately, their source code turns out to be a breeze to build. Notable changes: * The non-UTF8 version of the hungarian stemmer now works in LATIN2 not LATIN1. * New stemmers have appeared for arabic, indonesian, irish, lithuanian, nepali, and tamil. These all work in UTF8, and the indonesian and irish ones also work in LATIN1. (There are some new stemmers that I did not incorporate, mainly because their names don't match the underlying languages, suggesting that they're not to be considered mainstream.) Worth noting: the upstream Nepali dictionary was contributed by Arthur Zakirov. initdb forced because the contents of snowball_create.sql have changed. Still TODO: see about updating the stopword lists. Arthur Zakirov, minor mods and doc work by me Discussion: https://postgr.es/m/20180626122025.GA12647@zakirov.localdomain Discussion: https://postgr.es/m/20180219140849.GA9050@zakirov.localdomain	2018-09-24 17:29:38 -04:00
Andres Freund	52050ad8eb	Make EXPLAIN output for JIT compilation more dense. A discussion about also reporting JIT compilation overhead on workers brought unhappiness with the verbosity of the current explain format to light. Make the text format more dense, and restructure the structured output to mirror that more closely. As we're re-jiggering the output format anyway: The denser format allows us to report all flags for JIT compilation (now also reporting PGJIT_EXPR and PGJIT_DEFORM), and report the total time in addition to the individual times. Per complaint from Tom Lane. Author: Andres Freund Discussion: https://postgr.es/m/27812.1537221015@sss.pgh.pa.us Backpatch: 11-, where JIT compilation was introduced	2018-09-24 13:35:45 -07:00
Andrew Dunstan	7636e5c60f	Fast default trigger and expand_tuple fixes Ensure that triggers get properly filled in tuples for the OLD value. Also fix the logic of detecting missing null values. The previous logic failed to detect a missing null column before the first missing column with a default. Fixing this has simplified the logic a bit. Regression tests are added to test changes. This should ensure better coverage of expand_tuple(). Original bug reports, and some code and test scripts from Tomas Vondra Backpatch to release 11.	2018-09-24 16:11:24 -04:00
Tom Lane	87d9bbca13	Fix over-allocation of space for array_out()'s result string. array_out overestimated the space needed for its output, possibly by a very substantial amount if the array is multi-dimensional, because of wrong order of operations in the loop that counts the number of curly-brace pairs needed. While the output string is normally short-lived, this could still cause problems in extreme cases. An additional minor error was that it counted one more delimiter than is actually needed. Repair those errors, add an Assert that the space is now correctly calculated, and make some minor improvements in the comments. I also failed to resist the temptation to get rid of an integer modulus operation per array element; a simple comparison is sufficient. This bug dates clear back to Berkeley days, so back-patch to all supported versions. Keiichi Hirobe, minor additional work by me Discussion: https://postgr.es/m/CAH=EFxE9W0tRvQkixR2XJRRCToUYUEDkJZk6tnADXugPBRdcdg@mail.gmail.com	2018-09-24 11:30:59 -04:00
Joe Conway	c62dd80cdf	Document aclitem functions and operators aclitem functions and operators have been heretofore undocumented. Fix that. While at it, ensure the non-operator aclitem functions have pg_description strings. Does not seem worthwhile to back-patch. Author: Fabien Coelho, with pg_description from John Naylor, and significant refactoring and editorialization by me. Reviewed by: Tom Lane Discussion: https://postgr.es/m/flat/alpine.DEB.2.21.1808010825490.18204%40lancre	2018-09-24 10:14:57 -04:00
Noah Misch	d18f6674bd	Initialize random() in bootstrap/stand-alone postgres and in initdb. This removes a difference between the standard IsUnderPostmaster execution environment and that of --boot and --single. In a stand-alone backend, "SELECT random()" always started at the same seed. On a system capable of using posix shared memory, initdb could still conclude "selecting dynamic shared memory implementation ... sysv". Crashed --boot or --single postgres processes orphaned shared memory objects having names that collided with the not-actually-random names that initdb probed. The sysv fallback appeared after ten crashes of --boot or --single postgres. Since --boot and --single are rare in production use, systems used for PostgreSQL development are the principal candidate to notice this symptom. Back-patch to 9.3 (all supported versions). PostgreSQL 9.4 introduced dynamic shared memory, but 9.3 does share the "SELECT random()" problem. Reviewed by Tom Lane and Kyotaro HORIGUCHI. Discussion: https://postgr.es/m/20180915221546.GA3159382@rfd.leadboat.com	2018-09-23 22:56:39 -07:00
Tom Lane	89b280e139	Fix failure in WHERE CURRENT OF after rewinding the referenced cursor. In a case where we have multiple relation-scan nodes in a cursor plan, such as a scan of an inheritance tree, it's possible to fetch from a given scan node, then rewind the cursor and fetch some row from an earlier scan node. In such a case, execCurrent.c mistakenly thought that the later scan node was still active, because ExecReScan hadn't done anything to make it look not-active. We'd get some sort of failure in the case of a SeqScan node, because the node's scan tuple slot would be pointing at a HeapTuple whose t_self gets reset to invalid by heapam.c. But it seems possible that for other relation scan node types we'd actually return a valid tuple TID to the caller, resulting in updating or deleting a tuple that shouldn't have been considered current. To fix, forcibly clear the ScanTupleSlot in ExecScanReScan. Another issue here, which seems only latent at the moment but could easily become a live bug in future, is that rewinding a cursor does not necessarily lead to immediately applying ExecReScan to every scan-level node in the plan tree. Upper-level nodes will think that they can postpone that call if their child node is already marked with chgParam flags. I don't see a way for that to happen today in a plan tree that's simple enough for execCurrent.c's search_plan_tree to understand, but that's one heck of a fragile assumption. So, add some logic in search_plan_tree to detect chgParam flags being set on nodes that it descended to/through, and assume that that means we should consider lower scan nodes to be logically reset even if their ReScan call hasn't actually happened yet. Per bug #15395 from Matvey Arye. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/153764171023.14986.280404050547008575@wrigleys.postgresql.org	2018-09-23 16:05:45 -04:00
Alexander Korotkov	2f39106a20	Replace CAS loop with single TAS in ProcArrayGroupClearXid() Single pg_atomic_exchange_u32() is expected to be faster than loop of pg_atomic_compare_exchange_u32(). Also, it would be consistent with clog group update code. Discussion: https://postgr.es/m/CAPpHfdtxLsC-bqfxFcHswZ91OxXcZVNDBBVfg9tAWU0jvn1tQA%40mail.gmail.com Reviewed-by: Amit Kapila	2018-09-22 16:22:30 +03:00
Michael Paquier	db361db2fc	Make GUC wal_sender_timeout user-settable Being able to use a value that can be changed on a connection basis is useful with clusters distributed geographically, and makes failure detection more flexible. A note is added in the documentation about the use of "options" in primary_conninfo, which can be hard to grasp for newcomers with the need of two single quotes when listing a set of parameters. Author: Tsunakawa Takayuki Reviewed-by: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1FAAD3AE@G01JPEXMBYT05	2018-09-22 15:23:59 +09:00
Thomas Munro	f025bd2ddd	Use size_t consistently in dsa.{ch}. Takeshi Ideriha complained that there is a mixture of Size and size_t in dsa.c and corresponding header. Let's use size_t. Back-patch to 10 where dsa.c landed, to make future back-patching easy. Discussion: https://postgr.es/m/4E72940DA2BF16479384A86D54D0988A6F19ABD9%40G01JPEXMBKW04	2018-09-22 00:40:13 +12:00
Tom Lane	1dba1b61c2	Add a "return" statement to pacify perlcritic. Per buildfarm member crake.	2018-09-20 16:10:23 -04:00
Tom Lane	3dc820c43e	Teach genbki.pl to auto-generate pg_type entries for array types. This eliminates some more tedium in adding new catalog entries, specifically the need to set up an array type when adding a new built-in data type. Now it's sufficient to assign an OID for the array type and write it in an "array_type_oid" metadata field. You don't have to fill the base type's typarray link explicitly, either. No catversion bump since the contents of pg_type aren't changed. (Well, their order might be different, but that doesn't matter.) John Naylor, reviewed and whacked around a bit by Dagfinn Ilmari Mannsåker, and some more by me. Discussion: https://postgr.es/m/CAJVSVGVTb6m9pJF49b3SuA8J+T-THO9c0hxOmoyv-yGKh-FbNg@mail.gmail.com	2018-09-20 15:14:46 -04:00
Alexander Korotkov	09e99ce86e	Fix handling of format string text characters in to_timestamp()/to_date() `cf984672` introduced improvement of handling of spaces and separators in to_timestamp()/to_date() functions. In particular, now we're skipping spaces both before and after fields. That may cause format string text character to consume part of field in the situations, when it didn't happen before `cf984672`. This commit cause format string text character consume input string characters only when since previous field (or string beginning) number of skipped input string characters is not greater than number of corresponding format string characters (that is we didn't skip any extra characters in input string).	2018-09-20 15:48:04 +03:00
Thomas Munro	38763d6778	Fix segment_bins corruption in dsa.c. If a segment has been freed by dsa.c because it is entirely empty, other backends must make sure to unmap it before following links to new segments that might happen to have the same index number, or they could finish up looking at a defunct segment and then corrupt the segment_bins lists. The correct protocol requires checking freed_segment_counter after acquiring the area lock and before resolving any index number to a segment. Add the missing checks and an assertion. Back-patch to 10, where dsa.c first arrived. Author: Thomas Munro Reported-by: Tomas Vondra Discussion: https://postgr.es/m/CAEepm%3D0thg%2Bja5zGVa7jBy-uqyHrTqTm8HGhEOtMmigGrAqTbw%40mail.gmail.com	2018-09-20 15:52:39 +12:00
Thomas Munro	6c3c9d4189	Defer restoration of libraries in parallel workers. Several users of extensions complained of crashes in parallel workers that turned out to be due to syscache access from their _PG_init() functions. Reorder the initialization of parallel workers so that libraries are restored after the caches are initialized, and inside a transaction. This was reported in bug #15350 and elsewhere. We don't consider it to be a bug: extensions shouldn't do that, because then they can't be used in shared_preload_libraries. However, it's a fairly obscure hazard and these extensions worked in practice before parallel query came along. So let's make it work. Later commits might add a warning message and eventually an error. Back-patch to 9.6, where parallel query landed. Author: Thomas Munro Reviewed-by: Amit Kapila Reported-by: Kieran McCusker, Jimmy Discussion: https://postgr.es/m/153512195228.1489.8545997741965926448%40wrigleys.postgresql.org	2018-09-20 14:21:18 +12:00
Tom Lane	0d38e4ebb7	Fix minor error message style guide violation. No periods at the ends of primary error messages, please. Daniel Gustafsson Discussion: https://postgr.es/m/43E004C0-18C6-42B4-A313-003B43EB0571@yesql.se	2018-09-19 17:06:40 -04:00
Tom Lane	8f0de712c3	Don't ignore locktable-full failures in StandbyAcquireAccessExclusiveLock. Commit `37c54863c` removed the code in StandbyAcquireAccessExclusiveLock that checked the return value of LockAcquireExtended. That created a bug, because it's still passing reportMemoryError = false to LockAcquireExtended, meaning that LOCKACQUIRE_NOT_AVAIL will be returned if we're out of shared memory for the lock table. In such a situation, the startup process would believe it had acquired an exclusive lock even though it hadn't, with potentially dire consequences. To fix, just drop the use of reportMemoryError = false, which allows us to simplify the call into a plain LockAcquire(). It's unclear that the locktable-full situation arises often enough that it's worth having a better recovery method than crash-and-restart. (I strongly suspect that the only reason the code path existed at all was that it was relatively simple to do in the pre-37c54863c implementation. But now it's not.) LockAcquireExtended's reportMemoryError parameter is now dead code and could be removed. I refrained from doing so, however, because there was some interest in resurrecting the behavior if we do get reports of locktable-full failures in the field. Also, it seems unwise to remove the parameter concurrently with shipping commit `f868a8143`, which added a parameter; if there are any third-party callers of LockAcquireExtended, we want them to get a wrong-number-of-parameters compile error rather than a possibly-silent misinterpretation of its last parameter. Back-patch to 9.6 where the bug was introduced. Discussion: https://postgr.es/m/6202.1536359835@sss.pgh.pa.us	2018-09-19 12:43:51 -04:00
Alexander Korotkov	2a6368343f	Add support for nearest-neighbor (KNN) searches to SP-GiST Currently, KNN searches were supported only by GiST. SP-GiST also capable to support them. This commit implements that support. SP-GiST scan stack is replaced with queue, which serves as stack if no ordering is specified. KNN support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops and poly_ops (catversion is bumped). Some common parts between GiST and SP-GiST KNNs are extracted into separate functions. Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov Review: Andrey Borodin, Alexander Korotkov	2018-09-19 01:54:10 +03:00
Tom Lane	d0cfc3d6a4	Add a debugging option to stress-test outfuncs.c and readfuncs.c. In the normal course of operation, query trees will be serialized only if they are stored as views or rules; and plan trees will be serialized only if they get passed to parallel-query workers. This leaves an awful lot of opportunity for bugs/oversights to not get detected, as indeed we've just been reminded of the hard way. To improve matters, this patch adds a new compile option WRITE_READ_PARSE_PLAN_TREES, which is modeled on the longstanding option COPY_PARSE_PLAN_TREES; but instead of passing all parse and plan trees through copyObject, it passes them through nodeToString + stringToNode. Enabling this option in a buildfarm animal or two will catch problems at least for cases that are exercised by the regression tests. A small problem with this idea is that readfuncs.c historically has discarded location fields, on the reasonable grounds that parse locations in a retrieved view are not relevant to the current query. But doing that in WRITE_READ_PARSE_PLAN_TREES breaks pg_stat_statements, and it could cause problems for future improvements that might try to report error locations at runtime. To fix that, provide a variant behavior in readfuncs.c that makes it restore location fields when told to. In passing, const-ify the string arguments of stringToNode and its subsidiary functions, just because it annoyed me that they weren't const already. Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us	2018-09-18 17:11:54 -04:00
Tom Lane	db1071d4ee	Fix some minor issues exposed by outfuncs/readfuncs testing. A test patch to pass parse and plan trees through outfuncs + readfuncs exposed several issues that need to be fixed to get clean matches: Query.withCheckOptions failed to get copied; it's intentionally ignored by outfuncs/readfuncs on the grounds that it'd always be NIL anyway in stored rules. This seems less than future-proof, and it's not even saving very much, so just undo the decision and treat the field like all others. Several places that convert a view RTE into a subquery RTE, or similar manipulations, failed to clear out fields that were specific to the original RTE type and should be zero in a subquery RTE. Since readfuncs.c will leave such fields as zero, equalfuncs.c thinks the nodes are different leading to a reported mismatch. It seems like a good idea to clear out the no-longer-needed fields, even though in principle nothing should look at them; the node ought to be indistinguishable from how it would look if we'd built a new node instead of scribbling on the old one. BuildOnConflictExcludedTargetlist randomly set the resname of some TargetEntries to "" not NULL. outfuncs/readfuncs don't distinguish those cases, and so the string will read back in as NULL ... but equalfuncs.c does distinguish. Perhaps we ought to try to make things more consistent in this area --- but it's just useless extra code space for BuildOnConflictExcludedTargetlist to not use NULL here, so I fixed it for now by making it do that. catversion bumped because the change in handling of Query.withCheckOptions affects stored rules. Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us	2018-09-18 15:08:28 -04:00
Tom Lane	09991e5a47	Fix some probably-minor oversights in readfuncs.c. The system expects TABLEFUNC RTEs to have coltypes, coltypmods, and colcollations lists, but outfuncs doesn't dump them and readfuncs doesn't restore them. This doesn't cause obvious failures, because the only things that look at those fields are expandRTE() and get_rte_attribute_type(), which are mostly used during parse analysis, before anything would've passed the parsetree through outfuncs/readfuncs. But expandRTE() is used in build_physical_tlist(), which means that that function will return a wrong answer for a TABLEFUNC RTE that came from a view. Very accidentally, this doesn't cause serious problems, because what it will return is NIL which callers will interpret as "couldn't build a physical tlist because of dropped columns". So you still get a plan that works, though it's marginally less efficient than it could be. There are also some other expandRTE() calls associated with transformation of whole-row Vars in the planner. I have been unable to exhibit misbehavior from that, and it may be unreachable in any case that anyone would care about ... but I'm not entirely convinced, so this seems like something we should back- patch a fix for. Fortunately, we can fix it without forcing a change of stored rules and a catversion bump, because we can just copy these lists from the subsidiary TableFunc object. readfuncs.c was also missing support for NamedTuplestoreScan plan nodes. This accidentally fails to break parallel query because a query using a named tuplestore would never be considered parallel-safe anyway. However, project policy since parallel query came in is that all plan node types should have outfuncs/readfuncs support, so this is clearly an oversight that should be repaired. Noted while fooling around with a patch to test outfuncs/readfuncs more thoroughly. That exposed some other issues too, but these are the only ones that seem worth back-patching. Back-patch to v10 where both of these features came in. Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us	2018-09-18 13:02:27 -04:00
Thomas Munro	422952ee78	Allow DSM allocation to be interrupted. Chris Travers reported that the startup process can repeatedly try to cancel a backend that is in a posix_fallocate()/EINTR loop and cause it to loop forever. Teach the retry loop to give up if an interrupt is pending. Don't actually check for interrupts in that loop though, because a non-local exit would skip some clean-up code in the caller. Back-patch to 9.4 where DSM was added (and posix_fallocate() was later back-patched). Author: Chris Travers Reviewed-by: Ildar Musin, Murat Kabilov, Oleksii Kliukin Tested-by: Oleksii Kliukin Discussion: https://postgr.es/m/CAN-RpxB-oeZve_J3SM_6%3DHXPmvEG%3DHX%2B9V9pi8g2YR7YW0rBBg%40mail.gmail.com	2018-09-18 22:56:36 +12:00
Michael Paquier	1d6fbc38d9	Refactor routines for subscription and publication lookups Those routines gain a missing_ok argument, allowing a caller to get a NULL result instead of an error if set to true. This is part of a larger refactoring effort for objectaddress.c where trying to check for non-existing objects does not result in cache lookup failures. Author: Michael Paquier Reviewed-by: Aleksander Alekseev, Álvaro Herrera Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com	2018-09-18 12:00:18 +09:00
Tom Lane	07a3af0ff8	Fix parsetree representation of XMLTABLE(XMLNAMESPACES(DEFAULT ...)). The original coding for XMLTABLE thought it could represent a default namespace by a T_String Value node with a null string pointer. That's not okay, though; in particular outfuncs.c/readfuncs.c are not on board with such a representation, meaning you'll get a null pointer crash if you try to store a view or rule containing this construct. To fix, change the parsetree representation so that we have a NULL list element, instead of a bogus Value node. This isn't really a functional limitation since default XML namespaces aren't yet implemented in the executor; you'd just get "DEFAULT namespace is not supported" anyway. But crashes are not nice, so back-patch to v10 where this syntax was added. Ordinarily we'd consider a parsetree representation change to be un-backpatchable; but since existing releases would crash on the way to storing such constructs, there can't be any existing views/rules to be incompatible with. Per report from Andrey Lepikhov. Discussion: https://postgr.es/m/3690074f-abd2-56a9-144a-aa5545d7a291@postgrespro.ru	2018-09-17 13:16:32 -04:00
Tom Lane	3844adbf3c	Add outfuncs.c support for RawStmt nodes. I noticed while poking at a report from Andrey Lepikhov that the recent addition of RawStmt nodes at the top of raw parse trees makes it impossible to print any raw parse trees whatsoever, because outfuncs.c doesn't know RawStmt and hence fails to descend into it. While we generally lack outfuncs.c support for utility statements, there is reasonably complete support for what you can find in a raw SELECT statement. It was not my intention to make that all dead code ... so let's add support for RawStmt. Back-patch to v10 where RawStmt appeared.	2018-09-16 13:02:47 -04:00
Tom Lane	8f32bacc00	In v11, disable JIT by default (it's still enabled by default in HEAD). Per discussion, JIT isn't quite mature enough to ship enabled-by-default. I failed to resist the temptation to do a bunch of copy-editing on the related documentation. Also, clean up some inconsistencies in which section of config.sgml the JIT GUCs are documented in vs. what guc.c and postgresql.config.sample had. Discussion: https://postgr.es/m/20180914222657.mw25esrzbcnu6qlu@alap3.anarazel.de	2018-09-15 17:24:35 -04:00
Tom Lane	1f4a920b73	Fix failure with initplans used conditionally during EvalPlanQual rechecks. The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit `9f2ee8f28`, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits `71404af2a` and `342a1ffa2` into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com	2018-09-15 13:42:33 -04:00
Alvaro Herrera	6b78231d91	Move PartitionDispatchData struct definition to execPartition.c There's no reason to expose the struct definition, so don't. Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Discussion: https://postgr.es/m/d3fa24c1-bc65-7133-81df-6474387ccc4f@lab.ntt.co.jp	2018-09-14 19:06:57 -03:00
Alvaro Herrera	20bef2c311	Fix ALTER/TYPE on columns referenced by FKs in partitioned tables When ALTER TABLE ... SET DATA TYPE affects a column referenced by constraints and indexes, it drop those constraints and indexes and recreates them afterwards, so that the definitions match the new data type. The original code did this by dropping one object at a time (commit `077db40fa1` of May 2004), which worked fine because the dependencies between the objects were pretty straightforward, and ordering the objects in a specific way was enough to make this work. However, when there are foreign key constraints in partitioned tables, the dependencies are no longer so straightforward, and we were getting errors when attempted: ERROR: cache lookup failed for constraint 16398 This can be fixed by doing all the drops in one pass instead, using performMultipleDeletions (introduced by `df18c51f29` of Aug 2006). With this change we can also remove the code to carefully order the list of objects to be deleted. Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAKcux6nWS_m+s=1Udk_U9B+QY7pA-Ac58qR5BdUfOyrwnWHDew@mail.gmail.com	2018-09-14 13:41:20 -03:00
Andrew Gierth	728202b63c	Order active window clauses for greater reuse of Sort nodes. By sorting the active window list lexicographically by the sort clause list but putting longer clauses before shorter prefixes, we generate more chances to elide Sort nodes when building the path. Author: Daniel Gustafsson (with some editorialization by me) Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se	2018-09-14 17:35:42 +01:00
Amit Kapila	75f9c4ca5a	Don't allow LIMIT/OFFSET clause within sub-selects to be pushed to workers. Allowing sub-select containing LIMIT/OFFSET in workers can lead to inconsistent results at the top-level as there is no guarantee that the row order will be fully deterministic. The fix is to prohibit pushing LIMIT/OFFSET within sub-selects to workers. Reported-by: Andrew Fletcher Bug: 15324 Author: Amit Kapila Reviewed-by: Dilip Kumar Backpatch-through: 9.6 Discussion: https://postgr.es/m/153417684333.10284.11356259990921828616@wrigleys.postgresql.org	2018-09-14 09:36:30 +05:30
Michael Paquier	28a8fa984c	Improve autovacuum logging for aggressive and anti-wraparound runs A log message was being generated when log_min_duration is reached for autovacuum on a given relation to indicate if it was an aggressive run, and missed the point of mentioning if it is doing an anti-wrapround run. The log message generated is improved so as one, both or no extra details are added depending on the option set. Author: Sergei Kornilov Reviewed-by: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/11587951532155118@sas1-19a94364928d.qloud-c.yandex.net	2018-09-14 07:35:39 +09:00
Amit Kapila	bc153c941d	Attach FPI to the first record after full_page_writes is turned on. XLogInsert fails to attach a required FPI to the first record after full_page_writes is turned on by the last checkpoint. This bug got introduced in 9.5 due to code rearrangement in commits `2c03216d83` and `2076db2aea`. Fix it by ensuring that XLogInsertRecord performs a recomputation when the given record is generated with FPW as off but found that the flag has been turned on while actually inserting the record. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Amit Kapila Backpatch-through: 9.5 where this problem was introduced Discussion: https://postgr.es/m/20180420.151043.74298611.horiguchi.kyotaro@lab.ntt.co.jp	2018-09-13 15:32:50 +05:30
Michael Paquier	514a731ddc	Simplify static function in extension.c An extra argument for the filename defining the extension script location was present, aimed at being used for error reporting, but has never been used. This was around since extensions have been added in `d9572c4`. Author: Yugo Nagata Reviewed-by: Tatsuo Ishii Discussion: https://postgr.es/m/20180907180504.1ff19e1675bb44a67e9c7ab1@sraoss.co.jp	2018-09-13 16:56:57 +09:00
Peter Eisentraut	e5f1bb92cf	Simplify index tuple descriptor initialization We have two code paths for initializing the tuple descriptor for a new index: For a normal index, we copy the tuple descriptor from the table and reset a number of fields that are not applicable to indexes. For an expression index, we make a blank tuple descriptor and fill in the needed fields based on the provided expressions. As pg_attribute has grown over time, the number of fields that we need to reset in the first case is now bigger than the number of fields we actually want to copy, so it's sensible to do it the other way around: Make a blank descriptor and copy just the fields we need. This also allows more code sharing between the two branches, and it avoids having to touch this code for almost every unrelated change to the pg_attribute structure. Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>	2018-09-13 08:22:03 +02:00
Andrew Gierth	b7f6bcbffc	Repair bug in regexp split performance improvements. Commit `c8ea87e4b` introduced a temporary conversion buffer for substrings extracted during regexp splits. Unfortunately the code that sized it was failing to ignore the effects of ignored degenerate regexp matches, so for regexp_split_* calls it could under-size the buffer in such cases. Fix, and add some regression test cases (though those will only catch the bug if run in a multibyte encoding). Backpatch to 9.3 as the faulty code was. Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the report (via IRC) and assistance in analysis. Patch by me.	2018-09-12 19:31:06 +01:00
Tom Lane	fedc97cdfd	Remove ruleutils.c's special case for BIT [VARYING] literals. Up to now, get_const_expr() insisted on prefixing BIT and VARBIT literals with 'B'. That's not really necessary, because we always append explicit-cast syntax to identify the constant's type. Moreover, it's subtly wrong for VARBIT, because the parser will interpret B'...' as '...'::"bit"; see make_const() which explicitly assigns type BITOID for a T_BitString literal. So what had been a simple VARBIT literal is reconstructed as ('...'::"bit")::varbit, which is not the same thing, at least not before constant folding. This results in odd differences after dump/restore, as complained of by the patch submitter, and it could result in actual failures in partitioning or inheritance DDL operations (see commit `542320c2b`, which repaired similar misbehaviors for some other data types). Fixing it is pretty easy: just remove the special case and let the default code path handle these types. We could have kept the special case for BIT only, but there seems little point in that. Like the previous patch, I judge that back-patching this into stable branches wouldn't be a good idea. However, it seems not quite too late for v11, so let's fix it there. Paul Guo, reviewed by Davy Machado and John Naylor, minor adjustments by me Discussion: https://postgr.es/m/CABQrizdTra=2JEqA6+Ms1D1k1Kqw+aiBBhC9TreuZRX2JzxLAA@mail.gmail.com	2018-09-11 16:32:25 -04:00
Andrew Gierth	500d49794f	Repair double-free in SP-GIST rescan (bug #15378 ) spgrescan would first reset traversalCxt, and then traverse a potentially non-empty stack containing pointers to traversalValues which had been allocated in those contexts, freeing them a second time. This bug originates in commit `ccd6eb49a` where traversalValue was introduced. Repair by traversing the stack before the context reset; this isn't ideal, since it means doing retail pfree in a context that's about to be reset, but the freeing of a stack entry is also done in other places in the code during the scan so it's not worth trying to refactor it further. Regression test added. Backpatch to 9.6 where the problem was introduced. Per bug #15378; analysis and patch by me, originally from a report on IRC by user velix; see also PostGIS ticket #4174; review by Alexander Korotkov. Discussion: https://postgr.es/m/153663176628.23136.11901365223750051490@wrigleys.postgresql.org	2018-09-11 18:14:19 +01:00
Alexander Korotkov	cf98467242	Improve behavior of to_timestamp()/to_date() functions to_timestamp()/to_date() functions were introduced mainly for Oracle compatibility, and became very popular among PostgreSQL users. However, some behavior of to_timestamp()/to_date() functions are both incompatible with Oracle and confusing for our users. This behavior is related to handling of spaces and separators in non FX (fixed format) mode. This commit reworks this behavior making less confusing, better documented and more compatible with Oracle. Nevertheless, there are still following incompatibilities with Oracle. 1) We don't insist that there are no format string patterns unmatched to input string. 2) In FX mode we don't insist space and separators in format string to exactly match input string. 3) When format string patterns are divided by mix of spaces and separators, we don't distinguish them, while Oracle takes into account only last group of spaces/separators. Discussion: https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com Author: Artur Zakirov, Alexander Korotkov, Liudmila Mantrova Review: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston	2018-09-09 21:19:51 +03:00
Alexander Korotkov	5f08accdad	Fix past pd_upper write in ginRedoRecompress() ginRedoRecompress() replays actions over compressed segments of posting list in-place. However, it might lead to write past pg_upper, because intermediate state during playing the changes can take more space than both original state and final state. This commit fixes that by refuse from in-place modification. Instead page tail is copied once modification is started, and then it's used as the source of original segments. Backpatch to 9.4 where posting list compression was introduced. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian Review: Sivasubramanian Ramasubramanian Backpatch-through: 9.4	2018-09-09 21:19:29 +03:00
Tom Lane	ff47d4bf1f	Work around stdbool problem in dfmgr.c. Commit `842cb9fa6` refactored things so that dfmgr.c includes <dlfcn.h>, which before that had only been directly included in platform-specific stub files. It turns out that on macOS, <dlfcn.h> includes <stdbool.h>, and that causes problems on platforms where _Bool is not char-sized ... which happens to include the PPC versions of macOS. Work around it much as we have in plperl.h, by #undef'ing bool after including the problematic file, but only if we're not using stdbool-style booleans. Discussion: https://postgr.es/m/E1fxqjl-0003YS-NS@gemulon.postgresql.org	2018-09-09 12:41:27 -04:00
Tom Lane	ed0cdf0e05	Install a check for mis-linking of src/port and src/common functions. On ELF-based platforms (and maybe others?) it's possible for a shared library, when dynamically loaded into the backend, to call the backend versions of src/port and src/common functions rather than the frontend versions that are actually linked into the shlib. This is definitely not what we want, because the frontend versions often behave slightly differently. Up to now it's been "slight" enough that nobody noticed; but with the addition of SCRAM support functions in src/common, we're observing crashes due to the difference between palloc and malloc memory allocation rules, as reported in bug #15367 from Jeremy Evans. The purpose of this patch is to create a direct test for this type of mis-linking, so that we know whether any given platform requires extra measures to prevent using the wrong functions. If the test fails, it will lead to connection failures in the contrib/postgres_fdw regression test. At the moment, *BSD platforms using ELF format are known to have the problem and can be expected to fail; but we need to know whether anything else does, and we need a reliable ongoing check for future platforms. Actually fixing the problem will be the subject of later commit(s). Discussion: https://postgr.es/m/153626613985.23143.4743626885618266803@wrigleys.postgresql.org	2018-09-09 12:23:23 -04:00
Tom Lane	f47f314801	Minor cleanup/future-proofing for pg_saslprep(). Ensure that pg_saslprep() initializes its output argument to NULL in all failure paths, and then remove the redundant initialization that some (not all) of its callers did. This does not fix any live bug, but it reduces the odds of future bugs of omission. Also add a comment about why the existing failure-path coding is adequate. Back-patch so as to keep the function's API consistent across branches, again to forestall future bug introduction. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/16558.1536407783@sss.pgh.pa.us	2018-09-08 18:20:36 -04:00
Michael Paquier	9226a3b89b	Remove duplicated words split across lines in comments This has been detected using some interesting tricks with sed, and the method used is mentioned in details in the discussion below. Author: Justin Pryzby Discussion: https://postgr.es/m/20180908013109.GB15350@telsasoft.com	2018-09-08 12:24:19 -07:00
Tom Lane	361844fe56	Save/restore SPI's global variables in SPI_connect() and SPI_finish(). This patch removes two sources of interference between nominally independent functions when one SPI-using function calls another, perhaps without knowing that it does so. Chapman Flack pointed out that xml.c's query_to_xml_internal() expects SPI_tuptable and SPI_processed to stay valid across datatype output function calls; but it's possible that such a call could involve re-entrant use of SPI. It seems likely that there are similar hazards elsewhere, if not in the core code then in third-party SPI users. Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which would typically make for a crash in such a situation. Restoring them to the values they had at SPI_connect() seems like a considerably more useful behavior, and it still meets the design goal of not leaving any dangling pointers to tuple tables of the function being exited. Also, cause SPI_connect() to reset these variables to zeroes/nulls after saving them. This prevents interference in the opposite direction: it's possible that a SPI-using function that's only ever been tested standalone contains assumptions that these variables start out as zeroes. That was the case as long as you were the outermost SPI user, but not so much for an inner user. Now it's consistent. Report and fix suggestion by Chapman Flack, actual patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net	2018-09-07 20:09:57 -04:00
Tom Lane	f510412df3	Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY. It's somewhat surprising that we got away with this before. (Actually, since nobody tests this routinely AFAIK, it might've been broken for awhile. But it's definitely broken in the wake of commit f868a8143.) It seems sufficient to limit the forced recursion to a small number of levels. Back-patch to all supported branches, like the preceding patch. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us	2018-09-07 18:13:29 -04:00
Tom Lane	f868a8143a	Fix longstanding recursion hazard in sinval message processing. LockRelationOid and sibling routines supposed that, if our session already holds the lock they were asked to acquire, they could skip calling AcceptInvalidationMessages on the grounds that we must have already read any remote sinval messages issued against the relation being locked. This is normally true, but there's a critical special case where it's not: processing inside AcceptInvalidationMessages might attempt to access system relations, resulting in a recursive call to acquire a relation lock. Hence, if the outer call had acquired that same system catalog lock, we'd fall through, despite the possibility that there's an as-yet-unread sinval message for that system catalog. This could, for example, result in failure to access a system catalog or index that had just been processed by VACUUM FULL. This is the explanation for buildfarm failures we've been seeing intermittently for the past three months. The bug is far older than that, but commits `a54e1f158` et al added a new recursion case within AcceptInvalidationMessages that is apparently easier to hit than any previous case. To fix this, we must not skip calling AcceptInvalidationMessages until we have finished a call to it since acquiring a relation lock, not merely acquired the lock. (There's already adequate logic inside AcceptInvalidationMessages to deal with being called recursively.) Fortunately, we can implement that at trivial cost, by adding a flag to LOCALLOCK hashtable entries that tracks whether we know we have completed such a call. There is an API hazard added by this patch for external callers of LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD, it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR into thinking the lock wasn't already held. This should be a fail-soft condition, though, unless something very bizarre is being done in response to the test. Also, I added an additional output argument to LockAcquireExtended, assuming that that probably isn't called by any outside code given the very limited usefulness of its additional functionality. Back-patch to all supported branches. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us	2018-09-07 18:04:54 -04:00
Michael Paquier	8582b4d044	Improve handling of corrupted two-phase state files at recovery When a corrupted two-phase state file is found by WAL replay, be it for crash recovery or archive recovery, then the file is simply skipped and a WARNING is logged to the user, causing the transaction to be silently lost. Facing an on-disk WAL file which is corrupted is as likely to happen as what is stored in WAL records, but WAL records are already able to fail hard if there is a CRC mismatch. On-disk two-phase state files, on the contrary, are simply ignored if corrupted. Note that when restoring the initial two-phase data state at recovery, files newer than the horizon XID are discarded hence no files present in pg_twophase/ should be torned and have been made durable by a previous checkpoint, so recovery should never see any corrupted two-phase state file by design. The situation got better since `978b2f6` which has added two-phase state information directly in WAL instead of using on-disk files, so the risk is limited to two-phase transactions which live across at least one checkpoint for long periods. Backups having legit two-phase state files on-disk could also lose silently transactions when restored if things get corrupted. This behavior exists since two-phase commit has been introduced, no back-patch is done for now per the lack of complaints about this problem. Author: Michael Paquier Discussion: https://postgr.es/m/20180709050309.GM1467@paquier.xyz	2018-09-07 11:00:16 -07:00
Peter Eisentraut	98afa68d93	Use C99 designated initializers for some structs These are just a few particularly egregious cases that were hard to read and write, and error prone because of many similar adjacent types. Discussion: https://www.postgresql.org/message-id/flat/4c9f01be-9245-2148-b569-61a8562ef190%402ndquadrant.com	2018-09-07 11:40:03 +02:00
Peter Eisentraut	842cb9fa62	Refactor dlopen() support Nowadays, all platforms except Windows and older HP-UX have standard dlopen() support. So having a separate implementation per platform under src/backend/port/dynloader/ is a bit excessive. Instead, treat dlopen() like other library functions that happen to be missing sometimes and put a replacement implementation under src/port/. Discussion: https://www.postgresql.org/message-id/flat/e11a49cb-570a-60b7-707d-7084c8de0e61%402ndquadrant.com#54e735ae37476a121abb4e33c2549b03	2018-09-06 11:33:04 +02:00
Tom Lane	54b01b9293	Remove no-longer-used variable. Oversight in `2fbdf1b38`. Per buildfarm.	2018-09-05 14:29:58 -04:00
Alvaro Herrera	2fbdf1b38b	Simplify partitioned table creation vs. relcache In the original code, we were storing the pg_inherits row for a partitioned table too early: enough that we had a hack for relcache to avoid falling flat on its face while reading such a partial entry. If we finish the pg_class creation first and then store the pg_inherits entry, we don't need that hack. Also recognize that pg_class.relpartbound is not marked NOT NULL and therefore it's entirely possible to read null values, so having only Assert() protection isn't enough. Change those to if/elog tests instead. This qualifies as a robustness fix, so backpatch to pg11. In passing, remove one access that wasn't actually needed, and reword one message to be like all the others that check for the same thing. Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql	2018-09-05 14:36:13 -03:00
Michael Paquier	d6e98ebe37	Improve some error message strings and errcodes This makes a bit less work for translators, by unifying error strings a bit more with what the rest of the code does, this time for three error strings in autoprewarm and one in base backup code. After some code review of slot.c, some file-access errcodes are reported but lead to an incorrect internal error, while corrupted data makes the most sense, similarly to the previous work done in `e41d0a1`. Also, after calling rmtree(), a WARNING gets reported, which is a duplicate of what the internal call report, so make the code more consistent with all other code paths calling this function. Author: Michael Paquier Discussion: https://postgr.es/m/20180902200747.GC1343@paquier.xyz	2018-09-04 11:06:04 -07:00
Tom Lane	17b7c302b5	Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de	2018-09-04 13:45:35 -04:00
Amit Kapila	14e9b2a752	Prohibit pushing subqueries containing window function calculation to workers. Allowing window function calculation in workers leads to inconsistent results because if the input row ordering is not fully deterministic, the output of window functions might vary across workers. The fix is to treat them as parallel-restricted. In the passing, improve the coding pattern in max_parallel_hazard_walker so that it has a chain of mutually-exclusive if ... else if ... else if ... else if ... IsA tests. Reported-by: Marko Tiikkaja Bug: 15324 Author: Amit Kapila Reviewed-by: Tom Lane Backpatch-through: 9.6 Discussion: https://postgr.es/m/CAL9smLAnfPJCDUUG4ckX2iznj53V7VSMsYefzZieN93YxTNOcw@mail.gmail.com	2018-09-04 10:28:08 +05:30
Amit Kapila	7c9e19ca9a	During the split, set checksum on an empty hash index page. On a split, we allocate a new splitpoint's worth of bucket pages wherein we initialize the last page with zeros which is fine, but we forgot to set the checksum for that last page. We decided to back-patch this fix till 10 because we don't have an easy way to test it in prior versions. Another reason is that the hash-index code is changed heavily in 10, so it is not advisable to push the fix without testing it in prior versions. Author: Amit Kapila Reviewed-by: Yugo Nagata Backpatch-through: 10 Discussion: https://postgr.es/m/5d03686d-727c-dbf8-0064-bf8b97ffe850@2ndquadrant.com	2018-09-04 08:35:42 +05:30
Alvaro Herrera	c076f3d74a	Remove pg_constraint.conincluding This column was added in commit `8224de4f42` ("Indexes with INCLUDE columns and their support in B-tree") to ease writing the ruleutils.c supporting code for that feature, but it turns out to be unnecessary -- we can do the same thing with just one more syscache lookup. Even the documentation for the new column being removed in this commit is awkward. Discussion: https://postgr.es/m/20180902165018.33otxftp3olgtu4t@alvherre.pgsql	2018-09-03 12:59:26 -03:00
Tomas Vondra	4ddd8f5f55	Fix memory leak in TRUNCATE decoding When decoding a TRUNCATE record, the relids array was being allocated in the main ReorderBuffer memory context, but not released with the change resulting in a memory leak. The array was also ignored when serializing/deserializing the change, assuming all the information is stored in the change itself. So when spilling the change to disk, we've only we have serialized only the pointer to the relids array. Thanks to never releasing the array, the pointer however remained valid even after loading the change back to memory, preventing an actual crash. This fixes both the memory leak and (de)serialization. The relids array is still allocated in the main ReorderBuffer memory context (none of the existing ones seems like a good match, and adding an extra context seems like an overkill). The allocation is wrapped in a new ReorderBuffer API functions, to keep the details within reorderbuffer.c, just like the other ReorderBufferGet methods do. Author: Tomas Vondra Discussion: https://www.postgresql.org/message-id/flat/66175a41-9342-2845-652f-1bd4c3ee50aa%402ndquadrant.com Backpatch: 11, where decoding of TRUNCATE was introduced	2018-09-03 02:10:24 +02:00
Michael Paquier	caa0c6ceba	Fix initial sync of slot parent directory when restoring status At the beginning of recovery, information from replication slots is recovered from disk to memory. In order to ensure the durability of the information, the status file as well as its parent directory are synced. It happens that the sync on the parent directory was done directly using the status file path, which is logically incorrect, and the current code has been doing a sync on the same object twice in a row. Reported-by: Konstantin Knizhnik Diagnosed-by: Konstantin Knizhnik Author: Michael Paquier Discussion: https://postgr.es/m/9eb1a6d5-b66f-2640-598d-c5ea46b8f68a@postgrespro.ru Backpatch-through: 9.4-	2018-09-02 12:40:30 -07:00
Tom Lane	44cac93464	Avoid using potentially-under-aligned page buffers. There's a project policy against using plain "char buf[BLCKSZ]" local or static variables as page buffers; preferred style is to palloc or malloc each buffer to ensure it is MAXALIGN'd. However, that policy's been ignored in an increasing number of places. We've apparently got away with it so far, probably because (a) relatively few people use platforms on which misalignment causes core dumps and/or (b) the variables chance to be sufficiently aligned anyway. But this is not something to rely on. Moreover, even if we don't get a core dump, we might be paying a lot of cycles for misaligned accesses. To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock that the compiler must allocate with sufficient alignment, and use those in place of plain char arrays. I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. Since this seems to be a live portability hazard (despite the lack of field reports), back-patch to all supported versions. Patch by me; thanks to Michael Paquier for review. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de	2018-09-01 15:27:17 -04:00
Alexander Korotkov	ec74369931	Implement "pg_ctl logrotate" command Currently there are two ways to trigger log rotation in logging collector process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly to logging collector process. However, it's nice to have more suitable way for external tools to do that, which wouldn't require SQL connection or knowledge of logging collector pid. This commit implements triggering log rotation by "pg_ctl logrotate" command. Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov	2018-09-01 19:46:49 +03:00
Noah Misch	ab0ed6153a	Ignore server-side delays when enforcing wal_sender_timeout. Healthy clients of servers having poor I/O performance, such as buildfarm members hamster and tern, saw unexpected timeouts. That disagreed with documentation. This fix adds one gettimeofday() call whenever ProcessRepliesIfAny() finds no client reply messages. Back-patch to 9.4; the bug's symptom is rare and mild, and the code all moved between 9.3 and 9.4. Discussion: https://postgr.es/m/20180826034600.GA1105084@rfd.leadboat.com	2018-08-31 22:59:58 -07:00
Michael Paquier	c186ba135e	Ensure correct minimum consistent point on standbys Startup process has improved its calculation of incorrect minimum consistent point in `8d68ee6`, which ensures that all WAL available gets replayed when doing crash recovery, and has introduced an incorrect calculation of the minimum recovery point for non-startup processes, which can cause incorrect page references on a standby when for example the background writer flushed a couple of pages on-disk but was not updating the control file to let a subsequent crash recovery replay to where it should have. The only case where this has been reported to be a problem is when a standby needs to calculate the latest removed xid when replaying a btree deletion record, so one would need connections on a standby that happen just after recovery has thought it reached a consistent point. Using a background worker which is started after the consistent point is reached would be the easiest way to get into problems if it connects to a database. Having clients which attempt to connect periodically could also be a problem, but the odds of seeing this problem are much lower. The fix used is pretty simple, as the idea is to give access to the minimum recovery point written in the control file to non-startup processes so as they use a reference, while the startup process still initializes its own references of the minimum consistent point so as the original problem with incorrect page references happening post-promotion with a crash do not show up. Reported-by: Alexander Kukushkin Diagnosed-by: Alexander Kukushkin Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Alexander Kukushkin Discussion: https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org Backpatch-through: 9.3	2018-08-31 11:03:40 -07:00
Etsuro Fujita	7cfdc77023	Disable support for partitionwise joins in problematic cases. Commit `f49842d`, which added support for partitionwise joins, built the child's tlist by applying adjust_appendrel_attrs() to the parent's. So in the case where the parent's included a whole-row Var for the parent, the child's contained a ConvertRowtypeExpr. To cope with that, that commit added code to the planner, such as setrefs.c, but some code paths still assumed that the tlist for a scan (or join) rel would only include Vars and PlaceHolderVars, which was true before that commit, causing errors: * When creating an explicit sort node for an input path for a mergejoin path for a child join, prepare_sort_from_pathkeys() threw the 'could not find pathkey item to sort' error. * When deparsing a relation participating in a pushed down child join as a subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw the 'unexpected expression in subquery output' error. * When performing set_plan_references() on a local join plan generated by contrib/postgres_fdw for EvalPlanQual support for a pushed down child join, fix_join_expr() threw the 'variable not found in subplan target lists' error. To fix these, two approaches have been proposed: one by Ashutosh Bapat and one by me. While the former keeps building the child's tlist with a ConvertRowtypeExpr, the latter builds it with a whole-row Var for the child not to violate the planner assumption, and tries to fix it up later, But both approaches need more work, so refuse to generate partitionwise join paths when whole-row Vars are involved, instead. We don't need to handle ConvertRowtypeExprs in the child's tlists for now, so this commit also removes the changes to the planner. Previously, partitionwise join computed attr_needed data for each child separately, and built the child join's tlist using that data, which also required an extra step for adding PlaceHolderVars to that tlist, but it would be more efficient to build it from the parent join's tlist through the adjust_appendrel_attrs() transformation. So this commit builds that list that way, and simplifies build_joinrel_tlist() and placeholder.c as well as part of set_append_rel_size() to basically what they were before partitionwise join went in. Back-patch to PG11 where partitionwise join was introduced. Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also provided some of regression tests. Patch by me, reviewed by Robert Haas. Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com	2018-08-31 20:34:06 +09:00
Etsuro Fujita	2e39f69b66	Remove extra word from src/backend/optimizer/README	2018-08-31 16:40:17 +09:00
Peter Eisentraut	1e5e4efd02	Error position support for partition specifications Add support for error position reporting for partition specifications. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2018-08-30 08:20:23 +02:00
Peter Eisentraut	a4a232b1e7	Error position support for defaults and check constraints Add support for error position reporting for the expressions contained in defaults and check constraint definitions. This currently works only for CREATE TABLE, not ALTER TABLE, because the latter is not set up to pass around the original query string. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2018-08-30 08:20:23 +02:00
Michael Paquier	55875b6d2a	Stop bgworkers during fast shutdown with postmaster in startup phase When a postmaster gets into its phase PM_STARTUP, it would start background workers using BgWorkerStart_PostmasterStart mode immediately, which would cause problems for a fast shutdown as the postmaster forgets to send SIGTERM to already-started background workers. With smart and immediate shutdowns, this correctly happened, and fast shutdown is the only mode missing the shot. Author: Alexander Kukushkin Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAFh8B=mvnD8+DZUfzpi50DoaDfZRDfd7S=gwj5vU9GYn8UvHkA@mail.gmail.com Backpatch-through: 9.5	2018-08-29 17:10:02 -07:00
Andrew Gierth	c8ea87e4bd	Avoid quadratic slowdown in regexp match/split functions. regexp_matches, regexp_split_to_table and regexp_split_to_array all work by compiling a list of match positions as character offsets (NOT byte positions) in the source string. Formerly, they then used text_substr to extract the matched text; but in a multi-byte encoding, that counts the characters in the string, and the characters needed to reach the starting byte position, on every call. Accordingly, the performance degraded as the product of the input string length and the number of match positions, such that splitting a string of a few hundred kbytes could take many minutes. Repair by keeping the wide-character copy of the input string available (only in the case where encoding_max_length is not 1) after performing the match operation, and extracting substrings from that instead. This reduces the complexity to being linear in the number of result bytes, discounting the actual regexp match itself (which is not affected by this patch). In passing, remove cleanup using retail pfree() which was obsoleted by commit `ff428cded` (Feb 2008) which made cleanup of SRF multi-call contexts automatic. Also increase (to ~134 million) the maximum number of matches and provide an error message when it is reached. Backpatch all the way because this has been wrong forever. Analysis and patch by me; review by Kaiting Chen. Discussion: https://postgr.es/m/87pnyn55qh.fsf@news-spur.riddles.org.uk see also https://postgr.es/m/87lg996g4r.fsf@news-spur.riddles.org.uk	2018-08-28 12:17:33 +01:00
Peter Eisentraut	7a3b7bbfde	Fix snapshot leak warning for some procedures The problem arises with the combination of CALL with output parameters and doing a COMMIT inside the procedure. When a CALL has output parameters, the portal uses the strategy PORTAL_UTIL_SELECT instead of PORTAL_MULTI_QUERY. Using PORTAL_UTIL_SELECT causes the portal's snapshot to be registered with the current resource owner (portal->holdSnapshot); see `9ee1cf04ab` for the reason. Normally, PortalDrop() unregisters the snapshot. If not, then ResourceOwnerRelease() will print a warning about a snapshot leak on transaction commit. A transaction commit normally drops all portals (PreCommit_Portals()), except the active portal. So in case of the active portal, we need to manually release the snapshot to avoid the warning. Reported-by: Prabhat Sahu <prabhat.sahu@enterprisedb.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>	2018-08-27 22:16:15 +02:00
Michael Paquier	a556549d7e	Improve VACUUM and ANALYZE by avoiding early lock queue A caller of VACUUM can perform early lookup obtention which can cause other sessions to block on the request done, causing potentially DOS attacks as even a non-privileged user can attempt a vacuum fill of a critical catalog table to block even all incoming connection attempts. Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after building the list of relations to VACUUM, which can cause vacuum_rel() or analyze_rel() to try to lock the relation but the operation would just block. When the client specifies a list of relations and the relation needs to be skipped, ownership checks are done when building the list of relations to work on, preventing a later lock attempt. vacuum_rel() already had the sanity checks needed, except that those were applied too late. This commit refactors the code so as relation skips are checked beforehand, making it safer to avoid too early locks, for both manual VACUUM with and without a list of relations specified. An isolation test is added emulating the fact that early locks do not happen anymore, issuing a WARNING message earlier if the user calling VACUUM is not a relation owner. When a partitioned table is listed in a manual VACUUM or ANALYZE command, its full list of partitions is fetched, all partitions get added to the list to work on, and then each one of them is processed one by one, with ownership checks happening at the later phase of vacuum_rel() or analyze_rel(). Trying to do early ownership checks for each partition is proving to be tedious as this would result in deadlock risks with lock upgrades, and skipping all partitions if the listed partitioned table is not owned would result in a behavior change compared to how Postgres 10 has implemented vacuum for partitioned tables. The original problem reported related to early lock queue for critical relations is fixed anyway, so priority is given to avoiding a backward-incompatible behavior. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180812222142.GA6097@paquier.xyz	2018-08-27 09:11:12 +09:00
Thomas Munro	18e586741b	Fix typos. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f8du35u5DprpykWvgNEScxapbWYJdHq%2Bz06Wj3Y2KFPbw%40mail.gmail.com	2018-08-27 09:32:59 +12:00
Tom Lane	bff84a547d	Make syslogger more robust against failures in opening CSV log files. The previous coding figured it'd be good enough to postpone opening the first CSV log file until we got a message we needed to write there. This is unsafe, though, because if the open fails we end up in infinite recursion trying to report the failure. Instead make the CSV log file management code look as nearly as possible like the longstanding logic for the stderr log file. In particular, open it immediately at postmaster startup (if enabled), or when we get a SIGHUP in which we find that log_destination has been changed to enable CSV logging. It seems OK to fail if a postmaster-start-time open attempt fails, as we've long done for the stderr log file. But we can't die if we fail to open a CSV log file during SIGHUP, so we're still left with a problem. In that case, write any output meant for the CSV log file to the stderr log file. (This will also cover race-condition cases in which backends send CSV log data before or after we have the CSV log file open.) This patch also fixes an ancient oversight that, if CSV logging was turned off during a SIGHUP, we never actually closed the last CSV log file. In passing, remember to reset whereToSendOutput = DestNone during syslogger start, since (unlike all other postmaster children) it's forked before the postmaster has done that. This made for a platform-dependent difference in error reporting behavior between the syslogger and other children: except on Windows, it'd report problems to the original postmaster stderr as well as the normal error log file(s). It's barely possible that that was intentional at some point; but it doesn't seem likely to be desirable in production, and the platform dependency definitely isn't desirable. Per report from Alexander Kukushkin. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAFh8B==iLUD_gqC-dAENS0V+kVrCeGiKujtKqSQ7++S-caaChw@mail.gmail.com	2018-08-26 14:21:55 -04:00
Jeff Davis	ba9d35b8eb	Reconsider new file extension in commit `91f26d5f`. Andres and Tom objected to the choice of the ".tmp" extension. Changing to Andres's suggestion of ".spill". Discussion: https://postgr.es/m/88092095-3348-49D8-8746-EB574B1D30EA%40anarazel.de	2018-08-25 22:52:46 -07:00
Jeff Davis	91f26d5fe4	Change extension of spilled ReorderBufferChange data to ".tmp". The previous extension, ".snap", was chosen for historical reasons and became confusing. Discussion: https://postgr.es/m/CAMp0ubd_P8vBGx8=MfDXQJZxHA5D_Zarw5cCkDxJ_63+pWRJ9w@mail.gmail.com	2018-08-25 09:36:09 -07:00
Andres Freund	cb92520563	LLVMJIT: LLVMGetHostCPUFeatures now is upstream, use LLMV version if available. Noticed thanks to buildfarm animal seawasp. Author: Andres Freund Backpatch: v11-, where LLVM based JIT compliation was introduced.	2018-08-24 10:21:38 -07:00
Tom Lane	b0c5da615e	Suppress uninitialized-variable warning in new SCRAM code. While we generally don't sweat too much about "may be used uninitialized" warnings from older compilers, I noticed that there's a fair number of buildfarm animals that are producing such a warning only for this variable. So it seems worth silencing.	2018-08-24 10:51:10 -04:00
Andres Freund	143290efd0	Introduce minimal C99 usage to verify compiler support. This just converts a few for loops in postgres.c to declare variables in the loop initializer, and uses designated initializers in smgr.c's definition of smgr callbacks. Author: Andres Freund Discussion: https://postgr.es/m/97d4b165-192d-3605-749c-f614a0c4e783@2ndquadrant.com	2018-08-23 18:36:07 -07:00
Andres Freund	88ebd62fcc	Deduplicate code between slot_getallattrs() and slot_getsomeattrs(). Code in slot_getallattrs() is the same as if slot_getsomeattrs() is called with number of attributes specified in the tuple descriptor. Implement it that way instead of duplicating the code between those two functions. This is part of a patchseries abstracting TupleTableSlots so they can store arbitrary forms of tuples, but is a nice enough cleanup on its own. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-08-23 16:58:53 -07:00
Andrew Gierth	a40631a920	Fix lexing of standard multi-character operators in edge cases. Commits `c6b3c939b` (which fixed the precedence of >=, <=, <> operators) and `865f14a2d` (which added support for the standard => notation for named arguments) created a class of lexer tokens which look like multi-character operators but which have their own token IDs distinct from Op. However, longest-match rules meant that following any of these tokens with another operator character, as in (1<>-1), would cause them to be incorrectly returned as Op. The error here isn't immediately obvious, because the parser would usually still find the correct operator via the Op token, but there were more subtle problems: 1. If immediately followed by a comment or +-, >= <= <> would be given the old precedence of Op rather than the correct new precedence; 2. If followed by a comment, != would be returned as Op rather than as NOT_EQUAL, causing it not to be found at all; 3. If followed by a comment or +-, the => token for named arguments would be lexed as Op, causing the argument to be mis-parsed as a simple expression, usually causing an error. Fix by explicitly checking for the operators in the {operator} code block in addition to all the existing special cases there. Backpatch to 9.5 where the problem was introduced. Analysis and patch by me; review by Tom Lane. Discussion: https://postgr.es/m/87va851ppl.fsf@news-spur.riddles.org.uk	2018-08-23 21:42:40 +01:00
Andrew Gierth	d4a63f8297	Reduce an unnecessary O(N^3) loop in lexer. The lexer's handling of operators contained an O(N^3) hazard when dealing with long strings of + or - characters; it seems hard to prevent this case from being O(N^2), but the additional N multiplier was not needed. Backpatch all the way since this has been there since 7.x, and it presents at least a mild hazard in that trying to do Bind, PREPARE or EXPLAIN on a hostile query could take excessive time (without honouring cancels or timeouts) even if the query was never executed.	2018-08-23 21:42:40 +01:00
Peter Eisentraut	0a63f996e0	Change PROCEDURE to FUNCTION in CREATE TRIGGER syntax Since procedures are now a different thing from functions, change the CREATE TRIGGER and CREATE EVENT TRIGGER syntax to use FUNCTION in the clause that specifies the function. PROCEDURE is still accepted for compatibility. pg_dump and ruleutils.c output is not changed yet, because that would require a change in information_schema.sql and thus a catversion change. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>	2018-08-22 14:44:49 +02:00
Peter Eisentraut	d12782898e	Change PROCEDURE to FUNCTION in CREATE OPERATOR syntax Since procedures are now a different thing from functions, change the CREATE OPERATOR syntax to use FUNCTION in the clause that specifies the function. PROCEDURE is still accepted for compatibility. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>	2018-08-22 14:44:49 +02:00
Peter Eisentraut	b19495772e	doc: Update uses of the word "procedure" Historically, the term procedure was used as a synonym for function in Postgres/PostgreSQL. Now we have procedures as separate objects from functions, so we need to clean up the documentation to not mix those terms. In particular, mentions of "trigger procedures" are changed to "trigger functions", and access method "support procedures" are changed to "support functions". (The latter already used FUNCTION in the SQL syntax anyway.) Also, the terminology in the SPI chapter has been cleaned up. A few tests, examples, and code comments are also adjusted to be consistent with documentation changes, but not everything. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>	2018-08-22 14:44:49 +02:00
Thomas Munro	af63926cf5	Wrap long line in postgresql.conf.sample. Per complaint from Michael Paquier.	2018-08-22 21:34:50 +12:00
Thomas Munro	f9fe269ca2	Provide plan_cache_mode options in postgresql.conf.sample. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f8YkwojSTSg8YjNYCLCXzx0fR7wBR3Gf%2BrA9_52eoPZKg%40mail.gmail.com	2018-08-22 18:19:39 +12:00
Alvaro Herrera	55a0154efd	Fix typo	2018-08-21 17:16:10 -03:00
Alvaro Herrera	083651da17	fix typo	2018-08-21 17:03:35 -03:00
Michael Paquier	72be8c29a1	Fix set of NLS translation issues While monitoring the code, a couple of issues related to string translation has showed up: - Some routines for auto-updatable views return an error string, which sometimes missed the shot. A comment regarding string translation is added for each routine to help with future features. - GSSAPI authentication missed two translations. - vacuumdb handles non-translated strings. - GetConfigOptionByNum should translate strings. This part is not back-patched as after a minor upgrade this could be surprising for users. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp Backpatch-through: 9.3	2018-08-21 15:17:13 +09:00
Michael Paquier	d8c83800c3	Fix typo in description of enable_parallel_hash Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20180821.115841.93250330.horiguchi.kyotaro@lab.ntt.co.jp	2018-08-21 12:13:16 +09:00
Michael Paquier	1339fcc896	Clarify comment about assignment and reset of temp namespace ID in MyProc The new wording comes from Álvaro, which I modified a bit. Reported-by: Andres Freund, Álvaro Herrera Author: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20180809165047.GK13638@paquier.xyz Backpatch-through: 11	2018-08-21 08:32:18 +09:00
Peter Eisentraut	d83423db86	Improve error messages for CREATE OR REPLACE PROCEDURE Change the hint to recommend DROP PROCEDURE instead of FUNCTION. Also make the error message when changing the return type more specific to the case of procedures. Reported-by: Jeremy Evans <code@jeremyevans.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2018-08-18 22:05:43 +02:00
Peter Eisentraut	e4597ee65d	InsertPgAttributeTuple() to set attcacheoff InsertPgAttributeTuple() is the interface between in-memory tuple descriptors and on-disk pg_attribute, so it makes sense to give it the job of resetting attcacheoff. This avoids having all the callers having to do so. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2018-08-17 22:08:21 +02:00
Andrew Gierth	520acab171	Set scan direction appropriately for SubPlans (bug #15336 ) When executing a SubPlan in an expression, the EState's direction field was left alone, resulting in an attempt to execute the subplan backwards if it was encountered during a backwards scan of a cursor. Also, though much less likely, it was possible to reach the execution of an InitPlan while in backwards-scan state. Repair by saving/restoring estate->es_direction and forcing forward scan mode in the relevant places. Backpatch all the way, since this has been broken since 8.3 (prior to commit `c7ff7663e`, SubPlans had their own EStates rather than sharing the parent plan's, so there was no confusion over scan direction). Per bug #15336 reported by Vladimir Baranoff; analysis and patch by me, review by Tom Lane. Discussion: https://postgr.es/m/153449812167.1304.1741624125628126322@wrigleys.postgresql.org	2018-08-17 15:44:13 +01:00
Tomas Vondra	c4c3400885	Use the built-in float datatypes to implement geometric types This patch makes the geometric operators and functions use the exported function of the float4/float8 datatypes. The main reason of doing so is to check for underflow and overflow, and to handle NaNs consciously. The float datatypes consider NaNs values to be equal and greater than all non-NaN values. This change considers NaNs equal only for equality operators. The placement operators, contains, overlaps, left/right of etc. continue to return false when NaNs are involved. We don't need to worry about them being considered greater than any-NaN because there aren't any basic comparison operators like less/greater than for the geometric datatypes. The changes may be summarised as: * Check for underflow, overflow and division by zero * Consider NaN values to be equal * Return NULL when the distance is NaN for all closest point operators * Favour not-NaN over NaN where it makes sense The patch also replaces all occurrences of "double" as "float8". They are the same, but were used inconsistently in the same file. Author: Emre Hasegeli Reviewed-by: Kyotaro Horiguchi, Tomas Vondra Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com	2018-08-16 19:56:11 +02:00
Tomas Vondra	a082aed072	Remove remaining GEODEBUG references from geo_ops.c Commit `a7dc63d904` removed most of the GEODEBUG occurrences, but there were a couple remaining. So remove them too, to get rid of the macro entirely. Author: Emre Hasegeli Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com	2018-08-16 19:55:43 +02:00
Tom Lane	e1d19c902e	Require a C99-compliant snprintf(), and remove related workarounds. Since our substitute snprintf now returns a C99-compliant result, there's no need anymore to have complicated code to cope with pre-C99 behavior. We can just make configure substitute snprintf.c if it finds that the system snprintf() is pre-C99. (Note: I do not believe that there are any platforms where this test will trigger that weren't already being rejected due to our other C99-ish feature requirements for snprintf. But let's add the check for paranoia's sake.) Then, simplify the call sites that had logic to cope with the pre-C99 definition. I also dropped some stuff that was being paranoid about the possibility of snprintf overrunning the given buffer. The only reports we've ever heard of that being a problem were for Solaris 7, which is long dead, and we've sure not heard any reports of these assertions triggering in a long time. So let's drop that complexity too. Likewise, drop some code that wasn't trusting snprintf to set errno when it returns -1. That would be not-per-spec, and again there's no real reason to believe it is a live issue, especially not for snprintfs that pass all of configure's feature checks. Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us	2018-08-16 13:01:09 -04:00
Alvaro Herrera	1eb9221585	Fix executor prune failure when plan already pruned In a multi-layer partitioning setup, if at plan time all the sub-partitions are pruned but the intermediate one remains, the executor later throws a spurious error that there's nothing to prune. That is correct, but there's no reason to throw an error. Therefore, don't. Reported-by: Andreas Seltenreich <seltenreich@gmx.de> Author: David Rowley <david.rowley@2ndquadrant.com> Discussion: https://postgr.es/m/87in4h98i0.fsf@ansel.ydns.eu	2018-08-16 12:53:43 -03:00
Tomas Vondra	fa73b377ee	Close the file descriptor in ApplyLogicalMappingFile The function was forgetting to close the file descriptor, resulting in failures like this: ERROR: 53000: exceeded maxAllocatedDescs (492) while trying to open file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b" LOCATION: OpenTransientFile, fd.c:2161 Simply close the file at the end, and backpatch to 9.4 (where logical decoding was introduced). While at it, fix a nearby typo. Discussion: https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com	2018-08-16 16:49:57 +02:00
Michael Paquier	3593579bcd	Update comment in header of errcodes.txt This file mentions all the files generated from it, but missed that errcodes-list.sgml is no more, while errcodes-table.sgml is. Author: Noriyoshi Shinoda Discussion: https://postgr.es/m/TU4PR8401MB0430855D6B971E49EB55F328EE3E0@TU4PR8401MB0430.NAMPRD84.PROD.OUTLOOK.COM	2018-08-16 09:47:59 +02:00
Thomas Munro	ca1e64feba	Improve comment in GetNewObjectId(). The previous comment gave the impression that skipping OIDs before FirstNormalObjectId was merely an optimization to avoid likely collisions. In fact other parts of the system have been relying on this threshold to detect system-created objects since commit `8e18d04d4d`, so adjust the wording. Author: Thomas Munro Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEepm%3D33JASACeOayr_W3%3DCSjy2jiPxM-k89axu0akFbHdjnjA%40mail.gmail.com	2018-08-16 17:17:30 +12:00
Alvaro Herrera	ab7dbd681c	Update FSM on WAL replay of page all-visible/frozen We aren't very strict about keeping FSM up to date on WAL replay, because per-page freespace values aren't critical in replicas (can't write to heap in a replica; and if the replica is promoted, the values would be updated by VACUUM anyway). However, VACUUM since 9.6 can skip processing pages marked all-visible or all-frozen, and if such pages are recorded in FSM with wrong values, those values are blindly propagated to FSM's upper layers by VACUUM's FreeSpaceMapVacuum. (This rationale assumes that crashes are not very frequent, because those would cause outdated FSM to occur in the primary.) Even when the FSM is outdated in standby, things are not too bad normally, because, most per-page FSM values will be zero (other than those propagated with the base-backup that created the standby); only once the remaining free space is less than 0.2*BLCKSZ the per-page value is maintained by WAL replay of heap ins/upd/del. However, if wal_log_hints=on causes complete FSM pages to be propagated to a standby via full-page images, many too-optimistic per-page values can end up being registered in the standby. Incorrect per-page values aren't critical in most cases, since an inserter that is given a page that doesn't actually contain the claimed free space will update FSM with the correct value, and retry until it finds a usable page. However, if there are many such updates to do, an inserter can spend a long time doing them before a usable page is found; in a heavily trafficked insert-only table with many concurrent inserters this has been observed to cause several second stalls, causing visible application malfunction. To fix this problem, it seems sufficient to have heap_xlog_visible (replay of setting all-visible and all-frozen VM bits for a heap page) update the FSM value for the page being processed. This fixes the per-page counters together with making the page skippable to vacuum, so when vacuum does FreeSpaceMapVacuum, the values propagated to FSM upper layers are the correct ones, avoiding the problem. While at it, apply the same fix to heap_xlog_clean (replay of tuple removal by HOT pruning and vacuum). This makes any space freed by the cleaning available earlier than the next vacuum in the promoted replica. Backpatch to 9.6, where this problem was diagnosed on an insert-only table with all-frozen pages, which were introduced as a concept in that release. Theoretically it could apply with all-visible pages to older branches, but there's been no report of that and it doesn't backpatch cleanly anyway. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20180802172857.5skoexsilnjvgruk@alvherre.pgsql	2018-08-15 18:09:29 -03:00
Tom Lane	cc4f6b7786	Clean up assorted misuses of snprintf()'s result value. Fix a small number of places that were testing the result of snprintf() but doing so incorrectly. The right test for buffer overrun, per C99, is "result >= bufsize" not "result > bufsize". Some places were also checking for failure with "result == -1", but the standard only says that a negative value is delivered on failure. (Note that this only makes these places correct if snprintf() delivers C99-compliant results. But at least now these places are consistent with all the other places where we assume that.) Also, make psql_start_test() and isolation_start_test() check for buffer overrun while constructing their shell commands. There seems like a higher risk of overrun, with more severe consequences, here than there is for the individual file paths that are made elsewhere in the same functions, so this seemed like a worthwhile change. Also fix guc.c's do_serialize() to initialize errno = 0 before calling vsnprintf. In principle, this should be unnecessary because vsnprintf should have set errno if it returns a failure indication ... but the other two places this coding pattern is cribbed from don't assume that, so let's be consistent. These errors are all very old, so back-patch as appropriate. I think that only the shell command overrun cases are even theoretically reachable in practice, but there's not much point in erroneous error checks. Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us	2018-08-15 16:29:31 -04:00
Peter Eisentraut	b68ff3ea67	Remove obsolete netbsd dynloader code dlopen() has been documented since NetBSD 1.1 (1995).	2018-08-13 23:21:01 +02:00
Peter Eisentraut	29351a06af	Remove obsolete openbsd dynloader code dlopen() has been documented since OpenBSD 2.0 (1996).	2018-08-13 23:21:01 +02:00
Peter Eisentraut	2db1905fdd	Remove obsolete freebsd dynloader code dlopen() has been documented since FreeBSD 3.0 (1989).	2018-08-13 23:21:01 +02:00
Peter Eisentraut	351855fc4e	Remove obsolete linux dynloader code This has been obsolete probably since the late 1990s.	2018-08-13 23:21:01 +02:00
Peter Eisentraut	b5d29299cc	Remove obsolete darwin dynloader code not needed since macOS 10.3 (2003)	2018-08-13 23:21:01 +02:00
Peter Eisentraut	3ebdd21b79	Remove obsolete comment The sequence name is no longer stored in the sequence relation, since `1753b1b027`.	2018-08-13 21:07:31 +02:00
Michael Paquier	246a6c8f7b	Make autovacuum more aggressive to remove orphaned temp tables Commit `dafa084`, added in 10, made the removal of temporary orphaned tables more aggressive. This commit makes an extra step into the aggressiveness by adding a flag in each backend's MyProc which tracks down any temporary namespace currently in use. The flag is set when the namespace gets created and can be reset if the temporary namespace has been created in a transaction or sub-transaction which is aborted. The flag value assignment is assumed to be atomic, so this can be done in a lock-less fashion like other flags already present in PGPROC like databaseId or backendId, still the fact that the temporary namespace and table created are still locked until the transaction creating those commits acts as a barrier for other backends. This new flag gets used by autovacuum to discard more aggressively orphaned tables by additionally checking for the database a backend is connected to as well as its temporary namespace in-use, removing orphaned temporary relations even if a backend reuses the same slot as one which created temporary relations in a past session. The base idea of this patch comes from Robert Haas, has been written in its first version by Tsunakawa Takayuki, then heavily reviewed by me. Author: Tsunakawa Takayuki Reviewed-by: Michael Paquier, Kyotaro Horiguchi, Andres Freund Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05 Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI breakages on already released versions.	2018-08-13 11:49:04 +02:00
Amit Kapila	4f9a97e417	Adjust comment atop ExecShutdownNode. After commits `a315b967cc` and `b805b63ac2`, part of the comment atop ExecShutdownNode is redundant. Adjust it. Author: Amit Kapila Backpatch-through: 10 where both the mentioned commits are present. Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-13 10:04:39 +05:30
Amit Kapila	2cd0acfdad	Prohibit shutting down resources if there is a possibility of back up. Currently, we release the asynchronous resources as soon as it is evident that no more rows will be needed e.g. when a Limit is filled. This can be problematic especially for custom and foreign scans where we can scan backward. Fix that by disallowing the shutting down of resources in such cases. Reported-by: Robert Haas Analysed-by: Robert Haas and Amit Kapila Author: Amit Kapila Reviewed-by: Robert Haas Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-13 08:22:18 +05:30
Andrew Gierth	07172d5aff	Avoid query-lifetime memory leaks in XMLTABLE (bug #15321 ) Multiple calls to XMLTABLE in a query (e.g. laterally applying it to a table with an xml column, an important use-case) were leaking large amounts of memory into the per-query context, blowing up memory usage. Repair by reorganizing memory context usage in nodeTableFuncscan; use the usual per-tuple context for row-by-row evaluations instead of perValueCxt, and use the explicitly created context -- renamed from perValueCxt to perTableCxt -- for arguments and state for each individual table-generation operation. Backpatch to PG10 where this code was introduced. Original report by IRC user begriffs; analysis and patch by me. Reviewed by Tom Lane and Pavel Stehule. Discussion: https://postgr.es/m/153394403528.10284.7530399040974170549@wrigleys.postgresql.org	2018-08-13 01:59:45 +01:00
Tom Lane	4a2994f055	Fix wrong order of operations in inheritance_planner. When considering a partitioning parent rel, we should stop processing that subroot as soon as we've done adjust_appendrel_attrs and any securityQuals updates. The rest of this is unnecessary, and indeed adding duplicate subquery RTEs to the subroot is wrong. As the code stood, the children of that partition ended up with two sets of copied subquery RTEs, confusing matters greatly. Even more hilarity ensued if all of the children got excluded by constraint exclusion, so that the extra RTEs didn't make it back into the parent rtable. Per fuzz testing by Andreas Seltenreich. Back-patch to v11 where this got broken (by commit `0a480502b`, it looks like). Discussion: https://postgr.es/m/87va8g7vq0.fsf@ansel.ydns.eu	2018-08-11 15:53:20 -04:00
Andrew Dunstan	5c047fd709	Revert changes in execMain.c from commit `16828d5c02` These changes were put in at some stage of the development process, but are unnecessary and should not have made it into the final patch. Mea culpa. Per gripe from Andreas Freund Backpatch to REL_11_STABLE	2018-08-10 16:09:25 -04:00
Peter Geoghegan	4974d7f87e	Handle parallel index builds on mapped relations. Commit `9da0cc3528`, which introduced parallel CREATE INDEX, failed to propagate relmapper.c backend local cache state to parallel worker processes. This could result in parallel index builds against mapped catalog relations where the leader process (participating as a worker) scans the new, pristine relfilenode, while worker processes scan the obsolescent relfilenode. When this happened, the final index structure was typically not consistent with the owning table's structure. The final index structure could contain entries formed from both heap relfilenodes. Only rebuilds on mapped catalog relations that occur as part of a VACUUM FULL or CLUSTER could become corrupt in practice, since their mapped relation relfilenode swap is what allows the inconsistency to arise. On master, fix the problem by propagating the required relmapper.c backend state as part of standard parallel initialization (Cf. commit `29d58fd3`). On v11, simply disallow builds against mapped catalog relations by deeming them parallel unsafe. Author: Peter Geoghegan Reported-By: "death lock" Reviewed-By: Tom Lane, Amit Kapila Bug: #15309 Discussion: https://postgr.es/m/153329671686.1405.18298309097348420351@wrigleys.postgresql.org Backpatch: 11-, where parallel CREATE INDEX was introduced.	2018-08-10 13:01:34 -07:00
Michael Paquier	f841ceb26d	Improve TRUNCATE by avoiding early lock queue A caller of TRUNCATE could previously queue for an access exclusive lock on a relation it may not have permission to truncate, potentially interfering with users authorized to work on it. This can be very intrusive depending on the lock attempted to be taken. For example, pg_authid could be blocked, preventing any authentication attempt to happen on a PostgreSQL instance. This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is used with a callback doing the necessary ACL checks at an earlier stage, avoiding lock queuing issues, so as an immediate failure happens for unprivileged users instead of waiting on a lock that would not be taken. This is rather similar to the type of work done in `cbe24a6` for CLUSTER, and the code of TRUNCATE is this time refactored so as there is no user-facing changes. As the commit for CLUSTER, no back-patch is done. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180806165816.GA19883@paquier.xyz	2018-08-10 18:26:59 +02:00
Alexander Korotkov	2b13702d5c	Fix typo in SP-GiST error message Error message didn't match the actual check. Fix that. Compression of leaf SP-GiST values was introduced in 11. So, backpatch. Discussion: https://postgr.es/m/20180810.100742.15469435.horiguchi.kyotaro%40lab.ntt.co.jp Author: Kyotaro Horiguchi Backpatch-through: 11	2018-08-10 17:28:48 +03:00
Heikki Linnakangas	31380bc7c2	Spell "partitionwise" consistently. I'm not sure which spelling is better, "partitionwise" or "partition-wise", but everywhere else we spell it "partitionwise", so be consistent. Tatsuro Yamada reported the one in README, I found the other one with grep. Discussion: https://www.postgresql.org/message-id/d25ebf36-5a6d-8b2c-1ff3-d6f022a56000@lab.ntt.co.jp	2018-08-09 10:43:18 +03:00
Michael Paquier	661dd23950	Restrict access to reindex of shared catalogs for non-privileged users A database owner running a database-level REINDEX has the possibility to also do the operation on shared system catalogs without being an owner of them, which allows him to block resources it should not have access to. The same goes for a schema owner. For example, PostgreSQL would go unresponsive and even block authentication if a lock is waited for pg_authid. This commit makes sure that a user running a REINDEX SYSTEM, DATABASE or SCHEMA only works on the following relations: - The user is a superuser - The user is the table owner - The user is the database/schema owner, only if the relation worked on is not shared. Robert has worded most the documentation changes, and I have coded the core part. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier, Robert Haas Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180805211059.GA2185@paquier.xyz Backpatch-through: 11- as the current behavior has been around for a very long time and could be disruptive for already released branches.	2018-08-09 09:40:15 +02:00
Tom Lane	59ef49d26d	Remove bogus Assert in make_partitionedrel_pruneinfo(). This Assert thought that a given rel couldn't be both leaf and non-leaf, but it turns out that in some unusual plan trees that's wrong, so remove it. The lack of testing for cases like that is quite concerning --- there is little reason for confidence that there aren't other bugs in the area. But developing a stable test case seems rather difficult, and in any case we don't need this Assert. David Rowley Discussion: https://postgr.es/m/CAJGNTeOkdk=UVuMugmKL7M=owgt4nNr1wjxMg1F+mHsXyLCzFA@mail.gmail.com	2018-08-08 20:02:32 -04:00
Heikki Linnakangas	8e19a82640	Don't run atexit callbacks in quickdie signal handlers. exit() is not async-signal safe. Even if the libc implementation is, 3rd party libraries might have installed unsafe atexit() callbacks. After receiving SIGQUIT, we really just want to exit as quickly as possible, so we don't really want to run the atexit() callbacks anyway. The original report by Jimmy Yih was a self-deadlock in startup_die(). However, this patch doesn't address that scenario; the signal handling while waiting for the startup packet is more complicated. But at least this alleviates similar problems in the SIGQUIT handlers, like that reported by Asim R P later in the same thread. Backpatch to 9.3 (all supported versions). Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com	2018-08-08 19:10:32 +03:00
Tom Lane	11e22e486d	Match RelOptInfos by relids not pointer equality. Commit `1c2cb2744` added some code that tried to detect whether two RelOptInfos were the "same" rel by pointer comparison; but it turns out that inheritance_planner breaks that, through its shenanigans with copying some relations forward into new subproblems. Compare relid sets instead. Add a regression test case to exercise this area. Problem reported by Rushabh Lathia; diagnosis and fix by Amit Langote, modified a bit by me. Discussion: https://postgr.es/m/CAGPqQf3anJGj65bqAQ9edDr8gF7qig6_avRgwMT9MsZ19COUPw@mail.gmail.com	2018-08-08 11:44:50 -04:00
Tom Lane	9b7c56d6cb	Don't record FDW user mappings as members of extensions. CreateUserMapping has a recordDependencyOnCurrentExtension call that's been there since extensions were introduced (very possibly my fault). However, there's no support anywhere else for user mappings as members of extensions, nor are they listed as a possible member object type in the documentation. Nor does it really seem like a good idea for user mappings to belong to extensions when roles don't. Hence, remove the bogus call. (As we saw in bug #15310, the lack of any pg_dump support for this case ensures that any such membership record would silently disappear during pg_upgrade. So there's probably no need for us to do anything else about cleaning up after this mistake.) Discussion: https://postgr.es/m/27952.1533667213@sss.pgh.pa.us	2018-08-07 16:32:50 -04:00
Tom Lane	41db97399d	Fix incorrect initialization of BackendActivityBuffer. Since commit `c8e8b5a6e`, this has been zeroed out using the wrong length. In practice the length would always be too small, leading to not zeroing the whole buffer rather than clobbering additional memory; and that's pretty harmless, both because shmem would likely start out as zeroes and because we'd reinitialize any given entry before use. Still, it's bogus, so fix it. Reported by Petru-Florin Mihancea (bug #15312) Discussion: https://postgr.es/m/153363913073.1303.6518849192351268091@wrigleys.postgresql.org	2018-08-07 16:00:44 -04:00
Heikki Linnakangas	77291139c7	Remove support for tls-unique channel binding. There are some problems with the tls-unique channel binding type. It's not supported by all SSL libraries, and strictly speaking it's not defined for TLS 1.3 at all, even though at least in OpenSSL, the functions used for it still seem to work with TLS 1.3 connections. And since we had no mechanism to negotiate what channel binding type to use, there would be awkward interoperability issues if a server only supported some channel binding types. tls-server-end-point seems feasible to support with any SSL library, so let's just stick to that. This removes the scram_channel_binding libpq option altogether, since there is now only one supported channel binding type. This also removes all the channel binding tests from the SSL test suite. They were really just testing the scram_channel_binding option, which is now gone. Channel binding is used if both client and server support it, so it is used in the existing tests. It would be good to have some tests specifically for channel binding, to make sure it really is used, and the different combinations of a client and a server that support or doesn't support it. The current set of settings we have make it hard to write such tests, but I did test those things manually, by disabling HAVE_BE_TLS_GET_CERTIFICATE_HASH and/or HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH. I also removed the SCRAM_CHANNEL_BINDING_TLS_END_POINT constant. This is a matter of taste, but IMO it's more readable to just use the "tls-server-end-point" string. Refactor the checks on whether the SSL library supports the functions needed for tls-server-end-point channel binding. Now the server won't advertise, and the client won't choose, the SCRAM-SHA-256-PLUS variant, if compiled with an OpenSSL version too old to support it. In the passing, add some sanity checks to check that the chosen SASL mechanism, SCRAM-SHA-256 or SCRAM-SHA-256-PLUS, matches whether the SCRAM exchange used channel binding or not. For example, if the client selects the non-channel-binding variant SCRAM-SHA-256, but in the SCRAM message uses channel binding anyway. It's harmless from a security point of view, I believe, and I'm not sure if there are some other conditions that would cause the connection to fail, but it seems better to be strict about these things and check explicitly. Discussion: https://www.postgresql.org/message-id/ec787074-2305-c6f4-86aa-6902f98485a4%40iki.fi	2018-08-05 13:44:21 +03:00
Tom Lane	b8a1247a34	Fix INSERT ON CONFLICT UPDATE through a view that isn't just SELECT . When expanding an updatable view that is an INSERT's target, the rewriter failed to rewrite Vars in the ON CONFLICT UPDATE clause. This accidentally worked if the view was just "SELECT FROM ...", as the transformation would be a no-op in that case. With more complicated view targetlists, this omission would often lead to "attribute ... has the wrong type" errors or even crashes, as reported by Mario De Frutos Dieguez. Fix by adding code to rewriteTargetView to fix up the data structure correctly. The easiest way to update the exclRelTlist list is to rebuild it from scratch looking at the new target relation, so factor the code for that out of transformOnConflictClause to make it sharable. In passing, avoid duplicate permissions checks against the EXCLUDED pseudo-relation, and prevent useless view expansion of that relation's dummy RTE. The latter is only known to happen (after this patch) in cases where the query would fail later due to not having any INSTEAD OF triggers for the view. But by exactly that token, it would create an unintended and very poorly tested state of the query data structure, so it seems like a good idea to prevent it from happening at all. This has been broken since ON CONFLICT was introduced, so back-patch to 9.5. Dean Rasheed, based on an earlier patch by Amit Langote; comment-kibitzing and back-patching by me Discussion: https://postgr.es/m/CAFYwGJ0xfzy8jaK80hVN2eUWr6huce0RU8AgU04MGD00igqkTg@mail.gmail.com	2018-08-04 19:38:58 -04:00
Michael Paquier	5a23c74b63	Reset properly errno before calling write() `6cb3372` enforces errno to ENOSPC when less bytes than what is expected have been written when it is unset, though it forgot to properly reset errno before doing a system call to write(), causing errno to potentially come from a previous system call. Reported-by: Tom Lane Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/31797.1533326676@sss.pgh.pa.us	2018-08-05 05:31:18 +09:00
Peter Geoghegan	b3f919da07	Add table relcache invalidation to index builds. It's necessary to make sure that owning tables have a relcache invalidation prior to advancing the command counter to make newly-entered catalog tuples for the index visible. inval.c must be able to maintain the consistency of the local caches in the event of transaction abort. There is usually only a problem when CREATE INDEX transactions abort, since there is a generic invalidation once we reach index_update_stats(). This bug is of long standing. Problems were made much more likely by the addition of parallel CREATE INDEX (commit `9da0cc3528`), but it is strongly suspected that similar problems can be triggered without involving plan_create_index_workers(). (plan_create_index_workers() triggers a relcache build or rebuild, which previously only happened in rare edge cases.) Author: Peter Geoghegan Reported-By: Luca Ferrari Diagnosed-By: Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAKoxK+5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf=HEQ@mail.gmail.com Backpatch: 9.3-	2018-08-03 15:11:31 -07:00
Amit Kapila	85c9d3475e	Fix buffer usage stats for parallel nodes. The buffer usage stats is accounted only for the execution phase of the node. For Gather and Gather Merge nodes, such stats are accumulated at the time of shutdown of workers which is done after execution of node due to which we missed to account them for such nodes. Fix it by treating nodes as running while we shut down them. We can also miss accounting for a Limit node when Gather or Gather Merge is beneath it, because it can finish the execution before shutting down such nodes. So we allow a Limit node to shut down the resources before it completes the execution. In the passing fix the gather node code to allow workers to shut down as soon as we find that all the tuples from the workers have been retrieved. The original code use to do that, but is accidently removed by commit `01edb5c7fc`. Reported-by: Adrien Nayrat Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-03 11:02:02 +05:30
Amit Kapila	ccc84a956b	Match the buffer usage tracking for leader and worker backends. In the leader backend, we don't track the buffer usage for ExecutorStart phase whereas in worker backend we track it for ExecutorStart phase as well. This leads to different value for buffer usage stats for the parallel and non-parallel query. Change the code so that worker backend also starts tracking buffer usage after ExecutorStart. Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info	2018-08-03 09:11:37 +05:30
Tom Lane	1c2cb2744b	Fix run-time partition pruning for appends with multiple source rels. The previous coding here supposed that if run-time partitioning applied to a particular Append/MergeAppend plan, then all child plans of that node must be members of a single partitioning hierarchy. This is totally wrong, since an Append could be formed from a UNION ALL: we could have multiple hierarchies sharing the same Append, or child plans that aren't part of any hierarchy. To fix, restructure the related plan-time and execution-time data structures so that we can have a separate list or array for each partitioning hierarchy. Also track subplans that are not part of any hierarchy, and make sure they don't get pruned. Per reports from Phil Florent and others. Back-patch to v11, since the bug originated there. David Rowley, with a lot of cosmetic adjustments by me; thanks also to Amit Langote for review. Discussion: https://postgr.es/m/HE1PR03MB17068BB27404C90B5B788BCABA7B0@HE1PR03MB1706.eurprd03.prod.outlook.com	2018-08-01 19:42:52 -04:00
Alvaro Herrera	c40489e449	Fix logical replication slot initialization This was broken in commit `9c7d06d606`, which inadvertently gave the wrong value to fast_forward in one StartupDecodingContext call. Fix by flipping the value. Add a test for the obvious error, namely trying to initialize a replication slot with an nonexistent output plugin. While at it, move the CreateDecodingContext call earlier, so that any errors are reported before sending the CopyBoth message. Author: Dave Cramer <davecramer@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CADK3HHLVkeRe1v4P02-5hj55H3_yJg3AEtpXyEY5T3wuzO2jSg@mail.gmail.com	2018-08-01 17:47:15 -04:00
Alvaro Herrera	91bc213d90	Fix unnoticed variable shadowing in previous commit Per buildfarm.	2018-08-01 17:04:57 -04:00
Alvaro Herrera	1c9bb02d8e	Fix per-tuple memory leak in partition tuple routing Some operations were being done in a longer-lived memory context, causing intra-query leaks. It's not noticeable unless you're doing a large COPY, but if you are, it eats enough memory to cause a problem. Co-authored-by: Kohei KaiGai <kaigai@heterodb.com> Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAOP8fzYtVFWZADq4c=KoTAqgDrHWfng+AnEPEZccyxqxPVbbWQ@mail.gmail.com	2018-08-01 16:29:15 -04:00
Peter Eisentraut	0d5f05cde0	Allow multi-inserts during COPY into a partitioned table CopyFrom allows multi-inserts to be used for non-partitioned tables, but this was disabled for partitioned tables. The reason for this appeared to be that the tuple may not belong to the same partition as the previous tuple did. Not allowing multi-inserts here greatly slowed down imports into partitioned tables. These could take twice as long as a copy to an equivalent non-partitioned table. It seems wise to do something about this, so this change allows the multi-inserts by flushing the so-far inserted tuples to the partition when the next tuple does not belong to the same partition, or when the buffer fills. This improves performance when the next tuple in the stream commonly belongs to the same partition as the previous tuple. In cases where the target partition changes on every tuple, using multi-inserts slightly slows the performance. To get around this we track the average size of the batches that have been inserted and adaptively enable or disable multi-inserts based on the size of the batch. Some testing was done and the regression only seems to exist when the average size of the insert batch is close to 1, so let's just enable multi-inserts when the average size is at least 1.3. More performance testing might reveal a better number for, this, but since the slowdown was only 1-2% it does not seem critical enough to spend too much time calculating it. In any case it may depend on other factors rather than just the size of the batch. Allowing multi-inserts for partitions required a bit of work around the per-tuple memory contexts as we must flush the tuples when the next tuple does not belong the same partition. In which case there is no good time to reset the per-tuple context, as we've already built the new tuple by this time. In order to work around this we maintain two per-tuple contexts and just switch between them every time the partition changes and reset the old one. This does mean that the first of each batch of tuples is not allocated in the same memory context as the others, but that does not matter since we only reset the context once the previous batch has been inserted. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2018-08-01 10:23:09 +02:00
Tom Lane	f3eb76b399	Further fixes for quoted-list GUC values in pg_dump and ruleutils.c. Commits `742869946` et al turn out to be a couple bricks shy of a load. We were dumping the stored values of GUC_LIST_QUOTE variables as they appear in proconfig or setconfig catalog columns. However, although that quoting rule looks a lot like SQL-identifier double quotes, there are two critical differences: empty strings ("") are legal, and depending on which variable you're considering, values longer than NAMEDATALEN might be valid too. So the current technique fails altogether on empty-string list entries (as reported by Steven Winfield in bug #15248) and it also risks truncating file pathnames during dump/reload of GUC values that are lists of pathnames. To fix, split the stored value without any downcasing or truncation, and then emit each element as a SQL string literal. This is a tad annoying, because we now have three copies of the comma-separated-string splitting logic in varlena.c as well as a fourth one in dumputils.c. (Not to mention the randomly-different-from-those splitting logic in libpq...) I looked at unifying these, but it would be rather a mess unless we're willing to tweak the API definitions of SplitIdentifierString, SplitDirectoriesString, or both. That might be worth doing in future; but it seems pretty unsafe for a back-patched bug fix, so for now accept the duplication. Back-patch to all supported branches, as the previous fix was. Discussion: https://postgr.es/m/7585.1529435872@sss.pgh.pa.us	2018-07-31 13:00:14 -04:00
Tom Lane	6574f19127	Remove dead code left behind by `1b6801051`.	2018-07-30 19:11:02 -04:00
Alvaro Herrera	d25d45e4d9	Verify range bounds to bms_add_range when necessary Now that the bms_add_range boundary protections are gone, some alternative ones are needed in a few places. Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Discussion: https://postgr.es/m/3437ccf8-a144-55ff-1e2f-fc16b437823b@lab.ntt.co.jp	2018-07-30 18:45:39 -04:00
Alvaro Herrera	1b68010518	Change bms_add_range to be a no-op for empty ranges In commit `84940644de`, bms_add_range was added with an API to fail with an error if an empty range was specified. This seems arbitrary and unhelpful, so turn that case into a no-op instead. Callers that require further verification on the arguments or result can apply them by themselves. This fixes the bug that partition pruning throws an API error for a case involving the default partition of a default partition, as in the included test case. Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/16590.1532622503@sss.pgh.pa.us	2018-07-30 18:44:33 -04:00
Alvaro Herrera	4f10e7ea7b	Set ActiveSnapshot when logically replaying inserts Input functions for the inserted tuples may require a snapshot, when they are replayed by native logical replication. An example is a domain with a constraint using a SQL-language function, which prior to this commit failed to apply on the subscriber side. Reported-by: Mai Peng <maily.peng@webedia-group.com> Co-authored-by: Minh-Quan TRAN <qtran@itscaro.me> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/4EB4BD78-BFC3-4D04-B8DA-D53DF7160354@webedia-group.com Discussion: https://postgr.es/m/153211336163.1404.11721804383024050689@wrigleys.postgresql.org	2018-07-30 16:30:07 -04:00
Peter Eisentraut	98efa76fe3	Add ssl_library preset parameter This allows querying the SSL implementation used on the server side. It's analogous to using PQsslAttribute(conn, "library") in libpq. Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-07-30 13:46:27 +02:00
Tomas Vondra	ab87b8fedc	Mark variable used only in assertion with PG_USED_FOR_ASSERTS_ONLY Perpendicular lines always intersect, so the line_interpt_line() return value in line_closept_point() was used only in an assertion, triggering compiler warnings in non-assert builds.	2018-07-29 23:08:00 +02:00
Tomas Vondra	74294c7301	Restore handling of -0 in the C field of lines in line_construct(). Commit `a7dc63d904` inadvertedly removed this bit originally introduced by `43fe90f66a`, causing regression test failures on some platforms, due to producing {1,-1,-0} instead of {1,-1,0}.	2018-07-29 21:11:05 +02:00
Michael Paquier	9f7ba88aa4	Fix two oversights from `9ebe0572` which refactored cluster_rel The recheck option became a no-op as ClusterOption failed to set proper values for each element. There was a second code path where local options got overwritten. Both issues have been spotted by Coverity.	2018-07-29 22:00:42 +09:00
Noah Misch	e09144e6ce	Document security implications of qualified names. Commit `5770172cb0` documented secure schema usage, and that advice suffices for using unqualified names securely. Document, in typeconv-func primarily, the additional issues that arise with qualified names. Back-patch to 9.3 (all supported versions). Reviewed by Jonathan S. Katz. Discussion: https://postgr.es/m/20180721012446.GA1840594@rfd.leadboat.com	2018-07-28 20:08:01 -07:00
Tomas Vondra	6bf0bc842b	Provide separate header file for built-in float types Some data types under adt/ have separate header files, but most simple ones do not, and their public functions are defined in builtins.h. As the patches improving geometric types will require making additional functions public, this seems like a good opportunity to create a header for floats types. Commit `1acf757255` made _cmp functions public to solve NaN issues locally for GiST indexes. This patch reworks it in favour of a more widely applicable API. The API uses inline functions, as they are easier to use compared to macros, and avoid double-evaluation hazards. Author: Emre Hasegeli Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com	2018-07-29 03:30:48 +02:00
Tomas Vondra	a7dc63d904	Refactor geometric functions and operators The primary goal of this patch is to eliminate duplicate code and share code between different geometric data types more often, to prepare the ground for additional patches. Until now the code reuse was limited, probably because the simpler types (line and point) were implemented after the more complex ones. The changes are quite extensive and can be summarised as: * Eliminate SQL-level function calls. * Re-use more functions to implement others. * Unify internal function names and signatures. * Remove private functions from geo_decls.h. * Replace should-not-happen checks with assertions. * Add comments describe for various functions. * Remove some unreachable code. * Define delimiter symbols of line datatype like the other ones. * Remove the GEODEBUG macro and printf() calls. * Unify code style of a few oddly formatted lines. While the goal was to cause minimal user-visible changes, it was not possible to keep the original behavior in all cases - for example when handling NaN values, or when reusing code makes the functions return consistent results. Author: Emre Hasegeli Reviewed-by: Kyotaro Horiguchi, me Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com	2018-07-29 02:36:29 +02:00
Alexander Korotkov	d2086b08b0	Reduce path length for locking leaf B-tree pages during insertion In our B-tree implementation appropriate leaf page for new tuple insertion is acquired using _bt_search() function. This function always returns leaf page locked in shared mode. In order to obtain exclusive lock, caller have to relock the page. This commit makes _bt_search() function lock leaf page immediately in exclusive mode when needed. That removes unnecessary relock and, in turn reduces lock contention for B-tree leaf pages. Our experiments on multi-core systems showed acceleration up to 4.5 times in corner case. Discussion: https://postgr.es/m/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Yoshikazu Imai, Simon Riggs, Peter Geoghegan	2018-07-28 00:31:40 +03:00
Alvaro Herrera	8a9b72c3ea	Fix grammar in README.tuplock Author: Brad DeJong Discussion: https://postgr.es/m/CAJnrtnxrA4FqZi0Z6kGPQKMiZkWv2xxgSDQ+hv1jDrf8WCKjjw@mail.gmail.com	2018-07-27 10:56:30 -04:00
Robert Haas	3e32109049	Use key and partdesc from PartitionDispatch where possible. Instead of repeatedly fishing the data out of the relcache entry, let's use the version that we cached in the PartitionDispatch. We could alternatively rip out the PartitionDispatch fields altogether, but it doesn't make much sense to have them and not use them; before this patch, partdesc was set but altogether unused. Amit Langote and I both thought using them was a litle better than removing them, so this patch takes that approach. Discussion: http://postgr.es/m/CA+TgmobFnxcaW-Co-XO8=yhJ5pJXoNkCj6Z7jm9Mwj9FGv-D7w@mail.gmail.com	2018-07-27 09:40:52 -04:00
Amit Kapila	8ce29bb4f0	Fix the buffer release order for parallel index scans. During parallel index scans, if the current page to be read is deleted, we skip it and try to get the next page for a scan without releasing the buffer lock on the current page. To get the next page, sometimes it needs to wait for another process to complete its scan and advance it to the next page. Now, it is quite possible that the master backend has errored out before advancing the scan and issued a termination signal for all workers. The workers failed to notice the termination request during wait because the interrupts are held due to buffer lock on the previous page. This lead to all workers being stuck. The fix is to release the buffer lock on current page before trying to get the next page. We are already doing same in backward scans, but missed it for forward scans. Reported-by: Victor Yegorov Bug: 15290 Diagnosed-by: Thomas Munro and Amit Kapila Author: Amit Kapila Reviewed-by: Thomas Munro Tested-By: Thomas Munro and Victor Yegorov Backpatch-through: 10 where parallel index scans were introduced Discussion: https://postgr.es/m/153228422922.1395.1746424054206154747@wrigleys.postgresql.org	2018-07-27 10:53:00 +05:30
Tom Lane	662d12aea1	Avoid crash in eval_const_expressions if a Param's type changes. Since commit `6719b238e` it's been possible for the values of plpgsql record field variables to be exposed to the planner as Params. (Before that, plpgsql never supplied values for such variables during planning, so that the problematic code wasn't reached.) Other places that touch potentially-type-mutable Params either cope gracefully or do runtime-test-and-ereport checks that the type is what they expect. But eval_const_expressions() just had an Assert, meaning that it either failed the assertion or risked crashes due to using an incompatible value. In this case, rather than throwing an ereport immediately, we can just not perform a const-substitution in case of a mismatch. This seems important for the same reason that the Param fetch was speculative: we might not actually reach this part of the expression at runtime. Test case will follow in a separate commit. Patch by me, pursuant to bug report from Andrew Gierth. Back-patch to v11 where the previous commit appeared. Discussion: https://postgr.es/m/87wotkfju1.fsf@news-spur.riddles.org.uk	2018-07-26 16:08:45 -04:00
Andres Freund	3acc4acd9b	LLVMJIT: Release JIT context after running ExprContext shutdown callbacks. Due to inlining it previously was possible that an ExprContext's shutdown callback pointed to a JITed function. As the JIT context previously was shut down before the shutdown callbacks were called, that could lead to segfaults. Fix the ordering. Reported-By: Dmitry Dolgov Author: Andres Freund Discussion: https://postgr.es/m/CA+q6zcWO7CeAJtHBxgcHn_hj+PenM=tvG0RJ93X1uEJ86+76Ug@mail.gmail.com Backpatch: 11-, where JIT compilation was added	2018-07-25 16:31:49 -07:00
Andres Freund	bcafa263ec	LLVMJIT: Check for 'noinline' attribute in recursively inlined functions. Previously the attribute was only checked for external functions inlined, not "static" functions that had to be inlined as dependencies. This isn't really a bug, but makes debugging a bit harder. The new behaviour also makes more sense. Therefore backpatch. Author: Andres Freund Backpatch: 11-, where JIT compilation was added	2018-07-25 16:23:59 -07:00
Thomas Munro	2d30675952	Pad semaphores to avoid false sharing. In a USE_UNNAMED_SEMAPHORES build, the default on Linux and FreeBSD since commit `ecb0d20a`, we have an array of sem_t objects. This turned out to reduce performance compared to the previous default USE_SYSV_SEMAPHORES on an 8 socket system. Testing showed that the lost performance could be regained by padding the array elements so that they have their own cache lines. This matches what we do for similar hot arrays (see LWLockPadded, WALInsertLockPadded). Back-patch to 10, where unnamed semaphores were adopted as the default semaphore interface on those operating systems. Author: Thomas Munro Reviewed-by: Andres Freund Reported-by: Mithun Cy Tested-by: Mithun Cy, Tom Lane, Thomas Munro Discussion: https://postgr.es/m/CAD__OugYDM3O%2BdyZnnZSbJprSfsGFJcQ1R%3De59T3hcLmDug4_w%40mail.gmail.com	2018-07-25 11:00:29 +12:00
Andres Freund	b2bb3dc0e0	Defend against some potential spurious compiler warnings in `86eaf208e`. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f-AbCFeFU92GZZYqNOVRnPtUwczSYmR2NHCyf9uHUnNiw@mail.gmail.com	2018-07-24 10:10:22 -07:00
Michael Paquier	9ebe0572ce	Refactor cluster_rel() to handle more options This extends cluster_rel() in such a way that more options can be added in the future, which will reduce the amount of chunk code for an upcoming SKIP_LOCKED aimed for VACUUM. As VACUUM FULL is a different flavor of CLUSTER, we want to make that extensible to ease integration. This only reworks the API and its callers, without providing anything user-facing. Two options are present now: verbose mode and relation recheck when doing the cluster command work across multiple transactions. This could be used as well as a base to extend the grammar of CLUSTER later on. Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz	2018-07-24 11:37:32 +09:00
Michael Paquier	d9fadbf131	Fix calculation for WAL segment recycling and removal Commit `4b0d28de06` has removed the prior checkpoint and related facilities but has left WAL recycling based on the LSN of the prior checkpoint, which causes incorrect calculations for WAL removal and recycling for max_wal_size and min_wal_size. This commit changes things so as the base calculation point is the last checkpoint generated. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/20180723.135748.42558387.horiguchi.kyotaro@lab.ntt.co.jp Backpatch: 11-, where the prior checkpoint has been removed.	2018-07-24 10:32:56 +09:00
Thomas Munro	1bc180cd2a	Use setproctitle_fast() to update the ps status, if available. FreeBSD has introduced a faster variant of setproctitle(). Use it, where available. Author: Thomas Munro Discussion: https://postgr.es/m/CAEepm=1wKMTi81uodJ=1KbJAz5WedOg=cr8ewEXrUFeaxWEgww@mail.gmail.com	2018-07-24 13:09:22 +12:00
Andres Freund	e9a9843e13	LLVMJIT: Adapt to API changes in gdb and perf support. During the work of upstreaming my previous patches for gdb and perf support the API changed. Adapt. Normally this wouldn't necessarily be something to backpatch, but the previous API wasn't upstream, and at least the gdb support is quite useful for debugging. Author: Andres Freund Backpatch: 11, where LLVM based JIT support was added.	2018-07-22 21:13:34 -07:00
Andres Freund	a38b833a7c	LLVMJIT: Fix LLVM build for LLVM > 7. The location of LLVMAddPromoteMemoryToRegisterPass moved. Author: Andres Freund Backpatch: 11, where LLVM based JIT support was added.	2018-07-22 21:13:02 -07:00
Andres Freund	1307bc3d45	Reset context at the tail end of JITed EEOP_AGG_PLAIN_TRANS. While no negative consequences are currently known, it's clearly wrong to not reset the context in one of the branches. Reported-By: Dmitry Dolgov Author: Dmitry Dolgov Discussion: https://postgr.es/m/CAGPqQf165-=+Drw3Voim7M5EjHT1zwPF9BQRjLFQzCzYnNZEiQ@mail.gmail.com Backpatch: 11-, where JIT compilation support was added	2018-07-22 20:31:22 -07:00
Michael Paquier	e41d0a1090	Add proper errcodes to new error messages for read() failures Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the most sense here. While on the way, fix one errcode_for_file_access missing in origin.c since the code has been created, and remove one assignment of errno to 0 before calling read(), as this was around to fit with what was present before `811b6e36` where errno would not be set when not enough bytes are read. I have noticed the first one, and Tom has pinged me about the second one. Author: Michael Paquier Reported-by: Tom Lane Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us	2018-07-23 09:37:36 +09:00
Michael Paquier	56df07bb9e	Make more consistent some error messages for file-related operations Some error messages which report something about a file operation use as well context which is already provided within the path being worked on, making things rather duplicated. This creates more work for translators, and does not actually bring clarity. More could be done, however in a lot of cases the context used is actually useful, still that patch gets down things with a good cut. Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Tom Lane Discussion: https://postgr.es/m/20180718044711.GA8565@paquier.xyz	2018-07-23 09:19:12 +09:00
Andres Freund	6b4d860311	Fix JITed EEOP_AGG_INIT_TRANS, which missed some state. The JIT compiled implementation missed maintaining AggState->{current_set,curaggcontext}. That could lead to trouble because the transition value could be allocated in the wrong context. Reported-By: Rushabh Lathia Diagnosed-By: Dmitry Dolgov Author: Dmitry Dolgov, with minor changes by me Discussion: https://postgr.es/m/CAGPqQf165-=+Drw3Voim7M5EjHT1zwPF9BQRjLFQzCzYnNZEiQ@mail.gmail.com Backpatch: 11-, where JIT compilation support was added	2018-07-22 17:00:41 -07:00
Andres Freund	86eaf208ea	Hand code string to integer conversion for performance. As benchmarks show, using libc's string-to-integer conversion is pretty slow. At least part of the reason for that is that strtol[l] have to be more generic than what largely is required inside pg. This patch considerably speeds up int2/int4 input (int8 already was already using hand-rolled code). Most of the existing pg_atoi callers have been converted. But as one requires pg_atoi's custom delimiter functionality, and as it seems likely that there's external pg_atoi users, it seems sensible to just keep pg_atoi around. Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hmjpk@alap3.anarazel.de	2018-07-22 14:58:23 -07:00
Andres Freund	3522d0eaba	Deduplicate "invalid input syntax" messages for various types. Previously a lot of the error messages referenced the type in the error message itself. That requires that the message is translated separately for each type. Note that currently a few smallint cases continue to reference the integer, rather than smallint, type. A later patch will create a separate routine for 16bit input. Author: Andres Freund Discussion: https://postgr.es/m/20180707200158.wpqkd7rjr4jxq5g7@alap3.anarazel.de	2018-07-22 14:58:01 -07:00
Michael Paquier	96cdeae07f	Add toast tables to most system catalogs It has been project policy to create toast tables only for those catalogs that might reasonably need one. Since this judgment call can change over time, just create one for every catalog, as this can be useful when creating rather-long entries in catalogs, with recent examples being in the shape of policy expressions or customly-formatted SCRAM verifiers. To prevent circular dependencies and to avoid adding complexity to VACUUM FULL logic, exclude pg_class, pg_attribute, and pg_index. Also, to prevent pg_upgrade from seeing a non-empty new cluster, exclude pg_largeobject and pg_largeobject_metadata from the set as large object data is handled as user data. Those relations have no reason to use a toast table anyway. Author: Joe Conway, John Naylor Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/84ddff04-f122-784b-b6c5-3536804495f8@joeconway.com	2018-07-20 07:43:41 +09:00
Tom Lane	2409716755	Remove undocumented restriction against duplicate partition key columns. transformPartitionSpec rejected duplicate simple partition columns (e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression columns, resulting in inconsistent behavior. Worse, cases like "PARTITION BY RANGE (x,(x))") were accepted but would then result in dump/reload failures, since the expression (x) would get simplified to a plain column later. There seems no better reason for this restriction than there was for the one against duplicate included index columns (cf commit `701fd0bbc`), so let's just remove it. Back-patch to v10 where this code was added. Report and patch by Yugo Nagata. Discussion: https://postgr.es/m/20180712165939.36b12aff.nagata@sraoss.co.jp	2018-07-19 15:41:46 -04:00
Tom Lane	028e3da294	Fix pg_get_indexdef()'s behavior for included index columns. The multi-argument form of pg_get_indexdef() failed to print anything when asked to print a single index column that is an included column rather than a key column. This seems an unintentional result of someone having tried to take a short-cut and use the attrsOnly flag for two different purposes. To fix, split said flag into two flags, attrsOnly which suppresses non-attribute info, and keysOnly which suppresses included columns. Add a test case using psql's \d command, which relies on that function. (It's mighty tempting at this point to replace pg_get_indexdef_worker's mess of boolean flag arguments with a single bitmask-of-flags argument, which would allow making the call sites much more self-documenting. But I refrained for the moment.) Discussion: https://postgr.es/m/21724.1531943735@sss.pgh.pa.us	2018-07-19 14:53:48 -04:00
Alvaro Herrera	1573995f55	Rewrite comments in replication slot advance implementation The code added by `9c7d06d606` was a bit obscure; clarify that by rewriting the comments. Lack of clarity has already caused bugs, so it's a worthy goal. Co-authored-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Michaël Paquier <michael@paquier.xyz> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com> Discussion: https://postgr.es/m/87y3fgoyrn.fsf@ars-thinkpad	2018-07-19 14:15:44 -04:00
Alexander Korotkov	309765fa1e	Fix handling of empty uncompressed posting list pages in GIN PostgreSQL 9.4 introduces posting list compression in GIN. This feature supports online upgrade, so that after pg_upgrade uncompressed posting lists are compressed on-the-fly. Underlying code appears to always expect at least one item on uncompressed posting list page. But there could be completely empty pages, because VACUUM never deletes leftmost and rightmost pages from posting trees. This commit fixes that. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com Author: Sivasubramanian Ramasubramanian, Alexander Korotkov Backpatch-through: 9.4	2018-07-19 21:04:17 +03:00
Heikki Linnakangas	99fdebaf2d	Rephrase a few comments for clarity. I was confused by what "intended to be parallel serially" meant, until Robert Haas and David G. Johnston explained it. Rephrase the comment to make it more clear, using David's suggested wording. Discussion: https://www.postgresql.org/message-id/1fec9022-41e8-e484-70ce-2179b08c2092%40iki.fi	2018-07-19 16:08:09 +03:00
Heikki Linnakangas	405cb356d6	Fix comment. This comment was copy-pasted from nodeAppend.c to nodeMergeAppend.c, but while committing `5220bb7533`, I modified wrong copy of it. Spotted by David Rowley	2018-07-19 15:39:06 +03:00
Heikki Linnakangas	5220bb7533	Expand run-time partition pruning to work with MergeAppend This expands the support for the run-time partition pruning which was added for Append in `499be013de` to also allow unneeded subnodes of a MergeAppend to be removed. Author: David Rowley Discussion: https://www.postgresql.org/message-id/CAKJS1f_F_V8D7Wu-HVdnH7zCUxhoGK8XhLLtd%3DCu85qDZzXrgg%40mail.gmail.com	2018-07-19 13:49:43 +03:00
Michael Paquier	b33ef397a1	Fix print of Path nodes when using OPTIMIZER_DEBUG GatherMergePath (introduced in 10) and CustomPath (introduced in 9.5) have gone missing. The order of the Path nodes was inconsistent with what is listed in nodes.h, so make the order consistent at the same time to ease future checks and additions. Author: Sawada Masahiko Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAD21AoBQMLoc=ohH-oocuAPsELrmk8_EsRJjOyR8FQLZkbE0wA@mail.gmail.com	2018-07-19 09:54:39 +09:00
Michael Paquier	c6598b8b05	Fix re-parameterize of MergeAppendPath Instead of MergeAppendPath, MergeAppend nodes were considered. This code is not covered by any tests now, which should be addressed at some point. This is an oversight from `f49842d`, which introduced partition-wise joins in v11, so back-patch down to that. Author: Michael Paquier Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/20180718062202.GC8565@paquier.xyz	2018-07-19 09:01:57 +09:00
Tom Lane	701fd0bbc9	Drop the rule against included index columns duplicating key columns. The initial version of the included-index-column feature stated that included columns couldn't be the same as any key column of the index. While it'd be pretty silly to do that, since the included column would be entirely redundant, we've never prohibited redundant index columns before so it's not very consistent to do so here. Moreover, the prohibition was itself badly implemented, so that it failed to reject columns that were effectively identical but not spelled quite alike, as reported by Aditya Toshniwal. (Moreover, it's not hard to imagine that for some non-btree index types, such cases would be non-silly anyhow: the index might use a lossy representation for key columns but be able to support retrieval of the original form of included columns.) Hence, let's just drop the prohibition. In passing, do some copy-editing on the documentation for the included-column feature. Yugo Nagata; documentation and test corrections by me Discussion: https://postgr.es/m/CAM9w-_mhBCys4fQNfaiQKTRrVWtoFrZ-wXmDuE9Nj5y-Y7aDKQ@mail.gmail.com	2018-07-18 14:43:03 -04:00
Tom Lane	3cb646264e	Use a ResourceOwner to track buffer pins in all cases. Historically, we've allowed auxiliary processes to take buffer pins without tracking them in a ResourceOwner. However, that creates problems for error recovery. In particular, we've seen multiple reports of assertion crashes in the startup process when it gets an error while holding a buffer pin, as for example if it gets ENOSPC during a write. In a non-assert build, the process would simply exit without releasing the pin at all. We've gotten away with that so far just because a failure exit of the startup process translates to a database crash anyhow; but any similar behavior in other aux processes could result in stuck pins and subsequent problems in vacuum. To improve this, institute a policy that we must always have a resowner backing any attempt to pin a buffer, which we can enforce just by removing the previous special-case code in resowner.c. Add infrastructure to make it easy to create a process-lifespan AuxProcessResourceOwner and clear out its contents at appropriate times. Replace existing ad-hoc resowner management in bgwriter.c and other aux processes with that. (Thus, while the startup process gains a resowner where it had none at all before, some other aux process types are replacing an ad-hoc resowner with this code.) Also use the AuxProcessResourceOwner to manage buffer pins taken during StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap process or a standalone backend rather than a true auxiliary process. In passing, remove some other ad-hoc resource owner creations that had gotten cargo-culted into various other places. As far as I can tell that was all unnecessary, and if it had been necessary it was incomplete, due to lacking any provision for clearing those resowners later. (Also worth noting in this connection is that a process that hasn't called InitBufferPoolBackend has no business accessing buffers; so there's more to do than just add the resowner if we want to touch buffers in processes not covered by this patch.) Although this fixes a very old bug, no back-patch, because there's no evidence of any significant problem in non-assert builds. Patch by me, pursuant to a report from Justin Pryzby. Thanks to Robert Haas and Kyotaro Horiguchi for reviews. Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com	2018-07-18 12:15:16 -04:00
Heikki Linnakangas	6b387179ba	Fix misc typos, mostly in comments. A collection of typos I happened to spot while reading code, as well as grepping for common mistakes. Backpatch to all supported versions, as applicable, to avoid conflicts when backporting other commits in the future.	2018-07-18 16:17:32 +03:00
Michael Paquier	94019c879a	Fix more portability issues with casts to Size when using off_t This should tame the beast, as there are no other places where off_t is used in the new error messages. Reported again by longfin, which complained about walsender.c while I spotted the other two ones while double-checking.	2018-07-18 09:51:53 +09:00
Michael Paquier	8bd064f0c7	Fix casting in error message for two-phase file This error from from `811b6e3`, which causes compilation warnings with OSX 10.3 and clang. Reported by Tom Lane, via buildfarm member longfin.	2018-07-18 09:11:34 +09:00
Michael Paquier	811b6e36a9	Rework error messages around file handling Some error messages related to file handling are using the code path context to define their state. For example, 2PC-related errors are referring to "two-phase status files", or "relation mapping file" is used for catalog-to-filenode mapping, however those prove to be difficult to translate, and are not more helpful than just referring to the path of the file being worked on. So simplify all those error messages by just referring to files with their path used. In some cases, like the manipulation of WAL segments, the context is actually helpful so those are kept. Calls to the system function read() have also been rather inconsistent with their error handling sometimes not reporting the number of bytes read, and some other code paths trying to use an errno which has not been set. The in-core functions are using a more consistent pattern with this patch, which checks for both errno if set or if an inconsistent read is happening. So as to care about pluralization when reading an unexpected number of byte(s), "could not read: read %d of %zu" is used as error message, with %d field being the output result of read() and %zu the expected size. This simplifies the work of translators with less variations of the same message. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz	2018-07-18 08:01:23 +09:00
Alvaro Herrera	1c04d4beea	Revise BuildIndexValueDescription to simplify it Getting a pg_index tuple from syscache when the open index relation is available is pointless -- just use the one from relcache. Noticed while reviewing code for `cb9db2ab06`. No backpatch.	2018-07-16 20:20:15 -04:00
Alvaro Herrera	cb9db2ab06	Fix ALTER TABLE...SET STATS error message for included columns The existing error message was complaining that the column is not an expression, which is not correct. Introduce a suitable wording variation and a test. Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp> Discussion: https://postgr.es/m/20180628182803.e4632d5a.nagata@sraoss.co.jp Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>	2018-07-16 20:00:39 -04:00
Alvaro Herrera	e353389d24	Fix partition pruning with IS [NOT] NULL clauses The original code was unable to prune partitions that could not possibly contain NULL values, when the query specified less than all columns in a multicolumn partition key. Reorder the if-tests so that it is, and add more commentary and regression tests. Reported-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com> Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: amul sul <sulamul@gmail.com> Discussion: https://postgr.es/m/CAFjFpRc7qjLUfXLVBBC_HAnx644sjTYM=qVoT3TJ840HPbsTXw@mail.gmail.com	2018-07-16 18:38:59 -04:00
Robert Haas	32df1c9afa	Add subtransaction handling for table synchronization workers. Since the old logic was completely unaware of subtransactions, a change made in a subsequently-aborted subtransaction would still cause workers to be stopped at toplevel transaction commit. Fix that by managing a stack of worker lists rather than just one. Amit Khandekar and Robert Haas Discussion: http://postgr.es/m/CAJ3gD9eaG_mWqiOTA2LfAug-VRNn1hrhf50Xi1YroxL37QkZNg@mail.gmail.com	2018-07-16 17:33:22 -04:00
Peter Eisentraut	f7cb2842bf	Add plan_cache_mode setting This allows overriding the choice of custom or generic plan. Author: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com	2018-07-16 13:35:41 +02:00
Peter Eisentraut	a06e56b247	doc: Update redirecting links Update links that resulted in redirects. Most are changes from http to https, but there are also some other minor edits. (There are still some redirects where the target URL looks less elegant than the one we currently have. I have left those as is.)	2018-07-16 10:48:05 +02:00
Tom Lane	1007b0a126	Fix hashjoin costing mistake introduced with inner_unique optimization. In final_cost_hashjoin(), commit `9c7f5229a` allowed inner_unique cases to follow a code path previously used only for SEMI/ANTI joins; but it neglected to fix an if-test within that path that assumed SEMI and ANTI were the only possible cases. This resulted in a wrong value for hashjointuples, and an ensuing bad cost estimate, for inner_unique normal joins. Fortunately, for inner_unique normal joins we can assume the number of joined tuples is the same as for a SEMI join; so there's no need for more code, we just have to invert the test to check for ANTI not SEMI. It turns out that in two contrib tests in which commit `9c7f5229a` changed the plan expected for a query, the change was actually wrong and induced by this estimation error, not by any real improvement. Hence this patch also reverts those changes. Per report from RK Korlapati. Backpatch to v10 where the error was introduced. David Rowley Discussion: https://postgr.es/m/CA+SNy03bhq0fodsfOkeWDCreNjJVjsdHwUsb7AG=jpe0PtZc_g@mail.gmail.com	2018-07-14 11:59:12 -04:00
Tom Lane	4984784f83	Fix crash in json{b}_populate_recordset() and json{b}_to_recordset(). As of commit `37a795a60`, populate_recordset_worker() tried to pass back (as rsi.setDesc) a tupdesc that it also had cached in its fn_extra. But the core executor would free the passed-back tupdesc, risking a crash if the function were called again in the same query. The safest and least invasive way to fix that is to make an extra tupdesc copy to pass back. While at it, I failed to resist the temptation to get rid of unnecessary get_fn_expr_argtype() calls here and in populate_record_worker(). Per report from Dmitry Dolgov; thanks to Michael Paquier and Andrew Gierth for investigation and discussion. Discussion: https://postgr.es/m/CA+q6zcWzN9ztCfR47ZwgTr1KLnuO6BAY6FurxXhovP4hxr+yOQ@mail.gmail.com	2018-07-13 14:16:54 -04:00
Heikki Linnakangas	42f70cd9c3	Improve performance of tuple conversion map generation Previously convert_tuples_by_name_map naively performed a search of each outdesc column starting at the first column in indesc and searched each indesc column until a match was found. When partitioned tables had many columns this could result in slow generation of the tuple conversion maps. For INSERT and UPDATE statements that touched few rows, this could mean a very large overhead indeed. We can do a bit better with this loop. It's quite likely that the columns in partitioned tables and their partitions are in the same order, so it makes sense to start searching for each column outer column at the inner column position 1 after where the previous match was found (per idea from Alexander Kuzmenkov). This makes the best case search O(N) instead of O(N^2). The worst case is still O(N^2), but it seems unlikely that would happen. Likewise, in the planner, make_inh_translation_list's search for the matching column could often end up falling back on an O(N^2) type search. This commit also improves that by first checking the column that follows the previous match, instead of the column with the same attnum. If we fail to match here we fallback on the syscache's hashtable lookup. Author: David Rowley Reviewed-by: Alexander Kuzmenkov Discussion: https://www.postgresql.org/message-id/CAKJS1f9-wijVgMdRp6_qDMEQDJJ%2BA_n%3DxzZuTmLx5Fz6cwf%2B8A%40mail.gmail.com	2018-07-13 19:54:05 +03:00
Tom Lane	130beba36d	Fix inadequate buffer locking in FSM and VM page re-initialization. When reading an existing FSM or VM page that was found to be corrupt by the buffer manager, the code applied PageInit() to reinitialize the page, but did so without any locking. There is thus a hazard that two backends might concurrently do PageInit, which in itself would still be OK, but the slower one might then zero over subsequent data changes applied by the faster one. Even that is unlikely to be fatal; but it's not desirable, so add locking to prevent it. This does not add any locking overhead in the normal code path where the page is OK. It's not immediately obvious that that's safe, but I believe it is, for reasons explained in the added comments. Problem noted by R P Asim. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/CANXE4Te4G0TGq6cr0-TvwP0H4BNiK_-hB5gHe8mF+nz0mcYfMQ@mail.gmail.com	2018-07-13 11:53:10 -04:00
Peter Eisentraut	3884072329	Prohibit transaction commands in security definer procedures Starting and aborting transactions in security definer procedures doesn't work. StartTransaction() insists that the security context stack is empty, so this would currently cause a crash, and AbortTransaction() resets it. This could be made to work by reorganizing the code, but right now we just prohibit it. Reported-by: amul sul <sulamul@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b96Gupt_LFL7uNyy3c50-wbhA68NUjiK5%3DrF6_w%3Dpq_T%3DQ%40mail.gmail.com	2018-07-13 10:41:32 +02:00
Thomas Munro	e8d9caa436	Accept invalidation messages in InitializeSessionUserId(). If the authentication method modified the system catalogs through a separate database connection (say, to create a new role on the fly), make sure syscache sees the changes before we try to find the user. Author: Thomas Munro Reviewed-by: Tom Lane, Andres Freund Discussion: https://postgr.es/m/CAEepm%3D3_h0_cgmz5PMyab4xk_OFrg6G5VCN%3DnF4chFXM9iFOqA%40mail.gmail.com	2018-07-13 16:19:15 +12:00
Michael Paquier	ce89ad0fa0	Fix argument of pg_create_logical_replication_slot for slot name All attributes and arguments using a slot name map to the data type "name", but this function has been using "text". This is cosmetic, as even if text is used then the slot name would be truncated to 64 characters anyway and stored as such. The documentation already said so and the function already assumed that the argument was of this type when fetching its value. Bump catalog version. Author: Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoADYz_-eAqH5AVFaCaojcRgwpo9PW=u8kgTMys63oB8Cw@mail.gmail.com	2018-07-13 09:32:12 +09:00
Michael Paquier	5fc1008e8a	Clean up temporary WAL segments after an instance crash Temporary WAL segments are created in pg_wal and named as xlogtemp.pid before being renamed to the real deal when creating a new segment. If an instance crashes after the temporary segment is created and before the rename is done, then the server would finish with unremovable data. After an instance crash, scan pg_wal and remove any such segments. With repetitive unlucky crashes this would contribute to disk bloat and presents risks of ENOSPC especially with max_wal_size close to the maximum allowed. Author: Michael Paquier Reviewed-by: Yugo Nagata, Heikki Linnakangas Discussion: https://postgr.es/m/20180514054955.GF1528@paquier.xyz	2018-07-13 06:43:20 +09:00
Peter Eisentraut	5e6e2c8773	Reset shmem_exit_inprogress after shmem_exit() In `ad9a274778`, shmem_exit_inprogress was introduced. But we need to reset it after shmem_exit(), because unlike the similar proc_exit(), shmem_exit() can also be called for cleanup when the process will not exit. Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>	2018-07-12 20:22:17 +02:00
Alvaro Herrera	cd073d8f70	Fix FK checks of TRUNCATE involving partitioned tables When truncating a table that is referenced by foreign keys in partitioned tables, the check to ensure the referencing table are also truncated spuriously failed. This is because it was relying on relhastriggers as a proxy for the table having FKs, and that's wrong for partitioned tables. Fix it to consider such tables separately. There may be a better way ... but this code is pretty inefficient already. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquiër <michael@paquier.xyz> Discussion: https://postgr.es/m/20180711000624.zmeizicibxeehhsg@alvherre.pgsql	2018-07-12 12:09:08 -04:00
Peter Eisentraut	8e599897ca	Improve two error messages	2018-07-12 12:35:59 +02:00
Amit Kapila	40ca70ebcc	Allow using the updated tuple while moving it to a different partition. An update that causes the tuple to be moved to a different partition was missing out on re-constructing the to-be-updated tuple, based on the latest tuple in the update chain. Instead, it's simply deleting the latest tuple and inserting a new tuple in the new partition based on the old tuple. Commit `2f17844104` didn't consider this case, so some of the updates were getting lost. In passing, change the argument order for output parameter in ExecDelete and add some commentary about it. Reported-by: Pavan Deolasee Author: Amit Khandekar, with minor changes by me Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera Backpatch-through: 11 Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com	2018-07-12 12:51:39 +05:30
Michael Paquier	edc6b41bd4	Rename VACOPT_NOWAIT to VACOPT_SKIP_LOCKED When it comes to SELECT ... FOR or LOCK, NOWAIT means to not wait for something to happen, and issue an error. SKIP LOCKED means to not wait for something to happen but to move on without issuing an error. The internal option of autovacuum and autoanalyze mentioned above, used only when wraparound is not involved was named NOWAIT, but behaves like SKIP LOCKED which is confusing. Author: Nathan Bossart Discussion: https://postgr.es/m/20180307050345.GA3095@paquier.xyz	2018-07-12 14:28:28 +09:00
Michael Paquier	6551f3daa2	Add assertion in expand_vacuum_rel() for non-autovacuum path The code path where the assertion is added helps to check that autovacuum always includes a relation OID when doing a vacuum on it. Extracted from a larger patch set to add support for SKIP LOCKED with manual VACUUM commands. Author: Nathan Bossart Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com	2018-07-12 13:50:17 +09:00

... 6 7 8 9 10 ...

19215 Commits