postgresql

Commit Graph

Author	SHA1	Message	Date
Andres Freund	9d906f1119	Move generic slot support functions from heaptuple.c into execTuples.c. heaptuple.c was never a particular good fit for slot_getattr(), slot_getsomeattrs() and slot_getmissingattrs(), but in upcoming changes slots will be made more abstract (allowing slots that contain different types of tuples), making it clearly the wrong place. Note that slot_deform_tuple() remains in it's current place, as it clearly deals with a HeapTuple. getmissingattrs() also remains, but it's less clear that that's correct - but execTuples.c wouldn't be the right place. Author: Ashutosh Bapat. Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-10-15 15:17:04 -07:00
Thomas Munro	e73ca79fc7	Move the replication lag tracker into heap memory. Andres Freund complained about the 128KB of .bss occupied by LagTracker. It's only needed in the walsender process, so allocate it in heap memory there. Author: Thomas Munro Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya%40alap3.anarazel.de	2018-10-16 11:04:41 +13:00
Tom Lane	d48da369ab	Check for stack overrun in standard_ProcessUtility(). ProcessUtility can recurse, and indeed can be driven to infinite recursion, so it ought to have a check_stack_depth() call. This covers the reported bug (portal trying to execute itself) and a bunch of other cases that could perhaps arise somewhere. Per bug #15428 from Malthe Borch. Back-patch to all supported branches. Discussion: https://postgr.es/m/15428-b3c2915ec470b033@postgresql.org	2018-10-15 14:01:38 -04:00
Peter Eisentraut	35584fd05f	Make spelling of "acknowledgment" consistent I used the preferred U.S. spelling, as we do in other cases.	2018-10-15 10:06:45 +02:00
Tom Lane	7d4a10e260	Use PlaceHolderVars within the quals of a FULL JOIN. This prevents failures in cases where we pull up a constant or var-free expression from a subquery and put it into a full join's qual. That can result in not recognizing the qual as containing a mergejoin-able or hashjoin-able condition. A PHV prevents the problem because it is still recognized as belonging to the side of the join the subquery is in. I'm not very sure about the net effect of this change on plan quality. In "typical" cases where the join keys are Vars, nothing changes. In an affected case, the PHV-wrapped expression is less likely to be seen as equal to PHV-less instances below the join, but more likely to be seen as equal to similar expressions above the join, so it may end up being a wash. In the one existing case where there's any visible change in a regression-test plan, it amounts to referencing a lower computation of a COALESCE result instead of recomputing it, which seems like a win. Given my uncertainty about that and the lack of field complaints, no back-patch, even though this is a very ancient problem. Discussion: https://postgr.es/m/32090.1539378124@sss.pgh.pa.us	2018-10-14 13:07:29 -04:00
Michael Paquier	1df21ddb19	Avoid duplicate XIDs at recovery when building initial snapshot On a primary, sets of XLOG_RUNNING_XACTS records are generated on a periodic basis to allow recovery to build the initial state of transactions for a hot standby. The set of transaction IDs is created by scanning all the entries in ProcArray. However it happens that its logic never counted on the fact that two-phase transactions finishing to prepare can put ProcArray in a state where there are two entries with the same transaction ID, one for the initial transaction which gets cleared when prepare finishes, and a second, dummy, entry to track that the transaction is still running after prepare finishes. This way ensures a continuous presence of the transaction so as callers of for example TransactionIdIsInProgress() are always able to see it as alive. So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase transaction finishes to prepare, the record can finish with duplicated XIDs, which is a state expected by design. If this record gets applied on a standby to initial its recovery state, then it would simply fail, so the odds of facing this failure are very low in practice. It would be tempting to change the generation of XLOG_RUNNING_XACTS so as duplicates are removed on the source, but this requires to hold on ProcArrayLock for longer and this would impact all workloads, particularly those using heavily two-phase transactions. XLOG_RUNNING_XACTS is also actually used only to initialize the standby state at recovery, so instead the solution is taken to discard duplicates when applying the initial snapshot. Diagnosed-by: Konstantin Knizhnik Author: Michael Paquier Discussion: https://postgr.es/m/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru Backpatch-through: 9.3	2018-10-14 22:23:21 +09:00
Tom Lane	13cd7209f7	Simplify use of AllocSetContextCreate() wrapper macro. We can allow this macro to accept either abbreviated or non-abbreviated allocation parameters by making use of __VA_ARGS__. As noted by Andres Freund, it's unlikely that any compiler would have __builtin_constant_p but not __VA_ARGS__, so this gives up little or no error checking, and it avoids a minor but annoying API break for extensions. With this change, there is no reason for anybody to call AllocSetContextCreateExtended directly, so in HEAD I renamed it to AllocSetContextCreateInternal. It's probably too late for an ABI break like that in 11, though. Discussion: https://postgr.es/m/20181012170355.bhxi273skjt6sag4@alap3.anarazel.de	2018-10-12 14:26:56 -04:00
Alvaro Herrera	c7d43c4d8a	Correct attach/detach logic for FKs in partitions There was no code to handle foreign key constraints on partitioned tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH a partition that already had an equivalent constraint, that one was ignored and a new constraint was created. Adding this to the fact that foreign key cloning reuses the constraint name on the partition instead of generating a new name (as it probably should, to cater to SQL standard rules about constraint naming within schemas), the result was a pretty poor user experience -- the most visible failure was that just detaching a partition and re-attaching it failed with an error such as ERROR: duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index" DETAIL: Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists. because it would try to create an identically-named constraint in the partition. To make matters worse, if you tried to drop the constraint in the now-independent partition, that would fail because the constraint was still seen as dependent on the constraint in its former parent partitioned table: ERROR: cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09" This fix attacks the problem from two angles: first, when the partition is detached, the constraint is also marked as independent, so the drop now works. Second, when the partition is re-attached, we scan existing constraints searching for one matching the FK in the parent, and if one exists, we link that one to the parent constraint. So we don't end up with a duplicate -- and better yet, we don't need to scan the referenced table to verify that the constraint holds. To implement this I made a small change to previously planner-only struct ForeignKeyCacheInfo to contain the constraint OID; also relcache now maintains the list of FKs for partitioned tables too. Backpatch to 11. Reported-by: Michael Vitale (bug #15425) Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org	2018-10-12 12:37:37 -03:00
Andres Freund	cda6a8d01d	Remove deprecated abstime, reltime, tinterval datatypes. These types have been deprecated for a long time. Catversion bump, for obvious reasons. Author: Andres Freund Discussion: https://postgr.es/m/20181009192237.34wjp3nmw7oynmmr@alap3.anarazel.de https://postgr.es/m/20171213080506.cwjkpcz3bkk6yz2u@alap3.anarazel.de https://postgr.es/m/25615.1513115237@sss.pgh.pa.us	2018-10-11 11:59:15 -07:00
Andres Freund	86896be60e	Move timeofday() implementation out of nabstime.c. nabstime.c is about to be removed, but timeofday() isn't related to the rest of the functionality therein, and some find it useful. Move to timestamp.c. Discussion: https://postgr.es/m/20181009192237.34wjp3nmw7oynmmr@alap3.anarazel.de https://postgr.es/m/20180928223240.kgwc4czzzekrpsid%40alap3.anarazel.de	2018-10-11 11:30:59 -07:00
Andres Freund	e9edc1ba0b	Fix logical decoding error when system table w/ toast is repeatedly rewritten. Repeatedly rewriting a mapped catalog table with VACUUM FULL or CLUSTER could cause logical decoding to fail with: ERROR, "could not map filenode \"%s\" to relation OID" To trigger the problem the rewritten catalog had to have live tuples with toasted columns. The problem was triggered as during catalog table rewrites the heap_insert() check that prevents logical decoding information to be emitted for system catalogs, failed to treat the new heap's toast table as a system catalog (because the new heap is not recognized as a catalog table via RelationIsLogicallyLogged()). The relmapper, in contrast to the normal catalog contents, does not contain historical information. After a single rewrite of a mapped table the new relation is known to the relmapper, but if the table is rewritten twice before logical decoding occurs, the relfilenode cannot be mapped to a relation anymore. Which then leads us to error out. This only happens for toast tables, because the main table contents aren't re-inserted with heap_insert(). The fix is simple, add a new heap_insert() flag that prevents logical decoding information from being emitted, and accept during decoding that there might not be tuple data for toast tables. Unfortunately that does not fix pre-existing logical decoding errors. Doing so would require not throwing an error when a filenode cannot be mapped to a relation during decoding, and that seems too likely to hide bugs. If it's crucial to fix decoding for an existing slot, temporarily changing the ERROR in ReorderBufferCommit() to a WARNING appears to be the best fix. Author: Andres Freund Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de Backpatch: 9.4-, where logical decoding was introduced	2018-10-10 13:53:02 -07:00
Peter Eisentraut	f82d4d666f	Slightly correct context check for event triggers The previous check for a "complete query" omitted the new PROCESS_UTILITY_QUERY_NONATOMIC value. This didn't actually make a difference in practice, because only CALL and SET from PL/pgSQL run in this state, but it's more correct to include it anyway. Discussion: https://www.postgresql.org/message-id/4566041d-2567-74d2-d135-19ff6a20fe51%402ndquadrant.com	2018-10-10 22:41:12 +02:00
Peter Eisentraut	f8c10f616f	Turn transaction_isolation into GUC enum It was previously a string setting that was converted into an enum by custom code, but using the GUC enum facility seems much simpler and doesn't change any functionality, except that set transaction_isolation='default'; no longer works, but that was never documented and doesn't work with any other transaction characteristics. (Note that this is not the same as RESET or SET TO DEFAULT, which still work.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/457db615-e84c-4838-310e-43841eb806e5@iki.fi	2018-10-09 21:26:00 +02:00
Michael Paquier	c481016201	Add pg_ls_archive_statusdir function This function lists the contents of the WAL archive status directory, and is intended to be used by monitoring tools. Unlike pg_ls_dir(), access to it can be granted to non-superusers so that those monitoring tools can observe the principle of least privilege. Access is also given by default to members of pg_monitor. Author: Christoph Moench-Tegeder Reviewed-by: Aya Iwata Discussion: https://postgr.es/m/20180930205920.GA64534@elch.exwg.net	2018-10-09 22:29:09 +09:00
Thomas Munro	212fab9926	Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux). Originally committed as `15bc038f` (plus some follow-ups), this was reverted in `28e07270` due to a problem discovered in parallel workers. This new version corrects that problem by sending the list of uncommitted enum values to parallel workers. Here follows the original commit message describing the change: To prevent possibly breaking indexes on enum columns, we must keep uncommitted enum values from getting stored in tables, unless we can be sure that any such column is new in the current transaction. Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE from being executed at all in a transaction block, unless the target enum type had been created in the current transaction. This patch removes that restriction, and instead insists that an uncommitted enum value can't be referenced unless it belongs to an enum type created in the same transaction as the value. Per discussion, this should be a bit less onerous. It does require each function that could possibly return a new enum value to SQL operations to check this restriction, but there aren't so many of those that this seems unmaintainable. Author: Andrew Dunstan and Tom Lane, with parallel query fix by Thomas Munro Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEepm%3D0Ei7g6PaNTbcmAh9tCRahQrk%3Dr5ZWLD-jr7hXweYX3yg%40mail.gmail.com Discussion: https://postgr.es/m/4075.1459088427%40sss.pgh.pa.us	2018-10-09 12:51:01 +13:00
Tom Lane	82ff0cc91d	Advance transaction timestamp for intra-procedure transactions. Per discussion, this behavior seems less astonishing than not doing so. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/20180920234040.GC29981@momjian.us	2018-10-08 16:16:36 -04:00
Tom Lane	6eb3eb577d	Improve snprintf.c's handling of NaN, Infinity, and minus zero. Up to now, float4out/float8out handled NaN and Infinity cases explicitly, and invoked psprintf only for ordinary float values. This was done because platform implementations of snprintf produce varying representations of these special cases. But now that we use snprintf.c always, it's better to give it the responsibility to produce a uniform representation of these cases, so that we have uniformity across the board not only in float4out/float8out. Hence, move that work into fmtfloat(). Also, teach fmtfloat() to recognize IEEE minus zero and handle it correctly. The previous coding worked only accidentally, and would fail for e.g. "%+f" format (it'd print "+-0.00000"). Now that we're using snprintf.c everywhere, it's not acceptable for it to do weird things in corner cases. (This incidentally avoids a portability problem we've seen on some really ancient platforms, that native sprintf does the wrong thing with minus zero.) Also, introduce a new entry point in snprintf.c to allow float[48]out to bypass the work of interpreting a well-known format spec, as well as bypassing the overhead of the psprintf layer. I modeled this API loosely on strfromd(). In my testing, this brings float[48]out back to approximately the same speed they had when using native snprintf, fixing one of the main performance issues caused by using snprintf.c. (There is some talk of more aggressive work to improve the speed of floating-point output conversion, but these changes seem to provide a better starting point for such work anyway.) Getting rid of the previous ad-hoc hack for Infinity/NaN in fmtfloat() allows removing <ctype.h> from snprintf.c's #includes. I also removed a few other #includes that I think are historical, though the buildfarm may expose that as wrong. Discussion: https://postgr.es/m/13178.1538794717@sss.pgh.pa.us	2018-10-08 12:19:20 -04:00
Tom Lane	f9eb7c14b0	Avoid O(N^2) cost in ExecFindRowMark(). If there are many ExecRowMark structs, we spent O(N^2) time in ExecFindRowMark during executor startup. Once upon a time this was not of great concern, but the addition of native partitioning has squeezed out enough other costs that this can become the dominant overhead in some use-cases for tables with many partitions. To fix, simply replace that List data structure with an array. This adds a little bit of cost to execCurrentOf(), but not much, and anyway that code path is neither of large importance nor very efficient now. If we ever decide it is a bottleneck, constructing a hash table for lookup-by-tableoid would likely be the thing to do. Per complaint from Amit Langote, though this is different from his fix proposal. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-08 10:41:34 -04:00
Alvaro Herrera	eee01d606e	Silence compiler warning in Assert() gcc 6.3 does not whine about this mistake I made in `39808e8868` but evidently lots of other compilers do, according to Michael Paquier, Peter Eisentraut, Arthur Zakirov, Tomas Vondra. Discussion: too many to list	2018-10-08 10:37:21 -03:00
Peter Eisentraut	634b4b79cb	Track procedure calls in pg_stat_user_functions This was forgotten when procedures were implemented. Reported-by: Lukas Fittl <lukas@fittl.com>	2018-10-08 11:22:53 +02:00
Michael Paquier	9c2a970d1f	Improve two error messages related to foreign keys on partitioned tables Error messages for creating a foreign key on a partitioned table using ONLY or NOT VALID were wrong in mentioning the objects they worked on. This commit adds on the way some regression tests missing for those cases. Author: Laurenz Albe Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/c11c05810a9ed65e9b2c817a9ef442275a32fe80.camel@cybertec.at	2018-10-08 17:56:13 +09:00
Tom Lane	52ed730d51	Remove some unnecessary fields from Plan trees. In the wake of commit `f2343653f`, we no longer need some fields that were used before to control executor lock acquisitions: * PlannedStmt.nonleafResultRelations can go away entirely. * partitioned_rels can go away from Append, MergeAppend, and ModifyTable. However, ModifyTable still needs to know the RT index of the partition root table if any, which was formerly kept in the first entry of that list. Add a new field "rootRelation" to remember that. rootRelation is partly redundant with nominalRelation, in that if it's set it will have the same value as nominalRelation. However, the latter field has a different purpose so it seems best to keep them distinct. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-07 14:33:17 -04:00
Alvaro Herrera	39808e8868	Fix catalog insertion order for ATTACH PARTITION Commit `2fbdf1b38b` changed the order in which we inserted catalog rows when creating partitions, so that we could remove an unsightly hack required for untimely relcache invalidations. However, that commit only changed the ordering for CREATE TABLE PARTITION OF, and left ALTER TABLE ATTACH PARTITION unchanged, so the latter can be affected when catalog invalidations occur, for instance when the partition key involves an SQL function. Reported-by: Rajkumar Raghuwanshi Author: Amit Langote Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/CAKcux6=nTz9KSfTr_6Z2mpzLJ_09JN-rK6=dWic6gGyTSWueyQ@mail.gmail.com	2018-10-06 22:13:19 -03:00
Alvaro Herrera	ad08006ba0	Fix event triggers for partitioned tables Index DDL cascading on partitioned tables introduced a way for ALTER TABLE to be called reentrantly. This caused an an important deficiency in event trigger support to be exposed: on exiting the reentrant call, the alter table state object was clobbered, causing a crash when the outer alter table tries to finalize its processing. Fix the crash by creating a stack of event trigger state objects. There are still ways to cause things to misbehave (and probably other crashers) with more elaborate tricks, but at least it now doesn't crash in the obvious scenario. Backpatch to 9.5, where DDL deparsing of event triggers was introduced. Reported-by: Marco Slot Authors: Michaël Paquier, Álvaro Herrera Discussion: https://postgr.es/m/CANNhMLCpi+HQ7M36uPfGbJZEQLyTy7XvX=5EFkpR-b1bo0uJew@mail.gmail.com	2018-10-06 19:17:46 -03:00
Tom Lane	29ef2b310d	Restore sane locking behavior during parallel query. Commit `9a3cebeaa` changed things so that parallel workers didn't obtain any lock of their own on tables they access. That was clearly a bad idea, but I'd mistakenly supposed that it was the intended end result of the series of patches for simplifying the executor's lock management. Undo that change in relation_open(), and adjust ExecOpenScanRelation() so that it gets the correct lock if inside a parallel worker. In passing, clean up some more obsolete comments about when locks are acquired. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-06 15:49:37 -04:00
Tom Lane	f2343653f5	Remove more redundant relation locking during executor startup. We already have appropriate locks on every relation listed in the query's rangetable before we reach the executor. Take the next step in exploiting that knowledge by removing code that worries about taking locks on non-leaf result relations in a partitioned table. In particular, get rid of ExecLockNonLeafAppendTables and a stanza in InitPlan that asserts we already have locks on certain such tables. In passing, clean up some now-obsolete comments in InitPlan. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-06 15:12:51 -04:00
Tom Lane	0209f0285d	Don't use is_infinite() where isinf() will do. Places that aren't testing for sign should not use the more expensive function; it's just wasteful, not to mention being a cognitive load for readers who may know what isinf() is but not is_infinite(). As things stand, we actually don't need is_infinite() anyplace except float4out/float8out, which means it could potentially go away altogether after the changes I proposed in <13178.1538794717@sss.pgh.pa.us>.	2018-10-06 13:18:38 -04:00
Tom Lane	07ee62ce9e	Propagate xactStartTimestamp and stmtStartTimestamp to parallel workers. Previously, a worker process would establish values for these based on its own start time. In v10 and up, this can trivially be shown to cause misbehavior of transaction_timestamp(), timestamp_in(), and related functions which are (perhaps unwisely?) marked parallel-safe. It seems likely that other behaviors might diverge from what happens in the parent as well. It's not as trivial to demonstrate problems in 9.6 or 9.5, but I'm sure it's still possible, so back-patch to all branches containing parallel worker infrastructure. In HEAD only, mark now() and statement_timestamp() as parallel-safe (other affected functions already were). While in theory we could still squeeze that change into v11, it doesn't seem important enough to force a last-minute catversion bump. Konstantin Knizhnik, whacked around a bit by me Discussion: https://postgr.es/m/6406dbd2-5d37-4cb6-6eb2-9c44172c7e7c@postgrespro.ru	2018-10-06 12:00:09 -04:00
Dean Rasheed	e954a727f0	Improve the accuracy of floating point statistical aggregates. When computing statistical aggregates like variance, the common schoolbook algorithm which computes the sum of the squares of the values and subtracts the square of the mean can lead to a large loss of precision when using floating point arithmetic, because the difference between the two terms is often very small relative to the terms themselves. To avoid this, re-work these aggregates to use the Youngs-Cramer algorithm, which is a proven, numerically stable algorithm that directly aggregates the sum of the squares of the differences of the values from the mean in a single pass over the data. While at it, improve the test coverage to test the aggregate combine functions used during parallel aggregation. Per report and suggested algorithm from Erich Schubert. Patch by me, reviewed by Madeleine Thompson. Discussion: https://postgr.es/m/153313051300.1397.9594490737341194671@wrigleys.postgresql.org	2018-10-06 11:20:09 +01:00
Michael Paquier	38921d1416	Assign constraint name when cloning FK definition for partitions This is for example used when attaching a partition to a partitioned table which includes foreign keys, and in this case the constraint name has been missing in the data cloned. This could lead to hard crashes, as when validating the foreign key constraint, the constraint name is always expected. Particularly, when using log_min_messages >= DEBUG1, a log message would be generated with this unassigned constraint name, leading to an assertion failure on HEAD. While on it, rename a variable in ATExecAttachPartition which was declared twice with the same name. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20181005042236.GG1629@paquier.xyz Backpatch-through: 11	2018-10-06 14:59:36 +09:00
Tom Lane	c87cb5f7a6	Allow btree comparison functions to return INT_MIN. Historically we forbade datatype-specific comparison functions from returning INT_MIN, so that it would be safe to invert the sort order just by negating the comparison result. However, this was never really safe for comparison functions that directly return the result of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction on those library functions. Buildfarm results show that at least on recent Linux on s390x, memcmp() actually does return INT_MIN sometimes, causing sort failures. The agreed-on answer is to remove this restriction and fix relevant call sites to not make such an assumption; code such as "res = -res" should be replaced by "INVERT_COMPARE_RESULT(res)". The same is needed in a few places that just directly negated the result of memcmp or strcmp. To help find places having this problem, I've also added a compile option to nbtcompare.c that causes some of the commonly used comparators to return INT_MIN/INT_MAX instead of their usual -1/+1. It'd likely be a good idea to have at least one buildfarm member running with "-DSTRESS_SORT_INT_MIN". That's far from a complete test of course, but it should help to prevent fresh introductions of such bugs. This is a longstanding portability hazard, so back-patch to all supported branches. Discussion: https://postgr.es/m/20180928185215.ffoq2xrq5d3pafna@alap3.anarazel.de	2018-10-05 16:01:29 -04:00
Michael Paquier	9cd92d1a33	Add pg_ls_tmpdir function This lists the contents of a temporary directory associated to a given tablespace, useful to get information about on-disk consumption caused by temporary files used by a session query. By default, pg_default is scanned, and a tablespace can be specified as argument. This function is intended to be used by monitoring tools, and, unlike pg_ls_dir(), access to them can be granted to non-superusers so that those monitoring tools can observe the principle of least privilege. Access is also given by default to members of pg_monitor. Author: Nathan Bossart Reviewed-by: Laurenz Albe Discussion: https://postgr.es/m/92F458A2-6459-44B8-A7F2-2ADD3225046A@amazon.com	2018-10-05 09:21:48 +09:00
Tom Lane	d73f4c74dd	In the executor, use an array of pointers to access the rangetable. Instead of doing a lot of list_nth() accesses to es_range_table, create a flattened pointer array during executor startup and index into that to get at individual RangeTblEntrys. This eliminates one source of O(N^2) behavior with lots of partitions. (I'm not exactly convinced that it's the most important source, but it's an easy one to fix.) Amit Langote and David Rowley Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 15:48:17 -04:00
Tom Lane	9ddef36278	Centralize executor's opening/closing of Relations for rangetable entries. Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 14:03:42 -04:00
Alvaro Herrera	fb9e93a2c5	Fix duplicate primary keys in partitions When using the CREATE TABLE .. PARTITION OF syntax, it's possible to cause a partition to get two primary keys if the parent already has one. Tighten the check to disallow that. Reported-by: Rajkumar Raghuwanshi Author: Amul Sul Discussion: https://postgr.es/m/CAKcux6=OnSV3-qd8Gb6W=KPPwcCz6Fe_O_MQYjTa24__Xn8XxA@mail.gmail.com	2018-10-04 11:40:36 -03:00
Michael Paquier	09921f397b	Refactor user-facing SQL functions signalling backends This moves the system administration functions for signalling backends from backend/utils/adt/misc.c into a separate file dedicated to backend signalling. No new functionality is introduced in this commit. Author: Daniel Gustafsson Reviewed-by: Michael Paquier, Álvaro Herrera Discussion: https://postgr.es/m/C2C7C3EC-CC5F-44B6-9C78-637C88BD7D14@yesql.se	2018-10-04 18:27:25 +09:00
Michael Paquier	803b1301e8	Add option SKIP_LOCKED to VACUUM and ANALYZE When specified, this option allows VACUUM to skip the work on a relation if there is a conflicting lock on it when trying to open it at the beginning of its processing. Similarly to autovacuum, this comes with a couple of limitations while the relation is processed which can cause the process to still block: - when opening the relation indexes. - when acquiring row samples for table inheritance trees, partition trees or certain types of foreign tables, and that a lock is taken on some leaves of such trees. Author: Nathan Bossart Reviewed-by: Michael Paquier, Andres Freund, Masahiko Sawada Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com Discussion: https://postgr.es/m/20171201160907.27110.74730@wrigleys.postgresql.org	2018-10-04 09:00:33 +09:00
Tom Lane	9a3cebeaa7	Change executor to just Assert that table locks were already obtained. Instead of locking tables during executor startup, just Assert that suitable locks were obtained already during the parse/plan pipeline (or re-obtained by the plan cache). This must be so, else we have a hazard that concurrent DDL has invalidated the plan. This is pretty inefficient as well as undercommented, but it's all going to go away shortly, so I didn't try hard. This commit is just another attempt to use the buildfarm to see if we've missed anything in the plan to simplify the executor's table management. Note that the change needed here in relation_open() exposes that parallel workers now really are accessing tables without holding any lock of their own, whereas they were not doing that before this commit. This does not give me a warm fuzzy feeling about that aspect of parallel query; it does not seem like a good design, and we now know that it's had exactly no actual testing. I think that we should modify parallel query so that that change can be reverted. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-03 16:05:12 -04:00
Andres Freund	c03c1449c0	Fix issues around EXPLAIN with JIT. I (Andres) was more than a bit hasty in committing `33001fd7a7` after last minute changes, leading to a number of problems (jit output was only shown for JIT in parallel workers, and just EXPLAIN without ANALYZE didn't work). Lukas luckily found these issues quickly. Instead of combining instrumentation in in standard_ExecutorEnd(), do so on demand in the new ExplainPrintJITSummary(). Also update a documentation example of the JIT output, changed in `52050ad8eb`. Author: Lukas Fittl, with minor changes by me Discussion: https://postgr.es/m/CAP53PkxmgJht69pabxBXJBM+0oc6kf3KHMborLP7H2ouJ0CCtQ@mail.gmail.com Backpatch: 11, where JIT compilation was introduced	2018-10-03 12:48:37 -07:00
Amit Kapila	9bc9f72b28	MAXALIGN the target address where we store flattened value. The API (EOH_flatten_into) that flattens the expanded value representation expects the target address to be maxaligned. All it's usage adhere to that principle except when serializing datums for parallel query. Fix that usage. Diagnosed-by: Tom Lane Author: Tom Lane and Amit Kapila Backpatch-through: 9.6 Discussion: https://postgr.es/m/11629.1536550032@sss.pgh.pa.us	2018-10-03 09:15:03 +05:30
Tom Lane	6e35939feb	Change rewriter/planner/executor/plancache to depend on RTE rellockmode. Instead of recomputing the required lock levels in all these places, just use what commit `fdba460a2` made the parser store in the RTE fields. This already simplifies the code measurably in these places, and follow-on changes will remove a bunch of no-longer-needed infrastructure. In a few cases, this change causes us to acquire a higher lock level than we did before. This is OK primarily because said higher lock level should've been acquired already at query parse time; thus, we're saving a useless extra trip through the shared lock manager to acquire a lesser lock alongside the original lock. The only known exception to this is that re-execution of a previously planned SELECT FOR UPDATE/SHARE query, for a table that uses ROW_MARK_REFERENCE or ROW_MARK_COPY methods, might have gotten only AccessShareLock before. Now it will get RowShareLock like the first execution did, which seems fine. While there's more to do, push it in this state anyway, to let the buildfarm help verify that nothing bad happened. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-02 14:43:09 -04:00
Andres Freund	cc2905e963	Use slots more widely in tuple mapping code and make naming more consistent. It's inefficient to use a single slot for mapping between tuple descriptors for multiple tuples, as previously done when using ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors change for every tuple. Previously we also, via ConvertPartitionTupleSlot(), built new tuples after the mapping even in cases where we, immediately afterwards, access individual columns again. Refactor the code so one slot, on demand, is used for each partition. That avoids having to change the descriptor (and allows to use the more efficient "fixed" tuple slots). Then use slot->slot mapping, to avoid unnecessarily forming a tuple. As the naming between the tuple and slot mapping functions wasn't consistent, rename them to execute_attr_map_{tuple,slot}. It's likely that we'll also rename convert_tuples_by_* to denote that these functions "only" build a map, but that's left for later. Author: Amit Khandekar and Amit Langote, editorialized by me Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund Discussion: https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp Backpatch: -	2018-10-02 11:14:26 -07:00
Tom Lane	3d0f68dd30	Fix corner-case failures in has_foo_privilege() family of functions. The variants of these functions that take numeric inputs (OIDs or column numbers) are supposed to return NULL rather than failing on bad input; this rule reduces problems with snapshot skew when queries apply the functions to all rows of a catalog. has_column_privilege() had careless handling of the case where the table OID didn't exist. You might get something like this: select has_column_privilege(9999,'nosuchcol','select'); ERROR: column "nosuchcol" of relation "(null)" does not exist or you might get a crash, depending on the platform's printf's response to a null string pointer. In addition, while applying the column-number variant to a dropped column returned NULL as desired, applying the column-name variant did not: select has_column_privilege('mytable','........pg.dropped.2........','select'); ERROR: column "........pg.dropped.2........" of relation "mytable" does not exist It seems better to make this case return NULL as well. Also, the OID-accepting variants of has_foreign_data_wrapper_privilege, has_server_privilege, and has_tablespace_privilege didn't follow the principle of returning NULL for nonexistent OIDs. Superusers got TRUE, everybody else got an error. Per investigation of Jaime Casanova's report of a new crash in HEAD. These behaviors have been like this for a long time, so back-patch to all supported branches. Patch by me; thanks to Stephen Frost for discussion and review Discussion: https://postgr.es/m/CAJGNTeP=-6Gyqq5TN9OvYEydi7Fv1oGyYj650LGTnW44oAzYCg@mail.gmail.com	2018-10-02 11:54:12 -04:00
Michael Paquier	e3a25ab9ea	Refactor relation opening for VACUUM and ANALYZE VACUUM and ANALYZE share similar logic when it comes to opening a relation to work on in terms of how the relation is opened, in which order locks are tried and how logs should be generated when something does not work as expected. This commit refactors things so as both use the same code path to handle the way a relation is opened, so as the integration of new options becomes easier. Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/20180927075152.GT1659@paquier.xyz	2018-10-02 08:53:38 +09:00
Peter Eisentraut	cf3dfea45b	Change PROCEDURE to FUNCTION in CREATE EVENT TRIGGER syntax This was claimed to have been done in `0a63f996e0`, but that actually only changed the documentation and not the grammar. (That commit did fully change it for CREATE TRIGGER.)	2018-10-01 23:02:55 +02:00
Tom Lane	b04aeb0a05	Add assertions that we hold some relevant lock during relation open. Opening a relation with no lock at all is unsafe; there's no guarantee that we'll see a consistent state of the relevant catalog entries. While use of MVCC scans to read the catalogs partially addresses that complaint, it's still possible to switch to a new catalog snapshot partway through loading the relcache entry. Moreover, whether or not you trust the reasoning behind sometimes using less than AccessExclusiveLock for ALTER TABLE, that reasoning is certainly not valid if concurrent users of the table don't hold a lock corresponding to the operation they want to perform. Hence, add some assertion-build-only checks that require any caller of relation_open(x, NoLock) to hold at least AccessShareLock. This isn't a full solution, since we can't verify that the lock level is semantically appropriate for the action --- but it's definitely of some use, because it's already caught two bugs. We can also assert that callers of addRangeTableEntryForRelation() hold at least the lock level specified for the new RTE. Amit Langote and Tom Lane Discussion: https://postgr.es/m/16565.1538327894@sss.pgh.pa.us	2018-10-01 12:43:21 -04:00
Tom Lane	e27453bd83	Fix ALTER COLUMN TYPE to not open a relation without any lock. If the column being modified is referenced by a foreign key constraint of another table, ALTER TABLE would open the other table (to re-parse the constraint's definition) without having first obtained a lock on it. This was evidently intentional, but that doesn't mean it's really safe. It's especially not safe in 9.3, which pre-dates use of MVCC scans for catalog reads, but even in current releases it doesn't seem like a good idea. We know we'll need AccessExclusiveLock shortly to drop the obsoleted constraint, so just get that a little sooner to close the hole. Per testing with a patch that complains if we open a relation without holding any lock on it. I don't plan to back-patch that patch, but we should close the holes it identifies in all supported branches. Discussion: https://postgr.es/m/2038.1538335244@sss.pgh.pa.us	2018-10-01 11:39:13 -04:00
Tom Lane	fdba460a26	Create an RTE field to record the query's lock mode for each relation. Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-09-30 13:55:51 -04:00
Stephen Frost	8bddc86400	Add application_name to connection authorized msg The connection authorized message has quite a bit of useful information in it, but didn't include the application_name (when provided), so let's add that as it can be very useful. Note that at the point where we're emitting the connection authorized message, we haven't processed GUCs, so it's not possible to get this by using log_line_prefix (which pulls from the GUC). There's also something to be said for having this included in the connection authorized message and then not needing to repeat it for every line, as having it in log_line_prefix would do. The GUC cleans the application name to pure-ascii, so do that here too, but pull out the logic for cleaning up a string into its own function in common and re-use it from those places, and check_cluster_name which was doing the same thing. Author: Don Seiler <don@seiler.us> Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com	2018-09-28 19:04:50 -04:00
Tom Lane	2b04dfc472	Improve error reporting for unsupported effective_io_concurrency setting. Give a specific error complaining about lack of posix_fadvise() when someone tries to set effective_io_concurrency > 0 on platforms without that. This probably isn't worth extensive back-patching, but I (tgl) felt cramming it into v11 was reasonable. James Robinson Discussion: https://postgr.es/m/153771876450.14994.560017943128223619@wrigleys.postgresql.org Discussion: https://postgr.es/m/A3942987-5BC7-4F05-B54D-2A0EC2914B33@jlr-photo.com	2018-09-28 16:12:13 -04:00

1 2 3 4 5 ...

18739 Commits