postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	3cac0ec859	Reorder steps in ConditionVariablePrepareToSleep for more safety. In the admittedly-very-unlikely case that AddWaitEventToSet fails, ConditionVariablePrepareToSleep would error out after already having set cv_sleep_target, which is probably bad, and after having already set cv_wait_event_set, which is very bad. Transaction abort might or might not clean up cv_sleep_target properly; but there is nothing that would be aware that the WaitEventSet wasn't fully constructed, so that all future condition variable sleeps would be broken. We can easily guard against these hazards with slight restructuring. Back-patch to v10 where condition_variable.c was introduced. Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com	2018-01-05 19:42:49 -05:00
Tom Lane	aced5a92bf	Rewrite ConditionVariableBroadcast() to avoid live-lock. The original implementation of ConditionVariableBroadcast was, per its self-description, "the dumbest way possible". Thomas Munro found out it was a bit too dumb. An awakened process may immediately re-queue itself, if the specific condition it's waiting for is not yet satisfied. If this happens before ConditionVariableBroadcast is able to see the wait queue as empty, then ConditionVariableBroadcast will re-awaken the same process, repeating the cycle. Given unlucky timing this back-and-forth can repeat indefinitely; loops lasting thousands of seconds have been seen in testing. To fix, add our own process to the end of the wait queue to serve as a sentinel, and exit the broadcast loop once our process is not there anymore. There are various special considerations described in the comments, the principal disadvantage being that wakers can no longer be sure whether they awakened a real waiter or just a sentinel. But in practice nobody pays attention to the result of ConditionVariableSignal or ConditionVariableBroadcast anyway, so that problem seems hypothetical. Back-patch to v10 where condition_variable.c was introduced. Tom Lane and Thomas Munro Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com	2018-01-05 19:21:30 -05:00
Robert Haas	19c47e7c82	Factor error generation out of ExecPartitionCheck. At present, we always raise an ERROR if the partition constraint is violated, but a pending patch for UPDATE tuple routing will consider instead moving the tuple to the correct partition. Refactor to make that simpler. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me. Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com	2018-01-05 15:22:33 -05:00
Alvaro Herrera	df9f682c7b	Fix failure to delete spill files of aborted transactions Logical decoding's reorderbuffer.c may spill transaction files to disk when transactions are large. These are supposed to be removed when they become "too old" by xid; but file removal requires the boundary LSNs of the transaction to be known. The final_lsn is only set when we see the commit or abort record for the transaction, but nothing sets the value for transactions that crash, so the removal code misbehaves -- in assertion-enabled builds, it crashes by a failed assertion. To fix, modify the final_lsn of transactions that don't have a value set, to the LSN of the very latest change in the transaction. This causes the spilled files to be removed appropriately. Author: Atsushi Torikoshi Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp	2018-01-05 12:17:10 -03:00
Peter Eisentraut	054e8c6cdb	Another attempt at fixing build with various OpenSSL versions It seems we can't easily work around the lack of X509_get_signature_nid(), so revert the previous attempts and just disable the tls-server-end-point feature if we don't have it.	2018-01-04 19:09:27 -05:00
Peter Eisentraut	1834c1e432	Add missing includes <openssl/x509.h> is necessary to look into the X509 struct, used by `ac3ff8b1d8`.	2018-01-04 17:56:09 -05:00
Robert Haas	ef6087ee5f	Minor preparatory refactoring for UPDATE row movement. Generalize is_partition_attr to has_partition_attrs and make it accessible from outside tablecmds.c. Change map_partition_varattnos to clarify that it can be used for mapping between any two relations in a partitioning hierarchy, not just parent -> child. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me. Some comment changes by me. Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com	2018-01-04 16:25:49 -05:00
Peter Eisentraut	ac3ff8b1d8	Fix build with older OpenSSL versions Apparently, X509_get_signature_nid() is only in fairly new OpenSSL versions, so use the lower-level interface it is built on instead.	2018-01-04 16:22:06 -05:00
Robert Haas	cc6337d2fe	Simplify and encapsulate tuple routing support code. Instead of having ExecSetupPartitionTupleRouting return multiple out parameters, have it return a pointer to a structure containing all of those different things. Also, provide and use a cleanup function, ExecCleanupTupleRouting, instead of cleaning up all of the resources allocated by ExecSetupPartitionTupleRouting individually. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com	2018-01-04 15:48:15 -05:00
Peter Eisentraut	d3fb72ea6d	Implement channel binding tls-server-end-point for SCRAM This adds a second standard channel binding type for SCRAM. It is mainly intended for third-party clients that cannot implement tls-unique, for example JDBC. Author: Michael Paquier <michael.paquier@gmail.com>	2018-01-04 15:29:50 -05:00
Peter Eisentraut	f3049a603a	Refactor channel binding code to fetch cbind_data only when necessary As things stand now, channel binding data is fetched from OpenSSL and saved into the SCRAM exchange context for any SSL connection attempted for a SCRAM authentication, resulting in data fetched but not used if no channel binding is used or if a different channel binding type is used than what the data is here for. Refactor the code in such a way that binding data is fetched from the SSL stack only when a specific channel binding is used for both the frontend and the backend. In order to achieve that, save the libpq connection context directly in the SCRAM exchange state, and add a dependency to SSL in the low-level SCRAM routines. This makes the interface in charge of initializing the SCRAM context cleaner as all its data comes from either PGconn* (for frontend) or Port* (for the backend). Author: Michael Paquier <michael.paquier@gmail.com>	2018-01-04 13:55:12 -05:00
Peter Eisentraut	3ad2afc2e9	Define LDAPS_PORT if it's missing and disable implicit LDAPS on Windows Some versions of Windows don't define LDAPS_PORT. Also, Windows' ldap_sslinit() is documented to use LDAPS even if you said secure=0 when the port number happens to be 636 or 3269. Let's avoid using the port number to imply that you want LDAPS, so that connection strings have the same meaning on Windows and Unix. Author: Thomas Munro Discussion: https://postgr.es/m/CAEepm%3D23B7GV4AUz3MYH1TKpTv030VHxD2Sn%2BLYWDv8d-qWxww%40mail.gmail.com	2018-01-04 10:34:41 -05:00
Robert Haas	c759395617	Code review for Parallel Append. - Remove unnecessary #include mistakenly added in execnodes.h. - Fix mistake in comment in choose_next_subplan_for_leader. - Adjust row estimates in cost_append for a possibly-different parallel divisor. - Clamp row estimates in cost_append after operations that may not produce integers. Amit Kapila, with cosmetic adjustments by me. Discussion: http://postgr.es/m/CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com	2018-01-04 07:56:09 -05:00
Tom Lane	47c6772eb7	Clean up tupdesc.c for recent changes. TupleDescCopy needs to have the same effects as CreateTupleDescCopy in that, since it doesn't copy constraints, it should clear the per-attribute fields associated with them. Oversight in commit `cc5f81366`. Since TupleDescCopy has already established the presumption that it can just flat-copy the entire attribute array in one go, propagate that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr. (I'm suspicious that this would lead to valgrind complaints if we had any trailing padding in the struct, but we do not, and anyway fixing that seems like a job for a separate commit.) Add some better comments. Thomas Munro, reviewed by Vik Fearing, some additional hacking by me Discussion: https://postgr.es/m/CAEepm=0NvOGZ8B6GbQyQe2C_c2m3LKJ9w=8OMBaYRLgZ_Gw6Nw@mail.gmail.com	2018-01-03 17:53:41 -05:00
Alvaro Herrera	bab2969867	Fix typo Author: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no	2018-01-03 19:12:06 -03:00
Alvaro Herrera	3c27944fb2	Make XactLockTableWait work for transactions that are not yet self-locked XactLockTableWait assumed that its xid argument has already added itself to the lock table. That assumption led to another assumption that if locking the xid has succeeded but the xid is reported as still in progress, then the input xid must have been a subtransaction. These assumptions hold true for the original uses of this code in locking related to on-disk tuples, but they break down in logical replication slot snapshot building -- in particular, when a standby snapshot logged contains an xid that's already in ProcArray but not yet in the lock table. This leads to assertion failures that can be reproduced all the way back to 9.4, when logical decoding was introduced. To fix, change SubTransGetParent to SubTransGetTopmostTransaction which has a slightly different API: it returns the argument Xid if there is no parent, and it goes all the way to the top instead of moving up the levels one by one. Also, to avoid busy-waiting, add a 1ms sleep to give the other process time to register itself in the lock table. For consistency, change ConditionalXactLockTableWait the same way. Author: Petr Jelínek Discussion: https://postgr.es/m/1B3E32D8-FCF4-40B4-AEF9-5C0E3AC57969@postgrespro.ru Reported-by: Konstantin Knizhnik Diagnosed-by: Stas Kelvich, Petr Jelínek Reviewed-by: Andres Freund, Robert Haas	2018-01-03 17:26:20 -03:00
Tom Lane	6fcde24063	Fix some minor errors in new PHJ code. Correct ExecParallelHashTuplePrealloc's estimate of whether the space_allowed limit is exceeded. Be more consistent about tuples that are exactly HASH_CHUNK_THRESHOLD in size (they're "small", not "large"). Neither of these things explain the current buildfarm unhappiness, but they're still bugs. Thomas Munro, per gripe by me Discussion: https://postgr.es/m/CAEepm=34PDuR69kfYVhmZPgMdy8pSA-MYbpesEN1SR+2oj3Y+w@mail.gmail.com	2018-01-03 12:53:49 -05:00
Tom Lane	3decd150a2	Teach eval_const_expressions() to handle some more cases. Add some infrastructure (mostly macros) to make it easier to write typical cases for constant-expression simplification. Add simplification processing for ArrayRef, RowExpr, and ScalarArrayOpExpr node types, which formerly went unsimplified even if all their inputs were constants. Also teach it to simplify FieldSelect from a composite constant. Make use of the new infrastructure to reduce the amount of code needed for the existing ArrayExpr and ArrayCoerceExpr cases. One existing test case changes output as a result of the fact that RowExpr can now be folded to a constant. All the new code is exercised by existing test cases according to gcov, so I feel no need to add additional tests. Tom Lane, reviewed by Dmitry Dolgov Discussion: https://postgr.es/m/3be3b82c-e29c-b674-2163-bf47d98817b1@iki.fi	2018-01-03 12:35:09 -05:00
Peter Eisentraut	35c0754fad	Allow ldaps when using ldap authentication While ldaptls=1 provides an RFC 4513 conforming way to do LDAP authentication with TLS encryption, there was an earlier de facto standard way to do LDAP over SSL called LDAPS. Even though it's not enshrined in a standard, it's still widely used and sometimes required by organizations' network policies. There seems to be no reason not to support it when available in the client library. Therefore, add support when using OpenLDAP 2.4+ or Windows. It can be configured with ldapscheme=ldaps or ldapurl=ldaps://... Add tests for both ways of requesting LDAPS and a test for the pre-existing ldaptls=1. Modify the 001_auth.pl test for "diagnostic messages", which was previously relying on the server rejecting ldaptls=1. Author: Thomas Munro Reviewed-By: Peter Eisentraut Discussion: https://postgr.es/m/CAEepm=1s+pA-LZUjQ-9GQz0Z4rX_eK=DFXAF1nBQ+ROPimuOYQ@mail.gmail.com	2018-01-03 10:11:26 -05:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Andres Freund	f9ccf92e16	Simplify representation of aggregate transition values a bit. Previously aggregate transition values for hash and other forms of aggregation (i.e. sort and no group by) were represented differently. Hash based aggregation used a grouping set indexed array pointing to an array of transition values, whereas other forms of aggregation used one flattened array with the index being computed out of grouping set and transition offsets. That made upcoming changes hard, so represent both as grouping set indexed array of per-group data. As a nice side-effect this also makes aggregation slightly faster, because computing offsets with `transno + (setno * numTrans)` turns out not to be that cheap (too big for x86 lea for example). Author: Andres Freund Discussion: https://postgr.es/m/20171128003121.nmxbm2ounxzb6n2t@alap3.anarazel.de	2018-01-02 18:23:37 -08:00
Tom Lane	5dc692f78d	Ensure proper alignment of tuples in HashMemoryChunkData buffers. The previous coding relied (without any documentation) on the data[] member of HashMemoryChunkData being at a MAXALIGN'ed offset. If it was not, the tuples would not be maxaligned either, leading to failures on alignment-picky machines. While there seems to be no live bug on any platform we support, this is clearly pretty fragile: any addition to or rearrangement of the fields in HashMemoryChunkData could break it. Let's remove the hazard by getting rid of the data[] member and instead using pointer arithmetic with an explicitly maxalign'ed offset. Discussion: https://postgr.es/m/14483.1514938129@sss.pgh.pa.us	2018-01-02 21:23:06 -05:00
Alvaro Herrera	54eff5311d	Fix deadlock hazard in CREATE INDEX CONCURRENTLY Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are supposed to be able to work in parallel, as evidenced by fixes in commit `c3d09b3bd2` specifically to support this case. In reality, one of the sessions would be aborted by a misterious "deadlock detected" error. Jeff Janes diagnosed that this is because of leftover snapshots used for system catalog scans -- this was broken by `8aa3e47510` keeping track of (registering) the catalog snapshot. To fix the deadlocks, it's enough to de-register that snapshot prior to waiting. Backpatch to 9.4, which introduced MVCC catalog scans. Include an isolationtester spec that 8 out of 10 times reproduces the deadlock with the unpatched code for me (Álvaro). Author: Jeff Janes Diagnosed-by: Jeff Janes Reported-by: Jeremy Finzel Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com	2018-01-02 19:16:16 -03:00
Peter Eisentraut	438036264a	Don't cast between GinNullCategory and bool The original idea was that we could use an isNull-style bool array directly as a GinNullCategory array. However, the existing code already acknowledges that that doesn't really work, because of the possibility that bool as currently defined can have arbitrary bit patterns for true values. So it has to loop through the nullFlags array to set each bool value to an acceptable value. But if we are looping through the whole array anyway, we might as well build a proper GinNullCategory array instead and abandon the type casting. That makes the code much safer in case bool is ever changed to something else. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-01-02 12:20:56 -05:00
Andres Freund	93ea78b17c	Fix EXPLAIN ANALYZE output for Parallel Hash. In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size information. Refactor so that participants report only on batches they worked on rather than trying to report on all of them, and teach explain.c to consider the HashInstrumentation object from all participants instead of picking the first one it can find. This should fix an occasional build farm failure in the "join" regression test. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us	2018-01-01 14:38:23 -08:00
Andres Freund	b40933101c	Perform slot validity checks in a separate pass over expression. This reduces code duplication a bit, but the primary benefit that it makes JITing expression evaluation easier. When doing so we can't, as previously done in the interpreted case, really change opcode without recompiling. Nor dow we just carry around unnecessary branches to avoid re-checking over and over. As a minor side-effect this makes ExecEvalStepOp() O(log(N)) rather than O(N). Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2017-12-29 12:45:25 -08:00
Andres Freund	4717fdb14c	Rely on executor utils to build targetlist for DML RETURNING. This is useful because it gets rid of the sole direct user of ExecAssignResultType(). A future commit will likely make use of that and combine creating the targetlist with the initialization of the result slot. But it seems like good code hygiene anyway. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2017-12-29 12:26:29 -08:00
Magnus Hagander	d02974e32e	Properly set base backup backends to active in pg_stat_activity When walsenders were included in pg_stat_activity, only the ones actually streaming WAL were listed as active when they were active. In particular, the connections sending base backups were listed as being idle. Which means that a regular pg_basebackup would show up with one active and one idle connection, when both were active. This patch updates to set all walsenders to active when they are (including those doing very fast things like IDENTIFY_SYSTEM), and then back to idle. Details about exactly what they are doing is available in pg_stat_replication. Patch by me, review by Michael Paquier and David Steele.	2017-12-29 16:28:32 +01:00
Simon Riggs	48c9f49265	Fix race condition when changing synchronous_standby_names A momentary window exists when synchronous_standby_names changes that allows commands issued after the change to continue to act as async until the change becomes visible. Remove the race by using more appropriate test in syncrep.c Author: Asim Rama Praveen and Ashwin Agrawal Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen Reviewed-by: Michael Paquier, Masahiko Sawada	2017-12-29 14:30:33 +00:00
Simon Riggs	2958a672b1	Extend near-wraparound hints to include replication slots Author: Feike Steenbergen Reviewed-by: Michael Paquier	2017-12-29 14:01:25 +00:00
Andres Freund	f83040c62a	Fix rare assertion failure in parallel hash join. When a backend runs out of inner tuples to hash, it should detach from grow_batch_barrier only after it has flushed all batches to disk and merged counters, not before. Otherwise a concurrent backend in ExecParallelHashIncreaseNumBatches() could stop waiting for this backend and try to read tuples before they have been written. This commit reorders those operations and should fix the assertion failures seen occasionally on the build farm since commit `1804284042`. Author: Thomas Munro Discussion: https://postgr.es/m/E1eRwXy-0004IK-TO%40gemulon.postgresql.org	2017-12-28 02:41:53 -08:00
Alvaro Herrera	be2343221f	Protect against hypothetical memory leaks in RelationGetPartitionKey Also, fix a comment that commit `8a0596cb65` made obsolete. Reported-by: Robert Haas Discussion: http://postgr.es/m/CA+TgmoYbpuUUUp2GhYNwWm0qkah39spiU7uOiNXLz20ASfKYoA@mail.gmail.com	2017-12-27 18:06:14 -03:00
Robert Haas	62d02f39e7	Fix race-under-concurrency in PathNameCreateTemporaryDir. Thomas Munro Discussion: http://postgr.es/m/CAEepm=1Vp1e3KtftLtw4B60ZV9teNeKu6HxoaaBptQMsRWjJbQ@mail.gmail.com	2017-12-27 10:56:14 -08:00
Teodor Sigaev	ad337c76b6	Update relation's stats in pg_class during vacuum full. Hash index depends on estimation of numbers of tuples and pages of relations, incorrect value could be a reason of significantly growing of index. Vacuum full recreates heap and reindex all indexes before renewal stats. The patch fixes that, so indexes will see correct values. Backpatch to v10 only because earlier versions haven't usable hash index and growing of hash index is a single user-visible symptom. Author: Amit Kapila Reviewed-by: Ashutosh Sharma, me Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net	2017-12-27 18:25:37 +03:00
Teodor Sigaev	ff963b393c	Add polygon opclass for SP-GiST Polygon opclass uses compress method feature of SP-GiST added earlier. For now it's a single operator class which uses this feature. SP-GiST actually indexes a bounding boxes of input polygons, so part of supported operations are lossy. Opclass uses most methods of corresponding opclass over boxes of SP-GiST and treats bounding boxes as point in 4D-space. Bump catalog version. Authors: Nikita Glukhov, Alexander Korotkov with minor editorization by me Reviewed-By: all authors + Darafei Praliaskouski Discussion: https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru	2017-12-25 18:59:38 +03:00
Andres Freund	4e2970f880	Fix assert with side effects in the new PHJ code. Instead of asserting the assert just set the value to what it was supposed to test... Per coverity.	2017-12-24 02:57:55 -08:00
Tom Lane	c4c2885cbb	Fix UNION/INTERSECT/EXCEPT over no columns. Since 9.4, we've allowed the syntax "select union select" and variants of that. However, the planner wasn't expecting a no-column set operation and ended up treating the set operation as if it were UNION ALL. Turns out it's trivial to fix in v10 and later; we just need to be careful about not generating a Sort node with no sort keys. However, since a weird corner case like this is never going to be exercised by developers, we'd better have thorough regression tests if we want to consider it supported. Per report from Victor Yegorov. Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com	2017-12-22 12:08:06 -05:00
Teodor Sigaev	854823fa33	Add optional compression method to SP-GiST Patch allows to have different types of column and value stored in leaf tuples of SP-GiST. The main application of feature is to transform complex column type to simple indexed type or for truncating too long value, transformation could be lossy. Simple example: polygons are converted to their bounding boxes, this opclass follows. Authors: me, Heikki Linnakangas, Alexander Korotkov, Nikita Glukhov Reviewed-By: all authors + Darafei Praliaskouski Discussions: https://www.postgresql.org/message-id/5447B3FF.2080406@sigaev.ru https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru#54907069.1030506@sigaev.ru	2017-12-22 13:33:16 +03:00
Alvaro Herrera	9373baa0f7	Minor edits to catalog files and scripts This fixes a few typos and small mistakes; it also cleans a few minor stylistic issues. The biggest functional change is that Gen_fmgrtab.pl no longer knows the OID of language 'internal'. Author: John Naylor Discussion: https://postgr.es/m/CAJVSVGXAkwbk-A9QHHHf00N905kKisyQbaYwKqaRpze_gPXGfg@mail.gmail.com	2017-12-21 19:07:32 -03:00
Robert Haas	cce1ecfc77	Adjust assertion in GetCurrentCommandId. currentCommandIdUsed is only used to skip redundant increments of the command counter, and CommandCounterIncrement() is categorically denied under parallelism anyway. Therefore, it's OK for GetCurrentCommandId() to mark the counter value used, as long as it happens in the leader, not a worker. Prior to commit `e9baa5e9fa`, the slightly incorrect check didn't matter, but now it does. A test case added by commit `1804284042` uncovered the problem by accident; it caused failures with force_parallel_mode=on/regress. Report and review by Andres Freund. Patch by me. Discussion: http://postgr.es/m/20171221143106.5lhtygohvmazli3x@alap3.anarazel.de	2017-12-21 13:19:59 -05:00
Tom Lane	6719b238e8	Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit. This patch does three interrelated things: * Create a new expression execution step type EEOP_PARAM_CALLBACK and add the infrastructure needed for add-on modules to generate that. As discussed, the best control mechanism for that seems to be to add another hook function to ParamListInfo, which will be called by ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found. For stand-alone expressions, we add a new entry point to allow the ParamListInfo to be specified directly, since it can't be retrieved from the parent plan node's EState. * Redesign the API for the ParamListInfo paramFetch hook so that the ParamExternData array can be entirely virtual. This also lets us get rid of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to decide which param IDs should be accessible or not. plpgsql_param_fetch was already doing the identical masking check, so having callers do it too seemed redundant. While I was at it, I added a "speculative" flag to paramFetch that the planner can specify as TRUE to avoid unwanted failures. This solves an ancient problem for plpgsql that it couldn't provide values of non-DTYPE_VAR variables to the planner for fear of triggering premature "record not assigned yet" or "field not found" errors during planning. * Rework plpgsql to get rid of the need for "unshared" parameter lists, by dint of turning the single ParamListInfo per estate into a nearly read-only data structure that doesn't instantiate any per-variable data. Instead, the paramFetch hook controls access to per-variable data and can make the right decisions on the fly, replacing the cases that we used to need multiple ParamListInfos for. This might perhaps have been a performance loss on its own, but by using a paramCompile hook we can bypass plpgsql_param_fetch entirely during normal query execution. (It's now only called when, eg, we copy the ParamListInfo into a cursor portal. copyParamList() or SerializeParamList() effectively instantiate the virtual parameter array as a simple physical array without a paramFetch hook, which is what we want in those cases.) This allows reverting most of commit `6c82d8d1f`, though I kept the cosmetic code-consolidation aspects of that (eg the assign_simple_var function). Performance testing shows this to be at worst a break-even change, and it can provide wins ranging up to 20% in test cases involving accesses to fields of "record" variables. The fact that values of such variables can now be exposed to the planner might produce wins in some situations, too, but I've not pursued that angle. In passing, remove the "parent" pointer from the arguments to ExecInitExprRec and related functions, instead storing that pointer in a transient field in ExprState. The ParamListInfo pointer for a stand-alone expression is handled the same way; we'd otherwise have had to add yet another recursively-passed-down argument in expression compilation. Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us	2017-12-21 12:57:45 -05:00
Alvaro Herrera	8a0596cb65	Get rid of copy_partition_key That function currently exists to avoid leaking memory in CacheMemoryContext in case of trouble while the partition key is being built, but there's a better way: allocate everything in a memcxt that goes away if the current (sub)transaction fails, and once the partition key is built and no further errors can occur, make the memcxt permanent by making it a child of CacheMemoryContext. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql	2017-12-21 14:21:39 -03:00
Alvaro Herrera	9ef6aba1d3	Fix typo	2017-12-21 13:36:52 -03:00
Tom Lane	c98c35cd08	Avoid putting build-location-dependent strings into generated files. Various Perl scripts we use to generate files were in the habit of printing things like "generated by $0" into their output files. That looks like a fine idea at first glance, but it results in non-reproducible output, because in VPATH builds $0 won't be just the name of the script file, but a full path for it. We'd prefer that you get identical results whether using VPATH or not, so this is a bad thing. Some of these places also printed their input file name(s), causing an additional hazard of the same type. Hence, establish a policy that thou shalt not print $0, nor input file pathnames, into output files (they're still allowed in error messages, though). Instead just write the script name verbatim. While we are at it, we can make these annotations more useful by giving the script's full relative path name within the PG source tree, eg instead of Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl. Not all of the changes made here actually affect any files shipped in finished tarballs today, but it seems best to apply the policy everyplace so that nobody copies unsafe code into places where it could matter. Christoph Berg and Tom Lane Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de	2017-12-21 10:57:06 -05:00
Robert Haas	59d1e2b95a	Cancel CV sleep during subtransaction abort. Generally, error recovery paths that need to do things like LWLockReleaseAll and pgstat_report_wait_end also need to call ConditionVariableCancelSleep, but AbortSubTransaction was missed. Since subtransaction abort might destroy up the DSM segment that contains the ConditionVariable stored in cv_sleep_target, this can result in a crash for anything using condition variables. Reported and diagnosed by Andres Freund. Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de	2017-12-21 09:24:30 -05:00
Andres Freund	1804284042	Add parallel-aware hash joins. Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com	2017-12-21 00:43:41 -08:00
Robert Haas	f94eec490b	When passing query strings to workers, pass the terminating \0. Otherwise, when the query string is read, we might trailing garbage beyond the end, unless there happens to be a \0 there by good luck. Report and patch by Thomas Munro. Reviewed by Rafia Sabih. Discussion: http://postgr.es/m/CAEepm=2SJs7X+_vx8QoDu8d1SMEOxtLhxxLNzZun_BvNkuNhrw@mail.gmail.com	2017-12-20 17:26:50 -05:00
Robert Haas	8526bcb2df	Try again to fix accumulation of parallel worker instrumentation. When a Gather or Gather Merge node is started and stopped multiple times, accumulate instrumentation data only once, at the end, instead of after each execution, to avoid recording inflated totals. Commit `778e78ae9f`, the previous attempt at a fix, instead reset the state after every execution, which worked for the general instrumentation data but had problems for the additional instrumentation specific to Sort and Hash nodes. Report by hubert depesz lubaczewski. Analysis and fix by Amit Kapila, following a design proposal from Thomas Munro, with a comment tweak by me. Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com	2017-12-19 12:21:56 -05:00
Robert Haas	38fc54703e	Re-fix wrong costing of Sort under Gather Merge. Commit `dc02c7bca4` changed this call to create_sort_path() to take -1 rather than limit_tuples because, at that time, there was no way for a Sort beneath a Gather Merge to become a top-N sort. Later, commit `3452dc5240` provided a way for a Sort beneath a Gather Merge to become a top-N sort, but failed to revert the previous commit in the process. Do that. Report and analysis by Jeff Janes; patch by Thomas Munro; review by Amit Kapila and by me. Discussion: http://postgr.es/m/CAEepm=1BWtC34vUroA0Uqjw02MaqdUrW+d6WD85_k8SLyPiKHQ@mail.gmail.com	2017-12-19 10:42:17 -05:00
Andres Freund	ab9e0e718a	Add shared tuplestores. SharedTuplestore allows multiple participants to write into it and then read the tuples back from it in parallel. Each reader receives partial results. For now it always uses disk files, but other buffering policies and other kinds of scans (ie each reader receives complete results) may be useful in future. The upcoming parallel hash join feature will use this facility. Author: Thomas Munro Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com	2017-12-18 14:23:19 -08:00
Peter Eisentraut	25d532698d	Move SCRAM-related name definitions to scram-common.h Mechanism names for SCRAM and channel binding names have been included in scram.h by the libpq frontend code, and this header references a set of routines which are only used by the backend. scram-common.h is on the contrary usable by both the backend and libpq, so getting those names from there seems more reasonable. Author: Michael Paquier <michael.paquier@gmail.com>	2017-12-18 16:59:48 -05:00
Fujii Masao	56a95ee511	Fix bug in cancellation of non-exclusive backup to avoid assertion failure. Previously an assertion failure occurred when pg_stop_backup() for non-exclusive backup was aborted while it's waiting for WAL files to be archived. This assertion failure happened in do_pg_abort_backup() which was called when a non-exclusive backup was canceled. do_pg_abort_backup() assumes that there is at least one non-exclusive backup running when it's called. But pg_stop_backup() can be canceled even after it marks the end of non-exclusive backup (e.g., during waiting for WAL archiving). This broke the assumption that do_pg_abort_backup() relies on, and which caused an assertion failure. This commit changes do_pg_abort_backup() so that it does nothing when non-exclusive backup has been already marked as completed. That is, the asssumption is also changed, and do_pg_abort_backup() now can handle even the case where it's called when there is no running backup. Backpatch to 9.6 where SQL-callable non-exclusive backup was added. Author: Masahiko Sawada and Michael Paquier Reviewed-By: Robert Haas and Fujii Masao Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com	2017-12-19 03:46:14 +09:00
Robert Haas	fd7c0fa732	Fix crashes on plans with multiple Gather (Merge) nodes. es_query_dsa turns out to be broken by design, because it supposes that there is only one DSA for the whole query, whereas there is actually one per Gather (Merge) node. For now, work around that problem by setting and clearing the pointer around the sections of code that might need it. It's probably a better idea to get rid of es_query_dsa altogether in favor of having each node keep track individually of which DSA is relevant, but that seems like more than we would want to back-patch. Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit Kapila, and by me. Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com	2017-12-18 12:22:31 -05:00
Magnus Hagander	7731c32087	Fix typo on comment Author: David Rowley	2017-12-18 11:24:55 +01:00
Tom Lane	b31a9d7dd3	Suppress compiler warning about no function return value. Compilers that don't know that ereport(ERROR) doesn't return complained about the new coding in scanint8() introduced by commit `101c7ee3e`. Tweak coding to avoid the warning. Per buildfarm.	2017-12-17 00:41:41 -05:00
Andres Freund	699bf7d05c	Perform a lot more sanity checks when freezing tuples. The previous commit has shown that the sanity checks around freezing aren't strong enough. Strengthening them seems especially important because the existance of the bug has caused corruption that we don't want to make even worse during future vacuum cycles. The errors are emitted with ereport rather than elog, despite being "should never happen" messages, so a proper error code is emitted. To avoid superflous translations, mark messages as internal. Author: Andres Freund and Alvaro Herrera Reviewed-By: Alvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de Backpatch: 9.3-	2017-12-14 18:20:47 -08:00
Andres Freund	9c2f0a6c3c	Fix pruning of locked and updated tuples. Previously it was possible that a tuple was not pruned during vacuum, even though its update xmax (i.e. the updating xid in a multixact with both key share lockers and an updater) was below the cutoff horizon. As the freezing code assumed, rightly so, that that's not supposed to happen, xmax would be preserved (as a member of a new multixact or xmax directly). That causes two problems: For one the tuple is below the xmin horizon, which can cause problems if the clog is truncated or once there's an xid wraparound. The bigger problem is that that will break HOT chains, which in turn can lead two to breakages: First, failing index lookups, which in turn can e.g lead to constraints being violated. Second, future hot prunes / vacuums can end up making invisible tuples visible again. There's other harmful scenarios. Fix the problem by recognizing that tuples can be DEAD instead of RECENTLY_DEAD, even if the multixactid has alive members, if the update_xid is below the xmin horizon. That's safe because newer versions of the tuple will contain the locking xids. A followup commit will harden the code somewhat against future similar bugs and already corrupted data. Author: Andres Freund, with changes by Alvaro Herrera Reported-By: Daniel Wood Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de Backpatch: 9.3-	2017-12-14 18:20:47 -08:00
Andrew Dunstan	0fedb4ea69	Fix walsender timeouts when decoding a large transaction The logical slots have a fast code path for sending data so as not to impose too high a per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that a transaction with a large number of changes may take a very long time to be processed and sent to the client. This causes the walsender to ignore interrupts for potentially a long time and more importantly it will result in the walsender being killed due to timeout at the end of such a transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when the last keepalive check happened less than half the walsender timeout ago. Otherwise the slower code path will be taken. Backpatched to 9.4 Petr Jelinek, reviewed by Kyotaro HORIGUCHI, Yura Sokolov, Craig Ringer and Robert Haas. Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com	2017-12-14 11:13:14 -05:00
Andres Freund	538d114f6d	Allow executor nodes to change their ExecProcNode function. In order for executor nodes to be able to change their ExecProcNode function after ExecInitNode() has finished, provide ExecSetExecProcNode(). This allows any wrappers functions that only execProcnode.c knows about to be reinstalled. The motivation for wanting to change ExecProcNode after ExecInitNode() has finished is that it is not known until later whether parallel query is available, so if a parallel variant is to be installed then ExecInitNode() is too soon to decide. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com	2017-12-13 15:47:01 -08:00
Andres Freund	923e8dee88	Add defenses against pre-crash files to BufFileOpenShared(). Crash restarts currently don't clean up temporary files, as a debugging aid. If a left-over file happens to have the same name as a segment file we're trying to create, we'll just truncate and reuse it, but there is a problem: BufFileOpenShared() determines how many segment files exist by trying to open .0, .1, .2, ... until it finds no more files. It might be confused by a junk file that has the next segment number. To defend against that, make sure we always create a gap after the end file by unlinking the following name if it exists. Also make it an error to try to open a BufFile that doesn't exist (has no segment 0), so as not to encourage the development of client code that depends on an interface that we can't reliably provide. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D2jhCbC_GFQJaaDhWxLB4EXtT3vVd5czuRNaqF5CWSTog%40mail.gmail.com	2017-12-13 13:27:41 -08:00
Robert Haas	884a60840c	Fix parallel index scan hang with deleted or half-dead pages. The previous coding forgot to release the scan before seizing it again, leading to a lockup. Report by Patrick Hemmer. Diagnosis by Thomas Munro. Patch by Amit Kapila. Discussion: http://postgr.es/m/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com	2017-12-13 16:15:44 -05:00
Robert Haas	1d6fb35ad6	Revert "Fix accumulation of parallel worker instrumentation." This reverts commit `2c09a5c12a`. Per further discussion, that doesn't seem to be the best possible fix. Discussion: http://postgr.es/m/CAA4eK1LW2aFKzY3=vwvc=t-juzPPVWP2uT1bpx_MeyEqnM+p8g@mail.gmail.com	2017-12-13 15:19:28 -05:00
Tom Lane	9fa6f00b13	Rethink MemoryContext creation to improve performance. This patch makes a number of interrelated changes to reduce the overhead involved in creating/deleting memory contexts. The key ideas are: * Include the AllocSetContext header of an aset.c context in its first malloc request, rather than allocating it separately in TopMemoryContext. This means that we now always create an initial or "keeper" block in an aset, even if it never receives any allocation requests. * Create freelists in which we can save and recycle recently-destroyed asets (this idea is due to Robert Haas). * In the common case where the name of a context is a constant string, just store a pointer to it in the context header, rather than copying the string. The first change eliminates a palloc/pfree cycle per context, and also avoids bloat in TopMemoryContext, at the price that creating a context now involves a malloc/free cycle even if the context never receives any allocations. That would be a loser for some common usage patterns, but recycling short-lived contexts via the freelist eliminates that pain. Avoiding copying constant strings not only saves strlen() and strcpy() overhead, but is an essential part of the freelist optimization because it makes the context header size constant. Currently we make no attempt to use the freelist for contexts with non-constant names. (Perhaps someday we'll need to think harder about that, but in current usage, most contexts with custom names are long-lived anyway.) The freelist management in this initial commit is pretty simplistic, and we might want to refine it later --- but in common workloads that will never matter because the freelists will never get full anyway. To create a context with a non-constant name, one is now required to call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME option. AllocSetContextCreate becomes a wrapper macro, and it includes a test that will complain about non-string-literal context name parameters on gcc and similar compilers. An unfortunate side effect of making AllocSetContextCreate a macro is that one is now required to use the size parameter abstraction macros (ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of writing out individual size parameters no longer works unless you switch to AllocSetContextCreateExtended. Internally to the memory-context-related modules, the context creation APIs are simplified, removing the rather baroque original design whereby a context-type module called mcxt.c which then called back into the context-type module. That saved a bit of code duplication, but not much, and it prevented context-type modules from exercising control over the allocation of context headers. In passing, I converted the test-and-elog validation of aset size parameters into Asserts to save a few more cycles. The original thought was that callers might compute size parameters on the fly, but in practice nobody does that, so it's useless to expend cycles on checking those numbers in production builds. Also, mark the memory context method-pointer structs "const", just for cleanliness. Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us	2017-12-13 13:55:16 -05:00
Peter Eisentraut	3d8874224f	Fix crash when using CALL on an aggregate Author: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reported-by: Rushabh Lathia <rushabh.lathia@gmail.com>	2017-12-13 10:37:48 -05:00
Andres Freund	8e211f5391	Add float.h include to int8.c, for isnan(). port.h redirects isnan() to _isnan() on windows, which in turn is provided by float.h rather than math.h. Therefore include the latter as well. Per buildfarm.	2017-12-12 23:32:43 -08:00
Andres Freund	f512a6e132	Consistently use PG_INT(16\|32\|64)_(MIN\|MAX). Per buildfarm animal woodlouse.	2017-12-12 18:19:13 -08:00
Andres Freund	101c7ee3ee	Use new overflow aware integer operations. A previous commit added inline functions that provide fast(er) and correct overflow checks for signed integer math. Use them in a significant portion of backend code. There's more to touch in both backend and frontend code, but these were the easily identifiable cases. The old overflow checks are noticeable in integer heavy workloads. A secondary benefit is that getting rid of overflow checks that rely on signed integer overflow wrapping around, will allow us to get rid of -fwrapv in the future. Which in turn slows down other code. Author: Andres Freund Discussion: https://postgr.es/m/20171024103954.ztmatprlglz3rwke@alap3.anarazel.de	2017-12-12 16:55:37 -08:00
Robert Haas	95b52351fe	Remove obsolete comment. Commit `8b304b8b72` removed replacement selection, but left behind this comment text. The optimization to which the comment refers is not relevant without replacement selection, because if we had so few tuples as to require only one tape, we would have just completed the sort in memory. Peter Geoghegan Discussion: http://postgr.es/m/CAH2-WznqupLA8CMjp+vqzoe0yXu0DYYbQSNZxmgN76tLnAOZ_w@mail.gmail.com	2017-12-12 19:33:50 -05:00
Robert Haas	d329dc2ea4	Remove bug from OPTIMIZER_DEBUG code for partition-wise join. Etsuro Fujita, reviewed by Ashutosh Bapat Discussion: http://postgr.es/m/5A2A60E6.6000008@lab.ntt.co.jp	2017-12-12 10:52:15 -05:00
Peter Eisentraut	4034db215b	Fix comment Reported-by: Noah Misch <noah@leadboat.com>	2017-12-11 16:37:39 -05:00
Tom Lane	7eb16ab17d	Fix corner-case coredump in _SPI_error_callback(). I noticed that _SPI_execute_plan initially sets spierrcontext.arg = NULL, and only fills it in some time later. If an error were to happen in between, _SPI_error_callback would try to dereference the null pointer. This is unlikely --- there's not much between those points except push-snapshot calls --- but it's clearly not impossible. Tweak the callback to do nothing if the pointer isn't set yet. It's been like this for awhile, so back-patch to all supported branches.	2017-12-11 16:34:28 -05:00
Robert Haas	01a0ca1bed	Improve comment about PartitionBoundInfoData. Ashutosh Bapat, per discussion with Julien Rouhaund, who also reviewed this patch. Discussion: http://postgr.es/m/CAFjFpReBR3ftK9C23LLCZY_TDXhhjB_dgE-L9+mfTnA=gkvdvQ@mail.gmail.com	2017-12-11 12:52:15 -05:00
Magnus Hagander	d8f632caec	Fix typo Reported by Robins Tharakan	2017-12-09 11:40:31 +01:00
Peter Eisentraut	005ac298b1	Prohibit identity columns on typed tables and partitions Those cases currently crash and supporting them is more work then originally thought, so we'll just prohibit these scenarios for now. Author: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Reported-by: Мансур Галиев <gomer94@yandex.ru> Bug: #14866	2017-12-08 12:13:04 -05:00
Peter Eisentraut	af9f8b7ca3	Fix mistake in comment Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>	2017-12-08 11:23:36 -05:00
Peter Eisentraut	2d2d06b7e2	Apply identity sequence values on COPY A COPY into a table should apply identity sequence values just like it does for ordinary defaults. This was previously forgotten, leading to null values being inserted, which in turn would fail because identity columns have not-null constraints. Author: Michael Paquier <michael.paquier@gmail.com> Reported-by: Steven Winfield <steven.winfield@cantabcapital.com> Bug: #14952	2017-12-08 09:18:18 -05:00
Robert Haas	28724fd90d	Report failure to start a background worker. When a worker is flagged as BGW_NEVER_RESTART and we fail to start it, or if it is not marked BGW_NEVER_RESTART but is terminated before startup succeeds, what BgwHandleStatus should be reported? The previous code really hadn't considered this possibility (as indicated by the comments which ignore it completely) and would typically return BGWH_NOT_YET_STARTED, but that's not a good answer, because then there's no way for code using GetBackgroundWorkerPid() to tell the difference between a worker that has not started but will start later and a worker that has not started and will never be started. So, when this case happens, return BGWH_STOPPED instead. Update the comments to reflect this. The preceding fix by itself is insufficient to fix the problem, because the old code also didn't send a notification to the process identified in bgw_notify_pid when startup failed. That might've been technically correct under the theory that the status of the worker was BGWH_NOT_YET_STARTED, because the status would indeed not change when the worker failed to start, but now that we're more usefully reporting BGWH_STOPPED, a notification is needed. Without these fixes, code which starts background workers and then uses the recommended APIs to wait for those background workers to start would hang indefinitely if the postmaster failed to fork a worker. Amit Kapila and Robert Haas Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com	2017-12-06 08:58:27 -05:00
Robert Haas	9c64ddd414	Fix Parallel Append crash. Reported by Tom Lane and the buildfarm. Amul Sul and Amit Khandekar Discussion: http://postgr.es/m/17868.1512519318@sss.pgh.pa.us Discussion: http://postgr.es/m/CAJ3gD9cJQ4d-XhmZ6BqM9rMM2KDBfpkdgOAb4+psz56uBuMQ_A@mail.gmail.com	2017-12-06 08:42:50 -05:00
Robert Haas	ab72716778	Support Parallel Append plan nodes. When we create an Append node, we can spread out the workers over the subplans instead of piling on to each subplan one at a time, which should typically be a bit more efficient, both because the startup cost of any plan executed entirely by one worker is paid only once and also because of reduced contention. We can also construct Append plans using a mix of partial and non-partial subplans, which may allow for parallelism in places that otherwise couldn't support it. Unfortunately, this patch doesn't handle the important case of parallelizing UNION ALL by running each branch in a separate worker; the executor infrastructure is added here, but more planner work is needed. Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com	2017-12-05 17:28:39 -05:00
Robert Haas	2c09a5c12a	Fix accumulation of parallel worker instrumentation. When a Gather or Gather Merge node is started and stopped multiple times, the old code wouldn't reset the shared state between executions, potentially resulting in dramatically inflated instrumentation data for nodes beneath it. (The per-worker instrumentation ended up OK, I think, but the overall totals were inflated.) Report by hubert depesz lubaczewski. Analysis and fix by Amit Kapila, reviewed and tweaked a bit by me. Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com	2017-12-05 14:35:33 -05:00
Andres Freund	5bcf389ecf	Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate. If a hash join appears in a parallel query, there may be no hash table available for explain.c to inspect even though a hash table may have been built in other processes. This could happen either because parallel_leader_participation was set to off or because the leader happened to hit the end of the outer relation immediately (even though the complete relation is not empty) and decided not to build the hash table. Commit `bf11e7ee` introduced a way for workers to exchange instrumentation via the DSM segment for Sort nodes even though they are not parallel-aware. This commit does the same for Hash nodes, so that explain.c has a way to find instrumentation data from an arbitrary participant that actually built the hash table. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com	2017-12-05 10:55:56 -08:00
Tom Lane	8dc3c971a9	Treat directory open failures as hard errors in ResetUnloggedRelations(). Previously, this code just reported such problems at LOG level and kept going. The problem with this approach is that transient failures (e.g., ENFILE) could prevent us from resetting unlogged relations to empty, yet allow recovery to appear to complete successfully. That seems like a data corruption hazard large enough to treat such problems as reasons to fail startup. For the same reason, treat unlink failures for unlogged files as hard errors not just LOG messages. It's a little odd that we did it like that when file-level errors in other steps (copy_file, fsync_fname) are ERRORs. The sole case that I left alone is that ENOENT failure on a tablespace (not database) directory is not an error, though it will now be logged rather than just silently ignored. This is to cover the scenario where a previous DROP TABLESPACE removed the tablespace directory but failed before removing the pg_tblspc symlink. I'm not sure that that's very likely in practice, but that seems like the only real excuse for the old behavior here, so let's allow for it. (As coded, this will also allow ENOENT on $PGDATA/base/. But since we'll fail soon enough if that's gone, I don't think we need to complicate this code by distinguishing that from a true tablespace case.) Discussion: https://postgr.es/m/21040.1512418508@sss.pgh.pa.us	2017-12-04 20:52:59 -05:00
Tom Lane	066bc21c0e	Simplify do_pg_start_backup's API by opening pg_tblspc internally. do_pg_start_backup() expects its callers to pass in an open DIR pointer for the pg_tblspc directory, but there's no apparent advantage in that. It complicates the callers without adding any flexibility, and there's no robustness advantage, since we surely have to be prepared for errors during the scan of pg_tblspc anyway. In fact, by holding an extra kernel resource during operations like the preliminary checkpoint, we might be making things a fraction more failure-prone not less. Hence, remove that argument and open the directory just for the duration of the actual scan. Discussion: https://postgr.es/m/28752.1512413887@sss.pgh.pa.us	2017-12-04 18:37:54 -05:00
Tom Lane	561885db05	Improve error handling in RemovePgTempFiles(). Modify this function and its subsidiaries so that syscall failures are reported via ereport(LOG), rather than silently ignored as before. We don't want to throw a hard ERROR, as that would prevent database startup, and getting rid of leftover temporary files is not important enough for that. On the other hand, not reporting trouble at all seems like an odd choice not in line with current project norms, especially since any failure here is quite unexpected. On the same reasoning, adjust these functions' AllocateDir/ReadDir calls so that failure to scan a directory results in LOG not ERROR. I also removed the previous practice of silently ignoring ENOENT failures during directory opens --- there are some corner cases where that could happen given a previous database crash, but that seems like a bad excuse for ignoring a condition that isn't expected in most cases. A LOG message during postmaster start seems OK in such situations, and better than no output at all. In passing, make RemovePgTempRelationFiles' test for "is the file name all digits" look more like the way it's done elsewhere. Discussion: https://postgr.es/m/19907.1512402254@sss.pgh.pa.us	2017-12-04 17:59:35 -05:00
Tom Lane	2069e6faa0	Clean up assorted messiness around AllocateDir() usage. This patch fixes a couple of low-probability bugs that could lead to reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE) concerning directory-open or file-open failures. It also fixes places where we took shortcuts in reporting such errors, either by using elog instead of ereport or by using ereport but forgetting to specify an errcode. And it eliminates a lot of just plain redundant error-handling code. In service of all this, export fd.c's formerly-static function ReadDirExtended, so that external callers can make use of the coding pattern dir = AllocateDir(path); while ((de = ReadDirExtended(dir, path, LOG)) != NULL) if they'd like to treat directory-open failures as mere LOG conditions rather than errors. Also fix FreeDir to be a no-op if we reach it with dir == NULL, as such a coding pattern would cause. Then, remove code at many call sites that was throwing an error or log message for AllocateDir failure, as ReadDir or ReadDirExtended can handle that job just fine. Aside from being a net code savings, this gets rid of a lot of not-quite-up-to-snuff reports, as mentioned above. (In some places these changes result in replacing a custom error message such as "could not open tablespace directory" with more generic wording "could not open directory", but it was agreed that the custom wording buys little as long as we report the directory name.) In some other call sites where we can't just remove code, change the error reports to be fully project-style-compliant. Also reorder code in restoreTwoPhaseData that was acquiring a lock between AllocateDir and ReadDir; in the unlikely but surely not impossible case that LWLockAcquire changes errno, AllocateDir failures would be misreported. There is no great value in opening the directory before acquiring TwoPhaseStateLock, so just do it in the other order. Also fix CheckXLogRemoved to guarantee that it preserves errno, as quite a number of call sites are implicitly assuming. (Again, it's unlikely but I think not impossible that errno could change during a SpinLockAcquire. If so, this function was broken for its own purposes as well as breaking callers.) And change a few places that were using not-per-project-style messages, such as "could not read directory" when "could not open directory" is more correct. Back-patch the exporting of ReadDirExtended, in case we have occasion to back-patch some fix that makes use of it; it's not needed right now but surely making it global is pretty harmless. Also back-patch the restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is essentially cosmetic and need not get back-patched. Michael Paquier, with a bit of additional work by me Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com	2017-12-04 17:02:56 -05:00
Robert Haas	ab6eaee884	When VACUUM or ANALYZE skips a concurrently dropped table, log it. Hopefully, the additional logging will help avoid confusion that could otherwise result. Nathan Bossart, reviewed by Michael Paquier, Fabrízio Mello, and me	2017-12-04 15:25:55 -05:00
Tom Lane	ecc27d55f4	Support boolean columns in functional-dependency statistics. There's no good reason that the multicolumn stats stuff shouldn't work on booleans. But it looked only for "Var = pseudoconstant" clauses, and it will seldom find those for boolean Vars, since earlier phases of planning will fold "boolvar = true" or "boolvar = false" to just "boolvar" or "NOT boolvar" respectively. Improve dependencies_clauselist_selectivity() to recognize such clauses as equivalent to equality restrictions. This fixes a failure of the extended stats mechanism to apply in a case reported by Vitaliy Garnashevich. It's not a complete solution to his problem because the bitmap-scan costing code isn't consulting extended stats where it should, but that's surely an independent issue. In passing, improve some comments, get rid of a NumRelids() test that's redundant with the preceding bms_membership() test, and fix dependencies_clauselist_selectivity() so that estimatedclauses actually is a pure output argument as stated by its API contract. Back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/73a4936d-2814-dc08-ed0c-978f76f435b0@gmail.com	2017-12-04 11:51:43 -05:00
Robert Haas	9f4992e2a9	Remove memory leak protection from Gather and Gather Merge nodes. Before commit `6b65a7fe62`, tqueue.c could perform tuple remapping and thus leak memory, which is why commit `af33039317` made TupleQueueReaderNext run in a short-lived context. Now, however, tqueue.c has been reduced to a shadow of its former self, and there shouldn't be any chance of leaks any more. Accordingly, remove some tuple copying and memory context manipulation to speed up processing. Patch by me, reviewed by Amit Kapila. Some testing by Rafia Sabih. Discussion: http://postgr.es/m/CAA4eK1LSDydwrNjmYSNkfJ3ZivGSWH9SVswh6QpNzsMdj_oOQA@mail.gmail.com	2017-12-04 10:39:24 -05:00
Andres Freund	ec6a040056	Adjust #ifdef EXEC_BACKEND RemovePgTempFilesInDir() call. Other callers were adjusted in the course of `dc6c4c9dc2`. Per buildfarm.	2017-12-01 17:28:05 -08:00
Andres Freund	dc6c4c9dc2	Add infrastructure for sharing temporary files between backends. SharedFileSet allows temporary files to be created by one backend and then exported for read-only access by other backends, with clean-up managed by reference counting associated with a DSM segment. This includes changes to fd.c and buffile.c to support the new kind of temporary file. This will be used by an upcoming patch adding support for parallel hash joins. Author: Thomas Munro Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas, Rushabh Lathia Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAH2-WznJ_UgLux=_jTgCQ4yFz0iBntudsNKa1we3kN1BAG=88w@mail.gmail.com	2017-12-01 16:30:56 -08:00
Robert Haas	35438e5763	Minor code beautification in partition_bounds_equal. Use get_greatest_modulus more consistently, instead of doing the same thing in an ad-hoc manner in this one place. Ashutosh Bapat Discussion: http://postgr.es/m/CAFjFpReT9L4RCiJBKOyWC2=i02kv9uG2fx=4Fv7kFY2t0SPCgw@mail.gmail.com	2017-12-01 13:52:59 -05:00
Robert Haas	87c37e3291	Re-allow INSERT .. ON CONFLICT DO NOTHING on partitioned tables. Commit `8355a011a0` was reverted in `f05230752d`, but this attempt is hopefully better-considered: we now pass the correct value to ExecOpenIndices, which should avoid the crash that we hit before. Amit Langote, reviewed by Simon Riggs and by me. Some final editing by me. Discussion: http://postgr.es/m/7ff1e8ec-dc39-96b1-7f47-ff5965dceeac@lab.ntt.co.jp	2017-12-01 12:53:21 -05:00
Robert Haas	1cbc17aaca	Try to exclude partitioned tables in toto. Ashutosh Bapat, reviewed by Jeevan Chalke. Comment by me. Discussion: http://postgr.es/m/CAFjFpRcuRaydz88CY_aQekmuvmN2A9ax5z0k=ppT+s8KS8xMRA@mail.gmail.com	2017-12-01 10:59:09 -05:00
Robert Haas	59c8078744	Fix uninitialized memory reference. Without this, when partdesc->nparts == 0, we end up calling ExecBuildSlotPartitionKeyDescription without initializing values and isnull. Reported by Coverity via Michael Paquier. Patch by Michael Paquier, reviewed and revised by Amit Langote. Discussion: http://postgr.es/m/CAB7nPqQ3mwkdMoPY-ocgTpPnjd8TKOadMxdTtMLvEzF8480Zfg@mail.gmail.com	2017-12-01 10:05:00 -05:00
Peter Eisentraut	86ab28fbd1	Check channel binding flag at end of SCRAM exchange We need to check whether the channel-binding flag encoded in the client-final-message is the same one sent in the client-first-message. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-12-01 09:53:26 -05:00
Robert Haas	06ae669c92	Remove extra word from comment. David Rowley, who also was the primary author of the patch that added this function; the attribution in my previous commit, `84940644de`, was incorrect due to sloppiness on my part. Discussion: http://postgr.es/m/CAKJS1f_0iSiLQsf_c06AzOWAc3eS6ePjjVQFpcFv3W-O5aktnQ@mail.gmail.com	2017-11-30 16:22:38 -05:00
Peter Eisentraut	e4128ee767	SQL procedures This adds a new object type "procedure" that is similar to a function but does not have a return type and is invoked by the new CALL statement instead of SELECT or similar. This implementation is aligned with the SQL standard and compatible with or similar to other SQL implementations. This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well as ALTER/DROP ROUTINE that can refer to either a function or a procedure (or an aggregate function, as an extension to SQL). There is also support for procedures in various utility commands such as COMMENT and GRANT, as well as support in pg_dump and psql. Support for defining procedures is available in all the languages supplied by the core distribution. While this commit is mainly syntax sugar around existing functionality, future features will rely on having procedures as a separate object type. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>	2017-11-30 11:03:20 -05:00
Robert Haas	1761653bbb	Make create_unique_path manage memory like mark_dummy_rel. Put the unique path in the same context as the owning RelOptInfo, rather than the toplevel planner context. This is how this function worked originally, but commit `f41803bb39` changed it without explanation. mark_dummy_rel adopted the older (or newer?) technique in commit `eca75a12a2`, which also featured a much better explanation of why it is correct. So, switch back to that technique here, with the same explanation given there. Although this fixes a possible memory leak when GEQO is in use, the leak is minor and probably nobody cares, so no back-patch. Ashutosh Bapat, reviewed by Tom Lane and by me Discussion: http://postgr.es/m/CAFjFpRcXkHHrXyD9BCvkgGJV4TnHG2SWJ0PhJfrDu3NAcQvh7g@mail.gmail.com	2017-11-30 09:50:10 -05:00
Tom Lane	7ca25b7de6	Fix neqjoinsel's behavior for semi/anti join cases. Previously, this function estimated the selectivity as 1 minus eqjoinsel() for the negator equality operator, regardless of join type (I think there was an expectation that eqjoinsel would handle the join type). But actually this is completely wrong for semijoin cases: the fraction of the LHS that has a non-matching row is not one minus the fraction of the LHS that has a matching row. In reality a semijoin with <> will nearly always succeed: it can only fail when the RHS is empty, or it contains a single distinct value that is equal to the particular LHS value, or the LHS value is null. The only one of those things we should have much confidence in estimating is the fraction of LHS values that are null, so let's just take the selectivity as 1 minus outer nullfrac. Per coding convention, antijoin should be estimated the same as semijoin. Arguably this is a bug fix, but in view of the lack of field complaints and the risk of destabilizing plans, no back-patch. Thomas Munro, reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com	2017-11-29 22:00:37 -05:00
Andres Freund	1145acc70d	Add a barrier primitive for synchronizing backends. Provide support for dynamic or static parties of processes to wait for all processes to reach point in the code before continuing. This is similar to the mechanism of the same name in POSIX threads and MPI, though has explicit phasing and dynamic party support like the Java core library's Phaser. This will be used by an upcoming patch adding support for parallel hash joins. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=2_y7oi01OjA_wLvYcWMc9_d=LaoxrY3eiROCZkB_qakA@mail.gmail.com	2017-11-29 17:07:16 -08:00
Robert Haas	84940644de	New C function: bms_add_range This will be used by pending patches to improve partition pruning. Amit Langote and Kyotaro Horiguchi, per a suggestion from David Rowley. Review and testing of the larger patch set of which this is a part by Ashutosh Bapat, David Rowley, Dilip Kumar, Jesper Pedersen, Rajkumar Raghuwanshi, Beena Emerson, Amul Sul, and Kyotaro Horiguchi. Discussion: http://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp	2017-11-29 17:12:05 -05:00
Robert Haas	eaedf0df71	Update typedefs.list and re-run pgindent Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com	2017-11-29 09:24:24 -05:00
Tom Lane	801386af62	Clarify old comment about qual_is_pushdown_safe's handling of subplans. This comment glossed over the difference between initplans and subplans, but they are indeed different for our purposes here.	2017-11-28 23:32:17 -05:00
Alvaro Herrera	3848d21673	Make memset() use sizeof() rather than re-compute size This is simpler and more closely follows overwhelming precedent. Report and patch by Mark Dilger. Discussion: https://postgr.es/m/9A68FB88-5F45-4848-9926-8586E2D777D1@gmail.com	2017-11-29 00:13:45 -03:00
Alvaro Herrera	414cd434ff	Fix extstat collection when no stats are produced for a column This is a mistakenly placed conditional in `bf2a691e02`. Reported by Justin Pryzby Discussion: https://postgr.es/m/20171117214352.GE25796@telsasoft.com	2017-11-28 23:40:11 -03:00
Robert Haas	cd482f2951	Fix wrong function name in comment. Rushabh Lathia Discussion: http://postgr.es/m/CAGPqQf2z5g+7YmGZSZgKoiFsaUB+63Rzmz8-5PQHuS6hd14FEg@mail.gmail.com	2017-11-28 14:17:21 -05:00
Robert Haas	2d7950f222	If a range-partitioned table has no default partition, reject null keys. Commit `4e5fe9ad19` introduced this problem. Also add a test so it doesn't get broken again. Report by Rushabh Lathia. Fix by Amit Langote. Reviewed by Rushabh Lathia and Amul Sul. Tweaked by me. Discussion: http://postgr.es/m/CAGPqQf0Y1iJyk4QJBdMf=pS9i6Q0JUMM_h5-qkR3OMJ-e04PyA@mail.gmail.com	2017-11-28 14:11:16 -05:00
Robert Haas	445dbd82a3	Fix ReinitializeParallelDSM to tolerate finding no error queues. Commit `d466335064` changed things so that shm_toc_lookup would fail with an error rather than silently returning NULL in the hope that such failures would be reported in a useful way rather than via a system crash. However, it overlooked the fact that the lookup of PARALLEL_KEY_ERROR_QUEUE in ReinitializeParallelDSM is expected to fail when no DSM segment was created in the first place; in that case, we end up with a backend-private memory segment that still contains an entry for PARALLEL_KEY_FIXED but no others. Consequently a benign failure to initialize parallelism can escalate into an elog(ERROR); repair. Discussion: http://postgr.es/m/CA+Tgmob8LFw55DzH1QEREpBEA9RJ_W_amhBFCVZ6WMwUhVpOqg@mail.gmail.com	2017-11-28 12:15:38 -05:00
Robert Haas	c6755e233b	Teach bitmap heap scan to cope with absence of a DSA. If we have a plan that uses parallelism but are unable to execute it using parallelism, for example due to a lack of available DSM segments, then the EState's es_query_dsa will be NULL. Parallel bitmap heap scan needs to fall back to a non-parallel scan in such cases. Patch by me, reviewed by Dilip Kumar Discussion: http://postgr.es/m/CAEepm=0kADK5inNf_KuemjX=HQ=PuTP0DykM--fO5jS5ePVFEA@mail.gmail.com	2017-11-28 11:44:59 -05:00
Robert Haas	7b88d63a91	Add null test to partition constraint for default range partitions. Non-default range partitions have a constraint which include null tests, and both default and non-default list partitions also have a constraint which includes null tests, but for some reason this was missed for default range partitions. This could cause the partition constraint to evaluate to false for rows that were (correctly) routed to that partition by insert tuple routing, which could in turn cause constraint exclusion to prune the default partition in cases where it should not. Amit Langote, reviewed by Kyotaro Horiguchi Discussion: http://postgr.es/m/ba7aaeb1-4399-220e-70b4-62eade1522d0@lab.ntt.co.jp	2017-11-28 10:51:01 -05:00
Robert Haas	487a0c1518	Fix typo. Masahiko Sawada Discussion: http://postgr.es/m/CAD21AoCq_QG6UEo6yb-purmhLQjDLsCA2_B+8Wh3ah9P2-XmrQ@mail.gmail.com	2017-11-28 08:19:01 -05:00
Tom Lane	cb03fa33ae	Fix assorted syscache lookup sloppiness in partition-related code. heap_drop_with_catalog and ATExecDetachPartition neglected to check for SearchSysCache failures, as noted in bugs #14927 and #14928 from Pan Bian. Such failures are pretty unlikely, since we should already have some sort of lock on the rel at these points, but it's neither a good idea nor per project style to omit a check for failure. Also, StorePartitionKey contained a syscache lookup that it never did anything with, including never releasing the result. Presumably the reason why we don't see refcount-leak complaints is that the lookup always fails; but in any case it's pretty useless, so remove it. All of these errors were evidently introduced by the relation partitioning feature. Back-patch to v10 where that came in. Amit Langote and Tom Lane Discussion: https://postgr.es/m/20171127090105.1463.3962@wrigleys.postgresql.org Discussion: https://postgr.es/m/20171127091341.1468.72696@wrigleys.postgresql.org	2017-11-27 19:22:08 -05:00
Tom Lane	9a785ad573	Fix creation of resjunk tlist entries for inherited mixed UPDATE/DELETE. rewriteTargetListUD's processing is dependent on the relkind of the query's target table. That was fine at the time it was made to act that way, even for queries on inheritance trees, because all tables in an inheritance tree would necessarily be plain tables. However, the 9.5 feature addition allowing some members of an inheritance tree to be foreign tables broke the assumption that rewriteTargetListUD's output tlist could be applied to all child tables with nothing more than column-number mapping. This led to visible failures if foreign child tables had row-level triggers, and would also break in cases where child tables belonged to FDWs that used methods other than CTID for row identification. To fix, delay running rewriteTargetListUD until after the planner has expanded inheritance, so that it is applied separately to the (already mapped) tlist for each child table. We can conveniently call it from preprocess_targetlist. Refactor associated code slightly to avoid the need to heap_open the target relation multiple times during preprocess_targetlist. (The APIs remain a bit ugly, particularly around the point of which steps scribble on parse->targetList and which don't. But avoiding such scribbling would require a change in FDW callback APIs, which is more pain than it's worth.) Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when we transition from rows providing a CTID to rows that don't. (That's really an independent bug, but it manifests in much the same cases.) Add a regression test checking one manifestation of this problem, which was that row-level triggers on a foreign child table did not work right. Back-patch to 9.5 where the problem was introduced. Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru	2017-11-27 17:54:07 -05:00
Simon Riggs	59af8d4384	Pad XLogReaderState's per-buffer data_bufsz more aggressively. Originally, we palloc'd this buffer just barely big enough to hold the largest xlog backup block seen so far. We now MAXALIGN the palloc size. The original coding could result in many repeated palloc cycles, in the worst case where we see a series ofgradually larger xlog records. We ameliorate that similarly to `8735978e7a` by imposing a minimum buffer size of BLCKSZ. Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org	2017-11-27 09:34:05 +00:00
Tom Lane	8735978e7a	Pad XLogReaderState's main_data buffer more aggressively. Originally, we palloc'd this buffer just barely big enough to hold the largest xlog record seen so far. It turns out that that can result in valgrind complaints, because some compilers will emit code that assumes it can safely fetch padding bytes at the end of a struct, and those padding bytes were unallocated so far as aset.c was concerned. We can fix that by MAXALIGN'ing the palloc request size, ensuring that it is big enough to include any possible padding that might've been omitted from the on-disk record. An additional objection to the original coding is that it could result in many repeated palloc cycles, in the worst case where we see a series of gradually larger xlog records. We can ameliorate that cheaply by imposing a minimum buffer size that's large enough for most xlog records. BLCKSZ/2 was chosen after a bit of discussion. In passing, remove an obsolete comment in struct xl_heap_new_cid that the combocid field is free due to alignment considerations. Perhaps that was true at some point, but it's not now. Back-patch to 9.5 where this code came in. Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org	2017-11-26 15:17:24 -05:00
Joe Conway	752714dd9d	Make has_sequence_privilege support WITH GRANT OPTION The various has_*_privilege() functions all support an optional WITH GRANT OPTION added to the supported privilege types to test whether the privilege is held with grant option. That is, all except has_sequence_privilege() variations. Fix that. Back-patch to all supported branches. Discussion: https://postgr.es/m/005147f6-8280-42e9-5a03-dd2c1e4397ef@joeconway.com	2017-11-26 09:49:40 -08:00
Tom Lane	9b63c13f0a	Repair failure with SubPlans in multi-row VALUES lists. When nodeValuesscan.c was written, it was impossible to have a SubPlan in VALUES --- any sub-SELECT there would have to be uncorrelated and thereby would produce an InitPlan instead. We therefore took a shortcut in the logic that throws away a ValuesScan's per-row expression evaluation data structures. This was broken by the introduction of LATERAL however; a sub-SELECT containing a lateral reference produces a correlated SubPlan. The cleanest fix for this would be to give up the optimization of discarding the expression eval state. But that still seems pretty unappetizing for long VALUES lists. It seems to work to just prevent the subexpressions from hooking into the ValuesScan node's subPlan list, so let's do that and see how well it works. (If this breaks, due to additional connections between the subexpressions and the outer query structures, we might consider compromises like throwing away data only for VALUES rows not containing SubPlans.) Per bug #14924 from Christian Duta. Back-patch to 9.3 where LATERAL was introduced. Discussion: https://postgr.es/m/20171124120836.1463.5310@wrigleys.postgresql.org	2017-11-25 14:15:48 -05:00
Tom Lane	ab97aaac8f	Update buffile.h/.c comments for removal of non-temp option. Commit `11e264517` removed BufFile's isTemp flag, thereby eliminating the possibility of resurrecting BufFileCreate(). But it left that function in place, as well as a bunch of comments describing how things worked for the non-temp-file case. At best, that's now a source of confusion. So remove the long-since-commented-out function and change relevant comments. I (tgl) wanted to rename BufFileCreateTemp() to BufFileCreate(), but that seems not to be the consensus position, so leave it as-is. In passing, fix commit f0828b2fc's failure to update BufFileSeek's comment to match the change of its argument type from long to off_t. (I think that might actually have been intentional at the time, but now that 64-bit off_t is nearly universal, it looks anachronistic.) Thomas Munro and Tom Lane Discussion: https://postgr.es/m/E1eFVyl-0008J1-RO@gemulon.postgresql.org	2017-11-25 13:19:43 -05:00
Tom Lane	df3a66e282	Improve planner's handling of set-returning functions in grouping columns. Improve query_is_distinct_for() to accept SRFs in the targetlist when we can prove distinctness from a DISTINCT clause. In that case the de-duplication will surely happen after SRF expansion, so the proof still works. Continue to punt in the case where we'd try to prove distinctness from GROUP BY (or, in the future, source relations). To do that, we'd have to determine whether the SRFs were in the grouping columns or elsewhere in the tlist, and it still doesn't seem worth the trouble. But this trivial change allows us to recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces unique-ified output, which seems worth having. Also, fix estimate_num_groups() to consider the possibility of SRFs in the grouping columns. Its failure to do so was masked before v10 because grouping_planner() scaled up plan rowcount estimates by the estimated SRF multiplier after performing grouping. That doesn't happen anymore, which is more correct, but it means we need an adjustment in the estimate for the number of groups. Failure to do this leads to an underestimate for the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)" compared to what 9.6 and earlier estimated, thus breaking plan choices in some cases. Per report from Dmitry Shalashov. Back-patch to v10 to avoid degraded plan choices compared to previous releases. Discussion: https://postgr.es/m/CAKPeCUGAeHgoh5O=SvcQxREVkoX7UdeJUMj1F5=aBNvoTa+O8w@mail.gmail.com	2017-11-25 11:48:09 -05:00
Robert Haas	b10967eddf	Avoid projecting tuples unnecessarily in Gather and Gather Merge. It's most often the case that the target list for the Gather (Merge) node matches the target list supplied by the underlying plan node; when this is so, we can avoid the overhead of projecting. This depends on commit `f455e1125e` for proper functioning. Idea by Andres Freund. Patch by me. Review by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com	2017-11-25 10:49:17 -05:00
Tom Lane	0f2458ff5f	Improve valgrind logic in aset.c, and fix multiple issues in generation.c. Revise aset.c so that all the "private" fields of chunk headers are marked NOACCESS when outside the module, improving on the previous coding which protected only requested_size. Fix a couple of corner case bugs, such as failing to re-protect the header during a failure exit from AllocSetRealloc, and wrong padding-size calculation for an oversize allocation request. Apply the same design to generation.c, and also fix several bugs therein that I found by dint of hacking the code to use generation.c as the standard allocator and then running the core regression tests with it. Notably, we have to track the actual size of each block, else the wipe_mem call in GenerationReset clears the wrong amount of memory for an oversize-chunk block; and GenerationCheck needs a way of identifying freed chunks that isn't fooled by palloc(0). I chose to fix the latter by resetting the context pointer to NULL in a freed chunk, roughly like what happens in a freed aset.c chunk. Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org	2017-11-24 19:28:19 -05:00
Tom Lane	f65d21b258	Mostly-cosmetic improvements in memory chunk header alignment coding. Add commentary about what we're doing and why. Apply the method used for padding in GenerationChunk to AllocChunkData, replacing the rather ad-hoc solution used in commit `7e3aa03b4`. Reorder fields in GenerationChunk so that the padding calculation will work even if sizeof(size_t) is different from sizeof(void *) --- likely that will never happen, but we don't need the assumption if we do it like this. Improve static assertions about alignment. In passing, fix a couple of oversights in the "large chunk" path in GenerationAlloc(). Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org	2017-11-24 15:50:22 -05:00
Tom Lane	cc3c4af4a9	Fix bug in generation.c's valgrind support. This doesn't look like the last such bug, but it's one that the test_decoding regression test is tripping over. Per buildfarm. Tomas Vondra Discussion: https://postgr.es/m/c903f275-2150-fa52-64bf-dca7b53ebf8d@fuzzy.cz	2017-11-24 13:43:34 -05:00
Dean Rasheed	9c55391f0f	RLS comment fixes. The comments in get_policies_for_relation() say that CREATE POLICY does not support defining restrictive policies. This is no longer true, starting from PG10.	2017-11-24 14:14:40 +00:00
Andres Freund	59b71c6fe6	Fix handling of NULLs returned by aggregate combine functions. When strict aggregate combine functions, used in multi-stage/parallel aggregation, returned NULL, we didn't check for that, invoking the combine function with NULL the next round, despite it being strict. The equivalent code invoking normal transition functions has a check for that situation, which did not get copied in `a7de3dc5c3`. Fix the bug by adding the equivalent check. Based on a quick look I could not find any strict combine functions in core actually returning NULL, and it doesn't seem very likely external users have done so. So this isn't likely to have caused issues in practice. Add tests verifying transition / combine functions returning NULL is tested. Reported-By: Andres Freund Author: Andres Freund Discussion: https://postgr.es/m/20171121033642.7xvmjqrl4jdaaat3@alap3.anarazel.de Backpatch: 9.6, where parallel aggregation was introduced	2017-11-23 17:15:27 -08:00
Tom Lane	07bd77b95a	Ensure sizeof(GenerationChunk) is maxaligned. Per buildfarm. Also improve some comments.	2017-11-23 17:02:15 -05:00
Simon Riggs	b99661c2ff	Tweak code for older compilers Attempt to quiesce build farm Author: Tomas Vondra	2017-11-23 06:55:18 +11:00
Simon Riggs	a4ccc1cef5	Generational memory allocator Add new style of memory allocator, known as Generational appropriate for use in cases where memory is allocated and then freed in roughly oldest first order (FIFO). Use new allocator for logical decoding’s reorderbuffer to significantly reduce memory usage and improve performance. Author: Tomas Vondra Reviewed-by: Simon Riggs	2017-11-23 05:45:07 +11:00
Simon Riggs	7e17a6889a	Set es_output_cid in replication worker Allows triggers to operate correctly Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>	2017-11-22 16:28:14 +11:00
Robert Haas	ae65f6066d	Provide for forward compatibility with future minor protocol versions. Previously, any attempt to request a 3.x protocol version other than 3.0 would lead to a hard connection failure, which made the minor protocol version really no different from the major protocol version and precluded gentle protocol version breaks. Instead, when the client requests a 3.x protocol version where x is greater than 0, send the new NegotiateProtocolVersion message to convey that we support only 3.0. This makes it possible to introduce new minor protocol versions without requiring a connection retry when the server is older. In addition, if the startup packet includes name/value pairs where the name starts with "_pq_.", assume that those are protocol options, not GUCs. Include those we don't support (i.e. all of them, at present) in the NegotiateProtocolVersion message so that the client knows they were not understood. This makes it possible for the client to request previously-unsupported features without bumping the protocol version at all; the client can tell from the server's response whether the option was understood. It will take some time before servers that support these new facilities become common in the wild; to speed things up and make things easier for a future 3.1 protocol version, back-patch to all supported releases. Robert Haas and Badrul Chowdhury Discussion: http://postgr.es/m/BN6PR21MB0772FFA0CBD298B76017744CD1730@BN6PR21MB0772.namprd21.prod.outlook.com Discussion: http://postgr.es/m/30788.1498672033@sss.pgh.pa.us	2017-11-21 13:56:24 -05:00
Robert Haas	f3b0897a12	Fix multiple problems with satisfies_hash_partition. Fix the function header comment to describe the actual behavior. Check that table OID, modulus, and remainder arguments are not NULL before accessing them. Check that the modulus and remainder are sensible. If the table OID doesn't exist, return NULL instead of emitting an internal error, similar to what we do elsewhere. Check that the actual argument types match, or at least are binary coercible to, the expected argument types. Correctly handle invocation of this function using the VARIADIC syntax. Add regression tests. Robert Haas and Amul Sul, per a report by Andreas Seltenreich and subsequent followup investigation. Discussion: http://postgr.es/m/871sl4sdrv.fsf@ansel.ydns.eu	2017-11-21 13:06:32 -05:00
Tom Lane	84669c9b06	Use out-of-line M68K spinlock code for OpenBSD as well as NetBSD. David Carlier (from a patch being carried by OpenBSD packagers) Discussion: https://postgr.es/m/CA+XhMqzwFSGVU7MEnfhCecc8YdP98tigXzzpd0AAdwaGwaVXEA@mail.gmail.com	2017-11-20 18:05:17 -05:00
Simon Riggs	2ede45c3a4	Fix pg_control_checkpoint from commit `4b0d28de06` Author: Simon Riggs <simon@2ndQuadrant.com> Reported-By: Andreas Seltenreich <seltenreich@gmx.de>	2017-11-21 08:00:54 +11:00
Robert Haas	0510421ee0	Tweak use of ExecContextForcesOids by Gather (Merge). Specifically, pass the outer plan's PlanState instead of our own PlanState. At present, ExecContextForcesOids doesn't actually care which PlanState we pass; it just looks through to the underlying EState to find the result relation or top-level eflags. However, in the future it might care. If that happens, and if our goal is to get a tuple descriptor that matches that of the outer plan, then I think what we care about is whether the outer plan's context forces OIDs, rather than whether our own context forces OIDs, just as we use the outer node's target list rather than our own. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com	2017-11-20 12:34:06 -05:00
Robert Haas	f455e1125e	Pass eflags down to parallel workers. Currently, there are no known consequences of this oversight, so no back-patch. Several of the EXEC_FLAG_* constants aren't usable in parallel mode anyway, and potential problems related to the presence or absence of OIDs (see EXEC_FLAG_WITH_OIDS, EXEC_FLAG_WITHOUT_OIDS) seem at present to be masked by the unconditional projection step performed by Gather and Gather Merge. In general, however, it seems important that all participants agree on the values of these flags, which modify executor behavior globally, and a pending patch to skip projection in Gather (Merge) would be outright broken in certain cases without this fix. Patch by me, based on investigation of a test case provided by Amit Kapila. This patch was also reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com	2017-11-20 12:00:33 -05:00
Simon Riggs	c2513365a0	Parameter toast_tuple_target controls TOAST for new rows Specifies the point at which we try to move long column values into TOAST tables. No effect on existing rows. Discussion: https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2Ya00OyQQ@mail.gmail.com Author: Simon Riggs <simon@2ndQudrant.com> Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndQuadrant.com>	2017-11-20 09:50:10 +11:00
Tom Lane	52f63bd916	Fix compiler warning in rangetypes_spgist.c. On gcc 7.2.0, comparing pointer to (Datum) 0 produces a warning. Treat it as a simple pointer to avoid that; this is more consistent with comparable code elsewhere, anyway. Tomas Vondra Discussion: https://postgr.es/m/99410021-61ef-9a9a-9bc8-f733ece637ee@2ndquadrant.com	2017-11-18 16:46:29 -05:00
Tom Lane	4797f9b519	Merge near-duplicate code in RI triggers. Merge ri_restrict_del and ri_restrict_upd into one function ri_restrict. Create a function ri_setnull that is the common implementation of RI_FKey_setnull_del and RI_FKey_setnull_upd. Likewise create a function ri_setdefault that is the common implementation of RI_FKey_setdefault_del and RI_FKey_setdefault_upd. All of these pairs of functions were identical except for needing to check for no-actual-key-change in the UPDATE cases; the one extra if-test is a small price to pay for saving so much code. Aside from removing about 400 lines of essentially duplicate code, this allows us to recognize that we were uselessly caching two identical plans whenever there were pairs of triggers using these duplicated functions (which is likely very common). Ildar Musin, reviewed by Ildus Kurbangaliev Discussion: https://postgr.es/m/ca7064a7-6adc-6f22-ca47-8615ba9425a5@postgrespro.ru	2017-11-18 16:24:05 -05:00
Tom Lane	976a1a48fc	Improve to_date/to_number/to_timestamp behavior with multibyte characters. The documentation says that these functions skip one input character per literal (non-pattern) format character. Actually, though, they skipped one input byte per literal byte, which could be hugely confusing if either data or format contained multibyte characters. To fix, adjust the FormatNode representation and parse_format() so that multibyte format characters are stored as one FormatNode not several, and adjust the data-skipping bits to advance by pg_mblen() not necessarily one byte. There's no user-visible behavior change on the to_char() side, although the internal representation changes. Commit `e87d4965b` had already fixed most places where we skip characters on the basis of non-literal format patterns to advance by characters not bytes, but this gets one more place, the SKIP_THth macro. I think everything in formatting.c gets that right now. It'd be nice to have some regression test cases covering this behavior; but of course there's no way to do so in an encoding-agnostic way, and many of the interesting aspects would also require unportable locale selections. So I've not bothered here. Discussion: https://postgr.es/m/28186.1510957703@sss.pgh.pa.us	2017-11-18 12:42:52 -05:00
Tom Lane	63ca86318d	Fix quoted-substring handling in format parsing for to_char/to_number/etc. This code evidently intended to treat backslash as an escape character within double-quoted substrings, but it was sufficiently confused that cases like ..."foo\\"... did not work right: the second backslash managed to quote the double-quote after it, despite being quoted itself. Rewrite to get that right, while preserving the existing behavior outside double-quoted substrings, which is that backslash isn't special except in the combination \". Comparing to Oracle, it seems that their version of to_char() for timestamps allows literal alphanumerics only within double quotes, while non-alphanumerics are allowed outside quotes; backslashes aren't special anywhere; there is no way at all to emit a literal double quote. (Bizarrely, their to_char() for numbers is different; it doesn't allow literal text at all AFAICT.) The fact that they don't treat backslash as special justifies our existing behavior for backslash outside double quotes. I considered making backslash inside double quotes act the same way (ie, special only if before "), which in a green field would be a more consistent behavior. But that would likely break more existing SQL code than what this patch does. Add some test cases illustrating this behavior. (Only the last new case actually changes behavior in this commit.) Little of this behavior was documented, either, so fix that. Discussion: https://postgr.es/m/3626.1510949486@sss.pgh.pa.us	2017-11-18 12:16:37 -05:00
Peter Eisentraut	9288d62bb4	Support channel binding 'tls-unique' in SCRAM This is the basic feature set using OpenSSL to support the feature. In order to allow the frontend and the backend to fetch the sent and expected TLS Finished messages, a PG-like API is added to be able to make the interface pluggable for other SSL implementations. This commit also adds a infrastructure to facilitate the addition of future channel binding types as well as libpq parameters to control the SASL mechanism names and channel binding names. Those will be added by upcoming commits. Some tests are added to the SSL test suite to test SCRAM authentication with channel binding. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>	2017-11-18 10:15:54 -05:00
Robert Haas	611fe7d479	Update postgresql.conf.sample comment for bgwriter_lru_maxpages Commit `14ca9abfbe` should have done this, but did not. Jeff Janes Discussion: http://postgr.es/m/CAMkU=1yWOvL+YFYzGM9yXSoWjxr_5_Ny78pPzLKQCkfgB7H-JQ@mail.gmail.com	2017-11-17 14:52:00 -05:00
Tom Lane	e87d4965bd	Prevent to_number() from losing data when template doesn't match exactly. Non-data template patterns would consume characters whether or not those characters were what the pattern expected, for example SELECT TO_NUMBER('1234', '9,999'); produced 134 because the '2' got eaten by the comma pattern. This seems undesirable, not least because it doesn't happen in Oracle. For the ',' and 'G' template patterns, we can fix this by consuming characters only if they match what the pattern would output. For non-data patterns such as 'L' and 'TH', it seems impractical to tighten things up to the point of consuming only exact matches to what the pattern would output; but we can improve matters quite a lot by redefining the behavior as "consume only characters that aren't digits, signs, decimal point, or comma". Also, fix it so that the behavior is to consume the number of characters the pattern would output, not the number of bytes. The old coding would do surprising things with non-ASCII currency symbols, for example. (It would be good to apply that rule for literal text as well, but this commit only fixes it for non-data patterns.) Oliver Ford, reviewed by Thomas Munro and Nathan Wagner, and whacked around a bit more by me Discussion: https://postgr.es/m/CAGMVOdvpbMqPf9XWNzOwBpzJfErkydr_fEGhmuDGa015z97mwg@mail.gmail.com	2017-11-17 12:04:13 -05:00
Andres Freund	11e264517d	Remove BufFile's isTemp flag. The isTemp flag controls whether buffile.c chops BufFile data up into 1GB segments on disk. Since it was badly named and always true, get rid of it. Author: Thomas Munro (based on suggestion by Peter Geoghegan) Reviewed-By: Peter Geoghegan, Andres Freund Discussion: https://postgr.es/m/CAH2-Wz%3D%2B9Rfqh5UdvdW9rGezdhrMGGH-JL1X9FXXVZdeeGeOJA%40mail.gmail.com	2017-11-16 17:52:57 -08:00
Andres Freund	7082e614c0	Provide DSM segment to ExecXXXInitializeWorker functions. Previously, executor nodes running in parallel worker processes didn't have access to the dsm_segment object used for parallel execution. In order to support resource management based on DSM segment lifetime, they need that. So create a ParallelWorkerContext object to hold it and pass it to all InitializeWorker functions. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com	2017-11-16 17:39:18 -08:00
Tom Lane	687f096ea9	Make PL/Python handle domain-type conversions correctly. Fix PL/Python so that it can handle domains over composite, and so that it enforces domain constraints correctly in other cases that were not always done properly before. Notably, it didn't do arrays of domains right (oversight in commit `c12d570fa`), and it failed to enforce domain constraints when returning a composite type containing a domain field, and if a transform function is being used for a domain's base type then it failed to enforce domain constraints on the result. Also, in many places it missed checking domain constraints on null values, because the plpy_typeio code simply wasn't called for Py_None. Rather than try to band-aid these problems, I made a significant refactoring of the plpy_typeio logic. The existing design of recursing for array and composite members is extended to also treat domains as containers requiring recursion, and the APIs for the module are cleaned up and simplified. The patch also modifies plpy_typeio to rely on the typcache more than it did before (which was pretty much not at all). This reduces the need for repetitive lookups, and lets us get rid of an ad-hoc scheme for detecting changes in composite types. I added a couple of small features to typcache to help with that. Although some of this is fixing bugs that long predate v11, I don't think we should risk a back-patch: it's a significant amount of code churn, and there've been no complaints from the field about the bugs. Tom Lane, reviewed by Anthony Bykov Discussion: https://postgr.es/m/24449.1509393613@sss.pgh.pa.us	2017-11-16 16:23:04 -05:00
Robert Haas	3b2787e1f8	Fix broken cleanup interlock for GIN pending list. The pending list must (for correctness) always be cleaned up by vacuum, and should (for the avoidance of surprising behavior) always be cleaned up by an explicit call to gin_clean_pending_list, but cleanup is optional when inserting. The old logic got this backward: cleanup was forced if (stats == NULL), but that's going to be false when vacuuming and true for inserts. Masahiko Sawada, reviewed by me. Discussion: http://postgr.es/m/CAD21AoBLUSyiYKnTYtSAbC+F=XDjiaBrOUEGK+zUXdQ8owfPKw@mail.gmail.com	2017-11-16 14:19:27 -05:00
Robert Haas	6b2cd278a9	Fix typo in comment. Etsuro Fujita Discussion: http://postgr.es/m/5A0D7C3D.80803@lab.ntt.co.jp	2017-11-16 14:16:45 -05:00
Robert Haas	79f2d63713	Update postgresql.conf.sample to match pg_settings classificaitons. A handful of settings, most notably shared_preload_libraries, were just plain the wrong place compared to their assigned config_group value in guc.c (and thus pg_settings). In other cases the names of the sections in postgresql.conf.sample were mildly different from the corresponding entries in config_group_names[]. Make it all consistent. Adrián Escoms, reviewed by me. Discussion: http://postgr.es/m/CACksPC2veEmFRYqwYepWYO9U7aFhAx6sYq+WqjTyHw7uV=E=pw@mail.gmail.com	2017-11-16 12:57:17 -05:00
Robert Haas	e89a71fb44	Pass InitPlan values to workers via Gather (Merge). If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan that computes it is attached to or above the Gather (Merge), force the value to be computed before starting parallelism and pass it down to all workers. This allows us to use parallelism in cases where it previously would have had to be rejected as unsafe. We do - in this case - lose the optimization that the value is only computed if it's actually used. An alternative strategy would be to have the first worker that needs the value compute it, but one downside of that approach is that we'd then need to select a parallel-safe path to compute the parameter value; it couldn't for example contain a Gather (Merge) node. At some point in the future, we might want to consider both approaches. Independent of that consideration, there is a great deal more work that could be done to make more kinds of PARAM_EXEC parameters parallel-safe. This infrastructure could be used to allow a Gather (Merge) on the inner side of a nested loop (although that's not a very appealing plan) and cases where the InitPlan is attached below the Gather (Merge) could be addressed as well using various techniques. But this is a good start. Amit Kapila, reviewed and revised by me. Reviewing and testing from Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja. Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com	2017-11-16 12:06:14 -05:00
Andrew Dunstan	98d54bb779	Back out the session_start and session_end hooks feature. It's become apparent during testing that there are problems with at least the testing regime. I don't think we should have it without a working test regime, and the difficulties might indicate implementation problems anyway, so I'm backing out the whole thing until that's sorted out. This reverts commits `7459484` `9989f92` `cd8ce3a`	2017-11-16 11:35:02 -05:00
Robert Haas	4e5fe9ad19	Centralize executor-related partitioning code. Some code is moved from partition.c, which has grown very quickly lately; splitting the executor parts out might help to keep it from getting totally out of control. Other code is moved from execMain.c. All is moved to a new file execPartition.c. get_partition_for_tuple now has a new interface that more clearly separates executor concerns from generic concerns. Amit Langote. A slight comment tweak by me. Discussion: http://postgr.es/m/1f0985f8-3b61-8bc4-4350-baa6d804cb6d@lab.ntt.co.jp	2017-11-15 10:26:25 -05:00
Andrew Dunstan	cd8ce3a22c	Add hooks for session start and session end These hooks can be used in loadable modules. A simple test module is included. Discussion: https://postgr.es/m/20170720204733.40f2b7eb.nagata@sraoss.co.jp Fabrízio de Royes Mello and Yugo Nagata Reviewed by Michael Paquier and Aleksandr Parfenov	2017-11-15 10:16:34 -05:00
Robert Haas	ebc189e122	Fix typo. Jesper Pedersen Discussion: http://postgr.es/m/000f92d6-f623-95a5-b341-46e2c0495cea@redhat.com	2017-11-15 08:37:41 -05:00
Robert Haas	e5253fdc4f	Add parallel_leader_participation GUC. Sometimes, for testing, it's useful to have the leader do nothing but read tuples from workers; and it's possible that could work out better even in production. Thomas Munro, reviewed by Amit Kapila and by me. A few final tweaks by me. Discussion: http://postgr.es/m/CAEepm=2U++Lp3bNTv2Bv_kkr5NE2pOyHhxU=G0YTa4ZhSYhHiw@mail.gmail.com	2017-11-15 08:23:18 -05:00
Alvaro Herrera	a61f5ab986	Simplify index_[constraint_]create API Instead of passing large swaths of boolean arguments, define some flags that can be used in a bitmask. This makes it easier not only to figure out what each call site is doing, but also to add some new flags. The flags are split in two -- one set for index_create directly and another for constraints. index_create() itself receives both, and then passes down the latter to index_constraint_create(), which can also be called standalone. Discussion: https://postgr.es/m/20171023151251.j75uoe27gajdjmlm@alvherre.pgsql Reviewed-by: Simon Riggs	2017-11-14 15:19:05 +01:00
Robert Haas	44ae64c388	Push target list evaluation through Gather Merge. We already do this for Gather, but it got overlooked for Gather Merge. Amit Kapila, with review and minor revisions by Rushabh Lathia and by me. Discussion: http://postgr.es/m/CAA4eK1KUC5Uyu7qaifxrjpHxbSeoQh3yzwN3bThnJsmJcZ-qtA@mail.gmail.com	2017-11-13 16:37:42 -05:00
Robert Haas	e64861c79b	Track in the plan the types associated with PARAM_EXEC parameters. Up until now, we only tracked the number of parameters, which was sufficient to allocate an array of Datums of the appropriate size, but not sufficient to, for example, know how to serialize a Datum stored in one of those slots. An upcoming patch wants to do that, so add this tracking to make it possible. Patch by me, reviewed by Tom Lane and Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoYqpxDKn8koHdW8BEKk8FMUL0=e8m2Qe=M+r0UBjr3tuQ@mail.gmail.com	2017-11-13 15:24:12 -05:00
Stephen Frost	cfd8c87e16	Fix typo Determinisitcally -> Deterministically Author: Michael Paquier <michael.paquier@gmail.com> Discussion: https://postgr.es/m/CAB7nPqSauJ9gUMzj1aiXQVxqEkyko+WZ+wUac8_hB_M_bO6U_A@mail.gmail.com	2017-11-13 09:40:30 -05:00
Noah Misch	cbfffee41c	Install Windows crash dump handler before all else. Apart from calling write_stderr() on failure, the handler depends on no PostgreSQL facilities. We have experienced crashes before reaching the former call site. Given such an early crash, this change cannot hurt and may produce a helpful dump. Absent an early crash, this change has no effect. Back-patch to 9.3 (all supported versions). Takayuki Tsunakawa Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F80CD13@G01JPEXMBYT05	2017-11-12 14:31:00 -08:00
Noah Misch	e02571b73f	Don't call pgwin32_message_to_UTF16() without CurrentMemoryContext. PostgreSQL running as a Windows service crashed upon calling write_stderr() before MemoryContextInit(). This fix completes work started in `5735efee15`. Messages this early contain only ASCII bytes; if we removed the CurrentMemoryContext requirement, the ensuing conversions would have no effect. Back-patch to 9.3 (all supported versions). Takayuki Tsunakawa, reviewed by Michael Paquier. Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F80CC73@G01JPEXMBYT05	2017-11-12 13:03:15 -08:00
Noah Misch	2918fcedbf	Ignore XML declaration in xpath_internal(), for UTF8 databases. When a value contained an XML declaration naming some other encoding, this function interpreted UTF8 bytes as the named encoding, yielding mojibake. xml_parse() already has similar logic. This would be necessary but not sufficient for non-UTF8 databases, so preserve behavior there until the xpath facility can support such databases comprehensively. Back-patch to 9.3 (all supported versions). Pavel Stehule and Noah Misch Discussion: https://postgr.es/m/CAFj8pRC-dM=tT=QkGi+Achkm+gwPmjyOayGuUfXVumCxkDgYWg@mail.gmail.com	2017-11-11 11:10:53 -08:00
Robert Haas	5edc63bda6	Account for the effect of lossy pages when costing bitmap scans. Dilip Kumar, reviewed by Alexander Kumenkov, Amul Sul, and me. Some final adjustments by me. Discussion: http://postgr.es/m/CAFiTN-sYtqUOXQ4SpuhTv0Z9gD0si3YxZGv_PQAAMX8qbOotcg@mail.gmail.com	2017-11-10 16:50:50 -05:00
Peter Eisentraut	0c98d0dd5c	Fix some null pointer dereferences in LDAP auth code An LDAP URL without a host name such as "ldap://" or without a base DN such as "ldap://localhost" would cause a crash when reading pg_hba.conf. If no binddn is configured, an error message might end up trying to print a null pointer, which could crash on some platforms. Author: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-11-10 14:21:32 -05:00
Peter Eisentraut	0e1539ba0d	Add some const decorations to prototypes Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2017-11-10 13:38:57 -05:00
Robert Haas	b9941d3468	Fix incorrect comment. Etsuro Fujita Discussion: http://postgr.es/m/5A05728E.4050009@lab.ntt.co.jp	2017-11-10 10:55:09 -05:00
Robert Haas	1aba8e651a	Add hash partitioning. Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join. At present, we still depend on constraint exclusion for partitioning pruning, and the shape of the partition constraints for hash partitioning is such that that doesn't work. Work is underway to fix that, which should both improve performance and make partitioning pruning work with hash partitioning. Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me. A few final tweaks also by me. Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com	2017-11-09 18:07:44 -05:00
Tom Lane	ae20b23a9e	Refactor permissions checks for large objects. Up to now, ACL checks for large objects happened at the level of the SQL-callable functions, which led to CVE-2017-7548 because of a missing check. Push them down to be enforced in inv_api.c as much as possible, in hopes of preventing future bugs. This does have the effect of moving read and write permission errors to happen at lo_open time not loread or lowrite time, but that seems acceptable. Michael Paquier and Tom Lane Discussion: https://postgr.es/m/CAB7nPqRHmNOYbETnc_2EjsuzSM00Z+BWKv9sy6tnvSd5gWT_JA@mail.gmail.com	2017-11-09 12:56:07 -05:00
Tom Lane	5ecc0d738e	Restrict lo_import()/lo_export() via SQL permissions not hard-wired checks. While it's generally unwise to give permissions on these functions to anyone but a superuser, we've been moving away from hard-wired permission checks inside functions in favor of using the SQL permission system to control access. Bring lo_import() and lo_export() into compliance with that approach. In particular, this removes the manual configuration option ALLOW_DANGEROUS_LO_FUNCTIONS. That dates back to 1999 (commit `4cd4a54c8`); it's unlikely anyone has used it in many years. Moreover, if you really want such behavior, now you can get it with GRANT ... TO PUBLIC instead. Michael Paquier Discussion: https://postgr.es/m/CAB7nPqRHmNOYbETnc_2EjsuzSM00Z+BWKv9sy6tnvSd5gWT_JA@mail.gmail.com	2017-11-09 12:36:58 -05:00
Tom Lane	6c3a7ba5bb	Fix typo in ALTER SYSTEM output. The header comment written into postgresql.auto.conf by ALTER SYSTEM should match what initdb put there originally. Feike Steenbergen Discussion: https://postgr.es/m/CAK_s-G0KcKdO=0hqZkwb3s+tqZuuHwWqmF5BDsmoO9FtX75r0g@mail.gmail.com	2017-11-09 11:57:20 -05:00
Tom Lane	c5269472ea	Fix two violations of the ResourceOwnerEnlarge/Remember protocol. The point of having separate ResourceOwnerEnlargeFoo and ResourceOwnerRememberFoo functions is so that resource allocation can happen in between. Doing it in some other order is just wrong. OpenTemporaryFile() did open(), enlarge, remember, which would leak the open file if the enlarge step ran out of memory. Because fd.c has its own layer of resource-remembering, the consequences look like they'd be limited to an intratransaction FD leak, but it's still not good. IncrBufferRefCount() did enlarge, remember, incr-refcount, which would blow up if the incr-refcount step ever failed. It was safe enough when written, but since the introduction of PrivateRefCountHash, I think the assumption that no error could happen there is pretty shaky. The odds of real problems from either bug are probably small, but still, back-patch to supported branches. Thomas Munro and Tom Lane, per a comment from Andres Freund	2017-11-08 16:50:12 -05:00
Peter Eisentraut	2eb4a831e5	Change TRUE/FALSE to true/false The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-11-08 11:37:28 -05:00
Robert Haas	698e525437	Fix typo in comment. Masahiko Sawada Discussion: http://postgr.es/m/CAD21AoDrf5AOpZ-mX-j6O=zFNFfKaTdHkv3o1X2eSs2nBXALug@mail.gmail.com	2017-11-07 16:34:48 -05:00
Simon Riggs	4b0d28de06	Remove secondary checkpoint Previously server reserved WAL for last two checkpoints, which used too much disk space for small servers. Bumps PG_CONTROL_VERSION Author: Simon Riggs <simon@2ndQuadrant.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-11-07 12:56:30 -05:00
Simon Riggs	98267ee83e	Exclude pg_internal.init from BASE_BACKUP Add docs to explain this for other backup mechanisms Author: David Steele <david@pgmasters.net> Reviewed-by: Petr Jelinek <petr.jelinek@2ndQuadrant.com> et al	2017-11-07 12:28:35 -05:00
Dean Rasheed	87b2ebd352	Always require SELECT permission for ON CONFLICT DO UPDATE. The update path of an INSERT ... ON CONFLICT DO UPDATE requires SELECT permission on the columns of the arbiter index, but it failed to check for that in the case of an arbiter specified by constraint name. In addition, for a table with row level security enabled, it failed to check updated rows against the table's SELECT policies when the update path was taken (regardless of how the arbiter index was specified). Backpatch to 9.5 where ON CONFLICT DO UPDATE and RLS were introduced. Security: CVE-2017-15099	2017-11-06 09:19:22 +00:00
Noah Misch	bab3a714b6	Ignore CatalogSnapshot when checking COPY FREEZE prerequisites. This restores the ability, essentially lost in commit `ffaa44cb55`, to use COPY FREEZE under REPEATABLE READ isolation. Back-patch to 9.4, like that commit. Reviewed by Tom Lane. Discussion: https://postgr.es/m/CA+TgmoahWDm-7fperBxzU9uZ99LPMUmEpSXLTw9TmrOgzwnORw@mail.gmail.com	2017-11-05 09:25:52 -08:00
Peter Eisentraut	cb29ff8315	Fix incorrect use of bool NSUnLinkModule() doesn't take a bool as second argument but one of set of specific constants. The numeric values are the same in this case, but clean it up while we're cleaning up bool use elsewhere. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-11-04 12:00:58 -04:00
Tom Lane	a9169f0200	Avoid looping through line pointers twice in PageRepairFragmentation(). There doesn't seem to be any good reason to do the filling of the itemidbase[] array separately from the first traversal of the pointers. It's certainly not a win if there are any line pointers with storage, and even if there aren't, this change doesn't insert code into the part of the first loop that will be traversed in that case. So let's just merge the two loops. Yura Sokolov, reviewed by Claudio Freire Discussion: https://postgr.es/m/e49befcc6f1d7099834c6fdf5c675a60@postgrespro.ru	2017-11-03 17:21:59 -04:00
Tom Lane	4c11d2c559	Flag index metapages as standard-format in xlog.c calls. btree, hash, and bloom indexes all set up their metapages in standard format (that is, with pd_lower and pd_upper correctly delimiting the unused area); but they mostly didn't inform the xlog routines of this. When calling log_newpage[_buffer], this is bad because it loses the opportunity to compress unused data out of the WAL record. When calling XLogRegisterBuffer, it's not such a performance problem because all of these call sites also use REGBUF_WILL_INIT, preventing an FPI image from being written. But it's still a good idea to provide the flag when relevant, because that aids WAL consistency checking. This completes the project of getting all the in-core index AMs to handle their metapage WAL operations similarly. Amit Kapila, reviewed by Michael Paquier Discussion: https://postgr.es/m/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp	2017-11-03 16:31:32 -04:00
Alvaro Herrera	1b890562b8	Fix thinkos in BRIN summarization The previous commit contained a thinko that made a single-range summarization request process from there to end of table. Fix by setting the correct end range point. Per buildfarm.	2017-11-03 20:45:36 +01:00
Peter Eisentraut	a9fce66729	Don't reset additional columns on subscriber to NULL on UPDATE When a publisher table has fewer columns than a subscriber, the update of a row on the publisher should result in updating of only the columns in common. The previous coding mistakenly reset the values of additional columns on the subscriber to NULL because it failed to skip updates of columns not found in the attribute map. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-11-03 12:27:59 -04:00
Alvaro Herrera	ec42a1dcb3	Fix BRIN summarization concurrent with extension If a process is extending a table concurrently with some BRIN summarization process, it is possible for the latter to miss pages added by the former because the number of pages is computed ahead of time. Fix by determining a fresh relation size after inserting the placeholder tuple: any process that further extends the table concurrently will update the placeholder tuple, while previous pages will be processed by the heap scan. Reported-by: Tomas Vondra Reviewed-by: Tom Lane Author: Álvaro Herrera Discussion: https://postgr.es/m/083d996a-4a8a-0e13-800a-851dd09ad8cc@2ndquadrant.com Backpatch-to: 9.5	2017-11-03 17:23:13 +01:00
Tom Lane	81e334ce4e	Set the metapage's pd_lower correctly in brin, gin, and spgist indexes. Previously, these index types left the pd_lower field set to the default SizeOfPageHeaderData, which is really a lie because it ought to point past whatever space is being used for metadata. The coding accidentally failed to fail because we never told xlog.c that the metapage is of standard format --- but that's not very good, because it impedes WAL consistency checking, and in some cases prevents compression of full-page images. To fix, ensure that we set pd_lower correctly, not only when creating a metapage but whenever we write it out (these apparently redundant steps are needed to cope with pg_upgrade'd indexes that don't yet contain the right value). This allows telling xlog.c that the page is of standard format. The WAL consistency check mask functions are made to mask only if pd_lower appears valid, which I think is likely unnecessary complication, since any metapage appearing in a v11 WAL stream should contain valid pd_lower. But it doesn't cost much to be paranoid. Amit Langote, reviewed by Michael Paquier and Amit Kapila Discussion: https://postgr.es/m/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp	2017-11-02 17:22:08 -04:00
Tom Lane	62a16572d5	Fix corner-case errors in brin_doupdate(). In some cases the BRIN code releases lock on an index page, and later re-acquires lock and tries to check that the tuple it was working on is still there. That check was a couple bricks shy of a load. It didn't consider that the page might have turned into a "revmap" page. (The samepage code path doesn't call brin_getinsertbuffer(), so it isn't protected by the checks for revmap status there.) It also didn't check whether the tuple offset was now off the end of the linepointer array. Since commit `24992c6db` the latter case is pretty common, but at least in principle it could have occurred before that. The net result is that concurrent updates of a BRIN index could fail with errors like "invalid index offnum" or "inconsistent range map". Per report from Tomas Vondra. Back-patch to 9.5, since this code is substantially the same in all versions containing BRIN. Discussion: https://postgr.es/m/10d2b9f9-f427-03b8-8ad9-6af4ecacbee9@2ndquadrant.com	2017-11-02 12:54:55 -04:00
Tom Lane	7b6c075471	Teach planner to account for HAVING quals in aggregation plan nodes. For some reason, we have never accounted for either the evaluation cost or the selectivity of filter conditions attached to Agg and Group nodes (which, in practice, are always conditions from a HAVING clause). Applying our regular selectivity logic to post-grouping conditions is a bit bogus, but it's surely better than taking the selectivity as 1.0. Perhaps someday the extended-statistics mechanism can be taught to provide statistics that would help us in getting non-default estimates here. Per a gripe from Benjamin Coutu. This is surely a bug fix, but I'm hesitant to back-patch because of the prospect of destabilizing existing plan choices. Given that it took us this long to notice the bug, it's probably not hurting too many people in the field. Discussion: https://postgr.es/m/20968.1509486337@sss.pgh.pa.us	2017-11-02 11:24:12 -04:00
Alvaro Herrera	c6764eb3ae	Revert bogus fixes of HOT-freezing bug It turns out we misdiagnosed what the real problem was. Revert the previous changes, because they may have worse consequences going forward. A better fix is forthcoming. The simplistic test case is kept, though disabled. Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de	2017-11-02 15:51:41 +01:00
Tom Lane	7c70996ebf	Allow bitmap scans to operate as index-only scans when possible. If we don't have to return any columns from heap tuples, and there's no need to recheck qual conditions, and the heap page is all-visible, then we can skip fetching the heap page altogether. Skip prefetching pages too, when possible, on the assumption that the recheck flag will remain the same from one page to the next. While that assumption is hardly bulletproof, it seems like a good bet most of the time, and better than prefetching pages we don't need. This commit installs the executor infrastructure, but doesn't change any planner cost estimates, thus possibly causing bitmap scans to not be chosen in cases where this change renders them the best choice. I (tgl) am not entirely convinced that we need to account for this behavior in the planner, because I think typically the bitmap scan would get chosen anyway if it's the best bet. In any case the submitted patch took way too many shortcuts, resulting in too many clearly-bad choices, to be committable. Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around rather heavily by me. Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru	2017-11-01 17:38:20 -04:00
Tom Lane	af20e2d728	Fix ALTER TABLE code to update domain constraints when needed. It's possible for dropping a column, or altering its type, to require changes in domain CHECK constraint expressions; but the code was previously only expecting to find dependent table CHECK constraints. Make the necessary adjustments. This is a fairly old oversight, but it's a lot easier to encounter the problem in the context of domains over composite types than it was before. Given the lack of field complaints, I'm not going to bother with a back-patch, though I'd be willing to reconsider that decision if someone does complain. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/30656.1509128130@sss.pgh.pa.us	2017-11-01 13:32:23 -04:00
Robert Haas	ee4673ac07	Don't exaggerate the number of temporary blocks read. A read that returns zero bytes (or an error) should not increment the number of temporary blocks read. Thomas Munro Discussion: http://postgr.es/m/CAEepm=21xgihg=WaG+O5MFotEZfN6kFETpfw+RkSnEqNQqGn2Q@mail.gmail.com	2017-10-31 14:56:09 +05:30
Robert Haas	cf7ab13bfb	Fix code related to partitioning schemes for dropped columns. The entry in appinfo->translated_vars can be NULL; if so, we must avoid dereferencing it. Ashutosh Bapat Discussion: http://postgr.es/m/CAFjFpReL7+1ien=-21rhjpO3bV7aAm1rQ8XgLVk2csFagSzpZQ@mail.gmail.com	2017-10-31 14:43:05 +05:30
Alvaro Herrera	be72b9c378	Fix autovacuum work item error handling In autovacuum's "work item" processing, a few strings were allocated in the current transaction's memory context, which goes away during error handling; if an error happened during execution of the work item, the pfree() calls to clean up afterwards would try to release already-released memory, possibly leading to a crash. In branch master, this was already fixed by commit `335f3d04e4`, so backpatch that to REL_10_STABLE to fix the problem there too. As a secondary problem, verify that the autovacuum worker is connected to the right database for each work item; otherwise some items would be discarded by workers in other databases. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20171014035732.GB31726@telsasoft.com	2017-10-30 15:52:02 +01:00
Magnus Hagander	854b643c8e	Fix typo in comment Etsuro Fujita	2017-10-30 14:37:00 +01:00
Robert Haas	846fcc8516	Fix problems with the "role" GUC and parallel query. Without this fix, dropping a role can sometimes result in parallel query failures in sessions that have used "SET ROLE" to assume the dropped role, even if that setting isn't active any more. Report by Pavan Deolasee. Patch by Amit Kapila, reviewed by me. Discussion: http://postgr.es/m/CABOikdOomRcZsLsLK+Z+qENM1zxyaWnAvFh3MJZzZnnKiF+REg@mail.gmail.com	2017-10-29 12:58:40 +05:30
Robert Haas	11c1d555ce	Improve comments for parallel executor estimation functions. The previous comment (which was copied as boilerplate from one file to the next) implied that it was the executor node itself which was being serialized, but that's not right. We're not serializing the executor nodes; we're just allowing them to store some additional information in DSM. Adjusts the comment to reflect this. Discussion: http://postgr.es/m/CA+TgmoaHVinxG=3h6qBAsyV8xaDyQwbzK7YZnYfE8nJFMK1=FA@mail.gmail.com	2017-10-28 11:50:22 +02:00
Robert Haas	24fd674a1a	Fix grammar. Etsuro Fujita Discussion: http://postgr.es/m/cc7767b6-6a1b-74a2-8b3c-48b8e64c12ed@lab.ntt.co.jp	2017-10-28 11:14:23 +02:00
Robert Haas	1310ac258c	Fix misplaced ReleaseSysCache call in get_default_partition_oid. Julien Rouhaud Discussion: http://postgr.es/m/CAOBaU_Y4omLA+VbsVdA-JwBLoJWiPxfdKCkMjrZM7NMZxa1fKw@mail.gmail.com	2017-10-28 11:10:21 +02:00
Tom Lane	d76886c2d3	Dept of second thoughts: keep aliasp_item in sync with tlistitem. Commit `d5b760ecb` wasn't quite right, on second thought: if the caller didn't ask for column names then it would happily emit more Vars than if the caller did ask for column names. This is surely not a good idea. Advance the aliasp_item whether or not we're preparing a colnames list.	2017-10-27 18:16:24 -04:00
Tom Lane	d5b760ecb5	Fix crash when columns have been added to the end of a view. expandRTE() supposed that an RTE_SUBQUERY subquery must have exactly as many non-junk tlist items as the RTE has column aliases for it. This was true at the time the code was written, and is still true so far as parse analysis is concerned --- but when the function is used during planning, the subquery might have appeared through insertion of a view that now has more columns than it did when the outer query was parsed. This results in a core dump if, for instance, we have to expand a whole-row Var that references the subquery. To avoid crashing, we can either stop expanding the RTE when we run out of aliases, or invent new aliases for the added columns. While the latter might be more useful, the former is consistent with what expandRTE() does for composite-returning functions in the RTE_FUNCTION case, so it seems like we'd better do it that way. Per bug #14876 from Samuel Horwitz. This has been busted since commit `ff1ea2173` allowed views to acquire more columns, so back-patch to all supported branches. Discussion: https://postgr.es/m/20171026184035.1471.82810@wrigleys.postgresql.org	2017-10-27 17:28:54 -04:00
Robert Haas	682ce911f8	Allow parallel query for prepared statements with generic plans. This was always intended to work, but due to an oversight in max_parallel_hazard_walker, it didn't. In testing, we missed the fact that it was only working for custom plans, where the parameter value has been substituted for the parameter itself early enough that everything worked. In a generic plan, the Param node survives and must be treated as parallel-safe. SerializeParamList provides for the transmission of parameter values to workers. Amit Kapila with help from Kuntal Ghosh. Some changes by me. Discussion: http://postgr.es/m/CAA4eK1+_BuZrmVCeua5Eqnm4Co9DAXdM5HPAOE2J19ePbR912Q@mail.gmail.com	2017-10-27 22:22:39 +02:00
Tom Lane	6784d7a1dc	Rethink the dependencies recorded for FieldSelect/FieldStore nodes. On closer investigation, commits `f3ea3e3e8` et al were a few bricks shy of a load. What we need is not so much to lock down the result type of a FieldSelect, as to lock down the existence of the column it's trying to extract. Otherwise, we can break it by dropping that column. The dependency on the result type is then held indirectly through the column, and doesn't need to be recorded explicitly. Out of paranoia, I left in the code to record a dependency on the result type, but it's used only if we can't identify the pg_class OID for the column. That shouldn't ever happen right now, AFAICS, but it seems possible that in future the input node could be marked as being of type RECORD rather than some specific composite type. Likewise for FieldStore. Like the previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/22571.1509064146@sss.pgh.pa.us	2017-10-27 12:19:09 -04:00
Robert Haas	639c1a6bb9	Fix mistaken failure to allow parallelism in corner case. If we try to run a parallel plan in serial mode because, for example, it's going to be scanned via a cursor, but for some reason we're already in parallel mode (for example because an outer query is running in parallel), we'd incorrectly try to launch workers. Fix by adding a flag to the EState, so that we can be certain that ExecutePlan() and ExecGather()/ExecGatherMerge() will have the same idea about whether we are executing serially or in parallel. Report and fix by Amit Kapila with help from Kuntal Ghosh. A few tweaks by me. Discussion: http://postgr.es/m/CAA4eK1+_BuZrmVCeua5Eqnm4Co9DAXdM5HPAOE2J19ePbR912Q@mail.gmail.com	2017-10-27 16:04:01 +02:00
Tom Lane	37a795a60b	Support domains over composite types. This is the last major omission in our domains feature: you can now make a domain over anything that's not a pseudotype. The major complication from an implementation standpoint is that places that might be creating tuples of a domain type now need to be prepared to apply domain_check(). It seems better that unprepared code fail with an error like "<type> is not composite" than that it silently fail to apply domain constraints. Therefore, relevant infrastructure like get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted to treat domain-over-composite as a distinct case that unprepared code won't recognize, rather than just transparently treating it the same as plain composite. This isn't a 100% solution to the possibility of overlooked domain checks, but it catches most places. In passing, improve typcache.c's support for domains (it can now cache the identity of a domain's base type), and rewrite the argument handling logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative per-call lookups. I believe this is code-complete so far as the core and contrib code go. The PLs need varying amounts of work, which will be tackled in followup patches. Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us	2017-10-26 13:47:45 -04:00
Tom Lane	08f1e1f0a4	Make setrefs.c match by ressortgroupref even for plain Vars. Previously, we skipped using search_indexed_tlist_for_sortgroupref() if the tlist expression being sought in the child plan node was merely a Var. This is purely an optimization, based on the theory that search_indexed_tlist_for_var() is faster, and one copy of a Var should be as good as another. However, the GROUPING SETS patch broke the latter assumption: grouping columns containing the "same" Var can sometimes have different outputs, as shown in the test case added here. So do it the hard way whenever a ressortgroupref marking exists. (If this seems like a bottleneck, we could imagine building a tlist index data structure for ressortgroupref values, as we do for Vars. But I'll let that idea go until there's some evidence it's worthwhile.) Back-patch to 9.6. The problem also exists in 9.5 where GROUPING SETS came in, but this patch is insufficient to resolve the problem in 9.5: there is some obscure dependency on the upper-planner-pathification work that happened in 9.6. Given that this is such a weird corner case, and no end users have complained about it, it doesn't seem worth the work to develop a fix for 9.5. Patch by me, per a report from Heikki Linnakangas. (This does not fix Heikki's original complaint, just the follow-on one.) Discussion: https://postgr.es/m/aefc657e-edb2-64d5-6df1-a0828f6e9104@iki.fi	2017-10-26 12:17:40 -04:00
Andrew Dunstan	adee9e4e31	Undo inadvertent change in capitalization in commit `18fc4ec`.	2017-10-26 08:20:00 -04:00
Robert Haas	b55509332f	In relevant log messages, indicate whether vacuums are aggressive. Kyotaro Horiguchi, reviewed Masahiko Sawada, David G. Johnston, Álvaro Herrera, and me. Grammar correction to the final posted patch by me. Discussion: http://postgr.es/m/20170329.124649.193656100.horiguchi.kyotaro@lab.ntt.co.jp	2017-10-26 12:38:10 +02:00
Andrew Dunstan	18fc4ecf4a	Process variadic arguments consistently in json functions json_build_object and json_build_array and the jsonb equivalents did not correctly process explicit VARIADIC arguments. They are modified to use the new extract_variadic_args() utility function which abstracts away the details of the call method. Michael Paquier, reviewed by Tom Lane and Dmitry Dolgov. Backpatch to 9.5 for the jsonb fixes and 9.4 for the json fixes, as that's where they originated.	2017-10-25 07:34:00 -04:00
Andrew Dunstan	f3c6e8a27a	Add a utility function to extract variadic function arguments This is epecially useful in the case or "VARIADIC ANY" functions. The caller can get the artguments and types regardless of whether or not and explicit VARIADIC array argument has been used. The function also provides an option to convert arguments on type "unknown" to to "text". Michael Paquier and me, reviewed by Tom Lane. Backpatch to 9.4 in order to support the following json bug fix.	2017-10-25 07:13:11 -04:00
Tom Lane	896eb5efbd	In the planner, delete joinaliasvars lists after we're done with them. Although joinaliasvars lists coming out of the parser are quite simple, those lists can contain arbitrarily complex expressions after subquery pullup. We do not perform expression preprocessing on them, meaning that expressions in those lists will not meet the expectations of later phases of the planner (for example, that they do not contain SubLinks). This had been thought pretty harmless, since we don't intentionally touch those lists in later phases --- but Andreas Seltenreich found a case in which adjust_appendrel_attrs() could recurse into a joinaliasvars list and then die on its assertion that it never sees a SubLink. We considered a couple of localized fixes to prevent that specific case from looking at the joinaliasvars lists, but really this seems like a generic hazard for all expression processing in the planner. Therefore, probably the best answer is to delete the joinaliasvars lists from the parsetree at the end of expression preprocessing, so that there are no reachable expressions that haven't been through preprocessing. The case Andreas found seems to be harmless in non-Assert builds, and so far there are no field reports suggesting that there are user-visible effects in other cases. I considered back-patching this anyway, but it turns out that Andreas' test doesn't fail at all in 9.4-9.6, because in those versions adjust_appendrel_attrs contains code (added in commit `842faa714` and removed again in commit `215b43cdc`) to process SubLinks rather than complain about them. Barring discovery of another path by which unprocessed joinaliasvars lists can cause trouble, the most prudent compromise seems to be to patch this into v10 but not further. Patch by me, with thanks to Amit Langote for initial investigation and review. Discussion: https://postgr.es/m/87r2tvt9f1.fsf@ansel.ydns.eu	2017-10-24 18:42:47 -04:00
Tom Lane	f3ea3e3e82	Fix some oversights in expression dependency recording. find_expr_references() neglected to record a dependency on the result type of a FieldSelect node, allowing a DROP TYPE to break a view or rule that contains such an expression. I think we'd omitted this case intentionally, reasoning that there would always be a related dependency ensuring that the DROP would cascade to the view. But at least with nested field selection expressions, that's not true, as shown in bug #14867 from Mansur Galiev. Add the dependency, and for good measure a dependency on the node's exposed collation. Likewise add a dependency on the result type of a FieldStore. I think here the reasoning was that it'd only appear within an assignment to a field, and the dependency on the field's column would be enough ... but having seen this example, I think that's wrong for nested-composites cases. Looking at nearby code, I notice we're not recording a dependency on the exposed collation of CoerceViaIO, which seems inconsistent with our choices for related node types. Maybe that's OK but I'm feeling suspicious of this code today, so let's add that; it certainly can't hurt. This patch does not do anything to protect already-existing views, only views created after it's installed. But seeing that the issue has been there a very long time and nobody noticed till now, that's probably good enough. Back-patch to all supported branches. Discussion: https://postgr.es/m/20171023150118.1477.19174@wrigleys.postgresql.org	2017-10-23 13:57:45 -04:00
Tom Lane	36ea99c84d	Fix typcache's failure to treat ranges as container types. Like the similar logic for arrays and records, it's necessary to examine the range's subtype to decide whether the range type can support hashing. We can omit checking the subtype for btree-defined operations, though, since range subtypes are required to have those operations. (Possibly that simplification for btree cases led us to overlook that it does not apply for hash cases.) This is only an issue if the subtype lacks hash support, which is not true of any built-in range type, but it's easy to demonstrate a problem with a range type over, eg, money: you can get a "could not identify a hash function" failure when the planner is misled into thinking that hash join or aggregation would work. This was born broken, so back-patch to all supported branches.	2017-10-20 17:12:27 -04:00
Tom Lane	a8f1efc8ac	Fix misimplementation of typcache logic for extended hashing. The previous coding would report that an array type supports extended hashing if its element type supports regular hashing. This bug is only latent at the moment, since AFAICS there is not yet any code that depends on checking presence of extended-hashing support to make any decisions. (And in any case it wouldn't matter unless the element type has only regular hashing, which isn't true of any core data type.) But that doesn't make it less broken. Extend the cache_array_element_properties infrastructure to check this properly.	2017-10-20 16:08:17 -04:00
Magnus Hagander	752871b6de	Fix typos David Rowley	2017-10-19 13:58:30 +02:00
Magnus Hagander	275c4be19d	Fix typo Masahiko Sawada	2017-10-19 13:57:20 +02:00
Peter Eisentraut	927e1ee2cb	UCS_to_most.pl: Process encodings in sorted order Otherwise the order depends on the Perl hash implementation, making it cumbersome to scan the output when debugging.	2017-10-19 05:58:39 -04:00
Tom Lane	7421f4b89a	Fix incorrect handling of CTEs and ENRs as DML target relations. setTargetTable threw an error if the proposed target RangeVar's relname matched any visible CTE or ENR. This breaks backwards compatibility in the CTE case, since pre-v10 we never looked for a CTE here at all, so that CTE names did not mask regular tables. It does seem like a good idea to throw an error for the ENR case, though, thus causing ENRs to mask tables for this purpose; ENRs are new in v10 so we're not breaking existing code, and we may someday want to allow them to be the targets of DML. To fix that, replace use of getRTEForSpecialRelationTypes, which was overkill anyway, with use of scanNameSpaceForENR. A second problem was that the check neglected to verify null schemaname, so that a CTE or ENR could incorrectly be thought to match a qualified RangeVar. That happened because getRTEForSpecialRelationTypes relied on its caller to have checked for null schemaname. Even though the one remaining caller got it right, this is obviously bug-prone, so move the check inside getRTEForSpecialRelationTypes. Also, revert commit 18ce3a4ab's extremely poorly thought out decision to add a NULL return case to parserOpenTable --- without either documenting that or adjusting any of the callers to check for it. The current bug seems to have arisen in part due to working around that bad idea. In passing, remove the one-line shim functions transformCTEReference and transformENRReference --- they don't seem to be adding any clarity or functionality. Per report from Hugo Mercier (via Julien Rouhaud). Back-patch to v10 where the bug was introduced. Thomas Munro, with minor editing by me Discussion: https://postgr.es/m/CAOBaU_YdPVH+PTtiKSSLOiiW3mVDYsnNUekK+XPbHXiP=wrFLA@mail.gmail.com	2017-10-16 17:56:54 -04:00
Peter Eisentraut	4211673622	Exclude flex-generated code from coverage testing Flex generates a lot of functions that are not actually used. In order to avoid coverage figures being ruined by that, mark up the part of the .l files where the generated code appears by lcov exclusion markers. That way, lcov will typically only reported on coverage for the .l file, which is under our control, but not for the .c file. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-10-16 16:28:11 -04:00
Tom Lane	cf5ba7c30c	Treat aggregate direct arguments as per-agg data not per-trans data. There is no reason to insist that direct arguments must match before we can merge transition states of two aggregate calls. They're only used during the finalfn call, so we can treat them as like the finalfn itself. This allows, eg, merging of select percentile_cont(0.25) within group (order by a), percentile_disc(0.5) within group (order by a) from ... This didn't matter (and could not have been tested) before we allowed state merging of OSAs otherwise. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com	2017-10-16 16:02:51 -04:00
Tom Lane	be0ebb65f5	Allow the built-in ordered-set aggregates to share transition state. The built-in OSAs all share the same transition function, so they can share transition state as long as the final functions cooperate to not do the sort step more than once. To avoid running the tuplesort object in randomAccess mode unnecessarily, add a bit of infrastructure to nodeAgg.c to let the aggregate functions find out whether the transition state is actually being shared or not. This doesn't work for the hypothetical aggregates, since those inject a hypothetical row that isn't traceable to the shared input state. So they remain marked aggfinalmodify = 'w'. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com	2017-10-16 15:51:23 -04:00
Tom Lane	c3dfe0fec0	Repair breakage of aggregate FILTER option. An aggregate's input expression(s) are not supposed to be evaluated at all for a row where its FILTER test fails ... but commit `8ed3f11bb` overlooked that requirement. Reshuffle so that aggregates having a filter clause evaluate their arguments separately from those without. This still gets the benefit of doing only one ExecProject in the common case of multiple Aggrefs, none of which have filters. While at it, arrange for filter clauses to be included in the common ExecProject evaluation, thus perhaps buying a little bit even when there are filters. Back-patch to v10 where the bug was introduced. Discussion: https://postgr.es/m/30065.1508161354@sss.pgh.pa.us	2017-10-16 15:24:36 -04:00
Alvaro Herrera	60a1d96ed7	Rework DefineIndex relkind check Simplify coding using a switch rather than nested if tests. Author: Álvaro Reviewed-by: Robert Haas, Amit Langote, Michaël Paquier Discussion: https://postgr.es/m/20171013163820.pai7djcaxrntaxtn@alvherre.pgsql	2017-10-16 12:22:18 +02:00
Tom Lane	5fc438fb25	Restore nodeAgg.c's ability to check for improperly-nested aggregates. While poking around in the aggregate logic, I noticed that commit `8ed3f11bb` broke the logic in nodeAgg.c that purports to detect nested aggregates, by moving initialization of regular aggregate argument expressions out of the code segment that checks for that. You could argue that this check is unnecessary, but it's not much code so I'm inclined to keep it as a backstop against parser and planner bugs. However, there's certainly zero value in checking only some of the subexpressions. We can make the check complete again, and as a bonus make it a good deal more bulletproof against future mistakes of the same ilk, by moving it out to the outermost level of ExecInitAgg. This means we need to check only once per Agg node not once per aggregate, which also seems like a good thing --- if the check does find something wrong, it's not urgent that we report it before the plan node initialization finishes. Since this requires remembering the original length of the aggs list, I deleted a long-obsolete stanza that changed numaggs from 0 to 1. That's so old it predates our decision that palloc(0) is a valid operation, in (digs...) 2004, see commit `24a1e20f1`. In passing improve a few comments. Back-patch to v10, just in case.	2017-10-15 19:19:18 -04:00
Tom Lane	4de2d4fba3	Explicitly track whether aggregate final functions modify transition state. Up to now, there's been hard-wired assumptions that normal aggregates' final functions never modify their transition states, while ordered-set aggregates' final functions always do. This has always been a bit limiting, and in particular it's getting in the way of improving the built-in ordered-set aggregates to allow merging of transition states. Therefore, let's introduce catalog and CREATE AGGREGATE infrastructure that lets the finalfn's behavior be declared explicitly. There are now three possibilities for the finalfn behavior: it's purely read-only, it trashes the transition state irrecoverably, or it changes the state in such a way that no more transfn calls are possible but the state can still be passed to other, compatible finalfns. There are no examples of this third case today, but we'll shortly make the built-in OSAs act like that. This change allows user-defined aggregates to explicitly disclaim support for use as window functions, and/or to prevent transition state merging, if their implementations cannot handle that. While it was previously possible to handle the window case with a run-time error check, there was not any way to prevent transition state merging, which in retrospect is something commit `804163bc2` should have provided for. But better late than never. In passing, split out pg_aggregate.c's extern function declarations into a new header file pg_aggregate_fn.h, similarly to what we've done for some other catalog headers, so that pg_aggregate.h itself can be safe for frontend files to include. This lets pg_dump use the symbolic names for relevant constants. Discussion: https://postgr.es/m/4834.1507849699@sss.pgh.pa.us	2017-10-14 15:21:39 -04:00
Andres Freund	141fd1b66c	Improve sys/catcache performance. The following are the individual improvements: 1) Avoidance of FunctionCallInfo based function calls, replaced by more efficient functions with a native C argument interface. 2) Don't extract columns from a cache entry's tuple whenever matching entries - instead store them as a Datum array. This also allows to get rid of having to build dummy tuples for negative & list entries, and of a hack for dealing with cstring vs. text weirdness. 3) Reorder members of catcache.h struct, so imortant entries are more likely to be on one cacheline. 4) Allowing the compiler to specialize critical SearchCatCache for a specific number of attributes allows to unroll loops and avoid other nkeys dependant initialization. 5) Only initializing the ScanKey when necessary, i.e. catcache misses, greatly reduces cache unnecessary cpu cache misses. 6) Split of the cache-miss case from the hash lookup, reducing stack allocations etc in the common case. 7) CatCTup and their corresponding heaptuple are allocated in one piece. This results in making cache lookups themselves roughly three times as fast - full-system benchmarks obviously improve less than that. I've also evaluated further techniques: - replace open coded hash with simplehash - the list walk right now shows up in profiles. Unfortunately it's not easy to do so safely as an entry's memory location can change at various times, which doesn't work well with the refcounting and cache invalidation. - Cacheline-aligning CatCTup entries - helps some with performance, but the win isn't big and the code for it is ugly, because the tuples have to be freed as well. - add more proper functions, rather than macros for SearchSysCacheCopyN etc., but right now they don't show up in profiles. The reason the macro wrapper for syscache.c/h have to be changed, rather than just catcache, is that doing otherwise would require exposing the SysCache array to the outside. That might be a good idea anyway, but it's for another day. Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de	2017-10-13 14:22:41 -07:00
Robert Haas	6393613b6a	Fix possible crash with Parallel Bitmap Heap Scan. If a Parallel Bitmap Heap scan's chain of leftmost descendents includes a BitmapOr whose first child is a BitmapAnd, the prior coding would mistakenly create a non-shared TIDBitmap and then try to perform shared iteration. Report by Tomas Vondra. Patch by Dilip Kumar. Discussion: http://postgr.es/m/50e89684-8ad9-dead-8767-c9545bafd3b6@2ndquadrant.com	2017-10-13 15:02:45 -04:00
Tom Lane	73937119bf	Improve implementation of CRE-stack-flattening in map_variable_attnos(). I (tgl) objected to the obscure implementation introduced in commit `1c497fa72`. This one seems a bit less action-at-a-distance-y, at the price of repeating a few lines of code. Improve the comments about what the function is doing, too. Amit Khandekar, whacked around a bit more by me Discussion: https://postgr.es/m/CAJ3gD9egYTyHUH0nTMxm8-1m3RvdqEbaTyGC-CUNtYf7tKNDaQ@mail.gmail.com	2017-10-13 13:43:55 -04:00
Peter Eisentraut	7d1b8e7591	Attempt to fix LDAP build Apparently, an older spelling of LDAP_OPT_DIAGNOSTIC_MESSAGE is LDAP_OPT_ERROR_STRING, so fall back to that one.	2017-10-12 23:47:48 -04:00
Peter Eisentraut	cf1238cd97	Log diagnostic messages if errors occur during LDAP auth. Diagnostic messages seem likely to help users diagnose root causes more easily, so let's report them as errdetail. Author: Thomas Munro Reviewed-By: Ashutosh Bapat, Christoph Berg, Alvaro Herrera, Peter Eisentraut Discussion: https://postgr.es/m/CAEepm=2_dA-SYpFdmNVwvKsEBXOUj=K4ooKovHmvj6jnMdt8dw@mail.gmail.com	2017-10-12 22:37:14 -04:00
Peter Eisentraut	1feff99fe4	Improve LDAP cleanup code in error paths. After calling ldap_unbind_s() we probably shouldn't try to use the LDAP connection again to call ldap_get_option(), even if it failed. The OpenLDAP man page for ldap_unbind[_s] says "Once it is called, the connection to the LDAP server is closed, and the ld structure is invalid." Otherwise, as a general rule we should probably call ldap_unbind() before returning in all paths to avoid leaking resources. It is unlikely there is any practical leak problem since failure to authenticate currently results in the backend exiting soon afterwards. Author: Thomas Munro Reviewed-By: Alvaro Herrera, Peter Eisentraut Discussion: https://postgr.es/m/20170914141205.eup4kxzlkagtmfac%40alvherre.pgsql	2017-10-12 22:37:14 -04:00
Robert Haas	1c497fa72d	Avoid coercing a whole-row variable that is already coerced. Marginal efficiency and beautification hack. I'm not sure whether this case ever arises currently, but the pending patch for update tuple routing will cause it to arise. Amit Khandekar Discussion: http://postgr.es/m/CAJ3gD9cazfppe7-wwUbabPcQ4_0SubkiPFD1+0r5_DkVNWo5yg@mail.gmail.com	2017-10-12 17:10:48 -04:00
Robert Haas	60f7c0abef	Use ResultRelInfo ** rather than ResultRelInfo * for tuple routing. The previous convention doesn't lend itself to creating ResultRelInfos lazily, as we already do in ExecGetTriggerResultRel. This patch doesn't make anything lazier than before, but the pending patch for UPDATE tuple routing proposes to do so (and there might be other opportunities as well). Amit Khandekar with some adjustments by me. Discussion: http://postgr.es/m/CA+TgmoYPVP9Lyf6vUFA5DwxS4c--x6LOj2y36BsJaYtp62eXPQ@mail.gmail.com	2017-10-12 16:50:53 -04:00
Tom Lane	305cf1fd72	Fix AggGetAggref() so it won't lie to aggregate final functions. If we merge the transition calculations for two different aggregates, it's reasonable to assume that the transition function should not care which of those Aggref structs it gets from AggGetAggref(). It is not reasonable to make the same assumption about an aggregate final function, however. Commit `804163bc2` broke this, as it will pass whichever Aggref was first associated with the transition state in both cases. This doesn't create an observable bug so far as the core system is concerned, because the only existing uses of AggGetAggref() are in ordered-set aggregates that happen to not pay attention to anything but the input properties of the Aggref; and besides that, we disabled sharing of transition calculations for OSAs yesterday. Nonetheless, if some third-party code were using AggGetAggref() in a normal aggregate, they would be entitled to call this a bug. Hence, back-patch the fix to 9.6 where the problem was introduced. In passing, improve some of the comments about transition state sharing. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com	2017-10-12 15:20:16 -04:00
Robert Haas	ad4a7ed099	Synchronize error messages. Commits `6476b26115` and `14f67a8ee2` didn't use quite the same error message for what is basically the same situation. Amit Langote, pared back a bit by me. Discussion: http://postgr.es/m/54dc76d0-3b5b-ba5a-27dc-fb31a3975b61@lab.ntt.co.jp	2017-10-12 15:14:22 -04:00
Alvaro Herrera	e9ef11ac8b	Infer functional dependency past RelabelType Vars hidden within a RelabelType would not be detected as compatible with some functional dependency. Repair by properly ignoring the RelabelType. Author: David Rowley Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/CAKJS1f-y-UEy=rsBXynBOgiW1fKMr_LVoYSGL9QOc36mLEC-ww@mail.gmail.com	2017-10-12 17:23:47 +02:00
Robert Haas	360fd1a7b2	Fix logical replication to fire BEFORE ROW DELETE triggers. Before, that would fail to happen unless a BEFORE ROW UPDATE trigger was also present. Noted by me while reviewing a patch from Masahiko Sawada, who also wrote this patch. Reviewed by Petr Jelinek. Discussion: http://postgr.es/m/CA+TgmobAZvCxduG8y_mQKBK7nz-vhbdLvjM354KEFozpuzMN5A@mail.gmail.com	2017-10-12 10:26:55 -04:00
Andres Freund	31079a4a8e	Replace remaining uses of pq_sendint with pq_sendint{8,16,32}. pq_sendint() remains, so extension code doesn't unnecessarily break. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de	2017-10-11 21:00:46 -07:00
Tom Lane	52328727be	Prevent sharing transition states between ordered-set aggregates. This ought to work, but the built-in OSAs are not capable of coping, because their final-functions destructively modify their transition state (specifically, the contained tuplesort object). That was fine when those functions were written, but commit `804163bc2` moved the goalposts without telling orderedsetaggs.c. We should fix the built-in OSAs to support this, but it will take a little work, especially if we don't want to sacrifice performance in the normal non-shared-state case. Given that it took a year after 9.6 release for anyone to notice this bug, we should not prioritize sharable-state over nonsharable-state performance. And a proper fix is likely to be more complicated than we'd want to back-patch, too. Therefore, let's just put in this stop-gap patch to prevent nodeAgg.c from choosing to use shared state for OSAs. We can revert it in HEAD when we get a better fix. Report from Lukas Eder, diagnosis by me, patch by David Rowley. Back-patch to 9.6 where the problem was introduced. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com	2017-10-11 22:18:10 -04:00
Andres Freund	4c119fbcd4	Improve performance of SendRowDescriptionMessage. There's three categories of changes leading to better performance: - Splitting the per-attribute part of SendRowDescriptionMessage into a v2 and a v3 version allows avoiding branches for every attribute. - Preallocating the size of the buffer to be big enough for all attributes and then using pq_write* avoids unnecessary buffer size checks & resizing. - Reusing a persistently allocated StringInfo for all SendRowDescriptionMessage() invocations avoids repeated allocations & reallocations. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de	2017-10-11 17:23:23 -07:00
Robert Haas	cff440d368	pg_stat_statements: Widen query IDs from 32 bits to 64 bits. This takes advantage of the infrastructure introduced by commit `81c5e46c49` to greatly reduce the likelihood that two different queries will end up with the same query ID. It's still possible, of course, but whereas before it the chances of a collision reached 25% around 50,000 queries, it will now take more than 3 billion queries. Backward incompatibility: Because the type exposed at the SQL level is int8, users may now see negative query IDs in the pg_stat_statements view (and also, query IDs more than 4 billion, which was the old limit). Patch by me, reviewed by Michael Paquier and Peter Geoghegan. Discussion: http://postgr.es/m/CA+TgmobG_Kp4cBKFmsznUAaM1GWW6hhRNiZC0KjRMOOeYnz5Yw@mail.gmail.com	2017-10-11 19:52:46 -04:00
Andres Freund	f2dec34e19	Use one stringbuffer for all rows printed in printtup.c. This avoids newly allocating, and then possibly growing, the stringbuffer for every row. For wide rows this can substantially reduce memory allocator overhead, at the price of not immediately reducing memory usage after outputting an especially wide row. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de	2017-10-11 16:26:35 -07:00
Andres Freund	1de09ad8eb	Add more efficient functions to pqformat API. There's three prongs to achieve greater efficiency here: 1) Allow reusing a stringbuffer across pq_beginmessage/endmessage, with the new pq_beginmessage_reuse/endmessage_reuse. This can be beneficial both because it avoids allocating the initial buffer, and because it's more likely to already have an correctly sized buffer. 2) Replacing pq_sendint() with pq_sendint$width() inline functions. Previously unnecessary and unpredictable branches in pq_sendint() were needed. Additionally the replacement functions are implemented more efficiently. pq_sendint is now deprecated, a separate commit will convert all in-tree callers. 3) Add pq_writeint$width(), pq_writestring(). These rely on sufficient space in the StringInfo's buffer, avoiding individual space checks & potential individual resizing. To allow this to be used for strings, expose mbutil.c's MAX_CONVERSION_GROWTH. Followup commits will make use of these facilities. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de	2017-10-11 16:01:52 -07:00
Andres Freund	70c2d1be2b	Allow to avoid NUL-byte management for stringinfos and use in format.c. In a lot of the places having appendBinaryStringInfo() maintain a trailing NUL byte wasn't actually meaningful, e.g. when appending an integer which can contain 0 in one of its bytes. Removing this yields some small speedup, but more importantly will be more consistent when providing faster variants of pq_sendint etc. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de	2017-10-11 16:01:52 -07:00
Tom Lane	5fa6b0d102	Remove unnecessary PG_TRY overhead for CurrentResourceOwner changes. resowner/README contained advice to use a PG_TRY block to restore the old CurrentResourceOwner value anywhere that that variable is transiently changed. That advice was only inconsistently followed, however, and on reflection it seems like unnecessary overhead. We don't bother with such a convention for transient CurrentMemoryContext changes, on the grounds that any (sub)transaction abort will start out by resetting CurrentMemoryContext to what it wants. But the same is true of CurrentResourceOwner, so there seems no need to treat it differently. Hence, remove PG_TRY blocks that exist only to restore CurrentResourceOwner before re-throwing the error. There are a couple of places that restore it along with some other actions, and I left those alone; the restore is probably unnecessary but no noticeable gain will result from removing it. Discussion: https://postgr.es/m/5236.1507583529@sss.pgh.pa.us	2017-10-11 17:44:09 -04:00
Andres Freund	f676616651	Prevent idle in transaction session timeout from sometimes being ignored. The previous coding in ProcessInterrupts() could lead to idle_in_transaction_session_timeout being ignored, when statement_timeout occurred earlier. The problem was that ProcessInterrupts() would return before processing the transaction timeout if QueryCancelPending was set while QueryCancelHoldoffCount != 0 - which is the case when reading new commands from the client. Ergo when the idle transaction timeout would hit. Fix that by removing the early return. Alternatively the transaction timeout code could have been moved up, but that early return seems like an issue that could hit other cases too. Author: Lukas Fittl Bug: #14821 Discussion: https://www.postgresql.org/message-id/20170921010956.17345.61461%40wrigleys.postgresql.org https://www.postgresql.org/message-id/CAP53PkxQnv3OWJpyNPGJYT62uY=n1=2CF_Lpc6gVOFnc0-gazw@mail.gmail.com Backpatch: 9.6-, where idle_in_transaction_session_timeout was introduced.	2017-10-11 14:02:41 -07:00
Tom Lane	2860596832	Doc: fix missing explanation of default object privileges. The GRANT reference page, which lists the default privileges for new objects, failed to mention that USAGE is granted by default for data types and domains. As a lesser sin, it also did not specify anything about the initial privileges for sequences, FDWs, foreign servers, or large objects. Fix that, and add a comment to acldefault() in the probably vain hope of getting people to maintain this list in future. Noted by Laurenz Albe, though I editorialized on the wording a bit. Back-patch to all supported branches, since they all have this behavior. Discussion: https://postgr.es/m/1507620895.4152.1.camel@cybertec.at	2017-10-11 16:57:14 -04:00
Robert Haas	20d210bf5b	Fix mistakes in comments. Masahiko Sawada Discussion: http://postgr.es/m/CAD21AoBsfYsMHD6_SL9iN3n_Foaa+oPbL5jG55DxU1ChaujqwQ@mail.gmail.com	2017-10-11 15:55:10 -04:00
Tom Lane	118e99c3d7	Fix low-probability loss of NOTIFY messages due to XID wraparound. Up to now async.c has used TransactionIdIsInProgress() to detect whether a notify message's source transaction is still running. However, that function has a quick-exit path that reports that XIDs before RecentXmin are no longer running. If a listening backend is doing nothing but listening, and not running any queries, there is nothing that will advance its value of RecentXmin. Once 2 billion transactions elapse, the RecentXmin check causes active transactions to be reported as not running. If they aren't committed yet according to CLOG, async.c decides they aborted and discards their messages. The timing for that is a bit tight but it can happen when multiple backends are sending notifies concurrently. The net symptom therefore is that a sufficiently-long-surviving listen-only backend starts to miss some fraction of NOTIFY traffic, but only under heavy load. The only function that updates RecentXmin is GetSnapshotData(). A brute-force fix would therefore be to take a snapshot before processing incoming notify messages. But that would add cycles, as well as contention for the ProcArrayLock. We can be smarter: having taken the snapshot, let's use that to check for running XIDs, and not call TransactionIdIsInProgress() at all. In this way we reduce the number of ProcArrayLock acquisitions from one per message to one per notify interrupt; that's the same under light load but should be a benefit under heavy load. Light testing says that this change is a wash performance-wise for normal loads. I looked around for other callers of TransactionIdIsInProgress() that might be at similar risk, and didn't find any; all of them are inside transactions that presumably have already taken a snapshot. Problem report and diagnosis by Marko Tiikkaja, patch by me. Back-patch to all supported branches, since it's been like this since 9.0. Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org	2017-10-11 14:28:33 -04:00
Andres Freund	fffd651e83	Rewrite strnlen replacement implementation from `8a241792f9`. The previous placement of the fallback implementation in libpgcommon was problematic, because libpqport functions need strnlen functionality. Move replacement into libpgport. Provide strnlen() under its posix name, instead of pg_strnlen(). Fix stupid configure bug, executing the test only when compiled with threading support. Author: Andres Freund Discussion: https://postgr.es/m/E1e1gR2-0005fB-SI@gemulon.postgresql.org	2017-10-10 14:50:30 -07:00
Andres Freund	82c117cb90	Fix pnstrdup() to not memcpy() the maximum allowed length. The previous behaviour was dangerous if the length passed wasn't the size of the underlying buffer, but the maximum size of the underlying buffer. Author: Andres Freund Discussion: https://postgr.es/m/20161003215524.mwz5p45pcverrkyk@alap3.anarazel.de	2017-10-09 15:20:42 -07:00
Andres Freund	84ad4b036d	Reduce memory usage of targetlist SRFs. Previously nodeProjectSet only released memory once per input tuple, rather than once per returned tuple. If the computation of an individual returned tuple requires a lot of memory, that can lead to problems. Instead change things so that the expression context can be reset once per output tuple, which requires a new memory context to store SRF arguments in. This is a longstanding issue, but was hard to fix before 9.6, due to the way tSRFs where evaluated. But it's fairly easy to fix now. We could backpatch this into 10, but given there've been fewc omplaints that doesn't seem worth the risk so far. Reported-By: Lucas Fairchild Author: Andres Freund, per discussion with Tom Lane Discussion: https://postgr.es/m/4514.1507318623@sss.pgh.pa.us	2017-10-08 15:08:25 -07:00
Tom Lane	643c27e36f	Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com	2017-10-08 15:25:26 -04:00
Tom Lane	8ec5429e2f	Reduce "X = X" to "X IS NOT NULL", if it's easy to do so. If the operator is a strict btree equality operator, and X isn't volatile, then the clause must yield true for any non-null value of X, or null if X is null. At top level of a WHERE clause, we can ignore the distinction between false and null results, so it's valid to simplify the clause to "X IS NOT NULL". This is a useful improvement mainly because we'll get a far better selectivity estimate in most cases. Because such cases seldom arise in well-written queries, it is unappetizing to expend a lot of planner cycles looking for them ... but it turns out that there's a place we can shoehorn this in practically for free, because equivclass.c already has to detect and reject candidate equivalences of the form X = X. That doesn't catch every place that it would be valid to simplify to X IS NOT NULL, but it catches the typical case. Working harder doesn't seem justified. Patch by me, reviewed by Petr Jelinek Discussion: https://postgr.es/m/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com	2017-10-08 12:23:32 -04:00
Tom Lane	1518d07842	Fix crash when logical decoding is invoked from a PL function. The logical decoding functions do BeginInternalSubTransaction and RollbackAndReleaseCurrentSubTransaction to clean up after themselves. It turns out that AtEOSubXact_SPI has an unrecognized assumption that we always need to cancel the active SPI operation in the SPI context that surrounds the subtransaction (if there is one). That's true when the RollbackAndReleaseCurrentSubTransaction call is coming from the SPI-using function itself, but not when it's happening inside some unrelated function invoked by a SPI query. In practice the affected callers are the various PLs. To fix, record the current subtransaction ID when we begin a SPI operation, and clean up only if that ID is the subtransaction being canceled. Also, remove AtEOSubXact_SPI's assertion that it must have cleaned up the surrounding SPI context's active tuptable. That's proven wrong by the same test case. Also clarify (or, if you prefer, reinterpret) the calling conventions for _SPI_begin_call and _SPI_end_call. The memory context cleanup in the latter means that these have always had the flavor of a matched resource-management pair, but they weren't documented that way before. Per report from Ben Chobot. Back-patch to 9.4 where logical decoding came in. In principle, the SPI changes should go all the way back, since the problem dates back to commit `7ec1c5a86`. But given the lack of field complaints it seems few people are using internal subtransactions in this way. So I don't feel a need to take any risks in 9.2/9.3. Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com	2017-10-06 19:18:58 -04:00
Robert Haas	45866c7550	Copy information from the relcache instead of pointing to it. We have the relations continuously locked, but not open, so relcache pointers are not guaranteed to be stable. Per buildfarm member prion. Ashutosh Bapat. I fixed a typo. Discussion: http://postgr.es/m/CAFjFpRcRBqoKLZSNmRsjKr81uEP=ennvqSQaXVCCBTXvJ2rW+Q@mail.gmail.com	2017-10-06 15:28:07 -04:00
Tom Lane	a1c2c430d3	Fix intra-query memory leakage in nodeProjectSet.c. Both ExecMakeFunctionResultSet() and evaluation of simple expressions need to be done in the per-tuple memory context, not per-query, else we leak data until end of query. This is a consideration that was missed while refactoring code in the ProjectSet patch (note that in pre-v10, ExecMakeFunctionResult is called in the per-tuple context). Per bug #14843 from Ben M. Diagnosed independently by Andres and myself. Discussion: https://postgr.es/m/20171005230321.28561.15927@wrigleys.postgresql.org	2017-10-06 14:28:42 -04:00
Tom Lane	6b87416c9a	Fix access-off-end-of-array in clog.c. Sloppy loop coding in set_status_by_pages() resulted in fetching one array element more than it should from the subxids[] array. The odds of this resulting in SIGSEGV are pretty small, but we've certainly seen that happen with similar mistakes elsewhere. While at it, we can get rid of an extra TransactionIdToPage() calculation per loop. Per report from David Binderman. Back-patch to all supported branches, since this code is quite old. Discussion: https://postgr.es/m/HE1PR0802MB2331CBA919CBFFF0C465EB429C710@HE1PR0802MB2331.eurprd08.prod.outlook.com	2017-10-06 12:20:23 -04:00
Alvaro Herrera	a5736bf754	Fix traversal of half-frozen update chains When some tuple versions in an update chain are frozen due to them being older than freeze_min_age, the xmax/xmin trail can become broken. This breaks HOT (and probably other things). A subsequent VACUUM can break things in more serious ways, such as leaving orphan heap-only tuples whose root HOT redirect items were removed. This can be seen because index creation (or REINDEX) complain like ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t" Because of relfrozenxid contraints, we cannot avoid the freezing of the early tuples, so we must cope with the results: whenever we see an Xmin of FrozenTransactionId, consider it a match for whatever the previous Xmax value was. This problem seems to have appeared in 9.3 with multixact changes, though strictly speaking it seems unrelated. Since 9.4 we have commit `37484ad2a` "Change the way we mark tuples as frozen", so the fix is simple: just compare the raw Xmin (still stored in the tuple header, since freezing merely set an infomask bit) to the Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so the original value is lost and we have nothing to compare the Xmax with. To cope with that case we need to compare the Xmin with FrozenXid, assume it's a match, and hope for the best. Sadly, since you can pg_upgrade a 9.3 instance containing half-frozen pages to newer releases, we need to keep the old check in newer versions too, which seems a bit brittle; I hope we can somehow get rid of that. I didn't optimize the new function for performance. The new coding is probably a bit slower than before, since there is a function call rather than a straight comparison, but I'd rather have it work correctly than be fast but wrong. This is a followup after `20b6552242` fixed a few related problems. Apparently, in 9.6 and up there are more ways to get into trouble, but in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so there must be a separate bug. Reported-by: Peter Geoghegan Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood, Yi Wen Wong, Álvaro Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com	2017-10-06 17:20:01 +02:00
Robert Haas	f49842d1ee	Basic partition-wise join functionality. Instead of joining two partitioned tables in their entirety we can, if it is an equi-join on the partition keys, join the matching partitions individually. This involves teaching the planner about "other join" rels, which are related to regular join rels in the same way that other member rels are related to baserels. This can use significantly more CPU time and memory than regular join planning, because there may now be a set of "other" rels not only for every base relation but also for every join relation. In most practical cases, this probably shouldn't be a problem, because (1) it's probably unusual to join many tables each with many partitions using the partition keys for all joins and (2) if you do that scenario then you probably have a big enough machine to handle the increased memory cost of planning and (3) the resulting plan is highly likely to be better, so what you spend in planning you'll make up on the execution side. All the same, for now, turn this feature off by default. Currently, we can only perform joins between two tables whose partitioning schemes are absolutely identical. It would be nice to cope with other scenarios, such as extra partitions on one side or the other with no match on the other side, but that will have to wait for a future patch. Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit Khandekar, and by me. A few final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com	2017-10-06 11:11:10 -04:00
Tom Lane	fe9ba28ee8	Fix typo in README. s/BeginInternalSubtransaction/BeginInternalSubTransaction/	2017-10-05 15:06:01 -04:00
Robert Haas	6476b26115	On CREATE TABLE, consider skipping validation of subpartitions. This is just like commit `14f67a8ee2`, but for CREATE PARTITION rather than ATTACH PARTITION. Jeevan Ladhe, with test case changes by me. Discussion: http://postgr.es/m/CAOgcT0MWwG8WBw8frFMtRYHAgDD=tpt6U7WcsO_L2k0KYpm4Jg@mail.gmail.com	2017-10-05 13:23:28 -04:00
Robert Haas	14f67a8ee2	On attach, consider skipping validation of subpartitions individually. If the table attached as a partition is itself partitioned, individual partitions might have constraints strong enough to skip scanning the table even if the table actually attached does not. This is pretty cheap to check, and possibly a big win if it works out. Amit Langote, with test case changes by me. Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp	2017-10-05 13:06:46 -04:00
Robert Haas	c31e9d4baf	Improve error message when skipping scan of default partition. It seems like a good idea to clearly distinguish between skipping the scan of the new partition itself and skipping the scan of the default partition. Amit Langote Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp	2017-10-05 12:19:40 -04:00
Robert Haas	e9baa5e9fa	Allow DML commands that create tables to use parallel query. Haribabu Kommi, reviewed by Dilip Kumar and Rafia Sabih. Various cosmetic changes by me to explain why this appears to be safe but allowing inserts in parallel mode in general wouldn't be. Also, I removed the REFRESH MATERIALIZED VIEW case from Haribabu's patch, since I'm not convinced that case is OK, and hacked on the documentation somewhat. Discussion: http://postgr.es/m/CAJrrPGdo5bak6qnPWe8Kpi8g_jfQEs-G4SYmG9y+OFaw2-dPvA@mail.gmail.com	2017-10-05 11:40:48 -04:00
Tom Lane	4d85c2900b	Improve comments in vacuum_rel() and analyze_rel(). Remove obsolete references to get_rel_oids(). Avoid listing specific relkinds in the comments, since we seem unable to keep such things in sync with the code, and it's not all that helpful anyhow. Noted by Michael Paquier, though I rewrote the comments a bit more. Discussion: https://postgr.es/m/CAB7nPqTWiN9zwKTaOrsnKiGDChqRt7C1+CiiDk4N4OMn92rs6A@mail.gmail.com	2017-10-05 10:47:47 -04:00
Robert Haas	4b2ba1fe02	Fix typo. Etsuro Fujita Discussion: http://postgr.es/m/1b2e9ac7-b99a-2769-5e42-afdf62bfa7fa@lab.ntt.co.jp	2017-10-05 08:45:24 -04:00
Robert Haas	c097b271e8	Fix more user-visible elog() calls. Michael Paquier discovered that this could be triggered via SQL; give a nicer message instead. Patch by Michael Paquier, reviewed by Masahiko Sawada. Discussion: http://postgr.es/m/CAB7nPqQtPg+LKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg@mail.gmail.com	2017-10-05 07:58:02 -04:00
Peter Eisentraut	036166f26e	Document and use SPI_result_code_string() A lot of semi-internal code just prints out numeric SPI error codes, which is not very helpful. We already have an API function to convert the codes to a string, so let's make more use of that. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-10-04 22:14:21 -04:00
Peter Eisentraut	582bbcf37f	Move SPI error reporting out of ri_ReportViolation() These are two completely unrelated code paths, so it doesn't make sense to pack them into one function. Add attribute noreturn to ri_ReportViolation(). Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-10-04 22:14:21 -04:00
Andres Freund	212e6f34d5	Replace binary search in fmgr_isbuiltin with a lookup array. Turns out we have enough functions that the binary search is quite noticeable in profiles. Thus have Gen_fmgrtab.pl build a new mapping from a builtin function's oid to an index in the existing fmgr_builtins array. That keeps the additional memory usage at a reasonable amount. Author: Andres Freund, with input from Tom Lane Discussion: https://postgr.es/m/20170914065128.a5sk7z4xde5uy3ei@alap3.anarazel.de	2017-10-04 00:22:38 -07:00
Andres Freund	18f791ab2b	Move genbki.pl's find_defined_symbol to Catalog.pm. Will be used in Gen_fmgrtab.pl in a followup commit.	2017-10-04 00:11:36 -07:00
Tom Lane	11d8d72c27	Allow multiple tables to be specified in one VACUUM or ANALYZE command. Not much to say about this; does what it says on the tin. However, formerly, if there was a column list then the ANALYZE action was implied; now it must be specified, or you get an error. This is because it would otherwise be a bit unclear what the user meant if some tables have column lists and some don't. Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some editorialization by me Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com	2017-10-03 18:53:44 -04:00
Tom Lane	45f9d08684	Fix race condition with unprotected use of a latch pointer variable. Commit `597a87ccc` introduced a latch pointer variable to replace use of a long-lived shared latch in the shared WalRcvData structure. This was not well thought out, because there are now hazards of the pointer variable changing while it's being inspected by another process. This could obviously lead to a core dump in code like if (WalRcv->latch) SetLatch(WalRcv->latch); and there's a more remote risk of a torn read, if we have any platforms where reading/writing a pointer is not atomic. An actual problem would occur only if the walreceiver process exits (gracefully) while the startup process is trying to signal it, but that seems well within the realm of possibility. To fix, treat the pointer variable (not the referenced latch) as being protected by the WalRcv->mutex spinlock. There remains a race condition that we could apply SetLatch to a process latch that no longer belongs to the walreceiver, but I believe that's harmless: at worst it'd cause an extra wakeup of the next process to use that PGPROC structure. Back-patch to v10 where the faulty code was added. Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us	2017-10-03 14:00:56 -04:00
Alvaro Herrera	89e434b59c	Fix coding rules violations in walreceiver.c 1. Since commit `b1a9bad9e7` we had pstrdup() inside a spinlock-protected critical section; reported by Andreas Seltenreich. Turn those into strlcpy() to stack-allocated variables instead. Backpatch to 9.6. 2. Since commit `9ed551e0a4` we had a pfree() uselessly inside a spinlock-protected critical section. Tom Lane noticed in code review. Move down. Backpatch to 9.6. 3. Since commit `64233902d2` we had GetCurrentTimestamp() (a kernel call) inside a spinlock-protected critical section. Tom Lane noticed in code review. Move it up. Backpatch to 9.2. 4. Since commit `1bb2558046` we did elog(PANIC) while holding spinlock. Tom Lane noticed in code review. Release spinlock before dying. Backpatch to 9.2. Discussion: https://postgr.es/m/87h8vhtgj2.fsf@ansel.ydns.eu	2017-10-03 14:58:25 +02:00
Andres Freund	0ba99c84e8	Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h. All postgres internal usages are replaced, it's just libpq example usages that haven't been converted. External users of libpq can't generally rely on including postgres internal headers. Note that this includes replacing open-coded byte swapping of 64bit integers (using two 32 bit swaps) with a single 64bit swap. Where it looked applicable, I have removed netinet/in.h and arpa/inet.h usage, which previously provided the relevant functionality. It's perfectly possible that I missed other reasons for including those, the buildfarm will tell. Author: Andres Freund Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de	2017-10-01 15:36:14 -07:00
Tom Lane	c12d570fa1	Support arrays over domains. Allowing arrays with a domain type as their element type was left un-done in the original domain patch, but not for any very good reason. This omission leads to such surprising results as array_agg() not working on a domain column, because the parser can't identify a suitable output type for the polymorphic aggregate. In order to fix this, first clean up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(). Next, reimplement ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(). The executor now handles the per-element processing as a compiled expression instead of hard-wired code. The main advantage of this is that we can use a single ArrayCoerceExpr to handle as many as three successive steps per element: base type conversion, typmod coercion, and domain constraint checking. The old code used two stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty inefficient, and adding yet another array deconstruction to do domain constraint checking seemed very unappetizing. In the case where we just need a single, very simple coercion function, doing this straightforwardly leads to a noticeable increase in the per-array-element runtime cost. Hence, add an additional shortcut evalfunc in execExprInterp.c that skips unnecessary overhead for that specific form of expression. The runtime speed of simple cases is within 1% or so of where it was before, while cases that previously required two levels of array processing are significantly faster. Finally, create an implicit array type for every domain type, as we do for base types, enums, etc. Everything except the array-coercion case seems to just work without further effort. Tom Lane, reviewed by Andrew Dunstan Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us	2017-09-30 13:40:56 -04:00
Tom Lane	19de0ab23c	Fix inadequate locking during get_rel_oids(). get_rel_oids used to not take any relation locks at all, but that stopped being a good idea with commit `3c3bb9933`, which inserted a syscache lookup into the function. A concurrent DROP TABLE could now produce "cache lookup failed", which we don't want to have happen in normal operation. The best solution seems to be to transiently take a lock on the relation named by the RangeVar (which also makes the result of RangeVarGetRelid a lot less spongy). But we shouldn't hold the lock beyond this function, because we don't want VACUUM to lock more than one table at a time. (That would not be a big problem right now, but it will become one after the pending feature patch to allow multiple tables to be named in VACUUM.) In passing, adjust vacuum_rel and analyze_rel to document that we don't trust the passed RangeVar to be accurate, and allow the RangeVar to possibly be NULL --- which it is anyway for a whole-database VACUUM, though we accidentally didn't crash for that case. The passed RangeVar is in fact inaccurate when dealing with a child partition, as of v10, and it has been wrong for a whole long time in the case of vacuum_rel() recursing to a TOAST table. None of these things present visible bugs up to now, because the passed RangeVar is in fact only consulted for autovacuum logging, and in that particular context it's always accurate because autovacuum doesn't let vacuum.c expand partitions nor recurse to toast tables. Still, this seems like trouble waiting to happen, so let's nail the door at least partly shut. (Further cleanup is planned, in HEAD only, as part of the pending feature patch.) Fix some sadly inaccurate/obsolete comments too. Back-patch to v10. Michael Paquier and Tom Lane Discussion: https://postgr.es/m/25023.1506107590@sss.pgh.pa.us	2017-09-29 16:26:31 -04:00
Tom Lane	136ab7c5a5	Marginal improvement for generated code in execExprInterp.c. Avoid the coding pattern "*op->resvalue = f();", as some compilers think that requires them to evaluate "op->resvalue" before the function call. Unless there are lots of free registers, this can lead to a useless register spill and reload across the call. I changed all the cases like this in ExecInterpExpr(), but didn't bother in the out-of-line opcode eval subroutines, since those are presumably not as performance-critical. Discussion: https://postgr.es/m/2508.1506630094@sss.pgh.pa.us	2017-09-29 11:32:05 -04:00
Peter Eisentraut	5373bc2a08	Add background worker type Add bgw_type field to background worker structure. It is intended to be set to the same value for all workers of the same type, so they can be grouped in pg_stat_activity, for example. The backend_type column in pg_stat_activity now shows bgw_type for a background worker. The ps listing also no longer calls out that a process is a background worker but just show the bgw_type. That way, being a background worker is more of an implementation detail now that is not shown to the user. However, most log messages still refer to 'background worker "%s"'; otherwise constructing sensible and translatable log messages would become tricky. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2017-09-29 11:08:24 -04:00
Robert Haas	8b304b8b72	Remove replacement selection sort. At the time replacement_sort_tuples was introduced, there were still cases where replacement selection sort noticeably outperformed using quicksort even for the first run. However, those cases seem to have evaporated as a result of further improvements made since that time (and perhaps also advances in CPU technology). So remove replacement selection and the controlling GUC entirely. This makes tuplesort.c noticeably simpler and probably paves the way for further optimizations someone might want to do later. Peter Geoghegan, with review and testing by Tomas Vondra and me. Discussion: https://postgr.es/m/CAH2-WzmmNjG_K0R9nqYwMq3zjyJJK+hCbiZYNGhAy-Zyjs64GQ@mail.gmail.com	2017-09-29 10:25:44 -04:00
Alvaro Herrera	20b6552242	Fix freezing of a dead HOT-updated tuple Vacuum calls page-level HOT prune to remove dead HOT tuples before doing liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But concurrent transaction commit/abort may turn DEAD some of the HOT tuples that survived the prune, before HeapTupleSatisfiesVacuum tests them. This happens to activate the code that decides to freeze the tuple ... which resuscitates it, duplicating data. (This is especially bad if there's any unique constraints, because those are now internally violated due to the duplicate entries, though you won't know until you try to REINDEX or dump/restore the table.) One possible fix would be to simply skip doing anything to the tuple, and hope that the next HOT prune would remove it. But there is a problem: if the tuple is older than freeze horizon, this would leave an unfrozen XID behind, and if no HOT prune happens to clean it up before the containing pg_clog segment is truncated away, it'd later cause an error when the XID is looked up. Fix the problem by having the tuple freezing routines cope with the situation: don't freeze the tuple (and keep it dead). In the cases that the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag so that there is no need to look up the XID in pg_clog later on. An isolation test is included, authored by Michael Paquier, loosely based on Daniel Wood's original reproducer. It only tests one particular scenario, though, not all the possible ways for this problem to surface; it be good to have a more reliable way to test this more fully, but it'd require more work. In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql I outlined another test case (more closely matching Dan Wood's) that exposed a few more ways for the problem to occur. Backpatch all the way back to 9.3, where this problem was introduced by multixact juggling. In branches 9.3 and 9.4, this includes a backpatch of commit e5ff9fefcd50 (of 9.5 era), since the original is not correctable without matching the coding pattern in 9.5 up. Reported-by: Daniel Wood Diagnosed-by: Daniel Wood Reviewed-by: Yi Wen Wong, Michaël Paquier Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com	2017-09-28 16:44:01 +02:00
Tom Lane	7769fc000a	Fix behavior when converting a float infinity to numeric. float8_numeric() and float4_numeric() failed to consider the possibility that the input is an IEEE infinity. The results depended on the platform-specific behavior of sprintf(): on most platforms you'd get something like ERROR: invalid input syntax for type numeric: "inf" but at least on Windows it's possible for the conversion to succeed and deliver a finite value (typically 1), due to a nonstandard output format from sprintf and lack of syntax error checking in these functions. Since our numeric type lacks the concept of infinity, a suitable conversion is impossible; the best thing to do is throw an explicit error before letting sprintf do its thing. While at it, let's use snprintf not sprintf. Overrunning the buffer should be impossible if sprintf does what it's supposed to, but this is cheap insurance against a stack smash if it doesn't. Problem reported by Taiki Kondo. Patch by me based on fix suggestion from KaiGai Kohei. Back-patch to all supported branches. Discussion: https://postgr.es/m/12A9442FBAE80D4E8953883E0B84E088C8C7A2@BPXM01GP.gisp.nec.co.jp	2017-09-27 17:05:53 -04:00
Tom Lane	28e0727076	Revert to 9.6 treatment of ALTER TYPE enumtype ADD VALUE. This reverts commit `15bc038f9`, along with the followon commits `1635e80d3` and `984c92074` that tried to clean up the problems exposed by bug #14825. The result was incomplete because it failed to address parallel-query requirements. With 10.0 release so close upon us, now does not seem like the time to be adding more code to fix that. I hope we can un-revert this code and add the missing parallel query support during the v11 cycle. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org	2017-09-27 16:14:43 -04:00
Tom Lane	9a50a93c7b	Improve wording of error message added in commit `714805010`. Per suggestions from Peter Eisentraut and David Johnston. Back-patch, like the previous commit. Discussion: https://postgr.es/m/E1dv9jI-0006oT-Fn@gemulon.postgresql.org	2017-09-26 15:25:56 -04:00
Tom Lane	5ea96efaa0	Fix failure-to-read-man-page in commit `899bd785c`. posix_fallocate() is not quite a drop-in replacement for fallocate(), because it is defined to return the error code as its function result, not in "errno". I (tgl) missed this because RHEL6's version seems to set errno as well. That is not the case on more modern Linuxen, though, as per buildfarm results. Aside from fixing the return-convention confusion, remove the test for ENOSYS; we expect that glibc will mask that for posix_fallocate, though it does not for fallocate. Keep the test for EINTR, because POSIX specifies that as a possible result, and buildfarm results suggest that it can happen in practice. Back-patch to 9.4, like the previous commit. Thomas Munro Discussion: https://postgr.es/m/1002664500.12301802.1471008223422.JavaMail.yahoo@mail.yahoo.com	2017-09-26 13:42:53 -04:00
Tom Lane	984c92074d	Remove heuristic same-transaction test from check_safe_enum_use(). The blacklist mechanism added by the preceding commit directly fixes most of the practical cases that the same-transaction test was meant to cover. What remains is use-cases like begin; create type e as enum('x'); alter type e add value 'y'; -- use 'y' somehow commit; However, because the same-transaction test is heuristic, it fails on small variants of that, such as renaming the type or changing its owner. Rather than try to explain the behavior to users, let's remove it and just have a rule that the newly added value can't be used before being committed, full stop. Perhaps later it will be worth the implementation effort and overhead to have a more accurate test for type-was-created-in-this-transaction. We'll wait for some field experience with v10 before deciding to do that. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org	2017-09-26 13:14:46 -04:00
Tom Lane	1635e80d30	Use a blacklist to distinguish original from add-on enum values. Commit `15bc038f9` allowed ALTER TYPE ADD VALUE to be executed inside transaction blocks, by disallowing the use of the added value later in the same transaction, except under limited circumstances. However, the test for "limited circumstances" was heuristic and could reject references to enum values that were created during CREATE TYPE AS ENUM, not just later. This breaks the use-case of restoring pg_dump scripts in a single transaction, as reported in bug #14825 from Balazs Szilfai. We can improve this by keeping a "blacklist" table of enum value OIDs created by ALTER TYPE ADD VALUE during the current transaction. Any visible-but-uncommitted value whose OID is not in the blacklist must have been created by CREATE TYPE AS ENUM, and can be used safely because it could not have a lifespan shorter than its parent enum type. This change also removes the restriction that a renamed enum value can't be used before being committed (unless it was on the blacklist). Andrew Dunstan, with cosmetic improvements by me. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org	2017-09-26 13:14:46 -04:00
Peter Eisentraut	ab28feae2b	Handle heap rewrites better in logical replication A FOR ALL TABLES publication naturally considers all base tables to be a candidate for replication. This includes transient heaps that are created during a table rewrite during DDL. This causes failures on the subscriber side because it will not have a table like pg_temp_16386 to receive data (and if it did, it would be the wrong table). The prevent this problem, we filter out any tables that match this naming pattern and match an actual table from FOR ALL TABLES publications. This is only a heuristic, meaning that user tables that match that naming could accidentally be omitted. A more robust solution might require an explicit marking of such tables in pg_class somehow. Reported-by: yxq <yxq@o2.pl> Bug: #14785 Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-09-26 10:13:43 -04:00
Robert Haas	22c5e73562	Remove lsn from HashScanPosData. This was intended as infrastructure for weakening VACUUM's locking requirements, similar to what was done for btree indexes in commit `2ed5b87f96`. However, for hash indexes, it seems that the improvements which are possible are actually extremely marginal. Furthermore, performing the LSN cross-check will end up skipping cleanup far more often than is necessary; we only care about page modifications due to a VACUUM, but the LSN check will fail if ANY modification has occurred. So, rather than pressing forward with that "optimization", just rip the LSN field out. Patch by me, reviewed by Ashutosh Sharma and Amit Kapila Discussion: http://postgr.es/m/CAA4eK1JxqqcuC5Un7YLQVhOYSZBS+t=3xqZuEkt5RyquyuxpwQ@mail.gmail.com	2017-09-26 09:16:45 -04:00
Robert Haas	79a4a665c0	Fix trivial mistake in README. You might think I (Robert) could manage to count to five without messing it up, but if you did, you would be wrong. Amit Kapila Discussion: http://postgr.es/m/CAA4eK1JxqqcuC5Un7YLQVhOYSZBS+t=3xqZuEkt5RyquyuxpwQ@mail.gmail.com	2017-09-26 09:01:43 -04:00
Tom Lane	899bd785c0	Avoid SIGBUS on Linux when a DSM memory request overruns tmpfs. On Linux, shared memory segments created with shm_open() are backed by swap files created in tmpfs. If the swap file needs to be extended, but there's no tmpfs space left, you get a very unfriendly SIGBUS trap. To avoid this, force allocation of the full request size when we create the segment. This adds a few cycles, but none that we wouldn't expend later anyway, assuming the request isn't hugely bigger than the actual need. Make this code #ifdef __linux__, because (a) there's not currently a reason to think the same problem exists on other platforms, and (b) applying posix_fallocate() to an FD created by shm_open() isn't very portable anyway. Back-patch to 9.4 where the DSM code came in. Thomas Munro, per a bug report from Amul Sul Discussion: https://postgr.es/m/1002664500.12301802.1471008223422.JavaMail.yahoo@mail.yahoo.com	2017-09-25 16:09:19 -04:00
Tom Lane	716ea626a8	Make construct_[md_]array return a valid empty array for zero-size input. If construct_array() or construct_md_array() were given a dimension of zero, they'd produce an array that contains no elements but has positive dimension. This violates a general expectation that empty arrays should have ndims = 0; in particular, while arrays like this print as empty, they don't compare equal to other empty arrays. Up to now we've expected callers to avoid making such calls and instead be careful to call construct_empty_array() if there would be no elements. But this has always been an easily missed case, and we've repeatedly had to fix callers to do it right. In bug #14826, Erwin Brandstetter pointed out yet another such oversight, in ts_lexize(); and a bit of examination of other call sites found at least two more with similar issues. So let's fix the problem centrally and permanently by changing these two functions to construct a proper zero-D empty array whenever the array would be empty. This renders a few explicit calls of construct_empty_array() redundant, but the only such place I found that really seemed worth changing was in ExecEvalArrayExpr(). Although this fixes some very old bugs, no back-patch: the problem is pretty minor and the risk of changing behavior seems to outweigh the benefit in stable branches. Discussion: https://postgr.es/m/20170923125723.1448.39412@wrigleys.postgresql.org Discussion: https://postgr.es/m/20570.1506198383@sss.pgh.pa.us	2017-09-25 11:55:24 -04:00
Peter Eisentraut	6dda0998af	Allow ICU to use SortSupport on Windows with UTF-8 There is no reason to ever prevent the use of SortSupport on Windows when ICU locales are used. We previously avoided SortSupport on Windows with UTF-8 server encoding and a non C-locale due to restrictions in Windows' libc functionality. This is now considered to be a restriction in one platform's libc collation provider, and not a more general platform restriction. Reported-by: Peter Geoghegan <pg@bowt.ie>	2017-09-24 07:55:24 -04:00
Tom Lane	24541ffd78	... and the very same bug in publicationListToArray(). Sigh.	2017-09-23 15:16:48 -04:00
Tom Lane	737639017c	Fix bogus size calculation in strlist_to_textarray(). It's making an array of Datum, not an array of text *. The mistake is harmless since those are currently the same size, but it's still wrong.	2017-09-23 15:01:59 -04:00
Tom Lane	335f3d04e4	Improve memory management in autovacuum.c. Invoke vacuum(), as well as "work item" processing, in the PortalContext that do_autovacuum() has manufactured, which will be reset before each such invocation. This ensures cleanup of any memory leaked by these operations. It also avoids the rather dangerous practice of calling vacuum() in a context that vacuum() itself will destroy while it runs. There's no known live bug there, but it's not hard to imagine introducing one if we leave it like this. Tom Lane, reviewed by Michael Paquier and Alvaro Herrera Discussion: https://postgr.es/m/13849.1506114543@sss.pgh.pa.us	2017-09-23 13:28:16 -04:00
Peter Eisentraut	0c5803b450	Refactor new file permission handling The file handling functions from fd.c were called with a diverse mix of notations for the file permissions when they were opening new files. Almost all files created by the server should have the same permissions set. So change the API so that e.g. OpenTransientFile() automatically uses the standard permissions set, and OpenTransientFilePerm() is a new function that takes an explicit permissions set for the few cases where it is needed. This also saves an unnecessary argument for call sites that are just opening an existing file. While we're reviewing these APIs, get rid of the FileName typedef and use the standard const char * for the file name and mode_t for the file mode. This makes these functions match other file handling functions and removes an unnecessary layer of mysteriousness. We can also get rid of a few casts that way. Author: David Steele <david@pgmasters.net>	2017-09-23 10:16:18 -04:00
Peter Eisentraut	aa6b7b72d9	Fix saving and restoring umask In two cases, we set a different umask for some piece of code and restore it afterwards. But if the contained code errors out, the umask is not restored. So add TRY/CATCH blocks to fix that.	2017-09-22 17:10:36 -04:00
Andres Freund	791961f59b	Add inline murmurhash32(uint32) function. The function already existed in tidbitmap.c but more users requiring fast hashing of 32bit ints are coming up. Author: Andres Freund Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de	2017-09-22 13:38:42 -07:00
Robert Haas	6a2fa09c0c	For wal_consistency_checking, mask page checksum as well as page LSN. If the LSN is different, the checksum will be different, too. Ashwin Agrawal, reviewed by Michael Paquier and Kuntal Ghosh Discussion: http://postgr.es/m/CALfoeis5iqrAU-+JAN+ZzXkpPr7+-0OAGv7QUHwFn=-wDy4o4Q@mail.gmail.com	2017-09-22 14:28:22 -04:00
Robert Haas	7c75ef5715	hash: Implement page-at-a-time scan. Commit `09cb5c0e7d` added a similar optimization to btree back in 2006, but nobody bothered to implement the same thing for hash indexes, probably because they weren't WAL-logged and had lots of other performance problems as well. As with the corresponding btree case, this eliminates the problem of potentially needing to refind our position within the page, and cuts down on pin/unpin traffic as well. Ashutosh Sharma, reviewed by Alexander Korotkov, Jesper Pedersen, Amit Kapila, and me. Some final edits to comments and README by me. Discussion: http://postgr.es/m/CAE9k0Pm3KTx93K8_5j6VMzG4h5F+SyknxUwXrN-zqSZ9X8ZS3w@mail.gmail.com	2017-09-22 13:56:27 -04:00
Tom Lane	ed87e19807	Mop-up for commit `85feb77aa0`. Adjust commentary in regc_pg_locale.c to remove mention of the possibility of not having <wctype.h> functions, since we no longer consider that. Eliminate duplicate code in wparser_def.c by generalizing the p_iswhat macro to take a parameter saying what to return for non-ASCII chars in C locale. (That's not really a consequence of the USE_WIDE_UPPER_LOWER-ectomy, but I noticed it while doing that.)	2017-09-22 11:35:12 -04:00
Tom Lane	85feb77aa0	Assume wcstombs(), towlower(), and sibling functions are always present. These functions are required by SUS v2, which is our minimum baseline for Unix platforms, and are present on all interesting Windows versions as well. Even our oldest buildfarm members have them. Thus, we were not testing the "!USE_WIDE_UPPER_LOWER" code paths, which explains why the bug fixed in commit `e6023ee7f` escaped detection. Per discussion, there seems to be no more real-world value in maintaining this option. Hence, remove the configure-time tests for wcstombs() and towlower(), remove the USE_WIDE_UPPER_LOWER symbol, and remove all the !USE_WIDE_UPPER_LOWER code. There's not actually all that much of the latter, but simplifying the #if nests is a win in itself. Discussion: https://postgr.es/m/20170921052928.GA188913@rfd.leadboat.com	2017-09-22 11:00:58 -04:00
Peter Eisentraut	e6023ee7fa	Fix build with !USE_WIDE_UPPER_LOWER The placement of the ifdef blocks in formatting.c was pretty bogus, so the code failed to compile if USE_WIDE_UPPER_LOWER was not defined. Reported-by: Peter Geoghegan <pg@bowt.ie> Reported-by: Noah Misch <noah@leadboat.com>	2017-09-22 09:26:38 -04:00
Tom Lane	7148050105	Give a better error for duplicate entries in VACUUM/ANALYZE column list. Previously, the code didn't think about this case and would just try to analyze such a column twice. That would fail at the point of inserting the second version of the pg_statistic row, with obscure error messsages like "duplicate key value violates unique constraint" or "tuple already updated by self", depending on context and PG version. We could allow the case by ignoring duplicate column specifications, but it seems better to reject it explicitly. The bogus error messages seem like arguably a bug, so back-patch to all supported versions. Nathan Bossart, per a report from Michael Paquier, and whacked around a bit by me. Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com	2017-09-21 18:13:11 -04:00
Andrew Dunstan	28ae524bbf	Quieten warnings about unused variables These variables are only ever written to in assertion-enabled builds, and the latest Microsoft compilers complain about such variables in non-assertion-enabled builds. Apparently they don't worry so much about variables that are written to but not read from, so most of our PG_USED_FOR_ASSERTS_ONLY variables don't cause the problem. Discussion: https://postgr.es/m/7800.1505950322@sss.pgh.pa.us	2017-09-21 08:41:14 -04:00
Robert Haas	9140cf8269	Associate partitioning information with each RelOptInfo. This is not used for anything yet, but it is necessary infrastructure for partition-wise join and for partition pruning without constraint exclusion. Ashutosh Bapat, reviewed by Amit Langote and with quite a few changes, mostly cosmetic, by me. Additional review and testing of this patch series by Antonin Houska, Amit Khandekar, Rafia Sabih, Rajkumar Raghuwanshi, Thomas Munro, and Dilip Kumar. Discussion: http://postgr.es/m/CAFjFpRfneFG3H+F6BaiXemMrKF+FY-POpx3Ocy+RiH3yBmXSNw@mail.gmail.com	2017-09-20 23:39:13 -04:00
Tom Lane	7b86c2ac95	Improve dubious memory management in pg_newlocale_from_collation(). pg_newlocale_from_collation() used malloc() and strdup() directly, which is generally not per backend coding style, and it didn't bother to check for failure results, but would just SIGSEGV instead. Also, if one of the numerous error checks in the middle of the function failed, the already-allocated memory would be leaked permanently. Admittedly, it's not a lot of memory, but it could build up if this function were called repeatedly for a bad collation. The first two problems are easily cured by palloc'ing in TopMemoryContext instead of calling libc directly. We can fairly easily dodge the leakage problem for the struct pg_locale_struct by filling in a temporary variable and allocating permanent storage only once we reach the bottom of the function. It's harder to get rid of the potential leakage for ICU's copy of the collcollate string, but at least that's only allocated after most of the error checks; so live with that aspect. Back-patch to v10 where this code came in, with one or another of the ICU patches.	2017-09-20 13:52:36 -04:00
Robert Haas	57eebca03a	Fix create_lateral_join_info to handle dead relations properly. Commit `0a480502b0` broke it. Report by Andreas Seltenreich. Fix by Ashutosh Bapat. Discussion: http://postgr.es/m/874ls2vrnx.fsf@ansel.ydns.eu	2017-09-20 10:20:10 -04:00
Robert Haas	7f3a3312ab	Fix typo. Thomas Munro Discussion: http://postgr.es/m/CAEepm=2j-HAgnBUrAazwS0ry7Z_ihk+d7g+Ye3u99+6WbiGt_Q@mail.gmail.com	2017-09-20 10:07:53 -04:00
Peter Eisentraut	be87b70b61	Sync process names between ps and pg_stat_activity Remove gratuitous differences in the process names shown in pg_stat_activity.backend_type and the ps output. Reviewed-by: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>	2017-09-20 08:59:03 -04:00
Andres Freund	fc49e24fa6	Make WAL segment size configurable at initdb time. For performance reasons a larger segment size than the default 16MB can be useful. A larger segment size has two main benefits: Firstly, in setups using archiving, it makes it easier to write scripts that can keep up with higher amounts of WAL, secondly, the WAL has to be written and synced to disk less frequently. But at the same time large segment size are disadvantageous for smaller databases. So far the segment size had to be configured at compile time, often making it unrealistic to choose one fitting to a particularly load. Therefore change it to a initdb time setting. This includes a breaking changes to the xlogreader.h API, which now requires the current segment size to be configured. For that and similar reasons a number of binaries had to be taught how to recognize the current segment size. Author: Beena Emerson, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com	2017-09-19 22:03:48 -07:00
Tom Lane	2d484f9b05	Remove no-op GiST support functions in the core GiST opclasses. The preceding patch allowed us to remove useless GiST support functions. This patch actually does that for all the no-op cases in the core GiST code. This buys us whatever performance gain is to be had, and more importantly exercises the preceding patch. There remain no-op functions in the contrib GiST opclasses, but those will take more work to remove. Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com	2017-09-19 23:32:59 -04:00
Tom Lane	d3a4f89d8a	Allow no-op GiST support functions to be omitted. There are common use-cases in which the compress and/or decompress functions can be omitted, with the result being that we make no data transformation when storing or retrieving index values. Previously, you had to provide a no-op function anyway, but this patch allows such opclass support functions to be omitted. Furthermore, if the compress function is omitted, then the core code knows that the stored representation is the same as the original data. This means we can allow index-only scans without requiring a fetch function to be provided either. Previously you had to provide a no-op fetch function if you wanted IOS to work. This reportedly provides a small performance benefit in such cases, but IMO the real reason for doing it is just to reduce the amount of useless boilerplate code that has to be written for GiST opclasses. Andrey Borodin, reviewed by Dmitriy Sarafannikov Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com	2017-09-19 23:32:59 -04:00
Andres Freund	896537f078	s/NULL byte/NUL byte/ in comment refering to C string terminator. Reported-By: Robert Haas Discussion: https://postgr.es/m/CA+Tgmoa+YBvWgFST2NVoeXjVSohEpK=vqnVCsoCkhTVVxfLcVQ@mail.gmail.com	2017-09-19 16:41:07 -07:00
Andres Freund	71edbb6f66	Avoid use of non-portable strnlen() in pgstat_clip_activity(). The use of strnlen rather than strlen was just paranoia. Instead of giving up on the paranoia, just implement the safeguard differently. And add a comment explaining why we're careful. Author: Andres Freund Discussion: https://postgr.es/m/E1duOkJ-0001Mc-U5@gemulon.postgresql.org	2017-09-19 14:25:47 -07:00
Andres Freund	54b6cd589a	Speedup pgstat_report_activity by moving mb-aware truncation to read side. Previously multi-byte aware truncation was done on every pgstat_report_activity() call - proving to be a bottleneck for workloads with long query strings that execute quickly. Instead move the truncation to the read side, which commonly is executed far less frequently. That's possible because all server encodings allow to determine the length of a multi-byte string from the first byte. Rename PgBackendStatus.st_activity to st_activity_raw so existing extension users of the field break - their code has to be adjusted to use pgstat_clip_activity(). Author: Andres Freund Tested-By: Khuntal Ghosh Reviewed-By: Robert Haas, Tom Lane Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkviecpz@alap3.anarazel.de	2017-09-19 12:51:14 -07:00
Tom Lane	ed22fb8b00	Cache datatype-output-function lookup info across calls of concat(). Testing indicates this can save a third to a half of the runtime of the function. Pavel Stehule, reviewed by Alexander Kuzmenkov Discussion: https://postgr.es/m/CAFj8pRAT62pRgjoHbgTfJUc2uLmeQ4saUj+yVJAEZUiMwNCmdg@mail.gmail.com	2017-09-19 15:09:38 -04:00
Andres Freund	f8e5f156b3	Rearm statement_timeout after each executed query. Previously statement_timeout, in the extended protocol, affected all messages till a Sync message. For clients that pipeline/batch query execution that's problematic. Instead disable timeout after each Execute message, and enable, if necessary, the timer in start_xact_command(). As that's done only for Execute and not Parse / Bind, pipelining the latter two could still cause undesirable timeouts. But a survey of protocol implementations shows that all drivers issue Sync messages when preparing, and adding timeout rearming to both is fairly expensive for the common parse / bind / execute sequence. Author: Tatsuo Ishii, editorialized by Andres Freund Reviewed-By: Takayuki Tsunakawa, Andres Freund Discussion: https://postgr.es/m/20170222.115044.1665674502985097185.t-ishii@sraoss.co.jp	2017-09-18 19:36:44 -07:00
Andres Freund	0fb9e4ace5	Fix uninitialized variable in dshash.c. A bugfix for commit `8c0d7bafad`. The code would have crashed if hashtable->size_log2 ever had the same value as hashtable->control->size_log2 by coincidence. Per Valgrind. Author: Thomas Munro Reported-By: Tomas Vondra Discussion: https://postgr.es/m/e72fb33c-4f31-f276-e972-263d9b59554d%402ndquadrant.com	2017-09-18 17:43:37 -07:00
Andres Freund	ec9e05b3c3	Fix crash restart bug introduced in `8356753c21`. The bug was caused by not re-reading the control file during crash recovery restarts, which lead to an attempt to pfree() shared memory contents. The fix is to re-read the control file, which seems good anyway. It's unclear as of this moment, whether we want to keep the refactoring introduced in the commit referenced above, or come up with an alternative approach. But fixing the bug in the mean time seems like a good idea regardless. A followup commit will introduce regression test coverage for crash restarts. Reported-By: Tom Lane Discussion: https://postgr.es/m/14134.1505572349@sss.pgh.pa.us	2017-09-18 17:25:49 -07:00
Tom Lane	eb5c404b17	Minor code-cleanliness improvements for btree. Make the btree page-flags test macros (P_ISLEAF and friends) return clean boolean values, rather than values that might not fit in a bool. Use them in a few places that were randomly referencing the flag bits directly. In passing, change access/nbtree/'s only direct use of BUFFER_LOCK_SHARE to BT_READ. (Some think we should go the other way, but as long as we have BT_READ/BT_WRITE, let's use them consistently.) Masahiko Sawada, reviewed by Doug Doole Discussion: https://postgr.es/m/CAD21AoBmWPeN=WBB5Jvyz_Nt3rmW1ebUyAnk3ZbJP3RMXALJog@mail.gmail.com	2017-09-18 16:36:28 -04:00
Tom Lane	66917bfaa7	Make ExplainOpenGroup and ExplainCloseGroup public. Extensions with custom plan nodes might like to use these in their EXPLAIN output. Hadi Moshayedi Discussion: https://postgr.es/m/CA+_kT_dU-rHCN0u6pjA6bN5CZniMfD=-wVqPY4QLrKUY_uJq5w@mail.gmail.com	2017-09-18 16:01:16 -04:00
Tom Lane	4bd1994650	Make DatumGetFoo/PG_GETARG_FOO/PG_RETURN_FOO macro names more consistent. By project convention, these names should include "P" when dealing with a pointer type; that is, if the result of a GETARG macro is of type FOO , it should be called PG_GETARG_FOO_P not just PG_GETARG_FOO. Some newer types such as JSONB and ranges had not followed the convention, and a number of contrib modules hadn't gotten that memo either. Rename the offending macros to improve consistency. In passing, fix a few places that thought PG_DETOAST_DATUM() returns a Datum; it does not, it returns "struct varlena ". Applying DatumGetPointer to that happens not to cause any bad effects today, but it's formally wrong. Also, adjust an ltree macro that was designed without any thought for what pgindent would do with it. This is all cosmetic and shouldn't have any impact on generated code. Mark Dilger, some further tweaks by me Discussion: https://postgr.es/m/EA5676F4-766F-4F38-8348-ECC7DB427C6A@gmail.com	2017-09-18 15:21:23 -04:00
Tom Lane	3e1683d37e	Fix, or at least ameliorate, bugs in logicalrep_worker_launch(). If we failed to get a background worker slot, the code just walked away from the logicalrep-worker slot it already had, leaving that looking like the worker is still starting up. This led to an indefinite hang in subscription startup, as reported by Thomas Munro. We must release the slot on failure. Also fix a thinko: we must capture the worker slot's generation before releasing LogicalRepWorkerLock the first time, else testing to see if it's changed is pretty meaningless. BTW, the CHECK_FOR_INTERRUPTS() in WaitForReplicationWorkerAttach is a ticking time bomb, even without considering the possibility of elog(ERROR) in one of the other functions it calls. Really, this entire business needs a redesign with some actual thought about error recovery. But for now I'm just band-aiding the case observed in testing. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/CAEepm=2bP3TBMFBArP6o20AZaRduWjMnjCjt22hSdnA-EvrtCw@mail.gmail.com	2017-09-18 11:39:55 -04:00
Peter Eisentraut	8edacab209	Fix DROP SUBSCRIPTION hang When ALTER SUBSCRIPTION DISABLE is run in the same transaction before DROP SUBSCRIPTION, the latter will hang because workers will still be running, not having seen the DISABLE committed, and DROP SUBSCRIPTION will wait until the workers have vacated the replication origin slots. Previously, DROP SUBSCRIPTION killed the logical replication workers immediately only if it was going to drop the replication slot, otherwise it scheduled the worker killing for the end of the transaction, as a result of `7e174fa793`. This, however, causes the present problem. To fix, kill the workers immediately in all cases. This covers all cases: A subscription that doesn't have a replication slot must be disabled. It was either disabled in the same transaction, or it was already disabled before the current transaction, but then there shouldn't be any workers left and this won't make a difference. Reported-by: Arseny Sher <a.sher@postgrespro.ru> Discussion: https://www.postgresql.org/message-id/flat/87mv6av84w.fsf%40ars-thinkpad	2017-09-17 22:00:23 -04:00
Tom Lane	6f44fe7f12	Allow rel_is_distinct_for() to look through RelabelType below OpExpr. This lets it do the right thing for, eg, varchar columns. Back-patch to 9.5 where this logic appeared. David Rowley, per report from Kim Rose Carlsen Discussion: https://postgr.es/m/VI1PR05MB17091F9A9876528055D6A827C76D0@VI1PR05MB1709.eurprd05.prod.outlook.com	2017-09-17 15:28:51 -04:00
Tom Lane	27c6619e9c	Fix possible dangling pointer dereference in trigger.c. AfterTriggerEndQuery correctly notes that the query_stack could get repalloc'd during a trigger firing, but it nonetheless passes the address of a query_stack entry to afterTriggerInvokeEvents, so that if such a repalloc occurs, afterTriggerInvokeEvents is already working with an obsolete dangling pointer while it scans the rest of the events. Oops. The only code at risk is its "delete_ok" cleanup code, so we can prevent unsafe behavior by passing delete_ok = false instead of true. However, that could have a significant performance penalty, because the point of passing delete_ok = true is to not have to re-scan possibly a large number of dead trigger events on the next time through the loop. There's more than one way to skin that cat, though. What we can do is delete all the "chunks" in the event list except the last one, since we know all events in them must be dead. Deleting the chunks is work we'd have had to do later in AfterTriggerEndQuery anyway, and it ends up saving rescanning of just about the same events we'd have gotten rid of with delete_ok = true. In v10 and HEAD, we also have to be careful to mop up any per-table after_trig_events pointers that would become dangling. This is slightly annoying, but I don't think that normal use-cases will traverse this code path often enough for it to be a performance problem. It's pretty hard to hit this in practice because of the unlikelihood of the query_stack getting resized at just the wrong time. Nonetheless, it's definitely a live bug of ancient standing, so back-patch to all supported branches. Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us	2017-09-17 14:50:01 -04:00
Tom Lane	fd31f9f033	Ensure that BEFORE STATEMENT triggers fire the right number of times. Commit `0f79440fb` introduced mechanism to keep AFTER STATEMENT triggers from firing more than once per statement, which was formerly possible if more than one FK enforcement action had to be applied to a given table. Add a similar mechanism for BEFORE STATEMENT triggers, so that we don't have the unexpected situation of firing BEFORE STATEMENT triggers more often than AFTER STATEMENT. As with the previous patch, back-patch to v10. Discussion: https://postgr.es/m/22315.1505584992@sss.pgh.pa.us	2017-09-17 12:16:38 -04:00
Tom Lane	cad22075bc	Fix bogus size calculation introduced by commit `cc5f81366`. The elements of RecordCacheArray are TupleDesc, not TupleDesc *. Those are actually the same size, so that this error is harmless, but it's still wrong --- and it might bite us someday, if TupleDesc ever became a struct, say. Per Coverity.	2017-09-17 11:35:27 -04:00
Tom Lane	0f79440fb0	Fix SQL-spec incompatibilities in new transition table feature. The standard says that all changes of the same kind (insert, update, or delete) caused in one table by a single SQL statement should be reported in a single transition table; and by that, they mean to include foreign key enforcement actions cascading from the statement's direct effects. It's also reasonable to conclude that if the standard had wCTEs, they would say that effects of wCTEs applying to the same table as each other or the outer statement should be merged into one transition table. We weren't doing it like that. Hence, arrange to merge tuples from multiple update actions into a single transition table as much as we can. There is a problem, which is that if the firing of FK enforcement triggers and after-row triggers with transition tables is interspersed, we might need to report more tuples after some triggers have already seen the transition table. It seems like a bad idea for the transition table to be mutable between trigger calls. There's no good way around this without a major redesign of the FK logic, so for now, resolve it by opening a new transition table each time this happens. Also, ensure that AFTER STATEMENT triggers fire just once per statement, or once per transition table when we're forced to make more than one. Previous versions of Postgres have allowed each FK enforcement query to cause an additional firing of the AFTER STATEMENT triggers for the referencing table, but that's certainly not per spec. (We're still doing multiple firings of BEFORE STATEMENT triggers, though; is that something worth changing?) Also, forbid using transition tables with column-specific UPDATE triggers. The spec requires such transition tables to show only the tuples for which the UPDATE trigger would have fired, which means maintaining multiple transition tables or else somehow filtering the contents at readout. Maybe someday we'll bother to support that option, but it looks like a lot of trouble for a marginal feature. The transition tables are now managed by the AfterTriggers data structures, rather than being directly the responsibility of ModifyTable nodes. This removes a subtransaction-lifespan memory leak introduced by my previous band-aid patch `3c4359521`. In passing, refactor the AfterTriggers data structures to reduce the management overhead for them, by using arrays of structs rather than several parallel arrays for per-query-level and per-subtransaction state. I failed to resist the temptation to do some copy-editing on the SGML docs about triggers, above and beyond merely documenting the effects of this patch. Back-patch to v10, because we don't want the semantics of transition tables to change post-release. Patch by me, with help and review from Thomas Munro. Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org	2017-09-16 13:20:36 -04:00
Robert Haas	9361f6f54e	After a MINVALUE/MAXVALUE bound, allow only more of the same. In the old syntax, which used UNBOUNDED, we had a similar restriction, but commit `d363d42bb9`, which changed the syntax, eliminated it. Put it back. Patch by me, reviewed by Dean Rasheed. Discussion: http://postgr.es/m/CA+Tgmobs+pLPC27tS3gOpEAxAffHrq5w509cvkwTf9pF6cWYbg@mail.gmail.com	2017-09-15 21:15:55 -04:00
Peter Eisentraut	3012061b86	Apply pg_get_serial_sequence() to identity column sequences as well Bug: #14813	2017-09-15 14:21:20 -04:00
Tom Lane	71aa4801a8	Get rid of shared_record_typmod_registry_worker_detach; it doesn't work. This code is unsafe, as proven by buildfarm failures, because it tries to access shared memory that might already be gone. It's also unnecessary, because we're about to exit the process anyway and so the record type cache should never be accessed again. The idea was to lay some foundations for someday recycling workers --- which would require attaching to a different shared tupdesc registry --- but that will require considerably more thought. In the meantime let's save some bytes by just removing the nonfunctional code. Problem identification, and proposal to fix by removing functionality from the detach function, by Thomas Munro. I went a bit further by removing the function altogether. Discussion: https://postgr.es/m/E1dsguX-00056N-9x@gemulon.postgresql.org	2017-09-15 10:52:30 -04:00
Tom Lane	eaa4070543	Don't use anonymous unions. Commit `cc5f81366c` introduced a language feature that is not acceptable to strict C89 compilers. Thomas Munro Per buildfarm.	2017-09-15 00:57:38 -04:00
Andres Freund	6b65a7fe62	Remove TupleDesc remapping logic from tqueue.c. With the introduction of a shared memory record typmod registry, it is no longer necessary to remap record typmods when sending tuples between backends so most of tqueue.c can be removed. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-09-14 19:59:29 -07:00
Andres Freund	cc5f81366c	Add support for coordinating record typmods among parallel workers. Tuples can have type RECORDOID and a typmod number that identifies a blessed TupleDesc in a backend-private cache. To support the sharing of such tuples through shared memory and temporary files, provide a typmod registry in shared memory. To achieve that, introduce per-session DSM segments, created on demand when a backend first runs a parallel query. The per-session DSM segment has a table-of-contents just like the per-query DSM segment, and initially the contents are a shared record typmod registry and a DSA area to provide the space it needs to grow. State relating to the current session is accessed via a Session object reached through global variable CurrentSession that may require significant redesign further down the road as we figure out what else needs to be shared or remodelled. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-09-14 19:59:21 -07:00
Robert Haas	81276fdd39	Add missing tags to GetCommandLogLevel. Otherwise, log_statement = 'ddl' causes errors if those statement types are used. Michael Paquier, reviewed by Ashutosh Sharma Discussion: http://postgr.es/m/CAB7nPqStC3HkE76Q1MnHsVd1vF1Td9zXApzYadzDMyLMRkkGrw@mail.gmail.com	2017-09-14 17:19:04 -04:00
Andres Freund	8356753c21	Perform only one ReadControlFile() during startup. Previously we read the control file in multiple places. But soon the segment size will be configurable and stored in the control file, and that needs to be available earlier than it currently is needed. Instead of adding yet another place where it's read, refactor things so there's a single processing of the control file during startup (in EXEC_BACKEND that's every individual backend's startup). Author: Andres Freund Discussion: http://postgr.es/m/20170913092828.aozd3gvvmw67gmyc@alap3.anarazel.de	2017-09-14 14:14:34 -07:00
Robert Haas	0a480502b0	Expand partitioned table RTEs level by level, without flattening. Flattening the partitioning hierarchy at this stage makes various desirable optimizations difficult. The original use case for this patch was partition-wise join, which wants to match up the partitions in one partitioning hierarchy with those in another such hierarchy. However, it now seems that it will also be useful in making partition pruning work using the PartitionDesc rather than constraint exclusion, because with a flattened expansion, we have no easy way to figure out which PartitionDescs apply to which leaf tables in a multi-level partition hierarchy. As it turns out, we end up creating both rte->inh and !rte->inh RTEs for each intermediate partitioned table, just as we previously did for the root table. This seems unnecessary since the partitioned tables have no storage and are not scanned. We might want to go back and rejigger things so that no partitioned tables (including the parent) need !rte->inh RTEs, but that seems to require some adjustments not related to the core purpose of this patch. Ashutosh Bapat, reviewed by me and by Amit Langote. Some final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRd=1venqLL7oGU=C1dEkuvk2DJgvF+7uKbnPHaum1mvHQ@mail.gmail.com	2017-09-14 15:41:08 -04:00
Robert Haas	77b6b5e9ce	Make RelationGetPartitionDispatchInfo expand depth-first. With this change, the order of leaf partitions as returned by RelationGetPartitionDispatchInfo should now be the same as the order used by expand_inherited_rtentry. This will make it simpler for future patches to match up the partition dispatch information with the planner data structures. The new code is also, in my opinion anyway, simpler and easier to understand. Amit Langote, reviewed by Amit Khandekar. I also reviewed and made a few cosmetic revisions. Discussion: http://postgr.es/m/d98d4761-5071-1762-501e-0e15047c714b@lab.ntt.co.jp	2017-09-14 12:28:50 -04:00
Robert Haas	42651bdd68	Fix inconsistent capitalization. Amit Langote Discussion: http://postgr.es/m/a83a0899-19f5-594c-9aac-3ba0f16989a1@lab.ntt.co.jp	2017-09-14 11:11:12 -04:00
Robert Haas	1555566d9e	Set partitioned_rels appropriately when UNION ALL is used. In most cases, this omission won't matter, because the appropriate locks will have been acquired during parse/plan or by AcquireExecutorLocks. But it's a bug all the same. Report by Ashutosh Bapat. Patch by me, reviewed by Amit Langote. Discussion: http://postgr.es/m/CAFjFpRdHb_ZnoDTuBXqrudWXh3H1ibLkr6nHsCFT96fSK4DXtA@mail.gmail.com	2017-09-14 11:00:39 -04:00
Andres Freund	1ab973ab60	Properly check interrupts in execScan.c. During the development of `d47cfef711` the CFI()s in ExecScan() were moved back and forth, ending up in the wrong place. Thus queries that largely spend their time in ExecScan(), and have neither projection nor a qual, can't be cancelled in a timely manner. Reported-By: Jeff Janes Author: Andres Freund Discussion: https://postgr.es/m/CAMkU=1weDXp8eLLPt9SO1LEUsJYYK9cScaGhLKpuN+WbYo9b5g@mail.gmail.com Backpatch: 10, as `d47cfef711`	2017-09-14 02:00:14 -07:00
Tom Lane	7d08ce286c	Distinguish selectivity of < from <= and > from >=. Historically, the selectivity functions have simply not distinguished < from <=, or > from >=, arguing that the fraction of the population that satisfies the "=" aspect can be considered to be vanishingly small, if the comparison value isn't any of the most-common-values for the variable. (If it is, the code path that executes the operator against each MCV will take care of things properly.) But that isn't really true unless we're dealing with a continuum of variable values, and in practice we seldom are. If "x = const" would estimate a nonzero number of rows for a given const value, then it follows that we ought to estimate different numbers of rows for "x < const" and "x <= const", even if the const is not one of the MCVs. Handling this more honestly makes a significant difference in edge cases, such as the estimate for a tight range (x BETWEEN y AND z where y and z are close together). Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly split scalargtsel into scalargtsel/scalargesel. Adjust <= and >= operator definitions to reference the new selectivity functions. Improve the core ineq_histogram_selectivity() function to make a correction for equality. (Along the way, I learned quite a bit about exactly why that function gives good answers, which I tried to memorialize in improved comments.) The corresponding join selectivity functions were, and remain, just stubs. But I chose to split them similarly, to avoid confusion and to prevent the need for doing this exercise again if someone ever makes them less stubby. In passing, change ineq_histogram_selectivity's clamp for extreme probability estimates so that it varies depending on the histogram size, instead of being hardwired at 0.0001. With the default histogram size of 100 entries, you still get the old clamp value, but bigger histograms should allow us to put more faith in edge values. Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us	2017-09-13 11:12:39 -04:00
Peter Eisentraut	61975d6c2c	Improve error message in WAL sender The previous error message when attempting to run a general SQL command in a physical replication WAL sender was a bit sloppy. Reported-by: Fujii Masao <masao.fujii@gmail.com>	2017-09-13 08:31:03 -04:00
Peter Eisentraut	1a2fdc99a4	Define LDAP_NO_ATTRS if necessary. Commit `83aaac41c6` introduced the use of LDAP_NO_ATTRS to avoid requesting a dummy attribute when doing search+bind LDAP authentication. It turns out that not all LDAP implementations define that macro, but its value is fixed by the protocol so we can define it ourselves if it's missing. Author: Thomas Munro Reported-By: Ashutosh Sharma Discussion: https://postgr.es/m/CAE9k0Pm6FKCfPCiAr26-L_SMGOA7dT_k0%2B3pEbB8%2B-oT39xRpw%40mail.gmail.com	2017-09-13 08:22:42 -04:00
Andres Freund	6e7baa3227	Introduce BYTES unit for GUCs. This is already useful for track_activity_query_size, and will further be used in a later commit making the WAL segment size configurable. Author: Beena Emerson Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAOG9ApEu8bXVwBxkOO9J7ZpM76TASK_vFMEEiCEjwhMmSLiaqQ@mail.gmail.com	2017-09-12 12:13:12 -07:00
Peter Eisentraut	83aaac41c6	Allow custom search filters to be configured for LDAP auth Before, only filters of the form "(<ldapsearchattribute>=<user>)" could be used to search an LDAP server. Introduce ldapsearchfilter so that more general filters can be configured using patterns, like "(\|(uid=$username)(mail=$username))" and "(&(uid=$username) (objectClass=posixAccount))". Also allow search filters to be included in an LDAP URL. Author: Thomas Munro Reviewed-By: Peter Eisentraut, Mark Cave-Ayland, Magnus Hagander Discussion: https://postgr.es/m/CAEepm=0XTkYvMci0WRubZcf_1am8=gP=7oJErpsUfRYcKF2gwg@mail.gmail.com	2017-09-12 09:49:04 -04:00
Andres Freund	c1898c3e1e	Constify numeric.c. This allows the compiler/linker to move the static variables to a read-only segment. Not all the signature changes are necessary, but it seems better to apply const in a consistent manner. Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20170910232154.asgml44ji2b7lv3d@alap3.anarazel.de	2017-09-11 13:44:37 -07:00
Peter Eisentraut	821fb8cdbf	Message style fixes	2017-09-11 11:21:27 -04:00
Tom Lane	3c43595217	Quick-hack fix for foreign key cascade vs triggers with transition tables. AFTER triggers using transition tables crashed if they were fired due to a foreign key ON CASCADE update. This is because ExecEndModifyTable flushes the transition tables, on the assumption that any trigger that could need them was already fired during ExecutorFinish. Normally that's true, because we don't allow transition-table-using triggers to be deferred. However, foreign key CASCADE updates force any triggers on the referencing table to be deferred to the outer query level, by means of the EXEC_FLAG_SKIP_TRIGGERS flag. I don't recall all the details of why it's like that and am pretty loath to redesign it right now. Instead, just teach ExecEndModifyTable to skip destroying the TransitionCaptureState when that flag is set. This will allow the transition table data to survive until end of the current subtransaction. This isn't a terribly satisfactory solution, because (1) we might be leaking the transition tables for much longer than really necessary, and (2) as things stand, an AFTER STATEMENT trigger will fire once per RI updating query, ie once per row updated or deleted in the referenced table. I suspect that is not per SQL spec. But redesigning this is a research project that we're certainly not going to get done for v10. So let's go with this hackish answer for now. In passing, tweak AfterTriggerSaveEvent to not save the transition_capture pointer into the event record for a deferrable trigger. This is not necessary to fix the current bug, but it avoids letting dangling pointers to long-gone transition tables persist in the trigger event queue. That's at least a safety feature. It might also allow merging shared trigger states in more cases than before. I added a regression test that demonstrates the crash on unpatched code, and also exposes the behavior of firing the AFTER STATEMENT triggers once per row update. Per bug #14808 from Philippe Beaudoin. Back-patch to v10. Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org	2017-09-10 14:59:56 -04:00
Tom Lane	f80e782a6b	Remove pre-order and post-order traversal logic for red-black trees. This code isn't used, and there's no clear reason why anybody would ever want to use it. These traversal mechanisms don't yield a visitation order that is semantically meaningful for any external purpose, nor are they any faster or simpler than the left-to-right or right-to-left traversals. (In fact, some rough testing suggests they are slower :-(.) Moreover, these mechanisms are impossible to test in any arm's-length fashion; doing so requires knowledge of the red-black tree's internal implementation. Hence, let's just jettison them. Discussion: https://postgr.es/m/17735.1505003111@sss.pgh.pa.us	2017-09-10 13:19:11 -04:00
Tom Lane	fdf87ed451	Fix failure-to-copy bug in commit `6f6b99d13`. The previous coding of get_qual_for_list() was careful to copy everything it was using from the input data structure. The new version missed making a copy of pass-by-ref datum values that it's inserting into Consts. This is not optional, however, as revealed by buildfarm failures on machines running -DRELCACHE_FORCE_RELEASE: we're copying from a relcache entry that could go away before the required lifespan of our output expression. I'm pretty sure -DCLOBBER_CACHE_ALWAYS machines won't like this either, but none of them have reported in yet.	2017-09-08 20:45:31 -04:00
Tom Lane	e56dd7cf50	Fix uninitialized-variable bug. map_partition_varattnos() failed to set its found_whole_row output parameter if the given expression list was NIL. This seems to be a pre-existing bug that chanced to be exposed by commit `6f6b99d13`. It might be unreachable in v10, but I have little faith in that proposition, so back-patch. Per buildfarm.	2017-09-08 19:04:32 -04:00
Robert Haas	6f6b99d133	Allow a partitioned table to have a default partition. Any tuples that don't route to any other partition will route to the default partition. Jeevan Ladhe, Beena Emerson, Ashutosh Bapat, Rahila Syed, and Robert Haas, with review and testing at various stages by (at least) Rushabh Lathia, Keith Fiske, Amit Langote, Amul Sul, Rajkumar Raghuanshi, Sven Kunze, Kyotaro Horiguchi, Thom Brown, Rafia Sabih, and Dilip Kumar. Discussion: http://postgr.es/m/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com Discussion: http://postgr.es/m/CAOG9ApEYj34fWMcvBMBQ-YtqR9fTdXhdN82QEKG0SVZ6zeL1xg@mail.gmail.com	2017-09-08 17:28:04 -04:00
Tom Lane	3cf17c9d47	Remove mention of password_encryption = plain in postgresql.conf.sample. Evidently missed in commit `eb61136dc`. Spotted by Oleg Bartunov. Discussion: https://postgr.es/m/CAF4Au4wz_iK5r4fnTnnd8XqioAZQs-P7-VsEAfivW34zMVpAmw@mail.gmail.com	2017-09-08 14:38:54 -04:00
Robert Haas	f0a0c17c1b	Refactor get_partition_for_tuple a bit. Pending patches for both default partitioning and hash partitioning find the current coding pattern to be inconvenient. Change it so that we switch on the partitioning method first and then do whatever is needed. Amul Sul, reviewed by Jeevan Ladhe, with a few adjustments by me. Discussion: http://postgr.es/m/CAAJ_b97mTb=dG2pv6+1ougxEVZFVnZJajW+0QHj46mEE7WsoOQ@mail.gmail.com Discussion: http://postgr.es/m/CAOgcT0M37CAztEinpvjJc18EdHfm23fw0EG9-36Ya=+rEFUqaQ@mail.gmail.com	2017-09-07 21:13:42 -04:00
Tom Lane	3ca930fc39	Improve performance of get_actual_variable_range with recently-dead tuples. In commit `fccebe421`, we hacked get_actual_variable_range() to scan the index with SnapshotDirty, so that if there are many uncommitted tuples at the end of the index range, it wouldn't laboriously scan through all of them looking for a live value to return. However, that didn't fix it for the case of many recently-dead tuples at the end of the index; SnapshotDirty recognizes those as committed dead and so we're back to the same problem. To improve the situation, invent a "SnapshotNonVacuumable" snapshot type and use that instead. The reason this helps is that, if the snapshot rejects a given index entry, we know that the indexscan will mark that index entry as killed. This means the next get_actual_variable_range() scan will proceed past that entry without visiting the heap, making the scan a lot faster. We may end up accepting a recently-dead tuple as being the estimated extremal value, but that doesn't seem much worse than the compromise we made before to accept not-yet-committed extremal values. The cost of the scan is still proportional to the number of dead index entries at the end of the range, so in the interval after a mass delete but before VACUUM's cleaned up the mess, it's still possible for get_actual_variable_range() to take a noticeable amount of time, if you've got enough such dead entries. But the constant factor is much much better than before, since all we need to do with each index entry is test its "killed" bit. We chose to back-patch commit `fccebe421` at the time, but I'm hesitant to do so here, because this form of the problem seems to affect many fewer people. Also, even when it happens, it's less bad than the case fixed by commit `fccebe421` because we don't get the contention effects from expensive TransactionIdIsInProgress tests. Dmitriy Sarafannikov, reviewed by Andrey Borodin Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru	2017-09-07 19:41:51 -04:00
Peter Eisentraut	1356f78ea9	Reduce excessive dereferencing of function pointers It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com	2017-09-07 13:56:09 -04:00
Robert Haas	9d71323dac	Even if some partitions are foreign, allow tuple routing. This doesn't allow routing tuple to the foreign partitions themselves, but it permits tuples to be routed to regular partitions despite the presence of foreign partitions in the same inheritance hierarchy. Etsuro Fujita, reviewed by Amit Langote and by me. Discussion: http://postgr.es/m/bc3db4c1-1693-3b8a-559f-33ad2b50b7ad@lab.ntt.co.jp	2017-09-07 10:58:21 -04:00
Tom Lane	6eb52da394	Fix handling of savepoint commands within multi-statement Query strings. Issuing a savepoint-related command in a Query message that contains multiple SQL statements led to a FATAL exit with a complaint about "unexpected state STARTED". This is a shortcoming of commit `4f896dac1`, which attempted to prevent such misbehaviors in multi-statement strings; its quick hack of marking the individual statements as "not top-level" does the wrong thing in this case, and isn't a very accurate description of the situation anyway. To fix, let's introduce into xact.c an explicit model of what happens for multi-statement Query strings. This is an "implicit transaction block in progress" state, which for many purposes works like the normal TBLOCK_INPROGRESS state --- in particular, IsTransactionBlock returns true, causing the desired result that PreventTransactionChain will throw error. But in case of error abort it works like TBLOCK_STARTED, allowing the transaction to be cancelled without need for an explicit ROLLBACK command. Commit `4f896dac1` is reverted in toto, so that we go back to treating the individual statements as "top level". We could have left it as-is, but this allows sharpening the error message for PreventTransactionChain calls inside functions. Except for getting a normal error instead of a FATAL exit for savepoint commands, this patch should result in no user-visible behavioral change (other than that one error message rewording). There are some things we might want to do in the line of changing the appearance or wording of error and warning messages around this behavior, which would be much simpler to do now that it's an explicitly modeled state. But I haven't done them here. Although this fixes a long-standing bug, no backpatch. The consequences of the bug don't seem severe enough to justify the risk that this commit itself creates some new issue. Patch by me, but it owes something to previous investigation by Takayuki Tsunakawa, who also reported the bug in the first place. Also thanks to Michael Paquier for reviewing. Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05	2017-09-07 09:49:55 -04:00
Simon Riggs	f06588a8e6	Exclude special values in recovery_target_time recovery_target_time accepts timestamp input, though does not allow use of special values, e.g. “today”. Report a useful error message for these cases. Reported-by: Piotr Stefaniak Author: Simon Riggs Discussion: https://postgr.es/m/CANP8+jJdKA+BkkYLWz9zAm16Y0s2ExBv0WfpAwXdTpPfWnA9Bg@mail.gmail.com	2017-09-07 04:56:34 -07:00
Simon Riggs	5b6d13eec7	Allow SET STATISTICS on expression indexes Index columns are referenced by ordinal number rather than name, e.g. CREATE INDEX coord_idx ON measured (x, y, (z + t)); ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000; Incompatibility note for release notes: \d+ for indexes now also displays Stats Target Authors: Alexander Korotkov, with contribution by Adrien NAYRAT Review: Adrien NAYRAT, Simon Riggs Wordsmith: Simon Riggs	2017-09-06 13:46:01 -07:00
Tom Lane	8689e38263	Clean up handling of dropped columns in NAMEDTUPLESTORE RTEs. The NAMEDTUPLESTORE patch piggybacked on the infrastructure for TABLEFUNC/VALUES/CTE RTEs, none of which can ever have dropped columns, so the possibility was ignored most places. Fix that, including adding a specification to parsenodes.h about what it's supposed to look like. In passing, clean up assorted comments that hadn't been maintained properly by said patch. Per bug #14799 from Philippe Beaudoin. Back-patch to v10. Discussion: https://postgr.es/m/20170906120005.25630.84360@wrigleys.postgresql.org	2017-09-06 10:41:05 -04:00
Tom Lane	6e427aa4e5	Use lfirst_node() and linitial_node() where appropriate in planner.c. There's no particular reason to target this module for the first wholesale application of these macros; but we gotta start somewhere. Ashutosh Bapat and Jeevan Chalke Discussion: https://postgr.es/m/CAFjFpRcNr3r=u0ni=7A4GD9NnHQVq+dkFafzqo2rS6zy=dt1eg@mail.gmail.com	2017-09-05 15:57:48 -04:00
Peter Eisentraut	17273d059c	Remove unnecessary parentheses in return statements The parenthesized style has only been used in a few modules. Change that to use the style that is predominant across the whole tree. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>	2017-09-05 14:52:55 -04:00
Alvaro Herrera	ebd346caf4	Correct base backup throttling Throttling for sending a base backup in walsender is broken for the case where there is a lot of WAL traffic, because the latch used to put the walsender to sleep is also signalled by regular WAL traffic (and each signal causes an additional batch of data to be sent); the net effect is that there is no or little actual throttling. This is undesirable, so rewrite the sleep into a loop to achieve the desired effeect. Author: Jeff Janes, small tweaks by me Reviewed-by: Antonin Houska Discussion: https://postgr.es/m/CAMkU=1xH6mde-yL-Eo1TKBGNd0PB1-TMxvrNvqcAkN-qr2E9mw@mail.gmail.com	2017-09-05 17:27:30 +02:00
Tom Lane	4faa1dc2eb	Suppress compiler warnings in dshash.c. Some compilers complain, not unreasonably, about left-shifting an int32 "1" and then assigning the result to an int64. In practice I sure hope that this data structure never gets large enough that an overflow would actually occur; but let's cast the constant to the right type to avoid the hazard. In passing, fix a typo in dshash.h. Amit Kapila, adjusted as per comment from Thomas Munro. Discussion: https://postgr.es/m/CAA4eK1+5vfVMYtjK_NX8O3-42yM3o80qdqWnQzGquPrbq6mb+A@mail.gmail.com	2017-09-03 11:12:29 -04:00
Tom Lane	51daa7bdb3	Improve division of labor between execParallel.c and nodeGather[Merge].c. Move the responsibility for creating/destroying TupleQueueReaders into execParallel.c, to avoid duplicative coding in nodeGather.c and nodeGatherMerge.c. Also, instead of having DestroyTupleQueueReader do shm_mq_detach, do it in the caller (which is now only ExecParallelFinish). This means execParallel.c does both the attaching and detaching of the tuple-queue-reader shm_mqs, which seems less weird than the previous arrangement. These changes also eliminate a vestigial memory leak (of the pei->tqueue array). It's now demonstrable that rescans of Gather or GatherMerge don't leak memory. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us	2017-09-01 17:39:01 -04:00
Peter Eisentraut	c039ba0716	Add memory info to getrusage output Add the maxrss field to the getrusage output (log_*_stats). This was previously omitted because of portability concerns, but we feel this might not be a concern anymore. based on patch by Justin Pryzby <pryzby@telsasoft.com>	2017-09-01 15:36:33 -04:00
Robert Haas	0cb8b7531d	Tighten up some code in RelationBuildPartitionDesc. This probably doesn't save anything meaningful in terms of performance, but making the code simpler is a good idea anyway. Code by Beena Emerson, extracted from a larger patch by Jeevan Ladhe, slightly adjusted by me. Discussion: http://postgr.es/m/CAOgcT0ONgwajdtkoq+AuYkdTPY9cLWWLjxt_k4SXue3eieAr+g@mail.gmail.com	2017-09-01 15:16:44 -04:00
Robert Haas	baaf272ac9	Use group updates when setting transaction status in clog. Commit `0e141c0fbb` introduced a mechanism to reduce contention on ProcArrayLock by having a single process clear XIDs in the procArray on behalf of multiple processes, reducing the need to hand the lock around. A previous attempt to introduce a similar mechanism for CLogControlLock in `ccce90b398` crashed and burned, but the design problem which resulted in those failures is believed to have been corrected in this version. Amit Kapila, with some cosmetic changes by me. See the previous commit message for additional credits. Discussion: http://postgr.es/m/CAA4eK1KudxzgWhuywY_X=yeSAhJMT4DwCjroV5Ay60xaeB2Eew@mail.gmail.com	2017-09-01 11:45:40 -04:00
Alvaro Herrera	a6979c3a68	Restore behavior for replication origin drop Do for replication origins what the previous commit did for replication slots: restore the original behavior of replication origin drop to raise an error rather than blocking, because users might be depending on the original behavior. Maintain the blocking behavior when invoked internally from logical replication subscription handling. Discussion: https://postgr.es/m/20170830133922.tlpo3lgfejm4n2cs@alvherre.pgsql	2017-09-01 16:30:02 +02:00
Alvaro Herrera	be7161566d	Add a WAIT option to DROP_REPLICATION_SLOT Commit `9915de6c1c` changed the default behavior of DROP_REPLICATION_SLOT so that it would wait until any session holding the slot active would release it, instead of raising an error. But users are already depending on the original behavior, so revert to it by default and add a WAIT option to invoke the new behavior. Per complaint from Simone Gotti, in Discussion: https://postgr.es/m/CAEvsy6Wgdf90O6pUvg2wSVXL2omH5OPC-38OD4Zzgk-FXavj3Q@mail.gmail.com	2017-09-01 13:44:14 +02:00
Robert Haas	7b69b6ceb8	Fix assorted carelessness about Datum vs. int64 vs. uint64 Bugs introduced by commit `81c5e46c49`	2017-09-01 00:14:54 -04:00
Robert Haas	0d9506d125	Try to repair poorly-considered code in previous commit.	2017-08-31 23:09:00 -04:00
Robert Haas	81c5e46c49	Introduce 64-bit hash functions with a 64-bit seed. This will be useful for hash partitioning, which needs a way to seed the hash functions to avoid problems such as a hash index on a hash partitioned table clumping all values into a small portion of the bucket space; it's also useful for anything that wants a 64-bit hash value rather than a 32-bit hash value. Just in case somebody wants a 64-bit hash value that is compatible with the existing 32-bit hash values, make the low 32-bits of the 64-bit hash value match the 32-bit hash value when the seed is 0. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com	2017-08-31 22:21:21 -04:00
Tom Lane	2d44c58c79	Avoid memory leaks when a GatherMerge node is rescanned. Rescanning a GatherMerge led to leaking some memory in the executor's query-lifespan context, because most of the node's working data structures were simply abandoned and rebuilt from scratch. In practice, this might never amount to much, given the cost of relaunching worker processes --- but it's still pretty messy, so let's fix it. We can rearrange things so that the tuple arrays are simply cleared and reused, and we don't need to rebuild the TupleTableSlots either, just clear them. One small complication is that because we might get a different number of workers on each iteration, we can't keep the old convention that the leader's gm_slots[] entry is the last one; the leader might clobber a TupleTableSlot that we need for a worker in a future iteration. Hence, adjust the logic so that the leader has slot 0 always, while the active workers have slots 1..n. Back-patch to v10 to keep all the existing versions of nodeGatherMerge.c in sync --- because of the renumbering of the slots, there would otherwise be a very large risk that any future backpatches in this module would introduce bugs. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us	2017-08-31 16:21:05 -04:00
Robert Haas	30833ba154	Expand partitioned tables in PartDesc order. Previously, we expanded the inheritance hierarchy in the order in which find_all_inheritors had locked the tables, but that turns out to block quite a bit of useful optimization. For example, a partition-wise join can't count on two tables with matching bounds to get expanded in the same order. Where possible, this change results in expanding partitioned tables in bound order. Bound order isn't well-defined for a list-partitioned table with a null-accepting partition or for a list-partitioned table where the bounds for a single partition are interleaved with other partitions. However, when expansion in bound order is possible, it opens up further opportunities for optimization, such as strength-reducing MergeAppend to Append when the expansion order matches the desired sort order. Patch by me, with cosmetic revisions by Ashutosh Bapat. Discussion: http://postgr.es/m/CA+TgmoZrKj7kEzcMSum3aXV4eyvvbh9WD=c6m=002WMheDyE3A@mail.gmail.com	2017-08-31 15:50:18 -04:00
Tom Lane	6708e447ef	Clean up shm_mq cleanup. The logic around shm_mq_detach was a few bricks shy of a load, because (contrary to the comments for shm_mq_attach) all it did was update the shared shm_mq state. That left us leaking a bit of process-local memory, but much worse, the on_dsm_detach callback for shm_mq_detach was still armed. That means that whenever we ultimately detach from the DSM segment, we'd run shm_mq_detach again for already-detached, possibly long-dead queues. This accidentally fails to fail today, because we only ever re-use a shm_mq's memory for another shm_mq, and multiple detach attempts on the last such shm_mq are fairly harmless. But it's gonna bite us someday, so let's clean it up. To do that, change shm_mq_detach's API so it takes a shm_mq_handle not the underlying shm_mq. This makes the callers simpler in most cases anyway. Also fix a few places in parallel.c that were just pfree'ing the handle structs rather than doing proper cleanup. Back-patch to v10 because of the risk that the revenant shm_mq_detach callbacks would cause a live bug sometime. Since this is an API change, it's too late to do it in 9.6. (We could make a variant patch that preserves API, but I'm not excited enough to do that.) Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us	2017-08-31 15:10:24 -04:00
Tom Lane	04e9678614	Code review for nodeGatherMerge.c. Comment the fields of GatherMergeState, and organize them a bit more sensibly. Comment GMReaderTupleBuffer more usefully too. Improve assorted other comments that were obsolete or just not very good English. Get rid of the use of a GMReaderTupleBuffer for the leader process; that was confusing, since only the "done" field was used, and that in a way redundant with need_to_scan_locally. In gather_merge_init, avoid calling load_tuple_array for already-known-exhausted workers. I'm not sure if there's a live bug there, but the case is unlikely to be well tested due to timing considerations. Remove some useless code, such as duplicating the tts_isempty test done by TupIsNull. Remove useless initialization of ps.qual, replacing that with an assertion that we have no qual to check. (If we did, the code would fail to check it.) Avoid applying heap_copytuple to a null tuple. While that fails to crash, it's confusing and it makes the code less legible not more so IMO. Propagate a couple of these changes into nodeGather.c, as well. Back-patch to v10, partly because of the possibility that the gather_merge_init change is fixing a live bug, but mostly to keep the branches in sync to ease future bug fixes.	2017-08-30 17:21:08 -04:00
Tom Lane	41b0dd987d	Separate reinitialization of shared parallel-scan state from ExecReScan. Previously, the parallel executor logic did reinitialization of shared state within the ExecReScan code for parallel-aware scan nodes. This is problematic, because it means that the ExecReScan call has to occur synchronously (ie, during the parent Gather node's ReScan call). That is swimming very much against the tide so far as the ExecReScan machinery is concerned; the fact that it works at all today depends on a lot of fragile assumptions, such as that no plan node between Gather and a parallel-aware scan node is parameterized. Another objection is that because ExecReScan might be called in workers as well as the leader, hacky extra tests are needed in some places to prevent unwanted shared-state resets. Hence, let's separate this code into two functions, a ReInitializeDSM call and the ReScan call proper. ReInitializeDSM is called only in the leader and is guaranteed to run before we start new workers. ReScan is returned to its traditional function of resetting only local state, which means that ExecReScan's usual habits of delaying or eliminating child rescan calls are safe again. As with the preceding commit `7df2c1f8d`, it doesn't seem to be necessary to make these changes in 9.6, which is a good thing because the FDW and CustomScan APIs are impacted. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com	2017-08-30 13:18:16 -04:00
Tom Lane	7df2c1f8da	Force rescanning of parallel-aware scan nodes below a Gather[Merge]. The ExecReScan machinery contains various optimizations for postponing or skipping rescans of plan subtrees; for example a HashAgg node may conclude that it can re-use the table it built before, instead of re-reading its input subtree. But that is wrong if the input contains a parallel-aware table scan node, since the portion of the table scanned by the leader process is likely to vary from one rescan to the next. This explains the timing-dependent buildfarm failures we saw after commit `a2b70c89c`. The established mechanism for showing that a plan node's output is potentially variable is to mark it as depending on some runtime Param. Hence, to fix this, invent a dummy Param (one that has a PARAM_EXEC parameter number, but carries no actual value) associated with each Gather or GatherMerge node, mark parallel-aware nodes below that node as dependent on that Param, and arrange for ExecReScanGather[Merge] to flag that Param as changed whenever the Gather[Merge] node is rescanned. This solution breaks an undocumented assumption made by the parallel executor logic, namely that all rescans of nodes below a Gather[Merge] will happen synchronously during the ReScan of the top node itself. But that's fundamentally contrary to the design of the ExecReScan code, and so was doomed to fail someday anyway (even if you want to argue that the bug being fixed here wasn't a failure of that assumption). A follow-on patch will address that issue. In the meantime, the worst that's expected to happen is that given very bad timing luck, the leader might have to do all the work during a rescan, because workers think they have nothing to do, if they are able to start up before the eventual ReScan of the leader's parallel-aware table scan node has reset the shared scan state. Although this problem exists in 9.6, there does not seem to be any way for it to manifest there. Without GatherMerge, it seems that a plan tree that has a rescan-short-circuiting node below Gather will always also have one above it that will short-circuit in the same cases, preventing the Gather from being rescanned. Hence we won't take the risk of back-patching this change into 9.6. But v10 needs it. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com	2017-08-30 09:29:55 -04:00
Robert Haas	bf11e7ee2e	Propagate sort instrumentation from workers back to leader. Up until now, when parallel query was used, no details about the sort method or space used by the workers were available; details were shown only for any sorting done by the leader. Fix that. Commit `1177ab1dab` forced the test case added by commit `1f6d515a67` to run without parallelism; now that we have this infrastructure, allow that again, with a little tweaking to make it pass with and without force_parallel_mode. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com	2017-08-29 13:26:33 -04:00
Robert Haas	3452dc5240	Push tuple limits through Gather and Gather Merge. If we only need, say, 10 tuples in total, then we certainly don't need more than 10 tuples from any single process. Pushing down the limit lets workers exit early when possible. For Gather Merge, there is an additional benefit: a Sort immediately below the Gather Merge can be done as a bounded sort if there is an applicable limit. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+TgmoYa3QKKrLj5rX7UvGqhH73G1Li4B-EKxrmASaca2tFu9Q@mail.gmail.com	2017-08-29 13:16:55 -04:00
Tom Lane	3f4c7917b3	Code review for pushing LIMIT through subqueries. Minor improvements for commit `1f6d515a6`. We do not need the (rather expensive) test for SRFs in the targetlist, because since v10 any such SRFs would appear in separate ProjectSet nodes. Also, make the code look more like the existing cases by turning it into a simple recursion --- the argument that there might be some performance benefit to contorting the code seems unfounded to me, especially since any good compiler should turn the tail-recursion into iteration anyway. Discussion: http://postgr.es/m/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w@mail.gmail.com	2017-08-25 09:05:17 -04:00
Andres Freund	d7694fc148	Consolidate the function pointer types used by dshash.c. Commit `8c0d7bafad` introduced dshash with hash and compare functions like DynaHash's, and also variants that take a user data pointer instead of size. Simplify the interface by merging them into a single pair of function pointer types that take both size and a user data pointer. Since it is anticipated that memcmp and tag_hash behavior will be a common requirement, provide wrapper functions dshash_memcmp and dshash_memhash that conform to the new function types. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de	2017-08-24 17:01:36 -07:00
Andres Freund	4569715bd6	Fix unlikely shared memory leak after failure in dshash_create(). Tidy-up for commit `8c0d7bafad`, based on a complaint from Andres Freund. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de	2017-08-24 16:58:30 -07:00
Andres Freund	20fbf25533	Fix harmless thinko in dsa.c. Commit `16be2fd100` added DSA_ALLOC_HUGE, DSA_ALLOC_ZERO and DSA_ALLOC_NO_OOM which have the same numerical values and meanings as the similarly named MCXT_... macros. In one place we accidentally used MCXT_ALLOC_NO_OOM when DSA_ALLOC_NO_OOM is wanted, so tidy that up. Author: Thomas Munro Discussion: http://postgr.es/m/CAEepm=2AimHxVkkxnMfQvbZMkXy0uKbVa0-D38c5-qwrCm4CMQ@mail.gmail.com Backpatch: 10, where dsa was introduced.	2017-08-24 15:07:40 -07:00
Peter Eisentraut	1e1b01cd16	Fix outdated comment Author: Thomas Munro <thomas.munro@enterprisedb.com>	2017-08-23 14:19:35 -04:00
Peter Eisentraut	237a0b87b1	Improve plural handling in error message This does not use the normal plural handling, because no numbers appear in the actual message.	2017-08-23 13:56:59 -04:00
Peter Eisentraut	85f4d6393d	Tweak some SCRAM error messages and code comments Clarify/correct some error messages, fix up some code comments that confused SASL and SCRAM, and other minor fixes. No changes in functionality.	2017-08-23 12:29:38 -04:00
Peter Eisentraut	580ddcec39	Fix translation marker This was erroneously removed in `55a70a023c`.	2017-08-23 09:56:38 -04:00
Andres Freund	8c0d7bafad	Hash tables backed by DSA shared memory. Add general purpose chaining hash tables for DSA memory. Unlike DynaHash in shared memory mode, these hash tables can grow as required, and cope with being mapped into different addresses in different backends. There is a wide range of potential users for such a hash table, though it's very likely the interface will need to evolve as we come to understand the needs of different kinds of users. E.g support for iterators and incremental resizing is planned for later commits and the details of the callback signatures are likely to change. Author: Thomas Munro Reviewed-By: John Gorman, Andres Freund, Dilip Kumar, Robert Haas Discussion: https://postgr.es/m/CAEepm=3d8o8XdVwYT6O=bHKsKAM2pu2D6sV1S_=4d+jStVCE7w@mail.gmail.com https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-08-22 22:43:07 -07:00
Andres Freund	35ea75632a	Refactor typcache.c's record typmod hash table. Previously, tuple descriptors were stored in chains keyed by a fixed size array of OIDs. That meant there were effectively two levels of collision chain -- one inside and one outside the hash table. Instead, let dynahash.c look after conflicts for us by supplying a proper hash and equal function pair. This is a nice cleanup on its own, but also simplifies followup changes allowing blessed TupleDescs to be shared between backends participating in parallel query. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D34GVhOL%2BarUx56yx7OPk7%3DqpGsv3CpO54feqjAwQKm5g%40mail.gmail.com	2017-08-22 16:11:54 -07:00
Peter Eisentraut	2bfd1b1ee5	Don't install ICU collation keyword variants Users can still create them themselves. Instead, document Unicode TR 35 collation options for ICU, so users can create all this themselves. Reviewed-by: Peter Geoghegan <pg@bowt.ie>	2017-08-21 19:21:07 -04:00
Peter Eisentraut	51e225da30	Expand set of predefined ICU locales Install language+region combinations even if they are not distinct from the language's base locale. This gives better long-term stability of the set of predefined locales and makes the predefined locales less implementation-dependent and more practical for users. Reviewed-by: Peter Geoghegan <pg@bowt.ie>	2017-08-21 19:21:07 -04:00
Robert Haas	1f6d515a67	Push limit through subqueries to underlying sort, where possible. Douglas Doole, reviewed by Ashutosh Bapat and by me. Minor formatting change by me. Discussion: http://postgr.es/m/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w@mail.gmail.com	2017-08-21 14:19:44 -04:00
Robert Haas	79ccd7cbd5	pg_prewarm: Add automatic prewarm feature. Periodically while the server is running, and at shutdown, write out a list of blocks in shared buffers. When the server reaches consistency -- unfortunatey, we can't do it before that point without breaking things -- reload those blocks into any still-unused shared buffers. Mithun Cy and Robert Haas, reviewed and tested by Beena Emerson, Amit Kapila, Jim Nasby, and Rafia Sabih. Discussion: http://postgr.es/m/CAD__OugubOs1Vy7kgF6xTjmEqTR4CrGAv8w+ZbaY_+MZeitukw@mail.gmail.com	2017-08-21 14:17:39 -04:00
Noah Misch	66ed3829df	Inject $(ICU_LIBS) regardless of platform. It appeared in a conditional that excludes AIX, Cygwin and MinGW. Give ICU support a chance to work on those platforms. Back-patch to v10, where ICU support was introduced.	2017-08-20 21:22:18 -07:00
Andres Freund	c6293249dc	Partially flatten struct tupleDesc so that it can be used in DSM. TupleDesc's attributes were already stored in contiguous memory after the struct. Go one step further and get rid of the array of pointers to attributes so that they can be stored in shared memory mapped at different addresses in each backend. This won't work for TupleDescs with contraints and defaults, since those point to other objects, but for many purposes only attributes are needed. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-08-20 11:19:12 -07:00
Andres Freund	2cd7084524	Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n). This is a mechanical change in preparation for a later commit that will change the layout of TupleDesc. Introducing a macro to abstract the details of where attributes are stored will allow us to change that in separate step and revise it in future. Author: Thomas Munro, editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-08-20 11:19:07 -07:00
Peter Eisentraut	24620fc52b	Fix creation of ICU comments for keyword variants It would create the comment referring to the keyword-less parent locale. This was broken in `ddb5fdc068`.	2017-08-18 23:02:28 -04:00
Robert Haas	c4b841ba6a	Fix interaction of triggers, partitioning, and EXPLAIN ANALYZE. Add a new EState member es_leaf_result_relations, so that the trigger code knows about ResultRelInfos created by tuple routing. Also make sure ExplainPrintTriggers knows about partition-related ResultRelInfos. Etsuro Fujita, reviewed by Amit Langote Discussion: http://postgr.es/m/57163e18-8e56-da83-337a-22f2c0008051@lab.ntt.co.jp	2017-08-18 13:01:05 -04:00
Robert Haas	54cde0c4c0	Don't lock tables in RelationGetPartitionDispatchInfo. Instead, lock them in the caller using find_all_inheritors so that they get locked in the standard order, minimizing deadlock risks. Also in RelationGetPartitionDispatchInfo, avoid opening tables which are not partitioned; there's no need. Amit Langote, reviewed by Ashutosh Bapat and Amit Khandekar Discussion: http://postgr.es/m/91b36fa1-c197-b72f-ca6e-56c593bae68c@lab.ntt.co.jp	2017-08-17 15:43:09 -04:00
Robert Haas	ecfe59e50f	Refactor validation of new partitions a little bit. Move some logic that is currently in ATExecAttachPartition to separate functions to facilitate future code reuse. Ashutosh Bapat and Jeevan Ladhe Discussion: http://postgr.es/m/CA+Tgmobbnamyvii0pRdg9pp_jLHSUvq7u5SiRrVV0tEFFU58Tg@mail.gmail.com	2017-08-17 14:49:45 -04:00
Robert Haas	1e56883a52	Attempt to clarify comments related to force_parallel_mode. Per discussion with Tom Lane. Discussion: http://postgr.es/m/28589.1502902172@sss.pgh.pa.us	2017-08-17 14:09:14 -04:00
Tom Lane	a2b70c89ca	Fix ExecReScanGatherMerge. Not surprisingly, since it'd never ever been tested, ExecReScanGatherMerge didn't work. Fix it, and add a regression test case to exercise it. Amit Kapila Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com	2017-08-17 13:49:22 -04:00
Tom Lane	963af96920	Add missing "static" marker. Per pademelon.	2017-08-17 11:17:39 -04:00
Heikki Linnakangas	dcd052c8d2	Fix pg_atomic_u64 initialization. As Andres pointed out, pg_atomic_init_u64 must be used to initialize an atomic variable, before it can be accessed with the actual atomic ops. Trying to use pg_atomic_write_u64 on an uninitialized variable leads to a failure with the fallback implementation that uses a spinlock. Discussion: https://www.postgresql.org/message-id/20170816191346.d3ke5tpshhco4bnd%40alap3.anarazel.de	2017-08-17 00:48:44 +03:00
Tom Lane	2b74303637	Make the planner assume that the entries in a VALUES list are distinct. Previously, if we had to estimate the number of distinct values in a VALUES column, we fell back on the default behavior used whenever we lack statistics, which effectively is that there are Min(# of entries, 200) distinct values. This can be very badly off with a large VALUES list, as noted by Jeff Janes. We could consider actually running an ANALYZE-like scan on the VALUES, but that seems unduly expensive, and anyway it could not deliver reliable info if the entries are not all constants. What seems like a better choice is to assume that the values are all distinct. This will sometimes be just as wrong as the old code, but it seems more likely to be more nearly right in many common cases. Also, it is more consistent with what happens in some related cases, for example WHERE x = ANY(ARRAY[1,2,3,...,n]) and WHERE x = ANY(VALUES (1),(2),(3),...,(n)) now are estimated similarly. This was discussed some time ago, but consensus was it'd be better to slip it in at the start of a development cycle not near the end. (It should've gone into v10, really, but I forgot about it.) Discussion: https://postgr.es/m/CAMkU=1xHkyPa8VQgGcCNg3RMFFvVxUdOpus1gKcFuvVi0w6Acg@mail.gmail.com	2017-08-16 15:37:20 -04:00
Heikki Linnakangas	ac883ac453	Fix shm_toc.c to always return buffer-aligned memory. Previously, if you passed a non-aligned size to shm_toc_create(), the memory returned by shm_toc_allocate() would be similarly non-aligned. This was exposed by commit `3cda10f41b`, which allocated structs containing a pg_atomic_uint64 field with shm_toc_allocate(). On systems with MAXIMUM_ALIGNOF = 4, such structs still need to be 8-bytes aligned, but the memory returned by shm_toc_allocate() was only 4-bytes aligned. It's quite bogus that we abuse BUFFERALIGN to align the structs for pg_atomic_uint64. It doesn't really have anything to do with buffers. But that's a separate issue. This ought to fix the buildfarm failures on 32-bit x86 systems. Discussion: https://www.postgresql.org/message-id/7e0a73a5-0df9-1859-b8ae-9acf122dc38d@iki.fi	2017-08-16 21:52:38 +03:00
Peter Eisentraut	9b5140fb50	Correct representation of foreign tables in information schema tables.table_type is supposed to be 'FOREIGN' rather than 'FOREIGN TABLE' according to the SQL standard.	2017-08-16 11:03:33 -04:00
Heikki Linnakangas	3cda10f41b	Use atomic ops to hand out pages to scan in parallel scan. With a lot of CPUs, the spinlock that protects the current scan location in a parallel scan can become a bottleneck. Use an atomic fetch-and-add instruction instead. David Rowley Discussion: https://www.postgresql.org/message-id/CAKJS1f9tgsPhqBcoPjv9_KUPZvTLCZ4jy%3DB%3DbhqgaKn7cYzm-w@mail.gmail.com	2017-08-16 16:18:41 +03:00
Heikki Linnakangas	0c504a80cf	Remove dedicated B-tree root-split record types. Since commit `40dae7ec53`, which changed the way b-tree page splitting works, there has been no difference in the handling of root, and non-root split WAL records. We don't need to distinguish them anymore If you're worried about the loss of debugging information, note that usually a root split record will normally be followed by a WAL record to create the new root page. The root page will also have the BTP_ROOT flag set on the page itself, and there is a pointer to it from the metapage. Author: Aleksander Alekseev Discussion: https://www.postgresql.org/message-id/20170406122116.GA11081@e733.localdomain	2017-08-16 12:24:40 +03:00
Peter Eisentraut	77d05706be	Fix up some misusage of appendStringInfo() and friends Change to appendStringInfoChar() or appendStringInfoString() where those can be used. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>	2017-08-15 23:34:39 -04:00
Peter Eisentraut	4d4c891715	Initialize replication_slot_catalog_xmin in procarray Although not confirmed and probably rare, if the newly allocated memory is not already zero, this could possibly have caused some problems. Also reorder the initializations slightly so they match the order of the struct definition. Author: Wong, Yi Wen <yiwong@amazon.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>	2017-08-15 21:05:21 -04:00
Peter Eisentraut	0659465caa	Include foreign tables in information_schema.table_privileges This appears to have been an omission in the original commit `0d692a0dc9`. All related information_schema views already include foreign tables. Reported-by: Nicolas Thauvin <nicolas.thauvin@dalibo.com>	2017-08-15 19:27:22 -04:00
Alvaro Herrera	31ae1638ce	Simplify autovacuum work-item implementation The initial implementation of autovacuum work-items used a dynamic shared memory area (DSA). However, it's argued that dynamic shared memory is not portable enough, so we cannot rely on it being supported everywhere; at the same time, autovacuum work-items are now a critical part of the server, so it's not acceptable that they don't work in the cases where dynamic shared memory is disabled. Therefore, let's fall back to a simpler implementation of work-items that just uses autovacuum's main shared memory segment for storage. Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com	2017-08-15 18:14:07 -03:00
Peter Eisentraut	70b573b267	Fix logical replication protocol comparison logic Since we currently only have one protocol, this doesn't make much of a difference other than the error message. Author: Yugo Nagata <nagata@sraoss.co.jp>	2017-08-15 16:21:19 -04:00
Peter Eisentraut	e42351ae07	Simplify some code in logical replication launcher Avoid unnecessary locking calls when a subscription is disabled. Author: Yugo Nagata <nagata@sraoss.co.jp>	2017-08-15 15:13:06 -04:00
Tom Lane	4867d7f62f	Avoid out-of-memory in a hash join with many duplicate inner keys. The executor is capable of splitting buckets during a hash join if too much memory is being used by a small number of buckets. However, this only helps if a bucket's population is actually divisible; if all the hash keys are alike, the tuples still end up in the same new bucket. This can result in an OOM failure if there are enough inner keys with identical hash values. The planner's cost estimates will bias it against choosing a hash join in such situations, but not by so much that it will never do so. To mitigate the OOM hazard, explicitly estimate the hash bucket space needed by just the inner side's most common value, and if that would exceed work_mem then add disable_cost to the hash cost estimate. This approach doesn't account for the possibility that two or more common values would share the same hash value. On the other hand, work_mem is normally a fairly conservative bound, so that eating two or more times that much space is probably not going to kill us. If we have no stats about the inner side, ignore this consideration. There was some discussion of making a conservative assumption, but that would effectively result in disabling hash join whenever we lack stats, which seems like an overreaction given how seldom the problem manifests in the field. Per a complaint from David Hinkle. Although this could be viewed as a bug fix, the lack of similar complaints weighs against back- patching; indeed we waited for v11 because it seemed already rather late in the v10 cycle to be making plan choice changes like this one. Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us	2017-08-15 14:05:53 -04:00
Alvaro Herrera	d9a622cee1	Fix error handling path in autovacuum launcher The original code (since `00e6a16d01`) was assuming aborting the transaction in autovacuum launcher was sufficient to release all resources, but in reality the launcher runs quite a lot of code out of any transactions. Re-introduce individual cleanup calls to make abort more robust. Reported-by: Robert Haas Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com	2017-08-15 13:35:12 -03:00
Robert Haas	e139f1953f	Assorted preparatory refactoring for partition-wise join. Instead of duplicating the logic to search for a matching ParamPathInfo in multiple places, factor it out into a separate function. Pass only the relevant bits of the PartitionKey to partition_bounds_equal instead of the whole thing, because partition-wise join will want to call this without having a PartitionKey available. Adjust allow_star_schema_join and calc_nestloop_required_outer to take relevant Relids rather than the entire Path, because partition-wise join will want to call it with the top-parent relids to determine whether a child join is allowable. Ashutosh Bapat. Review and testing of the larger patch set of which this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih, Thomas Munro, Dilip Kumar, and me. Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com	2017-08-15 12:30:38 -04:00
Tom Lane	00418c6124	Simplify plpgsql's check for simple expressions. plpgsql wants to recognize expressions that it can execute directly via ExecEvalExpr() instead of going through the full SPI machinery. Originally the test for this consisted of recursively groveling through the post-planning expression tree to see if it contained only nodes that plpgsql recognized as safe. That was a major maintenance headache, since it required updating plpgsql every time we added any kind of expression node. It was also kind of expensive, so over time we added various pre-planning checks to try to short-circuit having to do that. Robert Haas pointed out that as of the SRF-processing changes in v10, particularly the addition of Query.hasTargetSRFs, there really isn't any reason to make the recursive scan at all: the initial checks cover everything we really care about. We do have to make sure that those checks agree with what inline_function() considers, so that inlining of a function that formerly wasn't inlined can't cause an expression considered simple to become non-simple. Hence, delete the recursive function exec_simple_check_node(), and tweak those other tests to more exactly agree with inline_function(). Adjust some comments and function naming to match. Discussion: https://postgr.es/m/CA+TgmoZGZpwdEV2FQWaVxA_qZXsQE1DAS5Fu8fwxXDNvfndiUQ@mail.gmail.com	2017-08-15 12:28:39 -04:00
Tom Lane	f3a4d7e7c2	Distinguish wait-for-connection from wait-for-write-ready on Windows. The API for WaitLatch and friends followed the Unix convention in which waiting for a socket connection to complete is identical to waiting for the socket to accept a write. While Windows provides a select(2) emulation that agrees with that, the native WaitForMultipleObjects API treats them as quite different --- and for some bizarre reason, it will report a not-yet-connected socket as write-ready. libpq itself has so far escaped dealing with this because it waits with select(), but in libpqwalreceiver.c we want to wait using WaitLatchOrSocket. The semantics mismatch resulted in replication connection failures on Windows, but only for remote connections (apparently, localhost connections complete immediately, or at least too fast for anyone to have noticed the problem in single-machine testing). To fix, introduce an additional WL_SOCKET_CONNECTED wait flag for WaitLatchOrSocket, which is identical to WL_SOCKET_WRITEABLE on non-Windows, but results in waiting for FD_CONNECT events on Windows. Ideally, we would also distinguish the two conditions in the API for PQconnectPoll(), but changing that API at this point seems infeasible. Instead, cheat by checking for PQstatus() == CONNECTION_STARTED to determine that we're still waiting for the connection to complete. (This is a cheat mainly because CONNECTION_STARTED is documented as an internal state rather than something callers should rely on. Perhaps we ought to change the documentation ... but this patch doesn't.) Per reports from Jobin Augustine and Igor Neyman. Back-patch to v10 where commit `1e8a85009` exposed this longstanding shortcoming. Andres Freund, minor fix and some code review/beautification by me Discussion: https://postgr.es/m/CAHBggj8g2T+ZDcACZ2FmzX9CTxkWjKBsHd6NkYB4i9Ojf6K1Fw@mail.gmail.com	2017-08-15 11:07:57 -04:00
Robert Haas	480f1f4329	Teach adjust_appendrel_attrs(_multilevel) to do multiple translations. Currently, child relations are always base relations, so when we translate parent relids to child relids, we only need to translate a singler relid. However, the proposed partition-wise join feature will create child joins, which will mean we need to translate a set of parent relids to the corresponding child relids. This is preliminary refactoring to make that possible. Ashutosh Bapat. Review and testing of the larger patch set of which this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih, Thomas Munro, Dilip Kumar, and me. Some adjustments, mostly cosmetic, by me. Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com	2017-08-15 10:49:06 -04:00
Robert Haas	d57929afc7	Avoid unnecessary single-child Append nodes. Before commit `d3cc37f1d8`, an inheritance parent whose only children were temp tables of other sessions would end up as a simple scan of the parent; but with that commit, we end up with an Append node, per a report from Ashutosh Bapat. Tweak the logic so that we go back to the old way, and update the function header comment for partitioning while we're at it. Ashutosh Bapat, reviewed by Amit Langote and adjusted by me. Discussion: http://postgr.es/m/CAFjFpReWJr1yTkHU=OqiMBmcYCMoSW3VPR39RBuQ_ovwDFBT5Q@mail.gmail.com	2017-08-15 09:16:33 -04:00
Robert Haas	1295a77788	Add missing call to ExecReScanGatherMerge. Amit Kapila Discussion: http://postgr.es/m/CAA4eK1KeQWZOoDmDmGMwuqzPW9JhRS+ditQVFdAfGjNmMZzqMQ@mail.gmail.com	2017-08-15 08:06:36 -04:00
Tom Lane	21d304dfed	Final pgindent + perltidy run for v10.	2017-08-14 17:29:33 -04:00
Tom Lane	5b6289c1e0	Handle elog(FATAL) during ROLLBACK more robustly. Stress testing by Andreas Seltenreich disclosed longstanding problems that occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are trying to execute a ROLLBACK of an already-failed transaction. In such a case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction would skip AbortTransaction and go straight to CleanupTransaction. This led to an assert failure in an assert-enabled build (due to the ROLLBACK's portal still having a cleanup hook) or without assertions, to a FATAL exit complaining about "cannot drop active portal". The latter's not disastrous, perhaps, but it's messy enough to want to improve it. We don't really want to run all of AbortTransaction in this code path. The minimum required to clean up the open portal safely is to do AtAbort_Memory and AtAbort_Portals. It seems like a good idea to do AtAbort_Memory unconditionally, to be entirely sure that we are starting with a safe CurrentMemoryContext. That means that if the main loop in AbortOutOfAnyTransaction does nothing, we need an extra step at the bottom to restore CurrentMemoryContext = TopMemoryContext, which I chose to do by invoking AtCleanup_Memory. This'll result in calling AtCleanup_Memory twice in many of the paths through this function, but that seems harmless and reasonably inexpensive. The original motivation for the assertion in AtCleanup_Portals was that we wanted to be sure that any user-defined code executed as a consequence of the cleanup hook runs during AbortTransaction not CleanupTransaction. That still seems like a valid concern, and now that we've seen one case of the assertion firing --- which means that exactly that would have happened in a production build --- let's replace the Assert with a runtime check. If we see the cleanup hook still set, we'll emit a WARNING and just drop the hook unexecuted. This has been like this a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu	2017-08-14 15:43:20 -04:00
Peter Eisentraut	7f1bb1d734	Fix typo Author: Masahiko Sawada <sawada.mshk@gmail.com>	2017-08-14 13:53:05 -04:00
Tom Lane	004a9702e0	Remove AtEOXact_CatCache(). The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com	2017-08-13 16:15:14 -04:00
Alvaro Herrera	2336f84284	Reword comment for clarity Reported by Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com	2017-08-12 23:26:35 -04:00
Peter Eisentraut	a1ef920e27	Remove uses of "slave" in replication contexts This affects mostly code comments, some documentation, and tests. Official APIs already used "standby".	2017-08-10 22:55:41 -04:00
Robert Haas	bb5d6e80b1	Improve the error message when creating an empty range partition. The previous message didn't mention the name of the table or the bounds. Put the table name in the primary error message and the bounds in the detail message. Amit Langote, changed slightly by me. Suggestions on the exac phrasing from Tom Lane, David G. Johnston, and Dean Rasheed. Discussion: http://postgr.es/m/CA+Tgmoae6bpwVa-1BMaVcwvCCeOoJ5B9Q9-RHWo-1gJxfPBZ5Q@mail.gmail.com	2017-08-10 13:46:56 -04:00
Robert Haas	ec99dd5aee	Remove incorrect assertion in clog.c We must advance the oldest XID that can be safely looked up in clog before truncating CLOG, and the oldest XID that can't be reused after truncating CLOG. This assertion, and the accompanying comment, are confused; remove them. Reported by Neha Sharma. Discussion: http://postgr.es/m/CANiYTQumC3T=UMBMd1Hor=5XWZYuCEQBioL3ug0YtNQCMMT5wQ@mail.gmail.com	2017-08-10 11:20:57 -04:00
Tom Lane	749c7c4170	Fix handling of container types in find_composite_type_dependencies. find_composite_type_dependencies correctly found columns that are of the specified type, and columns that are of arrays of that type, but not columns that are domains or ranges over the given type, its array type, etc. The most general way to handle this seems to be to assume that any type that is directly dependent on the specified type can be treated as a container type, and processed recursively (allowing us to handle nested cases such as ranges over domains over arrays ...). Since a type's array type already has such a dependency, we can drop the existing special case for the array type. The very similar logic in get_rels_with_domain was likewise a few bricks shy of a load, as it supposed that a directly dependent type could only be a sub-domain. This is already wrong for ranges over domains, and it'll someday be wrong for arrays over domains. Add test cases illustrating the problems, and back-patch to all supported branches. Discussion: https://postgr.es/m/15268.1502309024@sss.pgh.pa.us	2017-08-09 17:03:09 -04:00
Tom Lane	9bf4068cc3	Fix datumSerialize infrastructure to not crash on non-varlena data. Commit `1efc7e538` did a poor job of emulating existing logic for touching Datums that might be expanded-object pointers. It didn't check for typlen being -1 first, which meant it could crash on fixed-length pass-by-ref values, and probably on cstring values as well. It also didn't use DatumGetPointer before VARATT_IS_EXTERNAL_EXPANDED, which while currently harmless is not according to documentation nor prevailing style. I also think the lack of any explanation as to why datumSerialize makes these particular nonobvious choices is pretty awful, so fix that. Per report from Jarred Ward. Back-patch to 9.6 where this code came in. Discussion: https://postgr.es/m/6F61E6D2-2F5E-4794-9479-A429BE1CEA4B@simple.com	2017-08-08 19:18:22 -04:00
Alvaro Herrera	77d2c00af7	Reword some unclear comments	2017-08-08 18:48:01 -04:00
Alvaro Herrera	f5d54ef97a	Fix typo in comment	2017-08-08 18:34:25 -04:00
Alvaro Herrera	b2c95a3798	Fix replication origin-related race conditions Similar to what was fixed in commit `9915de6c1c` for replication slots, but this time it's related to replication origins: DROP SUBSCRIPTION attempts to drop the replication origin, but that fails if the replication worker process hasn't yet marked it unused. This causes failures in the buildfarm: ERROR: could not drop replication origin with OID 1, in use by PID 34069 Like the aforementioned commit, fix by having the process running DROP SUBSCRIPTION sleep until the worker marks the the replication origin struct as free. This uses a condition variable on each replication origin shmem state struct, so that the session trying to drop can sleep and expect to be awakened by the process keeping the origin open. Also fix a SGML markup in the previous commit. Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql	2017-08-08 16:07:46 -04:00
Alvaro Herrera	030273b7ea	Fix inadequacies in recently added wait events In commit `9915de6c1c`, we introduced a new wait point for replication slots and incorrectly labelled it as wait event PG_WAIT_LOCK. That's wrong, so invent an appropriate new wait event instead, and document it properly. While at it, fix numerous other problems in the vicinity: - two different walreceiver wait events were being mixed up in a single wait event (which wasn't documented either); split it out so that they can be distinguished, and document the new events properly. - ParallelBitmapPopulate was documented but didn't exist. - ParallelBitmapScan was not documented (I think this should be called "ParallelBitmapScanInit" instead.) - Logical replication wait events weren't documented - various symbols had been added in dartboard order in various places. Put them in alphabetical order instead, as was originally intended. Discussion: https://postgr.es/m/20170808181131.mu4fjepuh5m75cyq@alvherre.pgsql	2017-08-08 15:37:44 -04:00
Peter Eisentraut	cdc47d1f39	Update SQL features list	2017-08-07 14:30:24 -04:00
Peter Eisentraut	f7668b2b35	Translation updates Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 1a0b5e655d7871506c2b1c7ba562c2de6b6a55de	2017-08-07 13:55:34 -04:00
Peter Eisentraut	fca17a933b	Fix local/remote attribute mix-up in logical replication This would lead to failures if local and remote tables have a different column order. The tests previously didn't catch that because they only tested the initial data copy. So add another test that exercises the apply worker. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-08-07 10:49:08 -04:00
Peter Eisentraut	0e58455dd4	Fix handling of dropped columns in logical replication The relation attribute map was not initialized for dropped columns, leading to errors later on. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Scott Milliken <scott@deltaex.com> Bug: #14769	2017-08-07 10:28:35 -04:00
Tom Lane	8d9881911f	Require update permission for the large object written by lo_put(). lo_put() surely should require UPDATE permission, the same as lowrite(), but it failed to check for that, as reported by Chapman Flack. Oversight in commit c50b7c09d; backpatch to 9.4 where that was introduced. Tom Lane and Michael Paquier Security: CVE-2017-7548	2017-08-07 10:19:19 -04:00
Noah Misch	e568e1eee4	Again match pg_user_mappings to information_schema.user_mapping_options. Commit `3eefc51053` claimed to make pg_user_mappings enforce the qualifications user_mapping_options had been enforcing, but its removal of a longstanding restriction left them distinct when the current user is the subject of a mapping yet has no server privileges. user_mapping_options emits no rows for such a mapping, but pg_user_mappings includes full umoptions. Change pg_user_mappings to show null for umoptions. Back-patch to 9.2, like the above commit. Reviewed by Tom Lane. Reported by Jeff Janes. Security: CVE-2017-7547	2017-08-07 07:09:28 -07:00
Heikki Linnakangas	bf6b9e9444	Don't allow logging in with empty password. Some authentication methods allowed it, others did not. In the client-side, libpq does not even try to authenticate with an empty password, which makes using empty passwords hazardous: an administrator might think that an account with an empty password cannot be used to log in, because psql doesn't allow it, and not realize that a different client would in fact allow it. To clear that confusion and to be be consistent, disallow empty passwords in all authentication methods. All the authentication methods that used plaintext authentication over the wire, except for BSD authentication, already checked that the password received from the user was not empty. To avoid forgetting it in the future again, move the check to the recv_password_packet function. That only forbids using an empty password with plaintext authentication, however. MD5 and SCRAM need a different fix: * In stable branches, check that the MD5 hash stored for the user does not not correspond to an empty string. This adds some overhead to MD5 authentication, because the server needs to compute an extra MD5 hash, but it is not noticeable in practice. * In HEAD, modify CREATE and ALTER ROLE to clear the password if an empty string, or a password hash that corresponds to an empty string, is specified. The user-visible behavior is the same as in the stable branches, the user cannot log in, but it seems better to stop the empty password from entering the system in the first place. Secondly, it is fairly expensive to check that a SCRAM hash doesn't correspond to an empty string, because computing a SCRAM hash is much more expensive than an MD5 hash by design, so better avoid doing that on every authentication. We could clear the password on CREATE/ALTER ROLE also in stable branches, but we would still need to check at authentication time, because even if we prevent empty passwords from being stored in pg_authid, there might be existing ones there already. Reported by Jeroen van der Ham, Ben de Graaff and Jelte Fennema. Security: CVE-2017-7546	2017-08-07 17:03:42 +03:00

... 7 8 9 10 11 ...

18225 Commits