postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	7b86c2ac95	Improve dubious memory management in pg_newlocale_from_collation(). pg_newlocale_from_collation() used malloc() and strdup() directly, which is generally not per backend coding style, and it didn't bother to check for failure results, but would just SIGSEGV instead. Also, if one of the numerous error checks in the middle of the function failed, the already-allocated memory would be leaked permanently. Admittedly, it's not a lot of memory, but it could build up if this function were called repeatedly for a bad collation. The first two problems are easily cured by palloc'ing in TopMemoryContext instead of calling libc directly. We can fairly easily dodge the leakage problem for the struct pg_locale_struct by filling in a temporary variable and allocating permanent storage only once we reach the bottom of the function. It's harder to get rid of the potential leakage for ICU's copy of the collcollate string, but at least that's only allocated after most of the error checks; so live with that aspect. Back-patch to v10 where this code came in, with one or another of the ICU patches.	2017-09-20 13:52:36 -04:00
Robert Haas	57eebca03a	Fix create_lateral_join_info to handle dead relations properly. Commit `0a480502b0` broke it. Report by Andreas Seltenreich. Fix by Ashutosh Bapat. Discussion: http://postgr.es/m/874ls2vrnx.fsf@ansel.ydns.eu	2017-09-20 10:20:10 -04:00
Robert Haas	7f3a3312ab	Fix typo. Thomas Munro Discussion: http://postgr.es/m/CAEepm=2j-HAgnBUrAazwS0ry7Z_ihk+d7g+Ye3u99+6WbiGt_Q@mail.gmail.com	2017-09-20 10:07:53 -04:00
Peter Eisentraut	be87b70b61	Sync process names between ps and pg_stat_activity Remove gratuitous differences in the process names shown in pg_stat_activity.backend_type and the ps output. Reviewed-by: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>	2017-09-20 08:59:03 -04:00
Andres Freund	fc49e24fa6	Make WAL segment size configurable at initdb time. For performance reasons a larger segment size than the default 16MB can be useful. A larger segment size has two main benefits: Firstly, in setups using archiving, it makes it easier to write scripts that can keep up with higher amounts of WAL, secondly, the WAL has to be written and synced to disk less frequently. But at the same time large segment size are disadvantageous for smaller databases. So far the segment size had to be configured at compile time, often making it unrealistic to choose one fitting to a particularly load. Therefore change it to a initdb time setting. This includes a breaking changes to the xlogreader.h API, which now requires the current segment size to be configured. For that and similar reasons a number of binaries had to be taught how to recognize the current segment size. Author: Beena Emerson, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com	2017-09-19 22:03:48 -07:00
Tom Lane	2d484f9b05	Remove no-op GiST support functions in the core GiST opclasses. The preceding patch allowed us to remove useless GiST support functions. This patch actually does that for all the no-op cases in the core GiST code. This buys us whatever performance gain is to be had, and more importantly exercises the preceding patch. There remain no-op functions in the contrib GiST opclasses, but those will take more work to remove. Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com	2017-09-19 23:32:59 -04:00
Tom Lane	d3a4f89d8a	Allow no-op GiST support functions to be omitted. There are common use-cases in which the compress and/or decompress functions can be omitted, with the result being that we make no data transformation when storing or retrieving index values. Previously, you had to provide a no-op function anyway, but this patch allows such opclass support functions to be omitted. Furthermore, if the compress function is omitted, then the core code knows that the stored representation is the same as the original data. This means we can allow index-only scans without requiring a fetch function to be provided either. Previously you had to provide a no-op fetch function if you wanted IOS to work. This reportedly provides a small performance benefit in such cases, but IMO the real reason for doing it is just to reduce the amount of useless boilerplate code that has to be written for GiST opclasses. Andrey Borodin, reviewed by Dmitriy Sarafannikov Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com	2017-09-19 23:32:59 -04:00
Andres Freund	896537f078	s/NULL byte/NUL byte/ in comment refering to C string terminator. Reported-By: Robert Haas Discussion: https://postgr.es/m/CA+Tgmoa+YBvWgFST2NVoeXjVSohEpK=vqnVCsoCkhTVVxfLcVQ@mail.gmail.com	2017-09-19 16:41:07 -07:00
Andres Freund	71edbb6f66	Avoid use of non-portable strnlen() in pgstat_clip_activity(). The use of strnlen rather than strlen was just paranoia. Instead of giving up on the paranoia, just implement the safeguard differently. And add a comment explaining why we're careful. Author: Andres Freund Discussion: https://postgr.es/m/E1duOkJ-0001Mc-U5@gemulon.postgresql.org	2017-09-19 14:25:47 -07:00
Andres Freund	54b6cd589a	Speedup pgstat_report_activity by moving mb-aware truncation to read side. Previously multi-byte aware truncation was done on every pgstat_report_activity() call - proving to be a bottleneck for workloads with long query strings that execute quickly. Instead move the truncation to the read side, which commonly is executed far less frequently. That's possible because all server encodings allow to determine the length of a multi-byte string from the first byte. Rename PgBackendStatus.st_activity to st_activity_raw so existing extension users of the field break - their code has to be adjusted to use pgstat_clip_activity(). Author: Andres Freund Tested-By: Khuntal Ghosh Reviewed-By: Robert Haas, Tom Lane Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkviecpz@alap3.anarazel.de	2017-09-19 12:51:14 -07:00
Tom Lane	ed22fb8b00	Cache datatype-output-function lookup info across calls of concat(). Testing indicates this can save a third to a half of the runtime of the function. Pavel Stehule, reviewed by Alexander Kuzmenkov Discussion: https://postgr.es/m/CAFj8pRAT62pRgjoHbgTfJUc2uLmeQ4saUj+yVJAEZUiMwNCmdg@mail.gmail.com	2017-09-19 15:09:38 -04:00
Andres Freund	f8e5f156b3	Rearm statement_timeout after each executed query. Previously statement_timeout, in the extended protocol, affected all messages till a Sync message. For clients that pipeline/batch query execution that's problematic. Instead disable timeout after each Execute message, and enable, if necessary, the timer in start_xact_command(). As that's done only for Execute and not Parse / Bind, pipelining the latter two could still cause undesirable timeouts. But a survey of protocol implementations shows that all drivers issue Sync messages when preparing, and adding timeout rearming to both is fairly expensive for the common parse / bind / execute sequence. Author: Tatsuo Ishii, editorialized by Andres Freund Reviewed-By: Takayuki Tsunakawa, Andres Freund Discussion: https://postgr.es/m/20170222.115044.1665674502985097185.t-ishii@sraoss.co.jp	2017-09-18 19:36:44 -07:00
Andres Freund	0fb9e4ace5	Fix uninitialized variable in dshash.c. A bugfix for commit `8c0d7bafad`. The code would have crashed if hashtable->size_log2 ever had the same value as hashtable->control->size_log2 by coincidence. Per Valgrind. Author: Thomas Munro Reported-By: Tomas Vondra Discussion: https://postgr.es/m/e72fb33c-4f31-f276-e972-263d9b59554d%402ndquadrant.com	2017-09-18 17:43:37 -07:00
Andres Freund	ec9e05b3c3	Fix crash restart bug introduced in `8356753c21`. The bug was caused by not re-reading the control file during crash recovery restarts, which lead to an attempt to pfree() shared memory contents. The fix is to re-read the control file, which seems good anyway. It's unclear as of this moment, whether we want to keep the refactoring introduced in the commit referenced above, or come up with an alternative approach. But fixing the bug in the mean time seems like a good idea regardless. A followup commit will introduce regression test coverage for crash restarts. Reported-By: Tom Lane Discussion: https://postgr.es/m/14134.1505572349@sss.pgh.pa.us	2017-09-18 17:25:49 -07:00
Tom Lane	eb5c404b17	Minor code-cleanliness improvements for btree. Make the btree page-flags test macros (P_ISLEAF and friends) return clean boolean values, rather than values that might not fit in a bool. Use them in a few places that were randomly referencing the flag bits directly. In passing, change access/nbtree/'s only direct use of BUFFER_LOCK_SHARE to BT_READ. (Some think we should go the other way, but as long as we have BT_READ/BT_WRITE, let's use them consistently.) Masahiko Sawada, reviewed by Doug Doole Discussion: https://postgr.es/m/CAD21AoBmWPeN=WBB5Jvyz_Nt3rmW1ebUyAnk3ZbJP3RMXALJog@mail.gmail.com	2017-09-18 16:36:28 -04:00
Tom Lane	66917bfaa7	Make ExplainOpenGroup and ExplainCloseGroup public. Extensions with custom plan nodes might like to use these in their EXPLAIN output. Hadi Moshayedi Discussion: https://postgr.es/m/CA+_kT_dU-rHCN0u6pjA6bN5CZniMfD=-wVqPY4QLrKUY_uJq5w@mail.gmail.com	2017-09-18 16:01:16 -04:00
Tom Lane	4bd1994650	Make DatumGetFoo/PG_GETARG_FOO/PG_RETURN_FOO macro names more consistent. By project convention, these names should include "P" when dealing with a pointer type; that is, if the result of a GETARG macro is of type FOO , it should be called PG_GETARG_FOO_P not just PG_GETARG_FOO. Some newer types such as JSONB and ranges had not followed the convention, and a number of contrib modules hadn't gotten that memo either. Rename the offending macros to improve consistency. In passing, fix a few places that thought PG_DETOAST_DATUM() returns a Datum; it does not, it returns "struct varlena ". Applying DatumGetPointer to that happens not to cause any bad effects today, but it's formally wrong. Also, adjust an ltree macro that was designed without any thought for what pgindent would do with it. This is all cosmetic and shouldn't have any impact on generated code. Mark Dilger, some further tweaks by me Discussion: https://postgr.es/m/EA5676F4-766F-4F38-8348-ECC7DB427C6A@gmail.com	2017-09-18 15:21:23 -04:00
Tom Lane	3e1683d37e	Fix, or at least ameliorate, bugs in logicalrep_worker_launch(). If we failed to get a background worker slot, the code just walked away from the logicalrep-worker slot it already had, leaving that looking like the worker is still starting up. This led to an indefinite hang in subscription startup, as reported by Thomas Munro. We must release the slot on failure. Also fix a thinko: we must capture the worker slot's generation before releasing LogicalRepWorkerLock the first time, else testing to see if it's changed is pretty meaningless. BTW, the CHECK_FOR_INTERRUPTS() in WaitForReplicationWorkerAttach is a ticking time bomb, even without considering the possibility of elog(ERROR) in one of the other functions it calls. Really, this entire business needs a redesign with some actual thought about error recovery. But for now I'm just band-aiding the case observed in testing. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/CAEepm=2bP3TBMFBArP6o20AZaRduWjMnjCjt22hSdnA-EvrtCw@mail.gmail.com	2017-09-18 11:39:55 -04:00
Peter Eisentraut	8edacab209	Fix DROP SUBSCRIPTION hang When ALTER SUBSCRIPTION DISABLE is run in the same transaction before DROP SUBSCRIPTION, the latter will hang because workers will still be running, not having seen the DISABLE committed, and DROP SUBSCRIPTION will wait until the workers have vacated the replication origin slots. Previously, DROP SUBSCRIPTION killed the logical replication workers immediately only if it was going to drop the replication slot, otherwise it scheduled the worker killing for the end of the transaction, as a result of `7e174fa793`. This, however, causes the present problem. To fix, kill the workers immediately in all cases. This covers all cases: A subscription that doesn't have a replication slot must be disabled. It was either disabled in the same transaction, or it was already disabled before the current transaction, but then there shouldn't be any workers left and this won't make a difference. Reported-by: Arseny Sher <a.sher@postgrespro.ru> Discussion: https://www.postgresql.org/message-id/flat/87mv6av84w.fsf%40ars-thinkpad	2017-09-17 22:00:23 -04:00
Tom Lane	6f44fe7f12	Allow rel_is_distinct_for() to look through RelabelType below OpExpr. This lets it do the right thing for, eg, varchar columns. Back-patch to 9.5 where this logic appeared. David Rowley, per report from Kim Rose Carlsen Discussion: https://postgr.es/m/VI1PR05MB17091F9A9876528055D6A827C76D0@VI1PR05MB1709.eurprd05.prod.outlook.com	2017-09-17 15:28:51 -04:00
Tom Lane	27c6619e9c	Fix possible dangling pointer dereference in trigger.c. AfterTriggerEndQuery correctly notes that the query_stack could get repalloc'd during a trigger firing, but it nonetheless passes the address of a query_stack entry to afterTriggerInvokeEvents, so that if such a repalloc occurs, afterTriggerInvokeEvents is already working with an obsolete dangling pointer while it scans the rest of the events. Oops. The only code at risk is its "delete_ok" cleanup code, so we can prevent unsafe behavior by passing delete_ok = false instead of true. However, that could have a significant performance penalty, because the point of passing delete_ok = true is to not have to re-scan possibly a large number of dead trigger events on the next time through the loop. There's more than one way to skin that cat, though. What we can do is delete all the "chunks" in the event list except the last one, since we know all events in them must be dead. Deleting the chunks is work we'd have had to do later in AfterTriggerEndQuery anyway, and it ends up saving rescanning of just about the same events we'd have gotten rid of with delete_ok = true. In v10 and HEAD, we also have to be careful to mop up any per-table after_trig_events pointers that would become dangling. This is slightly annoying, but I don't think that normal use-cases will traverse this code path often enough for it to be a performance problem. It's pretty hard to hit this in practice because of the unlikelihood of the query_stack getting resized at just the wrong time. Nonetheless, it's definitely a live bug of ancient standing, so back-patch to all supported branches. Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us	2017-09-17 14:50:01 -04:00
Tom Lane	fd31f9f033	Ensure that BEFORE STATEMENT triggers fire the right number of times. Commit `0f79440fb` introduced mechanism to keep AFTER STATEMENT triggers from firing more than once per statement, which was formerly possible if more than one FK enforcement action had to be applied to a given table. Add a similar mechanism for BEFORE STATEMENT triggers, so that we don't have the unexpected situation of firing BEFORE STATEMENT triggers more often than AFTER STATEMENT. As with the previous patch, back-patch to v10. Discussion: https://postgr.es/m/22315.1505584992@sss.pgh.pa.us	2017-09-17 12:16:38 -04:00
Tom Lane	cad22075bc	Fix bogus size calculation introduced by commit `cc5f81366`. The elements of RecordCacheArray are TupleDesc, not TupleDesc *. Those are actually the same size, so that this error is harmless, but it's still wrong --- and it might bite us someday, if TupleDesc ever became a struct, say. Per Coverity.	2017-09-17 11:35:27 -04:00
Tom Lane	0f79440fb0	Fix SQL-spec incompatibilities in new transition table feature. The standard says that all changes of the same kind (insert, update, or delete) caused in one table by a single SQL statement should be reported in a single transition table; and by that, they mean to include foreign key enforcement actions cascading from the statement's direct effects. It's also reasonable to conclude that if the standard had wCTEs, they would say that effects of wCTEs applying to the same table as each other or the outer statement should be merged into one transition table. We weren't doing it like that. Hence, arrange to merge tuples from multiple update actions into a single transition table as much as we can. There is a problem, which is that if the firing of FK enforcement triggers and after-row triggers with transition tables is interspersed, we might need to report more tuples after some triggers have already seen the transition table. It seems like a bad idea for the transition table to be mutable between trigger calls. There's no good way around this without a major redesign of the FK logic, so for now, resolve it by opening a new transition table each time this happens. Also, ensure that AFTER STATEMENT triggers fire just once per statement, or once per transition table when we're forced to make more than one. Previous versions of Postgres have allowed each FK enforcement query to cause an additional firing of the AFTER STATEMENT triggers for the referencing table, but that's certainly not per spec. (We're still doing multiple firings of BEFORE STATEMENT triggers, though; is that something worth changing?) Also, forbid using transition tables with column-specific UPDATE triggers. The spec requires such transition tables to show only the tuples for which the UPDATE trigger would have fired, which means maintaining multiple transition tables or else somehow filtering the contents at readout. Maybe someday we'll bother to support that option, but it looks like a lot of trouble for a marginal feature. The transition tables are now managed by the AfterTriggers data structures, rather than being directly the responsibility of ModifyTable nodes. This removes a subtransaction-lifespan memory leak introduced by my previous band-aid patch `3c4359521`. In passing, refactor the AfterTriggers data structures to reduce the management overhead for them, by using arrays of structs rather than several parallel arrays for per-query-level and per-subtransaction state. I failed to resist the temptation to do some copy-editing on the SGML docs about triggers, above and beyond merely documenting the effects of this patch. Back-patch to v10, because we don't want the semantics of transition tables to change post-release. Patch by me, with help and review from Thomas Munro. Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org	2017-09-16 13:20:36 -04:00
Robert Haas	9361f6f54e	After a MINVALUE/MAXVALUE bound, allow only more of the same. In the old syntax, which used UNBOUNDED, we had a similar restriction, but commit `d363d42bb9`, which changed the syntax, eliminated it. Put it back. Patch by me, reviewed by Dean Rasheed. Discussion: http://postgr.es/m/CA+Tgmobs+pLPC27tS3gOpEAxAffHrq5w509cvkwTf9pF6cWYbg@mail.gmail.com	2017-09-15 21:15:55 -04:00
Peter Eisentraut	3012061b86	Apply pg_get_serial_sequence() to identity column sequences as well Bug: #14813	2017-09-15 14:21:20 -04:00
Tom Lane	71aa4801a8	Get rid of shared_record_typmod_registry_worker_detach; it doesn't work. This code is unsafe, as proven by buildfarm failures, because it tries to access shared memory that might already be gone. It's also unnecessary, because we're about to exit the process anyway and so the record type cache should never be accessed again. The idea was to lay some foundations for someday recycling workers --- which would require attaching to a different shared tupdesc registry --- but that will require considerably more thought. In the meantime let's save some bytes by just removing the nonfunctional code. Problem identification, and proposal to fix by removing functionality from the detach function, by Thomas Munro. I went a bit further by removing the function altogether. Discussion: https://postgr.es/m/E1dsguX-00056N-9x@gemulon.postgresql.org	2017-09-15 10:52:30 -04:00
Tom Lane	eaa4070543	Don't use anonymous unions. Commit `cc5f81366c` introduced a language feature that is not acceptable to strict C89 compilers. Thomas Munro Per buildfarm.	2017-09-15 00:57:38 -04:00
Andres Freund	6b65a7fe62	Remove TupleDesc remapping logic from tqueue.c. With the introduction of a shared memory record typmod registry, it is no longer necessary to remap record typmods when sending tuples between backends so most of tqueue.c can be removed. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-09-14 19:59:29 -07:00
Andres Freund	cc5f81366c	Add support for coordinating record typmods among parallel workers. Tuples can have type RECORDOID and a typmod number that identifies a blessed TupleDesc in a backend-private cache. To support the sharing of such tuples through shared memory and temporary files, provide a typmod registry in shared memory. To achieve that, introduce per-session DSM segments, created on demand when a backend first runs a parallel query. The per-session DSM segment has a table-of-contents just like the per-query DSM segment, and initially the contents are a shared record typmod registry and a DSA area to provide the space it needs to grow. State relating to the current session is accessed via a Session object reached through global variable CurrentSession that may require significant redesign further down the road as we figure out what else needs to be shared or remodelled. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-09-14 19:59:21 -07:00
Robert Haas	81276fdd39	Add missing tags to GetCommandLogLevel. Otherwise, log_statement = 'ddl' causes errors if those statement types are used. Michael Paquier, reviewed by Ashutosh Sharma Discussion: http://postgr.es/m/CAB7nPqStC3HkE76Q1MnHsVd1vF1Td9zXApzYadzDMyLMRkkGrw@mail.gmail.com	2017-09-14 17:19:04 -04:00
Andres Freund	8356753c21	Perform only one ReadControlFile() during startup. Previously we read the control file in multiple places. But soon the segment size will be configurable and stored in the control file, and that needs to be available earlier than it currently is needed. Instead of adding yet another place where it's read, refactor things so there's a single processing of the control file during startup (in EXEC_BACKEND that's every individual backend's startup). Author: Andres Freund Discussion: http://postgr.es/m/20170913092828.aozd3gvvmw67gmyc@alap3.anarazel.de	2017-09-14 14:14:34 -07:00
Robert Haas	0a480502b0	Expand partitioned table RTEs level by level, without flattening. Flattening the partitioning hierarchy at this stage makes various desirable optimizations difficult. The original use case for this patch was partition-wise join, which wants to match up the partitions in one partitioning hierarchy with those in another such hierarchy. However, it now seems that it will also be useful in making partition pruning work using the PartitionDesc rather than constraint exclusion, because with a flattened expansion, we have no easy way to figure out which PartitionDescs apply to which leaf tables in a multi-level partition hierarchy. As it turns out, we end up creating both rte->inh and !rte->inh RTEs for each intermediate partitioned table, just as we previously did for the root table. This seems unnecessary since the partitioned tables have no storage and are not scanned. We might want to go back and rejigger things so that no partitioned tables (including the parent) need !rte->inh RTEs, but that seems to require some adjustments not related to the core purpose of this patch. Ashutosh Bapat, reviewed by me and by Amit Langote. Some final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRd=1venqLL7oGU=C1dEkuvk2DJgvF+7uKbnPHaum1mvHQ@mail.gmail.com	2017-09-14 15:41:08 -04:00
Robert Haas	77b6b5e9ce	Make RelationGetPartitionDispatchInfo expand depth-first. With this change, the order of leaf partitions as returned by RelationGetPartitionDispatchInfo should now be the same as the order used by expand_inherited_rtentry. This will make it simpler for future patches to match up the partition dispatch information with the planner data structures. The new code is also, in my opinion anyway, simpler and easier to understand. Amit Langote, reviewed by Amit Khandekar. I also reviewed and made a few cosmetic revisions. Discussion: http://postgr.es/m/d98d4761-5071-1762-501e-0e15047c714b@lab.ntt.co.jp	2017-09-14 12:28:50 -04:00
Robert Haas	42651bdd68	Fix inconsistent capitalization. Amit Langote Discussion: http://postgr.es/m/a83a0899-19f5-594c-9aac-3ba0f16989a1@lab.ntt.co.jp	2017-09-14 11:11:12 -04:00
Robert Haas	1555566d9e	Set partitioned_rels appropriately when UNION ALL is used. In most cases, this omission won't matter, because the appropriate locks will have been acquired during parse/plan or by AcquireExecutorLocks. But it's a bug all the same. Report by Ashutosh Bapat. Patch by me, reviewed by Amit Langote. Discussion: http://postgr.es/m/CAFjFpRdHb_ZnoDTuBXqrudWXh3H1ibLkr6nHsCFT96fSK4DXtA@mail.gmail.com	2017-09-14 11:00:39 -04:00
Andres Freund	1ab973ab60	Properly check interrupts in execScan.c. During the development of `d47cfef711` the CFI()s in ExecScan() were moved back and forth, ending up in the wrong place. Thus queries that largely spend their time in ExecScan(), and have neither projection nor a qual, can't be cancelled in a timely manner. Reported-By: Jeff Janes Author: Andres Freund Discussion: https://postgr.es/m/CAMkU=1weDXp8eLLPt9SO1LEUsJYYK9cScaGhLKpuN+WbYo9b5g@mail.gmail.com Backpatch: 10, as `d47cfef711`	2017-09-14 02:00:14 -07:00
Tom Lane	7d08ce286c	Distinguish selectivity of < from <= and > from >=. Historically, the selectivity functions have simply not distinguished < from <=, or > from >=, arguing that the fraction of the population that satisfies the "=" aspect can be considered to be vanishingly small, if the comparison value isn't any of the most-common-values for the variable. (If it is, the code path that executes the operator against each MCV will take care of things properly.) But that isn't really true unless we're dealing with a continuum of variable values, and in practice we seldom are. If "x = const" would estimate a nonzero number of rows for a given const value, then it follows that we ought to estimate different numbers of rows for "x < const" and "x <= const", even if the const is not one of the MCVs. Handling this more honestly makes a significant difference in edge cases, such as the estimate for a tight range (x BETWEEN y AND z where y and z are close together). Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly split scalargtsel into scalargtsel/scalargesel. Adjust <= and >= operator definitions to reference the new selectivity functions. Improve the core ineq_histogram_selectivity() function to make a correction for equality. (Along the way, I learned quite a bit about exactly why that function gives good answers, which I tried to memorialize in improved comments.) The corresponding join selectivity functions were, and remain, just stubs. But I chose to split them similarly, to avoid confusion and to prevent the need for doing this exercise again if someone ever makes them less stubby. In passing, change ineq_histogram_selectivity's clamp for extreme probability estimates so that it varies depending on the histogram size, instead of being hardwired at 0.0001. With the default histogram size of 100 entries, you still get the old clamp value, but bigger histograms should allow us to put more faith in edge values. Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us	2017-09-13 11:12:39 -04:00
Peter Eisentraut	61975d6c2c	Improve error message in WAL sender The previous error message when attempting to run a general SQL command in a physical replication WAL sender was a bit sloppy. Reported-by: Fujii Masao <masao.fujii@gmail.com>	2017-09-13 08:31:03 -04:00
Peter Eisentraut	1a2fdc99a4	Define LDAP_NO_ATTRS if necessary. Commit `83aaac41c6` introduced the use of LDAP_NO_ATTRS to avoid requesting a dummy attribute when doing search+bind LDAP authentication. It turns out that not all LDAP implementations define that macro, but its value is fixed by the protocol so we can define it ourselves if it's missing. Author: Thomas Munro Reported-By: Ashutosh Sharma Discussion: https://postgr.es/m/CAE9k0Pm6FKCfPCiAr26-L_SMGOA7dT_k0%2B3pEbB8%2B-oT39xRpw%40mail.gmail.com	2017-09-13 08:22:42 -04:00
Andres Freund	6e7baa3227	Introduce BYTES unit for GUCs. This is already useful for track_activity_query_size, and will further be used in a later commit making the WAL segment size configurable. Author: Beena Emerson Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAOG9ApEu8bXVwBxkOO9J7ZpM76TASK_vFMEEiCEjwhMmSLiaqQ@mail.gmail.com	2017-09-12 12:13:12 -07:00
Peter Eisentraut	83aaac41c6	Allow custom search filters to be configured for LDAP auth Before, only filters of the form "(<ldapsearchattribute>=<user>)" could be used to search an LDAP server. Introduce ldapsearchfilter so that more general filters can be configured using patterns, like "(\|(uid=$username)(mail=$username))" and "(&(uid=$username) (objectClass=posixAccount))". Also allow search filters to be included in an LDAP URL. Author: Thomas Munro Reviewed-By: Peter Eisentraut, Mark Cave-Ayland, Magnus Hagander Discussion: https://postgr.es/m/CAEepm=0XTkYvMci0WRubZcf_1am8=gP=7oJErpsUfRYcKF2gwg@mail.gmail.com	2017-09-12 09:49:04 -04:00
Andres Freund	c1898c3e1e	Constify numeric.c. This allows the compiler/linker to move the static variables to a read-only segment. Not all the signature changes are necessary, but it seems better to apply const in a consistent manner. Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20170910232154.asgml44ji2b7lv3d@alap3.anarazel.de	2017-09-11 13:44:37 -07:00
Peter Eisentraut	821fb8cdbf	Message style fixes	2017-09-11 11:21:27 -04:00
Tom Lane	3c43595217	Quick-hack fix for foreign key cascade vs triggers with transition tables. AFTER triggers using transition tables crashed if they were fired due to a foreign key ON CASCADE update. This is because ExecEndModifyTable flushes the transition tables, on the assumption that any trigger that could need them was already fired during ExecutorFinish. Normally that's true, because we don't allow transition-table-using triggers to be deferred. However, foreign key CASCADE updates force any triggers on the referencing table to be deferred to the outer query level, by means of the EXEC_FLAG_SKIP_TRIGGERS flag. I don't recall all the details of why it's like that and am pretty loath to redesign it right now. Instead, just teach ExecEndModifyTable to skip destroying the TransitionCaptureState when that flag is set. This will allow the transition table data to survive until end of the current subtransaction. This isn't a terribly satisfactory solution, because (1) we might be leaking the transition tables for much longer than really necessary, and (2) as things stand, an AFTER STATEMENT trigger will fire once per RI updating query, ie once per row updated or deleted in the referenced table. I suspect that is not per SQL spec. But redesigning this is a research project that we're certainly not going to get done for v10. So let's go with this hackish answer for now. In passing, tweak AfterTriggerSaveEvent to not save the transition_capture pointer into the event record for a deferrable trigger. This is not necessary to fix the current bug, but it avoids letting dangling pointers to long-gone transition tables persist in the trigger event queue. That's at least a safety feature. It might also allow merging shared trigger states in more cases than before. I added a regression test that demonstrates the crash on unpatched code, and also exposes the behavior of firing the AFTER STATEMENT triggers once per row update. Per bug #14808 from Philippe Beaudoin. Back-patch to v10. Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org	2017-09-10 14:59:56 -04:00
Tom Lane	f80e782a6b	Remove pre-order and post-order traversal logic for red-black trees. This code isn't used, and there's no clear reason why anybody would ever want to use it. These traversal mechanisms don't yield a visitation order that is semantically meaningful for any external purpose, nor are they any faster or simpler than the left-to-right or right-to-left traversals. (In fact, some rough testing suggests they are slower :-(.) Moreover, these mechanisms are impossible to test in any arm's-length fashion; doing so requires knowledge of the red-black tree's internal implementation. Hence, let's just jettison them. Discussion: https://postgr.es/m/17735.1505003111@sss.pgh.pa.us	2017-09-10 13:19:11 -04:00
Tom Lane	fdf87ed451	Fix failure-to-copy bug in commit `6f6b99d13`. The previous coding of get_qual_for_list() was careful to copy everything it was using from the input data structure. The new version missed making a copy of pass-by-ref datum values that it's inserting into Consts. This is not optional, however, as revealed by buildfarm failures on machines running -DRELCACHE_FORCE_RELEASE: we're copying from a relcache entry that could go away before the required lifespan of our output expression. I'm pretty sure -DCLOBBER_CACHE_ALWAYS machines won't like this either, but none of them have reported in yet.	2017-09-08 20:45:31 -04:00
Tom Lane	e56dd7cf50	Fix uninitialized-variable bug. map_partition_varattnos() failed to set its found_whole_row output parameter if the given expression list was NIL. This seems to be a pre-existing bug that chanced to be exposed by commit `6f6b99d13`. It might be unreachable in v10, but I have little faith in that proposition, so back-patch. Per buildfarm.	2017-09-08 19:04:32 -04:00
Robert Haas	6f6b99d133	Allow a partitioned table to have a default partition. Any tuples that don't route to any other partition will route to the default partition. Jeevan Ladhe, Beena Emerson, Ashutosh Bapat, Rahila Syed, and Robert Haas, with review and testing at various stages by (at least) Rushabh Lathia, Keith Fiske, Amit Langote, Amul Sul, Rajkumar Raghuanshi, Sven Kunze, Kyotaro Horiguchi, Thom Brown, Rafia Sabih, and Dilip Kumar. Discussion: http://postgr.es/m/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com Discussion: http://postgr.es/m/CAOG9ApEYj34fWMcvBMBQ-YtqR9fTdXhdN82QEKG0SVZ6zeL1xg@mail.gmail.com	2017-09-08 17:28:04 -04:00
Tom Lane	3cf17c9d47	Remove mention of password_encryption = plain in postgresql.conf.sample. Evidently missed in commit `eb61136dc`. Spotted by Oleg Bartunov. Discussion: https://postgr.es/m/CAF4Au4wz_iK5r4fnTnnd8XqioAZQs-P7-VsEAfivW34zMVpAmw@mail.gmail.com	2017-09-08 14:38:54 -04:00
Robert Haas	f0a0c17c1b	Refactor get_partition_for_tuple a bit. Pending patches for both default partitioning and hash partitioning find the current coding pattern to be inconvenient. Change it so that we switch on the partitioning method first and then do whatever is needed. Amul Sul, reviewed by Jeevan Ladhe, with a few adjustments by me. Discussion: http://postgr.es/m/CAAJ_b97mTb=dG2pv6+1ougxEVZFVnZJajW+0QHj46mEE7WsoOQ@mail.gmail.com Discussion: http://postgr.es/m/CAOgcT0M37CAztEinpvjJc18EdHfm23fw0EG9-36Ya=+rEFUqaQ@mail.gmail.com	2017-09-07 21:13:42 -04:00
Tom Lane	3ca930fc39	Improve performance of get_actual_variable_range with recently-dead tuples. In commit `fccebe421`, we hacked get_actual_variable_range() to scan the index with SnapshotDirty, so that if there are many uncommitted tuples at the end of the index range, it wouldn't laboriously scan through all of them looking for a live value to return. However, that didn't fix it for the case of many recently-dead tuples at the end of the index; SnapshotDirty recognizes those as committed dead and so we're back to the same problem. To improve the situation, invent a "SnapshotNonVacuumable" snapshot type and use that instead. The reason this helps is that, if the snapshot rejects a given index entry, we know that the indexscan will mark that index entry as killed. This means the next get_actual_variable_range() scan will proceed past that entry without visiting the heap, making the scan a lot faster. We may end up accepting a recently-dead tuple as being the estimated extremal value, but that doesn't seem much worse than the compromise we made before to accept not-yet-committed extremal values. The cost of the scan is still proportional to the number of dead index entries at the end of the range, so in the interval after a mass delete but before VACUUM's cleaned up the mess, it's still possible for get_actual_variable_range() to take a noticeable amount of time, if you've got enough such dead entries. But the constant factor is much much better than before, since all we need to do with each index entry is test its "killed" bit. We chose to back-patch commit `fccebe421` at the time, but I'm hesitant to do so here, because this form of the problem seems to affect many fewer people. Also, even when it happens, it's less bad than the case fixed by commit `fccebe421` because we don't get the contention effects from expensive TransactionIdIsInProgress tests. Dmitriy Sarafannikov, reviewed by Andrey Borodin Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru	2017-09-07 19:41:51 -04:00
Peter Eisentraut	1356f78ea9	Reduce excessive dereferencing of function pointers It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com	2017-09-07 13:56:09 -04:00
Robert Haas	9d71323dac	Even if some partitions are foreign, allow tuple routing. This doesn't allow routing tuple to the foreign partitions themselves, but it permits tuples to be routed to regular partitions despite the presence of foreign partitions in the same inheritance hierarchy. Etsuro Fujita, reviewed by Amit Langote and by me. Discussion: http://postgr.es/m/bc3db4c1-1693-3b8a-559f-33ad2b50b7ad@lab.ntt.co.jp	2017-09-07 10:58:21 -04:00
Tom Lane	6eb52da394	Fix handling of savepoint commands within multi-statement Query strings. Issuing a savepoint-related command in a Query message that contains multiple SQL statements led to a FATAL exit with a complaint about "unexpected state STARTED". This is a shortcoming of commit `4f896dac1`, which attempted to prevent such misbehaviors in multi-statement strings; its quick hack of marking the individual statements as "not top-level" does the wrong thing in this case, and isn't a very accurate description of the situation anyway. To fix, let's introduce into xact.c an explicit model of what happens for multi-statement Query strings. This is an "implicit transaction block in progress" state, which for many purposes works like the normal TBLOCK_INPROGRESS state --- in particular, IsTransactionBlock returns true, causing the desired result that PreventTransactionChain will throw error. But in case of error abort it works like TBLOCK_STARTED, allowing the transaction to be cancelled without need for an explicit ROLLBACK command. Commit `4f896dac1` is reverted in toto, so that we go back to treating the individual statements as "top level". We could have left it as-is, but this allows sharpening the error message for PreventTransactionChain calls inside functions. Except for getting a normal error instead of a FATAL exit for savepoint commands, this patch should result in no user-visible behavioral change (other than that one error message rewording). There are some things we might want to do in the line of changing the appearance or wording of error and warning messages around this behavior, which would be much simpler to do now that it's an explicitly modeled state. But I haven't done them here. Although this fixes a long-standing bug, no backpatch. The consequences of the bug don't seem severe enough to justify the risk that this commit itself creates some new issue. Patch by me, but it owes something to previous investigation by Takayuki Tsunakawa, who also reported the bug in the first place. Also thanks to Michael Paquier for reviewing. Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05	2017-09-07 09:49:55 -04:00
Simon Riggs	f06588a8e6	Exclude special values in recovery_target_time recovery_target_time accepts timestamp input, though does not allow use of special values, e.g. “today”. Report a useful error message for these cases. Reported-by: Piotr Stefaniak Author: Simon Riggs Discussion: https://postgr.es/m/CANP8+jJdKA+BkkYLWz9zAm16Y0s2ExBv0WfpAwXdTpPfWnA9Bg@mail.gmail.com	2017-09-07 04:56:34 -07:00
Simon Riggs	5b6d13eec7	Allow SET STATISTICS on expression indexes Index columns are referenced by ordinal number rather than name, e.g. CREATE INDEX coord_idx ON measured (x, y, (z + t)); ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000; Incompatibility note for release notes: \d+ for indexes now also displays Stats Target Authors: Alexander Korotkov, with contribution by Adrien NAYRAT Review: Adrien NAYRAT, Simon Riggs Wordsmith: Simon Riggs	2017-09-06 13:46:01 -07:00
Tom Lane	8689e38263	Clean up handling of dropped columns in NAMEDTUPLESTORE RTEs. The NAMEDTUPLESTORE patch piggybacked on the infrastructure for TABLEFUNC/VALUES/CTE RTEs, none of which can ever have dropped columns, so the possibility was ignored most places. Fix that, including adding a specification to parsenodes.h about what it's supposed to look like. In passing, clean up assorted comments that hadn't been maintained properly by said patch. Per bug #14799 from Philippe Beaudoin. Back-patch to v10. Discussion: https://postgr.es/m/20170906120005.25630.84360@wrigleys.postgresql.org	2017-09-06 10:41:05 -04:00
Tom Lane	6e427aa4e5	Use lfirst_node() and linitial_node() where appropriate in planner.c. There's no particular reason to target this module for the first wholesale application of these macros; but we gotta start somewhere. Ashutosh Bapat and Jeevan Chalke Discussion: https://postgr.es/m/CAFjFpRcNr3r=u0ni=7A4GD9NnHQVq+dkFafzqo2rS6zy=dt1eg@mail.gmail.com	2017-09-05 15:57:48 -04:00
Peter Eisentraut	17273d059c	Remove unnecessary parentheses in return statements The parenthesized style has only been used in a few modules. Change that to use the style that is predominant across the whole tree. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>	2017-09-05 14:52:55 -04:00
Alvaro Herrera	ebd346caf4	Correct base backup throttling Throttling for sending a base backup in walsender is broken for the case where there is a lot of WAL traffic, because the latch used to put the walsender to sleep is also signalled by regular WAL traffic (and each signal causes an additional batch of data to be sent); the net effect is that there is no or little actual throttling. This is undesirable, so rewrite the sleep into a loop to achieve the desired effeect. Author: Jeff Janes, small tweaks by me Reviewed-by: Antonin Houska Discussion: https://postgr.es/m/CAMkU=1xH6mde-yL-Eo1TKBGNd0PB1-TMxvrNvqcAkN-qr2E9mw@mail.gmail.com	2017-09-05 17:27:30 +02:00
Tom Lane	4faa1dc2eb	Suppress compiler warnings in dshash.c. Some compilers complain, not unreasonably, about left-shifting an int32 "1" and then assigning the result to an int64. In practice I sure hope that this data structure never gets large enough that an overflow would actually occur; but let's cast the constant to the right type to avoid the hazard. In passing, fix a typo in dshash.h. Amit Kapila, adjusted as per comment from Thomas Munro. Discussion: https://postgr.es/m/CAA4eK1+5vfVMYtjK_NX8O3-42yM3o80qdqWnQzGquPrbq6mb+A@mail.gmail.com	2017-09-03 11:12:29 -04:00
Tom Lane	51daa7bdb3	Improve division of labor between execParallel.c and nodeGather[Merge].c. Move the responsibility for creating/destroying TupleQueueReaders into execParallel.c, to avoid duplicative coding in nodeGather.c and nodeGatherMerge.c. Also, instead of having DestroyTupleQueueReader do shm_mq_detach, do it in the caller (which is now only ExecParallelFinish). This means execParallel.c does both the attaching and detaching of the tuple-queue-reader shm_mqs, which seems less weird than the previous arrangement. These changes also eliminate a vestigial memory leak (of the pei->tqueue array). It's now demonstrable that rescans of Gather or GatherMerge don't leak memory. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us	2017-09-01 17:39:01 -04:00
Peter Eisentraut	c039ba0716	Add memory info to getrusage output Add the maxrss field to the getrusage output (log_*_stats). This was previously omitted because of portability concerns, but we feel this might not be a concern anymore. based on patch by Justin Pryzby <pryzby@telsasoft.com>	2017-09-01 15:36:33 -04:00
Robert Haas	0cb8b7531d	Tighten up some code in RelationBuildPartitionDesc. This probably doesn't save anything meaningful in terms of performance, but making the code simpler is a good idea anyway. Code by Beena Emerson, extracted from a larger patch by Jeevan Ladhe, slightly adjusted by me. Discussion: http://postgr.es/m/CAOgcT0ONgwajdtkoq+AuYkdTPY9cLWWLjxt_k4SXue3eieAr+g@mail.gmail.com	2017-09-01 15:16:44 -04:00
Robert Haas	baaf272ac9	Use group updates when setting transaction status in clog. Commit `0e141c0fbb` introduced a mechanism to reduce contention on ProcArrayLock by having a single process clear XIDs in the procArray on behalf of multiple processes, reducing the need to hand the lock around. A previous attempt to introduce a similar mechanism for CLogControlLock in `ccce90b398` crashed and burned, but the design problem which resulted in those failures is believed to have been corrected in this version. Amit Kapila, with some cosmetic changes by me. See the previous commit message for additional credits. Discussion: http://postgr.es/m/CAA4eK1KudxzgWhuywY_X=yeSAhJMT4DwCjroV5Ay60xaeB2Eew@mail.gmail.com	2017-09-01 11:45:40 -04:00
Alvaro Herrera	a6979c3a68	Restore behavior for replication origin drop Do for replication origins what the previous commit did for replication slots: restore the original behavior of replication origin drop to raise an error rather than blocking, because users might be depending on the original behavior. Maintain the blocking behavior when invoked internally from logical replication subscription handling. Discussion: https://postgr.es/m/20170830133922.tlpo3lgfejm4n2cs@alvherre.pgsql	2017-09-01 16:30:02 +02:00
Alvaro Herrera	be7161566d	Add a WAIT option to DROP_REPLICATION_SLOT Commit `9915de6c1c` changed the default behavior of DROP_REPLICATION_SLOT so that it would wait until any session holding the slot active would release it, instead of raising an error. But users are already depending on the original behavior, so revert to it by default and add a WAIT option to invoke the new behavior. Per complaint from Simone Gotti, in Discussion: https://postgr.es/m/CAEvsy6Wgdf90O6pUvg2wSVXL2omH5OPC-38OD4Zzgk-FXavj3Q@mail.gmail.com	2017-09-01 13:44:14 +02:00
Robert Haas	7b69b6ceb8	Fix assorted carelessness about Datum vs. int64 vs. uint64 Bugs introduced by commit `81c5e46c49`	2017-09-01 00:14:54 -04:00
Robert Haas	0d9506d125	Try to repair poorly-considered code in previous commit.	2017-08-31 23:09:00 -04:00
Robert Haas	81c5e46c49	Introduce 64-bit hash functions with a 64-bit seed. This will be useful for hash partitioning, which needs a way to seed the hash functions to avoid problems such as a hash index on a hash partitioned table clumping all values into a small portion of the bucket space; it's also useful for anything that wants a 64-bit hash value rather than a 32-bit hash value. Just in case somebody wants a 64-bit hash value that is compatible with the existing 32-bit hash values, make the low 32-bits of the 64-bit hash value match the 32-bit hash value when the seed is 0. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com	2017-08-31 22:21:21 -04:00
Tom Lane	2d44c58c79	Avoid memory leaks when a GatherMerge node is rescanned. Rescanning a GatherMerge led to leaking some memory in the executor's query-lifespan context, because most of the node's working data structures were simply abandoned and rebuilt from scratch. In practice, this might never amount to much, given the cost of relaunching worker processes --- but it's still pretty messy, so let's fix it. We can rearrange things so that the tuple arrays are simply cleared and reused, and we don't need to rebuild the TupleTableSlots either, just clear them. One small complication is that because we might get a different number of workers on each iteration, we can't keep the old convention that the leader's gm_slots[] entry is the last one; the leader might clobber a TupleTableSlot that we need for a worker in a future iteration. Hence, adjust the logic so that the leader has slot 0 always, while the active workers have slots 1..n. Back-patch to v10 to keep all the existing versions of nodeGatherMerge.c in sync --- because of the renumbering of the slots, there would otherwise be a very large risk that any future backpatches in this module would introduce bugs. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us	2017-08-31 16:21:05 -04:00
Robert Haas	30833ba154	Expand partitioned tables in PartDesc order. Previously, we expanded the inheritance hierarchy in the order in which find_all_inheritors had locked the tables, but that turns out to block quite a bit of useful optimization. For example, a partition-wise join can't count on two tables with matching bounds to get expanded in the same order. Where possible, this change results in expanding partitioned tables in bound order. Bound order isn't well-defined for a list-partitioned table with a null-accepting partition or for a list-partitioned table where the bounds for a single partition are interleaved with other partitions. However, when expansion in bound order is possible, it opens up further opportunities for optimization, such as strength-reducing MergeAppend to Append when the expansion order matches the desired sort order. Patch by me, with cosmetic revisions by Ashutosh Bapat. Discussion: http://postgr.es/m/CA+TgmoZrKj7kEzcMSum3aXV4eyvvbh9WD=c6m=002WMheDyE3A@mail.gmail.com	2017-08-31 15:50:18 -04:00
Tom Lane	6708e447ef	Clean up shm_mq cleanup. The logic around shm_mq_detach was a few bricks shy of a load, because (contrary to the comments for shm_mq_attach) all it did was update the shared shm_mq state. That left us leaking a bit of process-local memory, but much worse, the on_dsm_detach callback for shm_mq_detach was still armed. That means that whenever we ultimately detach from the DSM segment, we'd run shm_mq_detach again for already-detached, possibly long-dead queues. This accidentally fails to fail today, because we only ever re-use a shm_mq's memory for another shm_mq, and multiple detach attempts on the last such shm_mq are fairly harmless. But it's gonna bite us someday, so let's clean it up. To do that, change shm_mq_detach's API so it takes a shm_mq_handle not the underlying shm_mq. This makes the callers simpler in most cases anyway. Also fix a few places in parallel.c that were just pfree'ing the handle structs rather than doing proper cleanup. Back-patch to v10 because of the risk that the revenant shm_mq_detach callbacks would cause a live bug sometime. Since this is an API change, it's too late to do it in 9.6. (We could make a variant patch that preserves API, but I'm not excited enough to do that.) Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us	2017-08-31 15:10:24 -04:00
Tom Lane	04e9678614	Code review for nodeGatherMerge.c. Comment the fields of GatherMergeState, and organize them a bit more sensibly. Comment GMReaderTupleBuffer more usefully too. Improve assorted other comments that were obsolete or just not very good English. Get rid of the use of a GMReaderTupleBuffer for the leader process; that was confusing, since only the "done" field was used, and that in a way redundant with need_to_scan_locally. In gather_merge_init, avoid calling load_tuple_array for already-known-exhausted workers. I'm not sure if there's a live bug there, but the case is unlikely to be well tested due to timing considerations. Remove some useless code, such as duplicating the tts_isempty test done by TupIsNull. Remove useless initialization of ps.qual, replacing that with an assertion that we have no qual to check. (If we did, the code would fail to check it.) Avoid applying heap_copytuple to a null tuple. While that fails to crash, it's confusing and it makes the code less legible not more so IMO. Propagate a couple of these changes into nodeGather.c, as well. Back-patch to v10, partly because of the possibility that the gather_merge_init change is fixing a live bug, but mostly to keep the branches in sync to ease future bug fixes.	2017-08-30 17:21:08 -04:00
Tom Lane	41b0dd987d	Separate reinitialization of shared parallel-scan state from ExecReScan. Previously, the parallel executor logic did reinitialization of shared state within the ExecReScan code for parallel-aware scan nodes. This is problematic, because it means that the ExecReScan call has to occur synchronously (ie, during the parent Gather node's ReScan call). That is swimming very much against the tide so far as the ExecReScan machinery is concerned; the fact that it works at all today depends on a lot of fragile assumptions, such as that no plan node between Gather and a parallel-aware scan node is parameterized. Another objection is that because ExecReScan might be called in workers as well as the leader, hacky extra tests are needed in some places to prevent unwanted shared-state resets. Hence, let's separate this code into two functions, a ReInitializeDSM call and the ReScan call proper. ReInitializeDSM is called only in the leader and is guaranteed to run before we start new workers. ReScan is returned to its traditional function of resetting only local state, which means that ExecReScan's usual habits of delaying or eliminating child rescan calls are safe again. As with the preceding commit `7df2c1f8d`, it doesn't seem to be necessary to make these changes in 9.6, which is a good thing because the FDW and CustomScan APIs are impacted. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com	2017-08-30 13:18:16 -04:00
Tom Lane	7df2c1f8da	Force rescanning of parallel-aware scan nodes below a Gather[Merge]. The ExecReScan machinery contains various optimizations for postponing or skipping rescans of plan subtrees; for example a HashAgg node may conclude that it can re-use the table it built before, instead of re-reading its input subtree. But that is wrong if the input contains a parallel-aware table scan node, since the portion of the table scanned by the leader process is likely to vary from one rescan to the next. This explains the timing-dependent buildfarm failures we saw after commit `a2b70c89c`. The established mechanism for showing that a plan node's output is potentially variable is to mark it as depending on some runtime Param. Hence, to fix this, invent a dummy Param (one that has a PARAM_EXEC parameter number, but carries no actual value) associated with each Gather or GatherMerge node, mark parallel-aware nodes below that node as dependent on that Param, and arrange for ExecReScanGather[Merge] to flag that Param as changed whenever the Gather[Merge] node is rescanned. This solution breaks an undocumented assumption made by the parallel executor logic, namely that all rescans of nodes below a Gather[Merge] will happen synchronously during the ReScan of the top node itself. But that's fundamentally contrary to the design of the ExecReScan code, and so was doomed to fail someday anyway (even if you want to argue that the bug being fixed here wasn't a failure of that assumption). A follow-on patch will address that issue. In the meantime, the worst that's expected to happen is that given very bad timing luck, the leader might have to do all the work during a rescan, because workers think they have nothing to do, if they are able to start up before the eventual ReScan of the leader's parallel-aware table scan node has reset the shared scan state. Although this problem exists in 9.6, there does not seem to be any way for it to manifest there. Without GatherMerge, it seems that a plan tree that has a rescan-short-circuiting node below Gather will always also have one above it that will short-circuit in the same cases, preventing the Gather from being rescanned. Hence we won't take the risk of back-patching this change into 9.6. But v10 needs it. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com	2017-08-30 09:29:55 -04:00
Robert Haas	bf11e7ee2e	Propagate sort instrumentation from workers back to leader. Up until now, when parallel query was used, no details about the sort method or space used by the workers were available; details were shown only for any sorting done by the leader. Fix that. Commit `1177ab1dab` forced the test case added by commit `1f6d515a67` to run without parallelism; now that we have this infrastructure, allow that again, with a little tweaking to make it pass with and without force_parallel_mode. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com	2017-08-29 13:26:33 -04:00
Robert Haas	3452dc5240	Push tuple limits through Gather and Gather Merge. If we only need, say, 10 tuples in total, then we certainly don't need more than 10 tuples from any single process. Pushing down the limit lets workers exit early when possible. For Gather Merge, there is an additional benefit: a Sort immediately below the Gather Merge can be done as a bounded sort if there is an applicable limit. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+TgmoYa3QKKrLj5rX7UvGqhH73G1Li4B-EKxrmASaca2tFu9Q@mail.gmail.com	2017-08-29 13:16:55 -04:00
Tom Lane	3f4c7917b3	Code review for pushing LIMIT through subqueries. Minor improvements for commit `1f6d515a6`. We do not need the (rather expensive) test for SRFs in the targetlist, because since v10 any such SRFs would appear in separate ProjectSet nodes. Also, make the code look more like the existing cases by turning it into a simple recursion --- the argument that there might be some performance benefit to contorting the code seems unfounded to me, especially since any good compiler should turn the tail-recursion into iteration anyway. Discussion: http://postgr.es/m/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w@mail.gmail.com	2017-08-25 09:05:17 -04:00
Andres Freund	d7694fc148	Consolidate the function pointer types used by dshash.c. Commit `8c0d7bafad` introduced dshash with hash and compare functions like DynaHash's, and also variants that take a user data pointer instead of size. Simplify the interface by merging them into a single pair of function pointer types that take both size and a user data pointer. Since it is anticipated that memcmp and tag_hash behavior will be a common requirement, provide wrapper functions dshash_memcmp and dshash_memhash that conform to the new function types. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de	2017-08-24 17:01:36 -07:00
Andres Freund	4569715bd6	Fix unlikely shared memory leak after failure in dshash_create(). Tidy-up for commit `8c0d7bafad`, based on a complaint from Andres Freund. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de	2017-08-24 16:58:30 -07:00
Andres Freund	20fbf25533	Fix harmless thinko in dsa.c. Commit `16be2fd100` added DSA_ALLOC_HUGE, DSA_ALLOC_ZERO and DSA_ALLOC_NO_OOM which have the same numerical values and meanings as the similarly named MCXT_... macros. In one place we accidentally used MCXT_ALLOC_NO_OOM when DSA_ALLOC_NO_OOM is wanted, so tidy that up. Author: Thomas Munro Discussion: http://postgr.es/m/CAEepm=2AimHxVkkxnMfQvbZMkXy0uKbVa0-D38c5-qwrCm4CMQ@mail.gmail.com Backpatch: 10, where dsa was introduced.	2017-08-24 15:07:40 -07:00
Peter Eisentraut	1e1b01cd16	Fix outdated comment Author: Thomas Munro <thomas.munro@enterprisedb.com>	2017-08-23 14:19:35 -04:00
Peter Eisentraut	237a0b87b1	Improve plural handling in error message This does not use the normal plural handling, because no numbers appear in the actual message.	2017-08-23 13:56:59 -04:00
Peter Eisentraut	85f4d6393d	Tweak some SCRAM error messages and code comments Clarify/correct some error messages, fix up some code comments that confused SASL and SCRAM, and other minor fixes. No changes in functionality.	2017-08-23 12:29:38 -04:00
Peter Eisentraut	580ddcec39	Fix translation marker This was erroneously removed in `55a70a023c`.	2017-08-23 09:56:38 -04:00
Andres Freund	8c0d7bafad	Hash tables backed by DSA shared memory. Add general purpose chaining hash tables for DSA memory. Unlike DynaHash in shared memory mode, these hash tables can grow as required, and cope with being mapped into different addresses in different backends. There is a wide range of potential users for such a hash table, though it's very likely the interface will need to evolve as we come to understand the needs of different kinds of users. E.g support for iterators and incremental resizing is planned for later commits and the details of the callback signatures are likely to change. Author: Thomas Munro Reviewed-By: John Gorman, Andres Freund, Dilip Kumar, Robert Haas Discussion: https://postgr.es/m/CAEepm=3d8o8XdVwYT6O=bHKsKAM2pu2D6sV1S_=4d+jStVCE7w@mail.gmail.com https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-08-22 22:43:07 -07:00
Andres Freund	35ea75632a	Refactor typcache.c's record typmod hash table. Previously, tuple descriptors were stored in chains keyed by a fixed size array of OIDs. That meant there were effectively two levels of collision chain -- one inside and one outside the hash table. Instead, let dynahash.c look after conflicts for us by supplying a proper hash and equal function pair. This is a nice cleanup on its own, but also simplifies followup changes allowing blessed TupleDescs to be shared between backends participating in parallel query. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D34GVhOL%2BarUx56yx7OPk7%3DqpGsv3CpO54feqjAwQKm5g%40mail.gmail.com	2017-08-22 16:11:54 -07:00
Peter Eisentraut	2bfd1b1ee5	Don't install ICU collation keyword variants Users can still create them themselves. Instead, document Unicode TR 35 collation options for ICU, so users can create all this themselves. Reviewed-by: Peter Geoghegan <pg@bowt.ie>	2017-08-21 19:21:07 -04:00
Peter Eisentraut	51e225da30	Expand set of predefined ICU locales Install language+region combinations even if they are not distinct from the language's base locale. This gives better long-term stability of the set of predefined locales and makes the predefined locales less implementation-dependent and more practical for users. Reviewed-by: Peter Geoghegan <pg@bowt.ie>	2017-08-21 19:21:07 -04:00
Robert Haas	1f6d515a67	Push limit through subqueries to underlying sort, where possible. Douglas Doole, reviewed by Ashutosh Bapat and by me. Minor formatting change by me. Discussion: http://postgr.es/m/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w@mail.gmail.com	2017-08-21 14:19:44 -04:00
Robert Haas	79ccd7cbd5	pg_prewarm: Add automatic prewarm feature. Periodically while the server is running, and at shutdown, write out a list of blocks in shared buffers. When the server reaches consistency -- unfortunatey, we can't do it before that point without breaking things -- reload those blocks into any still-unused shared buffers. Mithun Cy and Robert Haas, reviewed and tested by Beena Emerson, Amit Kapila, Jim Nasby, and Rafia Sabih. Discussion: http://postgr.es/m/CAD__OugubOs1Vy7kgF6xTjmEqTR4CrGAv8w+ZbaY_+MZeitukw@mail.gmail.com	2017-08-21 14:17:39 -04:00
Noah Misch	66ed3829df	Inject $(ICU_LIBS) regardless of platform. It appeared in a conditional that excludes AIX, Cygwin and MinGW. Give ICU support a chance to work on those platforms. Back-patch to v10, where ICU support was introduced.	2017-08-20 21:22:18 -07:00
Andres Freund	c6293249dc	Partially flatten struct tupleDesc so that it can be used in DSM. TupleDesc's attributes were already stored in contiguous memory after the struct. Go one step further and get rid of the array of pointers to attributes so that they can be stored in shared memory mapped at different addresses in each backend. This won't work for TupleDescs with contraints and defaults, since those point to other objects, but for many purposes only attributes are needed. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-08-20 11:19:12 -07:00
Andres Freund	2cd7084524	Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n). This is a mechanical change in preparation for a later commit that will change the layout of TupleDesc. Introducing a macro to abstract the details of where attributes are stored will allow us to change that in separate step and revise it in future. Author: Thomas Munro, editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-08-20 11:19:07 -07:00
Peter Eisentraut	24620fc52b	Fix creation of ICU comments for keyword variants It would create the comment referring to the keyword-less parent locale. This was broken in `ddb5fdc068`.	2017-08-18 23:02:28 -04:00
Robert Haas	c4b841ba6a	Fix interaction of triggers, partitioning, and EXPLAIN ANALYZE. Add a new EState member es_leaf_result_relations, so that the trigger code knows about ResultRelInfos created by tuple routing. Also make sure ExplainPrintTriggers knows about partition-related ResultRelInfos. Etsuro Fujita, reviewed by Amit Langote Discussion: http://postgr.es/m/57163e18-8e56-da83-337a-22f2c0008051@lab.ntt.co.jp	2017-08-18 13:01:05 -04:00
Robert Haas	54cde0c4c0	Don't lock tables in RelationGetPartitionDispatchInfo. Instead, lock them in the caller using find_all_inheritors so that they get locked in the standard order, minimizing deadlock risks. Also in RelationGetPartitionDispatchInfo, avoid opening tables which are not partitioned; there's no need. Amit Langote, reviewed by Ashutosh Bapat and Amit Khandekar Discussion: http://postgr.es/m/91b36fa1-c197-b72f-ca6e-56c593bae68c@lab.ntt.co.jp	2017-08-17 15:43:09 -04:00
Robert Haas	ecfe59e50f	Refactor validation of new partitions a little bit. Move some logic that is currently in ATExecAttachPartition to separate functions to facilitate future code reuse. Ashutosh Bapat and Jeevan Ladhe Discussion: http://postgr.es/m/CA+Tgmobbnamyvii0pRdg9pp_jLHSUvq7u5SiRrVV0tEFFU58Tg@mail.gmail.com	2017-08-17 14:49:45 -04:00
Robert Haas	1e56883a52	Attempt to clarify comments related to force_parallel_mode. Per discussion with Tom Lane. Discussion: http://postgr.es/m/28589.1502902172@sss.pgh.pa.us	2017-08-17 14:09:14 -04:00
Tom Lane	a2b70c89ca	Fix ExecReScanGatherMerge. Not surprisingly, since it'd never ever been tested, ExecReScanGatherMerge didn't work. Fix it, and add a regression test case to exercise it. Amit Kapila Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com	2017-08-17 13:49:22 -04:00
Tom Lane	963af96920	Add missing "static" marker. Per pademelon.	2017-08-17 11:17:39 -04:00
Heikki Linnakangas	dcd052c8d2	Fix pg_atomic_u64 initialization. As Andres pointed out, pg_atomic_init_u64 must be used to initialize an atomic variable, before it can be accessed with the actual atomic ops. Trying to use pg_atomic_write_u64 on an uninitialized variable leads to a failure with the fallback implementation that uses a spinlock. Discussion: https://www.postgresql.org/message-id/20170816191346.d3ke5tpshhco4bnd%40alap3.anarazel.de	2017-08-17 00:48:44 +03:00
Tom Lane	2b74303637	Make the planner assume that the entries in a VALUES list are distinct. Previously, if we had to estimate the number of distinct values in a VALUES column, we fell back on the default behavior used whenever we lack statistics, which effectively is that there are Min(# of entries, 200) distinct values. This can be very badly off with a large VALUES list, as noted by Jeff Janes. We could consider actually running an ANALYZE-like scan on the VALUES, but that seems unduly expensive, and anyway it could not deliver reliable info if the entries are not all constants. What seems like a better choice is to assume that the values are all distinct. This will sometimes be just as wrong as the old code, but it seems more likely to be more nearly right in many common cases. Also, it is more consistent with what happens in some related cases, for example WHERE x = ANY(ARRAY[1,2,3,...,n]) and WHERE x = ANY(VALUES (1),(2),(3),...,(n)) now are estimated similarly. This was discussed some time ago, but consensus was it'd be better to slip it in at the start of a development cycle not near the end. (It should've gone into v10, really, but I forgot about it.) Discussion: https://postgr.es/m/CAMkU=1xHkyPa8VQgGcCNg3RMFFvVxUdOpus1gKcFuvVi0w6Acg@mail.gmail.com	2017-08-16 15:37:20 -04:00
Heikki Linnakangas	ac883ac453	Fix shm_toc.c to always return buffer-aligned memory. Previously, if you passed a non-aligned size to shm_toc_create(), the memory returned by shm_toc_allocate() would be similarly non-aligned. This was exposed by commit `3cda10f41b`, which allocated structs containing a pg_atomic_uint64 field with shm_toc_allocate(). On systems with MAXIMUM_ALIGNOF = 4, such structs still need to be 8-bytes aligned, but the memory returned by shm_toc_allocate() was only 4-bytes aligned. It's quite bogus that we abuse BUFFERALIGN to align the structs for pg_atomic_uint64. It doesn't really have anything to do with buffers. But that's a separate issue. This ought to fix the buildfarm failures on 32-bit x86 systems. Discussion: https://www.postgresql.org/message-id/7e0a73a5-0df9-1859-b8ae-9acf122dc38d@iki.fi	2017-08-16 21:52:38 +03:00
Peter Eisentraut	9b5140fb50	Correct representation of foreign tables in information schema tables.table_type is supposed to be 'FOREIGN' rather than 'FOREIGN TABLE' according to the SQL standard.	2017-08-16 11:03:33 -04:00
Heikki Linnakangas	3cda10f41b	Use atomic ops to hand out pages to scan in parallel scan. With a lot of CPUs, the spinlock that protects the current scan location in a parallel scan can become a bottleneck. Use an atomic fetch-and-add instruction instead. David Rowley Discussion: https://www.postgresql.org/message-id/CAKJS1f9tgsPhqBcoPjv9_KUPZvTLCZ4jy%3DB%3DbhqgaKn7cYzm-w@mail.gmail.com	2017-08-16 16:18:41 +03:00
Heikki Linnakangas	0c504a80cf	Remove dedicated B-tree root-split record types. Since commit `40dae7ec53`, which changed the way b-tree page splitting works, there has been no difference in the handling of root, and non-root split WAL records. We don't need to distinguish them anymore If you're worried about the loss of debugging information, note that usually a root split record will normally be followed by a WAL record to create the new root page. The root page will also have the BTP_ROOT flag set on the page itself, and there is a pointer to it from the metapage. Author: Aleksander Alekseev Discussion: https://www.postgresql.org/message-id/20170406122116.GA11081@e733.localdomain	2017-08-16 12:24:40 +03:00
Peter Eisentraut	77d05706be	Fix up some misusage of appendStringInfo() and friends Change to appendStringInfoChar() or appendStringInfoString() where those can be used. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>	2017-08-15 23:34:39 -04:00
Peter Eisentraut	4d4c891715	Initialize replication_slot_catalog_xmin in procarray Although not confirmed and probably rare, if the newly allocated memory is not already zero, this could possibly have caused some problems. Also reorder the initializations slightly so they match the order of the struct definition. Author: Wong, Yi Wen <yiwong@amazon.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>	2017-08-15 21:05:21 -04:00
Peter Eisentraut	0659465caa	Include foreign tables in information_schema.table_privileges This appears to have been an omission in the original commit `0d692a0dc9`. All related information_schema views already include foreign tables. Reported-by: Nicolas Thauvin <nicolas.thauvin@dalibo.com>	2017-08-15 19:27:22 -04:00
Alvaro Herrera	31ae1638ce	Simplify autovacuum work-item implementation The initial implementation of autovacuum work-items used a dynamic shared memory area (DSA). However, it's argued that dynamic shared memory is not portable enough, so we cannot rely on it being supported everywhere; at the same time, autovacuum work-items are now a critical part of the server, so it's not acceptable that they don't work in the cases where dynamic shared memory is disabled. Therefore, let's fall back to a simpler implementation of work-items that just uses autovacuum's main shared memory segment for storage. Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com	2017-08-15 18:14:07 -03:00
Peter Eisentraut	70b573b267	Fix logical replication protocol comparison logic Since we currently only have one protocol, this doesn't make much of a difference other than the error message. Author: Yugo Nagata <nagata@sraoss.co.jp>	2017-08-15 16:21:19 -04:00
Peter Eisentraut	e42351ae07	Simplify some code in logical replication launcher Avoid unnecessary locking calls when a subscription is disabled. Author: Yugo Nagata <nagata@sraoss.co.jp>	2017-08-15 15:13:06 -04:00
Tom Lane	4867d7f62f	Avoid out-of-memory in a hash join with many duplicate inner keys. The executor is capable of splitting buckets during a hash join if too much memory is being used by a small number of buckets. However, this only helps if a bucket's population is actually divisible; if all the hash keys are alike, the tuples still end up in the same new bucket. This can result in an OOM failure if there are enough inner keys with identical hash values. The planner's cost estimates will bias it against choosing a hash join in such situations, but not by so much that it will never do so. To mitigate the OOM hazard, explicitly estimate the hash bucket space needed by just the inner side's most common value, and if that would exceed work_mem then add disable_cost to the hash cost estimate. This approach doesn't account for the possibility that two or more common values would share the same hash value. On the other hand, work_mem is normally a fairly conservative bound, so that eating two or more times that much space is probably not going to kill us. If we have no stats about the inner side, ignore this consideration. There was some discussion of making a conservative assumption, but that would effectively result in disabling hash join whenever we lack stats, which seems like an overreaction given how seldom the problem manifests in the field. Per a complaint from David Hinkle. Although this could be viewed as a bug fix, the lack of similar complaints weighs against back- patching; indeed we waited for v11 because it seemed already rather late in the v10 cycle to be making plan choice changes like this one. Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us	2017-08-15 14:05:53 -04:00
Alvaro Herrera	d9a622cee1	Fix error handling path in autovacuum launcher The original code (since `00e6a16d01`) was assuming aborting the transaction in autovacuum launcher was sufficient to release all resources, but in reality the launcher runs quite a lot of code out of any transactions. Re-introduce individual cleanup calls to make abort more robust. Reported-by: Robert Haas Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com	2017-08-15 13:35:12 -03:00
Robert Haas	e139f1953f	Assorted preparatory refactoring for partition-wise join. Instead of duplicating the logic to search for a matching ParamPathInfo in multiple places, factor it out into a separate function. Pass only the relevant bits of the PartitionKey to partition_bounds_equal instead of the whole thing, because partition-wise join will want to call this without having a PartitionKey available. Adjust allow_star_schema_join and calc_nestloop_required_outer to take relevant Relids rather than the entire Path, because partition-wise join will want to call it with the top-parent relids to determine whether a child join is allowable. Ashutosh Bapat. Review and testing of the larger patch set of which this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih, Thomas Munro, Dilip Kumar, and me. Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com	2017-08-15 12:30:38 -04:00
Tom Lane	00418c6124	Simplify plpgsql's check for simple expressions. plpgsql wants to recognize expressions that it can execute directly via ExecEvalExpr() instead of going through the full SPI machinery. Originally the test for this consisted of recursively groveling through the post-planning expression tree to see if it contained only nodes that plpgsql recognized as safe. That was a major maintenance headache, since it required updating plpgsql every time we added any kind of expression node. It was also kind of expensive, so over time we added various pre-planning checks to try to short-circuit having to do that. Robert Haas pointed out that as of the SRF-processing changes in v10, particularly the addition of Query.hasTargetSRFs, there really isn't any reason to make the recursive scan at all: the initial checks cover everything we really care about. We do have to make sure that those checks agree with what inline_function() considers, so that inlining of a function that formerly wasn't inlined can't cause an expression considered simple to become non-simple. Hence, delete the recursive function exec_simple_check_node(), and tweak those other tests to more exactly agree with inline_function(). Adjust some comments and function naming to match. Discussion: https://postgr.es/m/CA+TgmoZGZpwdEV2FQWaVxA_qZXsQE1DAS5Fu8fwxXDNvfndiUQ@mail.gmail.com	2017-08-15 12:28:39 -04:00
Tom Lane	f3a4d7e7c2	Distinguish wait-for-connection from wait-for-write-ready on Windows. The API for WaitLatch and friends followed the Unix convention in which waiting for a socket connection to complete is identical to waiting for the socket to accept a write. While Windows provides a select(2) emulation that agrees with that, the native WaitForMultipleObjects API treats them as quite different --- and for some bizarre reason, it will report a not-yet-connected socket as write-ready. libpq itself has so far escaped dealing with this because it waits with select(), but in libpqwalreceiver.c we want to wait using WaitLatchOrSocket. The semantics mismatch resulted in replication connection failures on Windows, but only for remote connections (apparently, localhost connections complete immediately, or at least too fast for anyone to have noticed the problem in single-machine testing). To fix, introduce an additional WL_SOCKET_CONNECTED wait flag for WaitLatchOrSocket, which is identical to WL_SOCKET_WRITEABLE on non-Windows, but results in waiting for FD_CONNECT events on Windows. Ideally, we would also distinguish the two conditions in the API for PQconnectPoll(), but changing that API at this point seems infeasible. Instead, cheat by checking for PQstatus() == CONNECTION_STARTED to determine that we're still waiting for the connection to complete. (This is a cheat mainly because CONNECTION_STARTED is documented as an internal state rather than something callers should rely on. Perhaps we ought to change the documentation ... but this patch doesn't.) Per reports from Jobin Augustine and Igor Neyman. Back-patch to v10 where commit `1e8a85009` exposed this longstanding shortcoming. Andres Freund, minor fix and some code review/beautification by me Discussion: https://postgr.es/m/CAHBggj8g2T+ZDcACZ2FmzX9CTxkWjKBsHd6NkYB4i9Ojf6K1Fw@mail.gmail.com	2017-08-15 11:07:57 -04:00
Robert Haas	480f1f4329	Teach adjust_appendrel_attrs(_multilevel) to do multiple translations. Currently, child relations are always base relations, so when we translate parent relids to child relids, we only need to translate a singler relid. However, the proposed partition-wise join feature will create child joins, which will mean we need to translate a set of parent relids to the corresponding child relids. This is preliminary refactoring to make that possible. Ashutosh Bapat. Review and testing of the larger patch set of which this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih, Thomas Munro, Dilip Kumar, and me. Some adjustments, mostly cosmetic, by me. Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com	2017-08-15 10:49:06 -04:00
Robert Haas	d57929afc7	Avoid unnecessary single-child Append nodes. Before commit `d3cc37f1d8`, an inheritance parent whose only children were temp tables of other sessions would end up as a simple scan of the parent; but with that commit, we end up with an Append node, per a report from Ashutosh Bapat. Tweak the logic so that we go back to the old way, and update the function header comment for partitioning while we're at it. Ashutosh Bapat, reviewed by Amit Langote and adjusted by me. Discussion: http://postgr.es/m/CAFjFpReWJr1yTkHU=OqiMBmcYCMoSW3VPR39RBuQ_ovwDFBT5Q@mail.gmail.com	2017-08-15 09:16:33 -04:00
Robert Haas	1295a77788	Add missing call to ExecReScanGatherMerge. Amit Kapila Discussion: http://postgr.es/m/CAA4eK1KeQWZOoDmDmGMwuqzPW9JhRS+ditQVFdAfGjNmMZzqMQ@mail.gmail.com	2017-08-15 08:06:36 -04:00
Tom Lane	21d304dfed	Final pgindent + perltidy run for v10.	2017-08-14 17:29:33 -04:00
Tom Lane	5b6289c1e0	Handle elog(FATAL) during ROLLBACK more robustly. Stress testing by Andreas Seltenreich disclosed longstanding problems that occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are trying to execute a ROLLBACK of an already-failed transaction. In such a case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction would skip AbortTransaction and go straight to CleanupTransaction. This led to an assert failure in an assert-enabled build (due to the ROLLBACK's portal still having a cleanup hook) or without assertions, to a FATAL exit complaining about "cannot drop active portal". The latter's not disastrous, perhaps, but it's messy enough to want to improve it. We don't really want to run all of AbortTransaction in this code path. The minimum required to clean up the open portal safely is to do AtAbort_Memory and AtAbort_Portals. It seems like a good idea to do AtAbort_Memory unconditionally, to be entirely sure that we are starting with a safe CurrentMemoryContext. That means that if the main loop in AbortOutOfAnyTransaction does nothing, we need an extra step at the bottom to restore CurrentMemoryContext = TopMemoryContext, which I chose to do by invoking AtCleanup_Memory. This'll result in calling AtCleanup_Memory twice in many of the paths through this function, but that seems harmless and reasonably inexpensive. The original motivation for the assertion in AtCleanup_Portals was that we wanted to be sure that any user-defined code executed as a consequence of the cleanup hook runs during AbortTransaction not CleanupTransaction. That still seems like a valid concern, and now that we've seen one case of the assertion firing --- which means that exactly that would have happened in a production build --- let's replace the Assert with a runtime check. If we see the cleanup hook still set, we'll emit a WARNING and just drop the hook unexecuted. This has been like this a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu	2017-08-14 15:43:20 -04:00
Peter Eisentraut	7f1bb1d734	Fix typo Author: Masahiko Sawada <sawada.mshk@gmail.com>	2017-08-14 13:53:05 -04:00
Tom Lane	004a9702e0	Remove AtEOXact_CatCache(). The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com	2017-08-13 16:15:14 -04:00
Alvaro Herrera	2336f84284	Reword comment for clarity Reported by Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com	2017-08-12 23:26:35 -04:00
Peter Eisentraut	a1ef920e27	Remove uses of "slave" in replication contexts This affects mostly code comments, some documentation, and tests. Official APIs already used "standby".	2017-08-10 22:55:41 -04:00
Robert Haas	bb5d6e80b1	Improve the error message when creating an empty range partition. The previous message didn't mention the name of the table or the bounds. Put the table name in the primary error message and the bounds in the detail message. Amit Langote, changed slightly by me. Suggestions on the exac phrasing from Tom Lane, David G. Johnston, and Dean Rasheed. Discussion: http://postgr.es/m/CA+Tgmoae6bpwVa-1BMaVcwvCCeOoJ5B9Q9-RHWo-1gJxfPBZ5Q@mail.gmail.com	2017-08-10 13:46:56 -04:00
Robert Haas	ec99dd5aee	Remove incorrect assertion in clog.c We must advance the oldest XID that can be safely looked up in clog before truncating CLOG, and the oldest XID that can't be reused after truncating CLOG. This assertion, and the accompanying comment, are confused; remove them. Reported by Neha Sharma. Discussion: http://postgr.es/m/CANiYTQumC3T=UMBMd1Hor=5XWZYuCEQBioL3ug0YtNQCMMT5wQ@mail.gmail.com	2017-08-10 11:20:57 -04:00
Tom Lane	749c7c4170	Fix handling of container types in find_composite_type_dependencies. find_composite_type_dependencies correctly found columns that are of the specified type, and columns that are of arrays of that type, but not columns that are domains or ranges over the given type, its array type, etc. The most general way to handle this seems to be to assume that any type that is directly dependent on the specified type can be treated as a container type, and processed recursively (allowing us to handle nested cases such as ranges over domains over arrays ...). Since a type's array type already has such a dependency, we can drop the existing special case for the array type. The very similar logic in get_rels_with_domain was likewise a few bricks shy of a load, as it supposed that a directly dependent type could only be a sub-domain. This is already wrong for ranges over domains, and it'll someday be wrong for arrays over domains. Add test cases illustrating the problems, and back-patch to all supported branches. Discussion: https://postgr.es/m/15268.1502309024@sss.pgh.pa.us	2017-08-09 17:03:09 -04:00
Tom Lane	9bf4068cc3	Fix datumSerialize infrastructure to not crash on non-varlena data. Commit `1efc7e538` did a poor job of emulating existing logic for touching Datums that might be expanded-object pointers. It didn't check for typlen being -1 first, which meant it could crash on fixed-length pass-by-ref values, and probably on cstring values as well. It also didn't use DatumGetPointer before VARATT_IS_EXTERNAL_EXPANDED, which while currently harmless is not according to documentation nor prevailing style. I also think the lack of any explanation as to why datumSerialize makes these particular nonobvious choices is pretty awful, so fix that. Per report from Jarred Ward. Back-patch to 9.6 where this code came in. Discussion: https://postgr.es/m/6F61E6D2-2F5E-4794-9479-A429BE1CEA4B@simple.com	2017-08-08 19:18:22 -04:00
Alvaro Herrera	77d2c00af7	Reword some unclear comments	2017-08-08 18:48:01 -04:00
Alvaro Herrera	f5d54ef97a	Fix typo in comment	2017-08-08 18:34:25 -04:00
Alvaro Herrera	b2c95a3798	Fix replication origin-related race conditions Similar to what was fixed in commit `9915de6c1c` for replication slots, but this time it's related to replication origins: DROP SUBSCRIPTION attempts to drop the replication origin, but that fails if the replication worker process hasn't yet marked it unused. This causes failures in the buildfarm: ERROR: could not drop replication origin with OID 1, in use by PID 34069 Like the aforementioned commit, fix by having the process running DROP SUBSCRIPTION sleep until the worker marks the the replication origin struct as free. This uses a condition variable on each replication origin shmem state struct, so that the session trying to drop can sleep and expect to be awakened by the process keeping the origin open. Also fix a SGML markup in the previous commit. Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql	2017-08-08 16:07:46 -04:00
Alvaro Herrera	030273b7ea	Fix inadequacies in recently added wait events In commit `9915de6c1c`, we introduced a new wait point for replication slots and incorrectly labelled it as wait event PG_WAIT_LOCK. That's wrong, so invent an appropriate new wait event instead, and document it properly. While at it, fix numerous other problems in the vicinity: - two different walreceiver wait events were being mixed up in a single wait event (which wasn't documented either); split it out so that they can be distinguished, and document the new events properly. - ParallelBitmapPopulate was documented but didn't exist. - ParallelBitmapScan was not documented (I think this should be called "ParallelBitmapScanInit" instead.) - Logical replication wait events weren't documented - various symbols had been added in dartboard order in various places. Put them in alphabetical order instead, as was originally intended. Discussion: https://postgr.es/m/20170808181131.mu4fjepuh5m75cyq@alvherre.pgsql	2017-08-08 15:37:44 -04:00
Peter Eisentraut	cdc47d1f39	Update SQL features list	2017-08-07 14:30:24 -04:00
Peter Eisentraut	f7668b2b35	Translation updates Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 1a0b5e655d7871506c2b1c7ba562c2de6b6a55de	2017-08-07 13:55:34 -04:00
Peter Eisentraut	fca17a933b	Fix local/remote attribute mix-up in logical replication This would lead to failures if local and remote tables have a different column order. The tests previously didn't catch that because they only tested the initial data copy. So add another test that exercises the apply worker. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-08-07 10:49:08 -04:00
Peter Eisentraut	0e58455dd4	Fix handling of dropped columns in logical replication The relation attribute map was not initialized for dropped columns, leading to errors later on. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Scott Milliken <scott@deltaex.com> Bug: #14769	2017-08-07 10:28:35 -04:00
Tom Lane	8d9881911f	Require update permission for the large object written by lo_put(). lo_put() surely should require UPDATE permission, the same as lowrite(), but it failed to check for that, as reported by Chapman Flack. Oversight in commit c50b7c09d; backpatch to 9.4 where that was introduced. Tom Lane and Michael Paquier Security: CVE-2017-7548	2017-08-07 10:19:19 -04:00
Noah Misch	e568e1eee4	Again match pg_user_mappings to information_schema.user_mapping_options. Commit `3eefc51053` claimed to make pg_user_mappings enforce the qualifications user_mapping_options had been enforcing, but its removal of a longstanding restriction left them distinct when the current user is the subject of a mapping yet has no server privileges. user_mapping_options emits no rows for such a mapping, but pg_user_mappings includes full umoptions. Change pg_user_mappings to show null for umoptions. Back-patch to 9.2, like the above commit. Reviewed by Tom Lane. Reported by Jeff Janes. Security: CVE-2017-7547	2017-08-07 07:09:28 -07:00
Heikki Linnakangas	bf6b9e9444	Don't allow logging in with empty password. Some authentication methods allowed it, others did not. In the client-side, libpq does not even try to authenticate with an empty password, which makes using empty passwords hazardous: an administrator might think that an account with an empty password cannot be used to log in, because psql doesn't allow it, and not realize that a different client would in fact allow it. To clear that confusion and to be be consistent, disallow empty passwords in all authentication methods. All the authentication methods that used plaintext authentication over the wire, except for BSD authentication, already checked that the password received from the user was not empty. To avoid forgetting it in the future again, move the check to the recv_password_packet function. That only forbids using an empty password with plaintext authentication, however. MD5 and SCRAM need a different fix: * In stable branches, check that the MD5 hash stored for the user does not not correspond to an empty string. This adds some overhead to MD5 authentication, because the server needs to compute an extra MD5 hash, but it is not noticeable in practice. * In HEAD, modify CREATE and ALTER ROLE to clear the password if an empty string, or a password hash that corresponds to an empty string, is specified. The user-visible behavior is the same as in the stable branches, the user cannot log in, but it seems better to stop the empty password from entering the system in the first place. Secondly, it is fairly expensive to check that a SCRAM hash doesn't correspond to an empty string, because computing a SCRAM hash is much more expensive than an MD5 hash by design, so better avoid doing that on every authentication. We could clear the password on CREATE/ALTER ROLE also in stable branches, but we would still need to check at authentication time, because even if we prevent empty passwords from being stored in pg_authid, there might be existing ones there already. Reported by Jeroen van der Ham, Ben de Graaff and Jelte Fennema. Security: CVE-2017-7546	2017-08-07 17:03:42 +03:00
Peter Eisentraut	86524f0387	Fix function name in code comment Reported-by: Peter Geoghegan <pg@bowt.ie>	2017-08-07 09:49:55 -04:00
Peter Eisentraut	ad2ca3cba6	Improve wording of subscription refresh debug messages Reported-by: Yugo Nagata <nagata@sraoss.co.jp>	2017-08-07 09:40:12 -04:00
Peter Eisentraut	6f81306e4d	Downgrade subscription refresh messages to DEBUG1 The NOTICE messages about tables being added or removed during subscription refresh would be incorrect and possibly confusing if the transaction rolls back, so silence them but keep them available for debugging. Discussion: https://www.postgresql.org/message-id/CAD21AoAvaXizc2h7aiNyK_i0FQSa-tmhpdOGwbhh7Jy544Ad4Q%40mail.gmail.com	2017-08-07 09:16:03 -04:00
Andres Freund	5af4456a56	Fix thinko introduced in `2bef06d516` et al. The callers for GetOldestSafeDecodingTransactionId() all inverted the argument for the argument introduced in `2bef06d516`. Luckily this appears to be inconsequential for the moment, as we wait for concurrent in-progress transaction when assembling a snapshot. Additionally this could only make a difference when adding a second logical slot, because only a pre-existing slot could cause an issue by lowering the returned xid dangerously much. Reported-By: Antonin Houska Discussion: https://postgr.es/m/32704.1496993134@localhost Backport: 9.4-, where `2bef06d516` was backpatched to.	2017-08-06 14:20:55 -07:00
Tom Lane	e9f4ac1389	Suppress unused-variable warnings when building with ICU 4.2. Tidy-up for commit `eccead9ed`.	2017-08-05 11:48:43 -04:00
Robert Haas	52f8a59dd9	Make pg_stop_backup's wait_for_archive flag work on standbys. Previously, it had no effect. Now, if archive_mode=always, it will work, and if not, you'll get a warning. Masahiko Sawada, Michael Paquier, and Robert Haas. The patch as submitted also changed the behavior so that we would write and remove history files on standbys, but that seems like material for a separate patch to me. Discussion: http://postgr.es/m/CAD21AoC2Xw6M=ZJyejq_9d_iDkReC_=rpvQRw5QsyzKQdfYpkw@mail.gmail.com	2017-08-05 10:49:26 -04:00
Peter Eisentraut	eccead9ed4	Add support for ICU 4.2 Supporting ICU 4.2 seems useful because it ships with CentOS 6. Versions before ICU 4.6 don't support pkg-config, so document an installation method without using pkg-config. In ICU 4.2, ucol_getKeywordsForLocale() sometimes returns values that will not be accepted by uloc_toLanguageTag(). Skip loading keyword variants in that version. Reported-by: Victor Wagner <vitus@wagner.pp.ru>	2017-08-05 09:32:42 -04:00
Robert Haas	f85f88bcc2	Fix bug in deciding whether to scan newly-attached partition. If the table being attached had different attribute numbers than the parent, the old code could incorrectly decide it needed to be scanned. Amit Langote, reviewed by Ashutosh Bapat Discussion: http://postgr.es/m/CA+TgmobexgbBr2+Utw-pOMw9uxaBRKRjMW_-mmzKKx9PejPLMg@mail.gmail.com	2017-08-04 22:01:37 -04:00
Peter Eisentraut	7e174fa793	Only kill sync workers at commit time in subscription DDL This allows a transaction abort to avoid killing those workers. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-08-04 21:17:47 -04:00
Robert Haas	ff98a5e1e4	hash: Immediately after a bucket split, try to clean the old bucket. If it works, then we won't be storing two copies of all the tuples that were just moved. If not, VACUUM will still take care of it eventually. Per a report from AP and analysis from Amit Kapila, it seems that a bulk load can cause splits fast enough that VACUUM won't deal with the problem in time to prevent bloat. Amit Kapila; I rewrote the comment. Discussion: http://postgr.es/m/20170704105728.mwb72jebfmok2nm2@zip.com.au	2017-08-04 19:33:01 -04:00
Tom Lane	c30f1770a9	Apply ALTER ... SET NOT NULL recursively in ALTER ... ADD PRIMARY KEY. If you do ALTER COLUMN SET NOT NULL against an inheritance parent table, it will recurse to mark all the child columns as NOT NULL as well. This is necessary for consistency: if the column is labeled NOT NULL then reading it should never produce nulls. However, that didn't happen in the case where ALTER ... ADD PRIMARY KEY marks a target column NOT NULL that wasn't before. That was questionable from the beginning, and now Tushar Ahuja points out that it can lead to dump/restore failures in some cases. So let's make that case recurse too. Although this is meant to fix a bug, it's enough of a behavioral change that I'm pretty hesitant to back-patch, especially in view of the lack of similar field complaints. It doesn't seem to be too late to put it into v10 though. Michael Paquier, editorialized on slightly by me Discussion: https://postgr.es/m/b8794d6a-38f0-9d7c-ad4b-e85adf860fc9@enterprisedb.com	2017-08-04 11:45:18 -04:00
Tom Lane	97d3a0b090	Disallow SSL session tickets. We don't actually support session tickets, since we do not create an SSL session identifier. But it seems that OpenSSL will issue a session ticket on-demand anyway, which will then fail when used. This results in reconnection failures when using ticket-aware client-side SSL libraries (such as the Npgsql .NET driver), as reported by Shay Rojansky. To fix, just tell OpenSSL not to issue tickets. At some point in the far future, we might consider enabling tickets instead. But the security implications of that aren't entirely clear; and besides it would have little benefit except for very short-lived database connections, which is Something We're Bad At anyhow. It would take a lot of other work to get to a point where that would really be an exciting thing to do. While at it, also tell OpenSSL not to use a session cache. This doesn't really do anything, since a backend would never populate the cache anyway, but it might gain some micro-efficiencies and/or reduce security exposures. Patch by me, per discussion with Heikki Linnakangas and Shay Rojansky. Back-patch to all supported versions. Discussion: https://postgr.es/m/CADT4RqBU8N-csyZuzaook-c795dt22Zcwg1aHWB6tfVdAkodZA@mail.gmail.com	2017-08-04 11:07:10 -04:00
Peter Eisentraut	b374481221	Further unify ROLE and USER command grammar rules ALTER USER ... SET did not support all the syntax variants of ALTER ROLE ... SET. Fix that, and to avoid further deviations of this kind, unify many the grammar rules for ROLE/USER/GROUP commands. Reported-by: Pavel Golub <pavel@microolap.com>	2017-08-03 20:34:45 -04:00
Robert Haas	972b6ec20b	Fix lock upgrade hazard in ATExecAttachPartition. Amit Langote Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com	2017-08-03 14:21:00 -04:00
Robert Haas	583df3b5c5	Code beautification for ATExecAttachPartition. Amit Langote Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com	2017-08-03 14:19:59 -04:00
Robert Haas	86705aa8c3	Allow a foreign table CHECK constraint to be initially NOT VALID. For a table, the constraint can be considered validated immediately, because the table must be empty. But for a foreign table this is not necessarily the case. Fixes a bug in commit `f27a6b15e6`. Amit Langote, with some changes by me. Discussion: http://postgr.es/m/d2b7419f-4a71-cf86-cc99-bfd0f359a1ea@lab.ntt.co.jp	2017-08-03 13:24:48 -04:00
Robert Haas	12a34f59bf	Improve ExecModifyTable comments. Some of these comments wrongly implied that only an AFTER ROW trigger will cause a 'wholerow' attribute to be present for a foreign table, but a BEFORE ROW trigger can have the same effect. Others implied that it would always be present for a foreign table, but that's not true either. Etsuro Fujita and Robert Haas Discussion: http://postgr.es/m/10026bc7-1403-ef85-9e43-c6100c1cc0e3@lab.ntt.co.jp	2017-08-03 12:47:00 -04:00
Robert Haas	610e8ebb0f	Teach map_partition_varattnos to handle whole-row expressions. Otherwise, partitioned tables with RETURNING expressions or subject to a WITH CHECK OPTION do not work properly. Amit Langote, reviewed by Amit Khandekar and Etsuro Fujita. A few comment changes by me. Discussion: http://postgr.es/m/9a39df80-871e-6212-0684-f93c83be4097@lab.ntt.co.jp	2017-08-03 11:21:29 -04:00
Tom Lane	9d4e566999	Remove broken and useless entry-count printing in HASH_DEBUG code. init_htab(), with #define HASH_DEBUG, prints a bunch of hashtable parameters. It used to also print nentries, but commit `44ca4022f` changed that to "hash_get_num_entries(hctl)", which is wrong (the parameter should be "hashp"). Rather than correct the coding, though, let's just remove that field from the printout. The table must be empty, since we just finished building it, so expensively calculating the number of entries is rather pointless. Moreover hash_get_num_entries makes assumptions (about not needing locks) which we could do without in debugging code. Noted by Choi Doo-Won in bug #14764. Back-patch to 9.6 where the faulty code was introduced. Discussion: https://postgr.es/m/20170802032353.8424.12274@wrigleys.postgresql.org	2017-08-02 12:17:08 -04:00
Peter Eisentraut	cf65201833	Get a snapshot before COPY in table sync This fixes a crash if the local table has a function index and the function makes non-immutable calls. Reported-by: Scott Milliken <scott@deltaex.com> Author: Masahiko Sawada <sawada.mshk@gmail.com>	2017-08-02 11:34:42 -04:00
Tom Lane	f352f91cbf	Remove duplicate setting of SSL_OP_SINGLE_DH_USE option. Commit `c0a15e07c` moved the setting of OpenSSL's SSL_OP_SINGLE_DH_USE option into a new subroutine initialize_dh(), but forgot to remove it from where it was. SSL_CTX_set_options() is a trivial function, amounting indeed to just "ctx->options \|= op", hence there's no reason to contort the code or break separation of concerns to avoid calling it twice. So separating the DH setup from disabling of old protocol versions is a good change, but we need to finish the job. Noted while poking into the question of SSL session tickets.	2017-08-02 11:28:49 -04:00
Peter Eisentraut	41cefbb6db	Fix OBJECT_TYPE/OBJECT_DOMAIN confusion This doesn't have a significant impact except that now SECURITY LABEL ON DOMAIN rejects types that are not domains. Reported-by: 高增琦 <pgf00a@gmail.com>	2017-08-02 10:40:32 -04:00
Tom Lane	514f613293	Second try at getting useful errors out of newlocale/_create_locale. The early buildfarm returns for commit `1e165d05f` are pretty awful: not only does Windows not return a useful error, but it looks like a lot of Unix-ish platforms don't either. Given the number of different errnos seen so far, guess that what's really going on is that some newlocale() implementations fail to set errno at all. Hence, let's try zeroing errno just before newlocale() and then if it's still zero report as though it's ENOENT. That should cover the Windows case too. It's clear that we'll have to drop the regression test case, unless we want to maintain a separate expected-file for platforms without HAVE_LOCALE_T. But I'll leave it there awhile longer to see if this actually improves matters or not. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com	2017-08-01 17:17:20 -04:00
Tom Lane	1e165d05fe	Try to deliver a sane message for _create_locale() failure on Windows. We were just printing errno, which is certainly not gonna work on Windows. Now, it's not entirely clear from Microsoft's documentation whether _create_locale() adheres to standard Windows error reporting conventions, but let's assume it does and try to map the GetLastError result to an errno. If this turns out not to work, probably the best thing to do will be to assume the error is always ENOENT on Windows. This is a longstanding bug, but given the lack of previous field complaints, I'm not excited about back-patching it. Per report from Murtuza Zabuawala. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com	2017-08-01 16:11:51 -04:00
Tom Lane	f97256570f	Allow creation of C/POSIX collations without depending on libc behavior. Most of our collations code has special handling for the locale names "C" and "POSIX", allowing those collations to be used whether or not the system libraries think those locale names are valid, or indeed whether said libraries even have any locale support. But we missed handling things that way in CREATE COLLATION. This meant you couldn't clone the C/POSIX collations, nor explicitly define a new collation using those locale names, unless the libraries allow it. That's pretty pointless, as well as being a violation of pg_newlocale_from_collation's API specification. The practical effect of this change is quite limited: it allows creating such collations even on platforms that don't HAVE_LOCALE_T, and it allows making "POSIX" collation objects on Windows, which before this would only let you make "C" collation objects. Hence, even though this is a bug fix IMO, it doesn't seem worth the trouble to back-patch. In passing, suppress the DROP CASCADE detail messages at the end of the collation regression test. I'm surprised we've never been bit by message ordering issues there. Per report from Murtuza Zabuawala. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com	2017-08-01 13:51:05 -04:00
Dean Rasheed	4de6216877	Comment fix for partition_rbound_cmp(). This was an oversight in `d363d42`. Beena Emerson	2017-08-01 09:40:45 +01:00
Peter Eisentraut	f40254a799	Fix typo Author: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>	2017-07-31 17:08:14 -04:00
Heikki Linnakangas	c0a15e07cd	Always use 2048 bit DH parameters for OpenSSL ephemeral DH ciphers. 1024 bits is considered weak these days, but OpenSSL always passes 1024 as the key length to the tmp_dh callback. All the code to handle other key lengths is, in fact, dead. To remedy those issues: * Only include hard-coded 2048-bit parameters. * Set the parameters directly with SSL_CTX_set_tmp_dh(), without the callback * The name of the file containing the DH parameters is now a GUC. This replaces the old hardcoded "dh1024.pem" filename. (The files for other key lengths, dh512.pem, dh2048.pem, etc. were never actually used.) This is not a new problem, but it doesn't seem worth the risk and churn to backport. If you care enough about the strength of the DH parameters on old versions, you can create custom DH parameters, with as many bits as you wish, and put them in the "dh1024.pem" file. Per report by Nicolas Guini and Damian Quiroga. Reviewed by Michael Paquier. Discussion: https://www.postgresql.org/message-id/CAMxBoUyjOOautVozN6ofzym828aNrDjuCcOTcCquxjwS-L2hGQ@mail.gmail.com	2017-07-31 22:36:09 +03:00
Tatsuo Ishii	393d47ed0f	Add missing comment in postgresql.conf. current_source requires to restart server to reflect the new value. Per Yugo Nagata and Masahiko Sawada. Back patched to 9.2 and beyond.	2017-07-31 11:24:51 +09:00
Tatsuo Ishii	8b015dd723	Add missing comment in postgresql.conf. dynamic_shared_memory_type requires to restart server to reflect the new value. Per Yugo Nagata and Masahiko Sawada. Back pached to 9.4 and beyond.	2017-07-31 11:06:37 +09:00
Tatsuo Ishii	9fe63092b5	Add missing comment in postgresql.conf. max_logical_replication_workers requires to restart server to reflect the new value. Per Yugo Nagata. Minor editing by me.	2017-07-31 10:46:32 +09:00
Andres Freund	cc9f08b6b8	Move ExecProcNode from dispatch to function pointer based model. This allows us to add stack-depth checks the first time an executor node is called, and skip that overhead on following calls. Additionally it yields a nice speedup. While it'd probably have been a good idea to have that check all along, it has become more important after the new expression evaluation framework in `b8d7f053c5` - there's no stack depth check in common paths anymore now. We previously relied on ExecEvalExpr() being executed somewhere. We should move towards that model for further routines, but as this is required for v10, it seems better to only do the necessary (which already is quite large). Author: Andres Freund, Tom Lane Reported-By: Julien Rouhaud Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com	2017-07-30 16:18:21 -07:00
Andres Freund	d47cfef711	Move interrupt checking from ExecProcNode() to executor nodes. In a followup commit ExecProcNode(), and especially the large switch it contains, will largely be replaced by a function pointer directly to the correct node. The node functions will then get invoked by a thin inline function wrapper. To avoid having to include miscadmin.h in headers - CHECK_FOR_INTERRUPTS() - move the interrupt checks into the individual executor routines. While looking through all executor nodes, I noticed a number of arguably missing interrupt checks, add these too. Author: Andres Freund, Tom Lane Reviewed-By: Tom Lane Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us	2017-07-30 16:06:42 -07:00
Alvaro Herrera	5e3254f086	Update copyright in recently added files	2017-07-26 18:17:18 -04:00
Alvaro Herrera	459c64d322	Fix concurrent locking of tuple update chain If several sessions are concurrently locking a tuple update chain with nonconflicting lock modes using an old snapshot, and they all succeed, it may happen that some of them fail because of restarting the loop (due to a concurrent Xmax change) and getting an error in the subsequent pass while trying to obtain a tuple lock that they already have in some tuple version. This can only happen with very high concurrency (where a row is being both updated and FK-checked by multiple transactions concurrently), but it's been observed in the field and can have unpleasant consequences such as an FK check failing to see a tuple that definitely exists: ERROR: insert or update on table "child_table" violates foreign key constraint "fk_constraint_name" DETAIL: Key (keyid)=(123456) is not present in table "parent_table". (where the key is observably present in the table). Discussion: https://postgr.es/m/20170714210011.r25mrff4nxjhmf3g@alvherre.pgsql	2017-07-26 17:24:16 -04:00
Alvaro Herrera	c28e4f4dc6	Remove obsolete comments about functional dependencies Initial submitted versions of the functional dependencies patch ignored row groups that were smaller than a configured size. However, that consideration was removed in late stages of the patch just before commit, but some comments referring to it remained. Remove them to avoid confusion. Author: Atsushi Torikoshi Discussion: https://postgr.es/m/7cfb23fc-4493-9c02-5da9-e505fd0115d2@lab.ntt.co.jp	2017-07-26 11:40:39 -04:00
Alvaro Herrera	9915de6c1c	Fix race conditions in replication slot operations It is relatively easy to get a replication slot to look as still active while one process is in the process of getting rid of it; when some other process tries to "acquire" the slot, it would fail with an error message of "replication slot XYZ is active for PID N". The error message in itself is fine, except that when the intention is to drop the slot, it is unhelpful: the useful behavior would be to wait until the slot is no longer acquired, so that the drop can proceed. To implement this, we use a condition variable so that slot acquisition can be told to wait on that condition variable if the slot is already acquired, and we make any change in active_pid broadcast a signal on the condition variable. Thus, as soon as the slot is released, the drop will proceed properly. Reported by: Tom Lane Discussion: https://postgr.es/m/11904.1499039688@sss.pgh.pa.us Authors: Petr Jelínek, Álvaro Herrera	2017-07-25 13:26:49 -04:00
Robert Haas	4132dbec69	Fix partitioning crashes during error reporting. In various places where we reverse-map a tuple before calling ExecBuildSlotValueDescription, we neglected to ensure that the slot descriptor matched the tuple stored in it. Amit Langote and Amit Khandekar, reviewed by Etsuro Fujita Discussion: http://postgr.es/m/CAJ3gD9cqpP=WvJj=dv1ONkPWjy8ZuUaOM4_x86i3uQPas=0_jg@mail.gmail.com	2017-07-24 18:08:08 -04:00
Tom Lane	e2c8100e60	Fix race condition in predicate-lock init code in EXEC_BACKEND builds. Trading a little too heavily on letting the code path be the same whether we were creating shared data structures or only attaching to them, InitPredicateLocks() inserted the "scratch" PredicateLockTargetHash entry unconditionally. This is just wrong if we're in a postmaster child, which would only reach this code in EXEC_BACKEND builds. Most of the time, the hash_search(HASH_ENTER) call would simply report that the entry already existed, causing no visible effect since the code did not bother to check for that possibility. However, if this happened while some other backend had transiently removed the "scratch" entry, then that other backend's eventual RestoreScratchTarget would suffer an assert failure; this appears to be the explanation for a recent failure on buildfarm member culicidae. In non-assert builds, there would be no visible consequences there either. But nonetheless this is a pretty bad bug for EXEC_BACKEND builds, for two reasons: 1. Each new backend would perform the hash_search(HASH_ENTER) call without holding any lock that would prevent concurrent access to the PredicateLockTargetHash hash table. This creates a low but certainly nonzero risk of corruption of that hash table. 2. In the event that the race condition occurred, by reinserting the scratch entry too soon, we were defeating the entire purpose of the scratch entry, namely to guarantee that transaction commit could move hash table entries around with no risk of out-of-memory failure. The odds of an actual OOM failure are quite low, but not zero, and if it did happen it would again result in corruption of the hash table. The user-visible symptoms of such corruption are a little hard to predict, but would presumably amount to misbehavior of SERIALIZABLE transactions that'd require a crash or postmaster restart to fix. To fix, just skip the hash insertion if IsUnderPostmaster. I also inserted a bunch of assertions that the expected things happen depending on whether IsUnderPostmaster is true. That might be overkill, since most comparable code in other functions isn't quite that paranoid, but once burnt twice shy. In passing, also move a couple of lines to places where they seemed to make more sense. Diagnosis of problem by Thomas Munro, patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us	2017-07-24 16:45:58 -04:00
Robert Haas	7086be6e36	When WCOs are present, disable direct foreign table modification. If the user modifies a view that has CHECK OPTIONs and this gets translated into a modification to an underlying relation which happens to be a foreign table, the check options should be enforced. In the normal code path, that was happening properly, but it was not working properly for "direct" modification because the whole operation gets pushed to the remote side in that case and we never have an option to enforce the constraint against individual tuples. Fix by disabling direct modification when there is a need to enforce CHECK OPTIONs. Etsuro Fujita, reviewed by Kyotaro Horiguchi and by me. Discussion: http://postgr.es/m/f8a48f54-6f02-9c8a-5250-9791603171ee@lab.ntt.co.jp	2017-07-24 15:57:24 -04:00
Tom Lane	b4af9e3f37	Ensure that pg_get_ruledef()'s output matches pg_get_viewdef()'s. Various cases involving renaming of view columns are handled by having make_viewdef pass down the view's current relation tupledesc to get_query_def, which then takes care to use the column names from the tupledesc for the output column names of the SELECT. For some reason though, we'd missed teaching make_ruledef to do similarly when it is printing an ON SELECT rule, even though this is exactly the same case. The results from pg_get_ruledef would then be different and arguably wrong. In particular, this breaks pre-v10 versions of pg_dump, which in some situations would define views by means of emitting a CREATE RULE ... ON SELECT command. Third-party tools might not be happy either. In passing, clean up some crufty code in make_viewdef; we'd apparently modernized the equivalent code in make_ruledef somewhere along the way, and missed this copy. Per report from Gilles Darold. Back-patch to all supported versions. Discussion: https://postgr.es/m/ec05659a-40ff-4510-fc45-ca9d965d0838@dalibo.com	2017-07-24 15:16:31 -04:00
Tom Lane	278cb43411	Be more consistent about errors for opfamily member lookup failures. Add error checks in some places that were calling get_opfamily_member or get_opfamily_proc and just assuming that the call could never fail. Also, standardize the wording for such errors in some other places. None of these errors are expected in normal use, hence they're just elog not ereport. But they may be handy for diagnosing omissions in custom opclasses. Rushabh Lathia found the oversight in RelationBuildPartitionKey(); I found the others by grepping for all callers of these functions. Discussion: https://postgr.es/m/CAGPqQf2R9Nk8htpv0FFi+FP776EwMyGuORpc9zYkZKC8sFQE3g@mail.gmail.com	2017-07-24 11:23:27 -04:00
Tom Lane	ab2324fd46	Improve comments about partitioned hash table freelists. While I couldn't find any live bugs in commit `44ca4022f`, the comments seemed pretty far from adequate; in particular it was not made plain that "borrowing" entries from other freelists is critical for correctness. Try to improve the commentary. A couple of very minor code style tweaks, as well. Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us	2017-07-22 18:02:26 -04:00
Alvaro Herrera	de38489b92	Fix typo in comment Commit `fd31cd2651` renamed the variable to skipping_blocks, but forgot to update this comment. Noticed while inspecting code.	2017-07-21 20:08:53 -04:00
Teodor Sigaev	7e1fb4c59e	Fix double shared memory allocation. SLRU buffer lwlocks are allocated twice by oversight in commit `fe702a7b3f` where that locks were moved to separate tranche. The bug doesn't have user-visible effects except small overspending of shared memory. Backpatch to 9.6 where it was introduced. Alexander Korotkov with small editorization by me.	2017-07-21 13:31:20 +03:00
Dean Rasheed	d363d42bb9	Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition bounds. Previously, UNBOUNDED meant no lower bound when used in the FROM list, and no upper bound when used in the TO list, which was OK for single-column range partitioning, but problematic with multiple columns. For example, an upper bound of (10.0, UNBOUNDED) would not be collocated with a lower bound of (10.0, UNBOUNDED), thus making it difficult or impossible to define contiguous multi-column range partitions in some cases. Fix this by using MINVALUE and MAXVALUE instead of UNBOUNDED to represent a partition column that is unbounded below or above respectively. This syntax removes any ambiguity, and ensures that if one partition's lower bound equals another partition's upper bound, then the partitions are contiguous. Also drop the constraint prohibiting finite values after an unbounded column, and just document the fact that any values after MINVALUE or MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED multiple times, which was needlessly verbose. Note: Forces a post-PG 10 beta2 initdb. Report by Amul Sul, original patch by Amit Langote with some additional hacking by me. Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com	2017-07-21 09:20:47 +01:00
Tom Lane	eb145fdfea	Fix dumping of outer joins with empty qual lists. Normally, a JoinExpr would have empty "quals" only if it came from CROSS JOIN syntax. However, it's possible to get to this state by specifying NATURAL JOIN between two tables with no common column names, and there might be other ways too. The code previously printed no ON clause if "quals" was empty; that's right for CROSS JOIN but syntactically invalid if it's some type of outer join. Fix by printing ON TRUE in that case. This got broken by commit `2ffa740be`, which stopped using NATURAL JOIN syntax in ruleutils output due to its brittleness in the face of column renamings. Back-patch to 9.3 where that commit appeared. Per report from Tushar Ahuja. Discussion: https://postgr.es/m/98b283cd-6dda-5d3f-f8ac-87db8c76a3da@enterprisedb.com	2017-07-20 11:29:36 -04:00
Tom Lane	3cb29c42f9	Add static assertions about pg_control fitting into one disk sector. When pg_control was first designed, sizeof(ControlFileData) was small enough that a comment seemed like plenty to document the assumption that it'd fit into one disk sector. Now it's nearly 300 bytes, raising the possibility that somebody would carelessly add enough stuff to create a problem. Let's add a StaticAssertStmt() to ensure that the situation doesn't pass unnoticed if it ever occurs. While at it, rename PG_CONTROL_SIZE to PG_CONTROL_FILE_SIZE to make it clearer what that symbol means, and convert the existing runtime comparisons of sizeof(ControlFileData) vs. PG_CONTROL_FILE_SIZE to be static asserts --- we didn't have that technology when this code was first written. Discussion: https://postgr.es/m/9192.1500490591@sss.pgh.pa.us	2017-07-19 16:16:57 -04:00
Tom Lane	04a2c7f412	Improve make_tsvector() to handle empty input, and simplify its callers. It seemed a bit silly that each caller of make_tsvector() was laboriously special-casing the situation where no lexemes were found, when it would be easy and much more bullet-proof to make make_tsvector() handle that.	2017-07-18 13:13:47 -04:00
Tom Lane	b4c6d31c0b	Fix serious performance problems in json(b) to_tsvector(). In an off-list followup to bug #14745, Bob Jones complained that to_tsvector() on a 2MB jsonb value took an unreasonable amount of time and space --- enough to draw the wrath of the OOM killer on his machine. On my machine, his example proved to require upwards of 18 seconds and 4GB, which seemed pretty bogus considering that to_tsvector() on the same data treated as text took just a couple hundred msec and 10 or so MB. On investigation, the problem is that the implementation scans each string element of the json(b) and converts it to tsvector separately, then applies tsvector_concat() to join those separate tsvectors. The unreasonable memory usage came from leaking every single one of the transient tsvectors --- but even without that mistake, this is an O(N^2) or worse algorithm, because tsvector_concat() has to repeatedly process the words coming from earlier elements. We can fix it by accumulating all the lexeme data and applying make_tsvector() just once. As a side benefit, that also makes the desired adjustment of lexeme positions far cheaper, because we can just tweak the running "pos" counter between JSON elements. In passing, try to make the explanation of that tweak more intelligible. (I didn't think that a barely-readable comment far removed from the actual code was helpful.) And do some minor other code beautification.	2017-07-18 12:45:51 -04:00
Robert Haas	c85ec643ff	Reverse-convert row types in ExecWithCheckOptions. Just as we already do in ExecConstraints, and for the same reason: to improve the quality of error messages. Etsuro Fujita, reviewed by Amit Langote Discussion: http://postgr.es/m/56e0baa8-e458-2bbb-7936-367f7d832e43@lab.ntt.co.jp	2017-07-17 21:56:31 -04:00
Robert Haas	f81a91db4d	Use a real RT index when setting up partition tuple routing. Before, we always used a dummy value of 1, but that's not right when the partitioned table being modified is inside of a WITH clause rather than part of the main query. Amit Langote, reported and reviewd by Etsuro Fujita, with a comment change by me. Discussion: http://postgr.es/m/ee12f648-8907-77b5-afc0-2980bcb0aa37@lab.ntt.co.jp	2017-07-17 21:29:45 -04:00
Robert Haas	09c2e7cd2f	hash: Fix write-ahead logging bugs related to init forks. One, logging for CREATE INDEX was oblivious to the fact that when an unlogged table is created, only operations on the init fork should be logged. Two, init fork buffers need to be flushed after they are written; otherwise, a filesystem-level copy following recovery may do the wrong thing. (There may be a better fix for this issue than the one used here, but this is transposed from the similar logic already present in XLogReadBufferForRedoExtended, and a broader refactoring after beta2 seems inadvisable.) Amit Kapila, reviewed by Ashutosh Sharma, Kyotaro Horiguchi, and Michael Paquier Discussion: http://postgr.es/m/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w@mail.gmail.com	2017-07-17 12:03:35 -04:00
Tom Lane	de2af6e001	Improve comments for execExpr.c's handling of FieldStore subexpressions. Given this code's general eagerness to use subexpressions' output variables as temporary workspace, it's not exactly clear that it is safe for FieldStore to tell a newval subexpression that it can write into the same variable that is being supplied as a potential input. Document the chain of assumptions needed for that to be safe.	2017-07-15 16:57:43 -04:00
Tom Lane	e9b64824a0	Improve comments for execExpr.c's isAssignmentIndirectionExpr(). I got confused about why this function doesn't need to recursively search the expression tree for a CaseTestExpr node. After figuring that out, add a comment to save the next person some time.	2017-07-15 14:03:39 -04:00
Tom Lane	decb08ebdf	Code review for NextValueExpr expression node type. Add missing infrastructure for this node type, notably in ruleutils.c where its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs support. (outfuncs support is useful today for debugging purposes. The readfuncs support may never be needed, since at present it would only matter for parallel query and NextValueExpr should never appear in a parallelizable query; but it seems like a bad idea to have a primnode type that isn't fully supported here.) Teach planner infrastructure that NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node with cost cpu_operator_cost. Given its limited scope of usage, there might be no live bug today from the lack of that knowledge, but it's certainly going to bite us on the rear someday. Teach pg_stat_statements about the new node type, too. While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction, XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost. Failing to do this for SQLValueFunction was an oversight in my commit `0bb51aa96`. The others are longer-standing oversights, but no time like the present to fix them. (In principle, CoerceToDomain could have cost much higher than this, but it doesn't presently seem worth trying to examine the domain's constraints here.) Modify execExprInterp.c to execute NextValueExpr as an out-of-line function; it seems quite unlikely to me that it's worth insisting that it be inlined in all expression eval methods. Besides, providing the out-of-line function doesn't stop anyone from inlining if they want to. Adjust some places where NextValueExpr support had been inserted with the aid of a dartboard rather than keeping it in the same order as elsewhere. Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us	2017-07-14 15:25:43 -04:00

... 2 3 4 5 6 ...

17669 Commits