postgresql

Commit Graph

Author	SHA1	Message	Date
Alvaro Herrera	f65b94f639	Avoid pin scan for replay of XLOG_BTREE_VACUUM in all cases Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require complex interlocking that matched the requirements on the master. This required an O(N) operation that became a significant problem with large indexes, causing replication delays of seconds or in some cases minutes while the XLOG_BTREE_VACUUM was replayed. This commit skips the “pin scan” that was previously required, by observing in detail when and how it is safe to do so, with full documentation. The pin scan is skipped only in replay; the VACUUM code path on master is not touched here. No tests included. Manual tests using an additional patch to view WAL records and their timing have shown the change in WAL records and their handling has successfully reduced replication delay. This is a back-patch of commits `687f2cd7a0`, `3e4b7d8798`, `b602842613` by Simon Riggs, to branches 9.4 and 9.5. No further backpatch is possible because this depends on catalog scans being MVCC. I (Álvaro) additionally updated a slight problem in the README, which explains why this touches the 9.6 and master branches.	2016-11-17 13:31:30 -03:00
Robert Haas	b40b4dd9e1	Reserve zero as an invalid DSM handle. Previously, the handle for the control segment could not be zero, but some other DSM segment could potentially have a handle value of zero. However, that means that if someone wanted to store a dsm_handle that might or might not be valid, they would need a separate boolean to keep track of whether the associated value is legal. That's annoying, so change things so that no DSM segment can ever have a handle of 0 - or as we call it here, DSM_HANDLE_INVALID. Thomas Munro. This was submitted as part of a much larger patch to add an malloc-like allocator for dynamic shared memory, but this part seems like a good idea independently of the rest of the patch.	2016-11-15 16:33:29 -05:00
Tom Lane	ffaa44cb55	Account for catalog snapshot in PGXACT->xmin updates. The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting for whether MyPgXact->xmin could be cleared or advanced. In normal transactions this was masked by the fact that the transaction snapshot would be older, but during backend startup and certain utility commands it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had been cleared, meaning that recently-deleted rows could be pruned even though this snapshot could still see them, causing unexpected catalog lookup failures. This effect appears to be the explanation for a recent failure on buildfarm member piculet. To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever it is valid. In the previous logic, it was possible for the CatalogSnapshot to remain valid across waits for client input, but with this change that would mean it delays advance of global xmin in cases where it did not before. To avoid possibly causing new table-bloat problems with clients that sit idle for long intervals, add code to invalidate the CatalogSnapshot before waiting for client input. (When the backend is busy, it's unlikely that the CatalogSnapshot would be the oldest snap for very long, so we don't worry about forcing early invalidation of it otherwise.) In passing, remove the CatalogSnapshotStale flag in favor of using "CatalogSnapshot != NULL" to represent validity, as we do for the other special snapshots in snapmgr.c. And improve some obsolete comments. No regression test because I don't know a deterministic way to cause this failure. But the stress test shown in the original discussion provokes "cache lookup failed for relation 1255" within a few dozen seconds for me. Back-patch to 9.4 where MVCC catalog scans were introduced. (Note: it's quite easy to produce similar failures with the same test case in branches before 9.4. But MVCC catalog scans were supposed to fix that.) Discussion: <16447.1478818294@sss.pgh.pa.us>	2016-11-15 15:55:35 -05:00
Robert Haas	fc19c1801b	Limit the number of number of tapes used for a sort to 501. Gigantic numbers of tapes don't work out well. Original patch by Peter Geoghegan; comments entirely rewritten by me.	2016-11-15 10:30:33 -05:00
Robert Haas	00c6d8077f	Fix broken statement in UCS_to_most.pl. This has been wrong for a very long time, and it's puzzling to me how it ever worked for anyone. Kyotaro Horiguchi	2016-11-15 09:41:53 -05:00
Magnus Hagander	c99f876e9a	Fix typo in comment The function was renamed in `908e23473`, but the comment never learned about it.	2016-11-14 17:31:35 +01:00
Tom Lane	24aef33804	Cleanup of rewriter and planner handling of Query.hasRowSecurity flag. Be sure to pull up the subquery's hasRowSecurity flag when flattening a subquery in pull_up_simple_subquery(). This isn't a bug today because we don't look at the hasRowSecurity flag during planning, but it could easily be a bug tomorrow. Likewise, make rewriteRuleAction() pull up the hasRowSecurity flag when absorbing RTEs from a rule action. This isn't a bug either, for the opposite reason: the flag should never be set yet. But again, it seems like good future proofing. Add a comment explaining why rewriteTargetView() should not set hasRowSecurity when adding stuff to securityQuals. Improve some nearby comments about securityQuals processing, and document that field more completely in parsenodes.h. Patch by me, analysis by Dean Rasheed. Discussion: <CAEZATCXZ8tb2DV6f=bkhsMV6u_gRcZ0CZBw2J-qU84RxSukZog@mail.gmail.com>	2016-11-10 16:16:33 -05:00
Tom Lane	279c439c7f	Support "COPY view FROM" for views with INSTEAD OF INSERT triggers. We just pass the data to the INSTEAD trigger. Haribabu Kommi, reviewed by Dilip Kumar Patch: <CAJrrPGcSQkrNkO+4PhLm4B8UQQQmU9YVUuqmtgM=pmzMfxWaWQ@mail.gmail.com>	2016-11-10 14:13:43 -05:00
Tom Lane	e1b449bea9	Fix partial aggregation for the case of a degenerate GROUP BY clause. The plan generated for sorted partial aggregation with "GROUP BY constant" included a Sort node with no sort keys, which the executor does not like. Per report from Steve Randall. I'd add a regression test case if I could think of a compact one, but it doesn't seem worth expending lots of cycles on. Report: <CABVd52UAdGXpg_rCk46egpNKYdXOzCjuJ1zG26E2xBe_8bj+Fg@mail.gmail.com>	2016-11-10 11:31:56 -05:00
Tom Lane	1833f1a1c3	Simplify code by getting rid of SPI_push, SPI_pop, SPI_restore_connection. The idea behind SPI_push was to allow transitioning back into an "unconnected" state when a SPI-using procedure calls unrelated code that might or might not invoke SPI. That sounds good, but in practice the only thing it does for us is to catch cases where a called SPI-using function forgets to call SPI_connect --- which is a highly improbable failure mode, since it would be exposed immediately by direct testing of said function. As against that, we've had multiple bugs induced by forgetting to call SPI_push/SPI_pop around code that might invoke SPI-using functions; these are much harder to catch and indeed have gone undetected for years in some cases. And we've had to band-aid around some problems of this ilk by introducing conditional push/pop pairs in some places, which really kind of defeats the purpose altogether; if we can't draw bright lines between connected and unconnected code, what's the point? Hence, get rid of SPI_push[_conditional], SPI_pop[_conditional], and the underlying state variable _SPI_curid. It turns out SPI_restore_connection can go away too, which is a nice side benefit since it was never more than a kluge. Provide no-op macros for the deleted functions so as to avoid an API break for external modules. A side effect of this removal is that SPI_palloc and allied functions no longer permit being called when unconnected; they'll throw an error instead. The apparent usefulness of the previous behavior was a mirage as well, because it was depended on by only a few places (which I fixed in preceding commits), and it posed a risk of allocations being unexpectedly long-lived if someone forgot a SPI_push call. Discussion: <20808.1478481403@sss.pgh.pa.us>	2016-11-08 17:39:57 -05:00
Tom Lane	9257f07872	Replace uses of SPI_modifytuple that intend to allocate in current context. Invent a new function heap_modify_tuple_by_cols() that is functionally equivalent to SPI_modifytuple except that it always allocates its result by simple palloc. I chose however to make the API details a bit more like heap_modify_tuple: pass a tupdesc rather than a Relation, and use bool convention for the isnull array. Use this function in place of SPI_modifytuple at all call sites where the intended behavior is to allocate in current context. (There actually are only two call sites left that depend on the old behavior, which makes me wonder if we should just drop this function rather than keep it.) This new function is easier to use than heap_modify_tuple() for purposes of replacing a single column (or, really, any fixed number of columns). There are a number of places where it would simplify the code to change over, but I resisted that temptation for the moment ... everywhere except in plpgsql's exec_assign_value(); changing that might offer some small performance benefit, so I did it. This is on the way to removing SPI_push/SPI_pop, but it seems like good code cleanup in its own right. Discussion: <9633.1478552022@sss.pgh.pa.us>	2016-11-08 15:36:44 -05:00
Robert Haas	dce429b117	Fix typo. Michael Paquier	2016-11-08 15:33:57 -05:00
Tom Lane	6d30fb1f75	Make SPI_fnumber() reject dropped columns. There's basically no scenario where it's sensible for this to match dropped columns, so put a test for dropped-ness into SPI_fnumber() itself, and excise the test from the small number of callers that were paying attention to the case. (Most weren't :-(.) In passing, normalize tests at call sites: always reject attnum <= 0 if we're disallowing system columns. Previously there was a mixture of "< 0" and "<= 0" tests. This makes no practical difference since SPI_fnumber() never returns 0, but I'm feeling pedantic today. Also, in the places that are actually live user-facing code and not legacy cruft, distinguish "column not found" from "can't handle system column". Per discussion with Jim Nasby; thi supersedes his original patch that just changed the behavior at one call site. Discussion: <b2de8258-c4c0-1cb8-7b97-e8538e5c975c@BlueTreble.com>	2016-11-08 13:11:26 -05:00
Robert Haas	60379f66c8	Fix mistake in XLOG_SEG_SIZE test. The intent of the test is to check whether XLOG_SEG_SIZE is in a particular range, but actually in one case it compares XLOG_BLCKSZ by mistake. Repair. Commit `88e9823026` introduced this faulty test. Kuntal Ghosh, reviewed by Michael Paquier.	2016-11-08 12:12:19 -05:00
Robert Haas	f0e72a25b0	Improve handling of dead tuples in hash indexes. When squeezing a bucket during vacuum, it's not necessary to retain any tuples already marked as dead, so ignore them when deciding which tuples must be moved in order to empty a bucket page. Similarly, when splitting a bucket, relocating dead tuples to the new bucket is a waste of effort; instead, just ignore them. Amit Kapila, reviewed by me. Testing help provided by Ashutosh Sharma.	2016-11-08 10:52:51 -05:00
Peter Eisentraut	a5954de105	Save redundant code for pseudotype I/O functions Use a macro to generate the in and out functions for pseudotypes that reject all input and output, saving many lines of redundant code. Parameterize the error messages to reduce translatable strings.	2016-11-07 09:21:00 -05:00
Tom Lane	5485c99e7f	Fix silly nil-pointer-dereference bug introduced in commit `d5f6f13f8`. Don't fetch record->xl_info before we've verified that record isn't NULL. Per Coverity. Michael Paquier	2016-11-06 11:29:40 -05:00
Tom Lane	34ca090570	Adjust cost_merge_append() to reflect use of binaryheap_replace_first(). Commit `7a2fe9bd0` improved merge append so that replacement of a tuple takes log(N) operations, not twice log(N). Since cost_merge_append knew about that explicitly, we should adjust it. This probably makes little difference in practice, but the obsolete comment is confusing. Ideally this would have been put in in 9.3 with the underlying behavior change; but I'm not going to back-patch it, since there's some small chance of changing a plan choice that somebody's optimized for. Thomas Munro Discussion: <CAEepm=0WQBSvuYcMOUj4Ga4NXpu2J=ejZcE=e=eiTjTX-6_gDw@mail.gmail.com>	2016-11-05 13:48:11 -04:00
Tom Lane	d5f6f13f8e	Be more consistent about masking xl_info with ~XLR_INFO_MASK. Generally, WAL resource managers are only supposed to examine the top 4 bits of a WAL record's xl_info; the rest are reserved for the WAL mechanism itself. A few places were not consistent about doing this with respect to XLOG_CHECKPOINT and XLOG_SWITCH records. There's no bug currently, since no additional bits ever get set in these specific record types, but that might not be true forever. Let's follow the generic coding rule here too. Michael Paquier	2016-11-04 13:26:49 -04:00
Kevin Grittner	8c48375e5f	Implement syntax for transition tables in AFTER triggers. This is infrastructure for the complete SQL standard feature. No support is included at this point for execution nodes or PLs. The intent is to add that soon. As this patch leaves things, standard syntax can create tuplestores to contain old and/or new versions of rows affected by a statement. References to these tuplestores are in the TriggerData structure. C triggers can access the tuplestores directly, so they are usable, but they cannot yet be referenced within a SQL statement.	2016-11-04 10:49:50 -05:00
Robert Haas	f2e6a2ccf1	Add API to check if an existing exclusive lock allows cleanup. LockBufferForCleanup() acquires a cleanup lock unconditionally, and ConditionalLockBufferForCleanup() acquires a cleanup lock if it is possible to do so without waiting; this patch adds a new API, IsBufferCleanupOK(), which tests whether an exclusive lock already held happens to be a cleanup lock. This is possible because a cleanup lock simply means an exclusive lock plus the assurance any other pins on the buffer are newer than our own pin. Therefore, just as the existing functions decide that the exclusive lock that they've just taken is a cleanup lock if they observe the pin count to be 1, this new function allows us to observe that the pin count is 1 on a buffer we've already locked. This is useful in situations where a backend definitely wishes to modify the buffer and also wishes to perform cleanup operations if possible. The patch to eliminate heavyweight locking by hash indexes uses this, and it may have other applications as well. Amit Kapila, per a suggestion from me. Some comment adjustments by me as well.	2016-11-04 09:32:24 -04:00
Tom Lane	770671062f	Don't make FK-based selectivity estimates in inheritance situations. The foreign-key-aware logic for estimation of join sizes (added in commit `100340e2d`) blindly tried to apply the concept to rels that are actually parents of inheritance trees. This is just plain wrong so far as the referenced relation is concerned, since the inheritance scan may well produce lots of rows that are not participating in the constraint. It's wrong for the referencing relation too, for the same reason; although on that end we could conceivably detect whether all members of the inheritance tree have equivalent FK constraints pointing to the same referenced rel, and then proceed more or less as we do now. But pending somebody writing code to do that, we must disable this, because it's producing completely silly estimates when there's an FK linking the heads of inheritance trees. Per bug #14404 from Clinton Adams. Back-patch to 9.6 where the new estimation logic came in. Report: <20161028200412.15987.96482@wrigleys.postgresql.org>	2016-11-02 15:50:15 -04:00
Tom Lane	da8f3ebf30	Don't convert Consts into Vars during setrefs.c processing. While converting expressions in an upper-level plan node so that they reference Vars and expressions provided by the input plan node(s), don't convert plain Const items, even if there happens to be a matching Const in the input. It's silly to do so because a Var is more expensive to execute than a Const. Moreover, converting can fool ExecCheckPlanOutput's check that an insert or update query inserts nulls into dropped columns, leading to "query provides a value for a dropped column" errors during INSERT or UPDATE on a table with a dropped column. We could solve this by making that check more complicated, but I don't see the point; this fix should save a marginal number of cycles, and it also makes for less messy EXPLAIN output, as shown by the ensuing regression test result changes. Per report from Pavel Hanák. I have not incorporated a test case based on that example, as there doesn't seem to be a simple way of checking this in isolation without making a bunch of assumptions about other planner and SQL-function behavior. Back-patch to 9.6. This setrefs.c behavior exists much further back, but there is not currently reason to think that it causes problems before 9.6. Discussion: <83shraampf.fsf@is-it.eu>	2016-11-02 14:32:13 -04:00
Peter Eisentraut	3a47c704fb	Add make rules to download raw Unicode mapping files This serves as implicit documentation and is handy if someone wants to tweak things. The rules are not part of a normal build, like this entire directory.	2016-11-01 11:54:58 -04:00
Tom Lane	5ec81aceec	Fix nasty performance problem in tsquery_rewrite(). tsquery_rewrite() tries to find matches to subsets of AND/OR conditions; for example, in the query 'a \| b \| c' the substitution subquery 'a \| c' should match and lead to replacement of the first and third items. That's fine, but the matching algorithm apparently takes about O(2^N) for an N-clause query (I say "apparently" because the code is also both unintelligible and uncommented). We could probably do better than that even without any extra assumptions --- but actually, we know that the subclauses are sorted, indeed are depending on that elsewhere in this very same function. So we can just scan the two lists a single time to detect matches, as though we were doing a merge join. Also do a re-flattening call (QTNTernary()) in tsquery_rewrite_query, just to make sure that the tree fits the expectations of the next search cycle. I didn't try to devise a test case for this, but I'm pretty sure that the oversight could have led to failure to match in some cases where a match would be expected. Improve comments, and also stick a CHECK_FOR_INTERRUPTS into dofindsubquery, just in case it's still too slow for somebody. Per report from Andreas Seltenreich. Back-patch to all supported branches. Discussion: <8760oasf2y.fsf@credativ.de>	2016-10-30 17:35:42 -04:00
Tom Lane	24ebc444c6	Fix bogus tree-flattening logic in QTNTernary(). QTNTernary() contains logic to flatten, eg, '(a & b) & c' into 'a & b & c', which is all well and good, but it tries to do that to NOT nodes as well, so that '!!a' gets changed to '!a'. Explicitly restrict the conversion to be done only on AND and OR nodes, and add a test case illustrating the bug. In passing, provide some comments for the sadly naked functions in tsquery_util.c, and simplify some baroque logic in QTNFree(), which I think may have been leaking some items it intended to free. Noted while investigating a complaint from Andreas Seltenreich. Back-patch to all supported versions.	2016-10-30 15:24:40 -04:00
Tom Lane	9a00f03e47	Improve speed of aggregates that use array_append as transition function. In the previous coding, if an aggregate's transition function returned an expanded array, nodeAgg.c and nodeWindowAgg.c would always copy it and thus force it into the flat representation. This led to ping-ponging between flat and expanded formats, which costs a lot. For an aggregate using array_append as transition function, I measured about a 15X slowdown compared to the pre-9.5 code, when working on simple int[] arrays. Of course, the old code was already O(N^2) in this usage due to copying flat arrays all the time, but it wasn't quite this inefficient. To fix, teach nodeAgg.c and nodeWindowAgg.c to allow expanded transition values without copying, so long as the transition function takes care to return the transition value already properly parented under the aggcontext. That puts a bit of extra responsibility on the transition function, but doing it this way allows us to not need any extra logic in the fast path of advance_transition_function (ie, with a pass-by-value transition value, or with a modified-in-place pass-by-reference value). We already know that that's a hot spot so I'm loath to add any cycles at all there. Also, while only array_append currently knows how to follow this convention, this solution allows other transition functions to opt-in without needing to have a whitelist in the core aggregation code. (The reason we would need a whitelist is that currently, if you pass a R/W expanded-object pointer to an arbitrary function, it's allowed to do anything with it including deleting it; that breaks the core agg code's assumption that it should free discarded values. Returning a value under aggcontext is the transition function's signal that it knows it is an aggregate transition function and will play nice. Possibly the API rules for expanded objects should be refined, but that would not be a back-patchable change.) With this fix, an aggregate using array_append is no longer O(N^2), so it's much faster than pre-9.5 code rather than much slower. It's still a bit slower than the bespoke infrastructure for array_agg, but the differential seems to be only about 10%-20% rather than orders of magnitude. Discussion: <6315.1477677885@sss.pgh.pa.us>	2016-10-30 12:27:41 -04:00
Robert Haas	33839b5ffb	Fix leftover reference to background writer performing checkpoints. This was changed in PostgreSQL 9.2, but somehow this comment never got updated.	2016-10-28 09:09:00 -04:00
Robert Haas	4f714b2fd2	If the stats collector dies during Hot Standby, restart it. This bug exists as far back as 9.0, when Hot Standby was introduced, so back-patch to all supported branches. Report and patch by Takayuki Tsunakawa, reviewed by Michael Paquier and Kuntal Ghosh.	2016-10-27 14:27:40 -04:00
Robert Haas	f267c1c244	Fix possible pg_basebackup failure on standby with "include WAL". If a restartpoint flushed no dirty buffers, it could fail to update the minimum recovery point, leading to a minimum recovery point prior to the starting REDO location. perform_base_backup() would interpret that as meaning that no WAL files at all needed to be included in the backup, failing an internal sanity check. To fix, have restartpoints always update the minimum recovery point to just after the checkpoint record itself, so that the file (or files) containing the checkpoint record will always be included in the backup. Code by Amit Kapila, per a design suggestion by me, with some additional work on the code comment by me. Test case by Michael Paquier. Report by Kyotaro Horiguchi.	2016-10-27 11:19:51 -04:00
Peter Eisentraut	c32fe432af	Avoid using a C++ keyword in header file per cpluspluscheck	2016-10-26 22:41:56 -04:00
Bruce Momjian	586a46c22c	Properly indent postgresql.conf comments to align A few comments were misaligned.	2016-10-26 21:16:50 -04:00
Tom Lane	a522fc3d80	Fix incorrect trigger-property updating in ALTER CONSTRAINT. The code to change the deferrability properties of a foreign-key constraint updated all the associated triggers to match; but a moment's examination of the code that creates those triggers in the first place shows that only some of them should track the constraint's deferrability properties. This leads to odd failures in subsequent exercise of the foreign key, as the triggers are fired at the wrong times. Fix that, and add a regression test comparing the trigger properties produced by ALTER CONSTRAINT with those you get by creating the constraint as-intended to begin with. Per report from James Parks. Back-patch to 9.4 where this ALTER functionality was introduced. Report: <CAJ3Xv+jzJ8iNNUcp4RKW8b6Qp1xVAxHwSXVpjBNygjKxcVuE9w@mail.gmail.com>	2016-10-26 17:05:06 -04:00
Heikki Linnakangas	56f39009c5	Fix typos in comments. Vinayak Pokale	2016-10-26 11:12:31 +03:00
Bruce Momjian	10c064ce4d	Consistently mention 'SELECT pg_reload_conf()' in config files Previously we only mentioned SIGHUP and 'pg_ctl reload' in postgresql.conf and pg_hba.conf.	2016-10-25 11:26:15 -04:00
Alvaro Herrera	00f15338b2	Preserve commit timestamps across clean restart An oversight in setting the boundaries of known commit timestamps during startup caused old commit timestamps to become inaccessible after a server restart. Author and reporter: Julien Rouhaud Review, test code: Craig Ringer	2016-10-24 09:45:48 -03:00
Tom Lane	8f1fb7d621	Avoid testing tuple visibility without buffer lock. INSERT ... ON CONFLICT (specifically ExecCheckHeapTupleVisible) contains another example of this unsafe coding practice. It is much harder to get a failure out of it than the case fixed in commit `6292c2339`, because in most scenarios any hint bits that could be set would have already been set earlier in the command. However, Konstantin Knizhnik reported a failure with a custom transaction manager, and it's clearly possible to get a failure via a race condition in async-commit mode. For lack of a reproducible example, no regression test case in this commit. I did some testing with Asserts added to tqual.c's functions, and can say that running "make check-world" exposed these two bugs and no others. The Asserts are messy enough that I've not added them to the code for now. Report: <57EE93C8.8080504@postgrespro.ru> Related-Discussion: <CAO3NbwOycQjt2Oqy2VW-eLTq2M5uGMyHnGm=RNga4mjqcYD7gQ@mail.gmail.com>	2016-10-23 19:14:32 -04:00
Tom Lane	a6c0a5b6e8	Don't throw serialization errors for self-conflicts in INSERT ON CONFLICT. A transaction that conflicts against itself, for example INSERT INTO t(pk) VALUES (1),(1) ON CONFLICT DO NOTHING; should behave the same regardless of isolation level. It certainly shouldn't throw a serialization error, as retrying will not help. We got this wrong due to the ON CONFLICT logic not considering the case, as reported by Jason Dusek. Core of this patch is by Peter Geoghegan (based on an earlier patch by Thomas Munro), though I didn't take his proposed code refactoring for fear that it might have unexpected side-effects. Test cases by Thomas Munro and myself. Report: <CAO3NbwOycQjt2Oqy2VW-eLTq2M5uGMyHnGm=RNga4mjqcYD7gQ@mail.gmail.com> Related-Discussion: <57EE93C8.8080504@postgrespro.ru>	2016-10-23 18:36:13 -04:00
Tom Lane	6292c23391	Avoid testing tuple visibility without buffer lock in RI_FKey_check(). Despite the argumentation I wrote in commit `7a2fe85b0`, it's unsafe to do this, because in corner cases it's possible for HeapTupleSatisfiesSelf to try to set hint bits on the target tuple; and at least since 8.2 we have required the buffer content lock to be held while setting hint bits. The added regression test exercises one such corner case. Unpatched, it causes an assertion failure in assert-enabled builds, or otherwise would cause a hint bit change in a buffer we don't hold lock on, which given the right race condition could result in checksum failures or other data consistency problems. The odds of a problem in the field are probably pretty small, but nonetheless back-patch to all supported branches. Report: <19391.1477244876@sss.pgh.pa.us>	2016-10-23 15:01:24 -04:00
Robert Haas	919c811ca1	Fix comment formatting.	2016-10-21 12:04:21 -04:00
Robert Haas	7012b132d0	postgres_fdw: Push down aggregates to remote servers. Now that the upper planner uses paths, and now that we have proper hooks to inject paths into the upper planning process, it's possible for foreign data wrappers to arrange to push aggregates to the remote side instead of fetching all of the rows and aggregating them locally. This figures to be a massive win for performance, so teach postgres_fdw to do it. Jeevan Chalke and Ashutosh Bapat. Reviewed by Ashutosh Bapat with additional testing by Prabhat Sahu. Various mostly cosmetic changes by me.	2016-10-21 09:54:29 -04:00
Tom Lane	709e461bef	Fix EXPLAIN so that it doesn't emit invalid XML in corner cases. With track_io_timing = on, EXPLAIN (ANALYZE, BUFFERS) will emit fields named like "I/O Read Time". The slash makes that invalid as an XML element name, so that adding FORMAT XML would produce invalid XML. We already have code in there to translate spaces to dashes, so let's generalize that to convert anything that isn't a valid XML name character, viz letters, digits, hyphens, underscores, and periods. We could just reject slashes, which would run a bit faster. But the fact that this went unnoticed for so long doesn't give me a warm feeling that we'd notice the next creative violation, so let's make it a permanent fix. Reported by Markus Winand, though this isn't his initial patch proposal. Back-patch to 9.2 where track_io_timing was added. The problem is only latent in 9.1, so I don't feel a need to fix it there. Discussion: <E0BF6A45-68E8-45E6-918F-741FB332C6BB@winand.at>	2016-10-20 17:17:50 -04:00
Robert Haas	f82ec32ac3	Rename "pg_xlog" directory to "pg_wal". "xlog" is not a particularly clear abbreviation for "write-ahead log", and it sometimes confuses users into believe that the contents of the "pg_xlog" directory are not critical data, leading to unpleasant consequences. So, rename the directory to "pg_wal". This patch modifies pg_upgrade and pg_basebackup to understand both the old and new directory layouts; the former is necessary given the purpose of the tool, while the latter merely avoids an unnecessary backward-compatibility break. We may wish to consider renaming other programs, switches, and functions which still use the old "xlog" naming to also refer to "wal". However, that's still under discussion, so let's do just this much for now. Discussion: CAB7nPqTeC-8+zux8_-4ZD46V7YPwooeFxgndfsq5Rg8ibLVm1A@mail.gmail.com Michael Paquier	2016-10-20 11:32:18 -04:00
Tom Lane	a3215431ab	Suppress "Factory" zone in pg_timezone_names view for tzdata >= 2016g. IANA got rid of the really silly "abbreviation" and replaced it with one that's only moderately silly. But it's still pointless, so keep on not showing it.	2016-10-19 18:11:49 -04:00
Peter Eisentraut	9ffe4a8b4c	Make getrusage() output a little more readable Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Peter Geoghegan <pg@heroku.com>	2016-10-19 09:53:16 -04:00
Heikki Linnakangas	917dc7d239	Fix WAL-logging of FSM and VM truncation. When a relation is truncated, it is important that the FSM is truncated as well. Otherwise, after recovery, the FSM can return a page that has been truncated away, leading to errors like: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes We were using MarkBufferDirtyHint() to dirty the buffer holding the last remaining page of the FSM, but during recovery, that might in fact not dirty the page, and the FSM update might be lost. To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty() requires us to do WAL-logging ourselves, to protect from a torn page, if checksumming is enabled. Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log when checksumming is enabled. Analysis by Pavan Deolasee. Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com>	2016-10-19 14:26:05 +03:00
Tom Lane	6f13a682c8	Fix cidin() to handle values above 2^31 platform-independently. CommandId is declared as uint32, and values up to 4G are indeed legal. cidout() handles them properly by treating the value as unsigned int. But cidin() was just using atoi(), which has platform-dependent behavior for values outside the range of signed int, as reported by Bart Lengkeek in bug #14379. Use strtoul() instead, as xidin() does. In passing, make some purely cosmetic changes to make xidin/xidout look more like cidin/cidout; the former didn't have a monopoly on best practice IMO. Neither xidin nor cidin make any attempt to throw error for invalid input. I didn't change that here, and am not sure it's worth worrying about since neither is really a user-facing type. The point is just to ensure that indubitably-valid inputs work as expected. It's been like this for a long time, so back-patch to all supported branches. Report: <20161018152550.1413.6439@wrigleys.postgresql.org>	2016-10-18 12:24:46 -04:00
Heikki Linnakangas	faae1c918e	Revert "Replace PostmasterRandom() with a stronger way of generating randomness." This reverts commit `9e083fd468`. That was a few bricks shy of a load: * Query cancel stopped working * Buildfarm member pademelon stopped working, because the box doesn't have /dev/urandom nor /dev/random. This clearly needs some more discussion, and a quite different patch, so revert for now.	2016-10-18 16:28:23 +03:00
Robert Haas	7d3235ba42	By default, set log_line_prefix = '%m [%p] '. This value might not be to everyone's taste; in particular, some people might prefer %t to %m, and others may want %u, %d, or other fields. However, it's a vast improvement on the old default of ''. Christoph Berg	2016-10-17 16:34:48 -04:00
Heikki Linnakangas	d8589946dd	Fix use-after-free around DISTINCT transition function calls. Have tuplesort_gettupleslot() copy the contents of its current table slot as needed. This is based on an approach taken by tuplestore_gettupleslot(). In the future, tuplesort_gettupleslot() may also be taught to avoid copying the tuple where caller can determine that that is safe (the tuplestore_gettupleslot() interface already offers this option to callers). Patch by Peter Geoghegan. Fixes bug #14344, reported by Regina Obe. Report: <20160929035538.20224.39628@wrigleys.postgresql.org> Backpatch-through: 9.6	2016-10-17 12:13:16 +03:00

1 2 3 4 5 ...

16356 Commits