postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	6292c23391	Avoid testing tuple visibility without buffer lock in RI_FKey_check(). Despite the argumentation I wrote in commit `7a2fe85b0`, it's unsafe to do this, because in corner cases it's possible for HeapTupleSatisfiesSelf to try to set hint bits on the target tuple; and at least since 8.2 we have required the buffer content lock to be held while setting hint bits. The added regression test exercises one such corner case. Unpatched, it causes an assertion failure in assert-enabled builds, or otherwise would cause a hint bit change in a buffer we don't hold lock on, which given the right race condition could result in checksum failures or other data consistency problems. The odds of a problem in the field are probably pretty small, but nonetheless back-patch to all supported branches. Report: <19391.1477244876@sss.pgh.pa.us>	2016-10-23 15:01:24 -04:00
Robert Haas	919c811ca1	Fix comment formatting.	2016-10-21 12:04:21 -04:00
Robert Haas	7012b132d0	postgres_fdw: Push down aggregates to remote servers. Now that the upper planner uses paths, and now that we have proper hooks to inject paths into the upper planning process, it's possible for foreign data wrappers to arrange to push aggregates to the remote side instead of fetching all of the rows and aggregating them locally. This figures to be a massive win for performance, so teach postgres_fdw to do it. Jeevan Chalke and Ashutosh Bapat. Reviewed by Ashutosh Bapat with additional testing by Prabhat Sahu. Various mostly cosmetic changes by me.	2016-10-21 09:54:29 -04:00
Tom Lane	709e461bef	Fix EXPLAIN so that it doesn't emit invalid XML in corner cases. With track_io_timing = on, EXPLAIN (ANALYZE, BUFFERS) will emit fields named like "I/O Read Time". The slash makes that invalid as an XML element name, so that adding FORMAT XML would produce invalid XML. We already have code in there to translate spaces to dashes, so let's generalize that to convert anything that isn't a valid XML name character, viz letters, digits, hyphens, underscores, and periods. We could just reject slashes, which would run a bit faster. But the fact that this went unnoticed for so long doesn't give me a warm feeling that we'd notice the next creative violation, so let's make it a permanent fix. Reported by Markus Winand, though this isn't his initial patch proposal. Back-patch to 9.2 where track_io_timing was added. The problem is only latent in 9.1, so I don't feel a need to fix it there. Discussion: <E0BF6A45-68E8-45E6-918F-741FB332C6BB@winand.at>	2016-10-20 17:17:50 -04:00
Robert Haas	f82ec32ac3	Rename "pg_xlog" directory to "pg_wal". "xlog" is not a particularly clear abbreviation for "write-ahead log", and it sometimes confuses users into believe that the contents of the "pg_xlog" directory are not critical data, leading to unpleasant consequences. So, rename the directory to "pg_wal". This patch modifies pg_upgrade and pg_basebackup to understand both the old and new directory layouts; the former is necessary given the purpose of the tool, while the latter merely avoids an unnecessary backward-compatibility break. We may wish to consider renaming other programs, switches, and functions which still use the old "xlog" naming to also refer to "wal". However, that's still under discussion, so let's do just this much for now. Discussion: CAB7nPqTeC-8+zux8_-4ZD46V7YPwooeFxgndfsq5Rg8ibLVm1A@mail.gmail.com Michael Paquier	2016-10-20 11:32:18 -04:00
Tom Lane	a3215431ab	Suppress "Factory" zone in pg_timezone_names view for tzdata >= 2016g. IANA got rid of the really silly "abbreviation" and replaced it with one that's only moderately silly. But it's still pointless, so keep on not showing it.	2016-10-19 18:11:49 -04:00
Peter Eisentraut	9ffe4a8b4c	Make getrusage() output a little more readable Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Peter Geoghegan <pg@heroku.com>	2016-10-19 09:53:16 -04:00
Heikki Linnakangas	917dc7d239	Fix WAL-logging of FSM and VM truncation. When a relation is truncated, it is important that the FSM is truncated as well. Otherwise, after recovery, the FSM can return a page that has been truncated away, leading to errors like: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes We were using MarkBufferDirtyHint() to dirty the buffer holding the last remaining page of the FSM, but during recovery, that might in fact not dirty the page, and the FSM update might be lost. To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty() requires us to do WAL-logging ourselves, to protect from a torn page, if checksumming is enabled. Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log when checksumming is enabled. Analysis by Pavan Deolasee. Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com>	2016-10-19 14:26:05 +03:00
Tom Lane	6f13a682c8	Fix cidin() to handle values above 2^31 platform-independently. CommandId is declared as uint32, and values up to 4G are indeed legal. cidout() handles them properly by treating the value as unsigned int. But cidin() was just using atoi(), which has platform-dependent behavior for values outside the range of signed int, as reported by Bart Lengkeek in bug #14379. Use strtoul() instead, as xidin() does. In passing, make some purely cosmetic changes to make xidin/xidout look more like cidin/cidout; the former didn't have a monopoly on best practice IMO. Neither xidin nor cidin make any attempt to throw error for invalid input. I didn't change that here, and am not sure it's worth worrying about since neither is really a user-facing type. The point is just to ensure that indubitably-valid inputs work as expected. It's been like this for a long time, so back-patch to all supported branches. Report: <20161018152550.1413.6439@wrigleys.postgresql.org>	2016-10-18 12:24:46 -04:00
Heikki Linnakangas	faae1c918e	Revert "Replace PostmasterRandom() with a stronger way of generating randomness." This reverts commit `9e083fd468`. That was a few bricks shy of a load: * Query cancel stopped working * Buildfarm member pademelon stopped working, because the box doesn't have /dev/urandom nor /dev/random. This clearly needs some more discussion, and a quite different patch, so revert for now.	2016-10-18 16:28:23 +03:00
Robert Haas	7d3235ba42	By default, set log_line_prefix = '%m [%p] '. This value might not be to everyone's taste; in particular, some people might prefer %t to %m, and others may want %u, %d, or other fields. However, it's a vast improvement on the old default of ''. Christoph Berg	2016-10-17 16:34:48 -04:00
Heikki Linnakangas	d8589946dd	Fix use-after-free around DISTINCT transition function calls. Have tuplesort_gettupleslot() copy the contents of its current table slot as needed. This is based on an approach taken by tuplestore_gettupleslot(). In the future, tuplesort_gettupleslot() may also be taught to avoid copying the tuple where caller can determine that that is safe (the tuplestore_gettupleslot() interface already offers this option to callers). Patch by Peter Geoghegan. Fixes bug #14344, reported by Regina Obe. Report: <20160929035538.20224.39628@wrigleys.postgresql.org> Backpatch-through: 9.6	2016-10-17 12:13:16 +03:00
Heikki Linnakangas	9e083fd468	Replace PostmasterRandom() with a stronger way of generating randomness. This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom - /dev/random Original patch by Magnus Hagander, with further work by Michael Paquier and me. Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>	2016-10-17 11:52:50 +03:00
Andres Freund	5dfc198146	Use more efficient hashtable for execGrouping.c to speed up hash aggregation. The more efficient hashtable speeds up hash-aggregations with more than a few hundred groups significantly. Improvements of over 120% have been measured. Due to the the different hash table queries that not fully determined (e.g. GROUP BY without ORDER BY) may change their result order. The conversion is largely straight-forward, except that, due to the static element types of simplehash.h type hashes, the additional data some users store in elements (e.g. the per-group working data for hash aggregaters) is now stored in TupleHashEntryData->additional. The meaning of BuildTupleHashTable's entrysize (renamed to additionalsize) has been changed to only be about the additionally stored size. That size is only used for the initial sizing of the hash-table. Reviewed-By: Tomas Vondra Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>	2016-10-14 17:22:51 -07:00
Andres Freund	75ae538bc3	Use more efficient hashtable for tidbitmap.c to speed up bitmap scans. Use the new simplehash.h to speed up tidbitmap.c uses. For bitmap scan heavy queries speedups of over 100% have been measured. Both lossy and exact scans benefit, but the wins are bigger for mostly exact scans. The conversion is mostly trivial, except that tbm_lossify() now restarts lossifying at the point it previously stopped. Otherwise the hash table becomes unbalanced because the scan in done in hash-order, leaving the end of the hashtable more densely filled then the beginning. That caused performance issues with dynahash as well, but due to the open chaining they were less pronounced than with the linear adressing from simplehash.h. Reviewed-By: Tomas Vondra Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>	2016-10-14 16:08:11 -07:00
Tom Lane	32fdf42cf5	Fix assorted integer-overflow hazards in varbit.c. bitshiftright() and bitshiftleft() would recursively call each other infinitely if the user passed INT_MIN for the shift amount, due to integer overflow in negating the shift amount. To fix, clamp to -VARBITMAXLEN. That doesn't change the results since any shift distance larger than the input bit string's length produces an all-zeroes result. Also fix some places that seemed inadequately paranoid about input typmods exceeding VARBITMAXLEN. While a typmod accepted by anybit_typmodin() will certainly be much less than that, at least some of these spots are reachable with user-chosen integer values. Andreas Seltenreich and Tom Lane Discussion: <87d1j2zqtz.fsf@credativ.de>	2016-10-14 16:28:34 -04:00
Tom Lane	81e82a2bd4	Fix handling of pgstat counters for TRUNCATE in a prepared transaction. pgstat_twophase_postcommit is supposed to duplicate the math in AtEOXact_PgStat, but it had missed out the bit about clearing t_delta_live_tuples/t_delta_dead_tuples for a TRUNCATE. It's harder than you might think to replicate the issue here, because those counters would only be nonzero when a previous transaction in the same backend had added/deleted tuples in the truncated table, and those counts hadn't been sent to the stats collector yet. Evident oversight in commit `d42358efb`. I've not added a regression test for this; we tried to add one in `d42358efb`, and had to revert it because it was too timing-sensitive for the buildfarm. Back-patch to 9.5 where `d42358efb` came in. Stas Kelvich Discussion: <EB57BF68-C06D-4737-BDDC-4BA778F4E62B@postgrespro.ru>	2016-10-13 19:46:05 -04:00
Tom Lane	3cca13cbfc	Fix another bug in merging of inherited CHECK constraints. It's not good for an inherited child constraint to be marked connoinherit; that would result in the constraint not propagating to grandchild tables, if any are created later. The code mostly prevented this from happening but there was one case that was missed. This is somewhat related to commit `e55a946a8`, which also tightened checks on constraint merging. Hence, back-patch to 9.2 like that one. This isn't so much because there's a concrete feature-related reason to stop there, as to avoid having more distinct behaviors than we have to in this area. Amit Langote Discussion: <b28ee774-7009-313d-dd55-5bdd81242c41@lab.ntt.co.jp>	2016-10-13 17:05:14 -04:00
Tom Lane	cb775768e3	Try to find out the actual hugepage size when making a MAP_HUGETLB request. Even if Linux's mmap() is okay with a partial-hugepage request, munmap() is not, as reported by Chris Richards. Therefore it behooves us to try a bit harder to find out the actual hugepage size, instead of assuming that we can skate by with a guess. For the moment, just look into /proc/meminfo to find out the default hugepage size, and use that. Later, on kernels that support requests for nondefault sizes, we might try to consider other alternatives. But that smells more like a new feature than a bug fix, especially if we want to provide any way for the DBA to control it, so leave it for another day. I set this up to allow easy addition of platform-specific code for non-Linux platforms, if needed; but right now there are no reports suggesting that we need to work harder on other platforms. Back-patch to 9.4 where hugepage support was introduced. Discussion: <31056.1476303954@sss.pgh.pa.us>	2016-10-13 15:06:46 -04:00
Tom Lane	15fc5e1581	Clean up handling of anonymous mmap'd shared-memory segment. Fix detaching of the mmap'd segment to have its own on_shmem_exit callback, rather than piggybacking on the one for detaching from the SysV segment. That was confusing, and given the distance between the two attach calls, it was trouble waiting to happen. Make the detaching calls idempotent by clearing AnonymousShmem to show we've already unmapped. I spent quite a bit of time yesterday trying to find a path that would allow the munmap()'s to be done twice, and while I did not succeed, it seems silly that there's even a question. Make the #ifdef logic less confusing by separating "do we want to use anonymous shmem" from EXEC_BACKEND. Even though there's no current scenario where those conditions are different, it is not helpful for different places in the same file to be testing EXEC_BACKEND for what are fundamentally different reasons. Don't do on_exit_reset() in StartBackgroundWorker(). At best that's useless (InitPostmasterChild would have done it already) and at worst it could zap some callback that's unrelated to shared memory. Improve comments, and simplify the huge_pages enablement logic slightly. Back-patch to 9.4 where hugepage support was introduced. Arguably this should go into 9.3 as well, but the code looks significantly different there, and I doubt it's worth the trouble of adapting the patch given I can't show a live bug.	2016-10-13 13:59:56 -04:00
Tom Lane	9c4cc9e2c7	Fix broken jsonb_set() logic for replacing array elements. Commit `0b62fd036` did a fairly sloppy job of refactoring setPath() to support jsonb_insert() along with jsonb_set(). In its defense, though, there was no regression test case exercising the case of replacing an existing element in a jsonb array. Per bug #14366 from Peng Sun. Back-patch to 9.6 where bug was introduced. Report: <20161012065349.1412.47858@wrigleys.postgresql.org>	2016-10-13 00:25:48 -04:00
Tom Lane	5c80642aa8	Remove unnecessary int2vector-specific hash function and equality operator. These functions were originally added in commit `d8cedf67a` to support use of int2vector columns as catcache lookup keys. However, there are no catcaches that use such columns. (Indeed I now think it must always have been dead code: a catcache with such a key column would need an underlying unique index on the column, but we've never had an int2vector btree opclass.) Getting rid of the int2vector-specific operator and function does not lose any functionality, because operations on int2vectors will now fall back to the generic anyarray support. This avoids a wart that a btree index on an int2vector column (made using anyarray_ops) would fail to match equality searches, because int2vectoreq wasn't a member of the opclass. We don't really care much about that, since int2vector is not meant as a type for users to use, but it's silly to have extra code and less functionality. If we ever do want a catcache to be indexed by an int2vector column, we'd need to put back full btree and hash opclasses for int2vector, comparable to the support for oidvector. (The anyarray code can't be used at such a low level, because it needs to do catcache lookups.) But we'll deal with that if/when the need arises. Also worth noting is that removal of the hash int2vector_ops opclass will break any user-created hash indexes on int2vector columns. While hash anyarray_ops would serve the same purpose, it would probably not compute the same hash values and thus wouldn't be on-disk-compatible. Given that int2vector isn't a user-facing type and we're planning other incompatible changes in hash indexes for v10 anyway, this doesn't seem like something to worry about, but it's probably worth mentioning here. Amit Langote Discussion: <d9bb74f8-b194-7307-9ebd-90645d377e45@lab.ntt.co.jp>	2016-10-12 14:54:08 -04:00
Heikki Linnakangas	b75f467b6e	Simplify the code for logical tape read buffers. Pass the buffer size as argument to LogicalTapeRewindForRead, rather than setting it earlier with the separate LogicTapeAssignReadBufferSize call. This way, the buffer size is set closer to where it's actually used, which makes the code easier to understand. This makes the calculation for how much memory to use for the buffers less precise. We now use the same amount of memory for every tape, rounded down to the nearest BLCKSZ boundary, instead of using one more block for some tapes, to get the total up to exact amount of memory available. That should be OK, merging isn't too sensitive to the exact amount of memory used. Reviewed by Peter Geoghegan Discussion: <0f607c4b-df23-353e-bf56-c0389d28495f@iki.fi>	2016-10-12 12:05:45 +03:00
Tom Lane	2f1eaf87e8	Drop server support for FE/BE protocol version 1.0. While this isn't a lot of code, it's been essentially untestable for a very long time, because libpq doesn't support anything older than protocol 2.0, and has not since release 6.3. There's no reason to believe any other client-side code still uses that protocol, either. Discussion: <2661.1475849167@sss.pgh.pa.us>	2016-10-11 12:19:18 -04:00
Tom Lane	2b860f52ed	Remove "sco" and "unixware" ports. SCO OpenServer and SCO UnixWare are more or less dead platforms. We have never had a buildfarm member testing the "sco" port, and the last "unixware" member was last heard from in 2012, so it's fair to doubt that the code even compiles anymore on either one. Remove both ports. We can always undo this if someone shows up with an interest in maintaining and testing these platforms. Discussion: <17177.1476136994@sss.pgh.pa.us>	2016-10-11 11:26:04 -04:00
Heikki Linnakangas	6fb12cbcd6	Remove some unnecessary #includes. Amit Langote	2016-10-10 12:22:58 +03:00
Peter Eisentraut	52f0142eb4	Add a noreturn attribute to help static analyzers	2016-10-09 21:36:42 -04:00
Tom Lane	ac4a9d92fc	Fix incorrect handling of polymorphic aggregates used as window functions. The transfunction was told that its first argument and result were of the window function output type, not the aggregate state type. This'd only matter if the transfunction consults get_fn_expr_argtype, which typically only polymorphic functions would do. Although we have several regression tests around polymorphic aggs, none of them detected this mistake --- in fact, they still didn't fail when I injected the same mistake into nodeAgg.c. So add some more tests covering both plain agg and window-function-agg cases. Per report from Sebastian Luque. Back-patch to 9.6 where the error was introduced (by sloppy refactoring in commit `804163bc2`, looks like). Report: <87int2qkat.fsf@gmail.com>	2016-10-09 12:49:37 -04:00
Tom Lane	e55a946a81	Fix two bugs in merging of inherited CHECK constraints. Historically, we've allowed users to add a CHECK constraint to a child table and then add an identical CHECK constraint to the parent. This results in "merging" the two constraints so that the pre-existing child constraint ends up with both conislocal = true and coninhcount > 0. However, if you tried to do it in the other order, you got a duplicate constraint error. This is problematic for pg_dump, which needs to issue separated ADD CONSTRAINT commands in some cases, but has no good way to ensure that the constraints will be added in the required order. And it's more than a bit arbitrary, too. The goal of complaining about duplicated ADD CONSTRAINT commands can be served if we reject the case of adding a constraint when the existing one already has conislocal = true; but if it has conislocal = false, let's just make the ADD CONSTRAINT set conislocal = true. In this way, either order of adding the constraints has the same end result. Another problem was that the code allowed creation of a parent constraint marked convalidated that is merged with a child constraint that is !convalidated. In this case, an inheritance scan of the parent table could emit some rows violating the constraint condition, which would be an unexpected result given the marking of the parent constraint as validated. Hence, forbid merging of constraints in this case. (Note: valid child and not-valid parent seems fine, so continue to allow that.) Per report from Benedikt Grundmann. Back-patch to 9.2 where we introduced possibly-not-valid check constraints. The second bug obviously doesn't apply before that, and I think the first doesn't either, because pg_dump only gets into this situation when dealing with not-valid constraints. Report: <CADbMkNPT-Jz5PRSQ4RbUASYAjocV_KHUWapR%2Bg8fNvhUAyRpxA%40mail.gmail.com> Discussion: <22108.1475874586@sss.pgh.pa.us>	2016-10-08 19:29:27 -04:00
Tom Lane	8811f5d3a4	libpqwalreceiver needs to link with libintl when using --enable-nls. The need for this was previously obscured even on picky platforms by the hack we used to support direct cross-module references in the transforms contrib modules. Now that that hack is gone, the undefined symbol is exposed, as reported by Robert Haas. Back-patch to 9.5 where we started to use -Wl,-undefined,dynamic_lookup. I'm a bit surprised that the older branches don't seem to contain any gettext references in this module, but since they don't fail at build time, they must not. (We might be able to get away with leaving this alone in 9.5/9.6, but I think it's cleaner if the reference gets resolved at link time.) Report: <CA+TgmoaHJKU5kcWZcYduATYVT7Mnx+8jUnycaYYL7OtCwCigug@mail.gmail.com>	2016-10-07 21:12:25 -04:00
Andres Freund	b0779abb3a	Fix fallback implementation of pg_atomic_write_u32(). I somehow had assumed that in the spinlock (in turn possibly using semaphores) based fallback atomics implementation 32 bit writes could be done without a lock. As far as the write goes that's correct, since postgres supports only platforms with single-copy atomicity for aligned 32bit writes. But writing without holding the spinlock breaks read-modify-write operations like pg_atomic_compare_exchange_u32(), since they'll potentially "miss" a concurrent write, which can't happen in actual hardware implementations. In 9.6+ when using the fallback atomics implementation this could lead to buffer header locks not being properly marked as released, and potentially some related state corruption. I don't see a related danger in 9.5 (earliest release with the API), because pg_atomic_write_u32() wasn't used in a concurrent manner there. The state variable of local buffers, before this change, were manipulated using pg_atomic_write_u32(), to avoid unnecessary synchronization overhead. As that'd not be the case anymore, introduce and use pg_atomic_unlocked_write_u32(), which does not correctly interact with RMW operations. This bug only caused issues when postgres is compiled on platforms without atomics support (i.e. no common new platform), or when compiled with --disable-atomics, which explains why this wasn't noticed in testing. Reported-By: Tom Lane Discussion: <14947.1475690465@sss.pgh.pa.us> Backpatch: 9.5-, where the atomic operations API was introduced.	2016-10-07 16:55:15 -07:00
Heikki Linnakangas	0aec7f9aec	Remove bogus mapping from UTF-8 to SJIS conversion table. 0xc19c is not a valid UTF-8 byte sequence. It doesn't do any harm, AFAICS, but it's surely not intentional. No backpatching though, just to be sure. In the passing, also add a file header comment to the file, like the UCS_to_SJIS.pl script would produce. (The file was originally created with UCS_to_SJIS.pl, but has been modified by hand since then. That's questionable, but I'll leave fixing that for later.) Kyotaro Horiguchi Discussion: <20160907.155050.233844095.horiguchi.kyotaro@lab.ntt.co.jp>	2016-10-07 23:56:42 +03:00
Heikki Linnakangas	b56fb691b0	Fix excessive memory consumption in the new sort pre-reading code. LogicalTapeRewind() should not allocate large read buffer, if the tape is completely empty. The calling code relies on that, for its calculation of how much memory to allocate for the read buffers. That lead to massive overallocation of memory, if maxTapes was high, but only a few tapes were actually used. Reported by Tomas Vondra Discussion: <7303da46-daf7-9c68-3cc1-9f83235cf37e@2ndquadrant.com>	2016-10-06 09:46:40 +03:00
Robert Haas	eb3bc0bd1a	Re-alphabetize #include directives. Thomas Munro	2016-10-05 08:24:25 -04:00
Robert Haas	d2ce38e204	Rename WAIT_* constants to PG_WAIT_*. Windows apparently has a constant named WAIT_TIMEOUT, and some of these other names are pretty generic, too. Insert "PG_" at the front of each name in order to disambiguate. Michael Paquier	2016-10-05 08:04:52 -04:00
Robert Haas	6c9c95ed1b	Fix another Windows compile break. Commit `6f3bd98ebf` is still making the buildfarm unhappy. This time it's mastodon that is complaining.	2016-10-04 13:14:19 -04:00
Robert Haas	9445d1121d	Fix Windows compile break in `6f3bd98ebf`.	2016-10-04 12:18:05 -04:00
Heikki Linnakangas	d4fca5e6c7	Fix another outdated comment. Preloading is done by logtape.c now.	2016-10-04 19:16:00 +03:00
Robert Haas	6f3bd98ebf	Extend framework from commit `53be0b1ad` to report latch waits. WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an additional wait_event_info parameter; legal values are defined in pgstat.h. This makes it possible to uniquely identify every point in the core code where we are waiting for a latch; extensions can pass WAIT_EXTENSION. Because latches were the major wait primitive not previously covered by this patch, it is now possible to see information in pg_stat_activity on a large number of important wait events not previously addressed, such as ClientRead, ClientWrite, and SyncRep. Unfortunately, many of the wait events added by this patch will fail to appear in pg_stat_activity because they're only used in background processes which don't currently appear in pg_stat_activity. We should fix this either by creating a separate view for such information, or else by deciding to include them in pg_stat_activity after all. Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and Thomas Munro.	2016-10-04 11:01:42 -04:00
Heikki Linnakangas	c86c2d9d57	Update comment. mergepreread()/mergeprereadone() don't exist anymore, the function that does roughly the same is now called mergereadnext().	2016-10-04 09:47:54 +03:00
Andres Freund	61633f7904	Correct logical decoding restore behaviour for subtransactions. Before initializing iteration over a subtransaction's changes, the last few changes were not spilled to disk. That's correct if the transaction didn't spill to disk, but otherwise... This bug can lead to missed or misorderd subtransaction contents when they were spilled to disk. Move spilling of the remaining in-memory changes to ReorderBufferIterTXNInit(), where it can easily be applied to the top transaction and, if present, subtransactions. Since this code had too many bugs already, noticeably increase test coverage. Fixes: #14319 Reported-By: Huan Ruan Discussion: <20160909012610.20024.58169@wrigleys.postgresql.org> Backport: 9,4-, where logical decoding was added	2016-10-03 22:11:36 -07:00
Tom Lane	6bc811c992	Show a sensible value in pg_settings.unit for GUC_UNIT_XSEGS variables. Commit `88e982302` invented GUC_UNIT_XSEGS for min_wal_size and max_wal_size, but neglected to make it display sensibly in pg_settings.unit (by adding a case to the switch in GetConfigOptionByNum). Fix that, and adjust said switch to throw a run-time error the next time somebody forgets. In passing, avoid using a static buffer for the output string --- the rest of this function pstrdup's from a local buffer, and I see no very good reason why the units code should do it differently and less safely. Per report from Otar Shavadze. Back-patch to 9.5 where the new unit type was added. Report: <CAG-jOyA=iNFhN+yB4vfvqh688B7Tr5SArbYcFUAjZi=0Exp-Lg@mail.gmail.com>	2016-10-03 16:40:25 -04:00
Stephen Frost	814b9e9b8e	Fix RLS with COPY (col1, col2) FROM tab Attempting to COPY a subset of columns from a table with RLS enabled would fail due to an invalid query being constructed (using a single ColumnRef with the list of fields to exact in 'fields', but that's for the different levels of an indirection for a single column, not for specifying multiple columns). Correct by building a ColumnRef and then RestTarget for each column being requested and then adding those to the targetList for the select query. Include regression tests to hopefully catch if this is broken again in the future. Patch-By: Adam Brightwell Reviewed-By: Michael Paquier	2016-10-03 16:22:57 -04:00
Heikki Linnakangas	e94568ecc1	Change the way pre-reading in external sort's merge phase works. Don't pre-read tuples into SortTuple slots during merge. Instead, use the memory for larger read buffers in logtape.c. We're doing the same number of READTUP() calls either way, but managing the pre-read SortTuple slots is much more complicated. Also, the on-tape representation is more compact than SortTuples, so we can fit more pre-read tuples into the same amount of memory this way. And we have better cache-locality, when we use just a small number of SortTuple slots. Now that we only hold one tuple from each tape in the SortTuple slots, we can greatly simplify the "batch memory" management. We now maintain a small set of fixed-sized slots, to hold the tuples, and fall back to palloc() for larger tuples. We use this method during all merge phases, not just the final merge, and also when randomAccess is requested, and also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we do an external sort. Reviewed by Peter Geoghegan and Claudio Freire. Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>	2016-10-03 13:37:49 +03:00
Tom Lane	e8bdee2770	Add ALTER EXTENSION ADD/DROP ACCESS METHOD, and use it in pg_upgrade. Without this, an extension containing an access method is not properly dumped/restored during pg_upgrade --- the AM ends up not being a member of the extension after upgrading. Another oversight in commit `473b93287`, reported by Andrew Dunstan. Report: <f7ac29f3-515c-2a44-21c5-ec925053265f@dunslane.net>	2016-10-02 14:31:28 -04:00
Tom Lane	3b90e38c5d	Do ClosePostmasterPorts() earlier in SubPostmasterMain(). In standard Unix builds, postmaster child processes do ClosePostmasterPorts immediately after InitPostmasterChild, that is almost immediately after being spawned. This is important because we don't want children holding open the postmaster's end of the postmaster death watch pipe. However, in EXEC_BACKEND builds, SubPostmasterMain was postponing this responsibility significantly, in order to make it slightly more convenient to pass the right flag value to ClosePostmasterPorts. This is bad, particularly seeing that process_shared_preload_libraries() might invoke nearly-arbitrary code. Rearrange so that we do it as soon as we've fetched the socket FDs via read_backend_variables(). Also move the comment explaining about randomize_va_space to before the call of PGSharedMemoryReAttach, which is where it's relevant. The old placement was appropriate when the reattach happened inside CreateSharedMemoryAndSemaphores, but that was a long time ago. Back-patch to 9.3; the patch doesn't apply cleanly before that, and it doesn't seem worth a lot of effort given that we've had no actual field complaints traceable to this. Discussion: <4157.1475178360@sss.pgh.pa.us>	2016-10-01 17:15:09 -04:00
Peter Eisentraut	6ad8ac6026	Exclude additional directories in pg_basebackup The list of files and directories that pg_basebackup excludes from the backup was somewhat incomplete and unorganized. Change that with having the exclusion driven from tables. Clean up some code around it. Also document the exclusions in more detail so that users of pg_start_backup can make use of it as well. The contents of these directories are now excluded from the backup: pg_dynshmem, pg_notify, pg_serial, pg_snapshots, pg_subtrans Also fix a bug that a pg_repl_slot or pg_stat_tmp being a symlink would cause a corrupt tar header to be created. Now such symlinks are included in the backup as empty directories. Bug found by Ashutosh Sharma <ashu.coek88@gmail.com>. From: David Steele <david@pgmasters.net> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2016-09-28 12:00:00 -04:00
Alvaro Herrera	b82d5a2c7c	Silence compiler warnings Reported by Peter Eisentraut. Coding suggested by Tom Lane.	2016-09-28 19:31:58 -03:00
Tom Lane	83bed06be4	Rationalize format-picture caching logic in formatting.c. Add a validity flag to DCHCacheEntry and NUMCacheEntry entries, and do not set it true until after we've parsed the supplied format string. This allows dealing with possible errors while parsing the format without the baroque hack that was there before (which only covered errors within NUMDesc_prepare, anyway). We can get rid of the PG_TRY in NUMDesc_prepare, as well as last_NUMCacheEntry and NUM_cache_remove. (Essentially, this reverts commit `ff783fbae` in favor of a less fragile solution; the problems with that approach are well illustrated by later hacking such as 55f927a46.) In passing, define the size of these caches as DCH_CACHE_ENTRIES not DCH_CACHE_FIELDS + 1 (whoever thought that was a good definition?) and likewise for the NUM cache. Also const-ify format string parameters where convenient, and merge duplicated cache lookup logic. This is primarily driven by a proposed patch from Artur Zakirov, which introduced some ereport's into format string parsing for the datetime case. He proposed preventing the creation of invalid cache entries by parsing the format string first into a local-variable array, and then copying that to a cache entry. That seemed a bit ugly to me, and anyway randomly different from the way the identical problem had been solved for the numeric case. Let's make the two sets of code more similar not less so. I'm not sure whether we'll adopt the new error conditions Artur proposes, but this patch seems like good code cleanup and future-proofing in any case. The existing code is critically (and undocumented-ly) dependent on no elog being thrown out of several nontrivial functions, which is trouble waiting to happen, though it doesn't seem to be actively broken today. Discussion: <b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru>	2016-09-28 17:08:40 -04:00
Tom Lane	d3cd36a133	Make to_timestamp() and to_date() range-check fields of their input. Historically, something like to_date('2009-06-40','YYYY-MM-DD') would return '2009-07-10' because there was no prohibition on out-of-range month or day numbers. This has been widely panned, and it also turns out that Oracle throws an error in such cases. Since these functions are nominally Oracle-compatibility features, let's change that. There's no particular restriction on year (modulo the fact that the scanner may not believe that more than 4 digits are year digits, a matter to be addressed separately if at all). But we now check month, day, hour, minute, second, and fractional-second fields, as well as day-of-year and second-of-day fields if those are used. Currently, no checks are made on ISO-8601-style week numbers or day numbers; it's not very clear what the appropriate rules would be there, and they're probably so little used that it's not worth sweating over. Artur Zakirov, reviewed by Amul Sul, further adjustments by me Discussion: <1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com> See-Also: <57786490.9010201@wars-nicht.de>	2016-09-28 14:36:17 -04:00
Peter Eisentraut	967ed9205b	Remove dead line of code	2016-09-28 12:00:00 -04:00
Peter Eisentraut	e79e6c4da1	Fix CRC check handling in get_controlfile The previous patch broke this by returning NULL for a failed CRC check, which pg_controldata would then try to read. Fix by returning the result of the CRC check in a separate argument. Michael Paquier and myself	2016-09-28 12:00:00 -04:00
Robert Haas	308985b0b4	Fix dangling pointer problem in ReorderBufferSerializeChange. Commit `3fe3511d05` introduced a new case into this function, but neglected to ensure that the "ondisk" pointer got updated after a possible reallocation as the code does in other cases. Stas Kelvich, per diagnosis by Konstantin Knizhnik.	2016-09-28 11:19:46 -04:00
Heikki Linnakangas	babe05bc2b	Turn password_encryption GUC into an enum. This makes the parameter easier to extend, to support other password-based authentication protocols than MD5. (SCRAM is being worked on.) The GUC still accepts on/off as aliases for "md5" and "plain", although we may want to remove those once we actually add support for another password hash type. Michael Paquier, reviewed by David Steele, with some further edits by me. Discussion: <CAB7nPqSMXU35g=W9X74HVeQp0uvgJxvYOuA4A-A3M+0wfEBv-w@mail.gmail.com>	2016-09-28 12:22:44 +03:00
Tom Lane	72daabc7a3	Disallow pushing volatile quals past set-returning functions. Pushing an upper-level restriction clause into an unflattened subquery-in-FROM is okay when the subquery contains no SRFs in its targetlist, or when it does but the SRFs are unreferenced by the clause and the clause is not volatile. Otherwise, we're changing the number of times the clause is evaluated, which is bad for volatile quals, and possibly changing the result, since a volatile qual might succeed for some SRF output rows and not others despite not referencing any of the changing columns. (Indeed, if the clause is something like "random() > 0.5", the user is probably expecting exactly that behavior.) We had most of these restrictions down, but not the one about the upper clause not being volatile. Fix that, and add a regression test to illustrate the expected behavior. Although this is definitely a bug, it doesn't seem like back-patch material, since possibly some users don't realize that the broken behavior is broken and are relying on what happens now. Also, while the added test is quite cheap in the wake of commit `a4c35ea1c`, it would be much more expensive (or else messier) in older branches. Per report from Tom van Tilburg. Discussion: <CAP3PPDiucxYCNev52=YPVkrQAPVF1C5PFWnrQPT7iMzO1fiKFQ@mail.gmail.com>	2016-09-27 18:43:36 -04:00
Alvaro Herrera	51c3e9fade	Include <sys/select.h> where needed <sys/select.h> is required by POSIX.1-2001 to get the prototype of select(2), but nearly no systems enforce that because older standards let you get away with including some other headers. Recent OpenBSD hacking has removed that frail touch of friendliness, however, which broke some compiles; fix all the way back to 9.1 by adding the required standard. Only vacuumdb.c was reported to fail, but it seems easier to fix the whole lot in a fell swoop. Per bug #14334 by Sean Farrell.	2016-09-27 01:05:21 -03:00
Tom Lane	fdc9186f7e	Replace the built-in GIN array opclasses with a single polymorphic opclass. We had thirty different GIN array opclasses sharing the same operators and support functions. That still didn't cover all the built-in types, nor did it cover arrays of extension-added types. What we want is a single polymorphic opclass for "anyarray". There were two missing features needed to make this possible: 1. We have to be able to declare the index storage type as ANYELEMENT when the opclass is declared to index ANYARRAY. This just takes a few more lines in index_create(). Although this currently seems of use only for GIN, there's no reason to make index_create() restrict it to that. 2. We have to be able to identify the proper GIN compare function for the index storage type. This patch proceeds by making the compare function optional in GIN opclass definitions, and specifying that the default btree comparison function for the index storage type will be looked up when the opclass omits it. Again, that seems pretty generically useful. Since the comparison function lookup is done in initGinState(), making use of the second feature adds an additional cache lookup to GIN index access setup. It seems unlikely that that would be very noticeable given the other costs involved, but maybe at some point we should consider making GinState data persist longer than it now does --- we could keep it in the index relcache entry, perhaps. Rather fortuitously, we don't seem to need to do anything to get this change to play nice with dump/reload or pg_upgrade scenarios: the new opclass definition is automatically selected to replace existing index definitions, and the on-disk data remains compatible. Also, if a user has created a custom opclass definition for a non-builtin type, this doesn't break that, since CREATE INDEX will prefer an exact match to opcintype over a match to ANYARRAY. However, if there's anyone out there with handwritten DDL that explicitly specifies _bool_ops or one of the other replaced opclass names, they'll need to adjust that. Tom Lane, reviewed by Enrique Meneses Discussion: <14436.1470940379@sss.pgh.pa.us>	2016-09-26 14:52:44 -04:00
Tom Lane	da6c4f6ca8	Refer to OS X as "macOS", except for the port name which is still "darwin". We weren't terribly consistent about whether to call Apple's OS "OS X" or "Mac OS X", and the former is probably confusing to people who aren't Apple users. Now that Apple has rebranded it "macOS", follow their lead to establish a consistent naming pattern. Also, avoid the use of the ancient project name "Darwin", except as the port code name which does not seem desirable to change. (In short, this patch touches documentation and comments, but no actual code.) I didn't touch contrib/start-scripts/osx/, either. I suspect those are obsolete and due for a rewrite, anyway. I dithered about whether to apply this edit to old release notes, but those were responsible for quite a lot of the inconsistencies, so I ended up changing them too. Anyway, Apple's being ahistorical about this, so why shouldn't we be?	2016-09-25 15:40:57 -04:00
Tom Lane	959ea7fa76	Remove useless code. Apparent copy-and-pasteo in standby_desc_invalidations() had two entries for msg->id == SHAREDINVALRELMAP_ID. Aleksander Alekseev Discussion: <20160923090814.GB1238@e733>	2016-09-23 10:44:50 -04:00
Tom Lane	8e6b4ee21f	Don't trust CreateFileMapping() to clear the error code on success. We must test GetLastError() even when CreateFileMapping() returns a non-null handle. If that value were left over from some previous system call, we might be fooled into thinking the segment already existed. Experimentation on Windows 7 suggests that CreateFileMapping() clears the error code on success, but it is not documented to do so, so let's not rely on that happening in all Windows releases. Amit Kapila Discussion: <20811.1474390987@sss.pgh.pa.us>	2016-09-23 10:09:52 -04:00
Tom Lane	49a91b88e6	Avoid using PostmasterRandom() for DSM control segment ID. Commits `470d886c3` et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>	2016-09-23 09:54:11 -04:00
Bruce Momjian	1ff0042165	C comment: fix function header comment Fix for transformOnConflictClause(). Author: Tomonari Katsumata	2016-09-22 17:34:24 -04:00
Tom Lane	8023b5827f	Remove nearly-unused SizeOfIptrData macro. Past refactorings have removed all but one reference to SizeOfIptrData (and that one place was in a pretty noncritical spot). Since nobody's complained, it seems probable that there are no supported compilers that don't think sizeof(ItemPointerData) is 6. If there are, we're wasting MAXALIGN per heap tuple anyway, so it's rather silly to worry about whether we can shave space in places like WAL records. Pavan Deolasee Discussion: <CABOikdOOawDda4hwLOT6zdA6MFfPLu3Z2YBZkX0JdayNS6JOeQ@mail.gmail.com>	2016-09-22 14:30:33 -04:00
Tom Lane	96dd77d349	Be sure to rewind the tuplestore read pointer in non-leader CTEScan nodes. ExecInitCteScan supposed that it didn't have to do anything to the extra tuplestore read pointer it gets from tuplestore_alloc_read_pointer. However, it needs this read pointer to be positioned at the start of the tuplestore, while tuplestore_alloc_read_pointer is actually defined as cloning the current position of read pointer 0. In normal situations that accidentally works because we initialize the whole plan tree at once, before anything gets read. But it fails in an EvalPlanQual recheck, as illustrated in bug #14328 from Dima Pavlov. To fix, just forcibly rewind the pointer after tuplestore_alloc_read_pointer. The cost of doing so is negligible unless the tuplestore is already in TSS_READFILE state, which wouldn't happen in normal cases. We could consider altering tuplestore's API to make that case cheaper, but that would make for a more invasive back-patch and it doesn't seem worth it. This has been broken probably for as long as we've had CTEs, so back-patch to all supported branches. Discussion: <32468.1474548308@sss.pgh.pa.us>	2016-09-22 11:35:03 -04:00
Peter Eisentraut	ebdf5bf7d1	Delay updating control file to "in production" Move the updating of the control file to "in production" status until the point where WAL writes are allowed. Before, there could be a significant gap between the control file update and write transactions actually being allowed. This makes it more reliable to use the control status to verify the end of a promotion. From: Michael Paquier <michael.paquier@gmail.com>	2016-09-21 12:00:00 -04:00
Peter Eisentraut	c1dc51d484	pg_ctl: Detect current standby state from pg_control pg_ctl used to determine whether a server was in standby mode by looking for a recovery.conf file. With this change, it instead looks into pg_control, which is potentially more accurate. There are also occasional discussions about removing recovery.conf, so this removes one dependency. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2016-09-21 12:00:00 -04:00
Robert Haas	470d886c32	Use PostmasterRandom(), not random(), for DSM control segment ID. Otherwise, every startup gets the same "random" value, which is definitely not what was intended.	2016-09-20 12:26:29 -04:00
Robert Haas	419113dfdc	Retry DSM control segment creation if Windows indicates access denied. Otherwise, attempts to run multiple postmasters running on the same machine may fail, because Windows sometimes returns ERROR_ACCESS_DENIED rather than ERROR_ALREADY_EXISTS when there is an existing segment. Hitting this bug is much more likely because of another defect not fixed by this patch, namely that dsm_postmaster_startup() uses random() which returns the same value every time. But that's not a reason not to fix this. Kyotaro Horiguchi and Amit Kapila, reviewed by Michael Paquier Discussion: <CAA4eK1JyNdMeF-dgrpHozDecpDfsRZUtpCi+1AbtuEkfG3YooQ@mail.gmail.com>	2016-09-20 12:04:41 -04:00
Heikki Linnakangas	45310221a9	Fix outdated comments, GIST search queue is not an RBTree anymore. The GiST search queue is implemented as a pairing heap rather than as Red-Black Tree, since 9.5 (commit `e7032610`). I neglected these comments in that commit.	2016-09-20 11:38:25 +03:00
Tom Lane	d8c61c9765	Add debugging aid "bmsToString(Bitmapset *bms)". This function has no direct callers at present, but it's convenient for manual use in a debugger, rather than having to inspect memory and do bit-counting in your head. In passing, get rid of useless outBitmapset() wrapper around _outBitmapset(); let's just export the function that does the work. Likewise for outToken(). Ashutosh Bapat, tweaked a bit by me Discussion: <CAFjFpRdiht8e1HTVirbubr4YzaON5iZTzFJjq909y4sU8M_6eA@mail.gmail.com>	2016-09-16 09:36:24 -04:00
Robert Haas	5225c66336	Clarify policy on marking inherited constraints as valid. Amit Langote and Robert Haas	2016-09-15 17:24:54 -04:00
Heikki Linnakangas	5c6df67e0c	Fix building with LibreSSL. LibreSSL defines OPENSSL_VERSION_NUMBER to claim that it is version 2.0.0, but it doesn't have the functions added in OpenSSL 1.1.0. Add autoconf checks for the individual functions we need, and stop relying on OPENSSL_VERSION_NUMBER. Backport to 9.5 and 9.6, like the patch that broke this. In the back-branches, there are still a few OPENSSL_VERSION_NUMBER checks left, to check for OpenSSL 0.9.8 or 0.9.7. I left them as they were - LibreSSL has all those functions, so they work as intended. Per buildfarm member curculio. Discussion: <2442.1473957669@sss.pgh.pa.us>	2016-09-15 22:52:51 +03:00
Robert Haas	ffccee4736	Fix typo in comment. Amit Langote	2016-09-15 11:35:57 -04:00
Tom Lane	5472ed3e9b	Make min_parallel_relation_size's default value platform-independent. The documentation states that the default value is 8MB, but this was only true at BLCKSZ = 8kB, because the default was hard-coded as 1024. Make the code match the docs by computing the default as 8MB/BLCKSZ. Oversight in commit `75be66464`, noted pursuant to a gripe from Peter E. Discussion: <90634e20-097a-e4fd-67d5-fb2c42f0dd71@2ndquadrant.com>	2016-09-15 11:23:25 -04:00
Heikki Linnakangas	593d4e47db	Support OpenSSL 1.1.0. Changes needed to build at all: - Check for SSL_new in configure, now that SSL_library_init is a macro. - Do not access struct members directly. This includes some new code in pgcrypto, to use the resource owner mechanism to ensure that we don't leak OpenSSL handles, now that we can't embed them in other structs anymore. - RAND_SSLeay() -> RAND_OpenSSL() Changes that were needed to silence deprecation warnings, but were not strictly necessary: - RAND_pseudo_bytes() -> RAND_bytes(). - SSL_library_init() and OpenSSL_config() -> OPENSSL_init_ssl() - ASN1_STRING_data() -> ASN1_STRING_get0_data() - DH_generate_parameters() -> DH_generate_parameters() - Locking callbacks are not needed with OpenSSL 1.1.0 anymore. (Good riddance!) Also change references to SSLEAY_VERSION_NUMBER with OPENSSL_VERSION_NUMBER, for the sake of consistency. OPENSSL_VERSION_NUMBER has existed since time immemorial. Fix SSL test suite to work with OpenSSL 1.1.0. CA certificates must have the "CA:true" basic constraint extension now, or OpenSSL will refuse them. Regenerate the test certificates with that. The "openssl" binary, used to generate the certificates, is also now more picky, and throws an error if an X509 extension is specified in "req_extensions", but that section is empty. Backpatch to all supported branches, per popular demand. In back-branches, we still support OpenSSL 0.9.7 and above. OpenSSL 0.9.6 should still work too, but I didn't test it. In master, we only support 0.9.8 and above. Patch by Andreas Karlsson, with additional changes by me. Discussion: <20160627151604.GD1051@msg.df7cb.de>	2016-09-15 14:42:29 +03:00
Heikki Linnakangas	c99dd5bfed	Fix and clarify comments on replacement selection. These were modified by the patch to only use replacement selection for the first run in an external sort.	2016-09-15 11:51:43 +03:00
Peter Eisentraut	656df624c0	Add overflow checks to money type input function The money type input function did not have any overflow checks at all. There were some regression tests that purported to check for overflow, but they actually checked for the overflow behavior of the int8 type before casting to money. Remove those unnecessary checks and add some that actually check the money input function. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2016-09-14 12:00:00 -05:00
Tom Lane	55c3391d1e	Be pickier about converting between Name and Datum. We were misapplying NameGetDatum() to plain C strings in some places. This worked, because it was just a pointer cast anyway, but it's a type cheat in some sense. Use CStringGetDatum instead, and modify the NameGetDatum macro so it won't compile if applied to something that's not a pointer to NameData. This should result in no changes to generated code, but it is logically cleaner. Mark Dilger, tweaked a bit by me Discussion: <EFD8AC94-4C1F-40C1-A5EA-304080089C1B@gmail.com>	2016-09-13 17:17:48 -04:00
Tom Lane	fdc79e1909	Fix executor/README to reflect disallowing SRFs in UPDATE. The parenthetical comment here is obsoleted by commit `a4c35ea1c`. Noted by Andres Freund.	2016-09-13 14:25:35 -04:00
Tom Lane	a4c35ea1c2	Improve parser's and planner's handling of set-returning functions. Teach the parser to reject misplaced set-returning functions during parse analysis using p_expr_kind, in much the same way as we do for aggregates and window functions (cf commit `eaccfded9`). While this isn't complete (it misses nesting-based restrictions), it's much better than the previous error reporting for such cases, and it allows elimination of assorted ad-hoc expression_returns_set() error checks. We could add nesting checks later if it seems important to catch all cases at parse time. There is one case the parser will now throw error for although previous versions allowed it, which is SRFs in the tlist of an UPDATE. That never behaved sensibly (since it's ill-defined which generated row should be used to perform the update) and it's hard to see why it should not be treated as an error. It's a release-note-worthy change though. Also, add a new Query field hasTargetSRFs reporting whether there are any SRFs in the targetlist (including GROUP BY/ORDER BY expressions). The parser can now set that basically for free during parse analysis, and we can use it in a number of places to avoid expression_returns_set searches. (There will be more such checks soon.) In some places, this allows decontorting the logic since it's no longer expensive to check for SRFs in the tlist --- so I made the checks parallel to the handling of hasAggs/hasWindowFuncs wherever it seemed appropriate. catversion bump because adding a Query field changes stored rules. Andres Freund and Tom Lane Discussion: <24639.1473782855@sss.pgh.pa.us>	2016-09-13 13:54:24 -04:00
Robert Haas	445a38aba2	Have heapam.h include lockdefs.h rather than lock.h. lockdefs.h was only split from lock.h relatively recently, and represents a minimal subset of the old lock.h. heapam.h only needs that smaller subset, so adjust it to include only that. This requires some corresponding adjustments elsewhere. Peter Geoghegan	2016-09-13 09:21:35 -04:00
Simon Riggs	4068eb9918	Fix copy/pasto in file identification Daniel Gustafsson	2016-09-12 09:01:58 +01:00
Simon Riggs	fc3d4a44e9	Identify walsenders in pg_stat_activity Following `8299471c37` walsender procs are now visible in pg_stat_activity. Set query to ‘walsender’ for walsender procs to allow them to be identified. Discussion:CAB7nPqS8c76KPSufK_HSDeYrbtg+zZ7D0EEkjeM6txSEuCB_jA@mail.gmail.com Michael Paquier, issue raised by Fujii Masao, reviewed by Tom Lane	2016-09-12 08:57:14 +01:00
Simon Riggs	c3c0d7bd70	Raise max setting of checkpoint_timeout to 1d Previously checkpoint_timeout was capped at 3600s New max setting is 86400s = 24h = 1d Discussion: 32558.1454471895@sss.pgh.pa.us	2016-09-11 23:26:18 +01:00
Tom Lane	40b449ae84	Allow CREATE EXTENSION to follow extension update paths. Previously, to update an extension you had to produce both a version-update script and a new base installation script. It's become more and more obvious that that's tedious, duplicative, and error-prone. This patch attempts to improve matters by allowing the new base installation script to be omitted. CREATE EXTENSION will install a requested version if it can find a base script and a chain of update scripts that will get there. As in the existing update logic, shorter chains are preferred if there's more than one possibility, with an arbitrary tie-break rule for chains of equal length. Also adjust the pg_available_extension_versions view to show such versions as installable. While at it, refactor the code so that CASCADE processing works for extensions requested during ApplyExtensionUpdates(). Without this, addition of a new requirement in an updated extension would require creating a new base script, even if there was no other reason to do that. (It would be easy at this point to add a CASCADE option to ALTER EXTENSION UPDATE, to allow the same thing to happen during a manually-commanded version update, but I have not done that here.) Tom Lane, reviewed by Andres Freund Discussion: <20160905005919.jz2m2yh3und2dsuy@alap3.anarazel.de>	2016-09-11 14:15:07 -04:00
Heikki Linnakangas	24598337c8	Implement binary heap replace-top operation in a smarter way. In external sort's merge phase, we maintain a binary heap holding the next tuple from each input tape. On each step, the topmost tuple is returned, and replaced with the next tuple from the same tape. We were doing the replacement by deleting the top node in one operation, and inserting the next tuple after that. However, you can do a "replace-top" operation more efficiently, in one "sift-up". A deletion will always walk the heap from top to bottom, but in a replacement, we can stop as soon as we find the right place for the new tuple. This is particularly helpful, if the tapes are not in completely random order, so that the next tuple from a tape is likely to land near the top of the heap. Peter Geoghegan, reviewed by Claudio Freire, with some editing by me. Discussion: <CAM3SWZRhBhiknTF_=NjDSnNZ11hx=U_SEYwbc5vd=x7M4mMiCw@mail.gmail.com>	2016-09-11 16:27:27 +03:00
Tom Lane	ddc8893179	Fix miserable coding in pg_stat_get_activity(). Commit `dd1a3bccc` replaced a test on whether a subroutine returned a null pointer with a test on whether &pointer->backendStatus was null. This accidentally failed to fail, at least on common compilers, because backendStatus is the first field in the struct; but it was surely trouble waiting to happen. Commit `f91feba87` then messed things up further, changing the logic to local_beentry = pgstat_fetch_stat_local_beentry(curr_backend); if (!local_beentry) continue; beentry = &local_beentry->backendStatus; if (!beentry) { where the second "if" is now dead code, so that the intended behavior of printing a row with "<backend information not available>" cannot occur. I suspect this is all moot because pgstat_fetch_stat_local_beentry will never actually return null in this function's usage, but it's still very poor coding. Repair back to 9.4 where the original problem was introduced.	2016-09-10 13:49:04 -04:00
Tom Lane	24992c6db9	Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple. The full generality of deleting an arbitrary number of tuples is no longer needed, so let's save some code and cycles by replacing the original coding with an implementation based on PageIndexTupleDelete. We can always get back the old code from git if we need it again for new callers (though I don't care for its willingness to mess with line pointers it wasn't told to mess with). Discussion: <552.1473445163@sss.pgh.pa.us>	2016-09-09 19:00:59 -04:00
Tom Lane	1a4be103a5	Convert PageAddItem into a macro to save a few cycles. Nowadays this is just a backwards-compatibility wrapper around PageAddItemExtended, so let's avoid the extra level of function call. In addition, because pretty much all callers are passing constants for the two bool arguments, compilers will be able to constant-fold the conversion to a flags bitmask. Discussion: <552.1473445163@sss.pgh.pa.us>	2016-09-09 18:17:07 -04:00
Tom Lane	b1328d78f8	Invent PageIndexTupleOverwrite, and teach BRIN and GiST to use it. PageIndexTupleOverwrite performs approximately the same function as PageIndexTupleDelete (or PageIndexDeleteNoCompact) followed by PageAddItem targeting the same item pointer offset. But in the case where the new tuple is the same size as the old, it avoids shuffling other data around on the page, because the new tuple is placed where the old one was rather than being appended to the end of the page. This has been shown to provide a substantial speedup for some GiST use-cases. Also, this change allows some API simplifications: we can get rid of the rather klugy and error-prone PAI_ALLOW_FAR_OFFSET flag for PageAddItemExtended, since that was used only to cover a corner case for BRIN that's better expressed by using PageIndexTupleOverwrite. Note that this patch causes a rather subtle WAL incompatibility: the physical page content change represented by certain WAL records is now different than it was before, because while the tuples have the same itempointer line numbers, the tuples themselves are in different places. I have not bumped the WAL version number because I think it doesn't matter unless you are trying to do bitwise comparisons of original and replayed pages, and in any case we're early in a devel cycle and there will probably be more WAL changes before v10 gets out the door. There is probably room to make use of PageIndexTupleOverwrite in SP-GiST and GIN too, but that is left for a future patch. Andrey Borodin, reviewed by Anastasia Lubennikova, whacked around a bit by me Discussion: <CAJEAwVGQjGGOj6mMSgMwGvtFd5Kwe6VFAxY=uEPZWMDjzbn4VQ@mail.gmail.com>	2016-09-09 18:02:36 -04:00
Alvaro Herrera	5c609a742f	Fix locking a tuple updated by an aborted (sub)transaction When heap_lock_tuple decides to follow the update chain, it tried to also lock any version of the tuple that was created by an update that was subsequently rolled back. This is pointless, since for all intents and purposes that tuple exists no more; and moreover it causes misbehavior, as reported independently by Marko Tiikkaja and Marti Raudsepp: some SELECT FOR UPDATE/SHARE queries may fail to return the tuples, and assertion-enabled builds crash. Fix by having heap_lock_updated_tuple test the xmin and return success immediately if the tuple was created by an aborted transaction. The condition where tuples become invisible occurs when an updated tuple chain is followed by heap_lock_updated_tuple, which reports the problem as HeapTupleSelfUpdated to its caller heap_lock_tuple, which in turn propagates that code outwards possibly leading the calling code (ExecLockRows) to believe that the tuple exists no longer. Backpatch to 9.3. Only on 9.5 and newer this leads to a visible failure, because of commit 27846f02c176; before that, heap_lock_tuple skips the whole dance when the tuple is already locked by the same transaction, because of the ancient HeapTupleSatisfiesUpdate behavior. Still, the buggy condition may also exist in more convoluted scenarios involving concurrent transactions, so it seems safer to fix the bug in the old branches too. Discussion: https://www.postgresql.org/message-id/CABRT9RC81YUf1=jsmWopcKJEro=VoeG2ou6sPwyOUTx_qteRsg@mail.gmail.com https://www.postgresql.org/message-id/48d3eade-98d3-8b9a-477e-1a8dc32a724d@joh.to	2016-09-09 15:54:29 -03:00
Tom Lane	984d0a14e8	In PageIndexTupleDelete, don't assume stored item lengths are MAXALIGNed. PageAddItem stores the item length as-is. It MAXALIGN's the amount of space actually allocated for each tuple, but not the stored length. PageRepairFragmentation, PageIndexMultiDelete, and PageIndexDeleteNoCompact are all on board with this and MAXALIGN item lengths after fetching them. But PageIndexTupleDelete expects the stored length to be a MAXALIGN multiple already. This accidentally works for existing index AMs because they all maxalign their tuple sizes internally; but we don't do that for heap tuples, and it shouldn't be a requirement for index tuples either. So, sync PageIndexTupleDelete with the rest of bufpage.c by having it maxalign the item size after fetching. Also add a check that pd_special is maxaligned, to ensure that the test "(offset + size) > phdr->pd_special" is still doing the right thing. (If offset and pd_special are aligned, it doesn't matter whether size is.) Again, this is in sync with the rest of the routines here, except for PageAddItem which doesn't test because it doesn't actually do anything for which pd_special alignment matters. This shouldn't have any immediate functional impact; it just adds the flexibility to use PageIndexTupleDelete on index tuples with non-aligned lengths. Discussion: <3814.1473366762@sss.pgh.pa.us>	2016-09-09 12:21:09 -04:00
Tom Lane	967a7b0fc9	Avoid reporting "cache lookup failed" for some user-reachable cases. We have a not-terribly-thoroughly-enforced-yet project policy that internal errors with SQLSTATE XX000 (ie, plain elog) should not be triggerable from SQL. record_in, domain_in, and PL validator functions all failed to meet this standard, because they threw plain elog("cache lookup failed for XXX") errors on bad OIDs, and those are all invokable from SQL. For record_in, the best fix is to upgrade typcache.c (lookup_type_cache) to throw a user-facing error for this case. That seems consistent because it was more than halfway there already, having user-facing errors for shell types and non-composite types. Having done that, tweak domain_in to rely on the typcache to throw an appropriate error. (This costs little because InitDomainConstraintRef would fetch the typcache entry anyway.) For the PL validator functions, we already have a single choke point at CheckFunctionValidatorAccess, so just fix its error to be user-facing. Dilip Kumar, reviewed by Haribabu Kommi Discussion: <87wpxfygg9.fsf@credativ.de>	2016-09-09 09:20:34 -04:00
Simon Riggs	ec253de1fd	Fix corruption of 2PC recovery with subxacts Reading 2PC state files during recovery was borked, causing corruptions during recovery. Effect limited to servers with 2PC, subtransactions and recovery/replication. Stas Kelvich, reviewed by Michael Paquier and Pavan Deolasee	2016-09-09 11:55:12 +01:00
Andres Freund	45e191e3aa	Improve scalability of md.c for large relations. So far md.c used a linked list of segments. That proved to be a problem when processing large relations, because every smgr.c/md.c level access to a page incurred walking through a linked list of all preceding segments. Thus making accessing pages O(#segments). Replace the linked list of segments hanging off SMgrRelationData with an array of opened segments. That allows O(1) access to individual segments, if they've previously been opened. Discussion: <20140331101001.GE13135@alap3.anarazel.de> Reviewed-By: Peter Geoghegan, Tom Lane (in an older version)	2016-09-08 17:18:46 -07:00
Andres Freund	417fefaf08	Faster PageIsVerified() for the all zeroes case. That's primarily useful for testing very large relations, using sparse files. Discussion: <20140331101001.GE13135@alap3.anarazel.de> Reviewed-By: Peter Geoghegan	2016-09-08 17:02:43 -07:00
Andres Freund	769fd9d8e0	Fix mdtruncate() to close fd.c handle of deleted segments. mdtruncate() forgot to FileClose() a segment's mdfd_vfd, when deleting it. That lead to a fd.c handle to a truncated file being kept open until backend exit. The issue appears to have been introduced way back in `1a5c450f30`, before that the handle was closed inside FileUnlink(). The impact of this bug is limited - only VACUUM and ON COMMIT TRUNCATE for temporary tables, truncate files in place (i.e. TRUNCATE itself is not affected), and the relation has to be bigger than 1GB. The consequences of a leaked fd.c handle aren't severe either. Discussion: <20160908220748.oqh37ukwqqncbl3n@alap3.anarazel.de> Backpatch: all supported releases	2016-09-08 16:51:09 -07:00
Simon Riggs	67c6bd1ca3	Fix minor memory leak in Standby startup StandbyRecoverPreparedTransactions() leaked the buffer used for two phase state file. This was leaked once at startup and at every shutdown checkpoint seen. Backpatch to 9.6 Stas Kelvich	2016-09-08 10:32:58 +01:00
Tom Lane	0ab9c56d0f	Support renaming an existing value of an enum type. Not much to be said about this patch: it does what it says on the tin. In passing, rename AlterEnumStmt.skipIfExists to skipIfNewValExists to clarify what it actually does. In the discussion of this patch we considered supporting other similar options, such as IF EXISTS on the type as a whole or IF NOT EXISTS on the target name. This patch doesn't actually add any such feature, but it might happen later. Dagfinn Ilmari Mannsåker, reviewed by Emre Hasegeli Discussion: <CAO=2mx6uvgPaPDf-rHqG8=1MZnGyVDMQeh8zS4euRyyg4D35OQ@mail.gmail.com>	2016-09-07 16:11:56 -04:00
Tom Lane	4f405c8ef4	Add a HINT for client-vs-server COPY failure cases. Users often get confused between COPY and \copy and try to use client-side paths with COPY. The server then cannot find the file (if remote), or sees a permissions problem (if local), or some variant of that. Emit a hint about this in the most common cases. In future we might want to expand the set of errnos for which the hint gets printed, but be conservative for now. Craig Ringer, reviewed by Christoph Berg and Tom Lane Discussion: <CAMsr+YEqtD97qPEzQDqrCt5QiqPbWP_X4hmvy2pQzWC0GWiyPA@mail.gmail.com>	2016-09-06 23:55:55 -04:00
Peter Eisentraut	49eb0fd097	Add location field to DefElem Add a location field to the DefElem struct, used to parse many utility commands. Update various error messages to supply error position information. To propogate the error position information in a more systematic way, create a ParseState in standard_ProcessUtility() and pass that to interested functions implementing the utility commands. This seems better than passing the query string and then reassembling a parse state ad hoc, which violates the encapsulation of the ParseState type. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2016-09-06 12:00:00 -04:00
Tom Lane	f032722f86	Guard against possible memory allocation botch in batchmemtuples(). Negative availMemLessRefund would be problematic. It's not entirely clear whether the case can be hit in the code as it stands, but this seems like good future-proofing in any case. While we're at it, insist that the value be not merely positive but not tiny, so as to avoid doing a lot of repalloc work for little gain. Peter Geoghegan Discussion: <CAM3SWZRVkuUB68DbAkgw=532gW0f+fofKueAMsY7hVYi68MuYQ@mail.gmail.com>	2016-09-06 15:50:31 -04:00
Simon Riggs	dcb12ce8d8	Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL lazy_truncate_heap() was waiting for VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds not milliseconds as originally intended. Found by code inspection. Simon Riggs	2016-09-06 15:35:47 +01:00
Tom Lane	25794e841e	Cosmetic code cleanup in commands/extension.c. Some of the comments added by the CREATE EXTENSION CASCADE patch were a bit sloppy, and I didn't care for redeclaring the same local variable inside a nested block either. No functional changes.	2016-09-05 18:53:33 -04:00
Tom Lane	c54159d44c	Make locale-dependent regex character classes work for large char codes. Previously, we failed to recognize Unicode characters above U+7FF as being members of locale-dependent character classes such as [[:alpha:]]. (Actually, the same problem occurs for large pg_wchar values in any multibyte encoding, but UTF8 is the only case people have actually complained about.) It's impractical to get Spencer's original code to handle character classes or ranges containing many thousands of characters, because it insists on considering each member character individually at regex compile time, whether or not the character will ever be of interest at run time. To fix, choose a cutoff point MAX_SIMPLE_CHR below which we process characters individually as before, and deal with entire ranges or classes as single entities above that. We can actually make things cheaper than before for chars below the cutoff, because the color map can now be a simple linear array for those chars, rather than the multilevel tree structure Spencer designed. It's more expensive than before for chars above the cutoff, because we must do a binary search in a list of high chars and char ranges used in the regex pattern, plus call iswalpha() and friends for each locale-dependent character class used in the pattern. However, multibyte encodings are normally designed to give smaller codes to popular characters, so that we can expect that the slow path will be taken relatively infrequently. In any case, the speed penalty appears minor except when we have to apply iswalpha() etc. to high character codes at runtime --- and the previous coding gave wrong answers for those cases, so whether it was faster is moot. Tom Lane, reviewed by Heikki Linnakangas Discussion: <15563.1471913698@sss.pgh.pa.us>	2016-09-05 17:06:29 -04:00
Tom Lane	15bc038f9b	Relax transactional restrictions on ALTER TYPE ... ADD VALUE. To prevent possibly breaking indexes on enum columns, we must keep uncommitted enum values from getting stored in tables, unless we can be sure that any such column is new in the current transaction. Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE from being executed at all in a transaction block, unless the target enum type had been created in the current transaction. This patch removes that restriction, and instead insists that an uncommitted enum value can't be referenced unless it belongs to an enum type created in the same transaction as the value. Per discussion, this should be a bit less onerous. It does require each function that could possibly return a new enum value to SQL operations to check this restriction, but there aren't so many of those that this seems unmaintainable. Andrew Dunstan and Tom Lane Discussion: <4075.1459088427@sss.pgh.pa.us>	2016-09-05 12:59:55 -04:00
Simon Riggs	016abf1fb8	Add debug check function LWLockHeldByMeInMode() Tests whether my process holds a lock in given mode. Add initial usage in MarkBufferDirty(). Thomas Munro	2016-09-05 10:38:08 +01:00
Simon Riggs	d851bef2d6	Dirty replication slots when using sql interface When pg_logical_slot_get_changes(...) sets confirmed_flush_lsn to the point at which replay stopped, it doesn't dirty the replication slot. So if the replay didn't cause restart_lsn or catalog_xmin to change as well, this change will not get written out to disk. Even on a clean shutdown. If Pg crashes or restarts, a subsequent pg_logical_slot_get_changes(...) call will see the same changes already replayed since it uses the slot's confirmed_flush_lsn as the start point for fetching changes. The caller can't specify a start LSN when using the SQL interface. Mark the slot as dirty after reading changes using the SQL interface so that users won't see repeated changes after a clean shutdown. Repeated changes still occur when using the walsender interface or after an unclean shutdown. Craig Ringer	2016-09-05 09:44:38 +01:00
Tom Lane	b6182081be	Remove duplicate code from ReorderBufferCleanupTXN(). Andres is apparently the only hacker who thinks this code is better as-is. I (tgl) follow some of his logic, but the fact that it's setting off warnings from static code analyzers seems like a sufficient reason to put the complexity into a comment rather than the code. Aleksander Alekseev Discussion: <20160404190345.54d84ee8@fujitsu>	2016-09-04 20:49:44 -04:00
Heikki Linnakangas	e21db14b8a	Clarify the new Red-Black post-order traversal code a bit. Coverity complained about the for(;;) loop, because it never actually iterated. It was used just to be able to use "break" to exit it early. I agree with Coverity, that's a bit confusing, so refactor the code to use if-else instead. While we're at it, use a local variable to hold the "current" node. That's shorter and clearer than referring to "iter->last_visited" all the time.	2016-09-04 15:02:06 +03:00
Tom Lane	600dc4c0da	Fix multiple bugs in numeric_poly_deserialize(). These were evidently introduced by yesterday's commit `9cca11c91`, which perhaps needs more review than it got. Per report from Andreas Seltenreich and additional examination of nearby code. Report: <87oa45qfwq.fsf@credativ.de>	2016-09-03 14:18:55 -04:00
Tom Lane	60893786d5	Fix corrupt GIN_SEGMENT_ADDITEMS WAL records on big-endian hardware. computeLeafRecompressWALData() tried to produce a uint16 WAL log field by memcpy'ing the first two bytes of an int-sized variable. That accidentally works on little-endian hardware, but not at all on big-endian. Replay then thinks it's looking at an ADDITEMS action with zero entries, and reads the first two bytes of the first TID therein as the next segno/action, typically leading to "unexpected GIN leaf action" errors during replay. Even if replay failed to crash, the resulting GIN index page would surely be incorrect. To fix, just declare the variable as uint16 instead. Per bug #14295 from Spencer Thomason (much thanks to Spencer for turning his problem into a self-contained test case). This likely also explains a previous report of the same symptom from Bernd Helmle. Back-patch to 9.4 where the problem was introduced (by commit `14d02f0bb`). Discussion: <20160826072658.15676.7628@wrigleys.postgresql.org> Possible-Report: <2DA7350F7296B2A142272901@eje.land.credativ.lan>	2016-09-03 13:28:53 -04:00
Simon Riggs	35250b6ad7	New recovery target recovery_target_lsn Michael Paquier	2016-09-03 17:48:01 +01:00
Tom Lane	39b691f251	Don't require dynamic timezone abbreviations to match underlying time zone. Previously, we threw an error if a dynamic timezone abbreviation did not match any abbreviation recorded in the referenced IANA time zone entry. That seemed like a good consistency check at the time, but it turns out that a number of the abbreviations in the IANA database are things that Olson and crew made up out of whole cloth. Their current policy is to remove such names in favor of using simple numeric offsets. Perhaps unsurprisingly, a lot of these made-up abbreviations have varied in meaning over time, which meant that our commit `b2cbced9e` and later changes made them into dynamic abbreviations. So with newer IANA database versions that don't mention these abbreviations at all, we fail, as reported in bug #14307 from Neil Anderson. It's worse than just a few unused-in-the-wild abbreviations not working, because the pg_timezone_abbrevs view stops working altogether (since its underlying function tries to compute the whole view result in one call). We considered deleting these abbreviations from our abbreviations list, but the problem with that is that we can't stay ahead of possible future IANA changes. Instead, let's leave the abbreviations list alone, and treat any "orphaned" dynamic abbreviation as just meaning the referenced time zone. It will behave a bit differently than it used to, in that you can't any longer override the zone's standard vs. daylight rule by using the "wrong" abbreviation of a pair, but that's better than failing entirely. (Also, this solution can be interpreted as adding a small new feature, which is that any abbreviation a user wants can be defined as referencing a time zone name.) Back-patch to all supported branches, since this problem affects all of them when using tzdata 2016f or newer. Report: <20160902031551.15674.67337@wrigleys.postgresql.org> Discussion: <6189.1472820913@sss.pgh.pa.us>	2016-09-02 17:30:02 -04:00
Heikki Linnakangas	ec136d19b2	Move code shared between libpq and backend from backend/libpq/ to common/. When building libpq, ip.c and md5.c were symlinked or copied from src/backend/libpq into src/interfaces/libpq, but now that we have a directory specifically for routines that are shared between the server and client binaries, src/common/, move them there. Some routines in ip.c were only used in the backend. Keep those in src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file that's now in common. Fix the comment in src/common/Makefile to reflect how libpq actually links those files. There are two more files that libpq symlinks directly from src/backend: encnames.c and wchar.c. I don't feel compelled to move those right now, though. Patch by Michael Paquier, with some changes by me. Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>	2016-09-02 13:49:59 +03:00
Heikki Linnakangas	9cca11c915	Speed up SUM calculation in numeric aggregates. This introduces a numeric sum accumulator, which performs better than repeatedly calling add_var(). The performance comes from using wider digits and delaying carry propagation, tallying positive and negative values separately, and avoiding a round of palloc/pfree on every value. This speeds up SUM(), as well as other standard aggregates like AVG() and STDDEV() that also calculate a sum internally. Reviewed-by: Andrey Borodin Discussion: <c0545351-a467-5b76-6d46-4840d1ea8aa4@iki.fi>	2016-09-02 11:51:49 +03:00
Heikki Linnakangas	9f85784cae	Support multiple iterators in the Red-Black Tree implementation. While we don't need multiple iterators at the moment, the interface is nicer and less dangerous this way. Aleksander Alekseev, with some changes by me.	2016-09-02 08:39:39 +03:00
Tom Lane	6c03d981a6	Change API of ShmemAlloc() so it throws error rather than returning NULL. A majority of callers seem to have believed that this was the API spec already, because they omitted any check for a NULL result, and hence would crash on an out-of-shared-memory failure. The original proposal was to just add such error checks everywhere, but that does nothing to prevent similar omissions in future. Instead, let's make ShmemAlloc() throw the error (so we can remove the caller-side checks that do exist), and introduce a new function ShmemAllocNoError() that has the previous behavior of returning NULL, for the small number of callers that need that and are prepared to do the right thing. This also lets us remove the rather wishy-washy behavior of printing a WARNING for out-of-shmem, which never made much sense: either the caller has a strategy for dealing with that, or it doesn't. It's not ShmemAlloc's business to decide whether a warning is appropriate. The v10 release notes will need to call this out as a significant source-code change. It's likely that it will be a bug fix for extension callers too, but if not, they'll need to change to using ShmemAllocNoError(). This is nominally a bug fix, but the odds that it's fixing any live bug are actually rather small, because in general the requests being made by the unchecked callers were already accounted for in determining the overall shmem size, so really they ought not fail. Between that and the possible impact on extensions, no back-patch. Discussion: <24843.1472563085@sss.pgh.pa.us>	2016-09-01 10:13:55 -04:00
Tom Lane	65a588b4c3	Try to fix portability issue in enum renumbering (again). The hack embodied in commit `4ba61a487` no longer works after today's change to allow DatumGetFloat4/Float4GetDatum to be inlined (commit `14cca1bf8`). Probably what's happening is that the faulty compilers are deciding that the now-inlined assignment is a no-op and so they're not required to round to float4 width. We had a bunch of similar issues earlier this year in the degree-based trig functions, and eventually settled on using volatile intermediate variables as the least ugly method of forcing recalcitrant compilers to do what the C standard says (cf commit `82311bcdd`). Let's see if that method works here. Discussion: <4640.1472664476@sss.pgh.pa.us>	2016-08-31 13:58:01 -04:00
Tom Lane	679226337a	Remove no-longer-useful SSL-specific Port.count field. Since we removed SSL renegotiation, there's no longer any reason to keep track of the amount of data transferred over the link. Daniel Gustafsson Discussion: <FEA7F89C-ECDF-4799-B789-2F8DDCBA467F@yesql.se>	2016-08-31 09:24:19 -04:00
Heikki Linnakangas	14cca1bf8e	Use static inline functions for float <-> Datum conversions. Now that we are OK with using static inline functions, we can use them to avoid function call overhead of pass-by-val versions of Float4GetDatum, DatumGetFloat8, and Float8GetDatum. Those functions are only a few CPU instructions long, but they could not be written into macros previously, because we need a local union variable for the conversion. I kept the pass-by-ref versions as regular functions. They are very simple too, but they call palloc() anyway, so shaving a few instructions from the function call doesn't seem so important there. Discussion: <dbb82a4a-2c15-ba27-dd0a-009d2aa72b77@iki.fi>	2016-08-31 16:00:28 +03:00
Tom Lane	0e0f43d6fd	Prevent starting a standalone backend with standby_mode on. This can't really work because standby_mode expects there to be more WAL arriving, which there will not ever be because there's no WAL receiver process to fetch it. Moreover, if standby_mode is on then hot standby might also be turned on, causing even more strangeness because that expects read-only sessions to be executing in parallel. Bernd Helmle reported a case where btree_xlog_delete_get_latestRemovedXid got confused, but rather than band-aiding individual problems it seems best to prevent getting anywhere near this state in the first place. Back-patch to all supported branches. In passing, also fix some omissions of errcodes in other ereport's in readRecoveryCommandFile(). Michael Paquier (errcode hacking by me) Discussion: <00F0B2CEF6D0CEF8A90119D4@eje.credativ.lan>	2016-08-31 08:52:13 -04:00
Tom Lane	052cc223d5	Fix a bunch of places that called malloc and friends with no NULL check. Where possible, use palloc or pg_malloc instead; otherwise, insert explicit NULL checks. Generally speaking, these are places where an actual OOM is quite unlikely, either because they're in client programs that don't allocate all that much, or they're very early in process startup so that we'd likely have had a fork() failure instead. Hence, no back-patch, even though this is nominally a bug fix. Michael Paquier, with some adjustments by me Discussion: <CAB7nPqRu07Ot6iht9i9KRfYLpDaF2ZuUv5y_+72uP23ZAGysRg@mail.gmail.com>	2016-08-30 18:22:43 -04:00
Alvaro Herrera	8e1e3f958f	Split hash.h → hash_xlog.h Since the hash AM is going to be revamped to have WAL, this is a good opportunity to clean up the include file a little bit to avoid including a lot of extra stuff in the future. Author: Amit Kapila	2016-08-29 18:55:49 -03:00
Heikki Linnakangas	9b7cd59af1	Remove support for OpenSSL versions older than 0.9.8. OpenSSL officially only supports 1.0.1 and newer. Some OS distributions still provide patches for 0.9.8, but anything older than that is not interesting anymore. Let's simplify things by removing compatibility code. Andreas Karlsson, with small changes by me.	2016-08-29 20:16:02 +03:00
Tom Lane	cf34fdbbe1	Make AllocSetContextCreate throw an error for bad context-size parameters. The previous behavior was to silently change them to something valid. That obscured the bugs fixed in commit `ea268cdc9`, and generally seems less useful than complaining. Unlike the previous commit, though, we'll do this in HEAD only --- it's a bit too late to be possibly breaking third-party code in 9.6. Discussion: <CA+TgmobNcELVd3QmLD3tx=w7+CokRQiC4_U0txjz=WHpfdkU=w@mail.gmail.com>	2016-08-29 09:29:26 -04:00
Fujii Masao	bd082231ed	Fix typos in comments.	2016-08-29 16:06:40 +09:00
Fujii Masao	bab7823a49	Fix pg_xlogdump so that it handles cross-page XLP_FIRST_IS_CONTRECORD record. Previously pg_xlogdump failed to dump the contents of the WAL file if the file starts with the continuation WAL record which spans more than one pages. Since pg_xlogdump assumed that the continuation record always fits on a page, it could not find the valid WAL record to start reading from in that case. This patch changes pg_xlogdump so that it can handle a continuation WAL record which crosses a page boundary and find the valid record to start reading from. Back-patch to 9.3 where pg_xlogdump was introduced. Author: Pavan Deolasee Reviewed-By: Michael Paquier and Craig Ringer Discussion: CABOikdPsPByMiG6J01DKq6om2+BNkxHTPkOyqHM2a4oYwGKsqQ@mail.gmail.com	2016-08-29 14:34:58 +09:00
Tom Lane	ea268cdc9a	Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>	2016-08-27 17:50:38 -04:00
Tom Lane	26fa446da6	Add a nonlocalized version of the severity field to client error messages. This has been requested a few times, but the use-case for it was never entirely clear. The reason for adding it now is that transmission of error reports from parallel workers fails when NLS is active, because pq_parse_errornotice() wrongly assumes that the existing severity field is nonlocalized. There are other ways we could have fixed that, but the other options were basically kluges, whereas this way provides something that's at least arguably a useful feature along with the bug fix. Per report from Jakob Egger. Back-patch into 9.6, because otherwise parallel query is essentially unusable in non-English locales. The problem exists in 9.5 as well, but we don't want to risk changing on-the-wire behavior in 9.5 (even though the possibility of new error fields is specifically called out in the protocol document). It may be sufficient to leave the issue unfixed in 9.5, given the very limited usefulness of pq_parse_errornotice in that version. Discussion: <A88E0006-13CB-49C6-95CC-1A77D717213C@eggerapps.at>	2016-08-26 16:20:17 -04:00
Tom Lane	78dcd027e8	Fix potential memory leakage from HandleParallelMessages(). HandleParallelMessages leaked memory into the caller's context. Since it's called from ProcessInterrupts, there is basically zero certainty as to what CurrentMemoryContext is, which means we could be leaking into long-lived contexts. Over the processing of many worker messages that would grow to be a problem. Things could be even worse than just a leak, if we happened to service the interrupt while ErrorContext is current: elog.c thinks it can reset that on its own whim, possibly yanking storage out from under HandleParallelMessages. Give HandleParallelMessages its own dedicated context instead, which we can reset during each call to ensure there's no accumulation of wasted memory. Discussion: <16610.1472222135@sss.pgh.pa.us>	2016-08-26 15:04:05 -04:00
Tom Lane	45a36e6853	Put static forward declarations in elog.c back into same order as code. The guiding principle for the last few patches in this area apparently involved throwing darts. Cosmetic only, but back-patch to 9.6 because there is no reason for 9.6 and HEAD to diverge yet in this file.	2016-08-26 14:19:03 -04:00
Tom Lane	8529036b53	Fix assorted small bugs in ThrowErrorData(). Copy the palloc'd strings into the correct context, ie ErrorContext not wherever the source ErrorData is. This would be a large bug, except that it appears that all catchers of thrown errors do either EmitErrorReport or CopyErrorData before doing anything that would cause transient memory contexts to be cleaned up. Still, it's wrong and it will bite somebody someday. Fix failure to copy cursorpos and internalpos. Utter the appropriate incantations involving recursion_depth, so that we'll behave sanely if we get an error inside pstrdup. (In general, the body of this function ought to act like, eg, errdetail().) Per code reading induced by Jakob Egger's report.	2016-08-26 14:15:47 -04:00
Tom Lane	fbf28b6b52	Fix logic for adding "parallel worker" context line to worker errors. The previous coding here was capable of adding a "parallel worker" context line to errors that were not, in fact, returned from a parallel worker. Instead of using an errcontext callback to add that annotation, just paste it onto the message by hand; this looks uglier but is more reliable. Discussion: <19757.1472151987@sss.pgh.pa.us>	2016-08-26 10:07:28 -04:00
Tom Lane	ae4760d667	Fix small query-lifespan memory leak in bulk updates. When there is an identifiable REPLICA IDENTITY index on the target table, heap_update leaks the id_attrs bitmapset. That's not many bytes, but it adds up over enough rows, since the code typically runs in a query-lifespan context. Bug introduced in commit `e55704d8b`, which did a rather poor job of cloning the existing use-pattern for RelationGetIndexAttrBitmap(). Per bug #14293 from Zhou Digoal. Back-patch to 9.4 where the bug was introduced. Report: <20160824114320.15676.45171@wrigleys.postgresql.org>	2016-08-24 22:20:25 -04:00
Tom Lane	2c00fad286	Fix improper repetition of previous results from a hashed aggregate. ExecReScanAgg's check for whether it could re-use a previously calculated hashtable neglected the possibility that the Agg node might reference PARAM_EXEC Params that are not referenced by its input plan node. That's okay if the Params are in upper tlist or qual expressions; but if one appears in aggregate input expressions, then the hashtable contents need to be recomputed when the Param's value changes. To avoid unnecessary performance degradation in the case of a Param that isn't within an aggregate input, add logic to the planner to determine which Params are within aggregate inputs. This requires a new field in struct Agg, but fortunately we never write plans to disk, so this isn't an initdb-forcing change. Per report from Jeevan Chalke. This has been broken since forever, so back-patch to all supported branches. Andrew Gierth, with minor adjustments by me Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>	2016-08-24 14:38:12 -04:00
Tom Lane	71e006f031	Suppress compiler warnings in non-cassert builds. With Asserts off, these variables are set but never used, resulting in warnings from pickier compilers. Fix that with our standard solution. Per report from Jeff Janes.	2016-08-23 23:21:10 -04:00
Tom Lane	32909a57f9	Fix network_spgist.c build failures from missing AF_INET definition. AF_INET is apparently defined in something that's pulled in automatically on Linux, but the buildfarm says that's not true everywhere. Comparing to network_gist.c suggests that including <sys/socket.h> ought to fix it, and the POSIX standard concurs.	2016-08-23 16:25:35 -04:00
Tom Lane	77e2906821	Create an SP-GiST opclass for inet/cidr. This seems to offer significantly better search performance than the existing GiST opclass for inet/cidr, at least on data with a wide mix of network mask lengths. (That may suggest that the data splitting heuristics in the GiST opclass could be improved.) Emre Hasegeli, with mostly-cosmetic adjustments by me Discussion: <CAE2gYzxtth9qatW_OAqdOjykS0bxq7AYHLuyAQLPgT7H9ZU0Cw@mail.gmail.com>	2016-08-23 15:16:30 -04:00
Robert Haas	0fda682e54	Extend dsm API with a new function dsm_unpin_segment. If you have previously pinned a segment and decide that you don't actually want to keep it around until shutdown, this new API lets you remove the pin. This is pretty trivial except on Windows, where it requires closing the duplicate handle that was used to implement the pin. Thomas Munro and Amit Kapila, reviewed by Amit Kapila and by me.	2016-08-23 14:32:23 -04:00
Robert Haas	19998730ae	Remove duplicate function prototype. Kyotaro Horiguchi	2016-08-23 13:44:18 -04:00
Tom Lane	d2ddee63b4	Improve SP-GiST opclass API to better support unlabeled nodes. Previously, the spgSplitTuple action could only create a new upper tuple containing a single labeled node. This made it useless for opclasses that prefer to work with fixed sets of nodes (labeled or otherwise), which meant that restrictive prefixes could not be used with such node definitions. Change the output field set for the choose() method to allow it to specify any valid node set for the new upper tuple, and to specify which of these nodes to place the modified lower tuple in. In addition to its primary use for fixed node sets, this feature could allow existing opclasses that use variable node sets to skip a separate spgAddNode action when splitting a tuple, by setting up the node needed for the incoming value as part of the spgSplitTuple action. However, care would have to be taken to add the extra node only when it would not make the tuple bigger than before. (spgAddNode can enlarge the tuple, spgSplitTuple can't.) This is a prerequisite for an upcoming SP-GiST inet opclass, but is being committed separately to increase the visibility of the API change. In passing, improve the documentation about the traverse-values feature that was added by commit `ccd6eb49a`. Emre Hasegeli, with cosmetic adjustments and documentation rework by me Discussion: <CAE2gYzxtth9qatW_OAqdOjykS0bxq7AYHLuyAQLPgT7H9ZU0Cw@mail.gmail.com>	2016-08-23 12:10:34 -04:00
Robert Haas	86f31695f3	Add txid_current_ifassigned(). Add a variant of txid_current() that returns NULL if no transaction ID is assigned. This version can be used even on a standby server, although it will always return NULL since no transaction IDs can be assigned during recovery. Craig Ringer, per suggestion from Jim Nasby. Reviewed by Petr Jelinek and by me.	2016-08-23 10:30:52 -04:00
Robert Haas	ff36700c3b	Remove duplicate word from comment. Erik Rijkers	2016-08-23 10:05:13 -04:00
Tom Lane	7b405b3e04	Refactor some network.c code to create cidr_set_masklen_internal(). Merge several copies of "copy an inet value and adjust the mask length" code to create a single, conveniently C-callable function. This function is exported for future use by inet SPGiST support, but it's good cleanup anyway since we had three slightly-different-for-no-good-reason copies. (Extracted from a larger patch, to separate new code from refactoring of old code) Emre Hasegeli	2016-08-23 09:39:54 -04:00
Robert Haas	008c4135cc	Fix possible sorting error when aborting use of abbreviated keys. Due to an error in the abbreviated key abort logic, the most recently processed SortTuple could be incorrectly marked NULL, resulting in an incorrect final sort order. In the worst case, this could result in a corrupt btree index, which would need to be rebuild using REINDEX. However, abbrevation doesn't abort very often, not all data types use it, and only one tuple would end up in the wrong place, so the practical impact of this mistake may be somewhat limited. Report and patch by Peter Geoghegan.	2016-08-22 15:22:11 -04:00
Robert Haas	af5743851d	Improve header comment for LockHasWaitersRelation. Dimitry Ivanov spotted a typo, and I added a bit of wordsmithing.	2016-08-22 11:53:20 -04:00
Tom Lane	8299471c37	Use LEFT JOINs in some system views in case referenced row doesn't exist. In particular, left join to pg_authid so that rows in pg_stat_activity don't disappear if the session's owning user has been dropped. Also convert a few joins to pg_database to left joins, in the same spirit, though that case might be harder to hit. We were doing this in other views already, so it was a bit inconsistent that these views didn't. Oskari Saarenmaa, with some further tweaking by me Discussion: <56E87CD8.60007@ohmu.fi>	2016-08-19 17:13:47 -04:00
Tom Lane	65a603e903	Guard against parallel-restricted functions in VALUES expressions. Obvious brain fade in set_rel_consider_parallel(). Noticed it while adjusting the adjacent RTE_FUNCTION case. In 9.6, also make the code look more like what I just did in HEAD by removing the unnecessary function_rte_parallel_ok subroutine (it does nothing that expression_tree_walker wouldn't do).	2016-08-19 14:35:32 -04:00
Tom Lane	da1c91631e	Speed up planner's scanning for parallel-query hazards. We need to scan the whole parse tree for parallel-unsafe functions. If there are none, we'll later need to determine whether particular subtrees contain any parallel-restricted functions. The previous coding retained no knowledge from the first scan, even though this is very wasteful in the common case where the query contains only parallel-safe functions. We can bypass all of the later scans by remembering that fact. This provides a small but measurable speed improvement when the case applies, and shouldn't cost anything when it doesn't. Patch by me, reviewed by Robert Haas Discussion: <3740.1471538387@sss.pgh.pa.us>	2016-08-19 14:03:13 -04:00

1 2 3 4 5 ...

16418 Commits