postgresql

Commit Graph

Author	SHA1	Message	Date
Andres Freund	5aa2350426	Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit `73c986add`) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de	2015-04-29 19:30:53 +02:00
Peter Eisentraut	f95425478a	Fix hstore_plperl regression tests on some platforms On some platforms, plperl and plperlu cannot be loaded at the same time. So split the test into two separate test files.	2015-04-26 16:13:58 -04:00
Peter Eisentraut	cac7658205	Add transforms feature This provides a mechanism for specifying conversions between SQL data types and procedural languages. As examples, there are transforms for hstore and ltree for PL/Perl and PL/Python. reviews by Pavel Stěhule and Andres Freund	2015-04-26 10:33:14 -04:00
Tom Lane	f320cbb615	Fix typo in linux startup script. Missed a "$" in what was meant to be a variable substitution. Careless mistake in commit `f23425fa95`.	2015-04-26 09:43:15 -04:00
Peter Eisentraut	dcae5facca	Improve speed of make check-world Before, make check-world would create a new temporary installation for each test suite, which is slow and wasteful. Instead, we now create one test installation that is used by all test suites that are part of a make run. The management of the temporary installation is removed from pg_regress and handled in the makefiles. This allows for better control, and unifies the code with that of test suites not run through pg_regress. review and msvc support by Michael Paquier <michael.paquier@gmail.com> more review by Fabien Coelho <coelho@cri.ensmp.fr>	2015-04-23 08:59:52 -04:00
Stephen Frost	4ccc5bd28e	Pull in tableoid for inheiritance with rowMarks As noted by Etsuro Fujita [1] and Dean Rasheed[2], `cb1ca4d800` changed ExecBuildAuxRowMark() to always look for the tableoid in the target list, but didn't also change preprocess_targetlist() to always include the tableoid. This resulted in errors with soon-to-be-added RLS with inheritance tests, and errors when using inheritance with foreign tables. Authors: Etsuro Fujita and Dean Rasheed (independently) Minor word-smithing on the comments by me. [1] 552CF0B6.8010006@lab.ntt.co.jp [2] CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com	2015-04-22 11:29:35 -04:00
Andres Freund	cef939c347	Rename pg_replication_slot's new active_in to active_pid. In `d811c037ce` active_in was added but discussion since showed that active_pid is preferred as a name. Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com	2015-04-22 09:43:40 +02:00
Peter Eisentraut	b0a738f428	Move pg_xlogdump from contrib/ to src/bin/ Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-21 19:03:49 -04:00
Andres Freund	d811c037ce	Add 'active_in' column to pg_replication_slots. Right now it is visible whether a replication slot is active in any session, but not in which. Adding the active_in column, containing the pid of the backend having acquired the slot, makes it much easier to associate pg_replication_slots entries with the corresponding pg_stat_replication/pg_stat_activity row. This should have been done from the start, but I (Andres) dropped the ball there somehow. Author: Craig Ringer, revised by me Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com	2015-04-21 11:51:06 +02:00
Peter Eisentraut	528c2e44ab	Move pg_test_timing from contrib/ to src/bin/ Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-20 21:30:12 -04:00
Peter Eisentraut	00882d9e5c	Move pg_test_fsync from contrib/ to src/bin/ Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-19 22:20:49 -04:00
Peter Eisentraut	9fa8b0ee90	Move pg_upgrade from contrib/ to src/bin/ Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-14 19:26:38 -04:00
Peter Eisentraut	30982be4e5	Integrate pg_upgrade_support module into backend Previously, these functions were created in a schema "binary_upgrade", which was deleted after pg_upgrade was finished. Because we don't want to keep that schema around permanently, move them to pg_catalog but rename them with a binary_upgrade_... prefix. The provided functions are only small wrappers around global variables that were added specifically for pg_upgrade use, so keeping the module separate does not create any modularity. The functions still check that they are only called in binary upgrade mode, so it is not possible to call these during normal operation. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-14 19:26:37 -04:00
Heikki Linnakangas	4f700bcd20	Reorganize our CRC source files again. Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.	2015-04-14 17:03:42 +03:00
Peter Eisentraut	81134af3ec	Move pgbench from contrib/ to src/bin/ Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-13 13:07:16 -04:00
Peter Eisentraut	83aca89f7c	Move pg_archivecleanup from contrib/ to src/bin/ Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-11 23:29:18 -04:00
Alvaro Herrera	27846f02c1	Optimize locking a tuple already locked by another subxact Locking and updating the same tuple repeatedly led to some strange multixacts being created which had several subtransactions of the same parent transaction holding locks of the same strength. However, once a subxact of the current transaction holds a lock of a given strength, it's not necessary to acquire the same lock again. This made some coding patterns much slower than required. The fix is twofold. First we change HeapTupleSatisfiesUpdate to return HeapTupleBeingUpdated for the case where the current transaction is already a single-xid locker for the given tuple; it used to return HeapTupleMayBeUpdated for that case. The new logic is simpler, and the change to pgrowlocks is a testament to that: previously we needed to check for the single-xid locker separately in a very ugly way. That test is simpler now. As fallout from the HTSU change, some of its callers need to be amended so that tuple-locked-by-own-transaction is taken into account in the BeingUpdated case rather than the MayBeUpdated case. For many of them there is no difference; but heap_delete() and heap_update now check explicitely and do not grab tuple lock in that case. The HTSU change also means that routine MultiXactHasRunningRemoteMembers introduced in commit `11ac4c73cb` is no longer necessary and can be removed; the case that used to require it is now handled naturally as result of the changes to heap_delete and heap_update. The second part of the fix to the performance issue is to adjust heap_lock_tuple to avoid the slowness: 1. Previously we checked for the case that our own transaction already held a strong enough lock and returned MayBeUpdated, but only in the multixact case. Now we do it for the plain Xid case as well, which saves having to LockTuple. 2. If the current transaction is the only locker of the tuple (but with a lock not as strong as what we need; otherwise it would have been caught in the check mentioned above), we can skip sleeping on the multixact, and instead go straight to create an updated multixact with the additional lock strength. 3. Most importantly, make sure that both the single-xid-locker case and the multixact-locker case optimization are applied always. We do this by checking both in a single place, rather than them appearing in two separate portions of the routine -- something that is made possible by the HeapTupleSatisfiesUpdate API change. Previously we would only check for the single-xid case when HTSU returned MayBeUpdated, and only checked for the multixact case when HTSU returned BeingUpdated. This was at odds with what HTSU actually returned in one case: if our own transaction was locker in a multixact, it returned MayBeUpdated, so the optimization never applied. This is what led to the large multixacts in the first place. Per bug report #8470 by Oskari Saarenmaa.	2015-04-10 13:47:15 -03:00
Robert Haas	e41beea0dd	Improve pgbench error reporting. This would have been worth doing on general principle anyway, but the recent addition of an expression syntax to pgbench makes it an even better idea than it would have been otherwise. Fabien Coelho	2015-04-02 16:26:49 -04:00
Andres Freund	62e2a8dc2c	Define integer limits independently from the system definitions. In `83ff1618` we defined integer limits iff they're not provided by the system. That turns out not to be the greatest idea because there's different ways some datatypes can be represented. E.g. on OSX PG's 64bit datatype will be a 'long int', but OSX unconditionally uses 'long long'. That disparity then can lead to warnings, e.g. around printf formats. One way to fix that would be to back int64 using stdint.h's int64_t. While a good idea it's not that easy to implement. We would e.g. need to include stdint.h in our external headers, which we don't today. Also computing the correct int64 printf formats in that case is nontrivial. Instead simply prefix the integer limits with PG_ and define them unconditionally. I've adjusted all the references to them in code, but not the ones in comments; the latter seems unnecessary to me. Discussion: 20150331141423.GK4878@alap3.anarazel.de	2015-04-02 17:43:35 +02:00
Bruce Momjian	a0efc71453	pg_upgrade: call 'postgres' binary to get data directory location This matches the binary 'pg_ctl' calls. Previously we called the 'postmaster'. Report by Christoph Berg	2015-04-01 18:25:45 -04:00
Bruce Momjian	0cf16b44cb	btree_gin: properly call DirectFunctionCall1() Previously we called DirectFunctionCall3() with dummy arguments. Fixed version of previous patch. Report by Jon Nelson	2015-03-31 10:26:45 -04:00
Heikki Linnakangas	1d0db8de04	Remove spurious semicolons. Petr Jelinek	2015-03-31 15:12:27 +03:00
Andrew Dunstan	fa1e5afa8a	Run pg_upgrade and pg_resetxlog with restricted token on Windows As with initdb these programs need to run with a restricted token, and if they don't pg_upgrade will fail when run as a user with Adminstrator privileges. Backpatch to all live branches. On the development branch the code is reorganized so that the restricted token code is now in a single location. On the stable bramches a less invasive change is made by simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c. Patches and bug report from Muhammad Asif Naeem, reviewed by Michael Paquier, slightly edited by me.	2015-03-30 17:07:52 -04:00
Tom Lane	542320c2bd	Be more careful about printing constants in ruleutils.c. The previous coding in get_const_expr() tried to avoid quoting integer, float, and numeric literals if at all possible. While that looks nice, it means that dumped expressions might re-parse to something that's semantically equivalent but not the exact same parsetree; for example a FLOAT8 constant would re-parse as a NUMERIC constant with a cast to FLOAT8. Though the result would be the same after constant-folding, this is problematic in certain contexts. In particular, Jeff Davis pointed out that this could cause unexpected failures in ALTER INHERIT operations because of child tables having not-exactly-equivalent CHECK expressions. Therefore, favor correctness over legibility and dump such constants in quotes except in the limited cases where they'll be interpreted as the same type even without any casting. This results in assorted small changes in the regression test outputs, and will affect display of user-defined views and rules similarly. The odds of that causing problems in the field seem non-negligible; given the lack of previous complaints, it seems best not to change this in the back branches.	2015-03-30 14:59:49 -04:00
Tom Lane	e9dd03c03a	Minor code cleanups in pgbench expression support. Get rid of unnecessary expr_yylex declaration (we haven't supported flex 2.5.4 in a long time, and even if we still did, the declaration in pgbench.h makes this one unnecessary and inappropriate). Fix copyright dates, improve some layout choices, etc.	2015-03-29 13:06:59 -04:00
Tom Lane	2c33e0fbce	Better fix for misuse of Float8GetDatumFast(). We can use that macro as long as we put the value into a local variable. Commit `735cd6128` was not wrong on its own terms, but I think this way looks nicer, and it should save a few cycles on 32-bit machines.	2015-03-28 13:56:37 -04:00
Andrew Dunstan	cfe12763c3	Use standard librart sqrt function in pg_stat_statements The stddev calculation included a faster but unportable sqrt function. This is not worth the extra effort, and won't work everywhere. If the standard library function is good enough for the SQL function it should be good enough here too.	2015-03-28 09:22:51 -04:00
Heikki Linnakangas	e09b48316c	Add index-only scan support to btree_gist. inet, cidr, and timetz indexes still cannot support index-only scans, because they don't store the original unmodified value in the index, but a derived approximate value.	2015-03-27 23:35:16 +02:00
Andrew Dunstan	735cd6128a	Fix portability issues with stddev in pg_stat_statements Stddev is calculated on the fly, and the code in commit `717f709532` was using Float8GetDatumFast() inappropriately to convert the result to a Datum. Mea culpa. It now uses Float8GetDatum().	2015-03-27 17:29:59 -04:00
Andrew Dunstan	717f709532	Add stats for min, max, mean, stddev times to pg_stat_statements. The new fields are min_time, max_time, mean_time and stddev_time. Based on an original patch from Mitsumasa KONDO, modified by me. Reviewed by Petr Jelínek.	2015-03-27 15:43:22 -04:00
Heikki Linnakangas	8816af65e4	Minor refactoring of btree_gist code. The gbt_var_key_copy function was doing two different things depending on the boolean argument. Seems cleaner to have two separate functions. Remove unused argument from gbt_num_compress.	2015-03-26 23:10:10 +02:00
Tom Lane	785941cdc3	Tweak __attribute__-wrapping macros for better pgindent results. This improves on commit `bbfd7edae5` by making two simple changes: * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn(). Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces pgindent's tendency to misformat declarations involving them. * attributes are now always attached to function declarations, not definitions. Previously some places were taking creative shortcuts, which were not merely candidates for bad misformatting by pgindent but often were outright wrong anyway. (It does little good to put a noreturn annotation where callers can't see it.) In any case, if we would like to believe that these macros can be used with non-gcc compilers, we should avoid gratuitous variance in usage patterns. I also went through and manually improved the formatting of a lot of declarations, and got rid of excessively repetitive (and now obsolete anyway) comments informing the reader what pg_attribute_printf is for.	2015-03-26 14:03:25 -04:00
Andres Freund	83ff1618bc	Centralize definition of integer limits. Several submitted and even committed patches have run into the problem that C89, our baseline, does not provide minimum/maximum values for various integer datatypes. C99's stdint.h does, but we can't rely on it. Several parts of the code defined limits locally, so instead centralize the definitions to c.h. This patch also changes the more obvious usages of literal limit values; there's more places that could be changed, but it's less clear whether it's beneficial to change those. Author: Andrew Gierth Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk	2015-03-25 22:39:42 +01:00
Bruce Momjian	11226e3817	Revert commit `843cd0bfe6` Report by Tom Lane	2015-03-24 22:35:05 -04:00
Bruce Momjian	843cd0bfe6	btree_gin: properly call DirectFunctionCall1() Previously we called DirectFunctionCall3() with dummy arguments. Patch by Jon Nelson	2015-03-24 20:53:29 -04:00
Tom Lane	cb1ca4d800	Allow foreign tables to participate in inheritance. Foreign tables can now be inheritance children, or parents. Much of the system was already ready for this, but we had to fix a few things of course, mostly in the area of planner and executor handling of row locks. As side effects of this, allow foreign tables to have NOT VALID CHECK constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to disallow these things would've required bizarre and inconsistent special cases in inheritance behavior. Since foreign tables don't enforce CHECK constraints anyway, a NOT VALID one is a complete no-op, but that doesn't mean we shouldn't allow it. And it's possible that some FDWs might have use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops for most. An additional change in support of this is that when a ModifyTable node has multiple target tables, they will all now be explicitly identified in EXPLAIN output, for example: Update on pt1 (cost=0.00..321.05 rows=3541 width=46) Update on pt1 Foreign Update on ft1 Foreign Update on ft2 Update on child3 -> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46) -> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46) -> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46) -> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46) This was done mainly to provide an unambiguous place to attach "Remote SQL" fields, but it is useful for inherited updates even when no foreign tables are involved. Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro Horiguchi, some additional hacking by me	2015-03-22 13:53:21 -04:00
Tom Lane	8d1f239003	Replace insertion sort in contrib/intarray with qsort(). It's all very well to claim that a simplistic sort is fast in easy cases, but O(N^2) in the worst case is not good ... especially if the worst case is as easy to hit as "descending order input". Replace that bit with our standard qsort. Per bug #12866 from Maksym Boguk. Back-patch to all active branches.	2015-03-15 23:22:03 -04:00
Tom Lane	7b8b8a4331	Improve representation of PlanRowMark. This patch fixes two inadequacies of the PlanRowMark representation. First, that the original LockingClauseStrength isn't stored (and cannot be inferred for foreign tables, which always get ROW_MARK_COPY). Since some PlanRowMarks are created out of whole cloth and don't actually have an ancestral RowMarkClause, this requires adding a dummy LCS_NONE value to enum LockingClauseStrength, which is fairly annoying but the alternatives seem worse. This fix allows getting rid of the use of get_parse_rowmark() in FDWs (as per the discussion around commits `462bd95705` and `8ec8760fc8`), and it simplifies some things elsewhere. Second, that the representation assumed that all child tables in an inheritance hierarchy would use the same RowMarkType. That's true today but will soon not be true. We add an "allMarkTypes" field that identifies the union of mark types used in all a parent table's children, and use that where appropriate (currently, only in preprocess_targetlist()). In passing fix a couple of minor infelicities left over from the SKIP LOCKED patch, notably that _outPlanRowMark still thought waitPolicy is a bool. Catversion bump is required because the numeric values of enum LockingClauseStrength can appear in on-disk rules. Extracted from a much larger patch to support foreign table inheritance; it seemed worth breaking this out, since it's a separable concern. Shigeru Hanada and Etsuro Fujita, somewhat modified by me	2015-03-15 18:41:47 -04:00
Robert Haas	e96b7c6b9f	sepgsql: Improve error message when unsupported object type is labeled. KaiGai Kohei, reviewed by Álvaro Herrera and myself	2015-03-11 12:12:10 -04:00
Andres Freund	bbfd7edae5	Add macros wrapping all usage of gcc's __attribute__. Until now __attribute__() was defined to be empty for all compilers but gcc. That's problematic because it prevents using it in other compilers; which is necessary e.g. for atomics portability. It's also just generally dubious to do so in a header as widely included as c.h. Instead add pg_attribute_format_arg, pg_attribute_printf, pg_attribute_noreturn macros which are implemented in the compilers that understand them. Also add pg_attribute_noreturn and pg_attribute_packed, but don't provide fallbacks, since they can affect functionality. This means that external code that, possibly unwittingly, relied on __attribute__ defined to be empty on !gcc compilers may now run into warnings or errors on those compilers. But there shouldn't be many occurances of that and it's hard to work around... Discussion: 54B58BA3.8040302@ohmu.fi Author: Oskari Saarenmaa, with some minor changes by me.	2015-03-11 14:30:01 +01:00
Fujii Masao	57aa5b2bb1	Add GUC to enable compression of full page images stored in WAL. When newly-added GUC parameter, wal_compression, is on, the PostgreSQL server compresses a full page image written to WAL when full_page_writes is on or during a base backup. A compressed page image will be decompressed during WAL replay. Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay. This commit changes the WAL format (so bumping WAL version number) so that the one-byte flag indicating whether a full page image is compressed or not is included in its header information. This means that the commit increases the WAL volume one-byte per a full page image even if WAL compression is not used at all. We can save that one-byte by borrowing one-bit from the existing field like hole_offset in the header and using it as the flag, for example. But which would reduce the code readability and the extensibility of the feature. Per discussion, it's not worth paying those prices to save only one-byte, so we decided to add the one-byte flag to the header. This commit doesn't introduce any new compression algorithm like lz4. Currently a full page image is compressed using the existing PGLZ algorithm. Per discussion, we decided to use it at least in the first version of the feature because there were no performance reports showing that its compression ratio is unacceptably lower than that of other algorithm. Of course, in the future, it's worth considering the support of other compression algorithm for the better compression. Rahila Syed and Michael Paquier, reviewed in various versions by myself, Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.	2015-03-11 15:52:24 +09:00
Alvaro Herrera	e491bd2ee3	Move BRIN page type to page's last two bytes ... which is the usual convention among AMs, so that pg_filedump and similar utilities can tell apart pages of different AMs. It was also the intent of the original code, but I failed to realize that alignment considerations would move the whole thing to the previous-to-last word in the page. The new definition of the associated macro makes surrounding code a bit leaner, too. Per note from Heikki at http://www.postgresql.org/message-id/546A16EF.9070005@vmware.com	2015-03-10 12:27:15 -03:00
Heikki Linnakangas	f1fd515b39	Move WAL-related definitions from dbcommands.h to separate header file. This makes it easier to write frontend programs that needs to understand the WAL record format of CREATE/DROP DATABASE. dbcommands.h cannot easily be #included in a frontend program, because it pulls in other header files that need backend stuff, but the new dbcommands_xlog.h header file has fewer dependencies.	2015-03-09 15:50:49 +02:00
Alvaro Herrera	c6ee39bc85	Fix contrib/file_fdw's expected file I forgot to update it on yesterday's `cf34e373fc`.	2015-03-06 11:47:09 -03:00
Robert Haas	e5f3690249	pgbench: Fix mistakes in Makefile. My commit `878fdcb843` was not quite right. Tom Lane pointed out one of the mistakes fixed here, and I noticed the other myself while reviewing what I'd committed.	2015-03-03 10:48:16 -05:00
Robert Haas	878fdcb843	pgbench: Add a real expression syntax to \set Previously, you could do \set variable operand1 operator operand2, but nothing more complicated. Now, you can \set variable expression, which makes it much simpler to do multi-step calculations here. This also adds support for the modulo operator (%), with the same semantics as in C. Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and Stephen Frost	2015-03-02 14:21:41 -05:00
Tom Lane	2e211211a7	Use FLEXIBLE_ARRAY_MEMBER in a number of other places. I think we're about done with this...	2015-02-21 16:12:14 -05:00
Tom Lane	e1a11d9311	Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[]. This requires changing quite a few places that were depending on sizeof(HeapTupleHeaderData), but it seems for the best. Michael Paquier, some adjustments by me	2015-02-21 15:13:06 -05:00
Tom Lane	c110eff132	Use FLEXIBLE_ARRAY_MEMBER in struct RecordIOData. I (tgl) fixed this last night in rowtypes.c, but I missed that the code had been copied into a couple of other places. Michael Paquier	2015-02-20 17:03:12 -05:00
Tom Lane	09d8d110a6	Use FLEXIBLE_ARRAY_MEMBER in a bunch more places. Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]". Aside from being more self-documenting, this should help prevent bogus warnings from static code analyzers and perhaps compiler misoptimizations. This patch is just a down payment on eliminating the whole problem, but it gets rid of a lot of easy-to-fix cases. Note that the main problem with doing this is that one must no longer rely on computing sizeof(the containing struct), since the result would be compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf also warns against spelling that offsetof(struct, lastfield[0]). Michael Paquier, review and additional fixes by me.	2015-02-20 00:11:42 -05:00
Kevin Grittner	c923e82a23	Eliminate unnecessary NULL checks in picksplit method of intarray. Where these checks were being done there was no code path which could leave them NULL. Michael Paquier per Coverity	2015-02-16 15:26:23 -06:00
Tom Lane	80986e85aa	Avoid returning undefined bytes in chkpass_in(). We can't really fix the problem that the result is defined to depend on random(), so it is still going to fail the "unstable input conversion" test in parse_type.c. However, we can at least satify valgrind. (It looks like this code used to be valgrind-clean, actually, until somebody did a careless s/strncpy/strlcpy/g on it.) In passing, let's just make real sure that chkpass_out doesn't overrun its output buffer. No need for backpatch, I think, since this is just to satisfy debugging tools. Asif Naeem	2015-02-14 12:20:56 -05:00
Bruce Momjian	dc01efa5cc	pg_upgrade: improve checksum mismatch error message Patch by Greg Sabino Mullane, slight adjustments by me	2015-02-11 22:22:26 -05:00
Bruce Momjian	056764b102	pg_upgrade: quote directory names in delete_old_cluster script This allows the delete script to properly function when special characters appear in directory paths, e.g. spaces. Backpatch through 9.0	2015-02-11 22:06:04 -05:00
Heikki Linnakangas	c619c2351f	Move pg_crc.c to src/common, and remove pg_crc_tables.h To get CRC functionality in a client program, you now need to link with libpgcommon instead of libpgport. The CRC code has nothing to do with portability, so libpgcommon is a better home. (libpgcommon didn't exist when pg_crc.c was originally moved to src/port.) Remove the possibility to get CRC functionality by just #including pg_crc_tables.h. I'm not aware of any extensions that actually did that and couldn't simply link with libpgcommon. This also moves the pg_crc.h header file from src/include/utils to src/include/common, which will require changes to any external programs that currently does #include "utils/pg_crc.h". That seems acceptable, as include/common is clearly the right home for it now, and the change needed to any such programs is trivial.	2015-02-09 11:17:56 +02:00
Robert Haas	370b3a4618	pgcrypto: Code cleanup for decrypt_internal. Remove some unnecessary null-tests, and replace a goto-label construct with an "if" block. Michael Paquier, reviewed by me.	2015-02-04 08:46:32 -05:00
Heikki Linnakangas	4eaafa0453	Remove dead code. Commit `13629df` changed metaphone() function to return an empty string on empty input, but it left the old error message in place. It's now dead code. Michael Paquier, per Coverity warning.	2015-02-03 09:43:44 +02:00
Noah Misch	59b919822a	Prevent Valgrind Memcheck errors around px_acquire_system_randomness(). This function uses uninitialized stack and heap buffers as supplementary entropy sources. Mark them so Memcheck will not complain. Back-patch to 9.4, where Valgrind Memcheck cooperation first appeared. Marko Tiikkaja	2015-02-02 10:00:45 -05:00
Noah Misch	8b59672d8d	Cherry-pick security-relevant fixes from upstream imath library. This covers alterations to buffer sizing and zeroing made between imath 1.3 and imath 1.20. Valgrind Memcheck identified the buffer overruns and reliance on uninitialized data; their exploit potential is unknown. Builds specifying --with-openssl are unaffected, because they use the OpenSSL BIGNUM facility instead of imath. Back-patch to 9.0 (all supported versions). Security: CVE-2015-0243	2015-02-02 10:00:45 -05:00
Noah Misch	1dc7551586	Fix buffer overrun after incomplete read in pullf_read_max(). Most callers pass a stack buffer. The ensuing stack smash can crash the server, and we have not ruled out the viability of attacks that lead to privilege escalation. Back-patch to 9.0 (all supported versions). Marko Tiikkaja Security: CVE-2015-0243	2015-02-02 10:00:45 -05:00
Tom Lane	a59ee88197	Fix Coverity warning about contrib/pgcrypto's mdc_finish(). Coverity points out that mdc_finish returns a pointer to a local buffer (which of course is gone as soon as the function returns), leaving open a risk of misbehaviors possibly as bad as a stack overwrite. In reality, the only possible call site is in process_data_packets() which does not examine the returned pointer at all. So there's no live bug, but nonetheless the code is confusing and risky. Refactor to avoid the issue by letting process_data_packets() call mdc_finish() directly instead of going through the pullf_read() API. Although this is only cosmetic, it seems good to back-patch so that the logic in pgp-decrypt.c stays in sync across all branches. Marko Kreen	2015-01-30 13:05:30 -05:00
Tom Lane	37507962c3	Handle unexpected query results, especially NULLs, safely in connectby(). connectby() didn't adequately check that the constructed SQL query returns what it's expected to; in fact, since commit `08c33c426b` it wasn't checking that at all. This could result in a null-pointer-dereference crash if the constructed query returns only one column instead of the expected two. Less excitingly, it could also result in surprising data conversion failures if the constructed query returned values that were not I/O-conversion-compatible with the types specified by the query calling connectby(). In all branches, insist that the query return at least two columns; this seems like a minimal sanity check that can't break any reasonable use-cases. In HEAD, insist that the constructed query return the types specified by the outer query, including checking for typmod incompatibility, which the code never did even before it got broken. This is to hide the fact that the implementation does a conversion to text and back; someday we might want to improve that. In back branches, leave that alone, since adding a type check in a minor release is more likely to break things than make people happy. Type inconsistencies will continue to work so long as the actual type and declared type are I/O representation compatible, and otherwise will fail the same way they used to. Also, in all branches, be on guard for NULL results from the constructed query, which formerly would cause null-pointer dereference crashes. We now print the row with the NULL but don't recurse down from it. In passing, get rid of the rather pointless idea that build_tuplestore_recursively() should return the same tuplestore that's passed to it. Michael Paquier, adjusted somewhat by me	2015-01-29 20:18:33 -05:00
Andres Freund	ed127002d8	Align buffer descriptors to cache line boundaries. Benchmarks has shown that aligning the buffer descriptor array to cache lines is important for scalability; especially on bigger, multi-socket, machines. Currently the array sometimes already happens to be aligned by happenstance, depending how large previous shared memory allocations were. That can lead to wildly varying performance results after minor configuration changes. In addition to aligning the start of descriptor array, also force the size of individual descriptors to be of a common cache line size (64 bytes). That happens to already be the case on 64bit platforms, but this way we can change the struct BufferDesc more easily. As the alignment primarily matters in highly concurrent workloads which probably all are 64bit these days, and the space wastage of element alignment would be a bit more noticeable on 32bit systems, we don't force the stride to be cacheline sized on 32bit platforms for now. If somebody does actual performance testing, we can reevaluate that decision by changing the definition of BUFFERDESC_PADDED_SIZE. Discussion: 20140202151319.GD32123@awork2.anarazel.de Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter Geoghegan.	2015-01-29 22:48:45 +01:00
Heikki Linnakangas	670bf71f65	Remove dead NULL-pointer checks in GiST code. gist_poly_compress() and gist_circle_compress() checked for a NULL-pointer key argument, but that was dead code; the gist code never passes a NULL-pointer to the "compress" method. This commit also removes a documentation note added in commit `a0a3883`, about doing NULL-pointer checks in the "compress" method. It was added based on the fact that some implementations were doing NULL-pointer checks, but those checks were unnecessary in the first place. The NULL-pointer check in gbt_var_same() function was also unnecessary. The arguments to the "same" method come from the "compress", "union", or "picksplit" methods, but none of them return a NULL pointer. None of this is to be confused with SQL NULL values. Those are dealt with by the gist machinery, and are never passed to the GiST opclass methods. Michael Paquier	2015-01-28 10:03:58 +02:00
Tom Lane	dabda64152	Fix volatile-safety issue in dblink's materializeQueryResult(). Some fields of the sinfo struct are modified within PG_TRY and then referenced within PG_CATCH, so as with recent patch to async.c, "volatile" is necessary for strict POSIX compliance; and that propagates to a couple of subroutines as well as materializeQueryResult() itself. I think the risk of actual issues here is probably higher than in async.c, because storeQueryResult() is likely to get inlined into materializeQueryResult(), leaving the compiler free to conclude that its stores into sinfo fields are dead code.	2015-01-26 15:17:33 -05:00
Tom Lane	586dd5d6a5	Replace a bunch more uses of strncpy() with safer coding. strncpy() has a well-deserved reputation for being unsafe, so make an effort to get rid of nearly all occurrences in HEAD. A large fraction of the remaining uses were passing length less than or equal to the known strlen() of the source, in which case no null-padding can occur and the behavior is equivalent to memcpy(), though doubtless slower and certainly harder to reason about. So just use memcpy() in these cases. In other cases, use either StrNCpy() or strlcpy() as appropriate (depending on whether padding to the full length of the destination buffer seems useful). I left a few strncpy() calls alone in the src/timezone/ code, to keep it in sync with upstream (the IANA tzcode distribution). There are also a few such calls in ecpg that could possibly do with more analysis. AFAICT, none of these changes are more than cosmetic, except for the four occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength source leads to a non-null-terminated destination buffer and ensuing misbehavior. These don't seem like security issues, first because no stack clobber is possible and second because if your values of sslcert etc are coming from untrusted sources then you've got problems way worse than this. Still, it's undesirable to have unpredictable behavior for overlength inputs, so back-patch those four changes to all active branches.	2015-01-24 13:05:42 -05:00
Tom Lane	eb213acfe2	Prevent duplicate escape-string warnings when using pg_stat_statements. contrib/pg_stat_statements will sometimes run the core lexer a second time on submitted statements. Formerly, if you had standard_conforming_strings turned off, this led to sometimes getting two copies of any warnings enabled by escape_string_warning. While this is probably no longer a big deal in the field, it's a pain for regression testing. To fix, change the lexer so it doesn't consult the escape_string_warning GUC variable directly, but looks at a copy in the core_yy_extra_type state struct. Then, pg_stat_statements can change that copy to disable warnings while it's redoing the lexing. It seemed like a good idea to make this happen for all three of the GUCs consulted by the lexer, not just escape_string_warning. There's not an immediate use-case for callers to adjust the other two AFAIK, but making it possible is easy enough and seems like good future-proofing. Arguably this is a bug fix, but there doesn't seem to be enough interest to justify a back-patch. We'd not be able to back-patch exactly as-is anyway, for fear of breaking ABI compatibility of the struct. (We could perhaps back-patch the addition of only escape_string_warning by adding it at the end of the struct, where there's currently alignment padding space.)	2015-01-22 18:11:00 -05:00
Tom Lane	8e166e164c	Rearrange explain.c's API so callers need not embed sizeof(ExplainState). The folly of the previous arrangement was just demonstrated: there's no convenient way to add fields to ExplainState without breaking ABI, even if callers have no need to touch those fields. Since we might well need to do that again someday in back branches, let's change things so that only explain.c has to have sizeof(ExplainState) compiled into it. This costs one extra palloc() per EXPLAIN operation, which is surely pretty negligible.	2015-01-15 13:39:33 -05:00
Robert Haas	0b49642b99	pg_standby: Avoid writing one byte beyond the end of the buffer. Previously, read() might have returned a length equal to the buffer length, and then the subsequent store to buf[len] would write a zero-byte one byte past the end. This doesn't seem likely to be a security issue, but there's some chance it could result in pg_standby misbehaving. Spotted by Coverity; patch by Michael Paquier, reviewed by me.	2015-01-15 09:26:03 -05:00
Robert Haas	4a0a5f21fa	vacuumlo: Avoid unlikely memory leak. Spotted by Coverity. This isn't likely to matter in practice, but there's no harm in fixing it. Michael Paquier	2015-01-14 15:14:20 -05:00
Heikki Linnakangas	e37d474f91	Silence Coverity warnings about unused return values from pushJsonbValue() Similar warnings from backend were silenced earlier by commit `c8315930`, but there were a few more contrib/hstore. Michael Paquier	2015-01-13 14:33:05 +02:00
Bruce Momjian	ac7009abd2	pg_upgrade: fix one-byte per empty db memory leak Report by Tatsuo Ishii, Coverity	2015-01-09 12:12:30 -05:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Andres Freund	8cadeb792c	Correctly handle test durations of more than 2147s in pg_test_timing. Previously the computation of the total test duration, measured in microseconds, accidentally overflowed due to accidentally using signed 32bit arithmetic. As the only consequence is that pg_test_timing invocations with such, overly large, durations never finished the practical consequences of this bug are minor. Pointed out by Coverity. Backpatch to 9.2 where pg_test_timing was added.	2015-01-04 15:44:49 +01:00
Andres Freund	d1c575230d	Fix off-by-one in pg_xlogdump's fuzzy_open_file(). In the unlikely case of stdin (fd 0) being closed, the off-by-one would lead to pg_xlogdump failing to open files. Spotted by Coverity. Backpatch to 9.3 where pg_xlogdump was introduced.	2015-01-04 15:35:46 +01:00
Andres Freund	58bc4747be	Add missing va_end() call to a early exit in dmetaphone.c's StringAt(). Pointed out by Coverity. Backpatch to all supported branches, the code has been that way for a long while.	2015-01-04 15:35:46 +01:00
Tatsuo Ishii	3b5a89c482	Fix resource leak pointed out by Coverity.	2014-12-30 20:33:01 +09:00
Bruce Momjian	83bcc70459	pgbench: remove odd trailing period in init progress output	2014-12-24 09:21:09 -05:00
Heikki Linnakangas	7f0dccaed6	Turn much of the btree_gin macros into real functions. This makes the functions much nicer to read and edit, and also makes debugging easier.	2014-12-22 17:11:53 +02:00
Tom Lane	4a14f13a0a	Improve hash_create's API for selecting simple-binary-key hash functions. Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane	2014-12-18 13:36:36 -05:00
Noah Misch	f6dc6dd5ba	Lock down regression testing temporary clusters on Windows. Use SSPI authentication to allow connections exclusively from the OS user that launched the test suite. This closes on Windows the vulnerability that commit `be76a6d39e` closed on other platforms. Users of "make installcheck" or custom test harnesses can run "pg_regress --config-auth=DATADIR" to activate the same authentication configuration that "make check" would use. Back-patch to 9.0 (all supported versions). Security: CVE-2014-0067	2014-12-17 22:48:40 -05:00
Tom Lane	fc2ac1fb41	Allow CHECK constraints to be placed on foreign tables. As with NOT NULL constraints, we consider that such constraints are merely reports of constraints that are being enforced by the remote server (or other underlying storage mechanism). Their only real use is to allow planner optimizations, for example in constraint-exclusion checks. Thus, the code changes here amount to little more than removal of the error that was formerly thrown for applying CHECK to a foreign table. (In passing, do a bit of cleanup of the ALTER FOREIGN TABLE reference page, which had accumulated some weird decisions about ordering etc.) Shigeru Hanada and Etsuro Fujita, reviewed by Kyotaro Horiguchi and Ashutosh Bapat.	2014-12-17 17:00:53 -05:00
Magnus Hagander	cef0ae498c	Update .gitignore for pg_upgrade Add Windows versions of generated scripts, and make sure we only ignore the scripts int he root directory. Michael Paquier	2014-12-17 11:55:22 +01:00
Tom Lane	de8e46f5f5	Suppress bogus statistics when pgbench failed to complete any transactions. Code added in 9.4 would attempt to divide by zero in such cases. Noted while testing fix for missing-pclose problem.	2014-12-16 14:53:55 -05:00
Tom Lane	d38e8d30ce	Fix file descriptor leak after failure of a \setshell command in pgbench. If the called command fails to return data, runShellCommand forgot to pclose() the pipe before returning. This is fairly harmless in the current code, because pgbench would then abandon further processing of that client thread; so no more than nclients descriptors could be leaked this way. But it's not hard to imagine future improvements whereby that wouldn't be true. In any case, it's sloppy coding, so patch all branches. Found by Coverity.	2014-12-16 13:31:42 -05:00
Tom Lane	8ec8760fc8	Revert misguided change to postgres_fdw FOR UPDATE/SHARE code. In commit `462bd95705`, I changed postgres_fdw to rely on get_plan_rowmark() instead of get_parse_rowmark(). I still think that's a good idea in the long run, but as Etsuro Fujita pointed out, it doesn't work today because planner.c forces PlanRowMarks to have markType = ROW_MARK_COPY for all foreign tables. There's no urgent reason to change this in the back branches, so let's just revert that part of yesterday's commit rather than trying to design a better solution under time pressure. Also, add a regression test case showing what postgres_fdw does with FOR UPDATE/SHARE. I'd blithely assumed there was one already, else I'd have realized yesterday that this code didn't work.	2014-12-12 12:41:49 -05:00
Tom Lane	462bd95705	Fix planning of SELECT FOR UPDATE on child table with partial index. Ordinarily we can omit checking of a WHERE condition that matches a partial index's condition, when we are using an indexscan on that partial index. However, in SELECT FOR UPDATE we must include the "redundant" filter condition in the plan so that it gets checked properly in an EvalPlanQual recheck. The planner got this mostly right, but improperly omitted the filter condition if the index in question was on an inheritance child table. In READ COMMITTED mode, this could result in incorrectly returning just-updated rows that no longer satisfy the filter condition. The cause of the error is using get_parse_rowmark() when get_plan_rowmark() is what should be used during planning. In 9.3 and up, also fix the same mistake in contrib/postgres_fdw. It's currently harmless there (for lack of inheritance support) but wrong is wrong, and the incorrect code might get copied to someplace where it's more significant. Report and fix by Kyotaro Horiguchi. Back-patch to all supported branches.	2014-12-11 21:02:25 -05:00
Alvaro Herrera	dcbfc00aba	pg_xlogdump/.gitignore: add committsdesc.c Author: Michael Paquier	2014-12-09 09:54:14 -03:00
Heikki Linnakangas	ebc2b681b8	Fix pg_xlogdump's calculation of full-page image data. The old formula was completely bogus with the new WAL record format.	2014-12-05 11:40:27 +02:00
Peter Eisentraut	1e95bbc870	Fix SHLIB_PREREQS use in contrib, allowing PGXS builds dblink and postgres_fdw use SHLIB_PREREQS = submake-libpq to build libpq first. This doesn't work in a PGXS build, because there is no libpq to build. So just omit setting SHLIB_PREREQS in this case. Note that PGXS users can still use SHLIB_PREREQS (although it is not documented). The problem here is only that contrib modules can be built in-tree or using PGXS, and the prerequisite is only applicable in the former case. Commit `6697aa2bc2` previously attempted to address this by creating a somewhat fake submake-libpq target in Makefile.global. That was not the right fix, and it was also done in a nonportable way, so revert that.	2014-12-04 07:58:12 -05:00
Alvaro Herrera	73c986adde	Keep track of transaction commit timestamps Transactions can now set their commit timestamp directly as they commit, or an external transaction commit timestamp can be fed from an outside system using the new function TransactionTreeSetCommitTsData(). This data is crash-safe, and truncated at Xid freeze point, same as pg_clog. This module is disabled by default because it causes a performance hit, but can be enabled in postgresql.conf requiring only a server restart. A new test in src/test/modules is included. Catalog version bumped due to the new subdirectory within PGDATA and a couple of new SQL functions. Authors: Álvaro Herrera and Petr Jelínek Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven Singer, Peter Eisentraut	2014-12-03 11:53:02 -03:00
Andres Freund	0fd38e1370	Don't skip SQL backends in logical decoding for visibility computation. The logical decoding patchset introduced PROC_IN_LOGICAL_DECODING flag PGXACT flag, that allows such backends to be skipped when computing the xmin horizon/snapshots. That's fine and sensible for walsenders streaming out logical changes, but not at all fine for SQL backends doing logical decoding. If the latter set that flag any change they have performed outside of logical decoding will not be regarded as visible - which e.g. can lead to that change being vacuumed away. Note that not setting the flag for SQL backends isn't particularly bothersome - the SQL backend doesn't do streaming, so it only runs for a limited amount of time. Per buildfarm member 'tick' and Alvaro. Backpatch to 9.4, where logical decoding was introduced.	2014-12-02 23:47:08 +01:00
Alvaro Herrera	b52cb4690e	pageinspect/BRIN: minor tweaks Michael Paquier Double-dash additions suggested by Peter Geoghegan	2014-12-02 12:20:50 -03:00
Andrew Dunstan	e09996ff8d	Fix hstore_to_json_loose's detection of valid JSON number values. We expose a function IsValidJsonNumber that internally calls the lexer for json numbers. That allows us to use the same test everywhere, instead of inventing a broken test for hstore conversions. The new function is also used in datum_to_json, replacing the code that is now moved to the new function. Backpatch to 9.3 where hstore_to_json_loose was introduced.	2014-12-01 11:28:45 -05:00
Alvaro Herrera	22dfd116a1	Move test modules from contrib to src/test/modules This is advance preparation for introducing even more test modules; the easy solution is to add them to contrib, but that's bloated enough that it seems a good time to think of something different. Moved modules are dummy_seclabel, test_shm_mq, test_parser and worker_spi. (test_decoding was also a candidate, but there was too much opposition to moving that one. We can always reconsider later.)	2014-11-29 23:55:00 -03:00
Tom Lane	f4e031c662	Add bms_next_member(), and use it where appropriate. This patch adds a way of iterating through the members of a bitmapset nondestructively, unlike the old way with bms_first_member(). While bms_next_member() is very slightly slower than bms_first_member() (at least for typical-size bitmapsets), eliminating the need to palloc and pfree a temporary copy of the target bitmapset is a significant win. So this method should be preferred in all cases where a temporary copy would be necessary. Tom Lane, with suggestions from Dean Rasheed and David Rowley	2014-11-28 13:37:25 -05:00
Tom Lane	c168ba3112	Free libxml2/libxslt resources in a safer order. Mark Simonetti reported that libxslt sometimes crashes for him, and that swapping xslt_process's object-freeing calls around to do them in reverse order of creation seemed to fix it. I've not reproduced the crash, but valgrind clearly shows a reference to already-freed memory, which is consistent with the idea that shutdown of the xsltTransformContext is trying to reference the already-freed stylesheet or input document. With this patch, valgrind is no longer unhappy. I have an inquiry in to see if this is a libxslt bug or if we're just abusing the library; but even if it's a library bug, we'd want to adjust our code so it doesn't fail with unpatched libraries. Back-patch to all supported branches, because we've been doing this in the wrong(?) order for a long time.	2014-11-27 11:13:29 -05:00
Heikki Linnakangas	e453cc2741	Make Port->ssl_in_use available, even when built with !USE_SSL Code that check the flag no longer need #ifdef's, which is more convenient. In particular, makes it easier to write extensions that depend on it. In the passing, modify sslinfo's ssl_is_used function to check ssl_in_use instead of the OpenSSL specific 'ssl' pointer. It doesn't make any difference currently, as sslinfo is only compiled when built with OpenSSL, but seems cleaner anyway.	2014-11-25 09:46:11 +02:00
Robert Haas	f5d9698a84	Add infrastructure to save and restore GUC values. This is further infrastructure for parallelism. Amit Khandekar, Noah Misch, Robert Haas	2014-11-24 16:37:56 -05:00
Tom Lane	9c58101117	Fix mishandling of system columns in FDW queries. postgres_fdw would send query conditions involving system columns to the remote server, even though it makes no effort to ensure that system columns other than CTID match what the remote side thinks. tableoid, in particular, probably won't match and might have some use in queries. Hence, prevent sending conditions that include non-CTID system columns. Also, create_foreignscan_plan neglected to check local restriction conditions while determining whether to set fsSystemCol for a foreign scan plan node. This again would bollix the results for queries that test a foreign table's tableoid. Back-patch the first fix to 9.3 where postgres_fdw was introduced. Back-patch the second to 9.2. The code is probably broken in 9.1 as well, but the patch doesn't apply cleanly there; given the weak state of support for FDWs in 9.1, it doesn't seem worth fixing. Etsuro Fujita, reviewed by Ashutosh Bapat, and somewhat modified by me	2014-11-22 16:01:05 -05:00
Heikki Linnakangas	3a82bc6f8a	Add pageinspect functions for inspecting GIN indexes. Patch by me, Peter Geoghegan and Michael Paquier, reviewed by Amit Kapila.	2014-11-21 11:58:07 +02:00
Heikki Linnakangas	2c03216d83	Revamp the WAL record format. Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.	2014-11-20 18:46:41 +02:00
Robert Haas	a016555361	Avoid file descriptor leak in pg_test_fsync. This can cause problems on Windows, where files that are still open can't be unlinked. Jeff Janes	2014-11-19 12:06:24 -05:00
Alvaro Herrera	f9ef578d05	postgres_fdw.h: don't pull in rel.h when relcache.h is enough	2014-11-14 21:48:53 -03:00
Andres Freund	89fd41b390	Fix and improve cache invalidation logic for logical decoding. There are basically three situations in which logical decoding needs to perform cache invalidation. During/After replaying a transaction with catalog changes, when skipping a uninteresting transaction that performed catalog changes and when erroring out while replaying a transaction. Unfortunately these three cases were all done slightly differently - partially because `8de3e410fa`, which greatly simplifies matters, got committed in the midst of the development of logical decoding. The actually problematic case was when logical decoding skipped transaction commits (and thus processed invalidations). When used via the SQL interface cache invalidation could access the catalog - bad, because we didn't set up enough state to allow that correctly. It'd not be hard to setup sufficient state, but the simpler solution is to always perform cache invalidation outside a valid transaction. Also make the different cache invalidation cases look as similar as possible, to ease code review. This fixes the assertion failure reported by Antonin Houska in 53EE02D9.7040702@gmail.com. The presented testcase has been expanded into a regression test. Backpatch to 9.4, where logical decoding was introduced.	2014-11-13 20:34:31 +01:00
Robert Haas	c0828b78e9	Move the guts of our Levenshtein implementation into core. The hope is that we can use this to produce better diagnostics in some cases. Peter Geoghegan, reviewed by Michael Paquier, with some further changes by me.	2014-11-13 12:33:26 -05:00
Andres Freund	ec5896aed3	Fix several weaknesses in slot and logical replication on-disk serialization. Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and snapbuild.c were missing the FIN_CRC32 call when computing/checking checksums of on disk files. That doesn't lower the the error detection capabilities of the checksum, but is inconsistent with other usages. In a followup mail Heikki also noticed that, contrary to a comment, the 'version' and 'length' struct fields of replication slot's on disk data where not covered by the checksum. That's not likely to lead to actually missed corruption as those fields are cross checked with the expected version and the actual file length. But it's wrong nonetheless. As fixing these issues makes existing on disk files unreadable, bump the expected versions of on disk files for both slots and logical decoding historic catalog snapshots. This means that loading old files will fail with ERROR: "replication slot file ... has unsupported version 1" and ERROR: "snapbuild state file ... has unsupported version 1 instead of 2" respectively. Given the low likelihood of anybody already using these new features in a production setup that seems acceptable. Fixing these issues made me notice that there's no regression test covering the loading of historic snapshot from disk - so add one. Backpatch to 9.4 where these features were introduced.	2014-11-12 18:52:49 +01:00
Andres Freund	bd4ae0f396	Add interrupt checks to contrib/pg_prewarm. Currently the extension's pg_prewarm() function didn't check interrupts once it started "warming" data. Since individual calls can take a long while it's important for them to be interruptible. Backpatch to 9.4 where pg_prewarm was introduced.	2014-11-12 18:52:49 +01:00
Tom Lane	f2ad2bdd0a	Loop when necessary in contrib/pgcrypto's pktreader_pull(). This fixes a scenario in which pgp_sym_decrypt() failed with "Wrong key or corrupt data" on messages whose length is 6 less than a power of 2. Per bug #11905 from Connor Penhale. Fix by Marko Tiikkaja, regression test case from Jeff Janes.	2014-11-11 17:22:15 -05:00
Robert Haas	99e8f08fab	Update pg_xlogdump's .gitignore for brindesc.c.	2014-11-07 15:41:52 -05:00
Alvaro Herrera	7516f52594	BRIN: Block Range Indexes BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.	2014-11-07 16:38:14 -03:00
Heikki Linnakangas	2076db2aea	Move the backup-block logic from XLogInsert to a new file, xloginsert.c. xlog.c is huge, this makes it a little bit smaller, which is nice. Functions related to putting together the WAL record are in xloginsert.c, and the lower level stuff for managing WAL buffers and such are in xlog.c. Also move the definition of XLogRecord to a separate header file. This causes churn in the #includes of all the files that write WAL records, and redo routines, but it avoids pulling in xlog.h into most places. Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.	2014-11-06 13:55:36 +02:00
Tom Lane	66c029c842	Fix volatility markings of some contrib I/O functions. In general, datatype I/O functions are supposed to be immutable or at worst stable. Some contrib I/O functions were, through oversight, not marked with any volatility property at all, which made them VOLATILE. Since (most of) these functions actually behave immutably, the erroneous marking isn't terribly harmful; but it can be user-visible in certain circumstances, as per a recent bug report from Joe Van Dyk in which a cast to text was disallowed in an expression index definition. To fix, just adjust the declarations in the extension SQL scripts. If we were being very fussy about this, we'd bump the extension version numbers, but that seems like more trouble (for both developers and users) than the problem is worth. A fly in the ointment is that chkpass_in actually is volatile, because of its use of random() to generate a fresh salt when presented with a not-yet-encrypted password. This is bad because of the general assumption that I/O functions aren't volatile: the consequence is that records or arrays containing chkpass elements may have input behavior a bit different from a bare chkpass column. But there seems no way to fix this without breaking existing usage patterns for chkpass, and the consequences of the inconsistency don't seem bad enough to justify that. So for the moment, just document it in a comment. Since we're not bumping version numbers, there seems no harm in back-patching these fixes; at least future installations will get the functions marked correctly.	2014-11-05 11:34:11 -05:00
Heikki Linnakangas	5028f22f6e	Switch to CRC-32C in WAL and other places. The old algorithm was found to not be the usual CRC-32 algorithm, used by Ethernet et al. We were using a non-reflected lookup table with code meant for a reflected lookup table. That's a strange combination that AFAICS does not correspond to any bit-wise CRC calculation, which makes it difficult to reason about its properties. Although it has worked well in practice, seems safer to use a well-known algorithm. Since we're changing the algorithm anyway, we might as well choose a different polynomial. The Castagnoli polynomial has better error-correcting properties than the traditional CRC-32 polynomial, even if we had implemented it correctly. Another reason for picking that is that some new CPUs have hardware support for calculating CRC-32C, but not CRC-32, let alone our strange variant of it. This patch doesn't add any support for such hardware, but a future patch could now do that. The old algorithm is kept around for tsquery and pg_trgm, which use the values in indexes that need to remain compatible so that pg_upgrade works. While we're at it, share the old lookup table for CRC-32 calculation between hstore, ltree and core. They all use the same table, so might as well.	2014-11-04 11:39:48 +02:00
Tom Lane	f443de873e	Docs: fix incorrect spelling of contrib/pgcrypto option. pgp_sym_encrypt's option is spelled "sess-key", not "enable-session-key". Spotted by Jeff Janes. In passing, improve a comment in pgp-pgsql.c to make it clearer that the debugging options are intentionally undocumented.	2014-11-03 11:11:34 -05:00
Noah Misch	1ed8e771ad	Remove dead-since-introduction pgcrypto code. Marko Tiikkaja	2014-11-02 21:43:39 -05:00
Peter Eisentraut	83dc5908c2	pg_test_fsync: Update output format Apparently, computers are now a bit faster than when this was first added, so we need to make room for a digit or two in the ops/sec format. While we're at it, adjust some of the other output for a more consistent line length.	2014-10-20 15:36:51 -04:00
Tom Lane	488a7c9ccf	Fix file-identification comment in contrib/pgcrypto/pgcrypto--1.2.sql. Cosmetic oversight in commit `32984d8fc3`. Marko Tiikkaja	2014-10-20 10:53:57 -04:00
Tom Lane	b2cbced9ee	Support timezone abbreviations that sometimes change. Up to now, PG has assumed that any given timezone abbreviation (such as "EDT") represents a constant GMT offset in the usage of any particular region; we had a way to configure what that offset was, but not for it to be changeable over time. But, as with most things horological, this view of the world is too simplistic: there are numerous regions that have at one time or another switched to a different GMT offset but kept using the same timezone abbreviation. Almost the entire Russian Federation did that a few years ago, and later this month they're going to do it again. And there are similar examples all over the world. To cope with this, invent the notion of a "dynamic timezone abbreviation", which is one that is referenced to a particular underlying timezone (as defined in the IANA timezone database) and means whatever it currently means in that zone. For zones that use or have used daylight-savings time, the standard and DST abbreviations continue to have the property that you can specify standard or DST time and get that time offset whether or not DST was theoretically in effect at the time. However, the abbreviations mean what they meant at the time in question (or most recently before that time) rather than being absolutely fixed. The standard abbreviation-list files have been changed to use this behavior for abbreviations that have actually varied in meaning since 1970. The old simple-numeric definitions are kept for abbreviations that have not changed, since they are a bit faster to resolve. While this is clearly a new feature, it seems necessary to back-patch it into all active branches, because otherwise use of Russian zone abbreviations is going to become even more problematic than it already was. This change supersedes the changes in commit `513d06ded` et al to modify the fixed meanings of the Russian abbreviations; since we've not shipped that yet, this will avoid an undesirably incompatible (not to mention incorrect) change in behavior for timestamps between 2011 and 2014. This patch makes some cosmetic changes in ecpglib to keep its usage of datetime lookup tables as similar as possible to the backend code, but doesn't do anything about the increasingly obsolete set of timezone abbreviation definitions that are hard-wired into ecpglib. Whatever we do about that will likely not be appropriate material for back-patching. Also, a potential free() of a garbage pointer after an out-of-memory failure in ecpglib has been fixed. This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that caused it to produce unexpected results near a timezone transition, if both the "before" and "after" states are marked as standard time. We'd only ever thought about or tested transitions between standard and DST time, but that's not what's happening when a zone simply redefines their base GMT offset. In passing, update the SGML documentation to refer to the Olson/zoneinfo/ zic timezone database as the "IANA" database, since it's now being maintained under the auspices of IANA.	2014-10-16 15:22:10 -04:00
Tom Lane	90063a7612	Print planning time only in EXPLAIN ANALYZE, not plain EXPLAIN. We've gotten enough push-back on that change to make it clear that it wasn't an especially good idea to do it like that. Revert plain EXPLAIN to its previous behavior, but keep the extra output in EXPLAIN ANALYZE. Per discussion. Internally, I set this up as a separate flag ExplainState.summary that controls printing of planning time and execution time. For now it's just copied from the ANALYZE option, but we could consider exposing it to users.	2014-10-15 18:50:13 -04:00
Heikki Linnakangas	98aed6c721	Add --latency-limit option to pgbench. This allows transactions that take longer than specified limit to be counted separately. With --rate, transactions that are already late by the time we get to execute them are skipped altogether. Using --latency-limit with --rate allows you to "catch up" more quickly, if there's a hickup in the server causing a lot of transactions to stall momentarily. Fabien COELHO, reviewed by Rukh Meski and heavily refactored by me.	2014-10-13 20:50:24 +03:00
Bruce Momjian	dc9c612767	pg_upgrade: prefix Unix shell script name output with "./" This more clearly suggests the current directory. While this also works on Windows, it might be confusing. Report by Christoph Berg	2014-10-11 18:38:41 -04:00
Heikki Linnakangas	733be2a5cd	Remove unnecessary initialization of local variables. Oops, forgot these in the prveious commit.	2014-10-10 13:00:53 +03:00
Heikki Linnakangas	33755e8edf	Change the way encoding and locale checks are done in pg_upgrade. Lc_collate and lc_ctype have been per-database settings since server version 8.4, but pg_upgrade was still treating them as cluster-wide options. It fetched the values for the template0 databases in old and new cluster, and compared them. That's backwards; the encoding and locale of the template0 database doesn't matter, as template0 is guaranteed to contain only ASCII characters. But if there are any other databases that exist on both clusters (in particular template1 and postgres databases), their encodings and locales must be compatible. Also, make the locale comparison more lenient. If the locale names are not equal, try to canonicalize both of them by passing them to setlocale(). We used to do that only when upgrading from 9.1 or below, but it seems like a good idea even with newer versions. If we change the canonical form of a locale, this allows pg_upgrade to still work. I'm about to do just that to fix bug #11431, by mapping a locale name that contains non-ASCII characters to a pure-ASCII alias of the same locale. No backpatching, because earlier versions of pg_upgrade still support upgrading from 8.3 servers. That would be more complicated, so it doesn't seem worth it, given that we haven't received any complaints about this from users.	2014-10-10 10:39:32 +03:00
Heikki Linnakangas	86f809088c	Fix typo in error message.	2014-10-02 15:51:31 +03:00
Heikki Linnakangas	84f0ea3f68	Refactor pgbench log-writing code to a separate function. The doCustom function was incredibly long, this makes it a little bit more readable.	2014-10-02 14:01:19 +03:00
Heikki Linnakangas	32984d8fc3	Add functions for dealing with PGP armor header lines to pgcrypto. This add a new pgp_armor_headers function to extract armor headers from an ASCII-armored blob, and a new overloaded variant of the armor function, for constructing an ASCII-armor with extra headers. Marko Tiikkaja and me.	2014-10-01 16:03:39 +03:00
Andres Freund	0ef3c29a4b	Improve documentation about binary/textual output mode for output plugins. Also improve related error message as it contributed to the confusion. Discussion: CAB7nPqQrqFzjqCjxu4GZzTrD9kpj6HMn9G5aOOMwt1WZ8NfqeA@mail.gmail.com, CAB7nPqQXc_+g95zWnqaa=mVQ4d3BVRs6T41frcEYi2ocUrR3+A@mail.gmail.com Per discussion between Michael Paquier, Robert Haas and Andres Freund Backpatch to 9.4 where logical decoding was introduced.	2014-10-01 13:22:17 +02:00
Bruce Momjian	35419aeb83	pg_upgrade: have pg_upgrade fail for old 9.4 JSONB format Backpatch through 9.4	2014-09-29 20:19:59 -04:00
Andres Freund	9b6bb9b471	Define META_FREE in a way that doesn't cause -Wempty-body warnings. That get rids of the only -Wempty-body warning when compiling postgres with gcc 4.8/9. As `6550b901f` shows, it's useful to be able to use that option routinely. Without asserts there's many more warnings, but that's food for another commit.	2014-09-26 02:55:38 +02:00
Heikki Linnakangas	1dcfb8da09	Refactor space allocation for base64 encoding/decoding in pgcrypto. Instead of trying to accurately calculate the space needed, use a StringInfo that's enlarged as needed. This is just moving things around currently - the old code was not wrong - but this is in preparation for a patch that adds support for extra armor headers, and would make the space calculation more complicated. Marko Tiikkaja	2014-09-25 16:36:58 +03:00
Andres Freund	604f7956b9	Improve code around the recently added rm_identify rmgr callback. There are four weaknesses in728f152e07f998d2cb4fe5f24ec8da2c3bda98f2: * append_init() in heapdesc.c was ugly and required that rm_identify return values are only valid till the next call. Instead just add a couple more switch() cases for the INIT_PAGE cases. Now the returned value will always be valid. * a couple rm_identify() callbacks missed masking xl_info with ~XLR_INFO_MASK. * pg_xlogdump didn't map a NULL rm_identify to UNKNOWN or a similar string. * append_init() was called when id=NULL - which should never actually happen. But it's better to be careful.	2014-09-22 17:49:34 +02:00
Tom Lane	898f8a96ef	Fix failure of contrib/auto_explain to print per-node timing information. This has been broken since commit `af7914c662`, which added the EXPLAIN (TIMING) option. Although that commit included updates to auto_explain, they evidently weren't tested very carefully, because the code failed to print node timings even when it should, due to failure to set es.timing in the ExplainState struct. Reported off-list by Neelakanth Nadgir of Salesforce. In passing, clean up the documentation for auto_explain's options a little bit, including re-ordering them into what seems to me a more logical order.	2014-09-19 13:19:27 -04:00
Andres Freund	bdd5726c34	Add the capability to display summary statistics to pg_xlogdump. The new --stats/--stats=record options to pg_xlogdump display per rmgr/per record statistics about the parsed WAL. This is useful to understand what the WAL primarily consists of, to allow targeted optimizations on application, configuration, and core code level. It is likely that we will want to fine tune the statistics further, but the feature already is quite helpful. Author: Abhijit Menon-Sen, slightly editorialized by me Reviewed-By: Andres Freund, Dilip Kumar and Furuya Osamu Discussion: 20140604104716.GA3989@toroid.org	2014-09-19 16:33:16 +02:00
Andres Freund	728f152e07	Add rmgr callback to name xlog record types for display purposes. This is primarily useful for the upcoming pg_xlogdump --stats feature, but also allows to remove some duplicated code in the rmgr_desc routines. Due to the separation and harmonization, the output of dipsplayed records changes somewhat. But since this isn't enduser oriented content that's ok. It's potentially desirable to further change pg_xlogdump's display of records. It previously wasn't possible to show the record type separately from the description forcing it to be in the last column. But that's better done in a separate commit. Author: Abhijit Menon-Sen, slightly editorialized by me Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas Discussion: 20140604104716.GA3989@toroid.org	2014-09-19 16:20:29 +02:00
Bruce Momjian	c3c75fcd7a	pg_upgrade: adjust C comments	2014-09-11 18:44:00 -04:00
Heikki Linnakangas	01a2bfd172	Fix Windows build. I renamed a variable, but missed an #ifdef WIN32 block.	2014-09-11 15:15:40 +03:00
Heikki Linnakangas	54a2d5b37b	Simplify calculation of Poisson distributed delays in pgbench --rate mode. The previous coding first generated a uniform random value between 0.0 and 1.0, then converted that to an integer between 1 and 10000, and divided that again by 10000. Those conversions are unnecessary; we can use the double value that pg_erand48() returns directly. While we're at it, put the logic into a helper function, getPoissonRand(). The largest delay generated by the old coding was about 9.2 times the average, because of the way the uniformly distributed value used for the calculation was truncated to 1/10000 granularity. The new coding doesn't have such clamping. With my laptop's DBL_MIN value, the maximum delay with the new coding is about 700x the average. That seems acceptable - any reasonable pgbench session should last long enough to average that out. Backpatch to 9.4.	2014-09-11 13:00:48 +03:00
Heikki Linnakangas	02e3bcc661	Change the way latency is calculated with pgbench --rate option. The reported latency values now include the "schedule lag" time, that is, the time between the transaction's scheduled start time and the time it actually started. This relates better to a model where requests arrive at a certain rate, and we are interested in the response time to the end user or application, rather than the response time of the database itself. Also, when --rate is used, include the schedule lag time in the log output. The --rate option is new in 9.4, so backpatch to 9.4. It seems better to make this change in 9.4, while we're still in the beta period, than ship a 9.4 version that calculates the values differently than 9.5.	2014-09-11 12:57:32 +03:00
Bruce Momjian	acc8e41681	pg_upgrade: compare control version, not catalog version Also modify test for the possibility the large object value might not exist in the old cluster. Fix for commit `e1598a15f4`	2014-09-10 20:22:10 -04:00
Bruce Momjian	e1598a15f4	pg_upgrade: check for large object size compatibility	2014-09-10 19:23:36 -04:00
Peter Eisentraut	220bb39dee	doc: Reflect renaming of Mac OS X to OS X bug #10528	2014-09-09 13:56:29 -04:00
Bruce Momjian	a74a4aa23b	pg_upgrade: preserve the timestamp epoch This is useful for replication tools like Slony and Skytools. Report by Sergey Konoplev	2014-09-05 19:19:41 -04:00
Peter Eisentraut	303f4d1012	Assorted message fixes and improvements	2014-09-05 01:25:27 -04:00
Andres Freund	d6fa44fce7	Add skip-empty-xacts option to test_decoding for use in the regression tests. The regression tests for contrib/test_decoding regularly failed on postgres instances that were very slow. Either because the hardware itself was slow or because very expensive debugging options like CLOBBER_CACHE_ALWAYS were used. The reason they failed was just that some additional transactions were decoded. Analyze and vacuum, triggered by autovac. To fix just add a option to test_decoding to only display transactions in which a change was actually displayed. That's not pretty because it removes information from the tests; but better than constantly failing tests in very likely harmless ways. Backpatch to 9.4 where logical decoding was introduced. Discussion: 20140629142511.GA26930@awork2.anarazel.de	2014-09-01 15:59:44 +02:00
Andres Freund	4b4b680c3d	Make backend local tracking of buffer pins memory efficient. Since the dawn of time (aka Postgres95) multiple pins of the same buffer by one backend have been optimized not to modify the shared refcount more than once. This optimization has always used a NBuffer sized array in each backend keeping track of a backend's pins. That array (PrivateRefCount) was one of the biggest per-backend memory allocations, depending on the shared_buffers setting. Besides the waste of memory it also has proven to be a performance bottleneck when assertions are enabled as we make sure that there's no remaining pins left at the end of transactions. Also, on servers with lots of memory and a correspondingly high shared_buffers setting the amount of random memory accesses can also lead to poor cpu cache efficiency. Because of these reasons a backend's buffers pins are now kept track of in a small statically sized array that overflows into a hash table when necessary. Benchmarks have shown neutral to positive performance results with considerably lower memory usage. Patch by me, review by Robert Haas. Discussion: 20140321182231.GA17111@alap3.anarazel.de	2014-08-30 14:03:21 +02:00
Tom Lane	7f7eec89b6	Fix citext upgrade script for disallowance of oidvector element assignment. In commit `45e02e3232`, we intentionally disallowed updates on individual elements of oidvector columns. While that still seems like a sane idea in the abstract, we (I) forgot that citext's "upgrade from unpackaged" script did in fact perform exactly such updates, in order to fix the problem that citext indexes should have a collation but would not in databases dumped or upgraded from pre-9.1 installations. Even if we wanted to add casts to allow such updates, there's no practical way to do so in the back branches, so the only real alternative is to make citext's kluge even klugier. In this patch, I cast the oidvector to text, fix its contents with regexp_replace, and cast back to oidvector. (Ugh!) Since the aforementioned commit went into all active branches, we have to fix this in all branches that contain the now-broken update script. Per report from Eric Malm.	2014-08-28 18:21:05 -04:00
Peter Eisentraut	2d759341d9	Fix whitespace	2014-08-26 17:26:45 -04:00
Andres Freund	57ca1d4f01	Specify the port in dblink and postgres_fdw tests. That allows to run those tests against a postmaster listening on a nonstandard port without requiring to export PGPORT in postmaster's environment. This still doesn't support connecting to a nondefault host without configuring it in postmaster's environment. That's harder and less frequently used though. So this is a useful step.	2014-08-26 12:28:08 +02:00
Andres Freund	ddc2504dbc	Don't hardcode contrib_regression dbname in postgres_fdw and dblink tests. That allows parallel installcheck to succeed inside contrib/. The output is not particularly pretty unless make's -O option to synchronize the output is used. There's other tests, outside contrib, that use a hardcoded, non-unique, database name. Those prohibit paralell installcheck to be used across more directories; but that's something for a separate patch.	2014-08-26 12:27:26 +02:00

1 2 3 4 5 ...

2891 Commits