Commit Graph

2891 Commits

Author SHA1 Message Date
Andres Freund 5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
Peter Eisentraut f95425478a Fix hstore_plperl regression tests on some platforms
On some platforms, plperl and plperlu cannot be loaded at the same
time.  So split the test into two separate test files.
2015-04-26 16:13:58 -04:00
Peter Eisentraut cac7658205 Add transforms feature
This provides a mechanism for specifying conversions between SQL data
types and procedural languages.  As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund
2015-04-26 10:33:14 -04:00
Tom Lane f320cbb615 Fix typo in linux startup script.
Missed a "$" in what was meant to be a variable substitution.  Careless
mistake in commit f23425fa95.
2015-04-26 09:43:15 -04:00
Peter Eisentraut dcae5facca Improve speed of make check-world
Before, make check-world would create a new temporary installation for
each test suite, which is slow and wasteful.  Instead, we now create one
test installation that is used by all test suites that are part of a
make run.

The management of the temporary installation is removed from pg_regress
and handled in the makefiles.  This allows for better control, and
unifies the code with that of test suites not run through pg_regress.

review and msvc support by Michael Paquier <michael.paquier@gmail.com>

more review by Fabien Coelho <coelho@cri.ensmp.fr>
2015-04-23 08:59:52 -04:00
Stephen Frost 4ccc5bd28e Pull in tableoid for inheiritance with rowMarks
As noted by Etsuro Fujita [1] and Dean Rasheed[2],
cb1ca4d800 changed ExecBuildAuxRowMark()
to always look for the tableoid in the target list, but didn't also
change preprocess_targetlist() to always include the tableoid.  This
resulted in errors with soon-to-be-added RLS with inheritance tests,
and errors when using inheritance with foreign tables.

Authors: Etsuro Fujita and Dean Rasheed (independently)

Minor word-smithing on the comments by me.

[1] 552CF0B6.8010006@lab.ntt.co.jp
[2] CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com
2015-04-22 11:29:35 -04:00
Andres Freund cef939c347 Rename pg_replication_slot's new active_in to active_pid.
In d811c037ce active_in was added but discussion since showed that
active_pid is preferred as a name.

Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-22 09:43:40 +02:00
Peter Eisentraut b0a738f428 Move pg_xlogdump from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-21 19:03:49 -04:00
Andres Freund d811c037ce Add 'active_in' column to pg_replication_slots.
Right now it is visible whether a replication slot is active in any
session, but not in which.  Adding the active_in column, containing the
pid of the backend having acquired the slot, makes it much easier to
associate pg_replication_slots entries with the corresponding
pg_stat_replication/pg_stat_activity row.

This should have been done from the start, but I (Andres) dropped the
ball there somehow.

Author: Craig Ringer, revised by me Discussion:
CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-21 11:51:06 +02:00
Peter Eisentraut 528c2e44ab Move pg_test_timing from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-20 21:30:12 -04:00
Peter Eisentraut 00882d9e5c Move pg_test_fsync from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-19 22:20:49 -04:00
Peter Eisentraut 9fa8b0ee90 Move pg_upgrade from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-14 19:26:38 -04:00
Peter Eisentraut 30982be4e5 Integrate pg_upgrade_support module into backend
Previously, these functions were created in a schema "binary_upgrade",
which was deleted after pg_upgrade was finished.  Because we don't want
to keep that schema around permanently, move them to pg_catalog but
rename them with a binary_upgrade_... prefix.

The provided functions are only small wrappers around global variables
that were added specifically for pg_upgrade use, so keeping the module
separate does not create any modularity.

The functions still check that they are only called in binary upgrade
mode, so it is not possible to call these during normal operation.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-14 19:26:37 -04:00
Heikki Linnakangas 4f700bcd20 Reorganize our CRC source files again.
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.

Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-04-14 17:03:42 +03:00
Peter Eisentraut 81134af3ec Move pgbench from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-13 13:07:16 -04:00
Peter Eisentraut 83aca89f7c Move pg_archivecleanup from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-11 23:29:18 -04:00
Alvaro Herrera 27846f02c1 Optimize locking a tuple already locked by another subxact
Locking and updating the same tuple repeatedly led to some strange
multixacts being created which had several subtransactions of the same
parent transaction holding locks of the same strength.  However,
once a subxact of the current transaction holds a lock of a given
strength, it's not necessary to acquire the same lock again.  This made
some coding patterns much slower than required.

The fix is twofold.  First we change HeapTupleSatisfiesUpdate to return
HeapTupleBeingUpdated for the case where the current transaction is
already a single-xid locker for the given tuple; it used to return
HeapTupleMayBeUpdated for that case.  The new logic is simpler, and the
change to pgrowlocks is a testament to that: previously we needed to
check for the single-xid locker separately in a very ugly way.  That
test is simpler now.

As fallout from the HTSU change, some of its callers need to be amended
so that tuple-locked-by-own-transaction is taken into account in the
BeingUpdated case rather than the MayBeUpdated case.  For many of them
there is no difference; but heap_delete() and heap_update now check
explicitely and do not grab tuple lock in that case.

The HTSU change also means that routine MultiXactHasRunningRemoteMembers
introduced in commit 11ac4c73cb is no longer necessary and can be
removed; the case that used to require it is now handled naturally as
result of the changes to heap_delete and heap_update.

The second part of the fix to the performance issue is to adjust
heap_lock_tuple to avoid the slowness:

1. Previously we checked for the case that our own transaction already
held a strong enough lock and returned MayBeUpdated, but only in the
multixact case.  Now we do it for the plain Xid case as well, which
saves having to LockTuple.

2. If the current transaction is the only locker of the tuple (but with
a lock not as strong as what we need; otherwise it would have been
caught in the check mentioned above), we can skip sleeping on the
multixact, and instead go straight to create an updated multixact with
the additional lock strength.

3. Most importantly, make sure that both the single-xid-locker case and
the multixact-locker case optimization are applied always.  We do this
by checking both in a single place, rather than them appearing in two
separate portions of the routine -- something that is made possible by
the HeapTupleSatisfiesUpdate API change.  Previously we would only check
for the single-xid case when HTSU returned MayBeUpdated, and only
checked for the multixact case when HTSU returned BeingUpdated.  This
was at odds with what HTSU actually returned in one case: if our own
transaction was locker in a multixact, it returned MayBeUpdated, so the
optimization never applied.  This is what led to the large multixacts in
the first place.

Per bug report #8470 by Oskari Saarenmaa.
2015-04-10 13:47:15 -03:00
Robert Haas e41beea0dd Improve pgbench error reporting.
This would have been worth doing on general principle anyway, but the
recent addition of an expression syntax to pgbench makes it an even
better idea than it would have been otherwise.

Fabien Coelho
2015-04-02 16:26:49 -04:00
Andres Freund 62e2a8dc2c Define integer limits independently from the system definitions.
In 83ff1618 we defined integer limits iff they're not provided by the
system. That turns out not to be the greatest idea because there's
different ways some datatypes can be represented. E.g. on OSX PG's 64bit
datatype will be a 'long int', but OSX unconditionally uses 'long
long'. That disparity then can lead to warnings, e.g. around printf
formats.

One way to fix that would be to back int64 using stdint.h's
int64_t. While a good idea it's not that easy to implement. We would
e.g. need to include stdint.h in our external headers, which we don't
today. Also computing the correct int64 printf formats in that case is
nontrivial.

Instead simply prefix the integer limits with PG_ and define them
unconditionally. I've adjusted all the references to them in code, but
not the ones in comments; the latter seems unnecessary to me.

Discussion: 20150331141423.GK4878@alap3.anarazel.de
2015-04-02 17:43:35 +02:00
Bruce Momjian a0efc71453 pg_upgrade: call 'postgres' binary to get data directory location
This matches the binary 'pg_ctl' calls.  Previously we called the
'postmaster'.

Report by Christoph Berg
2015-04-01 18:25:45 -04:00
Bruce Momjian 0cf16b44cb btree_gin: properly call DirectFunctionCall1()
Previously we called DirectFunctionCall3() with dummy arguments.  Fixed
version of previous patch.

Report by Jon Nelson
2015-03-31 10:26:45 -04:00
Heikki Linnakangas 1d0db8de04 Remove spurious semicolons.
Petr Jelinek
2015-03-31 15:12:27 +03:00
Andrew Dunstan fa1e5afa8a Run pg_upgrade and pg_resetxlog with restricted token on Windows
As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.

Backpatch to all live branches. On the development branch the code is
reorganized so that the restricted token code is now in a single
location. On the stable bramches a less invasive change is made by
simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c.

Patches and bug report from Muhammad Asif Naeem, reviewed by Michael
Paquier, slightly edited by me.
2015-03-30 17:07:52 -04:00
Tom Lane 542320c2bd Be more careful about printing constants in ruleutils.c.
The previous coding in get_const_expr() tried to avoid quoting integer,
float, and numeric literals if at all possible.  While that looks nice,
it means that dumped expressions might re-parse to something that's
semantically equivalent but not the exact same parsetree; for example
a FLOAT8 constant would re-parse as a NUMERIC constant with a cast to
FLOAT8.  Though the result would be the same after constant-folding,
this is problematic in certain contexts.  In particular, Jeff Davis
pointed out that this could cause unexpected failures in ALTER INHERIT
operations because of child tables having not-exactly-equivalent CHECK
expressions.  Therefore, favor correctness over legibility and dump
such constants in quotes except in the limited cases where they'll
be interpreted as the same type even without any casting.

This results in assorted small changes in the regression test outputs,
and will affect display of user-defined views and rules similarly.
The odds of that causing problems in the field seem non-negligible;
given the lack of previous complaints, it seems best not to change
this in the back branches.
2015-03-30 14:59:49 -04:00
Tom Lane e9dd03c03a Minor code cleanups in pgbench expression support.
Get rid of unnecessary expr_yylex declaration (we haven't supported
flex 2.5.4 in a long time, and even if we still did, the declaration in
pgbench.h makes this one unnecessary and inappropriate).  Fix copyright
dates, improve some layout choices, etc.
2015-03-29 13:06:59 -04:00
Tom Lane 2c33e0fbce Better fix for misuse of Float8GetDatumFast().
We can use that macro as long as we put the value into a local variable.
Commit 735cd6128 was not wrong on its own terms, but I think this way
looks nicer, and it should save a few cycles on 32-bit machines.
2015-03-28 13:56:37 -04:00
Andrew Dunstan cfe12763c3 Use standard librart sqrt function in pg_stat_statements
The stddev calculation included a faster but unportable sqrt function.
This is not worth the extra effort, and won't work everywhere. If the
standard library function is good enough for the SQL function it
should be good enough here too.
2015-03-28 09:22:51 -04:00
Heikki Linnakangas e09b48316c Add index-only scan support to btree_gist.
inet, cidr, and timetz indexes still cannot support index-only scans,
because they don't store the original unmodified value in the index, but a
derived approximate value.
2015-03-27 23:35:16 +02:00
Andrew Dunstan 735cd6128a Fix portability issues with stddev in pg_stat_statements
Stddev is calculated on the fly, and the code in commit 717f709532 was
using Float8GetDatumFast() inappropriately to convert the result to a
Datum. Mea culpa. It now uses Float8GetDatum().
2015-03-27 17:29:59 -04:00
Andrew Dunstan 717f709532 Add stats for min, max, mean, stddev times to pg_stat_statements.
The new fields are min_time, max_time, mean_time and stddev_time.

Based on an original patch from Mitsumasa KONDO, modified by me. Reviewed by Petr Jelínek.
2015-03-27 15:43:22 -04:00
Heikki Linnakangas 8816af65e4 Minor refactoring of btree_gist code.
The gbt_var_key_copy function was doing two different things depending on
the boolean argument. Seems cleaner to have two separate functions.

Remove unused argument from gbt_num_compress.
2015-03-26 23:10:10 +02:00
Tom Lane 785941cdc3 Tweak __attribute__-wrapping macros for better pgindent results.
This improves on commit bbfd7edae5 by
making two simple changes:

* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed().  This reduces
pgindent's tendency to misformat declarations involving them.

* attributes are now always attached to function declarations, not
definitions.  Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway.  (It does little good to put a
noreturn annotation where callers can't see it.)  In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.

I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
2015-03-26 14:03:25 -04:00
Andres Freund 83ff1618bc Centralize definition of integer limits.
Several submitted and even committed patches have run into the problem
that C89, our baseline, does not provide minimum/maximum values for
various integer datatypes. C99's stdint.h does, but we can't rely on
it.

Several parts of the code defined limits locally, so instead centralize
the definitions to c.h.

This patch also changes the more obvious usages of literal limit values;
there's more places that could be changed, but it's less clear whether
it's beneficial to change those.

Author: Andrew Gierth
Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk
2015-03-25 22:39:42 +01:00
Bruce Momjian 11226e3817 Revert commit 843cd0bfe6
Report by Tom Lane
2015-03-24 22:35:05 -04:00
Bruce Momjian 843cd0bfe6 btree_gin: properly call DirectFunctionCall1()
Previously we called DirectFunctionCall3() with dummy arguments.

Patch by Jon Nelson
2015-03-24 20:53:29 -04:00
Tom Lane cb1ca4d800 Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.

As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS.  Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior.  Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it.  And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.

An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:

 Update on pt1  (cost=0.00..321.05 rows=3541 width=46)
   Update on pt1
   Foreign Update on ft1
   Foreign Update on ft2
   Update on child3
   ->  Seq Scan on pt1  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on ft1  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on ft2  (cost=100.00..148.03 rows=1170 width=46)
   ->  Seq Scan on child3  (cost=0.00..25.00 rows=1200 width=46)

This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.

Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
2015-03-22 13:53:21 -04:00
Tom Lane 8d1f239003 Replace insertion sort in contrib/intarray with qsort().
It's all very well to claim that a simplistic sort is fast in easy
cases, but O(N^2) in the worst case is not good ... especially if the
worst case is as easy to hit as "descending order input".  Replace that
bit with our standard qsort.

Per bug #12866 from Maksym Boguk.  Back-patch to all active branches.
2015-03-15 23:22:03 -04:00
Tom Lane 7b8b8a4331 Improve representation of PlanRowMark.
This patch fixes two inadequacies of the PlanRowMark representation.

First, that the original LockingClauseStrength isn't stored (and cannot be
inferred for foreign tables, which always get ROW_MARK_COPY).  Since some
PlanRowMarks are created out of whole cloth and don't actually have an
ancestral RowMarkClause, this requires adding a dummy LCS_NONE value to
enum LockingClauseStrength, which is fairly annoying but the alternatives
seem worse.  This fix allows getting rid of the use of get_parse_rowmark()
in FDWs (as per the discussion around commits 462bd95705 and
8ec8760fc8), and it simplifies some things elsewhere.

Second, that the representation assumed that all child tables in an
inheritance hierarchy would use the same RowMarkType.  That's true today
but will soon not be true.  We add an "allMarkTypes" field that identifies
the union of mark types used in all a parent table's children, and use
that where appropriate (currently, only in preprocess_targetlist()).

In passing fix a couple of minor infelicities left over from the SKIP
LOCKED patch, notably that _outPlanRowMark still thought waitPolicy
is a bool.

Catversion bump is required because the numeric values of enum
LockingClauseStrength can appear in on-disk rules.

Extracted from a much larger patch to support foreign table inheritance;
it seemed worth breaking this out, since it's a separable concern.

Shigeru Hanada and Etsuro Fujita, somewhat modified by me
2015-03-15 18:41:47 -04:00
Robert Haas e96b7c6b9f sepgsql: Improve error message when unsupported object type is labeled.
KaiGai Kohei, reviewed by Álvaro Herrera and myself
2015-03-11 12:12:10 -04:00
Andres Freund bbfd7edae5 Add macros wrapping all usage of gcc's __attribute__.
Until now __attribute__() was defined to be empty for all compilers but
gcc. That's problematic because it prevents using it in other compilers;
which is necessary e.g. for atomics portability.  It's also just
generally dubious to do so in a header as widely included as c.h.

Instead add pg_attribute_format_arg, pg_attribute_printf,
pg_attribute_noreturn macros which are implemented in the compilers that
understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
but don't provide fallbacks, since they can affect functionality.

This means that external code that, possibly unwittingly, relied on
__attribute__ defined to be empty on !gcc compilers may now run into
warnings or errors on those compilers. But there shouldn't be many
occurances of that and it's hard to work around...

Discussion: 54B58BA3.8040302@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.
2015-03-11 14:30:01 +01:00
Fujii Masao 57aa5b2bb1 Add GUC to enable compression of full page images stored in WAL.
When newly-added GUC parameter, wal_compression, is on, the PostgreSQL server
compresses a full page image written to WAL when full_page_writes is on or
during a base backup. A compressed page image will be decompressed during WAL
replay. Turning this parameter on can reduce the WAL volume without increasing
the risk of unrecoverable data corruption, but at the cost of some extra CPU
spent on the compression during WAL logging and on the decompression during
WAL replay.

This commit changes the WAL format (so bumping WAL version number) so that
the one-byte flag indicating whether a full page image is compressed or not is
included in its header information. This means that the commit increases the
WAL volume one-byte per a full page image even if WAL compression is not used
at all. We can save that one-byte by borrowing one-bit from the existing field
like hole_offset in the header and using it as the flag, for example. But which
would reduce the code readability and the extensibility of the feature.
Per discussion, it's not worth paying those prices to save only one-byte, so we
decided to add the one-byte flag to the header.

This commit doesn't introduce any new compression algorithm like lz4.
Currently a full page image is compressed using the existing PGLZ algorithm.
Per discussion, we decided to use it at least in the first version of the
feature because there were no performance reports showing that its compression
ratio is unacceptably lower than that of other algorithm. Of course,
in the future, it's worth considering the support of other compression
algorithm for the better compression.

Rahila Syed and Michael Paquier, reviewed in various versions by myself,
Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.
2015-03-11 15:52:24 +09:00
Alvaro Herrera e491bd2ee3 Move BRIN page type to page's last two bytes
... which is the usual convention among AMs, so that pg_filedump and
similar utilities can tell apart pages of different AMs.  It was also
the intent of the original code, but I failed to realize that alignment
considerations would move the whole thing to the previous-to-last word
in the page.

The new definition of the associated macro makes surrounding code a bit
leaner, too.

Per note from Heikki at
http://www.postgresql.org/message-id/546A16EF.9070005@vmware.com
2015-03-10 12:27:15 -03:00
Heikki Linnakangas f1fd515b39 Move WAL-related definitions from dbcommands.h to separate header file.
This makes it easier to write frontend programs that needs to understand
the WAL record format of CREATE/DROP DATABASE. dbcommands.h cannot easily
be #included in a frontend program, because it pulls in other header files
that need backend stuff, but the new dbcommands_xlog.h header file has
fewer dependencies.
2015-03-09 15:50:49 +02:00
Alvaro Herrera c6ee39bc85 Fix contrib/file_fdw's expected file
I forgot to update it on yesterday's cf34e373fc.
2015-03-06 11:47:09 -03:00
Robert Haas e5f3690249 pgbench: Fix mistakes in Makefile.
My commit 878fdcb843 was not quite
right.  Tom Lane pointed out one of the mistakes fixed here, and I
noticed the other myself while reviewing what I'd committed.
2015-03-03 10:48:16 -05:00
Robert Haas 878fdcb843 pgbench: Add a real expression syntax to \set
Previously, you could do \set variable operand1 operator operand2, but
nothing more complicated.  Now, you can \set variable expression, which
makes it much simpler to do multi-step calculations here.  This also
adds support for the modulo operator (%), with the same semantics as in
C.

Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and
Stephen Frost
2015-03-02 14:21:41 -05:00
Tom Lane 2e211211a7 Use FLEXIBLE_ARRAY_MEMBER in a number of other places.
I think we're about done with this...
2015-02-21 16:12:14 -05:00
Tom Lane e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
Tom Lane c110eff132 Use FLEXIBLE_ARRAY_MEMBER in struct RecordIOData.
I (tgl) fixed this last night in rowtypes.c, but I missed that the
code had been copied into a couple of other places.

Michael Paquier
2015-02-20 17:03:12 -05:00
Tom Lane 09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
Kevin Grittner c923e82a23 Eliminate unnecessary NULL checks in picksplit method of intarray.
Where these checks were being done there was no code path which
could leave them NULL.

Michael Paquier per Coverity
2015-02-16 15:26:23 -06:00
Tom Lane 80986e85aa Avoid returning undefined bytes in chkpass_in().
We can't really fix the problem that the result is defined to depend on
random(), so it is still going to fail the "unstable input conversion"
test in parse_type.c.  However, we can at least satify valgrind.  (It
looks like this code used to be valgrind-clean, actually, until somebody
did a careless s/strncpy/strlcpy/g on it.)

In passing, let's just make real sure that chkpass_out doesn't overrun
its output buffer.

No need for backpatch, I think, since this is just to satisfy debugging
tools.

Asif Naeem
2015-02-14 12:20:56 -05:00
Bruce Momjian dc01efa5cc pg_upgrade: improve checksum mismatch error message
Patch by Greg Sabino Mullane, slight adjustments by me
2015-02-11 22:22:26 -05:00
Bruce Momjian 056764b102 pg_upgrade: quote directory names in delete_old_cluster script
This allows the delete script to properly function when special
characters appear in directory paths, e.g. spaces.

Backpatch through 9.0
2015-02-11 22:06:04 -05:00
Heikki Linnakangas c619c2351f Move pg_crc.c to src/common, and remove pg_crc_tables.h
To get CRC functionality in a client program, you now need to link with
libpgcommon instead of libpgport. The CRC code has nothing to do with
portability, so libpgcommon is a better home. (libpgcommon didn't exist
when pg_crc.c was originally moved to src/port.)

Remove the possibility to get CRC functionality by just #including
pg_crc_tables.h. I'm not aware of any extensions that actually did that and
couldn't simply link with libpgcommon.

This also moves the pg_crc.h header file from src/include/utils to
src/include/common, which will require changes to any external programs
that currently does #include "utils/pg_crc.h". That seems acceptable, as
include/common is clearly the right home for it now, and the change needed
to any such programs is trivial.
2015-02-09 11:17:56 +02:00
Robert Haas 370b3a4618 pgcrypto: Code cleanup for decrypt_internal.
Remove some unnecessary null-tests, and replace a goto-label construct
with an "if" block.

Michael Paquier, reviewed by me.
2015-02-04 08:46:32 -05:00
Heikki Linnakangas 4eaafa0453 Remove dead code.
Commit 13629df changed metaphone() function to return an empty string on
empty input, but it left the old error message in place. It's now dead code.

Michael Paquier, per Coverity warning.
2015-02-03 09:43:44 +02:00
Noah Misch 59b919822a Prevent Valgrind Memcheck errors around px_acquire_system_randomness().
This function uses uninitialized stack and heap buffers as supplementary
entropy sources.  Mark them so Memcheck will not complain.  Back-patch
to 9.4, where Valgrind Memcheck cooperation first appeared.

Marko Tiikkaja
2015-02-02 10:00:45 -05:00
Noah Misch 8b59672d8d Cherry-pick security-relevant fixes from upstream imath library.
This covers alterations to buffer sizing and zeroing made between imath
1.3 and imath 1.20.  Valgrind Memcheck identified the buffer overruns
and reliance on uninitialized data; their exploit potential is unknown.
Builds specifying --with-openssl are unaffected, because they use the
OpenSSL BIGNUM facility instead of imath.  Back-patch to 9.0 (all
supported versions).

Security: CVE-2015-0243
2015-02-02 10:00:45 -05:00
Noah Misch 1dc7551586 Fix buffer overrun after incomplete read in pullf_read_max().
Most callers pass a stack buffer.  The ensuing stack smash can crash the
server, and we have not ruled out the viability of attacks that lead to
privilege escalation.  Back-patch to 9.0 (all supported versions).

Marko Tiikkaja

Security: CVE-2015-0243
2015-02-02 10:00:45 -05:00
Tom Lane a59ee88197 Fix Coverity warning about contrib/pgcrypto's mdc_finish().
Coverity points out that mdc_finish returns a pointer to a local buffer
(which of course is gone as soon as the function returns), leaving open
a risk of misbehaviors possibly as bad as a stack overwrite.

In reality, the only possible call site is in process_data_packets()
which does not examine the returned pointer at all.  So there's no
live bug, but nonetheless the code is confusing and risky.  Refactor
to avoid the issue by letting process_data_packets() call mdc_finish()
directly instead of going through the pullf_read() API.

Although this is only cosmetic, it seems good to back-patch so that
the logic in pgp-decrypt.c stays in sync across all branches.

Marko Kreen
2015-01-30 13:05:30 -05:00
Tom Lane 37507962c3 Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426b it wasn't
checking that at all.  This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two.  Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().

In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.

In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken.  This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.

In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy.  Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.

Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.

In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.

Michael Paquier, adjusted somewhat by me
2015-01-29 20:18:33 -05:00
Andres Freund ed127002d8 Align buffer descriptors to cache line boundaries.
Benchmarks has shown that aligning the buffer descriptor array to
cache lines is important for scalability; especially on bigger,
multi-socket, machines.

Currently the array sometimes already happens to be aligned by
happenstance, depending how large previous shared memory allocations
were. That can lead to wildly varying performance results after minor
configuration changes.

In addition to aligning the start of descriptor array, also force the
size of individual descriptors to be of a common cache line size (64
bytes). That happens to already be the case on 64bit platforms, but
this way we can change the struct BufferDesc more easily.

As the alignment primarily matters in highly concurrent workloads
which probably all are 64bit these days, and the space wastage of
element alignment would be a bit more noticeable on 32bit systems, we
don't force the stride to be cacheline sized on 32bit platforms for
now. If somebody does actual performance testing, we can reevaluate
that decision by changing the definition of BUFFERDESC_PADDED_SIZE.

Discussion: 20140202151319.GD32123@awork2.anarazel.de

Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter
Geoghegan.
2015-01-29 22:48:45 +01:00
Heikki Linnakangas 670bf71f65 Remove dead NULL-pointer checks in GiST code.
gist_poly_compress() and gist_circle_compress() checked for a NULL-pointer
key argument, but that was dead code; the gist code never passes a
NULL-pointer to the "compress" method.

This commit also removes a documentation note added in commit a0a3883,
about doing NULL-pointer checks in the "compress" method. It was added
based on the fact that some implementations were doing NULL-pointer
checks, but those checks were unnecessary in the first place.

The NULL-pointer check in gbt_var_same() function was also unnecessary.
The arguments to the "same" method come from the "compress", "union", or
"picksplit" methods, but none of them return a NULL pointer.

None of this is to be confused with SQL NULL values. Those are dealt with
by the gist machinery, and are never passed to the GiST opclass methods.

Michael Paquier
2015-01-28 10:03:58 +02:00
Tom Lane dabda64152 Fix volatile-safety issue in dblink's materializeQueryResult().
Some fields of the sinfo struct are modified within PG_TRY and then
referenced within PG_CATCH, so as with recent patch to async.c, "volatile"
is necessary for strict POSIX compliance; and that propagates to a couple
of subroutines as well as materializeQueryResult() itself.  I think the
risk of actual issues here is probably higher than in async.c, because
storeQueryResult() is likely to get inlined into materializeQueryResult(),
leaving the compiler free to conclude that its stores into sinfo fields are
dead code.
2015-01-26 15:17:33 -05:00
Tom Lane 586dd5d6a5 Replace a bunch more uses of strncpy() with safer coding.
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about.  So just use memcpy() in
these cases.

In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).

I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution).  There are also a
few such calls in ecpg that could possibly do with more analysis.

AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior.  These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
2015-01-24 13:05:42 -05:00
Tom Lane eb213acfe2 Prevent duplicate escape-string warnings when using pg_stat_statements.
contrib/pg_stat_statements will sometimes run the core lexer a second time
on submitted statements.  Formerly, if you had standard_conforming_strings
turned off, this led to sometimes getting two copies of any warnings
enabled by escape_string_warning.  While this is probably no longer a big
deal in the field, it's a pain for regression testing.

To fix, change the lexer so it doesn't consult the escape_string_warning
GUC variable directly, but looks at a copy in the core_yy_extra_type state
struct.  Then, pg_stat_statements can change that copy to disable warnings
while it's redoing the lexing.

It seemed like a good idea to make this happen for all three of the GUCs
consulted by the lexer, not just escape_string_warning.  There's not an
immediate use-case for callers to adjust the other two AFAIK, but making
it possible is easy enough and seems like good future-proofing.

Arguably this is a bug fix, but there doesn't seem to be enough interest to
justify a back-patch.  We'd not be able to back-patch exactly as-is anyway,
for fear of breaking ABI compatibility of the struct.  (We could perhaps
back-patch the addition of only escape_string_warning by adding it at the
end of the struct, where there's currently alignment padding space.)
2015-01-22 18:11:00 -05:00
Tom Lane 8e166e164c Rearrange explain.c's API so callers need not embed sizeof(ExplainState).
The folly of the previous arrangement was just demonstrated: there's no
convenient way to add fields to ExplainState without breaking ABI, even
if callers have no need to touch those fields.  Since we might well need
to do that again someday in back branches, let's change things so that
only explain.c has to have sizeof(ExplainState) compiled into it.  This
costs one extra palloc() per EXPLAIN operation, which is surely pretty
negligible.
2015-01-15 13:39:33 -05:00
Robert Haas 0b49642b99 pg_standby: Avoid writing one byte beyond the end of the buffer.
Previously, read() might have returned a length equal to the buffer
length, and then the subsequent store to buf[len] would write a
zero-byte one byte past the end.  This doesn't seem likely to be
a security issue, but there's some chance it could result in
pg_standby misbehaving.

Spotted by Coverity; patch by Michael Paquier, reviewed by me.
2015-01-15 09:26:03 -05:00
Robert Haas 4a0a5f21fa vacuumlo: Avoid unlikely memory leak.
Spotted by Coverity.  This isn't likely to matter in practice, but
there's no harm in fixing it.

Michael Paquier
2015-01-14 15:14:20 -05:00
Heikki Linnakangas e37d474f91 Silence Coverity warnings about unused return values from pushJsonbValue()
Similar warnings from backend were silenced earlier by commit c8315930,
but there were a few more contrib/hstore.

Michael Paquier
2015-01-13 14:33:05 +02:00
Bruce Momjian ac7009abd2 pg_upgrade: fix one-byte per empty db memory leak
Report by Tatsuo Ishii, Coverity
2015-01-09 12:12:30 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Andres Freund 8cadeb792c Correctly handle test durations of more than 2147s in pg_test_timing.
Previously the computation of the total test duration, measured in
microseconds, accidentally overflowed due to accidentally using signed
32bit arithmetic.  As the only consequence is that pg_test_timing
invocations with such, overly large, durations never finished the
practical consequences of this bug are minor.

Pointed out by Coverity.

Backpatch to 9.2 where pg_test_timing was added.
2015-01-04 15:44:49 +01:00
Andres Freund d1c575230d Fix off-by-one in pg_xlogdump's fuzzy_open_file().
In the unlikely case of stdin (fd 0) being closed, the off-by-one
would lead to pg_xlogdump failing to open files.

Spotted by Coverity.

Backpatch to 9.3 where pg_xlogdump was introduced.
2015-01-04 15:35:46 +01:00
Andres Freund 58bc4747be Add missing va_end() call to a early exit in dmetaphone.c's StringAt().
Pointed out by Coverity.

Backpatch to all supported branches, the code has been that way for a
long while.
2015-01-04 15:35:46 +01:00
Tatsuo Ishii 3b5a89c482 Fix resource leak pointed out by Coverity. 2014-12-30 20:33:01 +09:00
Bruce Momjian 83bcc70459 pgbench: remove odd trailing period in init progress output 2014-12-24 09:21:09 -05:00
Heikki Linnakangas 7f0dccaed6 Turn much of the btree_gin macros into real functions.
This makes the functions much nicer to read and edit, and also makes
debugging easier.
2014-12-22 17:11:53 +02:00
Tom Lane 4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Noah Misch f6dc6dd5ba Lock down regression testing temporary clusters on Windows.
Use SSPI authentication to allow connections exclusively from the OS
user that launched the test suite.  This closes on Windows the
vulnerability that commit be76a6d39e
closed on other platforms.  Users of "make installcheck" or custom test
harnesses can run "pg_regress --config-auth=DATADIR" to activate the
same authentication configuration that "make check" would use.
Back-patch to 9.0 (all supported versions).

Security: CVE-2014-0067
2014-12-17 22:48:40 -05:00
Tom Lane fc2ac1fb41 Allow CHECK constraints to be placed on foreign tables.
As with NOT NULL constraints, we consider that such constraints are merely
reports of constraints that are being enforced by the remote server (or
other underlying storage mechanism).  Their only real use is to allow
planner optimizations, for example in constraint-exclusion checks.  Thus,
the code changes here amount to little more than removal of the error that
was formerly thrown for applying CHECK to a foreign table.

(In passing, do a bit of cleanup of the ALTER FOREIGN TABLE reference page,
which had accumulated some weird decisions about ordering etc.)

Shigeru Hanada and Etsuro Fujita, reviewed by Kyotaro Horiguchi and
Ashutosh Bapat.
2014-12-17 17:00:53 -05:00
Magnus Hagander cef0ae498c Update .gitignore for pg_upgrade
Add Windows versions of generated scripts, and make sure we only
ignore the scripts int he root directory.

Michael Paquier
2014-12-17 11:55:22 +01:00
Tom Lane de8e46f5f5 Suppress bogus statistics when pgbench failed to complete any transactions.
Code added in 9.4 would attempt to divide by zero in such cases.
Noted while testing fix for missing-pclose problem.
2014-12-16 14:53:55 -05:00
Tom Lane d38e8d30ce Fix file descriptor leak after failure of a \setshell command in pgbench.
If the called command fails to return data, runShellCommand forgot to
pclose() the pipe before returning.  This is fairly harmless in the current
code, because pgbench would then abandon further processing of that client
thread; so no more than nclients descriptors could be leaked this way.  But
it's not hard to imagine future improvements whereby that wouldn't be true.
In any case, it's sloppy coding, so patch all branches.  Found by Coverity.
2014-12-16 13:31:42 -05:00
Tom Lane 8ec8760fc8 Revert misguided change to postgres_fdw FOR UPDATE/SHARE code.
In commit 462bd95705, I changed postgres_fdw
to rely on get_plan_rowmark() instead of get_parse_rowmark().  I still
think that's a good idea in the long run, but as Etsuro Fujita pointed out,
it doesn't work today because planner.c forces PlanRowMarks to have
markType = ROW_MARK_COPY for all foreign tables.  There's no urgent reason
to change this in the back branches, so let's just revert that part of
yesterday's commit rather than trying to design a better solution under
time pressure.

Also, add a regression test case showing what postgres_fdw does with FOR
UPDATE/SHARE.  I'd blithely assumed there was one already, else I'd have
realized yesterday that this code didn't work.
2014-12-12 12:41:49 -05:00
Tom Lane 462bd95705 Fix planning of SELECT FOR UPDATE on child table with partial index.
Ordinarily we can omit checking of a WHERE condition that matches a partial
index's condition, when we are using an indexscan on that partial index.
However, in SELECT FOR UPDATE we must include the "redundant" filter
condition in the plan so that it gets checked properly in an EvalPlanQual
recheck.  The planner got this mostly right, but improperly omitted the
filter condition if the index in question was on an inheritance child
table.  In READ COMMITTED mode, this could result in incorrectly returning
just-updated rows that no longer satisfy the filter condition.

The cause of the error is using get_parse_rowmark() when get_plan_rowmark()
is what should be used during planning.  In 9.3 and up, also fix the same
mistake in contrib/postgres_fdw.  It's currently harmless there (for lack
of inheritance support) but wrong is wrong, and the incorrect code might
get copied to someplace where it's more significant.

Report and fix by Kyotaro Horiguchi.  Back-patch to all supported branches.
2014-12-11 21:02:25 -05:00
Alvaro Herrera dcbfc00aba pg_xlogdump/.gitignore: add committsdesc.c
Author: Michael Paquier
2014-12-09 09:54:14 -03:00
Heikki Linnakangas ebc2b681b8 Fix pg_xlogdump's calculation of full-page image data.
The old formula was completely bogus with the new WAL record format.
2014-12-05 11:40:27 +02:00
Peter Eisentraut 1e95bbc870 Fix SHLIB_PREREQS use in contrib, allowing PGXS builds
dblink and postgres_fdw use SHLIB_PREREQS = submake-libpq to build libpq
first.  This doesn't work in a PGXS build, because there is no libpq to
build.  So just omit setting SHLIB_PREREQS in this case.

Note that PGXS users can still use SHLIB_PREREQS (although it is not
documented).  The problem here is only that contrib modules can be built
in-tree or using PGXS, and the prerequisite is only applicable in the
former case.

Commit 6697aa2bc2 previously attempted to
address this by creating a somewhat fake submake-libpq target in
Makefile.global.  That was not the right fix, and it was also done in a
nonportable way, so revert that.
2014-12-04 07:58:12 -05:00
Alvaro Herrera 73c986adde Keep track of transaction commit timestamps
Transactions can now set their commit timestamp directly as they commit,
or an external transaction commit timestamp can be fed from an outside
system using the new function TransactionTreeSetCommitTsData().  This
data is crash-safe, and truncated at Xid freeze point, same as pg_clog.

This module is disabled by default because it causes a performance hit,
but can be enabled in postgresql.conf requiring only a server restart.

A new test in src/test/modules is included.

Catalog version bumped due to the new subdirectory within PGDATA and a
couple of new SQL functions.

Authors: Álvaro Herrera and Petr Jelínek

Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert
Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven
Singer, Peter Eisentraut
2014-12-03 11:53:02 -03:00
Andres Freund 0fd38e1370 Don't skip SQL backends in logical decoding for visibility computation.
The logical decoding patchset introduced PROC_IN_LOGICAL_DECODING flag
PGXACT flag, that allows such backends to be skipped when computing
the xmin horizon/snapshots. That's fine and sensible for walsenders
streaming out logical changes, but not at all fine for SQL backends
doing logical decoding. If the latter set that flag any change they
have performed outside of logical decoding will not be regarded as
visible - which e.g. can lead to that change being vacuumed away.

Note that not setting the flag for SQL backends isn't particularly
bothersome - the SQL backend doesn't do streaming, so it only runs for
a limited amount of time.

Per buildfarm member 'tick' and Alvaro.

Backpatch to 9.4, where logical decoding was introduced.
2014-12-02 23:47:08 +01:00
Alvaro Herrera b52cb4690e pageinspect/BRIN: minor tweaks
Michael Paquier

Double-dash additions suggested by Peter Geoghegan
2014-12-02 12:20:50 -03:00
Andrew Dunstan e09996ff8d Fix hstore_to_json_loose's detection of valid JSON number values.
We expose a function IsValidJsonNumber that internally calls the lexer
for json numbers. That allows us to use the same test everywhere,
instead of inventing a broken test for hstore conversions. The new
function is also used in datum_to_json, replacing the code that is now
moved to the new function.

Backpatch to 9.3 where hstore_to_json_loose was introduced.
2014-12-01 11:28:45 -05:00
Alvaro Herrera 22dfd116a1 Move test modules from contrib to src/test/modules
This is advance preparation for introducing even more test modules; the
easy solution is to add them to contrib, but that's bloated enough that
it seems a good time to think of something different.

Moved modules are dummy_seclabel, test_shm_mq, test_parser and
worker_spi.

(test_decoding was also a candidate, but there was too much opposition
to moving that one.  We can always reconsider later.)
2014-11-29 23:55:00 -03:00
Tom Lane f4e031c662 Add bms_next_member(), and use it where appropriate.
This patch adds a way of iterating through the members of a bitmapset
nondestructively, unlike the old way with bms_first_member().  While
bms_next_member() is very slightly slower than bms_first_member()
(at least for typical-size bitmapsets), eliminating the need to palloc
and pfree a temporary copy of the target bitmapset is a significant win.
So this method should be preferred in all cases where a temporary copy
would be necessary.

Tom Lane, with suggestions from Dean Rasheed and David Rowley
2014-11-28 13:37:25 -05:00
Tom Lane c168ba3112 Free libxml2/libxslt resources in a safer order.
Mark Simonetti reported that libxslt sometimes crashes for him, and that
swapping xslt_process's object-freeing calls around to do them in reverse
order of creation seemed to fix it.  I've not reproduced the crash, but
valgrind clearly shows a reference to already-freed memory, which is
consistent with the idea that shutdown of the xsltTransformContext is
trying to reference the already-freed stylesheet or input document.
With this patch, valgrind is no longer unhappy.

I have an inquiry in to see if this is a libxslt bug or if we're just
abusing the library; but even if it's a library bug, we'd want to adjust
our code so it doesn't fail with unpatched libraries.

Back-patch to all supported branches, because we've been doing this in
the wrong(?) order for a long time.
2014-11-27 11:13:29 -05:00
Heikki Linnakangas e453cc2741 Make Port->ssl_in_use available, even when built with !USE_SSL
Code that check the flag no longer need #ifdef's, which is more convenient.
In particular, makes it easier to write extensions that depend on it.

In the passing, modify sslinfo's ssl_is_used function to check ssl_in_use
instead of the OpenSSL specific 'ssl' pointer. It doesn't make any
difference currently, as sslinfo is only compiled when built with OpenSSL,
but seems cleaner anyway.
2014-11-25 09:46:11 +02:00
Robert Haas f5d9698a84 Add infrastructure to save and restore GUC values.
This is further infrastructure for parallelism.

Amit Khandekar, Noah Misch, Robert Haas
2014-11-24 16:37:56 -05:00
Tom Lane 9c58101117 Fix mishandling of system columns in FDW queries.
postgres_fdw would send query conditions involving system columns to the
remote server, even though it makes no effort to ensure that system
columns other than CTID match what the remote side thinks.  tableoid,
in particular, probably won't match and might have some use in queries.
Hence, prevent sending conditions that include non-CTID system columns.

Also, create_foreignscan_plan neglected to check local restriction
conditions while determining whether to set fsSystemCol for a foreign
scan plan node.  This again would bollix the results for queries that
test a foreign table's tableoid.

Back-patch the first fix to 9.3 where postgres_fdw was introduced.
Back-patch the second to 9.2.  The code is probably broken in 9.1 as
well, but the patch doesn't apply cleanly there; given the weak state
of support for FDWs in 9.1, it doesn't seem worth fixing.

Etsuro Fujita, reviewed by Ashutosh Bapat, and somewhat modified by me
2014-11-22 16:01:05 -05:00
Heikki Linnakangas 3a82bc6f8a Add pageinspect functions for inspecting GIN indexes.
Patch by me, Peter Geoghegan and Michael Paquier, reviewed by Amit Kapila.
2014-11-21 11:58:07 +02:00
Heikki Linnakangas 2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00
Robert Haas a016555361 Avoid file descriptor leak in pg_test_fsync.
This can cause problems on Windows, where files that are still open
can't be unlinked.

Jeff Janes
2014-11-19 12:06:24 -05:00
Alvaro Herrera f9ef578d05 postgres_fdw.h: don't pull in rel.h when relcache.h is enough 2014-11-14 21:48:53 -03:00
Andres Freund 89fd41b390 Fix and improve cache invalidation logic for logical decoding.
There are basically three situations in which logical decoding needs
to perform cache invalidation. During/After replaying a transaction
with catalog changes, when skipping a uninteresting transaction that
performed catalog changes and when erroring out while replaying a
transaction. Unfortunately these three cases were all done slightly
differently - partially because 8de3e410fa, which greatly simplifies
matters, got committed in the midst of the development of logical
decoding.

The actually problematic case was when logical decoding skipped
transaction commits (and thus processed invalidations). When used via
the SQL interface cache invalidation could access the catalog - bad,
because we didn't set up enough state to allow that correctly. It'd
not be hard to setup sufficient state, but the simpler solution is to
always perform cache invalidation outside a valid transaction.

Also make the different cache invalidation cases look as similar as
possible, to ease code review.

This fixes the assertion failure reported by Antonin Houska in
53EE02D9.7040702@gmail.com. The presented testcase has been expanded
into a regression test.

Backpatch to 9.4, where logical decoding was introduced.
2014-11-13 20:34:31 +01:00
Robert Haas c0828b78e9 Move the guts of our Levenshtein implementation into core.
The hope is that we can use this to produce better diagnostics in
some cases.

Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
2014-11-13 12:33:26 -05:00
Andres Freund ec5896aed3 Fix several weaknesses in slot and logical replication on-disk serialization.
Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and
snapbuild.c were missing the FIN_CRC32 call when computing/checking
checksums of on disk files. That doesn't lower the the error detection
capabilities of the checksum, but is inconsistent with other usages.

In a followup mail Heikki also noticed that, contrary to a comment,
the 'version' and 'length' struct fields of replication slot's on disk
data where not covered by the checksum. That's not likely to lead to
actually missed corruption as those fields are cross checked with the
expected version and the actual file length. But it's wrong
nonetheless.

As fixing these issues makes existing on disk files unreadable, bump
the expected versions of on disk files for both slots and logical
decoding historic catalog snapshots.  This means that loading old
files will fail with
ERROR: "replication slot file ... has unsupported version 1"
and
ERROR: "snapbuild state file ... has unsupported version 1 instead of
2" respectively. Given the low likelihood of anybody already using
these new features in a production setup that seems acceptable.

Fixing these issues made me notice that there's no regression test
covering the loading of historic snapshot from disk - so add one.

Backpatch to 9.4 where these features were introduced.
2014-11-12 18:52:49 +01:00
Andres Freund bd4ae0f396 Add interrupt checks to contrib/pg_prewarm.
Currently the extension's pg_prewarm() function didn't check
interrupts once it started "warming" data. Since individual calls can
take a long while it's important for them to be interruptible.

Backpatch to 9.4 where pg_prewarm was introduced.
2014-11-12 18:52:49 +01:00
Tom Lane f2ad2bdd0a Loop when necessary in contrib/pgcrypto's pktreader_pull().
This fixes a scenario in which pgp_sym_decrypt() failed with "Wrong key
or corrupt data" on messages whose length is 6 less than a power of 2.

Per bug #11905 from Connor Penhale.  Fix by Marko Tiikkaja, regression
test case from Jeff Janes.
2014-11-11 17:22:15 -05:00
Robert Haas 99e8f08fab Update pg_xlogdump's .gitignore for brindesc.c. 2014-11-07 15:41:52 -05:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Heikki Linnakangas 2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Tom Lane 66c029c842 Fix volatility markings of some contrib I/O functions.
In general, datatype I/O functions are supposed to be immutable or at
worst stable.  Some contrib I/O functions were, through oversight, not
marked with any volatility property at all, which made them VOLATILE.
Since (most of) these functions actually behave immutably, the erroneous
marking isn't terribly harmful; but it can be user-visible in certain
circumstances, as per a recent bug report from Joe Van Dyk in which a
cast to text was disallowed in an expression index definition.

To fix, just adjust the declarations in the extension SQL scripts.  If we
were being very fussy about this, we'd bump the extension version numbers,
but that seems like more trouble (for both developers and users) than the
problem is worth.

A fly in the ointment is that chkpass_in actually is volatile, because
of its use of random() to generate a fresh salt when presented with a
not-yet-encrypted password.  This is bad because of the general assumption
that I/O functions aren't volatile: the consequence is that records or
arrays containing chkpass elements may have input behavior a bit different
from a bare chkpass column.  But there seems no way to fix this without
breaking existing usage patterns for chkpass, and the consequences of the
inconsistency don't seem bad enough to justify that.  So for the moment,
just document it in a comment.

Since we're not bumping version numbers, there seems no harm in
back-patching these fixes; at least future installations will get the
functions marked correctly.
2014-11-05 11:34:11 -05:00
Heikki Linnakangas 5028f22f6e Switch to CRC-32C in WAL and other places.
The old algorithm was found to not be the usual CRC-32 algorithm, used by
Ethernet et al. We were using a non-reflected lookup table with code meant
for a reflected lookup table. That's a strange combination that AFAICS does
not correspond to any bit-wise CRC calculation, which makes it difficult to
reason about its properties. Although it has worked well in practice, seems
safer to use a well-known algorithm.

Since we're changing the algorithm anyway, we might as well choose a
different polynomial. The Castagnoli polynomial has better error-correcting
properties than the traditional CRC-32 polynomial, even if we had
implemented it correctly. Another reason for picking that is that some new
CPUs have hardware support for calculating CRC-32C, but not CRC-32, let
alone our strange variant of it. This patch doesn't add any support for such
hardware, but a future patch could now do that.

The old algorithm is kept around for tsquery and pg_trgm, which use the
values in indexes that need to remain compatible so that pg_upgrade works.
While we're at it, share the old lookup table for CRC-32 calculation
between hstore, ltree and core. They all use the same table, so might as
well.
2014-11-04 11:39:48 +02:00
Tom Lane f443de873e Docs: fix incorrect spelling of contrib/pgcrypto option.
pgp_sym_encrypt's option is spelled "sess-key", not "enable-session-key".
Spotted by Jeff Janes.

In passing, improve a comment in pgp-pgsql.c to make it clearer that
the debugging options are intentionally undocumented.
2014-11-03 11:11:34 -05:00
Noah Misch 1ed8e771ad Remove dead-since-introduction pgcrypto code.
Marko Tiikkaja
2014-11-02 21:43:39 -05:00
Peter Eisentraut 83dc5908c2 pg_test_fsync: Update output format
Apparently, computers are now a bit faster than when this was first
added, so we need to make room for a digit or two in the ops/sec format.

While we're at it, adjust some of the other output for a more consistent
line length.
2014-10-20 15:36:51 -04:00
Tom Lane 488a7c9ccf Fix file-identification comment in contrib/pgcrypto/pgcrypto--1.2.sql.
Cosmetic oversight in commit 32984d8fc3.

Marko Tiikkaja
2014-10-20 10:53:57 -04:00
Tom Lane b2cbced9ee Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time.  But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation.  Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.

To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone.  For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time.  However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.

The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970.  The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.

While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.

This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib.  Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.

This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time.  We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.

In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 15:22:10 -04:00
Tom Lane 90063a7612 Print planning time only in EXPLAIN ANALYZE, not plain EXPLAIN.
We've gotten enough push-back on that change to make it clear that it
wasn't an especially good idea to do it like that.  Revert plain EXPLAIN
to its previous behavior, but keep the extra output in EXPLAIN ANALYZE.
Per discussion.

Internally, I set this up as a separate flag ExplainState.summary that
controls printing of planning time and execution time.  For now it's
just copied from the ANALYZE option, but we could consider exposing it
to users.
2014-10-15 18:50:13 -04:00
Heikki Linnakangas 98aed6c721 Add --latency-limit option to pgbench.
This allows transactions that take longer than specified limit to be counted
separately. With --rate, transactions that are already late by the time we
get to execute them are skipped altogether. Using --latency-limit with
--rate allows you to "catch up" more quickly, if there's a hickup in the
server causing a lot of transactions to stall momentarily.

Fabien COELHO, reviewed by Rukh Meski and heavily refactored by me.
2014-10-13 20:50:24 +03:00
Bruce Momjian dc9c612767 pg_upgrade: prefix Unix shell script name output with "./"
This more clearly suggests the current directory.  While this also works
on Windows, it might be confusing.

Report by Christoph Berg
2014-10-11 18:38:41 -04:00
Heikki Linnakangas 733be2a5cd Remove unnecessary initialization of local variables.
Oops, forgot these in the prveious commit.
2014-10-10 13:00:53 +03:00
Heikki Linnakangas 33755e8edf Change the way encoding and locale checks are done in pg_upgrade.
Lc_collate and lc_ctype have been per-database settings since server version
8.4, but pg_upgrade was still treating them as cluster-wide options. It
fetched the values for the template0 databases in old and new cluster, and
compared them. That's backwards; the encoding and locale of the template0
database doesn't matter, as template0 is guaranteed to contain only ASCII
characters. But if there are any other databases that exist on both clusters
(in particular template1 and postgres databases), their encodings and
locales must be compatible.

Also, make the locale comparison more lenient. If the locale names are not
equal, try to canonicalize both of them by passing them to setlocale(). We
used to do that only when upgrading from 9.1 or below, but it seems like a
good idea even with newer versions. If we change the canonical form of a
locale, this allows pg_upgrade to still work. I'm about to do just that to
fix bug #11431, by mapping a locale name that contains non-ASCII characters
to a pure-ASCII alias of the same locale.

No backpatching, because earlier versions of pg_upgrade still support
upgrading from 8.3 servers. That would be more complicated, so it doesn't
seem worth it, given that we haven't received any complaints about this
from users.
2014-10-10 10:39:32 +03:00
Heikki Linnakangas 86f809088c Fix typo in error message. 2014-10-02 15:51:31 +03:00
Heikki Linnakangas 84f0ea3f68 Refactor pgbench log-writing code to a separate function.
The doCustom function was incredibly long, this makes it a little bit more
readable.
2014-10-02 14:01:19 +03:00
Heikki Linnakangas 32984d8fc3 Add functions for dealing with PGP armor header lines to pgcrypto.
This add a new pgp_armor_headers function to extract armor headers from an
ASCII-armored blob, and a new overloaded variant of the armor function, for
constructing an ASCII-armor with extra headers.

Marko Tiikkaja and me.
2014-10-01 16:03:39 +03:00
Andres Freund 0ef3c29a4b Improve documentation about binary/textual output mode for output plugins.
Also improve related error message as it contributed to the confusion.

Discussion: CAB7nPqQrqFzjqCjxu4GZzTrD9kpj6HMn9G5aOOMwt1WZ8NfqeA@mail.gmail.com,
    CAB7nPqQXc_+g95zWnqaa=mVQ4d3BVRs6T41frcEYi2ocUrR3+A@mail.gmail.com

Per discussion between Michael Paquier, Robert Haas and Andres Freund

Backpatch to 9.4 where logical decoding was introduced.
2014-10-01 13:22:17 +02:00
Bruce Momjian 35419aeb83 pg_upgrade: have pg_upgrade fail for old 9.4 JSONB format
Backpatch through 9.4
2014-09-29 20:19:59 -04:00
Andres Freund 9b6bb9b471 Define META_FREE in a way that doesn't cause -Wempty-body warnings.
That get rids of the only -Wempty-body warning when compiling postgres
with gcc 4.8/9. As 6550b901f shows, it's useful to be able to use that
option routinely.

Without asserts there's many more warnings, but that's food for
another commit.
2014-09-26 02:55:38 +02:00
Heikki Linnakangas 1dcfb8da09 Refactor space allocation for base64 encoding/decoding in pgcrypto.
Instead of trying to accurately calculate the space needed, use a StringInfo
that's enlarged as needed. This is just moving things around currently - the
old code was not wrong - but this is in preparation for a patch that adds
support for extra armor headers, and would make the space calculation more
complicated.

Marko Tiikkaja
2014-09-25 16:36:58 +03:00
Andres Freund 604f7956b9 Improve code around the recently added rm_identify rmgr callback.
There are four weaknesses in728f152e07f998d2cb4fe5f24ec8da2c3bda98f2:

* append_init() in heapdesc.c was ugly and required that rm_identify
  return values are only valid till the next call. Instead just add a
  couple more switch() cases for the INIT_PAGE cases. Now the returned
  value will always be valid.
* a couple rm_identify() callbacks missed masking xl_info with
  ~XLR_INFO_MASK.
* pg_xlogdump didn't map a NULL rm_identify to UNKNOWN or a similar
  string.
* append_init() was called when id=NULL - which should never actually
  happen. But it's better to be careful.
2014-09-22 17:49:34 +02:00
Tom Lane 898f8a96ef Fix failure of contrib/auto_explain to print per-node timing information.
This has been broken since commit af7914c662,
which added the EXPLAIN (TIMING) option.  Although that commit included
updates to auto_explain, they evidently weren't tested very carefully,
because the code failed to print node timings even when it should, due to
failure to set es.timing in the ExplainState struct.  Reported off-list by
Neelakanth Nadgir of Salesforce.

In passing, clean up the documentation for auto_explain's options a
little bit, including re-ordering them into what seems to me a more
logical order.
2014-09-19 13:19:27 -04:00
Andres Freund bdd5726c34 Add the capability to display summary statistics to pg_xlogdump.
The new --stats/--stats=record options to pg_xlogdump display per
rmgr/per record statistics about the parsed WAL. This is useful to
understand what the WAL primarily consists of, to allow targeted
optimizations on application, configuration, and core code level.

It is likely that we will want to fine tune the statistics further,
but the feature already is quite helpful.

Author: Abhijit Menon-Sen, slightly editorialized by me
Reviewed-By: Andres Freund, Dilip Kumar and Furuya Osamu
Discussion: 20140604104716.GA3989@toroid.org
2014-09-19 16:33:16 +02:00
Andres Freund 728f152e07 Add rmgr callback to name xlog record types for display purposes.
This is primarily useful for the upcoming pg_xlogdump --stats feature,
but also allows to remove some duplicated code in the rmgr_desc
routines.

Due to the separation and harmonization, the output of dipsplayed
records changes somewhat. But since this isn't enduser oriented
content that's ok.

It's potentially desirable to further change pg_xlogdump's display of
records. It previously wasn't possible to show the record type
separately from the description forcing it to be in the last
column. But that's better done in a separate commit.

Author: Abhijit Menon-Sen, slightly editorialized by me
Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas
Discussion: 20140604104716.GA3989@toroid.org
2014-09-19 16:20:29 +02:00
Bruce Momjian c3c75fcd7a pg_upgrade: adjust C comments 2014-09-11 18:44:00 -04:00
Heikki Linnakangas 01a2bfd172 Fix Windows build.
I renamed a variable, but missed an #ifdef WIN32 block.
2014-09-11 15:15:40 +03:00
Heikki Linnakangas 54a2d5b37b Simplify calculation of Poisson distributed delays in pgbench --rate mode.
The previous coding first generated a uniform random value between 0.0 and
1.0, then converted that to an integer between 1 and 10000, and divided that
again by 10000. Those conversions are unnecessary; we can use the double
value that pg_erand48() returns directly. While we're at it, put the logic
into a helper function, getPoissonRand().

The largest delay generated by the old coding was about 9.2 times the
average, because of the way the uniformly distributed value used for the
calculation was truncated to 1/10000 granularity. The new coding doesn't
have such clamping. With my laptop's DBL_MIN value, the maximum delay with
the new coding is about 700x the average. That seems acceptable - any
reasonable pgbench session should last long enough to average that out.

Backpatch to 9.4.
2014-09-11 13:00:48 +03:00
Heikki Linnakangas 02e3bcc661 Change the way latency is calculated with pgbench --rate option.
The reported latency values now include the "schedule lag" time, that is,
the time between the transaction's scheduled start time and the time it
actually started. This relates better to a model where requests arrive at a
certain rate, and we are interested in the response time to the end user or
application, rather than the response time of the database itself.

Also, when --rate is used, include the schedule lag time in the log output.

The --rate option is new in 9.4, so backpatch to 9.4. It seems better to
make this change in 9.4, while we're still in the beta period, than ship a
9.4 version that calculates the values differently than 9.5.
2014-09-11 12:57:32 +03:00
Bruce Momjian acc8e41681 pg_upgrade: compare control version, not catalog version
Also modify test for the possibility the large object value might not
exist in the old cluster.

Fix for commit e1598a15f4
2014-09-10 20:22:10 -04:00
Bruce Momjian e1598a15f4 pg_upgrade: check for large object size compatibility 2014-09-10 19:23:36 -04:00
Peter Eisentraut 220bb39dee doc: Reflect renaming of Mac OS X to OS X
bug #10528
2014-09-09 13:56:29 -04:00
Bruce Momjian a74a4aa23b pg_upgrade: preserve the timestamp epoch
This is useful for replication tools like Slony and Skytools.

Report by Sergey Konoplev
2014-09-05 19:19:41 -04:00
Peter Eisentraut 303f4d1012 Assorted message fixes and improvements 2014-09-05 01:25:27 -04:00
Andres Freund d6fa44fce7 Add skip-empty-xacts option to test_decoding for use in the regression tests.
The regression tests for contrib/test_decoding regularly failed on
postgres instances that were very slow. Either because the hardware
itself was slow or because very expensive debugging options like
CLOBBER_CACHE_ALWAYS were used.

The reason they failed was just that some additional transactions were
decoded. Analyze and vacuum, triggered by autovac.

To fix just add a option to test_decoding to only display transactions
in which a change was actually displayed. That's not pretty because it
removes information from the tests; but better than constantly failing
tests in very likely harmless ways.

Backpatch to 9.4 where logical decoding was introduced.

Discussion: 20140629142511.GA26930@awork2.anarazel.de
2014-09-01 15:59:44 +02:00
Andres Freund 4b4b680c3d Make backend local tracking of buffer pins memory efficient.
Since the dawn of time (aka Postgres95) multiple pins of the same
buffer by one backend have been optimized not to modify the shared
refcount more than once. This optimization has always used a NBuffer
sized array in each backend keeping track of a backend's pins.

That array (PrivateRefCount) was one of the biggest per-backend memory
allocations, depending on the shared_buffers setting. Besides the
waste of memory it also has proven to be a performance bottleneck when
assertions are enabled as we make sure that there's no remaining pins
left at the end of transactions. Also, on servers with lots of memory
and a correspondingly high shared_buffers setting the amount of random
memory accesses can also lead to poor cpu cache efficiency.

Because of these reasons a backend's buffers pins are now kept track
of in a small statically sized array that overflows into a hash table
when necessary. Benchmarks have shown neutral to positive performance
results with considerably lower memory usage.

Patch by me, review by Robert Haas.

Discussion: 20140321182231.GA17111@alap3.anarazel.de
2014-08-30 14:03:21 +02:00
Tom Lane 7f7eec89b6 Fix citext upgrade script for disallowance of oidvector element assignment.
In commit 45e02e3232, we intentionally
disallowed updates on individual elements of oidvector columns.  While that
still seems like a sane idea in the abstract, we (I) forgot that citext's
"upgrade from unpackaged" script did in fact perform exactly such updates,
in order to fix the problem that citext indexes should have a collation
but would not in databases dumped or upgraded from pre-9.1 installations.

Even if we wanted to add casts to allow such updates, there's no practical
way to do so in the back branches, so the only real alternative is to make
citext's kluge even klugier.  In this patch, I cast the oidvector to text,
fix its contents with regexp_replace, and cast back to oidvector.  (Ugh!)

Since the aforementioned commit went into all active branches, we have to
fix this in all branches that contain the now-broken update script.

Per report from Eric Malm.
2014-08-28 18:21:05 -04:00
Peter Eisentraut 2d759341d9 Fix whitespace 2014-08-26 17:26:45 -04:00
Andres Freund 57ca1d4f01 Specify the port in dblink and postgres_fdw tests.
That allows to run those tests against a postmaster listening on a
nonstandard port without requiring to export PGPORT in postmaster's
environment.

This still doesn't support connecting to a nondefault host without
configuring it in postmaster's environment. That's harder and less
frequently used though. So this is a useful step.
2014-08-26 12:28:08 +02:00
Andres Freund ddc2504dbc Don't hardcode contrib_regression dbname in postgres_fdw and dblink tests.
That allows parallel installcheck to succeed inside contrib/. The
output is not particularly pretty unless make's -O option to
synchronize the output is used.

There's other tests, outside contrib, that use a hardcoded,
non-unique, database name. Those prohibit paralell installcheck to be
used across more directories; but that's something for a separate
patch.
2014-08-26 12:27:26 +02:00