Commit Graph

25242 Commits

Author SHA1 Message Date
Noah Misch c31305de5f Address ccvalid/ccnoinherit in TupleDesc support functions.
equalTupleDescs() neglected both of these ConstrCheck fields, and
CreateTupleDescCopyConstr() neglected ccnoinherit.  At this time, the
only known behavior defect resulting from these omissions is constraint
exclusion disregarding a CHECK constraint validated by an ALTER TABLE
VALIDATE CONSTRAINT statement issued earlier in the same transaction.
Back-patch to 9.2, where these fields were introduced.
2014-03-23 02:13:43 -04:00
Heikki Linnakangas 4c0e97c2d5 Fix thinkos in GinLogicValue enum.
It was incorrectly declared as global variable, not an enum type, and
the comments for GIN_FALSE and GIN_TRUE were backwards.
2014-03-21 23:41:37 +01:00
Heikki Linnakangas dea6ed2c98 Fix build with LWLOCK_STATS or dtrace.
Also fix the name of the dtrace probe for LWLockAcquireOrWait(). The
function was renamed from LWLockWaitUntilFree to LWLockAqcuireOrWait, but
the dtrace probe was neglected.

Pointed out by Andres Freund and the buildfarm.
2014-03-21 23:26:34 +01:00
Bruce Momjian 1494931d73 Remove MinGW readdir/errno bug workaround fixed on 2003-10-10 2014-03-21 13:47:37 -04:00
Bruce Momjian 6f03927fce Properly check for readdir/closedir() failures
Clear errno before calling readdir() and handle old MinGW errno bug
while adding full test coverage for readdir/closedir failures.

Backpatch through 8.4.
2014-03-21 13:45:11 -04:00
Heikki Linnakangas 68a2e52bba Replace the XLogInsert slots with regular LWLocks.
The special feature the XLogInsert slots had over regular LWLocks is the
insertingAt value that was updated atomically with releasing backends
waiting on it. Add new functions to the LWLock API to do that, and replace
the slots with LWLocks. This reduces the amount of duplicated code.
(There's still some duplication, but at least it's all in lwlock.c now.)

Reviewed by Andres Freund.
2014-03-21 15:10:48 +01:00
Tom Lane af930e606a Again fix initialization of auto-tuned effective_cache_size.
The previous method was overly complex and underly correct; in particular,
by assigning the default value with PGC_S_OVERRIDE, it prevented later
attempts to change the setting in postgresql.conf, as noted by Jeff Janes.
We should just assign the default value with source PGC_S_DYNAMIC_DEFAULT,
which will have the desired priority relative to the boot_val as well as
user-set values.

There is still a gap in this method: if there's an explicit assignment of
effective_cache_size = -1 in the postgresql.conf file, and that assignment
appears before shared_buffers is assigned, the code will substitute 4 times
the bootstrap default for shared_buffers, and that value will then persist
(since it will have source PGC_S_FILE).  I don't see any very nice way
to avoid that though, and it's not a case to be expected in practice.
The existing comments in guc-file.l look forward to a redesign of the
DYNAMIC_DEFAULT mechanism; if that ever happens, we should consider this
case as one of the things we'd like to improve.
2014-03-20 12:58:30 -04:00
Bruce Momjian a4c8f14364 libpq: pass a memory allocation failure error up to PQconndefaults()
Previously user name memory allocation failures were ignored and the
default user name set to NULL.
2014-03-20 11:48:31 -04:00
Alvaro Herrera f88d4cfc9d Setup error context callback for transaction lock waits
With this in place, a session blocking behind another one because of
tuple locks will get a context line mentioning the relation name, tuple
TID, and operation being done on tuple.  For example:

LOG:  process 11367 still waiting for ShareLock on transaction 717 after 1000.108 ms
DETAIL:  Process holding the lock: 11366. Wait queue: 11367.
CONTEXT:  while updating tuple (0,2) in relation "foo"
STATEMENT:  UPDATE foo SET value = 3;

Most usefully, the new line is displayed by log entries due to
log_lock_waits, although of course it will be printed by any other log
message as well.

Author: Christian Kruse, some tweaks by Álvaro Herrera
Reviewed-by: Amit Kapila, Andres Freund, Tom Lane, Robert Haas
2014-03-19 15:10:36 -03:00
Tom Lane ea8c7e9054 Fix memory leak during regular expression execution.
For a regex containing backrefs, pg_regexec() might fail to free all the
sub-DFAs that were created during execution, resulting in a permanent
(session lifespan) memory leak.  Problem was introduced by me in commit
587359479a.  Per report from Sandro Santilli;
diagnosis by Greg Stark.
2014-03-19 11:09:24 -04:00
Fujii Masao fb1d92a9fa Some minor improvements to logical decoding document.
Also improve help message in pg_recvlogical.
2014-03-19 22:13:05 +09:00
Heikki Linnakangas 59a5ab3f42 Remove rm_safe_restartpoint machinery.
It is no longer used, none of the resource managers have multi-record
actions that would make it unsafe to perform a restartpoint.

Also don't allow rm_cleanup to write WAL records, it's also no longer
required. Move the call to rm_cleanup routines to make it more symmetric
with rm_startup.
2014-03-18 22:10:35 +02:00
Heikki Linnakangas 1d3b258cbe Fix misc typos in comments. 2014-03-18 21:05:18 +02:00
Robert Haas a3b30d4cfe Fix uninitialized variable.
Report from Andres Freund, but not his fix.
2014-03-18 14:54:35 -04:00
Heikki Linnakangas 40dae7ec53 Make the handling of interrupted B-tree page splits more robust.
Splitting a page consists of two separate steps: splitting the child page,
and inserting the downlink for the new right page to the parent. Previously,
we handled the case that you crash in between those steps with a cleanup
routine after the WAL recovery had finished, which finished the incomplete
split. However, that doesn't help if the page split is interrupted but the
database doesn't crash, so that you don't perform WAL recovery. That could
happen for example if you run out of disk space.

Remove the end-of-recovery cleanup step. Instead, when a page is split, the
left page is marked with a new INCOMPLETE_SPLIT flag, and when the downlink
is inserted to the parent, the flag is cleared again. If an insertion sees
a page with the flag set, it knows that the split was interrupted for some
reason, and inserts the missing downlink before proceeding.

I used the same approach to fix GIN and GiST split algorithms earlier. This
was the last WAL cleanup routine, so we could get rid of that whole
machinery now, but I'll leave that for a separate patch.

Reviewed by Peter Geoghegan.
2014-03-18 20:50:44 +02:00
Robert Haas 8bdd12bbf0 Add pg_recvlogical, a tool to receive data logical decoding data.
This is fairly basic at the moment, but it's at least useful for
testing and debugging, and possibly more.

Andres Freund
2014-03-18 12:25:14 -04:00
Robert Haas 250f8a7bbe Rewrite comment for shm_mq_receive_bytes.
The comment and the code diverged at some point before the initial
commit of this feature, and I failed to notice.

Noted by Tom Lane.
2014-03-18 11:53:28 -04:00
Tom Lane f7271c4427 Fix relcache reference leak in refresh_by_match_merge().
One path through the loop over indexes forgot to do index_close().  Rather
than adding a fourth call, restructure slightly so that there's only one.

In passing, get rid of an unnecessary syscache lookup: the pg_index struct
for the index is already available from its relcache entry.

Per report from YAMAMOTO Takashi, though this is a bit different from his
suggested patch.  This is new code in HEAD, so no need for back-patch.
2014-03-18 11:36:53 -04:00
Robert Haas 3bd261ca18 Improve shm_mq portability around MAXIMUM_ALIGNOF and sizeof(Size).
Revise the original decision to expose a uint64-based interface and
use Size everywhere possible.  Avoid assuming that MAXIMUM_ALIGNOF is
8, or making any assumption about the relationship between that value
and sizeof(Size).  If MAXIMUM_ALIGNOF is bigger, we'll now insert
padding after the length word; if it's smaller, we are now prepared
to read and write the length word in chunks.

Per discussion with Tom Lane.
2014-03-18 11:23:13 -04:00
Tom Lane 19f2d6cdae Fix pg_dumpall option parsing: -i doesn't take an argument.
This used to work properly, but got fat-fingered in commit
3dee636e04.  Per bug #9620 from
Nicolas Payart.
2014-03-18 10:38:25 -04:00
Fujii Masao e726e59dc4 Fix help message and document in pg_receivexlog.
Add SLOTNAME placeholder to --slot option in help message and
document.
2014-03-18 21:15:45 +09:00
Robert Haas 79a4d24f31 Make it easy to detach completely from shared memory.
The new function dsm_detach_all() can be used either by postmaster
children that don't wish to take any risk of accidentally corrupting
shared memory; or by forked children of regular backends with
the same need.  This patch also updates the postmaster children that
already do PGSharedMemoryDetach() to do dsm_detach_all() as well.

Per discussion with Tom Lane.
2014-03-18 07:58:53 -04:00
Tom Lane d70cf811f7 During index build, check and elog (not just Assert) for broken HOT chain.
The recently-fixed bug in WAL replay could result in not finding a parent
tuple for a heap-only tuple.  The existing code would either Assert or
generate an invalid index entry, neither of which is desirable.  Throw a
regular error instead.
2014-03-17 12:36:11 -04:00
Heikki Linnakangas d663d4399e Fix thinko: have trueTriConsistentFn return GIN_TRUE.
While we're at it, also improve comments in ginlogic.c.
2014-03-17 17:29:04 +02:00
Fujii Masao 2bccced110 Fix typos in comments.
Thom Brown
2014-03-17 20:47:28 +09:00
Fujii Masao 5c6d9fc4b2 Fix bug in clean shutdown of walsender that pg_receiving is connecting to.
On clean shutdown, walsender waits for all WAL to be replicated to a standby,
and exits. It determined whether that replication had been completed by
checking whether its sent location had been equal to a standby's flush
location. Unfortunately this condition never becomes true when the standby
such as pg_receivexlog which always returns an invalid flush location is
connecting to walsender, and then walsender waits forever.

This commit changes walsender so that it just checks a standby's write
location if a flush location is invalid.

Back-patch to 9.1 where enough infrastructure for this exists.
2014-03-17 20:37:50 +09:00
Magnus Hagander 02703ff227 Fix small typo in comment
Michael Paquier
2014-03-17 09:09:21 +01:00
Alvaro Herrera bd1154edec plperl: Fix memory leak in hek2cstr
Backpatch all the way back to 9.1, where it was introduced by commit
50d89d42.

Reported by Sergey Burladyan in #9223
Author: Alex Hunsaker
2014-03-16 23:22:21 -03:00
Peter Eisentraut 2861e8e9cb Make punctuation consistent 2014-03-16 21:47:35 -04:00
Peter Eisentraut e2b959478c Fix whitespace 2014-03-16 21:47:35 -04:00
Tom Lane f4051e363c Fix advertised dispsize for libpq's sslmode connection parameter.
"8" was correct back when "disable" was the longest allowed value, but
since "verify-full" was added, it should be "12".  Given the lack of
complaints, I wouldn't be surprised if nobody is actually using these
values ... but still, if they're in the API, they should be right.

Noticed while pursuing a different problem.  It's been wrong for quite
a long time, so back-patch to all supported branches.
2014-03-16 21:43:40 -04:00
Magnus Hagander 0294023a6b Cleanups from the remove-native-krb5 patch
krb_srvname is actually not available anymore as a parameter server-side, since
with gssapi we accept all principals in our keytab. It's still used in libpq for
client side specification.

In passing remove declaration of krb_server_hostname, where all the functionality
was already removed.

Noted by Stephen Frost, though a different solution than his suggestion
2014-03-16 15:22:45 +01:00
Tom Lane aba7f56779 Update time zone data files to tzdata release 2014a.
DST law changes in Fiji, Turkey; historical changes in Israel, Ukraine.
2014-03-15 13:36:07 -04:00
Heikki Linnakangas efada2b8e9 Fix race condition in B-tree page deletion.
In short, we don't allow a page to be deleted if it's the rightmost child
of its parent, but that situation can change after we check for it.

Problem
-------

We check that the page to be deleted is not the rightmost child of its
parent, and then lock its left sibling, the page itself, its right sibling,
and the parent, in that order. However, if the parent page is split after
the check but before acquiring the locks, the target page might become the
rightmost child, if the split happens at the right place. That leads to an
error in vacuum (I reproduced this by setting a breakpoint in debugger):

ERROR:  failed to delete rightmost child 41 of block 3 in index "foo_pkey"

We currently re-check that the page is still the rightmost child, and throw
the above error if it's not. We could easily just give up rather than throw
an error, but that approach doesn't scale to half-dead pages. To recap,
although we don't normally allow deleting the rightmost child, if the page
is the *only* child of its parent, we delete the child page and mark the
parent page as half-dead in one atomic operation. But before we do that, we
check that the parent can later be deleted, by checking that it in turn is
not the rightmost child of the grandparent (potentially recursing all the
way up to the root). But the same situation can arise there - the
grandparent can be split while we're not holding the locks. We end up with
a half-dead page that we cannot delete.

To make things worse, the keyspace of the deleted page has already been
transferred to its right sibling. As the README points out, the keyspace at
the grandparent level is "out-of-whack" until the half-dead page is deleted,
and if enough tuples with keys in the transferred keyspace are inserted, the
page might get split and a downlink might be inserted into the grandparent
that is out-of-order. That might not cause any serious problem if it's
transient (as the README ponders), but is surely bad if it stays that way.

Solution
--------

This patch changes the page deletion algorithm to avoid that problem. After
checking that the topmost page in the chain of to-be-deleted pages is not
the rightmost child of its parent, and then deleting the pages from bottom
up, unlink the pages from top to bottom. This way, the intermediate stages
are similar to the intermediate stages in page splitting, and there is no
transient stage where the keyspace is "out-of-whack". The topmost page in
the to-be-deleted chain doesn't have a downlink pointing to it, like a page
split before the downlink has been inserted.

This also allows us to get rid of the cleanup step after WAL recovery, if we
crash during page deletion. The deletion will be continued at next VACUUM,
but the tree is consistent for searches and insertions at every step.

This bug is old, all supported versions are affected, but this patch is too
big to back-patch (and changes the WAL record formats of related records).
We have not heard any reports of the bug from users, so clearly it's not
easy to bump into. Maybe backpatch later, after this has had some field
testing.

Reviewed by Kevin Grittner and Peter Geoghegan.
2014-03-14 16:07:19 +02:00
Tom Lane 6c461cb92f Prevent interrupts while reporting non-ERROR elog messages.
This should eliminate the risk of recursive entry to syslog(3), which
appears to be the cause of the hang reported in bug #9551 from James
Morton.

Arguably, the real problem here is auth.c's willingness to turn on
ImmediateInterruptOK while executing fairly wide swaths of backend code.
We may well need to work at narrowing the code ranges in which the
authentication_timeout interrupt is enabled.  For the moment, though,
this is a cheap and reasonably noninvasive fix for a field-reported
failure; the other approach would be complex and not necessarily
bug-free itself.

Back-patch to all supported branches.
2014-03-13 20:59:42 -04:00
Tom Lane f70a78bc1f Allow psql to print COPY command status in more cases.
Previously, psql would print the "COPY nnn" command status only for COPY
commands executed server-side.  Now it will print that for frontend copies
too (including \copy).  However, we continue to suppress the command status
for COPY TO STDOUT, since in that case the copy data has been routed to the
same place that the command status would go, and there is a risk of the
status line being mistaken for another line of COPY data.  Doing that would
break existing scripts, and it doesn't seem worth the benefit --- this case
seems fairly analogous to SELECT, for which we also suppress the command
status.

Kumar Rajeev Rastogi, with substantial review by Amit Khandekar
2014-03-13 13:49:03 -04:00
Tom Lane 7bae0284ee Avoid transaction-commit race condition while receiving a NOTIFY message.
Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
whether a NOTIFY message's originating transaction is in progress,
committed, or aborted.  The previous coding could accept a message from a
transaction that was still in-progress according to the PGPROC array;
if the client were fast enough at starting a new transaction, it might fail
to see table rows added/updated by the message-sending transaction.  Which
of course would usually be the point of receiving the message.  We noted
this type of race condition long ago in tqual.c, but async.c overlooked it.

The race condition probably cannot occur unless there are multiple NOTIFY
senders in action, since an individual backend doesn't send NOTIFY signals
until well after it's done committing.  But if two senders commit in close
succession, it's certainly possible that we could see the second sender's
message within the race condition window while responding to the signal
from the first one.

Per bug #9557 from Marko Tiikkaja.  This patch is slightly more invasive
than what he proposed, since it removes the now-redundant
TransactionIdDidAbort call.

Back-patch to 9.0, where the current NOTIFY implementation was introduced.
2014-03-13 12:02:54 -04:00
Bruce Momjian 242c2737fb C comments: remove odd blank lines after #ifdef WIN32 lines
A few more
2014-03-13 01:42:24 -04:00
Bruce Momjian 886c0be3f6 C comments: remove odd blank lines after #ifdef WIN32 lines 2014-03-13 01:34:42 -04:00
Heikki Linnakangas a3115f0d9e Only WAL-log the modified portion in an UPDATE, if possible.
When a row is updated, and the new tuple version is put on the same page as
the old one, only WAL-log the part of the new tuple that's not identical to
the old. This saves significantly on the amount of WAL that needs to be
written, in the common case that most fields are not modified.

Amit Kapila, with a lot of back and forth with me, Robert Haas, and others.
2014-03-12 23:28:36 +02:00
Heikki Linnakangas 17d787a3b1 Items on GIN data pages are no longer always 6 bytes; update gincostestimate.
Also improve the comments a bit.
2014-03-12 20:52:22 +02:00
Fujii Masao 588fb50715 Show PIDs of lock holders and waiters in log_lock_waits log message.
Christian Kruse, reviewed by Kumar Rajeev Rastogi.
2014-03-13 03:26:47 +09:00
Robert Haas 336a578b8c Fix incorrect assertion about historical snapshots.
Also fix some nearby comments.

Andres Freund
2014-03-12 14:07:41 -04:00
Robert Haas 890194f14d Comment fixes related to logical decoding.
Andres Freund, per complaints by Peter Eisentraut.
2014-03-12 14:03:09 -04:00
Heikki Linnakangas c5608ea26a Allow opclasses to provide tri-valued GIN consistent functions.
With the GIN "fast scan" feature, GIN can skip items without fetching all
the keys for them, if it can prove that they don't match regardless of
those keys. So far, it has done the proving by calling the boolean
consistent function with all combinations of TRUE/FALSE for the unfetched
keys, but since that's O(n^2), it becomes unfeasible with more than a few
keys. We can avoid calling consistent with all the combinations, if we can
tell the operator class implementation directly which keys are unknown.

This commit includes a triConsistent function for the built-in array and
tsvector opclasses.

Alexander Korotkov, with some changes by me.
2014-03-12 17:51:30 +02:00
Heikki Linnakangas fecfc2b913 In WAL replay, restore GIN metapage unconditionally to avoid torn page.
We don't take a full-page image of the GIN metapage; instead, the WAL record
contains all the information required to reconstruct it from scratch. But
to avoid torn page hazards, we must re-initialize it from the WAL record
every time, even if it already has a greater LSN, similar to how normal full
page images are restored.

This was highly unlikely to cause any problems in practice, because the GIN
metapage is small. We rely on an update smaller than a 512 byte disk sector
to be atomic elsewhere, at least in pg_control. But better safe than sorry,
and this would be easy to overlook if more fields are added to the metapage
so that it's no longer small.

Reported by Noah Misch. Backpatch to all supported versions.
2014-03-12 10:04:57 +02:00
Tom Lane e85a5ffba8 Fix tracking of psql script line numbers during \copy from another place.
Commit 08146775ac changed do_copy() to
temporarily scribble on pset.cur_cmd_source.  That was a mighty ugly bit of
code in any case, but in particular it broke handleCopyIn's ability to tell
whether it was reading from the current script source file (in which case
pset.lineno should be incremented for each line of COPY data), or from
someplace else (in which case it shouldn't).  The former case still worked,
the latter not so much.  The visible effect was that line numbers reported
for errors in a script file would be wrong if there were an earlier \copy
that was reading anything other than inline-in-the-script-file data.

To fix, introduce another pset field that holds the file do_copy wants the
COPY code to use.  This is a little bit ugly, but less so than passing the
file down explicitly through several layers that aren't COPY-specific.

Extracted from a larger patch by Kumar Rajeev Rastogi; that patch also
changes printing of COPY command tags, which is not a bug fix and shouldn't
get back-patched.  This particular idea was from a suggestion by Amit
Khandekar, if I'm reading the thread correctly.

Back-patch to 9.2 where the faulty code was introduced.
2014-03-10 15:47:40 -04:00
Robert Haas 8722017bbc Allow dynamic shared memory segments to be kept until shutdown.
Amit Kapila, reviewed by Kyotaro Horiguchi, with some further
changes by me.
2014-03-10 14:04:47 -04:00
Robert Haas 5a991ef869 Allow logical decoding via the walsender interface.
In order for this to work, walsenders need the optional ability to
connect to a database, so the "replication" keyword now allows true
or false, for backward-compatibility, and the new value "database"
(which causes the "dbname" parameter to be respected).

walsender needs to loop not only when idle but also when sending
decoded data to the user and when waiting for more xlog data to decode.
This means that there are now three separate loops inside walsender.c;
although some refactoring has been done here, this is still a bit ugly.

Andres Freund, with contributions from Álvaro Herrera, and further
review by me.
2014-03-10 13:50:28 -04:00
Robert Haas cb9a0c7987 Teach on_exit_reset() to discard pending cleanups for dsm.
If a postmaster child invokes fork() and then calls on_exit_reset, that
should be sufficient to let it exit() without breaking anything, but
dynamic shared memory broke that by not updating on_exit_reset() to
discard callbacks registered with dynamic shared memory segments.

Per investigation of a complaint from Tom Lane.
2014-03-10 10:17:19 -04:00
Simon Riggs 77049443a1 Correct copy/pasto in comment for REPLICA IDENTITY 2014-03-09 09:05:16 +00:00
Bruce Momjian 5024044a20 C comments: improve description of relfilenode uniqueness
Report by Antonin Houska
2014-03-08 12:20:30 -05:00
Bruce Momjian 11d205e2bd pg_ctl: improve handling of invalid data directory
Return '4' and report a meaningful error message when a non-existent or
invalid data directory is passed.  Previously, pg_ctl would just report
the server was not running.

Patch by me and Amit Kapila
Report from Peter Eisentraut
2014-03-08 12:15:25 -05:00
Tom Lane ea177a3ba7 Remove unportable use of anonymous unions from reorderbuffer.h.
In b89e151054 I had assumed it was ok to use anonymous unions as
struct members, but while a longstanding extension in many compilers,
it's only been standardized in C11.

To fix, remove one of the anonymous unions which tried to hide some
implementation specific enum values and give the other a name. The
latter unfortunately requires changes in output plugins, but since the
feature has only been added a few days ago...

Andres Freund
2014-03-07 17:03:26 -05:00
Bruce Momjian 91d9de9751 fix ReplicationSlotsCountDBSlots for dropping unrelated databases
YAMAMOTO Takashi
2014-03-07 11:42:18 -05:00
Heikki Linnakangas 55566c9a74 Fix dangling smgr_owner pointer when a fake relcache entry is freed.
A fake relcache entry can "own" a SmgrRelation object, like a regular
relcache entry. But when it was free'd, the owner field in SmgrRelation
was not cleared, so it was left pointing to free'd memory.

Amazingly this apparently hasn't caused crashes in practice, or we would've
heard about it earlier. Andres found this with Valgrind.

Report and fix by Andres Freund, with minor modifications by me. Backpatch
to all supported versions.
2014-03-07 13:28:52 +02:00
Heikki Linnakangas ad7b48ea08 Avoid memcpy() with same source and destination address.
The behavior of that is undefined, although unlikely to lead to problems in
practice.

Found by running regression tests with Valgrind.
2014-03-07 13:14:33 +02:00
Tom Lane 7c31874945 Avoid getting more than AccessShareLock when deparsing a query.
In make_ruledef and get_query_def, we have long used AcquireRewriteLocks
to ensure that the querytree we are about to deparse is up-to-date and
the schemas of the underlying relations aren't changing.  Howwever, that
function thinks the query is about to be executed, so it acquires locks
that are stronger than necessary for the purpose of deparsing.  Thus for
example, if pg_dump asks to deparse a rule that includes "INSERT INTO t",
we'd acquire RowExclusiveLock on t.  That results in interference with
concurrent transactions that might for example ask for ShareLock on t.
Since pg_dump is documented as being purely read-only, this is unexpected.
(Worse, it used to actually be read-only; this behavior dates back only
to 8.1, cf commit ba4200246.)

Fix this by adding a parameter to AcquireRewriteLocks to tell it whether
we want the "real" execution locks or only AccessShareLock.

Report, diagnosis, and patch by Dean Rasheed.  Back-patch to all supported
branches.
2014-03-06 19:31:05 -05:00
Heikki Linnakangas a0c2fa9b5c isdigit() needs an unsigned char argument.
Per the C standard, the routine should be passed an int, with a value that's
representable as an unsigned char or EOF. Passing a signed char is wrong,
because a negative value is not representable as an unsigned char.
Unfortunately no compiler warns about that.
2014-03-06 21:40:10 +02:00
Heikki Linnakangas 94ae6ba74d Send keepalives from walsender even when busy sending WAL.
If walsender doesn't hear from the client for the time specified by
wal_sender_timeout, it will conclude the connection or client is dead, and
disconnect. When half of wal_sender_timeout has elapsed, it sends a ping
to the client, leaving it the remainig half of wal_sender_timeout to
respond. However, it only checked if half of wal_sender_timeout had elapsed
when it was about to sleep, so if it was busy sending WAL to the client for
long enough, it would not send the ping request in time. Then the client
would not know it needs to send a reply, and the walsender will disconnect
even though the client is still alive. Fix that.

Andres Freund, reviewed by Robert Haas, and some further changes by me.
Backpatch to 9.3. Earlier versions relied on the client to send the
keepalives on its own, and hence didn't have this problem.
2014-03-06 21:38:51 +02:00
Tom Lane bf4052faa1 Don't reject ROW_MARK_REFERENCE rowmarks for materialized views.
We should allow this so that matviews can be referenced in UPDATE/DELETE
statements in READ COMMITTED isolation level.  The requirement for that
is that a re-fetch by TID will see the same row version the query saw
earlier, which is true of matviews, so there's no reason for the
restriction.  Per bug #9398.

Michael Paquier, after a suggestion by me
2014-03-06 11:37:02 -05:00
Bruce Momjian 0024a3a3b6 C comment update: relfilenode is only unique with a tablespace
Report from Antonin Houska
2014-03-05 20:52:34 -05:00
Bruce Momjian b44fc39fce pg_dump: make argument combination error exit code consistent
Per report from Pavel Golub
2014-03-05 18:15:49 -05:00
Tom Lane f1ba94bcd9 Fix portability issues in recently added make_timestamp/make_interval code.
Explicitly reject infinity/NaN inputs, rather than just assuming that
something else will do it for us.  Per buildfarm.

While at it, make some over-parenthesized and under-legible code
more readable.
2014-03-05 16:42:18 -05:00
Tom Lane 8cf0ad1ea3 Add comment that ec_relids excludes "child" EquivalenceClass members.
This was already documented a few lines further down, but the comment
just beside the field declaration could be misleading.  Per gripe
from Kyotaro Horiguchi.
2014-03-05 16:00:33 -05:00
Robert Haas 406a1a9ef0 Fix some typos introduced by the logical decoding patch.
Erik Rijkers
2014-03-05 13:00:22 -05:00
Tom Lane 114b26c06f Remove unused field "evttype".
Apparent oversight in commit 3855968f.
2014-03-05 11:57:53 -05:00
Alvaro Herrera 2b4f2ab33d Remove the correct pgstat file on DROP DATABASE
We were unlinking the permanent file, not the non-permanent one.  But
since the stat collector already unlinks all permanent files on startup,
there was nothing for it to unlink.  The non-permanent file remained in
place, and was copied to the permanent directory on shutdown, so in
effect no file was ever dropped.

Backpatch to 9.3, where the issue was introduced by commit 187492b6c2.
Before that, there were no per-database files and thus no file to drop
on DROP DATABASE.

Per report from Thom Brown.

Author: Tomáš Vondra
2014-03-05 13:03:29 -03:00
Stephen Frost dd917bb793 Allocate fresh memory for post_opts/exec_path
Instead of having read_post_opts() depend on the memory allocated for
the config file (which is now getting free'd), pg_strdup() for
post_opts and exec_path (similar to how it's being done elsewhere).

Noted by Thom Brown.
2014-03-05 08:50:12 -05:00
Heikki Linnakangas 956685f82b Do wal_level and hot standby checks when doing crash-then-archive recovery.
CheckRequiredParameterValues() should perform the checks if archive recovery
was requested, even if we are going to perform crash recovery first.

Reported by Kyotaro HORIGUCHI. Backpatch to 9.2, like the crash-then-archive
recovery mode.
2014-03-05 14:48:14 +02:00
Heikki Linnakangas af246c37c0 Fix lastReplayedEndRecPtr calculation when starting from shutdown checkpoint.
When entering crash recovery followed by archive recovery, and the latest
checkpoint is a shutdown checkpoint, and there are no more WAL records to
replay before transitioning from crash to archive recovery, we would not
immediately allow read-only connections in hot standby mode even if we
could. That's because when starting from a shutdown checkpoint, we set
lastReplayedEndRecPtr incorrectly to the record before the checkpoint
record, instead of the checkpoint record itself. We don't run the redo
routine of the shutdown checkpoint record, but starting recovery from it
goes through the same motions, so it should be considered as replayed.

Reported by Kyotaro HORIGUCHI. All versions with hot standby are affected,
so backpatch to 9.0.
2014-03-05 13:51:19 +02:00
Stephen Frost eb933162cd Fix issues with pg_ctl
The new, small, free_readfile managed to have bug in it which could
cause it to try and free something it shouldn't, and fix the case
where it was being called with an invalid pointer leading to a
segfault.

Noted by Bruce, issues introduced and fixed by me.
2014-03-05 01:30:03 -05:00
Andrew Dunstan 3b5e03dca2 Provide a FORCE NULL option to COPY in CSV mode.
This forces an input field containing the quoted null string to be
returned as a NULL. Without this option, only unquoted null strings
behave this way. This helps where some CSV producers insist on quoting
every field, whether or not it is needed. The option takes a list of
fields, and only applies to those columns. There is an equivalent
column-level option added to file_fdw.

Ian Barwick, with some tweaking by Andrew Dunstan, reviewed by Payal
Singh.
2014-03-04 17:31:59 -05:00
Alvaro Herrera 84df54b22e Constructors for interval, timestamp, timestamptz
Author: Pavel Stěhule, editorialized somewhat by Álvaro Herrera
Reviewed-by: Tomáš Vondra, Marko Tiikkaja
With input from Fabrízio de Royes Mello, Jim Nasby
2014-03-04 15:09:43 -03:00
Robert Haas af2543e884 Allow VACUUM FULL/CLUSTER to bump freeze horizons even for pg_class.
pg_class is a special case for CLUSTER and VACUUM FULL, so although
commit 3cff1879f8 caused these
operations to advance relfrozenxid and relminmxid for all other
tables, it did not provide the same benefit for pg_class.  This
plugs that gap.

Andres Freund
2014-03-04 11:08:18 -05:00
Robert Haas 7e8db2dc42 Minor corrections to logical decoding patch. 2014-03-04 11:07:54 -05:00
Heikki Linnakangas 7558cc95d3 Error out on send failure in walsender loop.
I changed the loop in 9.3 to use "goto send_failure" instead of "break" on
errors, but I missed this one case. It was a relatively harmless bug: if
the flush fails once it will most likely fail again as soon as we try to
flush the output again. But it's a bug nevertheless.

Report and fix by Andres Freund.
2014-03-04 15:36:05 +02:00
Robert Haas b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00
Peter Eisentraut de94b47c0a Fix whitespace 2014-03-03 14:05:33 -05:00
Heikki Linnakangas f8ce16d0d2 Rename huge_tlb_pages to huge_pages, and improve docs.
Christian Kruse
2014-03-03 20:52:48 +02:00
Alvaro Herrera 9067310cc5 pg_dump et al: Add --if-exists option
This option makes pg_dump, pg_dumpall and pg_restore inject an IF EXISTS
clause to each DROP command they emit.  (In pg_dumpall, the clause is
not added to individual objects drops, but rather to the CREATE DATABASE
commands, as well as CREATE ROLE and CREATE TABLESPACE.)

This allows for a better user dump experience when using --clean in case
some objects do not already exist.  Per bug #7873 by Dave Rolsky.

Author: Pavel Stěhule
Reviewed-by: Jeevan Chalke, Álvaro Herrera, Josh Kupershmidt
2014-03-03 15:02:18 -03:00
Robert Haas 34c6d9611d Use a longer buffer in libpqrcv_startstreaming.
Because of the new SLOT clause in the START_REPLICATION command, it's
possible for the command to end up too long for the old maximum buffer
length.

Andres Freund
2014-03-03 07:24:52 -05:00
Robert Haas a8e9b86b5e Bump catversion.
The previous patch should have entailed a catversion bump, but I
forgot.
2014-03-03 07:22:20 -05:00
Robert Haas d83ee62231 Corrections to replication slots code and documentation.
Andres Freund, per a report from Vik Faering
2014-03-03 07:16:54 -05:00
Robert Haas ae95f5f74a Define LSNOID in pg_type.h.
Most other built-in types have a similarly-named constant, so this
type should probably have one, too.

Michael Paquier
2014-03-03 07:03:41 -05:00
Stephen Frost 5592ebac55 Another round of Coverity fixes
Additional non-security issues/improvements spotted by Coverity.

In backend/libpq, no sense trying to protect against port->hba being
NULL after we've already dereferenced it in the switch() statement.

Prevent against possible overflow due to 32bit arithmitic in
basebackup throttling (not yet released, so no security concern).

Remove nonsensical check of array pointer against NULL in procarray.c,
looks to be a holdover from 9.1 and earlier when there were pointers
being used but now it's just an array.

Remove pointer check-against-NULL in tsearch/spell.c as we had already
dereferenced it above (in the strcmp()).

Remove dead code from adt/orderedsetaggs.c, isnull is checked
immediately after each tuplesort_getdatum() call and if true we return,
so no point checking it again down at the bottom.

Remove recently added minor error-condition memory leak in pg_regress.
2014-03-03 03:18:51 -05:00
Stephen Frost b1aebbb6a8 Various Coverity-spotted fixes
A number of issues were identified by the Coverity scanner and are
addressed in this patch.  None of these appear to be security issues
and many are mostly cosmetic changes.

Short comments for each of the changes follows.

Correct the semi-colon placement in be-secure.c regarding SSL retries.
Remove a useless comparison-to-NULL in proc.c (value is dereferenced
  prior to this check and therefore can't be NULL).
Add checking of chmod() return values to initdb.
Fix a couple minor memory leaks in initdb.
Fix memory leak in pg_ctl- involves free'ing the config file contents.
Use an int to capture fgetc() return instead of an enum in pg_dump.
Fix minor memory leaks in pg_dump.
  (note minor change to convertOperatorReference()'s API)
Check fclose()/remove() return codes in psql.
Check fstat(), find_my_exec() return codes in psql.
Various ECPG memory leak fixes.
Check find_my_exec() return in ECPG.
Explicitly ignore pqFlush return in libpq error-path.
Change PQfnumber() to avoid doing an strdup() when no changes required.
Remove a few useless check-against-NULL's (value deref'd beforehand).
Check rmtree(), malloc() results in pg_regress.
Also check get_alternative_expectfile() return in pg_regress.
2014-03-01 22:14:14 -05:00
Tom Lane 9662143f0c Allow regex operations to be terminated early by query cancel requests.
The regex code didn't have any provision for query cancel; which is
unsurprising given its non-Postgres origin, but still problematic since
some operations can take a long time.  Introduce a callback function to
check for a pending query cancel or session termination request, and
call it in a couple of strategic spots where we can make the regex code
exit with an error indicator.

If we ever actually split out the regex code as a standalone library,
some additional work will be needed to let the cancel callback function
be specified externally to the library.  But that's straightforward
(certainly so by comparison to putting the locale-dependent character
classification logic on a similar arms-length basis), and there seems
no need to do it right now.

A bigger issue is that there may be more places than these two where
we need to check for cancels.  We can always add more checks later,
now that the infrastructure is in place.

Since there are known examples of not-terribly-long regexes that can
lock up a backend for a long time, back-patch to all supported branches.
I have hopes of fixing the known performance problems later, but adding
query cancel ability seems like a good idea even if they were all fixed.
2014-03-01 15:20:56 -05:00
Heikki Linnakangas d8a42b150f Remove bogus while-loop.
Commit abf5c5c9a4 added a bogus while-
statement after the for(;;)-loop. It went unnoticed in testing, because
it was dead code.

Report by KONDO Mitsumasa. Backpatch to 9.3. The commit that introduced
this was also applied to 9.2, but not the bogus while-loop part, because
the code in 9.2 looks quite different.
2014-02-28 13:33:41 +02:00
Alvaro Herrera ef5856fd9b Allow BASE_BACKUP to be throttled
A new MAX_RATE option allows imposing a limit to the network transfer
rate from the server side.  This is useful to limit the stress that
taking a base backup has on the server.

pg_basebackup is now able to specify a value to the server, too.

Author: Antonin Houska

Patch reviewed by Stefan Radomski, Andres Freund, Zoltán Böszörményi,
Fujii Masao, and Álvaro Herrera.
2014-02-27 18:55:57 -03:00
Alvaro Herrera 6bfa88acd3 Fix WAL replay of locking an updated tuple
We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
WAL replay of a tuple-lock operation, which is incorrect when the tuple
is already updated.

Back-patch to 9.3.  The clearing of both header elements was there
previously, but since no update could be present on a tuple that was
being locked, it was harmless.

Bug reported by Peter Geoghegan and Greg Stark in
CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com and
CAM-w4HPTOeMT4KP0OJK+mGgzgcTOtLRTvFZyvD0O4aH-7dxo3Q@mail.gmail.com
respectively; diagnosis by Andres Freund.
2014-02-27 11:13:39 -03:00
Heikki Linnakangas 00976f202c btbuild no longer calls _bt_doinsert(), update comment.
Peter Geoghegan
2014-02-26 18:49:04 +02:00
Jeff Davis 486ea0b19e Fix crash in json_to_record().
json_to_record() depends on get_call_result_type() for the tuple
descriptor of the record that should be returned, but in some cases
that cannot be determined. Add a guard to check if the tuple
descriptor has been properly resolved, similar to other callers of
get_call_result_type().

Also add guard for two other callers of get_call_result_type() in
jsonfuncs.c. Although json_to_record() is the only actual bug, it's a
good idea to follow convention.
2014-02-26 07:47:41 -08:00
Tom Lane fccebe421d Use SnapshotDirty rather than an active snapshot to probe index endpoints.
If there are lots of uncommitted tuples at the end of the index range,
get_actual_variable_range() ends up fetching each one and doing an MVCC
visibility check on it, until it finally hits a visible tuple.  This is
bad enough in isolation, considering that we don't need an exact answer
only an approximate one.  But because the tuples are not yet committed,
each visibility check does a TransactionIdIsInProgress() test, which
involves scanning the ProcArray.  When multiple sessions do this
concurrently, the ensuing contention results in horrid performance loss.
20X overall throughput loss on not-too-complicated queries is easy to
demonstrate in the back branches (though someone's made it noticeably
less bad in HEAD).

We can dodge the problem fairly effectively by using SnapshotDirty rather
than a normal MVCC snapshot.  This will cause the index probe to take
uncommitted tuples as good, so that we incur only one tuple fetch and test
even if there are many such tuples.  The extent to which this degrades the
estimate is debatable: it's possible the result is actually a more accurate
prediction than before, if the endmost tuple has become committed by the
time we actually execute the query being planned.  In any case, it's not
very likely that it makes the estimate a lot worse.

SnapshotDirty will still reject tuples that are known committed dead, so
we won't give bogus answers if an invalid outlier has been deleted but not
yet vacuumed from the index.  (Because btrees know how to mark such tuples
dead in the index, we shouldn't have a big performance problem in the case
that there are many of them at the end of the range.)  This consideration
motivates not using SnapshotAny, which was also considered as a fix.

Note: the back branches were using SnapshotNow instead of an MVCC snapshot,
but the problem and solution are the same.

Per performance complaints from Bartlomiej Romanski, Josh Berkus, and
others.  Back-patch to 9.0, where the issue was introduced (by commit
40608e7f94).
2014-02-25 16:04:06 -05:00
Robert Haas cf6aa68bbd Update a few comments to mention materialized views.
Etsuro Fujita
2014-02-25 13:40:12 -05:00
Robert Haas dd1a3bccca Show xid and xmin in pg_stat_activity and pg_stat_replication.
Christian Kruse, reviewed by Andres Freund and myself, with further
minor adjustments by me.
2014-02-25 12:34:04 -05:00
Robert Haas 278c94209b pg_basebackup: Skip only the *contents* of pg_replslot.
Include the directory itself.

Fujii Masao
2014-02-25 11:23:45 -05:00
Peter Eisentraut 32001ab0b7 Update and clarify ssl_ciphers default
- Write HIGH:MEDIUM instead of DEFAULT:!LOW:!EXP for clarity.
- Order 3DES last to work around inappropriate OpenSSL default.
- Remove !MD5 and @STRENGTH, because they are irrelevant.
- Add clarifying documentation.

Effectively, the new default is almost the same as the old one, but it
is arguably easier to understand and modify.

Author: Marko Kreen <markokr@gmail.com>
2014-02-24 20:30:28 -05:00
Bruce Momjian 848ae330a4 Increase work_mem and maintenance_work_mem defaults by 4x
New defaults are 4MB and 64MB.
2014-02-24 13:04:51 -05:00
Bruce Momjian 4bad548d98 psql: add separate \d display for disabled system triggers
Previously if you disabled all triggers, only user triggers would
show as disabled

Per report from Andres Freund
2014-02-24 12:44:55 -05:00
Bruce Momjian d613861b95 pg_dump: fix subtle memory leak in func and arg signature processing 2014-02-24 12:32:41 -05:00
Bruce Momjian 423f69ab64 Allow single-point polygons to be converted to circles
This allows finding the center of a single-point polygon and converting
it to a point.

Per report from Josef Grahn
2014-02-24 12:24:00 -05:00
Bruce Momjian 8457d0beca docs: document behavior of CHAR() comparisons with chars < space
Space trimming rather than space-padding causes unusual behavior, which
might not be standards-compliant.

Also remove recently-added now-redundant C comment.
2014-02-24 12:09:23 -05:00
Robert Haas 6615e77439 Use pg_lsn data type in pg_stat_replication, too.
Michael Paquier, per a suggestion from Andres Freund
2014-02-24 10:38:45 -05:00
Robert Haas bb818b53d4 Remove a couple of comments from the pg_lsn regression test.
Previously, one of these was a negative test case, but that got
changed along the way and the comments didn't get the memo.

Michael Paquier
2014-02-24 09:32:21 -05:00
Tom Lane 769065c1b2 Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.
A large majority of the callers of pg_do_encoding_conversion were
specifying the database encoding as either source or target of the
conversion, meaning that we can use the less general functions
pg_any_to_server/pg_server_to_any instead.

The main advantage of using the latter functions is that they can make use
of a cached conversion-function lookup in the common case that the other
encoding is the current client_encoding.  It's notationally cleaner too in
most cases, not least because of the historical artifact that the latter
functions use "char *" rather than "unsigned char *" in their APIs.

Note that pg_any_to_server will apply an encoding verification step in
some cases where pg_do_encoding_conversion would have just done nothing.
This seems to me to be a good idea at most of these call sites, though
it partially negates the performance benefit.

Per discussion of bug #9210.
2014-02-23 16:59:05 -05:00
Tom Lane 49c817eab7 Plug some more holes in encoding conversion.
Various places assume that pg_do_encoding_conversion() and
pg_server_to_any() will ensure encoding validity of their results;
but they failed to do so in the case that the source encoding is SQL_ASCII
while the destination is not.  We cannot perform any actual "conversion"
in that scenario, but we should still validate the string according to the
destination encoding.  Per bug #9210 from Digoal Zhou.

Arguably this is a back-patchable bug fix, but on the other hand adding
more enforcing of encoding checks might break existing applications that
were being sloppy.  On balance there doesn't seem to be much enthusiasm
for a back-patch, so fix in HEAD only.

While at it, remove some apparently-no-longer-needed provisions for
letting pg_do_encoding_conversion() "work" outside a transaction ---
if you consider it "working" to silently fail to do the requested
conversion.

Also, make a few cosmetic improvements in mbutils.c, notably removing
some Asserts that are certainly dead code since the variables they
assert aren't null are never null, even at process start.  (I think
this wasn't true at one time, but it is now.)
2014-02-23 15:22:50 -05:00
Peter Eisentraut fb05f3ce83 pg_basebackup: Add support for relocating tablespaces
Tablespaces can be relocated in plain backup mode by specifying one or
more -T olddir=newdir options.

Author: Steeve Lennmark <steevel@handeldsbanken.se>
Reviewed-by: Peter Eisentraut <peter_e@gmx.net>
2014-02-22 13:38:06 -05:00
Tom Lane 77585bce03 Do ScalarArrayOp estimation correctly when array is a stable expression.
Most estimation functions apply estimate_expression_value to see if they
can reduce an expression to a constant; the key difference is that it
allows evaluation of stable as well as immutable functions in hopes of
ending up with a simple Const node.  scalararraysel didn't get the memo
though, and neither did gincost_opexpr/gincost_scalararrayopexpr.  Fix
that, and remove a now-unnecessary estimate_expression_value step in the
subsidiary function scalararraysel_containment.

Per complaint from Alexey Klyukin.  Back-patch to 9.3.  The problem
goes back further, but I'm hesitant to change estimation behavior in
long-stable release branches.
2014-02-21 17:10:46 -05:00
Heikki Linnakangas 8f09ca436d Improve comment on setting data_checksum GUC.
There was an extra space there, and "fixed" wasn't very descriptive.
2014-02-20 10:58:30 +02:00
Tom Lane ae5266f259 Remove inappropriate EXPORTS line.
Looks like this gets added later ...
2014-02-19 21:08:50 -05:00
Tom Lane 4f5f485d10 Avoid using dllwrap to build pgevent in Mingw builds.
If this works, we can get rid of configure's support for locating dllwrap
... but let's see what the buildfarm says, first.

Hiroshi Inoue
2014-02-19 19:34:50 -05:00
Tom Lane 52acfd27f1 Fix some missing .gitignore and "make clean" items in ecpg.
Some of the files we optionally link in from elsewhere weren't ignored
and/or weren't cleaned up at "make clean".  Noted while testing on a
machine that needs our version of snprintf.c.
2014-02-19 18:50:48 -05:00
Robert Haas 6f289c2b7d Switch various builtin functions to use pg_lsn instead of text.
The functions in slotfuncs.c don't exist in any released version,
but the changes to xlogfuncs.c represent backward-incompatibilities.
Per discussion, we're hoping that the queries using these functions
are few enough and simple enough that this won't cause too much
breakage for users.

Michael Paquier, reviewed by Andres Freund and further modified
by me.
2014-02-19 11:37:43 -05:00
Robert Haas 694e3d139a Further code review for pg_lsn data type.
Change input function error messages to be more consistent with what is
done elsewhere.  Remove a bunch of redundant type casts, so that the
compiler will warn us if we screw up.  Don't pass LSNs by value on
platforms where a Datum is only 32 bytes, per buildfarm.  Move macros
for packing and unpacking LSNs to pg_lsn.h so that we can include
access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.
2014-02-19 10:06:59 -05:00
Robert Haas 844a28a9dd pg_lsn macro naming and type behavior revisions.
Change pg_lsn_mi so that it can return negative values when subtracting
LSNs, and clean up some perhaps ill-considered macro names.
2014-02-19 09:34:15 -05:00
Robert Haas 7d03a83f4d Add a pg_lsn data type, to represent an LSN.
Robert Haas and Michael Paquier
2014-02-19 08:35:23 -05:00
Tom Lane a222f7fda6 Remove broken code that tried to handle OVERLAPS with a single argument.
The SQL standard says that OVERLAPS should have a two-element row
constructor on each side.  The original coding of OVERLAPS support in
our grammar attempted to extend that by allowing a single-element row
constructor, which it internally duplicated ... or tried to, anyway.
But that code has certainly not worked since our List infrastructure was
rewritten in 2004, and I'm none too sure it worked before that.  As it
stands, it ends up building a List that includes itself, leading to
assorted undesirable behaviors later in the parser.

Even if it worked as intended, it'd be a bit evil because of the
possibility of duplicate evaluation of a volatile function that the user
had written only once.  Given the lack of documentation, test cases, or
complaints, let's just get rid of the idea and only support the standard
syntax.

While we're at it, improve the error cursor positioning for the
wrong-number-of-arguments errors, and inline the makeOverlaps() function
since it's only called in one place anyway.

Per bug #9227 from Joshua Yanovski.  Initial patch by Joshua Yanovski,
extended a bit by me.
2014-02-18 12:44:20 -05:00
Magnus Hagander 7f3e17b482 Disable RandomizedBaseAddress on MSVC builds
The ASLR in Windows 8/Windows 2012 can break PostgreSQL's shared memory. It
doesn't fail every time (which is explained by the Random part in ASLR), but
can fail with errors abut failing to reserve shared memory region.

MauMau, reviewed by Craig Ringer
2014-02-18 14:45:58 +01:00
Heikki Linnakangas 057152b37c Fix comment; checkpointer, not bgwriter, performs checkpoints since 9.2.
Amit Langote
2014-02-18 09:48:18 +02:00
Robert Haas 876f78d575 Fix capitalization in README.
Vik Fearing
2014-02-17 14:03:41 -05:00
Tom Lane 01824385ae Prevent potential overruns of fixed-size buffers.
Coverity identified a number of places in which it couldn't prove that a
string being copied into a fixed-size buffer would fit.  We believe that
most, perhaps all of these are in fact safe, or are copying data that is
coming from a trusted source so that any overrun is not really a security
issue.  Nonetheless it seems prudent to forestall any risk by using
strlcpy() and similar functions.

Fixes by Peter Eisentraut and Jozef Mlich based on Coverity reports.

In addition, fix a potential null-pointer-dereference crash in
contrib/chkpass.  The crypt(3) function is defined to return NULL on
failure, but chkpass.c didn't check for that before using the result.
The main practical case in which this could be an issue is if libc is
configured to refuse to execute unapproved hashing algorithms (e.g.,
"FIPS mode").  This ideally should've been a separate commit, but
since it touches code adjacent to one of the buffer overrun changes,
I included it in this commit to avoid last-minute merge issues.
This issue was reported by Honza Horak.

Security: CVE-2014-0065 for buffer overruns, CVE-2014-0066 for crypt()
2014-02-17 11:20:21 -05:00
Noah Misch 31400a6733 Predict integer overflow to avoid buffer overruns.
Several functions, mostly type input functions, calculated an allocation
size such that the calculation wrapped to a small positive value when
arguments implied a sufficiently-large requirement.  Writes past the end
of the inadvertent small allocation followed shortly thereafter.
Coverity identified the path_in() vulnerability; code inspection led to
the rest.  In passing, add check_stack_depth() to prevent stack overflow
in related functions.

Back-patch to 8.4 (all supported versions).  The non-comment hstore
changes touch code that did not exist in 8.4, so that part stops at 9.0.

Noah Misch and Heikki Linnakangas, reviewed by Tom Lane.

Security: CVE-2014-0064
2014-02-17 09:33:31 -05:00
Noah Misch 4318daecc9 Fix handling of wide datetime input/output.
Many server functions use the MAXDATELEN constant to size a buffer for
parsing or displaying a datetime value.  It was much too small for the
longest possible interval output and slightly too small for certain
valid timestamp input, particularly input with a long timezone name.
The long input was rejected needlessly; the long output caused
interval_out() to overrun its buffer.  ECPG's pgtypes library has a copy
of the vulnerable functions, which bore the same vulnerabilities along
with some of its own.  In contrast to the server, certain long inputs
caused stack overflow rather than failing cleanly.  Back-patch to 8.4
(all supported versions).

Reported by Daniel Schüssler, reviewed by Tom Lane.

Security: CVE-2014-0063
2014-02-17 09:33:31 -05:00
Robert Haas 5f173040e3 Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts.  At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.

This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older).  In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches.  A field has also been added
to the Constraint node (FkConstraint in 8.4).  Third-party code calling
these functions or using the Constraint node will require updating.

Report by Andres Freund.  Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.

Security: CVE-2014-0062
2014-02-17 09:33:31 -05:00
Noah Misch 537cbd35c8 Prevent privilege escalation in explicit calls to PL validators.
The primary role of PL validators is to be called implicitly during
CREATE FUNCTION, but they are also normal functions that a user can call
explicitly.  Add a permissions check to each validator to ensure that a
user cannot use explicit validator calls to achieve things he could not
otherwise achieve.  Back-patch to 8.4 (all supported versions).
Non-core procedural language extensions ought to make the same two-line
change to their own validators.

Andres Freund, reviewed by Tom Lane and Noah Misch.

Security: CVE-2014-0061
2014-02-17 09:33:31 -05:00
Noah Misch fea164a72a Shore up ADMIN OPTION restrictions.
Granting a role without ADMIN OPTION is supposed to prevent the grantee
from adding or removing members from the granted role.  Issuing SET ROLE
before the GRANT bypassed that, because the role itself had an implicit
right to add or remove members.  Plug that hole by recognizing that
implicit right only when the session user matches the current role.
Additionally, do not recognize it during a security-restricted operation
or during execution of a SECURITY DEFINER function.  The restriction on
SECURITY DEFINER is not security-critical.  However, it seems best for a
user testing his own SECURITY DEFINER function to see the same behavior
others will see.  Back-patch to 8.4 (all supported versions).

The SQL standards do not conflate roles and users as PostgreSQL does;
only SQL roles have members, and only SQL users initiate sessions.  An
application using PostgreSQL users and roles as SQL users and roles will
never attempt to grant membership in the role that is the session user,
so the implicit right to add or remove members will never arise.

The security impact was mostly that a role member could revoke access
from others, contrary to the wishes of his own grantor.  Unapproved role
member additions are less notable, because the member can still largely
achieve that by creating a view or a SECURITY DEFINER function.

Reviewed by Andres Freund and Tom Lane.  Reported, independently, by
Jonas Sundman and Noah Misch.

Security: CVE-2014-0060
2014-02-17 09:33:31 -05:00
Tom Lane fa1f0d7859 PGDLLIMPORT-ify MainLWLockArray, ProcDiePending, proc_exit_inprogress.
These are needed in HEAD to make assorted contrib modules build on Windows.
Now that all the MSVC and Mingw buildfarm members seem to be on the same
page about the need for them, we can have some confidence that future
problems of this ilk will be detected promptly; there seems nothing more
to be learned by delaying this fix further.

I chose to mark QueryCancelPending as well, since it's easy to imagine code
that wants to touch ProcDiePending also caring about QueryCancelPending.
2014-02-16 20:12:43 -05:00
Tom Lane a1c802712c Fix unportable coding in tarCreateHeader().
uid_t and gid_t might be wider than int on some platforms.
Per buildfarm member brolga.
2014-02-16 20:01:18 -05:00
Tom Lane 8d6e2d4abf Revert to using --enable-auto-import in Cygwin builds.
Disabling auto-import requires that all libraries we use be careful about
declspecs for exported variables; and it seems they aren't.  This means
that Cygwin will not give us useful info about missing PGDLLIMPORT markers;
but it's probably sufficient that MSVC and Mingw builds do.
2014-02-16 15:14:04 -05:00
Tom Lane a5cf60682e PGDLLIMPORT'ify DateStyle and IntervalStyle.
This is needed on Windows to support contrib/postgres_fdw.  Although it's
been broken since last March, we didn't notice until recently because there
were no active buildfarm members that complained about missing PGDLLIMPORT
marking.  Efforts are underway to improve that situation, in support of
which we're delaying fixing some other cases of global variables that
should be marked PGDLLIMPORT.  However, this case affects 9.3, so we
can't wait any longer to fix it.

I chose to mark DateOrder as well, though it's not strictly necessary
for postgres_fdw.
2014-02-16 12:37:07 -05:00
Tom Lane 56caaf195e On Windows, expect to find Tcl DLL in bin directory not lib directory.
Still another step in the continuing saga of trying to get
--disable-auto-import to work.

Hiroshi Inoue
2014-02-16 11:24:38 -05:00
Tom Lane 643f75ca9b Fix unportable coding in BackgroundWorkerStateChange().
PIDs aren't necessarily ints; our usual practice for printing them
is to explicitly cast to long.  Per buildfarm member rover_firefly.
2014-02-15 17:15:05 -05:00
Tom Lane f0ee42d59b Fix unportable coding in DetermineSleepTime().
We should not assume that struct timeval.tv_sec is a long, because
it ain't necessarily.  (POSIX says that it's a time_t, which might
well be 64 bits now or in the future; or for that matter might be
32 bits on machines with 64-bit longs.)  Per buildfarm member panther.

Back-patch to 9.3 where the dubious coding was introduced.
2014-02-15 17:09:50 -05:00
Tom Lane 60ff2fdd99 Centralize getopt-related declarations in a new header file pg_getopt.h.
We used to have externs for getopt() and its API variables scattered
all over the place.  Now that we find we're going to need to tweak the
variable declarations for Cygwin, it seems like a good idea to have
just one place to tweak.

In this commit, the variables are declared "#ifndef HAVE_GETOPT_H".
That may or may not work everywhere, but we'll soon find out.

Andres Freund
2014-02-15 14:31:30 -05:00
Bruce Momjian 32be1c8e90 Remove use of sscanf in pg_upgrade, and add C comment to pg_dump
Per report from Jackie Chang
2014-02-15 11:50:56 -05:00
Bruce Momjian a0d8947acb psql: Add C comment about gset_prefix being freed later 2014-02-15 00:09:40 -05:00
Tom Lane 1c5143a0b5 Ooops, forgot to remove solar87 and friends from src/timezone/Makefile.
Per buildfarm.
2014-02-14 23:20:08 -05:00
Tom Lane e04641f4b4 Update time zone data files to tzdata release 2013i.
DST law changes in Jordan; historical changes in Cuba.

Also, remove the zones Asia/Riyadh87, Asia/Riyadh88, and Asia/Riyadh89.
Per the upstream announcement:
    The files solar87, solar88, and solar89 are no longer distributed.
    They were a negative experiment -- that is, a demonstration that
    tz data can represent solar time only with some difficulty and error.
    Their presence in the distribution caused confusion, as Riyadh
    civil time was generally not solar time in those years.
2014-02-14 21:59:13 -05:00
Tom Lane 638b153f2a Fix fat-fingered makefile changes for pltcl.
I put the OBJS assignments in the wrong order.  Per buildfarm.
2014-02-14 17:10:53 -05:00
Tom Lane dcbf39774f In mingw builds, make our own import library for libtcl, too.
Per buildfarm results.
2014-02-14 13:13:06 -05:00
Tom Lane 02b61dd08f In mingw builds, make our own import library for libperl.
Borrow the method already used by plpython.  This is pretty ugly, but
it might fix the build failure exhibited by buildfarm member narwhal
since commit 846e91e022.

Hiroshi Inoue
2014-02-14 11:51:02 -05:00
Tom Lane a7983e989d Cosmetic improvements in plpython's make rule for libpython import library.
This build technique is remarkably ugly, but that doesn't mean it has
to be unreadable too.  Be a bit more liberal with the vertical whitespace,
and give the .def file a proper dependency, just in case.
2014-02-14 11:31:35 -05:00
Heikki Linnakangas 4d894b41cd Change the order that pg_xlog and WAL archive are polled for WAL segments.
If there is a WAL segment with same ID but different TLI present in both
the WAL archive and pg_xlog, prefer the one with higher TLI. Before this
patch, the archive was polled first, for all expected TLIs, and only if no
file was found was pg_xlog scanned. This was a change in behavior from 9.3,
which first scanned archive and pg_xlog for the highest TLI, then archive
and pg_xlog for the next highest TLI and so forth. This patch reverts the
behavior back to what it was in 9.2.

The reason for this is that if for example you try to do archive recovery
to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is
not archived yet, we would replay past the timeline switch point on
timeline 1 using the archived files, before even looking timeline 2's files
in pg_xlog

Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior
was changed.
2014-02-14 15:15:09 +02:00
Peter Eisentraut 0f2ca0075c Fix typo
Stefan Kaltenbrunner
2014-02-13 21:50:43 -05:00
Bruce Momjian 9c57d11fca Add C comment about problems with CHAR() space trimming 2014-02-13 21:46:03 -05:00
Tom Lane b8f00a46bc Clean up error cases in psql's COPY TO STDOUT/FROM STDIN code.
Adjust handleCopyOut() to stop trying to write data once it's failed
one time.  For typical cases such as out-of-disk-space or broken-pipe,
additional attempts aren't going to do anything but waste time, and
in any case clean truncation of the output seems like a better behavior
than randomly dropping blocks in the middle.

Also remove dubious (and misleadingly documented) attempt to force our way
out of COPY_OUT state if libpq didn't do that.  If we did have a situation
like that, it'd be a bug in libpq and would be better fixed there, IMO.
We can hope that commit fa4440f516 took care
of any such problems, anyway.

Also fix longstanding bug in handleCopyIn(): PQputCopyEnd() only supports
a non-null errormsg parameter in protocol version 3, and will actively
fail if one is passed in version 2.  This would've made our attempts
to get out of COPY_IN state after a failure into infinite loops when
talking to pre-7.4 servers.

Back-patch the COPY_OUT state change business back to 9.2 where it was
introduced, and the other two fixes into all supported branches.
2014-02-13 18:45:58 -05:00
Alvaro Herrera 801c2dc72c Separate multixact freezing parameters from xid's
Previously we were piggybacking on transaction ID parameters to freeze
multixacts; but since there isn't necessarily any relationship between
rates of Xid and multixact consumption, this turns out not to be a good
idea.

Therefore, we now have multixact-specific freezing parameters:

vacuum_multixact_freeze_min_age: when to remove multis as we come across
them in vacuum (default to 5 million, i.e. early in comparison to Xid's
default of 50 million)

vacuum_multixact_freeze_table_age: when to force whole-table scans
instead of scanning only the pages marked as not all visible in
visibility map (default to 150 million, same as for Xids).  Whichever of
both which reaches the 150 million mark earlier will cause a whole-table
scan.

autovacuum_multixact_freeze_max_age: when for cause emergency,
uninterruptible whole-table scans (default to 400 million, double as
that for Xids).  This means there shouldn't be more frequent emergency
vacuuming than previously, unless multixacts are being used very
rapidly.

Backpatch to 9.3 where multixacts were made to persist enough to require
freezing.  To avoid an ABI break in 9.3, VacuumStmt has a couple of
fields in an unnatural place, and StdRdOptions is split in two so that
the newly added fields can go at the end.

Patch by me, reviewed by Robert Haas, with additional input from Andres
Freund and Tom Lane.
2014-02-13 19:36:31 -03:00
Tom Lane 44c2163302 Fix length checking for Unicode identifiers containing escapes (U&"...").
We used the length of the input string, not the de-escaped string, as
the trigger for NAMEDATALEN truncation.  AFAICS this would only result
in sometimes printing a phony truncation warning; but it's just luck
that there was no worse problem, since we were violating the API spec
for truncate_identifier().  Per bug #9204 from Joshua Yanovski.

This has been wrong since the Unicode-identifier support was added,
so back-patch to all supported branches.
2014-02-13 14:24:42 -05:00
Tom Lane fa4440f516 Improve libpq's error recovery for connection loss during COPY.
In pqSendSome, if the connection is already closed at entry, discard any
queued output data before returning.  There is no possibility of ever
sending the data, and anyway this corresponds to what we'd do if we'd
detected a hard error while trying to send().  This avoids possible
indefinite bloat of the output buffer if the application keeps trying
to send data (or even just keeps trying to do PQputCopyEnd, as psql
indeed will).

Because PQputCopyEnd won't transition out of PGASYNC_COPY_IN state
until it's successfully queued the COPY END message, and pqPutMsgEnd
doesn't distinguish a queuing failure from a pqSendSome failure,
this omission allowed an infinite loop in psql if the connection closure
occurred when we had at least 8K queued to send.  It might be worth
refactoring so that we can make that distinction, but for the moment
the other changes made here seem to offer adequate defenses.

To guard against other variants of this scenario, do not allow
PQgetResult to return a PGRES_COPY_XXX result if the connection is
already known dead.  Make sure it returns PGRES_FATAL_ERROR instead.

Per report from Stephen Frost.  Back-patch to all active branches.
2014-02-12 17:50:57 -05:00