Commit Graph

25186 Commits

Author SHA1 Message Date
Tom Lane b23fd2d8b3 Tweak position of $(DLL_DEFFILE) in shared-library link commands.
Reading the GNU ld man page suggests that this is order-sensitive
and should go in front of library references.  Correction to commit
846e91e022.
2014-02-12 11:22:23 -05:00
Tom Lane a5eed4d770 Make gendef.pl emit DATA annotations for global variables.
This should make the MSVC build act more like builds for other platforms,
i.e. backend global variables will be automatically available to loadable
libraries without need for explicit PGDLLIMPORT marking.

Craig Ringer
2014-02-11 13:39:14 -05:00
Tom Lane 7a98d323df Flush a stray definition of $(DLLTOOL).
Even if this is needed, it'd be configure's responsibility to set it.
2014-02-11 12:59:48 -05:00
Tom Lane 846e91e022 Get rid of use of dlltool in Mingw builds.
We are almost completely out of the dlltool game, if this works.

Hiroshi Inoue
2014-02-11 12:56:20 -05:00
Tom Lane cba6ffaef3 Cygwin build fixes.
Get rid of use of dlltool for linking the main postgres executable.
dlltool is obsolete and we'd prefer to stop depending on it.

Also, include $(LDAP_LIBS_FE) in $(libpq_pgport).  (It's not clear that
this is really needed, or why it's not a linker bug if it is needed.
But reports are that it's needed on current Cygwin.)

We might want to back-patch this if it works, but first let's see
what the buildfarm thinks.

Marco Atzeri
2014-02-11 12:10:52 -05:00
Peter Eisentraut d3c4c47155 scripts: Remove newlines from end of generated SQL
This results in spurious empty lines in the server log.  Instead, add
the newlines only when printing out the --echo output.  In some cases,
this was already done, leading to two newlines being printed.  Clean
that up as well.

From: Fabrízio de Royes Mello <fabriziomello@gmail.com>
2014-02-10 21:47:19 -05:00
Tom Lane 2895415205 Don't generate plain-text HISTORY and src/test/regress/README anymore.
Providing this information as plain text was doubtless worth the trouble
ten years ago, but it seems likely that hardly anyone reads it in this
format anymore.  And the effort required to maintain these files (in the
form of extra-complex markup rules in the relevant parts of the SGML
documentation) is significant.  So, let's stop doing that and rely solely
on the other documentation formats.

Per discussion, the plain-text INSTALL instructions might still be worth
their keep, so we continue to generate that file.

Rather than remove HISTORY and src/test/regress/README from distribution
tarballs entirely, replace them with simple stub files that tell the reader
where to find the relevant documentation.  This is mainly to avoid possibly
breaking packaging recipes that expect these files to exist.

Back-patch to all supported branches, because simplifying the markup
requirements for release notes won't help much unless we do it in all
branches.
2014-02-10 20:48:04 -05:00
Heikki Linnakangas d699ba4134 Fix WakeupWaiters() to not wake up an exclusive locker unnecessarily.
WakeupWaiters() is supposed to wake up all LW_WAIT_UNTIL_FREE waiters of
the slot, but the loop incorrectly also woke up the first LW_EXCLUSIVE
waiter, if there was no LW_WAIT_UNTIL_FREE waiters in the queue.

Noted by Andres Freund. This code is new in 9.4, so no backpatching.
2014-02-10 15:18:18 +02:00
Heikki Linnakangas 6c2744f1d3 Use memmove() instead of memcpy() for copying overlapping regions.
In commit d2495f272c, I fixed this bug in
to_tsquery(), but missed the fact that plainto_tsquery() has the same bug.
2014-02-10 09:57:59 +02:00
Stephen Frost dfb1e9bdc0 Further pg_dump / ftello improvements
Make ftello error-checking consistent to all calls and remove a
bit of ftello-related code which has been #if 0'd out since 2001.

Note that we are not concerned with the ftello() call under
snprintf() failing as it is just building a string to call
exit_horribly() with; printing -1 in such a case is fine.
2014-02-09 18:28:14 -05:00
Stephen Frost 5e8e794e3b Focus on ftello result < 0 instead of errno
Rather than reset errno (or just hope that its cleared already),
check just the result of the ftello for < 0 to determine if there
was an issue.

Oversight by me, pointed out by Tom.
2014-02-09 13:29:36 -05:00
Magnus Hagander 8198a321c9 Limit pg_basebackup progress output to 1/second
This prevents pg_basebackup from generating excessive output when
dumping large clusters. The status is now updated once / second,
still making it possible to see that there is progress happening,
but limiting the total bandwidth.

Mika Eloranta, reviewed by Sawada Masahiko and Oskari Saarenmaa
2014-02-09 12:51:42 +01:00
Magnus Hagander 01025d80a1 Avoid printing uninitialized filename variable in verbose mode
When using verbose mode for pg_basebackup, in tar format sent to
stdout, we'd print an unitialized buffer as the filename.

Reported by Pontus Lundkvist
2014-02-09 12:05:14 +01:00
Stephen Frost cfa1b4a711 Minor pg_dump improvements
Improve pg_dump by checking results on various fgetc() calls which
previously were unchecked, ditto for ftello.  Also clean up a couple
of very minor memory leaks by waiting to allocate structures until
after the initial check(s).

Issues spotted by Coverity.
2014-02-08 21:25:47 -05:00
Peter Eisentraut 66c04c981d Mark some more variables as static or include the appropriate header
Detected by clang's -Wmissing-variable-declarations.

From: Andres Freund <andres@anarazel.de>
2014-02-08 21:21:46 -05:00
Heikki Linnakangas 6aa2bdf6a0 Initialize the entryRes array between each call to triConsistent.
The shimTriConstistentFn, which calls the opclass's consistent function with
all combinations of TRUE/FALSE for any MAYBE argument, modifies the entryRes
array passed by the caller. Change startScanKey to re-initialize it between
each call to accommodate that.

It's actually a bad habit by shimTriConsistentFn to modify its argument. But
the only caller that doesn't already re-initialize the entryRes array was
startScanKey, and it's easy for startScanKey to do so. Add a comment to
shimTriConsistentFn about that.

Note: this does not give a free pass to opclass-provided consistent
functions to modify the entryRes argument; shimTriConsistent assumes that
they don't, even though it does it itself.

While at it, refactor startScanKey to allocate the requiredEntries and
additionalEntries after it knows exactly how large they need to be. Saves a
little bit of memory, and looks nicer anyway.

Per complaint by Tom Lane, buildfarm and the pg_trgm regression test.
2014-02-07 18:53:31 +02:00
Heikki Linnakangas dbc649fd77 Speed up "rare & frequent" type GIN queries.
If you have a GIN query like "rare & frequent", we currently fetch all the
items that match either rare or frequent, call the consistent function for
each item, and let the consistent function filter out items that only match
one of the terms. However, if we can deduce that "rare" must be present for
the overall qual to be true, we can scan all the rare items, and for each
rare item, skip over to the next frequent item with the same or greater TID.
That greatly speeds up "rare & frequent" type queries.

To implement that, introduce the concept of a tri-state consistent function,
where the 3rd value is MAYBE, indicating that we don't know if that term is
present. Operator classes only provide a boolean consistent function, so we
simulate the tri-state consistent function by calling the boolean function
several times, with the MAYBE arguments set to all combinations of TRUE and
FALSE. Testing all combinations is only feasible for a small number of MAYBE
arguments, but it is envisioned that we'll provide a way for operator
classes to provide a native tri-state consistent function, which can be much
more efficient. But that is not included in this patch.

We were already using that trick to for lossy pages, calling the consistent
function with the lossy entry set to TRUE and FALSE. Now that we have the
tri-state consistent function, use it for lossy pages too.

Alexander Korotkov, with fair amount of refactoring by me.
2014-02-07 15:22:48 +02:00
Heikki Linnakangas e001030c27 Fix thinko in comment.
Amit Langote
2014-02-07 10:28:06 +02:00
Tom Lane 8de3e410fa In RelationClearRelation, postpone cache reload if !IsTransactionState().
We may process relcache flush requests during transaction startup or
shutdown.  In general it's not terribly safe to do catalog access at those
times, so the code's habit of trying to immediately revalidate unflushable
relcache entries is risky.  Although there are no field trouble reports
that are positively traceable to this, we have been able to demonstrate
failure of the assertions recently added in RelationIdGetRelation() and
SearchCatCache().  On the other hand, it seems safe to just postpone
revalidation of the cache entry until we're inside a valid transaction.
The one case where this is questionable is where we're exiting a
subtransaction and the outer transaction is holding the relcache entry open
--- but if we made any significant changes to the rel inside such a
subtransaction, we've got problems anyway.  There are mechanisms in place
to prevent that (to wit, locks for cross-session cases and
CheckTableNotInUse() for intra-session cases), so let's trust to those
mechanisms to keep us out of trouble.
2014-02-06 19:38:06 -05:00
Andrew Dunstan 45e1b6c4c4 Alphabeticize list in OBJS definition in utils/adt Makefile. 2014-02-06 12:11:49 -05:00
Tom Lane ddfc9cb054 Assert(IsTransactionState()) in RelationIdGetRelation().
Commit 42c80c696e added an
Assert(IsTransactionState()) in SearchCatCache(), to catch
any code that thought it could do a catcache lookup outside
transactions.  Extend the same idea to relcache lookups.
2014-02-06 11:28:13 -05:00
Peter Eisentraut f65233755c Fix whitespace 2014-02-05 23:12:51 -05:00
Tom Lane ac8bc3b6e4 Remove unnecessary relcache flushes after changing btree metapages.
These flushes were added in my commit d2896a9ed, which added the btree
logic that keeps a cached copy of the index metapage data in index relcache
entries.  The idea was to ensure that other backends would promptly update
their cached copies after a change.  However, this is not really necessary,
since _bt_getroot() has adequate defenses against believing a stale root
page link, and _bt_getrootheight() doesn't have to be 100% right.
Moreover, if it were necessary, a relcache flush would be an unreliable way
to do it, since the sinval mechanism believes that relcache flush requests
represent transactional updates, and therefore discards them on transaction
rollback.  Therefore, we might as well drop these flush requests and save
the time to rebuild the whole relcache entry after a metapage change.

If we ever try to support in-place truncation of btree indexes, it might
be necessary to revisit this issue so that _bt_getroot() can't get caught
by trying to follow a metapage link to a page that no longer exists.
A possible solution to that is to make use of an smgr, rather than
relcache, inval request to force other backends to discard their cached
metapages.  But for the moment this is not worth pursuing.
2014-02-05 13:43:46 -05:00
Peter Eisentraut 4e18236180 PL/Perl: Fix compiler warning
The code was assigning a (Datum) 0 to a void pointer.  That creates a
warning from clang 3.4.  It was probably a thinko to begin with.
2014-02-04 20:08:39 -05:00
Fujii Masao 489e6ac5a1 Fix comparison of an array of characters with zero to compare with '\0' instead.
Report from Andres Freund.
2014-02-04 10:59:39 +09:00
Tom Lane 0c2338abbb Fix lexing of U& sequences just before EOF.
Commit a5ff502fce was a brick shy of a load
in the backend lexer too, not just psql.  Per further testing of bug #9068.

In passing, improve related comments.
2014-02-03 19:47:57 -05:00
Tom Lane 0def2573c5 Fix *-qualification of named parameters in SQL-language functions.
Given a composite-type parameter named x, "$1.*" worked fine, but "x.*"
not so much.  This has been broken since named parameter references were
added in commit 9bff0780cf, so patch back
to 9.2.  Per bug #9085 from Hardy Falk.
2014-02-03 14:47:17 -05:00
Robert Haas 80353f3528 Adjust pg_sleep_for/pg_sleep_until to use clock_timestamp.
Otherwise, pg_sleep_until does the wrong thing in a multi-statement
transaction.

Julien Rouhaud
2014-02-03 14:33:43 -05:00
Andrew Dunstan d3ee45152b In json code, clean up temp memory contexts after processing.
Craig Ringer.
2014-02-03 10:40:12 -05:00
Fujii Masao 3e8554a54a Make pg_basebackup skip temporary statistics files.
The temporary statistics files don't need to be included in the backup
because they are always reset at the beginning of the archive recovery.
This patch changes pg_basebackup so that it skips all files located in
$PGDATA/pg_stat_tmp or the directory specified by stats_temp_directory
parameter.
2014-02-03 23:19:49 +09:00
Tom Lane 47aaebaac9 Switch in psql_scan() must cover all lexer states (except backslash cases).
Oversight in commit f7559c0101, which changed
UESCAPE lexing in psql.  Per bug #9068 from Manuel Gómez.
2014-02-02 18:59:34 -05:00
Tom Lane 46825d4978 Clean up some sloppy coding in repl_gram.y.
Remove unused copy-and-pasted macro definitions, and improve formatting
of recently-added productions.

I got interested in this because buildfarm member protosciurus has been
crashing in "bison repl_gram.y" since commit 858ec11.  It's a long shot
that this will fix that, though maybe the missing trailing semicolon
has something to do with it?  In any case, there's no need to approve
of dead code, nor of code whose formatting isn't even self-consistent
let alone consistent with what's around it.
2014-02-02 12:51:14 -05:00
Fujii Masao 0753bdb352 Add primary_slotname to recovery.conf.sample. 2014-02-03 00:41:50 +09:00
Fujii Masao 63be3b78f6 Fix typos in docs and comments.
Thom Brown
2014-02-02 10:28:18 +09:00
Andrew Dunstan 9abed7d1cb Fix makefile syntax. 2014-02-01 19:52:39 -05:00
Tom Lane 082c0dfa14 Fix some wide-character bugs in the text-search parser.
In p_isdigit and other character class test functions generated by the
p_iswhat macro, the code path for non-C locales with multibyte encodings
contained a bogus pointer cast that would accidentally fail to malfunction
if types wchar_t and wint_t have the same width.  Apparently that is true
on most platforms, but not on recent Cygwin releases.  Remove the cast,
as it seems completely unnecessary (I think it arose from a false analogy
to the need to cast to unsigned char when dealing with the <ctype.h>
functions).  Per bug #8970 from Marco Atzeri.

In the same functions, the code path for C locale with a multibyte encoding
simply ANDed each wide character with 0xFF before passing it to the
corresponding <ctype.h> function.  This could result in false positive
answers for some non-ASCII characters, so use a range test instead.
Noted by me while investigating Marco's complaint.

Also, remove some useless though not actually buggy maskings and casts
in the hand-coded p_isalnum and p_isalpha functions, which evidently
got tested a bit more carefully than the macro-generated functions.
2014-02-01 18:27:34 -05:00
Andrew Dunstan c8158a2eed fix whitespace 2014-02-01 16:30:26 -05:00
Tom Lane 214c7a4f0b Fix some more bugs in signal handlers and process shutdown logic.
WalSndKill was doing things exactly backwards: it should first clear
MyWalSnd (to stop signal handlers from touching MyWalSnd->latch),
then disown the latch, and only then mark the WalSnd struct unused by
clearing its pid field.

Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve
errno, which is surely a requirement for any signal handler.

Per discussion of recent buildfarm failures.  Back-patch as far
as the relevant code exists.
2014-02-01 16:21:23 -05:00
Andrew Dunstan 7e1531a450 Don't use deprecated dllwrap on Cygwin.
The preferred method is to use "cc -shared", and this allows binaries
to be rebased if required, unlike dllwrap.

Backpatch to 9.0 where we have buildfarm coverage.

There are still some issues with Cygwin, especially modern Cygwin, but
this helps us get closer to good support.

Marco Atzeri.
2014-02-01 16:08:33 -05:00
Andrew Dunstan d587298b80 Copy the libpq DLL to the bin directory on Mingw and Cygwin.
This has long been done by the MSVC build system, and has caused
confusion in the past when programs like psql have failed to start
because they can't find the DLL. If it's in the same directory as it now
will be they will find it.

Backpatch to all live branches.
2014-02-01 15:11:13 -05:00
Bruce Momjian d0ee93797d arrays: tighten checks for multi-dimensional input
Previously an input array string that started with a single-element
array dimension would then later accept a multi-dimensional segment.

BACKWARD INCOMPATIBILITY
2014-02-01 10:49:17 -05:00
Robert Haas 858ec11858 Introduce replication slots.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts.  Slots have some advantages over existing
techniques, as explained in the documentation.

In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.

Andres Freund and Robert Haas
2014-01-31 22:45:36 -05:00
Bruce Momjian 5168c76964 pg_restore: make help output plural for multi-enabled options
per report from Josh Kupershmidt
2014-01-31 22:29:01 -05:00
Robert Haas d1981719ad Clear MyProc and MyProcSignalState before they become invalid.
Evidence from buildfarm member crake suggests that the new test_shm_mq
module is routinely crashing the server due to the arrival of a SIGUSR1
after the shared memory segment has been unmapped.  Although processes
using the new dynamic background worker facilities are more likely to
receive a SIGUSR1 around this time, the problem is also possible on older
branches, so I'm back-patching the parts of this change that apply to
older branches as far as they apply.

It's already generally the case that code checks whether these pointers
are NULL before deferencing them, so the important thing is mostly to
make sure that they do get set to NULL before they become invalid.  But
in master, there's one case in procsignal_sigusr1_handler that lacks a
NULL guard, so add that.

Patch by me; review by Tom Lane.
2014-01-31 21:31:08 -05:00
Tom Lane 326e1d73c4 Disallow use of SSL v3 protocol in the server as well as in libpq.
Commit 820f08cabd claimed to make the server
and libpq handle SSL protocol versions identically, but actually the server
was still accepting SSL v3 protocol while libpq wasn't.  Per discussion,
SSL v3 is obsolete, and there's no good reason to continue to accept it.
So make the code really equivalent on both sides.  The behavior now is
that we use the highest mutually-supported TLS protocol version.

Marko Kreen, some comment-smithing by me
2014-01-31 17:51:18 -05:00
Bruce Momjian fc4ffba968 system catalogs: reorder pg_amproc entries into proper sections
Report form Antonin Houska
2014-01-31 16:04:18 -05:00
Bruce Momjian 290d2cb500 pgindent: add Perl comment 2014-01-31 14:46:00 -05:00
Bruce Momjian cad1e022b2 pgindent: add --list-of-typedefs option
Allows typedefs to be specified on the command line, per request from
Andrew.
2014-01-31 13:35:50 -05:00
Fujii Masao a87ae38be8 Add tab completion for ALTER TABLESPACE MOVE in psql. 2014-02-01 01:45:48 +09:00
Bruce Momjian 5ff47acf8f entab: add new options
Add new entab options to process only C comment whitespace after
periods, and to protect leading whitespace.
2014-01-31 11:05:21 -05:00
Bruce Momjian db98b31329 pgindent: preserve blank lines around #else/#endif
This requires a new version of pg_bsd_indent, version 1.3, to be
downloaded.
2014-01-30 22:40:05 -05:00
Robert Haas 760c770ff6 Add convenience functions pg_sleep_for and pg_sleep_until.
Vik Fearing, reviewed by Pavel Stehule and myself
2014-01-30 15:47:56 -05:00
Tom Lane 043f6ff05d Fix bogus handling of "postponed" lateral quals.
When pulling a "postponed" qual from a LATERAL subquery up into the quals
of an outer join, we must make sure that the postponed qual is included
in those seen by make_outerjoininfo().  Otherwise we might compute a
too-small min_lefthand or min_righthand for the outer join, leading to
"JOIN qualification cannot refer to other relations" failures from
distribute_qual_to_rels.  Subtler errors in the created plan seem possible,
too, if the extra qual would only affect join ordering constraints.

Per bug #9041 from David Leverton.  Back-patch to 9.3.
2014-01-30 14:51:16 -05:00
Bruce Momjian 146604ec43 Add checks for interval overflow/underflow
New checks include input, month/day/time internal adjustments, addition,
subtraction, multiplication, and negation.  Also adjust docs to
correctly specify interval size in bytes.

Report from Rok Kralj
2014-01-30 09:41:43 -05:00
Tom Lane 571addd729 Fix unsafe references to errno within error messaging logic.
Various places were supposing that errno could be expected to hold still
within an ereport() nest or similar contexts.  This isn't true necessarily,
though in some cases it accidentally failed to fail depending on how the
compiler chanced to order the subexpressions.  This class of thinko
explains recent reports of odd failures on clang-built versions, typically
missing or inappropriate HINT fields in messages.

Problem identified by Christian Kruse, who also submitted the patch this
commit is based on.  (I fixed a few issues in his patch and found a couple
of additional places with the same disease.)

Back-patch as appropriate to all supported branches.
2014-01-29 20:04:43 -05:00
Andrew Dunstan 120c5cc761 Silence compiler warnings about possibly unset variables.
They are in fact set in every case where they are needed, but the
compiler doesn't know that.

Per gripe from Tom Lane.
2014-01-29 18:54:14 -05:00
Andrew Dunstan 5e52e9d6d4 Forgot to bump catalog version for json_array_elements_text. 2014-01-29 16:38:31 -05:00
Robert Haas 9347baa5bb Include planning time in EXPLAIN ANALYZE output.
This doesn't work for prepared queries, but it's not too easy to get
the information in that case and there's some debate as to exactly
what the right thing to measure is, so just do this for now.

Andreas Karlsson, with slight doc changes by me.
2014-01-29 16:09:15 -05:00
Andrew Dunstan 5264d91541 Add json_array_elements_text function.
This was a notable omission from the json functions added in 9.3 and
there have been numerous complaints about its absence.

Laurence Rowe.
2014-01-29 15:39:01 -05:00
Heikki Linnakangas 699b1f40da Fix thinko in huge_tlb_pages patch.
We calculated the rounded-up size for the allocation, but then failed to
use the rounded-up value in the mmap() call. Oops.

Also, initialize allocsize, to silence warnings seen with some compilers,
as pointed out by Jeff Janes.
2014-01-29 21:33:56 +02:00
Heikki Linnakangas 626a120656 Further optimize GIN multi-key searches.
When skipping over some items in a posting tree, re-find the new location
by descending the tree from root, rather than walking the right links.
This can save a lot of I/O.

Heavily modified from Alexander Korotkov's fast scan patch.
2014-01-29 21:24:38 +02:00
Bruce Momjian 8440897b38 Fix pointer processing in new entab.c function 2014-01-29 13:31:11 -05:00
Bruce Momjian e93f7253a7 Add C functions to centralize entab processing 2014-01-29 12:48:07 -05:00
Bruce Momjian db90bcf8df Add more C comments to entab.c. 2014-01-29 12:22:22 -05:00
Heikki Linnakangas 25b1dafab6 Further optimize multi-key GIN searches.
If we're skipping past a certain TID, avoid decoding posting list segments
that only contain smaller TIDs.

Extracted from Alexander Korotkov's fast scan patch, heavily modified.
2014-01-29 18:26:40 +02:00
Heikki Linnakangas e20c70cb0f Allow skipping some items in a multi-key GIN search.
In a multi-key search, ie. something like "col @> 'foo' AND col @> 'bar'",
as soon as we find the next item that matches the first criteria, we don't
need to check the second criteria for TIDs smaller the first match. That
saves a lot of effort, especially if one of the terms is rare, while the
second occurs very frequently.

Based on ideas from Alexander Korotkov's fast scan patch.
2014-01-29 17:53:39 +02:00
Heikki Linnakangas 1a3458b6d8 Allow using huge TLB pages on Linux (MAP_HUGETLB)
This patch adds an option, huge_tlb_pages, which allows requesting the
shared memory segment to be allocated using huge pages, by using the
MAP_HUGETLB flag in mmap(). This can improve performance.

The default is 'try', which means that we will attempt using huge pages,
and fall back to non-huge pages if it doesn't work. Currently, only Linux
has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as
'off'.

In the passing, don't try to round the mmap() size to a multiple of
pagesize. mmap() doesn't require that, and there's no particular reason for
PostgreSQL to do that either. When using MAP_HUGETLB, however, round the
request size up to nearest 2MB boundary. This is to work around a bug in
some Linux kernel versions, but also to avoid wasting memory, because the
kernel will round the size up anyway.

Many people were involved in writing this patch, including Christian Kruse,
Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund
and me.
2014-01-29 14:08:30 +02:00
Robert Haas b7643b19f0 Fix compiler warning in EXEC_BACKEND builds.
Per a report by Rajeev Rastogi.
2014-01-28 23:35:50 -05:00
Andrew Dunstan 7043ac7100 Add new make targets check-tests and installcheck-tests.
These do not run any specific schedule of tests, but only those
specified as part of the invocation, e.g.:

    make check-tests TESTS="json jsonb"
2014-01-28 18:10:00 -05:00
Andrew Dunstan 105639900b New json functions.
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.

Catalog version bumped.

Andrew Dunstan, reviewed by Marko Tiikkaja.
2014-01-28 17:48:21 -05:00
Fujii Masao 9132b189bf Add pg_stat_archiver statistics view.
This view shows the statistics about the WAL archiver process's activity.

Gabriele Bartolini, reviewed by Michael Paquier, refactored a bit by me.
2014-01-29 02:58:22 +09:00
Bruce Momjian c871e8f53b Revert C comment change in slot_attisnull()
Revert 89774b58b0
2014-01-28 12:28:14 -05:00
Bruce Momjian 051b3341c1 Remove orphaned prototype
Rajeev rastogi
2014-01-28 11:29:39 -05:00
Stephen Frost aef61bf433 Revert dup2() checking in syslogger.c
Per the expanded comment-

As we're just trying to reset these to go to DEVNULL, there's not
much point in checking for failure from the close/dup2 calls here,
if they fail then presumably the file descriptors are closed and
any writes will go into the bitbucket anyway.

Pointed out by Tom.
2014-01-28 08:40:41 -05:00
Tom Lane 64e43c59b8 Log a detail message for auth failures due to missing or expired password.
It's worth distinguishing these cases from run-of-the-mill wrong-password
problems, since users have been known to waste lots of time pursuing the
wrong theory about what's failing.  Now, our longstanding policy about how
to report authentication failures is that we don't really want to tell the
*client* such things, since that might be giving information to a bad guy.
But there's nothing wrong with reporting the details to the postmaster log,
and indeed the comments in this area of the code contemplate that
interesting details should be so reported.  We just weren't handling these
particular interesting cases usefully.

To fix, add infrastructure allowing subroutines of ClientAuthentication()
to return a string to be added to the errdetail_log field of the main
authentication-failed error report.  We might later want to use this to
report other subcases of authentication failure the same way, but for the
moment I just dealt with password cases.

Per discussion of a patch from Josh Drake, though this is not what
he proposed.
2014-01-27 21:04:09 -05:00
Robert Haas ea9df812d8 Relax the requirement that all lwlocks be stored in a single array.
This makes it possible to store lwlocks as part of some other data
structure in the main shared memory segment, or in a dynamic shared
memory segment.  There is still a main LWLock array and this patch does
not move anything out of it, but it provides necessary infrastructure
for doing that in the future.

This change is likely to increase the size of LWLockPadded on some
platforms, especially 32-bit platforms where it was previously only
16 bytes.

Patch by me.  Review by Andres Freund and KaiGai Kohei.
2014-01-27 11:07:44 -05:00
Heikki Linnakangas f62eba204f Fix typo in README
Amit Langote
2014-01-27 09:33:18 +02:00
Tom Lane 2850896961 Code review for auto-tuned effective_cache_size.
Fix integer overflow issue noted by Magnus Hagander, as well as a bunch
of other infelicities in commit ee1e5662d8
and its unreasonably large number of followups.
2014-01-27 00:05:56 -05:00
Fujii Masao dd515d4082 Change the suffix of auto conf temporary file from "temp" to "tmp".
Michael Paquier
2014-01-27 12:39:11 +09:00
Fujii Masao 7c619be623 Fix typos in comments for ALTER SYSTEM.
Michael Paquier
2014-01-27 12:23:20 +09:00
Stephen Frost 152d24f5dd Fix minor leak in pg_dump
Move allocation to after we check the remote server version, to avoid
a possible, very minor, memory leak.  This makes us more consistent
throughout as most places in pg_dump are done in the same way (due, in
part, to previous fixes like this).

Spotted by the Coverity scanner.
2014-01-26 17:58:48 -05:00
Andrew Dunstan a7e5f7bf68 Provide for client-only installs with MSVC.
MauMau.
2014-01-26 17:03:13 -05:00
Stephen Frost 790eaa699e Check dup2() results in syslogger
Consistently check the dup2() call results throughout syslogger.c.
It's pretty unlikely that they'll error out, but if they do,
ereport(FATAL) instead of blissfully continuing on.

Spotted by the Coverity scanner.
2014-01-26 16:26:18 -05:00
Magnus Hagander f2795f8b53 Move the options column of \db+ before the description
The convention is to have the description field at the end.

Noted by Tom Lane
2014-01-26 21:16:02 +01:00
Magnus Hagander cae10ca27e Include tablespace options in verbose output of \db 2014-01-26 18:42:08 +01:00
Andrew Dunstan cec8394b5c Enable building with Visual Studion 2013.
Backpatch to 9.3.

Brar Piening.
2014-01-26 09:49:10 -05:00
Bruce Momjian 89774b58b0 Adjust C comment in slot_attisnull() regarding nulls. 2014-01-25 16:43:36 -05:00
Heikki Linnakangas 71c6a8e375 Add recovery_target='immediate' option.
This allows ending recovery as a consistent state has been reached. Without
this, there was no easy way to e.g restore an online backup, without
replaying any extra WAL after the backup ended.

MauMau and me.
2014-01-25 17:34:04 +02:00
Noah Misch 820f08cabd libpq: Support TLS versions beyond TLSv1.
Per report from Jeffrey Walton, libpq has been accepting only TLSv1
exactly.  Along the lines of the backend code, libpq will now support
new versions as OpenSSL adds them.

Marko Kreen, reviewed by Wim Lewis.
2014-01-24 19:29:06 -05:00
Noah Misch 3a5313265d psql: Mention SSL protocol version in \conninfo.
Marko Kreen, reviewed by Wim Lewis.
2014-01-24 19:23:56 -05:00
Stephen Frost 6794a9f9a1 Avoid minor leak in parallel pg_dump
During parallel pg_dump, a worker process closing the connection caused
a minor memory leak (particularly minor as we are likely about to exit
anyway).  Instead, free the memory in this case prior to returning NULL
to indicate connection closed.

Spotting by the Coverity scanner.

Back patch to 9.3 where this was introduced.
2014-01-24 15:10:08 -05:00
Heikki Linnakangas d150ff5781 Reset unused fields in GIN data leaf page footer.
The maxoff field is not used in the new, compressed page format. Let's
reset it when converting an old-format page to the new format. The code
won't care either way, but this makes it possible to use the field for
something else in the future.
2014-01-24 19:10:10 +02:00
Heikki Linnakangas a8f374849f Fix off-by-one in newly-introdcued GIN assertion.
Spotted by Alexander Korotkov
2014-01-24 11:10:09 +02:00
Heikki Linnakangas 398cf255ad In GIN recompression code, use mmemove rather than memcpy, for vacuum.
When vacuuming a data leaf page, any compressed posting lists that are not
modified, are copied back to the buffer from a later location in the same
buffer rather than from  a palloc'd copy. IOW, they are just moved
downwards in the same buffer. Because the source and destination addresses
can overlap, we must use memmove rather than memcpy.

Report and fix by Alexander Korotkov.
2014-01-24 10:48:45 +02:00
Stephen Frost fbe19ee3b8 ALTER TABLESPACE ... MOVE ... OWNED BY
Add the ability to specify the objects to move by who those objects are
owned by (as relowner) and change ALL to mean ALL objects.  This
makes the command always operate against a well-defined set of objects
and not have the objects-to-be-moved based on the role of the user
running the command.

Per discussion with Simon and Tom.
2014-01-23 23:52:40 -05:00
Tom Lane ac4ef637ad Allow use of "z" flag in our printf calls, and use it where appropriate.
Since C99, it's been standard for printf and friends to accept a "z" size
modifier, meaning "whatever size size_t has".  Up to now we've generally
dealt with printing size_t values by explicitly casting them to unsigned
long and using the "l" modifier; but this is really the wrong thing on
platforms where pointers are wider than longs (such as Win64).  So let's
start using "z" instead.  To ensure we can do that on all platforms, teach
src/port/snprintf.c to understand "z", and add a configure test to force
use of that implementation when the platform's version doesn't handle "z".

Having done that, modify a bunch of places that were using the
unsigned-long hack to use "z" instead.  This patch doesn't pretend to have
gotten everyplace that could benefit, but it catches many of them.  I made
an effort in particular to ensure that all uses of the same error message
text were updated together, so as not to increase the number of
translatable strings.

It's possible that this change will result in format-string warnings from
pre-C99 compilers.  We might have to reconsider if there are any popular
compilers that will warn about this; but let's start by seeing what the
buildfarm thinks.

Andres Freund, with a little additional work by me
2014-01-23 17:18:33 -05:00
Heikki Linnakangas ec8f692c3c Fix alignment of GIN in-line posting lists stored in entry tuples.
The Sparc machines in the buildfarm are crashing because of misaligned
access to posting lists stored in entry tuples.

I accidentally removed a critical SHORTALIGN() from ginFormTuple, as part
of the packed posting lists patch. Perhaps I thought it was unnecessary,
because the index_form_tuple() call above the SHORTALIGN already aligned
the size, missing the fact that the null-category byte makes it misaligned
again (I think the SHORTALIGN is indeed unnecessary if there's no null-
category byte, but let's just play it safe...)
2014-01-23 22:58:12 +02:00
Heikki Linnakangas 0fdb2f7d7c Silence compiler warning.
Not all compilers understand that elog(ERROR, ...) never returns.
2014-01-23 22:15:31 +02:00
Alvaro Herrera b152c6cd0d Make DROP IF EXISTS more consistently not fail
Some cases were still reporting errors and aborting, instead of a NOTICE
that the object was being skipped.  This makes it more difficult to
cleanly handle pg_dump --clean, so change that to instead skip missing
objects properly.

Per bug #7873 reported by Dave Rolsky; apparently this affects a large
number of users.

Authors: Pavel Stehule and Dean Rasheed.  Some tweaks by Álvaro Herrera
2014-01-23 14:40:29 -03:00
Fujii Masao 9f80f4835a Add libpq function PQhostaddr().
There was a bug in the psql's meta command \conninfo. When the
IP address was specified in the hostaddr and psql used it to create
a connection (i.e., psql -d "hostaddr=xxx"), \conninfo could not
display that address. This is because \conninfo got the connection
information only from PQhost() which could not return hostaddr.

This patch adds PQhostaddr(), and changes \conninfo so that it
can display not only the host name that PQhost() returns but also
the IP address which PQhostaddr() returns.

The bug has existed since 9.1 where \conninfo was introduced.
But it's too late to add new libpq function into the released versions,
so no backpatch.
2014-01-24 02:32:39 +09:00
Andrew Dunstan d5bc6ce6ac Allow case insensitive build version argument for MSVC.
Dilip Kumar.
2014-01-23 12:18:15 -05:00
Fujii Masao 77035fa8a9 Fix bugs in PQhost().
In the platform that doesn't support Unix-domain socket, when
neither host nor hostaddr are specified, the default host
'localhost' is used to connect to the server and PQhost() must
return that, but it didn't. This patch fixes PQhost() so that
it returns the default host in that case.

Also this patch fixes PQhost() so that it doesn't return
Unix-domain socket directory path in the platform that doesn't
support Unix-domain socket.

Back-patch to all supported versions.
2014-01-23 22:58:58 +09:00
Heikki Linnakangas 6668ad1d70 Fix declaration of GinVacuumState.
gcc 4.8 was happy with having a duplicate typedef, but most compilers seem not
to be, per buildfarm.
2014-01-22 19:55:36 +02:00
Heikki Linnakangas 36a35c550a Compress GIN posting lists, for smaller index size.
GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.

Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
2014-01-22 19:20:58 +02:00
Andrew Dunstan 243ee26633 Reindent json.c and jsonfuncs.c.
This will help in preparation of clean patches for upcoming
json work.
2014-01-22 08:46:51 -05:00
Stephen Frost 6c36f383df Allow type_func_name_keywords in even more places
A while back, 2c92edad48 allowed
type_func_name_keywords to be used in more places, including role
identifiers.  Unfortunately, that commit missed out on cases where
name_list was used for lists-of-roles, eg: for DROP ROLE.  This
resulted in the unfortunate situation that you could CREATE a role
with a type_func_name_keywords-allowed identifier, but not DROP it
(directly- ALTER could be used to rename it to something which
could be DROP'd).

This extends allowing type_func_name_keywords to places where role
lists can be used.

Back-patch to 9.0, as 2c92edad48 was.
2014-01-21 22:49:22 -05:00
Tom Lane 69c7a9838c Tweak parse location assignment for CURRENT_DATE and related constructs.
All these constructs generate parse trees consisting of a Const and
a run-time type coercion (perhaps a FuncExpr or a CoerceViaIO).  Modify
the raw parse output so that we end up with the original token's location
attached to the type coercion node while the Const has location -1;
before, it was the other way around.  This makes no difference in terms
of what exprLocation() will say about the parse tree as a whole, so it
should not have any user-visible impact.  The point of changing it is that
we do not want contrib/pg_stat_statements to treat these constructs as
replaceable constants.  It will do the right thing if the Const has
location -1 rather than a valid location.

This is a pretty ugly hack, but then this code is ugly already; we should
someday replace this translation with special-purpose parse node(s) that
would allow ruleutils.c to reconstruct the original query text.

(See also commit 5d3fcc4c2e, which also
hacked location assignment rules for the benefit of pg_stat_statements.)

Back-patch to 9.2 where pg_stat_statements grew the ability to recognize
replaceable constants.

Kyotaro Horiguchi
2014-01-21 16:34:28 -05:00
Robert Haas 01f7808b3e Add a cardinality function for arrays.
Unlike our other array functions, this considers the total number of
elements across all dimensions, and returns 0 rather than NULL when the
array has no elements.  But it seems that both of those behaviors are
almost universally disliked, so hopefully that's OK.

Marko Tiikkaja, reviewed by Dean Rasheed and Pavel Stehule
2014-01-21 12:38:53 -05:00
Robert Haas 033b2343fa Fix inadvertent semantics change in last patch to plug memory leaks.
Commit a5bca4ef03 accidentally changed
the semantics when the "skipping missing configuration file" is
emitted, because it forced OK to true instead of leaving the value
untouched.

Spotted by Tom Lane.
2014-01-21 11:42:37 -05:00
Robert Haas 5709b8acc6 Avoid a possible relcache leak in get_object_address_attribute.
There's no apparent way to trigger this, so I'm not going to worry
about back-patching it for now.  But it's still wrong.

Marti Raudsepp
2014-01-21 10:02:37 -05:00
Robert Haas a5bca4ef03 Plug more memory leaks when reloading config file.
Commit 138184adc5 plugged some but not
all of the leaks from commit 2a0c81a12c.
This tightens things up some more.

Amit Kapila, per an observation by Tom Lane
2014-01-21 09:41:40 -05:00
Alvaro Herrera d2458e3b20 Expose a routine to print triggers during EXPLAIN ANALYZE
This is so that auto_explain can use it.

Kyotaro HORIGUCHI
2014-01-20 17:13:47 -03:00
Tom Lane 9a8f5729b4 Fix to_timestamp/to_date's handling of consecutive spaces in format string.
When there are consecutive spaces (or other non-format-code characters) in
the format, we should advance over exactly that many characters of input.
The previous coding mistakenly did a "skip whitespace" action between such
characters, possibly allowing more input to be skipped than the user
intended.  We only need to skip whitespace just before an actual field.

This is really a bug fix, but given the minimal number of field complaints
and the risk of breaking applications coded to expect the old behavior,
let's not back-patch it.

Jeevan Chalke
2014-01-20 13:45:51 -05:00
Fujii Masao 5363c7f2bc Fix typo in comment.
Sawada Masahiko
2014-01-21 02:24:17 +09:00
Simon Riggs 4d1e2aeb1a Speed up COPY into tables with DEFAULT nextval()
Previously the presence of a nextval() prevented the
use of batch-mode COPY.  This patch introduces a
special case just for nextval() functions. In future
we will introduce a general case solution for
labelling volatile functions as safe for use.
2014-01-20 17:22:38 +00:00
Magnus Hagander 74a72ec208 Rename msvc build option krb5 to gss
In the MSVC build system we've never separated krb5 from gss,
and always built them both. Since the removal of native krb5
support, this parameter only controls GSSAPI, so rename it
accordingly.
2014-01-19 17:07:15 +01:00
Magnus Hagander 98de86e422 Remove support for native krb5 authentication
krb5 has been deprecated since 8.3, and the recommended way to do
Kerberos authentication is using the GSSAPI authentication method
(which is still fully supported).

libpq retains the ability to identify krb5 authentication, but only
gives an error message about it being unsupported. Since all authentication
is initiated from the backend, there is no need to keep it at all
in the backend.
2014-01-19 17:05:01 +01:00
Magnus Hagander 4b8f2859cc Adjust the SSL connection notification message
Suggested by Tom
2014-01-19 13:27:22 +01:00
Stephen Frost 5254958e92 Add CREATE TABLESPACE ... WITH ... Options
Tablespaces have a few options which can be set on them to give PG hints
as to how the tablespace behaves (perhaps it's faster for sequential
scans, or better able to handle random access, etc).  These options were
only available through the ALTER TABLESPACE command.

This adds the ability to set these options at CREATE TABLESPACE time,
removing the need to do both a CREATE TABLESPACE and ALTER TABLESPACE to
get the correct options set on the tablespace.

Vik Fearing, reviewed by Michael Paquier.
2014-01-18 20:59:31 -05:00
Tom Lane 115f414124 Fix VACUUM's reporting of dead-tuple counts to the stats collector.
Historically, VACUUM has just reported its new_rel_tuples estimate
(the same thing it puts into pg_class.reltuples) to the stats collector.
That number counts both live and dead-but-not-yet-reclaimable tuples.
This behavior may once have been right, but modern versions of the
pgstats code track live and dead tuple counts separately, so putting
the total into n_live_tuples and zero into n_dead_tuples is surely
pretty bogus.  Fix it to report live and dead tuple counts separately.

This doesn't really do much for situations where updating transactions
commit concurrently with a VACUUM scan (possibly causing double-counting or
omission of the tuples they add or delete); but it's clearly an improvement
over what we were doing before.

Hari Babu, reviewed by Amit Kapila
2014-01-18 19:24:33 -05:00
Stephen Frost 76e91b38ba Add ALTER TABLESPACE ... MOVE command
This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of
objects from one tablespace to another.  This can be extremely handy and avoids
a lot of error-prone scripting.  ALTER TABLESPACE ... MOVE will only move
objects the user owns, will notify the user if no objects were found, and can
be used to move ALL objects or specific types of objects (TABLES, INDEXES, or
MATERIALIZED VIEWS).
2014-01-18 18:56:40 -05:00
Stephen Frost 6f25c62d78 Allow SET TABLESPACE to database default
We've always allowed CREATE TABLE to create tables in the database's default
tablespace without checking for CREATE permissions on that tablespace.
Unfortunately, the original implementation of ALTER TABLE ... SET TABLESPACE
didn't pick up on that exception.

This changes ALTER TABLE ... SET TABLESPACE to allow the database's default
tablespace without checking for CREATE rights on that tablespace, just as
CREATE TABLE works today.  Users could always do this through a series of
commands (CREATE TABLE ... AS SELECT * FROM ...; DROP TABLE ...; etc), so
let's fix the oversight in SET TABLESPACE's original implementation.
2014-01-18 18:41:52 -05:00
Tom Lane 0d79c0a8cc Make various variables const (read-only).
These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.

Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.

Oskari Saarenmaa, with some editorialization by me
2014-01-18 16:04:32 -05:00
Andrew Dunstan 7d7eee8bb7 Export a few more symbols required for test_shm_mq module.
Patch from Amit Kapila.
2014-01-18 15:29:45 -05:00
Peter Eisentraut ad6bf0291a Fix client-only installation
The psql Makefile was not creating $(datadir) before installing
psqlrc.sample there.

In most cases, the directory would be created in some other way, but for
the documented from-source client-only installation procedure, it could
fail.

Reported-by: Mike Blackwell <mike.blackwell@rrd.com>
2014-01-17 23:08:22 -05:00
Andrew Dunstan 708c529c7f Export set_latch_on_sigusr1 symbol for Windows.
Per buildfarm currawong and grip from David Rowley.
2014-01-17 12:48:23 -05:00
Andrew Dunstan b64d956d58 Prevent double macro definition of WIN32.
David Rowley.
2014-01-17 11:49:44 -05:00
Magnus Hagander 4cba1f6bbf Show SSL encryption information when logging connections
Expand the messages when log_connections is enabled to include the
fact that SSL is used and the SSL cipher information.

Dr. Andreas Kunert, review by Marko Kreen
2014-01-17 13:32:31 +01:00
Magnus Hagander 9c14dd22e1 Define WIN32 when _WIN32 is set
_WIN32 is set by the compiler, whereas our code uses WIN32 that is
normally set through our build system. To make it possible to build
extensions out of tree we cannot rely on that, so set the WIN32
symbol explicitly whenever the compiler has set _WIN32.

Not setting this symbol causes double inclusion of pg_config_os.h,
and possibly other errors as well.

Craig Ringer
2014-01-17 12:41:32 +01:00
Bruce Momjian 7e1955b861 docs: update PL/pgSQL docs about the use of := and = 2014-01-16 16:40:58 -05:00
Heikki Linnakangas a472ae1e4e Fix Hot Standby feedback sending when streaming busily.
Commit 6f60fdd701 accidentally removed a
call to XLogWalRcvSendHSFeedback() after flushing received WAL to disk.
The consequence is that when walsender is busy streaming WAL, it doesn't
send HS feedback messages. One is sent if nothing is received from the
master for 100ms, but if there's a steady stream of WAL, it never happens.

Backpatch to 9.3.

Andres Freund and Amit Kapila
2014-01-16 23:15:41 +02:00
Alvaro Herrera 61bee9f756 Split ecpg_execute() in constituent parts
Split the rather long ecpg_execute() function into ecpg_build_params(),
ecpg_autostart_transaction(), a smaller ecpg_execute() and
ecpg_process_output().  There is no user-visible change here, only code
reorganization to support future patches.

Author: Zoltán Böszörményi

Reviewed by Antonin Houska.  Larger, older versions of this patch were
reviewed by Noah Misch and Michael Meskes.
2014-01-16 18:06:50 -03:00
Tom Lane 515d2c596c Add display of oprcode (the underlying function's name) to psql's \do+.
The + modifier of \do didn't use to do anything, but now it adds an oprcode
column.  This is useful both as an additional form of documentation of what
the operator does, and to save a step when finding out properties of the
underlying function.

Marko Tiikkaja, reviewed by Rushabh Lathia, adjusted a bit by me
2014-01-16 15:29:33 -05:00
Alvaro Herrera 3291301385 Split ECPGdo() in constituent parts
This splits ECPGdo() into ecpg_prologue(), ecpg_do() and
ecpg_epilogue(), and renames free_params() into ecpg_free_params() and
exports it.  This makes it possible for future code to use these
routines for their own purposes.

There is no user-visible functionality change here, only code
reorganization.

Zoltán Böszörményi

Reviewed by Antonin Houska.  Larger, older versions of this patch were
reviewed by Noah Misch and Michael Meskes.
2014-01-16 16:36:41 -03:00
Heikki Linnakangas 8ba288da5d Suppress Coverity complaints in readfuncs.c.
Coverity is complaining that the value returned by pg_strtok in
READ_LOCATION_FIELD and READ_BITMAPSET_FIELD macros is not used. In commit
39bfc94c86, we did this to the other macros
to placate compilers that complained when the variable was completely
unused, this extends that to the last remaining macros.
2014-01-16 12:00:19 +02:00
Robert Haas ed46758381 Logging running transactions every 15 seconds.
Previously, we did this just once per checkpoint, but that could make
Hot Standby take a long time to initialize.  To avoid busying an
otherwise-idle system, we don't do this if no WAL has been written
since we did it last.

Andres Freund
2014-01-15 12:41:20 -05:00
Robert Haas d02c0ddb15 Fix missing parentheses resulting in wrong order of dereference.
This could result in referencing uninitialized memory.

Michael Paquier, in response to a complaint from Andres Freund
2014-01-15 11:00:50 -05:00
Tom Lane 5df99f6481 Improve FILES section of psql reference page.
Primarily, explain where to find the system-wide psqlrc file, per recent
gripe from John Sutton.  Do some general wordsmithing and improve the
markup, too.

Also adjust psqlrc.sample so its comments about file location are somewhat
trustworthy.  (Not sure why we bother with this file when it's empty,
but whatever.)

Back-patch to 9.2 where the startup file naming scheme was last changed.
2014-01-14 19:27:57 -05:00
Tom Lane 061b079f89 Fix multiple bugs in index page locking during hot-standby WAL replay.
In ordinary operation, VACUUM must be careful to take a cleanup lock on
each leaf page of a btree index; this ensures that no indexscans could
still be "in flight" to heap tuples due to be deleted.  (Because of
possible index-tuple motion due to concurrent page splits, it's not enough
to lock only the pages we're deleting index tuples from.)  In Hot Standby,
the WAL replay process must likewise lock every leaf page.  There were
several bugs in the code for that:

* The replay scan might come across unused, all-zero pages in the index.
While btree_xlog_vacuum itself did the right thing (ie, nothing) with
such pages, xlogutils.c supposed that such pages must be corrupt and
would throw an error.  This accounts for various reports of replication
failures with "PANIC: WAL contains references to invalid pages".  To
fix, add a ReadBufferMode value that instructs XLogReadBufferExtended
not to complain when we're doing this.

* btree_xlog_vacuum performed the extra locking if standbyState ==
STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up
for hot standby queries until the database has reached consistency, and
we don't want to do the extra locking till then either, for fear of reading
corrupted pages (which bufmgr.c would complain about).  Fix by exporting a
new function from xlog.c that will report whether we're actually in hot
standby replay mode.

* To ensure full coverage of the index in the replay scan, btvacuumscan
would emit a dummy WAL record for the last page of the index, if no
vacuuming work had been done on that page.  However, if the last page
of the index is all-zero, that would result in corruption of said page,
since the functions called on it weren't prepared to handle that case.
There's no need to lock any such pages, so change the logic to target
the last normal leaf page instead.

The first two of these bugs were diagnosed by Andres Freund, the other one
by me.  Fixes based on ideas from Heikki Linnakangas and myself.

This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
2014-01-14 17:35:21 -05:00
Robert Haas 246a9a8d0c Fix typo in comment.
Etsuro Fujita
2014-01-14 14:34:57 -05:00
Robert Haas ec9037df26 Single-reader, single-writer, lightweight shared message queue.
This code provides infrastructure for user backends to communicate
relatively easily with background workers.  The message queue is
structured as a ring buffer and allows messages of arbitary length
to be sent and received.

Patch by me.  Review by KaiGai Kohei and Andres Freund.
2014-01-14 12:23:22 -05:00
Robert Haas 6ddd5137b2 Simple table of contents for a shared memory segment.
This interface is intended to make it simple to divide a dynamic shared
memory segment into different regions with distinct purposes.  It
therefore serves much the same purpose that ShmemIndex accomplishes for
the main shared memory segment, but it is intended to be more
lightweight.

Patch by me.  Review by Andres Freund.
2014-01-14 12:18:58 -05:00
Robert Haas 05ff5062da Code improvements for ALTER SYSTEM .. SET.
Move FreeConfigVariables() later to make sure ErrorConfFile is valid
when we use it, and get rid of an unnecessary string copy operation.

Amit Kapila, kibitzed by me.
2014-01-13 14:54:00 -05:00
Robert Haas 2bb1f14b89 Make bitmap heap scans show exact/lossy block info in EXPLAIN ANALYZE.
Etsuro Fujita
2014-01-13 14:42:16 -05:00
Michael Meskes 976a7d1156 Always use the same way to addres a descriptor in ecpg's regression tests. 2014-01-13 10:41:53 +01:00
Bruce Momjian bb953ad164 Fix pg_dumpall on pre-8.1 servers
rolname did not exist in pg_shadow.

Backpatch to 9.3

Report by Andrew Gierth via IRC
2014-01-12 22:25:36 -05:00
Tom Lane 158b7fa6a3 Disallow LATERAL references to the target table of an UPDATE/DELETE.
On second thought, commit 0c051c9008 was
over-hasty: rather than allowing this case, we ought to reject it for now.
That leaves the field clear for a future feature that allows the target
table to be re-specified in the FROM (or USING) clause, which will enable
left-joining the target table to something else.  We can then also allow
LATERAL references to such an explicitly re-specified target table.
But allowing them right now will create ambiguities or worse for such a
feature, and it isn't something we documented 9.3 as supporting.

While at it, add a convenience subroutine to avoid having several copies
of the ereport for disalllowed-LATERAL-reference cases.
2014-01-11 19:03:12 -05:00
Tom Lane 910bac5953 Fix possible crashes due to using elog/ereport too early in startup.
Per reports from Andres Freund and Luke Campbell, a server failure during
set_pglocale_pgservice results in a segfault rather than a useful error
message, because the infrastructure needed to use ereport hasn't been
initialized; specifically, MemoryContextInit hasn't been called.
One known cause of this is starting the server in a directory it
doesn't have permission to read.

We could try to prevent set_pglocale_pgservice from using anything that
depends on palloc or elog, but that would be messy, and the odds of future
breakage seem high.  Moreover there are other things being called in main.c
that look likely to use palloc or elog too --- perhaps those things
shouldn't be there, but they are there today.  The best solution seems to
be to move the call of MemoryContextInit to very early in the backend's
real main() function.  I've verified that an elog or ereport occurring
immediately after that is now capable of sending something useful to
stderr.

I also added code to elog.c to print something intelligible rather than
just crashing if MemoryContextInit hasn't created the ErrorContext.
This could happen if MemoryContextInit itself fails (due to malloc
failure), and provides some future-proofing against someone trying to
sneak in new code even earlier in server startup.

Back-patch to all supported branches.  Since we've only heard reports of
this type of failure recently, it may be that some recent change has made
it more likely to see a crash of this kind; but it sure looks like it's
broken all the way back.
2014-01-11 16:36:07 -05:00
Tom Lane 6286526207 Fix compute_scalar_stats() for case that all values exceed WIDTH_THRESHOLD.
The standard typanalyze functions skip over values whose detoasted size
exceeds WIDTH_THRESHOLD (1024 bytes), so as to limit memory bloat during
ANALYZE.  However, we (I think I, actually :-() failed to consider the
possibility that *every* non-null value in a column is too wide.  While
compute_minimal_stats() seems to behave reasonably anyway in such a case,
compute_scalar_stats() just fell through and generated no pg_statistic
entry at all.  That's unnecessarily pessimistic: we can still produce
valid stanullfrac and stawidth values in such cases, since we do include
too-wide values in the average-width calculation.  Furthermore, since the
general assumption in this code is that too-wide values are probably all
distinct from each other, it seems reasonable to set stadistinct to -1
("all distinct").

Per complaint from Kadri Raudsepp.  This has been like this since roughly
neolithic times, so back-patch to all supported branches.
2014-01-11 13:42:42 -05:00
Tom Lane 28233ffaa4 Add another regression test cross-checking operator and function comments.
Add a query that lists all the functions that are operator implementation
functions and have a SQL comment that doesn't just say "implementation of
XYZ operator".  (Note that the preceding test checks that such functions'
comments exactly match the corresponding operators' comments.)

While it's not forbidden to add more functions to this list, that should
only be done when we're encouraging users to use either the function or
operator syntax for the functionality, which is a fairly rare situation.
2014-01-11 00:16:08 -05:00
Andrew Dunstan 11829ff8b2 Remove DESCR entries for json operator functions.
Per -hackers discussion.
2014-01-10 22:25:04 -05:00
Bruce Momjian 111022eac6 Move username lookup functions from /port to /common
Per suggestion from Peter E and Alvaro
2014-01-10 18:03:28 -05:00
Alvaro Herrera 423e1211a8 Accept pg_upgraded tuples during multixact freezing
The new MultiXact freezing routines introduced by commit 8e9a16ab8f
neglected to consider tuples that came from a pg_upgrade'd database; a
vacuum run that tried to freeze such tuples would die with an error such
as
ERROR: MultiXactId 11415437 does no longer exist -- apparent wraparound

To fix, ensure that GetMultiXactIdMembers is allowed to return empty
multis when the infomask bits are right, as is done in other callsites.

Per trouble report from F-Secure.

In passing, fix a copy&paste bug reported by Andrey Karpov from VIVA64
from their PVS-Studio static checked, that instead of setting relminmxid
to Invalid, we were setting relfrozenxid twice.  Not an important
mistake because that code branch is about relations for which we don't
use the frozenxid/minmxid values at all in the first place, but seems to
warrants a fix nonetheless.
2014-01-10 18:03:18 -03:00
Tom Lane faab7a957d Remove unnecessary local variables to work around an icc optimization bug.
Buildfarm member dunlin has been crashing since commit 8b49a60, but other
machines seem fine with that code.  It turns out that removing the local
variables in ordered_set_startup() that are copies of fields in "qstate"
dodges the problem.  This might cost a few cycles on register-rich
machines, but it's probably a wash on others, and in any case this code
isn't performance-critical.  Thanks to Jeremy Drake for off-list
investigation.
2014-01-09 12:59:55 -05:00
Michael Meskes 192b4aacad Changed regression test to ecpg test suite for alignment problem just with last
commit.
2014-01-09 16:20:19 +01:00
Michael Meskes d685e24249 Fix descriptor output in ECPG.
While working on most platforms the old way sometimes created alignment
problems. This should fix it. Also the regresion tests were updated to test for
the reported case.

Report and fix by MauMau <maumau307@gmail.com>
2014-01-09 16:20:19 +01:00
Heikki Linnakangas c945af80cf Refactor checking whether we've reached the recovery target.
Makes the replay loop slightly more readable, by separating the concerns of
whether to stop and whether to delay, and how to extract the timestamp from
a record.

This has the user-visible change that the timestamp of the last applied
record is now updated after actually applying it. Before, it was updated
just before applying it. That meant that pg_last_xact_replay_timestamp()
could return the timestamp of a commit record that is in process of being
replayed, but not yet applied. Normally the difference is small, but if
min_recovery_apply_delay is set, there could be a significant delay between
reading a record and applying it.

Another behavioral change is that if you recover to a restore point, we stop
after the restore point record, not before it. It makes no difference as far
as running queries on the server is concerned, as applying a restore point
record changes nothing, but if examine the timeline history you will see
that the new timeline branched off just after the restore point record, not
before it. One practical consequence is that if you do PITR to the new
timeline, and set recovery target to the same named restore point again, it
will find and stop recovery at the same restore point. Conceptually, I think
it makes more sense to consider the restore point as part of the new
timeline's history than not.

In principle, setting the last-replayed timestamp before actually applying
the record was a bug all along, but it doesn't seem worth the risk to
backpatch, since min_recovery_apply_delay was only added in 9.4.
2014-01-09 14:00:39 +02:00
Tom Lane 220b34331f We don't need to include pg_sema.h in s_lock.h anymore.
Minor improvement to commit daa7527afc2274432094ebe7ceb03aa41f916607:
s_lock.h no longer has any need to mention PGSemaphoreData, so we can
rip out the #include that supplies that.  In a non-HAVE_SPINLOCKS
build, this doesn't really buy much since we still need the #include
in spin.h --- but everywhere else, this reduces #include footprint by
some trifle, and helps keep the different locking facilities separate.
2014-01-08 20:58:22 -05:00
Tom Lane 080b7db72e Fix "cannot accept a set" error when only some arms of a CASE return a set.
In commit c1352052ef, I implemented an
optimization that assumed that a function's argument expressions would
either always return a set (ie multiple rows), or always not.  This is
wrong however: we allow CASE expressions in which some arms return a set
of some type and others just return a scalar of that type.  There may be
other examples as well.  To fix, replace the run-time test of whether an
argument returned a set with a static precheck (expression_returns_set).
This adds a little bit of query startup overhead, but it seems barely
measurable.

Per bug #8228 from David Johnston.  This has been broken since 8.0,
so patch all supported branches.
2014-01-08 20:18:58 -05:00
Robert Haas daa7527afc Reduce the number of semaphores used under --disable-spinlocks.
Instead of allocating a semaphore from the operating system for every
spinlock, allocate a fixed number of semaphores (by default, 1024)
from the operating system and multiplex all the spinlocks that get
created onto them.  This could self-deadlock if a process attempted
to acquire more than one spinlock at a time, but since processes
aren't supposed to execute anything other than short stretches of
straight-line code while holding a spinlock, that shouldn't happen.

One motivation for this change is that, with the introduction of
dynamic shared memory, it may be desirable to create spinlocks that
last for less than the lifetime of the server.  Without this change,
attempting to use such facilities under --disable-spinlocks would
quickly exhaust any supply of available semaphores.  Quite apart
from that, it's desirable to contain the quantity of semaphores
needed to run the server simply on convenience grounds, since using
too many may make it harder to get PostgreSQL running on a new
platform, which is mostly the point of --disable-spinlocks in the
first place.

Patch by me; review by Tom Lane.
2014-01-08 18:58:00 -05:00
Heikki Linnakangas 3739e5ab93 Fix pause_at_recovery_target + recovery_target_inclusive combination.
If pause_at_recovery_target is set, recovery pauses *before* applying the
target record, even if recovery_target_inclusive is set. If you then
continue with pg_xlog_replay_resume(), it will apply the target record
before ending recovery. In other words, if you log in while it's paused
and verify that the database looks OK, ending recovery changes its state
again, possibly destroying data that you were tring to salvage with PITR.

Backpatch to 9.1, this has been broken since pause_at_recovery_target was
added.
2014-01-08 23:28:52 +02:00
Heikki Linnakangas 815d71deed If multiple recovery_targets are specified, use the latest one.
The docs say that only one of recovery_target_xid, recovery_target_time, or
recovery_target_name can be specified. But the code actually did something
different, so that a name overrode time, and xid overrode both time and name.
Now the target specified last takes effect, whether it's an xid, time or
name.

With this patch, we still accept multiple recovery_target settings, even
though docs say that only one can be specified. It's a general property of
the recovery.conf file parser that you if you specify the same option twice,
the last one takes effect, like with postgresql.conf.
2014-01-08 22:26:39 +02:00
Tom Lane 847e46abc9 Avoid extra AggCheckCallContext() checks in ordered-set aggregates.
In the transition functions, we don't really need to recheck this after the
first call.  I had been feeling paranoid about possibly getting a non-null
argument value in some other context; but it's probably game over anyway
if we have a non-null "internal" value that's not what we are expecting.

In the final functions, the general convention in pre-existing final
functions seems to be that an Assert() is good enough, so do it like that
here too.

This seems to save a few tenths of a percent of overall query runtime,
which isn't much, but still it's just overhead if there's not a plausible
case where the checks would fire.
2014-01-08 14:33:52 -05:00
Tom Lane e6336b8b57 Save a few cycles in advance_transition_function().
Keep a pre-initialized FunctionCallInfoData in AggStatePerAggData, and
re-use that at each row instead of doing InitFunctionCallInfoData each
time.  This saves only half a dozen assignments and maybe some stack
manipulation, and yet that seems to be good for a percent or two of the
overall query run time for simple aggregates such as count(*).  The cost
is that the FunctionCallInfoData (which is about a kilobyte, on 64-bit
machines) stays allocated for the duration of the query instead of being
short-lived stack data.  But we're already paying an equivalent space cost
for each regular FuncExpr or OpExpr node, so I don't feel bad about paying
it for aggregate functions.  The code seems a little cleaner this way too,
since the number of things passed to advance_transition_function decreases.
2014-01-08 13:58:37 -05:00
Heikki Linnakangas d59ff6c110 Fix bug in determining when recovery has reached consistency.
When starting WAL replay from an online checkpoint, the last replayed WAL
record variable was initialized using the checkpoint record's location, even
though the records between the REDO location and the checkpoint record had
not been replayed yet. That was noted as "slightly confusing" but harmless
in the comment, but in some cases, it fooled CheckRecoveryConsistency to
incorrectly conclude that we had already reached a consistent state
immediately at the beginning of WAL replay. That caused the system to accept
read-only connections in hot standby mode too early, and also PANICs with
message "WAL contains references to invalid pages".

Fix by initializing the variables to the REDO location instead.

In 9.2 and above, change CheckRecoveryConsistency() to use
lastReplayedEndRecPtr variable when checking if backup end location has
been reached. It was inconsistently using EndRecPtr for that check, but
lastReplayedEndRecPtr when checking min recovery point. It made no
difference before this patch, because in all the places where
CheckRecoveryConsistency was called the two variables were the same, but
it was always an accident waiting to happen, and would have been wrong
after this patch anyway.

Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0,
where hot standby was introduced.
2014-01-08 15:03:09 +02:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane 0c051c9008 Fix LATERAL references to target table of UPDATE/DELETE.
I failed to think much about UPDATE/DELETE when implementing LATERAL :-(.
The implemented behavior ended up being that subqueries in the FROM or
USING clause (respectively) could access the update/delete target table as
though it were a lateral reference; which seems fine if they said LATERAL,
but certainly ought to draw an error if they didn't.  Fix it so you get a
suitable error when you omit LATERAL.  Per report from Emre Hasegeli.
2014-01-07 15:25:27 -05:00
Heikki Linnakangas f68220df92 Silence compiler warning on MSVC.
MSVC doesn't know that elog(ERROR) doesn't return, and gives a warning about
missing return. Silence that.

Amit Kapila
2014-01-07 21:49:15 +02:00
Magnus Hagander 9544cc0d65 Move permissions check from do_pg_start_backup to pg_start_backup
And the same for do_pg_stop_backup. The code in do_pg_* is not allowed
to access the catalogs. For manual base backups, the permissions
check can be handled in the calling function, and for streaming
base backups only users with the required permissions can get past
the authentication step in the first place.

Reported by Antonin Houska, diagnosed by Andres Freund
2014-01-07 17:50:56 +01:00
Magnus Hagander b168c5ef27 Avoid including tablespaces inside PGDATA twice in base backups
If a tablespace was crated inside PGDATA it was backed up both as part
of the PGDATA backup and as the backup of the tablespace. Avoid this
by skipping any directory inside PGDATA that contains one of the active
tablespaces.

Dimitri Fontaine and Magnus Hagander
2014-01-07 17:11:32 +01:00
Peter Eisentraut edc43458d7 Add more use of psprintf() 2014-01-06 21:30:26 -05:00
Heikki Linnakangas 10a82cda67 Remove bogus -K option from pg_dump.
I added it to the getopt call by accident in commit
691e595dd9.

Amit Kapila
2014-01-06 12:30:19 +02:00
Tom Lane 8b49a6044d Cache catalog lookup data across groups in ordered-set aggregates.
The initial commit of ordered-set aggregates just did all the setup work
afresh each time the aggregate function is started up.  But in a GROUP BY
query, the catalog lookups need not be repeated for each group, since the
column datatypes and sort information won't change.  When there are many
small groups, this makes for a useful, though not huge, performance
improvement.  Per suggestion from Andrew Gierth.

Profiling of these cases suggests that it might be profitable to avoid
duplicate lookups within tuplesort startup as well; but changing the
tuplesort APIs would have much broader impact, so I left that for
another day.
2014-01-05 12:28:39 -05:00
Tom Lane 92459e7a7f Fix translatability markings in psql, and add defenses against future bugs.
Several previous commits have added columns to various \d queries without
updating their translate_columns[] arrays, leading to potentially incorrect
translations in NLS-enabled builds.  Offenders include commit 893686762
(added prosecdef to \df+), c9ac00e6e (added description to \dc+) and
3b17efdfd (added description to \dC+).  Fix those cases back to 9.3 or
9.2 as appropriate.

Since this is evidently more easily missed than one would like, in HEAD
also add an Assert that the supplied array is long enough.  This requires
an API change for printQuery(), so it seems inappropriate for back
branches, but presumably all future changes will be tested in HEAD anyway.

In HEAD and 9.3, also clean up a whole lot of sloppiness in the emitted
SQL for \dy (event triggers): lack of translatability due to failing to
pass words-to-be-translated through gettext_noop(), inadequate schema
qualification, and sloppy formatting resulting in unnecessarily ugly
-E output.

Peter Eisentraut and Tom Lane, per bug #8702 from Sergey Burladyan
2014-01-04 16:05:16 -05:00
Tom Lane 5858cf8ab2 Fix header comment for bitncmp().
The result is an int less than, equal to, or greater than zero, in the
style of memcmp (and, in fact, exactly the output of memcmp in some cases).
This comment previously said -1, 1, or 0, which was an overspecification,
as noted by Emre Hasegeli.  All of the existing callers appear to be fine
with the actual behavior, so just fix the comment.

In passing, improve infelicitous formatting of some call sites.
2014-01-04 14:01:51 -05:00
Alvaro Herrera 1a3e82a7f9 Restore some comments lost during 15732b34e8
Michael Paquier
2014-01-03 13:22:03 -03:00
Tom Lane a3b4aeecfe Ooops, should use double not single quotes in StaticAssertStmt().
That's what I get for testing this on an older compiler.
2014-01-02 21:54:20 -05:00
Tom Lane a7ef273e1c Fix calculation of maximum statistics-message size.
The PGSTAT_NUM_TABENTRIES macro should have been updated when new fields
were added to struct PgStat_MsgTabstat in commit 644828908, but it wasn't.
Fix that.

Also, add a static assertion that we didn't overrun the intended size limit
on stats messages.  This will not necessarily catch every mistake in
computing the maximum array size for stats messages, but it will catch ones
that have practical consequences.  (The assertion in fact doesn't complain
about the aforementioned error in PGSTAT_NUM_TABENTRIES, because that was
not big enough to cause the array length to increase.)

No back-patch, as there's no actual bug in existing releases; this is just
in the nature of future-proofing.

Mark Dilger and Tom Lane
2014-01-02 21:45:51 -05:00
Alvaro Herrera 638cf09e76 Handle 5-char filenames in SlruScanDirectory
Original users of slru.c were all producing 4-digit filenames, so that
was all that that code was prepared to handle.  Changes to multixact.c
in the course of commit 0ac5ad5134 made pg_multixact/members create
5-digit filenames once a certain threshold was reached, which
SlruScanDirectory wasn't prepared to deal with; in particular,
5-digit-name files were not removed during truncation.  Change that
routine to make it aware of those files, and have it process them just
like any others.

Right now, some pg_multixact/members directories will contain a mixture
of 4-char and 5-char filenames.  A future commit is expected fix things
so that each slru.c user declares the correct maximum width for the
files it produces, to avoid such unsightly mixtures.

Noticed while investigating bug #8673 reported by Serge Negodyuck.
2014-01-02 18:17:29 -03:00
Alvaro Herrera a50d976254 Wrap multixact/members correctly during extension
In the 9.2 code for extending multixact/members, the logic was very
simple because the number of entries in a members page was a proper
divisor of 2^32, and thus at 2^32 wraparound the logic for page switch
was identical than at any other page boundary.  In commit 0ac5ad5134 I
failed to realize this and introduced code that was not able to go over
the 2^32 boundary.  Fix that by ensuring that when we reach the last
page of the last segment we correctly zero the initial page of the
initial segment, using correct uint32-wraparound-safe arithmetic.

Noticed while investigating bug #8673 reported by Serge Negodyuck, as
diagnosed by Andres Freund.
2014-01-02 18:17:07 -03:00
Alvaro Herrera 722acf51a0 Handle wraparound during truncation in multixact/members
In pg_multixact/members, relying on modulo-2^32 arithmetic for
wraparound handling doesn't work all that well.  Because we don't
explicitely track wraparound of the allocation counter for members, it
is possible that the "live" area exceeds 2^31 entries; trying to remove
SLRU segments that are "old" according to the original logic might lead
to removal of segments still in use.  To fix, have the truncation
routine use a tailored SlruScanDirectory callback that keeps track of
the live area in actual use; that way, when the live range exceeds 2^31
entries, the oldest segments still live will not get removed untimely.

This new SlruScanDir callback needs to take care not to remove segments
that are "in the future": if new SLRU segments appear while the
truncation is ongoing, make sure we don't remove them.  This requires
examination of shared memory state to recheck for false positives, but
testing suggests that this doesn't cause a problem.  The original coding
didn't suffer from this pitfall because segments created when truncation
is running are never considered to be removable.

Per Andres Freund's investigation of bug #8673 reported by Serge
Negodyuck.
2014-01-02 18:16:54 -03:00
Robert Haas 3cff1879f8 Aggressively freeze tables when CLUSTER or VACUUM FULL rewrites them.
We haven't wanted to do this in the past on the grounds that in rare
cases the original xmin value will be needed for forensic purposes, but
commit 37484ad2aa removes that objection,
so now we can.

Per extensive discussion, among many people, on pgsql-hackers.
2014-01-02 15:15:51 -05:00
Robert Haas 4b351841fa Rename walLogHints to wal_log_hints for easier grepping.
Michael Paquier
2014-01-01 20:17:00 -05:00
Michael Meskes 7c957ec83e Do not use an empty hostname.
When trying to connect to a given database libecpg should not try using an
empty hostname if no hostname was given.
2014-01-01 12:39:31 +01:00
Tom Lane c01bc51f8d Fix broken support for event triggers as extension members.
CREATE EVENT TRIGGER forgot to mark the event trigger as a member of its
extension, and pg_dump didn't pay any attention anyway when deciding
whether to dump the event trigger.  Per report from Moshe Jacobson.

Given the obvious lack of testing here, it's rather astonishing that
ALTER EXTENSION ADD/DROP EVENT TRIGGER work, but they seem to.
2013-12-30 14:00:02 -05:00
Tom Lane f7fbf4b0be Remove dead code now that orindxpath.c is history.
We don't need make_restrictinfo_from_bitmapqual() anymore at all.
generate_bitmap_or_paths() doesn't need to be exported, and we can
drop its rather klugy restriction_only flag.
2013-12-30 12:50:31 -05:00
Tom Lane f343a880d5 Extract restriction OR clauses whether or not they are indexable.
It's possible to extract a restriction OR clause from a join clause that
has the form of an OR-of-ANDs, if each sub-AND includes a clause that
mentions only one specific relation.  While PG has been aware of that idea
for many years, the code previously only did it if it could extract an
indexable OR clause.  On reflection, though, that seems a silly limitation:
adding a restriction clause can be a win by reducing the number of rows
that have to be filtered at the join step, even if we have to test the
clause as a plain filter clause during the scan.  This should be especially
useful for foreign tables, where the change can cut the number of rows that
have to be retrieved from the foreign server; but testing shows it can win
even on local tables.  Per a suggestion from Robert Haas.

As a heuristic, I made the code accept an extracted restriction clause
if its estimated selectivity is less than 0.9, which will probably result
in accepting extracted clauses just about always.  We might need to tweak
that later based on experience.

Since the code no longer has even a weak connection to Path creation,
remove orindxpath.c and create a new file optimizer/util/orclauses.c.

There's some additional janitorial cleanup of now-dead code that needs
to happen, but it seems like that's a fit subject for a separate commit.
2013-12-30 12:24:37 -05:00
Kevin Grittner 47f50262e7 Don't attempt to limit target database for pg_restore.
There was an apparent attempt to limit the target database for
pg_restore to version 7.1.0 or later.  Due to a leading zero this
was interpreted as an octal number, which allowed targets with
version numbers down to 2.87.36.  The lowest actual release above
that was 6.0.0, so that was effectively the limit.

Since the success of the restore attempt will depend primarily on
on what statements were generated by the dump run, we don't want
pg_restore trying to guess whether a given target should be allowed
based on version number.  Allow a connection to any version.  Since
it is very unlikely that anyone would be using a recent version of
pg_restore to restore to a pre-6.0 database, this has little to no
practical impact, but it makes the code less confusing to read.

Issue reported and initial patch suggestion from Joel Jacobson
based on an article by Andrey Karpov reporting on issues found by
PVS-Studio static code analyzer.  Final patch based on analysis by
Tom Lane.  Back-patch to all supported branches.
2013-12-29 15:17:52 -06:00
Tom Lane ed011d9754 Undo autoconf 2.69's attempt to #define _DARWIN_USE_64_BIT_INODE.
Defining this symbol causes OS X 10.5 to use a buggy version of readdir(),
which can sometimes fail with EINVAL if the previously-fetched directory
entry has been deleted or renamed.  In later OS X versions that bug has
been repaired, but we still don't need the #define because it's on by
default.  So this is just an all-around bad idea, and we can do without it.
2013-12-29 12:57:56 -05:00
Peter Eisentraut 71812a98cb Update grammar
From: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
2013-12-28 20:54:23 -05:00
Peter Eisentraut b986270bd4 Fix whitespace 2013-12-27 19:51:49 -05:00
Andrew Dunstan 29dcf7ded5 Properly detect invalid JSON numbers when generating JSON.
Instead of looking for characters that aren't valid in JSON numbers, we
simply pass the output string through the JSON number parser, and if it
fails the string is quoted. This means among other things that money and
domains over money will be quoted correctly and generate valid JSON.

Fixes bug #8676 reported by Anderson Cristian da Silva.

Backpatched to 9.2 where JSON generation was introduced.
2013-12-27 17:04:00 -05:00
Kevin Grittner a133bf7031 Fix misplaced right paren bugs in pgstatfuncs.c.
The bug would only show up if the C sockaddr structure contained
zero in the first byte for a valid address; otherwise it would
fail to fail, which is probably why it went unnoticed for so long.

Patch submitted by Joel Jacobson after seeing an article by Andrey
Karpov in which he reports finding this through static code
analysis using PVS-Studio.  While I was at it I moved a definition
of a local variable referenced in the buggy code to a more local
context.

Backpatch to all supported branches.
2013-12-27 15:26:24 -06:00
Peter Eisentraut a09e3fd776 Fix whitespace 2013-12-26 23:51:56 -05:00
Tom Lane 1def747db6 Fix inadequately-tested code path in tuplesort_skiptuples().
Per report from Jeff Davis.
2013-12-24 17:13:02 -05:00
Tom Lane 4eeda92d86 Fix ANALYZE failure on a column that's a domain over a range.
Most other range operations seem to work all right on domains,
but this one not so much, at least not since commit 918eee0c.
Per bug #8684 from Brett Neumeier.
2013-12-23 22:18:48 -05:00
Robert Haas d43760b624 Revise documentation for new freezing method.
Commit 37484ad2aa invalidated a good
chunk of documentation, so patch it up to reflect the new state of
play.  Along the way, patch remaining documentation references to
FrozenXID to say instead FrozenTransactionId, so that they match the
way we actually spell it in the code.
2013-12-23 20:36:31 -05:00
Tom Lane cf63c641ca Fix portability issue in ordered-set patch.
Overly compact coding in makeOrderedSetArgs() led to a platform dependency:
if the compiler chose to execute the subexpressions in the wrong order,
list_length() might get applied to an already-modified List, giving a
value we didn't want.  Per buildfarm.
2013-12-23 20:24:07 -05:00
Tom Lane 8d65da1f01 Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 16:11:35 -05:00
Robert Haas 37484ad2aa Change the way we mark tuples as frozen.
Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN.  A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change.  To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly.  HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund
2013-12-22 15:49:09 -05:00
Fujii Masao 961bf59fb7 Rename wal_log_hintbits to wal_log_hints, per discussion on pgsql-hackers.
Sawada Masahiko
2013-12-21 03:33:16 +09:00
Alvaro Herrera 6130208e75 Avoid useless palloc during transaction commit
We can allocate the initial relations-to-drop array when first needed,
instead of at function entry; this avoids allocating it when the
function is not going to do anything, which is most of the time.

Backpatch to 9.3, where this behavior was introduced by commit
279628a0a7.

There's more that could be done here, such as possible reworking of the
code to avoid having to palloc anything, but that doesn't sound as
backpatchable as this relatively minor change.

Per complaint from Noah Misch in
20131031145234.GA621493@tornado.leadboat.com
2013-12-20 12:37:30 -03:00
Robert Haas c32afe53c2 pg_prewarm, a contrib module for prewarming relationd data.
Patch by me.  Review by Álvaro Herrera, Amit Kapila, Jeff Janes,
Gurjeet Singh, and others.
2013-12-20 08:14:13 -05:00
Alvaro Herrera 6eda3e9c27 isolationtester: Ensure stderr is unbuffered, too 2013-12-19 22:09:30 -03:00
Bruce Momjian 527fdd9df1 Move pg_upgrade_support global variables to their own include file
Previously their declarations were spread around to avoid accidental
access.
2013-12-19 16:10:07 -05:00
Alvaro Herrera 73bcb76b77 Make stdout unbuffered
This ensures that all stdout output is flushed immediately, to match
stderr.  This eliminates the need for fflush(stdout) calls sprinkled all
over the place.

Per Daniel Wood in message 519A79C6.90308@salesforce.com
2013-12-19 17:26:27 -03:00
Alvaro Herrera 13aa624431 Optimize updating a row that's locked by same xid
Updating or locking a row that was already locked by the same
transaction under the same Xid caused a MultiXact to be created; but
this is unnecessary, because there's no usefulness in being able to
differentiate two locks by the same transaction.  In particular, if a
transaction executed SELECT FOR UPDATE followed by an UPDATE that didn't
modify columns of the key, we would dutifully represent the resulting
combination as a multixact -- even though a single key-update is
sufficient.

Optimize the case so that only the strongest of both locks/updates is
represented in Xmax.  This can save some Xmax's from becoming
MultiXacts, which can be a significant optimization.

This missed optimization opportunity was spotted by Andres Freund while
investigating a bug reported by Oliver Seemann in message
CANCipfpfzoYnOz5jj=UZ70_R=CwDHv36dqWSpwsi27vpm1z5sA@mail.gmail.com
and also directly as a performance regression reported by Dong Ye in
message
d54b8387.000012d8.00000010@YED-DEVD1.vmware.com
Reportedly, this patch fixes the performance regression.

Since the missing optimization was reported as a significant performance
regression from 9.2, backpatch to 9.3.

Andres Freund, tweaked by Álvaro Herrera
2013-12-19 16:53:49 -03:00
Fujii Masao 084e385a2f Add tab completion for ALTER SYSTEM SET in psql. 2013-12-20 02:33:27 +09:00
Peter Eisentraut 94b899b829 Upgrade to Autoconf 2.69 2013-12-18 20:53:23 -05:00
Robert Haas 001a573a20 Allow on-detach callbacks for dynamic shared memory segments.
Just as backends must clean up their shared memory state (releasing
lwlocks, buffer pins, etc.) before exiting, they must also perform
any similar cleanups related to dynamic shared memory segments they
have mapped before unmapping those segments.  So add a mechanism to
ensure that.

Existing on_shmem_exit hooks include both "user level" cleanup such
as transaction abort and removal of leftover temporary relations and
also "low level" cleanup that forcibly released leftover shared
memory resources.  On-detach callbacks should run after the first
group but before the second group, so create a new before_shmem_exit
function for registering the early callbacks and keep on_shmem_exit
for the regular callbacks.  (An earlier draft of this patch added an
additional argument to on_shmem_exit, but that had a much larger
footprint and probably a substantially higher risk of breaking third
party code for no real gain.)

Patch by me, reviewed by KaiGai Kohei and Andres Freund.
2013-12-18 13:09:09 -05:00
Bruce Momjian 613c6d26bd Fix incorrect error message reported for non-existent users
Previously, lookups of non-existent user names could return "Success";
it will now return "User does not exist" by resetting errno.  This also
centralizes the user name lookup code in libpgport.

Report and analysis by Nicolas Marchildon;  patch by me
2013-12-18 12:16:21 -05:00
Alvaro Herrera 11ac4c73cb Don't ignore tuple locks propagated by our updates
If a tuple was locked by transaction A, and transaction B updated it,
the new version of the tuple created by B would be locked by A, yet
visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the
lock held by A wouldn't get checked if transaction B later deleted (or
key-updated) the new version of the tuple.  This might cause referential
integrity checks to give false positives (that is, allow deletes that
should have been rejected).

This is an easy oversight to have made, because prior to improved tuple
locks in commit 0ac5ad5134 it wasn't possible to have tuples created by
our own transaction that were also locked by remote transactions, and so
locks weren't even considered in that code path.

It is recommended that foreign keys be rechecked manually in bulk after
installing this update, in case some referenced rows are missing with
some referencing row remaining.

Per bug reported by Daniel Wood in
CAPweHKe5QQ1747X2c0tA=5zf4YnS2xcvGf13Opd-1Mq24rF1cQ@mail.gmail.com
2013-12-18 13:45:51 -03:00
Tatsuo Ishii 65d6e4cb5c Add ALTER SYSTEM command to edit the server configuration file.
Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii,
Boszormenyi Zoltan, Andres Freund, Greg Smith and others.
2013-12-18 23:42:44 +09:00
Bruce Momjian dba5a9dda9 Comment: COPY comment improvement
Etsuro Fujita
2013-12-17 12:51:16 -05:00
Alvaro Herrera 3b97e6823b Rework tuple freezing protocol
Tuple freezing was broken in connection to MultiXactIds; commit
8e53ae025d tried to fix it, but didn't go far enough.  As noted by
Noah Misch, freezing a tuple whose Xmax is a multi containing an aborted
update might cause locks in the multi to go ignored by later
transactions.  This is because the code depended on a multixact above
their cutoff point not having any lock-only member older than the cutoff
point for Xids, which is easily defeated in READ COMMITTED transactions.

The fix for this involves creating a new MultiXactId when necessary.
But this cannot be done during WAL replay, and moreover multixact
examination requires using CLOG access routines which are not supposed
to be used during WAL replay either; so tuple freezing cannot be done
with the old freeze WAL record.  Therefore, separate the freezing
computation from its execution, and change the WAL record to carry all
necessary information.  At WAL replay time, it's easy to re-execute
freezing because we don't need to re-compute the new infomask/Xmax
values but just take them from the WAL record.

While at it, restructure the coding to ensure all page changes occur in
a single critical section without much room for failures.  The previous
coding wasn't using a critical section, without any explanation as to
why this was acceptable.

In replication scenarios using the 9.3 branch, standby servers must be
upgraded before their master, so that they are prepared to deal with the
new WAL record once the master is upgraded; failure to do so will cause
WAL replay to die with a PANIC message.  Later upgrade of the standby
will allow the process to continue where it left off, so there's no
disruption of the data in the standby in any case.  Standbys know how to
deal with the old WAL record, so it's okay to keep the master running
the old code for a while.

In master, the old freeze WAL record is gone, for cleanliness' sake;
there's no compatibility concern there.

Backpatch to 9.3, where the original bug was introduced and where the
previous fix was backpatched.

Álvaro Herrera and Andres Freund
2013-12-16 11:29:50 -03:00
Heikki Linnakangas 30b96549ab Mark variables 'static' where possible. Move GinFuzzySearchLimit to ginget.c
Per "clang -Wmissing-variable-declarations" output, posted by Andres Freund.
I didn't silence all those warnings, though, only the most obvious cases.
2013-12-16 11:41:17 +02:00
Tatsuo Ishii 1f0626ee40 Add "SHIFT_JIS" as an accepted encoding name for locale checking.
When locale is "ja_JP.SJIS", nl_langinfo(CODESET) returns "SHIFT_JIS"
on some platforms, at least on RedHat Linux. So the encoding/locale
match table (encoding_match_list) needs the entry. Otherwise client
encoding is set to SQL_ASCII.

Back patch to all supported branches.
2013-12-15 11:09:05 +09:00
Tom Lane 1b4f7f93b4 Allow empty target list in SELECT.
This fixes a problem noted as a followup to bug #8648: if a query has a
semantically-empty target list, e.g. SELECT * FROM zero_column_table,
ruleutils.c will dump it as a syntactically-empty target list, which was
not allowed.  There doesn't seem to be any reliable way to fix this by
hacking ruleutils (note in particular that the originally zero-column table
might since have had columns added to it); and even if we had such a fix,
it would do nothing for existing dump files that might contain bad syntax.
The best bet seems to be to relax the syntactic restriction.

Also, add parse-analysis errors for SELECT DISTINCT with no columns (after
*-expansion) and RETURNING with no columns.  These cases previously
produced unexpected behavior because the parsed Query looked like it had
no DISTINCT or RETURNING clause, respectively.  If anyone ever offers
a plausible use-case for this, we could work a bit harder on making the
situation distinguishable.

Arguably this is a bug fix that should be back-patched, but I'm worried
that there may be client apps or PLs that expect "SELECT ;" to throw a
syntax error.  The issue doesn't seem important enough to risk changing
behavior in minor releases.
2013-12-14 20:23:26 -05:00
Tom Lane c03ad5602f Fix inherited UPDATE/DELETE with UNION ALL subqueries.
Fix an oversight in commit b3aaf9081a1a95c245fd605dcf02c91b3a5c3a29: we do
indeed need to process the planner's append_rel_list when copying RTE
subqueries, because if any of them were flattenable UNION ALL subqueries,
the append_rel_list shows which subquery RTEs were pulled up out of which
other ones.  Without this, UNION ALL subqueries aren't correctly inserted
into the update plans for inheritance child tables after the first one,
typically resulting in no update happening for those child table(s).
Per report from Victor Yegorov.

Experimentation with this case also exposed a fault in commit
a7b965382cf0cb30aeacb112572718045e6d4be7: if an inherited UPDATE/DELETE
was proven totally dummy by constraint exclusion, we might arrive at
add_rtes_to_flat_rtable with root->simple_rel_array being NULL.  This
should be interpreted as not having any RelOptInfos.  I chose to code
the guard as a check against simple_rel_array_size, so as to also
provide some protection against indexing off the end of the array.

Back-patch to 9.2 where the faulty code was added.
2013-12-14 17:33:53 -05:00
Alvaro Herrera 60eea3780c Fix typo 2013-12-13 17:27:16 -03:00
Alvaro Herrera d881dd6233 Rework MultiXactId cache code
The original performs too poorly; in some scenarios it shows way too
high while profiling.  Try to make it a bit smarter to avoid excessive
cosst.  In particular, make it have a maximum size, and have entries be
sorted in LRU order; once the max size is reached, evict the oldest
entry to avoid it from growing too large.

Per complaint from Andres Freund in connection with new tuple freezing
code.
2013-12-13 17:16:25 -03:00
Tom Lane 2efc6dc256 Add HOLD/RESUME_INTERRUPTS in HandleCatchupInterrupt/HandleNotifyInterrupt.
This prevents a possible longjmp out of the signal handler if a timeout
or SIGINT occurs while something within the handler has transiently set
ImmediateInterruptOK.  For safety we must hold off the timeout or cancel
error until we're back in mainline, or at least till we reach the end of
the signal handler when ImmediateInterruptOK was true at entry.  This
syncs these functions with the logic now present in handle_sig_alarm.

AFAICT there is no live bug here in 9.0 and up, because I don't think we
currently can wait for any heavyweight lock inside these functions, and
there is no other code (except read-from-client) that will turn on
ImmediateInterruptOK.  However, that was not true pre-9.0: in older
branches ProcessIncomingNotify might block trying to lock pg_listener, and
then a SIGINT could lead to undesirable control flow.  It might be all
right anyway given the relatively narrow code ranges in which NOTIFY
interrupts are enabled, but for safety's sake I'm back-patching this.
2013-12-13 14:05:51 -05:00
Heikki Linnakangas dde6282500 Fix more instances of "the the" in comments.
Plus one instance of "to to" in the docs.
2013-12-13 20:02:01 +02:00
Tom Lane e8312b4f03 Don't let timeout interrupts happen unless ImmediateInterruptOK is set.
Serious oversight in commit 16e1b7a1b7f7ffd8a18713e83c8cd72c9ce48e07:
we should not allow an interrupt to take control away from mainline code
except when ImmediateInterruptOK is set.  Just to be safe, let's adopt
the same save-clear-restore dance that's been used for many years in
HandleCatchupInterrupt and HandleNotifyInterrupt, so that nothing bad
happens if a timeout handler invokes code that tests or even manipulates
ImmediateInterruptOK.

Per report of "stuck spinlock" failures from Christophe Pettus, though
many other symptoms are possible.  Diagnosis by Andres Freund.
2013-12-13 11:50:15 -05:00
Heikki Linnakangas 50e547096c Add GUC to enable WAL-logging of hint bits, even with checksums disabled.
WAL records of hint bit updates is useful to tools that want to examine
which pages have been modified. In particular, this is required to make
the pg_rewind tool safe (without checksums).

This can also be used to test how much extra WAL-logging would occur if
you enabled checksums, without actually enabling them (which you can't
currently do without re-initdb'ing).

Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with
further changes by me.
2013-12-13 16:26:14 +02:00
Heikki Linnakangas a49633d8dc Fix WAL-logging of setting the visibility map bit.
The operation that removes the remaining dead tuples from the page must
be WAL-logged before the setting of the VM bit. Otherwise, if you replay
the WAL to between those two records, you end up with the VM bit set, but
the dead tuples are still there.

Backpatch to 9.3, where this bug was introduced.
2013-12-13 14:15:04 +02:00
Tom Lane ccca6f56f5 Fix ancient docs/comments thinko: XID comparison is mod 2^32, not 2^31.
Pointed out by Gianni Ciolli.
2013-12-12 12:39:48 -05:00
Tom Lane f26099057a Improve EXPLAIN to print the grouping columns in Agg and Group nodes.
Per request from Kevin Grittner.
2013-12-12 11:24:38 -05:00
Simon Riggs 8693559cac New autovacuum_work_mem parameter
If autovacuum_work_mem is set, autovacuum workers now use
this parameter in preference to maintenance_work_mem.

Peter Geoghegan
2013-12-12 11:42:39 +00:00
Simon Riggs 36da3cfb45 Allow time delayed standbys and recovery
Set min_recovery_apply_delay to force a delay in recovery apply for commit and
restore point WAL records. Other records are replayed immediately. Delay is
measured between WAL record time and local standby time.

Robert Haas, Fabrízio de Royes Mello and Simon Riggs
Detailed review by Mitsumasa Kondo
2013-12-12 10:53:20 +00:00
Heikki Linnakangas 108e3992cd Display old and new values in pg_resetxlog -n output.
For extra clarity.

Rajeev Rastogi, reviewed by Amit Kapila
2013-12-12 11:57:18 +02:00
Tom Lane 22310b808d Remove bogus executable permissions on xlog.c.
Apparently fat-fingered in 1a3d104475.
Noted by Peter Geoghegan.
2013-12-11 22:12:25 -05:00
Tom Lane 6bff0e7d92 Add a regression test case for plpython function returning setof RECORD.
We had coverage for functions returning setof a named composite type,
but not for anonymous records, which is a somewhat different code path.
In view of recent crash report from Sergey Konoplev, this seems worth
testing, though I doubt there's any deterministic bug here today.
2013-12-11 17:22:55 -05:00
Simon Riggs cf589c9c1f Regression tests for SCHEMA commands
Hari Babu Kommi reviewed by David Rowley
2013-12-11 20:45:15 +00:00
Simon Riggs b921a26fb8 Regression tests for ALTER TABLESPACE RENAME,OWNER
Hari Babu Kommi reviewed by David Rowley
2013-12-11 20:42:58 +00:00
Tom Lane b5e0a2a384 Tweak placement of explicit ANALYZE commands in the regression tests.
Make the COPY test, which loads most of the large static tables used in
the tests, also explicitly ANALYZE those tables.  This allows us to get
rid of various ad-hoc, and rather redundant, ANALYZE commands that had
gotten stuck into various test scripts over time to ensure we got
consistent plan choices.  (We could have done a database-wide ANALYZE,
but that would cause stats to get attached to the small static tables
too, which results in plan changes compared to the historical behavior.
I'm not sure that's a good idea, so not going that far for now.)

Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing
regression tests for lack of an "ANALYZE tenk1" in the subselect test.
There's no need for this in 8.4 since we didn't print any plans back
then.
2013-12-11 15:09:15 -05:00
Robert Haas 60dd40bbda Under wal_level=logical, when saving old tuples, always save OID.
There's no real point in not doing this.  It doesn't cost anything
in performance or space.  So let's go wild.

Andres Freund, with substantial editing as to style by me.
2013-12-11 13:19:31 -05:00
Kevin Grittner 09df854b8a Add table name to VACUUM statement in matview.c.
The test only needs the one table to be vacuumed.  Vacuuming the
database may affect other tests.

Per gripe from Tom Lane.  Back-patch to 9.3, where the test was
was added.
2013-12-11 08:53:03 -06:00
Peter Eisentraut e5dc4cc24d PL/Perl: Add event trigger support
From: Dimitri Fontaine <dimitri@2ndQuadrant.fr>
2013-12-11 08:11:59 -05:00
Robert Haas 6bea96dd49 Add a new option, -g, to createuser, to add membership in a role.
Chistopher Browne, reviewed by Sameer Thakur, Amit Kapila, and
Peter Eisentraut.
2013-12-11 07:50:36 -05:00
Robert Haas 66abc2608c Add a new reloption, user_catalog_table.
When this reloption is set and wal_level=logical is configured,
we'll record the CIDs stamped by inserts, updates, and deletes to
the table just as we would for an actual catalog table.  This will
allow logical decoding to use historical MVCC snapshots to access
such tables just as they access ordinary catalog tables.

Replication solutions built around the logical decoding machinery
will likely need to set this operation for their configuration
tables; it might also be needed by extensions which perform table
access in their output functions.

Andres Freund, reviewed by myself and others.
2013-12-10 19:17:34 -05:00
Robert Haas e55704d8b2 Add new wal_level, logical, sufficient for logical decoding.
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983.  This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all.  Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.

Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.

Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
2013-12-10 19:01:40 -05:00
Tom Lane 9ec6199d18 Fix possible crash with nested SubLinks.
An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...)
could produce an invalid plan that results in a crash at execution time,
if the planner attempts to flatten the outer IN into a semi-join.
This happens because convert_testexpr() was not expecting any nested
SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging
to the inner SubLink.  (I think the comment denying that this case could
happen was wrong when written; it's certainly been wrong for quite a long
time, since very early versions of the semijoin flattening logic.)

Per report from Teodor Sigaev.  Back-patch to all supported branches.
2013-12-10 16:10:17 -05:00
Noah Misch 53685d7981 Rename TABLE() to ROWS FROM().
SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and
other collection types.  This feature, however, deals with set-returning
functions.  Use a different syntax for this feature to keep open the
possibility of implementing the standard TABLE().
2013-12-10 09:34:37 -05:00
Robert Haas d9250da032 Fixups for dsm.c's file descriptor handling.
Per complaint from Tom Lane.
2013-12-09 11:15:19 -05:00
Peter Eisentraut 3164721462 SSL: Support ECDH key exchange
This sets up ECDH key exchange, when compiling against OpenSSL that
supports EC.  Then the ECDHE-RSA and ECDHE-ECDSA cipher suites can be
used for SSL connections.  The latter one means that EC keys are now
usable.

The reason for EC key exchange is that it's faster than DHE and it
allows to go to higher security levels where RSA will be horribly slow.

There is also new GUC option ssl_ecdh_curve that specifies the curve
name used for ECDH.  It defaults to "prime256v1", which is the most
common curve in use in HTTPS.

From: Marko Kreen <markokr@gmail.com>
Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
2013-12-07 15:11:44 -05:00
Peter Eisentraut ef3267523d SSL: Add configuration option to prefer server cipher order
By default, OpenSSL (and SSL/TLS in general) lets the client cipher
order take priority.  This is OK for browsers where the ciphers were
tuned, but few PostgreSQL client libraries make the cipher order
configurable.  So it makes sense to have the cipher order in
postgresql.conf take priority over client defaults.

This patch adds the setting "ssl_prefer_server_ciphers" that can be
turned on so that server cipher order is preferred.  Per discussion,
this now defaults to on.

From: Marko Kreen <markokr@gmail.com>
Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
2013-12-07 08:13:50 -05:00
Alvaro Herrera 312bde3d40 Fix improper abort during update chain locking
In 247c76a989, I added some code to do fine-grained checking of
MultiXact status of locking/updating transactions when traversing an
update chain.  There was a thinko in that patch which would have the
traversing abort, that is return HeapTupleUpdated, when the other
transaction is a committed lock-only.  In this case we should ignore it
and return success instead.  Of course, in the case where there is a
committed update, HeapTupleUpdated is the correct return value.

A user-visible symptom of this bug is that in REPEATABLE READ and
SERIALIZABLE transaction isolation modes spurious serializability errors
can occur:
  ERROR:  could not serialize access due to concurrent update

In order for this to happen, there needs to be a tuple that's key-share-
locked and also updated, and the update must abort; a subsequent
transaction trying to acquire a new lock on that tuple would abort with
the above error.  The reason is that the initial FOR KEY SHARE is seen
as committed by the new locking transaction, which triggers this bug.
(If the UPDATE commits, then the serialization error is correctly
reported.)

When running a query in READ COMMITTED mode, what happens is that the
locking is aborted by the HeapTupleUpdated return value, then
EvalPlanQual fetches the newest version of the tuple, which is then the
only version that gets locked.  (The second time the tuple is checked
there is no misbehavior on the committed lock-only, because it's not
checked by the code that traverses update chains; so no bug.) Only the
newest version of the tuple is locked, not older ones, but this is
harmless.

The isolation test added by this commit illustrates the desired
behavior, including the proper serialization errors that get thrown.

Backpatch to 9.3.
2013-12-05 17:47:51 -03:00
Tom Lane 74242c23c1 Clear retry flags properly in replacement OpenSSL sock_write function.
Current OpenSSL code includes a BIO_clear_retry_flags() step in the
sock_write() function.  Either we failed to copy the code correctly, or
they added this since we copied it.  In any case, lack of the clear step
appears to be the cause of the server lockup after connection loss reported
in bug #8647 from Valentine Gogichashvili.  Assume that this is correct
coding for all OpenSSL versions, and hence back-patch to all supported
branches.

Diagnosis and patch by Alexander Kukushkin.
2013-12-05 12:48:28 -05:00
Alvaro Herrera 07aeb1fec5 Avoid resetting Xmax when it's a multi with an aborted update
HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while
checking the contents of a multixact and finding it contains an aborted
update, by setting the HEAP_XMAX_INVALID bit.  This would lead to
concurrent transactions not noticing any previous locks held by
transactions that might still be running, and thus being able to acquire
subsequent locks they wouldn't be normally able to acquire.

This bug was introduced in commit 1ce150b7bb; backpatch this fix to 9.3,
like that commit.

This change reverts the change to the delete-abort-savept isolation test
in 1ce150b7bb, because that behavior change was caused by this bug.

Noticed by Andres Freund while investigating a different issue reported
by Noah Misch.
2013-12-05 12:21:55 -03:00