Commit Graph

13902 Commits

Author SHA1 Message Date
Tom Lane cba6ffaef3 Cygwin build fixes.
Get rid of use of dlltool for linking the main postgres executable.
dlltool is obsolete and we'd prefer to stop depending on it.

Also, include $(LDAP_LIBS_FE) in $(libpq_pgport).  (It's not clear that
this is really needed, or why it's not a linker bug if it is needed.
But reports are that it's needed on current Cygwin.)

We might want to back-patch this if it works, but first let's see
what the buildfarm thinks.

Marco Atzeri
2014-02-11 12:10:52 -05:00
Heikki Linnakangas d699ba4134 Fix WakeupWaiters() to not wake up an exclusive locker unnecessarily.
WakeupWaiters() is supposed to wake up all LW_WAIT_UNTIL_FREE waiters of
the slot, but the loop incorrectly also woke up the first LW_EXCLUSIVE
waiter, if there was no LW_WAIT_UNTIL_FREE waiters in the queue.

Noted by Andres Freund. This code is new in 9.4, so no backpatching.
2014-02-10 15:18:18 +02:00
Heikki Linnakangas 6c2744f1d3 Use memmove() instead of memcpy() for copying overlapping regions.
In commit d2495f272c, I fixed this bug in
to_tsquery(), but missed the fact that plainto_tsquery() has the same bug.
2014-02-10 09:57:59 +02:00
Peter Eisentraut 66c04c981d Mark some more variables as static or include the appropriate header
Detected by clang's -Wmissing-variable-declarations.

From: Andres Freund <andres@anarazel.de>
2014-02-08 21:21:46 -05:00
Heikki Linnakangas 6aa2bdf6a0 Initialize the entryRes array between each call to triConsistent.
The shimTriConstistentFn, which calls the opclass's consistent function with
all combinations of TRUE/FALSE for any MAYBE argument, modifies the entryRes
array passed by the caller. Change startScanKey to re-initialize it between
each call to accommodate that.

It's actually a bad habit by shimTriConsistentFn to modify its argument. But
the only caller that doesn't already re-initialize the entryRes array was
startScanKey, and it's easy for startScanKey to do so. Add a comment to
shimTriConsistentFn about that.

Note: this does not give a free pass to opclass-provided consistent
functions to modify the entryRes argument; shimTriConsistent assumes that
they don't, even though it does it itself.

While at it, refactor startScanKey to allocate the requiredEntries and
additionalEntries after it knows exactly how large they need to be. Saves a
little bit of memory, and looks nicer anyway.

Per complaint by Tom Lane, buildfarm and the pg_trgm regression test.
2014-02-07 18:53:31 +02:00
Heikki Linnakangas dbc649fd77 Speed up "rare & frequent" type GIN queries.
If you have a GIN query like "rare & frequent", we currently fetch all the
items that match either rare or frequent, call the consistent function for
each item, and let the consistent function filter out items that only match
one of the terms. However, if we can deduce that "rare" must be present for
the overall qual to be true, we can scan all the rare items, and for each
rare item, skip over to the next frequent item with the same or greater TID.
That greatly speeds up "rare & frequent" type queries.

To implement that, introduce the concept of a tri-state consistent function,
where the 3rd value is MAYBE, indicating that we don't know if that term is
present. Operator classes only provide a boolean consistent function, so we
simulate the tri-state consistent function by calling the boolean function
several times, with the MAYBE arguments set to all combinations of TRUE and
FALSE. Testing all combinations is only feasible for a small number of MAYBE
arguments, but it is envisioned that we'll provide a way for operator
classes to provide a native tri-state consistent function, which can be much
more efficient. But that is not included in this patch.

We were already using that trick to for lossy pages, calling the consistent
function with the lossy entry set to TRUE and FALSE. Now that we have the
tri-state consistent function, use it for lossy pages too.

Alexander Korotkov, with fair amount of refactoring by me.
2014-02-07 15:22:48 +02:00
Heikki Linnakangas e001030c27 Fix thinko in comment.
Amit Langote
2014-02-07 10:28:06 +02:00
Tom Lane 8de3e410fa In RelationClearRelation, postpone cache reload if !IsTransactionState().
We may process relcache flush requests during transaction startup or
shutdown.  In general it's not terribly safe to do catalog access at those
times, so the code's habit of trying to immediately revalidate unflushable
relcache entries is risky.  Although there are no field trouble reports
that are positively traceable to this, we have been able to demonstrate
failure of the assertions recently added in RelationIdGetRelation() and
SearchCatCache().  On the other hand, it seems safe to just postpone
revalidation of the cache entry until we're inside a valid transaction.
The one case where this is questionable is where we're exiting a
subtransaction and the outer transaction is holding the relcache entry open
--- but if we made any significant changes to the rel inside such a
subtransaction, we've got problems anyway.  There are mechanisms in place
to prevent that (to wit, locks for cross-session cases and
CheckTableNotInUse() for intra-session cases), so let's trust to those
mechanisms to keep us out of trouble.
2014-02-06 19:38:06 -05:00
Andrew Dunstan 45e1b6c4c4 Alphabeticize list in OBJS definition in utils/adt Makefile. 2014-02-06 12:11:49 -05:00
Tom Lane ddfc9cb054 Assert(IsTransactionState()) in RelationIdGetRelation().
Commit 42c80c696e added an
Assert(IsTransactionState()) in SearchCatCache(), to catch
any code that thought it could do a catcache lookup outside
transactions.  Extend the same idea to relcache lookups.
2014-02-06 11:28:13 -05:00
Peter Eisentraut f65233755c Fix whitespace 2014-02-05 23:12:51 -05:00
Tom Lane ac8bc3b6e4 Remove unnecessary relcache flushes after changing btree metapages.
These flushes were added in my commit d2896a9ed, which added the btree
logic that keeps a cached copy of the index metapage data in index relcache
entries.  The idea was to ensure that other backends would promptly update
their cached copies after a change.  However, this is not really necessary,
since _bt_getroot() has adequate defenses against believing a stale root
page link, and _bt_getrootheight() doesn't have to be 100% right.
Moreover, if it were necessary, a relcache flush would be an unreliable way
to do it, since the sinval mechanism believes that relcache flush requests
represent transactional updates, and therefore discards them on transaction
rollback.  Therefore, we might as well drop these flush requests and save
the time to rebuild the whole relcache entry after a metapage change.

If we ever try to support in-place truncation of btree indexes, it might
be necessary to revisit this issue so that _bt_getroot() can't get caught
by trying to follow a metapage link to a page that no longer exists.
A possible solution to that is to make use of an smgr, rather than
relcache, inval request to force other backends to discard their cached
metapages.  But for the moment this is not worth pursuing.
2014-02-05 13:43:46 -05:00
Fujii Masao 489e6ac5a1 Fix comparison of an array of characters with zero to compare with '\0' instead.
Report from Andres Freund.
2014-02-04 10:59:39 +09:00
Tom Lane 0c2338abbb Fix lexing of U& sequences just before EOF.
Commit a5ff502fce was a brick shy of a load
in the backend lexer too, not just psql.  Per further testing of bug #9068.

In passing, improve related comments.
2014-02-03 19:47:57 -05:00
Tom Lane 0def2573c5 Fix *-qualification of named parameters in SQL-language functions.
Given a composite-type parameter named x, "$1.*" worked fine, but "x.*"
not so much.  This has been broken since named parameter references were
added in commit 9bff0780cf, so patch back
to 9.2.  Per bug #9085 from Hardy Falk.
2014-02-03 14:47:17 -05:00
Andrew Dunstan d3ee45152b In json code, clean up temp memory contexts after processing.
Craig Ringer.
2014-02-03 10:40:12 -05:00
Fujii Masao 3e8554a54a Make pg_basebackup skip temporary statistics files.
The temporary statistics files don't need to be included in the backup
because they are always reset at the beginning of the archive recovery.
This patch changes pg_basebackup so that it skips all files located in
$PGDATA/pg_stat_tmp or the directory specified by stats_temp_directory
parameter.
2014-02-03 23:19:49 +09:00
Tom Lane 46825d4978 Clean up some sloppy coding in repl_gram.y.
Remove unused copy-and-pasted macro definitions, and improve formatting
of recently-added productions.

I got interested in this because buildfarm member protosciurus has been
crashing in "bison repl_gram.y" since commit 858ec11.  It's a long shot
that this will fix that, though maybe the missing trailing semicolon
has something to do with it?  In any case, there's no need to approve
of dead code, nor of code whose formatting isn't even self-consistent
let alone consistent with what's around it.
2014-02-02 12:51:14 -05:00
Fujii Masao 0753bdb352 Add primary_slotname to recovery.conf.sample. 2014-02-03 00:41:50 +09:00
Fujii Masao 63be3b78f6 Fix typos in docs and comments.
Thom Brown
2014-02-02 10:28:18 +09:00
Tom Lane 082c0dfa14 Fix some wide-character bugs in the text-search parser.
In p_isdigit and other character class test functions generated by the
p_iswhat macro, the code path for non-C locales with multibyte encodings
contained a bogus pointer cast that would accidentally fail to malfunction
if types wchar_t and wint_t have the same width.  Apparently that is true
on most platforms, but not on recent Cygwin releases.  Remove the cast,
as it seems completely unnecessary (I think it arose from a false analogy
to the need to cast to unsigned char when dealing with the <ctype.h>
functions).  Per bug #8970 from Marco Atzeri.

In the same functions, the code path for C locale with a multibyte encoding
simply ANDed each wide character with 0xFF before passing it to the
corresponding <ctype.h> function.  This could result in false positive
answers for some non-ASCII characters, so use a range test instead.
Noted by me while investigating Marco's complaint.

Also, remove some useless though not actually buggy maskings and casts
in the hand-coded p_isalnum and p_isalpha functions, which evidently
got tested a bit more carefully than the macro-generated functions.
2014-02-01 18:27:34 -05:00
Tom Lane 214c7a4f0b Fix some more bugs in signal handlers and process shutdown logic.
WalSndKill was doing things exactly backwards: it should first clear
MyWalSnd (to stop signal handlers from touching MyWalSnd->latch),
then disown the latch, and only then mark the WalSnd struct unused by
clearing its pid field.

Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve
errno, which is surely a requirement for any signal handler.

Per discussion of recent buildfarm failures.  Back-patch as far
as the relevant code exists.
2014-02-01 16:21:23 -05:00
Bruce Momjian d0ee93797d arrays: tighten checks for multi-dimensional input
Previously an input array string that started with a single-element
array dimension would then later accept a multi-dimensional segment.

BACKWARD INCOMPATIBILITY
2014-02-01 10:49:17 -05:00
Robert Haas 858ec11858 Introduce replication slots.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts.  Slots have some advantages over existing
techniques, as explained in the documentation.

In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.

Andres Freund and Robert Haas
2014-01-31 22:45:36 -05:00
Robert Haas d1981719ad Clear MyProc and MyProcSignalState before they become invalid.
Evidence from buildfarm member crake suggests that the new test_shm_mq
module is routinely crashing the server due to the arrival of a SIGUSR1
after the shared memory segment has been unmapped.  Although processes
using the new dynamic background worker facilities are more likely to
receive a SIGUSR1 around this time, the problem is also possible on older
branches, so I'm back-patching the parts of this change that apply to
older branches as far as they apply.

It's already generally the case that code checks whether these pointers
are NULL before deferencing them, so the important thing is mostly to
make sure that they do get set to NULL before they become invalid.  But
in master, there's one case in procsignal_sigusr1_handler that lacks a
NULL guard, so add that.

Patch by me; review by Tom Lane.
2014-01-31 21:31:08 -05:00
Tom Lane 326e1d73c4 Disallow use of SSL v3 protocol in the server as well as in libpq.
Commit 820f08cabd claimed to make the server
and libpq handle SSL protocol versions identically, but actually the server
was still accepting SSL v3 protocol while libpq wasn't.  Per discussion,
SSL v3 is obsolete, and there's no good reason to continue to accept it.
So make the code really equivalent on both sides.  The behavior now is
that we use the highest mutually-supported TLS protocol version.

Marko Kreen, some comment-smithing by me
2014-01-31 17:51:18 -05:00
Tom Lane 043f6ff05d Fix bogus handling of "postponed" lateral quals.
When pulling a "postponed" qual from a LATERAL subquery up into the quals
of an outer join, we must make sure that the postponed qual is included
in those seen by make_outerjoininfo().  Otherwise we might compute a
too-small min_lefthand or min_righthand for the outer join, leading to
"JOIN qualification cannot refer to other relations" failures from
distribute_qual_to_rels.  Subtler errors in the created plan seem possible,
too, if the extra qual would only affect join ordering constraints.

Per bug #9041 from David Leverton.  Back-patch to 9.3.
2014-01-30 14:51:16 -05:00
Bruce Momjian 146604ec43 Add checks for interval overflow/underflow
New checks include input, month/day/time internal adjustments, addition,
subtraction, multiplication, and negation.  Also adjust docs to
correctly specify interval size in bytes.

Report from Rok Kralj
2014-01-30 09:41:43 -05:00
Tom Lane 571addd729 Fix unsafe references to errno within error messaging logic.
Various places were supposing that errno could be expected to hold still
within an ereport() nest or similar contexts.  This isn't true necessarily,
though in some cases it accidentally failed to fail depending on how the
compiler chanced to order the subexpressions.  This class of thinko
explains recent reports of odd failures on clang-built versions, typically
missing or inappropriate HINT fields in messages.

Problem identified by Christian Kruse, who also submitted the patch this
commit is based on.  (I fixed a few issues in his patch and found a couple
of additional places with the same disease.)

Back-patch as appropriate to all supported branches.
2014-01-29 20:04:43 -05:00
Andrew Dunstan 120c5cc761 Silence compiler warnings about possibly unset variables.
They are in fact set in every case where they are needed, but the
compiler doesn't know that.

Per gripe from Tom Lane.
2014-01-29 18:54:14 -05:00
Robert Haas 9347baa5bb Include planning time in EXPLAIN ANALYZE output.
This doesn't work for prepared queries, but it's not too easy to get
the information in that case and there's some debate as to exactly
what the right thing to measure is, so just do this for now.

Andreas Karlsson, with slight doc changes by me.
2014-01-29 16:09:15 -05:00
Andrew Dunstan 5264d91541 Add json_array_elements_text function.
This was a notable omission from the json functions added in 9.3 and
there have been numerous complaints about its absence.

Laurence Rowe.
2014-01-29 15:39:01 -05:00
Heikki Linnakangas 699b1f40da Fix thinko in huge_tlb_pages patch.
We calculated the rounded-up size for the allocation, but then failed to
use the rounded-up value in the mmap() call. Oops.

Also, initialize allocsize, to silence warnings seen with some compilers,
as pointed out by Jeff Janes.
2014-01-29 21:33:56 +02:00
Heikki Linnakangas 626a120656 Further optimize GIN multi-key searches.
When skipping over some items in a posting tree, re-find the new location
by descending the tree from root, rather than walking the right links.
This can save a lot of I/O.

Heavily modified from Alexander Korotkov's fast scan patch.
2014-01-29 21:24:38 +02:00
Heikki Linnakangas 25b1dafab6 Further optimize multi-key GIN searches.
If we're skipping past a certain TID, avoid decoding posting list segments
that only contain smaller TIDs.

Extracted from Alexander Korotkov's fast scan patch, heavily modified.
2014-01-29 18:26:40 +02:00
Heikki Linnakangas e20c70cb0f Allow skipping some items in a multi-key GIN search.
In a multi-key search, ie. something like "col @> 'foo' AND col @> 'bar'",
as soon as we find the next item that matches the first criteria, we don't
need to check the second criteria for TIDs smaller the first match. That
saves a lot of effort, especially if one of the terms is rare, while the
second occurs very frequently.

Based on ideas from Alexander Korotkov's fast scan patch.
2014-01-29 17:53:39 +02:00
Heikki Linnakangas 1a3458b6d8 Allow using huge TLB pages on Linux (MAP_HUGETLB)
This patch adds an option, huge_tlb_pages, which allows requesting the
shared memory segment to be allocated using huge pages, by using the
MAP_HUGETLB flag in mmap(). This can improve performance.

The default is 'try', which means that we will attempt using huge pages,
and fall back to non-huge pages if it doesn't work. Currently, only Linux
has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as
'off'.

In the passing, don't try to round the mmap() size to a multiple of
pagesize. mmap() doesn't require that, and there's no particular reason for
PostgreSQL to do that either. When using MAP_HUGETLB, however, round the
request size up to nearest 2MB boundary. This is to work around a bug in
some Linux kernel versions, but also to avoid wasting memory, because the
kernel will round the size up anyway.

Many people were involved in writing this patch, including Christian Kruse,
Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund
and me.
2014-01-29 14:08:30 +02:00
Robert Haas b7643b19f0 Fix compiler warning in EXEC_BACKEND builds.
Per a report by Rajeev Rastogi.
2014-01-28 23:35:50 -05:00
Andrew Dunstan 105639900b New json functions.
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.

Catalog version bumped.

Andrew Dunstan, reviewed by Marko Tiikkaja.
2014-01-28 17:48:21 -05:00
Fujii Masao 9132b189bf Add pg_stat_archiver statistics view.
This view shows the statistics about the WAL archiver process's activity.

Gabriele Bartolini, reviewed by Michael Paquier, refactored a bit by me.
2014-01-29 02:58:22 +09:00
Bruce Momjian c871e8f53b Revert C comment change in slot_attisnull()
Revert 89774b58b0
2014-01-28 12:28:14 -05:00
Stephen Frost aef61bf433 Revert dup2() checking in syslogger.c
Per the expanded comment-

As we're just trying to reset these to go to DEVNULL, there's not
much point in checking for failure from the close/dup2 calls here,
if they fail then presumably the file descriptors are closed and
any writes will go into the bitbucket anyway.

Pointed out by Tom.
2014-01-28 08:40:41 -05:00
Tom Lane 64e43c59b8 Log a detail message for auth failures due to missing or expired password.
It's worth distinguishing these cases from run-of-the-mill wrong-password
problems, since users have been known to waste lots of time pursuing the
wrong theory about what's failing.  Now, our longstanding policy about how
to report authentication failures is that we don't really want to tell the
*client* such things, since that might be giving information to a bad guy.
But there's nothing wrong with reporting the details to the postmaster log,
and indeed the comments in this area of the code contemplate that
interesting details should be so reported.  We just weren't handling these
particular interesting cases usefully.

To fix, add infrastructure allowing subroutines of ClientAuthentication()
to return a string to be added to the errdetail_log field of the main
authentication-failed error report.  We might later want to use this to
report other subcases of authentication failure the same way, but for the
moment I just dealt with password cases.

Per discussion of a patch from Josh Drake, though this is not what
he proposed.
2014-01-27 21:04:09 -05:00
Robert Haas ea9df812d8 Relax the requirement that all lwlocks be stored in a single array.
This makes it possible to store lwlocks as part of some other data
structure in the main shared memory segment, or in a dynamic shared
memory segment.  There is still a main LWLock array and this patch does
not move anything out of it, but it provides necessary infrastructure
for doing that in the future.

This change is likely to increase the size of LWLockPadded on some
platforms, especially 32-bit platforms where it was previously only
16 bytes.

Patch by me.  Review by Andres Freund and KaiGai Kohei.
2014-01-27 11:07:44 -05:00
Heikki Linnakangas f62eba204f Fix typo in README
Amit Langote
2014-01-27 09:33:18 +02:00
Tom Lane 2850896961 Code review for auto-tuned effective_cache_size.
Fix integer overflow issue noted by Magnus Hagander, as well as a bunch
of other infelicities in commit ee1e5662d8
and its unreasonably large number of followups.
2014-01-27 00:05:56 -05:00
Fujii Masao dd515d4082 Change the suffix of auto conf temporary file from "temp" to "tmp".
Michael Paquier
2014-01-27 12:39:11 +09:00
Fujii Masao 7c619be623 Fix typos in comments for ALTER SYSTEM.
Michael Paquier
2014-01-27 12:23:20 +09:00
Stephen Frost 790eaa699e Check dup2() results in syslogger
Consistently check the dup2() call results throughout syslogger.c.
It's pretty unlikely that they'll error out, but if they do,
ereport(FATAL) instead of blissfully continuing on.

Spotted by the Coverity scanner.
2014-01-26 16:26:18 -05:00
Andrew Dunstan cec8394b5c Enable building with Visual Studion 2013.
Backpatch to 9.3.

Brar Piening.
2014-01-26 09:49:10 -05:00
Bruce Momjian 89774b58b0 Adjust C comment in slot_attisnull() regarding nulls. 2014-01-25 16:43:36 -05:00
Heikki Linnakangas 71c6a8e375 Add recovery_target='immediate' option.
This allows ending recovery as a consistent state has been reached. Without
this, there was no easy way to e.g restore an online backup, without
replaying any extra WAL after the backup ended.

MauMau and me.
2014-01-25 17:34:04 +02:00
Heikki Linnakangas d150ff5781 Reset unused fields in GIN data leaf page footer.
The maxoff field is not used in the new, compressed page format. Let's
reset it when converting an old-format page to the new format. The code
won't care either way, but this makes it possible to use the field for
something else in the future.
2014-01-24 19:10:10 +02:00
Heikki Linnakangas a8f374849f Fix off-by-one in newly-introdcued GIN assertion.
Spotted by Alexander Korotkov
2014-01-24 11:10:09 +02:00
Heikki Linnakangas 398cf255ad In GIN recompression code, use mmemove rather than memcpy, for vacuum.
When vacuuming a data leaf page, any compressed posting lists that are not
modified, are copied back to the buffer from a later location in the same
buffer rather than from  a palloc'd copy. IOW, they are just moved
downwards in the same buffer. Because the source and destination addresses
can overlap, we must use memmove rather than memcpy.

Report and fix by Alexander Korotkov.
2014-01-24 10:48:45 +02:00
Stephen Frost fbe19ee3b8 ALTER TABLESPACE ... MOVE ... OWNED BY
Add the ability to specify the objects to move by who those objects are
owned by (as relowner) and change ALL to mean ALL objects.  This
makes the command always operate against a well-defined set of objects
and not have the objects-to-be-moved based on the role of the user
running the command.

Per discussion with Simon and Tom.
2014-01-23 23:52:40 -05:00
Tom Lane ac4ef637ad Allow use of "z" flag in our printf calls, and use it where appropriate.
Since C99, it's been standard for printf and friends to accept a "z" size
modifier, meaning "whatever size size_t has".  Up to now we've generally
dealt with printing size_t values by explicitly casting them to unsigned
long and using the "l" modifier; but this is really the wrong thing on
platforms where pointers are wider than longs (such as Win64).  So let's
start using "z" instead.  To ensure we can do that on all platforms, teach
src/port/snprintf.c to understand "z", and add a configure test to force
use of that implementation when the platform's version doesn't handle "z".

Having done that, modify a bunch of places that were using the
unsigned-long hack to use "z" instead.  This patch doesn't pretend to have
gotten everyplace that could benefit, but it catches many of them.  I made
an effort in particular to ensure that all uses of the same error message
text were updated together, so as not to increase the number of
translatable strings.

It's possible that this change will result in format-string warnings from
pre-C99 compilers.  We might have to reconsider if there are any popular
compilers that will warn about this; but let's start by seeing what the
buildfarm thinks.

Andres Freund, with a little additional work by me
2014-01-23 17:18:33 -05:00
Heikki Linnakangas ec8f692c3c Fix alignment of GIN in-line posting lists stored in entry tuples.
The Sparc machines in the buildfarm are crashing because of misaligned
access to posting lists stored in entry tuples.

I accidentally removed a critical SHORTALIGN() from ginFormTuple, as part
of the packed posting lists patch. Perhaps I thought it was unnecessary,
because the index_form_tuple() call above the SHORTALIGN already aligned
the size, missing the fact that the null-category byte makes it misaligned
again (I think the SHORTALIGN is indeed unnecessary if there's no null-
category byte, but let's just play it safe...)
2014-01-23 22:58:12 +02:00
Heikki Linnakangas 0fdb2f7d7c Silence compiler warning.
Not all compilers understand that elog(ERROR, ...) never returns.
2014-01-23 22:15:31 +02:00
Alvaro Herrera b152c6cd0d Make DROP IF EXISTS more consistently not fail
Some cases were still reporting errors and aborting, instead of a NOTICE
that the object was being skipped.  This makes it more difficult to
cleanly handle pg_dump --clean, so change that to instead skip missing
objects properly.

Per bug #7873 reported by Dave Rolsky; apparently this affects a large
number of users.

Authors: Pavel Stehule and Dean Rasheed.  Some tweaks by Álvaro Herrera
2014-01-23 14:40:29 -03:00
Heikki Linnakangas 6668ad1d70 Fix declaration of GinVacuumState.
gcc 4.8 was happy with having a duplicate typedef, but most compilers seem not
to be, per buildfarm.
2014-01-22 19:55:36 +02:00
Heikki Linnakangas 36a35c550a Compress GIN posting lists, for smaller index size.
GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.

Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
2014-01-22 19:20:58 +02:00
Andrew Dunstan 243ee26633 Reindent json.c and jsonfuncs.c.
This will help in preparation of clean patches for upcoming
json work.
2014-01-22 08:46:51 -05:00
Stephen Frost 6c36f383df Allow type_func_name_keywords in even more places
A while back, 2c92edad48 allowed
type_func_name_keywords to be used in more places, including role
identifiers.  Unfortunately, that commit missed out on cases where
name_list was used for lists-of-roles, eg: for DROP ROLE.  This
resulted in the unfortunate situation that you could CREATE a role
with a type_func_name_keywords-allowed identifier, but not DROP it
(directly- ALTER could be used to rename it to something which
could be DROP'd).

This extends allowing type_func_name_keywords to places where role
lists can be used.

Back-patch to 9.0, as 2c92edad48 was.
2014-01-21 22:49:22 -05:00
Tom Lane 69c7a9838c Tweak parse location assignment for CURRENT_DATE and related constructs.
All these constructs generate parse trees consisting of a Const and
a run-time type coercion (perhaps a FuncExpr or a CoerceViaIO).  Modify
the raw parse output so that we end up with the original token's location
attached to the type coercion node while the Const has location -1;
before, it was the other way around.  This makes no difference in terms
of what exprLocation() will say about the parse tree as a whole, so it
should not have any user-visible impact.  The point of changing it is that
we do not want contrib/pg_stat_statements to treat these constructs as
replaceable constants.  It will do the right thing if the Const has
location -1 rather than a valid location.

This is a pretty ugly hack, but then this code is ugly already; we should
someday replace this translation with special-purpose parse node(s) that
would allow ruleutils.c to reconstruct the original query text.

(See also commit 5d3fcc4c2e, which also
hacked location assignment rules for the benefit of pg_stat_statements.)

Back-patch to 9.2 where pg_stat_statements grew the ability to recognize
replaceable constants.

Kyotaro Horiguchi
2014-01-21 16:34:28 -05:00
Robert Haas 01f7808b3e Add a cardinality function for arrays.
Unlike our other array functions, this considers the total number of
elements across all dimensions, and returns 0 rather than NULL when the
array has no elements.  But it seems that both of those behaviors are
almost universally disliked, so hopefully that's OK.

Marko Tiikkaja, reviewed by Dean Rasheed and Pavel Stehule
2014-01-21 12:38:53 -05:00
Robert Haas 033b2343fa Fix inadvertent semantics change in last patch to plug memory leaks.
Commit a5bca4ef03 accidentally changed
the semantics when the "skipping missing configuration file" is
emitted, because it forced OK to true instead of leaving the value
untouched.

Spotted by Tom Lane.
2014-01-21 11:42:37 -05:00
Robert Haas 5709b8acc6 Avoid a possible relcache leak in get_object_address_attribute.
There's no apparent way to trigger this, so I'm not going to worry
about back-patching it for now.  But it's still wrong.

Marti Raudsepp
2014-01-21 10:02:37 -05:00
Robert Haas a5bca4ef03 Plug more memory leaks when reloading config file.
Commit 138184adc5 plugged some but not
all of the leaks from commit 2a0c81a12c.
This tightens things up some more.

Amit Kapila, per an observation by Tom Lane
2014-01-21 09:41:40 -05:00
Alvaro Herrera d2458e3b20 Expose a routine to print triggers during EXPLAIN ANALYZE
This is so that auto_explain can use it.

Kyotaro HORIGUCHI
2014-01-20 17:13:47 -03:00
Tom Lane 9a8f5729b4 Fix to_timestamp/to_date's handling of consecutive spaces in format string.
When there are consecutive spaces (or other non-format-code characters) in
the format, we should advance over exactly that many characters of input.
The previous coding mistakenly did a "skip whitespace" action between such
characters, possibly allowing more input to be skipped than the user
intended.  We only need to skip whitespace just before an actual field.

This is really a bug fix, but given the minimal number of field complaints
and the risk of breaking applications coded to expect the old behavior,
let's not back-patch it.

Jeevan Chalke
2014-01-20 13:45:51 -05:00
Fujii Masao 5363c7f2bc Fix typo in comment.
Sawada Masahiko
2014-01-21 02:24:17 +09:00
Simon Riggs 4d1e2aeb1a Speed up COPY into tables with DEFAULT nextval()
Previously the presence of a nextval() prevented the
use of batch-mode COPY.  This patch introduces a
special case just for nextval() functions. In future
we will introduce a general case solution for
labelling volatile functions as safe for use.
2014-01-20 17:22:38 +00:00
Magnus Hagander 98de86e422 Remove support for native krb5 authentication
krb5 has been deprecated since 8.3, and the recommended way to do
Kerberos authentication is using the GSSAPI authentication method
(which is still fully supported).

libpq retains the ability to identify krb5 authentication, but only
gives an error message about it being unsupported. Since all authentication
is initiated from the backend, there is no need to keep it at all
in the backend.
2014-01-19 17:05:01 +01:00
Magnus Hagander 4b8f2859cc Adjust the SSL connection notification message
Suggested by Tom
2014-01-19 13:27:22 +01:00
Stephen Frost 5254958e92 Add CREATE TABLESPACE ... WITH ... Options
Tablespaces have a few options which can be set on them to give PG hints
as to how the tablespace behaves (perhaps it's faster for sequential
scans, or better able to handle random access, etc).  These options were
only available through the ALTER TABLESPACE command.

This adds the ability to set these options at CREATE TABLESPACE time,
removing the need to do both a CREATE TABLESPACE and ALTER TABLESPACE to
get the correct options set on the tablespace.

Vik Fearing, reviewed by Michael Paquier.
2014-01-18 20:59:31 -05:00
Tom Lane 115f414124 Fix VACUUM's reporting of dead-tuple counts to the stats collector.
Historically, VACUUM has just reported its new_rel_tuples estimate
(the same thing it puts into pg_class.reltuples) to the stats collector.
That number counts both live and dead-but-not-yet-reclaimable tuples.
This behavior may once have been right, but modern versions of the
pgstats code track live and dead tuple counts separately, so putting
the total into n_live_tuples and zero into n_dead_tuples is surely
pretty bogus.  Fix it to report live and dead tuple counts separately.

This doesn't really do much for situations where updating transactions
commit concurrently with a VACUUM scan (possibly causing double-counting or
omission of the tuples they add or delete); but it's clearly an improvement
over what we were doing before.

Hari Babu, reviewed by Amit Kapila
2014-01-18 19:24:33 -05:00
Stephen Frost 76e91b38ba Add ALTER TABLESPACE ... MOVE command
This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of
objects from one tablespace to another.  This can be extremely handy and avoids
a lot of error-prone scripting.  ALTER TABLESPACE ... MOVE will only move
objects the user owns, will notify the user if no objects were found, and can
be used to move ALL objects or specific types of objects (TABLES, INDEXES, or
MATERIALIZED VIEWS).
2014-01-18 18:56:40 -05:00
Stephen Frost 6f25c62d78 Allow SET TABLESPACE to database default
We've always allowed CREATE TABLE to create tables in the database's default
tablespace without checking for CREATE permissions on that tablespace.
Unfortunately, the original implementation of ALTER TABLE ... SET TABLESPACE
didn't pick up on that exception.

This changes ALTER TABLE ... SET TABLESPACE to allow the database's default
tablespace without checking for CREATE rights on that tablespace, just as
CREATE TABLE works today.  Users could always do this through a series of
commands (CREATE TABLE ... AS SELECT * FROM ...; DROP TABLE ...; etc), so
let's fix the oversight in SET TABLESPACE's original implementation.
2014-01-18 18:41:52 -05:00
Tom Lane 0d79c0a8cc Make various variables const (read-only).
These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.

Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.

Oskari Saarenmaa, with some editorialization by me
2014-01-18 16:04:32 -05:00
Magnus Hagander 4cba1f6bbf Show SSL encryption information when logging connections
Expand the messages when log_connections is enabled to include the
fact that SSL is used and the SSL cipher information.

Dr. Andreas Kunert, review by Marko Kreen
2014-01-17 13:32:31 +01:00
Heikki Linnakangas a472ae1e4e Fix Hot Standby feedback sending when streaming busily.
Commit 6f60fdd701 accidentally removed a
call to XLogWalRcvSendHSFeedback() after flushing received WAL to disk.
The consequence is that when walsender is busy streaming WAL, it doesn't
send HS feedback messages. One is sent if nothing is received from the
master for 100ms, but if there's a steady stream of WAL, it never happens.

Backpatch to 9.3.

Andres Freund and Amit Kapila
2014-01-16 23:15:41 +02:00
Heikki Linnakangas 8ba288da5d Suppress Coverity complaints in readfuncs.c.
Coverity is complaining that the value returned by pg_strtok in
READ_LOCATION_FIELD and READ_BITMAPSET_FIELD macros is not used. In commit
39bfc94c86, we did this to the other macros
to placate compilers that complained when the variable was completely
unused, this extends that to the last remaining macros.
2014-01-16 12:00:19 +02:00
Robert Haas ed46758381 Logging running transactions every 15 seconds.
Previously, we did this just once per checkpoint, but that could make
Hot Standby take a long time to initialize.  To avoid busying an
otherwise-idle system, we don't do this if no WAL has been written
since we did it last.

Andres Freund
2014-01-15 12:41:20 -05:00
Robert Haas d02c0ddb15 Fix missing parentheses resulting in wrong order of dereference.
This could result in referencing uninitialized memory.

Michael Paquier, in response to a complaint from Andres Freund
2014-01-15 11:00:50 -05:00
Tom Lane 061b079f89 Fix multiple bugs in index page locking during hot-standby WAL replay.
In ordinary operation, VACUUM must be careful to take a cleanup lock on
each leaf page of a btree index; this ensures that no indexscans could
still be "in flight" to heap tuples due to be deleted.  (Because of
possible index-tuple motion due to concurrent page splits, it's not enough
to lock only the pages we're deleting index tuples from.)  In Hot Standby,
the WAL replay process must likewise lock every leaf page.  There were
several bugs in the code for that:

* The replay scan might come across unused, all-zero pages in the index.
While btree_xlog_vacuum itself did the right thing (ie, nothing) with
such pages, xlogutils.c supposed that such pages must be corrupt and
would throw an error.  This accounts for various reports of replication
failures with "PANIC: WAL contains references to invalid pages".  To
fix, add a ReadBufferMode value that instructs XLogReadBufferExtended
not to complain when we're doing this.

* btree_xlog_vacuum performed the extra locking if standbyState ==
STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up
for hot standby queries until the database has reached consistency, and
we don't want to do the extra locking till then either, for fear of reading
corrupted pages (which bufmgr.c would complain about).  Fix by exporting a
new function from xlog.c that will report whether we're actually in hot
standby replay mode.

* To ensure full coverage of the index in the replay scan, btvacuumscan
would emit a dummy WAL record for the last page of the index, if no
vacuuming work had been done on that page.  However, if the last page
of the index is all-zero, that would result in corruption of said page,
since the functions called on it weren't prepared to handle that case.
There's no need to lock any such pages, so change the logic to target
the last normal leaf page instead.

The first two of these bugs were diagnosed by Andres Freund, the other one
by me.  Fixes based on ideas from Heikki Linnakangas and myself.

This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
2014-01-14 17:35:21 -05:00
Robert Haas ec9037df26 Single-reader, single-writer, lightweight shared message queue.
This code provides infrastructure for user backends to communicate
relatively easily with background workers.  The message queue is
structured as a ring buffer and allows messages of arbitary length
to be sent and received.

Patch by me.  Review by KaiGai Kohei and Andres Freund.
2014-01-14 12:23:22 -05:00
Robert Haas 6ddd5137b2 Simple table of contents for a shared memory segment.
This interface is intended to make it simple to divide a dynamic shared
memory segment into different regions with distinct purposes.  It
therefore serves much the same purpose that ShmemIndex accomplishes for
the main shared memory segment, but it is intended to be more
lightweight.

Patch by me.  Review by Andres Freund.
2014-01-14 12:18:58 -05:00
Robert Haas 05ff5062da Code improvements for ALTER SYSTEM .. SET.
Move FreeConfigVariables() later to make sure ErrorConfFile is valid
when we use it, and get rid of an unnecessary string copy operation.

Amit Kapila, kibitzed by me.
2014-01-13 14:54:00 -05:00
Robert Haas 2bb1f14b89 Make bitmap heap scans show exact/lossy block info in EXPLAIN ANALYZE.
Etsuro Fujita
2014-01-13 14:42:16 -05:00
Tom Lane 158b7fa6a3 Disallow LATERAL references to the target table of an UPDATE/DELETE.
On second thought, commit 0c051c9008 was
over-hasty: rather than allowing this case, we ought to reject it for now.
That leaves the field clear for a future feature that allows the target
table to be re-specified in the FROM (or USING) clause, which will enable
left-joining the target table to something else.  We can then also allow
LATERAL references to such an explicitly re-specified target table.
But allowing them right now will create ambiguities or worse for such a
feature, and it isn't something we documented 9.3 as supporting.

While at it, add a convenience subroutine to avoid having several copies
of the ereport for disalllowed-LATERAL-reference cases.
2014-01-11 19:03:12 -05:00
Tom Lane 910bac5953 Fix possible crashes due to using elog/ereport too early in startup.
Per reports from Andres Freund and Luke Campbell, a server failure during
set_pglocale_pgservice results in a segfault rather than a useful error
message, because the infrastructure needed to use ereport hasn't been
initialized; specifically, MemoryContextInit hasn't been called.
One known cause of this is starting the server in a directory it
doesn't have permission to read.

We could try to prevent set_pglocale_pgservice from using anything that
depends on palloc or elog, but that would be messy, and the odds of future
breakage seem high.  Moreover there are other things being called in main.c
that look likely to use palloc or elog too --- perhaps those things
shouldn't be there, but they are there today.  The best solution seems to
be to move the call of MemoryContextInit to very early in the backend's
real main() function.  I've verified that an elog or ereport occurring
immediately after that is now capable of sending something useful to
stderr.

I also added code to elog.c to print something intelligible rather than
just crashing if MemoryContextInit hasn't created the ErrorContext.
This could happen if MemoryContextInit itself fails (due to malloc
failure), and provides some future-proofing against someone trying to
sneak in new code even earlier in server startup.

Back-patch to all supported branches.  Since we've only heard reports of
this type of failure recently, it may be that some recent change has made
it more likely to see a crash of this kind; but it sure looks like it's
broken all the way back.
2014-01-11 16:36:07 -05:00
Tom Lane 6286526207 Fix compute_scalar_stats() for case that all values exceed WIDTH_THRESHOLD.
The standard typanalyze functions skip over values whose detoasted size
exceeds WIDTH_THRESHOLD (1024 bytes), so as to limit memory bloat during
ANALYZE.  However, we (I think I, actually :-() failed to consider the
possibility that *every* non-null value in a column is too wide.  While
compute_minimal_stats() seems to behave reasonably anyway in such a case,
compute_scalar_stats() just fell through and generated no pg_statistic
entry at all.  That's unnecessarily pessimistic: we can still produce
valid stanullfrac and stawidth values in such cases, since we do include
too-wide values in the average-width calculation.  Furthermore, since the
general assumption in this code is that too-wide values are probably all
distinct from each other, it seems reasonable to set stadistinct to -1
("all distinct").

Per complaint from Kadri Raudsepp.  This has been like this since roughly
neolithic times, so back-patch to all supported branches.
2014-01-11 13:42:42 -05:00
Bruce Momjian 111022eac6 Move username lookup functions from /port to /common
Per suggestion from Peter E and Alvaro
2014-01-10 18:03:28 -05:00
Alvaro Herrera 423e1211a8 Accept pg_upgraded tuples during multixact freezing
The new MultiXact freezing routines introduced by commit 8e9a16ab8f
neglected to consider tuples that came from a pg_upgrade'd database; a
vacuum run that tried to freeze such tuples would die with an error such
as
ERROR: MultiXactId 11415437 does no longer exist -- apparent wraparound

To fix, ensure that GetMultiXactIdMembers is allowed to return empty
multis when the infomask bits are right, as is done in other callsites.

Per trouble report from F-Secure.

In passing, fix a copy&paste bug reported by Andrey Karpov from VIVA64
from their PVS-Studio static checked, that instead of setting relminmxid
to Invalid, we were setting relfrozenxid twice.  Not an important
mistake because that code branch is about relations for which we don't
use the frozenxid/minmxid values at all in the first place, but seems to
warrants a fix nonetheless.
2014-01-10 18:03:18 -03:00
Tom Lane faab7a957d Remove unnecessary local variables to work around an icc optimization bug.
Buildfarm member dunlin has been crashing since commit 8b49a60, but other
machines seem fine with that code.  It turns out that removing the local
variables in ordered_set_startup() that are copies of fields in "qstate"
dodges the problem.  This might cost a few cycles on register-rich
machines, but it's probably a wash on others, and in any case this code
isn't performance-critical.  Thanks to Jeremy Drake for off-list
investigation.
2014-01-09 12:59:55 -05:00
Heikki Linnakangas c945af80cf Refactor checking whether we've reached the recovery target.
Makes the replay loop slightly more readable, by separating the concerns of
whether to stop and whether to delay, and how to extract the timestamp from
a record.

This has the user-visible change that the timestamp of the last applied
record is now updated after actually applying it. Before, it was updated
just before applying it. That meant that pg_last_xact_replay_timestamp()
could return the timestamp of a commit record that is in process of being
replayed, but not yet applied. Normally the difference is small, but if
min_recovery_apply_delay is set, there could be a significant delay between
reading a record and applying it.

Another behavioral change is that if you recover to a restore point, we stop
after the restore point record, not before it. It makes no difference as far
as running queries on the server is concerned, as applying a restore point
record changes nothing, but if examine the timeline history you will see
that the new timeline branched off just after the restore point record, not
before it. One practical consequence is that if you do PITR to the new
timeline, and set recovery target to the same named restore point again, it
will find and stop recovery at the same restore point. Conceptually, I think
it makes more sense to consider the restore point as part of the new
timeline's history than not.

In principle, setting the last-replayed timestamp before actually applying
the record was a bug all along, but it doesn't seem worth the risk to
backpatch, since min_recovery_apply_delay was only added in 9.4.
2014-01-09 14:00:39 +02:00
Tom Lane 220b34331f We don't need to include pg_sema.h in s_lock.h anymore.
Minor improvement to commit daa7527afc2274432094ebe7ceb03aa41f916607:
s_lock.h no longer has any need to mention PGSemaphoreData, so we can
rip out the #include that supplies that.  In a non-HAVE_SPINLOCKS
build, this doesn't really buy much since we still need the #include
in spin.h --- but everywhere else, this reduces #include footprint by
some trifle, and helps keep the different locking facilities separate.
2014-01-08 20:58:22 -05:00
Tom Lane 080b7db72e Fix "cannot accept a set" error when only some arms of a CASE return a set.
In commit c1352052ef, I implemented an
optimization that assumed that a function's argument expressions would
either always return a set (ie multiple rows), or always not.  This is
wrong however: we allow CASE expressions in which some arms return a set
of some type and others just return a scalar of that type.  There may be
other examples as well.  To fix, replace the run-time test of whether an
argument returned a set with a static precheck (expression_returns_set).
This adds a little bit of query startup overhead, but it seems barely
measurable.

Per bug #8228 from David Johnston.  This has been broken since 8.0,
so patch all supported branches.
2014-01-08 20:18:58 -05:00
Robert Haas daa7527afc Reduce the number of semaphores used under --disable-spinlocks.
Instead of allocating a semaphore from the operating system for every
spinlock, allocate a fixed number of semaphores (by default, 1024)
from the operating system and multiplex all the spinlocks that get
created onto them.  This could self-deadlock if a process attempted
to acquire more than one spinlock at a time, but since processes
aren't supposed to execute anything other than short stretches of
straight-line code while holding a spinlock, that shouldn't happen.

One motivation for this change is that, with the introduction of
dynamic shared memory, it may be desirable to create spinlocks that
last for less than the lifetime of the server.  Without this change,
attempting to use such facilities under --disable-spinlocks would
quickly exhaust any supply of available semaphores.  Quite apart
from that, it's desirable to contain the quantity of semaphores
needed to run the server simply on convenience grounds, since using
too many may make it harder to get PostgreSQL running on a new
platform, which is mostly the point of --disable-spinlocks in the
first place.

Patch by me; review by Tom Lane.
2014-01-08 18:58:00 -05:00