Commit Graph

34783 Commits

Author SHA1 Message Date
Heikki Linnakangas abf5c5c9a4 If recovery.conf is created after "pg_ctl stop -m i", do crash recovery.
If you create a base backup using an atomic filesystem snapshot, and try to
perform PITR starting from that base backup, or if you just kill a master
server and create recovery.conf to put it into standby mode, we don't know
how far we need to recover before reaching consistency. Normally in crash
recovery, we replay all the WAL present in pg_xlog, and assume that we're
consistent after that. And normally in archive recovery, minRecoveryPoint,
backupEndRequired, or backupEndPoint is set in the control file, indicating
how far we need to replay to reach consistency. But if the server was
previously up and running normally, and you kill -9 it or take an atomic
filesystem snapshot, none of those fields are set in the control file.

The solution is to perform crash recovery first, replaying all the WAL in
pg_xlog. After that's done, we assume that the system is consistent like in
normal crash recovery, and switch to archive recovery mode after that.

Per report from Kyotaro HORIGUCHI. In his scenario, recovery.conf was
created after "pg_ctl stop -m i". I'm not sure we need to support that exact
scenario, but we should support backing up using a filesystem snapshot,
which looks identical.

This issue goes back to at least 9.0, where hot standby was introduced and
we started to track when consistency is reached. In 9.1 and 9.2, we would
open up for hot standby too early, and queries could briefly see an
inconsistent state. But 9.2 made it more visible, as we started to PANIC if
we see a reference to a non-existing page during recovery, if we've already
reached consistency. This is a fairly big patch, so back-patch to 9.2 only,
where the issue is more visible. We can consider back-patching further after
this has received some more testing in 9.2 and master.
2013-02-22 12:32:41 +02:00
Alvaro Herrera a730183926 Move relpath() to libpgcommon
This enables non-backend code, such as pg_xlogdump, to use it easily.
The previous location, in src/backend/catalog/catalog.c, made that
essentially impossible because that file depends on many backend-only
facilities; so this needs to live separately.
2013-02-21 22:46:17 -03:00
Alvaro Herrera 6e3fd96463 Remove useless variable
Per Jeff Janes
2013-02-21 11:46:46 -03:00
Tom Lane 54a2786835 Need to decorate XactIsoLevel as PGDLLIMPORT for postgres_fdw.
Per buildfarm.
2013-02-21 09:28:42 -05:00
Tom Lane 699d70b2ec Teach MSVC build system about postgres_fdw.
Per buildfarm.
2013-02-21 06:43:15 -05:00
Tom Lane d0d75c4022 Add postgres_fdw contrib module.
There's still a lot of room for improvement, but it basically works,
and we need this to be present before we can do anything much with the
writable-foreign-tables patch.  So let's commit it and get on with testing.

Shigeru Hanada, reviewed by KaiGai Kohei and Tom Lane
2013-02-21 05:27:16 -05:00
Heikki Linnakangas f435cd1d38 Fix pg_dumpall with database names containing =
If a database name contained a '=' character, pg_dumpall failed. The problem
was in the way pg_dumpall passes the database name to pg_dump on the
command line. If it contained a '=' character, pg_dump would interpret it
as a libpq connection string instead of a plain database name.

To fix, pass the database name to pg_dump as a connection string,
"dbname=foo", with the database name escaped if necessary.

Back-patch to all supported branches.
2013-02-20 17:08:54 +02:00
Heikki Linnakangas 2930c05634 Don't pass NULL to fprintf, if a bogus connection string is given to pg_dump.
Back-patch to all supported branches.
2013-02-20 16:33:24 +02:00
Heikki Linnakangas 5d6899dbae Fix yet another typo in comment.
Etsuro Fujita
2013-02-20 12:31:26 +02:00
Alvaro Herrera a40d09e27f Move ExceptionalCondition back to postgres.h
It needs to be defined in the backend even when assertions are not
enabled.  It's cleaner to put it back, than create a separate #ifdef
section in c.h.

Per trouble report from Jeff Janes
2013-02-18 18:53:32 -03:00
Alvaro Herrera 187492b6c2 Split pgstat file in smaller pieces
We now write one file per database and one global file, instead of
having the whole thing in a single huge file.  This reduces the I/O that
must be done when partial data is required -- which is all the time,
because each process only needs information on its own database anyway.
Also, the autovacuum launcher does not need data about tables and
functions in each database; having the global stats for all DBs is
enough.

Catalog version bumped because we have a new subdir under PGDATA.

Author: Tomas Vondra.  Some rework by Álvaro
Testing by Jeff Janes
Other discussion by Heikki Linnakangas, Tom Lane.
2013-02-18 18:12:52 -03:00
Peter Eisentraut 9475db3a4e Add ALTER ROLE ALL SET command
This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE
... SET functionality to allow creating settings that apply to all users
in all databases.

reviewed by Pavel Stehule
2013-02-17 23:45:36 -05:00
Bruce Momjian 17f1523932 Warn about initdb using mount-points
Add code to detect and warn about trying to initdb or create pg_xlog on
mount points.
2013-02-16 18:52:50 -05:00
Heikki Linnakangas 1bd42cd70a Better fix for "unarchived WAL files get deleted on crash recovery" bug.
Revert my earlier fix for the bug that unarchived WAL files get deleted on
crash recovery, commit c9cc7e05c6. We create
a .done file for files streamed or restored from archive, so the WAL file
recycling logic used during normal operation works just as well during
archive recovery.

Per Fujii Masao's suggestion.
2013-02-15 19:33:31 +02:00
Simon Riggs c2f79ba269 Force archive_status of .done for xlogs created by dearchival/replication.
This is a forward-patch of commit 6f4b8a4f4f,
applied to 9.2 back in August. The plan was to do something else in master,
but it looks like it's not going to happen, so let's just apply the 9.2
solution to master as well.

Fujii Masao
2013-02-15 19:28:06 +02:00
Heikki Linnakangas c9cc7e05c6 Don't delete unarchived WAL files during crash recovery.
Bug reported by Jehan-Guillaume (ioguix) de Rorthais. This was introduced
with the change to keep WAL files restored from archive in pg_xlog, in 9.2.
2013-02-15 17:43:59 +02:00
Peter Eisentraut 8e6c8da16a pgindent: Fix order in instructions
The previous order of steps didn't literally work, because git clean
-fdx would delete the downloaded typedefs.list.  Also, pgindent needs to
be called with a path when one is in at the top of the build tree.
2013-02-14 21:40:05 -05:00
Tom Lane fdaf44862b Invent pre-commit/pre-prepare/pre-subcommit events for xact callbacks.
Currently it's only possible for loadable modules to get control during
post-commit cleanup of a transaction.  That doesn't work too well if they
want to do something that could throw an error; for example, an FDW might
need to issue a remote commit, which could well fail.  To improve matters,
extend the existing APIs for XactCallback and SubXactCallback functions
to provide new pre-commit events for this purpose.

The release notes will need to mention that existing callback functions
should be checked to make sure they don't do something unwanted when one
of the new event types occurs.  In the examples within our source tree,
contrib/sepgsql was fine but plpgsql had been a bit too cute.
2013-02-14 20:35:08 -05:00
Bruce Momjian 4765dd7921 pg_upgrade: conditionally create cluster delete script
If users create tablespaces inside the old cluster directory, it is
impossible for the delete script to delete _only_ the old cluster files,
so don't create a script in that case, and issue a message to the user.
2013-02-14 10:53:03 -05:00
Bruce Momjian 74205266d4 Fix pg_upgrade log file cleanup code
Recent pg_upgrade parallel improvements introduced a bug that prevented
cleanup of per-database log files.
2013-02-14 00:04:15 -05:00
Peter Eisentraut ff64fd49ce doc: Add make target to produce EPUB from DocBook 2013-02-13 23:12:21 -05:00
Tom Lane 71627f3d19 Fix CVE-2013-0255 properly.
Revert commit ab0f7b6089 (in HEAD only)
in favor of the proper solution, which is to declare enum_recv() correctly
in the system catalogs.  It should be declared to take type "internal"
not "cstring".

Also improve the type_sanity regression test, which should have caught
this typo, so that it actually would.  Most of the relevant checks on
the signature of type I/O functions should not have been restricted to
basetypes/pseudotypes, as they should apply to any type's I/O functions.
2013-02-13 16:20:01 -05:00
Tom Lane 9728eda792 Fix contrib/pg_trgm's similarity() function for trigram-free strings.
Cases such as similarity('', '') produced a NaN result due to computing
0/0.  Per discussion, make it return zero instead.

This appears to be the basic cause of bug #7867 from Michele Baravalle,
although it remains unclear why her installation doesn't think Cyrillic
letters are letters.

Back-patch to all active branches.
2013-02-13 14:07:06 -05:00
Tom Lane cd89965aab Fix bogus when-to-deregister-from-listener-array logic.
Since a backend adds itself to the global listener array during
Exec_ListenPreCommit, it's inappropriate for it to remove itself during
Exec_UnlistenCommit or Exec_UnlistenAllCommit --- that leads to failure
when committing a transaction that did UNLISTEN then LISTEN, since we end
up not registered though we should be.  (This leads to missing later
notifications, or to Assert failures in assert-enabled builds.)  Instead
deal with deregistering at the bottom of AtCommit_Notify, when we know the
final state of the listenChannels list.

Also, simplify the representation of registration status by replacing the
transient backendHasExecutedInitialListen flag with an amRegisteredListener
flag.

Per report from Greg Sabino Mullane.  Back-patch to 9.0, where the problem
was introduced during the LISTEN/NOTIFY rewrite.
2013-02-13 12:48:05 -05:00
Heikki Linnakangas fdf9e21196 Update visibility map in the second phase of vacuum.
There's a high chance that a page becomes all-visible when the second phase
of vacuum removes all the dead tuples on it, so it makes sense to check for
that. Otherwise the visibility map won't get updated until the next vacuum.

Pavan Deolasee, reviewed by Jeff Janes.
2013-02-13 17:52:10 +02:00
Alvaro Herrera 0e81ddde2c Rename "string" pstrdup argument to "in"
The former name collides with a symbol also used in the isolation test's
parser, causing assorted failures in certain platforms.
2013-02-12 12:43:09 -03:00
Alvaro Herrera 0f980b0e17 Don't build libpgcommon_srv.a just yet
It's empty, and some archivers do not support that case.
2013-02-12 12:21:27 -03:00
Alvaro Herrera 8396447cdb Create libpgcommon, and move pg_malloc et al to it
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines.  We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.

The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.

At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly.  To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc().  This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.

This lets us clean up some places that were already with
localized hacks.

Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that.  libpgcommon infrastructure was authored by Álvaro.
2013-02-12 11:21:05 -03:00
Peter Eisentraut 0cb1fac3b1 Add noreturn attributes to some error reporting functions 2013-02-12 07:13:22 -05:00
Heikki Linnakangas 62401db45c Support unlogged GiST index.
The reason this wasn't supported before was that GiST indexes need an
increasing sequence to detect concurrent page-splits. In a regular WAL-
logged GiST index, the LSN of the page-split record is used for that
purpose, and in a temporary index, we can get away with a backend-local
counter. Neither of those methods works for an unlogged relation.

To provide such an increasing sequence of numbers, create a "fake LSN"
counter that is saved and restored across shutdowns. On recovery, unlogged
relations are blown away, so the counter doesn't need to survive that
either.

Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
2013-02-11 23:07:09 +02:00
Heikki Linnakangas b669f416ce Fix checkpoint after fast promotion.
The intention was to request a regular online checkpoint immediately after
end of recovery, when performing "fast promotion". However, because the
checkpoint was requested before other backends were allowed to write WAL,
the checkpointer process performed a restartpoint rather than a checkpoint.

Delay the RequestCheckPoint call until after recovery has truly ended, so
that you get a real checkpoint.
2013-02-11 22:22:08 +02:00
Heikki Linnakangas 7803e9327d Include previous TLI in end-of-recovery and shutdown checkpoint records.
This isn't used for anything but a sanity check at the moment, but it could
be highly valuable for debugging purposes. It could also be used to recreate
timeline history by traversing WAL, which seems useful.
2013-02-11 18:16:25 +02:00
Tom Lane c352ea2d74 Further cleanup of gistsplit.c.
After further reflection I was unconvinced that the existing coding is
guaranteed to return valid union datums in every code path for multi-column
indexes.  Fix that by forcing a gistunionsubkey() call at the end of the
recursion.  Having done that, we can remove some clearly-redundant calls
elsewhere.  This should be a little faster for multi-column indexes (since
the previous coding would uselessly do such a call for each column while
unwinding the recursion), as well as much harder to break.

Also, simplify the handling of cases where one side or the other of a
primary split contains only don't-care tuples.  The previous coding used a
very ugly hack in removeDontCares() that essentially forced one random
tuple to be treated as non-don't-care, providing a random initial choice of
seed datum for the secondary split.  It seems unlikely that that method
will give better-than-random splits.  Instead, treat such a split as
degenerate and just let the next column determine the split, the same way
that we handle fully degenerate cases where the two sides produce identical
union datums.
2013-02-10 16:21:26 -05:00
Tom Lane db3d7e9f0d Remove useless picksplit-doesn't-support-secondary-split log spam.
This LOG message was put in over five years ago with the evident
expectation that we'd make all GiST opclasses support secondary split
directly.  However, no such thing ever happened, and indeed the number of
opclasses supporting it decreased to zero in 9.2.  The reason is that
improving on the default implementation isn't that easy --- the
opclass-specific code that did exist, before 9.2, doesn't appear to have
been any improvement over the default.

Hence, remove the message altogether.  There's certainly no point in
nagging users about this in released branches, but I doubt that we'll
ever implement complete opclass-specific support anyway.
2013-02-10 13:07:40 -05:00
Tom Lane dacc185f52 Remove vestigial secondary-split support in gist_box_picksplit().
Not only is this implementation of secondary-split not better than the
default implementation in gistsplit.c, it's actually worse.  The gistsplit.c
code at least looks to see if switching the left and right sides would make
a better merge with the previously-split tuples, while this doesn't.

In any case it's rather useless to support secondary split only in an edge
case.  There used to be more complete support for it here (in chooseLR()),
but that was removed in commit 7f3bd86843.
It appears to me though that the chooseLR() code was really isomorphic to
the default implementation, since it was still based on choosing the cheaper
way of adding two sub-split vectors that had been chosen without regard to
the primary split initially.  I think an implementation of secondary split
that could beat the default implementation would have to be pretty fully
integrated into the split algorithm, not plastered on at the end.

Back-patch to 9.2, but not further; previous branches have the chooseLR()
code which I don't feel a great need to mess with.  This is mainly so we
just have two behaviors and not three among the various branches (IOW, this
patch is cleanup for commit 7f3bd86843e5aad84585a57d3f6b80db3c609916's
incomplete removal of secondary-split support).
2013-02-10 12:40:09 -05:00
Tom Lane 0fd0f3688b Document and clean up gistsplit.c.
Improve comments, rename some variables and functions, slightly simplify
a couple of APIs, in an attempt to make this code readable by people other
than its original author.

Even though this is essentially just cosmetic, back-patch to all active
branches, because otherwise it's going to make back-patching future fixes
in this file very painful.
2013-02-10 11:58:15 -05:00
Tom Lane a187c96d26 Reduce log level of picksplit-doesn't-support-secondary-split whining.
This was agreed to back in 2007, but never actually done.

Josh Hansen
2013-02-09 12:17:55 -05:00
Tom Lane 3a1f8cdfa9 Add an example of attaching a default value to an updatable view.
This is probably the single most useful thing that ALTER VIEW can do,
particularly now that we have auto-updatable views.  So show an explicit
example.
2013-02-09 11:43:48 -05:00
Peter Eisentraut 0343a59d11 psql: Improve unaligned expanded output for zero rows
This used to erroneously print an empty line.  Now it prints nothing.
2013-02-09 00:11:58 -05:00
Peter Eisentraut 8ade58a4ea psql: Improve expanded print output in tuples-only mode
When there are zero result rows, in expanded mode, "(No rows)" is
printed.  So far, there was no way to turn this off.  Now, when
tuples-only mode is turned on, nothing is printed in this case.
2013-02-09 00:11:58 -05:00
Tom Lane c61e26ee3e Add support for ALTER RULE ... RENAME TO.
Ali Dar, reviewed by Dean Rasheed.
2013-02-08 23:58:40 -05:00
Tom Lane f806c191a3 Simplify box_overlap computations.
Given the assumption that a box's high coordinates are not less than its
low coordinates, the tests in box_ov() are overly complicated and can be
reduced to about half as much work.  Since many other functions in
geo_ops.c rely on that assumption, there doesn't seem to be a good reason
not to use it here.

Per discussion of Alexander Korotkov's GiST fix, which was already using
the simplified logic (in a non-fuzzy form, but the equivalence holds just
as well for fuzzy).
2013-02-08 18:26:08 -05:00
Tom Lane 3c29b196b0 Fix gist_box_same and gist_point_consistent to handle fuzziness correctly.
While there's considerable doubt that we want fuzzy behavior in the
geometric operators at all (let alone as currently implemented), nobody is
stepping forward to redesign that stuff.  In the meantime it behooves us
to make sure that index searches agree with the behavior of the underlying
operators.  This patch fixes two problems in this area.

First, gist_box_same was using fuzzy equality, but it really needs to use
exact equality to prevent not-quite-identical upper index keys from being
treated as identical, which for example would prevent an existing upper
key from being extended by an amount less than epsilon.  This would result
in inconsistent indexes.  (The next release notes will need to recommend
that users reindex GiST indexes on boxes, polygons, circles, and points,
since all four opclasses use gist_box_same.)

Second, gist_point_consistent used exact comparisons for upper-page
comparisons in ~= searches, when it needs to use fuzzy comparisons to
ensure it finds all matches; and it used fuzzy comparisons for point <@ box
searches, when it needs to use exact comparisons because that's what the
<@ operator (rather inconsistently) does.

The added regression test cases illustrate all three misbehaviors.

Back-patch to all active branches.  (8.4 did not have GiST point_ops,
but it still seems prudent to apply the gist_box_same patch to it.)

Alexander Korotkov, reviewed by Noah Misch
2013-02-08 18:03:17 -05:00
Alvaro Herrera 381d4b70a9 Clean up c.h / postgres.h after Assert() move
Per Tom
2013-02-08 12:50:58 -03:00
Alvaro Herrera 5766228bc6 Fix Xmax freeze conditions
I broke this in 0ac5ad5134; previously, freezing a tuple marked with an
IS_MULTI xmax was not necessary.

Per brokenness report from Jeff Janes.
2013-02-08 12:50:58 -03:00
Tom Lane 335c5e9206 doc: Fix mistakes in the most recent set of release notes.
Improve description of the vacuum_freeze_table_age bug (it's much more
serious than we realized at the time the fix was committed), and correct
attribution of pg_upgrade -O/-o fix (Marti Raudsepp contributed that,
but Bruce forgot to credit him in the commit log).

No need to back-patch right now, it'll happen when the next set of
release notes are prepared.
2013-02-08 10:41:15 -05:00
Magnus Hagander c572bfaf39 Fix another typo in a comment
Noted by Thom Brown
2013-02-08 15:42:01 +01:00
Peter Eisentraut cf4d67e819 Exclude access/rmgrlist.h from cpluspluscheck
It is not meant to be included standalone.
2013-02-08 07:01:21 -05:00
Peter Eisentraut 4760142146 scripts: Add build prerequisite on libpgport
Without this, building in src/bin/scripts directly will fail if
libpgport wasn't built first.  Other bin components are handled the same
way.

Phil Sorber
2013-02-08 06:43:54 -05:00
Magnus Hagander 733701d274 Fix typo in comment
Etsuro Fujita
2013-02-08 11:45:42 +01:00