Commit Graph

12127 Commits

Author SHA1 Message Date
Tom Lane b3aaf9081a Rearrange planner to save the whole PlannerInfo (subroot) for a subquery.
Formerly, set_subquery_pathlist and other creators of plans for subqueries
saved only the rangetable and rowMarks lists from the lower-level
PlannerInfo.  But there's no reason not to remember the whole PlannerInfo,
and indeed this turns out to simplify matters in a number of places.

The immediate reason for doing this was so that the subroot will still be
accessible when we're trying to extract column statistics out of an
already-planned subquery.  But now that I've done it, it seems like a good
code-beautification effort in its own right.

I also chose to get rid of the transient subrtable and subrowmark fields in
SubqueryScan nodes, in favor of having setrefs.c look up the subquery's
RelOptInfo.  That required changing all the APIs in setrefs.c to pass
PlannerInfo not PlannerGlobal, which was a large but quite mechanical
transformation.

One side-effect not foreseen at the beginning is that this finally broke
inheritance_planner's assumption that replanning the same subquery RTE N
times would necessarily give interchangeable results each time.  That
assumption was always pretty risky, but now we really have to make a
separate RTE for each instance so that there's a place to carry the
separate subroots.
2011-09-03 15:36:24 -04:00
Peter Eisentraut 42ad992fdc Add archive_command example 2011-09-03 01:29:09 +03:00
Peter Eisentraut f1e4f3d44f Whitespace adjustment for consistency in the file 2011-09-03 01:28:05 +03:00
Tom Lane 5b562644fe Teach ANALYZE to clear pg_class.relhassubclass when appropriate.
In the past, relhassubclass always remained true if a relation had ever had
child relations, even if the last subclass was long gone.  While this had
only marginal performance implications in most cases, it was annoying, and
I'm now considering some planner changes that would raise the cost of a
false positive.  It was previously impractical to fix this because of race
condition concerns.  However, given the recent change that made tablecmds.c
take ShareExclusiveLock on relations that are gaining a child (commit
fbcf4b92aa), we can now allow ANALYZE to
clear the flag when it's no longer relevant.  There is no additional
locking cost to do so, since ANALYZE takes ShareExclusiveLock anyway.
2011-09-02 14:29:31 -04:00
Bruce Momjian 10af3ab2b2 Add C comment about needed include. 2011-09-01 12:53:45 -04:00
Tom Lane e5b012b788 Put back improperly removed #include. 2011-09-01 11:57:46 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Heikki Linnakangas a88b6e4cfb setlocale() on Windows doesn't work correctly if the locale name contains
dots. I previously worked around this in initdb, mapping the known
problematic locale names to aliases that work, but Hiroshi Inoue pointed
out that that's not enough because even if you use one of the aliases, like
"Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie.
"Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by
passing that value back to setlocale(), it fails. Note that you are affected
by this bug also if you use one of those short-form names manually, so just
reverting the hack in initdb won't fix it.

To work around that, move the locale name mapping from initdb to a wrapper
around setlocale(), so that the mapping is invoked on every setlocale() call.

Also, add a few checks for failed setlocale() calls in the backend. These
calls shouldn't fail, and if they do there isn't much we can do about it,
but at least you'll get a warning.

Backpatch to 9.1, where the initdb hack was introduced. The Windows bug
affects older versions too if you set locale manually to one of the aliases,
but given the lack of complaints from the field, I'm hesitent to backpatch.
2011-09-01 11:08:32 +03:00
Tom Lane 0d3b231eeb Further repair of eqjoinsel ndistinct-clamping logic.
Examination of examples provided by Mark Kirkwood and others has convinced
me that actually commit 7f3eba30c9 was quite
a few bricks shy of a load.  The useful part of that patch was clamping
ndistinct for the inner side of a semi or anti join, and the reason why
that's needed is that it's the only way that restriction clauses
eliminating rows from the inner relation can affect the estimated size of
the join result.  I had not clearly understood why the clamping was
appropriate, and so mis-extrapolated to conclude that we should clamp
ndistinct for the outer side too, as well as for both sides of regular
joins.  These latter actions were all wrong, and are reverted with this
patch.  In addition, the clamping logic is now made to affect the behavior
of both paths in eqjoinsel_semi, with or without MCV lists to compare.
When we have MCVs, we suppose that the most common values are the ones
that are most likely to survive the decimation resulting from a lower
restriction clause, so we think of the clamping as eliminating non-MCV
values, or potentially even the least-common MCVs for the inner relation.

Back-patch to 8.4, same as previous fixes in this area.
2011-09-01 00:19:38 -04:00
Tom Lane 97930cf578 Improve eqjoinsel's ndistinct clamping to work for multiple levels of join.
This patch fixes an oversight in my commit
7f3eba30c9 of 2008-10-23.  That patch
accounted for baserel restriction clauses that reduced the number of rows
coming out of a table (and hence the number of possibly-distinct values of
a join variable), but not for join restriction clauses that might have been
applied at a lower level of join.  To account for the latter, look up the
sizes of the min_lefthand and min_righthand inputs of the current join,
and clamp with those in the same way as for the base relations.

Noted while investigating a complaint from Ben Chobot, although this in
itself doesn't seem to explain his report.

Back-patch to 8.4; previous versions used different estimation methods
for which this heuristic isn't relevant.
2011-08-31 16:05:43 -04:00
Tom Lane 5bba65de94 Fix a missed case in code for "moving average" estimate of reltuples.
It is possible for VACUUM to scan no pages at all, if the visibility map
shows that all pages are all-visible.  In this situation VACUUM has no new
information to report about the relation's tuple density, so it wasn't
changing pg_class.reltuples ... but it updated pg_class.relpages anyway.
That's wrong in general, since there is no evidence to justify changing the
density ratio reltuples/relpages, but it's particularly bad if the previous
state was relpages=reltuples=0, which means "unknown tuple density".
We just replaced "unknown" with "zero".  ANALYZE would eventually recover
from this, but it could take a lot of repetitions of ANALYZE to do so if
the relation size is much larger than the maximum number of pages ANALYZE
will scan, because of the moving-average behavior introduced by commit
b4b6923e03.

The only known situation where we could have relpages=reltuples=0 and yet
the visibility map asserts everything's visible is immediately following
a pg_upgrade.  It might be advisable for pg_upgrade to try to preserve the
relpages/reltuples statistics; but in any case this code is wrong on its
own terms, so fix it.  Per report from Sergey Koposov.

Back-patch to 8.4, where the visibility map was introduced, same as the
previous change.
2011-08-30 14:51:38 -04:00
Robert Haas 8a3d33c8e6 Fix parsing of time string followed by yesterday/today/tomorrow.
Previously, 'yesterday 04:00:00'::timestamp didn't do the same thing as
'04:00:00 yesterday'::timestamp, and the return value from the latter
was midnight rather than the specified time.

Dean Rasheed, with some stylistic changes
2011-08-30 11:38:42 -04:00
Robert Haas eab2ef6164 Remove some tabs from README file.
Some of the ASCII art expected 8-space tab stops, and some of it
expected 4-space tab stops.

Per report from YAMAMOTO Takashi.
2011-08-29 22:26:29 -04:00
Tom Lane a5b7640ba0 Fix concat_ws() to not insert a separator after leading NULL argument(s).
Per bug #6181 from Itagaki Takahiro.  Also do some marginal code cleanup
and improve error handling.
2011-08-29 15:20:57 -04:00
Robert Haas c01c25fbe5 Improve spinlock performance for HP-UX, ia64, non-gcc.
At least on this architecture, it's very important to spin on a
non-atomic instruction and only retry the atomic once it appears
that it will succeed.  To fix this, split TAS() into two macros:
TAS(), for trying to grab the lock the first time, and TAS_SPIN(),
for spinning until we get it.  TAS_SPIN() defaults to same as TAS(),
but we can override it when we know there's a better way.

It's likely that some of the other cases in s_lock.h require
similar treatment, but this is the only one we've got conclusive
evidence for at present.
2011-08-29 10:05:48 -04:00
Bruce Momjian 4bd7333b14 Allow more include files to be compiled in their own by adding missing
include dependencies.

Modify pgcompinclude to skip a common fcinfo error.
2011-08-27 11:05:33 -04:00
Peter Eisentraut fd5b397ca4 Implement the information schema with_hierarchy column
In PostgreSQL, this is included in the SELECT privilege, so show YES
or NO depending on whether SELECT is granted.
2011-08-27 15:03:02 +03:00
Bruce Momjian f261deb4b4 Add missing includes after pgrminclude run. 2011-08-26 18:15:14 -04:00
Bruce Momjian f8fc37b337 Add markers for skips. 2011-08-26 18:15:13 -04:00
Tom Lane 00eb036c11 Fix potential memory clobber in tsvector_concat().
tsvector_concat() allocated its result workspace using the "conservative"
estimate of the sum of the two input tsvectors' sizes.  Unfortunately that
wasn't so conservative as all that, because it supposed that the number of
pad bytes required could not grow.  Which it can, as per test case from
Jesper Krogh, if there's a mix of lexemes with positions and lexemes
without them in the input data.  The fix is to assume that we might add
a not-previously-present pad byte for each and every lexeme in the two
inputs; which really is conservative, but it doesn't seem worthwhile to
try to be more precise.

This is an aboriginal bug in tsvector_concat, so back-patch to all
versions containing it.
2011-08-26 16:51:34 -04:00
Tom Lane ecf248737a Add makefile rules to check for backtracking in backend and psql lexers.
Per discussion, we should enforce the policy of "no backtracking" in these
performance-sensitive scanners.
2011-08-25 14:44:17 -04:00
Tom Lane 2e95f1f002 Add "%option warn" to all flex input files that lacked it.
This is recommended in the flex manual, and there seems no good reason
not to use it everywhere.
2011-08-25 13:55:57 -04:00
Robert Haas 48bc57657d Tweak postgresql.conf.sample's comments on listen_addresess.
This makes it slightly more clear that '*' is not part of the default
value, in case that wasn't obvious.

As requested by Dougal Sutherland.
2011-08-25 09:41:24 -04:00
Tom Lane cb5c2ba2d8 Fix multiple bugs in extension dropping.
When we implemented extensions, we made findDependentObjects() treat
EXTENSION dependency links similarly to INTERNAL links.  However, that
logic contained an implicit assumption that an object could have at most
one INTERNAL dependency, so it did not work correctly for objects having
both INTERNAL and DEPENDENCY links.  This led to failure to drop some
extension member objects when dropping the extension.  Furthermore, we'd
never actually exercised the case of recursing to an internally-referenced
(owning) object from anything other than a NORMAL dependency, and it turns
out that passing the incoming dependency's flags to the owning object is
the Wrong Thing.  This led to sometimes dropping a whole extension silently
when we should have rejected the drop command for lack of CASCADE.

Since we obviously were under-testing extension drop scenarios, add some
regression test cases.  Unfortunately, such test cases require some
extensions (duh), so we can't test for problems in the core regression
tests.  I chose to add them to the earthdistance contrib module, which is
a good test case because it has a dependency on the cube contrib module.

Back-patch to 9.1.  Arguably these are pre-existing bugs in INTERNAL
dependency handling, but since it appears that the cases can never arise
pre-9.1, I'll refrain from back-patching the logic changes further than
that.
2011-08-24 13:09:06 -04:00
Tom Lane d4aa491493 Make CREATE EXTENSION check schema creation permissions.
When creating a new schema for a non-relocatable extension, we neglected
to check whether the calling user has permission to create schemas.
That didn't matter in the original coding, since we had already checked
superuserness, but in the new dispensation where users need not be
superusers, we should check it.  Use CreateSchemaCommand() rather than
calling NamespaceCreate() directly, so that we also enforce the rules
about reserved schema names.

Per complaint from KaiGai Kohei, though this isn't the same as his patch.
2011-08-23 21:49:07 -04:00
Tom Lane 43f0c20839 Fix overoptimistic assumptions in column width estimation for subqueries.
set_append_rel_pathlist supposed that, while computing per-column width
estimates for the appendrel, it could ignore child rels for which the
translated reltargetlist entry wasn't a Var.  This gave rise to completely
silly estimates in some common cases, such as constant outputs from some or
all of the arms of a UNION ALL.  Instead, fall back on get_typavgwidth to
estimate from the value's datatype; which might be a poor estimate but at
least it's not completely wacko.

That problem was exposed by an Assert in set_subquery_size_estimates, which
unfortunately was still overoptimistic even with that fix, since we don't
compute attr_widths estimates for appendrels that are entirely excluded by
constraints.  So remove the Assert; we'll just fall back on get_typavgwidth
in such cases.

Also, since set_subquery_size_estimates calls set_baserel_size_estimates
which calls set_rel_width, there's no need for set_subquery_size_estimates
to call get_typavgwidth; set_rel_width will handle it for us if we just
leave the estimate set to zero.  Remove the unnecessary code.

Per report from Erik Rijkers and subsequent investigation.
2011-08-23 17:13:12 -04:00
Peter Eisentraut 1af55e2751 Use consistent format for reporting GetLastError()
Use something like "error code %lu" for reporting GetLastError()
values on Windows.  Previously, a mix of different wordings and
formats were in use.
2011-08-23 22:00:52 +03:00
Robert Haas 7488936478 Typo fix. 2011-08-22 12:16:27 -04:00
Tom Lane 660a081c3f Fix handling of extension membership when filling in a shell operator.
The previous coding would result in deleting and not re-creating the
extension membership pg_depend rows, since there was no
CommandCounterIncrement that would allow recordDependencyOnCurrentExtension
to see that the deletion had happened.  Make it work like the shell type
case, ie, keep the existing entries (and then throw an error if they're for
the wrong extension).

Per bug #6172 from Hitoshi Harada.  Investigation and fix by Dimitri
Fontaine.
2011-08-22 10:55:47 -04:00
Tom Lane b33f78df17 Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.
Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER
ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger
fired for the same update.  Fix by not trying to overload the use of
estate->es_trig_tuple_slot.  Per report from Yoran Heling.

Back-patch to 9.0, when trigger WHEN conditions were introduced.
2011-08-21 18:15:55 -04:00
Tom Lane 08e1eedf24 Fix performance problem when building a lossy tidbitmap.
As pointed out by Sergey Koposov, repeated invocations of tbm_lossify can
make building a large tidbitmap into an O(N^2) operation.  To fix, make
sure we remove more than the minimum amount of information per call, and
add a fallback path to behave sanely if we're unable to fit the bitmap
within the requested amount of memory.

This has been wrong since the tidbitmap code was written, so back-patch
to all supported branches.
2011-08-20 14:51:02 -04:00
Robert Haas 0f7acbeddf Make lazy_vacuum_rel call pg_rusage_init only if needed.
do_analyze_rel already does it this way.

Euler Taveira de Oliveira
2011-08-18 09:55:04 -04:00
Robert Haas 24bf1552f6 Remove obsolete README file.
Perhaps we ought to add some other kind of documentation here instead,
but for now let's get rid of this woefully obsolete description of the
sinval machinery.
2011-08-18 09:49:41 -04:00
Peter Eisentraut 1bf80041e3 Translation updates 2011-08-17 14:07:46 +03:00
Heikki Linnakangas 1d0392b245 Fix comment about which version had BACKUP METHOD line in backup_lable, again.
It was invalidated again by Fujii's patch to 9.1.
2011-08-17 12:31:23 +03:00
Tom Lane b5282aa893 Revise sinval code to remove no-longer-used tuple TID from inval messages.
This requires adjusting the API for syscache callback functions: they now
get a hash value, not a TID, to identify the target tuple.  Most of them
weren't paying any attention to that argument anyway, but plancache did
require a small amount of fixing.

Also, improve performance a trifle by avoiding sending duplicate inval
messages when a heap_update isn't changing the catcache lookup columns.
2011-08-16 19:27:46 -04:00
Tom Lane 632ae6829f Forget about targeting catalog cache invalidations by tuple TID.
The TID isn't stable enough: we might queue an sinval event before a VACUUM
FULL, and then process it afterwards, when the target tuple no longer has
the same TID.  So we must invalidate entries on the basis of hash value
only.  The old coding can be shown to result in various bizarre,
hard-to-reproduce errors in the presence of concurrent VACUUM FULLs on
system catalogs, and could easily result in permanent catalog corruption,
up to and including complete loss of tables.

This commit is just a minimal fix that removes the unsafe comparison.
We should remove transmission of the tuple TID from sinval messages
altogether, and then arrange to suppress the extra message in the common
case of a heap_update that doesn't change the key hashvalue.  But that's
going to be much more invasive, and will only produce a probably-marginal
performance gain, so it doesn't seem like material for a back-patch.

Back-patch to 9.0.  Before that, VACUUM FULL refused to do any tuple moving
if it found any INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples (and
CLUSTER would give up altogether), so there was no risk of moving a tuple
that might be the subject of an unsent sinval message.
2011-08-16 15:26:22 -04:00
Tom Lane f4d7f1adba Fix incorrect order of operations during sinval reset processing.
We have to be sure that we have revalidated each nailed-in-cache relcache
entry before we try to use it to load data for some other relcache entry.
The introduction of "mapped relations" in 9.0 broke this, because although
we updated the state kept in relmapper.c early enough, we failed to
propagate that information into relcache entries soon enough; in
particular, we could try to fetch pg_class rows out of pg_class before
we'd updated its relcache entry's rd_node.relNode value from the map.

This bug accounts for Dave Gould's report of failures after "vacuum full
pg_class", and I believe that there is risk for other system catalogs
as well.

The core part of the fix is to copy relmapper data into the relcache
entries during "phase 1" in RelationCacheInvalidate(), before they'll be
used in "phase 2".  To try to future-proof the code against other similar
bugs, I also rearranged the order in which nailed relations are visited
during phase 2: now it's pg_class first, then pg_class_oid_index, then
other nailed relations.  This should ensure that RelationClearRelation can
apply RelationReloadIndexInfo to all nailed indexes without risking use
of not-yet-revalidated relcache entries.

Back-patch to 9.0 where the relation mapper was introduced.
2011-08-16 14:38:20 -04:00
Tom Lane 7b0d0e9356 Preserve toast value OIDs in toast-swap-by-content for CLUSTER/VACUUM FULL.
This works around the problem that a catalog cache entry might contain a
toast pointer that we try to dereference just as a VACUUM FULL completes
on that catalog.  We will see the sinval message on the cache entry when
we acquire lock on the toast table, but by that point we've already told
tuptoaster.c "here's the pointer to fetch", so it's difficult from a code
structural standpoint to update the pointer before we use it.  Much less
painful to ensure that toast pointers are not invalidated in the first
place.  We have to add a bit of code to deal with the case that a value
that previously wasn't toasted becomes so; but that should be a
seldom-exercised corner case, so the inefficiency shouldn't be significant.

Back-patch to 9.0.  In prior versions, we didn't allow CLUSTER on system
catalogs, and VACUUM FULL didn't result in reassignment of toast OIDs, so
there was no problem.
2011-08-16 13:48:04 -04:00
Tom Lane 2ada6779c5 Fix race condition in relcache init file invalidation.
The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale.  The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.

Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages.  This is more straightforward, and might even be a bit
faster since only one unlink call is needed.

This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.
2011-08-16 13:11:54 -04:00
Heikki Linnakangas 2877c67bc2 Fix bogus comment that claimed that the new BACKUP METHOD line in
backup_label was new in 9.0. Spotted by Fujii Masao.
2011-08-16 12:23:51 +03:00
Peter Eisentraut e5475a80d2 Add "Reason code" prefix to internal SSI error messages
This makes it clearer that the error message is perhaps not supposed
to be understood by users, and it also makes it somewhat clearer that
it was not accidentally omitted from translation.

Idea from Heikki Linnakangas, except that we don't mark "Reason code"
for translation at this point, because that would make the
implementation too cumbersome.
2011-08-15 15:20:16 +03:00
Tom Lane 52994e9e56 Fix unsafe order of operations in foreign-table DDL commands.
When updating or deleting a system catalog tuple, it's necessary to acquire
RowExclusiveLock on the catalog before looking up the tuple; otherwise a
concurrent VACUUM FULL on the catalog might move the tuple to a different
TID before we can apply the update.  Coding patterns that find the tuple
via a table scan aren't at risk here, but when obtaining the tuple from a
catalog cache, correct ordering is important; and several routines in
foreigncmds.c got it wrong.  Noted while running the regression tests in
parallel with VACUUM FULL of assorted system catalogs.

For consistency I moved all the heap_open calls to the starts of their
functions, including a couple for which there was no actual bug.

Back-patch to 8.4 where foreigncmds.c was added.
2011-08-14 15:40:21 -04:00
Tom Lane 592b615d71 Fix incorrect timeout handling during initial authentication transaction.
The statement start timestamp was not set before initiating the transaction
that is used to look up client authentication information in pg_authid.
In consequence, enable_sig_alarm computed a wrong value (far in the past)
for statement_fin_time.  That didn't have any immediate effect, because the
timeout alarm was set without reference to statement_fin_time; but if we
subsequently blocked on a lock for a short time, CheckStatementTimeout
would consult the bogus value when we cancelled the lock timeout wait,
and then conclude we'd timed out, leading to immediate failure of the
connection attempt.  Thus an innocent "vacuum full pg_authid" would cause
failures of concurrent connection attempts.  Noted while testing other,
more serious consequences of vacuum full on system catalogs.

We should set the statement timestamp before StartTransactionCommand(),
so that the transaction start timestamp is also valid.  I'm not sure if
there are any non-cosmetic effects of it not being valid, but the xact
timestamp is at least sent to the statistics machinery.

Back-patch to 9.0.  Before that, the client authentication timeout was done
outside any transaction and did not depend on this state to be valid.
2011-08-13 17:52:24 -04:00
Tom Lane a180776f7a Teach unix_latch.c to use poll() where available.
poll() is preferred over select() on platforms where both are available,
because it tends to be a bit faster and it doesn't have an arbitrary limit
on the range of FD numbers that can be accessed.  The FD range limit does
not appear to be a risk factor for any 9.1 usages, so this doesn't need to
be back-patched, but we need to have it in place if we keep on expanding
the uses of WaitLatch.
2011-08-11 12:50:22 -04:00
Robert Haas 5057366eed Unbreak legacy syntax "COMMENT ON RULE x IS y", with no relation name.
check_object_ownership() isn't happy about the null relation pointer.
We could fix it there, but this seems more future-proof.
2011-08-11 11:23:51 -04:00
Tom Lane cff75130b5 Remove wal_sender_delay GUC, because it's no longer useful.
The latch infrastructure is now capable of detecting all cases where the
walsender loop needs to wake up, so there is no reason to have an arbitrary
timeout.

Also, modify the walsender loop logic to follow the standard pattern of
ResetLatch, test for work to do, WaitLatch.  The previous coding was both
hard to follow and buggy: it would sometimes busy-loop despite having
nothing available to do, eg between receipt of a signal and the next time
it was caught up with new WAL, and it also had interesting choices like
deciding to update to WALSNDSTATE_STREAMING on the strength of information
known to be obsolete.
2011-08-10 18:50:28 -04:00
Tom Lane 79b2ee20c8 Add a bit of debug logging to backend_read_statsfile().
This is in hopes of learning more about what causes "pgstat wait timeout"
warnings in the buildfarm.  This patch should probably be reverted once
we've learned what we can.  As coded, it will result in regression test
"failures" at half the delay that the existing code does, so I expect
to see a few more than before.
2011-08-10 16:45:43 -04:00
Tom Lane 4dab3d5ae1 Change the autovacuum launcher to use WaitLatch instead of a poll loop.
In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens.  This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno.  Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch.  Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom
2011-08-10 12:22:21 -04:00
Heikki Linnakangas 41f9ffd928 If backup-end record is not seen, and we reach end of recovery from a
streamed backup, throw an error and refuse to start up. The restore has not
finished correctly in that case and the data directory is possibly corrupt.
We already errored out in case of archive recovery, but could not during
crash recovery because we couldn't distinguish between the case that
pg_start_backup() was called and the database then crashed (must not error,
data is OK), and the case that we're restoring from a backup and not all
the needed WAL was replayed (data can be corrupt).

To distinguish those cases, add a line to backup_label to indicate
whether the backup was taken with pg_start/stop_backup(), or by streaming
(ie. pg_basebackup).

This requires re-initdb, because of a new field added to the control file.
2011-08-10 09:22:49 +03:00