Commit Graph

1768 Commits

Author SHA1 Message Date
Robert Haas d86d51a958 Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.

Thanks to Tom Lane for design help and Alvaro Herrera for the review.
2010-01-05 21:54:00 +00:00
Heikki Linnakangas 06f82b2961 Write an end-of-backup WAL record at pg_stop_backup(), and wait for it at
recovery instead of reading the backup history file. This is more robust,
as it stops you from prematurely starting up an inconsisten cluster if the
backup history file is lost for some reason, or if the base backup was
never finished with pg_stop_backup().

This also paves the way for a simpler streaming replication patch, which
doesn't need to care about backup history files anymore.

The backup history file is still created and archived as before, but it's
not used by the system anymore. It's just for informational purposes now.

Bump PG_CONTROL_VERSION as the location of the backup startpoint is now
written to a new field in pg_control, and catversion because initdb is
required

Original patch by Fujii Masao per Simon's idea, with further fixes by me.
2010-01-04 12:50:50 +00:00
Tom Lane 5b76bb180f Dept of second thoughts: my first cut at supporting "x IS NOT NULL" btree
indexscans would do the wrong thing if index_rescan() was called with a
NULL instead of a new set of scankeys and the index was DESC order,
because sk_strategy would not get flipped a second time.  I think
that those provisions for a NULL argument are dead code now as far as the
core backend goes, but possibly somebody somewhere is still using it.
In any case, this refactoring seems clearer, and it's definitely shorter.
2010-01-03 05:39:08 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 29c4ad9829 Support "x IS NOT NULL" clauses as indexscan conditions. This turns out
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL".  Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
2010-01-01 21:53:49 +00:00
Tom Lane 85d02a6586 Redefine Datum as uintptr_t, instead of unsigned long.
This is more in keeping with modern practice, and is a first step towards
porting to Win64 (which has sizeof(pointer) > sizeof(long)).

Tsutomu Yamada, Magnus Hagander, Tom Lane
2009-12-31 19:41:37 +00:00
Heikki Linnakangas ff1e1e45b9 Reset minRecoveryPoint at checkpoints, so that we don't uselessly update
it in the control file at crash recovery following an archive recovery.

Per Fujii Masao and subsequent discussion.
2009-12-30 08:37:21 +00:00
Tom Lane 668e37d138 Fix wrong WAL info value generated when gistContinueInsert() performs an
index page split.  This would result in index corruption, or even more likely
an error during WAL replay, if we were unlucky enough to crash during
end-of-recovery cleanup after having completed an incomplete GIST insertion.

Yoichi Hirai
2009-12-24 17:52:04 +00:00
Simon Riggs efc16ea520 Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 01:32:45 +00:00
Tom Lane 62aba76568 Prevent indirect security attacks via changing session-local state within
an allegedly immutable index function.  It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user.  However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.

The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue.  GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation.  Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement.  (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)

Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.

Security: CVE-2009-4136
2009-12-09 21:57:51 +00:00
Tom Lane 0cb65564e5 Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.

Jeff Davis, reviewed by Robert Haas
2009-12-07 05:22:23 +00:00
Heikki Linnakangas cd87b6f8a5 Fix an old bug in multixact and two-phase commit. Prepared transactions can
be part of multixacts, so allocate a slot for each prepared transaction in
the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer
the oldest member value from the current backends slot to the prepared xact
slot. Also save and recover the value from the 2pc state file.

The symptom of the bug was that after a transaction prepared, a shared lock
still held by the prepared transaction was sometimes ignored by other
transactions.

Fix back to 8.1, where both 2PC and multixact were introduced.
2009-11-23 09:58:36 +00:00
Teodor Sigaev 5e75f6790c Fix multicolumn GIN's wrong results with fastupdate enabled.
User-defined consistent functions believes the check array
contains at least one true element which was not a true for
scanning pending list.

Per report from Yury Don <yura@vpcit.ru>
2009-11-13 11:17:04 +00:00
Tom Lane 7d535ebe5b Dept of second thoughts: after studying index_getnext() a bit more I realize
that it can scribble on scan->xs_ctup.t_self while following HOT chains,
so we can't rely on that to stay valid between hashgettuple() calls.
Introduce a private variable in HashScanOpaque, instead.
2009-11-01 22:30:54 +00:00
Tom Lane c4afdca4c2 Fix two serious bugs introduced into hash indexes by the 8.4 patch that made
hash indexes keep entries sorted by hash value.  First, the original plans for
concurrency assumed that insertions would happen only at the end of a page,
which is no longer true; this could cause scans to transiently fail to find
index entries in the presence of concurrent insertions.  We can compensate
by teaching scans to re-find their position after re-acquiring read locks.
Second, neither the bucket split nor the bucket compaction logic had been
fixed to preserve hashvalue ordering, so application of either of those
processes could lead to permanent corruption of an index, in the sense
that searches might fail to find entries that are present.

This patch fixes the split and compaction logic to preserve hashvalue
ordering, but it cannot do anything about pre-existing corruption.  We will
need to recommend reindexing all hash indexes in the 8.4.2 release notes.

To buy back the performance loss hereby induced in split and compaction,
fix them to use PageIndexMultiDelete instead of retail PageIndexDelete
operations.  We might later want to do something with qsort'ing the
page contents rather than doing a binary search for each insertion,
but that seemed more invasive than I cared to risk in a back-patch.

Per bug #5157 from Jeff Janes and subsequent investigation.
2009-11-01 21:25:25 +00:00
Tom Lane 8d54c2482b Code review for LIKE INCLUDING patch --- clean up some cosmetic and not
so cosmetic stuff.
2009-10-13 00:53:08 +00:00
Andrew Dunstan faa1afc6c1 CREATE LIKE INCLUDING COMMENTS and STORAGE, and INCLUDING ALL shortcut. Itagaki Takahiro. 2009-10-12 19:49:24 +00:00
Tom Lane c970292a94 Remove very ancient tuple-counting infrastructure (IncrRetrieved() and
friends).  This code has all been ifdef'd out for many years, and doesn't
seem to have any prospect of becoming any more useful in the future.
EXPLAIN ANALYZE is what people use in practice, and I think if we did want
process-wide counters we'd be more likely to put in dtrace events for that
than try to resurrect this code.  Get rid of it so as to have one less detail
to worry about while refactoring execMain.c.
2009-10-08 22:34:57 +00:00
Tom Lane e66d714386 Make sure that GIN fast-insert and regular code paths enforce the same
tuple size limit.  Improve the error message for index-tuple-too-large
so that it includes the actual size, the limit, and the index name.
Sync with the btree occurrences of the same error.

Back-patch to 8.4 because it appears that the out-of-sync problem
is occurring in the field.

Teodor and Tom
2009-10-02 21:14:04 +00:00
Teodor Sigaev f92bbb899a Fix incorrect arguments for gist_box_penalty call. The bug could be observed
only for secondary page split (i.e. for non-first columns of index)

 Patch by Paul Ramsey <pramsey@opengeo.org>
2009-09-18 14:01:56 +00:00
Tom Lane 384cad5c7b Fix two distinct errors in creation of GIN_INSERT_LISTPAGE xlog records.
In practice these mistakes were always masked when full_page_writes was on,
because XLogInsert would always choose to log the full page, and then
ginRedoInsertListPage wouldn't try to do anything.  But with full_page_writes
off a WAL replay failure was certain.

The GIN_INSERT_LISTPAGE record type could probably be eliminated entirely
in favor of using XLOG_HEAP_NEWPAGE, but I refrained from doing that now
since it would have required a significantly more invasive patch.

In passing do a little bit of code cleanup, including making the accounting
for free space on GIN list pages more precise.  (This wasn't a bug as the
errors were always in the conservative direction.)

Per report from Simon.  Back-patch to 8.4 which contains the identical code.
2009-09-15 20:31:30 +00:00
Heikki Linnakangas 7f2a10fecd Don't error out if recycling or removing an old WAL segment fails at the end
of checkpoint. Although the checkpoint has been written to WAL at that point
already, so that all data is safe, and we'll retry removing the WAL segment at
the next checkpoint, if such a failure persists we won't be able to remove any
other old WAL segments either and will eventually run out of disk space. It's
better to treat the failure as non-fatal, and move on to clean any other WAL
segment and continue with any other end-of-checkpoint cleanup.

We don't normally expect any such failures, but on Windows it can happen with
some anti-virus or backup software that lock files without FILE_SHARE_DELETE
flag.

Also, the loop in pgrename() to retry when the file is locked was broken. If a
file is locked on Windows, you get ERROR_SHARE_VIOLATION, not
ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left
the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct
in some environment), and added ERROR_LOCK_VIOLATION to be consistent with
similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to
10s, on the grounds that since it's been broken, we've effectively had a
timeout of 0s and no-one has complained, so a smaller timeout is actually
closer to the old behavior. A longer timeout would mean that if recycling a
WAL file fails because it's locked for some reason, InstallXLogFileSegment()
will hold ControlFileLock for longer, potentially blocking other backends, so
a long timeout isn't totally harmless.

While we're at it, set errno correctly in pgrename().

Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c
changes would make sense on other platforms and thus on older versions as
well, but since there's no such locking issues on other platforms, it's not
worth it.
2009-09-13 18:32:08 +00:00
Heikki Linnakangas 4e2d5efc6a On Windows, when a file is deleted and another process still has an open
file handle on it, the file goes into "pending deletion" state where it
still shows up in directory listing, but isn't accessible otherwise. That
confuses RemoveOldXLogFiles(), making it think that the file hasn't been
archived yet, while it actually was, and it was deleted along with the .done
file.

Fix that by renaming the file with ".deleted" extension before deleting it.
Also check the return value of rename() and unlink(), so that if the removal
fails for any reason (e.g another process is holding the file locked), we
don't delete the .done file until the WAL file is really gone.

Backpatch to 8.2, which is the oldest version supported on Windows.
2009-09-10 09:42:10 +00:00
Tom Lane 794e3e81a0 Force VACUUM to recalculate oldestXmin even when we haven't changed our
own database's datfrozenxid, if the current value is old enough to be
forcing autovacuums or warning messages.  This ensures that a bogus
value is replaced as soon as possible.  Per a comment from Heikki.
2009-09-01 04:46:49 +00:00
Tom Lane 14f445fccf Actually, we need to bump the format identifier on twophase files
because of readjustment of 2PC rmgr IDs for flatfile removal.
2009-09-01 04:15:45 +00:00
Alvaro Herrera a8bb8eb583 Remove flatfiles.c, which is now obsolete.
Recent commits have removed the various uses it was supporting.  It was a
performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems
it slowed down user creation after a billion users.
2009-09-01 02:54:52 +00:00
Tom Lane 25ec228ef7 Track the current XID wrap limit (or more accurately, the oldest unfrozen
XID) in checkpoint records.  This eliminates the need to recompute the value
from scratch during database startup, which is one of the two remaining
reasons for the flatfile code to exist.  It should also simplify life for
hot-standby operation.

To avoid bloating the checkpoint records unreasonably, I switched from
tracking the oldest database by name to tracking it by OID.  This turns
out to save cycles in general (everywhere but the warning-generating
paths, which we hardly care about) and also helps us deal with the case
that the oldest database got dropped instead of being vacuumed.  The prior
coding might go for a long time without updating the wrap limit in that case,
which is bad because it might result in a lot of useless autovacuum activity.
2009-08-31 02:23:23 +00:00
Alvaro Herrera 53af86c55c Fix handling of autovacuum reloptions.
In the original coding, setting a single reloption would cause default
values to be used for all the other reloptions.  This is a problem
particularly for autovacuum reloptions.

Itagaki Takahiro
2009-08-27 17:18:44 +00:00
Heikki Linnakangas 9cd6685f91 In the checkpoint written at the end of archive recovery, the WAL page header
was incorrectly initialized with timeline ID 0. That rendered the WAL page
unrecoverable, making a subsequent archive recovery stop at that point.
ThisTimeLineID needs to be initialized before calling AdvanceXLInsertBuffer().

This fixes bug #5011 reported by James Bardin. Backpatch to 8.4, as the bug
was introduced by the changes to use of bgwriter for writing the
end-of-archive-recovery checkpoint. Patch by Tom Lane.
2009-08-27 07:15:41 +00:00
Tom Lane 7fc7a7c4d0 Fix a violation of WAL coding rules in the recent patch to include an
"all tuples visible" flag in heap page headers.  The flag update *must*
be applied before calling XLogInsert, but heap_update and the tuple
moving routines in VACUUM FULL were ignoring this rule.  A crash and
replay could therefore leave the flag incorrectly set, causing rows
to appear visible in seqscans when they should not be.  This might explain
recent reports of data corruption from Jeff Ross and others.

In passing, do a bit of editorialization on comments in visibilitymap.c.
2009-08-24 02:18:32 +00:00
Tom Lane 67a5f8ff9e Department of marginal improvements: teach tupconvert.c to avoid doing a
physical conversion when there are dropped columns in the same places in
the input and output tupdescs.  This avoids possible performance loss from
the recent patch to improve dropped-column handling, in some cases where
the old code would have worked.
2009-08-17 20:34:31 +00:00
Tom Lane 04011cc970 Allow backends to start up without use of the flat-file copy of pg_database.
To make this work in the base case, pg_database now has a nailed-in-cache
relation descriptor that is initialized using hardwired knowledge in
relcache.c.  This means pg_database is added to the set of relations that
need to have a Schema_pg_xxx macro maintained in pg_attribute.h.  When this
path is taken, we'll have to do a seqscan of pg_database to find the row
we need.

In the normal case, we are able to do an indexscan to find the database's row
by name.  This is made possible by storing a global relcache init file that
describes only the shared catalogs and their indexes (and therefore is usable
by all backends in any database).  A new backend loads this cache file,
finds its database OID after an indexscan on pg_database, and then loads
the local relcache init file for that database.

This change should effectively eliminate number of databases as a factor
in backend startup time, even with large numbers of databases.  However,
the real reason for doing it is as a first step towards getting rid of
the flat files altogether.  There are still several other sub-projects
to be tackled before that can happen.
2009-08-12 20:53:31 +00:00
Tom Lane 97e14f6e93 Document that LocalSetXLogInsertAllowed can be re-executed.
Per comment from Simon.
2009-08-08 16:39:17 +00:00
Tom Lane 87740caa01 rm_cleanup functions need to be allowed to write WAL entries. This oversight
appears to explain the recent reports of "PANIC: cannot make new WAL entries
during recovery".
2009-08-07 19:29:49 +00:00
Tom Lane dcb2bda9b7 Improve plpgsql's ability to cope with rowtypes containing dropped columns,
by supporting conversions in places that used to demand exact rowtype match.

Since this issue is certain to come up elsewhere (in fact, already has,
in ExecEvalConvertRowtype), factor out the support code into new core
functions for tuple conversion.  I chose to put these in a new source
file since heaptuple.c is already overly long.

Heavily revised version of a patch by Pavel Stehule.
2009-08-06 20:44:32 +00:00
Tom Lane 9072592946 Add ALTER TABLE ... ALTER COLUMN ... SET STATISTICS DISTINCT
Robert Haas
2009-08-02 22:14:53 +00:00
Tom Lane 527f0ae3fa Department of second thoughts: let's show the exact key during unique index
build failures, too.  Refactor a bit more since that error message isn't
spelled the same.
2009-08-01 20:59:17 +00:00
Tom Lane b680ae4bdb Improve unique-constraint-violation error messages to include the exact
values being complained of.

In passing, also remove the arbitrary length limitation in the similar
error detail message for foreign key violations.

Itagaki Takahiro
2009-08-01 19:59:41 +00:00
Tom Lane 25d9bf2e3e Support deferrable uniqueness constraints.
The current implementation fires an AFTER ROW trigger for each tuple that
looks like it might be non-unique according to the index contents at the
time of insertion.  This works well as long as there aren't many conflicts,
but won't scale to massive unique-key reassignments.  Improving that case
is a TODO item.

Dean Rasheed
2009-07-29 20:56:21 +00:00
Tom Lane ca7c8168de Tweak TOAST code so that columns marked with MAIN storage strategy are
not forced out-of-line unless that is necessary to make the row fit on a
page.  Previously, they were forced out-of-line if needed to get the row
down to the default target size (1/4th page).

Kevin Grittner
2009-07-22 01:21:22 +00:00
Peter Eisentraut de160e2c00 Make backend header files C++ safe
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright.  You still
(probably) need extern "C" { }; around the inclusion of backend headers.

based on a patch by Kurt Harriman <harriman@acm.org>

Also add a script cpluspluscheck to check for C++ compatibility in the
future.  As of right now, this passes without error for me.
2009-07-16 06:33:46 +00:00
Tom Lane 2de48a83e6 Cleanup and code review for the patch that made bgwriter active during
archive recovery.  Invent a separate state variable and inquiry function
for XLogInsertAllowed() to clarify some tests and make the management of
writing the end-of-recovery checkpoint less klugy.  Fix several places
that were incorrectly testing InRecovery when they should be looking at
RecoveryInProgress or XLogInsertAllowed (because they will now be executed
in the bgwriter not startup process).  Clarify handling of bad LSNs passed
to XLogFlush during recovery.  Use a spinlock for setting/testing
SharedRecoveryInProgress.  Improve quite a lot of comments.

Heikki and Tom
2009-06-26 20:29:04 +00:00
Heikki Linnakangas 7e48b77b1c Fix some serious bugs in archive recovery, now that bgwriter is active
during it:

When bgwriter is active, the startup process can't perform mdsync() correctly
because it won't see the fsync requests accumulated in bgwriter's private
pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery
checkpoint as well, when it's active.

When bgwriter is active (= archive recovery), the startup process must not
accumulate fsync requests to its own pendingOpsTable, since bgwriter won't
see them there when it performs restartpoints. Make startup process drop its
pendingOpsTable when bgwriter is launched to avoid that.

Update minimum recovery point one last time when leaving archive recovery.
It won't be updated by the end-of-recovery checkpoint because XLogFlush()
sees us as out of recovery already.

This fixes bug #4879 reported by Fujii Masao.
2009-06-25 21:36:00 +00:00
Heikki Linnakangas ebaa1952f1 The code to unlink dropped relations in FinishPreparedTransaction() was
acting like runs inside WAL recovery, but it doesn't. I must've copy-pasted
this from a redo-function in the relation forks patch. Noticed by Tom Lane
while he was looking through callers of smgrdounlink().
2009-06-25 19:05:52 +00:00
Peter Eisentraut 4183b10661 Correct grammar in picksplit debug messages 2009-06-24 15:16:22 +00:00
Heikki Linnakangas efa8544fd5 Fix a few errors in comments. Patch by Fujii Masao, plus the one in
visibilitymap.c by me.
2009-06-18 10:08:08 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Peter Eisentraut 746c28a8e4 Improve capitalization and punctuation in recently added GiST message. 2009-06-10 20:02:15 +00:00
Tom Lane 61dd4185ff Keep rs_startblock the same during heap_rescan, so that a rescan of a SeqScan
node starts from the same place as the first scan did.  This avoids surprising
behavior of scrollable and WITH HOLD cursors, as seen in Mark Kirkwood's bug
report of yesterday.

It's not entirely clear whether a rescan should be forced to drop out of the
syncscan mode, but for the moment I left the code behaving the same on that
point.  Any change there would only be a performance and not a correctness
issue, anyway.

Back-patch to 8.3, since the unstable behavior was created by the syncscan
patch.
2009-06-10 18:54:16 +00:00
Tom Lane 32ea236361 Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane
behavior in cases where we don't know the heap tuple count accurately; in
particular partial vacuum, but this also makes the API a bit more useful
for ANALYZE.  This patch adds "estimated_count" flags to both structs so
that an approximate count can be flagged as such, and adjusts the logic
so that approximate counts are not used for updating pg_class.reltuples.

This fixes my previous complaint that VACUUM was putting ridiculous values
into pg_class.reltuples for indexes.  The actual impact of that bug is
limited, because the planner only pays attention to reltuples for an index
if the index is partial; which probably explains why beta testers hadn't
noticed a degradation in plan quality from it.  But it needs to be fixed.

The whole thing is a bit messy and should be redesigned in future, because
reltuples now has the potential to drift quite far away from reality when
a long period elapses with no non-partial vacuums.  But this is as good as
it's going to get for 8.4.
2009-06-06 22:13:52 +00:00
Tom Lane 356eea24ce Fix a serious bug introduced into GIN in 8.4: now that MergeItemPointers()
is supposed to remove duplicate heap TIDs, we have to be sure to reduce the
tuple size and posting-item count accordingly in addItemPointersToTuple().
Failing to do so resulted in the effective injection of garbage TIDs into the
index contents, ie, whatever happened to be in the memory palloc'd for the
new tuple.  I'm not sure that this fully explains the index corruption
reported by Tatsuo Ishii, but the test case I'm using no longer fails.
2009-06-06 02:39:40 +00:00
Heikki Linnakangas 7c8d7a2eec Only recycle normal files in pg_xlog as WAL segments. pg_standby creates
symbolic links with the -l option, and as Fujii Masao pointed out we ended up
overwriting files in the archive directory before this patch. Patch by
Aidan Van Dyk, Fujii Masao and me.

Backpatch to 8.3, where pg_standby was introduced.
2009-06-02 06:18:06 +00:00
Heikki Linnakangas 2e6107cb62 When archiving is enabled, rotate the last WAL segment at shutdown so that
all transactions are archived.

Original patch by Guillaume Smet.
2009-05-28 11:02:16 +00:00
Tom Lane c3707a4fcd Use more-portable coding for the check on handing out the last available
relopt_kind value in add_reloption_kind().  Per Zdenek Kotala.
2009-05-24 22:22:44 +00:00
Tom Lane 7280fab717 Fix bug #4814 (wrong subscript in consistent-function call), and add some
minimal regression test coverage for matchPartialInPendingList().
2009-05-19 02:48:26 +00:00
Tom Lane 4616d57dad Fix all the server-side SIGQUIT handlers (grumble ... why so many identical
copies?) to ensure they really don't run proc_exit/shmem_exit callbacks,
as was intended.  I broke this behavior recently by installing atexit
callbacks without thinking about the one case where we truly don't want
to run those callback functions.  Noted in an example from Dave Page.
2009-05-15 15:56:39 +00:00
Tom Lane bfab3f19e3 Include recovery_end_command in recovery.conf.sample.
Per suggestion of Jaime Casanova.
2009-05-14 22:22:01 +00:00
Tom Lane 284e12c398 Improve a couple of comments. 2009-05-14 21:28:35 +00:00
Heikki Linnakangas 9e403c2587 Add recovery_end_command option to recovery.conf. recovery_end_command
is run at the end of archive recovery, providing a chance to do external
cleanup. Modify pg_standby so that it no longer removes the trigger file,
that is to be done using the recovery_end_command now.

Provide a "smart" failover mode in pg_standby, where we don't fail over
immediately, but only after recovering all unapplied WAL from the archive.
That gives you zero data loss assuming all WAL was archived before
failover, which is what most users of pg_standby actually want.

recovery_end_command by Simon Riggs, pg_standby changes by Fujii Masao and
myself.
2009-05-14 20:31:09 +00:00
Tom Lane 23543c732b Rewrite xml.c's memory management (yet again). Give up on the idea of
redirecting libxml's allocations into a Postgres context.  Instead, just let
it use malloc directly, and add PG_TRY blocks as needed to be sure we release
libxml data structures in error recovery code paths.  This is ugly but seems
much more likely to play nicely with third-party uses of libxml, as seen in
recent trouble reports about using Perl XML facilities in pl/perl and bug
#4774 about contrib/xml2.

I left the code for allocation redirection in place, but it's only
built/used if you #define USE_LIBXMLCONTEXT.  This is because I found it
useful to corral libxml's allocations in a palloc context when hunting
for libxml memory leaks, and we're surely going to have more of those
in the future with this type of approach.  But we don't want it turned on
in a normal build because it breaks exactly what we need to fix.

I have not re-indented most of the code sections that are now wrapped
by PG_TRY(); that's for ease of review.  pg_indent will fix it.

This is a pre-existing bug in 8.3, but I don't dare back-patch this change
until it's gotten a reasonable amount of field testing.
2009-05-13 20:27:17 +00:00
Tom Lane f23bdda324 Fix LOCK TABLE to eliminate the race condition that could make it give weird
errors when tables are concurrently dropped.  To do this we must take lock
on each relation before we check its privileges.  The old code was trying
to do that the other way around, which is a bit pointless when there are lots
of other commands that lock relations before checking privileges.  I did keep
it checking each relation's privilege before locking the next relation, which
is a detail that ALTER TABLE isn't too picky about.
2009-05-12 16:43:32 +00:00
Heikki Linnakangas 223431cba1 Request XLOG switch before writing checkpoint in pg_start_backup(). Otherwise
you can end up with an unrecoverable backup if you start a new base backup
right after finishing archive recovery. In that scenario, the redo pointer of
the checkpoint that pg_start_backup() writes points to the XLOG segment where
the timeline-changing end-of-archive-recovery checkpoint is. The beginning
of that segment contains pages with the old timeline ID, and we don't accept
that in recovery unless we find a history file covering the old timeline ID.
If you omit pg_xlog from the base backup and clear the archive directory
before starting the backup, there will be no such history file available.

The bug is present in all versions since PITR was introduced in 8.0, but I'm
back-patching only back to 8.2. Earlier versions didn't have XLOG switch
records, making this fix unfeasible. Given the lack of reports until now,
it doesn't seem worthwhile to spend more effort to fix 8.0 and 8.1.

Per report and suggestion by Mikael Krantz
2009-05-07 11:25:25 +00:00
Tom Lane 8f348112f3 Insert CHECK_FOR_INTERRUPTS() calls into btree and hash index scans at the
points where we step right or left to the next page.  This should ensure
reasonable response time to a query cancel request during an unsuccessful
index scan, as seen in recent gripe from Marc Cousin.  It's a bit trickier
than it might seem at first glance, because CHECK_FOR_INTERRUPTS() is a no-op
if executed while holding a buffer lock.  So we have to do it just at the
point where we've dropped one page lock and not yet acquired the next.

Remove CHECK_FOR_INTERRUPTS calls at the top level of btgetbitmap and
hashgetbitmap, since they're pointless given the added checks.

I think that GIST is okay already --- at least, there's a CHECK_FOR_INTERRUPTS
at a plausible-looking place in gistnext().  I don't claim to know GIN well
enough to try to poke it for this, if indeed it has a problem at all.

This is a pre-existing issue, but in view of the lack of prior complaints
I'm not going to risk back-patching.
2009-05-05 19:36:32 +00:00
Tom Lane 2aa5ca952f Update comment for _bt_relandgetbuf. 2009-05-05 19:02:22 +00:00
Tom Lane 8d4f2ecd41 Change the default value of max_prepared_transactions to zero, and add
documentation warnings against setting it nonzero unless active use of
prepared transactions is intended and a suitable transaction manager has been
installed.  This should help to prevent the type of scenario we've seen
several times now where a prepared transaction is forgotten and eventually
causes severe maintenance problems (or even anti-wraparound shutdown).

The only real reason we had the default be nonzero in the first place was to
support regression testing of the feature.  To still be able to do that,
tweak pg_regress to force a nonzero value during "make check".  Since we
cannot force a nonzero value in "make installcheck", add a variant regression
test "expected" file that shows the results that will be obtained when
max_prepared_transactions is zero.

Also, extend the HINT messages for transaction wraparound warnings to mention
the possibility that old prepared transactions are causing the problem.

All per today's discussion.
2009-04-23 00:23:46 +00:00
Heikki Linnakangas bae8102f52 After archive recovery, mark the last WAL segment from the parent timeline
ready for archival. It was marked at the next checkpoint anyway, but
waiting for the next checkpoint is an unnecessary delay.

Fujii Masao
2009-04-22 19:51:12 +00:00
Tom Lane 387060951e Add an optional parameter to pg_start_backup() that specifies whether to do
the checkpoint in immediate or lazy mode.  This is to address complaints
that pg_start_backup() takes a long time even when there's no need to minimize
its I/O consumption.
2009-04-07 00:31:26 +00:00
Teodor Sigaev 09368d23db Fix 'all at one page bug' in picksplit method of R-tree emulation. Add defense
from buggy user-defined picksplit to GiST.
2009-04-06 14:27:27 +00:00
Teodor Sigaev 329a5322e9 Fix infinite loop while checking of partial match in pending list.
Improve comments. Now GIN-indexable operators should be strict.
Per Tom's questions/suggestions.
2009-04-05 11:32:01 +00:00
Tom Lane 090173a3f9 Remove the recently added node types ReloptElem and OptionDefElem in favor
of adding optional namespace and action fields to DefElem.  Having three
node types that do essentially the same thing bloats the code and leads
to errors of confusion, such as in yesterday's bug report from Khee Chin.
2009-04-04 21:12:31 +00:00
Alvaro Herrera 1c855f01ea Disallow setting fillfactor for TOAST tables.
To implement this without almost duplicating the reloption table, treat
relopt_kind as a bitmask instead of an integer value.  This decreases the
range of allowed values, but it's not clear that there's need for that much
values anyway.

This patch also makes heap_reloptions explicitly a no-op for relation kinds
other than heap and TOAST tables.

Patch by ITAGAKI Takahiro with minor edits from me.  (In particular I removed
the bit about adding relation kind to an error message, which I intend to
commit separately.)
2009-04-04 00:45:02 +00:00
Bruce Momjian 0e550ff617 Revert DTrace patch from Robert Lor 2009-04-02 20:59:10 +00:00
Bruce Momjian 227f817c1f Add support for additional DTrace probes.
Robert Lor
2009-04-02 19:14:34 +00:00
Tom Lane 793d5662e8 Fix an oversight in the support for storing/retrieving "minimal tuples" in
TupleTableSlots.  We have functions for retrieving a minimal tuple from a slot
after storing a regular tuple in it, or vice versa; but these were implemented
by converting the internal storage from one format to the other.  The problem
with that is it invalidates any pass-by-reference Datums that were already
fetched from the slot, since they'll be pointing into the just-freed version
of the tuple.  The known problem cases involve fetching both a whole-row
variable and a pass-by-reference value from a slot that is fed from a
tuplestore or tuplesort object.  The added regression tests illustrate some
simple cases, but there may be other failure scenarios traceable to the same
bug.  Note that the added tests probably only fail on unpatched code if it's
built with --enable-cassert; otherwise the bug leads to fetching from freed
memory, which will not have been overwritten without additional conditions.

Fix by allowing a slot to contain both formats simultaneously; which turns out
not to complicate the logic much at all, if anything it seems less contorted
than before.

Back-patch to 8.2, where minimal tuples were introduced.
2009-03-30 04:08:43 +00:00
Tom Lane 87b8db3774 Adjust the APIs for GIN opclass support functions to allow the extractQuery()
method to pass extra data to the consistent() and comparePartial() methods.
This is the core infrastructure needed to support the soon-to-appear
contrib/btree_gin module.  The APIs are still upward compatible with the
definitions used in 8.3 and before, although *not* with the previous 8.4devel
function definitions.

catversion bump for changes in pg_proc entries (although these are just
cosmetic, since GIN doesn't actually look at the function signature before
calling it...)

Teodor Sigaev and Oleg Bartunov
2009-03-25 22:19:02 +00:00
Tom Lane e5efda442c Install a search tree depth limit in GIN bulk-insert operations, to prevent
them from degrading badly when the input is sorted or nearly so.  In this
scenario the tree is unbalanced to the point of becoming a mere linked list,
so insertions become O(N^2).  The easiest and most safely back-patchable
solution is to stop growing the tree sooner, ie limit the growth of N.  We
might later consider a rebalancing tree algorithm, but it's not clear that
the benefit would be worth the cost and complexity.  Per report from Sergey
Burladyan and an earlier complaint from Heikki.

Back-patch to 8.2; older versions didn't have GIN indexes.
2009-03-24 22:06:03 +00:00
Tom Lane ff301d6e69 Implement "fastupdate" support for GIN indexes, in which we try to accumulate
multiple index entries in a holding area before adding them to the main index
structure.  This helps because bulk insert is (usually) significantly faster
than retail insert for GIN.

This patch also removes GIN support for amgettuple-style index scans.  The
API defined for amgettuple is difficult to support with fastupdate, and
the previously committed partial-match feature didn't really work with
it either.  We might eventually figure a way to put back amgettuple
support, but it won't happen for 8.4.

catversion bumped because of change in GIN's pg_am entry, and because
the format of GIN indexes changed on-disk (there's a metapage now,
and possibly a pending list).

Teodor Sigaev
2009-03-24 20:17:18 +00:00
Tom Lane 1079564979 Const-ify the parse table passed to fillRelOptions. The previous coding
meant it had to be built on-the-fly at each entry to default_reloptions.
2009-03-23 16:36:27 +00:00
Tom Lane e04810e8c4 Code review for dtrace probes added (so far) to 8.4. Adjust placement of
some bufmgr probes, take out redundant and memory-leak-inducing path arguments
to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to
recalculate space used in sort__done, clean up formatting in places where
I'm not sure pgindent will do a nice job by itself.
2009-03-11 23:19:25 +00:00
Heikki Linnakangas fb7df896fc Reload config file in startup process on SIGHUP.
Fujii Masao
2009-03-04 13:56:40 +00:00
Tom Lane 640796ff41 Reduce the maximum value of vacuum_cost_delay and autovacuum_vacuum_cost_delay
to 100ms (from 1000).  This still seems to be comfortably larger than the
useful range of the parameter, and it should help discourage people from
picking uselessly large values.  Tweak the documentation to recommend small
values, too.  Per discussion of a couple weeks ago.
2009-02-28 00:10:52 +00:00
Heikki Linnakangas bc134d7a51 Change the signaling of end-of-recovery. Startup process now indicates end
of recovery by exiting with exit code 0, like in previous releases. Per
Tom's suggestion.
2009-02-23 09:28:50 +00:00
Heikki Linnakangas cdd46c7654 Start background writer during archive recovery. Background writer now performs
its usual buffer cleaning duties during archive recovery, and it's responsible
for performing restartpoints.

This requires some changes in postmaster. When the startup process has done
all the initialization and is ready to start WAL redo, it signals the
postmaster to launch the background writer. The postmaster is signaled again
when the point in recovery is reached where we know that the database is in
consistent state. Postmaster isn't interested in that at the moment, but
that's the point where we could let other backends in to perform read-only
queries. The postmaster is signaled third time when the recovery has ended,
so that postmaster knows that it's safe to start accepting connections.

The startup process now traps SIGTERM, and performs a "clean" shutdown. If
you do a fast shutdown during recovery, a shutdown restartpoint is performed,
like a shutdown checkpoint, and postmaster kills the processes cleanly. You
still have to continue the recovery at next startup, though.

Currently, the background writer is only launched during archive recovery.
We could launch it during crash recovery as well, but it seems better to keep
that codepath as simple as possible, for the sake of robustness. And it
couldn't do any restartpoints during crash recovery anyway, so it wouldn't be
that useful.

log_restartpoints is gone. Use log_checkpoints instead. This is yet to be
documented.

This whole operation is a pre-requisite for Hot Standby, but has some value of
its own whether the hot standby patch makes 8.4 or not.

Simon Riggs, with lots of modifications by me.
2009-02-18 15:58:41 +00:00
Tom Lane 8205258fa6 Adopt Bob Jenkins' improved hash function for hash_any(). This changes the
contents of hash indexes (again), so bump catversion.

Kenneth Marshall
2009-02-09 21:18:28 +00:00
Alvaro Herrera 834a6da4f7 Update autovacuum to use reloptions instead of a system catalog, for
per-table overrides of parameters.

This removes a whole class of problems related to misusing the catalog,
and perhaps more importantly, gives us pg_dump support for the parameters.

Based on a patch by Euler Taveira de Oliveira, heavily reworked by me.
2009-02-09 20:57:59 +00:00
Heikki Linnakangas b75b66332a Fix obsolete comment. Zdenek Kotala 2009-02-07 10:49:36 +00:00
Alvaro Herrera 3a5b773715 Allow reloption names to have qualifiers, initially supporting a TOAST
qualifier, and add support for this in pg_dump.

This allows TOAST tables to have user-defined fillfactor, and will also
enable us to move the autovacuum parameters to reloptions without taking
away the possibility of setting values for TOAST tables.
2009-02-02 19:31:40 +00:00
Alvaro Herrera c0f92b57dc Allow extracting and parsing of reloptions from a bare pg_class tuple, and
refactor the relcache code that used to do that.  This allows other callers
(particularly autovacuum) to do the same without necessarily having to open
and lock a table.
2009-01-26 19:41:06 +00:00
Heikki Linnakangas 9187cedd7c Put back fast-path for the case that there's no backup blocks in
RestoreBkpBlocks. Went missing in my recent refactoring patch, as pointed
out by Simon's hot standby patch.
2009-01-23 11:19:34 +00:00
Tom Lane 3cb5d6580a Support column-level privileges, as required by SQL standard.
Stephen Frost, with help from KaiGai Kohei and others
2009-01-22 20:16:10 +00:00
Heikki Linnakangas b2a667b9ee Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should
be used instead of the normal exclusive lock, and make WAL redo functions
responsible for calling RestoreBkpBlocks(). They know better what kind of a
lock they need.

At the moment, this just moves things around with no functional change, but
makes the hot standby patch that's under review cleaner.
2009-01-20 18:59:37 +00:00
Alvaro Herrera 8ebe1e356c Simplify the writing of amoptions routines by introducing a convenience
fillRelOptions routine that stores the parsed values in the struct using a
table-based approach.  Per Tom suggestion.  Also remove the "continue"
in HANDLE_*_RELOPTION macros, which were useless and in spirit they were
assuming too much of how the macros were going to be used.  (Note that these
macros are now unused, but the intention is to introduce some usage in a
future autovacuum patch, which is why they weren't completely removed.)

Also, do not call the string validation routine when not validating.  It seems
less error-prone this way, per commentary on the amoptions SGML docs.
2009-01-12 21:02:15 +00:00
Tom Lane 1a37056a74 Re-enable the old code in xlog.c that tried to use posix_fadvise(), so that
we can get some buildfarm feedback about whether that function is still
problematic.  (Note that the planned async-preread patch will not really
prove anything one way or the other in buildfarm testing, since it will
be inactive with default GUC settings.)
2009-01-11 18:02:17 +00:00
Tom Lane 43a57cf365 Revise the TIDBitmap API to support multiple concurrent iterations over a
bitmap.  This is extracted from Greg Stark's posix_fadvise patch; it seems
worth committing separately, since it's potentially useful independently of
posix_fadvise.
2009-01-10 21:08:36 +00:00
Alvaro Herrera b813c8daca A couple further reloptions improvements, per KaiGai Kohei: add a validation
function to the string type and add a couple of macros for string handling.

In passing, fix an off-by-one bug of mine.
2009-01-08 19:34:41 +00:00
Alvaro Herrera b25433da5d Fix string reloption handling, per KaiGai Kohei. 2009-01-06 14:47:37 +00:00
Bruce Momjian 1d2a185eae Suppress compiler warning in a different way, per Alvaro. 2009-01-06 03:15:51 +00:00
Bruce Momjian 02aa5f19c6 Supress compiler warning. 2009-01-06 02:44:17 +00:00
Alvaro Herrera ba748f7a11 Change the reloptions machinery to use a table-based parser, and provide
a more complete framework for writing custom option processing routines
by user-defined access methods.

Catalog version bumped due to the general API changes, which are going to
affect user-defined "amoptions" routines.
2009-01-05 17:14:28 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Bruce Momjian 4ee79fd20d Change the name of dtrace wal tracepoints:
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY

Robert Lor
2008-12-24 20:41:29 +00:00
Bruce Momjian 5a90bc1fbe The attached patch contains a couple of fixes in the existing probes and
includes a few new ones.

- Fixed compilation errors on OS X for probes that use typedefs
- Fixed a number of probes to pass ForkNumber per the relation forks
patch
- The new probes are those that were taken out from the previous
submitted patch and required simple fixes. Will submit the other probes
that may require more discussion in a separate patch.

Robert Lor
2008-12-17 01:39:04 +00:00
Tom Lane fc3297d828 Make heap_update() set newtup->t_tableOid correctly, for consistency with
the other major heapam.c functions.  The only known consequence of this
omission is that UPDATE RETURNING failed to return the correct value for
"tableoid", as per report from KaiGai Kohei.

Back-patch to 8.2.  Arguably it's wrong all the way back; but without
evidence of visible breakage before RETURNING was added, I'll desist from
patching the older branches.
2008-12-16 16:26:08 +00:00
Tom Lane 17dc173660 To reduce confusion over whether VACUUM FULL is needed for anti-wraparound
vacuuming (it's not), say "database-wide VACUUM" instead of "full-database
VACUUM" in the relevant hint messages.  Also, document the permissions needed
to do this.  Per today's discussion.
2008-12-11 18:16:18 +00:00
Heikki Linnakangas dea81a6cf6 Revert SIGUSR1 multiplexing patch, per Tom's objection. 2008-12-09 15:59:39 +00:00
Heikki Linnakangas 7b05b3fa39 Provide support for multiplexing SIGUSR1 signal. The upcoming synchronous
replication patch needs a signal, but we've already used SIGUSR1 and
SIGUSR2 in normal backends. This patch allows reusing SIGUSR1 for that,
and for other purposes too if the need arises.
2008-12-09 14:28:20 +00:00
Heikki Linnakangas 7a567d9407 MAPSIZE macro needs to use MAXALIGN(SizeOfPageHeaderData) instead of
SizeOfPageHeaderData, like PageGetContents does. Per report by Pavan
Deolasee.
2008-12-06 17:31:37 +00:00
Alvaro Herrera 7b640b0345 Fix a couple of snapshot management bugs in the new ResourceOwner world:
non-writable large objects need to have their snapshots registered on the
transaction resowner, not the current portal's, because it must persist until
the large object is closed (which the portal does not).  Also, ensure that the
serializable snapshot is recorded by the transaction resource owner too, even
when a subtransaction has changed the current resource owner before
serializable is taken.

Per bug reports from Pavan Deolasee.
2008-12-04 14:51:02 +00:00
Teodor Sigaev 69b3383cfb Initialize GISTScanOpaque->qual_ok even if there is no conditions. 2008-12-04 11:08:46 +00:00
Heikki Linnakangas 608195a3a3 Introduce visibility map. The visibility map is a bitmap with one bit per
heap page, where a set bit indicates that all tuples on the page are
visible to all transactions, and the page therefore doesn't need
vacuuming. It is stored in a new relation fork.

Lazy vacuum uses the visibility map to skip pages that don't need
vacuuming. Vacuum is also responsible for setting the bits in the map.
In the future, this can hopefully be used to implement index-only-scans,
but we can't currently guarantee that the visibility map is always 100%
up-to-date.

In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on
each heap page, also indicating that all tuples on the page are visible to
all transactions. It's important that this flag is kept up-to-date. It
is also used to skip visibility tests in sequential scans, which gives a
small performance gain on seqscans.
2008-12-03 13:05:22 +00:00
Heikki Linnakangas b457b2a24e If pg_stop_backup() is called just after switching to a new xlog file,
wait for the previous instead of the new file to be archived.

Based on patch by Simon Riggs.
2008-12-03 08:20:11 +00:00
Tom Lane c1f3073333 Clean up the API for DestReceiver objects by eliminating the assumption
that a Portal is a useful and sufficient additional argument for
CreateDestReceiver --- it just isn't, in most cases.  Instead formalize
the approach of passing any needed parameters to the receiver separately.

One unexpected benefit of this change is that we can declare typedef Portal
in a less surprising location.

This patch is just code rearrangement and doesn't change any functionality.
I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.
2008-11-30 20:51:25 +00:00
Heikki Linnakangas 9858a8c81c Rely on relcache invalidation to update the cached size of the FSM. 2008-11-26 17:08:58 +00:00
Heikki Linnakangas 3396000684 Rethink the way FSM truncation works. Instead of WAL-logging FSM
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.

This will make it easier to add new forks with similar truncation logic,
like the visibility map.
2008-11-19 10:34:52 +00:00
Alvaro Herrera 03e5248d0f Replace the usage of heap_addheader to create pg_attribute tuples with regular
heap_form_tuple.  Since this removes the last remaining caller of
heap_addheader, remove it.

Extracted from the column privileges patch from Stephen Frost, with further
code cleanups by me.
2008-11-14 01:57:42 +00:00
Tom Lane 10e3acb8e7 Prevent synchronous scan during GIN index build, because GIN is optimized
for inserting tuples in increasing TID order.  It's not clear whether this
fully explains Ivan Sergio Borgonovo's complaint, but simple testing
confirms that a scan that doesn't start at block 0 can slow GIN build by
a factor of three or four.

Backpatch to 8.3.  Sync scan didn't exist before that.
2008-11-13 17:42:10 +00:00
Tom Lane cad3a26a95 Fix sloppy omission of now-required #include's. 2008-11-11 14:17:02 +00:00
Heikki Linnakangas 7e8b0b9ab1 Change error messages to print the physical path, like
"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1",
per Alvaro's suggestion. I didn't change the messages in the higher-level
index, heap and FSM routines, though, where the fork is implicit.
2008-11-11 13:19:16 +00:00
Tom Lane 1d577f5e49 Add a startup check that pg_xlog and pg_xlog/archive_status exist.
If the latter doesn't exist, automatically recreate it.  (We don't do
this for pg_xlog, though, per discussion.)

Jonah Harris
2008-11-09 17:51:15 +00:00
Tom Lane 85e2cedf98 Improve bulk-insert performance by keeping the current target buffer pinned
(but not locked, as that would risk deadlocks).  Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.

Robert Haas, after an idea of Simon Riggs'.
2008-11-06 20:51:15 +00:00
Heikki Linnakangas 5ae29525d1 The logic in systable_beginscan to translate heap attribute numbers to
index column numbers needs to handle the case where you have more than
one scankey on the same index column. toast_fetch_datum_slice() needs it.
2008-11-06 13:07:08 +00:00
Tom Lane b4eae023bb Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage
by splitting it into three functions with better-defined behaviors.

Zdenek Kotala
2008-11-03 20:47:49 +00:00
Alvaro Herrera 4ff0468371 Fix silly typo in previous commit. 2008-11-03 19:26:07 +00:00
Alvaro Herrera d698bf83d1 Fix TransactionIdSetStatusBit so that it doesn't try to change a transaction
from COMMITTED to SUBCOMMITTED during recovery.  This wasn't previously
possible, but it is now due to the recent changes on clog commit protocol for
subtransactions.

Simon Riggs
2008-11-03 19:24:03 +00:00
Alvaro Herrera b107299c40 Fix mistakes in comment headers 2008-11-03 15:10:17 +00:00
Tom Lane d7112cfa88 Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't
allowed different processes to have different addresses for the shmem segment
in quite a long time, but there were still a few places left that used the
old coding convention.  Clean them up to reduce confusion and improve the
compiler's ability to detect pointer type mismatches.

Kris Jurka
2008-11-02 21:24:52 +00:00
Tom Lane 902d1cb35f Remove all uses of the deprecated functions heap_formtuple, heap_modifytuple,
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values).  Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.

Kris Jurka
2008-11-02 01:45:28 +00:00
Heikki Linnakangas e9816533e3 Update FSM on WAL replay. This is a bit limited; the FSM is only updated
on non-full-page-image WAL records, and quite arbitrarily, only if there's
less than 20% free space on the page after the insert/update (not on HOT
updates, though). The 20% cutoff should avoid most of the overhead, when
replaying a bulk insertion, for example, while ensuring that pages that
are full are marked as full in the FSM.

This is mostly to avoid the nasty worst case scenario, where you replay
from a PITR archive, and the FSM information in the base backup is really
out of date. If there was a lot of pages that the outdated FSM claims to
have free space, but don't actually have any, the first unlucky inserter
after the recovery would traverse through all those pages, just to find
out that they're full. We didn't have this problem with the old FSM
implementation, because we simply threw the FSM information away on a
non-clean shutdown.
2008-10-31 19:40:27 +00:00
Heikki Linnakangas 19c8dc839b Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".

Add fork number to some error messages in bufmgr.c, that still lacked it.
2008-10-31 15:05:00 +00:00
Tom Lane 2314baef38 Fix recoveryLastXTime logic so that it actually does what one would expect.
Per gripe from Kevin Grittner.  Backpatch to 8.3, where the bug was introduced.
2008-10-30 04:06:16 +00:00
Alvaro Herrera c9d1efda96 No need for extra code to log freezing zero tuples. Callers already check that
they are freezing a nonzero amount anyway.
2008-10-27 21:50:12 +00:00
Teodor Sigaev b9856b67a7 Fix GiST's killing tuple: GISTScanOpaque->curpos wasn't
correctly set. As result, killtuple() marks as dead
wrong tuple on page. Bug was introduced by me while fixing
possible duplicates during GiST index scan.
2008-10-22 12:53:56 +00:00
Alvaro Herrera 97227e9ec0 These functions no longer return a value, per complaint from gothic_moth via
Zdenek Kotala.
2008-10-20 20:38:24 +00:00
Alvaro Herrera 06da3c570f Rework subtransaction commit protocol for hot standby.
This patch eliminates the marking of subtransactions as SUBCOMMITTED in pg_clog
during their commit; instead they remain in-progress until main transaction
commit.  At main transaction commit, the commit protocol is atomic-by-page
instead of one transaction at a time.  To avoid a race condition with some
subtransactions appearing committed before others in the case where they span
more than one pg_clog page, we conserve the logic that marks them subcommitted
before marking the parent committed.

Simon Riggs with minor help from me
2008-10-20 19:18:18 +00:00
Teodor Sigaev 3afffbc902 Remove support of backward scan in GiST. Per discussion
http://archives.postgresql.org/pgsql-hackers/2008-10/msg00857.php
2008-10-20 16:35:14 +00:00
Teodor Sigaev 77db9d9ff2 Remove mark/restore support in GIN and GiST indexes.
Per Tom's comment.
Also revome useless GISTScanOpaque->flags field.
2008-10-20 13:39:44 +00:00
Tom Lane af59a0650b Remove useless mark/restore support in hash index AM, per discussion.
(I'm leaving GiST/GIN cleanup to Teodor.)
2008-10-17 23:50:57 +00:00
Teodor Sigaev beeb3562dd During repeated rescan of GiST index it's possible that scan key
is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys
to set during key initialization. If key is NULL and SK_SEARCHNULL is not
set then nothnig can be satisfied.
With assert-enabled compilation that causes coredump.

Bug was introduced in 8.3 by support of IS NULL index scan.
2008-10-17 17:02:21 +00:00
Tom Lane 30584cda35 Fix small query-lifespan memory leak introduced by 8.4 change in index AM API
for bitmap index scans.  Per report and test case from Kevin Grittner.
2008-10-10 14:17:08 +00:00
Tom Lane 3437286356 Modify the parser's error reporting to include a specific hint for the case
of referencing a WITH item that's not yet in scope according to the SQL
spec's semantics.  This seems to be an easy error to make, and the bare
"relation doesn't exist" message doesn't lead one's mind in the correct
direction to fix it.
2008-10-08 01:14:44 +00:00
Heikki Linnakangas 89f373bf5b Index FSMs needs to be vacuumed as well. Report by Jeff Davis. 2008-10-06 08:04:11 +00:00
Bruce Momjian 2cc1633a35 Update README.HOT to reflect new snapshot tracking and xmin advancement
code in 8.4.
2008-10-02 20:59:31 +00:00
Heikki Linnakangas 15c121b3ed Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the
free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).

This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.

Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.
2008-09-30 10:52:14 +00:00
Heikki Linnakangas 61d9674988 Make LC_COLLATE and LC_CTYPE database-level settings. Collation and
ctype are now more like encoding, stored in new datcollate and datctype
columns in pg_database.

This is a stripped-down version of Radek Strnad's patch, with further
changes by me.
2008-09-23 09:20:39 +00:00
Tom Lane 4adc2f72a4 Change hash indexes to store only the hash code rather than the whole indexed
value.  This means that hash index lookups are always lossy and have to be
rechecked when the heap is visited; however, the gain in index compactness
outweighs this when the indexed values are wide.  Also, we only need to
perform datatype comparisons when the hash codes match exactly, rather than
for every entry in the hash bucket; so it could also win for datatypes that
have expensive comparison functions.  A small additional win is gained by
keeping hash index pages sorted by hash code and using binary search to reduce
the number of index tuples we have to look at.

Xiao Meng

This commit also incorporates Zdenek Kotala's patch to isolate hash metapages
and hash bitmaps a bit better from the page header datastructures.
2008-09-15 18:43:41 +00:00
Alvaro Herrera d53a56687f Initialize the minimum frozen Xid in vac_update_datfrozenxid using
GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not
depend on the latter being correctly set elsewhere, and while it is more
expensive, this code path is not performance-critical.  This is a real
risk for autovacuum, because it can execute whole cycles without doing
a single vacuum, which would mean that RecentGlobalXmin would stay at its
initialization value, FirstNormalTransactionId, causing a bogus value to be
inserted in pg_database.  This bug could explain some recent reports of
failure to truncate pg_clog.

At the same time, change the initialization of RecentGlobalXmin to
InvalidTransactionId, and ensure that it's set to something else whenever
it's going to be used.  Using it as FirstNormalTransactionId in HOT page
pruning could incur in data loss.  InitPostgres takes care of setting it
to a valid value, but the extra checks are there to prevent "special"
backends from behaving in unusual ways.

Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us
2008-09-11 14:01:10 +00:00
Tom Lane ead21631e8 Fix a couple of problems pointed out by Fujii Masao in the 2008-Apr-05 patch
for pg_stop_backup.  First, it is possible that the history file name is not
alphabetically later than the last WAL file name, so we should explicitly
check that both have been archived.  Second, the previous coding would wait
forever if a checkpoint had managed to remove the WAL file before we look for
it.

Simon Riggs, plus some code cleanup by me.
2008-09-08 16:42:15 +00:00
Teodor Sigaev 5373817cf2 Fix strategy propagation to scanEntry for partial match by moving propagation
to initializaion of scanEntry.
2008-09-04 11:47:05 +00:00
Teodor Sigaev 1dcf6fdf1b Fix possible duplicate tuples while GiST scan. Now page is processed
at once and ItemPointers are collected in memory.

Remove tuple's killing by killtuple() if tuple was moved to another
page - it could produce unaceptable overhead.

Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.
2008-08-23 10:37:24 +00:00
Heikki Linnakangas 3f0e808c4a Introduce the concept of relation forks. An smgr relation can now consist
of multiple forks, and each fork can be created and grown separately.

The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.

This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.
2008-08-11 11:05:11 +00:00
Alvaro Herrera e36e6b1cab Add a few more DTrace probes to the backend.
Robert Lor
2008-08-01 13:16:09 +00:00
Tom Lane 11c794f224 Use guc.c's parse_int() instead of pg_atoi() to parse fillfactor in
default_reloptions().  The previous coding was really a bug because pg_atoi()
will always throw elog on bad input data, whereas default_reloptions is not
supposed to complain about bad input unless its validate parameter is true.
Right now you could only expose the problem by hand-modifying
pg_class.reloptions into an invalid state, so it doesn't seem worth
back-patching; but we should get it right in HEAD because there might be other
situations in future.  Noted while studying GIN fast-update patch.
2008-07-23 17:29:53 +00:00
Tom Lane 9d035f4254 Clean up the use of some page-header-access macros: principally, use
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible.  Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.

I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.
2008-07-13 20:45:47 +00:00
Tom Lane 27cb66fdfe Multi-column GIN indexes. Teodor Sigaev 2008-07-11 21:06:29 +00:00
Neil Conway 68af3752de Minor improvements to the Gin internal documentation. 2008-07-08 03:25:42 +00:00
Teodor Sigaev 2a59b7910e Fix initialization of GinScanEntryData.partialMatch 2008-07-04 13:21:18 +00:00
Bruce Momjian 6b797c852b Fix recovery.conf boolean variables to take the same range of string
values as postgresql.conf.
2008-06-30 22:10:43 +00:00
Tom Lane 7ea9b997ef Remove unnecessary coziness of GIN code with datum copying. Now that
space is tracked via GetMemoryChunkSpace, there's really no advantage
to duplicating datumCopy's innards here.  This is one bit of my toast
indirection patch that should go in anyway.
2008-06-29 21:04:01 +00:00
Alvaro Herrera a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
Tom Lane 71ff461a18 Fix 64-bit problem in recent patch. 2008-06-15 01:41:37 +00:00
Tom Lane 55a56845ed Improve the various elog messages in tuptoaster.c to report which TOAST table
the problem happened in.  These are all supposedly can't-happen cases, but
when they do happen it's useful to know where.

Back-patch to 8.3, but not further because the patch doesn't apply cleanly
further back.  Given the lack of response to my proposal of this, there
doesn't seem to be enough interest to justify much back-porting effort.
2008-06-13 02:59:47 +00:00
Heikki Linnakangas a213f1ee6c Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation
forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.

For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.
2008-06-12 09:12:31 +00:00
Heikki Linnakangas 96675bff1f Fix bug in the WAL recovery code to finish an incomplete split.
CacheInvalidateRelcache() crashes if called in WAL recovery, because the
invalidation infrastructure hasn't been initialized yet.

Back-patch to 8.2, where the bug was introduced.
2008-06-11 08:38:56 +00:00
Alvaro Herrera 7f15f8f2e7 Fix breakage caused by conflicting patches, as evidenced by the buildfarm. 2008-06-08 23:16:43 +00:00
Tom Lane 281a724d5c Rewrite DROP's dependency traversal algorithm into an honest two-pass
algorithm, replacing the original intention of a one-pass search, which
had been hacked up over time to be partially two-pass in hopes of handling
various corner cases better.  It still wasn't quite there, especially as
regards emitting unwanted NOTICE messages.  More importantly, this approach
lets us fix a number of open bugs concerning concurrent DROP scenarios,
because we can take locks during the first pass and avoid traversing to
dependent objects that were just deleted by someone else.

There is more that can be done here, but I'll go ahead and commit the
base patch before working on the options.
2008-06-08 22:41:04 +00:00
Alvaro Herrera cc87402d6e Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It is
more logical that way, and also it reduces the amount of unnecessary includes
in bufpage.h, which is widely used.

Zdenek Kotala.

My previous patch to bufpage.h should also have credited him as author, but I
forgot (sorry about that).
2008-06-08 22:00:48 +00:00
Magnus Hagander 8eee526c19 Set hidden field for guc enum missed in previous commit. 2008-05-28 15:22:05 +00:00
Tom Lane 7b8a63c3e9 Alter the xxx_pattern_ops opclasses to use the regular equality operator of
the associated datatype as their equality member.  This means that these
opclasses can now support plain equality comparisons along with LIKE tests,
thus avoiding the need for an extra index in some applications.  This
optimization was not possible when the pattern opclasses were first introduced,
because we didn't insist that text equality meant bitwise equality; but we
do now, so there is no semantic difference between regular and pattern
equality operators.

I removed the name_pattern_ops opclass altogether, since it's really useless:
name's regular comparisons are just strcmp() and are unlikely to become
something different.  Instead teach indxpath.c that btree name_ops can be
used for LIKE whether or not the locale is C.  This might lead to a useful
speedup in LIKE queries on the system catalogs in non-C locales.

The ~=~ and ~<>~ operators are gone altogether.  (It would have been nice to
keep them for backward compatibility's sake, but since the pg_amop structure
doesn't allow multiple equality operators per opclass, there's no way.)

A not-immediately-obvious incompatibility is that the sort order within
bpchar_pattern_ops indexes changes --- it had been identical to plain
strcmp, but is now trailing-blank-insensitive.  This will impact
in-place upgrades, if those ever happen.

Per discussions a couple months ago.
2008-05-27 00:13:09 +00:00
Heikki Linnakangas 50ff07d5b1 Remove arbitrary 10MB limit on two-phase state file size. It's not that hard
to go beoynd 10MB, as demonstrated by Gavin Sharry's example of dropping a
schema with ~25000 objects. The really bogus thing about the limit was that
it was enforced when a state file file was read in, not when it was written,
so you would end up with a prepared transaction that you can't commit or
abort, and the only recourse was to shut down the server and remove the file
by hand.

Raise the limit to MaxAllocSize, and enforce it also when a state file is
written. We could've removed the limit altogether, but reading in a file
larger than MaxAllocSize would fail anyway because we read it into a
palloc'd buffer.

Backpatch down to 8.1, where 2PC and this issue was introduced.
2008-05-19 18:16:26 +00:00
Tom Lane 1a604b4e31 Fix a subtle bug exposed by recent wal_sync_method rearrangements.
Formerly, the default value of wal_sync_method was determined inside xlog.c,
but now it is determined inside guc.c.  guc.c was reading xlogdefs.h
without having read <fcntl.h>, leading to wrong determination of
DEFAULT_SYNC_METHOD.  Obviously xlogdefs.h needs to include <fcntl.h>
for itself to ensure stable results.
2008-05-17 17:24:57 +00:00
Tom Lane 8a2f5d221b Reduce unnecessary PANIC to ERROR, improve a couple of comments. 2008-05-16 19:15:05 +00:00
Tom Lane e6dbcb72fa Extend GIN to support partial-match searches, and extend tsquery to support
prefix matching using this facility.

Teodor Sigaev and Oleg Bartunov
2008-05-16 16:31:02 +00:00
Tom Lane 8282d6fc70 Persuade GIN to react to control-C in a reasonable amount of time
while building a GIN index.
2008-05-16 01:27:06 +00:00
Magnus Hagander 9bf1db04c0 Remove the special variable for open_sync_bit used in O_SYNC and O_DSYNC
modes, replacing it with a call to a function that derives it from the
sync_method variable, now that it has distinct values for these two cases.

This means that assign_xlog_sync_method() no longer changes any settings,
thus fixing the bug introduced in the change to use a guc enum for
wal_sync_method.
2008-05-14 14:02:57 +00:00
Magnus Hagander 72e2db86b9 Don't try to close negative file descriptors, since this can cause
crashes on certain platforms. In particular, the MSVC runtime is known
to do this.

Fixes bug #4162, reported and diagnosed by Javier Pimas
2008-05-13 20:53:52 +00:00
Bruce Momjian d82a1d582c This is the patch replace offnum++ by OffsetNumberNext, to be
consistent.  OffsetNumberNext() has some casting that makes it useful.

Fujii Masao
2008-05-13 15:44:08 +00:00
Alvaro Herrera 5da9da71c4 Improve snapshot manager by keeping explicit track of snapshots.
There are two ways to track a snapshot: there's the "registered" list, which
is used for arbitrary long-lived snapshots; and there's the "active stack",
which is used for the snapshot that is considered "active" at any time.
This also allows users of snapshots to stop worrying about snapshot memory
allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot
assignment.  This is all done automatically now.

As a consequence, this allows us to reset MyProc->xmin when there are no
more snapshots registered in the current backend, reducing the impact that
long-running transactions have on VACUUM.
2008-05-12 20:02:02 +00:00
Magnus Hagander aa82790fca Fix breakage by the wal_sync_method patch in installations that use
O_DSYNC (specifically this broke all the Windows buildfarm members)
2008-05-12 19:45:23 +00:00
Alvaro Herrera 9084399782 Put back bufmgr.h in bufpage.h -- it is needed by some macros.
Remove #include bufmgr.h from (most?) source files which already include
bufpage.h.
2008-05-12 16:06:10 +00:00
Magnus Hagander 2739a4e1d2 Report which WAL sync method we are trying to change *to* when it fails,
not which one we had before (that worked, and thus is completley irrelevant)
2008-05-12 14:27:47 +00:00
Magnus Hagander f99760c19f Convert wal_sync_method to guc enum. 2008-05-12 08:35:05 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Tom Lane cd902b331d Change the rules for inherited CHECK constraints to be essentially the same
as those for inherited columns; that is, it's no longer allowed for a child
table to not have a check constraint matching one that exists on a parent.
This satisfies the principle of least surprise (rows selected from the parent
will always appear to meet its check constraints) and eliminates some
longstanding bogosity in pg_dump, which formerly had to guess about whether
check constraints were really inherited or not.

The implementation involves adding conislocal and coninhcount columns to
pg_constraint (paralleling attislocal and attinhcount in pg_attribute)
and refactoring various ALTER TABLE actions to be more like those for
columns.

Alex Hunsaker, Nikhil Sontakke, Tom Lane
2008-05-09 23:32:05 +00:00
Heikki Linnakangas c5f42ce8d5 Fix Assert introduced in previous patch. 2008-05-09 15:27:17 +00:00
Heikki Linnakangas f0eb3e5e58 Fix incorrect archive truncation point calculation in the %r recovery_command
parameter. This fixes bug 4137 reported by Wojciech Strzalka, where a WAL
file is deleted too early when starting the recovery of a warm standby server.

Also add a sanity check in pg_standby so that it will refuse to delete anything
earlier than the file being restored, and improve the debug message in case
nothing is deleted.

Simon Riggs. Backpatch to 8.3, which is where %r was introduced.
2008-05-09 14:27:47 +00:00
Magnus Hagander 380d1ee69e Update error messages, per notes from Tom.
Laurenz Albe
2008-04-24 14:23:43 +00:00
Magnus Hagander c979a1fefa Prevent shutdown in normal mode if online backup is running, and
have pg_ctl warn about this.

Cancel running online backups (by renaming the backup_label file,
thus rendering the backup useless) when shutting down in fast mode.

Laurenz Albe
2008-04-23 13:44:59 +00:00
Teodor Sigaev cf23b75b4d Fix using too many LWLocks bug, reported by Craig Ringer
<craig@postnewspapers.com.au>.
It was my mistake, I missed limitation of number of held locks, now GIN doesn't
use continiuous locks, but still hold buffers pinned to prevent interference
with vacuum's deletion algorithm.

Backpatch is needed.
2008-04-22 17:52:43 +00:00
Tom Lane 8472bf7a73 Allow float8, int8, and related datatypes to be passed by value on machines
where Datum is 8 bytes wide.  Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior.  Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.

Zoltan Boszormenyi, plus configurability stuff by me.
2008-04-21 00:26:47 +00:00
Alvaro Herrera 2f0f7b4bce Clean up a few places where Datums were being treated as pointers (and vice
versa) without going through DatumGetPointer.

Gavin Sherry, with Feng Tian.
2008-04-17 21:37:28 +00:00
Tom Lane d1cbd26ded Repair two places where SIGTERM exit could leave shared memory state
corrupted.  (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.)  To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook.  Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.

Backpatch as far as 8.2.  The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
2008-04-16 23:59:40 +00:00
Tom Lane 9b5c8d45f6 Push index operator lossiness determination down to GIST/GIN opclass
"consistent" functions, and remove pg_amop.opreqcheck, as per recent
discussion.  The main immediate benefit of this is that we no longer need
8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery
searches on GIN indexes.  In future it should be possible to optimize some
other queries better than is done now, by detecting at runtime whether the
index match is exact or not.

Tom Lane, after an idea of Heikki's, and with some help from Teodor.
2008-04-14 17:05:34 +00:00
Tom Lane 24558da14a Phase 2 of project to make index operator lossiness be determined at runtime
instead of plan time.  Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now).  For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions.  The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
2008-04-13 19:18:14 +00:00
Tom Lane ec498cdcbb Create new routines systable_beginscan_ordered, systable_getnext_ordered,
systable_endscan_ordered that have API similar to systable_beginscan etc
(in particular, the passed-in scankeys have heap not index attnums),
but guarantee ordered output, unlike the existing functions.  For the moment
these are just very thin wrappers around index_beginscan/index_getnext/etc.
Someday they might need to get smarter; but for now this is just a code
refactoring exercise to reduce the number of direct callers of index_getnext,
in preparation for changing that function's API.

In passing, remove index_getnext_indexitem, which has been dead code for
quite some time, and will have even less use than that in the presence
of run-time-lossy indexes.
2008-04-12 23:14:21 +00:00
Tom Lane 4e82a95476 Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole
indexscan always occurs in one call, and the results are returned in a
TIDBitmap instead of a limited-size array of TIDs.  This should improve
speed a little by reducing AM entry/exit overhead, and it is necessary
infrastructure if we are ever to support bitmap indexes.

In an only slightly related change, add support for TIDBitmaps to preserve
(somewhat lossily) the knowledge that particular TIDs reported by an index
need to have their quals rechecked when the heap is visited.  This facility
is not really used yet; we'll need to extend the forced-recheck feature to
plain indexscans before it's useful, and that hasn't been coded yet.
The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
with weighted queries.  There might be other uses in future, but that one
alone is sufficient reason.

Heikki Linnakangas, with some adjustments by me.
2008-04-10 22:25:26 +00:00
Tom Lane 2604359251 Improve hash_any() to use word-wide fetches when hashing suitably aligned
data.  This makes for a significant speedup at the cost that the results
now vary between little-endian and big-endian machines; which forces us
to add explicit ORDER BYs in a couple of regression tests to preserve
machine-independent comparison results.  Also, force initdb by bumping
catversion, since the contents of hash indexes will change (at least on
big-endian machines).

Kenneth Marshall and Tom Lane, based on work from Bob Jenkins.  This commit
does not adopt Bob's new faster mix() algorithm, however, since we still need
to convince ourselves that that doesn't degrade the quality of the hashing.
2008-04-06 16:54:49 +00:00
Bruce Momjian 2a1cf97c22 Have pg_stop_backup() wait for all archive files to be sent, rather than
returing right away.  This guarantees that when pg_stop_backup()
returns, you have a valid backup.

Simon Riggs
2008-04-05 01:34:06 +00:00
Tom Lane b03271590e Remove heap_release_fetch, which is no longer used anywhere; this simplifies
heap_fetch a little.
2008-04-03 17:12:27 +00:00
Alvaro Herrera 73b0300b2a Move the HTSU_Result enum definition into snapshot.h, to avoid including
tqual.h into heapam.h.  This makes all inclusion of tqual.h explicit.

I also sorted alphabetically the includes on some source files.
2008-03-26 21:10:39 +00:00
Alvaro Herrera 78f02ca1f5 Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.
Per complaint from Tom Lane.
2008-03-26 18:48:59 +00:00
Alvaro Herrera d43b085d57 Separate snapshot management code from tuple visibility code, create a
snapmgmt.c file for the former.  The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.

This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.

Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.
2008-03-26 16:20:48 +00:00
Tom Lane 220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
Bruce Momjian fca9fff41b More README src cleanups. 2008-03-21 13:23:29 +00:00
Bruce Momjian 4e228447aa Make source code READMEs more consistent. Add CVS tags to all README files. 2008-03-20 17:55:15 +00:00
Peter Eisentraut a7b7b07af3 Enable probes to work with Mac OS X Leopard and other OSes that will
support DTrace in the future.

Switch from using DTRACE_PROBEn macros to the dynamically generated macros.
Use "dtrace -h" to create a header file that contains the dynamically
generated macros to be used in the source code instead of the DTRACE_PROBEn
macros.  A dummy header file is generated for builds without DTrace support.

Author: Robert Lor <Robert.Lor@sun.com>
2008-03-17 19:44:41 +00:00
Tom Lane 32846f8152 Fix TransactionIdIsCurrentTransactionId() to use binary search instead of
linear search when checking child-transaction XIDs.  This makes for an
important speedup in transactions that have large numbers of children,
as in a recent example from Craig Ringer.  We can also get rid of an
ugly kluge that represented lists of TransactionIds as lists of OIDs.

Heikki Linnakangas
2008-03-17 02:18:55 +00:00
Tom Lane 787eba734b When creating a large hash index, pre-sort the index entries by estimated
bucket number, so as to ensure locality of access to the index during the
insertion step.  Without this, building an index significantly larger than
available RAM takes a very long time because of thrashing.  On the other
hand, sorting is just useless overhead when the index does fit in RAM.
We choose to sort when the initial index size exceeds effective_cache_size.

This is a revised version of work by Tom Raney and Shreya Bhargava.
2008-03-16 23:15:08 +00:00
Tom Lane c9a1cc694a Change hash index creation so that rather than always establishing exactly
two buckets at the start, we create a number of buckets appropriate for the
estimated size of the table.  This avoids a lot of expensive bucket-split
actions during initial index build on an already-populated table.

This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch
to reduce hash index build time.  I'm committing it separately to make it
easier for people to test the effects of this separately from the effects
of their other core idea (pre-sorting the index entries by bucket number).
2008-03-15 20:46:31 +00:00
Tom Lane 3e701a04fe Fix heap_page_prune's problem with failing to send cache invalidation
messages if the calling transaction aborts later on.  Collapsing out line
pointer redirects is a done deal as soon as we complete the page update,
so syscache *must* be notified even if the VACUUM FULL as a whole doesn't
complete.  To fix, add some functionality to inval.c to allow the pending
inval messages to be sent immediately while heap_page_prune is still
running.  The implementation is a bit chintzy: it will only work in the
context of VACUUM FULL.  But that's all we need now, and it can always be
extended later if needed.  Per my trouble report of a week ago.
2008-03-13 18:00:32 +00:00
Tom Lane 611b4393f2 Make TransactionIdIsInProgress check transam.c's single-item XID status cache
before it goes groveling through the ProcArray.  In situations where the same
recently-committed transaction ID is checked repeatedly by tqual.c, this saves
a lot of shared-memory searches.  And it's cheap enough that it shouldn't
hurt noticeably when it doesn't help.
Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.
2008-03-11 20:20:35 +00:00
Tom Lane 2fc2795456 Remove no-longer-used XLogCacheByte field of XLogCtl.
Itagaki Takahiro
2008-03-10 02:13:22 +00:00
Tom Lane 6f10eb2111 Refactor heap_page_prune so that instead of changing item states on-the-fly,
it accumulates the set of changes to be made and then applies them.  It had
to accumulate the set of changes anyway to prepare a WAL record for the
pruning action, so this isn't an enormous change; the only new complexity is
to not doubly mark tuples that are visited twice in the scan.  The main
advantage is that we can substantially reduce the scope of the critical
section in which the changes are applied, thus avoiding PANIC in foreseeable
cases like running out of memory in inval.c.  A nice secondary advantage is
that it is now far clearer that WAL replay will actually do the same thing
that the original pruning did.

This commit doesn't do anything about the open problem that
CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change
caused by collapsing out a redirect pointer.  But whatever we do about that,
it'll be a good idea to not do it inside a critical section.
2008-03-08 21:57:59 +00:00
Tom Lane ad434473eb This patch addresses some issues in TOAST compression strategy that
were discussed last year, but we felt it was too late in the 8.3 cycle to
change the code immediately.  Specifically, the patch:

* Reduces the minimum datum size to be considered for compression from
256 to 32 bytes, as suggested by Greg Stark.

* Increases the required compression rate for compressed storage from
20% to 25%, again per Greg's suggestion.

* Replaces force_input_size (size above which compression is forced)
with a maximum size to be considered for compression.  It was agreed
that allowing large inputs to escape the minimum-compression-rate
requirement was not bright, and that indeed we'd rather have a knob
that acted in the other direction.  I set this value to 1MB for the
moment, but it could use some performance studies to tune it.

* Adds an early-failure path to the compressor as suggested by Jan:
if it's been unable to find even one compressible substring in the
first 1KB (parameterizable), assume we're looking at incompressible
input and give up.  (Possibly this logic can be improved, but I'll
commit it as-is for now.)

* Improves the toasting heuristics so that when we have very large
fields with attstorage 'x' or 'e', we will push those out to toast
storage before considering inline compression of shorter fields.
This also responds to a suggestion of Greg's, though my original
proposal for a solution was a bit off base because it didn't fix
the problem for large 'e' fields.

There was some discussion in the earlier threads of exposing some
of the compression knobs to users, perhaps even on a per-column
basis.  I have not done anything about that here.  It seems to me
that if we are changing around the parameters, we'd better get some
experience and be sure we are happy with the design before we set
things in stone by providing user-visible knobs.
2008-03-07 23:20:21 +00:00
Tom Lane 6a17826621 Change hashscan.c to keep its list of active hash index scans in
TopMemoryContext, rather than scattered through executor per-query contexts.
This poses no danger of memory leak since the ResourceOwner mechanism
guarantees release of no-longer-needed items.  It is needed because the
per-query context might already be released by the time we try to clean up
the hash scan list.  Report by ykhuang, diagnosis by Heikki.

Back-patch to 8.0, where the ResourceOwner-based cleanup was introduced.
The given test case does not fail before 8.2, probably because we rearranged
transaction abort processing somehow; but this coding is undoubtedly risky
so I'll patch 8.0 and 8.1 anyway.
2008-03-07 15:59:03 +00:00
Tom Lane 7d6e6e2e97 Fix PREPARE TRANSACTION to reject the case where the transaction has dropped a
temporary table; we can't support that because there's no way to clean up the
source backend's internal state if the eventual COMMIT PREPARED is done by
another backend.  This was checked correctly in 8.1 but I broke it in 8.2 :-(.
Patch by Heikki Linnakangas, original trouble report by John Smith.
2008-03-04 19:54:06 +00:00
Tom Lane c67f6f2f57 Reducing the assumed alignment of struct varlena means that the compiler
is also licensed to put a local variable declared that way at an unaligned
address.  Which will not work if the variable is then manipulated with
SET_VARSIZE or other macros that assume alignment.  So the previous patch
is not an unalloyed good, but on balance I think it's still a win, since
we have very few places that do that sort of thing.  Fix the one place in
tuptoaster.c that does it.  Per buildfarm results from gypsy_moth
(I'm a bit surprised that only one machine showed a failure).
2008-02-29 17:47:41 +00:00
Tom Lane 9713c06319 Change the declaration of struct varlena so that the length word is
represented as "char ...[4]" not "int32".  Since the length word is never
supposed to be accessed via this struct member anyway, this won't break
any existing code that is following the rules.  The advantage is that C
compilers will no longer assume that a pointer to struct varlena is
word-aligned, which prevents incorrect optimizations in TOAST-pointer
access and perhaps other places.  gcc doesn't seem to do this (at least
not at -O2), but the problem is demonstrable on some other compilers.

I changed struct inet as well, but didn't bother to touch a lot of other
struct definitions in which it wouldn't make any difference because there
were other fields forcing int alignment anyway.  Hopefully none of those
struct definitions are used for accessing unaligned Datums.
2008-02-23 19:11:45 +00:00
Peter Eisentraut 1f4a587fc3 Remove another target I forgot during the refactoring 2008-02-19 11:49:12 +00:00
Peter Eisentraut 0474dcb608 Refactor backend makefiles to remove lots of duplicate code 2008-02-19 10:30:09 +00:00
Tom Lane cd00406774 Replace time_t with pg_time_t (same values, but always int64) in on-disk
data structures and backend internal APIs.  This solves problems we've seen
recently with inconsistent layout of pg_control between machines that have
32-bit time_t and those that have already migrated to 64-bit time_t.  Also,
we can get out from under the problem that Windows' Unix-API emulation is not
consistent about the width of time_t.

There are a few remaining places where local time_t variables are used to hold
the current or recent result of time(NULL).  I didn't bother changing these
since they do not affect any cross-module APIs and surely all platforms will
have 64-bit time_t before overflow becomes an actual risk.  time_t should
be avoided for anything visible to extension modules, however.
2008-02-17 02:09:32 +00:00
Tom Lane 47df4f6688 Add a GUC variable "synchronize_seqscans" to allow clients to disable the new
synchronized-scanning behavior, and make pg_dump disable sync scans so that
it will reliably preserve row ordering.  Per recent discussions.
2008-01-30 18:35:55 +00:00
Peter Eisentraut 6f8f8d2daa Provide a clearer error message if the pg_control version number looks
wrong because of mismatched byte ordering.
2008-01-21 11:17:46 +00:00
Tom Lane ac12412ede Revise memory management for libxml calls. Instead of keeping libxml's data
in whichever context happens to be current during a call of an xml.c function,
use a dedicated context that will not go away until we explicitly delete it
(which we do at transaction end or subtransaction abort).  This makes recovery
after an error much simpler --- we don't have to individually delete the data
structures created by libxml.  Also, we need to initialize and cleanup libxml
only once per transaction (if there's no error) instead of once per function
call, so it should be a bit faster.  We'll need to keep an eye out for
intra-transaction memory leaks, though.  Alvaro and Tom.
2008-01-15 18:57:00 +00:00
Tom Lane d3b1b1f9d8 Fix CREATE INDEX CONCURRENTLY so that it won't use synchronized scan for
its second pass over the table.  It has to start at block zero, else the
"merge join" logic for detecting which TIDs are already in the index
doesn't work.  Hence, extend heapam.c's API so that callers can enable or
disable syncscan.  (I put in an option to disable buffer access strategy,
too, just in case somebody needs it.)  Per report from Hannes Dorbath.
2008-01-14 01:39:09 +00:00
Tom Lane eedb068c0a Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,
and CLUSTER) execute as the table owner rather than the calling user, using
the same privilege-switching mechanism already used for SECURITY DEFINER
functions.  The purpose of this change is to ensure that user-defined
functions used in index definitions cannot acquire the privileges of a
superuser account that is performing routine maintenance.  While a function
used in an index is supposed to be IMMUTABLE and thus not able to do anything
very interesting, there are several easy ways around that restriction; and
even if we could plug them all, there would remain a risk of reading sensitive
information and broadcasting it through a covert channel such as CPU usage.

To prevent bypassing this security measure, execution of SET SESSION
AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context.

Thanks to Itagaki Takahiro for reporting this vulnerability.

Security: CVE-2007-6600
2008-01-03 21:23:15 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Tom Lane ac1ae9f2fa Improve a number of elog messages for not-supposed-to-happen cases in btrees,
since these seem to happen after all in corrupted indexes.  Make sure we
supply the index name in all cases, and provide relevant block numbers where
available.  Also consistently identify the index name as such.

Back-patch to 8.2, in hopes that this might help Mason Hale figure out his
problem.
2007-12-31 04:52:05 +00:00
Tom Lane 265f904d8f Code review for LIKE ... INCLUDING INDEXES patch. Fix failure to propagate
constraint status of copied indexes (bug #3774), as well as various other
small bugs such as failure to pstrdup when needed.  Allow INCLUDING INDEXES
indexes to be merged with identical declared indexes (perhaps not real useful,
but the code is there and having it not apply to LIKE indexes seems pretty
unorthogonal).  Avoid useless work in generateClonedIndexStmt().  Undo some
poorly chosen API changes, and put a couple of routines in modules that seem
to be better places for them.
2007-12-01 23:44:44 +00:00
Tom Lane 895a94de6d Avoid incrementing the CommandCounter when CommandCounterIncrement is called
but no database changes have been made since the last CommandCounterIncrement.
This should result in a significant improvement in the number of "commands"
that can typically be performed within a transaction before hitting the 2^32
CommandId size limit.  In particular this buys back (and more) the possible
adverse consequences of my previous patch to fix plan caching behavior.

The implementation requires tracking whether the current CommandCounter
value has been "used" to mark any tuples.  CommandCounter values stored into
snapshots are presumed not to be used for this purpose.  This requires some
small executor changes, since the executor used to conflate the curcid of
the snapshot it was using with the command ID to mark output tuples with.
Separating these concepts allows some small simplifications in executor APIs.

Something for the TODO list: look into having CommandCounterIncrement not do
AcceptInvalidationMessages.  It seems fairly bogus to be doing it there,
but exactly where to do it instead isn't clear, and I'm disinclined to mess
with asynchronous behavior during late beta.
2007-11-30 21:22:54 +00:00
Tom Lane 6f4cfe48ac Improve GIN index build's tracking of memory usage by using
GetMemoryChunkSpace, not just the palloc request size.  This brings the
allocatedMemory counter close enough to reality (as measured by
MemoryContextStats printouts) that I think we can get rid of the arbitrary
factor-of-2 adjustment that was put into the code initially.  Given the
sensitivity of GIN build to work memory size, not using as much of work
memory as we're allowed to seems a pretty bad idea.
2007-11-16 21:55:59 +00:00
Tom Lane 93190c3098 Repair still another bug in the btree page split WAL reduction patch:
it failed for splits of non-leaf pages because in such pages the first
data key on a page is suppressed, and so we can't just copy the first
key from the right page to reconstitute the left page's high key.
Problem found by Koichi Suzuki, patch by Heikki.
2007-11-16 19:53:50 +00:00
Bruce Momjian f639df0d61 Small comment spacing improvement. 2007-11-16 01:51:22 +00:00
Bruce Momjian 7d4c99b414 Fix pgindent to properly handle 'else' and single-line comments on the
same line;  previous fix was only partial.  Re-run pgindent on files
that need it.
2007-11-15 23:23:44 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Peter Eisentraut b30769ee54 When logging the recovery.conf parameters, show them quoted as they would
appear in the configuration file.
2007-11-15 22:02:12 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 6cc4451b5c Prevent re-use of a deleted relation's relfilenode until after the next
checkpoint.  This guards against an unlikely data-loss scenario in which
we re-use the relfilenode, then crash, then replay the deletion and
recreation of the file.  Even then we'd be OK if all insertions into the
new relation had been WAL-logged ... but that's not guaranteed given all
the no-WAL-logging optimizations that have recently been added.

Patch by Heikki Linnakangas, per a discussion last month.
2007-11-15 20:36:40 +00:00
Tom Lane b40c0a4bb0 Clean up some stray references to tsearch2. 2007-11-13 23:36:26 +00:00
Tom Lane 0bd4da23a4 Ensure that typmod decoration on a datatype name is validated in all cases,
even in code paths where we don't pay any subsequent attention to the typmod
value.  This seems needed in view of the fact that 8.3's generalized typmod
support will accept a lot of bogus syntax, such as "timestamp(foo)" or
"record(int, 42)" --- if we allow such things to pass without comment,
users will get confused.  Per a recent example from Greg Stark.

To implement this in a way that's not very vulnerable to future
bugs-of-omission, refactor the API of parse_type.c's TypeName lookup routines
so that typmod validation is folded into the base lookup operation.  Callers
can still choose not to receive the encoded typmod, but we'll check the
decoration anyway if it's present.
2007-11-11 19:22:49 +00:00
Bruce Momjian 82748bc253 Reduce error level of ROLLBACK outside a transaction from WARNING to
NOTICE.
2007-11-10 14:36:44 +00:00
Peter Eisentraut 5f9869d0ee Use "alternative" instead of "alternate" where it is clearer. 2007-11-07 12:24:24 +00:00
Teodor Sigaev bf5ccf382c - Add check of already changed page while replay WAL. This touches only
ginRedoInsert(), because other ginRedo* functions rewrite whole page or
make changes which could be applied several times without consistent's loss

- Remove check of identifying of corresponding split record:
it's possible that replaying of WAL starts after actual page split, but before
removing of that split from incomplete splits list. In this case, that check
cause FATAL error.

Per stress test which reproduces bug reported by Craig McElroy
<craig.mcelroy@contegix.com>
2007-10-29 19:26:57 +00:00
Teodor Sigaev 85376c6f7d Fix coredump during replay WAL after crash. Change entrySplitPage() to prevent
usage of any information from system catalog, because it could be called during
replay of WAL.

Per bug report from Craig McElroy <craig.mcelroy@contegix.com>. Patch doesn't
change on-disk storage.
2007-10-29 13:49:21 +00:00
Alvaro Herrera 745c1b2c2a Rearrange vacuum-related bits in PGPROC as a bitmask, to better support
having several of them.  Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).

Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
2007-10-24 20:55:36 +00:00
Tom Lane 9226ba817b Keep heap_page_prune from marking the buffer dirty when it didn't
really change anything.  Per report from Itagaki Takahiro.  Fix by
Pavan Deolasee.
2007-10-24 13:05:57 +00:00
Tom Lane 56303abff0 Tweak toast-related logic in heapam.c so that the toaster is only invoked
when relkind = RELKIND_RELATION.  This syncs these tests with the Asserts
in tuptoaster.c, and ensures that we won't ever try to, for example,
compress a sequence's tuple.  Problem found by Greg Stark while stress-testing
with much-smaller-than-normal page sizes.
2007-10-16 17:05:26 +00:00
Tom Lane 5c8eb929e6 When telling the bgwriter that we need a checkpoint because too much xlog
has been consumed, recheck against the latest value of RedoRecPtr before
really sending the signal.  This avoids useless checkpoint activity if
XLogWrite is executed when we have a very stale local copy of RedoRecPtr.
The potential for useless checkpoint is very much worse in 8.3 because of
the walwriter process (which never does XLogInsert), so while this behavior
was intentional, it needs to be changed.  Per report from Itagaki Takahiro.
2007-10-12 19:39:59 +00:00
Tom Lane 56b7695cf5 Remove incorrect use of VARSIZE() on a toasted datum. We can just remove it
instead of fix it, since once we've set toast_action[i] to 'p' it no longer
matters what toast_sizes[i] is.  Greg Stark
2007-10-11 18:19:58 +00:00
Tom Lane b526462f9e Avoid assuming that struct varattrib_pointer doesn't get padded by the
compiler --- at least on ARM, it does.  I suspect that the varvarlena patch
has been creating larger-than-intended toast pointers all along on ARM,
but it wasn't exposed until the latest tweak added some Asserts that
calculated the expected size in a different way.  We could probably have
fixed this by adding __attribute__((packed)) as is done for ItemPointerData,
but struct varattrib_pointer isn't really all that useful anyway, so it
seems cleanest to just get rid of it and have only struct varattrib_1b_e.
Per results from buildfarm member quagga.
2007-10-01 16:25:56 +00:00
Tom Lane 27b8922221 Add an extra header byte to TOAST-pointer datums to represent their size
explicitly.  This means a TOAST pointer takes 18 bytes instead of 17 --- still
smaller than in 8.2 --- which seems a good tradeoff to ensure we won't have
painted ourselves into a corner if we want to support multiple types of TOAST
pointer later on.  Per discussion with Greg Stark.
2007-09-30 19:54:58 +00:00