Commit Graph

417 Commits

Author SHA1 Message Date
Tom Lane c970292a94 Remove very ancient tuple-counting infrastructure (IncrRetrieved() and
friends).  This code has all been ifdef'd out for many years, and doesn't
seem to have any prospect of becoming any more useful in the future.
EXPLAIN ANALYZE is what people use in practice, and I think if we did want
process-wide counters we'd be more likely to put in dtrace events for that
than try to resurrect this code.  Get rid of it so as to have one less detail
to worry about while refactoring execMain.c.
2009-10-08 22:34:57 +00:00
Tom Lane e66d714386 Make sure that GIN fast-insert and regular code paths enforce the same
tuple size limit.  Improve the error message for index-tuple-too-large
so that it includes the actual size, the limit, and the index name.
Sync with the btree occurrences of the same error.

Back-patch to 8.4 because it appears that the out-of-sync problem
is occurring in the field.

Teodor and Tom
2009-10-02 21:14:04 +00:00
Tom Lane 527f0ae3fa Department of second thoughts: let's show the exact key during unique index
build failures, too.  Refactor a bit more since that error message isn't
spelled the same.
2009-08-01 20:59:17 +00:00
Tom Lane b680ae4bdb Improve unique-constraint-violation error messages to include the exact
values being complained of.

In passing, also remove the arbitrary length limitation in the similar
error detail message for foreign key violations.

Itagaki Takahiro
2009-08-01 19:59:41 +00:00
Tom Lane 25d9bf2e3e Support deferrable uniqueness constraints.
The current implementation fires an AFTER ROW trigger for each tuple that
looks like it might be non-unique according to the index contents at the
time of insertion.  This works well as long as there aren't many conflicts,
but won't scale to massive unique-key reassignments.  Improving that case
is a TODO item.

Dean Rasheed
2009-07-29 20:56:21 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane 32ea236361 Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane
behavior in cases where we don't know the heap tuple count accurately; in
particular partial vacuum, but this also makes the API a bit more useful
for ANALYZE.  This patch adds "estimated_count" flags to both structs so
that an approximate count can be flagged as such, and adjusts the logic
so that approximate counts are not used for updating pg_class.reltuples.

This fixes my previous complaint that VACUUM was putting ridiculous values
into pg_class.reltuples for indexes.  The actual impact of that bug is
limited, because the planner only pays attention to reltuples for an index
if the index is partial; which probably explains why beta testers hadn't
noticed a degradation in plan quality from it.  But it needs to be fixed.

The whole thing is a bit messy and should be redesigned in future, because
reltuples now has the potential to drift quite far away from reality when
a long period elapses with no non-partial vacuums.  But this is as good as
it's going to get for 8.4.
2009-06-06 22:13:52 +00:00
Tom Lane 8f348112f3 Insert CHECK_FOR_INTERRUPTS() calls into btree and hash index scans at the
points where we step right or left to the next page.  This should ensure
reasonable response time to a query cancel request during an unsuccessful
index scan, as seen in recent gripe from Marc Cousin.  It's a bit trickier
than it might seem at first glance, because CHECK_FOR_INTERRUPTS() is a no-op
if executed while holding a buffer lock.  So we have to do it just at the
point where we've dropped one page lock and not yet acquired the next.

Remove CHECK_FOR_INTERRUPTS calls at the top level of btgetbitmap and
hashgetbitmap, since they're pointless given the added checks.

I think that GIST is okay already --- at least, there's a CHECK_FOR_INTERRUPTS
at a plausible-looking place in gistnext().  I don't claim to know GIN well
enough to try to poke it for this, if indeed it has a problem at all.

This is a pre-existing issue, but in view of the lack of prior complaints
I'm not going to risk back-patching.
2009-05-05 19:36:32 +00:00
Tom Lane 2aa5ca952f Update comment for _bt_relandgetbuf. 2009-05-05 19:02:22 +00:00
Tom Lane ff301d6e69 Implement "fastupdate" support for GIN indexes, in which we try to accumulate
multiple index entries in a holding area before adding them to the main index
structure.  This helps because bulk insert is (usually) significantly faster
than retail insert for GIN.

This patch also removes GIN support for amgettuple-style index scans.  The
API defined for amgettuple is difficult to support with fastupdate, and
the previously committed partial-match feature didn't really work with
it either.  We might eventually figure a way to put back amgettuple
support, but it won't happen for 8.4.

catversion bumped because of change in GIN's pg_am entry, and because
the format of GIN indexes changed on-disk (there's a metapage now,
and possibly a pending list).

Teodor Sigaev
2009-03-24 20:17:18 +00:00
Heikki Linnakangas b2a667b9ee Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should
be used instead of the normal exclusive lock, and make WAL redo functions
responsible for calling RestoreBkpBlocks(). They know better what kind of a
lock they need.

At the moment, this just moves things around with no functional change, but
makes the hot standby patch that's under review cleaner.
2009-01-20 18:59:37 +00:00
Alvaro Herrera ba748f7a11 Change the reloptions machinery to use a table-based parser, and provide
a more complete framework for writing custom option processing routines
by user-defined access methods.

Catalog version bumped due to the general API changes, which are going to
affect user-defined "amoptions" routines.
2009-01-05 17:14:28 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Heikki Linnakangas 3396000684 Rethink the way FSM truncation works. Instead of WAL-logging FSM
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.

This will make it easier to add new forks with similar truncation logic,
like the visibility map.
2008-11-19 10:34:52 +00:00
Tom Lane 10e3acb8e7 Prevent synchronous scan during GIN index build, because GIN is optimized
for inserting tuples in increasing TID order.  It's not clear whether this
fully explains Ivan Sergio Borgonovo's complaint, but simple testing
confirms that a scan that doesn't start at block 0 can slow GIN build by
a factor of three or four.

Backpatch to 8.3.  Sync scan didn't exist before that.
2008-11-13 17:42:10 +00:00
Tom Lane b4eae023bb Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage
by splitting it into three functions with better-defined behaviors.

Zdenek Kotala
2008-11-03 20:47:49 +00:00
Heikki Linnakangas 19c8dc839b Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".

Add fork number to some error messages in bufmgr.c, that still lacked it.
2008-10-31 15:05:00 +00:00
Heikki Linnakangas 89f373bf5b Index FSMs needs to be vacuumed as well. Report by Jeff Davis. 2008-10-06 08:04:11 +00:00
Heikki Linnakangas 15c121b3ed Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the
free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).

This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.

Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.
2008-09-30 10:52:14 +00:00
Heikki Linnakangas 3f0e808c4a Introduce the concept of relation forks. An smgr relation can now consist
of multiple forks, and each fork can be created and grown separately.

The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.

This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.
2008-08-11 11:05:11 +00:00
Tom Lane 9d035f4254 Clean up the use of some page-header-access macros: principally, use
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible.  Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.

I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.
2008-07-13 20:45:47 +00:00
Alvaro Herrera a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
Heikki Linnakangas a213f1ee6c Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation
forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.

For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.
2008-06-12 09:12:31 +00:00
Heikki Linnakangas 96675bff1f Fix bug in the WAL recovery code to finish an incomplete split.
CacheInvalidateRelcache() crashes if called in WAL recovery, because the
invalidation infrastructure hasn't been initialized yet.

Back-patch to 8.2, where the bug was introduced.
2008-06-11 08:38:56 +00:00
Tom Lane 7b8a63c3e9 Alter the xxx_pattern_ops opclasses to use the regular equality operator of
the associated datatype as their equality member.  This means that these
opclasses can now support plain equality comparisons along with LIKE tests,
thus avoiding the need for an extra index in some applications.  This
optimization was not possible when the pattern opclasses were first introduced,
because we didn't insist that text equality meant bitwise equality; but we
do now, so there is no semantic difference between regular and pattern
equality operators.

I removed the name_pattern_ops opclass altogether, since it's really useless:
name's regular comparisons are just strcmp() and are unlikely to become
something different.  Instead teach indxpath.c that btree name_ops can be
used for LIKE whether or not the locale is C.  This might lead to a useful
speedup in LIKE queries on the system catalogs in non-C locales.

The ~=~ and ~<>~ operators are gone altogether.  (It would have been nice to
keep them for backward compatibility's sake, but since the pg_amop structure
doesn't allow multiple equality operators per opclass, there's no way.)

A not-immediately-obvious incompatibility is that the sort order within
bpchar_pattern_ops indexes changes --- it had been identical to plain
strcmp, but is now trailing-blank-insensitive.  This will impact
in-place upgrades, if those ever happen.

Per discussions a couple months ago.
2008-05-27 00:13:09 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Tom Lane d1cbd26ded Repair two places where SIGTERM exit could leave shared memory state
corrupted.  (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.)  To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook.  Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.

Backpatch as far as 8.2.  The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
2008-04-16 23:59:40 +00:00
Tom Lane 24558da14a Phase 2 of project to make index operator lossiness be determined at runtime
instead of plan time.  Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now).  For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions.  The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
2008-04-13 19:18:14 +00:00
Tom Lane 4e82a95476 Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole
indexscan always occurs in one call, and the results are returned in a
TIDBitmap instead of a limited-size array of TIDs.  This should improve
speed a little by reducing AM entry/exit overhead, and it is necessary
infrastructure if we are ever to support bitmap indexes.

In an only slightly related change, add support for TIDBitmaps to preserve
(somewhat lossily) the knowledge that particular TIDs reported by an index
need to have their quals rechecked when the heap is visited.  This facility
is not really used yet; we'll need to extend the forced-recheck feature to
plain indexscans before it's useful, and that hasn't been coded yet.
The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
with weighted queries.  There might be other uses in future, but that one
alone is sufficient reason.

Heikki Linnakangas, with some adjustments by me.
2008-04-10 22:25:26 +00:00
Alvaro Herrera 73b0300b2a Move the HTSU_Result enum definition into snapshot.h, to avoid including
tqual.h into heapam.h.  This makes all inclusion of tqual.h explicit.

I also sorted alphabetically the includes on some source files.
2008-03-26 21:10:39 +00:00
Alvaro Herrera 78f02ca1f5 Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.
Per complaint from Tom Lane.
2008-03-26 18:48:59 +00:00
Alvaro Herrera d43b085d57 Separate snapshot management code from tuple visibility code, create a
snapmgmt.c file for the former.  The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.

This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.

Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.
2008-03-26 16:20:48 +00:00
Bruce Momjian fca9fff41b More README src cleanups. 2008-03-21 13:23:29 +00:00
Bruce Momjian 4e228447aa Make source code READMEs more consistent. Add CVS tags to all README files. 2008-03-20 17:55:15 +00:00
Tom Lane 787eba734b When creating a large hash index, pre-sort the index entries by estimated
bucket number, so as to ensure locality of access to the index during the
insertion step.  Without this, building an index significantly larger than
available RAM takes a very long time because of thrashing.  On the other
hand, sorting is just useless overhead when the index does fit in RAM.
We choose to sort when the initial index size exceeds effective_cache_size.

This is a revised version of work by Tom Raney and Shreya Bhargava.
2008-03-16 23:15:08 +00:00
Peter Eisentraut 0474dcb608 Refactor backend makefiles to remove lots of duplicate code 2008-02-19 10:30:09 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Tom Lane ac1ae9f2fa Improve a number of elog messages for not-supposed-to-happen cases in btrees,
since these seem to happen after all in corrupted indexes.  Make sure we
supply the index name in all cases, and provide relevant block numbers where
available.  Also consistently identify the index name as such.

Back-patch to 8.2, in hopes that this might help Mason Hale figure out his
problem.
2007-12-31 04:52:05 +00:00
Tom Lane 93190c3098 Repair still another bug in the btree page split WAL reduction patch:
it failed for splits of non-leaf pages because in such pages the first
data key on a page is suppressed, and so we can't just copy the first
key from the right page to reconstitute the left page's high key.
Problem found by Koichi Suzuki, patch by Heikki.
2007-11-16 19:53:50 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 282d2a03dd HOT updates. When we update a tuple without changing any of its indexed
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
2007-09-20 17:56:33 +00:00
Tom Lane 6889303531 Redefine the lp_flags field of item pointers as having four states, rather
than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time).  This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len.  The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.
2007-09-12 22:10:26 +00:00
Peter Eisentraut f4a3789b39 Clarify some error messages about duplicate things. 2007-06-03 22:16:03 +00:00
Tom Lane d526575f89 Make large sequential scans and VACUUMs work in a limited-size "ring" of
buffers, rather than blowing out the whole shared-buffer arena.  Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer.  Those flushes will now occur only once per
ring-ful.  The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done.  The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it.  To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.
2007-05-30 20:12:03 +00:00
Tom Lane 77947c51c0 Fix up pgstats counting of live and dead tuples to recognize that committed
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
2007-05-27 03:50:39 +00:00
Tom Lane a8d539f124 To support external compression of archived WAL data, add a flag bit to
WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made).  Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.

This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database.  The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry.  Per discussion.

Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted).  The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.
2007-05-20 21:08:19 +00:00
Tom Lane 226a100568 Code review for btree page split WAL reduction patch. Make it actually work
(original code *always* created a full-page image for the left page, thus
leaving the intended savings unrealized), avoid risk of not having enough room
on the page during xlog restore, squeeze out another couple bytes in the xlog
record, clean up neglected comments.
2007-04-11 20:47:38 +00:00
Tom Lane 56218fbc48 Minor tweaking of index special-space definitions so that the various
index types can be reliably distinguished by examining the special space
on an index page.  Per my earlier proposal, plus the realization that
there's no need for btree's vacuum cycle ID to cycle through every possible
16-bit value.  Restricting its range a little costs nearly nothing and
eliminates the possibility of collisions.
Memo to self: remember to make bitmap indexes play along with this scheme,
assuming that patch ever gets accepted.
2007-04-09 22:04:08 +00:00
Tom Lane 7b78474da3 Make CLUSTER MVCC-safe. Heikki Linnakangas 2007-04-08 01:26:33 +00:00