Commit Graph

187 Commits

Author SHA1 Message Date
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane 3695a55513 Replace simple constant pg_am.amcanreturn with an AM support function.
The need for this was debated when we put in the index-only-scan feature,
but at the time we had no near-term expectation of having AMs that could
support such scans for only some indexes; so we kept it simple.  However,
the SP-GiST AM forces the issue, so let's fix it.

This patch only installs the new API; no behavior actually changes.
2011-12-18 15:50:37 -05:00
Tom Lane cbfa92c23c Improve index-only scans to avoid repeated access to the index page.
We copy all the matched tuples off the page during _bt_readpage, instead of
expensively re-locking the page during each subsequent tuple fetch.  This
costs a bit more local storage, but not more than 2*BLCKSZ worth, and the
reduction in LWLock traffic is certainly worth that.  What's more, this
lets us get rid of the API wart in the original patch that said an index AM
could randomly decline to supply an index tuple despite having asserted
pg_am.amcanreturn.  That will be important for future improvements in the
index-only-scan feature, since the executor will now be able to rely on
having the index data available.
2011-10-09 00:21:08 -04:00
Tom Lane a2822fb933 Support index-only scans using the visibility map to avoid heap fetches.
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page.  This patch depends
on the previous patches that made the visibility map reliable.

There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.

Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
2011-10-07 20:14:13 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Robert Haas 4da99ea423 Avoid having two copies of the HOT-chain search logic.
It's been like this since HOT was originally introduced, but the logic
is complex enough that this is a recipe for bugs, as we've already
found out with SSI.  So refactor heap_hot_search_buffer() so that it
can satisfy the needs of index_getnext(), and make index_getnext() use
that rather than duplicating the logic.

This change was originally proposed by Heikki Linnakangas as part of a
larger refactoring oriented towards allowing index-only scans.  I
extracted and adjusted this part, since it seems to have independent
merit.  Review by Jeff Davis.
2011-06-27 10:27:17 -04:00
Heikki Linnakangas 0a0e2b52a5 Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC
snapshots, like in REINDEX, are basically non-transactional operations. The
DDL operation itself might participate in SSI, but there's separate
functions for that.

Kevin Grittner and Dan Ports, with some changes by me.
2011-06-15 12:11:18 +03:00
Bruce Momjian 6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Heikki Linnakangas 3103f9a77d The row-version chaining in Serializable Snapshot Isolation was still wrong.
On further analysis, it turns out that it is not needed to duplicate predicate
locks to the new row version at update, the lock on the version that the
transaction saw as visible is enough. However, there was a different bug in
the code that checks for dangerous structures when a new rw-conflict happens.
Fix that bug, and remove all the row-version chaining related code.

Kevin Grittner & Dan Ports, with some comment editorialization by me.
2011-05-30 20:47:17 +03:00
Tom Lane d2f60a3ab0 Add an Assert that indexam.c isn't used on an index awaiting reindexing.
This might have caught the recent embarrassment over trying to modify
pg_index while its indexes were being rebuilt.

Noah Misch
2011-04-16 19:29:10 -04:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane b310b6e31c Revise collation derivation method and expression-tree representation.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record).  Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.

Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree.  This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.

Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
2011-03-19 20:30:08 -04:00
Tom Lane 149b2673c2 Fix incorrect access to pg_index.indcollation.
Since this field is after a variable-length field, it can't simply be
accessed via the C struct for pg_index.  Fortunately, the relcache already
did the dirty work of pulling the information out to where it can be
accessed easily, so this is a one-line fix.

Andres Freund
2011-03-06 12:10:50 -05:00
Peter Eisentraut 414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Heikki Linnakangas dafaa3efb7 Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.

To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.

A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.

Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.

We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.

Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.

Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
2011-02-08 00:09:08 +02:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Tom Lane d583f10b7e Create core infrastructure for KNNGIST.
This is a heavily revised version of builtin_knngist_core-0.9.  The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too.  Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values.  AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane
2010-12-02 20:51:37 -05:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Simon Riggs efc16ea520 Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 01:32:45 +00:00
Tom Lane 25d9bf2e3e Support deferrable uniqueness constraints.
The current implementation fires an AFTER ROW trigger for each tuple that
looks like it might be non-unique according to the index contents at the
time of insertion.  This works well as long as there aren't many conflicts,
but won't scale to massive unique-key reassignments.  Improving that case
is a TODO item.

Dean Rasheed
2009-07-29 20:56:21 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane ff301d6e69 Implement "fastupdate" support for GIN indexes, in which we try to accumulate
multiple index entries in a holding area before adding them to the main index
structure.  This helps because bulk insert is (usually) significantly faster
than retail insert for GIN.

This patch also removes GIN support for amgettuple-style index scans.  The
API defined for amgettuple is difficult to support with fastupdate, and
the previously committed partial-match feature didn't really work with
it either.  We might eventually figure a way to put back amgettuple
support, but it won't happen for 8.4.

catversion bumped because of change in GIN's pg_am entry, and because
the format of GIN indexes changed on-disk (there's a metapage now,
and possibly a pending list).

Teodor Sigaev
2009-03-24 20:17:18 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane 30584cda35 Fix small query-lifespan memory leak introduced by 8.4 change in index AM API
for bitmap index scans.  Per report and test case from Kevin Grittner.
2008-10-10 14:17:08 +00:00
Alvaro Herrera d53a56687f Initialize the minimum frozen Xid in vac_update_datfrozenxid using
GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not
depend on the latter being correctly set elsewhere, and while it is more
expensive, this code path is not performance-critical.  This is a real
risk for autovacuum, because it can execute whole cycles without doing
a single vacuum, which would mean that RecentGlobalXmin would stay at its
initialization value, FirstNormalTransactionId, causing a bogus value to be
inserted in pg_database.  This bug could explain some recent reports of
failure to truncate pg_clog.

At the same time, change the initialization of RecentGlobalXmin to
InvalidTransactionId, and ensure that it's set to something else whenever
it's going to be used.  Using it as FirstNormalTransactionId in HOT page
pruning could incur in data loss.  InitPostgres takes care of setting it
to a valid value, but the extra checks are there to prevent "special"
backends from behaving in unusual ways.

Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us
2008-09-11 14:01:10 +00:00
Alvaro Herrera a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Tom Lane 24558da14a Phase 2 of project to make index operator lossiness be determined at runtime
instead of plan time.  Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now).  For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions.  The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
2008-04-13 19:18:14 +00:00
Tom Lane ec498cdcbb Create new routines systable_beginscan_ordered, systable_getnext_ordered,
systable_endscan_ordered that have API similar to systable_beginscan etc
(in particular, the passed-in scankeys have heap not index attnums),
but guarantee ordered output, unlike the existing functions.  For the moment
these are just very thin wrappers around index_beginscan/index_getnext/etc.
Someday they might need to get smarter; but for now this is just a code
refactoring exercise to reduce the number of direct callers of index_getnext,
in preparation for changing that function's API.

In passing, remove index_getnext_indexitem, which has been dead code for
quite some time, and will have even less use than that in the presence
of run-time-lossy indexes.
2008-04-12 23:14:21 +00:00
Tom Lane 4e82a95476 Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole
indexscan always occurs in one call, and the results are returned in a
TIDBitmap instead of a limited-size array of TIDs.  This should improve
speed a little by reducing AM entry/exit overhead, and it is necessary
infrastructure if we are ever to support bitmap indexes.

In an only slightly related change, add support for TIDBitmaps to preserve
(somewhat lossily) the knowledge that particular TIDs reported by an index
need to have their quals rechecked when the heap is visited.  This facility
is not really used yet; we'll need to extend the forced-recheck feature to
plain indexscans before it's useful, and that hasn't been coded yet.
The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
with weighted queries.  There might be other uses in future, but that one
alone is sufficient reason.

Heikki Linnakangas, with some adjustments by me.
2008-04-10 22:25:26 +00:00
Alvaro Herrera 73b0300b2a Move the HTSU_Result enum definition into snapshot.h, to avoid including
tqual.h into heapam.h.  This makes all inclusion of tqual.h explicit.

I also sorted alphabetically the includes on some source files.
2008-03-26 21:10:39 +00:00
Alvaro Herrera 78f02ca1f5 Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.
Per complaint from Tom Lane.
2008-03-26 18:48:59 +00:00
Alvaro Herrera d43b085d57 Separate snapshot management code from tuple visibility code, create a
snapmgmt.c file for the former.  The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.

This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.

Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.
2008-03-26 16:20:48 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 282d2a03dd HOT updates. When we update a tuple without changing any of its indexed
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
2007-09-20 17:56:33 +00:00
Tom Lane 77947c51c0 Fix up pgstats counting of live and dead tuples to recognize that committed
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
2007-05-27 03:50:39 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane a78fcfb512 Restructure operator classes to allow improved handling of cross-data-type
cases.  Operator classes now exist within "operator families".  While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.

This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later.  Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default.  I owe some more documentation work, too.  But that can all
be done in smaller pieces once this infrastructure is in place.
2006-12-23 00:43:13 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane 09d3670df3 Change the relation_open protocol so that we obtain lock on a relation
(table or index) before trying to open its relcache entry.  This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load.  Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.

Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
2006-07-31 20:09:10 +00:00
Tom Lane 09cb5c0e7d Rewrite btree index scans to work a page at a time in all cases (both
btgettuple and btgetmulti).  This eliminates the problem of "re-finding" the
exact stopping point, since the stopping point is effectively always a page
boundary, and index items are never moved across pre-existing page boundaries.
A small penalty is that the keys_are_unique optimization is effectively
disabled (and, therefore, is removed in this patch), causing us to apply
_bt_checkkeys() to at least one more tuple than necessary when looking up a
unique key.  However, the advantages for non-unique cases seem great enough to
accept this tradeoff.  Aside from simplifying and (sometimes) speeding up the
indexscan code, this will allow us to reimplement btbulkdelete as a largely
sequential scan instead of index-order traversal, thereby significantly
reducing the cost of VACUUM.  Those changes will come in a separate patch.

Original patch by Heikki Linnakangas, rework by Tom Lane.
2006-05-07 01:21:30 +00:00
Tom Lane e57345975c Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion.
This formulation requires every AM to provide amvacuumcleanup, unlike before,
but it's surely a whole lot cleaner.  Also, add an 'amstorage' column to
pg_am so that we can get rid of hardwired knowledge in DefineOpClass().
2006-05-02 22:25:10 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane fd267c1ebc Skip ambulkdelete scan if there's nothing to delete and the index is not
partial.  None of the existing AMs do anything useful except counting
tuples when there's nothing to delete, and we can get a tuple count
from the heap as long as it's not a partial index.  (hash actually can
skip anyway because it maintains a tuple count in the index metapage.)
GIST is not currently able to exploit this optimization because, due to
failure to index NULLs, GIST is always effectively partial.  Possibly
we should fix that sometime.
Simon Riggs w/ some review by Tom Lane.
2006-02-11 23:31:34 +00:00
Bruce Momjian 77bb65d3fc Revert based on Tom's recommendation:
> Allow VACUUM to complete faster by avoiding scanning the indexes when no
> rows were removed from the heap by the VACUUM.
2006-02-11 17:14:09 +00:00
Bruce Momjian bf324946b3 Allow VACUUM to complete faster by avoiding scanning the indexes when no
rows were removed from the heap by the VACUUM.

Simon Riggs
2006-02-11 16:59:09 +00:00
Tom Lane a98871b7ac Tweak indexscan machinery to avoid taking an AccessShareLock on an index
if we already have a stronger lock due to the index's table being the
update target table of the query.  Same optimization I applied earlier
at the table level.  There doesn't seem to be much interest in the more
radical idea of not locking indexes at all, so do what we can ...
2005-12-03 05:51:03 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane cb8b6618ce Revise pgstats stuff to fix the problems with not counting accesses
generated by bitmap index scans.  Along the way, simplify and speed up
the code for counting sequential and index scans; it was both confusing
and inefficient to be taking care of that in the per-tuple loops, IMHO.
initdb forced because of internal changes in pg_stat view definitions.
2005-10-06 02:29:23 +00:00
Teodor Sigaev e8cab5fe49 Concurrency for GiST
- full concurrency for insert/update/select/vacuum:
        - select and vacuum never locks more than one page simultaneously
        - select (gettuple) hasn't any lock across it's calls
        - insert never locks more than two page simultaneously:
                - during search of leaf to insert it locks only one page
                  simultaneously
                - while walk upward to the root it locked only parent (may be
                  non-direct parent) and child. One of them X-lock, another may
                  be S- or X-lock
- 'vacuum full' locks index
- improve gistgetmulti
- simplify XLOG records

Fix bug in index_beginscan_internal: LockRelation may clean
  rd_aminfo structure, so move GET_REL_PROCEDURE after LockRelation
2005-06-27 12:45:23 +00:00
Tom Lane c186c93148 Change the planner to allow indexscan qualification clauses to use
nonconsecutive columns of a multicolumn index, as per discussion around
mid-May (pghackers thread "Best way to scan on-disk bitmaps").  This
turns out to require only minimal changes in btree, and so far as I can
see none at all in GiST.  btcostestimate did need some work, but its
original assumption that index selectivity == heap selectivity was
quite bogus even before this.
2005-06-13 23:14:49 +00:00
Tom Lane 32e8fc4a28 Arrange to cache fmgr lookup information for an index's access method
routines in the index's relcache entry, instead of doing a fresh fmgr_info
on every index access.  We were already doing this for the index's opclass
support functions; not sure why we didn't think to do it for the AM
functions too.  This supersedes the former method of caching (only)
amgettuple in indexscan scan descriptors; it's an improvement because the
function lookup can be amortized across multiple statements instead of
being repeated for each statement.  Even though lookup for builtin
functions is pretty cheap, this seems to drop a percent or two off some
simple benchmarks.
2005-05-27 23:31:21 +00:00
Tom Lane 2ef172a2a4 Fix latent bug in ExecSeqRestrPos: it leaves the plan node's result slot
in an inconsistent state.  (This is only latent because in reality
ExecSeqRestrPos is dead code at the moment ... but someday maybe it won't
be.)  Add some comments about what the API for plan node mark/restore
actually is, because it's not immediately obvious.
2005-05-15 21:19:55 +00:00
Tom Lane 162bd08b3f Completion of project to use fixed OIDs for all system catalogs and
indexes.  Replace all heap_openr and index_openr calls by heap_open
and index_open.  Remove runtime lookups of catalog OID numbers in
various places.  Remove relcache's support for looking up system
catalogs by name.  Bulky but mostly very boring patch ...
2005-04-14 20:03:27 +00:00
Tom Lane bf3dbb5881 First steps towards index scans with heap access decoupled from index
access: define new index access method functions 'amgetmulti' that can
fetch multiple TIDs per call.  (The functions exist but are totally
untested as yet.)  Since I was modifying pg_am anyway, remove the
no-longer-needed 'rel' parameter from amcostestimate functions, and
also remove the vestigial amowner column that was creating useless
work for Alvaro's shared-object-dependencies project.
Initdb forced due to changes in pg_am.
2005-03-27 23:53:05 +00:00
Tom Lane ee4ddac137 Convert index-related tuple handling routines from char 'n'/' ' to bool
convention for isnull flags.  Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
2005-03-21 01:24:04 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane 9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Tom Lane d2af5f8a3e Adjust index locking rules as per my proposal of earlier today. You
now are supposed to take some kind of lock on an index whenever you
are going to access the index contents, rather than relying only on a
lock on the parent table.
2004-09-30 23:21:26 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane 37fa3b6c89 Tweak indexscan and seqscan code to arrange that steps from one page to
the next are handled by ReleaseAndReadBuffer rather than separate
ReleaseBuffer and ReadBuffer calls.  This cuts the number of acquisitions
of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens
to pull successive rows from the same heap page).  Unfortunately this
doesn't seem enough to get us out of the recently discussed context-switch
storm problem, but it's surely worth doing anyway.
2004-04-21 18:24:26 +00:00
PostgreSQL Daemon 969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Peter Eisentraut feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Bruce Momjian 46785776c4 Another pgindent run with updated typedefs. 2003-08-08 21:42:59 +00:00
Bruce Momjian f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian 089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane ec7aa4b515 Error message editing in backend/access. 2003-07-21 20:29:40 +00:00
Tom Lane fddc2d94ce Modify keys_are_unique optimization to release buffer pins before it
returns NULL.  This avoids out-of-buffers failures during many-way
indexscans, as in Shraibman's complaint of 21-Mar.
2003-03-24 21:42:33 +00:00
Tom Lane 0489783011 Adjust amrescan code so that it's allowed to call index_rescan with a
NULL key pointer, indicating that the existing scan key should be reused.
This behavior isn't used yet but will be needed for my planned fix to
the keys_are_unique code.
2003-03-23 23:01:03 +00:00
Tom Lane 799bc58dc7 More infrastructure for btree compaction project. Tree-traversal code
now knows what to do upon hitting a dead page (in theory anyway, it's
untested...).  Add a post-VACUUM-cleanup entry point for index AMs, to
provide a place for dead-page scavenging to happen.
Also, fix oversight that broke btpo_prev links in temporary indexes.
initdb forced due to additions in pg_am.
2003-02-22 00:45:05 +00:00
Tom Lane cbca6c4896 Fix for bug #866. 7.3 contains new logic for avoiding redundant calls to
the index AM when we know we are fetching a unique row.  However, this
logic did not consider the possibility that it would be asked to fetch
backwards.  Also fix mark/restore to work correctly in this scenario.
2003-01-08 19:41:40 +00:00
Bruce Momjian e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Bruce Momjian d84fe82230 Update copyright to 2002. 2002-06-20 20:29:54 +00:00
Tom Lane de09da547a Wups, managed to break ANALYZE with one aspect of that heap_fetch change. 2002-05-24 19:52:43 +00:00
Tom Lane 3f4d488022 Mark index entries "killed" when they are no longer visible to any
transaction, so as to avoid returning them out of the index AM.  Saves
repeated heap_fetch operations on frequently-updated rows.  Also detect
queries on unique keys (equality to all columns of a unique index), and
don't bother continuing scan once we have found first match.

Killing is implemented in the btree and hash AMs, but not yet in rtree
or gist, because there isn't an equally convenient place to do it in
those AMs (the outer amgetnext routine can't do it without re-pinning
the index page).

Did some small cleanup on APIs of HeapTupleSatisfies, heap_fetch, and
index_insert to make this a little easier.
2002-05-24 18:57:57 +00:00
Tom Lane 44fbe20d62 Restructure indexscan API (index_beginscan, index_getnext) per
yesterday's proposal to pghackers.  Also remove unnecessary parameters
to heap_beginscan, heap_rescan.  I modified pg_proc.h to reflect the
new numbers of parameters for the AM interface routines, but did not
force an initdb because nothing actually looks at those fields.
2002-05-20 23:51:44 +00:00
Tom Lane 27a54ae282 Opclasses live in namespaces. I also took the opportunity to create
an 'opclass owner' column in pg_opclass.  Nothing is done with it at
present, but since there are plans to invent a CREATE OPERATOR CLASS
command soon, we'll probably want DROP OPERATOR CLASS too, which
suggests that a notion of ownership would be a good idea.
2002-04-17 20:57:57 +00:00
Tom Lane 1dbf8aa7a8 pg_class has a relnamespace column. You can create and access tables
in schemas other than the system namespace; however, there's no search
path yet, and not all operations work yet on tables outside the system
namespace.
2002-03-26 19:17:02 +00:00
Tom Lane 7d05310828 Fix problem reported by Alex Korn: if a relation has been dropped and
recreated since the start of our transaction, our first reference to it
errored out because we'd try to reuse our old relcache entry for it.
Do this by accepting SI inval messages just before relcache search in
heap_openr, so that dead relcache entries will be flushed before we
search.  Also, break heap_open/openr into two pairs of routines,
relation_open(r) and heap_open(r).  The relation_open routines make
no tests on relkind and so can be used to open anything that has a
pg_class entry.  The heap_open routines are wrappers that add a relkind
test to preserve their established behavior.  Use the relation_open
routines in several places that had various kluge solutions for opening
rels that might be either heap or index rels.

Also, remove the old 'heap stats' code that's been superseded by Jan's
stats collector, and clean up some inconsistencies in error reporting
between the different types of ALTER TABLE.
2001-11-02 16:30:29 +00:00
Bruce Momjian b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Tom Lane 85801a4dbd Rearrange fmgr.c and relcache so that it's possible to keep FmgrInfo
lookup info in the relcache for index access method support functions.
This makes a huge difference for dynamically loaded support functions,
and should save a few cycles even for built-in ones.  Also tweak dfmgr.c
so that load_external_function is called only once, not twice, when
doing fmgr_info for a dynamically loaded function.  All per performance
gripe from Teodor Sigaev, 5-Oct-01.
2001-10-06 23:21:45 +00:00
Tom Lane c8076f09d2 Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers.  Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc.  (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.)  The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method.  I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions.  Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.

Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).

Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error.  We discovered this need long
ago for btree, but missed the other guys.

Oh, one more thing: concurrent VACUUM is now the default.
2001-07-15 22:48:19 +00:00
Jan Wieck 8d80b0d980 Statistical system views (yet without the config stuff, but
it's hard to keep such massive changes in sync with the tree
so I need to get it in and work from there now).

Jan
2001-06-22 19:16:24 +00:00
Tom Lane 0b370ea7c8 Clean up some minor problems exposed by further thought about Panon's bug
report on old-style functions invoked by RI triggers.  We had a number of
other places that were being sloppy about which memory context FmgrInfo
subsidiary data will be allocated in.  Turns out none of them actually
cause a problem in 7.1, but this is for arcane reasons such as the fact
that old-style triggers aren't supported anyway.  To avoid getting burnt
later, I've restructured the trigger support so that we don't keep trigger
FmgrInfo structs in relcache memory.  Some other related cleanups too:
it's not really necessary to call fmgr_info at all while setting up
the index support info in relcache entries, because those ScanKeyEntry
structs are never used to invoke the functions.  This should speed up
relcache initialization a tiny bit.
2001-06-01 02:41:36 +00:00
Tom Lane 3043810d97 Updates to make GIST work with multi-key indexes (from Oleg Bartunov
and Teodor Sigaev).  Declare key values as Datum where appropriate,
rather than char* (Tom Lane).
2001-05-31 18:16:55 +00:00
Bruce Momjian 0686d49da0 Remove dashes in comments that don't need them, rewrap with pgindent. 2001-03-22 06:16:21 +00:00
Bruce Momjian 623bf843d2 Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. 2001-01-24 19:43:33 +00:00
Tom Lane 6bfe64032e Cleanup of code for creating index entries. Functional indexes with
pass-by-ref data types --- eg, an index on lower(textfield) --- no longer
leak memory during index creation or update.  Clean up a lot of redundant
code ... did you know that copy, vacuum, truncate, reindex, extend index,
and bootstrap each basically duplicated the main executor's logic for
extracting information about an index and preparing index entries?
Functional indexes should be a little faster now too, due to removal
of repeated function lookups.
CREATE INDEX 'opt_type' clause is deimplemented by these changes,
but I haven't removed it from the parser yet (need to merge with
Thomas' latest change set first).
2000-07-14 22:18:02 +00:00
Tom Lane f2d1205322 Another batch of fmgr updates. I think I have gotten all old-style
functions that take pass-by-value datatypes.  Should be ready for
port testing ...
2000-06-13 07:35:40 +00:00
Tom Lane 0f1e39643d Third round of fmgr updates: eliminate calls using fmgr() and
fmgr_faddr() in favor of new-style calls.  Lots of cleanup of
sloppy casts to use XXXGetDatum and DatumGetXXX ...
2000-05-30 04:25:00 +00:00
Tom Lane 0a7fb4e918 First round of changes for new fmgr interface. fmgr itself and the
key call sites are changed, but most called functions are still oldstyle.
An exception is that the PL managers are updated (so, for example, NULL
handling now behaves as expected in plperl and plpgsql functions).
NOTE initdb is forced due to added column in pg_proc.
2000-05-28 17:56:29 +00:00
Bruce Momjian 52f77df613 Ye-old pgindent run. Same 4-space tabs. 2000-04-12 17:17:23 +00:00
Tom Lane 34235a295b Cache fmgr lookup data for index's getnext() function in IndexScanDesc,
so that the fmgr lookup only has to happen once per index scan and not
once per tuple.  Seems to save 5% or so of CPU time for an indexscan.
2000-03-14 23:52:01 +00:00
Bruce Momjian 5c25d60244 Add:
* Portions Copyright (c) 1996-2000, PostgreSQL, Inc

to all files copyright Regents of Berkeley.  Man, that's a lot of files.
2000-01-26 05:58:53 +00:00
Tom Lane 71ed7eb494 Revise handling of index-type-specific indexscan cost estimation, per
pghackers discussion of 5-Jan-2000.  The amopselect and amopnpages
estimators are gone, and in their place is a per-AM amcostestimate
procedure (linked to from pg_am, not pg_amop).
2000-01-22 23:50:30 +00:00
Tom Lane 9c95f8c9b2 Repair bugs discussed in pghackers thread of 15 May 1999: creation of a
relcache entry no longer leaks a small amount of memory.  index_endscan
now releases all the memory acquired by index_beginscan, so callers of it
should NOT pfree the scan descriptor anymore.
1999-12-30 05:05:13 +00:00