If the hash table backing a catalog cache becomes too full (fillfactor > 2),
enlarge it. A new buckets array, double the size of the old, is allocated,
and all entries in the old hash are moved to the right bucket in the new
hash.
This has two benefits. First, cache lookups don't get so expensive when
there are lots of entries in a cache, like if you access hundreds of
thousands of tables. Second, we can make the (initial) sizes of the caches
much smaller, which saves memory.
This patch dials down the initial sizes of the catcaches. The new sizes are
chosen so that a backend that only runs a few basic queries still won't need
to enlarge any of them.
dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.
In consequence, revert addition of "cache_bucket" field to CatCTup.
Provide a common implementation of embedded singly-linked and
doubly-linked lists. "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.
Most of the implementation uses inlineable functions (where supported),
for performance.
Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes. Other uses can (and probably will) be
converted in the future. Since dllist.c is unused after this conversion,
it has been removed.
Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
Now that cache invalidation callbacks get only a hash value, and not a
tuple TID (per commits 632ae6829f and
b5282aa893), the only way they can restrict
what they invalidate is to know what the hash values mean. setrefs.c was
doing this via a hard-wired assumption but that seems pretty grotty, and
it'll only get worse as more cases come up. So let's expose a calculation
function that takes the same parameters as SearchSysCache. Per complaint
from Marko Kreen.
This requires adjusting the API for syscache callback functions: they now
get a hash value, not a TID, to identify the target tuple. Most of them
weren't paying any attention to that argument anyway, but plancache did
require a small amount of fixing.
Also, improve performance a trifle by avoiding sending duplicate inval
messages when a heap_update isn't changing the catcache lookup columns.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
needed by nothing else.
The restructuring I just finished doing on cache management exposed to me how
silly this routine was. Its function was to go into the catcache and blow
away all entries related to a given relation when there was a relcache flush
on that relation. However, there is no point in removing a catcache entry
if the catalog row it represents is still valid --- and if it isn't valid,
there must have been a catcache entry flush on it, because that's triggered
directly by heap_update or heap_delete on the catalog row. So this routine
accomplished nothing except to blow away valid cache entries that we'd very
likely be wanting in the near future to help reconstruct the relcache entry.
Dumb.
On top of which, it required a subtle and easy-to-get-wrong attribute in
syscache definitions, ie, the column containing the OID of the related
relation if any. Removing that is a very useful maintenance simplification.
of shared or nailed system catalogs. This has two key benefits:
* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
* We no longer have to use an unsafe reindex-in-place approach for reindexing
shared catalogs.
CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.
Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.
This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch. As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
been initialized yet. This can happen because there are code paths that call
SysCacheGetAttr() on a tuple originally fetched from a different syscache
(hopefully on the same catalog) than the one specified in the call. It
doesn't seem useful or robust to try to prevent that from happening, so just
improve the function to cope instead. Per bug#2678 from Jeff Trout. The
specific example shown by Jeff is new in 8.1, but to be on the safe side
I'm backpatching 8.0 as well. We could patch 7.x similarly but I think
that's probably overkill, given the lack of evidence of old bugs of this ilk.
remove the infrastructure needed to enforce the limit, ie, the global
LRU list of cache entries. On small-to-middling databases this wins
because maintaining the LRU list is a waste of time. On large databases
this wins because it's better to keep more cache entries (we assume
such users can afford to use some more per-backend memory than was
contemplated in the Berkeley-era catcache design). This provides a
noticeable improvement in the speed of psql \d on a 10000-table
database, though it doesn't make it instantaneous.
While at it, use per-catcache settings for the number of hash buckets
per catcache, rather than the former one-size-fits-all value. It's a
bit silly to be using the same number of hash buckets for, eg, pg_am
and pg_attribute. The specific values I used might need some tuning,
but they seem to be in the right ballpark based on CATCACHE_STATS
results from the standard regression tests.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
SearchCatCacheList and ReleaseCatCacheList. Previously, we incremented
and decremented the refcounts of list member tuples along with the list
itself, but that's unnecessary, and very expensive when the list is big.
It's cheaper to change only the list refcount. When we are considering
deleting a cache entry, we have to check not only its own refcount but
its parent list's ... but it's easy to arrange the code so that this
check is not made in any commonly-used paths, so the cost is really nil.
The bigger gain though is to refrain from DLMoveToFront'ing each individual
member tuple each time the list is referenced. To keep some semblance
of fair space management, lists are just marked as used or not since the
last cache cleanout search, and we do a MoveToFront pass only when about
to run a cleanout. In combination, these changes reduce the costs of
SearchCatCacheList and ReleaseCatCacheList from about 4.5% of pgbench
runtime to under 1%, according to my gprof results.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
when open references remain during normal cleanup of a resource owner.
This restores the system's ability to warn about leaks to what it was
before 8.0. Not really a user-level bug, but helpful for development.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
keep track of portal-related resources separately from transaction-related
resources. This allows cursors to work in a somewhat sane fashion with
nested transactions. For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor. We might want to change that later.
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.
Alvaro Herrera, with some help from Tom Lane.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number. Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations. I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
in schemas other than the system namespace; however, there's no search
path yet, and not all operations work yet on tables outside the system
namespace.
PostgreSQL. This hash function replaces the one used by hash indexes and
the catalog cache. Hash joins use a different, relatively poor-quality
hash function, but I'll fix that later.
As suggested by Tom Lane, this patch also changes the size of the fixed
hash table used by the catalog cache to be a power-of-2 (instead of a
prime: I chose 256 instead of 257). This allows the catcache to lookup
hash buckets using a simple bitmask. This should improve the performance
of the catalog cache slightly, since the previous method (modulo a
prime) was slow.
In my tests, this improves the performance of hash indexes by between 4%
and 8%; the performance when using btree indexes or seqscans is
basically unchanged.
Neil Conway <neilconway@rogers.com>
speed up repetitive failed searches; per pghackers discussion in late
January. inval.c logic substantially simplified, since we can now treat
inserts and deletes alike as far as inval events are concerned. Some
repair work needed in heap_create_with_catalog, which turns out to have
been doing CommandCounterIncrement at a point where the new relation has
non-self-consistent catalog entries. With the new inval code, that
resulted in assert failures during a relcache entry rebuild.
Improve 'pg_internal.init' relcache entry preload mechanism so that it is
safe to use for all system catalogs, and arrange to preload a realistic
set of system-catalog entries instead of only the three nailed-in-cache
indexes that were formerly loaded this way. Fix mechanism for deleting
out-of-date pg_internal.init files: this must be synchronized with transaction
commit, not just done at random times within transactions. Drive it off
relcache invalidation mechanism so that no special-case tests are needed.
Cache additional information in relcache entries for indexes (their pg_index
tuples and index-operator OIDs) to eliminate repeated lookups. Also cache
index opclass info at the per-opclass level to avoid repeated lookups during
relcache load.
Generalize 'systable scan' utilities originally developed by Hiroshi,
move them into genam.c, use in a number of places where there was formerly
ugly code for choosing either heap or index scan. In particular this allows
simplification of the logic that prevents infinite recursion between syscache
and relcache during startup: we can easily switch to heapscans in relcache.c
when and where needed to avoid recursion, so IndexScanOK becomes simpler and
does not need any expensive initialization.
Eliminate useless opening of a heapscan data structure while doing an indexscan
(this saves an mdnblocks call and thus at least one kernel call).