Add a macro templatized hashtable.

dynahash.c hash tables aren't quite fast enough for some use-cases. There are several reasons for lacking performance: - the use of chaining for collision handling makes them cache inefficient, that's especially an issue when the tables get bigger. - as the element sizes for dynahash are only determined at runtime, offset computations are somewhat expensive - hash and element comparisons are indirect function calls, causing unnecessary pipeline stalls - it's two level structure has some benefits (somewhat natural partitioning), but increases the number of indirections to fix several of these the hash tables have to be adjusted to the individual use-case at compile-time. C unfortunately doesn't provide a good way to do compile code generation (like e.g. c++'s templates for all their weaknesses do). Thus the somewhat ugly approach taken here is to allow for code generation using a macro-templatized header file, which generates functions and types based on a prefix and other parameters. Later patches use this infrastructure to use such hash tables for tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation, ...). In queries where these use up a large fraction of the time, this has been measured to lead to performance improvements of over 100%. There are other cases where this could be useful (e.g. catcache.c). The hash table design chosen is a variant of linear open-addressing. The biggest disadvantage of simple linear addressing schemes are highly variable lookup times due to clustering, and deletions leaving a lot of tombstones around. To address these issues a variant of "robin hood" hashing is employed. Robin hood hashing optimizes chaining lengths by moving elements close to their optimal bucket ("rich" elements), out of the way if a to-be-inserted element is further away from its optimal position (i.e. it's "poor"). While that can make insertions slower, the average lookup performance is a lot better, and higher fill factors can be used in a still performant manner. To avoid tombstones - which normally solve the issue that a deleted node's presence is relevant to determine whether a lookup needs to continue looking or is done - buckets following a deleted element are shifted backwards, unless they're empty or already at their optimal position. There's further possible improvements that can be made to this implementation. Amongst others: - Use distance as a termination criteria during searches. This is generally a good idea, but I've been able to see the overhead of distance calculations in some cases. - Consider combining the 'empty' status into the hashvalue, and enforce storing the hashvalue. That could, in some cases, increase memory density and remove a few instructions. - Experiment further with the, very conservatively choosen, fillfactor. - Make maximum size of hashtable configurable, to allow storing very very large tables. That'd require 64bit hash values to be more common than now, though. - some smaller memcpy calls could be optimized to copy larger chunks But since the new implementation is already considerably faster than dynahash it seem sensible to start using it. Reviewed-By: Tomas Vondra Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
2016-10-14 16:05:30 -07:00 · 2016-10-14 16:05:30 -07:00 · b30d3ea824
parent aa3ca5e3dd
commit b30d3ea824
2 changed files with 881 additions and 0 deletions
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@ -0,0 +1,878 @@
+/*
+ * simplehash.h
+ *
+ *	  Hash table implementation which will be specialized to user-defined
+ *	  types, by including this file to generate the required code.  It's
+ *	  probably not worthwhile to do so for hash tables that aren't performance
+ *	  or space sensitive.
+ *
+ * Usage notes:
+ *
+ *	  To generate a hash-table and associated functions for a use case several
+ *	  macros have to be #define'ed before this file is included.  Including
+ *	  the file #undef's all those, so a new hash table can be generated
+ *	  afterwards.
+ *	  The relevant parameters are:
+ *	  - SH_PREFIX - prefix for all symbol names generated. A prefix of 'foo'
+ *		will result in hash table type 'foo_hash' and functions like
+ *		'foo_insert'/'foo_lookup' and so forth.
+ *	  - SH_ELEMENT_TYPE - type of the contained elements
+ *	  - SH_KEY_TYPE - type of the hashtable's key
+ *	  - SH_DECLARE - if defined function prototypes and type declarations are
+ *		generated
+ *	  - SH_DEFINE - if defined function definitions are generated
+ *	  - SH_SCOPE - in which scope (e.g. extern, static inline) do function
+ *		declarations reside
+ *	  The following parameters are only relevant when SH_DEFINE is defined:
+ *	  - SH_KEY - name of the element in SH_ELEMENT_TYPE containing the hash key
+ *	  - SH_EQUAL(table, a, b) - compare two table keys
+ *	  - SH_HASH_KEY(table, key) - generate hash for the key
+ *	  - SH_STORE_HASH - if defined the hash is stored in the elements
+ *	  - SH_GET_HASH(tb, a) - return the field to store the hash in
+ *
+ *	  For examples of usage look at simplehash.c (file local definition) and
+ *	  execnodes.h/execGrouping.c (exposed declaration, file local
+ *	  implementation).
+ *
+ * Hash table design:
+ *
+ *	  The hash table design chosen is a variant of linear open-addressing. The
+ *	  reason for doing so is that linear addressing is CPU cache & pipeline
+ *	  friendly. The biggest disadvantage of simple linear addressing schemes
+ *	  are highly variable lookup times due to clustering, and deletions
+ *	  leaving a lot of tombstones around.  To address these issues a variant
+ *	  of "robin hood" hashing is employed.  Robin hood hashing optimizes
+ *	  chaining lengths by moving elements close to their optimal bucket
+ *	  ("rich" elements), out of the way if a to-be-inserted element is further
+ *	  away from its optimal position (i.e. it's "poor").  While that can make
+ *	  insertions slower, the average lookup performance is a lot better, and
+ *	  higher fill factors can be used in a still performant manner.  To avoid
+ *	  tombstones - which normally solve the issue that a deleted node's
+ *	  presence is relevant to determine whether a lookup needs to continue
+ *	  looking or is done - buckets following a deleted element are shifted
+ *	  backwards, unless they're empty or already at their optimal position.
+ */
+
+/* helpers */
+#define SH_MAKE_PREFIX(a) CppConcat(a,_)
+#define SH_MAKE_NAME(name) SH_MAKE_NAME_(SH_MAKE_PREFIX(SH_PREFIX),name)
+#define SH_MAKE_NAME_(a,b) CppConcat(a,b)
+
+/* name macros for: */
+
+/* type declarations */
+#define SH_TYPE SH_MAKE_NAME(hash)
+#define SH_STATUS SH_MAKE_NAME(status)
+#define SH_STATUS_EMPTY SH_MAKE_NAME(EMPTY)
+#define SH_STATUS_IN_USE SH_MAKE_NAME(IN_USE)
+#define SH_ITERATOR SH_MAKE_NAME(iterator)
+
+/* function declarations */
+#define SH_CREATE SH_MAKE_NAME(create)
+#define SH_DESTROY SH_MAKE_NAME(destroy)
+#define SH_INSERT SH_MAKE_NAME(insert)
+#define SH_DELETE SH_MAKE_NAME(delete)
+#define SH_LOOKUP SH_MAKE_NAME(lookup)
+#define SH_GROW SH_MAKE_NAME(grow)
+#define SH_START_ITERATE SH_MAKE_NAME(start_iterate)
+#define SH_START_ITERATE_AT SH_MAKE_NAME(start_iterate_at)
+#define SH_ITERATE SH_MAKE_NAME(iterate)
+#define SH_STAT SH_MAKE_NAME(stat)
+
+/* internal helper functions (no externally visible prototypes) */
+#define SH_COMPUTE_PARAMETERS SH_MAKE_NAME(compute_parameters)
+#define SH_NEXT SH_MAKE_NAME(next)
+#define SH_PREV SH_MAKE_NAME(prev)
+#define SH_DISTANCE_FROM_OPTIMAL SH_MAKE_NAME(distance)
+#define SH_INITIAL_BUCKET SH_MAKE_NAME(initial_bucket)
+#define SH_ENTRY_HASH SH_MAKE_NAME(entry_hash)
+
+/* generate forward declarations necessary to use the hash table */
+#ifdef SH_DECLARE
+
+/* type definitions */
+typedef struct SH_TYPE
+{
+	/*
+	 * Size of data / bucket array, 64 bits to handle UINT32_MAX sized hash
+	 * tables.  Note that the maximum number of elements is lower
+	 * (SH_MAX_FILLFACTOR)
+	 */
+	uint64		size;
+
+	/* how many elements have valid contents */
+	uint32		members;
+
+	/* mask for bucket and size calculations, based on size */
+	uint32		sizemask;
+
+	/* boundary after which to grow hashtable */
+	uint32		grow_threshold;
+
+	/* hash buckets */
+	SH_ELEMENT_TYPE *data;
+
+	/* memory context to use for allocations */
+	MemoryContext ctx;
+
+	/* user defined data, useful for callbacks */
+	void	   *private;
+} SH_TYPE;
+
+typedef enum SH_STATUS
+{
+	SH_STATUS_EMPTY = 0x00,
+	SH_STATUS_IN_USE = 0x01
+} SH_STATUS;
+
+typedef struct SH_ITERATOR
+{
+	uint32		cur;			/* current element */
+	uint32		end;
+	bool		done;			/* iterator exhausted? */
+} SH_ITERATOR;
+
+/* externally visible function prototypes */
+SH_SCOPE SH_TYPE *SH_CREATE(MemoryContext ctx, uint32 nelements);
+SH_SCOPE void SH_DESTROY(SH_TYPE *tb);
+SH_SCOPE void SH_GROW(SH_TYPE *tb, uint32 newsize);
+SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE *tb, SH_KEY_TYPE key, bool *found);
+SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE *tb, SH_KEY_TYPE key);
+SH_SCOPE bool SH_DELETE(SH_TYPE *tb, SH_KEY_TYPE key);
+SH_SCOPE void SH_START_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter);
+SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE *tb, SH_ITERATOR *iter, uint32 at);
+SH_SCOPE SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter);
+SH_SCOPE void SH_STAT(SH_TYPE *tb);
+
+#endif   /* SH_DECLARE */
+
+
+/* generate implementation of the hash table */
+#ifdef SH_DEFINE
+
+#include "utils/memutils.h"
+
+/* conservative fillfactor for a robin hood table, might want to adjust */
+#define SH_FILLFACTOR (0.8)
+/* increase fillfactor if we otherwise would error out */
+#define SH_MAX_FILLFACTOR (0.95)
+/* max data array size,we allow up to PG_UINT32_MAX buckets, including 0 */
+#define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
+
+#ifdef SH_STORE_HASH
+#define SH_COMPARE_KEYS(tb, ahash, akey, b) (ahash == SH_GET_HASH(tb, b) && SH_EQUAL(tb, b->SH_KEY, akey))
+#else
+#define SH_COMPARE_KEYS(tb, ahash, akey, b) (SH_EQUAL(tb, b->SH_KEY, akey))
+#endif
+
+/* FIXME: can we move these to a central location? */
+
+/* calculate ceil(log base 2) of num */
+static inline uint64
+sh_log2(uint64 num)
+{
+	int			i;
+	uint64		limit;
+
+	for (i = 0, limit = 1; limit < num; i++, limit <<= 1)
+		;
+	return i;
+}
+
+/* calculate first power of 2 >= num */
+static inline uint64
+sh_pow2(uint64 num)
+{
+	return ((uint64) 1) << sh_log2(num);
+}
+
+/*
+ * Compute sizing parameters for hashtable. Called when creating and growing
+ * the hashtable.
+ */
+static inline void
+SH_COMPUTE_PARAMETERS(SH_TYPE *tb, uint32 newsize)
+{
+	uint64		size;
+
+	/* supporting zero sized hashes would complicate matters */
+	size = Max(newsize, 2);
+
+	/* round up size to the next power of 2, that's the bucketing works  */
+	size = sh_pow2(size);
+	Assert(size <= SH_MAX_SIZE);
+
+	/*
+	 * Verify allocation of ->data is possible on platform, without
+	 * overflowing Size.
+	 */
+	if ((((uint64) sizeof(SH_ELEMENT_TYPE)) * size) >= MaxAllocHugeSize)
+		elog(ERROR, "hash table too large");
+
+	/* now set size */
+	tb->size = size;
+
+	if (tb->size == SH_MAX_SIZE)
+		tb->sizemask = 0;
+	else
+		tb->sizemask = tb->size - 1;
+
+	/*
+	 * Compute growth threshold here and after growing the table, to make
+	 * computations during insert cheaper.
+	 */
+	if (tb->size == SH_MAX_SIZE)
+		tb->grow_threshold = ((double) tb->size) * SH_MAX_FILLFACTOR;
+	else
+		tb->grow_threshold = ((double) tb->size) * SH_FILLFACTOR;
+}
+
+/* return the optimal bucket for the hash */
+static inline uint32
+SH_INITIAL_BUCKET(SH_TYPE *tb, uint32 hash)
+{
+	return hash & tb->sizemask;
+}
+
+/* return next bucket after the current, handling wraparound */
+static inline uint32
+SH_NEXT(SH_TYPE *tb, uint32 curelem, uint32 startelem)
+{
+	curelem = (curelem + 1) & tb->sizemask;
+
+	Assert(curelem != startelem);
+
+	return curelem;
+}
+
+/* return bucket before the current, handling wraparound */
+static inline uint32
+SH_PREV(SH_TYPE *tb, uint32 curelem, uint32 startelem)
+{
+	curelem = (curelem - 1) & tb->sizemask;
+
+	Assert(curelem != startelem);
+
+	return curelem;
+}
+
+/* return distance between bucket and it's optimal position */
+static inline uint32
+SH_DISTANCE_FROM_OPTIMAL(SH_TYPE *tb, uint32 optimal, uint32 bucket)
+{
+	if (optimal <= bucket)
+		return bucket - optimal;
+	else
+		return (tb->size + bucket) - optimal;
+}
+
+static inline uint32
+SH_ENTRY_HASH(SH_TYPE *tb, SH_ELEMENT_TYPE * entry)
+{
+#ifdef SH_STORE_HASH
+	return SH_GET_HASH(tb, entry);
+#else
+	return SH_HASH_KEY(tb, entry->SH_KEY);
+#endif
+}
+
+/*
+ * Create a hash table with enough space for `nelements` distinct members,
+ * allocating required memory in the passed-in context.
+ */
+SH_SCOPE SH_TYPE *
+SH_CREATE(MemoryContext ctx, uint32 nelements)
+{
+	SH_TYPE    *tb;
+	uint64		size;
+
+	tb = MemoryContextAllocZero(ctx, sizeof(SH_TYPE));
+	tb->ctx = ctx;
+
+	/* increase nelements by fillfactor, want to store nelements elements */
+	size = Min((double) SH_MAX_SIZE, ((double) nelements) / SH_FILLFACTOR);
+
+	SH_COMPUTE_PARAMETERS(tb, size);
+
+	tb->data = MemoryContextAllocExtended(tb->ctx,
+										  sizeof(SH_ELEMENT_TYPE) * tb->size,
+										  MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO);
+
+	return tb;
+}
+
+/* destroy a previously created hash table */
+SH_SCOPE void
+SH_DESTROY(SH_TYPE *tb)
+{
+	pfree(tb->data);
+	pfree(tb);
+}
+
+/*
+ * Grow a hash table to at least `newsize` buckets.
+ *
+ * Usually this will automatically be called by insertions/deletions, when
+ * necessary. But resizing to the exact input size can be advantageous
+ * performance-wise, when known at some point.
+ */
+SH_SCOPE void
+SH_GROW(SH_TYPE *tb, uint32 newsize)
+{
+	uint64		oldsize = tb->size;
+	SH_ELEMENT_TYPE *olddata = tb->data;
+	SH_ELEMENT_TYPE *newdata;
+	uint32		i;
+	uint32		startelem = 0;
+	uint32		copyelem;
+
+	Assert(oldsize == sh_pow2(oldsize));
+	Assert(oldsize != SH_MAX_SIZE);
+	Assert(oldsize < newsize);
+
+	/* compute parameters for new table */
+	SH_COMPUTE_PARAMETERS(tb, newsize);
+
+	tb->data = MemoryContextAllocExtended(
+								 tb->ctx, sizeof(SH_ELEMENT_TYPE) * tb->size,
+										  MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO);
+
+	newdata = tb->data;
+
+	/*
+	 * Copy entries from the old data to newdata. We theoretically could use
+	 * SH_INSERT here, to avoid code duplication, but that's more general than
+	 * we need. We neither want tb->members increased, nor do we need to do
+	 * deal with deleted elements, nor do we need to compare keys. So a
+	 * special-cased implementation is lot faster. As resizing can be time
+	 * consuming and frequent, that's worthwile to optimize.
+	 *
+	 * To be able to simply move entries over, we have to start not at the
+	 * first bucket (i.e olddata[0]), but find the first bucket that's either
+	 * empty, or is occupied by an entry at it's optimal position. Such a
+	 * bucket has to exist in any table with a load factor under 1, as not all
+	 * buckets are occupied, i.e. there always has to be an empty bucket.  By
+	 * starting at such a bucket we can move the entries to the larger table,
+	 * without having to deal with conflicts.
+	 */
+
+	/* search for the first element in the hash that's not wrapped around */
+	for (i = 0; i < oldsize; i++)
+	{
+		SH_ELEMENT_TYPE *oldentry = &olddata[i];
+		uint32		hash;
+		uint32		optimal;
+
+		if (oldentry->status != SH_STATUS_IN_USE)
+		{
+			startelem = i;
+			break;
+		}
+
+		hash = SH_ENTRY_HASH(tb, oldentry);
+		optimal = SH_INITIAL_BUCKET(tb, hash);
+
+		if (optimal == i)
+		{
+			startelem = i;
+			break;
+		}
+	}
+
+	/* and copy all elements in the old table */
+	copyelem = startelem;
+	for (i = 0; i < oldsize; i++)
+	{
+		SH_ELEMENT_TYPE *oldentry = &olddata[copyelem];
+
+		if (oldentry->status == SH_STATUS_IN_USE)
+		{
+			uint32		hash;
+			uint32		startelem;
+			uint32		curelem;
+			SH_ELEMENT_TYPE *newentry;
+
+			hash = SH_ENTRY_HASH(tb, oldentry);
+			startelem = SH_INITIAL_BUCKET(tb, hash);
+			curelem = startelem;
+
+			/* find empty element to put data into */
+			while (true)
+			{
+				newentry = &newdata[curelem];
+
+				if (newentry->status == SH_STATUS_EMPTY)
+				{
+					break;
+				}
+
+				curelem = SH_NEXT(tb, curelem, startelem);
+			}
+
+			/* copy entry to new slot */
+			memcpy(newentry, oldentry, sizeof(SH_ELEMENT_TYPE));
+		}
+
+		/* can't use SH_NEXT here, would use new size */
+		copyelem++;
+		if (copyelem >= oldsize)
+		{
+			copyelem = 0;
+		}
+	}
+
+	pfree(olddata);
+}
+
+/*
+ * Insert the key key into the hash-table, set *found to true if the key
+ * already exists, false otherwise. Returns the hash-table entry in either
+ * case.
+ */
+SH_SCOPE SH_ELEMENT_TYPE *
+SH_INSERT(SH_TYPE *tb, SH_KEY_TYPE key, bool *found)
+{
+	uint32		hash = SH_HASH_KEY(tb, key);
+	uint32		startelem;
+	uint32		curelem;
+	SH_ELEMENT_TYPE *data;
+	uint32		insertdist = 0;
+
+	/*
+	 * We do the grow check even if the key is actually present, to avoid
+	 * doing the check inside the loop. This also lets us avoid having to
+	 * re-find our position in the hashtable after resizing.
+	 */
+	if (unlikely(tb->members >= tb->grow_threshold))
+	{
+		if (tb->size == SH_MAX_SIZE)
+		{
+			elog(ERROR, "hash table size exceeded");
+		}
+
+		/*
+		 * When optimizing, it can be very useful to print these out.
+		 */
+		/* SH_STAT(tb); */
+		SH_GROW(tb, tb->size * 2);
+		/* SH_STAT(tb); */
+	}
+
+	/* perform insert, start bucket search at optimal location */
+	data = tb->data;
+	startelem = SH_INITIAL_BUCKET(tb, hash);
+	curelem = startelem;
+	while (true)
+	{
+		uint32		curdist;
+		uint32		curhash;
+		uint32		curoptimal;
+		SH_ELEMENT_TYPE *entry = &data[curelem];
+
+		/* any empty bucket can directly be used */
+		if (entry->status == SH_STATUS_EMPTY)
+		{
+			tb->members++;
+			entry->SH_KEY = key;
+#ifdef SH_STORE_HASH
+			SH_GET_HASH(tb, entry) = hash;
+#endif
+			entry->status = SH_STATUS_IN_USE;
+			*found = false;
+			return entry;
+		}
+
+		/*
+		 * If the bucket is not empty, we either found a match (in which case
+		 * we're done), or we have to decide whether to skip over or move the
+		 * colliding entry. When the the colliding elements distance to it's
+		 * optimal position is smaller than the to-be-inserted entry's, we
+		 * shift the colliding entry (and it's followers) forward by one.
+		 */
+
+		if (SH_COMPARE_KEYS(tb, hash, key, entry))
+		{
+			Assert(entry->status == SH_STATUS_IN_USE);
+			*found = true;
+			return entry;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, entry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+		curdist = SH_DISTANCE_FROM_OPTIMAL(tb, curoptimal, curelem);
+
+		if (insertdist > curdist)
+		{
+			SH_ELEMENT_TYPE *lastentry = entry;
+			uint32		emptyelem = curelem;
+			uint32		moveelem;
+
+			/* find next empty bucket */
+			while (true)
+			{
+				SH_ELEMENT_TYPE *emptyentry;
+
+				emptyelem = SH_NEXT(tb, emptyelem, startelem);
+				emptyentry = &data[emptyelem];
+
+				if (emptyentry->status == SH_STATUS_EMPTY)
+				{
+					lastentry = emptyentry;
+					break;
+				}
+			}
+
+			/* shift forward, starting at last occupied element */
+
+			/*
+			 * TODO: This could be optimized to be one memcpy in may cases,
+			 * excepting wrapping around at the end of ->data. Hasn't shown up
+			 * in profiles so far though.
+			 */
+			moveelem = emptyelem;
+			while (moveelem != curelem)
+			{
+				SH_ELEMENT_TYPE *moveentry;
+
+				moveelem = SH_PREV(tb, moveelem, startelem);
+				moveentry = &data[moveelem];
+
+				memcpy(lastentry, moveentry, sizeof(SH_ELEMENT_TYPE));
+				lastentry = moveentry;
+			}
+
+			/* and fill the now empty spot */
+			tb->members++;
+
+			entry->SH_KEY = key;
+#ifdef SH_STORE_HASH
+			SH_GET_HASH(tb, entry) = hash;
+#endif
+			entry->status = SH_STATUS_IN_USE;
+			*found = false;
+			return entry;
+		}
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		insertdist++;
+	}
+}
+
+/*
+ * Lookup up entry in hash table.  Returns NULL if key not present.
+ */
+SH_SCOPE SH_ELEMENT_TYPE *
+SH_LOOKUP(SH_TYPE *tb, SH_KEY_TYPE key)
+{
+	uint32		hash = SH_HASH_KEY(tb, key);
+	const uint32 startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem = startelem;
+
+	while (true)
+	{
+		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
+
+		if (entry->status == SH_STATUS_EMPTY)
+		{
+			return NULL;
+		}
+
+		Assert(entry->status == SH_STATUS_IN_USE);
+
+		if (SH_COMPARE_KEYS(tb, hash, key, entry))
+			return entry;
+
+		/*
+		 * TODO: we could stop search based on distance. If the current
+		 * buckets's distance-from-optimal is smaller than what we've skipped
+		 * already, the entry doesn't exist. Probably only do so if
+		 * SH_STORE_HASH is defined, to avoid re-computing hashes?
+		 */
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+	}
+}
+
+/*
+ * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * present.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE *tb, SH_KEY_TYPE key)
+{
+	uint32		hash = SH_HASH_KEY(tb, key);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem = startelem;
+
+	while (true)
+	{
+		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
+
+		if (entry->status == SH_STATUS_EMPTY)
+			return false;
+
+		if (entry->status == SH_STATUS_IN_USE &&
+			SH_COMPARE_KEYS(tb, hash, key, entry))
+		{
+			SH_ELEMENT_TYPE *lastentry = entry;
+
+			tb->members--;
+
+			/*
+			 * Backward shift following elements till either an empty element
+			 * or an element at its optimal position is encounterered.
+			 *
+			 * While that sounds expensive, the average chain length is short,
+			 * and deletions would otherwise require toombstones.
+			 */
+			while (true)
+			{
+				SH_ELEMENT_TYPE *curentry;
+				uint32		curhash;
+				uint32		curoptimal;
+
+				curelem = SH_NEXT(tb, curelem, startelem);
+				curentry = &tb->data[curelem];
+
+				if (curentry->status != SH_STATUS_IN_USE)
+				{
+					lastentry->status = SH_STATUS_EMPTY;
+					break;
+				}
+
+				curhash = SH_ENTRY_HASH(tb, curentry);
+				curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+				/* current is at optimal position, done */
+				if (curoptimal == curelem)
+				{
+					lastentry->status = SH_STATUS_EMPTY;
+					break;
+				}
+
+				/* shift */
+				memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+				lastentry = curentry;
+			}
+
+			return true;
+		}
+
+		/* TODO: return false; if distance too big */
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+	}
+}
+
+/*
+ * Initialize iterator.
+ */
+SH_SCOPE void
+SH_START_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter)
+{
+	int			i;
+	uint64		startelem = PG_UINT64_MAX;
+
+	/*
+	 * Search for the first empty element. As deletions during iterations are
+	 * supported, we want to start/end at an element that cannot be affected
+	 * by elements being shifted.
+	 */
+	for (i = 0; i < tb->size; i++)
+	{
+		SH_ELEMENT_TYPE *entry = &tb->data[i];
+
+		if (entry->status != SH_STATUS_IN_USE)
+		{
+			startelem = i;
+			break;
+		}
+	}
+
+	Assert(startelem < SH_MAX_SIZE);
+
+	/*
+	 * Iterate backwards, that allows the current element to be deleted, even
+	 * if there are backward shifts
+	 */
+	iter->cur = startelem;
+	iter->end = iter->cur;
+	iter->done = false;
+}
+
+/*
+ * Initialize iterator to a specific bucket. That's really only useful for
+ * cases where callers are partially iterating over the hashspace, and that
+ * iteration deletes and inserts elements based on visited entries. Doing that
+ * repeatedly could lead to an unbalanced keyspace when always starting at the
+ * same position.
+ */
+SH_SCOPE void
+SH_START_ITERATE_AT(SH_TYPE *tb, SH_ITERATOR *iter, uint32 at)
+{
+	/*
+	 * Iterate backwards, that allows the current element to be deleted, even
+	 * if there are backward shifts.
+	 */
+	iter->cur = at & tb->sizemask;		/* ensure at is within a valid range */
+	iter->end = iter->cur;
+	iter->done = false;
+}
+
+/*
+ * Iterate over all entries in the hash-table. Return the next occupied entry,
+ * or NULL if done.
+ *
+ * During iteration the current entry in the hash table may be deleted,
+ * without leading to elements being skipped or returned twice.  Additionally
+ * the rest of the table may be modified (i.e. there can be insertions or
+ * deletions), but if so, there's neither a guarantee that all nodes are
+ * visited at least once, nor a guarantee that a node is visited at most once.
+ */
+SH_SCOPE SH_ELEMENT_TYPE *
+SH_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter)
+{
+	while (!iter->done)
+	{
+		SH_ELEMENT_TYPE *elem;
+
+		elem = &tb->data[iter->cur];
+
+		/* next element in backward direction */
+		iter->cur = (iter->cur - 1) & tb->sizemask;
+
+		if ((iter->cur & tb->sizemask) == (iter->end & tb->sizemask))
+			iter->done = true;
+		if (elem->status == SH_STATUS_IN_USE)
+		{
+			return elem;
+		}
+	}
+
+	return NULL;
+}
+
+/*
+ * Report some statistics about the state of the hashtable. For
+ * debugging/profiling purposes only.
+ */
+SH_SCOPE void
+SH_STAT(SH_TYPE *tb)
+{
+	uint32		max_chain_length = 0;
+	uint32		total_chain_length = 0;
+	double		avg_chain_length;
+	double		fillfactor;
+	uint32		i;
+
+	uint32	   *collisions = palloc0(tb->size * sizeof(uint32));
+	uint32		total_collisions = 0;
+	uint32		max_collisions = 0;
+	double		avg_collisions;
+
+	for (i = 0; i < tb->size; i++)
+	{
+		uint32		hash;
+		uint32		optimal;
+		uint32		dist;
+		SH_ELEMENT_TYPE *elem;
+
+		elem = &tb->data[i];
+
+		if (elem->status != SH_STATUS_IN_USE)
+			continue;
+
+		hash = SH_ENTRY_HASH(tb, elem);
+		optimal = SH_INITIAL_BUCKET(tb, hash);
+		dist = SH_DISTANCE_FROM_OPTIMAL(tb, optimal, i);
+
+		if (dist > max_chain_length)
+			max_chain_length = dist;
+		total_chain_length += dist;
+
+		collisions[optimal]++;
+	}
+
+	for (i = 0; i < tb->size; i++)
+	{
+		uint32		curcoll = collisions[i];
+
+		if (curcoll == 0)
+			continue;
+
+		/* single contained element is not a collision */
+		curcoll--;
+		total_collisions += curcoll;
+		if (curcoll > max_collisions)
+			max_collisions = curcoll;
+	}
+
+	if (tb->members > 0)
+	{
+		fillfactor = tb->members / ((double) tb->size);
+		avg_chain_length = ((double) total_chain_length) / tb->members;
+		avg_collisions = ((double) total_collisions) / tb->members;
+	}
+	else
+	{
+		fillfactor = 0;
+		avg_chain_length = 0;
+		avg_collisions = 0;
+	}
+
+	elog(LOG, "size: " UINT64_FORMAT ", members: %u, filled: %f, total chain: %u, max chain: %u, avg chain: %f, total_collisions: %u, max_collisions: %i, avg_collisions: %f",
+		 tb->size, tb->members, fillfactor, total_chain_length, max_chain_length, avg_chain_length,
+		 total_collisions, max_collisions, avg_collisions);
+}
+
+#endif   /* SH_DEFINE */
+
+
+/* undefine external paramters, so next hash table can be defined */
+#undef SH_PREFIX
+#undef SH_KEY_TYPE
+#undef SH_KEY
+#undef SH_ELEMENT_TYPE
+#undef SH_HASH_KEY
+#undef SH_SCOPE
+#undef SH_DECLARE
+#undef SH_DEFINE
+#undef SH_GET_HASH
+#undef SH_STORE_HASH
+
+/* undefine locally declared macros */
+#undef SH_MAKE_PREFIX
+#undef SH_MAKE_NAME
+#undef SH_MAKE_NAME_
+#undef SH_FILLFACTOR
+#undef SH_MAX_FILLFACTOR
+#undef SH_MAX_SIZE
+
+/* types */
+#undef SH_TYPE
+#undef SH_STATUS
+#undef SH_STATUS_EMPTY
+#undef SH_STATUS_IN_USE
+#undef SH_ITERTOR
+
+/* external function names */
+#undef SH_CREATE
+#undef SH_DESTROY
+#undef SH_INSERT
+#undef SH_DELETE
+#undef SH_LOOKUP
+#undef SH_GROW
+#undef SH_START_ITERATE
+#undef SH_START_ITERATE_AT
+#undef SH_ITERATE
+#undef SH_STAT
+
+/* internal function names */
+#undef SH_COMPUTE_PARAMETERS
+#undef SH_COMPARE_KEYS
+#undef SH_INITIAL_BUCKET
+#undef SH_NEXT
+#undef SH_PREV
+#undef SH_DISTANCE_FROM_OPTIMAL
+#undef SH_ENTRY_HASH
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@ -1797,6 +1797,9 @@ SERIALIZABLEXIDTAG
 SERVICE_STATUS
 SERVICE_STATUS_HANDLE
 SERVICE_TABLE_ENTRY
+SH_TYPE
+SH_ITERATOR
+SH_STATUS
 SHA1_CTX
 SHA224_CTX
 SHA256_CTX