/*
** You can have as many strategies as you please in GiSTs, as
** long as your consistent method can handle them
*/
#define GISTNStrategies 100
^^^
- too big number:
strat.h->StrategyEvaluationData->StrategyExpression expression[12]
^^
- so 12 is real max # of strategies, or StrategyEvaluationIsValid
crashes backend (called if CASSER defined).
Reply-To: hackers@hub.org, Dan McGuirk <mcguirk@indirect.com>
To: hackers@hub.org
Subject: [HACKERS] tmin writeback optimization
I was doing some profiling of the backend, and noticed that during a certain
benchmark I was running somewhere between 30% and 75% of the backend's CPU
time was being spent in calls to TransactionIdDidCommit() from
HeapTupleSatisfiesNow() or HeapTupleSatisfiesItself() to determine that
changed rows' transactions had in fact been committed even though the rows'
tmin values had not yet been set.
When a query looks at a given row, it needs to figure out whether the
transaction that changed the row has been committed and hence it should pay
attention to the row, or whether on the other hand the transaction is still
in progress or has been aborted and hence the row should be ignored. If
a tmin value is set, it is known definitively that the row's transaction
has been committed. However, if tmin is not set, the transaction
referred to in xmin must be looked up in pg_log, and this is what the
backend was spending a lot of time doing during my benchmark.
So, implementing a method suggested by Vadim, I created the following
patch that, the first time a query finds a committed row whose tmin value
is not set, sets it, and marks the buffer where the row is stored as
dirty. (It works for tmax, too.) This doesn't result in the boost in
real time performance I was hoping for, however it does decrease backend
CPU usage by up to two-thirds in certain situations, so it could be
rather beneficial in high-concurrency settings.
Patches from: aoki@CS.Berkeley.EDU (Paul M. Aoki)
i gave jolly my btree bulkload code a long, long time ago but never
gave him a bunch of my bugfixes. here's a diff against the 6.0
baseline.
for some reason, this code has slowed down somewhat relative to the
insertion-build code on very small tables. don't know why -- it used
to be within about 10%. anyway, here are some (highly unscientific!)
timings on a dec 3000/300 for synthetic tables with 10k, 100k and
1000k tuples (basically, 1mb, 10mb and 100mb heaps). 'c' means
clustered (pre-sorted) inputs and 'u' means unclustered (randomly
ordered) inputs. the 10k table basically fits in the buffer pool, but
the 100k and 1000k tables don't. as you can see, insertion build is
fine if you've sorted your heaps on your index key or if your heap
fits in core, but is absolutely horrible on unordered data (yes,
that's 7.5 hours to index 100mb of data...) because of the zillions of
random i/os.
if it doesn't work for you for whatever reason, you can always turn it
back off by flipping the FastBuild flag in nbtree.c. i don't have
time to maintain it.
good luck!
baseline code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 8.6
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 9.1
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.2
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 652.4
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.1
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 26772.9
bulkloading code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 11.3
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 10.4
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.5
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 63.5
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.9
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 701.0
Changes:
* Unique index capability works using the syntax 'create unique
index'.
* Duplicate OID's in the system tables are removed. I put
little scripts called 'duplicate_oids' and 'find_oid' in
include/catalog that help to find and remove duplicate OID's.
I also moved 'unused_oids' from backend/catalog to
include/catalog, since it has to be in the same directory
as the include files in order to work.
* The backend tries converting the name of a function or aggregate
to all lowercase if the original name given doesn't work (mostly
for compatibility with ODBC).
* You can 'SELECT NULL' to your heart's content.
* I put my _bt_updateitem fix in instead, which uses
_bt_insertonpg so that even if the new key is so big that
the page has to be split, everything still works.
* All literal references to system catalog OID's have been
replaced with references to define'd constants from the catalog
header files.
* I added a couple of node copy functions. I think this was a
preliminary attempt to get rules to work.
Note. all include files that have been hit so far have had extraneous
include files cleaned out and are reduced to...the lowest common
"include file", based on 'cc -Wall -I. test.c', where test.c is:
#include "postgres.h"
#include "<top of branches>" (ie. top of branches this time was utils/fcache2.h)