this one could be useful for people experiencing out-of-memory crashes while
executing queries which retrieve or use a very large number of tuples.
The problem happens when storage is allocated for functions results used in
a large query, for example:
select upper(name) from big_table;
select big_table.array[1] from big_table;
select count(upper(name)) from big_table;
This patch is a dirty hack that fixes the out-of-memory problem for the most
common cases, like the above ones. It is not the final solution for the
problem but it can work for some people, so I'm posting it.
The patch should be safe because all changes are under #ifdef. Furthermore
the feature can be enabled or disabled at runtime by the `free_tuple_memory'
options in the pg_options file. The option is disabled by default and must
be explicitly enabled at runtime to have any effect.
To enable the patch add the follwing line to Makefile.custom:
CUSTOM_COPT += -DFREE_TUPLE_MEMORY
To enable the option at runtime add the following line to pg_option:
free_tuple_memory=1
Massimo
and possibly for other cases too:
DO NOT cache status of transaction in unknown state
(i.e. non-committed and non-aborted ones)
Example:
T1 reads row updated/inserted by running T2 and cache T2 status.
T2 commits.
Now T1 reads a row updated by T2 and with HEAP_XMAX_COMMITTED
in t_infomask (so cached T2 status is not changed).
Now T1 EvalPlanQual gets updated row version without HEAP_XMIN_COMMITTED
-> TransactionIdDidCommit(t_xmin) and TransactionIdDidAbort(t_xmin)
return FALSE and T2 decides that t_xmin is not committed and gets
ERROR above.
It's too late to find more smart way to handle such cases and so
I just changed xact status caching and got rid TransactionIdFlushCache()
from code.
Changed: transam.c, xact.c, lmgr.c and transam.h - last three
just because of TransactionIdFlushCache() is removed.
2. heapam.c:
T1 marked a row for update. T2 waits for T1 commit/abort.
T1 commits. T3 updates the row before T2 locks row page.
Now T2 sees that new row t_xmax is different from xact id (T1)
T2 was waiting for. Old code did Assert here. New one goes to
HeapTupleSatisfiesUpdate. Obvious changes too.
3. Added Assert to vacuum.c
4. bufmgr.c: break
Assert(buf->r_locks == 0 && !buf->ri_lock)
into two Asserts.
2. varsup.c:ReadNewTransactionId(): don't read nextXid from disk -
this func doesn't allocate next xid, so ShmemVariableCache->nextXid
may be used (but GetNewTransactionId() must be called first).
3. vacuum.c: change elog(ERROR, "Child item....") to elog(NOTICE) -
this is not ERROR, proper handling is just not implemented, yet.
4. s_lock.c: increase S_MAX_BUSY by 2 times.
5. shmem.c:GetSnapshotData(): have to call ReadNewTransactionId()
_after_ SpinAcquire(ShmemIndexLock).
transactions will not assume that MyProc transaction was committed
before snapshot calculations. With old MyProc->xid assignment
(in xact.c:StartTransaction()) there was ability to see the same
row twice (I used gdb for this)!...
2. Assignments of InvalidTransactionId to MyProc->xid and MyProc->xmin
are moved from xact.c:CommitTransaction() to
xact.c:RecordTransactionCommit() - this invalidation must be done
before releasing transaction locks or bad (too high) XmaxRecent value
might be used by vacuum ("ERROR: Child itemid marked as unused"
reported by "Hiroshi Inoue" <Inoue@tpf.co.jp>; once again, gdb
allowed me reproduce this error).
files to be closed automatically at transaction abort or commit, should
they still be open. Also close any still-open stdio files allocated with
AllocateFile at abort/commit. This should eliminate problems with leakage
of file descriptors after an error. Also, put in some primitive buffered-IO
support so that psort.c can use virtual files without severe performance
penalties.
can be generated in a buffer and then sent to the frontend in a single
libpq call. This solves problems with NOTICE and ERROR messages generated
in the middle of a data message or COPY OUT operation.
indexes.
1. Index Scan using plural indexids never scan backward
as to the order of indexids.
2. The cursor using Index scan is not usable after moving
past the end.
This patch solves above bugs.
Moreover the change of _bt_first() would be useful to extend
ORDER BY patch by Jan Wieck for all descending order cases.
Hiroshi Inoue
2. Much faster btree tuples deletion in the case when first on page
index tuple is deleted (no movement to the left page(s)).
3. Remember blkno of new root page in BTPageOpaque of
left/right siblings when root page is splitted.
I would like some feedback on what the hash function for the int8 hash
function
in the ./backend/access/hash/hashfunc.c should return.
Also, could someone (maybe Tomas Lockhart?) look-over the patch and make
sure
the system table entries are correct? I've tried to research them as
much as I
could, but some of them are still not clear to me.
Thanks,
-Ryan
Ok. I made patches replacing all of "#if FALSE" or "#if 0" to "#ifdef
NOT_USED" for current. I have tested these patches in that the
postgres binaries are identical.
so that fetching an attribute value needs only one SearchSysCacheTuple call
instead of two redundant searches. This speeds up a large SELECT by about
ten percent, and probably will help GROUP BY and SELECT DISTINCT too.
for against a just updated CVS tree. It contains
Partial new rewrite system that handles subselects, view
aggregate columns, insert into select from view, updates
with set col = view-value and select rules restriction to
view definition.
Updates for rule/view backparsing utility functions to
handle subselects correct.
New system views pg_tables and pg_indexes (where you can
see the complete index definition in the latter one).
Enabling array references on query parameters.
Bugfix for functional index.
Little changes to system views pg_rules and pg_views.
The rule system isn't a release-stopper any longer.
But another stopper is that I don't know if the latest
changes to PL/pgSQL (not already in CVS) made it compile on
AIX. Still wait for some response from Dave.
Jan
After some playing with gdb I found that in printtup() there is a non null
attribute with typeinfo->attrs[i]->atttypid = 0 (invalid oid). Unfortunately
attibutes with invalid type are neither printed nor marked as null, and this
explains why psql doesn't get all the expected data.
So I made this patch to printtup():
> tprintf.patch
>
> tprintf.patch
>
> adds functions and macros which implement a conditional trace package
> with the ability to change flags and numeric options of running
> backends at runtime.
> Options/flags can be specified in the command line and/or read from
> the file pg_options in the data directory.
no longer returns buffer pointer, can be gotten from scan;
descriptor; bootstrap can create multi-key indexes;
pg_procname index now is multi-key index; oidint2, oidint4, oidname
are gone (must be removed from regression tests); use System Cache
rather than sequential scan in many places; heap_modifytuple no
longer takes buffer parameter; remove unused buffer parameter in
a few other functions; oid8 is not index-able; remove some use of
single-character variable names; cleanup Buffer variables usage
and scan descriptor looping; cleaned up allocation and freeing of
tuples; 18k lines of diff;
As Bruce mentioned, this is due to the conflict among changes we made.
Included patches should fix the problem(I changed all MB to
MULTIBYTE). Please let me know if you have further problem.
P.S. I did not include pathces to configure and gram.c to save the
file size(configure.in and gram.y modified).
calls. Outside a transaction, the backend detects them as buffer
leaks; it sends a NOTICE, and frees them. This sometimes cause a
segmentation fault (at least on Linux). These indexes are initialized
on the first lo_read/lo_write/lo_tell call, and (normally) closed
on a lo_close call. Thus the buffer leaks appear when lo direct
access functions are used, and not with lo_import/lo_export functions
(libpq version calls lo_close before ending the command, and the
backend version uses another path).
The included patches (against recent snapshot, and against 6.3.2)
cause indexes to be closed on transaction end (that is on explicit
'END' statment, or on command termination outside trasaction blocks),
thus preventing the buffer leaks while increasing performance inside
transactions. Some (all?) 'classic' memory leaks are also removed.
I hope it will be ok.
--- Pascal ANDRE, graduated from Ecole Centrale Paris andre@via.ecp.fr
I have implemented a framework of encoding translation between the
backend and the frontend. Also I have added a new variable setting
command:
SET CLIENT_ENCODING TO 'encoding';
Other features include:
Latin1 support more 8 bit cleaness
See doc/README.mb for more details. Note that the pacthes are
against May 30 snapshot.
Tatsuo Ishii
1. Removes the unnecessary "#define AbcRegProcedure 123"'s from
pg_proc.h.
2. Changes those #defines to use the names already defined in
fmgr.h.
3. Forces the make of fmgr.h in backend/Makefile instead of having
it
made as a dependency in access/common/Makefile *hack*hack*hack*
4. Rearranged the #includes to a less helter-skelter arrangement,
also
changing <file.h> to "file.h" to signify a non-system header.
5. Removed "pg_proc.h" from files where its only purpose was for
the
#defines removed in item #1.
6. Added "fmgr.h" to each file changed for completeness sake.
Turns out that #6 was not necessary for some files because fmgr.h
was being included in a roundabout way SIX levels deep by the first
include.
"access/genam.h"
->"access/relscan.h"
->"utils/rel.h"
->"access/strat.h"
->"access/skey.h"
->"fmgr.h"
So adding fmgr.h really didn't add anything to the compile, hopefully
just made it clearer to the programmer.
S Darren.
Attached you'll find a (big) patch that fixes make dep and make
depend in all Makefiles where I found it to be appropriate.
It also removes the dependency in Makefile.global for NAMEDATALEN
and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh
a little smarter.
This no longer requires initdb.sh that is turned into initdb with
a sed script when installing Postgres, hence initdb.sh should be
renamed to initdb (after the patch has been applied :-) )
This patch is against the 6.3 sources, as it took a while to
complete.
Please review and apply,
Cheers,
Jeroen van Vianen
1. Remove the char2, char4, char8 and char16 types from postgresql
2. Change references of char16 to name in the regression tests.
3. Rename the char16.sql regression test to name.sql. 4. Modify
the regression test scripts and outputs to match up.
Might require new regression.{SYSTEM} files...
Darren King
#define TAPETEMP "pg_btsortXXXXXX"
to:
#define TAPETEMP "pg_btsortXXXXXXX"
For some reason, under FreeBSD, it appears that the mktemp() value needs the
extra 'X' to improve/ensure uniqueness
select from a table with attrs (a int, b char(20))
crashed in bpcharout() (palloc of -1 bytes). But a table
with attrs (a int, b varchar(20)) worked.
From: Jan Wieck <jwieck@debis.com>
varchar length.
Cleans up code so attlen is always length.
Removed varchar() hack added earlier.
Will fix bug in selecting varchar() fields, and varchar() can be
variable length.
Patch by: wieck@sapserv.debis.de (Jan Wieck)
One of the design rules of PostgreSQL is extensibility. And
to follow this rule means (at least for me) that there should
not only be a builtin PL. Instead I would prefer a defined
interface for PL implemetations.
==========================================
What follows is a set of diffs that cleans up the usage of BLCKSZ.
As a side effect, the person compiling the code can change the
value of BLCKSZ _at_their_own_risk_. By that, I mean that I've
tried it here at 4096 and 16384 with no ill-effects. A value
of 4096 _shouldn't_ affect much as far as the kernel/file system
goes, but making it bigger than 8192 can have severe consequences
if you don't know what you're doing. 16394 worked for me, _BUT_
when I went to 32768 and did an initdb, the SCSI driver broke and
the partition that I was running under went to hell in a hand
basket. Had to reboot and do a good bit of fsck'ing to fix things up.
The patch can be safely applied though. Just leave BLCKSZ = 8192
and everything is as before. It basically only cleans up all of the
references to BLCKSZ in the code.
If this patch is applied, a comment in the config.h file though above
the BLCKSZ define with warning about monkeying around with it would
be a good idea.
Darren darrenk@insightdist.com
(Also cleans up some of the #includes in files referencing BLCKSZ.)
==========================================
Essentially, this cleans things up so that if PORTNAME isn't defined (I'm
working on getting rid of it for FreeBSD, at least, to see if its possible)
none of the PORTNAME related stuff gets passed around.
Had a little bit of -I related redundancy as well
Subject: [PORTS] Patches for Irix 6.4
I have worked out how to compile PostgreSQL on Irix 6.4 using the -n32 compiler
mode and version 7.1 of the C compiler. (The n32 compiler use 32 bits
addressing,
but allows access to all the instructions in the MIPS4 instruction set.)
There were several problems:
1) The ld command is not referenced as a macro in all the Makefiles. On
this platform, you have to include -n32 on all the ld commands. Makefiles
were changed as needed.
3) Lots of warnings are generated from the compiler. Since the regression
tests worked OK, I didn't attempt to fix them. If anyone wants the compilation
log, please let me know, and I'll email it to you.
The version of postgresql was 970602. Here is Makefile.custom:
CUSTOM_COPT = -O2 -n32
MK_NO_LORDER = 1
LD = ld -n32
CC += -n32
Subject: [PATCHES] Re: [PORTS] AIX 6.1 fixes...
Here are the patches for the two things that wouldn't make it thru the AIX
compiler. The geo_ops.c change is harmless I believe. The nbtcompare.c patch
fixes me, but I don't know about any other ports. Maybe wait on that one
until Vadim decides what to do about the unsigned vs signed chars varlena
issue.
when btree used in innerscan with run-time key which value
passed by pointer.
Fix: keys ordering stuff moved to _bt_first().
Pointed by Thomas Lockhart.
OK, here are a passel of patches for the geometric data types.
These add a "circle" data type, new operators and functions
for the existing data types, and change the default formats
for some of the existing types to make them consistant with
each other. Current formatting conventions (e.g. compatible
with v6.0 to allow dump/reload) are supported, but the new
conventions should be an improvement and we can eventually
drop the old conventions entirely.
For example, there are two kinds of paths (connected line segments),
open and closed, and the old format was
'(1,2,1,2,3,4)' for a closed path with two points (1,2) and (3,4)
'(0,2,1,2,3,4)' for an open path with two points (1,2) and (3,4)
Pretty arcane, huh? The new format for paths is
'((1,2),(3,4))' for a closed path with two points (1,2) and (3,4)
'[(1,2),(3,4)]' for an open path with two points (1,2) and (3,4)
For polygons, the old convention is
'(0,4,2,0,4,3)' for a triangle with points at (0,0),(4,4), and (2,3)
and the new convention is
'((0,0),(4,4),(2,3))' for a triangle with points at (0,0),(4,4), and (2,3)
Other data types which are also represented as lists of points
(e.g. boxes, line segments, and polygons) have similar representations
(they surround each point with parens).
For v6.1, any format which can be interpreted as the old style format
is decoded as such; we can remove that backwards compatibility but ugly
convention for v7.0. This will allow dump/reloads from v6.0.
These include some updates to the regression test files to change the test
for creating a data type from "circle" to "widget" to keep the test from
trashing the new builtin circle type.
index tuple (logical position within A LEVEL). bti_oid & bti_dummy
taken off from BTItemData.
2. Fix for multi-column indices (nbtsearch.c):
_bt_binsrch() - for searches on internal pages having keysize <
number of attrs we point at the last item < the scankey, not at the
first item = the scankey;
_bt_moveright() - if keysize < number of attrs we compare scankey with
_last_ item on current page to decide should we move right or
not.
Reply-To: hackers@hub.org, Dan McGuirk <mcguirk@indirect.com>
To: hackers@hub.org
Subject: [HACKERS] tmin writeback optimization
I was doing some profiling of the backend, and noticed that during a certain
benchmark I was running somewhere between 30% and 75% of the backend's CPU
time was being spent in calls to TransactionIdDidCommit() from
HeapTupleSatisfiesNow() or HeapTupleSatisfiesItself() to determine that
changed rows' transactions had in fact been committed even though the rows'
tmin values had not yet been set.
When a query looks at a given row, it needs to figure out whether the
transaction that changed the row has been committed and hence it should pay
attention to the row, or whether on the other hand the transaction is still
in progress or has been aborted and hence the row should be ignored. If
a tmin value is set, it is known definitively that the row's transaction
has been committed. However, if tmin is not set, the transaction
referred to in xmin must be looked up in pg_log, and this is what the
backend was spending a lot of time doing during my benchmark.
So, implementing a method suggested by Vadim, I created the following
patch that, the first time a query finds a committed row whose tmin value
is not set, sets it, and marks the buffer where the row is stored as
dirty. (It works for tmax, too.) This doesn't result in the boost in
real time performance I was hoping for, however it does decrease backend
CPU usage by up to two-thirds in certain situations, so it could be
rather beneficial in high-concurrency settings.
Actually required by multi-column indices support.
We still don't use btree for 'A is (not) null', but
now btree keep items with NULL attrs using single rule
for placing/finding items on pages:
NULLs greater NOT_NULLs and NULL = NULL.
+ Bulkload code (nbtsort.c) support for multi-column indices
building and NULLs.
+ Fix for btendscan()->pfree(scanopaque) from Chris Dunlop.
Subject: [HACKERS] linux/alpha patches
These patches lay the groundwork for a Linux/Alpha port. The port doesn't
actually work unless you tweak the linker to put all the pointers in the
first 32 bits of the address space, but it's at least a start. It
implements the test-and-set instruction in Alpha assembly, and also fixes
a lot of pointer-to-integer conversions, which is probably good anyway.
Subject: [HACKERS] abort failed transaction patch
This patch allows you to end a transaction that has failed on an error
using the 'ABORT' statement without generating another error message.
(By default you get an error unless you use 'END' to terminate the
transaction, which has already been aborted anyway.)