diff --git a/doc/TODO.detail/transactions b/doc/TODO.detail/transactions deleted file mode 100644 index ce7af3e4b3..0000000000 --- a/doc/TODO.detail/transactions +++ /dev/null @@ -1,1177 +0,0 @@ -From pgsql-hackers-owner+M11649@postgresql.org Wed Aug 1 15:22:46 2001 -Return-path: -Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) - by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f71JMjN09768 - for ; Wed, 1 Aug 2001 15:22:45 -0400 (EDT) -Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) - by postgresql.org (8.11.3/8.11.1) with SMTP id f71JMUf62338; - Wed, 1 Aug 2001 15:22:30 -0400 (EDT) - (envelope-from pgsql-hackers-owner+M11649@postgresql.org) -Received: from sectorbase2.sectorbase.com (sectorbase2.sectorbase.com [63.88.121.62] (may be forged)) - by postgresql.org (8.11.3/8.11.1) with SMTP id f71J4df57086 - for ; Wed, 1 Aug 2001 15:04:40 -0400 (EDT) - (envelope-from vmikheev@SECTORBASE.COM) -Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) - id ; Wed, 1 Aug 2001 12:04:31 -0700 -Message-ID: <3705826352029646A3E91C53F7189E32016705@sectorbase2.sectorbase.com> -From: "Mikheev, Vadim" -To: "'pgsql-hackers@postgresql.org'" -Subject: [HACKERS] Using POSIX mutex-es -Date: Wed, 1 Aug 2001 12:04:24 -0700 -MIME-Version: 1.0 -X-Mailer: Internet Mail Service (5.5.2653.19) -Content-Type: text/plain; - charset="koi8-r" -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: RO - -1. Just changed - TAS(lock) to pthread_mutex_trylock(lock) - S_LOCK(lock) to pthread_mutex_lock(lock) - S_UNLOCK(lock) to pthread_mutex_unlock(lock) -(and S_INIT_LOCK to share mutex-es between processes). - -2. pgbench was initialized with scale 10. - SUN WS 10 (512Mb), Solaris 2.6 (I'm unable to test on E4500 -:() - -B 16384, wal_files 8, wal_buffers 256, - checkpoint_segments 64, checkpoint_timeout 3600 - 50 clients x 100 transactions - (after initialization DB dir was saved and before each test - copyed back and vacuum-ed). - -3. No difference. - Mutex version maybe 0.5-1 % faster (eg: 37.264238 tps vs 37.083339 tps). - -So - no gain, but no performance loss "from using pthread library" -(I've also run tests with 1 client), at least on Solaris. - -And so - looks like we can use POSIX mutex-es and conditional variables -(not semaphores; man pthread_cond_wait) and should implement light lmgr, -probably with priority locking. - -Vadim - ----------------------------(end of broadcast)--------------------------- -TIP 2: you can get off all lists at once with the unregister command - (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) - -From pgsql-hackers-owner+M18052=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 13:39:19 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0NIdIU26480 - for ; Wed, 23 Jan 2002 13:39:18 -0500 (EST) -Received: (qmail 59371 invoked by alias); 23 Jan 2002 18:39:18 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 23 Jan 2002 18:39:18 -0000 -Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0NIJ8l47400 - for ; Wed, 23 Jan 2002 13:19:08 -0500 (EST) - (envelope-from pgman@candle.pha.pa.us) -Received: (from pgman@localhost) - by candle.pha.pa.us (8.11.6/8.10.1) id g0NIJ5i24508 - for pgsql-hackers@postgreSQL.org; Wed, 23 Jan 2002 13:19:05 -0500 (EST) -From: Bruce Momjian -Message-ID: <200201231819.g0NIJ5i24508@candle.pha.pa.us> -Subject: [HACKERS] Savepoints -To: PostgreSQL-development -Date: Wed, 23 Jan 2002 13:19:05 -0500 (EST) -X-Mailer: ELM [version 2.4ME+ PL96 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: RO - -I have talked in the past about a possible implementation of -savepoints/nested transactions. I would like to more formally outline -my ideas below. - -We have talked about using WAL for such a purpose, but that requires WAL -files to remain for the life of a transaction, which seems unacceptable. -Other database systems do that, and it is a pain for administrators. I -realized we could do some sort of WAL compaction, but that seems quite -complex too. - -Basically, under my plan, WAL would be unchanged. WAL's function is -crash recovery, and it would retain that. There would also be no -on-disk changes. I would use the command counter in certain cases to -identify savepoints. - -My idea is to keep savepoint undo information in a private area per -backend, either in memory or on disk. We can either save the -relid/tids of modified rows, or if there are too many, discard the -saved ones and just remember the modified relids. On rollback to save -point, either clear up the modified relid/tids, or sequential scan -through the relid and clear up all the tuples that have our transaction -id and have command counters that are part of the undo savepoint. - -It seems marking undo savepoint rows with a fixed aborted transaction id -would be the easiest solution. - -Of course, we only remember modified rows when we are in savepoints, and -only undo them when we rollback to a savepoint. Transaction processing -remains the same. - -There is no reason for other backend to be able to see savepoint undo -information, and keeping it private greatly simplifies the -implementation. - --- - Bruce Momjian | http://candle.pha.pa.us - pgman@candle.pha.pa.us | (610) 853-3000 - + If your life is a hard drive, | 830 Blythe Avenue - + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 - ----------------------------(end of broadcast)--------------------------- -TIP 6: Have you searched our list archives? - -http://archives.postgresql.org - -From hstenger@adinet.com.uy Wed Jan 23 14:13:33 2002 -Return-path: -Received: from correo.adinet.com.uy (fecorreo01.adinet.com.uy [206.99.44.217]) - by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0NJDWU29832 - for ; Wed, 23 Jan 2002 14:13:33 -0500 (EST) -Received: from adinet.com.uy (200.61.76.155) by correo.adinet.com.uy (5.5.052) (authenticated as hstenger@adinet.com.uy) - id 3C4DBC5C00017E9F; Wed, 23 Jan 2002 16:13:25 -0300 -Message-ID: <3C4F0BC0.5CFBB919@adinet.com.uy> -Date: Wed, 23 Jan 2002 16:15:12 -0300 -From: Haroldo Stenger -X-Mailer: Mozilla 4.78 [en] (Win98; U) -X-Accept-Language: en -MIME-Version: 1.0 -To: Bruce Momjian -cc: PostgreSQL-development -Subject: Re: [HACKERS] Savepoints -References: <200201231819.g0NIJ5i24508@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Basically, under my plan, WAL would be unchanged. WAL's function is -> crash recovery, and it would retain that. There would also be no -> on-disk changes. I would use the command counter in certain cases to -> identify savepoints. - -This is a pointer to the previous August thread, where your original proposal -was posted, and some WAL/not WAL discussion took place. Just not to repeat the -already mentioned points. Oh, it's google archive just for fun, and to not -overload hub.org ;-) - -http://groups.google.com/groups?hl=en&threadm=200108050432.f754Wdo11696%40candle.pha.pa.us&rnum=1&prev=/groups%3Fhl%3Den%26selm%3D200108050432.f754Wdo11696%2540candle.pha.pa.us - -Regards, -Haroldo. - -From vmikheev@SECTORBASE.COM Wed Jan 23 18:23:04 2002 -Return-path: -Received: from sectorbase2.sectorbase.com ([66.106.163.120]) - by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0NNN3U21442 - for ; Wed, 23 Jan 2002 18:23:04 -0500 (EST) -Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) - id ; Wed, 23 Jan 2002 15:22:52 -0800 -Message-ID: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> -From: "Mikheev, Vadim" -To: "'Bruce Momjian'" , - PostgreSQL-development - -Subject: RE: [HACKERS] Savepoints -Date: Wed, 23 Jan 2002 15:22:42 -0800 -MIME-Version: 1.0 -X-Mailer: Internet Mail Service (5.5.2653.19) -Content-Type: text/plain; - charset="iso-8859-1" -Status: ORr - -> I have talked in the past about a possible implementation of -> savepoints/nested transactions. I would like to more formally outline -> my ideas below. - -Well, I would like to do the same -:) - -> ... -> There is no reason for other backend to be able to see savepoint undo -> information, and keeping it private greatly simplifies the -> implementation. - -Yes... and requires additional memory/disk space: we keep old records -in data files and we'll store them again... - -How about: use overwriting smgr + put old records into rollback -segments - RS - (you have to keep them somewhere till TX's running -anyway) + use WAL only as REDO log (RS will be used to rollback TX' -changes and WAL will be used for RS/data files recovery). -Something like what Oracle does. - -Vadim - -From pgsql-hackers-owner+M18085=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:15:02 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1F1U26461 - for ; Wed, 23 Jan 2002 20:15:02 -0500 (EST) -Received: (qmail 92866 invoked by alias); 24 Jan 2002 01:14:59 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 24 Jan 2002 01:14:59 -0000 -Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O18ml91949 - for ; Wed, 23 Jan 2002 20:08:50 -0500 (EST) - (envelope-from pgman@candle.pha.pa.us) -Received: (from pgman@localhost) - by candle.pha.pa.us (8.11.6/8.10.1) id g0O18jV26044; - Wed, 23 Jan 2002 20:08:45 -0500 (EST) -From: Bruce Momjian -Message-ID: <200201240108.g0O18jV26044@candle.pha.pa.us> -Subject: Re: [HACKERS] Savepoints -In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> -To: "Mikheev, Vadim" -Date: Wed, 23 Jan 2002 20:08:45 -0500 (EST) -cc: PostgreSQL-development -X-Mailer: ELM [version 2.4ME+ PL96 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: OR - -Mikheev, Vadim wrote: -> > I have talked in the past about a possible implementation of -> > savepoints/nested transactions. I would like to more formally outline -> > my ideas below. -> -> Well, I would like to do the same -:) - -Good. - -> > ... -> > There is no reason for other backend to be able to see savepoint undo -> > information, and keeping it private greatly simplifies the -> > implementation. -> -> Yes... and requires additional memory/disk space: we keep old records -> in data files and we'll store them again... - -I was suggesting keeping only relid/tid or in some cases only relid. -Seems like one or the other will fit all needs: relid/tid for update of -a few rows, relid for many rows updated in the same table. I saw no -need to store the actual data. - -> How about: use overwriting smgr + put old records into rollback -> segments - RS - (you have to keep them somewhere till TX's running -> anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> changes and WAL will be used for RS/data files recovery). -> Something like what Oracle does. - -Why record the old data rows rather than the tids? While the -transaction is running, the rows can't be moved anyway. Also, why store -them in a shared area. That has additional requirements because one old -transaction can require all transactions to keep their stuff around. -Why not just make it a private data file for each backend? - --- - Bruce Momjian | http://candle.pha.pa.us - pgman@candle.pha.pa.us | (610) 853-3000 - + If your life is a hard drive, | 830 Blythe Avenue - + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 - ----------------------------(end of broadcast)--------------------------- -TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org - -From pgsql-hackers-owner+M18086=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:25:47 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1PkU26964 - for ; Wed, 23 Jan 2002 20:25:47 -0500 (EST) -Received: (qmail 94878 invoked by alias); 24 Jan 2002 01:25:44 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 24 Jan 2002 01:25:44 -0000 -Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O1L1l94075 - for ; Wed, 23 Jan 2002 20:21:01 -0500 (EST) - (envelope-from pgman@candle.pha.pa.us) -Received: (from pgman@localhost) - by candle.pha.pa.us (8.11.6/8.10.1) id g0O1Kwm26748; - Wed, 23 Jan 2002 20:20:58 -0500 (EST) -From: Bruce Momjian -Message-ID: <200201240120.g0O1Kwm26748@candle.pha.pa.us> -Subject: Re: [HACKERS] Savepoints -In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> -To: "Mikheev, Vadim" -Date: Wed, 23 Jan 2002 20:20:58 -0500 (EST) -cc: PostgreSQL-development -X-Mailer: ELM [version 2.4ME+ PL96 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: OR - -> > There is no reason for other backend to be able to see savepoint undo -> > information, and keeping it private greatly simplifies the -> > implementation. -> -> Yes... and requires additional memory/disk space: we keep old records -> in data files and we'll store them again... -> -> How about: use overwriting smgr + put old records into rollback -> segments - RS - (you have to keep them somewhere till TX's running -> anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> changes and WAL will be used for RS/data files recovery). -> Something like what Oracle does. - -I am sorry. I see what you are saying now. I missed the words -"overwriting smgr". You are suggesting going to an overwriting storage -manager. Is this to be done only because of savepoints. Doesn't seem -worth it when I have a possible solution without such a drastic change. -Also, overwriting storage manager will require MVCC to read through -there to get accurate MVCC visibility, right? - --- - Bruce Momjian | http://candle.pha.pa.us - pgman@candle.pha.pa.us | (610) 853-3000 - + If your life is a hard drive, | 830 Blythe Avenue - + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 - ----------------------------(end of broadcast)--------------------------- -TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org - -From vmikheev@SECTORBASE.COM Wed Jan 23 21:03:29 2002 -Return-path: -Received: from sectorbase2.sectorbase.com ([66.106.163.120]) - by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O23TU28813 - for ; Wed, 23 Jan 2002 21:03:29 -0500 (EST) -Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) - id ; Wed, 23 Jan 2002 18:03:18 -0800 -Message-ID: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com> -From: "Mikheev, Vadim" -To: "'Bruce Momjian'" -cc: PostgreSQL-development -Subject: RE: [HACKERS] Savepoints -Date: Wed, 23 Jan 2002 18:03:11 -0800 -MIME-Version: 1.0 -X-Mailer: Internet Mail Service (5.5.2653.19) -Content-Type: text/plain; - charset="iso-8859-1" -Status: ORr - -> > How about: use overwriting smgr + put old records into rollback -> > segments - RS - (you have to keep them somewhere till TX's running -> > anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> > changes and WAL will be used for RS/data files recovery). -> > Something like what Oracle does. -> -> I am sorry. I see what you are saying now. I missed the words - -And I'm sorry for missing your notes about storing relid+tid only. - -> "overwriting smgr". You are suggesting going to an overwriting -> storage manager. Is this to be done only because of savepoints. - -No. One point I made a few monthes ago (and never got objections) -is - why to keep old data in data files sooooo long? -Imagine long running TX (eg pg_dump). Why other TX-s must read -again and again completely useless (for them) old data we keep -for pg_dump? - -> Doesn't seem worth it when I have a possible solution without -> such a drastic change. -> Also, overwriting storage manager will require MVCC to read -> through there to get accurate MVCC visibility, right? - -Right... just like now non-overwriting smgr requires *ALL* -TX-s to read old data in data files. But with overwriting smgr -TX will read RS only when it is required and as far (much) as -it is required. - -Simple solutions are not always the best ones. -Compare Oracle and InterBase. Both have MVCC. -Smgr-s are different. What RDBMS is more cool? -Why doesn't Oracle use more simple non-overwriting smgr -(as InterBase... and we do)? - -Vadim - -From dhogaza@pacifier.com Wed Jan 23 21:05:37 2002 -Return-path: -Received: from comet.pacifier.com ([199.2.117.155]) - by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O25bU28962 - for ; Wed, 23 Jan 2002 21:05:37 -0500 (EST) -Received: from pacifier.com (dsl-dhogaza.pacifier.net [207.202.226.68]) - by comet.pacifier.com (8.11.2/8.11.1) with ESMTP id g0O24qX29917; - Wed, 23 Jan 2002 18:04:52 -0800 (PST) -Message-ID: <3C4F6BF0.2010406@pacifier.com> -Date: Wed, 23 Jan 2002 18:05:36 -0800 -From: Don Baccus -User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011221 -X-Accept-Language: en-us -MIME-Version: 1.0 -To: Bruce Momjian -cc: "Mikheev, Vadim" , - PostgreSQL-development -Subject: Re: [HACKERS] Savepoints -References: <200201240120.g0O1Kwm26748@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii; format=flowed -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: - - -> I am sorry. I see what you are saying now. I missed the words -> "overwriting smgr". You are suggesting going to an overwriting storage -> manager. - - -Overwriting storage managers don't suffer from unbounded growth of -datafiles until garbage collection (vacuum) is performed. In fact, -there's no need for a vacuum-style utility. The rollback segments only -need to keep around enough past history to rollback transactions that -are executing. - -Of course, then the size of your transactions are limited by the size of -your rollback segments, which in Oracle are fixed in length when you -build your database (there are ways to change this when you figure out -that you didn't pick a good number when creating it). - - >Is this to be done only because of savepoints. - -Not in traditional storage managers such as Oracle uses. The complexity -of managing visibility and the like are traded off against the fact that -you're not stuck ever needing to garbage collect a database that -occupies a roomful of disks. - -It's a trade-off. PG's current storage manager seems to work awfully -well in a lot of common database scenarios, and Tom's new vacuum is -meant to help mitigate against the drawbacks. But overwriting storage -managers certainly have their advantages, too. - - > Doesn't seem - -> worth it when I have a possible solution without such a drastic change. -> Also, overwriting storage manager will require MVCC to read through -> there to get accurate MVCC visibility, right? - - -Yep... - --- -Don Baccus -Portland, OR -http://donb.photo.net, http://birdnotes.net, http://openacs.org - - -From Inoue@tpf.co.jp Thu Jan 24 11:34:48 2002 -Return-path: -Received: from p2272.nsk.ne.jp (p2272.nsk.ne.jp [210.145.18.145]) - by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0OGYjU23980 - for ; Thu, 24 Jan 2002 11:34:47 -0500 (EST) -Received: from mcadnote1 (ppm132.noc.fukui.nsk.ne.jp [61.198.95.32]) - by p2272.nsk.ne.jp (8.9.3/3.7W-20000722) with SMTP id BAA12147; - Fri, 25 Jan 2002 01:34:24 +0900 (JST) -From: "Hiroshi Inoue" -To: "Mikheev, Vadim" -cc: "PostgreSQL-development" , - "'Bruce Momjian'" -Subject: RE: [HACKERS] Savepoints -Date: Fri, 25 Jan 2002 01:34:29 +0900 -Message-ID: -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) -In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> -X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 -Importance: Normal -Status: OR - -> -----Original Message----- -> From: Mikheev, Vadim -> -> How about: use overwriting smgr + put old records into rollback -> segments - RS - (you have to keep them somewhere till TX's running -> anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> changes and WAL will be used for RS/data files recovery). -> Something like what Oracle does. - -As long as we use no overwriting manager -1) Rollback(data) isn't needed in case of a db crash. -2) Rollback(data) isn't needed to cancal a transaction entirely. -3) We don't need to mind the transaction size so much. - -We can't use the db any longer if a REDO recovery fails now. -Under overwriting smgr we can't use the db any longer either -if rollback fails. How could PG be not less reliable than now ? - -regards, -Hiroshi Inoue - -From pgsql-hackers-owner+M18123=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:15:11 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJFAU12547 - for ; Thu, 24 Jan 2002 14:15:10 -0500 (EST) -Received: (qmail 43413 invoked by alias); 24 Jan 2002 19:13:48 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 24 Jan 2002 19:13:48 -0000 -Received: from sectorbase2.sectorbase.com ([66.106.163.120]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJC4l42011 - for ; Thu, 24 Jan 2002 14:12:04 -0500 (EST) - (envelope-from vmikheev@SECTORBASE.COM) -Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) - id ; Thu, 24 Jan 2002 11:11:54 -0800 -Message-ID: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com> -From: "Mikheev, Vadim" -To: "'Hiroshi Inoue'" -cc: PostgreSQL-development , - "'Bruce Momjian'" - -Subject: Re: [HACKERS] Savepoints -Date: Thu, 24 Jan 2002 11:11:52 -0800 -MIME-Version: 1.0 -X-Mailer: Internet Mail Service (5.5.2653.19) -Content-Type: text/plain; - charset="iso-8859-1" -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: OR - -> > How about: use overwriting smgr + put old records into rollback -> > segments - RS - (you have to keep them somewhere till TX's running -> > anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> > changes and WAL will be used for RS/data files recovery). -> > Something like what Oracle does. -> -> As long as we use no overwriting manager -> 1) Rollback(data) isn't needed in case of a db crash. -> 2) Rollback(data) isn't needed to cancal a transaction entirely. - --1) But vacuum must read a huge amount of data to remove dirt. --2) But TX-s must read data they are not interested at all. - -> 3) We don't need to mind the transaction size so much. - --3) The same with overwriting smgr and WAL used *only as REDO log*: -we are not required to keep WAL files for duration of transaction -- as soon as server knows that changes logged in some WAL file -applied to data files and RS on disk (and archived, for WAL-based -BAR) that file may be reused/removed. Old data will still occupy -space in RS but their space in data files will be available -for reuse. - -> We can't use the db any longer if a REDO recovery fails now. - -Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or -use good RAM - whatever caused problem with restart. - -> Under overwriting smgr we can't use the db any longer either -> if rollback fails. - -Why should it fail? Bugs? Fix them. - -> How could PG be not less reliable than now ? - -Is today' RG more reliable than Oracle, Informix, DB2? - -Vadim - ----------------------------(end of broadcast)--------------------------- -TIP 3: if posting/reading through Usenet, please send an appropriate -subscribe-nomail command to majordomo@postgresql.org so that your -message can get through to the mailing list cleanly - -From pgsql-hackers-owner+M18125=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:23:42 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJNfU13481 - for ; Thu, 24 Jan 2002 14:23:42 -0500 (EST) -Received: (qmail 49604 invoked by alias); 24 Jan 2002 19:23:40 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 24 Jan 2002 19:23:40 -0000 -Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJMTl48885 - for ; Thu, 24 Jan 2002 14:22:29 -0500 (EST) - (envelope-from pgman@candle.pha.pa.us) -Received: (from pgman@localhost) - by candle.pha.pa.us (8.11.6/8.10.1) id g0OJMJf13378; - Thu, 24 Jan 2002 14:22:19 -0500 (EST) -From: Bruce Momjian -Message-ID: <200201241922.g0OJMJf13378@candle.pha.pa.us> -Subject: Re: [HACKERS] Savepoints -In-Reply-To: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com> -To: "Mikheev, Vadim" -Date: Thu, 24 Jan 2002 14:22:19 -0500 (EST) -cc: PostgreSQL-development -X-Mailer: ELM [version 2.4ME+ PL96 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: OR - - -OK, I have had time to think about this, and I think I can put the two -proposals into perspective. I will use Vadim's terminology. - -In our current setup, rollback/undo data is kept in the same file as our -live data. This data is used for two purposes, one, for rollback of -transactions, and perhaps subtransactions in the future, and second, for -MVCC visibility for backends making changes. - -So, it seems the real question is whether a database modification should -write the old data into a separate rollback segment and modify the heap -data, or just create a new row and require the old row to be removed -later by vacuum. - -Let's look at this behavior without MVCC. In such cases, if someone -tries to read a modified row, it will block and wait for the modifying -backend to commit or rollback, when it will then continue. In such -cases, there is no reason for the waiting transaction to read the old -data in the redo segment because it can't continue anyway. - -Now, with MVCC, the backend has to read through the redo segment to get -the original data value for that row. - -Now, while rollback segments do help with cleaning out old UPDATE rows, -how does it improve DELETE performance? Seems it would just mark it as -expired like we do now. - -One objection I always had to redo segments was that if I start a -transaction in the morning and walk away, none of the redo segments can -be recycled. I was going to ask if we can force some type of redo -segment compaction to keep old active rows and delete rows no longer -visible to any transaction. However, I now realize that our VACUUM has -the same problem. Tuples with XID >= GetOldestXmin() are not recycled, -meaning we have this problem in our current implementation too. (I -wonder if our vacuum could be smarter about knowing which rows are -visible, perhaps by creating a sorted list of xid's and doing a binary -search on the list to determine visibility.) - -So, I guess the issue is, do we want to keep redo information in the -main table, or split it out into redo segments. Certainly we have to -eliminate the Oracle restrictions that redo segment size is fixed at -install time. - -The advantages of a redo segment is that hopefully we don't have -transactions reading through irrelevant undo information. The -disadvantage is that we now have redo information grouped into table -files where a sequential scan can be performed. (Index scans of redo -info are a performance problem currently.) We would have to somehow -efficiently access redo information grouped into the redo segments. -Perhaps a hash based in relid would help here. Another disadvantage is -concurrency. When we start modifying heap data in place, we have to -prevent other backends from seeing that modification while we move the -old data to the redo segment. - -I guess my feeling is that if we can get vacuum to happen automatically, -how is our current non-overwriting storage manager different from redo -segments? - -One big advantage of redo segments would be that right now, if someone -updates a row repeatedly, there are lots of heap versions of the row -that are difficult to shrink in the table, while if they are in the redo -segments, we can more efficiently remove them, and there is only on heap -row. - -How is recovery handled with rollback segments? Do we write old and new -data to WAL? We just write new data to WAL now, right? Do we fsync -rollback segments? - -Have I outlined this accurately? - ---------------------------------------------------------------------------- - -Mikheev, Vadim wrote: -> > > How about: use overwriting smgr + put old records into rollback -> > > segments - RS - (you have to keep them somewhere till TX's running -> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> > > changes and WAL will be used for RS/data files recovery). -> > > Something like what Oracle does. -> > -> > I am sorry. I see what you are saying now. I missed the words -> -> And I'm sorry for missing your notes about storing relid+tid only. -> -> > "overwriting smgr". You are suggesting going to an overwriting -> > storage manager. Is this to be done only because of savepoints. -> -> No. One point I made a few monthes ago (and never got objections) -> is - why to keep old data in data files sooooo long? -> Imagine long running TX (eg pg_dump). Why other TX-s must read -> again and again completely useless (for them) old data we keep -> for pg_dump? -> -> > Doesn't seem worth it when I have a possible solution without -> > such a drastic change. -> > Also, overwriting storage manager will require MVCC to read -> > through there to get accurate MVCC visibility, right? -> -> Right... just like now non-overwriting smgr requires *ALL* -> TX-s to read old data in data files. But with overwriting smgr -> TX will read RS only when it is required and as far (much) as -> it is required. -> -> Simple solutions are not always the best ones. -> Compare Oracle and InterBase. Both have MVCC. -> Smgr-s are different. What RDBMS is more cool? -> Why doesn't Oracle use more simple non-overwriting smgr -> (as InterBase... and we do)? -> -> Vadim -> - --- - Bruce Momjian | http://candle.pha.pa.us - pgman@candle.pha.pa.us | (610) 853-3000 - + If your life is a hard drive, | 830 Blythe Avenue - + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 - ----------------------------(end of broadcast)--------------------------- -TIP 2: you can get off all lists at once with the unregister command - (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) - -From pgsql-hackers-owner+M18141=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 19:43:38 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0P0hbU15026 - for ; Thu, 24 Jan 2002 19:43:38 -0500 (EST) -Received: (qmail 28642 invoked by alias); 25 Jan 2002 00:43:24 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 25 Jan 2002 00:43:24 -0000 -Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) - by postgresql.org (8.11.3/8.11.4) with SMTP id g0P0YIl27208 - for ; Thu, 24 Jan 2002 19:34:18 -0500 (EST) - (envelope-from Inoue@tpf.co.jp) -Received: (qmail 3661 invoked from network); 25 Jan 2002 00:34:19 -0000 -Received: from unknown (HELO viscomail.tpf.co.jp) (100.0.0.108) - by sd2.tpf-fw-c.co.jp with SMTP; 25 Jan 2002 00:34:19 -0000 -Received: from tpf.co.jp (3dgateway1 [126.0.1.60]) - by viscomail.tpf.co.jp (8.8.8+Sun/8.8.8) with ESMTP id JAA00756; - Fri, 25 Jan 2002 09:34:18 +0900 (JST) -Message-ID: <3C50A807.32A29E09@tpf.co.jp> -Date: Fri, 25 Jan 2002 09:34:15 +0900 -From: Hiroshi Inoue -X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U) -X-Accept-Language: ja -MIME-Version: 1.0 -To: "Mikheev, Vadim" -cc: PostgreSQL-development , - "'Bruce Momjian'" -Subject: Re: [HACKERS] Savepoints -References: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com> -Content-Type: text/plain; charset=iso-2022-jp -Content-Transfer-Encoding: 7bit -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: OR - -"Mikheev, Vadim" wrote: -> -> > > How about: use overwriting smgr + put old records into rollback -> > > segments - RS - (you have to keep them somewhere till TX's running -> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> > > changes and WAL will be used for RS/data files recovery). -> > > Something like what Oracle does. -> > -> > As long as we use no overwriting manager -> > 1) Rollback(data) isn't needed in case of a db crash. -> > 2) Rollback(data) isn't needed to cancal a transaction entirely. -> -> -1) But vacuum must read a huge amount of data to remove dirt. -> -2) But TX-s must read data they are not interested at all. -> -> > 3) We don't need to mind the transaction size so much. -> -> -3) The same with overwriting smgr and WAL used *only as REDO log*: - -The larger RS becomes the longer it would take time to cancel -the transaction whereas it is executed in a momemnt under no -overwriting smgr and for example if RS exhausted all disk space -is PG really safe ? Other backends would also fail because they -couldn't write RS any mode. Many transactions would execute -UNDO operations simultaneously but there's no space to write -WALs (UNDO operations must be written to WAL also) and PG -system would abort. And could PG restart under such situations ? -Even though there's a way to recover from the situation, I -think we should avoid such dangerous situations from the -first. Basically recovery operations should never fail. - -> -> > We can't use the db any longer if a REDO recovery fails now. -> -> Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or -> use good RAM - whatever caused problem with restart. - -As I already mentioned recovery operations should never fail. -> -> > Under overwriting smgr we can't use the db any longer either -> > if rollback fails. -> -> Why should it fail? Bugs? Fix them. - -Rollback operations are executed much more often than -REDO recovery and it is hard to fix such bugs once PG -was released. Most people in such troubles have no -time to persue the cause. In reality I replied to the -PG restart troubles twice (with --wal-debug and pg_resetxlog -suggestions ) in Japan but got no further replies. - -> -> > How could PG be not less reliable than now ? -> -> Is today' RG more reliable than Oracle, Informix, DB2? - -I have never been and would never be optiomistic -about recovery. Is 7.1 more reliable than 7.0 from the -recovery POV ? I see no reason why overwriting smgr is -more relaible than no overwriting smgr as for recovery. - -regards, -Hiroshi Inoue - ----------------------------(end of broadcast)--------------------------- -TIP 6: Have you searched our list archives? - -http://archives.postgresql.org - -From ZeugswetterA@spardat.at Fri Jan 25 09:21:40 2002 -Return-path: -Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1]) - by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0PELde10640 - for ; Fri, 25 Jan 2002 09:21:39 -0500 (EST) -Received: from m01x1.s-mxs.net [10.3.55.201] - by smxsat1.smxs.net - with XWall v3.18f ; - Fri, 25 Jan 2002 15:22:51 +0100 -Received: from m0103.s-mxs.net [10.3.55.3] - by m01x1.s-mxs.net - with XWall v3.18a ; - Fri, 25 Jan 2002 15:21:23 +0100 -Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966); - Fri, 25 Jan 2002 15:21:22 +0100 -X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 -content-class: urn:content-classes:message -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Subject: RE: [HACKERS] Savepoints -Date: Fri, 25 Jan 2002 15:21:22 +0100 -Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA42128DE@m0114.s-mxs.net> -Thread-Topic: [HACKERS] Savepoints -Thread-Index: AcGkZ8SMKn//UUTjS3mi+qC7+gZAwwBQ4YMA -From: "Zeugswetter Andreas SB SD" -To: "Mikheev, Vadim" , - "Bruce Momjian" , - "PostgreSQL-development" -X-OriginalArrivalTime: 25 Jan 2002 14:21:22.0648 (UTC) FILETIME=[9090BD80:01C1A5AB] -Content-Transfer-Encoding: 8bit -X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id g0PELde10640 -Status: OR - -Vadim wrote: -> How about: use overwriting smgr + put old records into rollback -> segments - RS - (you have to keep them somewhere till TX's running -> anyway) + use WAL only as REDO log (RS will be used to rollback TX' -> changes and WAL will be used for RS/data files recovery). -> Something like what Oracle does. - -We have all the info we need in WAL and in the old rows, -why would you want to write them to RS ? -You only need RS for overwriting smgr. - -Andreas - -From pgsql-hackers-owner+M18209=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:14:02 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLE1e19182 - for ; Fri, 25 Jan 2002 16:14:01 -0500 (EST) -Received: (qmail 85111 invoked by alias); 25 Jan 2002 21:13:59 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 25 Jan 2002 21:13:59 -0000 -Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PL48l79366 - for ; Fri, 25 Jan 2002 16:04:09 -0500 (EST) - (envelope-from ZeugswetterA@spardat.at) -Received: from m01x1.s-mxs.net [10.3.55.201] - by smxsat1.smxs.net - with XWall v3.18f ; - Fri, 25 Jan 2002 22:05:21 +0100 -Received: from m0102.s-mxs.net [10.3.55.2] - by m01x1.s-mxs.net - with XWall v3.18a ; - Fri, 25 Jan 2002 22:03:54 +0100 -Received: from m0114.s-mxs.net ([10.3.55.14]) by m0102.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966); - Fri, 25 Jan 2002 22:03:53 +0100 -X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 -content-class: urn:content-classes:message -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Subject: Re: [HACKERS] Savepoints -Date: Fri, 25 Jan 2002 22:03:53 +0100 -Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net> -Thread-Topic: [HACKERS] Savepoints -Thread-Index: AcGlDMGVwSWndt4kT1C7QhclLvQPWgA1arbw -From: "Zeugswetter Andreas SB SD" -To: "Bruce Momjian" , - "Mikheev, Vadim" -cc: "PostgreSQL-development" -X-OriginalArrivalTime: 25 Jan 2002 21:03:53.0685 (UTC) FILETIME=[CBB48850:01C1A5E3] -Content-Transfer-Encoding: 8bit -X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLDAm83732 -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: ORr - - -> Now, with MVCC, the backend has to read through the redo segment to get - -You mean rollback segment, but ... - -> the original data value for that row. - -Will only need to be looked up if the row is currently beeing modified by -a not yet comitted txn (at least in the default read committed mode) - -> -> Now, while rollback segments do help with cleaning out old UPDATE rows, -> how does it improve DELETE performance? Seems it would just mark it as -> expired like we do now. - -delete would probably be: -1. mark original deleted and write whole row to RS - -I don't think you would like to mix looking up deleted rows in heap -but updated rows in RS - -Andreas - -PS: not that I like overwrite with MVCC now -If you think of VACUUM as garbage collection PG is highly trendy with -the non-overwriting smgr. - ----------------------------(end of broadcast)--------------------------- -TIP 5: Have you checked our extensive FAQ? - -http://www.postgresql.org/users-lounge/docs/faq.html - -From pgsql-hackers-owner+M18211=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:53:45 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLrie22174 - for ; Fri, 25 Jan 2002 16:53:44 -0500 (EST) -Received: (qmail 96831 invoked by alias); 25 Jan 2002 21:53:43 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 25 Jan 2002 21:53:43 -0000 -Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PLpRl96298 - for ; Fri, 25 Jan 2002 16:51:27 -0500 (EST) - (envelope-from ZeugswetterA@spardat.at) -Received: from m01x1.s-mxs.net [10.3.55.201] - by smxsat1.smxs.net - with XWall v3.18f ; - Fri, 25 Jan 2002 22:52:54 +0100 -Received: from m0103.s-mxs.net [10.3.55.3] - by m01x1.s-mxs.net - with XWall v3.18a ; - Fri, 25 Jan 2002 22:51:25 +0100 -Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966); - Fri, 25 Jan 2002 22:51:25 +0100 -X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 -content-class: urn:content-classes:message -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Subject: Re: [HACKERS] Savepoints -Date: Fri, 25 Jan 2002 22:51:24 +0100 -Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C5@m0114.s-mxs.net> -Thread-Topic: [HACKERS] Savepoints -Thread-Index: AcGlznYKFcqoYpMnSlGQHhQuEf6LuAAGpxnQ -From: "Zeugswetter Andreas SB SD" -To: "Mikheev, Vadim" -cc: -X-OriginalArrivalTime: 25 Jan 2002 21:51:25.0008 (UTC) FILETIME=[6F39E500:01C1A5EA] -Content-Transfer-Encoding: 8bit -X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLrP196418 -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: OR - - -> > > How about: use overwriting smgr + put old records into rollback -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > > segments - RS - (you have to keep them somewhere till TX's running -> > > anyway) + use WAL only as REDO log (RS will be used to -> rollback TX' -> > > changes and WAL will be used for RS/data files recovery). -> > > Something like what Oracle does. -> > -> > We have all the info we need in WAL and in the old rows, -> > why would you want to write them to RS ? -> > You only need RS for overwriting smgr. -> -> This is what I'm saying - implement Overwriting smgr... - -Yes I am sorry, I am catching up on email and had not read Bruce's -comment (nor yours correctly) :-( - -I was also long in the pro overwriting camp, because I am used to -non MVCC dbs like DB/2 and Informix. (which I like very much) -But I am starting to doubt that overwriting is really so good for -an MVCC db. And I don't think PG wants to switch to non MVCC :-) - -Imho it would only need a much more aggressive VACUUM backend. -(aka garbage collector :-) Maybe It could be designed to sniff the -redo log (buffer) to get a hint at what to actually clean out next. - -Andreas - ----------------------------(end of broadcast)--------------------------- -TIP 4: Don't 'kill -9' the postmaster - -From pgsql-hackers-owner+M18218=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 19:14:24 2002 -Return-path: -Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) - by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0Q0ENe03543 - for ; Fri, 25 Jan 2002 19:14:23 -0500 (EST) -Received: (qmail 22482 invoked by alias); 26 Jan 2002 00:13:55 -0000 -Received: from unknown (HELO postgresql.org) (64.49.215.8) - by www.postgresql.org with SMTP; 26 Jan 2002 00:13:55 -0000 -Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) - by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PNw1l20714 - for ; Fri, 25 Jan 2002 18:58:01 -0500 (EST) - (envelope-from pgman@candle.pha.pa.us) -Received: (from pgman@localhost) - by candle.pha.pa.us (8.11.6/8.10.1) id g0PNvoL02515; - Fri, 25 Jan 2002 18:57:50 -0500 (EST) -From: Bruce Momjian -Message-ID: <200201252357.g0PNvoL02515@candle.pha.pa.us> -Subject: Re: [HACKERS] Savepoints -In-Reply-To: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net> -To: Zeugswetter Andreas SB SD -Date: Fri, 25 Jan 2002 18:57:50 -0500 (EST) -cc: "Mikheev, Vadim" , - PostgreSQL-development -X-Mailer: ELM [version 2.4ME+ PL96 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -Precedence: bulk -Sender: pgsql-hackers-owner@postgresql.org -Status: OR - -Zeugswetter Andreas SB SD wrote: -> -> > Now, with MVCC, the backend has to read through the redo segment to get -> -> You mean rollback segment, but ... - - -Sorry, yes. I get redo/undo/rollback mixed up sometimes. :-) - -> > the original data value for that row. -> -> Will only need to be looked up if the row is currently beeing modified by -> a not yet comitted txn (at least in the default read committed mode) - -Uh, not really. The transaction may have completed after my transaction -started, meaning even though it looks like it is committed, to me, it is -not visible. Most MVCC visibility will require undo lookup. - -> -> > -> > Now, while rollback segments do help with cleaning out old UPDATE rows, -> > how does it improve DELETE performance? Seems it would just mark it as -> > expired like we do now. -> -> delete would probably be: -> 1. mark original deleted and write whole row to RS -> -> I don't think you would like to mix looking up deleted rows in heap -> but updated rows in RS - -Yes, so really the overwriting is only a big win for UPDATE. Right now, -UPDATE is DELETE/INSERT, and that DELETE makes MVCC happy. :-) - -My whole goal was to simplify this so we can see the differences. - - -> PS: not that I like overwrite with MVCC now -> If you think of VACUUM as garbage collection PG is highly trendy with -> the non-overwriting smgr. - -Yes, that is basically what it is now, a garbage collector that collects -in heap rather than in undo. - --- - Bruce Momjian | http://candle.pha.pa.us - pgman@candle.pha.pa.us | (610) 853-3000 - + If your life is a hard drive, | 830 Blythe Avenue - + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 - ----------------------------(end of broadcast)--------------------------- -TIP 3: if posting/reading through Usenet, please send an appropriate -subscribe-nomail command to majordomo@postgresql.org so that your -message can get through to the mailing list cleanly - -From pgman Wed Jan 23 10:36:13 2002 -Subject: Savepoints -To: PostgreSQL-development -Date: Wed, 23 Jan 2002 13:19:05 -0500 (EST) -X-Mailer: ELM [version 2.4ME+ PL96 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -Content-Length: 1829 -Status: OR - -I have talked in the past about a possible implementation of -savepoints/nested transactions. I would like to more formally outline -my ideas below. - -We have talked about using WAL for such a purpose, but that requires WAL -files to remain for the life of a transaction, which seems unacceptable. -Other database systems do that, and it is a pain for administrators. I -realized we could do some sort of WAL compaction, but that seems quite -complex too. - -Basically, under my plan, WAL would be unchanged. WAL's function is -crash recovery, and it would retain that. There would also be no -on-disk changes. I would use the command counter in certain cases to -identify savepoints. - -My idea is to keep savepoint undo information in a private area per -backend, either in memory or on disk. We can either save the -relid/tids of modified rows, or if there are too many, discard the -saved ones and just remember the modified relids. On rollback to save -point, either clear up the modified relid/tids, or sequential scan -through the relid and clear up all the tuples that have our transaction -id and have command counters that are part of the undo savepoint. - -It seems marking undo savepoint rows with a fixed aborted transaction id -would be the easiest solution. - -Of course, we only remember modified rows when we are in savepoints, and -only undo them when we rollback to a savepoint. Transaction processing -remains the same. - -There is no reason for other backend to be able to see savepoint undo -information, and keeping it private greatly simplifies the -implementation. - --- - Bruce Momjian | http://candle.pha.pa.us - pgman@candle.pha.pa.us | (610) 853-3000 - + If your life is a hard drive, | 830 Blythe Avenue - + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 -