diff --git a/doc/TODO.detail/transactions b/doc/TODO.detail/transactions index 8898580bc2..ce7af3e4b3 100644 --- a/doc/TODO.detail/transactions +++ b/doc/TODO.detail/transactions @@ -167,3 +167,1011 @@ http://groups.google.com/groups?hl=en&threadm=200108050432.f754Wdo11696%40candle Regards, Haroldo. +From vmikheev@SECTORBASE.COM Wed Jan 23 18:23:04 2002 +Return-path: +Received: from sectorbase2.sectorbase.com ([66.106.163.120]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0NNN3U21442 + for ; Wed, 23 Jan 2002 18:23:04 -0500 (EST) +Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) + id ; Wed, 23 Jan 2002 15:22:52 -0800 +Message-ID: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> +From: "Mikheev, Vadim" +To: "'Bruce Momjian'" , + PostgreSQL-development + +Subject: RE: [HACKERS] Savepoints +Date: Wed, 23 Jan 2002 15:22:42 -0800 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2653.19) +Content-Type: text/plain; + charset="iso-8859-1" +Status: ORr + +> I have talked in the past about a possible implementation of +> savepoints/nested transactions. I would like to more formally outline +> my ideas below. + +Well, I would like to do the same -:) + +> ... +> There is no reason for other backend to be able to see savepoint undo +> information, and keeping it private greatly simplifies the +> implementation. + +Yes... and requires additional memory/disk space: we keep old records +in data files and we'll store them again... + +How about: use overwriting smgr + put old records into rollback +segments - RS - (you have to keep them somewhere till TX's running +anyway) + use WAL only as REDO log (RS will be used to rollback TX' +changes and WAL will be used for RS/data files recovery). +Something like what Oracle does. + +Vadim + +From pgsql-hackers-owner+M18085=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:15:02 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1F1U26461 + for ; Wed, 23 Jan 2002 20:15:02 -0500 (EST) +Received: (qmail 92866 invoked by alias); 24 Jan 2002 01:14:59 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 24 Jan 2002 01:14:59 -0000 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O18ml91949 + for ; Wed, 23 Jan 2002 20:08:50 -0500 (EST) + (envelope-from pgman@candle.pha.pa.us) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g0O18jV26044; + Wed, 23 Jan 2002 20:08:45 -0500 (EST) +From: Bruce Momjian +Message-ID: <200201240108.g0O18jV26044@candle.pha.pa.us> +Subject: Re: [HACKERS] Savepoints +In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> +To: "Mikheev, Vadim" +Date: Wed, 23 Jan 2002 20:08:45 -0500 (EST) +cc: PostgreSQL-development +X-Mailer: ELM [version 2.4ME+ PL96 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + +Mikheev, Vadim wrote: +> > I have talked in the past about a possible implementation of +> > savepoints/nested transactions. I would like to more formally outline +> > my ideas below. +> +> Well, I would like to do the same -:) + +Good. + +> > ... +> > There is no reason for other backend to be able to see savepoint undo +> > information, and keeping it private greatly simplifies the +> > implementation. +> +> Yes... and requires additional memory/disk space: we keep old records +> in data files and we'll store them again... + +I was suggesting keeping only relid/tid or in some cases only relid. +Seems like one or the other will fit all needs: relid/tid for update of +a few rows, relid for many rows updated in the same table. I saw no +need to store the actual data. + +> How about: use overwriting smgr + put old records into rollback +> segments - RS - (you have to keep them somewhere till TX's running +> anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> changes and WAL will be used for RS/data files recovery). +> Something like what Oracle does. + +Why record the old data rows rather than the tids? While the +transaction is running, the rows can't be moved anyway. Also, why store +them in a shared area. That has additional requirements because one old +transaction can require all transactions to keep their stuff around. +Why not just make it a private data file for each backend? + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + +---------------------------(end of broadcast)--------------------------- +TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org + +From pgsql-hackers-owner+M18086=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:25:47 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1PkU26964 + for ; Wed, 23 Jan 2002 20:25:47 -0500 (EST) +Received: (qmail 94878 invoked by alias); 24 Jan 2002 01:25:44 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 24 Jan 2002 01:25:44 -0000 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O1L1l94075 + for ; Wed, 23 Jan 2002 20:21:01 -0500 (EST) + (envelope-from pgman@candle.pha.pa.us) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g0O1Kwm26748; + Wed, 23 Jan 2002 20:20:58 -0500 (EST) +From: Bruce Momjian +Message-ID: <200201240120.g0O1Kwm26748@candle.pha.pa.us> +Subject: Re: [HACKERS] Savepoints +In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> +To: "Mikheev, Vadim" +Date: Wed, 23 Jan 2002 20:20:58 -0500 (EST) +cc: PostgreSQL-development +X-Mailer: ELM [version 2.4ME+ PL96 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + +> > There is no reason for other backend to be able to see savepoint undo +> > information, and keeping it private greatly simplifies the +> > implementation. +> +> Yes... and requires additional memory/disk space: we keep old records +> in data files and we'll store them again... +> +> How about: use overwriting smgr + put old records into rollback +> segments - RS - (you have to keep them somewhere till TX's running +> anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> changes and WAL will be used for RS/data files recovery). +> Something like what Oracle does. + +I am sorry. I see what you are saying now. I missed the words +"overwriting smgr". You are suggesting going to an overwriting storage +manager. Is this to be done only because of savepoints. Doesn't seem +worth it when I have a possible solution without such a drastic change. +Also, overwriting storage manager will require MVCC to read through +there to get accurate MVCC visibility, right? + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + +---------------------------(end of broadcast)--------------------------- +TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org + +From vmikheev@SECTORBASE.COM Wed Jan 23 21:03:29 2002 +Return-path: +Received: from sectorbase2.sectorbase.com ([66.106.163.120]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O23TU28813 + for ; Wed, 23 Jan 2002 21:03:29 -0500 (EST) +Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) + id ; Wed, 23 Jan 2002 18:03:18 -0800 +Message-ID: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com> +From: "Mikheev, Vadim" +To: "'Bruce Momjian'" +cc: PostgreSQL-development +Subject: RE: [HACKERS] Savepoints +Date: Wed, 23 Jan 2002 18:03:11 -0800 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2653.19) +Content-Type: text/plain; + charset="iso-8859-1" +Status: ORr + +> > How about: use overwriting smgr + put old records into rollback +> > segments - RS - (you have to keep them somewhere till TX's running +> > anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> > changes and WAL will be used for RS/data files recovery). +> > Something like what Oracle does. +> +> I am sorry. I see what you are saying now. I missed the words + +And I'm sorry for missing your notes about storing relid+tid only. + +> "overwriting smgr". You are suggesting going to an overwriting +> storage manager. Is this to be done only because of savepoints. + +No. One point I made a few monthes ago (and never got objections) +is - why to keep old data in data files sooooo long? +Imagine long running TX (eg pg_dump). Why other TX-s must read +again and again completely useless (for them) old data we keep +for pg_dump? + +> Doesn't seem worth it when I have a possible solution without +> such a drastic change. +> Also, overwriting storage manager will require MVCC to read +> through there to get accurate MVCC visibility, right? + +Right... just like now non-overwriting smgr requires *ALL* +TX-s to read old data in data files. But with overwriting smgr +TX will read RS only when it is required and as far (much) as +it is required. + +Simple solutions are not always the best ones. +Compare Oracle and InterBase. Both have MVCC. +Smgr-s are different. What RDBMS is more cool? +Why doesn't Oracle use more simple non-overwriting smgr +(as InterBase... and we do)? + +Vadim + +From dhogaza@pacifier.com Wed Jan 23 21:05:37 2002 +Return-path: +Received: from comet.pacifier.com ([199.2.117.155]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O25bU28962 + for ; Wed, 23 Jan 2002 21:05:37 -0500 (EST) +Received: from pacifier.com (dsl-dhogaza.pacifier.net [207.202.226.68]) + by comet.pacifier.com (8.11.2/8.11.1) with ESMTP id g0O24qX29917; + Wed, 23 Jan 2002 18:04:52 -0800 (PST) +Message-ID: <3C4F6BF0.2010406@pacifier.com> +Date: Wed, 23 Jan 2002 18:05:36 -0800 +From: Don Baccus +User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011221 +X-Accept-Language: en-us +MIME-Version: 1.0 +To: Bruce Momjian +cc: "Mikheev, Vadim" , + PostgreSQL-development +Subject: Re: [HACKERS] Savepoints +References: <200201240120.g0O1Kwm26748@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii; format=flowed +Content-Transfer-Encoding: 7bit +Status: OR + +Bruce Momjian wrote: + + +> I am sorry. I see what you are saying now. I missed the words +> "overwriting smgr". You are suggesting going to an overwriting storage +> manager. + + +Overwriting storage managers don't suffer from unbounded growth of +datafiles until garbage collection (vacuum) is performed. In fact, +there's no need for a vacuum-style utility. The rollback segments only +need to keep around enough past history to rollback transactions that +are executing. + +Of course, then the size of your transactions are limited by the size of +your rollback segments, which in Oracle are fixed in length when you +build your database (there are ways to change this when you figure out +that you didn't pick a good number when creating it). + + >Is this to be done only because of savepoints. + +Not in traditional storage managers such as Oracle uses. The complexity +of managing visibility and the like are traded off against the fact that +you're not stuck ever needing to garbage collect a database that +occupies a roomful of disks. + +It's a trade-off. PG's current storage manager seems to work awfully +well in a lot of common database scenarios, and Tom's new vacuum is +meant to help mitigate against the drawbacks. But overwriting storage +managers certainly have their advantages, too. + + > Doesn't seem + +> worth it when I have a possible solution without such a drastic change. +> Also, overwriting storage manager will require MVCC to read through +> there to get accurate MVCC visibility, right? + + +Yep... + +-- +Don Baccus +Portland, OR +http://donb.photo.net, http://birdnotes.net, http://openacs.org + + +From Inoue@tpf.co.jp Thu Jan 24 11:34:48 2002 +Return-path: +Received: from p2272.nsk.ne.jp (p2272.nsk.ne.jp [210.145.18.145]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0OGYjU23980 + for ; Thu, 24 Jan 2002 11:34:47 -0500 (EST) +Received: from mcadnote1 (ppm132.noc.fukui.nsk.ne.jp [61.198.95.32]) + by p2272.nsk.ne.jp (8.9.3/3.7W-20000722) with SMTP id BAA12147; + Fri, 25 Jan 2002 01:34:24 +0900 (JST) +From: "Hiroshi Inoue" +To: "Mikheev, Vadim" +cc: "PostgreSQL-development" , + "'Bruce Momjian'" +Subject: RE: [HACKERS] Savepoints +Date: Fri, 25 Jan 2002 01:34:29 +0900 +Message-ID: +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) +In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com> +X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 +Importance: Normal +Status: OR + +> -----Original Message----- +> From: Mikheev, Vadim +> +> How about: use overwriting smgr + put old records into rollback +> segments - RS - (you have to keep them somewhere till TX's running +> anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> changes and WAL will be used for RS/data files recovery). +> Something like what Oracle does. + +As long as we use no overwriting manager +1) Rollback(data) isn't needed in case of a db crash. +2) Rollback(data) isn't needed to cancal a transaction entirely. +3) We don't need to mind the transaction size so much. + +We can't use the db any longer if a REDO recovery fails now. +Under overwriting smgr we can't use the db any longer either +if rollback fails. How could PG be not less reliable than now ? + +regards, +Hiroshi Inoue + +From pgsql-hackers-owner+M18123=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:15:11 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJFAU12547 + for ; Thu, 24 Jan 2002 14:15:10 -0500 (EST) +Received: (qmail 43413 invoked by alias); 24 Jan 2002 19:13:48 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 24 Jan 2002 19:13:48 -0000 +Received: from sectorbase2.sectorbase.com ([66.106.163.120]) + by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJC4l42011 + for ; Thu, 24 Jan 2002 14:12:04 -0500 (EST) + (envelope-from vmikheev@SECTORBASE.COM) +Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) + id ; Thu, 24 Jan 2002 11:11:54 -0800 +Message-ID: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com> +From: "Mikheev, Vadim" +To: "'Hiroshi Inoue'" +cc: PostgreSQL-development , + "'Bruce Momjian'" + +Subject: Re: [HACKERS] Savepoints +Date: Thu, 24 Jan 2002 11:11:52 -0800 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2653.19) +Content-Type: text/plain; + charset="iso-8859-1" +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + +> > How about: use overwriting smgr + put old records into rollback +> > segments - RS - (you have to keep them somewhere till TX's running +> > anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> > changes and WAL will be used for RS/data files recovery). +> > Something like what Oracle does. +> +> As long as we use no overwriting manager +> 1) Rollback(data) isn't needed in case of a db crash. +> 2) Rollback(data) isn't needed to cancal a transaction entirely. + +-1) But vacuum must read a huge amount of data to remove dirt. +-2) But TX-s must read data they are not interested at all. + +> 3) We don't need to mind the transaction size so much. + +-3) The same with overwriting smgr and WAL used *only as REDO log*: +we are not required to keep WAL files for duration of transaction +- as soon as server knows that changes logged in some WAL file +applied to data files and RS on disk (and archived, for WAL-based +BAR) that file may be reused/removed. Old data will still occupy +space in RS but their space in data files will be available +for reuse. + +> We can't use the db any longer if a REDO recovery fails now. + +Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or +use good RAM - whatever caused problem with restart. + +> Under overwriting smgr we can't use the db any longer either +> if rollback fails. + +Why should it fail? Bugs? Fix them. + +> How could PG be not less reliable than now ? + +Is today' RG more reliable than Oracle, Informix, DB2? + +Vadim + +---------------------------(end of broadcast)--------------------------- +TIP 3: if posting/reading through Usenet, please send an appropriate +subscribe-nomail command to majordomo@postgresql.org so that your +message can get through to the mailing list cleanly + +From pgsql-hackers-owner+M18125=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:23:42 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJNfU13481 + for ; Thu, 24 Jan 2002 14:23:42 -0500 (EST) +Received: (qmail 49604 invoked by alias); 24 Jan 2002 19:23:40 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 24 Jan 2002 19:23:40 -0000 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJMTl48885 + for ; Thu, 24 Jan 2002 14:22:29 -0500 (EST) + (envelope-from pgman@candle.pha.pa.us) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g0OJMJf13378; + Thu, 24 Jan 2002 14:22:19 -0500 (EST) +From: Bruce Momjian +Message-ID: <200201241922.g0OJMJf13378@candle.pha.pa.us> +Subject: Re: [HACKERS] Savepoints +In-Reply-To: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com> +To: "Mikheev, Vadim" +Date: Thu, 24 Jan 2002 14:22:19 -0500 (EST) +cc: PostgreSQL-development +X-Mailer: ELM [version 2.4ME+ PL96 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + + +OK, I have had time to think about this, and I think I can put the two +proposals into perspective. I will use Vadim's terminology. + +In our current setup, rollback/undo data is kept in the same file as our +live data. This data is used for two purposes, one, for rollback of +transactions, and perhaps subtransactions in the future, and second, for +MVCC visibility for backends making changes. + +So, it seems the real question is whether a database modification should +write the old data into a separate rollback segment and modify the heap +data, or just create a new row and require the old row to be removed +later by vacuum. + +Let's look at this behavior without MVCC. In such cases, if someone +tries to read a modified row, it will block and wait for the modifying +backend to commit or rollback, when it will then continue. In such +cases, there is no reason for the waiting transaction to read the old +data in the redo segment because it can't continue anyway. + +Now, with MVCC, the backend has to read through the redo segment to get +the original data value for that row. + +Now, while rollback segments do help with cleaning out old UPDATE rows, +how does it improve DELETE performance? Seems it would just mark it as +expired like we do now. + +One objection I always had to redo segments was that if I start a +transaction in the morning and walk away, none of the redo segments can +be recycled. I was going to ask if we can force some type of redo +segment compaction to keep old active rows and delete rows no longer +visible to any transaction. However, I now realize that our VACUUM has +the same problem. Tuples with XID >= GetOldestXmin() are not recycled, +meaning we have this problem in our current implementation too. (I +wonder if our vacuum could be smarter about knowing which rows are +visible, perhaps by creating a sorted list of xid's and doing a binary +search on the list to determine visibility.) + +So, I guess the issue is, do we want to keep redo information in the +main table, or split it out into redo segments. Certainly we have to +eliminate the Oracle restrictions that redo segment size is fixed at +install time. + +The advantages of a redo segment is that hopefully we don't have +transactions reading through irrelevant undo information. The +disadvantage is that we now have redo information grouped into table +files where a sequential scan can be performed. (Index scans of redo +info are a performance problem currently.) We would have to somehow +efficiently access redo information grouped into the redo segments. +Perhaps a hash based in relid would help here. Another disadvantage is +concurrency. When we start modifying heap data in place, we have to +prevent other backends from seeing that modification while we move the +old data to the redo segment. + +I guess my feeling is that if we can get vacuum to happen automatically, +how is our current non-overwriting storage manager different from redo +segments? + +One big advantage of redo segments would be that right now, if someone +updates a row repeatedly, there are lots of heap versions of the row +that are difficult to shrink in the table, while if they are in the redo +segments, we can more efficiently remove them, and there is only on heap +row. + +How is recovery handled with rollback segments? Do we write old and new +data to WAL? We just write new data to WAL now, right? Do we fsync +rollback segments? + +Have I outlined this accurately? + +--------------------------------------------------------------------------- + +Mikheev, Vadim wrote: +> > > How about: use overwriting smgr + put old records into rollback +> > > segments - RS - (you have to keep them somewhere till TX's running +> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> > > changes and WAL will be used for RS/data files recovery). +> > > Something like what Oracle does. +> > +> > I am sorry. I see what you are saying now. I missed the words +> +> And I'm sorry for missing your notes about storing relid+tid only. +> +> > "overwriting smgr". You are suggesting going to an overwriting +> > storage manager. Is this to be done only because of savepoints. +> +> No. One point I made a few monthes ago (and never got objections) +> is - why to keep old data in data files sooooo long? +> Imagine long running TX (eg pg_dump). Why other TX-s must read +> again and again completely useless (for them) old data we keep +> for pg_dump? +> +> > Doesn't seem worth it when I have a possible solution without +> > such a drastic change. +> > Also, overwriting storage manager will require MVCC to read +> > through there to get accurate MVCC visibility, right? +> +> Right... just like now non-overwriting smgr requires *ALL* +> TX-s to read old data in data files. But with overwriting smgr +> TX will read RS only when it is required and as far (much) as +> it is required. +> +> Simple solutions are not always the best ones. +> Compare Oracle and InterBase. Both have MVCC. +> Smgr-s are different. What RDBMS is more cool? +> Why doesn't Oracle use more simple non-overwriting smgr +> (as InterBase... and we do)? +> +> Vadim +> + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + +---------------------------(end of broadcast)--------------------------- +TIP 2: you can get off all lists at once with the unregister command + (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) + +From pgsql-hackers-owner+M18141=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 19:43:38 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0P0hbU15026 + for ; Thu, 24 Jan 2002 19:43:38 -0500 (EST) +Received: (qmail 28642 invoked by alias); 25 Jan 2002 00:43:24 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 25 Jan 2002 00:43:24 -0000 +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by postgresql.org (8.11.3/8.11.4) with SMTP id g0P0YIl27208 + for ; Thu, 24 Jan 2002 19:34:18 -0500 (EST) + (envelope-from Inoue@tpf.co.jp) +Received: (qmail 3661 invoked from network); 25 Jan 2002 00:34:19 -0000 +Received: from unknown (HELO viscomail.tpf.co.jp) (100.0.0.108) + by sd2.tpf-fw-c.co.jp with SMTP; 25 Jan 2002 00:34:19 -0000 +Received: from tpf.co.jp (3dgateway1 [126.0.1.60]) + by viscomail.tpf.co.jp (8.8.8+Sun/8.8.8) with ESMTP id JAA00756; + Fri, 25 Jan 2002 09:34:18 +0900 (JST) +Message-ID: <3C50A807.32A29E09@tpf.co.jp> +Date: Fri, 25 Jan 2002 09:34:15 +0900 +From: Hiroshi Inoue +X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U) +X-Accept-Language: ja +MIME-Version: 1.0 +To: "Mikheev, Vadim" +cc: PostgreSQL-development , + "'Bruce Momjian'" +Subject: Re: [HACKERS] Savepoints +References: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com> +Content-Type: text/plain; charset=iso-2022-jp +Content-Transfer-Encoding: 7bit +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + +"Mikheev, Vadim" wrote: +> +> > > How about: use overwriting smgr + put old records into rollback +> > > segments - RS - (you have to keep them somewhere till TX's running +> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> > > changes and WAL will be used for RS/data files recovery). +> > > Something like what Oracle does. +> > +> > As long as we use no overwriting manager +> > 1) Rollback(data) isn't needed in case of a db crash. +> > 2) Rollback(data) isn't needed to cancal a transaction entirely. +> +> -1) But vacuum must read a huge amount of data to remove dirt. +> -2) But TX-s must read data they are not interested at all. +> +> > 3) We don't need to mind the transaction size so much. +> +> -3) The same with overwriting smgr and WAL used *only as REDO log*: + +The larger RS becomes the longer it would take time to cancel +the transaction whereas it is executed in a momemnt under no +overwriting smgr and for example if RS exhausted all disk space +is PG really safe ? Other backends would also fail because they +couldn't write RS any mode. Many transactions would execute +UNDO operations simultaneously but there's no space to write +WALs (UNDO operations must be written to WAL also) and PG +system would abort. And could PG restart under such situations ? +Even though there's a way to recover from the situation, I +think we should avoid such dangerous situations from the +first. Basically recovery operations should never fail. + +> +> > We can't use the db any longer if a REDO recovery fails now. +> +> Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or +> use good RAM - whatever caused problem with restart. + +As I already mentioned recovery operations should never fail. +> +> > Under overwriting smgr we can't use the db any longer either +> > if rollback fails. +> +> Why should it fail? Bugs? Fix them. + +Rollback operations are executed much more often than +REDO recovery and it is hard to fix such bugs once PG +was released. Most people in such troubles have no +time to persue the cause. In reality I replied to the +PG restart troubles twice (with --wal-debug and pg_resetxlog +suggestions ) in Japan but got no further replies. + +> +> > How could PG be not less reliable than now ? +> +> Is today' RG more reliable than Oracle, Informix, DB2? + +I have never been and would never be optiomistic +about recovery. Is 7.1 more reliable than 7.0 from the +recovery POV ? I see no reason why overwriting smgr is +more relaible than no overwriting smgr as for recovery. + +regards, +Hiroshi Inoue + +---------------------------(end of broadcast)--------------------------- +TIP 6: Have you searched our list archives? + +http://archives.postgresql.org + +From ZeugswetterA@spardat.at Fri Jan 25 09:21:40 2002 +Return-path: +Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0PELde10640 + for ; Fri, 25 Jan 2002 09:21:39 -0500 (EST) +Received: from m01x1.s-mxs.net [10.3.55.201] + by smxsat1.smxs.net + with XWall v3.18f ; + Fri, 25 Jan 2002 15:22:51 +0100 +Received: from m0103.s-mxs.net [10.3.55.3] + by m01x1.s-mxs.net + with XWall v3.18a ; + Fri, 25 Jan 2002 15:21:23 +0100 +Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966); + Fri, 25 Jan 2002 15:21:22 +0100 +X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 +content-class: urn:content-classes:message +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Subject: RE: [HACKERS] Savepoints +Date: Fri, 25 Jan 2002 15:21:22 +0100 +Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA42128DE@m0114.s-mxs.net> +Thread-Topic: [HACKERS] Savepoints +Thread-Index: AcGkZ8SMKn//UUTjS3mi+qC7+gZAwwBQ4YMA +From: "Zeugswetter Andreas SB SD" +To: "Mikheev, Vadim" , + "Bruce Momjian" , + "PostgreSQL-development" +X-OriginalArrivalTime: 25 Jan 2002 14:21:22.0648 (UTC) FILETIME=[9090BD80:01C1A5AB] +Content-Transfer-Encoding: 8bit +X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id g0PELde10640 +Status: OR + +Vadim wrote: +> How about: use overwriting smgr + put old records into rollback +> segments - RS - (you have to keep them somewhere till TX's running +> anyway) + use WAL only as REDO log (RS will be used to rollback TX' +> changes and WAL will be used for RS/data files recovery). +> Something like what Oracle does. + +We have all the info we need in WAL and in the old rows, +why would you want to write them to RS ? +You only need RS for overwriting smgr. + +Andreas + +From pgsql-hackers-owner+M18209=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:14:02 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLE1e19182 + for ; Fri, 25 Jan 2002 16:14:01 -0500 (EST) +Received: (qmail 85111 invoked by alias); 25 Jan 2002 21:13:59 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 25 Jan 2002 21:13:59 -0000 +Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1]) + by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PL48l79366 + for ; Fri, 25 Jan 2002 16:04:09 -0500 (EST) + (envelope-from ZeugswetterA@spardat.at) +Received: from m01x1.s-mxs.net [10.3.55.201] + by smxsat1.smxs.net + with XWall v3.18f ; + Fri, 25 Jan 2002 22:05:21 +0100 +Received: from m0102.s-mxs.net [10.3.55.2] + by m01x1.s-mxs.net + with XWall v3.18a ; + Fri, 25 Jan 2002 22:03:54 +0100 +Received: from m0114.s-mxs.net ([10.3.55.14]) by m0102.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966); + Fri, 25 Jan 2002 22:03:53 +0100 +X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 +content-class: urn:content-classes:message +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Subject: Re: [HACKERS] Savepoints +Date: Fri, 25 Jan 2002 22:03:53 +0100 +Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net> +Thread-Topic: [HACKERS] Savepoints +Thread-Index: AcGlDMGVwSWndt4kT1C7QhclLvQPWgA1arbw +From: "Zeugswetter Andreas SB SD" +To: "Bruce Momjian" , + "Mikheev, Vadim" +cc: "PostgreSQL-development" +X-OriginalArrivalTime: 25 Jan 2002 21:03:53.0685 (UTC) FILETIME=[CBB48850:01C1A5E3] +Content-Transfer-Encoding: 8bit +X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLDAm83732 +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: ORr + + +> Now, with MVCC, the backend has to read through the redo segment to get + +You mean rollback segment, but ... + +> the original data value for that row. + +Will only need to be looked up if the row is currently beeing modified by +a not yet comitted txn (at least in the default read committed mode) + +> +> Now, while rollback segments do help with cleaning out old UPDATE rows, +> how does it improve DELETE performance? Seems it would just mark it as +> expired like we do now. + +delete would probably be: +1. mark original deleted and write whole row to RS + +I don't think you would like to mix looking up deleted rows in heap +but updated rows in RS + +Andreas + +PS: not that I like overwrite with MVCC now +If you think of VACUUM as garbage collection PG is highly trendy with +the non-overwriting smgr. + +---------------------------(end of broadcast)--------------------------- +TIP 5: Have you checked our extensive FAQ? + +http://www.postgresql.org/users-lounge/docs/faq.html + +From pgsql-hackers-owner+M18211=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:53:45 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLrie22174 + for ; Fri, 25 Jan 2002 16:53:44 -0500 (EST) +Received: (qmail 96831 invoked by alias); 25 Jan 2002 21:53:43 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 25 Jan 2002 21:53:43 -0000 +Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1]) + by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PLpRl96298 + for ; Fri, 25 Jan 2002 16:51:27 -0500 (EST) + (envelope-from ZeugswetterA@spardat.at) +Received: from m01x1.s-mxs.net [10.3.55.201] + by smxsat1.smxs.net + with XWall v3.18f ; + Fri, 25 Jan 2002 22:52:54 +0100 +Received: from m0103.s-mxs.net [10.3.55.3] + by m01x1.s-mxs.net + with XWall v3.18a ; + Fri, 25 Jan 2002 22:51:25 +0100 +Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966); + Fri, 25 Jan 2002 22:51:25 +0100 +X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 +content-class: urn:content-classes:message +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Subject: Re: [HACKERS] Savepoints +Date: Fri, 25 Jan 2002 22:51:24 +0100 +Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C5@m0114.s-mxs.net> +Thread-Topic: [HACKERS] Savepoints +Thread-Index: AcGlznYKFcqoYpMnSlGQHhQuEf6LuAAGpxnQ +From: "Zeugswetter Andreas SB SD" +To: "Mikheev, Vadim" +cc: +X-OriginalArrivalTime: 25 Jan 2002 21:51:25.0008 (UTC) FILETIME=[6F39E500:01C1A5EA] +Content-Transfer-Encoding: 8bit +X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLrP196418 +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + + +> > > How about: use overwriting smgr + put old records into rollback +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > > segments - RS - (you have to keep them somewhere till TX's running +> > > anyway) + use WAL only as REDO log (RS will be used to +> rollback TX' +> > > changes and WAL will be used for RS/data files recovery). +> > > Something like what Oracle does. +> > +> > We have all the info we need in WAL and in the old rows, +> > why would you want to write them to RS ? +> > You only need RS for overwriting smgr. +> +> This is what I'm saying - implement Overwriting smgr... + +Yes I am sorry, I am catching up on email and had not read Bruce's +comment (nor yours correctly) :-( + +I was also long in the pro overwriting camp, because I am used to +non MVCC dbs like DB/2 and Informix. (which I like very much) +But I am starting to doubt that overwriting is really so good for +an MVCC db. And I don't think PG wants to switch to non MVCC :-) + +Imho it would only need a much more aggressive VACUUM backend. +(aka garbage collector :-) Maybe It could be designed to sniff the +redo log (buffer) to get a hint at what to actually clean out next. + +Andreas + +---------------------------(end of broadcast)--------------------------- +TIP 4: Don't 'kill -9' the postmaster + +From pgsql-hackers-owner+M18218=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 19:14:24 2002 +Return-path: +Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) + by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0Q0ENe03543 + for ; Fri, 25 Jan 2002 19:14:23 -0500 (EST) +Received: (qmail 22482 invoked by alias); 26 Jan 2002 00:13:55 -0000 +Received: from unknown (HELO postgresql.org) (64.49.215.8) + by www.postgresql.org with SMTP; 26 Jan 2002 00:13:55 -0000 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PNw1l20714 + for ; Fri, 25 Jan 2002 18:58:01 -0500 (EST) + (envelope-from pgman@candle.pha.pa.us) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g0PNvoL02515; + Fri, 25 Jan 2002 18:57:50 -0500 (EST) +From: Bruce Momjian +Message-ID: <200201252357.g0PNvoL02515@candle.pha.pa.us> +Subject: Re: [HACKERS] Savepoints +In-Reply-To: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net> +To: Zeugswetter Andreas SB SD +Date: Fri, 25 Jan 2002 18:57:50 -0500 (EST) +cc: "Mikheev, Vadim" , + PostgreSQL-development +X-Mailer: ELM [version 2.4ME+ PL96 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + +Zeugswetter Andreas SB SD wrote: +> +> > Now, with MVCC, the backend has to read through the redo segment to get +> +> You mean rollback segment, but ... + + +Sorry, yes. I get redo/undo/rollback mixed up sometimes. :-) + +> > the original data value for that row. +> +> Will only need to be looked up if the row is currently beeing modified by +> a not yet comitted txn (at least in the default read committed mode) + +Uh, not really. The transaction may have completed after my transaction +started, meaning even though it looks like it is committed, to me, it is +not visible. Most MVCC visibility will require undo lookup. + +> +> > +> > Now, while rollback segments do help with cleaning out old UPDATE rows, +> > how does it improve DELETE performance? Seems it would just mark it as +> > expired like we do now. +> +> delete would probably be: +> 1. mark original deleted and write whole row to RS +> +> I don't think you would like to mix looking up deleted rows in heap +> but updated rows in RS + +Yes, so really the overwriting is only a big win for UPDATE. Right now, +UPDATE is DELETE/INSERT, and that DELETE makes MVCC happy. :-) + +My whole goal was to simplify this so we can see the differences. + + +> PS: not that I like overwrite with MVCC now +> If you think of VACUUM as garbage collection PG is highly trendy with +> the non-overwriting smgr. + +Yes, that is basically what it is now, a garbage collector that collects +in heap rather than in undo. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + +---------------------------(end of broadcast)--------------------------- +TIP 3: if posting/reading through Usenet, please send an appropriate +subscribe-nomail command to majordomo@postgresql.org so that your +message can get through to the mailing list cleanly + +From pgman Wed Jan 23 10:36:13 2002 +Subject: Savepoints +To: PostgreSQL-development +Date: Wed, 23 Jan 2002 13:19:05 -0500 (EST) +X-Mailer: ELM [version 2.4ME+ PL96 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Content-Length: 1829 +Status: OR + +I have talked in the past about a possible implementation of +savepoints/nested transactions. I would like to more formally outline +my ideas below. + +We have talked about using WAL for such a purpose, but that requires WAL +files to remain for the life of a transaction, which seems unacceptable. +Other database systems do that, and it is a pain for administrators. I +realized we could do some sort of WAL compaction, but that seems quite +complex too. + +Basically, under my plan, WAL would be unchanged. WAL's function is +crash recovery, and it would retain that. There would also be no +on-disk changes. I would use the command counter in certain cases to +identify savepoints. + +My idea is to keep savepoint undo information in a private area per +backend, either in memory or on disk. We can either save the +relid/tids of modified rows, or if there are too many, discard the +saved ones and just remember the modified relids. On rollback to save +point, either clear up the modified relid/tids, or sequential scan +through the relid and clear up all the tuples that have our transaction +id and have command counters that are part of the undo savepoint. + +It seems marking undo savepoint rows with a fixed aborted transaction id +would be the easiest solution. + +Of course, we only remember modified rows when we are in savepoints, and +only undo them when we rollback to a savepoint. Transaction processing +remains the same. + +There is no reason for other backend to be able to see savepoint undo +information, and keeping it private greatly simplifies the +implementation. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 +