From goran@kirra.net Mon Dec 20 14:30:54 1999 Received: from villa.bildbasen.se (villa.bildbasen.se [193.45.225.97]) by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id PAA29058 for ; Mon, 20 Dec 1999 15:30:17 -0500 (EST) Received: (qmail 2485 invoked from network); 20 Dec 1999 20:29:53 -0000 Received: from a112.dial.kiruna.se (HELO kirra.net) (193.45.238.12) by villa.bildbasen.se with SMTP; 20 Dec 1999 20:29:53 -0000 Sender: goran Message-ID: <385E9192.226CC37D@kirra.net> Date: Mon, 20 Dec 1999 21:29:06 +0100 From: Goran Thyni Organization: kirra.net X-Mailer: Mozilla 4.6 [en] (X11; U; Linux 2.2.13 i586) X-Accept-Language: sv, en MIME-Version: 1.0 To: Bruce Momjian CC: "neil d. quiogue" , PostgreSQL-development Subject: Re: [HACKERS] Re: QUESTION: Replication References: <199912201508.KAA20572@candle.pha.pa.us> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Status: OR Bruce Momjian wrote: > We need major work in this area, or at least a plan and an FAQ item. > We are getting major questions on this, and I don't know enough even to > make an FAQ item telling people their options. My 2 cents, or 2 ören since I'm a Swede, on this: It is pretty simple to build a replication with pg_dump, transfer, empty replic and reload. But if we want "live replicas" we better base our efforts on a mechanism using WAL-logs to rollforward the replicas. regards, ----------------- Göran Thyni On quiet nights you can hear Windows NT reboot! From owner-pgsql-hackers@hub.org Fri Dec 24 10:01:18 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA11295 for ; Fri, 24 Dec 1999 11:01:17 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id KAA20310 for ; Fri, 24 Dec 1999 10:39:18 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id KAA61760; Fri, 24 Dec 1999 10:31:13 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 10:30:48 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id KAA58879 for pgsql-hackers-outgoing; Fri, 24 Dec 1999 10:29:51 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from bocs170n.black-oak.COM ([38.149.137.131]) by hub.org (8.9.3/8.9.3) with ESMTP id KAA58795 for ; Fri, 24 Dec 1999 10:29:00 -0500 (EST) (envelope-from DWalker@black-oak.com) From: DWalker@black-oak.com To: pgsql-hackers@postgreSQL.org Subject: [HACKERS] database replication Date: Fri, 24 Dec 1999 10:27:59 -0500 Message-ID: X-Priority: 3 (Normal) X-MIMETrack: Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99 10:28:01 AM MIME-Version: 1.0 MIME-Version: 1.0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-pgsql-hackers@postgreSQL.org Status: OR

I've been toying with the idea of implementing database replication for = the last few days.  The system I'm proposing will be a seperate progra= m which can be run on any machine and will most likely be implemented in Py= thon.  What I'm looking for at this point are gaping holes in my think= ing/logic/etc.  Here's what I'm thinking...

 

1) I wa= nt to make this program an additional layer over PostgreSQL.  I really= don't want to hack server code if I can get away with it.  At this po= int I don't feel I need to.

2) The replication system will need to ad= d at least one field to each table in each database that needs to be replic= ated.  This field will be a date/time stamp which identifies the "= ;last update" of the record.  This field will be called PGR=5FTIM= E for lack of a better name.  Because this field will be used from wit= hin programs and triggers it can be longer so as to not mistake it for a us= er field.

3) For each table to be replicated the replication system w= ill programatically add one plpgsql function and trigger to modify the PGR= =5FTIME field on both UPDATEs and INSERTs.  The name of this function = and trigger will be along the lines of <table=5Fname>=5Freplication= =5Fupdate=5Ftrigger and <table=5Fname>=5Freplication=5Fupdate=5Ffunct= ion.  The function is a simple two-line chunk of code to set the field= PGR=5FTIME equal to NOW.  The trigger is called before each insert/up= date.  When looking at the Docs I see that times are stored in Zulu (G= T) time.  Because of this I don't have to worry about time zones and t= he like.  I need direction on this part (such as "hey dummy, look= at page N of file X.").

4) At this point we have tables which c= an, at a basic level, tell the replication system when they were last updat= ed.

5) The replication system will have a database of its own to reco= rd the last replication event, hold configuration, logs, etc.  I'd pre= fer to store the configuration in a PostgreSQL table but it could just as e= asily be stored in a text file on the filesystem somewhere.

6) To han= dle replication I basically check the local "last replication time&quo= t; and compare it against the remote PGR=5FTIME fields.  If the remote= PGR=5FTIME is greater than the last replication time then change the local= copy of the database, otherwise, change the remote end of the database. &n= bsp;At this point I don't have a way to know WHICH field changed between th= e two replicas so either I do ROW level replication or I check each field. =  I check PGR=5FTIME to determine which field is the most current. &nbs= p;Some fine tuning of this process will have to occur no doubt.

7) Th= e commandline utility, fired off by something like cron, could run several = times during the day -- command line parameters can be implemented to say P= USH ALL CHANGES TO SERVER A, or PULL ALL CHANGES FROM SERVER B.

 = ;

Questions/Concerns:

1) How far do I go with this?  Do I = start manhandling the system catalogs (pg=5F* tables)?

2) As to #2 an= d #3 above, I really don't like tools automagically changing my tables but = at this point I don't see a way around it.  I guess this is where the = testing comes into play.

3) Security: the replication app will have t= o have pretty good rights to the database so it can add the nessecary funct= ions and triggers, modify table schema, etc.  

 

&nbs= p; So, any "you're insane and should run home to momma" comments?=

 

              Damond=

= ************ From owner-pgsql-hackers@hub.org Fri Dec 24 18:31:03 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA26244 for ; Fri, 24 Dec 1999 19:31:02 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id TAA12730 for ; Fri, 24 Dec 1999 19:30:05 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id TAA57851; Fri, 24 Dec 1999 19:23:31 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 19:22:54 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id TAA57710 for pgsql-hackers-outgoing; Fri, 24 Dec 1999 19:21:56 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from Mail.austin.rr.com (sm2.texas.rr.com [24.93.35.55]) by hub.org (8.9.3/8.9.3) with ESMTP id TAA57680 for ; Fri, 24 Dec 1999 19:21:25 -0500 (EST) (envelope-from ELOEHR@austin.rr.com) Received: from austin.rr.com ([24.93.40.248]) by Mail.austin.rr.com with Microsoft SMTPSVC(5.5.1877.197.19); Fri, 24 Dec 1999 18:12:50 -0600 Message-ID: <38640E2D.75136600@austin.rr.com> Date: Fri, 24 Dec 1999 18:22:05 -0600 From: Ed Loehr X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.12-20smp i686) X-Accept-Language: en MIME-Version: 1.0 To: DWalker@black-oak.com CC: pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] database replication References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-pgsql-hackers@postgreSQL.org Status: OR DWalker@black-oak.com wrote: > 6) To handle replication I basically check the local "last > replication time" and compare it against the remote PGR_TIME > fields. If the remote PGR_TIME is greater than the last replication > time then change the local copy of the database, otherwise, change > the remote end of the database. At this point I don't have a way to > know WHICH field changed between the two replicas so either I do ROW > level replication or I check each field. I check PGR_TIME to > determine which field is the most current. Some fine tuning of this > process will have to occur no doubt. Interesting idea. I can see how this might sync up two databases somehow. For true replication, however, I would always want every replicated database to be, at the very least, internally consistent (i.e., referential integrity), even if it was a little behind on processing transactions. In this method, its not clear how consistency is every achieved/guaranteed at any point in time if the input stream of changes is continuous. If the input stream ceased, then I can see how this approach might eventually catch up and totally resync everything, but it looks *very* computationally expensive. But I might have missed something. How would internal consistency be maintained? > 7) The commandline utility, fired off by something like cron, could > run several times during the day -- command line parameters can be > implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES > FROM SERVER B. My two cents is that, while I can see this kind of database syncing as valuable, this is not the kind of "replication" I had in mind. This may already possible by simply copying the database. What replication means to me is a live, continuously streaming sequence of updates from one database to another where the replicated database is always internally consistent, available for read-only queries, and never "too far" out of sync with the source/primary database. What does replication mean to others? Cheers, Ed Loehr ************ From owner-pgsql-hackers@hub.org Fri Dec 24 21:31:10 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA02578 for ; Fri, 24 Dec 1999 22:31:09 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id WAA16641 for ; Fri, 24 Dec 1999 22:18:56 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id WAA89135; Fri, 24 Dec 1999 22:11:12 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 22:10:56 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id WAA89019 for pgsql-hackers-outgoing; Fri, 24 Dec 1999 22:09:59 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from bocs170n.black-oak.COM ([38.149.137.131]) by hub.org (8.9.3/8.9.3) with ESMTP id WAA88957; Fri, 24 Dec 1999 22:09:11 -0500 (EST) (envelope-from dwalker@black-oak.com) Received: from gcx80 ([151.196.99.113]) by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1) with SMTP id 1999122422080835:6 ; Fri, 24 Dec 1999 22:08:08 -0500 Message-ID: <001b01bf4e9e$647287d0$af63a8c0@walkers.org> From: "Damond Walker" To: Cc: References: <38640E2D.75136600@austin.rr.com> Subject: Re: [HACKERS] database replication Date: Fri, 24 Dec 1999 22:07:55 -0800 MIME-Version: 1.0 X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99 10:08:09 PM, Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99 10:08:11 PM, Serialize complete at 12/24/99 10:08:11 PM Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="iso-8859-1" Sender: owner-pgsql-hackers@postgreSQL.org Status: OR > > Interesting idea. I can see how this might sync up two databases > somehow. For true replication, however, I would always want every > replicated database to be, at the very least, internally consistent > (i.e., referential integrity), even if it was a little behind on > processing transactions. In this method, its not clear how > consistency is every achieved/guaranteed at any point in time if the > input stream of changes is continuous. If the input stream ceased, > then I can see how this approach might eventually catch up and totally > resync everything, but it looks *very* computationally expensive. > What's the typical unit of work for the database? Are we talking about update transactions which span the entire DB? Or are we talking about updating maybe 1% or less of the database everyday? I'd think it would be more towards the latter than the former. So, yes, this process would be computationally expensive but how many records would actually have to be sent back and forth? > But I might have missed something. How would internal consistency be > maintained? > Updates that occur at site A will be moved to site B and vice versa. Consistency would be maintained. The only problem that I can see right off the bat would be what if site A and site B made changes to a row and then site C was brought into the picture? Which one wins? Someone *has* to win when it comes to this type of thing. You really DON'T want to start merging row changes... > > My two cents is that, while I can see this kind of database syncing as > valuable, this is not the kind of "replication" I had in mind. This > may already possible by simply copying the database. What replication > means to me is a live, continuously streaming sequence of updates from > one database to another where the replicated database is always > internally consistent, available for read-only queries, and never "too > far" out of sync with the source/primary database. > Sounds like you're talking about distributed transactions to me. That's an entirely different subject all-together. What you describe can be done by copying a database...but as you say, this would only work in a read-only situation. Damond ************ From owner-pgsql-hackers@hub.org Sat Dec 25 16:35:07 1999 Received: from hub.org (hub.org [216.126.84.1]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA28890 for ; Sat, 25 Dec 1999 17:35:05 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id RAA86997; Sat, 25 Dec 1999 17:29:10 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Sat, 25 Dec 1999 17:28:09 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id RAA86863 for pgsql-hackers-outgoing; Sat, 25 Dec 1999 17:27:11 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from mtiwmhc08.worldnet.att.net (mtiwmhc08.worldnet.att.net [204.127.131.19]) by hub.org (8.9.3/8.9.3) with ESMTP id RAA86798 for ; Sat, 25 Dec 1999 17:26:34 -0500 (EST) (envelope-from pgsql@rkirkpat.net) Received: from [192.168.3.100] ([12.74.72.219]) by mtiwmhc08.worldnet.att.net (InterMail v03.02.07.07 118-134) with ESMTP id <19991225222554.VIOL28505@[12.74.72.219]>; Sat, 25 Dec 1999 22:25:54 +0000 Date: Sat, 25 Dec 1999 15:25:47 -0700 (MST) From: Ryan Kirkpatrick X-Sender: rkirkpat@excelsior.rkirkpat.net To: DWalker@black-oak.com cc: pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] database replication In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pgsql-hackers@postgreSQL.org Status: OR On Fri, 24 Dec 1999 DWalker@black-oak.com wrote: > I've been toying with the idea of implementing database replication > for the last few days. I too have been thinking about this some over the last year or two, just trying to find a quick and easy way to do it. I am not so interested in replication, as in synchronization, as in between a desktop machine and a laptop, so I can keep the databases on each in sync with each other. For this sort of purpose, both the local and remote databases would be "idle" at the time of syncing. > 2) The replication system will need to add at least one field to each > table in each database that needs to be replicated. This field will be > a date/time stamp which identifies the "last update" of the record. > This field will be called PGR_TIME for lack of a better name. > Because this field will be used from within programs and triggers it > can be longer so as to not mistake it for a user field. How about a single, seperate table with the fields of 'database', 'tablename', 'oid', 'last_changed', that would store the same data as your PGR_TIME field. It would be seperated from the actually data tables, and therefore would be totally transparent to any database interface applications. The 'oid' field would hold each row's OID, a nice, unique identification number for the row, while the other fields would tell which table and database the oid is in. Then this table can be compared with the this table on a remote machine to quickly find updates and changes, then each differences can be dealt with in turn. > 3) For each table to be replicated the replication system will > programatically add one plpgsql function and trigger to modify the > PGR_TIME field on both UPDATEs and INSERTs. The name of this function > and trigger will be along the lines of > _replication_update_trigger and > _replication_update_function. The function is a simple > two-line chunk of code to set the field PGR_TIME equal to NOW. The > trigger is called before each insert/update. When looking at the Docs > I see that times are stored in Zulu (GT) time. Because of this I > don't have to worry about time zones and the like. I need direction > on this part (such as "hey dummy, look at page N of file X."). I like this idea, better than any I have come up with yet. Though, how are you going to handle DELETEs? > 6) To handle replication I basically check the local "last replication > time" and compare it against the remote PGR_TIME fields. If the > remote PGR_TIME is greater than the last replication time then change > the local copy of the database, otherwise, change the remote end of > the database. At this point I don't have a way to know WHICH field > changed between the two replicas so either I do ROW level replication > or I check each field. I check PGR_TIME to determine which field is > the most current. Some fine tuning of this process will have to occur > no doubt. Yea, this is indeed the sticky part, and would indeed require some fine-tunning. Basically, the way I see it, is if the two timestamps for a single row do not match (or even if the row and therefore timestamp is missing on one side or the other altogether): local ts > remote ts => Local row is exported to remote. remote ts > local ts => Remote row is exported to local. local ts > last sync time && no remote ts => Local row is inserted on remote. local ts < last sync time && no remote ts => Local row is deleted. remote ts > last sync time && no local ts => Remote row is inserted on local. remote ts < last sync time && no local ts => Remote row is deleted. where the synchronization process is running on the local machine. By exported, I mean the local values are sent to the remote machine, and the row on that remote machine is updated to the local values. How does this sound? > 7) The commandline utility, fired off by something like cron, could > run several times during the day -- command line parameters can be > implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES > FROM SERVER B. Or run manually for my purposes. Also, maybe follow it with a vacuum run on both sides for all databases, as this is going to potenitally cause lots of table changes that could stand with a cleanup. > 1) How far do I go with this? Do I start manhandling the system catalogs (pg_* tables)? Initially, I would just stick to user table data... If you have changes in triggers and other meta-data/executable code, you are going to want to make syncs of that stuff manually anyway. At least I would want to. > 2) As to #2 and #3 above, I really don't like tools automagically > changing my tables but at this point I don't see a way around it. I > guess this is where the testing comes into play. Hence the reason for the seperate table with just a row's identification and last update time. Only modifications to the synced database is the update trigger, which should be pretty harmless. > 3) Security: the replication app will have to have pretty good rights > to the database so it can add the nessecary functions and triggers, > modify table schema, etc. Just run the sync program as the postgres super user, and there are no problems. :) > So, any "you're insane and should run home to momma" comments? No, not at all. Though it probably should be remaned from replication to synchronization. The former is usually associated with a continuous stream of updates between the local and remote databases, so they are almost always in sync, and have a queuing ability if their connection is loss for span of time as well. Very complex and difficult to implement, and would require hacking server code. :( Something only Sybase and Oracle have (as far as I know), and from what I have seen of Sybase's replication server support (dated by 5yrs) it was a pain to setup and get running correctly. The latter, synchronization, is much more managable, and can still be useful, especially when you have a large database you want in two places, mainly for read only purposes at one end or the other, but don't want to waste the time/bandwidth to move and load the entire database each time it changes on one end or the other. Same idea as mirroring software for FTP sites, just transfers the changes, and nothing more. I also like the idea of using Python. I have been using it recently for some database interfaces (to PostgreSQL of course :), and it is a very nice language to work with. Some worries about performance of the program though, as python is only an interpreted lanuage, and I have yet to really be impressed with the speed of execution of my database interfaces yet. Anyway, it sound like a good project, and finally one where I actually have a clue of what is going on, and the skills to help. So, if you are interested in pursing this project, I would be more than glad to help. TTYL. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | --------------------------------------------------------------------------- ************ From owner-pgsql-hackers@hub.org Sun Dec 26 08:31:09 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA17976 for ; Sun, 26 Dec 1999 09:31:07 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id JAA23337 for ; Sun, 26 Dec 1999 09:28:36 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id JAA90738; Sun, 26 Dec 1999 09:21:58 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 09:19:19 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id JAA90498 for pgsql-hackers-outgoing; Sun, 26 Dec 1999 09:18:21 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from bocs170n.black-oak.COM ([38.149.137.131]) by hub.org (8.9.3/8.9.3) with ESMTP id JAA90452 for ; Sun, 26 Dec 1999 09:17:54 -0500 (EST) (envelope-from dwalker@black-oak.com) Received: from vmware98 ([151.196.99.113]) by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1) with SMTP id 1999122609164808:7 ; Sun, 26 Dec 1999 09:16:48 -0500 Message-ID: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org> From: "Damond Walker" To: "Ryan Kirkpatrick" Cc: Subject: Re: [HACKERS] database replication Date: Sun, 26 Dec 1999 10:10:41 -0500 MIME-Version: 1.0 X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.72.3110.1 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99 09:16:51 AM, Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99 09:16:54 AM, Serialize complete at 12/26/99 09:16:54 AM Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="iso-8859-1" Sender: owner-pgsql-hackers@postgreSQL.org Status: OR > > I too have been thinking about this some over the last year or >two, just trying to find a quick and easy way to do it. I am not so >interested in replication, as in synchronization, as in between a desktop >machine and a laptop, so I can keep the databases on each in sync with >each other. For this sort of purpose, both the local and remote databases >would be "idle" at the time of syncing. > I don't think it would matter if the databases are idle or not to be honest with you. At any single point in time when you replicate I'd figure that the database would be in a consistent state. So, you should be able to replicate (or sync) a remote database that is in use. After all, you're getting a snapshot of the database as it stands at 8:45 PM. At 8:46 PM it may be totally different...but the next time syncing takes place those changes would appear in your local copy. The one problem you may run into is if the remote host is running a large batch process. It's very likely that you will get 50% of their changes when you replicate...but then again, that's why you can schedule the event to work around such things. > How about a single, seperate table with the fields of 'database', >'tablename', 'oid', 'last_changed', that would store the same data as your >PGR_TIME field. It would be seperated from the actually data tables, and >therefore would be totally transparent to any database interface >applications. The 'oid' field would hold each row's OID, a nice, unique >identification number for the row, while the other fields would tell which >table and database the oid is in. Then this table can be compared with the >this table on a remote machine to quickly find updates and changes, then >each differences can be dealt with in turn. > The problem with OID's is that they are unique at the local level but if you try and use them between servers you can run into overlap. Also, if a database is under heavy use this table could quickly become VERY large. Add indexes to this table to help performance and you're taking up even more disk space. Using the PGR_TIME field with an index will allow us to find rows which have changed VERY quickly. All we need to do now is somehow programatically find the primary key for a table so the person setting up replication (or syncing) doesn't have to have an indepth knowledge of the schema in order to setup a syncing schedule. > > I like this idea, better than any I have come up with yet. Though, >how are you going to handle DELETEs? > Oops...how about defining a trigger for this? With deletion I guess we would have to move a flag into another table saying we deleted record 'X' with this primary key from this table. > > Yea, this is indeed the sticky part, and would indeed require some >fine-tunning. Basically, the way I see it, is if the two timestamps for a >single row do not match (or even if the row and therefore timestamp is >missing on one side or the other altogether): > local ts > remote ts => Local row is exported to remote. > remote ts > local ts => Remote row is exported to local. > local ts > last sync time && no remote ts => > Local row is inserted on remote. > local ts < last sync time && no remote ts => > Local row is deleted. > remote ts > last sync time && no local ts => > Remote row is inserted on local. > remote ts < last sync time && no local ts => > Remote row is deleted. >where the synchronization process is running on the local machine. By >exported, I mean the local values are sent to the remote machine, and the >row on that remote machine is updated to the local values. How does this >sound? > The replication part will be the most complex...that much is for certain... I've been writing systems in Lotus Notes/Domino for the last year or so and I've grown quite spoiled with what it can do in regards to replication. It's not real-time but you have to gear your applications to this type of thing (it's possible to create documents, fire off email to notify people of changes and have the email arrive before the replicated documents do). Replicating large Notes/Domino databases takes quite a while....I don't see any kind of replication or syncing running in a blink of an eye. Having said that, a good algo will have to be written to cut down on network traffic and to keep database conversations down to a minimum. This will be appreciated by people with low bandwidth connections I'm sure (dial-ups, fractional T1's, etc). > Or run manually for my purposes. Also, maybe follow it >with a vacuum run on both sides for all databases, as this is going to >potenitally cause lots of table changes that could stand with a cleanup. > What would a vacuum do to a system being used by many people? > No, not at all. Though it probably should be remaned from >replication to synchronization. The former is usually associated with a >continuous stream of updates between the local and remote databases, so >they are almost always in sync, and have a queuing ability if their >connection is loss for span of time as well. Very complex and difficult to >implement, and would require hacking server code. :( Something only Sybase >and Oracle have (as far as I know), and from what I have seen of Sybase's >replication server support (dated by 5yrs) it was a pain to setup and get >running correctly. It could probably be named either way...but the one thing I really don't want to do is start hacking server code. The PostgreSQL people have enough to do without worrying about trying to meld anything I've done to their server. :) Besides, I like the idea of having it operate as a stand-alone product. The only PostgreSQL feature we would require would be triggers and plpgsql...what was the earliest version of PostgreSQL that supported plpgsql? Even then I don't see the triggers being that complex to boot. > I also like the idea of using Python. I have been using it >recently for some database interfaces (to PostgreSQL of course :), and it >is a very nice language to work with. Some worries about performance of >the program though, as python is only an interpreted lanuage, and I have >yet to really be impressed with the speed of execution of my database >interfaces yet. The only thing we'd need for Python is the Python extensions for PostgreSQL...which in turn requires libpq and that's about it. So, it should be able to run on any platform supported by Python and libpq. Using TK for the interface components will require NT people to get additional software from the 'net. At least it did with older version of Windows Python. Unix folks should be happy....assuming they have X running on the machine doing the replication or syncing. Even then I wrote a curses based Python interface awhile back which allows buttons, progress bars, input fields, etc (I called it tinter and it's available at http://iximd.com/~dwalker). It's a simple interface and could probably be cleaned up a bit but it works. :) > Anyway, it sound like a good project, and finally one where I >actually have a clue of what is going on, and the skills to help. So, if >you are interested in pursing this project, I would be more than glad to >help. TTYL. > That would be a Good Thing. Have webspace somewhere? If I can get permission from the "powers that be" at the office I could host a website on our (Domino) webserver. Damond ************ From owner-pgsql-hackers@hub.org Sun Dec 26 19:11:48 1999 Received: from hub.org (hub.org [216.126.84.1]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA26661 for ; Sun, 26 Dec 1999 20:11:46 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id UAA14959; Sun, 26 Dec 1999 20:08:15 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 20:07:27 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id UAA14820 for pgsql-hackers-outgoing; Sun, 26 Dec 1999 20:06:28 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from mtiwmhc02.worldnet.att.net (mtiwmhc02.worldnet.att.net [204.127.131.37]) by hub.org (8.9.3/8.9.3) with ESMTP id UAA14749 for ; Sun, 26 Dec 1999 20:05:39 -0500 (EST) (envelope-from rkirkpat@rkirkpat.net) Received: from [192.168.3.100] ([12.74.72.56]) by mtiwmhc02.worldnet.att.net (InterMail v03.02.07.07 118-134) with ESMTP id <19991227010506.WJVW1914@[12.74.72.56]>; Mon, 27 Dec 1999 01:05:06 +0000 Date: Sun, 26 Dec 1999 18:05:02 -0700 (MST) From: Ryan Kirkpatrick X-Sender: rkirkpat@excelsior.rkirkpat.net To: Damond Walker cc: pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] database replication In-Reply-To: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pgsql-hackers@postgreSQL.org Status: OR On Sun, 26 Dec 1999, Damond Walker wrote: > > How about a single, seperate table with the fields of 'database', > >'tablename', 'oid', 'last_changed', that would store the same data as your > >PGR_TIME field. It would be seperated from the actually data tables, and ... > The problem with OID's is that they are unique at the local level but if > you try and use them between servers you can run into overlap. Yea, forgot about that point, but became dead obvious once you mentioned it. Boy, I feel stupid now. :) > Using the PGR_TIME field with an index will allow us to find rows which > have changed VERY quickly. All we need to do now is somehow programatically > find the primary key for a table so the person setting up replication (or > syncing) doesn't have to have an indepth knowledge of the schema in order to > setup a syncing schedule. Hmm... Yea, maybe look to see which field(s) has a primary, unique index on it? Then use those field(s) as a primary key. Just require that any table to be synchronized to have some set of fields that uniquely identify each row. Either that, or add another field to each table with our own, cross system consistent, identification system. Don't know which would be more efficient and easier to work with. The former could potentially get sticky if it takes a lots of fields to generate a unique key value, but has the smallest effect on the table to be synced. The latter could be difficult to keep straight between systems (local vs. remote), and would require a trigger on inserts to generate a new, unique id number, that does not exist locally or remotely (nasty issue there), but would remove the uniqueness requirement. > Oops...how about defining a trigger for this? With deletion I guess we > would have to move a flag into another table saying we deleted record 'X' > with this primary key from this table. Or, according to my logic below, if a row is missing on one side or the other, then just compare the remaining row's timestamp to the last synchronization time (stored in a seperate table/db elsewhere). The results of the comparsion and the state of row existences tell one if the row was inserted or deleted since the last sync, and what should be done to perform the sync. > > Yea, this is indeed the sticky part, and would indeed require some > >fine-tunning. Basically, the way I see it, is if the two timestamps for a > >single row do not match (or even if the row and therefore timestamp is > >missing on one side or the other altogether): > > local ts > remote ts => Local row is exported to remote. > > remote ts > local ts => Remote row is exported to local. > > local ts > last sync time && no remote ts => > > Local row is inserted on remote. > > local ts < last sync time && no remote ts => > > Local row is deleted. > > remote ts > last sync time && no local ts => > > Remote row is inserted on local. > > remote ts < last sync time && no local ts => > > Remote row is deleted. > >where the synchronization process is running on the local machine. By > >exported, I mean the local values are sent to the remote machine, and the > >row on that remote machine is updated to the local values. How does this > >sound? > Having said that, a good algo will have to be written to cut down on > network traffic and to keep database conversations down to a minimum. This > will be appreciated by people with low bandwidth connections I'm sure > (dial-ups, fractional T1's, etc). Of course! In reflection, the assigned identification number I mentioned above might be the best then, instead of having to transfer the entire set of key fields back and forth. > What would a vacuum do to a system being used by many people? Probably lock them out of tables while they are vacuumed... Maybe not really required in the end, possibly optional? > It could probably be named either way...but the one thing I really don't > want to do is start hacking server code. The PostgreSQL people have enough > to do without worrying about trying to meld anything I've done to their > server. :) Yea, they probably would appreciate that. They already have enough on thier plate for 7.x as it is! :) > Besides, I like the idea of having it operate as a stand-alone product. > The only PostgreSQL feature we would require would be triggers and > plpgsql...what was the earliest version of PostgreSQL that supported > plpgsql? Even then I don't see the triggers being that complex to boot. No, provided that we don't do the identification number idea (which the more I think about it, probably will not work). As for what version support plpgsql, I don't know, one of the more hard-core pgsql hackers can probably tell us that. > The only thing we'd need for Python is the Python extensions for > PostgreSQL...which in turn requires libpq and that's about it. So, it > should be able to run on any platform supported by Python and libpq. Of course. If it ran on NT as well as Linux/Unix, that would be even better. :) > Unix folks should be happy....assuming they have X running on the > machine doing the replication or syncing. Even then I wrote a curses > based Python interface awhile back which allows buttons, progress > bars, input fields, etc (I called it tinter and it's available at > http://iximd.com/~dwalker). It's a simple interface and could > probably be cleaned up a bit but it works. :) Why would we want any type of GUI (X11 or curses) for this sync program. I imagine just a command line program with a few options (local machine, remote machine, db name, etc...), and nothing else. Though I will take a look at your curses interface, as I have been wanting to make a curses interface to a few db interfaces I have, in a simple as manner as possible. > That would be a Good Thing. Have webspace somewhere? If I can get > permission from the "powers that be" at the office I could host a website on > our (Domino) webserver. Yea, I got my own web server (www.rkirkpat.net) with 1GB+ of disk space available, sitting on a decent speed DSL. Even can setup of a virtual server if we want (i.e. pgsync.rkirkpat.net :). CVS repository, email lists, etc... possible with some effort (and time). So, where should we start? TTYL. PS. The current pages on my web site are very out of date at the moment (save for the pgsql information). I hope to have updated ones up within the week. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | --------------------------------------------------------------------------- ************ From owner-pgsql-hackers@hub.org Mon Dec 27 12:33:32 1999 Received: from hub.org (hub.org [216.126.84.1]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA24817 for ; Mon, 27 Dec 1999 13:33:29 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id NAA53391; Mon, 27 Dec 1999 13:29:02 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Mon, 27 Dec 1999 13:28:38 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id NAA53248 for pgsql-hackers-outgoing; Mon, 27 Dec 1999 13:27:40 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from gtv.ca (h139-142-238-17.cg.fiberone.net [139.142.238.17]) by hub.org (8.9.3/8.9.3) with ESMTP id NAA53170 for ; Mon, 27 Dec 1999 13:26:40 -0500 (EST) (envelope-from aaron@genisys.ca) Received: from stilborne (24.67.90.252.ab.wave.home.com [24.67.90.252]) by gtv.ca (8.9.3/8.8.7) with SMTP id MAA01200 for ; Mon, 27 Dec 1999 12:36:39 -0700 From: "Aaron J. Seigo" To: pgsql-hackers@hub.org Subject: Re: [HACKERS] database replication Date: Mon, 27 Dec 1999 11:23:19 -0700 X-Mailer: KMail [version 1.0.28] Content-Type: text/plain References: <199912271135.TAA10184@netrinsics.com> In-Reply-To: <199912271135.TAA10184@netrinsics.com> MIME-Version: 1.0 Message-Id: <99122711245600.07929@stilborne> Content-Transfer-Encoding: 8bit Sender: owner-pgsql-hackers@postgreSQL.org Status: OR hi.. > Before anyone starts implementing any database replication, I'd strongly > suggest doing some research, first: > > http://sybooks.sybase.com:80/onlinebooks/group-rs/rsg1150e/rs_admin/@Generic__BookView;cs=default;ts=default good idea, but perhaps sybase isn't the best study case.. here's some extremely detailed online coverage of Oracle 8i's replication, from the oracle online library: http://bach.towson.edu/oracledocs/DOC/server803/A54651_01/toc.htm -- Aaron J. Seigo Sys Admin ************ From owner-pgsql-hackers@hub.org Thu Dec 30 08:01:09 1999 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA10317 for ; Thu, 30 Dec 1999 09:01:08 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id IAA02365 for ; Thu, 30 Dec 1999 08:37:10 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id IAA87902; Thu, 30 Dec 1999 08:34:22 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Thu, 30 Dec 1999 08:32:24 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id IAA85771 for pgsql-hackers-outgoing; Thu, 30 Dec 1999 08:31:27 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from sandman.acadiau.ca (dcurrie@sandman.acadiau.ca [131.162.129.111]) by hub.org (8.9.3/8.9.3) with ESMTP id IAA85234 for ; Thu, 30 Dec 1999 08:31:10 -0500 (EST) (envelope-from dcurrie@sandman.acadiau.ca) Received: (from dcurrie@localhost) by sandman.acadiau.ca (8.8.8/8.8.8/Debian/GNU) id GAA18698; Thu, 30 Dec 1999 06:30:58 -0400 From: Duane Currie Message-Id: <199912301030.GAA18698@sandman.acadiau.ca> Subject: Re: [HACKERS] database replication In-Reply-To: from "DWalker@black-oak.com" at "Dec 24, 99 10:27:59 am" To: DWalker@black-oak.com Date: Thu, 30 Dec 1999 10:30:58 +0000 (AST) Cc: pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL39 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-pgsql-hackers@postgresql.org Status: OR Hi Guys, Now for one of my REALLY rare posts. Having done a little bit of distributed data systems, I figured I'd pitch in a couple cents worth. > 2) The replication system will need to add at least one field to each > table in each database that needs to be re plicated.  This > field will be a date/time stamp which identifies the " last > update" of the record.  This field will be called PGR_TIME > for la ck of a better name.  Because this field will be used > from within programs and triggers it can be longer so as to not > mistake it for a user field. I just started reading this thread, but I figured I'd throw in a couple suggestions for distributed data control (a few idioms I've had to deal with b4): - Never use time (not reliable from system to system). Use a version number of some sort that can stay consistent across all replicas This way, if a system's time is or goes out of wack, it doesn't cause your database to disintegrate, and it's easier to track conflicts (see below. If using time, the algorithm gets nightmarish) - On an insert, set to version 1 - On an update, version++ - On a delete, mark deleted, and add a delete stub somewhere for the replicator process to deal with in sync'ing the databases. - If two records have the same version but different data, there's a conflict. A few choices: 1. Pick one as the correct one (yuck!! invisible data loss) 2. Store both copies, pick one as current, and alert database owner of the conflict, so they can deal with it "manually." 3. If possible, some conflicts can be merged. If a disjoint set of fields were changed in each instance, these changes may both be applied and the record merged. (Problem: takes a lot more space. Requires a version number for every field, or persistent storage of some old records. However, this might help the "which fields changed" issue you were talking about in #6) - A unique id across all systems should exist (or something that effectively simulates a unique id. Maybe a composition of the originating oid (from the insert) and the originating database (oid of the database's record?) might do it. Store this as an extra field in every record. (Two extra fieldss so far: 'unique id' and 'version') I do like your approach: triggers and a separate process. (Maintainable!! :) Anyway, just figured I'd throw in a few suggestions, Duane ************ From owner-pgsql-patches@hub.org Sun Jan 2 23:01:38 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA16274 for ; Mon, 3 Jan 2000 00:01:28 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id XAA02655 for ; Sun, 2 Jan 2000 23:45:55 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by hub.org (8.9.3/8.9.3) with ESMTP id XAA13828; Sun, 2 Jan 2000 23:40:47 -0500 (EST) (envelope-from owner-pgsql-patches@hub.org) Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 02 Jan 2000 23:38:34 +0000 (EST) Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id XAA13624 for pgsql-patches-outgoing; Sun, 2 Jan 2000 23:37:36 -0500 (EST) (envelope-from owner-pgsql-patches@postgreSQL.org) Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.org (8.9.3/8.9.3) with ESMTP id XAA13560 for ; Sun, 2 Jan 2000 23:37:02 -0500 (EST) (envelope-from P.Marchesso@Videotron.ca) Received: from Videotron.ca ([207.253.210.234]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.07.30.00.05.p8) with ESMTP id <0FNQ000TEST8VI@falla.videotron.net> for pgsql-patches@postgresql.org; Sun, 2 Jan 2000 23:37:01 -0500 (EST) Date: Sun, 02 Jan 2000 23:39:23 -0500 From: Philippe Marchesseault Subject: [PATCHES] Distributed PostgreSQL! To: pgsql-patches@postgreSQL.org Message-id: <387027FB.EB88D757@Videotron.ca> MIME-version: 1.0 X-Mailer: Mozilla 4.51 [en] (X11; I; Linux 2.2.11 i586) Content-type: MULTIPART/MIXED; BOUNDARY="Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)" X-Accept-Language: en Sender: owner-pgsql-patches@postgreSQL.org Precedence: bulk Status: ORr This is a multi-part message in MIME format. --Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit Hi all! Here is a small patch to make postgres a distributed database. By distributed I mean that you can have the same copy of the database on N different machines and keep them all in sync. It does not improve performances unless you distribute your clients in a sensible manner. It does not allow you to do parallel selects. The support page is : pages.infinit.net/daemon and soon to be in english. The patch was tested with RedHat Linux 6.0 on Intel with kernel 2.2.11. Only two machines where used so i'm not competely sure that it works with more than two. -But it should- I would like to know if somebody else is interested in this otherwise i'm probably not gonna keep it growing. So please reply me to my e-mail (P.Marchesso@videotron.ca) to give me an idea of the amount of people interested in this. Thanks all. Philippe Marchesseault -- It's not the size of the dog in the fight, but the size of the fight in the dog. -Archie Griffen --Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg) Content-type: application/octet-stream; name=replicator-0.1.tgz Content-disposition: attachment; filename=replicator-0.1.tgz Content-transfer-encoding: base64 H4sIAOeZbzgAA+w8a3PayLL7Nara/9Dx3iRAME/brO11tjDGDjc2OIBvTm5OihLSAFoLiUgi LGfX//12z0MaAX7UOU72nLpW7cagmenp6enp9xCwmetYZuQHxR++1QM7pVq5DD8AQG2vxv+W d3b4X/mUsKVUrdbKO7XKLrZWdkqVH2D3m2GkPfMwMgOAH2Z+GI0DFt7ejwW3Nv7nPkGy/xfm NRs5Lnv0Ocql0l6pdOv+l8u1XbH/e7W90m6NWsu7pR+g9OiYbHj+n+9/q904vzppwhFst6A4 D4NiGFjF2Tj84vJPjme5c5vFra5vma5sl21FwzBdFw7AGwb+NQtgKv8mvAXYZWw6HnayXGZ6 d3cVXQ7AeBZMYXukOquv0/TXZKhhGLJrjEvBAvlhYjwbW5b2/r8ycu1Z2B6P7SFs+8lMYRQ4 s/irodA8UJOvwZ3eCXeahqu+GtqyD7SFrEFPNW2aQAO07c6+wPb52m65ztB49pNAQKfZzIys CR7DA+OZ7YxGm5hgaFrXzLOLkeXPiuqgFHzXfnhnC96A+rJX2C3wWf9q5n96dPn/zbaH5L+U 95vkf6VUS/T/Xq2Crbt7ldqT/P8eD+6NWS6V8+VSzXgDxdzlBfTnDE6YBZWfcSMOqtWDnX0o 7+/v54rY4yelELaKE3/KYqYpaoykZNeW3v+XcBkWo+UMZcHkzVqDM7M2vZ6GY/G6mHuOmHEM jL3arrn3cxU/PnO8SJmQan72fs6C5SH22sNeu/m9/T3qWcwBAuiZEfw36hYoQ7l6UKrg6pD9 SiXqATlojSCaMECOiNiUeVFIHwP8s3CiCZjw6vWrPPXw0t3AQkqEMAr8qQREzZpIngW+xcKw IFppEepxRpD5QggPSDB740+lz3B0RBNl/+C9Ya0VXsGrQ9GWXjM2ndbPe03eeEP/MDdkf9zS td+90npy+iJme7V9a29/3/gFnskH4CVHIQoYG7hOGOXBZvSvabmdrywIHJtlD43t7W0x0V3P gyDVSjsmegH5WnWfID7j/6wxZvWgXDso7SSMSX0APuC2oPKRG4DThRHR2IleheD5Ee5hyFxm Rcnm8aEgQcTbQThuv8F9nZqe3UeuhedH0Lg4GfSa581GH16+XKGo2C5cf69/0R9c9M4AefdQ viM2nhLvI93L6mU4wy2NRhnRyH5HQmy9fhFu5VNbnlXd5R+HDwg9m/6MWZRpd06ag3fNj3lo XTYGjW6z3oc/obS3t5fNw0vslAfEZtBr/W8zD6Us/FKKUQWYIdX9ILN1EY57SLVTE81/+wC2 4lk5cwgWURshGOWvlltPz+M8mtiOTdjHnuNu/69cre0I/69Sq5XLpSq1Vit7T/r/ezzFnP4Y /YkTQivk8nOGhrw5ZvgZtSZ6eP4ijIWe43vKnO+9PwfbjMyhGTJAF880bIdk13AeMRuY99UJ fI8kbcEwGv5sGTjjSQSZRpbLboDLieM6M5SNF2ZgTVBNMnPuRoZABTXnODCn4JB6ZSjc/VG0 MAN2CEt/Dhaq8oAl06GcB5TXRXRrpj46M0t6Mfds9NxoQRELpiH4Qsmfta/gjHksMF24nA9x VXDuWMwLmWHivPQmnOAChkve/ZRm78nZ4dRHqJwKh8DQNsAJUIGFRJWKISeQ0PKABkAGKYgI B+DPaFAWsVyCi7ojHldYX7BORkeYHBN/JvcDF7Zw0OseMkDOHM3dvIE94UOr/7Zz1Yd6+yN8 qHe79Xb/4yE3X3xsZV+l6eJMcRsRLC4mML1oiUQxLprdxlvsXz9unbf6Hwnt01a/3ez14LTT hTpc1rv9VuPqvN6Fy6vuZafXLKC6Y4QQM24nJ4wQ0tRHqtksQgWDZpDxETcvRJTQg5yYX0lV W8z5igiZaEvNlg/ZI9f3xsIuizS6HZK2R0Wfh0XgIENE/vruGcnuodL0rEIedvehz5AmDC5d 02KwDb05Da9WS3k4Rj6nrhd1gFKlXC5vl6ulWh6uevWCkTo+RcNAIyQJBRxAf0J8HdJGyxCC Iw+XMAlxBeh8R7gJ4RS3lO+NhwSbiiNGJiXSxPNtZqBZqT8CXKhW6HMmpI6qpUCTI9Vxwvbr 8goWwdzz0LoA31uFS8Cmoi9KPOROvn9tDlrABIQmV4FNBI0vwrQsNkNb2fI9Dy0sRB93GnIE /wOZzAsmrGkSJfTF9gnKwsRl0wwEThuKJ5060QmnSQiKJBla2pmLeq/f7MJxt/MO/1x2Ow1k 02Yva5Ahp7kQke345ECkXrnOcPUdWVrpd2gYeatD0SEJfeuaRen3Hosc/L/oeOv9aXn01rjF D9roBW30gZK3W4l/ZRhffQeF7O9ONPDwZLjLDLlEhOWpjXYeWnjCyNxquH5IG56QmIw8AAvf s4wcQC8IVqaEH2+ImXsT349WvB3JcZp/E02CuWAc3B9SGWjBzhluBceuPpu5y54anrEmZpBD cFM8pN3mWavX79b7rU4bN3gQsDHHmhbxxbHzgIZul4WoDwi1tGmNL4q5fhoRjzGbH4lrz1+I 07NANzWF2wLFO/oIEaKXtsz59NtvWic0F/KENVvqxjlhzClEExdPjq/OFG0z8sRnIV7lAbxA Nn1hZ//uoUGfghK4zEvgZhVIXC7iII16wd+3mPWHdyDw/qp1cnAEL2w+L8JU4BO/gdP1NsdA dwq0ZZFLNeLOgeAaDpID5eeEXNZm5bh1lo35jX9VnVP96g08rFpH/r23setp/eq8r3Xl3zf2 bJ10L7SO9HVzv3a/q/fDr7f0+5/6eaojft/Ys91BKmo9+XfVkx+mMv9yIw6UMLNQmnFtfzr3 +FnkTqsQqa0TMmHAM6dMqUEu1ddOHXq05ERzkcfPWZeNHZLbbeyeuedkrR8nvqX6cUAIg/7H yyY1Tdl05TRwiMjOzj+YP0rN9k9y9L/Ko2L1gdCbTwz7CAyLNOMsC2/JjOD6GY1G5B+p/Ikt j4UqSpiwIfRL32/HrasqKeY7cjAYl+w6/yCLJzt5KFnWm0+PlxELlfzNdYXFmOrMJTquNaN6 I/uhZflVqjfkKL33ZuYlNruTzxrcaqVYUrABhY0HH9JHM4WF4v0FekEMR4hp9RWmI5J8jXJP b1+noGy8QkXx1dU9cIVqUSsLo+fGkCeLMG6gsYD2/oL8ExRXaOBJ+5zJrY6RD7HBmmTEWyFx NJQs8iTPO413g8t6412zf4BWKzOvD1c7nMXtsgPi0PFc9Kz8cXrWZ3wEJ4MaYjyDVcskRodL uNVtegYb0Wj+rZUA1Q0xZVMhUk18jQRFq95dHnC6MImjE/4aEwXQQRqR73uQkGKzpJPO0tC0 5RIpQqnvEmzEJG6+0XYP/yczihzsWBkJz8xjX3lmNpoHdLQS9bX5KRp0UKem42X4Xl6z5QDP PVp7zH7HlofSWsPXSpeJNlIufrBEFTieCrNMnXm2uERNBZufYu4qFO45+QeZrDj8ZJ6awwYe Jjtg3qdy6fOhWp6UWqQzSbGSN27JfvFYLx6KR6q0OnUaEB6+IdIHQXEwtKfclw1n5sJjNoeJ 7DO3hPAzbTsYoBvPU7J1/KJEGzWOUCELg7aH5J6tOEO4yejaFnJFgYYUcmIc4il8kkz9dNBq N/t56NHJQYHWrF/I8w7Jgb/ruFvI3hGT8BJ2Sh35G6VM43UU0K8YjMyp41JaQWJxuNZj5geU qzmCCTrTYeYcRW6zPbjsdPvZ9c4m/8D/xCNa7frJSXdQb3/kZ+qcBqApwek8/AcL/MzLTBoK vUSR93NiASDdhg7aF4J2ecisbFAuCy9jGLH4XOn0UP1AM+n6QD+OI1vqBoEUMSV6BQqtYxQn A5Rv+tbdNZMY/pC5ijkZBzj3/VlIR4cOOadhWgNt4Fyh/RXrxme0x6VL8gop30O6EWNuJB+C ynLvjbD5oNx/buNqnlzC/VI+cn2nZkPgItxw504mCKM0l2il1d9dRBUTrDnMd1CY20xSHxZz ZzzdlAx/LheiVi6iJxTdYYs4HCS7o2CTMjIxstJ0uOTGtRB9Dz/h1P8hy9ARbY1ItJnKBOSy TqKboKQwOtIQ4fLeU0uiBZHM1UJU4HsrsatCoRCrw02WZMwA2cM1raWrq8RM2kSNNbixxuM2 DwF4/vxuMpGkNkcsWuK0UYK1IJxIuyoifCCdx2YU5LOuVzVPgqiuthI19Po1ZXylJkzvzQ0Z FujmIMQPdHQ5pI1benNH/H9D2v7Rcwz31P9VK3s7qv6vWqtWeP3fTvUp//M9nqf8z1P+5yn/ 83j5H+MnZ4QMN5K5icHbH42f8KvjMe1N/Oqi/rdBo9NuNxsUfOhRLVLcplnIXEzu7lSrCSxp ISrXpFJKA20fJ63VSnm/koBFt/ld86MaCLs/7yRDVTGJatwraQglITw5Eo9v0ijDdGpkuVRB uCLWf3Lcrl80j/7Ymi7t4dbNITmSypMKtYyCTDvRN4asTSzOQwkUjyS9ibpcSB6HDKUppTMc Mg+pA5GcoUsGf2iBg7wWJMjr7n9ed9vzcajzBujfQeeyKaJBiKmCLczLP7ingvPykAXPn+AK gUcMPikSfBbRMxV0wW+x4xwHfvG/eZhIl8TK48HfdaLIxsiX0SYWGLLMjKxHbjrBBZ0xPXtI 4hHlA5rDOJOQNsrz5quZB4xH7mDDKosS0dYJODai44wcFgIzrQmX+3KrkjQintwpNhIrZBx+ egOmEJiitPs1cdBFgoUTboKg2uaUfdpFR924ScUBkfh4dpiHchyxUYcnV/zxqfboER/N/vP+ mvof2N2t7Kj6H5Rxu7z+p7z7ZP99j+fJ/nuy/57sv0et/7mj2sXxnMgxXQqP3RoA/5E0O1kB vDRGi55oJpKQ1NQT1eG/Y71Lqre9Mn/kTNm3LIrR8hA/bg7CCzKqUKbsQMYIGV453Lcx43aJ nibPqTz5vclKFZN0WfxKptyFYXVpBiFTMf6RM55Ls5YuX8a1UEJSiPiQIeNIVN4iA4ZyY2Tk 6RHyAd6/kg8oFvudkw78Znrz0BDRLEVCRAg/EW2HSypqyGzxXgpcshO35hHWOt2TSljrz9MI 1D+XUWFi+TaX1TDdfjPhLyUQlVNYAbaeVNictFH7It/cGahOprgr57CeCti0Z4m8EFBvyUhL vCl+OA/Ws+hyy/iG6Y0FZbFz7halSDleFIiOgvI3vCWKZJTOc3QbuPAzw+vVkjFicx+dBREV D1EL2f40Q5Ih0746PxdFJKmZcYIjkP14KxKXymZiyj4wu/9AQvJrJA9I7ce05NKeRtHSabVk FKXlNl9rqizmjtsbh2s5mdWCnY2lFEmCQaVWzLj0TZaTKlP3C5oP4rJmEsEWNTiB9fWWGpx0 Gc4mEqq6BVF+A4nc0LLnWoJBrwFAumhO+qFKRFHpUTo3n5TPpYt70vxwVxXEg4UggbxD9tH/ SQ3lveH2f7tH8//0S8aPOsd9/l9V+n90/3Nnl+5/7JbK5Sf/73s838gES16i9Tn7sj1i/HUx d+b6Q9Mls+byjHRVDsYDSn3hx08UM/x8aKRLF3mzXtynuol6DUrihTyiCOl0oFSEFGcjISys JinuVOIxQMM7EtG3cK2A7Vj0lfXKswGOp8m40ODBKzlDT1zN3C1XPitrz7nXTDTimsoVrRXP c0dtJVU5kd9IMTkqtUZVnJoAvFd073GGfqzNfjPR0QRunl3URbUEKoWMc1RCX+mXmILw+rUj 5WEx1/ahzUQOWoUbxbVXzxc2KDbI1cdqAzLrO+V8pnJqygqvaPKsFKcy26rp0nUgEkPKg6ar WrOoVV3XtzK3EkqS+HaYDzIaOKjXr2NekwpH3d9MsUEetshCOnqBHilaqUe7O9UK2EOyo474 3c5bjCkRIt9Q2v0i5JXcqUl4t+TYOESay/eyiz3MrHc2xA5dvqdCwnmY0QdnaX+SJMTguH4i GUHEtsPxp3Ll589SGyeXVnGpf/cyidKIM+vEgcgg8vzAi/AAqSGLcR9CgFgf4yR31lHH0fT4 0JszimuE6xcWeOBann4aFqLVazHun6hyLq3gWomJxcQXsfj5Csik3pWXEMbCIswk3uJsQPgT JS/PAn6HIYeLD1My4iGH8dQJ8DBbE2Zdg7NyQR1N7qk063gUJEacAg7eqwgWplg/t2adSNWg FIsOdzGs6Uz5Y2SBEjccHaFxhByEPv6cxeXDfDHbb6SVdrThmIqzvjpUzpbiaHE3QYcodzng layX79nvzEqxqNY7sfmIoZ/TkD//xDGCwj3B3vglSxe2L8+6zd6g0bm4qLdPBp13scEXc3al tKNY+x7mjstEgY6kNG7JvCYeBnFMdSzzt9BIsXxsUq4xO9BhdpkZ8HXoBrNeWLK5zw0SWxaF IG4rNZv8qlcH3b3f/CFnexLvps1DVNoNmVAxVBxu4gqSx6QUK/EYFQ7z6W69cnKUCaXYMOR5 IQk8uT8m7qT5VKiLHiBBcjzkaM+S50+Lgmr5pQQdylmlq01/TEdZ5DECPFXGP3ET4T6n6y/x r1a48R4nS3xedbUc9WMD8hCrtKdWsfVBFrGmNT/uiG49pau/YN1kWnXPbimLEjPp8qwg5Knk KV1oJ3VhqxL3ZXJyhEP2H+V//dWP5v/hOTi5aH6DOe75/Z+9cq0W+3/lCv3+G358uv//XZ5u EpL7WiqUDcM4b120+tzy7cmUlMwD0sVJLrUj9C4oeUTJFxd1/e+Abp/HXKgUKoVymaR1l9lv zUi27hVKBaPuhn4qrUjgdEgC9II8i+lsHtHdZin2PYavg+sC/T5P6E8ZaY0JjqYK3ZCrMZ5e kvUHiDAqxlD+EAyaRzzxRj/lQvdnCpChlFX9qv+208Uu3G40hsz1F1nDaIEZhvMpaSuyRma+ F5pDx3UiMiqTZJPKQhbgCueJc3sLDwInvMY5WrCgAJLhOtdSlgpx7Xi4skS9cp5aVXp5uvqs sptkTZkWKjpmmOIermkHjJSx+MaErufjbd+aCyHa15EEDpptm+5sYj7nqTeOHIVqIz8ykeoX 9ROO5dwTkCiLS+jYc2FW60lcIABccg8Z8wycwWN2gX7/Rz2G8eEtatRWD1r9X9Ms5JAVQQUp 28JHSFLISLM4yUz1IiKlzHeRLqGHwEOlvayBo4gZcPR4wsgaoIEhMnEkbPTIxB0Ac8qrdAiQ MF5wmDPFJSHeqFJ5jQwaHGEhjZ/tM/E7QUuGQOcznmmg2xeui+zNf5cHKM3KE3NEQwph2+YS 9TtZNgbn4YW5jPdvw5x5osJwmaR3ycARPMrr1YntDfolPjSUKMQh7Ka6NKPk7w9pP1FFW6yu xseRDX6JQZ2IgsFvRdH7dqe9fSuImTk2tdLzpCybUrbG284HOOk0aVfhQ6f77leDJzrp1wm8 NZi0RpXUzSeXWMmhinO9aY+KZ7P5kU3ngFNuHP8dJ7E2vY48idvym+ho1dG+slQfHr9Irtmu XWfXTF2+l15ee21wSPEvS8WG8vpVeFXIv2K5huSnye0wMglgDWPQadx/2wTyW67arYaIINUv Ou0zXqDXk+Tvo9xzPGeqsslobwUmUsfkXMCNJ48Hktbse9pvrcbMSGWV11eP7hJd7OPL5dfp nWidFgU4UTxrqHKQUJ6SuUx3ciqIWfJJMQJ/q1yAAjQV4kAyPvKN2BDFI87cUXKpMLVlHQHM JikcOrTsib+gKwF55TXrW2mjQBdLxAW5hB1tVhj5MzprkAkZo/pCd5mFse/bVANnamKTV2CM jERLMJ3hN/pPYqkyPSQYk4nfjdM8mwIdNC491Ulrd+izsU1JYyGFxA+YhbLjqx78X3vX2tzG dWQ/79T+iAlSFYlVEEyKlmRbqexCJCQhoUiZDzv6tBkSQ3BsAIOdAUhjf/32Od195w4ethMn dtUuUXFsAjP32befp/ueDN5eppdn8th/yHOXEZAy+z4v71WDn43lt2+9dJ09IGP+Xo4d8CUL 2Rd5qNYFDKadHLeQqiAN9MGg00tznd5nfEoZrBH67WTVCw8Ko8dshd6XdbqcA3MxG+9ZQZOe DTa7Lu+1WFsB2xECXv6vYpLepByPeYoc+omXPoihXzyTcwPblSJxHkBJpFFKJ7cvXazizeEm N3Y2s6VaX/omv8mWTGjNZ2sPweAUTQFnqc1KMjtzsqBTUS3b9U1aT97K2a3vxJhNP+TZjEZn d53v8qSOSiUZ7qIJ9xsnxbqc3CtKSWaNXU7Xp8OF2CiKF6UXyUHIyHbrnMvcFKdBdZiGkRj9 Z7TQ1sw2DmqKsr3ZTMfvW+ioQll+1CCxc0BIAaZRyaFZgGaS5M3VuwsnUlPzfJE7VonvtS2O /9lBa3Xuf+t6jblewjhSLBCcSYRQLR4KsJchZVPGCiEzEeZjTn74ZEq5/9+E/dTLClirB92m H7CZohD04hMU0RxfUm6yAs6HYXifalH5ZHvpRankkKXXxUJdstATqijR2z0JHSQPutM0dIyn R6CxDoBEo+sOv6HkbKz2abnWFouxQBdayrFe9Ri+gDNR8Qq2cJ1iflN31MeDrvB3Ne3YIa00 AQzrIntPtYobUf8OUkuV6WQnlE9+6Pk35X/eCy8tF6JO9m6yRyjvb/2J7P/l/F/Uh9j/r37E /k+fvzww+3//4OWrA9j/+4fPH+3/X+Pz+999Jqzms/ouSX5vduNNVcyDOrCcwxosq1FefSVP HOzFCoar355bKQ881wdOt2nn8vPhnmkPQTzFbs2k50Uo/yD/OWv+s3nmD7/1gv0f+0TnH5rf v6SPnzr/6SvD/7/Yf7XP+t+Hh4eP+P9f5dOcf1gC0Pv9cgL/e7b2d0Mxv/XYHz+//BOd/6Oz j5+Gp+/++X3I+f989/n//HD/1cs0ffn54YsXL/afHwL/dfji5WP9/1/lo3XAmWgxOB2c90/S j1dvToZHqfwzOL0YJP/mNd2/8QSXbvrnpZhmB19+eZAk6XpKzxdfdvnTzowZzblI0p2fl69e oFppnfbvxSI+yoQhFaNxzgSM/ecHh18y9SJJB/d5tVIjERb9tFgsHJo0X9GSiTKD5Nlr6X6K H4u8ToLXfGJZKu4979LBe3OXzeh5KGgush4IXBpwdSf/pmvyscplbJMctUwuWVmILdWW8FIv Ghc8zfi8LsYzc69m38uX5iiuEqQ2jeAyKjUlhIPnEJDS1EvTNytGAqoM5fK3Z8Yknm5D//1C 7H3tarzMkOKTWxzkx7rCb4mP+dkzQsi/N2tYrfYmosBsLEa/malbayyjl1KHTHZkAgV4Tqnr Y76XXdk5T6IghoEPguUZIAIPd7C0s+XirqyYFsw6umWyrHX7ZEhPL+BT0td2UWVrcjfwmtGr kvhinxTXVVatduU4wX2ZZ6PeXsr4CFz/WRPJTrj0NmJ4CsqyB6oJmULzPGNpkFaOW9frrlT5 LW5HoFPDN7ALmkzmFX0aBHVsH1m9QXvxnmroKgmB+Ig6orOjR2ZjfOlTo51qTFJINDaWV/fS tTs3Hor6bq8bugrODgVgSdNiJ6DcjyzYOEdgJfEXgVwqFtGrjIcppbaoUV6Hl07GeKOjRCMz IhY4Xl/31+bdsOZYItbbHZXmSaKTrebuXJZ4dYFsCu4fuVzNXXEQJNdShIaslJUtZvOyGNfF KBFiBXvCYuYzjQlpJ9oSBg6Srr/Xn0rsSpWHdEV9ipGMer0XFIxG/iDZXV4tcLeWuaKLELDE 8UTLuqLJ1h2NV5Kpirb8IX+SS/FWfsh/yJAh1/UntjZXL1EJugllPtzlOHbJGO5bzlgRM7e5 NMR+4CQdu/tLqKOYaxDMgxa2VlhXHCM6unp6yvjuGjnDCccD1g2kFpEXgh0R5Uk7fSGJMI6a Pr+7fOrEwIRRjVuvlGDo0kt8axLeOLKFSixI8SB7usjn9Vfp04M9yiUVle1VF7JMnorhXCJ+ YmQSSaaHO5TXxhrV/HGSj+WYU+LVtYEt0XQ33mFNg/VtjPvjqBGN73Iv6N5V9vmk9qkQS6m5 QUrwjiPkahvBJVzw3KUw82wRxBnVYSuUnc7KJtFUXeEmQJImkB7u3onYMAdfqHfbHcIAEXFo 88zinRhfYtyijinIy0BzMA9OHOopNZlOd7tsSTHLJoi765QgZGQhRLRPKUsZCNdhqNdTXa5a LeqW18l4JStrKzF59ATW0nKRWTV0nCT8PFl12UnMnhSHyow7MGoR91jLhYgQzt6E4xw/w5cM ugNvJQchElVTjoQ7VjrjKDG6VJRBposeJCcmUcxGxX0xoms4La/JSLSToM904QHKhTZveNos vSk0g8BxVeSiRa96xjQR2ltwm7uhLOI0GzE/m6jF1NfZJqTH7zroUIqsdNJ6YuoGuDxLxS2a 5zQ1vec62Bz7H04u5VMpM1Suecu6cvMVcdpN+F1pXTPG3aV/W0Lb6yX/nvykgiy/Xg7OP1yk /dNj4KiPh1rLBUnTZlJ102PAyYdvrjSUKw9+ODsevrXYLga/bzGULaqSkSMXG/Ec6jHEQChn IEZEFJAEsacFZO8cScwhdb1hO3flBMKlzlam2k5FA73O40zzZGu6vAxsu3rR02XvfNTxdUR7 BpSmm1BnCcOnWIjmgNF71KnDqSBcHNAL3lqCiFLtGfbRL2iDIJG8Ku5lx+5zXRAdfDPhSfbw lZ5phbvKzKVbfdaWzbP14paJ3QcZUJnoJk2Kv9oQmAH4e0wytbPcIJuRtc/5c8eSiZzNZTbG kj1FRWVhBLcLhKf9BatsY5kzo5C1PylEpbWfZ4nvTNqJe+9A82Tk3E6GYm1Go0phIVmddkR2 dOSg9IW936uCUNq6Emi041y0JkllEopnoyErdRg5vFYWS61suUAYnky5ltadVIBmKm8Tx0fE S29M2TUdIEcsHoXWWsgpeyWJlHXGOXnXF8sokk8qGy0WlIjpBqEl3vNTYYP5HKrXjFaJxlKJ cFLGJfPcMuK9HjAwvohKZNUS6vbc4c8ud8IkATEiuzpQKBWO4M8wWF1Xs2ae1LEeg+2NlWuo zcBwyQmZihRYiiKGwHdBk9CVfizNvLhZlst6or0LzyEvF9qVb6wERsCS2CDjp5LmpBnnsUnc TLJiqsVxXfK/1iqIheLZTLtL9LXaJdatlyiKOeEsIJIACLOYPeYWmk7wDJXIxj6MFIH20rXw RU0/7YoS4WkrVsFdUkuHymsI6c/vVnUBPJLStR5mN9e0J1XwVtZKuxKI6XxBPYr0LwjdH9wy d6WZlPO8oRzT7wziiFlV2wnGOaZxtkQ5W0pkBrAQOtydrLhrslTpNFY0ydrbjNAY/LbKKxc2 uYOEsI8tdCmkIQr3NM8XjkLxEL/L8a+0eGy21xgBitNg0Nt1RhRTIM+/kbXlwsocFVBHkrPk YkJSIhuT6608R1twDjSCtWWEp0/1dBzXG+MgbRKj6M1G6wWAiZ4sM22LGZtRHBIMMNZ7XgSx zu9qFXUxtqi9sZZLXZnaXd7CCGppVADMWi8ZVsHpGSKKp7GoRqEVENAuTcBFv07/Zs9V97D0 Lui14rmwEiB16gZrAPdUlUEMCZ+xyQujFQYb2YS6lKBR/shiKtK2c2GcCJCeJjY0DVJJtGo5 6mOqRiJpK8IyuSZM8sGqyaaIogSCVnqazcqlcBeFL1II81C0OF66leNlbMC+2G37PIVOO0FJ INPAAn3YKdBxhBf2GoeFIoVx4lv1jsiBbLW5XWxh/cCYGM0nE5dfaM4BLvdF/rDGE9lKo+E9 HfyAUsjS1FeOpQsiWyF5RWtT4UFlEwFOGyhBF1+9BLPWkneVibU4kM9mU0MgWGzUbOdaY709 0dzdb8JnDeiq2KWyTa/sszkeNEYTr3qXiRnoOHE7AzAn+YoqQzuPZpdyyapiyBmoy5m0Rlcu VKOKGmKjd2jlP6CNF6rO1qbvTWWN7w2vOGudQd1ZQp9xRLsOP4rmWYpoC8N/CCiiFg1NU96J FHcNp/NyEV5I1oiOmM3QLApmZQr3chajpgmRVs2WJutChYw1VjhNaGkbbhTaW86FkvYKqAO4 cYeonadKgCvDNcCxeVAkEmxtZd24kql54uoOkS9ofOq0qnycVaMJEXG3xDA/QEyrc+xSXuxG YQKMlP73RWCYdZMUS8Uo8v9RUa0XSew6coQ7KlhJO6kOVh0B8tzrVHbpjoZD0xXNmyT/Ia/U /HXHmWVqL6pysnWxIwOqrESdm8Cb4eZUvVUVkDkPieEvNJozJTpsPMYqebMOAuQ8WAt/S0PJ uq5FBmlXLe/URPa02OR9OVkyYTcBALOsAHRTnt7MT3XfhgtdV87/otEp2yRNw0rZKuUOf1xV X5/C+uhhQqowdfXn+R5RNdffwafiPnBFKpLfQCPbIn+TCz9xBxzDcwPP7lCihBnAZWZnSl0a sgKN/tS/Qa4N1BWWk7PdwHeTnLKuUp8yBaGh159BmGOQqkA1RkjXzryf2rh83W5NUGVNezpa TVQ370ZaK6dZhapQS3cMNU5CCB3Vxl7LEnaDRrY5syycJ6rc3fQ+mxTaHBKVhDsv6H/Tea3y rGKgpjErqCCRIay6ppCbBjWzBCQa0gzoUTGyCJdbCJB+ijsm1FwXLqbXLqWwrj1bWF/xVl5E e3Na+0DFTwXwz9uD3euvM/kH9uBmF3XFBWojm5X6qSeeYINU9q/FoXZMGToKvWfZRMYyU35m aoyFbdU9oDdNzJgMJpxSzLYNd4e7ESD08H4rXSX4an/y8HK+QUHNAtXBLJd1qSzp7WJ57dLh WldfVBfmrsTm/W3DVNQjpmNhWFC3YxokJx5CMM48tW3LjCk0PY3nZLN40OqRC0dfe0/Yu3bp 8ZiNcQEyVI6WsJWKxmoRy26yrGmZZHVd3hTuEJMjgAqLLBtdhLJ3/rzyYQAQrUCySDWXXxhc YX4yqj3wkE8mWaw4NDOSWb73BBDodkk9z7njuSuz3Y35xMeFIT5IDfPHJaGsmbt6glIbv/YU Zru6C61lWaNrWiAJ9mkvTl/5jhrAVCia2ulTnSFGrAmVqprUYON7NsPEUiFwAFb1QlQ3Opl4 i2xr/rCUZFWXM+otHHPoKjG1PbMT6iUzotUTIX+7oS1ErUPFik4AojXmJyOhIxOhSc8k37Lk eA1Fkxropjat1t9Koa4La8Yo1xrYoD5Xt6mMsjHkRVDPr5NtamWLS1r+Ubkc30W8vbCIuTo5 p/Oc6TFbhrDmLooWA1GDNP280Rm0Lh8cQequEfuPTvRQHna7LpEopYJ68x/mcOTSgDJR7+w8 UlUQzYSDSahivkio4zxQGyx3dr+7d/BPxJWUBhkrypYQAwsTZpAiBTayFffcMqwknENfYKjQ 7bqv6rPiYniYndsLCeEaWuQTDPE3Ry4UVQO/CQPj0eE2wbwBL/YBiD2IQJf873Y5Uc4yKZh+ Bun1QrfOzbvY2rRLgdoWSF3AKenBaZKOwS3IbMP0LYVHY5hjmPjqtm2Hcs2lJyx8x8aUzIRb j30o9gYWb+ZWWcUg3V1xXSzUVT/JHkL0vowvOVojo0t4L3A3VBcQGB0Q07CqdtbqhuLrjq9t TvY9de4g4HgTqEb7b9X+8D3WGwQRpobH0WFGf09gT0cchp+sLeKaiWNQh5eWgYhSlKag/Jiq /xMzXsSghrUDZMQPE9lPo7O0xAPJ9osiRfQQt32JUYDfxyWn+zstAZosIhhDe80cQmHsqRDJ YJ7L22XFeFULcBIKprlT/UkajE3PoFMGQLqWpbhjiKuXtE+SIVSszm0OXfBGr6X2E2ghpYgd cx5rFtkrFANQwU53CrLaPDIAISBW+3fLEespp6qkRNapxpwT0UQhcXJ/6Nb20+MH8NekTzXa PC0MW2jxas1L2+smERVSGeY6khBAO08N/4JJ6aiYSSEDF3PZO2449Z7LaUD95JgsTNMPXayd kVbOGcQFnJ/oN4jG3e8q5MLwT3g99umXpo3XQO0IedXFdDmRY5prsEgDGCJDxqZXNlw/icM2 EVovrxbqfo9eM9G/sYlQvZ0wd5w9C/tvIpMy392AngmVPBUjmlblSsyE1TNCCqLDHekJ3osw P1V79VrTpo6DhVhGhWYvqts+/CVmJLUKmYdOkZynXRuAo/LlvZZFstr1bR4YFWpARL2C0Aru IG7yjwxfdbgo6LPhkJL/vMsn0KTVGAaSbqaHMqeWp6KXTeAw3iwnqHtYVDfLqaZrK4e7ziZx Lm3UfIRETdQp6fEUfygKS6whVw1AOVMSSuJuEUEdtlxu82VFDrbF5yY7szT5zL/01Efok7qB VcDRL6S6Mu8Z3XUO1DNfnToOChbNZyP0ZuuTr9udM4GdKuOkNUKP8hmSBpMeV9biwmCYjYHd 2mJV+rvBv5qglD85iYr4ucIznPrndMljwdL0A/cxL1HmPkBykjFwHXKsletYN8EUf0AIv2IM Eui+jSHlo8SpnazLbBKiEY2fay3OutDi6nHerDnTE33ptTlRUUnbwr0EUX02Kme6ASORPiMi Swm1Sus70gyUQYr3lrMgjNXH1zAjG6TCTwJewtigSUJlxHdlQZ3wcu3UxGRKSBwGil7g3SfA 6cGMxGtZhvxeDwBqY69LK5Wq9WKDPdOI+KLnwbV1P8Vnhnpd41isQuDwCYQPHBxKw6jiHT6F S6SI+q9XTWQrttOVRzfqyAaWCFyRplfdGsemGUCOno1G6ncAEch2j3M8Pr9jBL01xQj0InJN Y3GJMuIwla5CM7NF+9VWOoC6c2ZUAlATJmkWQlnHsrYO8hFE4kyDU7jaudvmxaLkl3KCESKp ydCjIco5F6p0B6OFH6/L0QbKgMrLlyyxsRuKjpVy9EWV3xeM3uqWA9RsN4bUftHIrms3qANA i8VxwpULaXqBucVt8PCAMEXCF2DuMvZ6XlRRUT+hJxxce0PTIzBCLVGDF/ReD7J4BRz5laaK oNQwhxAiIZBUrv2yFFkY+Ffhb8QWyh4vZdLgi/6E3rLc4EPdNqY3h3dSZWvPbhgSyiojQJ1J 2g6Yd+sSlk63seIosh2j0TjPIwdqW6F2kJhHCH1QZeWogVZX22+YSbaQw8bcm4CGLsJq2xKs BclWAcNSup7vr/Da1B+/7ybKyVDo0n7PlUfHoEang7rCBv6EWDjlvzEKtbb4XesErynVoRjS 2q0uKh8Sw9BDfW8saVMNgxQI8ciYzf3Eyv/YJTKtK4CQwlFOcxyyOqE8CE7GOiCeLU0DQozr 7tf3CMmPmrEAMj4uswlPN89ede9kp2oBS5ySpuT9xglQ+9WybeUBtrNqHNMy2OzI/FFsA644 MDESXhkrP2FRDkt1Oj0Ltwlx/w966ZvBUf/qYsBKRR/Pz96d9z+g5JehYo/Tt+eDQXr2Nj16 3z9/N+jiufMBnojbAkY2aqCLMjb4e/DXy8HpZfpxcP5heHkprb35lPY/fpTG+29OBulJ/1tZ zcFfjwYfL9Nv3w9OkzM0/+1QxnNx2ccLw9P02/Ph5fD0ndVS+vjpfPju/WX6/uzkeHBOtO5n 0jtf1KuNBheJjOOb4XF7Up3+hQy7E65W8sFjcrhm6S/D0+NuOhiyocFfP54PLmT+ibQ9/CAj HsiPw9Ojk6tjAoHfXGlNH1bZk3FennFp/FlvXQYj7W/cyQTk8M+4lIlLKI3Igp8PL/6S9i8S W9ivr/qhIVldaeND//SIG7W2kZhu+unsClJD5n1yjAcSfwALNUiPB29RNPob2V55Urq5uPow sPW+uOQCnZykp4MjGW///FN6MTj/ZniEdUjOBx/7Q1l+YKTPz7X0tPKW5z1snlDJ4BvQwNXp CWZ7Pvj6SuazhRLQRv+dUBsWM9r35NuhdI4dWt/8Ll+RH5rN/yRkhALpnxSY/cnIQ4YZkNtt qhCiaKiz/+YMa/BGxjPksGQgWBBs0XH/Q//d4KKbBCJg1wYm76YXHwdHQ/yH/C6kJ3t9oqsi p+jrK+yifGGNpH3ZTkwNdGhbhjMIWjt1GpG+18/l06bvNfoDXZycXYDYpJPLfsoRy7/fDPD0 +eBU1ovHqX90dHUuRwtP4A0ZzcWVHLbhKTclwXx5mofnx36euM7p2/7w5Op8g8akZ71oc6C0 FjbEiexir0saSIdvpauj97Z7aevUfkrfy1a8Gchj/eNvhuA82k8iZ+FiaGtyZi3YOpKxMddU 5sfntwD4gf3vzwHOKX74Ck5cyAEtT6t+1ktqAfLlJ7DdU1F5TNbVoGOTjyMRr5Ny3lzz3qAp oyw3w+qZyBwzC6ReJGKJqLNsWQcppAae2d05K0yt1DN9B0NDVR9Fu1MSFYukLRFUEoa0nY07 9KKE0BAydiei58W5Y3axyCzw1ChIAdLr+qM6I1KrvFRnt5gaRhzenjb3khqMiDAci7TwWixP GdU8FEUOippwn68sciUqfG3KWgM5JpAHTbGN+MY5j/lTk+8EpaCDkqXmvErnJe0gLXuXWxIs AwYG9UMaE9QAg0L+EevJ9x03EC3Ak1pLzGvT12KB3GpVOUKK9EY/YsP/xLbW06pXK2nfa9Sr 4vMn7fUX3ZNo/mNFUf4DdyWma3clBsTez78v0TuJ7ktkKz/rzsRtC/D33ptIyMgvuzsRTfyy +xNDktHPvkMRb/zyexQVIvGP36WI9zfvU/x5KfxIRgFOCV6BGBYCz5myWy9/CyIWBZmVD6ty JuPXFEDR91EDfqKuzhZCo4VI7Tov9ESSDMtWBRCv1umF+5qAR9S3XNCIYRZFC9sqByY3BNW7 mSjV96rNOzm/lKltObvtk7vxthaC5A7031ycnYi2cfIp1pRfkwJs8/VG7L8xW/XhSa85BOun v5EzZPz5BP1olbgWM2ALljsV/EVugr2Ou7t5Eg+kp1CVu9Uchh3jWg3K28fHMYS3jVo907aV TdKyG3fmm53dMpRi0Y+mP4aKa3g1V3BoIMbGCLDYZfQoRMlOW4dmuUvqmedpv86TaSlNPruR EXxPR8Y0ny1lwfJp/ewZuDaN5xr1/9I4x799xSnBeEg/5iO4Z7RcoS6WZ7oH+LG9PUVdWM3d rpIaJvtEYxszRbAjuIzEucYZ16TcdJrMFNc1UKgUqfG1Zmi+N2R6BtzEfCIigqgpvgMy1fyK T+WqHK1muZ9oyD8tWmw+8WwSewN5QqCNGMO1zqWhv0V0/gQBMWIE5TTWmsLLUtYOfKn3ghNN OvszRpO+R6H/igzvjwodQbK3UMnlSk5aOftTNz0QvawqJqw+AgVFf+iiQkddeE7XN0JB5snd wWSDX8UiRY1PI5RonkfejCTKfA1FBkJYrYpZUYagbFUiJg1mw1ISwSmTOB6cGZlg8iqZGG7U kYhSQTRX3GOr6qnjUBJr3J1GyhQeHBbqadwjUd48Y2ZLdYtke3WLTWfmb1295vHzSz9R/afh qZhzJyf//D5+4v6H588P9v3+h8NXuAvw4PDF4eP9f7/KZ+P+h6+vhkd/SY0WNi+AiG9siC6A D+pP7wWFqvy7dwhFbiiCYWLXQSThOohu+8YIYe4He2l6NUNPUYG5ZzKi3mL8P+o8Vqba9Gnm 4aJiWuRzaeBDA4ahwed7itHMs8XNXagOrY9UdYB3qrEXCrpDqBmCGkH9EhyVuup0rYvm0kNZ kygZn0/opWahloNDmsF5P51dnadeJ51yt5ce57eolwuO3ZmuRtedXnoUKvV+EJGJHnvJ4Z5d gqOS1ufF5bHQZpSu3NEHnn3cT/+4uSCd5HNp7Tw3ARwe6KVPj3mBmGgTLJxSsjg664QTGbaX vJAXj5pyPPPxf91dZz3caG0phaJVBS+Dgrn4p5WxfvpJhO3QK3QYWLteip6117NyWvLeUPP5 6jUfgzT5FV/g5Vv+gUoizONLIamXXwjl7IO5vHjR83/2+dSikiOfvOS01d8Q7q1SRO+qQXU+ KwDoBaY8X3gd5730dpKNheJeSRNUjTs0LqT3jtosupSRGlQnX+BZSw1O14C6pKNZKJbaVE7d CwlJ/NIFuJORQjIKRHFd7fKbUpIvMb2lftlZzjsGqpZzts+rzDDrCZMQUXn70nZX94couspf RlVQf113revbxvLwSxTJlzZ21H7eWfn5QlSSP4sunh6mz/f3fxVW//h5/Dx+Hj+Pn8fP4+fx 8/h5/Dx+Hj+Pn8fP/9fP/wKykq3cAMgAAA== --Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)-- ************ From owner-pgsql-hackers@hub.org Mon Jan 3 13:47:07 2000 Received: from hub.org (hub.org [216.126.84.1]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA23987 for ; Mon, 3 Jan 2000 14:47:06 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id OAA03234; Mon, 3 Jan 2000 14:39:56 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Mon, 3 Jan 2000 14:39:49 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id OAA03050 for pgsql-hackers-outgoing; Mon, 3 Jan 2000 14:38:50 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from ara.zf.jcu.cz (zakkr@ara.zf.jcu.cz [160.217.161.4]) by hub.org (8.9.3/8.9.3) with ESMTP id OAA02975 for ; Mon, 3 Jan 2000 14:38:05 -0500 (EST) (envelope-from zakkr@zf.jcu.cz) Received: from localhost (zakkr@localhost) by ara.zf.jcu.cz (8.9.3/8.9.3/Debian/GNU) with SMTP id UAA19297; Mon, 3 Jan 2000 20:23:35 +0100 Date: Mon, 3 Jan 2000 20:23:35 +0100 (CET) From: Karel Zak - Zakkr To: P.Marchesso@videotron.ca cc: pgsql-hackers Subject: [HACKERS] replicator Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pgsql-hackers@postgresql.org Status: OR Hi, I look at your (Philippe's) replicator, but I don't good understand your replication concept. node1: SQL --IPC--> node-broker | TCP/IP | master-node --IPC--> replikator | | | libpq | | | node2 node..n (Is it right picture?) If I good understand, all nodes make connection to master node and data replicate "replicator" on this master node. But it (master node) is very critical space in this concept - If master node not work replication for *all* nodes is lost. Hmm.. but I want use replication for high available applications... IMHO is problem with node registration / authentification on master node. Why concept is not more upright? As: SQL --IPC--> node-replicator | | | via libpq send data to all nodes with current client/backend auth. (not exist any master node, all nodes have connection to all nodes) Use replicator as external proces and copy data from SQL to this replicator via IPC is (your) very good idea. Karel ---------------------------------------------------------------------- Karel Zak http://home.zf.jcu.cz/~zakkr/ Docs: http://docs.linux.cz (big docs archive) Kim Project: http://home.zf.jcu.cz/~zakkr/kim/ (process manager) FTP: ftp://ftp2.zf.jcu.cz/users/zakkr/ (C/ncurses/PgSQL) ----------------------------------------------------------------------- ************ From owner-pgsql-hackers@hub.org Tue Jan 4 10:31:01 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA17522 for ; Tue, 4 Jan 2000 11:31:00 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id LAA01541 for ; Tue, 4 Jan 2000 11:27:30 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id LAA09992; Tue, 4 Jan 2000 11:18:07 -0500 (EST) (envelope-from owner-pgsql-hackers) Received: by hub.org (bulk_mailer v1.5); Tue, 4 Jan 2000 11:17:58 -0500 Received: (from majordom@localhost) by hub.org (8.9.3/8.9.3) id LAA09856 for pgsql-hackers-outgoing; Tue, 4 Jan 2000 11:17:17 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from ara.zf.jcu.cz (zakkr@ara.zf.jcu.cz [160.217.161.4]) by hub.org (8.9.3/8.9.3) with ESMTP id LAA09763 for ; Tue, 4 Jan 2000 11:16:43 -0500 (EST) (envelope-from zakkr@zf.jcu.cz) Received: from localhost (zakkr@localhost) by ara.zf.jcu.cz (8.9.3/8.9.3/Debian/GNU) with SMTP id RAA31673; Tue, 4 Jan 2000 17:02:06 +0100 Date: Tue, 4 Jan 2000 17:02:06 +0100 (CET) From: Karel Zak - Zakkr To: Philippe Marchesseault cc: pgsql-hackers Subject: Re: [HACKERS] replicator In-Reply-To: <38714B6F.2DECAEC0@Videotron.ca> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pgsql-hackers@postgreSQL.org Status: OR On Mon, 3 Jan 2000, Philippe Marchesseault wrote: > So it could become: > > SQL --IPC--> node-replicator > | | | > via TCP send statements to each node > replicator (on local node) > | > via libpq send data to > current (local) backend. > > > (not exist any master node, all nodes have connection to all nodes) > > Exactly, if the replicator dies only the node dies, everything else keeps > working. Hi, I a little explore replication conception on Oracle and Sybase (in manuals). (Know anyone some interesting links or publication about it?) Firstly, I sure, untimely is write replication to PgSQL now, if we haven't exactly conception for it. It need more suggestion from more developers. We need firstly answers for next qestion: 1/ How replication concept choose for PG? 2/ How manage transaction for nodes? (and we need define any replication protocol for this) 3/ How involve replication in current PG transaction code? My idea (dream:-) is replication that allow you use full read-write on all nodes and replication which use current transaction method in PG - not is difference between more backends on one host or more backend on more hosts - it makes "global transaction consistency". Now is transaction manage via ICP (one host), my dream is alike manage this transaction, but between more host via TCP. (And make optimalization for this - transfer commited data/commands only.) Any suggestion? ------------------- Note: (transaction oriented replication) Sybase - I. model (only one node is read-write) primary SQL data (READ-WRITE) | replication agent (transaction log monitoring) | primary distribution server (one or more repl. servers) | / | \ | nodes (READ-ONLY) | secondary dist. server / | \ nodes (READ-ONLY) If primary SQL is read-write and the other nodes *read-only* => system good work if connection is disable (data are save to replication-log and if connection is available log is write to node). Sybase - II. model (all nodes read-write) SQL data 1 --->--+ NODE I. | | ^ | | replication agent 1 (transaction log monitoring) V | | V | | replication server 1 | ^ V | replication server 2 NODE II. | | ^ +-<-->--- SQL data 2 | | replcation agent 2 -<-- Sorry, I not sure if I re-draw previous picture total good.. Karel ************ From pgsql-hackers-owner+M3133@hub.org Fri Jun 9 15:02:25 2000 Received: from hub.org (root@hub.org [216.126.84.1]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA22319 for ; Fri, 9 Jun 2000 15:02:24 -0400 (EDT) Received: from hub.org (majordom@localhost [127.0.0.1]) by hub.org (8.10.1/8.10.1) with SMTP id e59IsET81137; Fri, 9 Jun 2000 14:54:14 -0400 (EDT) Received: from ultra2.quiknet.com (ultra2.quiknet.com [207.183.249.4]) by hub.org (8.10.1/8.10.1) with SMTP id e59IrQT80458 for ; Fri, 9 Jun 2000 14:53:26 -0400 (EDT) Received: (qmail 13302 invoked from network); 9 Jun 2000 18:53:21 -0000 Received: from 18.67.tc1.oro.pmpool.quiknet.com (HELO quiknet.com) (pecondon@207.231.67.18) by ultra2.quiknet.com with SMTP; 9 Jun 2000 18:53:21 -0000 Message-ID: <39413D08.A6BDC664@quiknet.com> Date: Fri, 09 Jun 2000 11:52:57 -0700 From: Paul Condon X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 To: ohp@pyrenet.fr, pgsql-hackers@postgresql.org Subject: [HACKERS] Re: Big project, please help Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailing-List: pgsql-hackers@postgresql.org Precedence: bulk Sender: pgsql-hackers-owner@hub.org Status: OR Two way replication on a single "table" is availabe in Lotus Notes. In Notes, every record has a time-stamp, which contains the time of the last update. (It also has a creation timestamp.) During replication, timestamps are compared at the row/record level, and compared with the timestamp of the last replication. If, for corresponding rows in two replicas, the timestamp of one row is newer than the last replication, the contents of this newer row is copied to the other replica. But if both of the corresponding rows have newer timestamps, there is a problem. The Lotus Notes solution is to: 1. send a replication conflict message to the Notes Administrator, which message contains full copies of both rows. 2. copy the newest row over the less new row in the replicas. 3. there is a mechanism for the Administrator to reverse the default decision in 2, if the semantics of the message history, or off-line investigation indicates that the wrong decision was made. In practice, the Administrator is not overwhelmed with replication conflict messages because updates usually only originate at the site that originally created the row. Or updates fill only fields that were originally 'TBD'. The full logic is perhaps more complicated than I have described here, but it is already complicated enough to give you an idea of what you're really being asked to do. I am not aware of a supplier of relational database who really supports two way replication at the level that Notes supports it, but Notes isn't a relational database. The difficulty of the position that you appear to be in is that management might believe that the full problem is solved in brand X RDBMS, and you will have trouble convincing management that this is not really true. From pgsql-hackers-owner+M2401@hub.org Tue May 23 12:19:54 2000 Received: from news.tht.net (news.hub.org [216.126.91.242]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA28410 for ; Tue, 23 May 2000 12:19:53 -0400 (EDT) Received: from hub.org (majordom@hub.org [216.126.84.1]) by news.tht.net (8.9.3/8.9.3) with ESMTP id MAB53304; Tue, 23 May 2000 12:00:08 -0400 (EDT) (envelope-from pgsql-hackers-owner+M2401@hub.org) Received: from gwineta.repas.de (gwineta.repas.de [193.101.49.1]) by hub.org (8.9.3/8.9.3) with ESMTP id LAA39896 for ; Tue, 23 May 2000 11:57:31 -0400 (EDT) (envelope-from kardos@repas-aeg.de) Received: (from smap@localhost) by gwineta.repas.de (8.8.8/8.8.8) id RAA27154 for ; Tue, 23 May 2000 17:57:23 +0200 Received: from dragon.dr.repas.de(172.30.48.206) by gwineta.repas.de via smap (V2.1) id xma027101; Tue, 23 May 00 17:56:20 +0200 Received: from kardos.dr.repas.de ([172.30.48.153]) by dragon.dr.repas.de (UCX V4.2-21C, OpenVMS V6.2 Alpha); Tue, 23 May 2000 17:57:24 +0200 Message-ID: <010201bfc4cf$7334d5a0$99301eac@Dr.repas.de> From: "Kardos, Dr. Andreas" To: "Todd M. Shrider" , References: Subject: Re: [HACKERS] failing over with postgresql Date: Tue, 23 May 2000 17:56:20 +0200 Organization: repas AEG Automation GmbH MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 X-Mailing-List: pgsql-hackers@postgresql.org Precedence: bulk Sender: pgsql-hackers-owner@hub.org Status: OR For a SCADA system (Supervisory Control and Data Akquisition) which consists of one master and one hot-standby server I have implemented such a solution. To these UNIX servers client workstations are connected (NT and/or UNIX). The database client programms run on client and server side. When developing this approach I had to goals in mind: 1) Not to get dependend on the PostgreSQL sources since they change very dynamically. 2) Not to get dependend on the fe/be protocol since there are discussions around to change it. So the approach is quite simple: Forward all database requests to the standby server on TCP/IP level. On both servers the postmaster listens on port 5433 and not on 5432. On standard port 5432 my program listens instead. This program forks twice for every incomming connection. The first instance forwards all packets from the frontend to both backends. The second instance receives the packets from all backends and forwards the packets from the master backend to the frontend. So a frontend running on a server machine connects to port 5432 of localhost. On the client machine runs another program (on NT as a service). This program forks for every incomming connections twice. The first instance forwards all packets to port 5432 of the current master server and the second instance forwards the packets from the master server to the frontend. During standby computer startup the database of the master computer is dumped, zipped, copied to the standby computer, unzipped and loaded into that database. If a standby startup took place, all client connections are aborted to allow a login into the standby database. The frontends need to reconnect in this case. So the database of the standby computer is always in sync. The disadvantage of this method is that a query cannot be canceled in the standby server since the request key of this connections gets lost. But we can live with that. Both programms are able to run on Unix and on (native!) NT. On NT threads are created instead of forked processes. This approach is simple, but it is effective and it works. We hope to survive this way until real replication will be implemented in PostgreSQL. Andreas Kardos -----Ursprüngliche Nachricht----- Von: Todd M. Shrider An: Gesendet: Donnerstag, 18. Mai 2000 17:48 Betreff: [HACKERS] failing over with postgresql > > is anyone working on or have working a fail-over implentation for the > postgresql stuff. i'd be interested in seeing if and how any might be > dealing with just general issues as well as the database syncing issues. > > we are looking to do this with heartbeat and lvs in mind. also if anyone > is load ballancing their databases that would be cool to talk about to. > > --- > Todd M. Shrider VA Linux Systems > Systems Engineer > tshrider@valinux.com www.valinux.com > From pgsql-hackers-owner+M3662@postgresql.org Tue Jan 23 16:23:34 2001 Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA04456 for ; Tue, 23 Jan 2001 16:23:34 -0500 (EST) Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLKf004705; Tue, 23 Jan 2001 16:20:41 -0500 (EST) (envelope-from pgsql-hackers-owner+M3662@postgresql.org) Received: from sectorbase2.sectorbase.com ([208.48.122.131]) by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLAe003753 for ; Tue, 23 Jan 2001 16:10:40 -0500 (EST) (envelope-from vmikheev@SECTORBASE.COM) Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) id ; Tue, 23 Jan 2001 12:49:07 -0800 Message-ID: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com> From: "Mikheev, Vadim" To: "'dom@idealx.com'" , pgsql-hackers@postgresql.org Subject: RE: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd) Date: Tue, 23 Jan 2001 13:10:34 -0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: ORr > I had thought that the pre-commit information could be stored in an > auxiliary table by the middleware program ; we would then have > to re-implement some sort of higher-level WAL (I thought of the list > of the commands performed in the current transaction, with a sequence > number for each of them that would guarantee correct ordering between > concurrent transactions in case of a REDO). But I fear I am missing This wouldn't work for READ COMMITTED isolation level. But why do you want to log commands into WAL where each modification is already logged in, hm, correct order? Well, it has sense if you're looking for async replication but you need not in two-phase commit for this and should aware about problems with READ COMMITTED isolevel. Back to two-phase commit - it's easiest part of work required for distributed transaction processing. Currently we place single commit record to log and transaction is committed when this record (and so all other transaction records) is on disk. Two-phase commit: 1. For 1st phase we'll place into log "prepared-to-commit" record and this phase will be accomplished after record is flushed on disk. At this point transaction may be committed at any time because of all its modifications are logged. But it still may be rolled back if this phase failed on other sites of distributed system. 2. When all sites are prepared to commit we'll place "committed" record into log. No need to flush it because of in the event of crash for all "prepared" transactions recoverer will have to communicate other sites to know their statuses anyway. That's all! It is really hard to implement distributed lock- and communication- managers but there is no problem with logging two records instead of one. Period. Vadim From pgsql-hackers-owner+M3665@postgresql.org Tue Jan 23 17:05:26 2001 Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05972 for ; Tue, 23 Jan 2001 17:05:24 -0500 (EST) Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NM31008120; Tue, 23 Jan 2001 17:03:01 -0500 (EST) (envelope-from pgsql-hackers-owner+M3665@postgresql.org) Received: from candle.pha.pa.us (candle.navpoint.com [162.33.245.46]) by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f0NLsU007188 for ; Tue, 23 Jan 2001 16:54:30 -0500 (EST) (envelope-from pgman@candle.pha.pa.us) Received: (from pgman@localhost) by candle.pha.pa.us (8.9.0/8.9.0) id QAA05300; Tue, 23 Jan 2001 16:53:53 -0500 (EST) From: Bruce Momjian Message-Id: <200101232153.QAA05300@candle.pha.pa.us> Subject: Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd) In-Reply-To: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com> "from Mikheev, Vadim at Jan 23, 2001 01:10:34 pm" To: "Mikheev, Vadim" Date: Tue, 23 Jan 2001 16:53:53 -0500 (EST) CC: "'dom@idealx.com'" , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL77 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR [ Charset ISO-8859-1 unsupported, converting... ] > > I had thought that the pre-commit information could be stored in an > > auxiliary table by the middleware program ; we would then have > > to re-implement some sort of higher-level WAL (I thought of the list > > of the commands performed in the current transaction, with a sequence > > number for each of them that would guarantee correct ordering between > > concurrent transactions in case of a REDO). But I fear I am missing > > This wouldn't work for READ COMMITTED isolation level. > But why do you want to log commands into WAL where each modification > is already logged in, hm, correct order? > Well, it has sense if you're looking for async replication but > you need not in two-phase commit for this and should aware about > problems with READ COMMITTED isolevel. > I believe the issue here is that while SERIALIZABLE ISOLATION means all queries can be run serially, our default is READ COMMITTED, meaning that open transactions see committed transactions, even if the transaction committed after our transaction started. (FYI, see my chapter on transactions for help, http://www.postgresql.org/docs/awbook.html.) To do higher-level WAL, you would have to record not only the queries, but the other queries that were committed at the start of each command in your transaction. Ideally, you could number every commit by its XID your log, and then when processing the query, pass the "committed" transaction ids that were visible at the time each command began. In other words, you can replay the queries in transaction commit order, except that you have to have some transactions committed at specific points while other transactions are open, i.e.: XID Open XIDS Query 500 UPDATE t SET col = 3; 501 500 BEGIN; 501 500 UPDATE t SET col = 4; 501 UPDATE t SET col = 5; 501 COMMIT; This is a silly example, but it shows that 500 must commit after the first command in transaction 501, but before the second command in the transaction. This is because UPDATE t SET col = 5 actually sees the changes made by transaction 500 in READ COMMITTED isolation level. I am not advocating this. I think WAL is a better choice. I just wanted to outline how replaying the queries in commit order is insufficient. > Back to two-phase commit - it's easiest part of work required for > distributed transaction processing. > Currently we place single commit record to log and transaction is > committed when this record (and so all other transaction records) > is on disk. > Two-phase commit: > > 1. For 1st phase we'll place into log "prepared-to-commit" record > and this phase will be accomplished after record is flushed on disk. > At this point transaction may be committed at any time because of > all its modifications are logged. But it still may be rolled back > if this phase failed on other sites of distributed system. > > 2. When all sites are prepared to commit we'll place "committed" > record into log. No need to flush it because of in the event of > crash for all "prepared" transactions recoverer will have to > communicate other sites to know their statuses anyway. > > That's all! It is really hard to implement distributed lock- and > communication- managers but there is no problem with logging two > records instead of one. Period. Great. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 From pgsql-general-owner+M805@postgresql.org Tue Nov 21 23:53:04 2000 Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA19262 for ; Wed, 22 Nov 2000 00:53:03 -0500 (EST) Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAM5qYs47249; Wed, 22 Nov 2000 00:52:34 -0500 (EST) (envelope-from pgsql-general-owner+M805@postgresql.org) Received: from racerx.cabrion.com (racerx.cabrion.com [166.82.231.4]) by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eAM5lJs46653 for ; Wed, 22 Nov 2000 00:47:19 -0500 (EST) (envelope-from rob@cabrion.com) Received: from cabrionhome (gso163-25-211.triad.rr.com [24.163.25.211]) by racerx.cabrion.com (8.8.7/8.8.7) with SMTP id AAA13731 for ; Wed, 22 Nov 2000 00:45:20 -0500 Message-ID: <006501c05447$fb9aa0c0$4100fd0a@cabrion.org> From: "rob" To: Subject: [GENERAL] Synchronization Toolkit Date: Wed, 22 Nov 2000 00:49:29 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0062_01C0541E.125CAF30" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Precedence: bulk Sender: pgsql-general-owner@postgresql.org Status: OR This is a multi-part message in MIME format. ------=_NextPart_000_0062_01C0541E.125CAF30 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Not to be confused with replication, my concept of synchronization is to manage changes between a server table (or tables) and one or more mobile, disconnected databases (i.e. PalmPilot, laptop, etc.). I read through the notes in the TODO for this topic and devised a tool kit for doing synchronization. I hope that the Postgresql development community will find this useful and will help me refine this concept by offering insight, experience and some good old fashion hacking if you are so inclined. The bottom of this message describes how to use the attached files. I look forward to your feedback. --rob Methodology: I devised a concept that I call "session versioning". This means that every time a row changes it does NOT get a new version. Rather it gets stamped with the current session version common to all published tables. Clients, when they connect for synchronization, will immediately increment this common version number reserve the result as a "post version" and then increment the session version again. This version number, implemented as a sequence, is common to all synchronized tables and rows. Any time the server makes changes to the row gets stamped with the current session version, when the client posts its changes it uses the reserved "post version". The client then makes all it's changes stamping the changed rows with it's reserved "post version" rather than the current version. The reason why is explained later. It is important that the client post all its own changes first so that it does not end up receiving records which changed since it's last session that it is about to update anyway. Reserving the post version is a two step process. First, the number is simply stored in a variable for later use. Second, the value is added to a lock table (last_stable) to indicate to any concurrent sessions that rows with higher version numbers are to be considered "unstable" at the moment and they should not attempt to retrieve them at this time. Each client, upon connection, will use the lowest value in this lock table (max_version) to determine the upper boundary for versions it should retrieve. The lower boundary is simply the previous session's "max_version" plus one. Thus when the client retrieves changes is uses the following SQL "where" expression: WHERE row_version >= max_version and row_version <= last_stable_version and version <> this_post_version The point of reserving and locking a post version is important in that it allows concurrent synchronization by multiple clients. The first, of many, clients to connect basically dictates to all future clients that they must not take any rows equal to or greater than the one which it just reserved and locked. The reason the session version is incremented a second time is so that the server may continue to post changes concurrent with any client changes and be certain that these concurrent server changes will not taint rows the client is about to retrieve. Once the client is finished with it's session it removes the lock on it's post version. Partitioning data for use by each node is the next challenge we face. How can we control which "slice" of data each client receives? A slice can be horizontal or vertical within a table. Horizontal slices are easy, it's just the where clause of an SQL statement that says "give me the rows that match X criteria". We handle this by storing and appending a where clause to each client's retrieval statement in addition to where clause described above. Actually, two where clauses are stored and appended. One is per client and one is per publication (table). We defined horizontal slices by filtering rows. Vertical slices are limits by column. The tool kit does provide a mechanism for pseudo vertical partitioning. When a client is "subscribed" to a publication, the toolkit stores what columns that node is to receive during a session. These are stored in the subscribed_cols table. While this does limit the number columns transmitted, the insert/update/delete triggers do not recognize changes based on columns. The "pseudo" nature of our vertical partitioning is evident by example: Say you have a table with name, address and phone number as columns. You restrict a client to see only name and address. This means that phone number information will not be sent to the client during synchronization, and the client can't attempt to alter the phone number of a given entry. Great, but . . . if, on the server, the phone number (but not the name or address) is changed, the entire row gets marked with a new version. This means that the name and address will get sent to the client even though they didn't change. Well, there's the flaw in vertical partitioning. Other than wasting bandwidth, the extra row does no harm to the process. The workaround for this is to highly normalize your schema when possible. Collisions are the next crux one encounters with synchronization. When two clients retrieve the same row and both make (different)changes, which one is correct? So far the system operates totally independent of time. This is good because it doesn't rely on the server or client to keep accurate time. We can just ignore time all together, but then we force our clients to synchronize on a strict schedule in order to avoid (or reduce) collisions. If every node synchronized immediately after making changes we could just stop here. Unfortunately this isn't reality. Reality dictates that of two clients: Client A & B will each pick up the same record on Monday. A will make changes on Monday, then leave for vacation. B will make changes on Wednesday because new information was gathered in A's absence. Client B posts those changes Wednesday. Meanwhile, client A returns from vacation on Friday and synchronizes his changes. A over writes B's changes even though A made changes before the most recent information was posted by B. It is clear that we need some form of time stamp to cope with the above example. While clocks aren't the most reliable, they are the only common version control available to solve this problem. The system is set up to accept (but not require) timestamps from clients and changes on the server are time stamped. The system, when presented a time stamp with a row, will compare them to figure out who wins in a tie. The system makes certain "sanity" checks with regard to these time stamps. A client may not attempt to post a change with a timestamp that is more than one hour in the future (according to what the server thinks "now" is) nor one hour before it's last synchronization date/time. The client row will be immediately placed into the collision table if the timestamp is that far out of whack. Implementations of the tool kit should take care to ensure that client & server agree on what "now" is before attempting to submit changes with timestamps. Time stamps are not required. Should a client be incapable of tracking timestamps, etc. The system will assume that any server row which has been changed since the client's last session will win a tie. This is quite error prone, so timestamps are encouraged where possible. Inserts pose an interesting challenge. Since multiple clients cannot share a sequence (often used as a primary key) while disconnected. They will be responsible for their own unique "row_id" when inserting records. Inserts accept any arbitrary key, and write back to the client a special kind of update that gives the server's row_id. The client is responsible for making sure that this update takes place locally. Deletes are the last portion of the process. When deletes occur, the row_id, version, etc. are stored in a "deleted" table. These entries are retrieved by the client using the same version filter as described above. The table is pruned at the end of each session by deleting all records with versions that are less than the lowest 'last_version' stored for each client. Having wrapped up the synchronization process, I'll move on to describe some points about managing clients, publications and the like. The tool kit is split into two objects: SyncManagement and Synchronization. The Synchronization object exposes an API that client implementations use to communicate and receive changes. The management functions handle system install and uninstall in addition to publication of tables and client subscriptions. Installation and uninstallation are handled by their corresponding functions in the API. All system tables are prefixed and suffixed with four underscores, in hopes that this avoids conflict with an existing tables. Calling the install function more than once will generate an error message. Uninstall will remove all related tables, sequences, functions and triggers from the system. The first step, after installing the system, is to publish a table. A table can be published more than once under different names. Simply provide a unique name as the second argument to the publish function. Since object names are restricted to 32 characters in Postgres, each table is given a unique id and this id is used to create the trigger and sequence names. Since one table can be published multiple times, but only needs one set of triggers and one sequence for change management a reference count is kept so that we know when to add/drop triggers and functions. By default, all columns are published, but the third argument to the publish function accepts an array reference of column names that allows you to specify a limited set. Information about the table is stored in the "tables" table, info about the publication is in the "publications" table and column names are stored in "subscribed_cols" table. The next step is to subscribe a client to a table. A client is identified by a user name and a node name. The subscribe function takes three arguments: user, node & publication. The subscription process writes an entry into the "subscribed" table with default values. Of note, the "RefreshOnce" attribute is set to true whenever a table is published. This indicates to the system that a full table refresh should be sent the next time the client connects even if the client requests synchronization rather than refresh. The toolkit does not, yet, provide a way to manage the whereclause stored at either the publication or client level. To use or test this feature, you will need to set the whereclause attributes manually. Tables and users can be unpublished and unsubscribed using the corresponding functions within the tool kit's management interface. Because postgres lacks an "ALTER TABLE DROP COLUMN" function, the unpublish function only removes default values and indexes for those columns. The API isn't the most robust thing in the world right now. All functions return undef on success and an error string otherwise (like DBD). I hope to clean up the API considerably over the next month. The code has not been field tested at this time. The files attached are: 1) SynKit.pm (A perl module that contains install/uninstall functions and a simple api for synchronization & management) 2) sync_install.pl (Sample code to demonstrate the installation, publishing and subscribe process) 3) sync_uninstall.pl (Sample code to demonstrate the uninstallation, unpublishing and unsubscribe process) To use them on Linux (don't know about Win32 but should work fine): - set up a test database and make SURE plpgsql is installed - install perl 5.05 along with Date::Parse(TimeDate-1.1) , DBI and DBD::Pg modules [www.cpan.org] - copy all three attached files to a test directory - cd to your test directory - edit all three files and change the three DBI variables to suit your system (they are clearly marked) - % perl sync_install.pl - check out the tables, functions & triggers installed - % perl sync.pl - check out the 'sync_test' table, do some updates/inserts/deletes and run sync.pl again NOTE: Sanity checks default to allow no more than 50% of the table to be changed by the client in a single session. If you delete all (or most of) the rows you will get errors when you run sync.pl again! (by design) - % perl sync_uninstall.pl (when you are done) - check out the sample scripts and the perl module code (commented, but not documented) ------=_NextPart_000_0062_01C0541E.125CAF30 Content-Type: application/octet-stream; name="sync.pl" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="sync.pl" # This script depicts the syncronization process for two users. ## CHANGE THESE THREE VARIABLE TO MATCH YOUR SYSTEM ########### my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy'; # my $db_user =3D 'test'; # my $db_pass =3D 'test'; # ################################################################# my $ret; #holds return value use SynKit; #create a synchronization object (pass dbi connection info) my $s =3D Synchronize->new($dbi_connect_string,$db_user,$db_pass); #start a session by passing a user name, "node" identifier and a collision = queue name (client or server) $ret =3D $s->start_session('JOE','REMOTE_NODE_NAME','server'); print "Handle this error: $ret\n\n" if $ret; #call this once before attempting to apply individual changes $ret =3D $s->start_changes('sync_test',['name']); print "Handle this error: $ret\n\n" if $ret; #call this for each change the client wants to make to the database $ret =3D $s->apply_change(CLIENTROWID,'insert',undef,['ted']); print "Handle this error: $ret\n\n" if $ret; #call this for each change the client wants to make to the database $ret =3D $s->apply_change(CLIENTROWID,'insert','1973-11-10 11:25:00 AM -05= ',['tim']); print "Handle this error: $ret\n\n" if $ret; #call this for each change the client wants to make to the database $ret =3D $s->apply_change(999,'update',undef,['tom']); print "Handle this error: $ret\n\n" if $ret; #call this for each change the client wants to make to the database $ret =3D $s->apply_change(1,'update',undef,['tom']); print "Handle this error: $ret\n\n" if $ret; #call this once after all changes have been submitted $ret =3D $s->end_changes(); print "Handle this error: $ret\n\n" if $ret; #call this to get updates from all subscribed tables $ret =3D $s->get_all_updates(); print "Handle this error: $ret\n\n" if $ret; print "\n\nSyncronization session is complete. (JOE) \n\n"; # make some changes to the database (server perspective) print "\n\nMaking changes to the the database. (server side) \n\n"; use DBI; my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass); $dbh->do("insert into sync_test values ('roger')"); $dbh->do("insert into sync_test values ('john')"); $dbh->do("insert into sync_test values ('harry')"); $dbh->do("delete from sync_test where name =3D 'roger'"); $dbh->do("update sync_test set name =3D 'tom' where name =3D 'harry'"); $dbh->disconnect; #now do another session for a different user #start a session by passing a user name, "node" identifier and a collision = queue name (client or server) $ret =3D $s->start_session('KEN','ANOTHER_REMOTE_NODE_NAME','server'); print "Handle this error: $ret\n\n" if $ret; #call this to get updates from all subscribed tables $ret =3D $s->get_all_updates(); print "Handle this error: $ret\n\n" if $ret; print "\n\nSynchronization session is complete. (KEN)\n\n"; print "Now look at your database and see what happend, make changes to the = test table, etc. and run this again.\n\n"; ------=_NextPart_000_0062_01C0541E.125CAF30 Content-Type: application/octet-stream; name="sync_uninstall.pl" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="sync_uninstall.pl" # this script uninstalls the synchronization system using the SyncManager o= bject; use SynKit; ### CHANGE THESE TO MATCH YOUR SYSTEM ######################## my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy'; # my $db_user =3D 'test'; # my $db_pass =3D 'test'; # ################################################################# my $ret; #holds return value #create an instance of the SyncManager object my $m =3D SyncManager->new($dbi_connect_string,$db_user,$db_pass); # call this to unsubscribe a user/node (not necessary if you are uninstalli= ng) print $m->unsubscribe('KEN','ANOTHER_REMOTE_NODE_NAME','sync_test'); #call this to unpublish a table (not necessary if you are uninstalling) print $m->unpublish('sync_test'); #call this to uninstall the syncronization system # NOTE: this will automatically unpublish & unsubscribe all users print $m->UNINSTALL; # now let's drop our little test table use DBI; my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass); $dbh->do("drop table sync_test"); $dbh->disconnect; print "\n\nI hope you enjoyed this little demonstration\n\n"; ------=_NextPart_000_0062_01C0541E.125CAF30 Content-Type: application/octet-stream; name="sync_install.pl" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="sync_install.pl" # This script shows how to install the synchronization system=20 # using the SyncManager object use SynKit; ### CHANGE THESE TO MATCH YOUR SYSTEM ########################## my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy'; # my $db_user =3D 'test'; # my $db_pass =3D 'test'; # ################################################################# my $ret; #holds return value #create an instance of the sync manager object my $m =3D SyncManager->new($dbi_connect_string,$db_user,$db_pass); #Call this to install the syncronization management tables, etc. $ret =3D $m->INSTALL; die "Handle this error: $ret\n\n" if $ret; #create a test table for us to demonstrate with use DBI; my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass); $dbh->do("create table sync_test (name text)"); $dbh->do("insert into sync_test values ('rob')"); $dbh->do("insert into sync_test values ('rob')"); $dbh->do("insert into sync_test values ('rob')"); $dbh->do("insert into sync_test values ('ted')"); $dbh->do("insert into sync_test values ('ted')"); $dbh->do("insert into sync_test values ('ted')"); $dbh->disconnect; #call this to "publish" a table $ret =3D $m->publish('sync_test'); print "Handle this error: $ret\n\n" if $ret; #call this to "subscribe" a user/node to a publication (table) $ret =3D $m->subscribe('JOE','REMOTE_NODE_NAME','sync_test'); print "Handle this error: $ret\n\n" if $ret; #call this to "subscribe" a user/node to a publication (table) $ret =3D $m->subscribe('KEN','ANOTHER_REMOTE_NODE_NAME','sync_test'); print "Handle this error: $ret\n\n" if $ret; print "Now you can do: 'perl sync.pl' a few times to play\n\n"; print "Do 'perl sync_uninstall.pl' to uninstall the system\n"; ------=_NextPart_000_0062_01C0541E.125CAF30 Content-Type: application/octet-stream; name="SynKit.pm" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="SynKit.pm" # Perl DB synchronization toolkit #created for postgres 7.0.2 + use strict; BEGIN { use vars qw($VERSION); # set the version for version checking $VERSION =3D 1.00; } package Synchronize; use DBI; use Date::Parse; # new requires 3 arguments: dbi connection string, plus the corresponding u= sername and password to get connected to the database sub new { my $proto =3D shift; my $class =3D ref($proto) || $proto; my $self =3D {}; my $dbi =3D shift; my $user =3D shift; my $pass =3D shift; $self->{DBH} =3D DBI->connect($dbi,$user,$pass) || die "Failed to connect = to database: ".DBI->errstr(); $self->{user} =3D undef; $self->{node} =3D undef; $self->{status} =3D undef; # holds status of table update portion of sessi= on $self->{pubs} =3D {}; #holds hash of pubs available to sessiom with val = =3D 1 if ok to request sync $self->{orderpubs} =3D undef; #holds array ref of subscribed pubs ordered = by sync_order $self->{this_post_ver} =3D undef; #holds the version number under which th= is session will post changes $self->{max_ver} =3D undef; #holds the maximum safe version for getting up= dates $self->{current} =3D {}; #holds the current publication info to which chan= ges are being applied $self->{queue} =3D 'server'; # tells collide function what to do with coll= isions. (default is to hold on server) $self->{DBLOG}=3D DBI->connect($dbi,$user,$pass) || die "cannot log to DB:= ".DBI->errstr();=20 return bless ($self, $class); } sub dblog {=20 my $self =3D shift; my $msg =3D $self->{DBLOG}->quote($_[0]); my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); $self->{DBLOG}->do("insert into ____sync_log____ (username, nodename,stamp= , message) values($quser, $qnode, now(), $msg)"); } #start_session establishes session wide information and other housekeeping = chores # Accepts username, nodename and queue (client or server) as arguments; sub start_session { my $self =3D shift; $self->{user} =3D shift || die 'Username is required'; $self->{node} =3D shift || die 'Nodename is required'; $self->{queue} =3D shift; if ($self->{queue} ne 'server' && $self->{queue} ne 'client') { die "You must provide a queue argument of either 'server' or 'client'"; } my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); my $sql =3D "select pubname from ____subscribed____ where username =3D $qu= ser and nodename =3D $qnode"; my @pubs =3D $self->GetColList($sql); return 'User/Node has no subscriptions!' if !defined(@pubs); # go though the list and check permissions and rules for each foreach my $pub (@pubs) { my $qpub =3D $self->{DBH}->quote($pub); my $sql =3D "select disabled, pubname, fullrefreshonly, refreshonce,post_= ver from ____subscribed____ where username =3D $quser and pubname =3D $qpub= and nodename =3D $qnode"; my $sth =3D $self->{DBH}->prepare($sql) || die $self->{DBH}->errstr; $sth->execute || die $self->{DBH}->errstr; my @row; while (@row =3D $sth->fetchrow_array) { next if $row[0]; #publication is disabled next if !defined($row[1]); #publication does not exist (should never occ= ur) if ($row[2] || $row[3]) { #refresh of refresh once flag is set $self->{pubs}->{$pub} =3D 0; #refresh only next; } if (!defined($row[4])) { #no previous session exists, must refresh $self->{pubs}->{$pub} =3D 0; #refresh only next; } $self->{pubs}->{$pub} =3D 1; #OK for sync } $sth->finish; } $sql =3D "select pubname from ____publications____ order by sync_order"; my @op =3D $self->GetColList($sql); my @orderpubs; #loop through ordered pubs and remove non subscribed publications foreach my $pub (@op) { push @orderpubs, $pub if defined($self->{pubs}->{$pub}); } =09 $self->{orderpubs} =3D \@orderpubs; # Now we obtain a session version number, etc. $self->{DBH}->{AutoCommit} =3D 0; #allows "transactions" $self->{DBH}->{RaiseError} =3D 1; #script [or eval] will automatically die= on errors eval { #start DB transaction #lock the version sequence until we determin that we have gotten #a good value. Lock will be released on commit. $self->{DBH}->do('lock ____version_seq____ in access exclusive mode'); # remove stale locks if they exist my $sql =3D "delete from ____last_stable____ where username =3D $quser an= d nodename =3D $qnode"; $self->{DBH}->do($sql); # increment version sequence & grab the next val as post_ver my $sql =3D "select nextval('____version_seq____')"; my $sth =3D $self->{DBH}->prepare($sql); $sth->execute; ($self->{this_post_ver}) =3D $sth->fetchrow_array(); $sth->finish; # grab max_ver from last_stable $sql =3D "select min(version) from ____last_stable____";=20 $sth =3D $self->{DBH}->prepare($sql); $sth->execute; ($self->{max_ver}) =3D $sth->fetchrow_array(); $sth->finish; # if there was no version in lock table, then take the ID that was in use # when we started the session ($max_ver -1) $self->{max_ver} =3D $self->{this_post_ver} -1 if (!defined($self->{max_v= er})); # lock post_ver by placing it in last_stable $self->{DBH}->do("insert into ____last_stable____ (version, username, nod= ename) values ($self->{this_post_ver}, $quser,$qnode)"); # increment version sequence again (discard result) $sql =3D "select nextval('____version_seq____')"; $sth =3D $self->{DBH}->prepare($sql); $sth->execute; $sth->fetchrow_array(); $sth->finish; }; #end eval/transaction if ($@) { # part of transaction failed return 'Start session failed'; $self->{DBH}->rollback; } else { # all's well commit block $self->{DBH}->commit; } $self->{DBH}->{AutoCommit} =3D 1; $self->{DBH}->{RaiseError} =3D 0; return undef; } #start changes should be called once before applying individual change requ= ests # Requires publication and ref to columns that will be updated as arguments sub start_changes { my $self =3D shift; my $pub =3D shift || die 'Publication is required'; my $colref =3D shift || die 'Reference to column array is required'; $self->{status} =3D 'starting'; my $qpub =3D $self->{DBH}->quote($pub); my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); my @cols =3D @{$colref}; my @subcols =3D $self->GetColList("select col_name from ____subscribed_col= s____ where username =3D $quser and nodename =3D $qnode and pubname =3D $qp= ub"); my %subcols; foreach my $col (@subcols) { $subcols{$col} =3D 1; } foreach my $col (@cols) {=09 return "User/node is not subscribed to column '$col'" if !$subcols{$col}; } my $sql =3D "select pubname, readonly, last_session, post_ver, last_ver, w= hereclause, sanity_limit,=20 sanity_delete, sanity_update, sanity_insert from ____subscribed____ where u= sername =3D $quser and pubname =3D $qpub and nodename =3D $qnode"; my ($junk, $readonly, $last_session, $post_ver, $last_ver, $whereclause, $= sanity_limit,=20 $sanity_delete, $sanity_update, $sanity_insert) =3D $self->GetOneRow($sql); =09 return 'Publication is read only' if $readonly; $sql =3D "select whereclause from ____publications____ where pubname =3D $= qpub"; my ($wc) =3D $self->GetOneRow($sql); $whereclause =3D '('.$whereclause.')' if $whereclause; $whereclause =3D $whereclause.' and ('.$wc.')' if $wc; my ($table) =3D $self->GetOneRow("select tablename from ____publications__= __ where pubname =3D $qpub"); return 'Publication is not registered correctly' if !defined($table); my %info; $info{pub} =3D $pub; $info{whereclause} =3D $whereclause; $info{post_ver} =3D $post_ver; $last_session =3D~ s/([+|-]\d\d?)$/ $1/; #put a space before timezone=09 $last_session =3D str2time ($last_session); #convert to perltime (seconds = since 1970) $info{last_session} =3D $last_session; $info{last_ver} =3D $last_ver; $info{table} =3D $table; $info{cols} =3D \@cols; my $sql =3D "select count(oid) from $table"; $sql =3D $sql .' '.$whereclause if $whereclause; my ($rowcount) =3D $self->GetOneRow($sql); #calculate sanity levels (convert from % to number of rows) # limits defined as less than 1 mean no limit $info{sanitylimit} =3D $rowcount * ($sanity_limit / 100) if $sanity_limit = > 0; $info{insertlimit} =3D $rowcount * ($sanity_insert / 100) if $sanity_inser= t > 0; $info{updatelimit} =3D $rowcount * ($sanity_update / 100) if $sanity_updat= e > 0; $info{deletelimit} =3D $rowcount * ($sanity_delete / 100) if $sanity_delet= e > 0; $self->{sanitycount} =3D 0; $self->{updatecount} =3D 0; $self->{insertcount} =3D 0; $self->{deletecount} =3D 0; $self->{current} =3D \%info; $self->{DBH}->{AutoCommit} =3D 0; #turn on transaction behavior so we can = roll back on sanity limits, etc. $self->{status} =3D 'ready'; return undef; } #call this once all changes are submitted to commit them; sub end_changes { my $self =3D shift; return undef if $self->{status} ne 'ready'; $self->{DBH}->commit; $self->{DBH}->{AutoCommit} =3D 1; $self->{status} =3D 'success'; return undef; } #call apply_change once for each row level client update # Accepts 4 params: rowid, action, timestamp and reference to data array # Note: timestamp can be undef, data can be undef # timestamp MUST be in perl time (secs since 1970) #this routine checks basic timestamp info and sanity limits, then passes th= e info along to do_action() for processing sub apply_change { my $self =3D shift; my $rowid =3D shift || return 'Row ID is required'; #don't die just for on= e bad row my $action =3D shift || return 'Action is required'; #don't die just for o= ne bad row my $timestamp =3D shift; my $dataref =3D shift; $action =3D lc($action); $timestamp =3D str2time($timestamp) if $timestamp; return 'Status failure, cannot accept changes: '.$self->{status} if $self-= >{status} ne 'ready'; my %info =3D %{$self->{current}}; $self->{sanitycount}++; if ($info{sanitylimit} && $self->{sanitycount} > $info{sanitylimit}) { # too many changes from client my $ret =3D $self->sanity('limit'); return $ret if $ret; } =09 if ($timestamp && $timestamp > time() + 3600) { # current time + one hour #client's clock is way off, cannot submit changes in future my $ret =3D $self->collide('future', $info{table}, $rowid, $action, undef= , $timestamp, $dataref, $self->{queue}); return $ret if $ret; } if ($timestamp && $timestamp < $info{last_session} - 3600) { # last sessio= n time less one hour #client's clock is way off, cannot submit changes that occured before las= t sync date my $ret =3D $self->collide('past', $info{table}, $rowid, $action, undef, = $timestamp, $dataref , $self->{queue}); return $ret if $ret; } my ($crow, $cver, $ctime); #current row,ver,time if ($action ne 'insert') { my $sql =3D "select ____rowid____, ____rowver____, ____stamp____ from $in= fo{table} where ____rowid____ =3D $rowid"; ($crow, $cver, $ctime) =3D $self->GetOneRow($sql); if (!defined($crow)) { my $ret =3D $self->collide('norow', $info{table}, $rowid, $action, undef= , $timestamp, $dataref , $self->{queue}); return $ret if $ret;=09=09 } $ctime =3D~ s/([+|-]\d\d?)$/ $1/; #put space between timezone $ctime =3D str2time($ctime) if $ctime; #convert to perl time if ($timestamp) { if ($ctime < $timestamp) { my $ret =3D $self->collide('time', $info{table}, $rowid, $action, undef= , $timestamp, $dataref, $self->{queue} );=09=09 return $ret if $ret; } } else { if ($cver > $self->{this_post_ver}) { my $ret =3D $self->collide('version', $info{table}, $rowid, $action, un= def, $timestamp, $dataref, $self->{queue} ); return $ret if $ret; } } =09 } if ($action eq 'insert') { $self->{insertcount}++; if ($info{insertlimit} && $self->{insertcount} > $info{insertlimit}) { # too many changes from client my $ret =3D $self->sanity('insert'); return $ret if $ret; } my $qtable =3D $self->{DBH}->quote($info{table}); my ($rowidsequence) =3D '_'.$self->GetOneRow("select table_id from ____ta= bles____ where tablename =3D $qtable").'__rowid_seq'; return 'Table incorrectly registered, cannot get rowid sequence name: '.$= self->{DBH}->errstr() if not defined $rowidsequence; my @data; foreach my $val (@{$dataref}) { push @data, $self->{DBH}->quote($val); } my $sql =3D "insert into $info{table} ("; if ($timestamp) { $sql =3D $sql . join(',',@{$info{cols}}) . ',____rowver____, ____stamp__= __) values ('; $sql =3D $sql . join (',',@data) .','.$self->{this_post_ver}.',\''.local= time($timestamp).'\')'; } else { $sql =3D $sql . join(',',@{$info{cols}}) . ',____rowver____) values ('; $sql =3D $sql . join (',',@data) .','.$self->{this_post_ver}.')'; } my $ret =3D $self->{DBH}->do($sql); if (!$ret) { my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,= $action, undef, $timestamp, $dataref , $self->{queue}); return $ret if $ret;=09=09 } my ($newrowid) =3D $self->GetOneRow("select currval('$rowidsequence')"); return 'Failed to get current rowid on inserted row'.$self->{DBH}->errstr= if not defined $newrowid; $self->changerowid($rowid, $newrowid); } if ($action eq 'update') { $self->{updatecount}++; if ($info{updatelimit} && $self->{updatecount} > $info{updatelimit}) { # too many changes from client my $ret =3D $self->sanity('update'); return $ret if $ret; } my @data; foreach my $val (@{$dataref}) { push @data, $self->{DBH}->quote($val); }=09 my $sql =3D "update $info{table} set "; my @cols =3D @{$info{cols}}; foreach my $col (@cols) { my $val =3D shift @data; $sql =3D $sql . "$col =3D $val,"; } $sql =3D $sql." ____rowver____ =3D $self->{this_post_ver}"; $sql =3D $sql.", ____stamp____ =3D '".localtime($timestamp)."'" if $times= tamp; $sql =3D $sql." where ____rowid____ =3D $rowid"; $sql =3D $sql." and $info{whereclause}" if $info{whereclause}; my $ret =3D $self->{DBH}->do($sql); if (!$ret) { my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,= $action, undef, $timestamp, $dataref , $self->{queue}); return $ret if $ret;=09=09 } } if ($action eq 'delete') { $self->{deletecount}++; if ($info{deletelimit} && $self->{deletecount} > $info{deletelimit}) { # too many changes from client my $ret =3D $self->sanity('delete'); return $ret if $ret; } if ($timestamp) { my $sql =3D "update $info{table} set ____rowver____ =3D $self->{this_pos= t_ver}, ____stamp____ =3D '".localtime($timestamp)."' where ____rowid____ = =3D $rowid"; $sql =3D $sql . " where $info{whereclause}" if $info{whereclause}; $self->{DBH}->do($sql) || return 'Predelete update failed: '.$self->{DBH= }->errstr; } else { my $sql =3D "update $info{table} set ____rowver____ =3D $self->{this_pos= t_ver} where ____rowid____ =3D $rowid"; $sql =3D $sql . " where $info{whereclause}" if $info{whereclause}; $self->{DBH}->do($sql) || return 'Predelete update failed: '.$self->{DBH= }->errstr; } my $sql =3D "delete from $info{table} where ____rowid____ =3D $rowid"; $sql =3D $sql . " where $info{whereclause}" if $info{whereclause}; my $ret =3D $self->{DBH}->do($sql); if (!$ret) { my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,= $action, undef, $timestamp, $dataref , $self->{queue}); return $ret if $ret;=09=09 } } =09 =09 return undef; } sub changerowid { my $self =3D shift; my $oldid =3D shift; my $newid =3D shift; $self->writeclient('changeid',"$oldid\t$newid"); } #writes info to client sub writeclient { my $self =3D shift; my $type =3D shift; my @info =3D @_; print "$type: ",join("\t",@info),"\n"; return undef; } # Override this for custom behavior. Default is to echo back the sanity fa= ilure reason.=20=20 # If you want to override a collision, you can do so by returning undef. sub sanity { my $self =3D shift; my $reason =3D shift; $self->{status} =3D 'sanity exceeded'; $self->{DBH}->rollback; return $reason; } # Override this for custom behavior. Default is to echo back the failure r= eason.=20=20 # If you want to override a collision, you can do so by returning undef. sub collide { my $self =3D shift; my ($reason,$table,$rowid,$action,$rowver,$timestamp,$data, $queue) =3D @_; my @data; foreach my $val (@{$data}) { push @data, $self->{DBH}->quote($val); }=09 if ($reason =3D~ /integrity/i || $reason =3D~ /constraint/i) { $self->{status} =3D 'intergrity violation'; $self->{DBH}->rollback; } my $datastring; my @cols =3D @{$self->{current}->{cols}}; foreach my $col (@cols) { my $val =3D shift @data; $datastring =3D $datastring . "$col =3D $val,"; } chop $datastring; #remove trailing comma if ($queue eq 'server') { $timestamp =3D localtime($timestamp) if defined($timestamp); $rowid =3D $self->{DBH}->quote($rowid); $rowid =3D 'null' if !defined($rowid); $rowver =3D 'null' if !defined($rowver); $timestamp =3D $self->{DBH}->quote($timestamp); $data =3D $self->{DBH}->quote($data); my $qtable =3D $self->{DBH}->quote($table); my $qreason =3D $self->{DBH}->quote($reason); my $qaction =3D $self->{DBH}->quote($action); my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); $datastring =3D $self->{DBH}->quote($datastring); my $sql =3D "insert into ____collision____ (rowid, tablename, rowver, stamp, data, reason, action, username, nodename, queue) values($rowid,$qtable, $rowver, $timestamp,$datastring, $qreason, $qaction,$quser, $qnode)"; $self->{DBH}->do($sql) || die 'Failed to write to collision table: '.$sel= f->{DBH}->errstr; } else { $self->writeclient('collision',$rowid,$table, $rowver, $timestamp,$reason= , $action,$self->{user}, $self->{node}, $data); } return $reason; } #calls get_updates once for each publication the user/node is subscribed to= in correct sync_order sub get_all_updates { my $self =3D shift; my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); foreach my $pub (@{$self->{orderpubs}}) { $self->get_updates($pub, 1); #request update as sync unless overrridden b= y flags } } # Call this once for each table the client needs refreshed or sync'ed AFTER= all inbound client changes have been posted # Accepts publication and sync flag as arguments sub get_updates { my $self =3D shift; my $pub =3D shift || die 'Publication is required'; my $sync =3D shift; my $qpub =3D $self->{DBH}->quote($pub); my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); #enforce refresh and refreshonce flags undef $sync if !$self->{pubs}->{$pub};=20 my %info =3D $self->{current}; my @cols =3D $self->GetColList("select col_name from ____subscribed_cols__= __ where username =3D $quser and nodename =3D $qnode and pubname =3D $qpub"= );; my ($table) =3D $self->GetOneRow("select tablename from ____publications__= __ where pubname =3D $qpub"); return 'Table incorrectly registered for read' if !defined($table); my $qtable =3D $self->{DBH}->quote($table);=09 my $sql =3D "select pubname, last_session, post_ver, last_ver, whereclause= from ____subscribed____ where username =3D $quser and pubname =3D $qpub an= d nodename =3D $qnode"; my ($junk, $last_session, $post_ver, $last_ver, $whereclause) =3D $self->G= etOneRow($sql); my ($wc) =3D $self->GetOneRow("select whereclause from ____publications___= _ where pubname =3D $qpub"); $whereclause =3D '('.$whereclause.')' if $whereclause; $whereclause =3D $whereclause.' and ('.$wc.')' if $wc; if ($sync) { $self->writeclient('start synchronize', $pub); } else { $self->writeclient('start refresh', $pub); $self->{DBH}->do("update ____subscribed____ set refreshonce =3D false whe= re pubname =3D $qpub and username =3D $quser and nodename =3D $qnode") || r= eturn 'Failed to clear RefreshOnce flag: '.$self->{DBH}->errstr; } $self->writeclient('columns',@cols); my $sql =3D "select ____rowid____, ".join(',', @cols)." from $table"; if ($sync) { $sql =3D $sql." where (____rowver____ <=3D $self->{max_ver} and ____rowve= r____ > $last_ver)"; if (defined($self->{this_post_ver})) { $sql =3D $sql . " and (____rowver____ <> $post_ver)"; } } else { $sql =3D $sql." where (____rowver____ <=3D $self->{max_ver})"; } $sql =3D $sql." and $whereclause" if $whereclause; =09 my $sth =3D $self->{DBH}->prepare($sql) || return 'Failed to get prepare S= QL for updates: '.$self->{DBH}->errstr; $sth->execute || return 'Failed to execute SQL for updates: '.$self->{DBH}= ->errstr; my @row; while (@row =3D $sth->fetchrow_array) { $self->writeclient('update/insert',@row); } $sth->finish; # now get deleted rows if ($sync) { $sql =3D "select rowid from ____deleted____ where (tablename =3D $qtable)= "; $sql =3D $sql." and (rowver <=3D $self->{max_ver} and rowver > $last_ver)= "; if (defined($self->{this_post_ver})) { $sql =3D $sql . " and (rowver <> $self->{this_post_ver})"; } $sql =3D $sql." and $whereclause" if $whereclause; $sth =3D $self->{DBH}->prepare($sql) || return 'Failed to get prepare SQL= for deletes: '.$self->{DBH}->errstr; $sth->execute || return 'Failed to execute SQL for deletes: '.$self->{DBH= }->errstr; my @row; while (@row =3D $sth->fetchrow_array) { $self->writeclient('delete',@row); } $sth->finish; } if ($sync) { $self->writeclient('end synchronize', $pub); } else { $self->writeclient('end refresh', $pub); } my $qpub =3D $self->{DBH}->quote($pub); my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); $self->{DBH}->do("update ____subscribed____ set last_ver =3D $self->{max_v= er}, last_session =3D now(), post_ver =3D $self->{this_post_ver} where user= name =3D $quser and nodename =3D $qnode and pubname =3D $qpub"); return undef; } # Call this once when everything else is done. Does housekeeping.=20 # (MAKE THIS AN OBJECT DESTRUCTOR?) sub DESTROY { my $self =3D shift; #release version from lock table (including old ones) my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); my $sql =3D "delete from ____last_stable____ where username =3D $quser and= nodename =3D $qnode"; $self->{DBH}->do($sql); #clean up deleted table my ($version) =3D $self->GetOneRow("select min(last_ver) from ____subscrib= ed____"); return undef if not defined $version; $self->{DBH}->do("delete from ____deleted____ where rowver < $version") ||= return 'Failed to prune deleted table'.$self->{DBH}->errstr;; #disconnect from DBD sessions $self->{DBH}->disconnect; $self->{DBLOG}->disconnect; return undef; } ############# Helper Subs ############ sub GetColList { my $self =3D shift; my $sql =3D shift || die 'Must provide sql select statement'; my $sth =3D $self->{DBH}->prepare($sql) || return undef; $sth->execute || return undef; my $val; my @col; while (($val) =3D $sth->fetchrow_array) { push @col, $val; } $sth->finish; return @col; } sub GetOneRow { my $self =3D shift; my $sql =3D shift || die 'Must provide sql select statement'; my $sth =3D $self->{DBH}->prepare($sql) || return undef; $sth->execute || return undef; my @row =3D $sth->fetchrow_array; $sth->finish; return @row; } =20 package SyncManager; use DBI; # new requires 3 arguments: dbi connection string, plus the corresponding u= sername and password sub new { my $proto =3D shift; my $class =3D ref($proto) || $proto; my $self =3D {}; my $dbi =3D shift; my $user =3D shift; my $pass =3D shift; $self->{DBH} =3D DBI->connect($dbi,$user,$pass) || die "Failed to connect = to database: ".DBI->errstr(); $self->{DBLOG}=3D DBI->connect($dbi,$user,$pass) || die "cannot log to DB:= ".DBI->errstr(); =09 return bless ($self, $class); } sub dblog {=20 my $self =3D shift; my $msg =3D $self->{DBLOG}->quote($_[0]); my $quser =3D $self->{DBH}->quote($self->{user}); my $qnode =3D $self->{DBH}->quote($self->{node}); $self->{DBLOG}->do("insert into ____sync_log____ (username, nodename,stamp= , message) values($quser, $qnode, now(), $msg)"); } #this should never need to be called, but it might if a node bails without = releasing their locks sub ReleaseAllLocks { my $self =3D shift; $self->{DBH}->do("delete from ____last_stable____)"); } # Adds a publication to the system. Also adds triggers, sequences, etc ass= ociated with the table if approproate. # accepts two argument: the name of a physical table and the name under wh= ich to publish it=20 # NOTE: the publication name is optional and will default to the table na= me if not supplied # returns undef if ok, else error string; sub publish { my $self =3D shift; my $table =3D shift || die 'You must provide a table name (and optionally = a unique publication name)'; my $pub =3D shift; $pub =3D $table if not defined($pub); my $qpub =3D $self->{DBH}->quote($pub); my $sql =3D "select tablename from ____publications____ where pubname =3D = $qpub"; my ($junk) =3D $self->GetOneRow($sql); return 'Publication already exists' if defined($junk); my $qtable =3D $self->{DBH}->quote($table); $sql =3D "select table_id, refcount from ____tables____ where tablename = =3D $qtable"; my ($id, $refcount) =3D $self->GetOneRow($sql); if(!defined($id)) { $self->{DBH}->do("insert into ____tables____ (tablename, refcount) values= ($qtable,1)") || return 'Failed to register table: ' . $self->{DBH}->errst= r; my $sql =3D "select table_id from ____tables____ where tablename =3D $qta= ble"; ($id) =3D $self->GetOneRow($sql); } if (defined($refcount)) { $self->{DBH}->do("update ____tables____ set refcount =3D refcount+1 where= table_id =3D $id") || return 'Failed to update refrence count: ' . $self->= {DBH}->errstr; } else { =09=09 $id =3D '_'.$id.'_';=20 my @cols =3D $self->GetTableCols($table, 1); # 1 =3D get hidden cols too my %skip; foreach my $col (@cols) { $skip{$col} =3D 1; } =09=09 if (!$skip{____rowver____}) { $self->{DBH}->do("alter table $table add column ____rowver____ int4"); #= don't fail here in case table is being republished, just accept the error s= ilently } $self->{DBH}->do("update $table set ____rowver____ =3D ____version_seq___= _.last_value - 1") || return 'Failed to initialize rowver: ' . $self->{DBH}= ->errstr; if (!$skip{____rowid____}) { $self->{DBH}->do("alter table $table add column ____rowid____ int4"); #d= on't fail here in case table is being republished, just accept the error si= lently } my $index =3D $id.'____rowid____idx'; $self->{DBH}->do("create index $index on $table(____rowid____)") || retur= n 'Failed to create rowid index: ' . $self->{DBH}->errstr; my $sequence =3D $id.'_rowid_seq'; $self->{DBH}->do("create sequence $sequence") || return 'Failed to create= rowver sequence: ' . $self->{DBH}->errstr; $self->{DBH}->do("alter table $table alter column ____rowid____ set defau= lt nextval('$sequence')"); #don't fail here in case table is being republis= hed, just accept the error silently $self->{DBH}->do("update $table set ____rowid____ =3D nextval('$sequence= ')") || return 'Failed to initialize rowid: ' . $self->{DBH}->errstr; if (!$skip{____stamp____}) { $self->{DBH}->do("alter table $table add column ____stamp____ timestamp"= ); #don't fail here in case table is being republished, just accept the err= or silently } $self->{DBH}->do("update $table set ____stamp____ =3D now()") || return = 'Failed to initialize stamp: ' . $self->{DBH}->errstr; my $trigger =3D $id.'_ver_ins'; $self->{DBH}->do("create trigger $trigger before insert on $table for eac= h row execute procedure sync_insert_ver()") || return 'Failed to create tri= gger: ' . $self->{DBH}->errstr; my $trigger =3D $id.'_ver_upd'; $self->{DBH}->do("create trigger $trigger before update on $table for eac= h row execute procedure sync_update_ver()") || return 'Failed to create tri= gger: ' . $self->{DBH}->errstr; my $trigger =3D $id.'_del_row'; $self->{DBH}->do("create trigger $trigger after delete on $table for each= row execute procedure sync_delete_row()") || return 'Failed to create trig= ger: ' . $self->{DBH}->errstr; } $self->{DBH}->do("insert into ____publications____ (pubname, tablename) va= lues ('$pub','$table')") || return 'Failed to create publication entry: '.$= self->{DBH}->errstr; return undef; } # Removes a publication from the system. Also drops triggers, sequences, e= tc associated with the table if approproate. # accepts one argument: the name of a publication # returns undef if ok, else error string; sub unpublish { my $self =3D shift; my $pub =3D shift || return 'You must provide a publication name'; my $qpub =3D $self->{DBH}->quote($pub); my $sql =3D "select tablename from ____publications____ where pubname =3D = $qpub"; my ($table) =3D $self->GetOneRow($sql); return 'Publication does not exist' if !defined($table); my $qtable =3D $self->{DBH}->quote($table); $sql =3D "select table_id, refcount from ____tables____ where tablename = =3D $qtable"; my ($id, $refcount) =3D $self->GetOneRow($sql); return 'Table: $table is not correctly registered!' if not defined($id); $self->{DBH}->do("update ____tables____ set refcount =3D refcount -1 where= tablename =3D $qtable") || return 'Failed to decrement reference count: ' = . $self->{DBH}->errstr; $self->{DBH}->do("delete from ____subscribed____ where pubname =3D $qpub")= || return 'Failed to delete user subscriptions: ' . $self->{DBH}->errstr; $self->{DBH}->do("delete from ____subscribed_cols____ where pubname =3D $q= pub") || return 'Failed to delete subscribed columns: ' . $self->{DBH}->err= str; $self->{DBH}->do("delete from ____publications____ where tablename =3D $qt= able and pubname =3D $qpub") || return 'Failed to delete from publications:= ' . $self->{DBH}->errstr; #if this is the last reference, we want to drop triggers, etc; if ($refcount <=3D 1) { $id =3D "_".$id."_"; $self->{DBH}->do("alter table $table alter column ____rowver____ drop def= ault") || return 'Failed to alter column default: ' . $self->{DBH}->errstr; $self->{DBH}->do("alter table $table alter column ____rowid____ drop defa= ult") || return 'Failed to alter column default: ' . $self->{DBH}->errstr; $self->{DBH}->do("alter table $table alter column ____stamp____ drop defa= ult") || return 'Failed to alter column default: ' . $self->{DBH}->errstr; my $trigger =3D $id.'_ver_upd'; $self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to = drop trigger: ' . $self->{DBH}->errstr; my $trigger =3D $id.'_ver_ins'; $self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to = drop trigger: ' . $self->{DBH}->errstr; my $trigger =3D $id.'_del_row'; $self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to = drop trigger: ' . $self->{DBH}->errstr; my $sequence =3D $id.'_rowid_seq'; $self->{DBH}->do("drop sequence $sequence") || return 'Failed to drop seq= uence: ' . $self->{DBH}->errstr; my $index =3D $id.'____rowid____idx'; $self->{DBH}->do("drop index $index") || return 'Failed to drop index: ' = . $self->{DBH}->errstr; $self->{DBH}->do("delete from ____tables____ where tablename =3D $qtable"= ) || return 'remove entry from tables: ' . $self->{DBH}->errstr; } return undef; } #Subscribe user/node to a publication # Accepts 3 arguements: Username, Nodename, Publication # NOTE: the remaining arguments can be supplied as column names to which = the user/node should be subscribed # Return undef if ok, else returns an error string sub subscribe { my $self =3D shift; my $user =3D shift || die 'You must provide user, node and publication as = arguments'; my $node =3D shift || die 'You must provide user, node and publication as = arguments'; my $pub =3D shift || die 'You must provide user, node and publication as a= rguments'; my @cols =3D @_; my $quser =3D $self->{DBH}->quote($user); my $qnode =3D $self->{DBH}->quote($node); my $qpub =3D $self->{DBH}->quote($pub); my $sql =3D "select tablename from ____publications____ where pubname =3D = $qpub"; my ($table) =3D $self->GetOneRow($sql); return "Publication $pub does not exist." if not defined $table; my $qtable =3D $self->{DBH}->quote($table); @cols =3D $self->GetTableCols($table) if !@cols; # get defaults if cols we= re not spefified by caller $self->{DBH}->do("insert into ____subscribed____ (username, nodename,pubna= me,last_ver,refreshonce) values('$user', '$node','$pub',0, true)") || retur= n 'Failes to create subscription: ' . $self->{DBH}->errstr;=09 foreach my $col (@cols) { $self->{DBH}->do("insert into ____subscribed_cols____ (username, nodename= , pubname, col_name) values ('$user','$node','$pub','$col')") || return 'Fa= iles to subscribe column: ' . $self->{DBH}->errstr;=09 } return undef; } #Unsubscribe user/node to a publication # Accepts 3 arguements: Username, Nodename, Publication # Return undef if ok, else returns an error string sub unsubscribe { my $self =3D shift; my $user =3D shift || die 'You must provide user, node and publication as = arguments'; my $node =3D shift || die 'You must provide user, node and publication as = arguments'; my $pub =3D shift || die 'You must provide user, node and publication as a= rguments'; my @cols =3D @_; my $quser =3D $self->{DBH}->quote($user); my $qnode =3D $self->{DBH}->quote($node); my $qpub =3D $self->{DBH}->quote($pub); my $sql =3D "select tablename from ____publications____ where pubname =3D = $qpub"; my $table =3D $self->GetOneRow($sql); return "Publication $pub does not exist." if not defined $table; $self->{DBH}->do("delete from ____subscribed_cols____ where pubname =3D $q= pub and username =3D $quser and nodename =3D $qnode") || return 'Failed to = remove column subscription: '. $self->{DBH}->errstr; $self->{DBH}->do("delete from ____subscribed____ where pubname =3D $qpub a= nd username =3D $quser and nodename =3D $qnode") || return 'Failed to remov= e subscription: '. $self->{DBH}->errstr; return undef; } #INSTALL creates the necessary management tables.=20=20 #returns undef if everything is ok, else returns a string describing the e= rror; sub INSTALL { my $self =3D shift; #check to see if management tables are already installed my ($test) =3D $self->GetOneRow("select * from pg_class where relname =3D '= ____publications____'"); if (defined($test)) { return 'It appears that synchronization manangement tables are already ins= talled here. Please uninstall before reinstalling.'; }; #install the management tables, etc. $self->{DBH}->do("create table ____publications____ (pubname text primary k= ey,description text, tablename text, sync_order int4, whereclause text)") |= | return $self->{DBH}->errstr(); $self->{DBH}->do("create table ____subscribed_cols____ (nodename text, user= name text, pubname text, col_name text, description text, primary key(noden= ame, username, pubname,col_name))") || return $self->{DBH}->errstr(); $self->{DBH}->do("create table ____subscribed____ (nodename text, username = text, pubname text, last_session timestamp, post_ver int4, last_ver int4, w= hereclause text, sanity_limit int4 default 0, sanity_delete int4 default 0,= sanity_update int4 default 0, sanity_insert int4 default 50, readonly bool= ean, disabled boolean, fullrefreshonly boolean, refreshonce boolean, primar= y key(nodename, username, pubname))") || return $self->{DBH}->errstr(); $self->{DBH}->do("create table ____last_stable____ (version int4, username = text, nodename text, primary key(version, username, nodename))") || return = $self->{DBH}->errstr(); $self->{DBH}->do("create table ____tables____ (tablename text, table_id int= 4, refcount int4, primary key(tablename, table_id))") || return $self->{DBH= }->errstr(); $self->{DBH}->do("create sequence ____table_id_seq____") || return $self->{= DBH}->errstr(); $self->{DBH}->do("alter table ____tables____ alter column table_id set defa= ult nextval('____table_id_seq____')") || return $self->{DBH}->errstr(); $self->{DBH}->do("create table ____deleted____ (rowid int4, tablename text,= rowver int4, stamp timestamp, primary key (rowid, tablename))") || return = $self->{DBH}->errstr(); $self->{DBH}->do("create table ____collision____ (rowid text, tablename tex= t, rowver int4, stamp timestamp, faildate timestamp default now(),data text= ,reason text, action text, username text, nodename text,queue text)") || re= turn $self->{DBH}->errstr(); $self->{DBH}->do("create sequence ____version_seq____") || return $self->{D= BH}->errstr(); $self->{DBH}->do("create table ____sync_log____ (username text, nodename te= xt, stamp timestamp, message text)") || return $self->{DBH}->errstr(); $self->{DBH}->do("create function sync_insert_ver() returns opaque as 'begin if new.____rowver____ isnull then new.____rowver____ :=3D ____version_seq____.last_value; end if; if new.____stamp____ isnull then new.____stamp____ :=3D now(); end if; return NEW; end;' language 'plpgsql'") || return $self->{DBH}->errstr(); $self->{DBH}->do("create function sync_update_ver() returns opaque as 'begin if new.____rowver____ =3D old.____rowver____ then new.____rowver____ :=3D ____version_seq____.last_value; end if; if new.____stamp____ =3D old.____stamp____ then new.____stamp____ :=3D now(); end if; return NEW; end;' language 'plpgsql'") || return $self->{DBH}->errstr(); $self->{DBH}->do("create function sync_delete_row() returns opaque as=20 'begin=20 insert into ____deleted____ (rowid,tablename,rowver,stamp) values (old.____rowid____, TG_RELNAME, old.____rowver____,old.____stamp____);=20 return old;=20 end;' language 'plpgsql'") || return $self->{DBH}->errstr(); return undef; } #removes all management tables & related stuff #returns undef if ok, else returns an error message as a string sub UNINSTALL { my $self =3D shift; #Make sure all tables are unpublished first my $sth =3D $self->{DBH}->prepare("select pubname from ____publications____= "); $sth->execute; my $pub; while (($pub) =3D $sth->fetchrow_array) { $self->unpublish($pub);=09 } $sth->finish; $self->{DBH}->do("drop table ____publications____") || return $self->{DBH}-= >errstr(); $self->{DBH}->do("drop table ____subscribed_cols____") || return $self->{DB= H}->errstr(); $self->{DBH}->do("drop table ____subscribed____") || return $self->{DBH}->e= rrstr(); $self->{DBH}->do("drop table ____last_stable____") || return $self->{DBH}->= errstr(); $self->{DBH}->do("drop table ____deleted____") || return $self->{DBH}->errs= tr(); $self->{DBH}->do("drop table ____collision____") || return $self->{DBH}->er= rstr(); $self->{DBH}->do("drop table ____tables____") || return $self->{DBH}->errst= r(); $self->{DBH}->do("drop table ____sync_log____") || return $self->{DBH}->err= str(); $self->{DBH}->do("drop sequence ____table_id_seq____") || return $self->{DB= H}->errstr(); $self->{DBH}->do("drop sequence ____version_seq____") || return $self->{DBH= }->errstr(); $self->{DBH}->do("drop function sync_insert_ver()") || return $self->{DBH}-= >errstr(); $self->{DBH}->do("drop function sync_update_ver()") || return $self->{DBH}-= >errstr(); $self->{DBH}->do("drop function sync_delete_row()") || return $self->{DBH}-= >errstr(); return undef; } sub DESTROY { my $self =3D shift; $self->{DBH}->disconnect; $self->{DBLOG}->disconnect; return undef; } ############# Helper Subs ############ sub GetOneRow { my $self =3D shift; my $sql =3D shift || die 'Must provide sql select statement'; my $sth =3D $self->{DBH}->prepare($sql) || return undef; $sth->execute || return undef; my @row =3D $sth->fetchrow_array; $sth->finish; return @row; } #call this with second non-zero value to get hidden columns sub GetTableCols { my $self =3D shift; my $table =3D shift || die 'Must provide table name'; my $wanthidden =3D shift; my $sql =3D "select * from $table where 0 =3D 1"; my $sth =3D $self->{DBH}->prepare($sql) || return undef; $sth->execute || return undef; my @row =3D @{$sth->{NAME}}; $sth->finish; return @row if $wanthidden; my @cols; foreach my $col (@row) { next if $col eq '____rowver____'; next if $col eq '____stamp____'; next if $col eq '____rowid____'; push @cols, $col;=09 } return @cols; } 1; #happy require ------=_NextPart_000_0062_01C0541E.125CAF30-- From pgsql-hackers-owner+M9917@postgresql.org Mon Jun 11 15:53:25 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BJrPL01206 for ; Mon, 11 Jun 2001 15:53:25 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5BJrPE67753; Mon, 11 Jun 2001 15:53:25 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9917@postgresql.org) Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BJmLE65620 for ; Mon, 11 Jun 2001 15:48:21 -0400 (EDT) (envelope-from djohnson@greatbridge.com) Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70]) by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5BJm2Q28847 for ; Mon, 11 Jun 2001 15:48:02 -0400 From: Darren Johnson Date: Mon, 11 Jun 2001 19:46:44 GMT Message-ID: <20010611.19464400@j2.us.greatbridge.com> Subject: [HACKERS] Postgres Replication To: pgsql-hackers@postgresql.org Reply-To: Darren Johnson X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux) X-Priority: 3 (Normal) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5BJmLE65621 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR We have been researching replication for several months now, and I have some opinions to share to the community for feedback, discussion, and/or participation. Our goal is to get a replication solution for PostgreSQL that will meet most needs of users and applications alike (mission impossible theme here :). My research work along with others contributors has been collected and presented here http://www.greatbridge.org/genpage?replication_top If there is something missing, especially PostgreSQL related work, I would like to know about it, and my apologies to any one who got left off the list. This work is ongoing and doesn't draw a conclusion, which IMHO should be left up to the user, but I'm offering my opinions to spur discussion and/or feed back from this list, and try not to offend any one. Here's my opinion: of the approaches we've surveyed, the most promising one is the Postgres-R project from the Information and Communication Systems Group, ETH in Zurich, Switzerland, originally produced by Bettina Kemme, Gustavo Alonso, and others. Although Postgres-R is a synchronous approach, I believe it is the closest to the goal mentioned above. Here is an abstract of the advantages. 1) Postgres-R is built on the PostgreSQL-6.4.2 code base. The replication functionality is an optional parameter, so there will be insignificant overhead for non replication situations. The replication and communication managers are the two new modules added to the PostgreSQL code base. 2) The replication manager's main function is controlling the replication protocol via a message handling process. It receives messages from the local and remote backends and forwards write sets and decision messages via the communication manager to the other servers. The replication manager controls all the transactions running on the local server by keeping track of the states, including which protocol phase (read, send, lock, or write) the transaction is in. The replication manager maintains a two way channel implemented as buffered sockets to each backend. 3) The main task of the communication manager is to provide simple socket based interface between the replication manager and the group communication system (currently Ensemble). The communication system is a cluster of servers connected via the communication manager. The replication manager also maintains three one-way channels to the communication system: a broadcast channel to send messages, a total-order channel to receive totally orders write sets, and a no-order channel to listen for decision messages from the communication system. Decision messages can be received at any time where the reception of totally ordered write sets can be blocked in certain phases. 4) Based on a two phase locking approach, all dead lock situations are local and detectable by Postgres-R code base, and aborted. 5) The write set messages used to send database changes to other servers, can use either the SQL statements or the actual tuples changed. This is a parameter based on number of tuples changed by a transaction. While sending the tuple changes reduces overhead in query parse, plan and execution, there is a negative effect in sending a large write set across the network. 6) Postgres-R uses a synchronous approach that keeps the data on all sites consistent and provides serializability. The user does not have to bother with conflict resolution, and receives the same correctness and consistency of a centralized system. 7) Postgres-R could be part of a good fault-resilient and load distribution solution. It is peer-to-peer based and incurs low overhead propagating updates to the other cluster members. All replicated databases locally process queries. 8) Compared to other synchronous replication strategies (e.g., standard distributed 2-phase-locking + 2-phase-commit), Postgres-R has much better performance using 2-phase-locking. There are some issues that are not currently addressed by Postgres-R, but some enhancements made to PostgreSQL since the 6.4.2 tree are very favorable to addressing these short comings. 1) The addition of WAL in 7.1 has the information for recovering failed/off-line servers, currently all the servers would have to be stopped, and a copy would be used to get all the servers synchronized before starting again. 2)Being synchronous, Postgres-R would not be a good solution for off line/WAN scenarios where asynchronous replication is required. There are some theories on this issue which involve servers connecting and disconnecting from the cluster. 3)As in any serialized synchronous approach there is change in the flow of execution of a transaction; while most of these changes can be solved by calling newly developed functions at certain time points, synchronous replica control is tightly coupled with the concurrency control. Hence, especially in PostgreSQL 7.2 some parts of the concurrency control (MVCC) might have to be adjusted. This can lead to a slightly more complicated maintenance than a system that does not change the backend. 4)Partial replication is not addressed. Any feedback on this post will be appreciated. Thanks, Darren ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M9923@postgresql.org Mon Jun 11 18:14:23 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BMENL18644 for ; Mon, 11 Jun 2001 18:14:23 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5BMEQE14877; Mon, 11 Jun 2001 18:14:26 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9923@postgresql.org) Received: from spoetnik.xs4all.nl (spoetnik.xs4all.nl [194.109.249.226]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BM6ME12270 for ; Mon, 11 Jun 2001 18:06:23 -0400 (EDT) (envelope-from reinoud@xs4all.nl) Received: from KAYAK (kayak [192.168.1.20]) by spoetnik.xs4all.nl (Postfix) with SMTP id 865A33E1B for ; Tue, 12 Jun 2001 00:06:16 +0200 (CEST) From: reinoud@xs4all.nl (Reinoud van Leeuwen) To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Postgres Replication Date: Mon, 11 Jun 2001 22:06:07 GMT Organization: Not organized in any way Reply-To: reinoud@xs4all.nl Message-ID: <3b403d96.562404297@192.168.1.10> References: <20010611.19464400@j2.us.greatbridge.com> In-Reply-To: <20010611.19464400@j2.us.greatbridge.com> X-Mailer: Forte Agent 1.5/32.451 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5BM6PE12276 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR On Mon, 11 Jun 2001 19:46:44 GMT, you wrote: >We have been researching replication for several months now, and >I have some opinions to share to the community for feedback, >discussion, and/or participation. Our goal is to get a replication >solution for PostgreSQL that will meet most needs of users >and applications alike (mission impossible theme here :). > >My research work along with others contributors has been collected >and presented here http://www.greatbridge.org/genpage?replication_top >If there is something missing, especially PostgreSQL related >work, I would like to know about it, and my apologies to any >one who got left off the list. This work is ongoing and doesn't >draw a conclusion, which IMHO should be left up to the user, >but I'm offering my opinions to spur discussion and/or feed back >from this list, and try not to offend any one. > >Here's my opinion: of the approaches we've surveyed, the most >promising one is the Postgres-R project from the Information and >Communication Systems Group, ETH in Zurich, Switzerland, originally >produced by Bettina Kemme, Gustavo Alonso, and others. Although >Postgres-R is a synchronous approach, I believe it is the closest to >the goal mentioned above. Here is an abstract of the advantages. > >1) Postgres-R is built on the PostgreSQL-6.4.2 code base. The >replication >functionality is an optional parameter, so there will be insignificant >overhead for non replication situations. The replication and >communication >managers are the two new modules added to the PostgreSQL code base. > >2) The replication manager's main function is controlling the >replication protocol via a message handling process. It receives >messages from the local and remote backends and forwards write >sets and decision messages via the communication manager to the >other servers. The replication manager controls all the transactions >running on the local server by keeping track of the states, including >which protocol phase (read, send, lock, or write) the transaction is >in. The replication manager maintains a two way channel >implemented as buffered sockets to each backend. what does "manager controls all the transactions" mean? I hope it does *not* mean that a bug in the manager would cause transactions not to commit... > >3) The main task of the communication manager is to provide simple >socket based interface between the replication manager and the >group communication system (currently Ensemble). The >communication system is a cluster of servers connected via >the communication manager. The replication manager also maintains >three one-way channels to the communication system: a broadcast >channel to send messages, a total-order channel to receive >totally orders write sets, and a no-order channel to listen for >decision messages from the communication system. Decision >messages can be received at any time where the reception of >totally ordered write sets can be blocked in certain phases. > >4) Based on a two phase locking approach, all dead lock situations >are local and detectable by Postgres-R code base, and aborted. Does this imply locking over different servers? That would mean a grinding halt when a network outage occurs... >5) The write set messages used to send database changes to other >servers, can use either the SQL statements or the actual tuples >changed. This is a parameter based on number of tuples changed >by a transaction. While sending the tuple changes reduces >overhead in query parse, plan and execution, there is a negative >effect in sending a large write set across the network. > >6) Postgres-R uses a synchronous approach that keeps the data on >all sites consistent and provides serializability. The user does not >have to bother with conflict resolution, and receives the same >correctness and consistency of a centralized system. > >7) Postgres-R could be part of a good fault-resilient and load >distribution >solution. It is peer-to-peer based and incurs low overhead propagating >updates to the other cluster members. All replicated databases locally >process queries. > >8) Compared to other synchronous replication strategies (e.g., standard >distributed 2-phase-locking + 2-phase-commit), Postgres-R has much >better performance using 2-phase-locking. Coming from a Sybase background I have some experience with replication. The way it works in Sybase Replication server is as follows: - for each replicated database, there is a "log reader" process that reads the WAL and captures only *committed transactions* to the replication server. (it does not make much sense to replicate other things IMHO :-). - the replication server stores incoming data in a que ("stable device"), until it is sure it has reached its final destination - a replication server can send data to another replication server in a compact (read: WAN friendly) way. A chain of replication servers can be made, depending on network architecture) - the final replication server makes a almost standard client connection to the target database and translates the compact transactions back to SQL statements. By using masks, extra functionality can be built in. This kind of architecture has several advantages: - only committed transactions are replicated which saves overhead - it does not have very much impact on performance of the source server (apart from reading the WAL) - since every replication server has a stable device, data is stored when the network is down and nothing gets lost (nor stops performing) - because only the log reader and the connection from the final replication server are RDBMS specific, it is possible to replicate from MS to Oracle using a Sybase replication server (or different versions etc). I do not know how much of this is patented or copyrighted, but the architecture seems elegant and robust to me. I have done implementations of bi-directional replication too. It *is* possible but does require some funky setup and maintenance. (but it is better that letting offices on different continents working on the same database :-) just my 2 EURO cts :-) -- __________________________________________________ "Nothing is as subjective as reality" Reinoud van Leeuwen reinoud@xs4all.nl http://www.xs4all.nl/~reinoud __________________________________________________ ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org From pgsql-hackers-owner+M9924@postgresql.org Mon Jun 11 18:41:51 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BMfpL28917 for ; Mon, 11 Jun 2001 18:41:51 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5BMfsE25092; Mon, 11 Jun 2001 18:41:54 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9924@postgresql.org) Received: from spider.pilosoft.com (p55-222.acedsl.com [160.79.55.222]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BMalE23024 for ; Mon, 11 Jun 2001 18:36:47 -0400 (EDT) (envelope-from alex@pilosoft.com) Received: from localhost (alexmail@localhost) by spider.pilosoft.com (8.9.3/8.9.3) with ESMTP id SAA06092; Mon, 11 Jun 2001 18:46:05 -0400 (EDT) Date: Mon, 11 Jun 2001 18:46:05 -0400 (EDT) From: Alex Pilosov To: Reinoud van Leeuwen cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Postgres Replication In-Reply-To: <3b403d96.562404297@192.168.1.10> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR On Mon, 11 Jun 2001, Reinoud van Leeuwen wrote: > On Mon, 11 Jun 2001 19:46:44 GMT, you wrote: > what does "manager controls all the transactions" mean? I hope it does > *not* mean that a bug in the manager would cause transactions not to > commit... Well yeah it does. Bugs are a fact of life. :) > >4) Based on a two phase locking approach, all dead lock situations > >are local and detectable by Postgres-R code base, and aborted. > > Does this imply locking over different servers? That would mean a > grinding halt when a network outage occurs... Don't know, but see below. > Coming from a Sybase background I have some experience with > replication. The way it works in Sybase Replication server is as > follows: > - for each replicated database, there is a "log reader" process that > reads the WAL and captures only *committed transactions* to the > replication server. (it does not make much sense to replicate other > things IMHO :-). > - the replication server stores incoming data in a que ("stable > device"), until it is sure it has reached its final destination > > - a replication server can send data to another replication server in > a compact (read: WAN friendly) way. A chain of replication servers can > be made, depending on network architecture) > > - the final replication server makes a almost standard client > connection to the target database and translates the compact > transactions back to SQL statements. By using masks, extra > functionality can be built in. > > This kind of architecture has several advantages: > - only committed transactions are replicated which saves overhead > - it does not have very much impact on performance of the source > server (apart from reading the WAL) > - since every replication server has a stable device, data is stored > when the network is down and nothing gets lost (nor stops performing) > - because only the log reader and the connection from the final > replication server are RDBMS specific, it is possible to replicate > from MS to Oracle using a Sybase replication server (or different > versions etc). > > I do not know how much of this is patented or copyrighted, but the > architecture seems elegant and robust to me. I have done > implementations of bi-directional replication too. It *is* possible > but does require some funky setup and maintenance. (but it is better > that letting offices on different continents working on the same > database :-) Yes, the above architecture is what almost every vendor of replication software uses. And I'm sure if you worked much with Sybase, you hate the garbage that their repserver is :). The architecture of postgres-r and repserver are fundamentally different for a good reason: repserver only wants to replicate committed transactions, while postgres-r is more of a 'clustering' solution (albeit they don't say this word), and is capable to do much more than simple rep server. I.E. you can safely put half of your clients to second server in a replicated postgres-r cluster without being worried that a conflict (or a wierd locking situation) may occur. Try that with sybase, it is fundamentally designed for one-way replication, and the fact that you can do one-way replication in both directions doesn't mean its safe to do that! I'm not sure how postgres-r handles network problems. To be useful, a good replication solution must have an option of "no network->no updates" as well as "no network->queue updates and send them later". However, it is far easier to add queuing to a correct 'eager locking' database than it is to add proper locking to a queue-based replicator. -alex ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From pgsql-hackers-owner+M9932@postgresql.org Mon Jun 11 22:17:54 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C2HsL15803 for ; Mon, 11 Jun 2001 22:17:54 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5C2HtE86836; Mon, 11 Jun 2001 22:17:55 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9932@postgresql.org) Received: from femail15.sdc1.sfba.home.com (femail15.sdc1.sfba.home.com [24.0.95.142]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C2BXE85020 for ; Mon, 11 Jun 2001 22:11:33 -0400 (EDT) (envelope-from djohnson@greatbridge.com) Received: from greatbridge.com ([65.2.95.27]) by femail15.sdc1.sfba.home.com (InterMail vM.4.01.03.20 201-229-121-120-20010223) with ESMTP id <20010612021124.OZRG17243.femail15.sdc1.sfba.home.com@greatbridge.com>; Mon, 11 Jun 2001 19:11:24 -0700 Message-ID: <3B257969.6050405@greatbridge.com> Date: Mon, 11 Jun 2001 22:07:37 -0400 From: Darren Johnson User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001108 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: Alex Pilosov , Reinoud van Leeuwen cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Postgres Replication References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR Thanks for the feedback. I'll try to address both your issues here. >> what does "manager controls all the transactions" mean? > The replication manager controls the transactions by serializing the write set messages. This ensures all transactions are committed in the same order on each server, so bugs here are not allowed ;-) >> I hope it does >> *not* mean that a bug in the manager would cause transactions not to >> commit... > > Well yeah it does. Bugs are a fact of life. : > >>> 4) Based on a two phase locking approach, all dead lock situations >>> are local and detectable by Postgres-R code base, and aborted. >> >> Does this imply locking over different servers? That would mean a >> grinding halt when a network outage occurs... > > Don't know, but see below. There is a branch of the Postgres-R code that has some failure detection implemented, so we will have to merge this functionality with the version of Postgres-R we have, and test this issue. I'll let you the results. >> >> - the replication server stores incoming data in a que ("stable >> device"), until it is sure it has reached its final destination > I like this idea for recovering servers that have been down a short period of time, using WAL to recover transactions missed during the outage. >> >> This kind of architecture has several advantages: >> - only committed transactions are replicated which saves overhead >> - it does not have very much impact on performance of the source >> server (apart from reading the WAL) >> - since every replication server has a stable device, data is stored >> when the network is down and nothing gets lost (nor stops performing) >> - because only the log reader and the connection from the final >> replication server are RDBMS specific, it is possible to replicate >> from MS to Oracle using a Sybase replication server (or different >> versions etc). > There are some issues with the "log reader" approach: 1) The databases are not synchronized until the log reader completes its processing. 2) I'm not sure about Sybase, but the log reader sends SQL statements to the other servers which are then parsed, planned and executed. This over head could be avoided if only the tuple changes are replicated. 3) Works fine for read only situations, but peer-to-peer applications using this approach must be designed with a conflict resolution scheme. Don't get me wrong, I believe we can learn from the replication techniques used by commercial databases like Sybase, and try to implement the good ones into PostgreSQL. Postgres-R is a synchronous approach which out performs the traditional approaches to synchronous replication. Being based on PostgreSQL-6.4.2, getting this approach in the 7.2 tree might be better than reinventing the wheel. Thanks again, Darren Thanks again, Darren ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl From pgsql-hackers-owner+M9936@postgresql.org Tue Jun 12 03:22:51 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C7MoL11061 for ; Tue, 12 Jun 2001 03:22:50 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5C7MPE35441; Tue, 12 Jun 2001 03:22:25 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9936@postgresql.org) Received: from reorxrsm.server.lan.at (zep3.it-austria.net [213.150.1.73]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C72ZE25009 for ; Tue, 12 Jun 2001 03:02:36 -0400 (EDT) (envelope-from ZeugswetterA@wien.spardat.at) Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149]) by reorxrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5C72Qu27966 for ; Tue, 12 Jun 2001 09:02:26 +0200 Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21) id ; Tue, 12 Jun 2001 09:02:21 +0200 Message-ID: <11C1E6749A55D411A9670001FA68796336831B@sdexcsrv1.f000.d0188.sd.spardat.at> From: Zeugswetter Andreas SB To: "'Darren Johnson'" , pgsql-hackers@postgresql.org Subject: AW: [HACKERS] Postgres Replication Date: Tue, 12 Jun 2001 09:02:20 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > Although > Postgres-R is a synchronous approach, I believe it is the closest to > the goal mentioned above. Here is an abstract of the advantages. If you only want synchronous replication, why not simply use triggers ? All you would then need is remote query access and two phase commit, and maybe a little script that helps create the appropriate triggers. Doing a replicate all or nothing approach that only works synchronous is imho not flexible enough. Andreas ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl From pgsql-hackers-owner+M9945@postgresql.org Tue Jun 12 10:18:29 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CEISL06372 for ; Tue, 12 Jun 2001 10:18:28 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CEIQE77517; Tue, 12 Jun 2001 10:18:26 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9945@postgresql.org) Received: from krypton.netropolis.org ([208.222.215.99]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CEDuE75514 for ; Tue, 12 Jun 2001 10:13:56 -0400 (EDT) (envelope-from root@generalogic.com) Received: from [132.216.183.103] (helo=localhost) by krypton.netropolis.org with esmtp (Exim 3.12 #1 (Debian)) id 159ouq-0003MU-00 for ; Tue, 12 Jun 2001 10:13:08 -0400 To: pgsql-hackers@postgresql.org Subject: Re: AW: [HACKERS] Postgres Replication In-Reply-To: <20010612.13321600@j2.us.greatbridge.com> References: <20010612.13321600@j2.us.greatbridge.com> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.0 (HANANOEN) MIME-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20010612123623O.root@generalogic.com> Date: Tue, 12 Jun 2001 12:36:23 +0530 From: root X-Dispatcher: imput version 20000414(IM141) Lines: 47 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR Hello I have hacked up a replication layer for Perl code accessing a database throught the DBI interface. It works pretty well with MySQL (I can run pre-bender slashcode replicated, haven't tried the more recent releases). Potentially this hack should also work with Pg but I haven't tried yet. If someone would like to test it out with a complex Pg app and let me know how it went that would be cool. The replication layer is based on Eric Newton's Recall replication library (www.fault-tolerant.org/recall), and requires that all database accesses be through the DBI interface. The replicas are live, in that every operation affects all the replicas in real time. Replica outages are invisible to the user, so long as a majority of the replicas are functioning. Disconnected replicas can be used for read-only access. The only code modification that should be required to use the replication layer is to change the DSN in connect(): my $replicas = '192.168.1.1:7000,192.168.1.2:7000,192.168.1.3:7000'; my $dbh = DBI->connect("DBI:Recall:database=$replicas"); You should be able to install the replication modules with: perl -MCPAN -eshell cpan> install Replication::Recall::DBServer and then install DBD::Recall (which doesn't seem to be accessible from the CPAN shell yet, for some reason), by: wget http://www.cpan.org/authors/id/AGUL/DBD-Recall-1.10.tar.gz tar xzvf DBD-Recall-1.10.tar.gz cd DBD-Recall-1.10 perl Makefile.PL make install I would be very interested in hearing about your experiences with this... Thanks #! ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From pgsql-hackers-owner+M9938@postgresql.org Tue Jun 12 05:12:54 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C9CrL15228 for ; Tue, 12 Jun 2001 05:12:53 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5C9CnE91297; Tue, 12 Jun 2001 05:12:49 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9938@postgresql.org) Received: from mobile.hub.org (SHW39-29.accesscable.net [24.138.39.29]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C98DE89175 for ; Tue, 12 Jun 2001 05:08:13 -0400 (EDT) (envelope-from scrappy@hub.org) Received: from localhost (scrappy@localhost) by mobile.hub.org (8.11.3/8.11.1) with ESMTP id f5C97f361630; Tue, 12 Jun 2001 06:07:46 -0300 (ADT) (envelope-from scrappy@hub.org) X-Authentication-Warning: mobile.hub.org: scrappy owned process doing -bs Date: Tue, 12 Jun 2001 06:07:41 -0300 (ADT) From: The Hermit Hacker To: Zeugswetter Andreas SB cc: "'Darren Johnson'" , Subject: Re: AW: [HACKERS] Postgres Replication In-Reply-To: <11C1E6749A55D411A9670001FA68796336831B@sdexcsrv1.f000.d0188.sd.spardat.at> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR which I believe is what the rserv implementation in contrib currently does ... no? its funny ... what is in contrib right now was developed in a weekend by Vadim, put in contrib, yet nobody has either used it *or* seen fit to submit patches to improve it ... ? On Tue, 12 Jun 2001, Zeugswetter Andreas SB wrote: > > > Although > > Postgres-R is a synchronous approach, I believe it is the closest to > > the goal mentioned above. Here is an abstract of the advantages. > > If you only want synchronous replication, why not simply use triggers ? > All you would then need is remote query access and two phase commit, > and maybe a little script that helps create the appropriate triggers. > > Doing a replicate all or nothing approach that only works synchronous > is imho not flexible enough. > > Andreas > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://www.postgresql.org/search.mpl > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org From pgsql-hackers-owner+M9940@postgresql.org Tue Jun 12 09:39:08 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CDd8L03200 for ; Tue, 12 Jun 2001 09:39:08 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CDcmE58175; Tue, 12 Jun 2001 09:38:48 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9940@postgresql.org) Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CDYAE56164 for ; Tue, 12 Jun 2001 09:34:10 -0400 (EDT) (envelope-from djohnson@greatbridge.com) Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70]) by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CDXeQ03585; Tue, 12 Jun 2001 09:33:40 -0400 From: Darren Johnson Date: Tue, 12 Jun 2001 13:32:16 GMT Message-ID: <20010612.13321600@j2.us.greatbridge.com> Subject: Re: AW: [HACKERS] Postgres Replication To: The Hermit Hacker cc: Zeugswetter Andreas SB , Reply-To: Darren Johnson In-Reply-To: References: X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux) X-Priority: 3 (Normal) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CDYAE56166 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > which I believe is what the rserv implementation in contrib currently does > ... no? We tried rserv, PG Link (Joseph Conway), and PosrgreSQL Replicator. All these projects are trigger based asynchronous replication. They all have some advantages over the current functionality of Postgres-R some of which I believe can be addressed: 1) Partial replication - being able to replicate just one or part of a table(s) 2) They make no changes to the PostgreSQL code base. (Postgres-R can't address this one ;) 3) PostgreSQL Replicator has some very nice conflict resolution schemes. Here are some disadvantages to using a "trigger based" approach: 1) Triggers simply transfer individual data items when they are modified, they do not keep track of transactions. 2) The execution of triggers within a database imposes a performance overhead to that database. 3) Triggers require careful management by database administrators. Someone needs to keep track of all the "alarms" going off. 4) The activation of triggers in a database cannot be easily rolled back or undone. > On Tue, 12 Jun 2001, Zeugswetter Andreas SB wrote: > > Doing a replicate all or nothing approach that only works synchronous > > is imho not flexible enough. > > I agree. Partial and asynchronous replication need to be addressed, and some of the common functionality of Postgres-R could possibly be used to meet those needs. Thanks for your feedback, Darren ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html From pgsql-hackers-owner+M9969@postgresql.org Tue Jun 12 16:53:45 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CKriL23104 for ; Tue, 12 Jun 2001 16:53:44 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CKrlE87423; Tue, 12 Jun 2001 16:53:47 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9969@postgresql.org) Received: from sectorbase2.sectorbase.com (sectorbase2.sectorbase.com [63.88.121.62] (may be forged)) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CHWkE69562 for ; Tue, 12 Jun 2001 13:32:46 -0400 (EDT) (envelope-from vmikheev@SECTORBASE.COM) Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19) id ; Tue, 12 Jun 2001 10:30:29 -0700 Message-ID: <3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com> From: "Mikheev, Vadim" To: "'Darren Johnson'" , The Hermit Hacker cc: Zeugswetter Andreas SB , pgsql-hackers@postgresql.org Subject: RE: AW: [HACKERS] Postgres Replication Date: Tue, 12 Jun 2001 10:30:27 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > Here are some disadvantages to using a "trigger based" approach: > > 1) Triggers simply transfer individual data items when they > are modified, they do not keep track of transactions. I don't know about other *async* replication engines but Rserv keeps track of transactions (if I understood you corectly). Rserv transfers not individual modified data items but *consistent* snapshot of changes to move slave database from one *consistent* state (when all RI constraints satisfied) to another *consistent* state. > 4) The activation of triggers in a database cannot be easily > rolled back or undone. What do you mean? Vadim ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M9967@postgresql.org Tue Jun 12 16:42:11 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CKgBL17982 for ; Tue, 12 Jun 2001 16:42:11 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CKgDE80566; Tue, 12 Jun 2001 16:42:13 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9967@postgresql.org) Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CIVdE07561 for ; Tue, 12 Jun 2001 14:31:39 -0400 (EDT) (envelope-from djohnson@greatbridge.com) Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70]) by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CIUfQ10080; Tue, 12 Jun 2001 14:30:41 -0400 From: Darren Johnson Date: Tue, 12 Jun 2001 18:29:20 GMT Message-ID: <20010612.18292000@j2.us.greatbridge.com> Subject: RE: AW: [HACKERS] Postgres Replication To: "Mikheev, Vadim" cc: The Hermit Hacker , Zeugswetter Andreas SB , pgsql-hackers@postgresql.org Reply-To: Darren Johnson <3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com> References: <3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com> X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux) X-Priority: 3 (Normal) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CIVdE07562 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > > Here are some disadvantages to using a "trigger based" approach: > > > > 1) Triggers simply transfer individual data items when they > > are modified, they do not keep track of transactions. > I don't know about other *async* replication engines but Rserv > keeps track of transactions (if I understood you corectly). > Rserv transfers not individual modified data items but > *consistent* snapshot of changes to move slave database from > one *consistent* state (when all RI constraints satisfied) > to another *consistent* state. I thought Andreas did a good job of correcting me here. Transaction- based replication with triggers do not apply to points 1 and 4. I should have made a distinction between non-transaction and transaction based replication with triggers. I was not trying to single out rserv or any other project, and I can see how my wording implies this misinterpretation (my apologies). > > 4) The activation of triggers in a database cannot be easily > > rolled back or undone. > What do you mean? Once the trigger fires, it is not an easy task to abort that execution via rollback or undo. Again this is not an issue with a transaction-based trigger approach. Sincerely, Darren ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M9943@postgresql.org Tue Jun 12 10:03:02 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CE32L04619 for ; Tue, 12 Jun 2001 10:03:02 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CE31E70430; Tue, 12 Jun 2001 10:03:01 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9943@postgresql.org) Received: from fizbanrsm.server.lan.at (zep4.it-austria.net [213.150.1.74]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CDoQE64062 for ; Tue, 12 Jun 2001 09:50:26 -0400 (EDT) (envelope-from ZeugswetterA@wien.spardat.at) Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149]) by fizbanrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5CDoJe11224 for ; Tue, 12 Jun 2001 15:50:19 +0200 Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21) id ; Tue, 12 Jun 2001 15:50:15 +0200 Message-ID: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> From: Zeugswetter Andreas SB To: "'Darren Johnson'" , The Hermit Hacker cc: pgsql-hackers@postgresql.org Subject: AW: AW: [HACKERS] Postgres Replication Date: Tue, 12 Jun 2001 15:50:09 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > Here are some disadvantages to using a "trigger based" approach: > > 1) Triggers simply transfer individual data items when they > are modified, they do not keep track of transactions. > 2) The execution of triggers within a database imposes a performance > overhead to that database. > 3) Triggers require careful management by database administrators. > Someone needs to keep track of all the "alarms" going off. > 4) The activation of triggers in a database cannot be easily > rolled back or undone. Yes, points 2 and 3 are a given, although point 2 buys you the functionality of transparent locking across all involved db servers. Points 1 and 4 are only the case for a trigger mechanism that does not use remote connection and 2-phase commit. Imho an implementation that opens a separate client connection to the replication target is only suited for async replication, and for that a WAL based solution would probably impose less overhead. Andreas ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M9946@postgresql.org Tue Jun 12 10:47:09 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CEl9L08144 for ; Tue, 12 Jun 2001 10:47:09 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CEihE88714; Tue, 12 Jun 2001 10:44:43 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9946@postgresql.org) Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CEd6E85859 for ; Tue, 12 Jun 2001 10:39:06 -0400 (EDT) (envelope-from djohnson@greatbridge.com) Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70]) by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CEcgQ04905; Tue, 12 Jun 2001 10:38:42 -0400 From: Darren Johnson Date: Tue, 12 Jun 2001 14:37:18 GMT Message-ID: <20010612.14371800@j2.us.greatbridge.com> Subject: Re: AW: AW: [HACKERS] Postgres Replication To: Zeugswetter Andreas SB cc: pgsql-hackers@postgresql.org Reply-To: Darren Johnson <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux) X-Priority: 3 (Normal) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CEd6E85860 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > Imho an implementation that opens a separate client connection to the > replication target is only suited for async replication, and for that a WAL > based solution would probably impose less overhead. Yes there is significant overhead with opening a connection to a client, so Postgres-R creates a pool of backends at start up, coupled with the group communication system (Ensemble) that significantly reduces this issue. Very good points, Darren ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl From pgsql-hackers-owner+M9982@postgresql.org Tue Jun 12 19:04:06 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CN46E10043 for ; Tue, 12 Jun 2001 19:04:06 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CN4AE62160; Tue, 12 Jun 2001 19:04:10 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9982@postgresql.org) Received: from spoetnik.xs4all.nl (spoetnik.xs4all.nl [194.109.249.226]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CMxaE60194 for ; Tue, 12 Jun 2001 18:59:36 -0400 (EDT) (envelope-from reinoud@xs4all.nl) Received: from KAYAK (kayak [192.168.1.20]) by spoetnik.xs4all.nl (Postfix) with SMTP id 435353E1B for ; Wed, 13 Jun 2001 00:59:28 +0200 (CEST) From: reinoud@xs4all.nl (Reinoud van Leeuwen) To: pgsql-hackers@postgresql.org Subject: Re: AW: AW: [HACKERS] Postgres Replication Date: Tue, 12 Jun 2001 22:59:23 GMT Organization: Not organized in any way Reply-To: reinoud@xs4all.nl Message-ID: <3b499c5b.652202125@192.168.1.10> References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> In-Reply-To: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> X-Mailer: Forte Agent 1.5/32.451 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CMxcE60196 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR On Tue, 12 Jun 2001 15:50:09 +0200, you wrote: > >> Here are some disadvantages to using a "trigger based" approach: >> >> 1) Triggers simply transfer individual data items when they >> are modified, they do not keep track of transactions. >> 2) The execution of triggers within a database imposes a performance >> overhead to that database. >> 3) Triggers require careful management by database administrators. >> Someone needs to keep track of all the "alarms" going off. >> 4) The activation of triggers in a database cannot be easily >> rolled back or undone. > >Yes, points 2 and 3 are a given, although point 2 buys you the functionality >of transparent locking across all involved db servers. >Points 1 and 4 are only the case for a trigger mechanism that does >not use remote connection and 2-phase commit. > >Imho an implementation that opens a separate client connection to the >replication target is only suited for async replication, and for that a WAL >based solution would probably impose less overhead. Well as I read back the thread I see 2 different approaches to replication: 1: tight integrated replication. pro: - bi-directional (or multidirectional): updates are possible everywhere - A cluster of servers allways has the same state. - it does not matter to which server you connect con: - network between servers will be a bottleneck, especially if it is a WAN connection - only full replication possible - what happens if one server is down? (or the network between) are commits still possible 2: async replication pro: - long distance possible - no problems with network outages - only changes are replicated, selects do not have impact - no locking issues accross servers - partial replication possible (many->one (datawarehouse), or one-many (queries possible everywhere, updates only central) - goof for failover situations (backup server is standing by) con: - bidirectional replication hard to set up (you'll have to implement conflict resolution according to your business rules) - different servers are not guaranteed to be in the same state. I can think of some scenarios where I would definitely want to *choose* one of the options. A load-balanced web environment would likely want the first option, but synchronizing offices in different continents might not work with 2-phase commit over the network.... And we have not even started talking about *managing* replicated environments. A lot of fail-over scenarios stop planning after the backup host has take control. But how to get back? -- __________________________________________________ "Nothing is as subjective as reality" Reinoud van Leeuwen reinoud@xs4all.nl http://www.xs4all.nl/~reinoud __________________________________________________ ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org From pgsql-hackers-owner+M9986@postgresql.org Tue Jun 12 19:48:48 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CNmmE13125 for ; Tue, 12 Jun 2001 19:48:48 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5CNmqE76673; Tue, 12 Jun 2001 19:48:52 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9986@postgresql.org) Received: from sss.pgh.pa.us ([192.204.191.242]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CNdQE73923 for ; Tue, 12 Jun 2001 19:39:26 -0400 (EDT) (envelope-from tgl@sss.pgh.pa.us) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) by sss.pgh.pa.us (8.11.3/8.11.3) with ESMTP id f5CNdI016442; Tue, 12 Jun 2001 19:39:18 -0400 (EDT) To: reinoud@xs4all.nl cc: pgsql-hackers@postgresql.org Subject: Re: AW: AW: [HACKERS] Postgres Replication In-Reply-To: <3b499c5b.652202125@192.168.1.10> References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> <3b499c5b.652202125@192.168.1.10> Comments: In-reply-to reinoud@xs4all.nl (Reinoud van Leeuwen) message dated "Tue, 12 Jun 2001 22:59:23 +0000" Date: Tue, 12 Jun 2001 19:39:18 -0400 Message-ID: <16439.992389158@sss.pgh.pa.us> From: Tom Lane Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR reinoud@xs4all.nl (Reinoud van Leeuwen) writes: > Well as I read back the thread I see 2 different approaches to > replication: > ... > I can think of some scenarios where I would definitely want to > *choose* one of the options. Yes. IIRC, it looks to be possible to support a form of async replication using the Postgres-R approach: you allow the cluster to break apart when communications fail, and then rejoin when your link comes back to life. (This can work in principle, how close it is to reality is another question; but the rejoin operation is the same as crash recovery, so you have to have it anyway.) So this seems to me to allow getting most of the benefits of the async approach. OTOH it is difficult to see how to go the other way: getting the benefits of a synchronous solution atop a basically-async implementation doesn't seem like it can work. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl From pgsql-hackers-owner+M9997@postgresql.org Wed Jun 13 09:05:56 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5DD5tE28260 for ; Wed, 13 Jun 2001 09:05:55 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5DD5xE12437; Wed, 13 Jun 2001 09:05:59 -0400 (EDT) (envelope-from pgsql-hackers-owner+M9997@postgresql.org) Received: from fizbanrsm.server.lan.at (zep4.it-austria.net [213.150.1.74]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5DD19E00635 for ; Wed, 13 Jun 2001 09:01:10 -0400 (EDT) (envelope-from ZeugswetterA@wien.spardat.at) Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149]) by fizbanrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5DD13m08153 for ; Wed, 13 Jun 2001 15:01:03 +0200 Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21) id ; Wed, 13 Jun 2001 15:00:02 +0200 Message-ID: <11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at> From: Zeugswetter Andreas SB To: "'reinoud@xs4all.nl'" , pgsql-hackers@postgresql.org Subject: AW: AW: AW: [HACKERS] Postgres Replication Date: Wed, 13 Jun 2001 11:55:48 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > Well as I read back the thread I see 2 different approaches to > replication: > > 1: tight integrated replication. > pro: > - bi-directional (or multidirectional): updates are possible everywhere > - A cluster of servers allways has the same state. > - it does not matter to which server you connect > con: > - network between servers will be a bottleneck, especially if it is a > WAN connection > - only full replication possible I do not understand that point, if it is trigger based, you have all the flexibility you need. (only some tables, only some rows, different rows to different targets ....), (or do you mean not all targets, that could also be achieved with triggers) > - what happens if one server is down? (or the network between) are > commits still possible No, updates are not possible if one target is not reachable, that would not be synchronous and would again need business rules to resolve conflicts. Allowing updates when a target is not reachable would require admin intervention. Andreas ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M10005@postgresql.org Wed Jun 13 11:15:48 2001 Return-path: Received: from postgresql.org (webmail.postgresql.org [216.126.85.28]) by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5DFFmE08382 for ; Wed, 13 Jun 2001 11:15:48 -0400 (EDT) Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28]) by postgresql.org (8.11.3/8.11.1) with SMTP id f5DFFoE53621; Wed, 13 Jun 2001 11:15:50 -0400 (EDT) (envelope-from pgsql-hackers-owner+M10005@postgresql.org) Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36]) by postgresql.org (8.11.3/8.11.1) with ESMTP id f5DEk7E38930 for ; Wed, 13 Jun 2001 10:46:07 -0400 (EDT) (envelope-from djohnson@greatbridge.com) Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70]) by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5DEhfQ22566; Wed, 13 Jun 2001 10:43:41 -0400 From: Darren Johnson Date: Wed, 13 Jun 2001 14:44:11 GMT Message-ID: <20010613.14441100@j2.us.greatbridge.com> Subject: Re: AW: AW: AW: [HACKERS] Postgres Replication To: Zeugswetter Andreas SB cc: "'reinoud@xs4all.nl'" , pgsql-hackers@postgresql.org Reply-To: Darren Johnson <11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at> References: <11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at> X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux) X-Priority: 3 (Normal) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5DEk8E38931 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > > - only full replication possible > I do not understand that point, if it is trigger based, you > have all the flexibility you need. (only some tables, only some rows, > different rows to different targets ....), > (or do you mean not all targets, that could also be achieved with triggers) Currently with Postgres-R, it is one database replicating all tables to all servers in the group communication system. There are some ways around this by invoking the -r option when a SQL statement should be replicated, and leaving the -r option off for non-replicated scenarios. IMHO this is not a good solution. A better solution will need to be implemented, which involves a subscription table(s) with relation/server information. There are two ideas for subscribing and receiving replicated data. 1) Receiver driven propagation - A simple solution where all transactions are propagated and the receiving servers will reference the subscription information before applying updates. 2) Sender driven propagation - A more optimal and complex solution where servers do not receive any messages regarding data items for which they have not subscribed > > - what happens if one server is down? (or the network between) are > > commits still possible > No, updates are not possible if one target is not reachable, AFAIK, Postgres-R can still replicate if one target is not reachable, but only to the remaining servers ;). There is a scenario that could arise if a server issues a lock request then fails or goes off line. There is code that checks for this condition, which needs to be merged with the branch we have. > that would not be synchronous and would again need business rules > to resolve conflicts. Yes the failed server would not be synchronized, and getting this failed server back in sync needs to be addressed. > Allowing updates when a target is not reachable would require admin > intervention. In its current state yes, but our goal would be to eliminate this requirement as well. Darren ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From pgsql-hackers-owner+M18443=candle.pha.pa.us=pgman@postgresql.org Mon Feb 4 19:16:17 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g150GGP03822 for ; Mon, 4 Feb 2002 19:16:16 -0500 (EST) Received: (qmail 77444 invoked by alias); 5 Feb 2002 00:16:11 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 5 Feb 2002 00:16:11 -0000 Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g150Esl77040 for ; Mon, 4 Feb 2002 19:14:54 -0500 (EST) (envelope-from markw@mohawksoft.com) Received: from mohawksoft.com (localhost [127.0.0.1]) by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g150AWh08676 for ; Mon, 4 Feb 2002 19:10:33 -0500 Message-ID: <3C5F22F8.C9B958F0@mohawksoft.com> Date: Mon, 04 Feb 2002 19:10:32 -0500 From: mlw X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686) X-Accept-Language: en MIME-Version: 1.0 To: PostgreSQL-development Subject: [HACKERS] Replication Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it works like the whole rserv project. I don't like it. OK, what the hell do we need to do to get PostgreSQL replicating? ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M18445=candle.pha.pa.us=pgman@postgresql.org Mon Feb 4 19:57:01 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g150v0P06518 for ; Mon, 4 Feb 2002 19:57:00 -0500 (EST) Received: (qmail 90440 invoked by alias); 5 Feb 2002 00:56:59 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 5 Feb 2002 00:56:59 -0000 Received: from www1.navtechinc.com ([192.234.226.140]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g150rMl89885 for ; Mon, 4 Feb 2002 19:53:22 -0500 (EST) (envelope-from ssinger@navtechinc.com) Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190]) by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id AAA06047; Tue, 5 Feb 2002 00:53:22 GMT Received: from localhost (ssinger@localhost) by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id AAA10675; Tue, 5 Feb 2002 00:52:43 GMT Date: Tue, 5 Feb 2002 00:52:43 +0000 (GMT) From: Steven X-X-Sender: To: mlw cc: PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR On Mon, 4 Feb 2002, mlw wrote: I've developed a replacement for Rserv and we are planning on releasing it as open source(ie as a contrib module). Like Rserv its trigger based but its much more flexible. The key adventages it has over Rserv is that it has -Support for multiple slaves -It Perserves transactions while doing the mirroring. Ie If rows A,B are originally added in the same transaction they will be mirrored in the same transaction. We have plans on adding filtering based on data/selective mirroring as well. (Ie only rows with COUNTRY='Canada' go to slave A, and rows with COUNTRY='China' go to slave B). But I'm not sure when I'll get to that. Support for conflict resolution(If allow edits to be made on the slaves) would be nice. I hope to be able to send a tarball with the source to the pgpatches list within the next few days. We've been using the system operationally for a number of months and have been happy with it. > I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it > works like the whole rserv project. I don't like it. > OK, what the hell do we need to do to get PostgreSQL replicating? > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Steven Singer ssinger@navtechinc.com Aircraft Performance Systems Phone: 519-747-1170 ext 282 Navtech Systems Support Inc. AFTN: CYYZXNSX SITA: YYZNSCR Waterloo, Ontario ARINC: YKFNSCR ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M18447=candle.pha.pa.us=pgman@postgresql.org Mon Feb 4 20:06:57 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g1516vP07508 for ; Mon, 4 Feb 2002 20:06:57 -0500 (EST) Received: (qmail 92753 invoked by alias); 5 Feb 2002 01:06:55 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 5 Feb 2002 01:06:55 -0000 Received: from inflicted.crimelabs.net (crimelabs.net [66.92.101.112]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g150vhl91978 for ; Mon, 4 Feb 2002 19:57:44 -0500 (EST) (envelope-from bpalmer@crimelabs.net) Received: from mizer.crimelabs.net (mizer.crimelabs.net [192.168.88.10]) by inflicted.crimelabs.net (Postfix) with ESMTP id 9D6EE8779; Mon, 4 Feb 2002 19:57:46 -0500 (EST) Date: Mon, 4 Feb 2002 19:57:34 -0500 (EST) From: bpalmer To: mlw cc: PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > > OK, what the hell do we need to do to get PostgreSQL replicating? I hope you understand that replication, done right, is a massive project. I know that Darren any myself (and the rest of the pg-repl folks) have been waiting till 7.2 went gold till we did anymore work. I think we hope to have master / slave replicatin working for 7.3 and then target multimaster for 7.4. At least that's the hope. - Brandon ---------------------------------------------------------------------------- c: 646-456-5455 h: 201-798-4983 b. palmer, bpalmer@crimelabs.net pgp:crimelabs.net/bpalmer.pgp5 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M18449=candle.pha.pa.us=pgman@postgresql.org Mon Feb 4 21:16:56 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g152GtP10503 for ; Mon, 4 Feb 2002 21:16:55 -0500 (EST) Received: (qmail 6711 invoked by alias); 5 Feb 2002 02:16:53 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 5 Feb 2002 02:16:53 -0000 Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g151qSl99469 for ; Mon, 4 Feb 2002 20:52:28 -0500 (EST) (envelope-from markw@mohawksoft.com) Received: from mohawksoft.com (localhost [127.0.0.1]) by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g151lph09147; Mon, 4 Feb 2002 20:47:51 -0500 Message-ID: <3C5F39C7.970F4549@mohawksoft.com> Date: Mon, 04 Feb 2002 20:47:51 -0500 From: mlw X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686) X-Accept-Language: en MIME-Version: 1.0 To: Steven cc: PostgreSQL-development Subject: Re: [HACKERS] Replication References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR Steven wrote: > > On Mon, 4 Feb 2002, mlw wrote: > > I've developed a replacement for Rserv and we are planning on releasing > it as open source(ie as a contrib module). > > Like Rserv its trigger based but its much more flexible. > The key adventages it has over Rserv is that it has > -Support for multiple slaves > -It Perserves transactions while doing the mirroring. Ie If rows A,B are > originally added in the same transaction they will be mirrored in the same > transaction. I did a similar thing. I took the rserv trigger "as is," but rewrote the replication support code. What I eventually did was write a "snapshot daemon" which created snapshot files. Then a "slave daemon" which would check the last snapshot applied and apply all the snapshots, in order, as needed. One would run one of these daemons per slave server. ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html From pgsql-hackers-owner+M18448=candle.pha.pa.us=pgman@postgresql.org Mon Feb 4 20:57:25 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g151vOP09239 for ; Mon, 4 Feb 2002 20:57:24 -0500 (EST) Received: (qmail 99828 invoked by alias); 5 Feb 2002 01:57:19 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 5 Feb 2002 01:57:19 -0000 Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g151s0l99529 for ; Mon, 4 Feb 2002 20:54:00 -0500 (EST) (envelope-from markw@mohawksoft.com) Received: from mohawksoft.com (localhost [127.0.0.1]) by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g151nah09156; Mon, 4 Feb 2002 20:49:37 -0500 Message-ID: <3C5F3A30.A4C46FB8@mohawksoft.com> Date: Mon, 04 Feb 2002 20:49:36 -0500 From: mlw X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686) X-Accept-Language: en MIME-Version: 1.0 To: bpalmer cc: PostgreSQL-development Subject: Re: [HACKERS] Replication References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR bpalmer wrote: > > > > > OK, what the hell do we need to do to get PostgreSQL replicating? > > I hope you understand that replication, done right, is a massive > project. I know that Darren any myself (and the rest of the pg-repl > folks) have been waiting till 7.2 went gold till we did anymore work. I > think we hope to have master / slave replicatin working for 7.3 and then > target multimaster for 7.4. At least that's the hope. I do know how hard replication is. I also understand how important it is. If you guys have a project going, and need developers, I am more than willing. ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html From pgsql-hackers-owner+M18450=candle.pha.pa.us=pgman@postgresql.org Mon Feb 4 21:42:13 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g152gCP11957 for ; Mon, 4 Feb 2002 21:42:13 -0500 (EST) Received: (qmail 14229 invoked by alias); 5 Feb 2002 02:42:09 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 5 Feb 2002 02:42:09 -0000 Received: from www1.navtechinc.com ([192.234.226.140]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g152SBl10682 for ; Mon, 4 Feb 2002 21:28:11 -0500 (EST) (envelope-from ssinger@navtechinc.com) Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190]) by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id CAA06384; Tue, 5 Feb 2002 02:28:13 GMT Received: from localhost (ssinger@localhost) by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id CAA10682; Tue, 5 Feb 2002 02:27:35 GMT Date: Tue, 5 Feb 2002 02:27:35 +0000 (GMT) From: Steven X-X-Sender: To: mlw cc: PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: <3C5F39C7.970F4549@mohawksoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR DBMirror doesn't use snapshot's instead it records a log of transactions that are committed to the database in a pair of tables. In the case of an INSERT this is the row that is being added. In the case of a delete the primary key of the row being deleted. And in the case of an UPDATE, the primary key before the update along with all of the data the row should have after an update. Then for each slave database a perl script walks though the transactions that are pending for that host and reconstructs SQL to send the row edits to that host. A record of the fact that transaction Y has been sent to host X is also kept. When transaction X has been sent to all of the hosts that are in the system it is then deleted from the Pending tables. I suspect that all of the information I'm storing in the Pending tables is also being stored by Postgres in its log but I haven't investigated how the information could be extracted(or how long it is kept for). That would reduce the extra storage overhead that the replication system imposes. As I remember(Its been a while since I've looked at it) RServ uses OID's in its tables to point to the data that needs to be replicated. We tried a similar approach but found difficulties with doing partial updates. On Mon, 4 Feb 2002, mlw wrote: > I did a similar thing. I took the rserv trigger "as is," but rewrote the > replication support code. What I eventually did was write a "snapshot daemon" > which created snapshot files. Then a "slave daemon" which would check the last > snapshot applied and apply all the snapshots, in order, as needed. One would > run one of these daemons per slave server. -- Steven Singer ssinger@navtechinc.com Aircraft Performance Systems Phone: 519-747-1170 ext 282 Navtech Systems Support Inc. AFTN: CYYZXNSX SITA: YYZNSCR Waterloo, Ontario ARINC: YKFNSCR ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M18554=candle.pha.pa.us=pgman@postgresql.org Thu Feb 7 02:49:48 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g177nlP04347 for ; Thu, 7 Feb 2002 02:49:47 -0500 (EST) Received: (qmail 22556 invoked by alias); 7 Feb 2002 07:49:49 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 7 Feb 2002 07:49:49 -0000 Received: from linuxworld.com.au (www.linuxworld.com.au [203.34.46.50]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g177QfE19572 for ; Thu, 7 Feb 2002 02:26:42 -0500 (EST) (envelope-from swm@linuxworld.com.au) Received: from localhost (swm@localhost) by linuxworld.com.au (8.11.4/8.11.4) with ESMTP id g177RiU06086; Thu, 7 Feb 2002 18:27:45 +1100 Date: Thu, 7 Feb 2002 18:27:44 +1100 (EST) From: Gavin Sherry To: mlw cc: PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR On Mon, 4 Feb 2002, mlw wrote: > I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it > works like the whole rserv project. I don't like it. > > OK, what the hell do we need to do to get PostgreSQL replicating? The trigger model is not a very sophisticated one. I think I have a better -- though more complicated -- one. This model would be able to handle multiple masters and master->slave. First of all, all machines in the cluster would have to be aware all the machines in the cluster. This would have to be stored in a new system table. The FE/BE protocol would need to be modified to accepted parsed node trees generated by pg_analyze_and_rewrite(). These could then be dispatched by the executing server, inside of pg_exec_query_string, to all other servers in the cluster (excluding itself). Naturally, this dispatch would need to be non-blocking. pg_exec_query_string() would need to check that nodetags to make sure selects and perhaps some commands are not dispatched. Before the executing server runs finish_xact_command(), it would check that the query was successfully executed on all machines otherwise abort. Such a system would need a few configuration options: whether or not you abort on failed replication to slaves, the ability to replicate only certain tables, etc. Naturally, this would slow down writes to the system (possibly a lot depending on the performance difference between the executing machine and the least powerful machine in the cluster), but most usages of postgresql are read intensive, not write. Any reason this model would not work? Gavin ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M18558=candle.pha.pa.us=pgman@postgresql.org Thu Feb 7 08:31:00 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17DUxP13923 for ; Thu, 7 Feb 2002 08:30:59 -0500 (EST) Received: (qmail 91796 invoked by alias); 7 Feb 2002 13:30:55 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 7 Feb 2002 13:30:55 -0000 Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g17Cw0E87782 for ; Thu, 7 Feb 2002 07:58:01 -0500 (EST) (envelope-from markw@mohawksoft.com) Received: from mohawksoft.com (localhost [127.0.0.1]) by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g17CqNt16887; Thu, 7 Feb 2002 07:52:24 -0500 Message-ID: <3C627887.CC9FF837@mohawksoft.com> Date: Thu, 07 Feb 2002 07:52:23 -0500 From: mlw X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686) X-Accept-Language: en MIME-Version: 1.0 To: Gavin Sherry cc: PostgreSQL-development Subject: Re: [HACKERS] Replication References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR Gavin Sherry wrote: > Naturally, this would slow down writes to the system (possibly a lot > depending on the performance difference between the executing machine and > the least powerful machine in the cluster), but most usages of postgresql > are read intensive, not write. > > Any reason this model would not work? What, then is the purpose of replication to multiple masters? I can think of only two reasons why you want replication. (1) Redundancy, make sure that if one server dies, then another server has the same data and is used seamlessly. (2) Increase performance over one system. In reason (1) I submit that a server load balance which sits on top of PostgreSQL, and executes writes on both servers while distributing reads would be best. This is a HUGE project. The load balancer must know EXACTLY how the system is configured, which includes all functions and everything. In reason (2) your system would fail to provide the scalability that would be needed. If writes take a long time, but reads are fine, what is the difference between the trigger based replicator? I have in the back of my mind, an idea of patching into the WAL stuff, and using that mechanism to push changes out to the slaves. Where one machine is still the master, but no trigger stuff, just a WAL patch. Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure exactly, the idea hasn't completely formed yet. ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html From pgsql-hackers-owner+M18574=candle.pha.pa.us=pgman@postgresql.org Thu Feb 7 12:51:42 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17HpfP16661 for ; Thu, 7 Feb 2002 12:51:41 -0500 (EST) Received: (qmail 62955 invoked by alias); 7 Feb 2002 17:50:42 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 7 Feb 2002 17:50:42 -0000 Received: from www1.navtechinc.com ([192.234.226.140]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g17HnTE62256 for ; Thu, 7 Feb 2002 12:49:29 -0500 (EST) (envelope-from ssinger@navtechinc.com) Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190]) by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id RAA07908; Thu, 7 Feb 2002 17:49:31 GMT Received: from localhost (ssinger@localhost) by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id RAA05687; Thu, 7 Feb 2002 17:48:52 GMT Date: Thu, 7 Feb 2002 17:48:51 +0000 (GMT) From: Steven Singer X-X-Sender: To: Gavin Sherry cc: mlw , PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR What you describe sounds like a form of a two-stage commit protocol. If the command worked on two of the replicated databases but failed on a third then the executing server would have to be able to undo the command on the replicated databases as well as itself. The problems with two stage commit type approches to replication are 1) Speed as you mentioned. Write speed isn't a concern for some applications but it is very important in others. and 2) All of the databases must be able to communicate with each other at all times in order for any edits to work. If the servers are connected over some sort of WAN that periodically has short outages this is a problem. Also if your using replication because you want to be able to take down one of the databases for short periods of time without bringing down the others your in trouble. btw: I posted the alternative to Rserv that I mentioned the other day to the pg-patches mailing list. If anyone is intreasted you should be able to grab it off the archives. On Thu, 7 Feb 2002, Gavin Sherry wrote: > > First of all, all machines in the cluster would have to be aware all the > machines in the cluster. This would have to be stored in a new system > table. > > The FE/BE protocol would need to be modified to accepted parsed node trees > generated by pg_analyze_and_rewrite(). These could then be dispatched by > the executing server, inside of pg_exec_query_string, to all other servers > in the cluster (excluding itself). Naturally, this dispatch would need to > be non-blocking. > > pg_exec_query_string() would need to check that nodetags to make sure > selects and perhaps some commands are not dispatched. > > Before the executing server runs finish_xact_command(), it would check > that the query was successfully executed on all machines otherwise > abort. Such a system would need a few configuration options: whether or > not you abort on failed replication to slaves, the ability to replicate > only certain tables, etc. > > Naturally, this would slow down writes to the system (possibly a lot > depending on the performance difference between the executing machine and > the least powerful machine in the cluster), but most usages of postgresql > are read intensive, not write. > > Any reason this model would not work? > > Gavin > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Steven Singer ssinger@navtechinc.com Aircraft Performance Systems Phone: 519-747-1170 ext 282 Navtech Systems Support Inc. AFTN: CYYZXNSX SITA: YYZNSCR Waterloo, Ontario ARINC: YKFNSCR ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org From pgsql-hackers-owner+M18590=candle.pha.pa.us=pgman@postgresql.org Thu Feb 7 17:50:42 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17MoeP27121 for ; Thu, 7 Feb 2002 17:50:40 -0500 (EST) Received: (qmail 39930 invoked by alias); 7 Feb 2002 22:50:17 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 7 Feb 2002 22:50:17 -0000 Received: from odin.fts.net (wall.icgate.net [209.26.177.2]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g17Ma4E38041 for ; Thu, 7 Feb 2002 17:36:04 -0500 (EST) (envelope-from fharvell@odin.fts.net) Received: from odin.fts.net (fharvell@localhost) by odin.fts.net (8.11.6/8.11.6) with ESMTP id g17MZhR17707; Thu, 7 Feb 2002 17:35:43 -0500 Message-ID: <200202072235.g17MZhR17707@odin.fts.net> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: F Harvell To: mlw cc: Gavin Sherry , PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: Message from mlw of "Thu, 07 Feb 2002 07:52:23 EST." <3C627887.CC9FF837@mohawksoft.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 07 Feb 2002 17:35:43 -0500 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR I'm not that familiar with the whole replication issues in PostgreSQL, however, I would be partial to replication that was based upon the playback of the (a?) journal file. (I believe that the WAL is a journal file.) By being based upon a journal file, it would be possible to accomplish two significant items. First, it would be possible to "restore" a database to an exact state just before a failure. Most commercial databases provide the ability to do this. Banks, etc. log the journal files directly to tape to provide a complete transaction history such that they can rebuild their database from any given snapshot. (Note that the journal file needs to be "editable" as a failure may be "delete from x" with a missing where clause.) This leads directly into the second advantage, the ability to have a replicated database operating anywhere, over any connection on any server. Speed of writes would not be a factor. In essence, as long as the replicated database had a snapshot of the database and then was provided with all journal files since the snapshot, it would be possible to build a current database. If the replicant got behind in the processing, it would catch up when things slowed down. In my opionion, the first advantage is in many ways most important. Replication becomes simply the restoration of the database in realtime on a second server. The "replication" task becomes the definition of a protocol for distributing the journal file. At least one major database vendor does replication (shadowing) in exactly this mannor. Maybe I'm all wet and the journal file and journal playback already exists. If so, IMHO, basing replication off of this would be the right direction. On Thu, 07 Feb 2002 07:52:23 EST, mlw wrote: > > I have in the back of my mind, an idea of patching into the WAL stuff, and > using that mechanism to push changes out to the slaves. > > Where one machine is still the master, but no trigger stuff, just a WAL patch. > Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure > exactly, the idea hasn't completely formed yet. > ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M18605=candle.pha.pa.us=pgman@postgresql.org Fri Feb 8 00:50:08 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g185o7P27878 for ; Fri, 8 Feb 2002 00:50:07 -0500 (EST) Received: (qmail 17348 invoked by alias); 8 Feb 2002 05:50:03 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 8 Feb 2002 05:50:03 -0000 Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g185cTE15241 for ; Fri, 8 Feb 2002 00:38:29 -0500 (EST) (envelope-from darren.johnson@cox.net) Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20020208053833.YKTV6710.lakemtao03.mgt.cox.net@cox.net> for ; Fri, 8 Feb 2002 00:38:33 -0500 Message-ID: <3C636232.6060206@cox.net> Date: Fri, 08 Feb 2002 00:29:22 -0500 From: Darren Johnson User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001108 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: PostgreSQL-development Subject: Re: [HACKERS] Replication References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > > The problems with two stage commit type approches to replication are IMHO the biggest problem with two phased commit is it doesn't scale. The more servers you add to the replica the slower it goes. Also there's the potential for dead locks across server boundaries. > > 2) All of the databases must be able to communicate with each other at > all times in order for any edits to work. If the servers are > connected over some sort of WAN that periodically has short outages this > is a problem. Also if your using replication because you want to be able > to take down one of the databases for short periods of time without > bringing down the others your in trouble. All true for two phased commit protocol. To have multi master replication, you must have all systems communicating, but you can use a multicast group communication system instead of 2PC. Using total order messaging, you can ensure all changes are delivered to all servers in the replica in the same order. This group communication system also allows failures to be detected while other servers in the replica continue processing. A few of us are working with this theory, and trying to integrate with 7.2. There is a working model for 6.4, but its very limited. (insert, update, and deletes) We are currently hosted at http://gborg.postgresql.org/project/pgreplication/projdisplay.php But the site has been down the last 2 days. I've contacted the web master, but haven't seen any results yet. If any one knows what going on with gborg, I'd appreciate a status. Darren ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-hackers-owner+M18617=candle.pha.pa.us=pgman@postgresql.org Fri Feb 8 06:20:44 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18BKhP06132 for ; Fri, 8 Feb 2002 06:20:43 -0500 (EST) Received: (qmail 90815 invoked by alias); 8 Feb 2002 11:20:40 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 8 Feb 2002 11:20:40 -0000 Received: from laptop.kieser.demon.co.uk (kieser.demon.co.uk [62.49.6.72]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g18B9ZE89589 for ; Fri, 8 Feb 2002 06:09:36 -0500 (EST) (envelope-from brad@kieser.net) Received: from laptop.kieser.demon.co.uk (localhost.localdomain [127.0.0.1]) by laptop.kieser.demon.co.uk (Postfix) with SMTP id 598393A132; Fri, 8 Feb 2002 11:09:36 +0000 (GMT) From: Bradley Kieser Date: Fri, 08 Feb 2002 11:09:36 GMT Message-ID: <20020208.11093600@laptop.kieser.demon.co.uk> Subject: Re: [HACKERS] Replication To: Darren Johnson cc: PostgreSQL-development In-Reply-To: <3C636232.6060206@cox.net> References: <3C636232.6060206@cox.net> X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux) X-Priority: 3 (Normal) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g18BJoF90352 Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR Darren, Given that different replication strategies will probably be developed for PG, do you envisage DBAs to be able to select the type of replication for their installation? I.e. Replication being selectable rther like storage structures? Would be a killer bit of flexibility, given how enormous the impact of replication will be to corporate adoption of PG. Brad >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< On 2/8/02, 5:29:22 AM, Darren Johnson wrote regarding Re: [HACKERS] Replication: > > > > The problems with two stage commit type approches to replication are > IMHO the biggest problem with two phased commit is it doesn't scale. > The more servers > you add to the replica the slower it goes. Also there's the potential > for dead locks across > server boundaries. > > > > 2) All of the databases must be able to communicate with each other at > > all times in order for any edits to work. If the servers are > > connected over some sort of WAN that periodically has short outages this > > is a problem. Also if your using replication because you want to be > able > > to take down one of the databases for short periods of time without > > bringing down the others your in trouble. > All true for two phased commit protocol. To have multi master > replication, you must have all > systems communicating, but you can use a multicast group communication > system instead of > 2PC. Using total order messaging, you can ensure all changes are > delivered to all servers in the > replica in the same order. This group communication system also allows > failures to be detected > while other servers in the replica continue processing. > A few of us are working with this theory, and trying to integrate with > 7.2. There is a working > model for 6.4, but its very limited. (insert, update, and deletes) We > are currently hosted at > http://gborg.postgresql.org/project/pgreplication/projdisplay.php > But the site has been down the last 2 days. I've contacted the web > master, but haven't seen > any results yet. If any one knows what going on with gborg, I'd > appreciate a status. > Darren > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org From pgsql-hackers-owner+M18642=candle.pha.pa.us=pgman@postgresql.org Fri Feb 8 12:40:36 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18HeZP08450 for ; Fri, 8 Feb 2002 12:40:35 -0500 (EST) Received: (qmail 74089 invoked by alias); 8 Feb 2002 17:40:30 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 8 Feb 2002 17:40:30 -0000 Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g18HbwE73437 for ; Fri, 8 Feb 2002 12:37:58 -0500 (EST) (envelope-from darren.johnson@cox.net) Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20020208173804.DKQS6710.lakemtao03.mgt.cox.net@cox.net>; Fri, 8 Feb 2002 12:38:04 -0500 Message-ID: <3C63FB71.206@cox.net> Date: Fri, 08 Feb 2002 11:23:13 -0500 From: Darren Johnson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01 X-Accept-Language: en MIME-Version: 1.0 To: Bradley Kieser cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Replication References: <3C636232.6060206@cox.net> <20020208.11093600@laptop.kieser.demon.co.uk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > > Given that different replication strategies will probably be developed > for PG, do you envisage DBAs to be able to select the type of replication > for their installation? I.e. Replication being selectable rther like > storage structures? I can't speak for other replication solutions, but we are using the --with-replication or -r parameter when starting postmaster. Some day I hope there will be parameters for master/slave partial/full and sync/async, but it will be some time before we cross those bridges. Darren ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org From pgsql-hackers-owner+M18658=candle.pha.pa.us=pgman@postgresql.org Fri Feb 8 14:42:40 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18JgdP28166 for ; Fri, 8 Feb 2002 14:42:39 -0500 (EST) Received: (qmail 18650 invoked by alias); 8 Feb 2002 19:42:39 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 8 Feb 2002 19:42:39 -0000 Received: from enigma.trueimpact.net (enigma.trueimpact.net [209.82.45.201]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g18JYBE17341 for ; Fri, 8 Feb 2002 14:34:11 -0500 (EST) (envelope-from rjonasz@trueimpact.com) Received: from nietzsche.trueimpact.net (unknown [209.82.45.200]) by enigma.trueimpact.net (Postfix) with ESMTP id A785066B04 for ; Fri, 8 Feb 2002 14:33:28 -0500 (EST) Date: Fri, 8 Feb 2002 14:34:34 -0500 (EST) From: Randall Jonasz X-X-Sender: To: PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: <3C627887.CC9FF837@mohawksoft.com> Message-ID: <20020208142932.H6545-100000@nietzsche.trueimpact.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR I've been looking into database replication theory lately and have found some interesting papers discussing various approaches. (Here's one paper that struck me as being very helpful, http://citeseer.nj.nec.com/460405.html ) So far I favour an eager replication system which is predicated on a read local/write all available. The system should not depend on two phase commit or primary copy algorithms. The former leads to the whole system being as quick as the slowest machine. In addition, 2 phase commit involves 2n messages for each transaction which does not scale well at all. This idea will also have to take into account a crashed node which did not ack a transaction. The primary copy algorithms I've seen suffer from a single point of failure and potential bottlenecks at the primary node. Instead I like the master to master or peer to peer algorithm as discussed in the above paper. This approach accounts for network partitions, nodes leaving and joining a cluster and the ability to commit a transaction once the communication module has determined the total order of the said transaction, i.e. no need for waiting for acks. This scales well and research has shown it to increase the number of transactions/second a database cluster can handle over a single node. Postgres-R is another interesting approach which I think should be taken seriously. Anyone interested can read a paper on this at http://citeseer.nj.nec.com/330257.html Anyways, my two cents Randall Jonasz Software Engineer Click2net Inc. On Thu, 7 Feb 2002, mlw wrote: > Gavin Sherry wrote: > > Naturally, this would slow down writes to the system (possibly a lot > > depending on the performance difference between the executing machine and > > the least powerful machine in the cluster), but most usages of postgresql > > are read intensive, not write. > > > > Any reason this model would not work? > > What, then is the purpose of replication to multiple masters? > > I can think of only two reasons why you want replication. (1) Redundancy, make > sure that if one server dies, then another server has the same data and is used > seamlessly. (2) Increase performance over one system. > > In reason (1) I submit that a server load balance which sits on top of > PostgreSQL, and executes writes on both servers while distributing reads would > be best. This is a HUGE project. The load balancer must know EXACTLY how the > system is configured, which includes all functions and everything. > > In reason (2) your system would fail to provide the scalability that would be > needed. If writes take a long time, but reads are fine, what is the difference > between the trigger based replicator? > > I have in the back of my mind, an idea of patching into the WAL stuff, and > using that mechanism to push changes out to the slaves. > > Where one machine is still the master, but no trigger stuff, just a WAL patch. > Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure > exactly, the idea hasn't completely formed yet. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > > ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html From pgsql-hackers-owner+M18660=candle.pha.pa.us=pgman@postgresql.org Fri Feb 8 15:20:32 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18KKSP03731 for ; Fri, 8 Feb 2002 15:20:29 -0500 (EST) Received: (qmail 28961 invoked by alias); 8 Feb 2002 20:20:27 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 8 Feb 2002 20:20:27 -0000 Received: from inflicted.crimelabs.net (crimelabs.net [66.92.101.112]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g18KC7E27667 for ; Fri, 8 Feb 2002 15:12:07 -0500 (EST) (envelope-from bpalmer@crimelabs.net) Received: from mizer.crimelabs.net (mizer.crimelabs.net [192.168.88.10]) by inflicted.crimelabs.net (Postfix) with ESMTP id 1066F8787; Fri, 8 Feb 2002 15:12:08 -0500 (EST) Date: Fri, 8 Feb 2002 15:12:00 -0500 (EST) From: bpalmer To: Randall Jonasz cc: PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: <20020208142932.H6545-100000@nietzsche.trueimpact.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR I've not looked at the first paper, but I wil. > Postgres-R is another interesting approach which I think should be taken > seriously. Anyone interested can read a paper on this at > http://citeseer.nj.nec.com/330257.html I would point you to the info on gborg, but it seems to be down at the moment. - Brandon ---------------------------------------------------------------------------- c: 646-456-5455 h: 201-798-4983 b. palmer, bpalmer@crimelabs.net pgp:crimelabs.net/bpalmer.pgp5 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From pgsql-hackers-owner+M18666=candle.pha.pa.us=pgman@postgresql.org Fri Feb 8 17:41:03 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18Mf2P18046 for ; Fri, 8 Feb 2002 17:41:03 -0500 (EST) Received: (qmail 63057 invoked by alias); 8 Feb 2002 22:41:02 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 8 Feb 2002 22:41:02 -0000 Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g18MR9E60361 for ; Fri, 8 Feb 2002 17:27:11 -0500 (EST) (envelope-from darren.johnson@cox.net) Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20020208222634.GTRG6710.lakemtao03.mgt.cox.net@cox.net>; Fri, 8 Feb 2002 17:26:34 -0500 Message-ID: <3C643F0F.70303@cox.net> Date: Fri, 08 Feb 2002 16:11:43 -0500 From: Darren Johnson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01 X-Accept-Language: en MIME-Version: 1.0 To: Randall Jonasz cc: PostgreSQL-development Subject: Re: [HACKERS] Replication References: <20020208142932.H6545-100000@nietzsche.trueimpact.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > I've been looking into database replication theory lately and have found > some interesting papers discussing various approaches. (Here's > one paper that struck me as being very helpful, > http://citeseer.nj.nec.com/460405.html ) Here is another one from that same group, that addresses the WAN issues. > http://www.cnds.jhu.edu/pub/papers/cnds-2002-1.pdf enjoy, Darren ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org From pgsql-hackers-owner+M18674=candle.pha.pa.us=pgman@postgresql.org Fri Feb 8 19:20:30 2002 Return-path: Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9]) by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g190KTP26980 for ; Fri, 8 Feb 2002 19:20:29 -0500 (EST) Received: (qmail 88124 invoked by alias); 9 Feb 2002 00:20:27 -0000 Received: from unknown (HELO postgresql.org) (64.49.215.8) by www.postgresql.org with SMTP; 9 Feb 2002 00:20:27 -0000 Received: from localhost.localdomain (bgp01077650bgs.wanarb01.mi.comcast.net [68.40.135.112]) by postgresql.org (8.11.3/8.11.4) with ESMTP id g190H3E87489 for ; Fri, 8 Feb 2002 19:17:03 -0500 (EST) (envelope-from camber@ais.org) Received: from localhost (camber@localhost) by localhost.localdomain (8.11.6/8.11.6) with ESMTP id g190H0P18427; Fri, 8 Feb 2002 19:17:00 -0500 X-Authentication-Warning: localhost.localdomain: camber owned process doing -bs Date: Fri, 8 Feb 2002 19:17:00 -0500 (EST) From: Brian Bruns X-X-Sender: To: Randall Jonasz cc: PostgreSQL-development Subject: Re: [HACKERS] Replication In-Reply-To: <20020208142932.H6545-100000@nietzsche.trueimpact.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Status: OR > > I have in the back of my mind, an idea of patching into the WAL stuff, and > > using that mechanism to push changes out to the slaves. > > > > Where one machine is still the master, but no trigger stuff, just a WAL patch. > > Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure > > exactly, the idea hasn't completely formed yet. > > FWIW, Sybase Replication Server does just such a thing. They have a secondary log marker (prevents the log from truncating past the oldest unreplicated transaction). A thread within the system called the "rep agent" (but it use to be a separate process call the LTM), reads the log and forwards it to the rep server, once the rep server has the whole transaction and it is written to a stable device (aka synced to disk) the rep server responds to the LTM telling him it's OK to move the log marker forward. Anyway, once the replication server proper has the transaction it uses a publish/subscribe methodology to see who wants get the update. Bidirectional replication is done by making two oneway replications. The whole thing is table based, it marks the tables as replicated or not in the database to save the trip to the repserver on un replicated tables. Plus you can take parts of a database (replicate all rows where the country is "us" to this server and all the rows with "uk" to that server). Or opposite you can roll up smaller regional databases to bigger ones, it's very flexible. Cheers, Brian ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster