diff --git a/doc/TODO b/doc/TODO index c9f7b7e9b6..84c3128529 100644 --- a/doc/TODO +++ b/doc/TODO @@ -155,7 +155,7 @@ EXOTIC FEATURES * Add sql3 recursive unions * Add the concept of dataspaces -* Add replication of distributed databases +* Add replication of distributed databases [replication] * Allow queries across multiple databases * Allow nested transactions (Vadim) @@ -198,7 +198,7 @@ FSYNC INDEXES -* Use indexes in ORDER BY for min(), max() +* Use indexes to find min() and max() * Use index to restrict rows returned by multi-key index when used with non-consecutive keys or OR clauses, so fewer heap accesses * Allow SELECT * FROM tab WHERE int2col = 4 use int2col index, int8, diff --git a/doc/TODO.detail/replication b/doc/TODO.detail/replication new file mode 100644 index 0000000000..d18f7db52d --- /dev/null +++ b/doc/TODO.detail/replication @@ -0,0 +1,907 @@ +From goran@kirra.net Mon Dec 20 14:30:54 1999 +Received: from villa.bildbasen.se (villa.bildbasen.se [193.45.225.97]) + by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id PAA29058 + for ; Mon, 20 Dec 1999 15:30:17 -0500 (EST) +Received: (qmail 2485 invoked from network); 20 Dec 1999 20:29:53 -0000 +Received: from a112.dial.kiruna.se (HELO kirra.net) (193.45.238.12) + by villa.bildbasen.se with SMTP; 20 Dec 1999 20:29:53 -0000 +Sender: goran +Message-ID: <385E9192.226CC37D@kirra.net> +Date: Mon, 20 Dec 1999 21:29:06 +0100 +From: Goran Thyni +Organization: kirra.net +X-Mailer: Mozilla 4.6 [en] (X11; U; Linux 2.2.13 i586) +X-Accept-Language: sv, en +MIME-Version: 1.0 +To: Bruce Momjian +CC: "neil d. quiogue" , + PostgreSQL-development +Subject: Re: [HACKERS] Re: QUESTION: Replication +References: <199912201508.KAA20572@candle.pha.pa.us> +Content-Type: text/plain; charset=iso-8859-1 +Content-Transfer-Encoding: 8bit +Status: OR + +Bruce Momjian wrote: +> We need major work in this area, or at least a plan and an FAQ item. +> We are getting major questions on this, and I don't know enough even to +> make an FAQ item telling people their options. + +My 2 cents, or 2 ören since I'm a Swede, on this: + +It is pretty simple to build a replication with pg_dump, transfer, +empty replic and reload. +But if we want "live replicas" we better base our efforts on a +mechanism using WAL-logs to rollforward the replicas. + +regards, +----------------- +Göran Thyni +On quiet nights you can hear Windows NT reboot! + +From owner-pgsql-hackers@hub.org Fri Dec 24 10:01:18 1999 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA11295 + for ; Fri, 24 Dec 1999 11:01:17 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id KAA20310 for ; Fri, 24 Dec 1999 10:39:18 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id KAA61760; + Fri, 24 Dec 1999 10:31:13 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 10:30:48 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id KAA58879 + for pgsql-hackers-outgoing; Fri, 24 Dec 1999 10:29:51 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from bocs170n.black-oak.COM ([38.149.137.131]) + by hub.org (8.9.3/8.9.3) with ESMTP id KAA58795 + for ; Fri, 24 Dec 1999 10:29:00 -0500 (EST) + (envelope-from DWalker@black-oak.com) +From: DWalker@black-oak.com +To: pgsql-hackers@postgreSQL.org +Subject: [HACKERS] database replication +Date: Fri, 24 Dec 1999 10:27:59 -0500 +Message-ID: +X-Priority: 3 (Normal) +X-MIMETrack: Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99 + 10:28:01 AM +MIME-Version: 1.0 +MIME-Version: 1.0 +Content-Type: text/html; charset=ISO-8859-1 +Content-Transfer-Encoding: quoted-printable +Sender: owner-pgsql-hackers@postgreSQL.org +Status: OR + +

I've been toying with the idea of implementing database replication for = +the last few days.  The system I'm proposing will be a seperate progra= +m which can be run on any machine and will most likely be implemented in Py= +thon.  What I'm looking for at this point are gaping holes in my think= +ing/logic/etc.  Here's what I'm thinking...

 

1) I wa= +nt to make this program an additional layer over PostgreSQL.  I really= + don't want to hack server code if I can get away with it.  At this po= +int I don't feel I need to.

2) The replication system will need to ad= +d at least one field to each table in each database that needs to be replic= +ated.  This field will be a date/time stamp which identifies the "= +;last update" of the record.  This field will be called PGR=5FTIM= +E for lack of a better name.  Because this field will be used from wit= +hin programs and triggers it can be longer so as to not mistake it for a us= +er field.

3) For each table to be replicated the replication system w= +ill programatically add one plpgsql function and trigger to modify the PGR= +=5FTIME field on both UPDATEs and INSERTs.  The name of this function = +and trigger will be along the lines of <table=5Fname>=5Freplication= +=5Fupdate=5Ftrigger and <table=5Fname>=5Freplication=5Fupdate=5Ffunct= +ion.  The function is a simple two-line chunk of code to set the field= + PGR=5FTIME equal to NOW.  The trigger is called before each insert/up= +date.  When looking at the Docs I see that times are stored in Zulu (G= +T) time.  Because of this I don't have to worry about time zones and t= +he like.  I need direction on this part (such as "hey dummy, look= + at page N of file X.").

4) At this point we have tables which c= +an, at a basic level, tell the replication system when they were last updat= +ed.

5) The replication system will have a database of its own to reco= +rd the last replication event, hold configuration, logs, etc.  I'd pre= +fer to store the configuration in a PostgreSQL table but it could just as e= +asily be stored in a text file on the filesystem somewhere.

6) To han= +dle replication I basically check the local "last replication time&quo= +t; and compare it against the remote PGR=5FTIME fields.  If the remote= + PGR=5FTIME is greater than the last replication time then change the local= + copy of the database, otherwise, change the remote end of the database. &n= +bsp;At this point I don't have a way to know WHICH field changed between th= +e two replicas so either I do ROW level replication or I check each field. = + I check PGR=5FTIME to determine which field is the most current. &nbs= +p;Some fine tuning of this process will have to occur no doubt.

7) Th= +e commandline utility, fired off by something like cron, could run several = +times during the day -- command line parameters can be implemented to say P= +USH ALL CHANGES TO SERVER A, or PULL ALL CHANGES FROM SERVER B.

 = +;

Questions/Concerns:

1) How far do I go with this?  Do I = +start manhandling the system catalogs (pg=5F* tables)?

2) As to #2 an= +d #3 above, I really don't like tools automagically changing my tables but = +at this point I don't see a way around it.  I guess this is where the = +testing comes into play.

3) Security: the replication app will have t= +o have pretty good rights to the database so it can add the nessecary funct= +ions and triggers, modify table schema, etc.  

 

&nbs= +p; So, any "you're insane and should run home to momma" comments?= +

 

              Damond= +

= + +************ + +From owner-pgsql-hackers@hub.org Fri Dec 24 18:31:03 1999 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA26244 + for ; Fri, 24 Dec 1999 19:31:02 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id TAA12730 for ; Fri, 24 Dec 1999 19:30:05 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id TAA57851; + Fri, 24 Dec 1999 19:23:31 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 19:22:54 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id TAA57710 + for pgsql-hackers-outgoing; Fri, 24 Dec 1999 19:21:56 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from Mail.austin.rr.com (sm2.texas.rr.com [24.93.35.55]) + by hub.org (8.9.3/8.9.3) with ESMTP id TAA57680 + for ; Fri, 24 Dec 1999 19:21:25 -0500 (EST) + (envelope-from ELOEHR@austin.rr.com) +Received: from austin.rr.com ([24.93.40.248]) by Mail.austin.rr.com with Microsoft SMTPSVC(5.5.1877.197.19); + Fri, 24 Dec 1999 18:12:50 -0600 +Message-ID: <38640E2D.75136600@austin.rr.com> +Date: Fri, 24 Dec 1999 18:22:05 -0600 +From: Ed Loehr +X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.12-20smp i686) +X-Accept-Language: en +MIME-Version: 1.0 +To: DWalker@black-oak.com +CC: pgsql-hackers@postgreSQL.org +Subject: Re: [HACKERS] database replication +References: +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@postgreSQL.org +Status: OR + +DWalker@black-oak.com wrote: + +> 6) To handle replication I basically check the local "last +> replication time" and compare it against the remote PGR_TIME +> fields. If the remote PGR_TIME is greater than the last replication +> time then change the local copy of the database, otherwise, change +> the remote end of the database. At this point I don't have a way to +> know WHICH field changed between the two replicas so either I do ROW +> level replication or I check each field. I check PGR_TIME to +> determine which field is the most current. Some fine tuning of this +> process will have to occur no doubt. + +Interesting idea. I can see how this might sync up two databases +somehow. For true replication, however, I would always want every +replicated database to be, at the very least, internally consistent +(i.e., referential integrity), even if it was a little behind on +processing transactions. In this method, its not clear how +consistency is every achieved/guaranteed at any point in time if the +input stream of changes is continuous. If the input stream ceased, +then I can see how this approach might eventually catch up and totally +resync everything, but it looks *very* computationally expensive. + +But I might have missed something. How would internal consistency be +maintained? + + +> 7) The commandline utility, fired off by something like cron, could +> run several times during the day -- command line parameters can be +> implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES +> FROM SERVER B. + +My two cents is that, while I can see this kind of database syncing as +valuable, this is not the kind of "replication" I had in mind. This +may already possible by simply copying the database. What replication +means to me is a live, continuously streaming sequence of updates from +one database to another where the replicated database is always +internally consistent, available for read-only queries, and never "too +far" out of sync with the source/primary database. + +What does replication mean to others? + +Cheers, +Ed Loehr + + + +************ + +From owner-pgsql-hackers@hub.org Fri Dec 24 21:31:10 1999 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA02578 + for ; Fri, 24 Dec 1999 22:31:09 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id WAA16641 for ; Fri, 24 Dec 1999 22:18:56 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id WAA89135; + Fri, 24 Dec 1999 22:11:12 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 22:10:56 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id WAA89019 + for pgsql-hackers-outgoing; Fri, 24 Dec 1999 22:09:59 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from bocs170n.black-oak.COM ([38.149.137.131]) + by hub.org (8.9.3/8.9.3) with ESMTP id WAA88957; + Fri, 24 Dec 1999 22:09:11 -0500 (EST) + (envelope-from dwalker@black-oak.com) +Received: from gcx80 ([151.196.99.113]) + by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1) + with SMTP id 1999122422080835:6 ; + Fri, 24 Dec 1999 22:08:08 -0500 +Message-ID: <001b01bf4e9e$647287d0$af63a8c0@walkers.org> +From: "Damond Walker" +To: +Cc: +References: <38640E2D.75136600@austin.rr.com> +Subject: Re: [HACKERS] database replication +Date: Fri, 24 Dec 1999 22:07:55 -0800 +MIME-Version: 1.0 +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook Express 5.00.2314.1300 +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99 + 10:08:09 PM, + Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99 + 10:08:11 PM, + Serialize complete at 12/24/99 10:08:11 PM +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; + charset="iso-8859-1" +Sender: owner-pgsql-hackers@postgreSQL.org +Status: OR + +> +> Interesting idea. I can see how this might sync up two databases +> somehow. For true replication, however, I would always want every +> replicated database to be, at the very least, internally consistent +> (i.e., referential integrity), even if it was a little behind on +> processing transactions. In this method, its not clear how +> consistency is every achieved/guaranteed at any point in time if the +> input stream of changes is continuous. If the input stream ceased, +> then I can see how this approach might eventually catch up and totally +> resync everything, but it looks *very* computationally expensive. +> + + What's the typical unit of work for the database? Are we talking about +update transactions which span the entire DB? Or are we talking about +updating maybe 1% or less of the database everyday? I'd think it would be +more towards the latter than the former. So, yes, this process would be +computationally expensive but how many records would actually have to be +sent back and forth? + +> But I might have missed something. How would internal consistency be +> maintained? +> + + Updates that occur at site A will be moved to site B and vice versa. +Consistency would be maintained. The only problem that I can see right off +the bat would be what if site A and site B made changes to a row and then +site C was brought into the picture? Which one wins? + + Someone *has* to win when it comes to this type of thing. You really +DON'T want to start merging row changes... + +> +> My two cents is that, while I can see this kind of database syncing as +> valuable, this is not the kind of "replication" I had in mind. This +> may already possible by simply copying the database. What replication +> means to me is a live, continuously streaming sequence of updates from +> one database to another where the replicated database is always +> internally consistent, available for read-only queries, and never "too +> far" out of sync with the source/primary database. +> + + Sounds like you're talking about distributed transactions to me. That's +an entirely different subject all-together. What you describe can be done +by copying a database...but as you say, this would only work in a read-only +situation. + + + Damond + + +************ + +From owner-pgsql-hackers@hub.org Sat Dec 25 16:35:07 1999 +Received: from hub.org (hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA28890 + for ; Sat, 25 Dec 1999 17:35:05 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id RAA86997; + Sat, 25 Dec 1999 17:29:10 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Sat, 25 Dec 1999 17:28:09 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id RAA86863 + for pgsql-hackers-outgoing; Sat, 25 Dec 1999 17:27:11 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from mtiwmhc08.worldnet.att.net (mtiwmhc08.worldnet.att.net [204.127.131.19]) + by hub.org (8.9.3/8.9.3) with ESMTP id RAA86798 + for ; Sat, 25 Dec 1999 17:26:34 -0500 (EST) + (envelope-from pgsql@rkirkpat.net) +Received: from [192.168.3.100] ([12.74.72.219]) + by mtiwmhc08.worldnet.att.net (InterMail v03.02.07.07 118-134) + with ESMTP id <19991225222554.VIOL28505@[12.74.72.219]>; + Sat, 25 Dec 1999 22:25:54 +0000 +Date: Sat, 25 Dec 1999 15:25:47 -0700 (MST) +From: Ryan Kirkpatrick +X-Sender: rkirkpat@excelsior.rkirkpat.net +To: DWalker@black-oak.com +cc: pgsql-hackers@postgreSQL.org +Subject: Re: [HACKERS] database replication +In-Reply-To: +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Sender: owner-pgsql-hackers@postgreSQL.org +Status: OR + +On Fri, 24 Dec 1999 DWalker@black-oak.com wrote: + +> I've been toying with the idea of implementing database replication +> for the last few days. + + I too have been thinking about this some over the last year or +two, just trying to find a quick and easy way to do it. I am not so +interested in replication, as in synchronization, as in between a desktop +machine and a laptop, so I can keep the databases on each in sync with +each other. For this sort of purpose, both the local and remote databases +would be "idle" at the time of syncing. + +> 2) The replication system will need to add at least one field to each +> table in each database that needs to be replicated. This field will be +> a date/time stamp which identifies the "last update" of the record. +> This field will be called PGR_TIME for lack of a better name. +> Because this field will be used from within programs and triggers it +> can be longer so as to not mistake it for a user field. + + How about a single, seperate table with the fields of 'database', +'tablename', 'oid', 'last_changed', that would store the same data as your +PGR_TIME field. It would be seperated from the actually data tables, and +therefore would be totally transparent to any database interface +applications. The 'oid' field would hold each row's OID, a nice, unique +identification number for the row, while the other fields would tell which +table and database the oid is in. Then this table can be compared with the +this table on a remote machine to quickly find updates and changes, then +each differences can be dealt with in turn. + +> 3) For each table to be replicated the replication system will +> programatically add one plpgsql function and trigger to modify the +> PGR_TIME field on both UPDATEs and INSERTs. The name of this function +> and trigger will be along the lines of +> _replication_update_trigger and +> _replication_update_function. The function is a simple +> two-line chunk of code to set the field PGR_TIME equal to NOW. The +> trigger is called before each insert/update. When looking at the Docs +> I see that times are stored in Zulu (GT) time. Because of this I +> don't have to worry about time zones and the like. I need direction +> on this part (such as "hey dummy, look at page N of file X."). + + I like this idea, better than any I have come up with yet. Though, +how are you going to handle DELETEs? + +> 6) To handle replication I basically check the local "last replication +> time" and compare it against the remote PGR_TIME fields. If the +> remote PGR_TIME is greater than the last replication time then change +> the local copy of the database, otherwise, change the remote end of +> the database. At this point I don't have a way to know WHICH field +> changed between the two replicas so either I do ROW level replication +> or I check each field. I check PGR_TIME to determine which field is +> the most current. Some fine tuning of this process will have to occur +> no doubt. + + Yea, this is indeed the sticky part, and would indeed require some +fine-tunning. Basically, the way I see it, is if the two timestamps for a +single row do not match (or even if the row and therefore timestamp is +missing on one side or the other altogether): + local ts > remote ts => Local row is exported to remote. + remote ts > local ts => Remote row is exported to local. + local ts > last sync time && no remote ts => + Local row is inserted on remote. + local ts < last sync time && no remote ts => + Local row is deleted. + remote ts > last sync time && no local ts => + Remote row is inserted on local. + remote ts < last sync time && no local ts => + Remote row is deleted. +where the synchronization process is running on the local machine. By +exported, I mean the local values are sent to the remote machine, and the +row on that remote machine is updated to the local values. How does this +sound? + +> 7) The commandline utility, fired off by something like cron, could +> run several times during the day -- command line parameters can be +> implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES +> FROM SERVER B. + + Or run manually for my purposes. Also, maybe follow it +with a vacuum run on both sides for all databases, as this is going to +potenitally cause lots of table changes that could stand with a cleanup. + +> 1) How far do I go with this? Do I start manhandling the system catalogs (pg_* tables)? + + Initially, I would just stick to user table data... If you have +changes in triggers and other meta-data/executable code, you are going to +want to make syncs of that stuff manually anyway. At least I would want +to. + +> 2) As to #2 and #3 above, I really don't like tools automagically +> changing my tables but at this point I don't see a way around it. I +> guess this is where the testing comes into play. + + Hence the reason for the seperate table with just a row's +identification and last update time. Only modifications to the synced +database is the update trigger, which should be pretty harmless. + +> 3) Security: the replication app will have to have pretty good rights +> to the database so it can add the nessecary functions and triggers, +> modify table schema, etc. + + Just run the sync program as the postgres super user, and there +are no problems. :) + +> So, any "you're insane and should run home to momma" comments? + + No, not at all. Though it probably should be remaned from +replication to synchronization. The former is usually associated with a +continuous stream of updates between the local and remote databases, so +they are almost always in sync, and have a queuing ability if their +connection is loss for span of time as well. Very complex and difficult to +implement, and would require hacking server code. :( Something only Sybase +and Oracle have (as far as I know), and from what I have seen of Sybase's +replication server support (dated by 5yrs) it was a pain to setup and get +running correctly. + The latter, synchronization, is much more managable, and can still +be useful, especially when you have a large database you want in two +places, mainly for read only purposes at one end or the other, but don't +want to waste the time/bandwidth to move and load the entire database each +time it changes on one end or the other. Same idea as mirroring software +for FTP sites, just transfers the changes, and nothing more. + I also like the idea of using Python. I have been using it +recently for some database interfaces (to PostgreSQL of course :), and it +is a very nice language to work with. Some worries about performance of +the program though, as python is only an interpreted lanuage, and I have +yet to really be impressed with the speed of execution of my database +interfaces yet. + Anyway, it sound like a good project, and finally one where I +actually have a clue of what is going on, and the skills to help. So, if +you are interested in pursing this project, I would be more than glad to +help. TTYL. + +--------------------------------------------------------------------------- +| "For to me to live is Christ, and to die is gain." | +| --- Philippians 1:21 (KJV) | +--------------------------------------------------------------------------- +| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | +--------------------------------------------------------------------------- + + + +************ + +From owner-pgsql-hackers@hub.org Sun Dec 26 08:31:09 1999 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA17976 + for ; Sun, 26 Dec 1999 09:31:07 -0500 (EST) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id JAA23337 for ; Sun, 26 Dec 1999 09:28:36 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id JAA90738; + Sun, 26 Dec 1999 09:21:58 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 09:19:19 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id JAA90498 + for pgsql-hackers-outgoing; Sun, 26 Dec 1999 09:18:21 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from bocs170n.black-oak.COM ([38.149.137.131]) + by hub.org (8.9.3/8.9.3) with ESMTP id JAA90452 + for ; Sun, 26 Dec 1999 09:17:54 -0500 (EST) + (envelope-from dwalker@black-oak.com) +Received: from vmware98 ([151.196.99.113]) + by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1) + with SMTP id 1999122609164808:7 ; + Sun, 26 Dec 1999 09:16:48 -0500 +Message-ID: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org> +From: "Damond Walker" +To: "Ryan Kirkpatrick" +Cc: +Subject: Re: [HACKERS] database replication +Date: Sun, 26 Dec 1999 10:10:41 -0500 +MIME-Version: 1.0 +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook Express 4.72.3110.1 +X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 +X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99 + 09:16:51 AM, + Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99 + 09:16:54 AM, + Serialize complete at 12/26/99 09:16:54 AM +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; + charset="iso-8859-1" +Sender: owner-pgsql-hackers@postgreSQL.org +Status: OR + +> +> I too have been thinking about this some over the last year or +>two, just trying to find a quick and easy way to do it. I am not so +>interested in replication, as in synchronization, as in between a desktop +>machine and a laptop, so I can keep the databases on each in sync with +>each other. For this sort of purpose, both the local and remote databases +>would be "idle" at the time of syncing. +> + + I don't think it would matter if the databases are idle or not to be +honest with you. At any single point in time when you replicate I'd figure +that the database would be in a consistent state. So, you should be able to +replicate (or sync) a remote database that is in use. After all, you're +getting a snapshot of the database as it stands at 8:45 PM. At 8:46 PM it +may be totally different...but the next time syncing takes place those +changes would appear in your local copy. + + The one problem you may run into is if the remote host is running a +large batch process. It's very likely that you will get 50% of their +changes when you replicate...but then again, that's why you can schedule the +event to work around such things. + +> How about a single, seperate table with the fields of 'database', +>'tablename', 'oid', 'last_changed', that would store the same data as your +>PGR_TIME field. It would be seperated from the actually data tables, and +>therefore would be totally transparent to any database interface +>applications. The 'oid' field would hold each row's OID, a nice, unique +>identification number for the row, while the other fields would tell which +>table and database the oid is in. Then this table can be compared with the +>this table on a remote machine to quickly find updates and changes, then +>each differences can be dealt with in turn. +> + + The problem with OID's is that they are unique at the local level but if +you try and use them between servers you can run into overlap. Also, if a +database is under heavy use this table could quickly become VERY large. Add +indexes to this table to help performance and you're taking up even more +disk space. + + Using the PGR_TIME field with an index will allow us to find rows which +have changed VERY quickly. All we need to do now is somehow programatically +find the primary key for a table so the person setting up replication (or +syncing) doesn't have to have an indepth knowledge of the schema in order to +setup a syncing schedule. + +> +> I like this idea, better than any I have come up with yet. Though, +>how are you going to handle DELETEs? +> + + Oops...how about defining a trigger for this? With deletion I guess we +would have to move a flag into another table saying we deleted record 'X' +with this primary key from this table. + +> +> Yea, this is indeed the sticky part, and would indeed require some +>fine-tunning. Basically, the way I see it, is if the two timestamps for a +>single row do not match (or even if the row and therefore timestamp is +>missing on one side or the other altogether): +> local ts > remote ts => Local row is exported to remote. +> remote ts > local ts => Remote row is exported to local. +> local ts > last sync time && no remote ts => +> Local row is inserted on remote. +> local ts < last sync time && no remote ts => +> Local row is deleted. +> remote ts > last sync time && no local ts => +> Remote row is inserted on local. +> remote ts < last sync time && no local ts => +> Remote row is deleted. +>where the synchronization process is running on the local machine. By +>exported, I mean the local values are sent to the remote machine, and the +>row on that remote machine is updated to the local values. How does this +>sound? +> + + The replication part will be the most complex...that much is for +certain... + + I've been writing systems in Lotus Notes/Domino for the last year or so +and I've grown quite spoiled with what it can do in regards to replication. +It's not real-time but you have to gear your applications to this type of +thing (it's possible to create documents, fire off email to notify people of +changes and have the email arrive before the replicated documents do). +Replicating large Notes/Domino databases takes quite a while....I don't see +any kind of replication or syncing running in a blink of an eye. + + Having said that, a good algo will have to be written to cut down on +network traffic and to keep database conversations down to a minimum. This +will be appreciated by people with low bandwidth connections I'm sure +(dial-ups, fractional T1's, etc). + +> Or run manually for my purposes. Also, maybe follow it +>with a vacuum run on both sides for all databases, as this is going to +>potenitally cause lots of table changes that could stand with a cleanup. +> + + What would a vacuum do to a system being used by many people? + +> No, not at all. Though it probably should be remaned from +>replication to synchronization. The former is usually associated with a +>continuous stream of updates between the local and remote databases, so +>they are almost always in sync, and have a queuing ability if their +>connection is loss for span of time as well. Very complex and difficult to +>implement, and would require hacking server code. :( Something only Sybase +>and Oracle have (as far as I know), and from what I have seen of Sybase's +>replication server support (dated by 5yrs) it was a pain to setup and get +>running correctly. + + It could probably be named either way...but the one thing I really don't +want to do is start hacking server code. The PostgreSQL people have enough +to do without worrying about trying to meld anything I've done to their +server. :) + + Besides, I like the idea of having it operate as a stand-alone product. +The only PostgreSQL feature we would require would be triggers and +plpgsql...what was the earliest version of PostgreSQL that supported +plpgsql? Even then I don't see the triggers being that complex to boot. + +> I also like the idea of using Python. I have been using it +>recently for some database interfaces (to PostgreSQL of course :), and it +>is a very nice language to work with. Some worries about performance of +>the program though, as python is only an interpreted lanuage, and I have +>yet to really be impressed with the speed of execution of my database +>interfaces yet. + + The only thing we'd need for Python is the Python extensions for +PostgreSQL...which in turn requires libpq and that's about it. So, it +should be able to run on any platform supported by Python and libpq. Using +TK for the interface components will require NT people to get additional +software from the 'net. At least it did with older version of Windows +Python. Unix folks should be happy....assuming they have X running on the +machine doing the replication or syncing. Even then I wrote a curses based +Python interface awhile back which allows buttons, progress bars, input +fields, etc (I called it tinter and it's available at +http://iximd.com/~dwalker). It's a simple interface and could probably be +cleaned up a bit but it works. :) + +> Anyway, it sound like a good project, and finally one where I +>actually have a clue of what is going on, and the skills to help. So, if +>you are interested in pursing this project, I would be more than glad to +>help. TTYL. +> + + + That would be a Good Thing. Have webspace somewhere? If I can get +permission from the "powers that be" at the office I could host a website on +our (Domino) webserver. + + Damond + + +************ + +From owner-pgsql-hackers@hub.org Sun Dec 26 19:11:48 1999 +Received: from hub.org (hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA26661 + for ; Sun, 26 Dec 1999 20:11:46 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id UAA14959; + Sun, 26 Dec 1999 20:08:15 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 20:07:27 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id UAA14820 + for pgsql-hackers-outgoing; Sun, 26 Dec 1999 20:06:28 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from mtiwmhc02.worldnet.att.net (mtiwmhc02.worldnet.att.net [204.127.131.37]) + by hub.org (8.9.3/8.9.3) with ESMTP id UAA14749 + for ; Sun, 26 Dec 1999 20:05:39 -0500 (EST) + (envelope-from rkirkpat@rkirkpat.net) +Received: from [192.168.3.100] ([12.74.72.56]) + by mtiwmhc02.worldnet.att.net (InterMail v03.02.07.07 118-134) + with ESMTP id <19991227010506.WJVW1914@[12.74.72.56]>; + Mon, 27 Dec 1999 01:05:06 +0000 +Date: Sun, 26 Dec 1999 18:05:02 -0700 (MST) +From: Ryan Kirkpatrick +X-Sender: rkirkpat@excelsior.rkirkpat.net +To: Damond Walker +cc: pgsql-hackers@postgreSQL.org +Subject: Re: [HACKERS] database replication +In-Reply-To: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Sender: owner-pgsql-hackers@postgreSQL.org +Status: OR + +On Sun, 26 Dec 1999, Damond Walker wrote: + +> > How about a single, seperate table with the fields of 'database', +> >'tablename', 'oid', 'last_changed', that would store the same data as your +> >PGR_TIME field. It would be seperated from the actually data tables, and +... +> The problem with OID's is that they are unique at the local level but if +> you try and use them between servers you can run into overlap. + + Yea, forgot about that point, but became dead obvious once you +mentioned it. Boy, I feel stupid now. :) + +> Using the PGR_TIME field with an index will allow us to find rows which +> have changed VERY quickly. All we need to do now is somehow programatically +> find the primary key for a table so the person setting up replication (or +> syncing) doesn't have to have an indepth knowledge of the schema in order to +> setup a syncing schedule. + + Hmm... Yea, maybe look to see which field(s) has a primary, unique +index on it? Then use those field(s) as a primary key. Just require that +any table to be synchronized to have some set of fields that uniquely +identify each row. Either that, or add another field to each table with +our own, cross system consistent, identification system. Don't know which +would be more efficient and easier to work with. + The former could potentially get sticky if it takes a lots of +fields to generate a unique key value, but has the smallest effect on the +table to be synced. The latter could be difficult to keep straight between +systems (local vs. remote), and would require a trigger on inserts to +generate a new, unique id number, that does not exist locally or +remotely (nasty issue there), but would remove the uniqueness +requirement. + +> Oops...how about defining a trigger for this? With deletion I guess we +> would have to move a flag into another table saying we deleted record 'X' +> with this primary key from this table. + + Or, according to my logic below, if a row is missing on one side +or the other, then just compare the remaining row's timestamp to the last +synchronization time (stored in a seperate table/db elsewhere). The +results of the comparsion and the state of row existences tell one if the +row was inserted or deleted since the last sync, and what should be done +to perform the sync. + +> > Yea, this is indeed the sticky part, and would indeed require some +> >fine-tunning. Basically, the way I see it, is if the two timestamps for a +> >single row do not match (or even if the row and therefore timestamp is +> >missing on one side or the other altogether): +> > local ts > remote ts => Local row is exported to remote. +> > remote ts > local ts => Remote row is exported to local. +> > local ts > last sync time && no remote ts => +> > Local row is inserted on remote. +> > local ts < last sync time && no remote ts => +> > Local row is deleted. +> > remote ts > last sync time && no local ts => +> > Remote row is inserted on local. +> > remote ts < last sync time && no local ts => +> > Remote row is deleted. +> >where the synchronization process is running on the local machine. By +> >exported, I mean the local values are sent to the remote machine, and the +> >row on that remote machine is updated to the local values. How does this +> >sound? + +> Having said that, a good algo will have to be written to cut down on +> network traffic and to keep database conversations down to a minimum. This +> will be appreciated by people with low bandwidth connections I'm sure +> (dial-ups, fractional T1's, etc). + + Of course! In reflection, the assigned identification number I +mentioned above might be the best then, instead of having to transfer the +entire set of key fields back and forth. + +> What would a vacuum do to a system being used by many people? + + Probably lock them out of tables while they are vacuumed... Maybe +not really required in the end, possibly optional? + +> It could probably be named either way...but the one thing I really don't +> want to do is start hacking server code. The PostgreSQL people have enough +> to do without worrying about trying to meld anything I've done to their +> server. :) + + Yea, they probably would appreciate that. They already have enough +on thier plate for 7.x as it is! :) + +> Besides, I like the idea of having it operate as a stand-alone product. +> The only PostgreSQL feature we would require would be triggers and +> plpgsql...what was the earliest version of PostgreSQL that supported +> plpgsql? Even then I don't see the triggers being that complex to boot. + + No, provided that we don't do the identification number idea +(which the more I think about it, probably will not work). As for what +version support plpgsql, I don't know, one of the more hard-core pgsql +hackers can probably tell us that. + +> The only thing we'd need for Python is the Python extensions for +> PostgreSQL...which in turn requires libpq and that's about it. So, it +> should be able to run on any platform supported by Python and libpq. + + Of course. If it ran on NT as well as Linux/Unix, that would be +even better. :) + +> Unix folks should be happy....assuming they have X running on the +> machine doing the replication or syncing. Even then I wrote a curses +> based Python interface awhile back which allows buttons, progress +> bars, input fields, etc (I called it tinter and it's available at +> http://iximd.com/~dwalker). It's a simple interface and could +> probably be cleaned up a bit but it works. :) + + Why would we want any type of GUI (X11 or curses) for this sync +program. I imagine just a command line program with a few options (local +machine, remote machine, db name, etc...), and nothing else. + Though I will take a look at your curses interface, as I have been +wanting to make a curses interface to a few db interfaces I have, in a +simple as manner as possible. + +> That would be a Good Thing. Have webspace somewhere? If I can get +> permission from the "powers that be" at the office I could host a website on +> our (Domino) webserver. + + Yea, I got my own web server (www.rkirkpat.net) with 1GB+ of disk +space available, sitting on a decent speed DSL. Even can setup of a +virtual server if we want (i.e. pgsync.rkirkpat.net :). CVS repository, +email lists, etc... possible with some effort (and time). + So, where should we start? TTYL. + + PS. The current pages on my web site are very out of date at the +moment (save for the pgsql information). I hope to have updated ones up +within the week. + +--------------------------------------------------------------------------- +| "For to me to live is Christ, and to die is gain." | +| --- Philippians 1:21 (KJV) | +--------------------------------------------------------------------------- +| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | +--------------------------------------------------------------------------- + + +************ + +From owner-pgsql-hackers@hub.org Mon Dec 27 12:33:32 1999 +Received: from hub.org (hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA24817 + for ; Mon, 27 Dec 1999 13:33:29 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id NAA53391; + Mon, 27 Dec 1999 13:29:02 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Mon, 27 Dec 1999 13:28:38 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id NAA53248 + for pgsql-hackers-outgoing; Mon, 27 Dec 1999 13:27:40 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from gtv.ca (h139-142-238-17.cg.fiberone.net [139.142.238.17]) + by hub.org (8.9.3/8.9.3) with ESMTP id NAA53170 + for ; Mon, 27 Dec 1999 13:26:40 -0500 (EST) + (envelope-from aaron@genisys.ca) +Received: from stilborne (24.67.90.252.ab.wave.home.com [24.67.90.252]) + by gtv.ca (8.9.3/8.8.7) with SMTP id MAA01200 + for ; Mon, 27 Dec 1999 12:36:39 -0700 +From: "Aaron J. Seigo" +To: pgsql-hackers@hub.org +Subject: Re: [HACKERS] database replication +Date: Mon, 27 Dec 1999 11:23:19 -0700 +X-Mailer: KMail [version 1.0.28] +Content-Type: text/plain +References: <199912271135.TAA10184@netrinsics.com> +In-Reply-To: <199912271135.TAA10184@netrinsics.com> +MIME-Version: 1.0 +Message-Id: <99122711245600.07929@stilborne> +Content-Transfer-Encoding: 8bit +Sender: owner-pgsql-hackers@postgreSQL.org +Status: OR + +hi.. + +> Before anyone starts implementing any database replication, I'd strongly +> suggest doing some research, first: +> +> http://sybooks.sybase.com:80/onlinebooks/group-rs/rsg1150e/rs_admin/@Generic__BookView;cs=default;ts=default + +good idea, but perhaps sybase isn't the best study case.. here's some extremely +detailed online coverage of Oracle 8i's replication, from the oracle online +library: + +http://bach.towson.edu/oracledocs/DOC/server803/A54651_01/toc.htm + +-- +Aaron J. Seigo +Sys Admin + +************ + diff --git a/doc/src/FAQ.html b/doc/src/FAQ.html index c29d4c0dff..13e76176c2 100644 --- a/doc/src/FAQ.html +++ b/doc/src/FAQ.html @@ -628,7 +628,7 @@ support configured in your kernel at all.

accessing my PostgreSQL database?

By default, PostgreSQL only allows connections from the local machine -using unix domain sockets. Other machines will not be able to connect +using Unix domain sockets. Other machines will not be able to connect unless you add the -i flag to the postmaster, and enable host-based authentication by modifying the file $PGDATA/pg_hba.conf accordingly. This will allow TCP/IP connections. @@ -852,9 +852,12 @@ Maximum size for a table? unlimited on all operating systems Maximum size for a row? 8k, configurable to 32k Maximum number of rows in a table? unlimited Maximum number of columns table? unlimited -Maximun number of indexes on a table? unlimited +Maximum number of indexes on a table? unlimited +Of course, these are not actually unlimited, but limited to available +disk space.

+ To change the maximum row size, edit include/config.h and change BLCKSZ. To use attributes larger than 8K, you can also use the large object interface.