postgresql/doc/TODO.detail/replication

From goran@kirra.net Mon Dec 20 14:30:54 1999
Received: from villa.bildbasen.se (villa.bildbasen.se [193.45.225.97])
	by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id PAA29058
	for <pgman@candle.pha.pa.us>; Mon, 20 Dec 1999 15:30:17 -0500 (EST)
Received: (qmail 2485 invoked from network); 20 Dec 1999 20:29:53 -0000
Received: from a112.dial.kiruna.se (HELO kirra.net) (193.45.238.12)
  by villa.bildbasen.se with SMTP; 20 Dec 1999 20:29:53 -0000
Sender: goran
Message-ID: <385E9192.226CC37D@kirra.net>
Date: Mon, 20 Dec 1999 21:29:06 +0100
From: Goran Thyni <goran@kirra.net>
Organization: kirra.net
X-Mailer: Mozilla 4.6 [en] (X11; U; Linux 2.2.13 i586)
X-Accept-Language: sv, en
MIME-Version: 1.0
To: Bruce Momjian <pgman@candle.pha.pa.us>
CC: "neil d. quiogue" <nquiogue@ieee.org>,
        PostgreSQL-development <pgsql-hackers@postgreSQL.org>
Subject: Re: [HACKERS] Re: QUESTION: Replication
References: <199912201508.KAA20572@candle.pha.pa.us>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Status: OR

Bruce Momjian wrote:
> We need major work in this area, or at least a plan and an FAQ item.
> We are getting major questions on this, and I don't know enough even to
> make an FAQ item telling people their options.

My 2 cents, or 2 ören since I'm a Swede, on this:

It is pretty simple to build a replication with pg_dump, transfer,
empty replic and reload.
But if we want "live replicas" we better base our efforts on a
mechanism using WAL-logs to rollforward the replicas.

regards,
-----------------
Göran Thyni
On quiet nights you can hear Windows NT reboot!

From owner-pgsql-hackers@hub.org Fri Dec 24 10:01:18 1999
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA11295
	for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 11:01:17 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id KAA20310 for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 10:39:18 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id KAA61760;
	Fri, 24 Dec 1999 10:31:13 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 10:30:48 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id KAA58879
	for pgsql-hackers-outgoing; Fri, 24 Dec 1999 10:29:51 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from bocs170n.black-oak.COM ([38.149.137.131])
	by hub.org (8.9.3/8.9.3) with ESMTP id KAA58795
	for <pgsql-hackers@postgreSQL.org>; Fri, 24 Dec 1999 10:29:00 -0500 (EST)
	(envelope-from DWalker@black-oak.com)
From: DWalker@black-oak.com
To: pgsql-hackers@postgreSQL.org
Subject: [HACKERS] database replication
Date: Fri, 24 Dec 1999 10:27:59 -0500
Message-ID: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM>
X-Priority: 3 (Normal)
X-MIMETrack: Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99
	10:28:01 AM
MIME-Version: 1.0
MIME-Version: 1.0
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR

<P>I've been toying with the idea of implementing database replication for =
the last few days. &nbsp;The system I'm proposing will be a seperate progra=
m which can be run on any machine and will most likely be implemented in Py=
thon. &nbsp;What I'm looking for at this point are gaping holes in my think=
ing/logic/etc. &nbsp;Here's what I'm thinking...</P><P>&nbsp;</P><P>1) I wa=
nt to make this program an additional layer over PostgreSQL. &nbsp;I really=
 don't want to hack server code if I can get away with it. &nbsp;At this po=
int I don't feel I need to.</P><P>2) The replication system will need to ad=
d at least one field to each table in each database that needs to be replic=
ated. &nbsp;This field will be a date/time stamp which identifies the &quot=
;last update&quot; of the record. &nbsp;This field will be called PGR=5FTIM=
E for lack of a better name. &nbsp;Because this field will be used from wit=
hin programs and triggers it can be longer so as to not mistake it for a us=
er field.</P><P>3) For each table to be replicated the replication system w=
ill programatically add one plpgsql function and trigger to modify the PGR=
=5FTIME field on both UPDATEs and INSERTs. &nbsp;The name of this function =
and trigger will be along the lines of &lt;table=5Fname&gt;=5Freplication=
=5Fupdate=5Ftrigger and &lt;table=5Fname&gt;=5Freplication=5Fupdate=5Ffunct=
ion. &nbsp;The function is a simple two-line chunk of code to set the field=
 PGR=5FTIME equal to NOW. &nbsp;The trigger is called before each insert/up=
date. &nbsp;When looking at the Docs I see that times are stored in Zulu (G=
T) time. &nbsp;Because of this I don't have to worry about time zones and t=
he like. &nbsp;I need direction on this part (such as &quot;hey dummy, look=
 at page N of file X.&quot;).</P><P>4) At this point we have tables which c=
an, at a basic level, tell the replication system when they were last updat=
ed.</P><P>5) The replication system will have a database of its own to reco=
rd the last replication event, hold configuration, logs, etc. &nbsp;I'd pre=
fer to store the configuration in a PostgreSQL table but it could just as e=
asily be stored in a text file on the filesystem somewhere.</P><P>6) To han=
dle replication I basically check the local &quot;last replication time&quo=
t; and compare it against the remote PGR=5FTIME fields. &nbsp;If the remote=
 PGR=5FTIME is greater than the last replication time then change the local=
 copy of the database, otherwise, change the remote end of the database. &n=
bsp;At this point I don't have a way to know WHICH field changed between th=
e two replicas so either I do ROW level replication or I check each field. =
&nbsp;I check PGR=5FTIME to determine which field is the most current. &nbs=
p;Some fine tuning of this process will have to occur no doubt.</P><P>7) Th=
e commandline utility, fired off by something like cron, could run several =
times during the day -- command line parameters can be implemented to say P=
USH ALL CHANGES TO SERVER A, or PULL ALL CHANGES FROM SERVER B.</P><P>&nbsp=
;</P><P>Questions/Concerns:</P><P>1) How far do I go with this? &nbsp;Do I =
start manhandling the system catalogs (pg=5F* tables)?</P><P>2) As to #2 an=
d #3 above, I really don't like tools automagically changing my tables but =
at this point I don't see a way around it. &nbsp;I guess this is where the =
testing comes into play.</P><P>3) Security: the replication app will have t=
o have pretty good rights to the database so it can add the nessecary funct=
ions and triggers, modify table schema, etc. &nbsp;</P><P>&nbsp;</P><P>&nbs=
p; So, any &quot;you're insane and should run home to momma&quot; comments?=
</P><P>&nbsp;</P><P>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Damond=
</P><P></P>=

************

From owner-pgsql-hackers@hub.org Fri Dec 24 18:31:03 1999
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA26244
	for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 19:31:02 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id TAA12730 for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 19:30:05 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id TAA57851;
	Fri, 24 Dec 1999 19:23:31 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 19:22:54 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id TAA57710
	for pgsql-hackers-outgoing; Fri, 24 Dec 1999 19:21:56 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from Mail.austin.rr.com (sm2.texas.rr.com [24.93.35.55])
	by hub.org (8.9.3/8.9.3) with ESMTP id TAA57680
	for <pgsql-hackers@postgresql.org>; Fri, 24 Dec 1999 19:21:25 -0500 (EST)
	(envelope-from ELOEHR@austin.rr.com)
Received: from austin.rr.com ([24.93.40.248]) by Mail.austin.rr.com  with Microsoft SMTPSVC(5.5.1877.197.19);
  Fri, 24 Dec 1999 18:12:50 -0600
Message-ID: <38640E2D.75136600@austin.rr.com>
Date: Fri, 24 Dec 1999 18:22:05 -0600
From: Ed Loehr <ELOEHR@austin.rr.com>
X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.12-20smp i686)
X-Accept-Language: en
MIME-Version: 1.0
To: DWalker@black-oak.com
CC: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] database replication
References: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR

DWalker@black-oak.com wrote:

> 6) To handle replication I basically check the local "last
> replication time" and compare it against the remote PGR_TIME
> fields.  If the remote PGR_TIME is greater than the last replication
> time then change the local copy of the database, otherwise, change
> the remote end of the database.  At this point I don't have a way to
> know WHICH field changed between the two replicas so either I do ROW
> level replication or I check each field.  I check PGR_TIME to
> determine which field is the most current.  Some fine tuning of this
> process will have to occur no doubt.

Interesting idea.  I can see how this might sync up two databases
somehow.  For true replication, however, I would always want every
replicated database to be, at the very least, internally consistent
(i.e., referential integrity), even if it was a little behind on
processing transactions.  In this method, its not clear how
consistency is every achieved/guaranteed at any point in time if the
input stream of changes is continuous.  If the input stream ceased,
then I can see how this approach might eventually catch up and totally
resync everything, but it looks *very* computationally  expensive.

But I might have missed something.  How would internal consistency be
maintained?


> 7) The commandline utility, fired off by something like cron, could
> run several times during the day -- command line parameters can be
> implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES
> FROM SERVER B.

My two cents is that, while I can see this kind of database syncing as
valuable, this is not the kind of "replication" I had in mind.  This
may already possible by simply copying the database.  What replication
means to me is a live, continuously streaming sequence of updates from
one database to another where the replicated database is always
internally consistent, available for read-only queries, and never "too
far" out of sync with the source/primary database.

What does replication mean to others?

Cheers,
Ed Loehr


************

From owner-pgsql-hackers@hub.org Fri Dec 24 21:31:10 1999
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA02578
	for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 22:31:09 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id WAA16641 for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 22:18:56 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id WAA89135;
	Fri, 24 Dec 1999 22:11:12 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 22:10:56 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id WAA89019
	for pgsql-hackers-outgoing; Fri, 24 Dec 1999 22:09:59 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from bocs170n.black-oak.COM ([38.149.137.131])
	by hub.org (8.9.3/8.9.3) with ESMTP id WAA88957;
	Fri, 24 Dec 1999 22:09:11 -0500 (EST)
	(envelope-from dwalker@black-oak.com)
Received: from gcx80 ([151.196.99.113])
          by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1)
          with SMTP id 1999122422080835:6 ;
          Fri, 24 Dec 1999 22:08:08 -0500
Message-ID: <001b01bf4e9e$647287d0$af63a8c0@walkers.org>
From: "Damond Walker" <dwalker@black-oak.com>
To: <owner-pgsql-hackers@postgreSQL.org>
Cc: <pgsql-hackers@postgreSQL.org>
References: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM> <38640E2D.75136600@austin.rr.com>
Subject: Re: [HACKERS] database replication
Date: Fri, 24 Dec 1999 22:07:55 -0800
MIME-Version: 1.0
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99
	10:08:09 PM,
	Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99
	10:08:11 PM,
	Serialize complete at 12/24/99 10:08:11 PM
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR

>
> Interesting idea.  I can see how this might sync up two databases
> somehow.  For true replication, however, I would always want every
> replicated database to be, at the very least, internally consistent
> (i.e., referential integrity), even if it was a little behind on
> processing transactions.  In this method, its not clear how
> consistency is every achieved/guaranteed at any point in time if the
> input stream of changes is continuous.  If the input stream ceased,
> then I can see how this approach might eventually catch up and totally
> resync everything, but it looks *very* computationally  expensive.
>

    What's the typical unit of work for the database?  Are we talking about
update transactions which span the entire DB?  Or are we talking about
updating maybe 1% or less of the database everyday?  I'd think it would be
more towards the latter than the former.  So, yes, this process would be
computationally expensive but how many records would actually have to be
sent back and forth?

> But I might have missed something.  How would internal consistency be
> maintained?
>

    Updates that occur at site A will be moved to site B and vice versa.
Consistency would be maintained.  The only problem that I can see right off
the bat would be what if site A and site B made changes to a row and then
site C was brought into the picture?  Which one wins?

    Someone *has* to win when it comes to this type of thing.  You really
DON'T want to start merging row changes...

>
> My two cents is that, while I can see this kind of database syncing as
> valuable, this is not the kind of "replication" I had in mind.  This
> may already possible by simply copying the database.  What replication
> means to me is a live, continuously streaming sequence of updates from
> one database to another where the replicated database is always
> internally consistent, available for read-only queries, and never "too
> far" out of sync with the source/primary database.
>

    Sounds like you're talking about distributed transactions to me.  That's
an entirely different subject all-together.  What you describe can be done
by copying a database...but as you say, this would only work in a read-only
situation.


                Damond


************

From owner-pgsql-hackers@hub.org Sat Dec 25 16:35:07 1999
Received: from hub.org (hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA28890
	for <pgman@candle.pha.pa.us>; Sat, 25 Dec 1999 17:35:05 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id RAA86997;
	Sat, 25 Dec 1999 17:29:10 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Sat, 25 Dec 1999 17:28:09 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id RAA86863
	for pgsql-hackers-outgoing; Sat, 25 Dec 1999 17:27:11 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from mtiwmhc08.worldnet.att.net (mtiwmhc08.worldnet.att.net [204.127.131.19])
	by hub.org (8.9.3/8.9.3) with ESMTP id RAA86798
	for <pgsql-hackers@postgreSQL.org>; Sat, 25 Dec 1999 17:26:34 -0500 (EST)
	(envelope-from pgsql@rkirkpat.net)
Received: from [192.168.3.100] ([12.74.72.219])
          by mtiwmhc08.worldnet.att.net (InterMail v03.02.07.07 118-134)
          with ESMTP id <19991225222554.VIOL28505@[12.74.72.219]>;
          Sat, 25 Dec 1999 22:25:54 +0000
Date: Sat, 25 Dec 1999 15:25:47 -0700 (MST)
From: Ryan Kirkpatrick <pgsql@rkirkpat.net>
X-Sender: rkirkpat@excelsior.rkirkpat.net
To: DWalker@black-oak.com
cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] database replication
In-Reply-To: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM>
Message-ID: <Pine.LNX.4.10.9912251433310.1551-100000@excelsior.rkirkpat.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR

On Fri, 24 Dec 1999 DWalker@black-oak.com wrote:

> I've been toying with the idea of implementing database replication
> for the last few days.

	I too have been thinking about this some over the last year or
two, just trying to find a quick and easy way to do it. I am not so
interested in replication, as in synchronization, as in between a desktop
machine and a laptop, so I can keep the databases on each in sync with
each other. For this sort of purpose, both the local and remote databases
would be "idle" at the time of syncing.

> 2) The replication system will need to add at least one field to each
> table in each database that needs to be replicated. This field will be
> a date/time stamp which identifies the "last update" of the record.
> This field will be called PGR_TIME for lack of a better name.
> Because this field will be used from within programs and triggers it
> can be longer so as to not mistake it for a user field.

	How about a single, seperate table with the fields of 'database',
'tablename', 'oid', 'last_changed', that would store the same data as your
PGR_TIME field. It would be seperated from the actually data tables, and
therefore would be totally transparent to any database interface
applications. The 'oid' field would hold each row's OID, a nice, unique
identification number for the row, while the other fields would tell which
table and database the oid is in. Then this table can be compared with the
this table on a remote machine to quickly find updates and changes, then
each differences can be dealt with in turn.

> 3) For each table to be replicated the replication system will
> programatically add one plpgsql function and trigger to modify the
> PGR_TIME field on both UPDATEs and INSERTs.  The name of this function
> and trigger will be along the lines of
> <table_name>_replication_update_trigger and
> <table_name>_replication_update_function.  The function is a simple
> two-line chunk of code to set the field PGR_TIME equal to NOW.  The
> trigger is called before each insert/update.  When looking at the Docs
> I see that times are stored in Zulu (GT) time.  Because of this I
> don't have to worry about time zones and the like.  I need direction
> on this part (such as "hey dummy, look at page N of file X.").

	I like this idea, better than any I have come up with yet. Though,
how are you going to handle DELETEs?

> 6) To handle replication I basically check the local "last replication
> time" and compare it against the remote PGR_TIME fields.  If the
> remote PGR_TIME is greater than the last replication time then change
> the local copy of the database, otherwise, change the remote end of
> the database.  At this point I don't have a way to know WHICH field
> changed between the two replicas so either I do ROW level replication
> or I check each field.  I check PGR_TIME to determine which field is
> the most current.  Some fine tuning of this process will have to occur
> no doubt.

	Yea, this is indeed the sticky part, and would indeed require some
fine-tunning. Basically, the way I see it, is if the two timestamps for a
single row do not match (or even if the row and therefore timestamp is
missing on one side or the other altogether):
	local ts > remote ts => Local row is exported to remote.
	remote ts > local ts => Remote row is exported to local.
	local ts > last sync time && no remote ts =>
		Local row is inserted on remote.
	local ts < last sync time && no remote ts =>
		Local row is deleted.
	remote ts > last sync time && no local ts =>
		Remote row is inserted on local.
	remote ts < last sync time && no local ts =>
		Remote row is deleted.
where the synchronization process is running on the local machine. By
exported, I mean the local values are sent to the remote machine, and the
row on that remote machine is updated to the local values. How does this
sound?

> 7) The commandline utility, fired off by something like cron, could
> run several times during the day -- command line parameters can be
> implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES
> FROM SERVER B.

	Or run manually for my purposes. Also, maybe follow it
with a vacuum run on both sides for all databases, as this is going to
potenitally cause lots of table changes that could stand with a cleanup.

> 1) How far do I go with this?  Do I start manhandling the system catalogs (pg_* tables)?

	Initially, I would just stick to user table data... If you have
changes in triggers and other meta-data/executable code, you are going to
want to make syncs of that stuff manually anyway. At least I would want
to.

> 2) As to #2 and #3 above, I really don't like tools automagically
> changing my tables but at this point I don't see a way around it.  I
> guess this is where the testing comes into play.

	Hence the reason for the seperate table with just a row's
identification and last update time. Only modifications to the synced
database is the update trigger, which should be pretty harmless.

> 3) Security: the replication app will have to have pretty good rights
> to the database so it can add the nessecary functions and triggers,
> modify table schema, etc.

	Just run the sync program as the postgres super user, and there
are no problems. :)

>   So, any "you're insane and should run home to momma" comments?

	No, not at all. Though it probably should be remaned from
replication to synchronization. The former is usually associated with a
continuous stream of updates between the local and remote databases, so
they are almost always in sync, and have a queuing ability if their
connection is loss for span of time as well. Very complex and difficult to
implement, and would require hacking server code. :( Something only Sybase
and Oracle have (as far as I know), and from what I have seen of Sybase's
replication server support (dated by 5yrs) it was a pain to setup and get
running correctly.
	The latter, synchronization, is much more managable, and can still
be useful, especially when you have a large database you want in two
places, mainly for read only purposes at one end or the other, but don't
want to waste the time/bandwidth to move and load the entire database each
time it changes on one end or the other. Same idea as mirroring software
for FTP sites, just transfers the changes, and nothing more.
	I also like the idea of using Python. I have been using it
recently for some database interfaces (to PostgreSQL of course :), and it
is a very nice language to work with. Some worries about performance of
the program though, as python is only an interpreted lanuage, and I have
yet to really be impressed with the speed of execution of my database
interfaces yet.
	Anyway, it sound like a good project, and finally one where I
actually have a clue of what is going on, and the skills to help. So, if
you are interested in pursing this project, I would be more than glad to
help. TTYL.

---------------------------------------------------------------------------
|   "For to me to live is Christ, and to die is gain."                    |
|                                            --- Philippians 1:21 (KJV)   |
---------------------------------------------------------------------------
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
---------------------------------------------------------------------------


************

From owner-pgsql-hackers@hub.org Sun Dec 26 08:31:09 1999
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA17976
	for <pgman@candle.pha.pa.us>; Sun, 26 Dec 1999 09:31:07 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id JAA23337 for <pgman@candle.pha.pa.us>; Sun, 26 Dec 1999 09:28:36 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id JAA90738;
	Sun, 26 Dec 1999 09:21:58 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 09:19:19 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id JAA90498
	for pgsql-hackers-outgoing; Sun, 26 Dec 1999 09:18:21 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from bocs170n.black-oak.COM ([38.149.137.131])
	by hub.org (8.9.3/8.9.3) with ESMTP id JAA90452
	for <pgsql-hackers@postgreSQL.org>; Sun, 26 Dec 1999 09:17:54 -0500 (EST)
	(envelope-from dwalker@black-oak.com)
Received: from vmware98 ([151.196.99.113])
          by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1)
          with SMTP id 1999122609164808:7 ;
          Sun, 26 Dec 1999 09:16:48 -0500
Message-ID: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org>
From: "Damond Walker" <dwalker@black-oak.com>
To: "Ryan Kirkpatrick" <pgsql@rkirkpat.net>
Cc: <pgsql-hackers@postgreSQL.org>
Subject: Re: [HACKERS] database replication
Date: Sun, 26 Dec 1999 10:10:41 -0500
MIME-Version: 1.0
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 4.72.3110.1
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99
	09:16:51 AM,
	Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99
	09:16:54 AM,
	Serialize complete at 12/26/99 09:16:54 AM
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR

>
>     I too have been thinking about this some over the last year or
>two, just trying to find a quick and easy way to do it. I am not so
>interested in replication, as in synchronization, as in between a desktop
>machine and a laptop, so I can keep the databases on each in sync with
>each other. For this sort of purpose, both the local and remote databases
>would be "idle" at the time of syncing.
>

    I don't think it would matter if the databases are idle or not to be
honest with you.  At any single point in time when you replicate I'd figure
that the database would be in a consistent state.  So, you should be able to
replicate (or sync) a remote database that is in use.  After all, you're
getting a snapshot of the database as it stands at 8:45 PM.  At 8:46 PM it
may be totally different...but the next time syncing takes place those
changes would appear in your local copy.

    The one problem you may run into is if the remote host is running a
large batch process.  It's very likely that you will get 50% of their
changes when you replicate...but then again, that's why you can schedule the
event to work around such things.

>     How about a single, seperate table with the fields of 'database',
>'tablename', 'oid', 'last_changed', that would store the same data as your
>PGR_TIME field. It would be seperated from the actually data tables, and
>therefore would be totally transparent to any database interface
>applications. The 'oid' field would hold each row's OID, a nice, unique
>identification number for the row, while the other fields would tell which
>table and database the oid is in. Then this table can be compared with the
>this table on a remote machine to quickly find updates and changes, then
>each differences can be dealt with in turn.
>

    The problem with OID's is that they are unique at the local level but if
you try and use them between servers you can run into overlap.  Also, if a
database is under heavy use this table could quickly become VERY large.  Add
indexes to this table to help performance and you're taking up even more
disk space.

    Using the PGR_TIME field with an index will allow us to find rows which
have changed VERY quickly.  All we need to do now is somehow programatically
find the primary key for a table so the person setting up replication (or
syncing) doesn't have to have an indepth knowledge of the schema in order to
setup a syncing schedule.

>
>     I like this idea, better than any I have come up with yet. Though,
>how are you going to handle DELETEs?
>

    Oops...how about defining a trigger for this?  With deletion I guess we
would have to move a flag into another table saying we deleted record 'X'
with this primary key from this table.

>
>     Yea, this is indeed the sticky part, and would indeed require some
>fine-tunning. Basically, the way I see it, is if the two timestamps for a
>single row do not match (or even if the row and therefore timestamp is
>missing on one side or the other altogether):
>     local ts > remote ts => Local row is exported to remote.
>     remote ts > local ts => Remote row is exported to local.
>     local ts > last sync time && no remote ts =>
>          Local row is inserted on remote.
>     local ts < last sync time && no remote ts =>
>          Local row is deleted.
>     remote ts > last sync time && no local ts =>
>          Remote row is inserted on local.
>     remote ts < last sync time && no local ts =>
>          Remote row is deleted.
>where the synchronization process is running on the local machine. By
>exported, I mean the local values are sent to the remote machine, and the
>row on that remote machine is updated to the local values. How does this
>sound?
>

    The replication part will be the most complex...that much is for
certain...

    I've been writing systems in Lotus Notes/Domino for the last year or so
and I've grown quite spoiled with what it can do in regards to replication.
It's not real-time but you have to gear your applications to this type of
thing (it's possible to create documents, fire off email to notify people of
changes and have the email arrive before the replicated documents do).
Replicating large Notes/Domino databases takes quite a while....I don't see
any kind of replication or syncing running in a blink of an eye.

    Having said that, a good algo will have to be written to cut down on
network traffic and to keep database conversations down to a minimum.  This
will be appreciated by people with low bandwidth connections I'm sure
(dial-ups, fractional T1's, etc).

>     Or run manually for my purposes. Also, maybe follow it
>with a vacuum run on both sides for all databases, as this is going to
>potenitally cause lots of table changes that could stand with a cleanup.
>

    What would a vacuum do to a system being used by many people?

>     No, not at all. Though it probably should be remaned from
>replication to synchronization. The former is usually associated with a
>continuous stream of updates between the local and remote databases, so
>they are almost always in sync, and have a queuing ability if their
>connection is loss for span of time as well. Very complex and difficult to
>implement, and would require hacking server code. :( Something only Sybase
>and Oracle have (as far as I know), and from what I have seen of Sybase's
>replication server support (dated by 5yrs) it was a pain to setup and get
>running correctly.

    It could probably be named either way...but the one thing I really don't
want to do is start hacking server code.  The PostgreSQL people have enough
to do without worrying about trying to meld anything I've done to their
server.   :)

    Besides, I like the idea of having it operate as a stand-alone product.
The only PostgreSQL feature we would require would be triggers and
plpgsql...what was the earliest version of PostgreSQL that supported
plpgsql?  Even then I don't see the triggers being that complex to boot.

>     I also like the idea of using Python. I have been using it
>recently for some database interfaces (to PostgreSQL of course :), and it
>is a very nice language to work with. Some worries about performance of
>the program though, as python is only an interpreted lanuage, and I have
>yet to really be impressed with the speed of execution of my database
>interfaces yet.

    The only thing we'd need for Python is the Python extensions for
PostgreSQL...which in turn requires libpq and that's about it.  So, it
should be able to run on any platform supported by Python and libpq.  Using
TK for the interface components will require NT people to get additional
software from the 'net.  At least it did with older version of Windows
Python.  Unix folks should be happy....assuming they have X running on the
machine doing the replication or syncing.  Even then I wrote a curses based
Python interface awhile back which allows buttons, progress bars, input
fields, etc (I called it tinter and it's available at
http://iximd.com/~dwalker).  It's a simple interface and could probably be
cleaned up a bit but it works.  :)

>     Anyway, it sound like a good project, and finally one where I
>actually have a clue of what is going on, and the skills to help. So, if
>you are interested in pursing this project, I would be more than glad to
>help. TTYL.
>


    That would be a Good Thing.  Have webspace somewhere?  If I can get
permission from the "powers that be" at the office I could host a website on
our (Domino) webserver.

                Damond


************

From owner-pgsql-hackers@hub.org Sun Dec 26 19:11:48 1999
Received: from hub.org (hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA26661
	for <pgman@candle.pha.pa.us>; Sun, 26 Dec 1999 20:11:46 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id UAA14959;
	Sun, 26 Dec 1999 20:08:15 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 20:07:27 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id UAA14820
	for pgsql-hackers-outgoing; Sun, 26 Dec 1999 20:06:28 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from mtiwmhc02.worldnet.att.net (mtiwmhc02.worldnet.att.net [204.127.131.37])
	by hub.org (8.9.3/8.9.3) with ESMTP id UAA14749
	for <pgsql-hackers@postgreSQL.org>; Sun, 26 Dec 1999 20:05:39 -0500 (EST)
	(envelope-from rkirkpat@rkirkpat.net)
Received: from [192.168.3.100] ([12.74.72.56])
          by mtiwmhc02.worldnet.att.net (InterMail v03.02.07.07 118-134)
          with ESMTP id <19991227010506.WJVW1914@[12.74.72.56]>;
          Mon, 27 Dec 1999 01:05:06 +0000
Date: Sun, 26 Dec 1999 18:05:02 -0700 (MST)
From: Ryan Kirkpatrick <pgsql@rkirkpat.net>
X-Sender: rkirkpat@excelsior.rkirkpat.net
To: Damond Walker <dwalker@black-oak.com>
cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] database replication
In-Reply-To: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org>
Message-ID: <Pine.LNX.4.10.9912261742550.7666-100000@excelsior.rkirkpat.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR

On Sun, 26 Dec 1999, Damond Walker wrote:

> >     How about a single, seperate table with the fields of 'database',
> >'tablename', 'oid', 'last_changed', that would store the same data as your
> >PGR_TIME field. It would be seperated from the actually data tables, and
...
>     The problem with OID's is that they are unique at the local level but if
> you try and use them between servers you can run into overlap.

	Yea, forgot about that point, but became dead obvious once you
mentioned it. Boy, I feel stupid now. :)

>     Using the PGR_TIME field with an index will allow us to find rows which
> have changed VERY quickly.  All we need to do now is somehow programatically
> find the primary key for a table so the person setting up replication (or
> syncing) doesn't have to have an indepth knowledge of the schema in order to
> setup a syncing schedule.

	Hmm... Yea, maybe look to see which field(s) has a primary, unique
index on it? Then use those field(s) as a primary key. Just require that
any table to be synchronized to have some set of fields that uniquely
identify each row. Either that, or add another field to each table with
our own, cross system consistent, identification system. Don't know which
would be more efficient and easier to work with.
	The former could potentially get sticky if it takes a lots of
fields to generate a unique key value, but has the smallest effect on the
table to be synced. The latter could be difficult to keep straight between
systems (local vs. remote), and would require a trigger on inserts to
generate a new, unique id number, that does not exist locally or
remotely (nasty issue there), but would remove the uniqueness
requirement.

>     Oops...how about defining a trigger for this?  With deletion I guess we
> would have to move a flag into another table saying we deleted record 'X'
> with this primary key from this table.

	Or, according to my logic below, if a row is missing on one side
or the other, then just compare the remaining row's timestamp to the last
synchronization time (stored in a seperate table/db elsewhere). The
results of the comparsion and the state of row existences tell one if the
row was inserted or deleted since the last sync, and what should be done
to perform the sync.

> >     Yea, this is indeed the sticky part, and would indeed require some
> >fine-tunning. Basically, the way I see it, is if the two timestamps for a
> >single row do not match (or even if the row and therefore timestamp is
> >missing on one side or the other altogether):
> >     local ts > remote ts => Local row is exported to remote.
> >     remote ts > local ts => Remote row is exported to local.
> >     local ts > last sync time && no remote ts =>
> >          Local row is inserted on remote.
> >     local ts < last sync time && no remote ts =>
> >          Local row is deleted.
> >     remote ts > last sync time && no local ts =>
> >          Remote row is inserted on local.
> >     remote ts < last sync time && no local ts =>
> >          Remote row is deleted.
> >where the synchronization process is running on the local machine. By
> >exported, I mean the local values are sent to the remote machine, and the
> >row on that remote machine is updated to the local values. How does this
> >sound?

>     Having said that, a good algo will have to be written to cut down on
> network traffic and to keep database conversations down to a minimum.  This
> will be appreciated by people with low bandwidth connections I'm sure
> (dial-ups, fractional T1's, etc).

	Of course! In reflection, the assigned identification number I
mentioned above might be the best then, instead of having to transfer the
entire set of key fields back and forth.

>     What would a vacuum do to a system being used by many people?

	Probably lock them out of tables while they are vacuumed... Maybe
not really required in the end, possibly optional?

>     It could probably be named either way...but the one thing I really don't
> want to do is start hacking server code.  The PostgreSQL people have enough
> to do without worrying about trying to meld anything I've done to their
> server.   :)

	Yea, they probably would appreciate that. They already have enough
on thier plate for 7.x as it is! :)

>     Besides, I like the idea of having it operate as a stand-alone product.
> The only PostgreSQL feature we would require would be triggers and
> plpgsql...what was the earliest version of PostgreSQL that supported
> plpgsql?  Even then I don't see the triggers being that complex to boot.

	No, provided that we don't do the identification number idea
(which the more I think about it, probably will not work). As for what
version support plpgsql, I don't know, one of the more hard-core pgsql
hackers can probably tell us that.

>     The only thing we'd need for Python is the Python extensions for
> PostgreSQL...which in turn requires libpq and that's about it.  So, it
> should be able to run on any platform supported by Python and libpq.

	Of course. If it ran on NT as well as Linux/Unix, that would be
even better. :)

> Unix folks should be happy....assuming they have X running on the
> machine doing the replication or syncing.  Even then I wrote a curses
> based Python interface awhile back which allows buttons, progress
> bars, input fields, etc (I called it tinter and it's available at
> http://iximd.com/~dwalker).  It's a simple interface and could
> probably be cleaned up a bit but it works.  :)

	Why would we want any type of GUI (X11 or curses) for this sync
program. I imagine just a command line program with a few options (local
machine, remote machine, db name, etc...), and nothing else.
	Though I will take a look at your curses interface, as I have been
wanting to make a curses interface to a few db interfaces I have, in a
simple as manner as possible.

>     That would be a Good Thing.  Have webspace somewhere?  If I can get
> permission from the "powers that be" at the office I could host a website on
> our (Domino) webserver.

	Yea, I got my own web server (www.rkirkpat.net) with 1GB+ of disk
space available, sitting on a decent speed DSL. Even can setup of a
virtual server if we want (i.e. pgsync.rkirkpat.net :). CVS repository,
email lists, etc... possible with some effort (and time).
	So, where should we start? TTYL.

	PS. The current pages on my web site are very out of date at the
moment (save for the pgsql information). I hope to have updated ones up
within the week.

---------------------------------------------------------------------------
|   "For to me to live is Christ, and to die is gain."                    |
|                                            --- Philippians 1:21 (KJV)   |
---------------------------------------------------------------------------
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
---------------------------------------------------------------------------


************

From owner-pgsql-hackers@hub.org Mon Dec 27 12:33:32 1999
Received: from hub.org (hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA24817
	for <pgman@candle.pha.pa.us>; Mon, 27 Dec 1999 13:33:29 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id NAA53391;
	Mon, 27 Dec 1999 13:29:02 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Mon, 27 Dec 1999 13:28:38 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id NAA53248
	for pgsql-hackers-outgoing; Mon, 27 Dec 1999 13:27:40 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from gtv.ca (h139-142-238-17.cg.fiberone.net [139.142.238.17])
	by hub.org (8.9.3/8.9.3) with ESMTP id NAA53170
	for <pgsql-hackers@hub.org>; Mon, 27 Dec 1999 13:26:40 -0500 (EST)
	(envelope-from aaron@genisys.ca)
Received: from stilborne (24.67.90.252.ab.wave.home.com [24.67.90.252])
	by gtv.ca (8.9.3/8.8.7) with SMTP id MAA01200
	for <pgsql-hackers@hub.org>; Mon, 27 Dec 1999 12:36:39 -0700
From: "Aaron J. Seigo" <aaron@gtv.ca>
To: pgsql-hackers@hub.org
Subject: Re: [HACKERS] database replication
Date: Mon, 27 Dec 1999 11:23:19 -0700
X-Mailer: KMail [version 1.0.28]
Content-Type: text/plain
References: <199912271135.TAA10184@netrinsics.com>
In-Reply-To: <199912271135.TAA10184@netrinsics.com>
MIME-Version: 1.0
Message-Id: <99122711245600.07929@stilborne>
Content-Transfer-Encoding: 8bit
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR

hi..

> Before anyone starts implementing any database replication, I'd strongly
> suggest doing some research, first:
>
> http://sybooks.sybase.com:80/onlinebooks/group-rs/rsg1150e/rs_admin/@Generic__BookView;cs=default;ts=default

good idea, but perhaps sybase isn't the best study case.. here's some extremely
detailed online coverage of Oracle 8i's replication, from the oracle online
library:

http://bach.towson.edu/oracledocs/DOC/server803/A54651_01/toc.htm

--
Aaron J. Seigo
Sys Admin

************

From owner-pgsql-hackers@hub.org Thu Dec 30 08:01:09 1999
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA10317
	for <pgman@candle.pha.pa.us>; Thu, 30 Dec 1999 09:01:08 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id IAA02365 for <pgman@candle.pha.pa.us>; Thu, 30 Dec 1999 08:37:10 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id IAA87902;
	Thu, 30 Dec 1999 08:34:22 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Thu, 30 Dec 1999 08:32:24 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id IAA85771
	for pgsql-hackers-outgoing; Thu, 30 Dec 1999 08:31:27 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from sandman.acadiau.ca (dcurrie@sandman.acadiau.ca [131.162.129.111])
	by hub.org (8.9.3/8.9.3) with ESMTP id IAA85234
	for <pgsql-hackers@postgresql.org>; Thu, 30 Dec 1999 08:31:10 -0500 (EST)
	(envelope-from dcurrie@sandman.acadiau.ca)
Received: (from dcurrie@localhost)
	by sandman.acadiau.ca (8.8.8/8.8.8/Debian/GNU) id GAA18698;
	Thu, 30 Dec 1999 06:30:58 -0400
From: Duane Currie <dcurrie@sandman.acadiau.ca>
Message-Id: <199912301030.GAA18698@sandman.acadiau.ca>
Subject: Re: [HACKERS] database replication
In-Reply-To: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM> from "DWalker@black-oak.com" at "Dec 24, 99 10:27:59 am"
To: DWalker@black-oak.com
Date: Thu, 30 Dec 1999 10:30:58 +0000 (AST)
Cc: pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL39 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@postgresql.org
Status: OR

Hi Guys,

Now for one of my REALLY rare posts.
Having done a little bit of distributed data systems, I figured I'd
pitch in a couple cents worth.

> 2) The replication system will need to add at least one field to each
>    table in each database that needs to be re plicated. &nbsp;This
>    field will be a date/time stamp which identifies the &quot; last
>    update&quot; of the record. &nbsp;This field will be called PGR_TIME
>    for la ck of a better name. &nbsp;Because this field will be used
>    from within programs and triggers it can be longer so as to not
>    mistake it for a user field.

I just started reading this thread, but I figured I'd throw in a couple
suggestions for distributed data control  (a few idioms I've had to
deal with b4):
	- Never use time (not reliable from system to system).  Use
	  a version number of some sort that can stay consistent across
	  all replicas

	  This way, if a system's time is or goes out of wack, it doesn't
	  cause your database to disintegrate, and it's easier to track
	  conflicts (see below.  If using time, the algorithm gets
	  nightmarish)

	- On an insert, set to version 1

	- On an update, version++

	- On a delete, mark deleted, and add a delete stub somewhere for the
	  replicator process to deal with in sync'ing the databases.

	- If two records have the same version but different data, there's
	  a conflict.  A few choices:
	  	1.  Pick one as the correct one (yuck!! invisible data loss)
		2.  Store both copies, pick one as current, and alert
		    database owner of the conflict, so they can deal with
		    it "manually."
		3.  If possible, some conflicts can be merged.  If a disjoint
		    set of fields were changed in each instance, these changes
		    may both be applied and the record merged.  (Problem:
		    takes a lot more space.  Requires a version number for
		    every field, or persistent storage of some old records.
		    However, this might help the "which fields changed" issue
		    you were talking about in #6)

	- A unique id across all systems should exist (or something that
	  effectively simulates a unique id.  Maybe a composition of the
	  originating oid (from the insert) and the originating database
	  (oid of the database's record?) might do it.  Store this as
	  an extra field in every record.

	  (Two extra fieldss so far: 'unique id' and 'version')

I do like your approach:  triggers and a separate process. (Maintainable!! :)

Anyway, just figured I'd throw in a few suggestions,
Duane

************

From owner-pgsql-patches@hub.org Sun Jan  2 23:01:38 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA16274
	for <pgman@candle.pha.pa.us>; Mon, 3 Jan 2000 00:01:28 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id XAA02655 for <pgman@candle.pha.pa.us>; Sun, 2 Jan 2000 23:45:55 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1])
	by hub.org (8.9.3/8.9.3) with ESMTP id XAA13828;
	Sun, 2 Jan 2000 23:40:47 -0500 (EST)
	(envelope-from owner-pgsql-patches@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 02 Jan 2000 23:38:34 +0000 (EST)
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id XAA13624
	for pgsql-patches-outgoing; Sun, 2 Jan 2000 23:37:36 -0500 (EST)
	(envelope-from owner-pgsql-patches@postgreSQL.org)
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.org (8.9.3/8.9.3) with ESMTP id XAA13560
	for <pgsql-patches@postgresql.org>; Sun, 2 Jan 2000 23:37:02 -0500 (EST)
	(envelope-from P.Marchesso@Videotron.ca)
Received: from Videotron.ca ([207.253.210.234])
	by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.07.30.00.05.p8)
	with ESMTP id <0FNQ000TEST8VI@falla.videotron.net> for pgsql-patches@postgresql.org; Sun,
	2 Jan 2000 23:37:01 -0500 (EST)
Date: Sun, 02 Jan 2000 23:39:23 -0500
From: Philippe Marchesseault <P.Marchesso@Videotron.ca>
Subject: [PATCHES] Distributed PostgreSQL!
To: pgsql-patches@postgreSQL.org
Message-id: <387027FB.EB88D757@Videotron.ca>
MIME-version: 1.0
X-Mailer: Mozilla 4.51 [en] (X11; I; Linux 2.2.11 i586)
Content-type: MULTIPART/MIXED; BOUNDARY="Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)"
X-Accept-Language: en
Sender: owner-pgsql-patches@postgreSQL.org
Precedence: bulk
Status: ORr

This is a multi-part message in MIME format.

--Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7bit

Hi all!

Here is a small patch to make postgres a distributed database. By
distributed I mean that you can have the same copy of the database on N
different machines and keep them all in sync.
It does not improve performances unless you distribute your clients in a
sensible manner. It does not allow you to do parallel selects.

The support page is : pages.infinit.net/daemon  and soon to be in
english.

The patch was tested with RedHat Linux 6.0 on Intel with kernel 2.2.11.
Only two machines where used so i'm not competely sure that it works
with more than two. -But it should-

I would like to know if somebody else is interested in this otherwise
i'm probably not gonna keep it growing. So please reply me to my e-mail
(P.Marchesso@videotron.ca) to give me an idea of the amount of people
interested in this.

Thanks all.

Philippe Marchesseault

--
It's not the size of the dog in the fight,
but the size of the fight in the dog.
                        -Archie Griffen


--Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)
Content-type: application/octet-stream; name=replicator-0.1.tgz
Content-disposition: attachment; filename=replicator-0.1.tgz
Content-transfer-encoding: base64

H4sIAOeZbzgAA+w8a3PayLL7Nara/9Dx3iRAME/brO11tjDGDjc2OIBvTm5OihLSAFoLiUgi
LGfX//12z0MaAX7UOU72nLpW7cagmenp6enp9xCwmetYZuQHxR++1QM7pVq5DD8AQG2vxv+W
d3b4X/mUsKVUrdbKO7XKLrZWdkqVH2D3m2GkPfMwMgOAH2Z+GI0DFt7ejwW3Nv7nPkGy/xfm
NRs5Lnv0Ocql0l6pdOv+l8u1XbH/e7W90m6NWsu7pR+g9OiYbHj+n+9/q904vzppwhFst6A4
D4NiGFjF2Tj84vJPjme5c5vFra5vma5sl21FwzBdFw7AGwb+NQtgKv8mvAXYZWw6HnayXGZ6
d3cVXQ7AeBZMYXukOquv0/TXZKhhGLJrjEvBAvlhYjwbW5b2/r8ycu1Z2B6P7SFs+8lMYRQ4
s/irodA8UJOvwZ3eCXeahqu+GtqyD7SFrEFPNW2aQAO07c6+wPb52m65ztB49pNAQKfZzIys
CR7DA+OZ7YxGm5hgaFrXzLOLkeXPiuqgFHzXfnhnC96A+rJX2C3wWf9q5n96dPn/zbaH5L+U
95vkf6VUS/T/Xq2Crbt7ldqT/P8eD+6NWS6V8+VSzXgDxdzlBfTnDE6YBZWfcSMOqtWDnX0o
7+/v54rY4yelELaKE3/KYqYpaoykZNeW3v+XcBkWo+UMZcHkzVqDM7M2vZ6GY/G6mHuOmHEM
jL3arrn3cxU/PnO8SJmQan72fs6C5SH22sNeu/m9/T3qWcwBAuiZEfw36hYoQ7l6UKrg6pD9
SiXqATlojSCaMECOiNiUeVFIHwP8s3CiCZjw6vWrPPXw0t3AQkqEMAr8qQREzZpIngW+xcKw
IFppEepxRpD5QggPSDB740+lz3B0RBNl/+C9Ya0VXsGrQ9GWXjM2ndbPe03eeEP/MDdkf9zS
td+90npy+iJme7V9a29/3/gFnskH4CVHIQoYG7hOGOXBZvSvabmdrywIHJtlD43t7W0x0V3P
gyDVSjsmegH5WnWfID7j/6wxZvWgXDso7SSMSX0APuC2oPKRG4DThRHR2IleheD5Ee5hyFxm
Rcnm8aEgQcTbQThuv8F9nZqe3UeuhedH0Lg4GfSa581GH16+XKGo2C5cf69/0R9c9M4AefdQ
viM2nhLvI93L6mU4wy2NRhnRyH5HQmy9fhFu5VNbnlXd5R+HDwg9m/6MWZRpd06ag3fNj3lo
XTYGjW6z3oc/obS3t5fNw0vslAfEZtBr/W8zD6Us/FKKUQWYIdX9ILN1EY57SLVTE81/+wC2
4lk5cwgWURshGOWvlltPz+M8mtiOTdjHnuNu/69cre0I/69Sq5XLpSq1Vit7T/r/ezzFnP4Y
/YkTQivk8nOGhrw5ZvgZtSZ6eP4ijIWe43vKnO+9PwfbjMyhGTJAF880bIdk13AeMRuY99UJ
fI8kbcEwGv5sGTjjSQSZRpbLboDLieM6M5SNF2ZgTVBNMnPuRoZABTXnODCn4JB6ZSjc/VG0
MAN2CEt/Dhaq8oAl06GcB5TXRXRrpj46M0t6Mfds9NxoQRELpiH4Qsmfta/gjHksMF24nA9x
VXDuWMwLmWHivPQmnOAChkve/ZRm78nZ4dRHqJwKh8DQNsAJUIGFRJWKISeQ0PKABkAGKYgI
B+DPaFAWsVyCi7ojHldYX7BORkeYHBN/JvcDF7Zw0OseMkDOHM3dvIE94UOr/7Zz1Yd6+yN8
qHe79Xb/4yE3X3xsZV+l6eJMcRsRLC4mML1oiUQxLprdxlvsXz9unbf6Hwnt01a/3ez14LTT
hTpc1rv9VuPqvN6Fy6vuZafXLKC6Y4QQM24nJ4wQ0tRHqtksQgWDZpDxETcvRJTQg5yYX0lV
W8z5igiZaEvNlg/ZI9f3xsIuizS6HZK2R0Wfh0XgIENE/vruGcnuodL0rEIedvehz5AmDC5d
02KwDb05Da9WS3k4Rj6nrhd1gFKlXC5vl6ulWh6uevWCkTo+RcNAIyQJBRxAf0J8HdJGyxCC
Iw+XMAlxBeh8R7gJ4RS3lO+NhwSbiiNGJiXSxPNtZqBZqT8CXKhW6HMmpI6qpUCTI9Vxwvbr
8goWwdzz0LoA31uFS8Cmoi9KPOROvn9tDlrABIQmV4FNBI0vwrQsNkNb2fI9Dy0sRB93GnIE
/wOZzAsmrGkSJfTF9gnKwsRl0wwEThuKJ5060QmnSQiKJBla2pmLeq/f7MJxt/MO/1x2Ow1k
02Yva5Ahp7kQke345ECkXrnOcPUdWVrpd2gYeatD0SEJfeuaRen3Hosc/L/oeOv9aXn01rjF
D9roBW30gZK3W4l/ZRhffQeF7O9ONPDwZLjLDLlEhOWpjXYeWnjCyNxquH5IG56QmIw8AAvf
s4wcQC8IVqaEH2+ImXsT349WvB3JcZp/E02CuWAc3B9SGWjBzhluBceuPpu5y54anrEmZpBD
cFM8pN3mWavX79b7rU4bN3gQsDHHmhbxxbHzgIZul4WoDwi1tGmNL4q5fhoRjzGbH4lrz1+I
07NANzWF2wLFO/oIEaKXtsz59NtvWic0F/KENVvqxjlhzClEExdPjq/OFG0z8sRnIV7lAbxA
Nn1hZ//uoUGfghK4zEvgZhVIXC7iII16wd+3mPWHdyDw/qp1cnAEL2w+L8JU4BO/gdP1NsdA
dwq0ZZFLNeLOgeAaDpID5eeEXNZm5bh1lo35jX9VnVP96g08rFpH/r23setp/eq8r3Xl3zf2
bJ10L7SO9HVzv3a/q/fDr7f0+5/6eaojft/Ys91BKmo9+XfVkx+mMv9yIw6UMLNQmnFtfzr3
+FnkTqsQqa0TMmHAM6dMqUEu1ddOHXq05ERzkcfPWZeNHZLbbeyeuedkrR8nvqX6cUAIg/7H
yyY1Tdl05TRwiMjOzj+YP0rN9k9y9L/Ko2L1gdCbTwz7CAyLNOMsC2/JjOD6GY1G5B+p/Ikt
j4UqSpiwIfRL32/HrasqKeY7cjAYl+w6/yCLJzt5KFnWm0+PlxELlfzNdYXFmOrMJTquNaN6
I/uhZflVqjfkKL33ZuYlNruTzxrcaqVYUrABhY0HH9JHM4WF4v0FekEMR4hp9RWmI5J8jXJP
b1+noGy8QkXx1dU9cIVqUSsLo+fGkCeLMG6gsYD2/oL8ExRXaOBJ+5zJrY6RD7HBmmTEWyFx
NJQs8iTPO413g8t6412zf4BWKzOvD1c7nMXtsgPi0PFc9Kz8cXrWZ3wEJ4MaYjyDVcskRodL
uNVtegYb0Wj+rZUA1Q0xZVMhUk18jQRFq95dHnC6MImjE/4aEwXQQRqR73uQkGKzpJPO0tC0
5RIpQqnvEmzEJG6+0XYP/yczihzsWBkJz8xjX3lmNpoHdLQS9bX5KRp0UKem42X4Xl6z5QDP
PVp7zH7HlofSWsPXSpeJNlIufrBEFTieCrNMnXm2uERNBZufYu4qFO45+QeZrDj8ZJ6awwYe
Jjtg3qdy6fOhWp6UWqQzSbGSN27JfvFYLx6KR6q0OnUaEB6+IdIHQXEwtKfclw1n5sJjNoeJ
7DO3hPAzbTsYoBvPU7J1/KJEGzWOUCELg7aH5J6tOEO4yejaFnJFgYYUcmIc4il8kkz9dNBq
N/t56NHJQYHWrF/I8w7Jgb/ruFvI3hGT8BJ2Sh35G6VM43UU0K8YjMyp41JaQWJxuNZj5geU
qzmCCTrTYeYcRW6zPbjsdPvZ9c4m/8D/xCNa7frJSXdQb3/kZ+qcBqApwek8/AcL/MzLTBoK
vUSR93NiASDdhg7aF4J2ecisbFAuCy9jGLH4XOn0UP1AM+n6QD+OI1vqBoEUMSV6BQqtYxQn
A5Rv+tbdNZMY/pC5ijkZBzj3/VlIR4cOOadhWgNt4Fyh/RXrxme0x6VL8gop30O6EWNuJB+C
ynLvjbD5oNx/buNqnlzC/VI+cn2nZkPgItxw504mCKM0l2il1d9dRBUTrDnMd1CY20xSHxZz
ZzzdlAx/LheiVi6iJxTdYYs4HCS7o2CTMjIxstJ0uOTGtRB9Dz/h1P8hy9ARbY1ItJnKBOSy
TqKboKQwOtIQ4fLeU0uiBZHM1UJU4HsrsatCoRCrw02WZMwA2cM1raWrq8RM2kSNNbixxuM2
DwF4/vxuMpGkNkcsWuK0UYK1IJxIuyoifCCdx2YU5LOuVzVPgqiuthI19Po1ZXylJkzvzQ0Z
FujmIMQPdHQ5pI1benNH/H9D2v7Rcwz31P9VK3s7qv6vWqtWeP3fTvUp//M9nqf8z1P+5yn/
83j5H+MnZ4QMN5K5icHbH42f8KvjMe1N/Oqi/rdBo9NuNxsUfOhRLVLcplnIXEzu7lSrCSxp
ISrXpFJKA20fJ63VSnm/koBFt/ld86MaCLs/7yRDVTGJatwraQglITw5Eo9v0ijDdGpkuVRB
uCLWf3Lcrl80j/7Ymi7t4dbNITmSypMKtYyCTDvRN4asTSzOQwkUjyS9ibpcSB6HDKUppTMc
Mg+pA5GcoUsGf2iBg7wWJMjr7n9ed9vzcajzBujfQeeyKaJBiKmCLczLP7ingvPykAXPn+AK
gUcMPikSfBbRMxV0wW+x4xwHfvG/eZhIl8TK48HfdaLIxsiX0SYWGLLMjKxHbjrBBZ0xPXtI
4hHlA5rDOJOQNsrz5quZB4xH7mDDKosS0dYJODai44wcFgIzrQmX+3KrkjQintwpNhIrZBx+
egOmEJiitPs1cdBFgoUTboKg2uaUfdpFR924ScUBkfh4dpiHchyxUYcnV/zxqfboER/N/vP+
mvof2N2t7Kj6H5Rxu7z+p7z7ZP99j+fJ/nuy/57sv0et/7mj2sXxnMgxXQqP3RoA/5E0O1kB
vDRGi55oJpKQ1NQT1eG/Y71Lqre9Mn/kTNm3LIrR8hA/bg7CCzKqUKbsQMYIGV453Lcx43aJ
nibPqTz5vclKFZN0WfxKptyFYXVpBiFTMf6RM55Ls5YuX8a1UEJSiPiQIeNIVN4iA4ZyY2Tk
6RHyAd6/kg8oFvudkw78Znrz0BDRLEVCRAg/EW2HSypqyGzxXgpcshO35hHWOt2TSljrz9MI
1D+XUWFi+TaX1TDdfjPhLyUQlVNYAbaeVNictFH7It/cGahOprgr57CeCti0Z4m8EFBvyUhL
vCl+OA/Ws+hyy/iG6Y0FZbFz7halSDleFIiOgvI3vCWKZJTOc3QbuPAzw+vVkjFicx+dBREV
D1EL2f40Q5Ih0746PxdFJKmZcYIjkP14KxKXymZiyj4wu/9AQvJrJA9I7ce05NKeRtHSabVk
FKXlNl9rqizmjtsbh2s5mdWCnY2lFEmCQaVWzLj0TZaTKlP3C5oP4rJmEsEWNTiB9fWWGpx0
Gc4mEqq6BVF+A4nc0LLnWoJBrwFAumhO+qFKRFHpUTo3n5TPpYt70vxwVxXEg4UggbxD9tH/
SQ3lveH2f7tH8//0S8aPOsd9/l9V+n90/3Nnl+5/7JbK5Sf/73s838gES16i9Tn7sj1i/HUx
d+b6Q9Mls+byjHRVDsYDSn3hx08UM/x8aKRLF3mzXtynuol6DUrihTyiCOl0oFSEFGcjISys
JinuVOIxQMM7EtG3cK2A7Vj0lfXKswGOp8m40ODBKzlDT1zN3C1XPitrz7nXTDTimsoVrRXP
c0dtJVU5kd9IMTkqtUZVnJoAvFd073GGfqzNfjPR0QRunl3URbUEKoWMc1RCX+mXmILw+rUj
5WEx1/ahzUQOWoUbxbVXzxc2KDbI1cdqAzLrO+V8pnJqygqvaPKsFKcy26rp0nUgEkPKg6ar
WrOoVV3XtzK3EkqS+HaYDzIaOKjXr2NekwpH3d9MsUEetshCOnqBHilaqUe7O9UK2EOyo474
3c5bjCkRIt9Q2v0i5JXcqUl4t+TYOESay/eyiz3MrHc2xA5dvqdCwnmY0QdnaX+SJMTguH4i
GUHEtsPxp3Ll589SGyeXVnGpf/cyidKIM+vEgcgg8vzAi/AAqSGLcR9CgFgf4yR31lHH0fT4
0JszimuE6xcWeOBann4aFqLVazHun6hyLq3gWomJxcQXsfj5Csik3pWXEMbCIswk3uJsQPgT
JS/PAn6HIYeLD1My4iGH8dQJ8DBbE2Zdg7NyQR1N7qk063gUJEacAg7eqwgWplg/t2adSNWg
FIsOdzGs6Uz5Y2SBEjccHaFxhByEPv6cxeXDfDHbb6SVdrThmIqzvjpUzpbiaHE3QYcodzng
layX79nvzEqxqNY7sfmIoZ/TkD//xDGCwj3B3vglSxe2L8+6zd6g0bm4qLdPBp13scEXc3al
tKNY+x7mjstEgY6kNG7JvCYeBnFMdSzzt9BIsXxsUq4xO9BhdpkZ8HXoBrNeWLK5zw0SWxaF
IG4rNZv8qlcH3b3f/CFnexLvps1DVNoNmVAxVBxu4gqSx6QUK/EYFQ7z6W69cnKUCaXYMOR5
IQk8uT8m7qT5VKiLHiBBcjzkaM+S50+Lgmr5pQQdylmlq01/TEdZ5DECPFXGP3ET4T6n6y/x
r1a48R4nS3xedbUc9WMD8hCrtKdWsfVBFrGmNT/uiG49pau/YN1kWnXPbimLEjPp8qwg5Knk
KV1oJ3VhqxL3ZXJyhEP2H+V//dWP5v/hOTi5aH6DOe75/Z+9cq0W+3/lCv3+G358uv//XZ5u
EpL7WiqUDcM4b120+tzy7cmUlMwD0sVJLrUj9C4oeUTJFxd1/e+Abp/HXKgUKoVymaR1l9lv
zUi27hVKBaPuhn4qrUjgdEgC9II8i+lsHtHdZin2PYavg+sC/T5P6E8ZaY0JjqYK3ZCrMZ5e
kvUHiDAqxlD+EAyaRzzxRj/lQvdnCpChlFX9qv+208Uu3G40hsz1F1nDaIEZhvMpaSuyRma+
F5pDx3UiMiqTZJPKQhbgCueJc3sLDwInvMY5WrCgAJLhOtdSlgpx7Xi4skS9cp5aVXp5uvqs
sptkTZkWKjpmmOIermkHjJSx+MaErufjbd+aCyHa15EEDpptm+5sYj7nqTeOHIVqIz8ykeoX
9ROO5dwTkCiLS+jYc2FW60lcIABccg8Z8wycwWN2gX7/Rz2G8eEtatRWD1r9X9Ms5JAVQQUp
28JHSFLISLM4yUz1IiKlzHeRLqGHwEOlvayBo4gZcPR4wsgaoIEhMnEkbPTIxB0Ac8qrdAiQ
MF5wmDPFJSHeqFJ5jQwaHGEhjZ/tM/E7QUuGQOcznmmg2xeui+zNf5cHKM3KE3NEQwph2+YS
9TtZNgbn4YW5jPdvw5x5osJwmaR3ycARPMrr1YntDfolPjSUKMQh7Ka6NKPk7w9pP1FFW6yu
xseRDX6JQZ2IgsFvRdH7dqe9fSuImTk2tdLzpCybUrbG284HOOk0aVfhQ6f77leDJzrp1wm8
NZi0RpXUzSeXWMmhinO9aY+KZ7P5kU3ngFNuHP8dJ7E2vY48idvym+ho1dG+slQfHr9Irtmu
XWfXTF2+l15ee21wSPEvS8WG8vpVeFXIv2K5huSnye0wMglgDWPQadx/2wTyW67arYaIINUv
Ou0zXqDXk+Tvo9xzPGeqsslobwUmUsfkXMCNJ48Hktbse9pvrcbMSGWV11eP7hJd7OPL5dfp
nWidFgU4UTxrqHKQUJ6SuUx3ciqIWfJJMQJ/q1yAAjQV4kAyPvKN2BDFI87cUXKpMLVlHQHM
JikcOrTsib+gKwF55TXrW2mjQBdLxAW5hB1tVhj5MzprkAkZo/pCd5mFse/bVANnamKTV2CM
jERLMJ3hN/pPYqkyPSQYk4nfjdM8mwIdNC491Ulrd+izsU1JYyGFxA+YhbLjqx78X3vX2tzG
dWQ/79T+iAlSFYlVEEyKlmRbqexCJCQhoUiZDzv6tBkSQ3BsAIOdAUhjf/32Od195w4ethMn
dtUuUXFsAjP32befp/ueDN5eppdn8th/yHOXEZAy+z4v71WDn43lt2+9dJ09IGP+Xo4d8CUL
2Rd5qNYFDKadHLeQqiAN9MGg00tznd5nfEoZrBH67WTVCw8Ko8dshd6XdbqcA3MxG+9ZQZOe
DTa7Lu+1WFsB2xECXv6vYpLepByPeYoc+omXPoihXzyTcwPblSJxHkBJpFFKJ7cvXazizeEm
N3Y2s6VaX/omv8mWTGjNZ2sPweAUTQFnqc1KMjtzsqBTUS3b9U1aT97K2a3vxJhNP+TZjEZn
d53v8qSOSiUZ7qIJ9xsnxbqc3CtKSWaNXU7Xp8OF2CiKF6UXyUHIyHbrnMvcFKdBdZiGkRj9
Z7TQ1sw2DmqKsr3ZTMfvW+ioQll+1CCxc0BIAaZRyaFZgGaS5M3VuwsnUlPzfJE7VonvtS2O
/9lBa3Xuf+t6jblewjhSLBCcSYRQLR4KsJchZVPGCiEzEeZjTn74ZEq5/9+E/dTLClirB92m
H7CZohD04hMU0RxfUm6yAs6HYXifalH5ZHvpRankkKXXxUJdstATqijR2z0JHSQPutM0dIyn
R6CxDoBEo+sOv6HkbKz2abnWFouxQBdayrFe9Ri+gDNR8Qq2cJ1iflN31MeDrvB3Ne3YIa00
AQzrIntPtYobUf8OUkuV6WQnlE9+6Pk35X/eCy8tF6JO9m6yRyjvb/2J7P/l/F/Uh9j/r37E
/k+fvzww+3//4OWrA9j/+4fPH+3/X+Pz+999Jqzms/ouSX5vduNNVcyDOrCcwxosq1FefSVP
HOzFCoar355bKQ881wdOt2nn8vPhnmkPQTzFbs2k50Uo/yD/OWv+s3nmD7/1gv0f+0TnH5rf
v6SPnzr/6SvD/7/Yf7XP+t+Hh4eP+P9f5dOcf1gC0Pv9cgL/e7b2d0Mxv/XYHz+//BOd/6Oz
j5+Gp+/++X3I+f989/n//HD/1cs0ffn54YsXL/afHwL/dfji5WP9/1/lo3XAmWgxOB2c90/S
j1dvToZHqfwzOL0YJP/mNd2/8QSXbvrnpZhmB19+eZAk6XpKzxdfdvnTzowZzblI0p2fl69e
oFppnfbvxSI+yoQhFaNxzgSM/ecHh18y9SJJB/d5tVIjERb9tFgsHJo0X9GSiTKD5Nlr6X6K
H4u8ToLXfGJZKu4979LBe3OXzeh5KGgush4IXBpwdSf/pmvyscplbJMctUwuWVmILdWW8FIv
Ghc8zfi8LsYzc69m38uX5iiuEqQ2jeAyKjUlhIPnEJDS1EvTNytGAqoM5fK3Z8Yknm5D//1C
7H3tarzMkOKTWxzkx7rCb4mP+dkzQsi/N2tYrfYmosBsLEa/malbayyjl1KHTHZkAgV4Tqnr
Y76XXdk5T6IghoEPguUZIAIPd7C0s+XirqyYFsw6umWyrHX7ZEhPL+BT0td2UWVrcjfwmtGr
kvhinxTXVVatduU4wX2ZZ6PeXsr4CFz/WRPJTrj0NmJ4CsqyB6oJmULzPGNpkFaOW9frrlT5
LW5HoFPDN7ALmkzmFX0aBHVsH1m9QXvxnmroKgmB+Ig6orOjR2ZjfOlTo51qTFJINDaWV/fS
tTs3Hor6bq8bugrODgVgSdNiJ6DcjyzYOEdgJfEXgVwqFtGrjIcppbaoUV6Hl07GeKOjRCMz
IhY4Xl/31+bdsOZYItbbHZXmSaKTrebuXJZ4dYFsCu4fuVzNXXEQJNdShIaslJUtZvOyGNfF
KBFiBXvCYuYzjQlpJ9oSBg6Srr/Xn0rsSpWHdEV9ipGMer0XFIxG/iDZXV4tcLeWuaKLELDE
8UTLuqLJ1h2NV5Kpirb8IX+SS/FWfsh/yJAh1/UntjZXL1EJugllPtzlOHbJGO5bzlgRM7e5
NMR+4CQdu/tLqKOYaxDMgxa2VlhXHCM6unp6yvjuGjnDCccD1g2kFpEXgh0R5Uk7fSGJMI6a
Pr+7fOrEwIRRjVuvlGDo0kt8axLeOLKFSixI8SB7usjn9Vfp04M9yiUVle1VF7JMnorhXCJ+
YmQSSaaHO5TXxhrV/HGSj+WYU+LVtYEt0XQ33mFNg/VtjPvjqBGN73Iv6N5V9vmk9qkQS6m5
QUrwjiPkahvBJVzw3KUw82wRxBnVYSuUnc7KJtFUXeEmQJImkB7u3onYMAdfqHfbHcIAEXFo
88zinRhfYtyijinIy0BzMA9OHOopNZlOd7tsSTHLJoi765QgZGQhRLRPKUsZCNdhqNdTXa5a
LeqW18l4JStrKzF59ATW0nKRWTV0nCT8PFl12UnMnhSHyow7MGoR91jLhYgQzt6E4xw/w5cM
ugNvJQchElVTjoQ7VjrjKDG6VJRBposeJCcmUcxGxX0xoms4La/JSLSToM904QHKhTZveNos
vSk0g8BxVeSiRa96xjQR2ltwm7uhLOI0GzE/m6jF1NfZJqTH7zroUIqsdNJ6YuoGuDxLxS2a
5zQ1vec62Bz7H04u5VMpM1Suecu6cvMVcdpN+F1pXTPG3aV/W0Lb6yX/nvykgiy/Xg7OP1yk
/dNj4KiPh1rLBUnTZlJ102PAyYdvrjSUKw9+ODsevrXYLga/bzGULaqSkSMXG/Ec6jHEQChn
IEZEFJAEsacFZO8cScwhdb1hO3flBMKlzlam2k5FA73O40zzZGu6vAxsu3rR02XvfNTxdUR7
BpSmm1BnCcOnWIjmgNF71KnDqSBcHNAL3lqCiFLtGfbRL2iDIJG8Ku5lx+5zXRAdfDPhSfbw
lZ5phbvKzKVbfdaWzbP14paJ3QcZUJnoJk2Kv9oQmAH4e0wytbPcIJuRtc/5c8eSiZzNZTbG
kj1FRWVhBLcLhKf9BatsY5kzo5C1PylEpbWfZ4nvTNqJe+9A82Tk3E6GYm1Go0phIVmddkR2
dOSg9IW936uCUNq6Emi041y0JkllEopnoyErdRg5vFYWS61suUAYnky5ltadVIBmKm8Tx0fE
S29M2TUdIEcsHoXWWsgpeyWJlHXGOXnXF8sokk8qGy0WlIjpBqEl3vNTYYP5HKrXjFaJxlKJ
cFLGJfPcMuK9HjAwvohKZNUS6vbc4c8ud8IkATEiuzpQKBWO4M8wWF1Xs2ae1LEeg+2NlWuo
zcBwyQmZihRYiiKGwHdBk9CVfizNvLhZlst6or0LzyEvF9qVb6wERsCS2CDjp5LmpBnnsUnc
TLJiqsVxXfK/1iqIheLZTLtL9LXaJdatlyiKOeEsIJIACLOYPeYWmk7wDJXIxj6MFIH20rXw
RU0/7YoS4WkrVsFdUkuHymsI6c/vVnUBPJLStR5mN9e0J1XwVtZKuxKI6XxBPYr0LwjdH9wy
d6WZlPO8oRzT7wziiFlV2wnGOaZxtkQ5W0pkBrAQOtydrLhrslTpNFY0ydrbjNAY/LbKKxc2
uYOEsI8tdCmkIQr3NM8XjkLxEL/L8a+0eGy21xgBitNg0Nt1RhRTIM+/kbXlwsocFVBHkrPk
YkJSIhuT6608R1twDjSCtWWEp0/1dBzXG+MgbRKj6M1G6wWAiZ4sM22LGZtRHBIMMNZ7XgSx
zu9qFXUxtqi9sZZLXZnaXd7CCGppVADMWi8ZVsHpGSKKp7GoRqEVENAuTcBFv07/Zs9V97D0
Lui14rmwEiB16gZrAPdUlUEMCZ+xyQujFQYb2YS6lKBR/shiKtK2c2GcCJCeJjY0DVJJtGo5
6mOqRiJpK8IyuSZM8sGqyaaIogSCVnqazcqlcBeFL1II81C0OF66leNlbMC+2G37PIVOO0FJ
INPAAn3YKdBxhBf2GoeFIoVx4lv1jsiBbLW5XWxh/cCYGM0nE5dfaM4BLvdF/rDGE9lKo+E9
HfyAUsjS1FeOpQsiWyF5RWtT4UFlEwFOGyhBF1+9BLPWkneVibU4kM9mU0MgWGzUbOdaY709
0dzdb8JnDeiq2KWyTa/sszkeNEYTr3qXiRnoOHE7AzAn+YoqQzuPZpdyyapiyBmoy5m0Rlcu
VKOKGmKjd2jlP6CNF6rO1qbvTWWN7w2vOGudQd1ZQp9xRLsOP4rmWYpoC8N/CCiiFg1NU96J
FHcNp/NyEV5I1oiOmM3QLApmZQr3chajpgmRVs2WJutChYw1VjhNaGkbbhTaW86FkvYKqAO4
cYeonadKgCvDNcCxeVAkEmxtZd24kql54uoOkS9ofOq0qnycVaMJEXG3xDA/QEyrc+xSXuxG
YQKMlP73RWCYdZMUS8Uo8v9RUa0XSew6coQ7KlhJO6kOVh0B8tzrVHbpjoZD0xXNmyT/Ia/U
/HXHmWVqL6pysnWxIwOqrESdm8Cb4eZUvVUVkDkPieEvNJozJTpsPMYqebMOAuQ8WAt/S0PJ
uq5FBmlXLe/URPa02OR9OVkyYTcBALOsAHRTnt7MT3XfhgtdV87/otEp2yRNw0rZKuUOf1xV
X5/C+uhhQqowdfXn+R5RNdffwafiPnBFKpLfQCPbIn+TCz9xBxzDcwPP7lCihBnAZWZnSl0a
sgKN/tS/Qa4N1BWWk7PdwHeTnLKuUp8yBaGh159BmGOQqkA1RkjXzryf2rh83W5NUGVNezpa
TVQ370ZaK6dZhapQS3cMNU5CCB3Vxl7LEnaDRrY5syycJ6rc3fQ+mxTaHBKVhDsv6H/Tea3y
rGKgpjErqCCRIay6ppCbBjWzBCQa0gzoUTGyCJdbCJB+ijsm1FwXLqbXLqWwrj1bWF/xVl5E
e3Na+0DFTwXwz9uD3euvM/kH9uBmF3XFBWojm5X6qSeeYINU9q/FoXZMGToKvWfZRMYyU35m
aoyFbdU9oDdNzJgMJpxSzLYNd4e7ESD08H4rXSX4an/y8HK+QUHNAtXBLJd1qSzp7WJ57dLh
WldfVBfmrsTm/W3DVNQjpmNhWFC3YxokJx5CMM48tW3LjCk0PY3nZLN40OqRC0dfe0/Yu3bp
8ZiNcQEyVI6WsJWKxmoRy26yrGmZZHVd3hTuEJMjgAqLLBtdhLJ3/rzyYQAQrUCySDWXXxhc
YX4yqj3wkE8mWaw4NDOSWb73BBDodkk9z7njuSuz3Y35xMeFIT5IDfPHJaGsmbt6glIbv/YU
Zru6C61lWaNrWiAJ9mkvTl/5jhrAVCia2ulTnSFGrAmVqprUYON7NsPEUiFwAFb1QlQ3Opl4
i2xr/rCUZFWXM+otHHPoKjG1PbMT6iUzotUTIX+7oS1ErUPFik4AojXmJyOhIxOhSc8k37Lk
eA1Fkxropjat1t9Koa4La8Yo1xrYoD5Xt6mMsjHkRVDPr5NtamWLS1r+Ubkc30W8vbCIuTo5
p/Oc6TFbhrDmLooWA1GDNP280Rm0Lh8cQequEfuPTvRQHna7LpEopYJ68x/mcOTSgDJR7+w8
UlUQzYSDSahivkio4zxQGyx3dr+7d/BPxJWUBhkrypYQAwsTZpAiBTayFffcMqwknENfYKjQ
7bqv6rPiYniYndsLCeEaWuQTDPE3Ry4UVQO/CQPj0eE2wbwBL/YBiD2IQJf873Y5Uc4yKZh+
Bun1QrfOzbvY2rRLgdoWSF3AKenBaZKOwS3IbMP0LYVHY5hjmPjqtm2Hcs2lJyx8x8aUzIRb
j30o9gYWb+ZWWcUg3V1xXSzUVT/JHkL0vowvOVojo0t4L3A3VBcQGB0Q07CqdtbqhuLrjq9t
TvY9de4g4HgTqEb7b9X+8D3WGwQRpobH0WFGf09gT0cchp+sLeKaiWNQh5eWgYhSlKag/Jiq
/xMzXsSghrUDZMQPE9lPo7O0xAPJ9osiRfQQt32JUYDfxyWn+zstAZosIhhDe80cQmHsqRDJ
YJ7L22XFeFULcBIKprlT/UkajE3PoFMGQLqWpbhjiKuXtE+SIVSszm0OXfBGr6X2E2ghpYgd
cx5rFtkrFANQwU53CrLaPDIAISBW+3fLEespp6qkRNapxpwT0UQhcXJ/6Nb20+MH8NekTzXa
PC0MW2jxas1L2+smERVSGeY6khBAO08N/4JJ6aiYSSEDF3PZO2449Z7LaUD95JgsTNMPXayd
kVbOGcQFnJ/oN4jG3e8q5MLwT3g99umXpo3XQO0IedXFdDmRY5prsEgDGCJDxqZXNlw/icM2
EVovrxbqfo9eM9G/sYlQvZ0wd5w9C/tvIpMy392AngmVPBUjmlblSsyE1TNCCqLDHekJ3osw
P1V79VrTpo6DhVhGhWYvqts+/CVmJLUKmYdOkZynXRuAo/LlvZZFstr1bR4YFWpARL2C0Aru
IG7yjwxfdbgo6LPhkJL/vMsn0KTVGAaSbqaHMqeWp6KXTeAw3iwnqHtYVDfLqaZrK4e7ziZx
Lm3UfIRETdQp6fEUfygKS6whVw1AOVMSSuJuEUEdtlxu82VFDrbF5yY7szT5zL/01Efok7qB
VcDRL6S6Mu8Z3XUO1DNfnToOChbNZyP0ZuuTr9udM4GdKuOkNUKP8hmSBpMeV9biwmCYjYHd
2mJV+rvBv5qglD85iYr4ucIznPrndMljwdL0A/cxL1HmPkBykjFwHXKsletYN8EUf0AIv2IM
Eui+jSHlo8SpnazLbBKiEY2fay3OutDi6nHerDnTE33ptTlRUUnbwr0EUX02Kme6ASORPiMi
Swm1Sus70gyUQYr3lrMgjNXH1zAjG6TCTwJewtigSUJlxHdlQZ3wcu3UxGRKSBwGil7g3SfA
6cGMxGtZhvxeDwBqY69LK5Wq9WKDPdOI+KLnwbV1P8Vnhnpd41isQuDwCYQPHBxKw6jiHT6F
S6SI+q9XTWQrttOVRzfqyAaWCFyRplfdGsemGUCOno1G6ncAEch2j3M8Pr9jBL01xQj0InJN
Y3GJMuIwla5CM7NF+9VWOoC6c2ZUAlATJmkWQlnHsrYO8hFE4kyDU7jaudvmxaLkl3KCESKp
ydCjIco5F6p0B6OFH6/L0QbKgMrLlyyxsRuKjpVy9EWV3xeM3uqWA9RsN4bUftHIrms3qANA
i8VxwpULaXqBucVt8PCAMEXCF2DuMvZ6XlRRUT+hJxxce0PTIzBCLVGDF/ReD7J4BRz5laaK
oNQwhxAiIZBUrv2yFFkY+Ffhb8QWyh4vZdLgi/6E3rLc4EPdNqY3h3dSZWvPbhgSyiojQJ1J
2g6Yd+sSlk63seIosh2j0TjPIwdqW6F2kJhHCH1QZeWogVZX22+YSbaQw8bcm4CGLsJq2xKs
BclWAcNSup7vr/Da1B+/7ybKyVDo0n7PlUfHoEang7rCBv6EWDjlvzEKtbb4XesErynVoRjS
2q0uKh8Sw9BDfW8saVMNgxQI8ciYzf3Eyv/YJTKtK4CQwlFOcxyyOqE8CE7GOiCeLU0DQozr
7tf3CMmPmrEAMj4uswlPN89ede9kp2oBS5ySpuT9xglQ+9WybeUBtrNqHNMy2OzI/FFsA644
MDESXhkrP2FRDkt1Oj0Ltwlx/w966ZvBUf/qYsBKRR/Pz96d9z+g5JehYo/Tt+eDQXr2Nj16
3z9/N+jiufMBnojbAkY2aqCLMjb4e/DXy8HpZfpxcP5heHkprb35lPY/fpTG+29OBulJ/1tZ
zcFfjwYfL9Nv3w9OkzM0/+1QxnNx2ccLw9P02/Ph5fD0ndVS+vjpfPju/WX6/uzkeHBOtO5n
0jtf1KuNBheJjOOb4XF7Up3+hQy7E65W8sFjcrhm6S/D0+NuOhiyocFfP54PLmT+ibQ9/CAj
HsiPw9Ojk6tjAoHfXGlNH1bZk3FennFp/FlvXQYj7W/cyQTk8M+4lIlLKI3Igp8PL/6S9i8S
W9ivr/qhIVldaeND//SIG7W2kZhu+unsClJD5n1yjAcSfwALNUiPB29RNPob2V55Urq5uPow
sPW+uOQCnZykp4MjGW///FN6MTj/ZniEdUjOBx/7Q1l+YKTPz7X0tPKW5z1snlDJ4BvQwNXp
CWZ7Pvj6SuazhRLQRv+dUBsWM9r35NuhdI4dWt/8Ll+RH5rN/yRkhALpnxSY/cnIQ4YZkNtt
qhCiaKiz/+YMa/BGxjPksGQgWBBs0XH/Q//d4KKbBCJg1wYm76YXHwdHQ/yH/C6kJ3t9oqsi
p+jrK+yifGGNpH3ZTkwNdGhbhjMIWjt1GpG+18/l06bvNfoDXZycXYDYpJPLfsoRy7/fDPD0
+eBU1ovHqX90dHUuRwtP4A0ZzcWVHLbhKTclwXx5mofnx36euM7p2/7w5Op8g8akZ71oc6C0
FjbEiexir0saSIdvpauj97Z7aevUfkrfy1a8Gchj/eNvhuA82k8iZ+FiaGtyZi3YOpKxMddU
5sfntwD4gf3vzwHOKX74Ck5cyAEtT6t+1ktqAfLlJ7DdU1F5TNbVoGOTjyMRr5Ny3lzz3qAp
oyw3w+qZyBwzC6ReJGKJqLNsWQcppAae2d05K0yt1DN9B0NDVR9Fu1MSFYukLRFUEoa0nY07
9KKE0BAydiei58W5Y3axyCzw1ChIAdLr+qM6I1KrvFRnt5gaRhzenjb3khqMiDAci7TwWixP
GdU8FEUOippwn68sciUqfG3KWgM5JpAHTbGN+MY5j/lTk+8EpaCDkqXmvErnJe0gLXuXWxIs
AwYG9UMaE9QAg0L+EevJ9x03EC3Ak1pLzGvT12KB3GpVOUKK9EY/YsP/xLbW06pXK2nfa9Sr
4vMn7fUX3ZNo/mNFUf4DdyWma3clBsTez78v0TuJ7ktkKz/rzsRtC/D33ptIyMgvuzsRTfyy
+xNDktHPvkMRb/zyexQVIvGP36WI9zfvU/x5KfxIRgFOCV6BGBYCz5myWy9/CyIWBZmVD6ty
JuPXFEDR91EDfqKuzhZCo4VI7Tov9ESSDMtWBRCv1umF+5qAR9S3XNCIYRZFC9sqByY3BNW7
mSjV96rNOzm/lKltObvtk7vxthaC5A7031ycnYi2cfIp1pRfkwJs8/VG7L8xW/XhSa85BOun
v5EzZPz5BP1olbgWM2ALljsV/EVugr2Ou7t5Eg+kp1CVu9Uchh3jWg3K28fHMYS3jVo907aV
TdKyG3fmm53dMpRi0Y+mP4aKa3g1V3BoIMbGCLDYZfQoRMlOW4dmuUvqmedpv86TaSlNPruR
EXxPR8Y0ny1lwfJp/ewZuDaN5xr1/9I4x799xSnBeEg/5iO4Z7RcoS6WZ7oH+LG9PUVdWM3d
rpIaJvtEYxszRbAjuIzEucYZ16TcdJrMFNc1UKgUqfG1Zmi+N2R6BtzEfCIigqgpvgMy1fyK
T+WqHK1muZ9oyD8tWmw+8WwSewN5QqCNGMO1zqWhv0V0/gQBMWIE5TTWmsLLUtYOfKn3ghNN
OvszRpO+R6H/igzvjwodQbK3UMnlSk5aOftTNz0QvawqJqw+AgVFf+iiQkddeE7XN0JB5snd
wWSDX8UiRY1PI5RonkfejCTKfA1FBkJYrYpZUYagbFUiJg1mw1ISwSmTOB6cGZlg8iqZGG7U
kYhSQTRX3GOr6qnjUBJr3J1GyhQeHBbqadwjUd48Y2ZLdYtke3WLTWfmb1295vHzSz9R/afh
qZhzJyf//D5+4v6H588P9v3+h8NXuAvw4PDF4eP9f7/KZ+P+h6+vhkd/SY0WNi+AiG9siC6A
D+pP7wWFqvy7dwhFbiiCYWLXQSThOohu+8YIYe4He2l6NUNPUYG5ZzKi3mL8P+o8Vqba9Gnm
4aJiWuRzaeBDA4ahwed7itHMs8XNXagOrY9UdYB3qrEXCrpDqBmCGkH9EhyVuup0rYvm0kNZ
kygZn0/opWahloNDmsF5P51dnadeJ51yt5ce57eolwuO3ZmuRtedXnoUKvV+EJGJHnvJ4Z5d
gqOS1ufF5bHQZpSu3NEHnn3cT/+4uSCd5HNp7Tw3ARwe6KVPj3mBmGgTLJxSsjg664QTGbaX
vJAXj5pyPPPxf91dZz3caG0phaJVBS+Dgrn4p5WxfvpJhO3QK3QYWLteip6117NyWvLeUPP5
6jUfgzT5FV/g5Vv+gUoizONLIamXXwjl7IO5vHjR83/2+dSikiOfvOS01d8Q7q1SRO+qQXU+
KwDoBaY8X3gd5730dpKNheJeSRNUjTs0LqT3jtosupSRGlQnX+BZSw1O14C6pKNZKJbaVE7d
CwlJ/NIFuJORQjIKRHFd7fKbUpIvMb2lftlZzjsGqpZzts+rzDDrCZMQUXn70nZX94couspf
RlVQf113revbxvLwSxTJlzZ21H7eWfn5QlSSP4sunh6mz/f3fxVW//h5/Dx+Hj+Pn8fP4+fx
8/h5/Dx+Hj+Pn8fP/9fP/wKykq3cAMgAAA==

--Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)--

************


From owner-pgsql-hackers@hub.org Mon Jan  3 13:47:07 2000
Received: from hub.org (hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA23987
	for <pgman@candle.pha.pa.us>; Mon, 3 Jan 2000 14:47:06 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id OAA03234;
	Mon, 3 Jan 2000 14:39:56 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Mon, 3 Jan 2000 14:39:49 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id OAA03050
	for pgsql-hackers-outgoing; Mon, 3 Jan 2000 14:38:50 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from ara.zf.jcu.cz (zakkr@ara.zf.jcu.cz [160.217.161.4])
	by hub.org (8.9.3/8.9.3) with ESMTP id OAA02975
	for <pgsql-hackers@postgreSQL.org>; Mon, 3 Jan 2000 14:38:05 -0500 (EST)
	(envelope-from zakkr@zf.jcu.cz)
Received: from localhost (zakkr@localhost)
	by ara.zf.jcu.cz (8.9.3/8.9.3/Debian/GNU) with SMTP id UAA19297;
	Mon, 3 Jan 2000 20:23:35 +0100
Date: Mon, 3 Jan 2000 20:23:35 +0100 (CET)
From: Karel Zak - Zakkr <zakkr@zf.jcu.cz>
To: P.Marchesso@videotron.ca
cc: pgsql-hackers <pgsql-hackers@postgresql.org>
Subject: [HACKERS] replicator
Message-ID: <Pine.LNX.3.96.1000103194931.19115A-100000@ara.zf.jcu.cz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-hackers@postgresql.org
Status: OR


Hi,

I look at your (Philippe's) replicator, but I don't good understand
your replication concept.


    node1:  SQL --IPC--> node-broker
                       |
                      TCP/IP
                       |
                    master-node --IPC--> replikator
                                         |   |   |
                                           libpq
                                         |   |   |
                                       node2 node..n

(Is it right picture?)

If I good understand, all nodes make connection to master node and data
replicate "replicator" on this master node. But it (master node) is very
critical space in this concept - If master node not work replication for
*all* nodes is lost. Hmm.. but I want use replication for high available
applications...

IMHO is problem with node registration / authentification on master node.
Why concept is not more upright? As:

	SQL --IPC--> node-replicator
			|  |  |
		     via libpq send data to all nodes with
                     current client/backend auth.

	(not exist any master node, all nodes have connection to all nodes)


Use replicator as external proces and copy data from SQL to this replicator
via IPC is (your) very good idea.

							Karel


----------------------------------------------------------------------
Karel Zak <zakkr@zf.jcu.cz>              http://home.zf.jcu.cz/~zakkr/

Docs:        http://docs.linux.cz                    (big docs archive)
Kim Project: http://home.zf.jcu.cz/~zakkr/kim/        (process manager)
FTP:         ftp://ftp2.zf.jcu.cz/users/zakkr/        (C/ncurses/PgSQL)
-----------------------------------------------------------------------


************

From owner-pgsql-hackers@hub.org Tue Jan  4 10:31:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA17522
	for <pgman@candle.pha.pa.us>; Tue, 4 Jan 2000 11:31:00 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id LAA01541 for <pgman@candle.pha.pa.us>; Tue, 4 Jan 2000 11:27:30 -0500 (EST)
Received: from localhost (majordom@localhost)
	by hub.org (8.9.3/8.9.3) with SMTP id LAA09992;
	Tue, 4 Jan 2000 11:18:07 -0500 (EST)
	(envelope-from owner-pgsql-hackers)
Received: by hub.org (bulk_mailer v1.5); Tue, 4 Jan 2000 11:17:58 -0500
Received: (from majordom@localhost)
	by hub.org (8.9.3/8.9.3) id LAA09856
	for pgsql-hackers-outgoing; Tue, 4 Jan 2000 11:17:17 -0500 (EST)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from ara.zf.jcu.cz (zakkr@ara.zf.jcu.cz [160.217.161.4])
	by hub.org (8.9.3/8.9.3) with ESMTP id LAA09763
	for <pgsql-hackers@postgreSQL.org>; Tue, 4 Jan 2000 11:16:43 -0500 (EST)
	(envelope-from zakkr@zf.jcu.cz)
Received: from localhost (zakkr@localhost)
	by ara.zf.jcu.cz (8.9.3/8.9.3/Debian/GNU) with SMTP id RAA31673;
	Tue, 4 Jan 2000 17:02:06 +0100
Date: Tue, 4 Jan 2000 17:02:06 +0100 (CET)
From: Karel Zak - Zakkr <zakkr@zf.jcu.cz>
To: Philippe Marchesseault <P.Marchesso@Videotron.ca>
cc: pgsql-hackers <pgsql-hackers@postgreSQL.org>
Subject: Re: [HACKERS] replicator
In-Reply-To: <38714B6F.2DECAEC0@Videotron.ca>
Message-ID: <Pine.LNX.3.96.1000104162226.27234D-100000@ara.zf.jcu.cz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-hackers@postgreSQL.org
Status: OR


On Mon, 3 Jan 2000, Philippe Marchesseault wrote:

> So it could become:
>
> SQL --IPC--> node-replicator
>                            |   |   |
>       via TCP send statements to each node
>                       replicator (on local node)
>                            |
>          via libpq send data to
>         current (local) backend.
>
> >  (not exist any master node, all nodes have connection to all nodes)
>
> Exactly, if the replicator dies only the node dies, everything else keeps
> working.


 Hi,

 I a little explore replication conception on Oracle and Sybase (in manuals).
(Know anyone some interesting links or publication about it?)

 Firstly, I sure, untimely is write replication to PgSQL now, if we
haven't exactly conception for it. It need more suggestion from more
developers. We need firstly answers for next qestion:

	1/ How replication concept choose for PG?
	2/ How manage transaction for nodes? (and we need define any
           replication protocol for this)
	3/ How involve replication in current PG transaction code?

My idea (dream:-) is replication that allow you use full read-write on all
nodes and replication which use current transaction method in PG - not is
difference between more backends on one host or more backend on more hosts
- it makes "global transaction consistency".

Now is transaction manage via ICP (one host), my dream is alike manage
this transaction, but between more host via TCP. (And make optimalization
for this - transfer commited data/commands only.)


Any suggestion?


-------------------
Note:

(transaction oriented replication)

 Sybase - I. model (only one node is read-write)

	 primary SQL data (READ-WRITE)
                |
	 replication agent (transaction log monitoring)
		|
	 primary distribution server (one or more repl. servers)
	        |               /  |  \
                |            nodes (READ-ONLY)
                |
         secondary dist. server
                          /  |  \
                       nodes (READ-ONLY)


       If primary SQL is read-write and the other nodes *read-only*
       => system good work if connection is disable (data are save to
          replication-log and if connection is available log is write
	  to node).


 Sybase - II. model (all nodes read-write)

     	    SQL data 1 --->--+                        NODE I.
                |            |
                ^            |
	        |     replication agent 1 (transaction log monitoring)
                V        |
		|        V
                |        |
         replication server 1
                |
		^
                V
                |
         replication server 2                        NODE II.
                |         |
                ^         +-<-->--- SQL data 2
                |                    |
               replcation agent 2 -<--


Sorry, I not sure if I re-draw previous picture total good..

								Karel


************

From pgsql-hackers-owner+M3133@hub.org Fri Jun  9 15:02:25 2000
Received: from hub.org (root@hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA22319
	for <pgman@candle.pha.pa.us>; Fri, 9 Jun 2000 15:02:24 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
	by hub.org (8.10.1/8.10.1) with SMTP id e59IsET81137;
	Fri, 9 Jun 2000 14:54:14 -0400 (EDT)
Received: from ultra2.quiknet.com (ultra2.quiknet.com [207.183.249.4])
	by hub.org (8.10.1/8.10.1) with SMTP id e59IrQT80458
	for <pgsql-hackers@postgresql.org>; Fri, 9 Jun 2000 14:53:26 -0400 (EDT)
Received: (qmail 13302 invoked from network); 9 Jun 2000 18:53:21 -0000
Received: from 18.67.tc1.oro.pmpool.quiknet.com (HELO quiknet.com) (pecondon@207.231.67.18)
  by ultra2.quiknet.com with SMTP; 9 Jun 2000 18:53:21 -0000
Message-ID: <39413D08.A6BDC664@quiknet.com>
Date: Fri, 09 Jun 2000 11:52:57 -0700
From: Paul Condon <pecondon@quiknet.com>
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.14-5.0 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: ohp@pyrenet.fr, pgsql-hackers@postgresql.org
Subject: [HACKERS] Re: Big project, please help
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR

Two way replication on a single "table" is availabe in Lotus Notes. In
Notes, every record has a time-stamp, which contains the time of the
last update. (It also has a creation timestamp.) During replication,
timestamps are compared at the row/record level, and compared with the
timestamp of the last replication. If, for corresponding rows in two
replicas, the timestamp of one row is newer than the last replication,
the contents of this newer row is copied to the other replica. But if
both of the corresponding rows have newer timestamps, there is a
problem. The Lotus Notes solution is to:
  1. send a replication conflict message to the Notes Administrator,
which message contains full copies of both rows.
  2. copy the newest row over the less new row in the replicas.
  3. there is a mechanism for the Administrator to reverse the default
decision in 2, if the semantics of the message history, or off-line
investigation indicates that the wrong decision was made.

In practice, the Administrator is not overwhelmed with replication
conflict messages because updates usually only originate at the site
that originally created the row. Or updates fill only fields that were
originally 'TBD'. The full logic is perhaps more complicated than I have
described here, but it is already complicated enough to give you an idea
of what you're really being asked to do. I am not aware of a supplier of
relational database who really supports two way replication at the level
that Notes supports it, but Notes isn't a relational database.

The difficulty of the position that you appear to be in is that
management might believe that the full problem is solved in brand X
RDBMS, and you will have trouble convincing management that this is not
really true.


From pgsql-hackers-owner+M2401@hub.org Tue May 23 12:19:54 2000
Received: from news.tht.net (news.hub.org [216.126.91.242])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA28410
	for <pgman@candle.pha.pa.us>; Tue, 23 May 2000 12:19:53 -0400 (EDT)
Received: from hub.org (majordom@hub.org [216.126.84.1])
	by news.tht.net (8.9.3/8.9.3) with ESMTP id MAB53304;
	Tue, 23 May 2000 12:00:08 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M2401@hub.org)
Received: from gwineta.repas.de (gwineta.repas.de [193.101.49.1])
	by hub.org (8.9.3/8.9.3) with ESMTP id LAA39896
	for <pgsql-hackers@postgresql.org>; Tue, 23 May 2000 11:57:31 -0400 (EDT)
	(envelope-from kardos@repas-aeg.de)
Received: (from smap@localhost)
	by gwineta.repas.de (8.8.8/8.8.8) id RAA27154
	for <pgsql-hackers@postgresql.org>; Tue, 23 May 2000 17:57:23 +0200
Received: from dragon.dr.repas.de(172.30.48.206) by gwineta.repas.de via smap (V2.1)
	id xma027101; Tue, 23 May 00 17:56:20 +0200
Received: from kardos.dr.repas.de ([172.30.48.153])
  by dragon.dr.repas.de (UCX V4.2-21C, OpenVMS V6.2 Alpha);
	Tue, 23 May 2000 17:57:24 +0200
Message-ID: <010201bfc4cf$7334d5a0$99301eac@Dr.repas.de>
From: "Kardos, Dr. Andreas" <kardos@repas-aeg.de>
To: "Todd M. Shrider" <tshrider@varesearch.com>,
        <pgsql-hackers@postgresql.org>
References: <Pine.LNX.4.04.10005180846290.15739-100000@silicon.su.valinux.com>
Subject: Re: [HACKERS] failing over with postgresql
Date: Tue, 23 May 2000 17:56:20 +0200
Organization: repas AEG Automation GmbH
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR

For a SCADA system (Supervisory Control and Data Akquisition) which consists
of one  master and one hot-standby server I have implemented such a
solution. To these UNIX servers client workstations are connected (NT and/or
UNIX). The database client programms run on client and server side.

When developing this approach I had to goals in mind:
1) Not to get dependend on the PostgreSQL sources since they change very
dynamically.
2) Not to get dependend on the fe/be protocol  since there are discussions
around to change it.

So the approach is quite simple: Forward all database requests to the
standby server on TCP/IP level.

On both servers the postmaster listens on port 5433 and not on 5432. On
standard port 5432 my program listens instead. This program forks twice for
every incomming connection. The first instance forwards all packets from the
frontend to both backends. The second instance receives the packets from all
backends and forwards the packets from the master backend to the frontend.
So a frontend running on a server machine connects to port 5432 of
localhost.

On the client machine runs another program (on NT as a service). This
program forks for every incomming connections twice. The first instance
forwards all packets to port 5432 of the current master server and the
second instance forwards the packets from the master server to the frontend.

During standby computer startup the database of the master computer is
dumped, zipped, copied to the standby computer, unzipped and loaded into
that database.
If a standby startup took place, all client connections are aborted to allow
a login into the standby database. The frontends need to reconnect in this
case. So the database of the standby computer is always in sync.

The disadvantage of this method is that a query cannot be canceled in the
standby server since the request key of this connections gets lost. But we
can live with that.

Both programms are able to run on Unix and on (native!) NT. On NT threads
are created instead of forked processes.

This approach is simple, but it is effective and it works.

We hope to survive this way until real replication will be implemented in
PostgreSQL.

Andreas Kardos

-----Ursprüngliche Nachricht-----
Von: Todd M. Shrider <tshrider@varesearch.com>
An: <pgsql-hackers@postgresql.org>
Gesendet: Donnerstag, 18. Mai 2000 17:48
Betreff: [HACKERS] failing over with postgresql


>
> is anyone working on or have working a fail-over implentation for the
> postgresql stuff. i'd be interested in seeing if and how any might be
> dealing with just general issues as well as the database syncing issues.
>
> we are looking to do this with heartbeat and lvs in mind. also if anyone
> is load ballancing their databases that would be cool to talk about to.
>
> ---
> Todd M. Shrider VA Linux Systems
> Systems Engineer
> tshrider@valinux.com www.valinux.com
>


From pgsql-hackers-owner+M3662@postgresql.org Tue Jan 23 16:23:34 2001
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA04456
	for <pgman@candle.pha.pa.us>; Tue, 23 Jan 2001 16:23:34 -0500 (EST)
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLKf004705;
	Tue, 23 Jan 2001 16:20:41 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M3662@postgresql.org)
Received: from sectorbase2.sectorbase.com ([208.48.122.131])
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLAe003753
	for <pgsql-hackers@postgresql.org>; Tue, 23 Jan 2001 16:10:40 -0500 (EST)
	(envelope-from vmikheev@SECTORBASE.COM)
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
	id <DG1W4Q8F>; Tue, 23 Jan 2001 12:49:07 -0800
Message-ID: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com>
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
To: "'dom@idealx.com'" <dom@idealx.com>, pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)
Date: Tue, 23 Jan 2001 13:10:34 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: ORr

>   I had thought that the pre-commit information could be stored in an
> auxiliary table by the middleware program ; we would then have
> to re-implement some sort of higher-level WAL (I thought of the list
> of the commands performed in the current transaction, with a sequence
> number for each of them that would guarantee correct ordering between
> concurrent transactions in case of a REDO). But I fear I am missing

This wouldn't work for READ COMMITTED isolation level.
But why do you want to log commands into WAL where each modification
is already logged in, hm, correct order?
Well, it has sense if you're looking for async replication but
you need not in two-phase commit for this and should aware about
problems with READ COMMITTED isolevel.

Back to two-phase commit - it's easiest part of work required for
distributed transaction processing.
Currently we place single commit record to log and transaction is
committed when this record (and so all other transaction records)
is on disk.
Two-phase commit:

1. For 1st phase we'll place into log "prepared-to-commit" record
   and this phase will be accomplished after record is flushed on disk.
   At this point transaction may be committed at any time because of
   all its modifications are logged. But it still may be rolled back
   if this phase failed on other sites of distributed system.

2. When all sites are prepared to commit we'll place "committed"
   record into log. No need to flush it because of in the event of
   crash for all "prepared" transactions recoverer will have to
   communicate other sites to know their statuses anyway.

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

Vadim

From pgsql-hackers-owner+M3665@postgresql.org Tue Jan 23 17:05:26 2001
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05972
	for <pgman@candle.pha.pa.us>; Tue, 23 Jan 2001 17:05:24 -0500 (EST)
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NM31008120;
	Tue, 23 Jan 2001 17:03:01 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M3665@postgresql.org)
Received: from candle.pha.pa.us (candle.navpoint.com [162.33.245.46])
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f0NLsU007188
	for <pgsql-hackers@postgresql.org>; Tue, 23 Jan 2001 16:54:30 -0500 (EST)
	(envelope-from pgman@candle.pha.pa.us)
Received: (from pgman@localhost)
	by candle.pha.pa.us (8.9.0/8.9.0) id QAA05300;
	Tue, 23 Jan 2001 16:53:53 -0500 (EST)
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-Id: <200101232153.QAA05300@candle.pha.pa.us>
Subject: Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)
In-Reply-To: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com>
	"from Mikheev, Vadim at Jan 23, 2001 01:10:34 pm"
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
Date: Tue, 23 Jan 2001 16:53:53 -0500 (EST)
CC: "'dom@idealx.com'" <dom@idealx.com>, pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL77 (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

[ Charset ISO-8859-1 unsupported, converting... ]
> >   I had thought that the pre-commit information could be stored in an
> > auxiliary table by the middleware program ; we would then have
> > to re-implement some sort of higher-level WAL (I thought of the list
> > of the commands performed in the current transaction, with a sequence
> > number for each of them that would guarantee correct ordering between
> > concurrent transactions in case of a REDO). But I fear I am missing
>
> This wouldn't work for READ COMMITTED isolation level.
> But why do you want to log commands into WAL where each modification
> is already logged in, hm, correct order?
> Well, it has sense if you're looking for async replication but
> you need not in two-phase commit for this and should aware about
> problems with READ COMMITTED isolevel.
>

I believe the issue here is that while SERIALIZABLE ISOLATION means all
queries can be run serially, our default is READ COMMITTED, meaning that
open transactions see committed transactions, even if the transaction
committed after our transaction started.  (FYI, see my chapter on
transactions for help,  http://www.postgresql.org/docs/awbook.html.)

To do higher-level WAL, you would have to record not only the queries,
but the other queries that were committed at the start of each command
in your transaction.

Ideally, you could number every commit by its XID your log, and then
when processing the query, pass the "committed" transaction ids that
were visible at the time each command began.

In other words, you can replay the queries in transaction commit order,
except that you have to have some transactions committed at specific
points while other transactions are open, i.e.:

XID	Open XIDS	Query
500			UPDATE t SET col = 3;
501	500		BEGIN;
501	500		UPDATE t SET col = 4;
501			UPDATE t SET col = 5;
501			COMMIT;

This is a silly example, but it shows that 500 must commit after the
first command in transaction 501, but before the second command in the
transaction.  This is because UPDATE t SET col = 5 actually sees the
changes made by transaction 500 in READ COMMITTED isolation level.

I am not advocating this.  I think WAL is a better choice.  I just
wanted to outline how replaying the queries in commit order is
insufficient.

> Back to two-phase commit - it's easiest part of work required for
> distributed transaction processing.
> Currently we place single commit record to log and transaction is
> committed when this record (and so all other transaction records)
> is on disk.
> Two-phase commit:
>
> 1. For 1st phase we'll place into log "prepared-to-commit" record
>    and this phase will be accomplished after record is flushed on disk.
>    At this point transaction may be committed at any time because of
>    all its modifications are logged. But it still may be rolled back
>    if this phase failed on other sites of distributed system.
>
> 2. When all sites are prepared to commit we'll place "committed"
>    record into log. No need to flush it because of in the event of
>    crash for all "prepared" transactions recoverer will have to
>    communicate other sites to know their statuses anyway.
>
> That's all! It is really hard to implement distributed lock- and
> communication- managers but there is no problem with logging two
> records instead of one. Period.

Great.


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

From pgsql-general-owner+M805@postgresql.org Tue Nov 21 23:53:04 2000
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA19262
	for <pgman@candle.pha.pa.us>; Wed, 22 Nov 2000 00:53:03 -0500 (EST)
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAM5qYs47249;
	Wed, 22 Nov 2000 00:52:34 -0500 (EST)
	(envelope-from pgsql-general-owner+M805@postgresql.org)
Received: from racerx.cabrion.com (racerx.cabrion.com [166.82.231.4])
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eAM5lJs46653
	for <pgsql-general@postgresql.org>; Wed, 22 Nov 2000 00:47:19 -0500 (EST)
	(envelope-from rob@cabrion.com)
Received: from cabrionhome (gso163-25-211.triad.rr.com [24.163.25.211])
	by racerx.cabrion.com (8.8.7/8.8.7) with SMTP id AAA13731
	for <pgsql-general@postgresql.org>; Wed, 22 Nov 2000 00:45:20 -0500
Message-ID: <006501c05447$fb9aa0c0$4100fd0a@cabrion.org>
From: "rob" <rob@cabrion.com>
To: <pgsql-general@postgresql.org>
Subject: [GENERAL] Synchronization Toolkit
Date: Wed, 22 Nov 2000 00:49:29 -0500
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_0062_01C0541E.125CAF30"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4133.2400
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
Precedence: bulk
Sender: pgsql-general-owner@postgresql.org
Status: OR

This is a multi-part message in MIME format.

------=_NextPart_000_0062_01C0541E.125CAF30
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Not to be confused with replication, my concept of synchronization is to
manage changes between a server table (or tables) and one or more mobile,
disconnected databases (i.e. PalmPilot, laptop, etc.).

I read through the notes in the TODO for this topic and devised a tool kit
for doing synchronization.  I hope that the Postgresql development community
will find this useful and will help me refine this concept by offering
insight, experience and some good old fashion hacking if you are so
inclined.

The bottom of this message describes how to use the attached files.

I look forward to your feedback.

--rob


Methodology:

I devised a concept that I call "session versioning".  This means that every
time a row changes it does NOT get a new version.  Rather it gets stamped
with the current session version common to all published tables.  Clients,
when they connect for synchronization, will immediately increment this
common version number reserve the result as a "post version" and then
increment the session version again.  This version number, implemented as a
sequence, is common to all synchronized tables and rows.

Any time the server makes changes to the row gets stamped with the current
session version, when the client posts its changes it uses the reserved
"post version".  The client then makes all it's changes stamping the changed
rows with it's reserved "post version" rather than the current version.  The
reason why is explained later.  It is important that the client post all its
own changes first so that it does not end up receiving records which changed
since it's last session that it is about to update anyway.

Reserving the post version is a two step process.  First, the number is
simply stored in a variable for later use.  Second, the value is added to a
lock table (last_stable) to indicate to any concurrent sessions that rows
with higher version numbers are to be considered "unstable" at the moment
and they should not attempt to retrieve them at this time.  Each client,
upon connection, will use the lowest value in this lock table (max_version)
to determine the upper boundary for versions it should retrieve.  The lower
boundary is simply the  previous session's "max_version" plus one.  Thus
when the client retrieves changes is uses the following SQL "where"
expression:

WHERE row_version >= max_version and row_version <= last_stable_version and
version <> this_post_version

The point of reserving and locking a post version is important in that it
allows concurrent synchronization by multiple clients.  The first, of many,
clients to connect basically dictates to all future clients that they must
not take any rows equal to or greater than the one which it just reserved
and locked.  The reason the session version is incremented a second time is
so that the server may continue to post changes concurrent with any client
changes and be certain that these concurrent server changes will not taint
rows the client is about to retrieve. Once the client is finished with it's
session it removes the lock on it's post version.

Partitioning data for use by each node is the next challenge we face.  How
can we control which "slice" of data each client receives?  A slice can be
horizontal or vertical within a table.  Horizontal slices are easy,  it's
just the where clause of an SQL statement that says "give me the rows that
match X criteria".  We handle this by storing and appending a where clause
to each client's retrieval statement  in addition to where clause described
above.  Actually, two where clauses are stored and appended.  One is per
client and one is per publication (table).

We defined horizontal slices by filtering rows.  Vertical slices are limits
by column.  The tool kit does provide a mechanism for pseudo vertical
partitioning.  When a client is "subscribed" to a publication, the toolkit
stores what columns that node is to receive during a session.  These are
stored in the subscribed_cols table.  While this does limit the number
columns transmitted, the insert/update/delete triggers do not recognize
changes based on columns.   The "pseudo" nature of our vertical partitioning
is evident by example:

Say you have a table with name, address and phone number as columns.  You
restrict a client to see only name and address.  This means that phone
number information will not be sent to the client during synchronization,
and the client can't attempt to alter the phone number of a given entry.
Great, but . . . if, on the server, the phone number (but not the name or
address) is changed, the entire row gets marked with a new version.  This
means that the name and address will get sent to the client even though they
didn't change.

Well, there's the flaw in vertical partitioning.  Other than wasting
bandwidth, the extra row does no harm to the process.  The workaround for
this is to highly normalize your schema when possible.

Collisions are the next crux one encounters with synchronization.  When two
clients retrieve the same row and both make (different)changes, which one is
correct?  So far the system operates totally independent of time.  This is
good because it doesn't rely on the server or client to keep accurate time.
We can just ignore time all together, but then we force our clients to
synchronize on a strict schedule in order to avoid (or reduce) collisions.
If every node synchronized immediately after making changes we could just
stop here.  Unfortunately this isn't reality.  Reality dictates that of two
clients: Client A & B will each pick up the same record on Monday.  A will
make changes on Monday, then leave for vacation.  B will make changes on
Wednesday because new information was gathered in A's absence.  Client B
posts those changes Wednesday.  Meanwhile, client A returns from vacation on
Friday and synchronizes his changes.  A over writes B's changes even though
A made changes before the most recent information was posted by B.

It is clear that we need some form of time stamp to cope with the above
example.  While clocks aren't the most reliable, they are the only common
version control available to solve this problem.  The system is set up to
accept (but not require) timestamps from clients and changes on the server
are time stamped.  The system, when presented a time stamp with a row, will
compare them to figure out who wins in a tie.   The system makes certain
"sanity" checks with regard to these time stamps.  A client may not attempt
to post a change with a timestamp that is more than one hour in the future
(according to what the server thinks "now" is) nor one hour before it's last
synchronization date/time.  The client row will be immediately placed into
the collision table if the timestamp is that far out of whack.
Implementations of the tool kit should take care to ensure that client &
server agree on what "now" is before attempting to submit changes with
timestamps.

Time stamps are not required.  Should a client be incapable of tracking
timestamps, etc.  The system will assume that any server row which has been
changed since the client's last session will win a tie.  This is quite error
prone, so timestamps are encouraged where possible.

Inserts pose an interesting challenge.  Since multiple clients cannot share
a sequence (often used as a primary key) while disconnected.  They will be
responsible for their own unique "row_id" when inserting records.   Inserts
accept any arbitrary key, and write back to the client a special kind of
update that gives the server's row_id.  The client is responsible for making
sure that this update takes place locally.

Deletes are the last portion of the process.  When deletes occur, the
row_id, version, etc. are stored in a "deleted" table.  These entries are
retrieved by the client using the same version filter as described above.
The table is pruned at the end of each session by deleting all records with
versions that are less than the lowest 'last_version' stored for each
client.

Having wrapped up the synchronization process, I'll move on to describe some
points about managing clients, publications and the like.

The tool kit is split into two objects: SyncManagement and Synchronization.
The Synchronization object exposes an API that client implementations use to
communicate and receive changes.  The management functions handle system
install and uninstall in addition to publication of tables and client
subscriptions.

Installation and uninstallation are handled by their corresponding functions
in the API.  All system tables are prefixed and suffixed with four
underscores, in hopes that this avoids conflict with an existing tables.
Calling the install function more than once will generate an error message.
Uninstall will remove all related tables, sequences,  functions and triggers
from the system.

The first step, after installing the system, is to publish a table.  A table
can be published more than once under different names.  Simply provide a
unique name as the second argument to the publish function.  Since object
names are restricted to 32 characters in Postgres, each table is given a
unique id and this id is used to create the trigger and sequence names.
Since one table can be published multiple times, but only needs one set of
triggers and one sequence for change management a reference count is kept so
that we know when to add/drop triggers and functions.  By default, all
columns are published, but the third argument to the publish function
accepts an array reference of column names that allows you to specify a
limited set.  Information about the table is stored in the "tables" table,
info about the publication is in the "publications" table and column names
are stored in "subscribed_cols" table.

The next step is to subscribe a client to a table.  A client is identified
by a user name and a node name.  The subscribe function takes three
arguments: user, node & publication.  The subscription process writes an
entry into the "subscribed" table with default values.  Of note, the
"RefreshOnce" attribute is set to true whenever a table is published.  This
indicates to the system that a full table refresh should be sent the next
time the client connects even if the client requests synchronization rather
than refresh.

The toolkit does not, yet, provide a way to manage the whereclause stored at
either the publication or client level.  To use or test this feature, you
will need to set the whereclause attributes manually.

Tables and users can be unpublished and unsubscribed using the corresponding
functions within the tool kit's management interface.  Because postgres
lacks an "ALTER TABLE DROP COLUMN" function, the unpublish function only
removes default values and indexes for those columns.

The API isn't the most robust thing in the world right now.  All functions
return undef on success and an error string otherwise (like DBD).  I hope to
clean up the API considerably over the next month.  The code has not been
field tested at this time.


The files attached are:

1) SynKit.pm (A perl module that contains install/uninstall functions and a
simple api for synchronization & management)

2) sync_install.pl (Sample code to demonstrate the installation, publishing
and subscribe process)

3) sync_uninstall.pl (Sample code to demonstrate the uninstallation,
unpublishing and unsubscribe process)


To use them on Linux (don't know about Win32 but should work fine):

 - set up a test database and make SURE plpgsql is installed

 - install perl 5.05 along with Date::Parse(TimeDate-1.1) , DBI and DBD::Pg
modules [www.cpan.org]

 - copy all three attached files to a test directory

 - cd to your test directory

 - edit all three files and change the three DBI variables to suit your
system (they are clearly marked)

 - % perl sync_install.pl

 - check out the tables, functions & triggers installed

 - % perl sync.pl

 - check out the 'sync_test' table, do some updates/inserts/deletes and run
sync.pl again
        NOTE: Sanity checks default to allow no more than 50% of the table
to be changed by the client in a single session.
        If you delete all (or most of) the rows  you will get errors when
you run sync.pl again! (by design)

 - % perl sync_uninstall.pl  (when you are done)

 - check out  the sample scripts and the perl module code (commented, but
not documented)


------=_NextPart_000_0062_01C0541E.125CAF30
Content-Type: application/octet-stream; name="sync.pl"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="sync.pl"


# This script depicts the syncronization process for two users.


##  CHANGE THESE THREE VARIABLE TO MATCH YOUR SYSTEM  ###########
my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy';	#
my $db_user =3D 'test';						#
my $db_pass =3D 'test';						#
#################################################################

my $ret; #holds return value

use SynKit;

#create a synchronization object (pass dbi connection info)
my $s =3D Synchronize->new($dbi_connect_string,$db_user,$db_pass);

#start a session by passing a user name, "node" identifier and a collision =
queue name (client or server)
$ret =3D $s->start_session('JOE','REMOTE_NODE_NAME','server');
print "Handle this error: $ret\n\n" if $ret;

#call this once before attempting to apply individual changes
$ret =3D $s->start_changes('sync_test',['name']);
print "Handle this error: $ret\n\n" if $ret;

#call this for each change the client wants to make to the database
$ret =3D  $s->apply_change(CLIENTROWID,'insert',undef,['ted']);
print "Handle this error: $ret\n\n" if $ret;

#call this for each change the client wants to make to the database
$ret =3D  $s->apply_change(CLIENTROWID,'insert','1973-11-10 11:25:00 AM -05=
',['tim']);
print "Handle this error: $ret\n\n" if $ret;

#call this for each change the client wants to make to the database
$ret =3D  $s->apply_change(999,'update',undef,['tom']);
print "Handle this error: $ret\n\n" if $ret;

#call this for each change the client wants to make to the database
$ret =3D  $s->apply_change(1,'update',undef,['tom']);
print "Handle this error: $ret\n\n" if $ret;

#call this once after all changes have been submitted
$ret =3D $s->end_changes();
print "Handle this error: $ret\n\n" if $ret;

#call this to get updates from all subscribed tables
$ret =3D $s->get_all_updates();
print "Handle this error: $ret\n\n" if $ret;

print "\n\nSyncronization session is complete. (JOE) \n\n";


# make some changes to the database (server perspective)

print "\n\nMaking changes to the the database. (server side) \n\n";

use DBI;
my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass);

$dbh->do("insert into sync_test values ('roger')");
$dbh->do("insert into sync_test values ('john')");
$dbh->do("insert into sync_test values ('harry')");
$dbh->do("delete from sync_test where name =3D 'roger'");
$dbh->do("update sync_test set name =3D 'tom' where name =3D 'harry'");

$dbh->disconnect;


#now do another session for a different user

#start a session by passing a user name, "node" identifier and a collision =
queue name (client or server)
$ret =3D $s->start_session('KEN','ANOTHER_REMOTE_NODE_NAME','server');
print "Handle this error: $ret\n\n" if $ret;

#call this to get updates from all subscribed tables
$ret =3D $s->get_all_updates();
print "Handle this error: $ret\n\n" if $ret;

print "\n\nSynchronization session is complete. (KEN)\n\n";

print "Now look at your database and see what happend, make changes to the =
test table, etc. and run this again.\n\n";

------=_NextPart_000_0062_01C0541E.125CAF30
Content-Type: application/octet-stream; name="sync_uninstall.pl"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="sync_uninstall.pl"


# this script uninstalls the synchronization system using the SyncManager o=
bject;

use SynKit;

###  CHANGE THESE TO MATCH YOUR SYSTEM   ########################
my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy';	#
my $db_user =3D 'test';						#
my $db_pass =3D 'test';						#
#################################################################


my $ret; #holds return value

#create an instance of the SyncManager object
my $m =3D SyncManager->new($dbi_connect_string,$db_user,$db_pass);

# call this to unsubscribe a user/node (not necessary if you are uninstalli=
ng)
print $m->unsubscribe('KEN','ANOTHER_REMOTE_NODE_NAME','sync_test');

#call this to unpublish a table (not necessary if you are uninstalling)
print $m->unpublish('sync_test');

#call this to uninstall the syncronization system
#  NOTE: this will automatically unpublish & unsubscribe all users
print $m->UNINSTALL;

# now let's drop our little test table
use DBI;
my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass);
$dbh->do("drop table sync_test");
$dbh->disconnect;

print "\n\nI hope you enjoyed this little demonstration\n\n";


------=_NextPart_000_0062_01C0541E.125CAF30
Content-Type: application/octet-stream; name="sync_install.pl"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="sync_install.pl"


# This script shows how to install the synchronization system=20
# using the SyncManager object

use SynKit;

### CHANGE THESE TO MATCH YOUR SYSTEM  ##########################
my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy';	#
my $db_user =3D 'test';						#
my $db_pass =3D 'test';						#
#################################################################
my $ret; #holds return value


#create an instance of the sync manager object
my $m =3D SyncManager->new($dbi_connect_string,$db_user,$db_pass);

#Call this to install the syncronization management tables, etc.
$ret =3D $m->INSTALL;
die "Handle this error: $ret\n\n" if $ret;


#create a test table for us to demonstrate with
use DBI;
my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass);
$dbh->do("create table sync_test (name text)");
$dbh->do("insert into sync_test values ('rob')");
$dbh->do("insert into sync_test values ('rob')");
$dbh->do("insert into sync_test values ('rob')");
$dbh->do("insert into sync_test values ('ted')");
$dbh->do("insert into sync_test values ('ted')");
$dbh->do("insert into sync_test values ('ted')");
$dbh->disconnect;


#call this to "publish" a table
$ret =3D $m->publish('sync_test');
print "Handle this error: $ret\n\n" if $ret;

#call this to "subscribe" a user/node to a publication (table)
$ret =3D $m->subscribe('JOE','REMOTE_NODE_NAME','sync_test');
print "Handle this error: $ret\n\n" if $ret;

#call this to "subscribe" a user/node to a publication (table)
$ret =3D $m->subscribe('KEN','ANOTHER_REMOTE_NODE_NAME','sync_test');
print "Handle this error: $ret\n\n" if $ret;


print "Now you can do: 'perl sync.pl' a few times to play\n\n";
print "Do 'perl sync_uninstall.pl' to uninstall the system\n";


------=_NextPart_000_0062_01C0541E.125CAF30
Content-Type: application/octet-stream; name="SynKit.pm"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="SynKit.pm"

# Perl DB synchronization toolkit

#created for postgres 7.0.2 +
use strict;

BEGIN {
        use vars       qw($VERSION);
        # set the version for version checking
        $VERSION     =3D 1.00;
}


package Synchronize;

use DBI;

use Date::Parse;

# new requires 3 arguments: dbi connection string, plus the corresponding u=
sername and password to get connected to the database
sub new {
	my $proto =3D shift;
	my $class =3D ref($proto) || $proto;
	my $self =3D {};

	my $dbi =3D shift;
	my $user =3D shift;
	my $pass =3D shift;

	$self->{DBH} =3D DBI->connect($dbi,$user,$pass) || die "Failed to connect =
to database: ".DBI->errstr();

	$self->{user} =3D undef;
	$self->{node} =3D undef;
	$self->{status} =3D undef; # holds status of table update portion of sessi=
on
	$self->{pubs} =3D {}; #holds hash of pubs available to sessiom with val =
=3D 1 if ok to request sync
	$self->{orderpubs} =3D undef; #holds array ref of subscribed pubs ordered =
by sync_order
	$self->{this_post_ver} =3D undef; #holds the version number under which th=
is session will post changes
	$self->{max_ver} =3D undef; #holds the maximum safe version for getting up=
dates
	$self->{current} =3D {}; #holds the current publication info to which chan=
ges are being applied
	$self->{queue} =3D 'server'; # tells collide function what to do with coll=
isions. (default is to hold on server)

	$self->{DBLOG}=3D DBI->connect($dbi,$user,$pass) || die "cannot log to DB:=
 ".DBI->errstr();=20


	return bless ($self, $class);
}

sub dblog {=20
	my $self =3D shift;
	my $msg =3D $self->{DBLOG}->quote($_[0]);
	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});
	$self->{DBLOG}->do("insert into ____sync_log____ (username, nodename,stamp=
, message) values($quser, $qnode, now(), $msg)");
}


#start_session establishes session wide information and other housekeeping =
chores
	# Accepts username, nodename and queue (client or server) as arguments;

sub start_session {
	my $self =3D shift;
	$self->{user} =3D shift || die 'Username is required';
	$self->{node} =3D shift || die 'Nodename is required';
	$self->{queue} =3D shift;


	if ($self->{queue} ne 'server' && $self->{queue} ne 'client') {
		die "You must provide a queue argument of either 'server' or 'client'";
	}

	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});

	my $sql =3D "select pubname from ____subscribed____ where username =3D $qu=
ser and nodename =3D $qnode";
	my @pubs =3D $self->GetColList($sql);

	return 'User/Node has no subscriptions!' if !defined(@pubs);

	# go though the list and check permissions and rules for each
	foreach my $pub (@pubs) {
		my $qpub =3D $self->{DBH}->quote($pub);
		my $sql =3D "select disabled, pubname, fullrefreshonly, refreshonce,post_=
ver from ____subscribed____ where username =3D $quser and pubname =3D $qpub=
 and nodename =3D $qnode";
		my $sth =3D $self->{DBH}->prepare($sql) || die $self->{DBH}->errstr;
		$sth->execute || die $self->{DBH}->errstr;
		my @row;
		while (@row =3D $sth->fetchrow_array) {
			next if $row[0]; #publication is disabled
			next if !defined($row[1]); #publication does not exist (should never occ=
ur)
			if ($row[2] || $row[3]) { #refresh of refresh once flag is set
				$self->{pubs}->{$pub} =3D 0; #refresh only
				next;
			}
			if (!defined($row[4])) { #no previous session exists, must refresh
				$self->{pubs}->{$pub} =3D 0; #refresh only
				next;
			}
			$self->{pubs}->{$pub} =3D 1; #OK for sync
		}
		$sth->finish;
	}


	$sql =3D "select pubname from ____publications____ order by sync_order";
	my @op =3D $self->GetColList($sql);
	my @orderpubs;

	#loop through ordered pubs and remove non subscribed publications
	foreach my $pub (@op) {
		push @orderpubs, $pub if defined($self->{pubs}->{$pub});
	}
=09
	$self->{orderpubs} =3D \@orderpubs;

# Now we obtain a session version number, etc.

	$self->{DBH}->{AutoCommit} =3D 0; #allows "transactions"
	$self->{DBH}->{RaiseError} =3D 1; #script [or eval] will automatically die=
 on errors

	eval { #start DB transaction

	#lock the version sequence until we determin that we have gotten
	#a good  value.  Lock will be released on commit.
		$self->{DBH}->do('lock ____version_seq____ in access exclusive mode');

	# remove stale locks if they exist
		my $sql =3D "delete from ____last_stable____ where username =3D $quser an=
d nodename =3D $qnode";
		$self->{DBH}->do($sql);

	# increment version sequence & grab the next val as post_ver
		my $sql =3D "select nextval('____version_seq____')";
		my $sth =3D $self->{DBH}->prepare($sql);
		$sth->execute;
		($self->{this_post_ver}) =3D $sth->fetchrow_array();
		$sth->finish;
	# grab max_ver from last_stable

		$sql =3D "select min(version) from ____last_stable____";=20
		$sth =3D $self->{DBH}->prepare($sql);
		$sth->execute;
		($self->{max_ver}) =3D $sth->fetchrow_array();
		$sth->finish;

	# if there was no version in lock table, then take the ID that was in use
	# when we started the session ($max_ver -1)

		$self->{max_ver} =3D $self->{this_post_ver} -1 if (!defined($self->{max_v=
er}));

	# lock post_ver by placing it in last_stable
		$self->{DBH}->do("insert into ____last_stable____ (version, username, nod=
ename) values ($self->{this_post_ver}, $quser,$qnode)");

	# increment version sequence again (discard result)
		$sql =3D "select nextval('____version_seq____')";
		$sth =3D $self->{DBH}->prepare($sql);
		$sth->execute;
		$sth->fetchrow_array();
		$sth->finish;

	}; #end eval/transaction

	if ($@) { # part of transaction failed
		return 'Start session failed';
		$self->{DBH}->rollback;
	} else { # all's well commit block
		$self->{DBH}->commit;
	}
	$self->{DBH}->{AutoCommit} =3D 1;
	$self->{DBH}->{RaiseError} =3D 0;

	return undef;

}

#start changes should be called once before applying individual change requ=
ests
	# Requires publication and ref to columns that will be updated as arguments
sub start_changes {
	my $self =3D shift;
	my $pub =3D shift || die 'Publication is required';
	my $colref =3D shift || die 'Reference to column array is required';

	$self->{status} =3D 'starting';

	my $qpub =3D $self->{DBH}->quote($pub);
	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});

	my @cols =3D @{$colref};
	my @subcols =3D $self->GetColList("select col_name from ____subscribed_col=
s____ where username =3D $quser and nodename =3D $qnode and pubname =3D $qp=
ub");
	my %subcols;
	foreach my $col (@subcols) {
		$subcols{$col} =3D 1;
	}
	foreach my $col (@cols) {=09
		return "User/node is not subscribed to column '$col'" if !$subcols{$col};
	}

	my $sql =3D "select pubname, readonly, last_session, post_ver, last_ver, w=
hereclause, sanity_limit,=20
sanity_delete, sanity_update, sanity_insert from ____subscribed____ where u=
sername =3D $quser and pubname =3D $qpub and nodename =3D $qnode";
	my ($junk, $readonly, $last_session, $post_ver, $last_ver, $whereclause, $=
sanity_limit,=20
$sanity_delete, $sanity_update, $sanity_insert) =3D $self->GetOneRow($sql);
=09
	return 'Publication is read only' if $readonly;

	$sql =3D "select whereclause from ____publications____ where pubname =3D $=
qpub";
	my ($wc) =3D $self->GetOneRow($sql);
	$whereclause =3D '('.$whereclause.')' if $whereclause;
	$whereclause =3D $whereclause.' and ('.$wc.')' if $wc;

	my ($table) =3D $self->GetOneRow("select tablename from ____publications__=
__ where pubname =3D $qpub");

	return 'Publication is not registered correctly' if !defined($table);

	my %info;
	$info{pub} =3D $pub;
	$info{whereclause} =3D $whereclause;
	$info{post_ver} =3D $post_ver;
	$last_session =3D~ s/([+|-]\d\d?)$/ $1/;	#put a space before timezone=09
	$last_session =3D str2time ($last_session); #convert to perltime (seconds =
since 1970)
	$info{last_session} =3D $last_session;
	$info{last_ver} =3D $last_ver;
	$info{table}  =3D $table;
	$info{cols} =3D \@cols;

	my $sql =3D "select count(oid) from $table";
	$sql =3D $sql .' '.$whereclause if $whereclause;
	my ($rowcount) =3D $self->GetOneRow($sql);

	#calculate sanity levels (convert from % to number of rows)
	# limits defined as less than 1 mean no limit
	$info{sanitylimit} =3D $rowcount * ($sanity_limit / 100) if $sanity_limit =
> 0;
	$info{insertlimit} =3D $rowcount * ($sanity_insert / 100) if $sanity_inser=
t > 0;
	$info{updatelimit} =3D $rowcount * ($sanity_update / 100) if $sanity_updat=
e > 0;
	$info{deletelimit} =3D $rowcount * ($sanity_delete / 100) if $sanity_delet=
e > 0;

	$self->{sanitycount} =3D 0;
	$self->{updatecount} =3D 0;
	$self->{insertcount} =3D 0;
	$self->{deletecount} =3D 0;

	$self->{current} =3D \%info;

	$self->{DBH}->{AutoCommit} =3D 0; #turn on transaction behavior so we can =
roll back on sanity limits, etc.

	$self->{status} =3D 'ready';

	return undef;
}

#call this once all changes are submitted to commit them;
sub end_changes {
	my $self =3D shift;
	return undef if $self->{status} ne 'ready';
	$self->{DBH}->commit;
	$self->{DBH}->{AutoCommit} =3D 1;
	$self->{status} =3D 'success';
	return undef;
}

#call apply_change once for each row level client update
	# Accepts 4 params: rowid, action, timestamp and reference to data array
	#	Note: timestamp can be undef, data can be undef
	#		timestamp MUST be in perl time (secs since 1970)

#this routine checks basic timestamp info and sanity limits, then passes th=
e info along to do_action() for processing
sub apply_change {
	my $self =3D shift;
	my $rowid =3D shift || return 'Row ID is required'; #don't die just for on=
e bad row
	my $action =3D shift || return 'Action is required'; #don't die just for o=
ne bad row
	my $timestamp =3D shift;
	my $dataref =3D shift;
	$action =3D lc($action);

	$timestamp =3D str2time($timestamp) if $timestamp;

	return 'Status failure, cannot accept changes: '.$self->{status} if $self-=
>{status} ne 'ready';

	my %info =3D %{$self->{current}};

	$self->{sanitycount}++;
	if ($info{sanitylimit} && $self->{sanitycount} > $info{sanitylimit}) {
		# too many changes from client
		my $ret =3D $self->sanity('limit');
		return $ret if $ret;
	}

=09
	if ($timestamp && $timestamp > time() + 3600) { # current time + one hour
		#client's clock is way off, cannot submit changes in future
		my $ret =3D $self->collide('future', $info{table}, $rowid, $action, undef=
, $timestamp, $dataref, $self->{queue});
		return $ret if $ret;
	}

	if ($timestamp && $timestamp < $info{last_session} - 3600) { # last sessio=
n time less one hour
		#client's clock is way off, cannot submit changes that occured before las=
t sync date
		my $ret =3D $self->collide('past', $info{table}, $rowid, $action, undef, =
$timestamp, $dataref , $self->{queue});
		return $ret if $ret;
	}

	my ($crow, $cver, $ctime); #current row,ver,time
	if ($action ne 'insert') {
		my $sql =3D "select ____rowid____, ____rowver____, ____stamp____ from $in=
fo{table} where ____rowid____ =3D $rowid";
		($crow, $cver, $ctime) =3D $self->GetOneRow($sql);
		if (!defined($crow)) {
			my $ret =3D $self->collide('norow', $info{table}, $rowid, $action, undef=
, $timestamp, $dataref , $self->{queue});
			return $ret if $ret;=09=09
		}

		$ctime =3D~ s/([+|-]\d\d?)$/ $1/; #put space between timezone
		$ctime =3D str2time($ctime) if $ctime; #convert to perl time

		if ($timestamp) {
			if ($ctime < $timestamp) {
				my $ret =3D $self->collide('time', $info{table}, $rowid, $action, undef=
, $timestamp, $dataref, $self->{queue} );=09=09
				return $ret if $ret;
			}

		} else {
			if ($cver > $self->{this_post_ver}) {
				my $ret =3D $self->collide('version', $info{table}, $rowid, $action, un=
def, $timestamp, $dataref, $self->{queue} );
				return $ret if $ret;
			}
		}
=09
	}

	if ($action eq 'insert') {
		$self->{insertcount}++;
		if ($info{insertlimit} && $self->{insertcount} > $info{insertlimit}) {
			# too many changes from client
			my $ret =3D $self->sanity('insert');
			return $ret if $ret;
		}

		my $qtable =3D $self->{DBH}->quote($info{table});
		my ($rowidsequence) =3D '_'.$self->GetOneRow("select table_id from ____ta=
bles____ where tablename =3D $qtable").'__rowid_seq';
		return 'Table incorrectly registered, cannot get rowid sequence name: '.$=
self->{DBH}->errstr() if not defined $rowidsequence;

		my @data;
		foreach my $val (@{$dataref}) {
			push @data, $self->{DBH}->quote($val);
		}
		my $sql =3D "insert into $info{table} (";
		if ($timestamp) {
			$sql =3D $sql . join(',',@{$info{cols}}) . ',____rowver____, ____stamp__=
__) values (';
			$sql =3D $sql . join (',',@data) .','.$self->{this_post_ver}.',\''.local=
time($timestamp).'\')';
		} else {
			$sql =3D $sql . join(',',@{$info{cols}}) . ',____rowver____) values (';
			$sql =3D $sql . join (',',@data) .','.$self->{this_post_ver}.')';
		}
		my $ret =3D $self->{DBH}->do($sql);
		if (!$ret) {
			my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,=
 $action, undef, $timestamp, $dataref , $self->{queue});
			return $ret if $ret;=09=09
		}
		my ($newrowid) =3D $self->GetOneRow("select currval('$rowidsequence')");
		return 'Failed to get current rowid on inserted row'.$self->{DBH}->errstr=
 if not defined $newrowid;
		$self->changerowid($rowid, $newrowid);
	}

	if ($action eq 'update') {
		$self->{updatecount}++;
		if ($info{updatelimit} && $self->{updatecount} > $info{updatelimit}) {
			# too many changes from client
			my $ret =3D $self->sanity('update');
			return $ret if $ret;
		}
		my @data;
		foreach my $val (@{$dataref}) {
			push @data, $self->{DBH}->quote($val);
		}=09

		my $sql =3D "update $info{table} set ";
		my @cols =3D @{$info{cols}};
		foreach my $col (@cols) {
			my $val =3D shift @data;
			$sql =3D $sql . "$col =3D $val,";
		}
		$sql =3D $sql." ____rowver____ =3D $self->{this_post_ver}";
		$sql =3D $sql.", ____stamp____ =3D '".localtime($timestamp)."'" if $times=
tamp;
		$sql =3D $sql." where ____rowid____ =3D $rowid";
		$sql =3D $sql." and $info{whereclause}" if $info{whereclause};
		my $ret =3D $self->{DBH}->do($sql);
		if (!$ret) {
			my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,=
 $action, undef, $timestamp, $dataref , $self->{queue});
			return $ret if $ret;=09=09
		}

	}

	if ($action eq 'delete') {
		$self->{deletecount}++;
		if ($info{deletelimit} && $self->{deletecount} > $info{deletelimit}) {
			# too many changes from client
			my $ret =3D $self->sanity('delete');
			return $ret if $ret;
		}
		if ($timestamp) {
			my $sql =3D "update $info{table} set ____rowver____ =3D $self->{this_pos=
t_ver}, ____stamp____ =3D '".localtime($timestamp)."'  where ____rowid____ =
=3D $rowid";
			$sql =3D $sql . " where $info{whereclause}" if $info{whereclause};
			$self->{DBH}->do($sql) || return 'Predelete update failed: '.$self->{DBH=
}->errstr;
		} else {
			my $sql =3D "update $info{table} set ____rowver____ =3D $self->{this_pos=
t_ver} where ____rowid____ =3D $rowid";
			$sql =3D $sql . " where $info{whereclause}" if $info{whereclause};
			$self->{DBH}->do($sql) || return 'Predelete update failed: '.$self->{DBH=
}->errstr;
		}
		my $sql =3D "delete from $info{table} where ____rowid____ =3D $rowid";
		$sql =3D $sql . " where $info{whereclause}" if $info{whereclause};
		my $ret =3D $self->{DBH}->do($sql);
		if (!$ret) {
			my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,=
 $action, undef, $timestamp, $dataref , $self->{queue});
			return $ret if $ret;=09=09
		}
}
=09
=09
	return undef;
}

sub changerowid {
	my $self =3D shift;
	my $oldid =3D shift;
	my $newid =3D shift;
	$self->writeclient('changeid',"$oldid\t$newid");
}

#writes info to client
sub writeclient {
	my $self =3D shift;
	my $type =3D shift;
	my @info =3D @_;
	print "$type: ",join("\t",@info),"\n";
	return undef;
}

# Override this for custom behavior.  Default is to echo back the sanity fa=
ilure reason.=20=20
# If you want to override a collision, you can do so by returning undef.
sub sanity {
	my $self =3D shift;
	my $reason =3D shift;
	$self->{status} =3D 'sanity exceeded';
	$self->{DBH}->rollback;
	return $reason;
}

# Override this for custom behavior.  Default is to echo back the failure r=
eason.=20=20
# If you want to override a collision, you can do so by returning undef.
sub collide {
	my $self =3D shift;
	my ($reason,$table,$rowid,$action,$rowver,$timestamp,$data, $queue) =3D @_;

	my @data;
	foreach my $val (@{$data}) {
		push @data, $self->{DBH}->quote($val);
	}=09

	if ($reason =3D~ /integrity/i || $reason =3D~ /constraint/i) {
		$self->{status} =3D 'intergrity violation';
		$self->{DBH}->rollback;
	}

	my $datastring;
	my @cols =3D @{$self->{current}->{cols}};
	foreach my $col (@cols) {
		my $val =3D shift @data;
		$datastring =3D $datastring . "$col =3D $val,";
	}
	chop $datastring; #remove trailing comma

	if ($queue eq 'server') {
		$timestamp =3D localtime($timestamp) if defined($timestamp);
		$rowid =3D $self->{DBH}->quote($rowid);
		$rowid =3D 'null' if !defined($rowid);
		$rowver =3D 'null' if !defined($rowver);
		$timestamp =3D $self->{DBH}->quote($timestamp);
		$data =3D $self->{DBH}->quote($data);
		my $qtable =3D $self->{DBH}->quote($table);
		my $qreason =3D $self->{DBH}->quote($reason);
		my $qaction =3D $self->{DBH}->quote($action);
		my $quser =3D $self->{DBH}->quote($self->{user});
		my $qnode =3D $self->{DBH}->quote($self->{node});
		$datastring =3D $self->{DBH}->quote($datastring);


		my $sql =3D "insert into ____collision____ (rowid,
tablename, rowver, stamp, data, reason, action, username,
nodename, queue) values($rowid,$qtable, $rowver, $timestamp,$datastring,
$qreason, $qaction,$quser, $qnode)";
		$self->{DBH}->do($sql) || die 'Failed to write to collision table: '.$sel=
f->{DBH}->errstr;

	} else {

		$self->writeclient('collision',$rowid,$table, $rowver, $timestamp,$reason=
, $action,$self->{user}, $self->{node}, $data);

	}
	return $reason;
}

#calls get_updates once for each publication the user/node is subscribed to=
 in correct sync_order
sub get_all_updates {
	my $self =3D shift;
	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});

	foreach my $pub (@{$self->{orderpubs}}) {
		$self->get_updates($pub, 1); #request update as sync unless overrridden b=
y flags
	}

}

# Call this once for each table the client needs refreshed or sync'ed AFTER=
 all inbound client changes have been posted
#	Accepts publication and sync flag as arguments
sub get_updates {
	my $self =3D shift;
	my $pub =3D shift || die 'Publication is required';
	my $sync =3D shift;

	my $qpub =3D $self->{DBH}->quote($pub);
	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});

	#enforce refresh and refreshonce flags
	undef $sync if !$self->{pubs}->{$pub};=20


	my %info =3D $self->{current};

	my @cols =3D $self->GetColList("select col_name from ____subscribed_cols__=
__ where username =3D $quser and nodename =3D $qnode and pubname =3D $qpub"=
);;

	my ($table) =3D $self->GetOneRow("select tablename from ____publications__=
__ where pubname =3D $qpub");
	return 'Table incorrectly registered for read' if !defined($table);
	my $qtable =3D $self->{DBH}->quote($table);=09


	my $sql =3D "select pubname, last_session, post_ver, last_ver, whereclause=
 from ____subscribed____ where username =3D $quser and pubname =3D $qpub an=
d nodename =3D $qnode";
	my ($junk, $last_session, $post_ver, $last_ver, $whereclause) =3D $self->G=
etOneRow($sql);

	my ($wc) =3D $self->GetOneRow("select whereclause from ____publications___=
_ where pubname =3D $qpub");

	$whereclause =3D '('.$whereclause.')' if $whereclause;

	$whereclause =3D $whereclause.' and ('.$wc.')' if $wc;


	if ($sync) {
		$self->writeclient('start synchronize', $pub);
	} else {
		$self->writeclient('start refresh', $pub);
		$self->{DBH}->do("update ____subscribed____ set refreshonce =3D false whe=
re pubname =3D $qpub and username =3D $quser and nodename =3D $qnode") || r=
eturn 'Failed to clear RefreshOnce flag: '.$self->{DBH}->errstr;
	}

	$self->writeclient('columns',@cols);


	my $sql =3D "select ____rowid____, ".join(',', @cols)." from $table";
	if ($sync) {
		$sql =3D $sql." where (____rowver____ <=3D $self->{max_ver} and ____rowve=
r____ > $last_ver)";
		if (defined($self->{this_post_ver})) {
			$sql =3D $sql . " and (____rowver____ <> $post_ver)";
		}
	} else {
		$sql =3D $sql." where (____rowver____ <=3D $self->{max_ver})";
	}
	$sql =3D $sql." and $whereclause" if $whereclause;
=09
	my $sth =3D $self->{DBH}->prepare($sql) || return 'Failed to get prepare S=
QL for updates: '.$self->{DBH}->errstr;
	$sth->execute || return 'Failed to execute SQL for updates: '.$self->{DBH}=
->errstr;
	my @row;
	while (@row =3D $sth->fetchrow_array) {
		$self->writeclient('update/insert',@row);
	}

	$sth->finish;

	# now get deleted rows
	if ($sync) {
		$sql =3D "select rowid from ____deleted____ where (tablename =3D $qtable)=
";
		$sql =3D $sql." and (rowver <=3D $self->{max_ver} and rowver > $last_ver)=
";
		if (defined($self->{this_post_ver})) {
			$sql =3D $sql . " and (rowver <> $self->{this_post_ver})";
		}
		$sql =3D $sql." and $whereclause" if $whereclause;

		$sth =3D $self->{DBH}->prepare($sql) || return 'Failed to get prepare SQL=
 for deletes: '.$self->{DBH}->errstr;
		$sth->execute || return 'Failed to execute SQL for deletes: '.$self->{DBH=
}->errstr;
		my @row;
		while (@row =3D $sth->fetchrow_array) {
			$self->writeclient('delete',@row);
		}

		$sth->finish;
	}

	if ($sync) {
		$self->writeclient('end synchronize', $pub);
	} else {
		$self->writeclient('end refresh', $pub);
	}

	my $qpub =3D $self->{DBH}->quote($pub);
	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});

	$self->{DBH}->do("update ____subscribed____ set last_ver =3D $self->{max_v=
er}, last_session =3D now(), post_ver =3D $self->{this_post_ver} where user=
name =3D $quser and nodename =3D $qnode and pubname =3D $qpub");
	return undef;
}


# Call this once when everything else is done.  Does housekeeping.=20
# (MAKE THIS AN OBJECT DESTRUCTOR?)
sub DESTROY {
	my $self =3D shift;

#release version from lock table (including old ones)
	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});
	my $sql =3D "delete from ____last_stable____ where username =3D $quser and=
 nodename =3D $qnode";
	$self->{DBH}->do($sql);

#clean up deleted table
	my ($version) =3D $self->GetOneRow("select min(last_ver) from ____subscrib=
ed____");
	return undef if not defined $version;
	$self->{DBH}->do("delete from ____deleted____ where rowver < $version") ||=
 return 'Failed to prune deleted table'.$self->{DBH}->errstr;;


#disconnect from DBD sessions
	$self->{DBH}->disconnect;
	$self->{DBLOG}->disconnect;
	return undef;
}

############# Helper Subs ############
sub GetColList {
	my $self =3D shift;
	my $sql =3D shift || die 'Must provide sql select statement';
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
	$sth->execute || return undef;
	my $val;
	my @col;
	while (($val) =3D $sth->fetchrow_array) {
		push @col, $val;
	}
	$sth->finish;
	return @col;
}

sub GetOneRow {
	my $self =3D shift;
	my $sql =3D shift || die 'Must provide sql select statement';
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
	$sth->execute || return undef;
	my @row =3D $sth->fetchrow_array;
	$sth->finish;
	return @row;
}

=20


package SyncManager;

use DBI;
# new requires 3 arguments: dbi connection string, plus the corresponding u=
sername and password

sub new {
	my $proto =3D shift;
	my $class =3D ref($proto) || $proto;
	my $self =3D {};

	my $dbi =3D shift;
	my $user =3D shift;
	my $pass =3D shift;

	$self->{DBH} =3D DBI->connect($dbi,$user,$pass) || die "Failed to connect =
to database: ".DBI->errstr();

	$self->{DBLOG}=3D DBI->connect($dbi,$user,$pass) || die "cannot log to DB:=
 ".DBI->errstr();
=09
	return bless ($self, $class);
}

sub dblog {=20
	my $self =3D shift;
	my $msg =3D $self->{DBLOG}->quote($_[0]);
	my $quser =3D $self->{DBH}->quote($self->{user});
	my $qnode =3D $self->{DBH}->quote($self->{node});
	$self->{DBLOG}->do("insert into ____sync_log____ (username, nodename,stamp=
, message) values($quser, $qnode, now(), $msg)");
}

#this should never need to be called, but it might if a node bails without =
releasing their locks
sub ReleaseAllLocks {
	my $self =3D shift;
	$self->{DBH}->do("delete from ____last_stable____)");
}
# Adds a publication to the system.  Also adds triggers, sequences, etc ass=
ociated with the table if approproate.
	# accepts two argument: the name of a physical table and the name under wh=
ich to publish it=20
	# 	NOTE: the publication name is optional and will default to the table na=
me if not supplied
	# returns undef if ok, else error string;
sub publish {
	my $self =3D shift;
	my $table =3D shift || die 'You must provide a table name (and optionally =
a unique publication name)';
	my $pub =3D shift;
	$pub =3D $table if not defined($pub);

	my $qpub =3D $self->{DBH}->quote($pub);
	my $sql =3D "select tablename from ____publications____ where pubname =3D =
$qpub";
	my ($junk) =3D $self->GetOneRow($sql);
	return 'Publication already exists' if defined($junk);

	my $qtable =3D $self->{DBH}->quote($table);

	$sql =3D "select table_id, refcount from ____tables____ where tablename =
=3D $qtable";
	my ($id, $refcount) =3D $self->GetOneRow($sql);

	if(!defined($id)) {
		$self->{DBH}->do("insert into ____tables____ (tablename, refcount) values=
 ($qtable,1)") || return 'Failed to register table: ' . $self->{DBH}->errst=
r;
		my $sql =3D "select table_id from ____tables____ where tablename =3D $qta=
ble";
		($id) =3D $self->GetOneRow($sql);
	}

	if (defined($refcount)) {
		$self->{DBH}->do("update ____tables____ set refcount =3D refcount+1 where=
 table_id =3D $id") || return 'Failed to update refrence count: ' . $self->=
{DBH}->errstr;
	} else {
=09=09
		$id =3D '_'.$id.'_';=20

		my @cols =3D $self->GetTableCols($table, 1); # 1 =3D get hidden cols too
		my %skip;
		foreach my $col (@cols) {
			$skip{$col} =3D 1;
		}
=09=09
		if (!$skip{____rowver____}) {
			$self->{DBH}->do("alter table $table add column ____rowver____ int4"); #=
don't fail here in case table is being republished, just accept the error s=
ilently
		}
		$self->{DBH}->do("update $table set ____rowver____ =3D ____version_seq___=
_.last_value - 1") || return 'Failed to initialize rowver: ' . $self->{DBH}=
->errstr;

		if (!$skip{____rowid____}) {
			$self->{DBH}->do("alter table $table add column ____rowid____ int4"); #d=
on't fail here in case table is being republished, just accept the error si=
lently
		}

		my $index =3D $id.'____rowid____idx';
		$self->{DBH}->do("create index $index on $table(____rowid____)") || retur=
n 'Failed to create rowid index: ' . $self->{DBH}->errstr;

		my $sequence =3D $id.'_rowid_seq';
		$self->{DBH}->do("create sequence $sequence") || return 'Failed to create=
 rowver sequence: ' . $self->{DBH}->errstr;

		$self->{DBH}->do("alter table $table alter column ____rowid____ set defau=
lt nextval('$sequence')"); #don't fail here in case table is being republis=
hed, just accept the error silently

		$self->{DBH}->do("update $table set ____rowid____ =3D  nextval('$sequence=
')") || return 'Failed to initialize rowid: ' . $self->{DBH}->errstr;

		if (!$skip{____stamp____}) {
			$self->{DBH}->do("alter table $table add column ____stamp____ timestamp"=
); #don't fail here in case table is being republished, just accept the err=
or silently
		}

		$self->{DBH}->do("update $table set ____stamp____ =3D  now()") || return =
'Failed to initialize stamp: ' . $self->{DBH}->errstr;

		my $trigger =3D $id.'_ver_ins';
		$self->{DBH}->do("create trigger $trigger before insert on $table for eac=
h row execute procedure sync_insert_ver()") || return 'Failed to create tri=
gger: ' . $self->{DBH}->errstr;

		my $trigger =3D $id.'_ver_upd';
		$self->{DBH}->do("create trigger $trigger before update on $table for eac=
h row execute procedure sync_update_ver()") || return 'Failed to create tri=
gger: ' . $self->{DBH}->errstr;

		my $trigger =3D $id.'_del_row';
		$self->{DBH}->do("create trigger $trigger after delete on $table for each=
 row execute procedure sync_delete_row()") || return 'Failed to create trig=
ger: ' . $self->{DBH}->errstr;
	}

	$self->{DBH}->do("insert into ____publications____ (pubname, tablename) va=
lues ('$pub','$table')") || return 'Failed to create publication entry: '.$=
self->{DBH}->errstr;

	return undef;
}


# Removes a publication from the system.  Also drops triggers, sequences, e=
tc associated with the table if approproate.
	# accepts one argument: the name of a publication
	# returns undef if ok, else error string;
sub unpublish {
	my $self =3D shift;
	my $pub =3D shift || return 'You must provide a publication name';
	my $qpub =3D $self->{DBH}->quote($pub);
	my $sql =3D "select tablename from ____publications____ where pubname =3D =
$qpub";
	my ($table) =3D $self->GetOneRow($sql);
	return 'Publication does not exist' if !defined($table);

	my $qtable =3D $self->{DBH}->quote($table);

	$sql =3D "select table_id, refcount from ____tables____ where tablename =
=3D $qtable";
	my ($id, $refcount) =3D $self->GetOneRow($sql);
	return 'Table: $table is not correctly registered!' if not defined($id);

	$self->{DBH}->do("update ____tables____ set refcount =3D refcount -1 where=
 tablename =3D $qtable") || return 'Failed to decrement reference count: ' =
. $self->{DBH}->errstr;

	$self->{DBH}->do("delete from ____subscribed____ where pubname =3D $qpub")=
 || return 'Failed to delete user subscriptions: ' . $self->{DBH}->errstr;
	$self->{DBH}->do("delete from ____subscribed_cols____ where pubname =3D $q=
pub") || return 'Failed to delete subscribed columns: ' . $self->{DBH}->err=
str;
	$self->{DBH}->do("delete from ____publications____ where tablename =3D $qt=
able and pubname =3D $qpub") || return 'Failed to delete from publications:=
 ' . $self->{DBH}->errstr;

	#if this is the last reference, we want to drop triggers, etc;
	if ($refcount <=3D 1) {
		$id =3D "_".$id."_";

		$self->{DBH}->do("alter table $table alter column ____rowver____ drop def=
ault") || return 'Failed to alter column default: ' . $self->{DBH}->errstr;
		$self->{DBH}->do("alter table $table alter column ____rowid____ drop defa=
ult") || return 'Failed to alter column default: ' . $self->{DBH}->errstr;
		$self->{DBH}->do("alter table $table alter column ____stamp____ drop defa=
ult") || return 'Failed to alter column default: ' . $self->{DBH}->errstr;

		my $trigger =3D $id.'_ver_upd';
		$self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to =
drop trigger: ' . $self->{DBH}->errstr;

		my $trigger =3D $id.'_ver_ins';
		$self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to =
drop trigger: ' . $self->{DBH}->errstr;

		my $trigger =3D $id.'_del_row';
		$self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to =
drop trigger: ' . $self->{DBH}->errstr;

		my $sequence =3D $id.'_rowid_seq';
		$self->{DBH}->do("drop sequence $sequence") || return 'Failed to drop seq=
uence: ' . $self->{DBH}->errstr;

		my $index =3D $id.'____rowid____idx';
		$self->{DBH}->do("drop index $index") || return 'Failed to drop index: ' =
. $self->{DBH}->errstr;
		$self->{DBH}->do("delete from ____tables____ where tablename =3D $qtable"=
) || return 'remove entry from tables: ' . $self->{DBH}->errstr;
	}
return undef;
}


#Subscribe user/node to a publication
	# Accepts 3 arguements: Username, Nodename, Publication
	# 	NOTE: the remaining arguments can be supplied as column names to which =
the user/node should be subscribed
	# Return undef if ok, else returns an error string

sub subscribe {
	my $self =3D shift;
	my $user =3D shift || die 'You must provide user, node and publication as =
arguments';
	my $node =3D shift || die 'You must provide user, node and publication as =
arguments';
	my $pub =3D shift || die 'You must provide user, node and publication as a=
rguments';
	my @cols =3D @_;

	my $quser =3D $self->{DBH}->quote($user);
	my $qnode =3D $self->{DBH}->quote($node);
	my $qpub =3D $self->{DBH}->quote($pub);

	my $sql =3D "select tablename from ____publications____ where pubname =3D =
$qpub";
	my ($table) =3D $self->GetOneRow($sql);
	return "Publication $pub does not exist." if not defined $table;
	my $qtable =3D $self->{DBH}->quote($table);

	@cols =3D $self->GetTableCols($table) if !@cols; # get defaults if cols we=
re not spefified by caller

	$self->{DBH}->do("insert into ____subscribed____ (username, nodename,pubna=
me,last_ver,refreshonce) values('$user', '$node','$pub',0, true)") || retur=
n 'Failes to create subscription: ' . $self->{DBH}->errstr;=09

	foreach my $col (@cols) {
		$self->{DBH}->do("insert into ____subscribed_cols____ (username, nodename=
, pubname, col_name) values ('$user','$node','$pub','$col')") || return 'Fa=
iles to subscribe column: ' . $self->{DBH}->errstr;=09
	}

	return undef;
}


#Unsubscribe user/node to a publication
	# Accepts 3 arguements: Username, Nodename, Publication
	# Return undef if ok, else returns an error string

sub unsubscribe {
	my $self =3D shift;
	my $user =3D shift || die 'You must provide user, node and publication as =
arguments';
	my $node =3D shift || die 'You must provide user, node and publication as =
arguments';
	my $pub =3D shift || die 'You must provide user, node and publication as a=
rguments';
	my @cols =3D @_;

	my $quser =3D $self->{DBH}->quote($user);
	my $qnode =3D $self->{DBH}->quote($node);
	my $qpub =3D $self->{DBH}->quote($pub);

	my $sql =3D "select tablename from ____publications____ where pubname =3D =
$qpub";
	my $table =3D $self->GetOneRow($sql);
	return "Publication $pub does not exist." if not defined $table;

	$self->{DBH}->do("delete from ____subscribed_cols____ where pubname =3D $q=
pub and username =3D $quser and nodename =3D $qnode") || return 'Failed to =
remove column subscription: '. $self->{DBH}->errstr;
	$self->{DBH}->do("delete from ____subscribed____ where pubname =3D $qpub a=
nd username =3D $quser and nodename =3D $qnode") || return 'Failed to remov=
e subscription: '. $self->{DBH}->errstr;


	return undef;
}


#INSTALL creates the necessary management tables.=20=20
	#returns undef if everything is ok, else returns a string describing the e=
rror;
sub INSTALL {
my $self =3D shift;

#check to see if management tables are already installed

my ($test) =3D $self->GetOneRow("select * from pg_class where relname =3D '=
____publications____'");
if (defined($test)) {
	return 'It appears that synchronization manangement tables are already ins=
talled here.  Please uninstall before reinstalling.';
};


#install the management tables, etc.

$self->{DBH}->do("create table ____publications____ (pubname text primary k=
ey,description text, tablename text, sync_order int4, whereclause text)") |=
| return $self->{DBH}->errstr();

$self->{DBH}->do("create table ____subscribed_cols____ (nodename text, user=
name text, pubname text, col_name text, description text, primary key(noden=
ame, username, pubname,col_name))") || return $self->{DBH}->errstr();

$self->{DBH}->do("create table ____subscribed____ (nodename text, username =
text, pubname text, last_session timestamp, post_ver int4, last_ver int4, w=
hereclause text, sanity_limit int4 default 0, sanity_delete int4 default 0,=
 sanity_update int4 default 0, sanity_insert int4 default 50, readonly bool=
ean, disabled boolean, fullrefreshonly boolean, refreshonce boolean, primar=
y key(nodename, username, pubname))") || return $self->{DBH}->errstr();

$self->{DBH}->do("create table ____last_stable____ (version int4, username =
text, nodename text, primary key(version, username, nodename))") || return =
$self->{DBH}->errstr();

$self->{DBH}->do("create table ____tables____ (tablename text, table_id int=
4, refcount int4, primary key(tablename, table_id))") || return $self->{DBH=
}->errstr();

$self->{DBH}->do("create sequence ____table_id_seq____") || return $self->{=
DBH}->errstr();

$self->{DBH}->do("alter table ____tables____ alter column table_id set defa=
ult nextval('____table_id_seq____')") || return $self->{DBH}->errstr();

$self->{DBH}->do("create table ____deleted____ (rowid int4, tablename text,=
 rowver int4, stamp timestamp, primary key (rowid, tablename))") || return =
$self->{DBH}->errstr();

$self->{DBH}->do("create table ____collision____ (rowid text, tablename tex=
t, rowver int4, stamp timestamp, faildate timestamp default now(),data text=
,reason text, action text, username text, nodename text,queue text)") || re=
turn $self->{DBH}->errstr();

$self->{DBH}->do("create sequence ____version_seq____") || return $self->{D=
BH}->errstr();

$self->{DBH}->do("create table ____sync_log____ (username text, nodename te=
xt, stamp timestamp, message text)") || return $self->{DBH}->errstr();

$self->{DBH}->do("create function sync_insert_ver() returns opaque as
'begin
if new.____rowver____ isnull then
new.____rowver____ :=3D ____version_seq____.last_value;
end if;
if new.____stamp____ isnull then
new.____stamp____ :=3D now();
end if;
return NEW;
end;' language 'plpgsql'") || return $self->{DBH}->errstr();

$self->{DBH}->do("create function sync_update_ver() returns opaque as
'begin
if new.____rowver____ =3D old.____rowver____ then
new.____rowver____ :=3D ____version_seq____.last_value;
end if;
if new.____stamp____ =3D old.____stamp____ then
new.____stamp____ :=3D now();
end if;
return NEW;
end;' language 'plpgsql'") || return $self->{DBH}->errstr();


$self->{DBH}->do("create function sync_delete_row() returns opaque as=20
'begin=20
insert into ____deleted____ (rowid,tablename,rowver,stamp) values
(old.____rowid____, TG_RELNAME, old.____rowver____,old.____stamp____);=20
return old;=20
end;' language 'plpgsql'") || return $self->{DBH}->errstr();

return undef;
}

#removes all management tables & related stuff
	#returns undef if ok, else returns an error message as a string
sub UNINSTALL {
my $self =3D shift;

#Make sure all tables are unpublished first
my $sth =3D $self->{DBH}->prepare("select pubname from ____publications____=
");
$sth->execute;
my $pub;
while (($pub) =3D $sth->fetchrow_array) {
	$self->unpublish($pub);=09
}
$sth->finish;

$self->{DBH}->do("drop table ____publications____") || return $self->{DBH}-=
>errstr();
$self->{DBH}->do("drop table ____subscribed_cols____") || return $self->{DB=
H}->errstr();
$self->{DBH}->do("drop table ____subscribed____") || return $self->{DBH}->e=
rrstr();
$self->{DBH}->do("drop table ____last_stable____") || return $self->{DBH}->=
errstr();
$self->{DBH}->do("drop table ____deleted____") || return $self->{DBH}->errs=
tr();
$self->{DBH}->do("drop table ____collision____") || return $self->{DBH}->er=
rstr();
$self->{DBH}->do("drop table ____tables____") || return $self->{DBH}->errst=
r();
$self->{DBH}->do("drop table ____sync_log____") || return $self->{DBH}->err=
str();

$self->{DBH}->do("drop sequence ____table_id_seq____") || return $self->{DB=
H}->errstr();
$self->{DBH}->do("drop sequence ____version_seq____") || return $self->{DBH=
}->errstr();

$self->{DBH}->do("drop function sync_insert_ver()") || return $self->{DBH}-=
>errstr();
$self->{DBH}->do("drop function sync_update_ver()") || return $self->{DBH}-=
>errstr();
$self->{DBH}->do("drop function sync_delete_row()") || return $self->{DBH}-=
>errstr();

return undef;

}

sub DESTROY {
	my $self =3D shift;

	$self->{DBH}->disconnect;
	$self->{DBLOG}->disconnect;
	return undef;
}

############# Helper Subs ############

sub GetOneRow {
	my $self =3D shift;
	my $sql =3D shift || die 'Must provide sql select statement';
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
	$sth->execute || return undef;
	my @row =3D $sth->fetchrow_array;
	$sth->finish;
	return @row;
}

#call this with second non-zero value to get hidden columns
sub GetTableCols {
	my $self =3D shift;
	my $table =3D shift || die 'Must provide table name';
	my $wanthidden =3D shift;
	my $sql =3D "select * from $table where 0 =3D 1";
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
	$sth->execute || return undef;
	my @row =3D @{$sth->{NAME}};
	$sth->finish;
	return @row if $wanthidden;
	my @cols;
	foreach my $col (@row) {
		next if $col eq '____rowver____';
		next if $col eq '____stamp____';
		next if $col eq '____rowid____';
		push @cols, $col;=09
	}
	return @cols;
}


1; #happy require

------=_NextPart_000_0062_01C0541E.125CAF30--


From pgsql-hackers-owner+M9917@postgresql.org Mon Jun 11 15:53:25 2001
Return-path: <pgsql-hackers-owner+M9917@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BJrPL01206
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 15:53:25 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5BJrPE67753;
	Mon, 11 Jun 2001 15:53:25 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9917@postgresql.org)
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BJmLE65620
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 15:48:21 -0400 (EDT)
	(envelope-from djohnson@greatbridge.com)
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5BJm2Q28847
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 15:48:02 -0400
From: Darren Johnson <djohnson@greatbridge.com>
Date: Mon, 11 Jun 2001 19:46:44 GMT
Message-ID: <20010611.19464400@j2.us.greatbridge.com>
Subject: [HACKERS] Postgres Replication
To: pgsql-hackers@postgresql.org
Reply-To: Darren Johnson <djohnson@greatbridge.com>
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
X-Priority: 3 (Normal)
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5BJmLE65621
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

We have been researching replication for several months now, and
I have some opinions to share to the community for feedback,
discussion, and/or participation. Our goal is to get a replication
solution for PostgreSQL that will meet most needs of users
and applications alike (mission impossible theme here :).

My research work along with others contributors has been collected
and presented here http://www.greatbridge.org/genpage?replication_top
If there is something missing, especially PostgreSQL related
work, I would like to know about it, and my apologies to any
one who got left off the list. This work is ongoing and doesn't
draw a conclusion, which IMHO should be left up to the user,
but I'm offering my opinions to spur discussion and/or feed back
from this list, and try not to offend any one.

Here's my opinion: of the approaches we've surveyed, the most
promising one is the Postgres-R project from the Information and
Communication Systems Group, ETH  in Zurich, Switzerland, originally
produced by Bettina Kemme, Gustavo Alonso, and others.  Although
Postgres-R is a synchronous approach, I believe it is the closest to
the goal mentioned above. Here is an abstract of the advantages.

1) Postgres-R is built on the PostgreSQL-6.4.2 code base.  The
replication
functionality is an optional parameter, so there will be insignificant
overhead for non replication situations. The replication and
communication
managers are the two new modules added to the PostgreSQL code base.

2) The replication manager's main function is controlling the
replication protocol via a message handling process. It receives
messages from the local and remote backends and forwards write
sets and decision messages via the communication manager to the
other servers. The replication manager controls all the transactions
running on the local server by keeping track of the states, including
which protocol phase (read, send, lock, or write) the transaction is
in. The replication manager maintains a two way channel
implemented as buffered sockets to each backend.

3) The main task of the communication manager is to provide simple
socket based interface between the replication manager and the
group communication system (currently Ensemble). The
communication system is a cluster of servers connected via
the communication manager.  The replication manager also maintains
three one-way channels to the communication system: a broadcast
channel to send messages, a total-order channel to receive
totally orders write sets, and a no-order channel to listen for
decision messages from the communication system. Decision
messages can be received at any time where the reception of
totally ordered write sets can be blocked in certain phases.

4) Based on a two phase locking approach, all dead lock situations
are local and detectable by Postgres-R code base, and aborted.

5) The write set messages used to send database changes to other
servers, can use either the SQL statements or the actual tuples
changed. This is a parameter based on number of tuples changed
by a transaction. While sending the tuple changes reduces
overhead in query parse, plan and execution, there is a negative
effect in sending a large write set across the network.

6) Postgres-R uses a synchronous approach that keeps the data on
all sites consistent and provides serializability. The user does not
have to bother with conflict resolution, and receives the same
correctness and consistency of a centralized system.

7) Postgres-R could be part of a good fault-resilient and load
distribution
solution.  It is peer-to-peer based and incurs low overhead propagating
updates to the other cluster members.  All replicated databases locally
process queries.

8) Compared to other synchronous replication strategies (e.g., standard
distributed 2-phase-locking + 2-phase-commit), Postgres-R has much
better performance using 2-phase-locking.


There are some issues that are not currently addressed by
Postgres-R, but some enhancements made to PostgreSQL since the
6.4.2 tree are very favorable to addressing these short comings.

1) The addition of WAL in 7.1 has the information for recovering
failed/off-line servers, currently all the servers would have to be
stopped, and a copy would be used to get all the servers synchronized
before starting again.

2)Being synchronous, Postgres-R would not be a good solution
for off line/WAN scenarios where asynchronous replication is
required.  There are some theories on this issue which involve servers
connecting and disconnecting from the cluster.

3)As in any serialized synchronous approach there is  change in the
flow of execution of a transaction; while most of these changes can
be solved by calling newly developed functions at certain time points,
synchronous replica control is tightly coupled with the concurrency
control.
Hence, especially in PostgreSQL 7.2 some parts of the concurrency control
(MVCC) might have to be adjusted. This can lead to a slightly more
complicated maintenance than a system that does not change the backend.

4)Partial replication is not addressed.


Any feedback on this post will be appreciated.

Thanks,

Darren

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M9923@postgresql.org Mon Jun 11 18:14:23 2001
Return-path: <pgsql-hackers-owner+M9923@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BMENL18644
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 18:14:23 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5BMEQE14877;
	Mon, 11 Jun 2001 18:14:26 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9923@postgresql.org)
Received: from spoetnik.xs4all.nl (spoetnik.xs4all.nl [194.109.249.226])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BM6ME12270
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 18:06:23 -0400 (EDT)
	(envelope-from reinoud@xs4all.nl)
Received: from KAYAK (kayak [192.168.1.20])
	by spoetnik.xs4all.nl (Postfix) with SMTP id 865A33E1B
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 00:06:16 +0200 (CEST)
From: reinoud@xs4all.nl (Reinoud van Leeuwen)
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Postgres Replication
Date: Mon, 11 Jun 2001 22:06:07 GMT
Organization: Not organized in any way
Reply-To: reinoud@xs4all.nl
Message-ID: <3b403d96.562404297@192.168.1.10>
References: <20010611.19464400@j2.us.greatbridge.com>
In-Reply-To: <20010611.19464400@j2.us.greatbridge.com>
X-Mailer: Forte Agent 1.5/32.451
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5BM6PE12276
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:

>We have been researching replication for several months now, and
>I have some opinions to share to the community for feedback,
>discussion, and/or participation. Our goal is to get a replication
>solution for PostgreSQL that will meet most needs of users
>and applications alike (mission impossible theme here :).
>
>My research work along with others contributors has been collected
>and presented here http://www.greatbridge.org/genpage?replication_top
>If there is something missing, especially PostgreSQL related
>work, I would like to know about it, and my apologies to any
>one who got left off the list. This work is ongoing and doesn't
>draw a conclusion, which IMHO should be left up to the user,
>but I'm offering my opinions to spur discussion and/or feed back
>from this list, and try not to offend any one.
>
>Here's my opinion: of the approaches we've surveyed, the most
>promising one is the Postgres-R project from the Information and
>Communication Systems Group, ETH  in Zurich, Switzerland, originally
>produced by Bettina Kemme, Gustavo Alonso, and others.  Although
>Postgres-R is a synchronous approach, I believe it is the closest to
>the goal mentioned above. Here is an abstract of the advantages.
>
>1) Postgres-R is built on the PostgreSQL-6.4.2 code base.  The
>replication
>functionality is an optional parameter, so there will be insignificant
>overhead for non replication situations. The replication and
>communication
>managers are the two new modules added to the PostgreSQL code base.
>
>2) The replication manager's main function is controlling the
>replication protocol via a message handling process. It receives
>messages from the local and remote backends and forwards write
>sets and decision messages via the communication manager to the
>other servers. The replication manager controls all the transactions
>running on the local server by keeping track of the states, including
>which protocol phase (read, send, lock, or write) the transaction is
>in. The replication manager maintains a two way channel
>implemented as buffered sockets to each backend.

what does "manager controls all the transactions" mean? I hope it does
*not* mean that a bug in the manager would cause transactions not to
commit...

>
>3) The main task of the communication manager is to provide simple
>socket based interface between the replication manager and the
>group communication system (currently Ensemble). The
>communication system is a cluster of servers connected via
>the communication manager.  The replication manager also maintains
>three one-way channels to the communication system: a broadcast
>channel to send messages, a total-order channel to receive
>totally orders write sets, and a no-order channel to listen for
>decision messages from the communication system. Decision
>messages can be received at any time where the reception of
>totally ordered write sets can be blocked in certain phases.
>
>4) Based on a two phase locking approach, all dead lock situations
>are local and detectable by Postgres-R code base, and aborted.

Does this imply locking over different servers? That would mean a
grinding halt when a network outage occurs...

>5) The write set messages used to send database changes to other
>servers, can use either the SQL statements or the actual tuples
>changed. This is a parameter based on number of tuples changed
>by a transaction. While sending the tuple changes reduces
>overhead in query parse, plan and execution, there is a negative
>effect in sending a large write set across the network.
>
>6) Postgres-R uses a synchronous approach that keeps the data on
>all sites consistent and provides serializability. The user does not
>have to bother with conflict resolution, and receives the same
>correctness and consistency of a centralized system.
>
>7) Postgres-R could be part of a good fault-resilient and load
>distribution
>solution.  It is peer-to-peer based and incurs low overhead propagating
>updates to the other cluster members.  All replicated databases locally
>process queries.
>
>8) Compared to other synchronous replication strategies (e.g., standard
>distributed 2-phase-locking + 2-phase-commit), Postgres-R has much
>better performance using 2-phase-locking.

Coming from a Sybase background I have some experience with
replication. The way it works in Sybase Replication server is as
follows:
- for each replicated database, there is a "log reader" process that
reads the WAL and captures only *committed transactions* to the
replication server. (it does not make much sense to replicate other
things IMHO :-).
- the replication server stores incoming data in a que ("stable
device"), until it is sure it has reached its final destination

- a replication server can send data to another replication server in
a compact (read: WAN friendly) way. A chain of replication servers can
be made, depending on network architecture)

- the final replication server makes a almost standard client
connection to the target database and translates the compact
transactions back to SQL statements. By using masks, extra
functionality can be built in.

This kind of architecture has several advantages:
- only committed transactions are replicated which saves overhead
- it does not have very much impact on performance of the source
server (apart from reading the WAL)
- since every replication server has a stable device, data is stored
when the network is down and nothing gets lost (nor stops performing)
- because only the log reader and the connection from the final
replication server are RDBMS specific, it is possible to replicate
from MS to Oracle using a Sybase replication server (or different
versions etc).

I do not know how much of this is patented or copyrighted, but the
architecture seems elegant and robust to me. I have done
implementations of bi-directional replication too. It *is* possible
but does require some funky setup and maintenance. (but it is better
that letting offices on different continents working on the same
database :-)

just my 2 EURO cts  :-)


--
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen       reinoud@xs4all.nl
http://www.xs4all.nl/~reinoud
__________________________________________________

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

From pgsql-hackers-owner+M9924@postgresql.org Mon Jun 11 18:41:51 2001
Return-path: <pgsql-hackers-owner+M9924@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BMfpL28917
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 18:41:51 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5BMfsE25092;
	Mon, 11 Jun 2001 18:41:54 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9924@postgresql.org)
Received: from spider.pilosoft.com (p55-222.acedsl.com [160.79.55.222])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BMalE23024
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 18:36:47 -0400 (EDT)
	(envelope-from alex@pilosoft.com)
Received: from localhost (alexmail@localhost)
	by spider.pilosoft.com (8.9.3/8.9.3) with ESMTP id SAA06092;
	Mon, 11 Jun 2001 18:46:05 -0400 (EDT)
Date: Mon, 11 Jun 2001 18:46:05 -0400 (EDT)
From: Alex Pilosov <alex@pilosoft.com>
To: Reinoud van Leeuwen <reinoud@xs4all.nl>
cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Postgres Replication
In-Reply-To: <3b403d96.562404297@192.168.1.10>
Message-ID: <Pine.BSO.4.10.10106111828450.9902-100000@spider.pilosoft.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

On Mon, 11 Jun 2001, Reinoud van Leeuwen wrote:

> On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:

> what does "manager controls all the transactions" mean? I hope it does
> *not* mean that a bug in the manager would cause transactions not to
> commit...
Well yeah it does. Bugs are a fact of life. :)

> >4) Based on a two phase locking approach, all dead lock situations
> >are local and detectable by Postgres-R code base, and aborted.
>
> Does this imply locking over different servers? That would mean a
> grinding halt when a network outage occurs...
Don't know, but see below.

> Coming from a Sybase background I have some experience with
> replication. The way it works in Sybase Replication server is as
> follows:
> - for each replicated database, there is a "log reader" process that
> reads the WAL and captures only *committed transactions* to the
> replication server. (it does not make much sense to replicate other
> things IMHO :-).
> - the replication server stores incoming data in a que ("stable
> device"), until it is sure it has reached its final destination
>
> - a replication server can send data to another replication server in
> a compact (read: WAN friendly) way. A chain of replication servers can
> be made, depending on network architecture)
>
> - the final replication server makes a almost standard client
> connection to the target database and translates the compact
> transactions back to SQL statements. By using masks, extra
> functionality can be built in.
>
> This kind of architecture has several advantages:
> - only committed transactions are replicated which saves overhead
> - it does not have very much impact on performance of the source
> server (apart from reading the WAL)
> - since every replication server has a stable device, data is stored
> when the network is down and nothing gets lost (nor stops performing)
> - because only the log reader and the connection from the final
> replication server are RDBMS specific, it is possible to replicate
> from MS to Oracle using a Sybase replication server (or different
> versions etc).
>
> I do not know how much of this is patented or copyrighted, but the
> architecture seems elegant and robust to me. I have done
> implementations of bi-directional replication too. It *is* possible
> but does require some funky setup and maintenance. (but it is better
> that letting offices on different continents working on the same
> database :-)
Yes, the above architecture is what almost every vendor of replication
software uses. And I'm sure if you worked much with Sybase, you hate the
garbage that their repserver is :).

The architecture of postgres-r and repserver are fundamentally different
for a good reason: repserver only wants to replicate committed
transactions, while postgres-r is more of a 'clustering' solution (albeit
they don't say this word), and is capable to do much more than simple rep
server.

I.E. you can safely put half of your clients to second server in a
replicated postgres-r cluster without being worried that a conflict (or a
wierd locking situation) may occur.

Try that with sybase, it is fundamentally designed for one-way
replication, and the fact that you can do one-way replication in both
directions doesn't mean its safe to do that!

I'm not sure how postgres-r handles network problems. To be useful, a good
replication solution must have an option of "no network->no updates" as
well as "no network->queue updates and send them later". However, it is
far easier to add queuing to a correct 'eager locking' database than it is
to add proper locking to a queue-based replicator.

-alex


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

From pgsql-hackers-owner+M9932@postgresql.org Mon Jun 11 22:17:54 2001
Return-path: <pgsql-hackers-owner+M9932@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C2HsL15803
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 22:17:54 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5C2HtE86836;
	Mon, 11 Jun 2001 22:17:55 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9932@postgresql.org)
Received: from femail15.sdc1.sfba.home.com (femail15.sdc1.sfba.home.com [24.0.95.142])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C2BXE85020
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 22:11:33 -0400 (EDT)
	(envelope-from djohnson@greatbridge.com)
Received: from greatbridge.com ([65.2.95.27])
          by femail15.sdc1.sfba.home.com
          (InterMail vM.4.01.03.20 201-229-121-120-20010223) with ESMTP
          id <20010612021124.OZRG17243.femail15.sdc1.sfba.home.com@greatbridge.com>;
          Mon, 11 Jun 2001 19:11:24 -0700
Message-ID: <3B257969.6050405@greatbridge.com>
Date: Mon, 11 Jun 2001 22:07:37 -0400
From: Darren Johnson <djohnson@greatbridge.com>
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001108 Netscape6/6.0
X-Accept-Language: en
MIME-Version: 1.0
To: Alex Pilosov <alex@pilosoft.com>, Reinoud van Leeuwen <reinoud@xs4all.nl>
cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Postgres Replication
References: <Pine.BSO.4.10.10106111828450.9902-100000@spider.pilosoft.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


Thanks for the feedback.  I'll try to address both your issues here.

>> what does "manager controls all the transactions" mean?
>
The replication manager controls the transactions by serializing the
write set messages.
This ensures all transactions are committed in the same order on each
server, so bugs
here are not allowed  ;-)

>> I hope it does
>> *not* mean that a bug in the manager would cause transactions not to
>> commit...
>
> Well yeah it does. Bugs are a fact of life. :

>
>>> 4) Based on a two phase locking approach, all dead lock situations
>>> are local and detectable by Postgres-R code base, and aborted.
>>
>> Does this imply locking over different servers? That would mean a
>> grinding halt when a network outage occurs...
>
> Don't know, but see below.

There is a branch of the Postgres-R code that has some failure detection
implemented,
so we will have to merge this functionality with the version of
Postgres-R we have, and
test this issue.  I'll let you the results.

>>
>> - the replication server stores incoming data in a que ("stable
>> device"), until it is sure it has reached its final destination
>
I like this idea for recovering servers that have been down a short
period of time, using WAL
to recover transactions missed during the outage.

>>
>> This kind of architecture has several advantages:
>> - only committed transactions are replicated which saves overhead
>> - it does not have very much impact on performance of the source
>> server (apart from reading the WAL)
>> - since every replication server has a stable device, data is stored
>> when the network is down and nothing gets lost (nor stops performing)
>> - because only the log reader and the connection from the final
>> replication server are RDBMS specific, it is possible to replicate
>> from MS to Oracle using a Sybase replication server (or different
>> versions etc).
>
There are some issues with the "log reader" approach:
1) The databases are not synchronized until the log reader completes its
processing.
2) I'm not sure about Sybase, but the log reader sends SQL statements to
the other servers
which are then parsed, planned and executed.  This over head could be
avoided if only
the tuple changes are replicated.
3) Works fine for read only situations, but peer-to-peer applications
using this approach
must be designed with a conflict resolution scheme.

Don't get me wrong, I believe we can learn from the replication
techniques used by commercial
databases like Sybase, and try to implement the good ones into
PostgreSQL.  Postgres-R is
a synchronous approach which out performs the traditional approaches to
synchronous replication.
Being based on PostgreSQL-6.4.2, getting this approach in the 7.2 tree
might be better than
reinventing the wheel.

Thanks again,

Darren


Thanks again,

Darren


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

From pgsql-hackers-owner+M9936@postgresql.org Tue Jun 12 03:22:51 2001
Return-path: <pgsql-hackers-owner+M9936@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C7MoL11061
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 03:22:50 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5C7MPE35441;
	Tue, 12 Jun 2001 03:22:25 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9936@postgresql.org)
Received: from reorxrsm.server.lan.at (zep3.it-austria.net [213.150.1.73])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C72ZE25009
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 03:02:36 -0400 (EDT)
	(envelope-from ZeugswetterA@wien.spardat.at)
Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149])
	by reorxrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5C72Qu27966
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 09:02:26 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21)
	id <M3L15341>; Tue, 12 Jun 2001 09:02:21 +0200
Message-ID: <11C1E6749A55D411A9670001FA68796336831B@sdexcsrv1.f000.d0188.sd.spardat.at>
From: Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at>
To: "'Darren Johnson'" <djohnson@greatbridge.com>,
   pgsql-hackers@postgresql.org
Subject: AW: [HACKERS] Postgres Replication
Date: Tue, 12 Jun 2001 09:02:20 +0200
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> Although
> Postgres-R is a synchronous approach, I believe it is the closest to
> the goal mentioned above. Here is an abstract of the advantages.

If you only want synchronous replication, why not simply use triggers ?
All you would then need is remote query access and two phase commit,
and maybe a little script that helps create the appropriate triggers.

Doing a replicate all or nothing approach that only works synchronous
is imho not flexible enough.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

From pgsql-hackers-owner+M9945@postgresql.org Tue Jun 12 10:18:29 2001
Return-path: <pgsql-hackers-owner+M9945@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CEISL06372
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 10:18:28 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CEIQE77517;
	Tue, 12 Jun 2001 10:18:26 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9945@postgresql.org)
Received: from krypton.netropolis.org ([208.222.215.99])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CEDuE75514
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 10:13:56 -0400 (EDT)
	(envelope-from root@generalogic.com)
Received: from [132.216.183.103] (helo=localhost)
	by krypton.netropolis.org with esmtp (Exim 3.12 #1 (Debian))
	id 159ouq-0003MU-00
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 10:13:08 -0400
To: pgsql-hackers@postgresql.org
Subject: Re: AW: [HACKERS] Postgres Replication
In-Reply-To: <20010612.13321600@j2.us.greatbridge.com>
References: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
	<20010612.13321600@j2.us.greatbridge.com>
X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.0 (HANANOEN)
MIME-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <20010612123623O.root@generalogic.com>
Date: Tue, 12 Jun 2001 12:36:23 +0530
From: root <root@generalogic.com>
X-Dispatcher: imput version 20000414(IM141)
Lines: 47
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


Hello

I have hacked up a replication layer for Perl code accessing a
database throught the DBI interface. It works pretty well with MySQL
(I can run pre-bender slashcode replicated, haven't tried the more
recent releases).

Potentially this hack should also work with Pg but I haven't tried
yet. If someone would like to test it out with a complex Pg app and
let me know how it went that would be cool.

The replication layer is based on Eric Newton's Recall replication
library (www.fault-tolerant.org/recall), and requires that all
database accesses be through the DBI interface.

The replicas are live, in that every operation affects all the
replicas in real time. Replica outages are invisible to the user, so
long as a majority of the replicas are functioning. Disconnected
replicas can be used for read-only access.

The only code modification that should be required to use the
replication layer is to change the DSN in connect():

  my $replicas = '192.168.1.1:7000,192.168.1.2:7000,192.168.1.3:7000';
  my $dbh = DBI->connect("DBI:Recall:database=$replicas");

You should be able to install the replication modules with:

perl -MCPAN -eshell
cpan> install Replication::Recall::DBServer

and then install DBD::Recall (which doesn't seem to be accessible from
the CPAN shell yet, for some reason), by:

wget http://www.cpan.org/authors/id/AGUL/DBD-Recall-1.10.tar.gz
tar xzvf DBD-Recall-1.10.tar.gz
cd DBD-Recall-1.10
perl Makefile.PL
make install

I would be very interested in hearing about your experiences with
this...

Thanks

#!

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

From pgsql-hackers-owner+M9938@postgresql.org Tue Jun 12 05:12:54 2001
Return-path: <pgsql-hackers-owner+M9938@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C9CrL15228
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 05:12:53 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5C9CnE91297;
	Tue, 12 Jun 2001 05:12:49 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9938@postgresql.org)
Received: from mobile.hub.org (SHW39-29.accesscable.net [24.138.39.29])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C98DE89175
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 05:08:13 -0400 (EDT)
	(envelope-from scrappy@hub.org)
Received: from localhost (scrappy@localhost)
	by mobile.hub.org (8.11.3/8.11.1) with ESMTP id f5C97f361630;
	Tue, 12 Jun 2001 06:07:46 -0300 (ADT)
	(envelope-from scrappy@hub.org)
X-Authentication-Warning: mobile.hub.org: scrappy owned process doing -bs
Date: Tue, 12 Jun 2001 06:07:41 -0300 (ADT)
From: The Hermit Hacker <scrappy@hub.org>
To: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
cc: "'Darren Johnson'" <djohnson@greatbridge.com>,
   <pgsql-hackers@postgresql.org>
Subject: Re: AW: [HACKERS] Postgres Replication
In-Reply-To: <11C1E6749A55D411A9670001FA68796336831B@sdexcsrv1.f000.d0188.sd.spardat.at>
Message-ID: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


which I believe is what the rserv implementation in contrib currently does
... no?

its funny ... what is in contrib right now was developed in a weekend by
Vadim, put in contrib, yet nobody has either used it *or* seen fit to
submit patches to improve it ... ?

On Tue, 12 Jun 2001, Zeugswetter Andreas SB wrote:

>
> > Although
> > Postgres-R is a synchronous approach, I believe it is the closest to
> > the goal mentioned above. Here is an abstract of the advantages.
>
> If you only want synchronous replication, why not simply use triggers ?
> All you would then need is remote query access and two phase commit,
> and maybe a little script that helps create the appropriate triggers.
>
> Doing a replicate all or nothing approach that only works synchronous
> is imho not flexible enough.
>
> Andreas
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl
>

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

From pgsql-hackers-owner+M9940@postgresql.org Tue Jun 12 09:39:08 2001
Return-path: <pgsql-hackers-owner+M9940@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CDd8L03200
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 09:39:08 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CDcmE58175;
	Tue, 12 Jun 2001 09:38:48 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9940@postgresql.org)
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CDYAE56164
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 09:34:10 -0400 (EDT)
	(envelope-from djohnson@greatbridge.com)
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CDXeQ03585;
	Tue, 12 Jun 2001 09:33:40 -0400
From: Darren Johnson <djohnson@greatbridge.com>
Date: Tue, 12 Jun 2001 13:32:16 GMT
Message-ID: <20010612.13321600@j2.us.greatbridge.com>
Subject: Re: AW: [HACKERS] Postgres Replication
To: The Hermit Hacker <scrappy@hub.org>
cc: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>,
   <pgsql-hackers@postgresql.org>
Reply-To: Darren Johnson <djohnson@greatbridge.com>
In-Reply-To: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
References: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
X-Priority: 3 (Normal)
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CDYAE56166
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> which I believe is what the rserv implementation in contrib currently
does
> ... no?

We tried rserv, PG Link (Joseph Conway), and PosrgreSQL Replicator.  All
these projects are trigger based asynchronous replication.  They all have
some advantages over the current functionality of Postgres-R some of
which I believe can be addressed:

1) Partial replication - being able to replicate just one or part of a
table(s)
2) They make no changes to the PostgreSQL code base. (Postgres-R can't
address this one ;)
3) PostgreSQL Replicator has some very nice conflict resolution schemes.


Here are some disadvantages to using a "trigger based" approach:

1) Triggers simply transfer individual data items when they are modified,
they do not keep track of transactions.
2) The execution of triggers within a database imposes a performance
overhead to that database.
3) Triggers require careful management by database administrators.
Someone needs to keep track of all the "alarms" going off.
4) The activation of triggers in a database cannot be easily
rolled back or undone.


> On Tue, 12 Jun 2001, Zeugswetter Andreas SB wrote:

> > Doing a replicate all or nothing approach that only works synchronous
> > is imho not flexible enough.
> >


I agree.  Partial and asynchronous replication need to be addressed,
and some of the common functionality of Postgres-R could possibly
be used to meet those needs.


Thanks for your feedback,

Darren

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

From pgsql-hackers-owner+M9969@postgresql.org Tue Jun 12 16:53:45 2001
Return-path: <pgsql-hackers-owner+M9969@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CKriL23104
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 16:53:44 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CKrlE87423;
	Tue, 12 Jun 2001 16:53:47 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9969@postgresql.org)
Received: from sectorbase2.sectorbase.com (sectorbase2.sectorbase.com [63.88.121.62] (may be forged))
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CHWkE69562
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 13:32:46 -0400 (EDT)
	(envelope-from vmikheev@SECTORBASE.COM)
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
	id <MX6MWMV8>; Tue, 12 Jun 2001 10:30:29 -0700
Message-ID: <3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com>
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
To: "'Darren Johnson'" <djohnson@greatbridge.com>,
   The Hermit Hacker
  <scrappy@hub.org>
cc: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>,
   pgsql-hackers@postgresql.org
Subject: RE: AW: [HACKERS] Postgres Replication
Date: Tue, 12 Jun 2001 10:30:27 -0700
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

> Here are some disadvantages to using a "trigger based" approach:
>
> 1) Triggers simply transfer individual data items when they
> are modified, they do not keep track of transactions.

I don't know about other *async* replication engines but Rserv
keeps track of transactions (if I understood you corectly).
Rserv transfers not individual modified data items but
*consistent* snapshot of changes to move slave database from
one *consistent* state (when all RI constraints satisfied)
to another *consistent* state.

> 4) The activation of triggers in a database cannot be easily
> rolled back or undone.

What do you mean?

Vadim

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M9967@postgresql.org Tue Jun 12 16:42:11 2001
Return-path: <pgsql-hackers-owner+M9967@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CKgBL17982
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 16:42:11 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CKgDE80566;
	Tue, 12 Jun 2001 16:42:13 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9967@postgresql.org)
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CIVdE07561
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 14:31:39 -0400 (EDT)
	(envelope-from djohnson@greatbridge.com)
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CIUfQ10080;
	Tue, 12 Jun 2001 14:30:41 -0400
From: Darren Johnson <djohnson@greatbridge.com>
Date: Tue, 12 Jun 2001 18:29:20 GMT
Message-ID: <20010612.18292000@j2.us.greatbridge.com>
Subject: RE: AW: [HACKERS] Postgres Replication
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
cc: The Hermit Hacker <scrappy@hub.org>,
   Zeugswetter Andreas SB
	<ZeugswetterA@wien.spardat.at>,
   pgsql-hackers@postgresql.org
Reply-To: Darren Johnson <djohnson@greatbridge.com>
	<3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com>
References: <3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com>
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
X-Priority: 3 (Normal)
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CIVdE07562
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> > Here are some disadvantages to using a "trigger based" approach:
> >
> > 1) Triggers simply transfer individual data items when they
> > are modified, they do not keep track of transactions.

> I don't know about other *async* replication engines but Rserv
> keeps track of transactions (if I understood you corectly).
> Rserv transfers not individual modified data items but
> *consistent* snapshot of changes to move slave database from
> one *consistent* state (when all RI constraints satisfied)
> to another *consistent* state.

I thought Andreas did a good job of correcting me here. Transaction-
based replication with triggers do not apply to points 1 and 4.  I
should have made a distinction between non-transaction and
transaction based replication with triggers.  I was not trying to
single out rserv or any other project, and I can see how my wording
implies this misinterpretation (my apologies).


> > 4) The activation of triggers in a database cannot be easily
> > rolled back or undone.

> What do you mean?

Once the trigger fires, it is not an easy task  to abort that
execution via rollback or undo.  Again this is not an issue
with a transaction-based trigger approach.


Sincerely,

Darren

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M9943@postgresql.org Tue Jun 12 10:03:02 2001
Return-path: <pgsql-hackers-owner+M9943@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CE32L04619
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 10:03:02 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CE31E70430;
	Tue, 12 Jun 2001 10:03:01 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9943@postgresql.org)
Received: from fizbanrsm.server.lan.at (zep4.it-austria.net [213.150.1.74])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CDoQE64062
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 09:50:26 -0400 (EDT)
	(envelope-from ZeugswetterA@wien.spardat.at)
Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149])
	by fizbanrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5CDoJe11224
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 15:50:19 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21)
	id <M3L15S4T>; Tue, 12 Jun 2001 15:50:15 +0200
Message-ID: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
From: Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at>
To: "'Darren Johnson'" <djohnson@greatbridge.com>,
   The Hermit Hacker
  <scrappy@hub.org>
cc: pgsql-hackers@postgresql.org
Subject: AW: AW: [HACKERS] Postgres Replication
Date: Tue, 12 Jun 2001 15:50:09 +0200
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> Here are some disadvantages to using a "trigger based" approach:
>
> 1) Triggers simply transfer individual data items when they
> are modified, they do not keep track of transactions.
> 2) The execution of triggers within a database imposes a performance
> overhead to that database.
> 3) Triggers require careful management by database administrators.
> Someone needs to keep track of all the "alarms" going off.
> 4) The activation of triggers in a database cannot be easily
> rolled back or undone.

Yes, points 2 and 3 are a given, although point 2 buys you the functionality
of transparent locking across all involved db servers.
Points 1 and 4 are only the case for a trigger mechanism that does
not use remote connection and 2-phase commit.

Imho an implementation that opens a separate client connection to the
replication target is only suited for async replication, and for that a WAL
based solution would probably impose less overhead.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M9946@postgresql.org Tue Jun 12 10:47:09 2001
Return-path: <pgsql-hackers-owner+M9946@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CEl9L08144
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 10:47:09 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CEihE88714;
	Tue, 12 Jun 2001 10:44:43 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9946@postgresql.org)
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CEd6E85859
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 10:39:06 -0400 (EDT)
	(envelope-from djohnson@greatbridge.com)
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CEcgQ04905;
	Tue, 12 Jun 2001 10:38:42 -0400
From: Darren Johnson <djohnson@greatbridge.com>
Date: Tue, 12 Jun 2001 14:37:18 GMT
Message-ID: <20010612.14371800@j2.us.greatbridge.com>
Subject: Re: AW: AW: [HACKERS] Postgres Replication
To: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
cc: pgsql-hackers@postgresql.org
Reply-To: Darren Johnson <djohnson@greatbridge.com>
	<11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
X-Priority: 3 (Normal)
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CEd6E85860
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> Imho an implementation that opens a separate client connection to the
> replication target is only suited for async replication, and for that a
WAL
> based solution would probably impose less overhead.


Yes there is significant overhead with opening a connection to a
client, so Postgres-R creates a pool of backends at start up,
coupled with the group communication system (Ensemble) that
significantly reduces this issue.


Very good points,

Darren


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

From pgsql-hackers-owner+M9982@postgresql.org Tue Jun 12 19:04:06 2001
Return-path: <pgsql-hackers-owner+M9982@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CN46E10043
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 19:04:06 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CN4AE62160;
	Tue, 12 Jun 2001 19:04:10 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9982@postgresql.org)
Received: from spoetnik.xs4all.nl (spoetnik.xs4all.nl [194.109.249.226])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CMxaE60194
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 18:59:36 -0400 (EDT)
	(envelope-from reinoud@xs4all.nl)
Received: from KAYAK (kayak [192.168.1.20])
	by spoetnik.xs4all.nl (Postfix) with SMTP id 435353E1B
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 00:59:28 +0200 (CEST)
From: reinoud@xs4all.nl (Reinoud van Leeuwen)
To: pgsql-hackers@postgresql.org
Subject: Re: AW: AW: [HACKERS] Postgres Replication
Date: Tue, 12 Jun 2001 22:59:23 GMT
Organization: Not organized in any way
Reply-To: reinoud@xs4all.nl
Message-ID: <3b499c5b.652202125@192.168.1.10>
References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
In-Reply-To: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
X-Mailer: Forte Agent 1.5/32.451
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CMxcE60196
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

On Tue, 12 Jun 2001 15:50:09 +0200, you wrote:

>
>> Here are some disadvantages to using a "trigger based" approach:
>>
>> 1) Triggers simply transfer individual data items when they
>> are modified, they do not keep track of transactions.
>> 2) The execution of triggers within a database imposes a performance
>> overhead to that database.
>> 3) Triggers require careful management by database administrators.
>> Someone needs to keep track of all the "alarms" going off.
>> 4) The activation of triggers in a database cannot be easily
>> rolled back or undone.
>
>Yes, points 2 and 3 are a given, although point 2 buys you the functionality
>of transparent locking across all involved db servers.
>Points 1 and 4 are only the case for a trigger mechanism that does
>not use remote connection and 2-phase commit.
>
>Imho an implementation that opens a separate client connection to the
>replication target is only suited for async replication, and for that a WAL
>based solution would probably impose less overhead.

Well as I read back the thread I see 2 different approaches to
replication:

1: tight integrated replication.
pro:
- bi-directional (or multidirectional): updates are possible
everywhere
- A cluster of servers allways has the same state.
- it does not matter to which server you connect
con:
- network between servers will be a bottleneck, especially if it is a
WAN connection
- only full replication possible
- what happens if one server is down? (or the network between) are
commits still possible

2: async replication
pro:
- long distance possible
- no problems with network outages
- only changes are replicated, selects do not have impact
- no locking issues accross servers
- partial replication possible (many->one (datawarehouse), or one-many
(queries possible everywhere, updates only central)
- goof for failover situations (backup server is standing by)
con:
- bidirectional replication hard to set up (you'll have to implement
conflict resolution according to your business rules)
- different servers are not guaranteed to be in the same state.

I can think of some scenarios where I would definitely want to
*choose* one of the options. A load-balanced web environment would
likely want the first option, but synchronizing offices in different
continents might not work with 2-phase commit over the network....

And we have not even started talking about *managing* replicated
environments. A lot of fail-over scenarios stop planning after the
backup host has take control. But how to get back?
--
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen       reinoud@xs4all.nl
http://www.xs4all.nl/~reinoud
__________________________________________________

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

From pgsql-hackers-owner+M9986@postgresql.org Tue Jun 12 19:48:48 2001
Return-path: <pgsql-hackers-owner+M9986@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CNmmE13125
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 19:48:48 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CNmqE76673;
	Tue, 12 Jun 2001 19:48:52 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9986@postgresql.org)
Received: from sss.pgh.pa.us ([192.204.191.242])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CNdQE73923
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 19:39:26 -0400 (EDT)
	(envelope-from tgl@sss.pgh.pa.us)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss.pgh.pa.us (8.11.3/8.11.3) with ESMTP id f5CNdI016442;
	Tue, 12 Jun 2001 19:39:18 -0400 (EDT)
To: reinoud@xs4all.nl
cc: pgsql-hackers@postgresql.org
Subject: Re: AW: AW: [HACKERS] Postgres Replication
In-Reply-To: <3b499c5b.652202125@192.168.1.10>
References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> <3b499c5b.652202125@192.168.1.10>
Comments: In-reply-to reinoud@xs4all.nl (Reinoud van Leeuwen)
	message dated "Tue, 12 Jun 2001 22:59:23 +0000"
Date: Tue, 12 Jun 2001 19:39:18 -0400
Message-ID: <16439.992389158@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

reinoud@xs4all.nl (Reinoud van Leeuwen) writes:
> Well as I read back the thread I see 2 different approaches to
> replication:
> ...
> I can think of some scenarios where I would definitely want to
> *choose* one of the options.

Yes.  IIRC, it looks to be possible to support a form of async
replication using the Postgres-R approach: you allow the cluster
to break apart when communications fail, and then rejoin when
your link comes back to life.  (This can work in principle, how
close it is to reality is another question; but the rejoin operation
is the same as crash recovery, so you have to have it anyway.)

So this seems to me to allow getting most of the benefits of the async
approach.  OTOH it is difficult to see how to go the other way: getting
the benefits of a synchronous solution atop a basically-async
implementation doesn't seem like it can work.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

From pgsql-hackers-owner+M9997@postgresql.org Wed Jun 13 09:05:56 2001
Return-path: <pgsql-hackers-owner+M9997@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5DD5tE28260
	for <pgman@candle.pha.pa.us>; Wed, 13 Jun 2001 09:05:55 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5DD5xE12437;
	Wed, 13 Jun 2001 09:05:59 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M9997@postgresql.org)
Received: from fizbanrsm.server.lan.at (zep4.it-austria.net [213.150.1.74])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5DD19E00635
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 09:01:10 -0400 (EDT)
	(envelope-from ZeugswetterA@wien.spardat.at)
Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149])
	by fizbanrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5DD13m08153
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 15:01:03 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21)
	id <M6AB97MY>; Wed, 13 Jun 2001 15:00:02 +0200
Message-ID: <11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at>
From: Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at>
To: "'reinoud@xs4all.nl'" <reinoud@xs4all.nl>, pgsql-hackers@postgresql.org
Subject: AW: AW: AW: [HACKERS] Postgres Replication
Date: Wed, 13 Jun 2001 11:55:48 +0200
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> Well as I read back the thread I see 2 different approaches to
> replication:
>
> 1: tight integrated replication.
> pro:
> - bi-directional (or multidirectional): updates are possible everywhere
> - A cluster of servers allways has the same state.
> - it does not matter to which server you connect
> con:
> - network between servers will be a bottleneck, especially if it is a
> WAN connection
> - only full replication possible

I do not understand that point, if it is trigger based, you
have all the flexibility you need. (only some tables, only some rows,
different rows to different targets ....),
(or do you mean not all targets, that could also be achieved with triggers)

> - what happens if one server is down? (or the network between) are
> commits still possible

No, updates are not possible if one target is not reachable,
that would not be synchronous and would again need business rules
to resolve conflicts.

Allowing updates when a target is not reachable would require admin
intervention.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M10005@postgresql.org Wed Jun 13 11:15:48 2001
Return-path: <pgsql-hackers-owner+M10005@postgresql.org>
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5DFFmE08382
	for <pgman@candle.pha.pa.us>; Wed, 13 Jun 2001 11:15:48 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5DFFoE53621;
	Wed, 13 Jun 2001 11:15:50 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M10005@postgresql.org)
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5DEk7E38930
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 10:46:07 -0400 (EDT)
	(envelope-from djohnson@greatbridge.com)
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5DEhfQ22566;
	Wed, 13 Jun 2001 10:43:41 -0400
From: Darren Johnson <djohnson@greatbridge.com>
Date: Wed, 13 Jun 2001 14:44:11 GMT
Message-ID: <20010613.14441100@j2.us.greatbridge.com>
Subject: Re: AW: AW: AW: [HACKERS] Postgres Replication
To: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
cc: "'reinoud@xs4all.nl'" <reinoud@xs4all.nl>, pgsql-hackers@postgresql.org
Reply-To: Darren Johnson <djohnson@greatbridge.com>
	<11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at>
References: <11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at>
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
X-Priority: 3 (Normal)
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5DEk8E38931
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> > - only full replication possible

> I do not understand that point, if it is trigger based, you
> have all the flexibility you need. (only some tables, only some rows,
> different rows to different targets ....),
> (or do you mean not all targets, that could also be achieved with
triggers)

Currently with Postgres-R, it is one database replicating all tables to
all servers in the group communication system.  There are some ways
around
this by invoking the -r option when a SQL statement should be replicated,
and leaving the -r option off for non-replicated scenarios.  IMHO this is
not a good solution.

A better solution will need to be implemented, which involves a
subscription table(s) with relation/server information.  There are two
ideas for subscribing and receiving replicated data.

1) Receiver driven propagation - A simple solution where all
transactions are propagated and the receiving servers will reference
the subscription information before applying updates.

2) Sender driven propagation - A more optimal and complex solution
where servers do not receive any messages regarding data items for
which they have not subscribed


> > - what happens if one server is down? (or the network between) are
> > commits still possible

> No, updates are not possible if one target is not reachable,

AFAIK, Postgres-R can still replicate if one target is not reachable,
but only to the remaining servers ;).

There is a scenario that could arise if a server issues a lock
request then fails or goes off line.  There is code that checks
for this condition, which needs to be merged with the branch we have.

> that would not be synchronous and would again need business rules
> to resolve conflicts.

Yes the failed server would not be synchronized, and getting this
failed server back in sync needs to be addressed.

> Allowing updates when a target is not reachable would require admin
> intervention.

In its current state yes, but our goal would be to eliminate this
requirement as well.


Darren

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

From pgsql-hackers-owner+M18443=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 19:16:17 2002
Return-path: <pgsql-hackers-owner+M18443=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g150GGP03822
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 19:16:16 -0500 (EST)
Received: (qmail 77444 invoked by alias); 5 Feb 2002 00:16:11 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 5 Feb 2002 00:16:11 -0000
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g150Esl77040
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:14:54 -0500 (EST)
	(envelope-from markw@mohawksoft.com)
Received: from mohawksoft.com (localhost [127.0.0.1])
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g150AWh08676
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:10:33 -0500
Message-ID: <3C5F22F8.C9B958F0@mohawksoft.com>
Date: Mon, 04 Feb 2002 19:10:32 -0500
From: mlw <markw@mohawksoft.com>
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: [HACKERS] Replication
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it
works like the whole rserv project. I don't like it.

OK, what the hell do we need to do to get PostgreSQL replicating?

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M18445=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 19:57:01 2002
Return-path: <pgsql-hackers-owner+M18445=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g150v0P06518
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 19:57:00 -0500 (EST)
Received: (qmail 90440 invoked by alias); 5 Feb 2002 00:56:59 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 5 Feb 2002 00:56:59 -0000
Received: from www1.navtechinc.com ([192.234.226.140])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g150rMl89885
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:53:22 -0500 (EST)
	(envelope-from ssinger@navtechinc.com)
Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190])
	by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id AAA06047;
	Tue, 5 Feb 2002 00:53:22 GMT
Received: from localhost (ssinger@localhost)
	by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id AAA10675;
	Tue, 5 Feb 2002 00:52:43 GMT
Date: Tue, 5 Feb 2002 00:52:43 +0000 (GMT)
From: Steven <ssinger@navtechinc.com>
X-X-Sender: <ssinger@pcNavYkfAdm1.ykf.navtechinc.com>
To: mlw <markw@mohawksoft.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com>
Message-ID: <Pine.LNX.4.33.0202050040190.24027-100000@pcNavYkfAdm1.ykf.navtechinc.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

On Mon, 4 Feb 2002, mlw wrote:

I've developed a replacement for Rserv and we are planning on releasing
it as open source(ie as a contrib module).

Like Rserv its trigger based but its much more flexible.
The key adventages it has over Rserv is that it has
-Support for multiple slaves
-It Perserves transactions while doing the mirroring. Ie  If rows A,B are
originally added in the same transaction they will be mirrored in the same
transaction.

We have plans on adding filtering based on data/selective mirroring as
well. (Ie only rows with COUNTRY='Canada' go to
slave A, and  rows with COUNTRY='China' go to slave B).
But I'm not sure when I'll get to that.

Support for conflict resolution(If allow edits to be made on the slaves)
would be nice.

I hope to be able to send a tarball with the source to the pgpatches list
within the next few days.

We've been using the system operationally for a number of months and have
been happy with it.

> I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it
> works like the whole rserv project. I don't like it.
> OK, what the hell do we need to do to get PostgreSQL replicating?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Steven Singer                                       ssinger@navtechinc.com
Aircraft Performance Systems                Phone:  519-747-1170 ext 282
Navtech Systems Support Inc.                AFTN:   CYYZXNSX SITA: YYZNSCR
Waterloo, Ontario                           ARINC:  YKFNSCR


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M18447=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 20:06:57 2002
Return-path: <pgsql-hackers-owner+M18447=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g1516vP07508
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 20:06:57 -0500 (EST)
Received: (qmail 92753 invoked by alias); 5 Feb 2002 01:06:55 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 5 Feb 2002 01:06:55 -0000
Received: from inflicted.crimelabs.net (crimelabs.net [66.92.101.112])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g150vhl91978
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:57:44 -0500 (EST)
	(envelope-from bpalmer@crimelabs.net)
Received: from mizer.crimelabs.net (mizer.crimelabs.net [192.168.88.10])
	by inflicted.crimelabs.net (Postfix) with ESMTP
	id 9D6EE8779; Mon,  4 Feb 2002 19:57:46 -0500 (EST)
Date: Mon, 4 Feb 2002 19:57:34 -0500 (EST)
From: bpalmer <bpalmer@crimelabs.net>
To: mlw <markw@mohawksoft.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com>
Message-ID: <Pine.BSO.4.43.0202041955420.17121-100000@mizer.crimelabs.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

>
> OK, what the hell do we need to do to get PostgreSQL replicating?

I hope you understand that replication,  done right,  is a massive
project.  I know that Darren any myself (and the rest of the pg-repl
folks) have been waiting till 7.2 went gold till we did anymore work.  I
think we hope to have master / slave replicatin working for 7.3 and then
target multimaster for 7.4.  At least that's the hope.

- Brandon

----------------------------------------------------------------------------
 c: 646-456-5455                                            h: 201-798-4983
 b. palmer,  bpalmer@crimelabs.net           pgp:crimelabs.net/bpalmer.pgp5


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M18449=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 21:16:56 2002
Return-path: <pgsql-hackers-owner+M18449=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g152GtP10503
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 21:16:55 -0500 (EST)
Received: (qmail 6711 invoked by alias); 5 Feb 2002 02:16:53 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 5 Feb 2002 02:16:53 -0000
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g151qSl99469
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 20:52:28 -0500 (EST)
	(envelope-from markw@mohawksoft.com)
Received: from mohawksoft.com (localhost [127.0.0.1])
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g151lph09147;
	Mon, 4 Feb 2002 20:47:51 -0500
Message-ID: <3C5F39C7.970F4549@mohawksoft.com>
Date: Mon, 04 Feb 2002 20:47:51 -0500
From: mlw <markw@mohawksoft.com>
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Steven <ssinger@navtechinc.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
References: <Pine.LNX.4.33.0202050040190.24027-100000@pcNavYkfAdm1.ykf.navtechinc.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

Steven wrote:
>
> On Mon, 4 Feb 2002, mlw wrote:
>
> I've developed a replacement for Rserv and we are planning on releasing
> it as open source(ie as a contrib module).
>
> Like Rserv its trigger based but its much more flexible.
> The key adventages it has over Rserv is that it has
> -Support for multiple slaves
> -It Perserves transactions while doing the mirroring. Ie  If rows A,B are
> originally added in the same transaction they will be mirrored in the same
> transaction.

I did a similar thing. I took the rserv trigger "as is," but rewrote the
replication support code. What I eventually did was write a "snapshot daemon"
which created snapshot files. Then a "slave daemon" which would check the last
snapshot applied and apply all the snapshots, in order, as needed. One would
run one of these daemons per slave server.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

From pgsql-hackers-owner+M18448=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 20:57:25 2002
Return-path: <pgsql-hackers-owner+M18448=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g151vOP09239
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 20:57:24 -0500 (EST)
Received: (qmail 99828 invoked by alias); 5 Feb 2002 01:57:19 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 5 Feb 2002 01:57:19 -0000
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g151s0l99529
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 20:54:00 -0500 (EST)
	(envelope-from markw@mohawksoft.com)
Received: from mohawksoft.com (localhost [127.0.0.1])
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g151nah09156;
	Mon, 4 Feb 2002 20:49:37 -0500
Message-ID: <3C5F3A30.A4C46FB8@mohawksoft.com>
Date: Mon, 04 Feb 2002 20:49:36 -0500
From: mlw <markw@mohawksoft.com>
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: bpalmer <bpalmer@crimelabs.net>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
References: <Pine.BSO.4.43.0202041955420.17121-100000@mizer.crimelabs.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

bpalmer wrote:
>
> >
> > OK, what the hell do we need to do to get PostgreSQL replicating?
>
> I hope you understand that replication,  done right,  is a massive
> project.  I know that Darren any myself (and the rest of the pg-repl
> folks) have been waiting till 7.2 went gold till we did anymore work.  I
> think we hope to have master / slave replicatin working for 7.3 and then
> target multimaster for 7.4.  At least that's the hope.

I do know how hard replication is. I also understand how important it is.

If you guys have a project going, and need developers, I am more than willing.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

From pgsql-hackers-owner+M18450=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 21:42:13 2002
Return-path: <pgsql-hackers-owner+M18450=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g152gCP11957
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 21:42:13 -0500 (EST)
Received: (qmail 14229 invoked by alias); 5 Feb 2002 02:42:09 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 5 Feb 2002 02:42:09 -0000
Received: from www1.navtechinc.com ([192.234.226.140])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g152SBl10682
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 21:28:11 -0500 (EST)
	(envelope-from ssinger@navtechinc.com)
Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190])
	by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id CAA06384;
	Tue, 5 Feb 2002 02:28:13 GMT
Received: from localhost (ssinger@localhost)
	by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id CAA10682;
	Tue, 5 Feb 2002 02:27:35 GMT
Date: Tue, 5 Feb 2002 02:27:35 +0000 (GMT)
From: Steven <ssinger@navtechinc.com>
X-X-Sender: <ssinger@pcNavYkfAdm1.ykf.navtechinc.com>
To: mlw <markw@mohawksoft.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <3C5F39C7.970F4549@mohawksoft.com>
Message-ID: <Pine.LNX.4.33.0202050159591.26756-100000@pcNavYkfAdm1.ykf.navtechinc.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


DBMirror doesn't use snapshot's instead it records a log of transactions
that are committed to the database in a pair of tables.
In the case of an INSERT this is the row that is being added.
In the case of a delete the primary key of the row being deleted.

And in the case of an UPDATE, the primary key before the update along with
all of the data the row should have after an update.

Then for each slave database a perl script walks though the transactions
that are pending for that host and reconstructs SQL to send the row edits
to that host.  A record of the fact that transaction Y has been sent to
host X is also kept.

When transaction X has been sent to all of the hosts that are in the
system it is then deleted from the Pending tables.

I suspect that all of the information I'm storing in the Pending tables is
also being stored by Postgres in its log but I haven't investigated how
the information could be extracted(or how long it is kept for).  That
would  reduce the extra storage overhead that the replication system
imposes.

As I remember(Its been a while since I've looked at it) RServ uses OID's
in its tables to point to the data that needs to be replicated.  We tried
a similar approach but found difficulties with doing partial updates.


On Mon, 4 Feb 2002, mlw wrote:

> I did a similar thing. I took the rserv trigger "as is," but rewrote the
> replication support code. What I eventually did was write a "snapshot daemon"
> which created snapshot files. Then a "slave daemon" which would check the last
> snapshot applied and apply all the snapshots, in order, as needed. One would
> run one of these daemons per slave server.


--
Steven Singer                                       ssinger@navtechinc.com
Aircraft Performance Systems                Phone:  519-747-1170 ext 282
Navtech Systems Support Inc.                AFTN:   CYYZXNSX SITA: YYZNSCR
Waterloo, Ontario                           ARINC:  YKFNSCR


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M18554=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 02:49:48 2002
Return-path: <pgsql-hackers-owner+M18554=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g177nlP04347
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 02:49:47 -0500 (EST)
Received: (qmail 22556 invoked by alias); 7 Feb 2002 07:49:49 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 7 Feb 2002 07:49:49 -0000
Received: from linuxworld.com.au (www.linuxworld.com.au [203.34.46.50])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g177QfE19572
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 02:26:42 -0500 (EST)
	(envelope-from swm@linuxworld.com.au)
Received: from localhost (swm@localhost)
	by linuxworld.com.au (8.11.4/8.11.4) with ESMTP id g177RiU06086;
	Thu, 7 Feb 2002 18:27:45 +1100
Date: Thu, 7 Feb 2002 18:27:44 +1100 (EST)
From: Gavin Sherry <swm@linuxworld.com.au>
To: mlw <markw@mohawksoft.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com>
Message-ID: <Pine.LNX.4.21.0202071751240.5160-100000@linuxworld.com.au>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

On Mon, 4 Feb 2002, mlw wrote:

> I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it
> works like the whole rserv project. I don't like it.
>
> OK, what the hell do we need to do to get PostgreSQL replicating?

The trigger model is not a very sophisticated one. I think I have a better
-- though more complicated -- one. This model would be able to handle
multiple masters and master->slave.

First of all, all machines in the cluster would have to be aware all the
machines in the cluster. This would have to be stored in a new system
table.

The FE/BE protocol would need to be modified to accepted parsed node trees
generated by pg_analyze_and_rewrite(). These could then be dispatched by
the executing server, inside of pg_exec_query_string, to all other servers
in the cluster (excluding itself). Naturally, this dispatch would need to
be non-blocking.

pg_exec_query_string() would need to check that nodetags to make sure
selects and perhaps some commands are not dispatched.

Before the executing server runs finish_xact_command(), it would check
that the query was successfully executed on all machines otherwise
abort. Such a system would need a few configuration options: whether or
not you abort on failed replication to slaves, the ability to replicate
only certain tables, etc.

Naturally, this would slow down writes to the system (possibly a lot
depending on the performance difference between the executing machine and
the least powerful machine in the cluster), but most usages of postgresql
are read intensive, not write.

Any reason this model would not work?

Gavin


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M18558=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 08:31:00 2002
Return-path: <pgsql-hackers-owner+M18558=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17DUxP13923
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 08:30:59 -0500 (EST)
Received: (qmail 91796 invoked by alias); 7 Feb 2002 13:30:55 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 7 Feb 2002 13:30:55 -0000
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g17Cw0E87782
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 07:58:01 -0500 (EST)
	(envelope-from markw@mohawksoft.com)
Received: from mohawksoft.com (localhost [127.0.0.1])
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g17CqNt16887;
	Thu, 7 Feb 2002 07:52:24 -0500
Message-ID: <3C627887.CC9FF837@mohawksoft.com>
Date: Thu, 07 Feb 2002 07:52:23 -0500
From: mlw <markw@mohawksoft.com>
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Gavin Sherry <swm@linuxworld.com.au>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
References: <Pine.LNX.4.21.0202071751240.5160-100000@linuxworld.com.au>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

Gavin Sherry wrote:
> Naturally, this would slow down writes to the system (possibly a lot
> depending on the performance difference between the executing machine and
> the least powerful machine in the cluster), but most usages of postgresql
> are read intensive, not write.
>
> Any reason this model would not work?

What, then is the purpose of replication to multiple masters?

I can think of only two reasons why you want replication. (1) Redundancy, make
sure that if one server dies, then another server has the same data and is used
seamlessly. (2) Increase performance over one system.

In reason (1) I submit that a server load balance which sits on top of
PostgreSQL, and executes writes on both servers while distributing reads would
be best. This is a HUGE project. The load balancer must know EXACTLY how the
system is configured, which includes all functions and everything.

In reason (2) your system would fail to provide the scalability that would be
needed. If writes take a long time, but reads are fine, what is the difference
between the trigger based replicator?

I have in the back of my mind, an idea of patching into the WAL stuff, and
using that mechanism to push changes out to the slaves.

Where one machine is still the master, but no trigger stuff, just a WAL patch.
Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
exactly, the idea hasn't completely formed yet.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

From pgsql-hackers-owner+M18574=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 12:51:42 2002
Return-path: <pgsql-hackers-owner+M18574=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17HpfP16661
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 12:51:41 -0500 (EST)
Received: (qmail 62955 invoked by alias); 7 Feb 2002 17:50:42 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 7 Feb 2002 17:50:42 -0000
Received: from www1.navtechinc.com ([192.234.226.140])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g17HnTE62256
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 12:49:29 -0500 (EST)
	(envelope-from ssinger@navtechinc.com)
Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190])
	by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id RAA07908;
	Thu, 7 Feb 2002 17:49:31 GMT
Received: from localhost (ssinger@localhost)
	by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id RAA05687;
	Thu, 7 Feb 2002 17:48:52 GMT
Date: Thu, 7 Feb 2002 17:48:51 +0000 (GMT)
From: Steven Singer <ssinger@navtechinc.com>
X-X-Sender: <ssinger@pcNavYkfAdm1.ykf.navtechinc.com>
To: Gavin Sherry <swm@linuxworld.com.au>
cc: mlw <markw@mohawksoft.com>,
   PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <Pine.LNX.4.21.0202071751240.5160-100000@linuxworld.com.au>
Message-ID: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


What you describe sounds like a form of a two-stage commit protocol.

If the command worked on two of the replicated databases but failed on a
third then the executing server would have to be able to undo the command
on the replicated databases as well as itself.

The problems with two stage commit type approches to replication are
1) Speed as you mentioned.  Write speed isn't a concern for some
applications but it is very important in others.

and
2) All of the databases must be able to communicate with each other at
all times in order for any edits to work.   If the servers are
connected over some sort of WAN that periodically has short outages this
is a problem.   Also if your using replication because you want to be able
to take down one of the databases for short periods of time without
bringing down the others your in trouble.


btw: I posted the alternative to Rserv that I mentioned the other day to
the  pg-patches mailing list.  If anyone is intreasted you should be able
to grab it off the archives.

On Thu, 7 Feb 2002, Gavin Sherry wrote:

>
> First of all, all machines in the cluster would have to be aware all the
> machines in the cluster. This would have to be stored in a new system
> table.
>
> The FE/BE protocol would need to be modified to accepted parsed node trees
> generated by pg_analyze_and_rewrite(). These could then be dispatched by
> the executing server, inside of pg_exec_query_string, to all other servers
> in the cluster (excluding itself). Naturally, this dispatch would need to
> be non-blocking.
>
> pg_exec_query_string() would need to check that nodetags to make sure
> selects and perhaps some commands are not dispatched.
>
> Before the executing server runs finish_xact_command(), it would check
> that the query was successfully executed on all machines otherwise
> abort. Such a system would need a few configuration options: whether or
> not you abort on failed replication to slaves, the ability to replicate
> only certain tables, etc.
>
> Naturally, this would slow down writes to the system (possibly a lot
> depending on the performance difference between the executing machine and
> the least powerful machine in the cluster), but most usages of postgresql
> are read intensive, not write.
>
> Any reason this model would not work?
>
> Gavin
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Steven Singer                                       ssinger@navtechinc.com
Aircraft Performance Systems                Phone:  519-747-1170 ext 282
Navtech Systems Support Inc.                AFTN:   CYYZXNSX SITA: YYZNSCR
Waterloo, Ontario                           ARINC:  YKFNSCR


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

From pgsql-hackers-owner+M18590=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 17:50:42 2002
Return-path: <pgsql-hackers-owner+M18590=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17MoeP27121
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 17:50:40 -0500 (EST)
Received: (qmail 39930 invoked by alias); 7 Feb 2002 22:50:17 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 7 Feb 2002 22:50:17 -0000
Received: from odin.fts.net (wall.icgate.net [209.26.177.2])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g17Ma4E38041
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 17:36:04 -0500 (EST)
	(envelope-from fharvell@odin.fts.net)
Received: from odin.fts.net (fharvell@localhost)
	by odin.fts.net (8.11.6/8.11.6) with ESMTP id g17MZhR17707;
	Thu, 7 Feb 2002 17:35:43 -0500
Message-ID: <200202072235.g17MZhR17707@odin.fts.net>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
From: F Harvell <fharvell@fts.net>
To: mlw <markw@mohawksoft.com>
cc: Gavin Sherry <swm@linuxworld.com.au>,
   PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: Message from mlw
    of "Thu, 07 Feb 2002 07:52:23 EST."
    <3C627887.CC9FF837@mohawksoft.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 07 Feb 2002 17:35:43 -0500
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

I'm not that familiar with the whole replication issues in PostgreSQL,
however, I would be partial to replication that was based upon the
playback of the (a?) journal file.  (I believe that the WAL is a
journal file.)

By being based upon a journal file, it would be possible to accomplish
two significant items.  First, it would be possible to "restore" a
database to an exact state just before a failure.  Most commercial
databases provide the ability to do this.  Banks, etc. log the journal
files directly to tape to provide a complete transaction history such
that they can rebuild their database from any given snapshot.  (Note
that the journal file needs to be "editable" as a failure may be
"delete from x" with a missing where clause.)

This leads directly into the second advantage, the ability to have a
replicated database operating anywhere, over any connection on any
server.  Speed of writes would not be a factor.  In essence, as long
as the replicated database had a snapshot of the database and then was
provided with all journal files since the snapshot, it would be
possible to build a current database.  If the replicant got behind in
the processing, it would catch up when things slowed down.

In my opionion, the first advantage is in many ways most important.
Replication becomes simply the restoration of the database in realtime
on a second server.  The "replication" task becomes the definition of
a protocol for distributing the journal file.  At least one major
database vendor does replication (shadowing) in exactly this mannor.

Maybe I'm all wet and the journal file and journal playback already
exists.  If so, IMHO, basing replication off of this would be the
right direction.


On Thu, 07 Feb 2002 07:52:23 EST, mlw wrote:
>
> I have in the back of my mind, an idea of patching into the WAL stuff, and
> using that mechanism to push changes out to the slaves.
>
> Where one machine is still the master, but no trigger stuff, just a WAL patch.
> Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
> exactly, the idea hasn't completely formed yet.
>


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M18605=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 00:50:08 2002
Return-path: <pgsql-hackers-owner+M18605=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g185o7P27878
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 00:50:07 -0500 (EST)
Received: (qmail 17348 invoked by alias); 8 Feb 2002 05:50:03 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 8 Feb 2002 05:50:03 -0000
Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g185cTE15241
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 00:38:29 -0500 (EST)
	(envelope-from darren.johnson@cox.net)
Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net
          (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP
          id <20020208053833.YKTV6710.lakemtao03.mgt.cox.net@cox.net>
          for <pgsql-hackers@postgresql.org>;
          Fri, 8 Feb 2002 00:38:33 -0500
Message-ID: <3C636232.6060206@cox.net>
Date: Fri, 08 Feb 2002 00:29:22 -0500
From: Darren Johnson <darren.johnson@cox.net>
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001108 Netscape6/6.0
X-Accept-Language: en
MIME-Version: 1.0
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
References: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


 >
 > The problems with two stage commit type approches to replication are

IMHO the biggest problem with two phased commit is it doesn't scale.
The more servers
you add to the replica the slower it goes.  Also there's the potential
for dead locks across
server boundaries.

 >
 > 2) All of the databases must be able to communicate with each other at
 > all times in order for any edits to work.   If the servers are
 > connected over some sort of WAN that periodically has short outages this
 > is a problem.   Also if your using replication because you want to be
able
 > to take down one of the databases for short periods of time without
 > bringing down the others your in trouble.

All true for two phased commit protocol.  To have multi master
replication, you must have all
systems communicating, but you can use a multicast group communication
system instead of
2PC.  Using total order messaging, you can ensure all changes are
delivered to all servers in the
replica in the same order.   This group communication system also allows
failures to be detected
while other servers in the replica continue processing.

A few of us are working with this theory, and trying to integrate with
7.2.  There is a working
model for 6.4, but its very limited.  (insert, update, and deletes)  We
are currently hosted at

http://gborg.postgresql.org/project/pgreplication/projdisplay.php
But the site has been down the last 2 days.  I've contacted the web
master, but haven't seen
any results yet.  If any one knows what going on with gborg, I'd
appreciate a status.

Darren


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M18617=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 06:20:44 2002
Return-path: <pgsql-hackers-owner+M18617=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18BKhP06132
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 06:20:43 -0500 (EST)
Received: (qmail 90815 invoked by alias); 8 Feb 2002 11:20:40 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 8 Feb 2002 11:20:40 -0000
Received: from laptop.kieser.demon.co.uk (kieser.demon.co.uk [62.49.6.72])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18B9ZE89589
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 06:09:36 -0500 (EST)
	(envelope-from brad@kieser.net)
Received: from laptop.kieser.demon.co.uk (localhost.localdomain [127.0.0.1])
	by laptop.kieser.demon.co.uk (Postfix) with SMTP
	id 598393A132; Fri,  8 Feb 2002 11:09:36 +0000 (GMT)
From: Bradley Kieser <brad@kieser.net>
Date: Fri, 08 Feb 2002 11:09:36 GMT
Message-ID: <20020208.11093600@laptop.kieser.demon.co.uk>
Subject: Re: [HACKERS] Replication
To: Darren Johnson <darren.johnson@cox.net>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
In-Reply-To: <3C636232.6060206@cox.net>
References: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com> <3C636232.6060206@cox.net>
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
X-Priority: 3 (Normal)
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g18BJoF90352
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

Darren,
Given that different replication strategies will probably be developed
for PG, do you envisage DBAs to be able to select the type of replication
for their installation? I.e. Replication being selectable rther like
storage structures?

Would be a killer bit of flexibility, given how enormous the impact of
replication will be to corporate adoption of PG.

Brad


>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 2/8/02, 5:29:22 AM, Darren Johnson <darren.johnson@cox.net> wrote
regarding Re: [HACKERS] Replication:


>  >
>  > The problems with two stage commit type approches to replication are

> IMHO the biggest problem with two phased commit is it doesn't scale.
> The more servers
> you add to the replica the slower it goes.  Also there's the potential
> for dead locks across
> server boundaries.

>  >
>  > 2) All of the databases must be able to communicate with each other at
>  > all times in order for any edits to work.   If the servers are
>  > connected over some sort of WAN that periodically has short outages this
>  > is a problem.   Also if your using replication because you want to be
> able
>  > to take down one of the databases for short periods of time without
>  > bringing down the others your in trouble.

> All true for two phased commit protocol.  To have multi master
> replication, you must have all
> systems communicating, but you can use a multicast group communication
> system instead of
> 2PC.  Using total order messaging, you can ensure all changes are
> delivered to all servers in the
> replica in the same order.   This group communication system also allows
> failures to be detected
> while other servers in the replica continue processing.

> A few of us are working with this theory, and trying to integrate with
> 7.2.  There is a working
> model for 6.4, but its very limited.  (insert, update, and deletes)  We
> are currently hosted at

> http://gborg.postgresql.org/project/pgreplication/projdisplay.php
> But the site has been down the last 2 days.  I've contacted the web
> master, but haven't seen
> any results yet.  If any one knows what going on with gborg, I'd
> appreciate a status.

> Darren


> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

From pgsql-hackers-owner+M18642=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 12:40:36 2002
Return-path: <pgsql-hackers-owner+M18642=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18HeZP08450
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 12:40:35 -0500 (EST)
Received: (qmail 74089 invoked by alias); 8 Feb 2002 17:40:30 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 8 Feb 2002 17:40:30 -0000
Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18HbwE73437
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 12:37:58 -0500 (EST)
	(envelope-from darren.johnson@cox.net)
Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net
          (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP
          id <20020208173804.DKQS6710.lakemtao03.mgt.cox.net@cox.net>;
          Fri, 8 Feb 2002 12:38:04 -0500
Message-ID: <3C63FB71.206@cox.net>
Date: Fri, 08 Feb 2002 11:23:13 -0500
From: Darren Johnson <darren.johnson@cox.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01
X-Accept-Language: en
MIME-Version: 1.0
To: Bradley Kieser <brad@kieser.net>
cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replication
References: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com> <3C636232.6060206@cox.net> <20020208.11093600@laptop.kieser.demon.co.uk>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

>
> Given that different replication strategies will probably be developed
> for PG, do you envisage DBAs to be able to select the type of replication
> for their installation? I.e. Replication being selectable rther like
> storage structures?

I can't speak for other replication solutions, but we are using the
--with-replication or
-r parameter when starting postmaster.  Some day I hope there will be
parameters for
master/slave partial/full and sync/async,  but it will be some time
before we cross those
bridges.

Darren


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

From pgsql-hackers-owner+M18658=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 14:42:40 2002
Return-path: <pgsql-hackers-owner+M18658=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18JgdP28166
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 14:42:39 -0500 (EST)
Received: (qmail 18650 invoked by alias); 8 Feb 2002 19:42:39 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 8 Feb 2002 19:42:39 -0000
Received: from enigma.trueimpact.net (enigma.trueimpact.net [209.82.45.201])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18JYBE17341
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 14:34:11 -0500 (EST)
	(envelope-from rjonasz@trueimpact.com)
Received: from nietzsche.trueimpact.net (unknown [209.82.45.200])
	by enigma.trueimpact.net (Postfix) with ESMTP id A785066B04
	for <pgsql-hackers@postgresql.org>; Fri,  8 Feb 2002 14:33:28 -0500 (EST)
Date: Fri, 8 Feb 2002 14:34:34 -0500 (EST)
From: Randall Jonasz <rjonasz@trueimpact.com>
X-X-Sender: <rjonasz@nietzsche.trueimpact.net>
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <3C627887.CC9FF837@mohawksoft.com>
Message-ID: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

I've been looking into database replication theory lately and have found
some interesting papers discussing various approaches.  (Here's
one paper that struck me as being very helpful,
http://citeseer.nj.nec.com/460405.html )  So far I favour an
eager replication system which is predicated on a read local/write all
available. The system should not depend on two phase commit or primary
copy algorithms.  The former leads to the whole system being as quick as
the slowest machine.  In addition, 2 phase commit involves 2n messages for
each transaction which does not scale well at all.  This idea will also
have to take into account a crashed node which did not ack a transaction.
The primary copy algorithms I've seen suffer from a single point of
failure and potential bottlenecks at the primary node.

Instead I like the master to master or peer to peer algorithm as discussed
in the above paper.  This approach accounts for network partitions, nodes
leaving and joining a cluster and the ability to commit a transaction once
the communication module has determined the total order of the said
transaction, i.e. no need for waiting for acks.   This scales well and
research has shown it to increase the number of transactions/second a
database cluster can handle over a single node.

Postgres-R is another interesting approach which I think should be taken
seriously. Anyone interested can read a paper on this at
http://citeseer.nj.nec.com/330257.html

Anyways, my two cents

Randall Jonasz
Software Engineer
Click2net Inc.


On Thu, 7 Feb 2002, mlw wrote:

> Gavin Sherry wrote:
> > Naturally, this would slow down writes to the system (possibly a lot
> > depending on the performance difference between the executing machine and
> > the least powerful machine in the cluster), but most usages of postgresql
> > are read intensive, not write.
> >
> > Any reason this model would not work?
>
> What, then is the purpose of replication to multiple masters?
>
> I can think of only two reasons why you want replication. (1) Redundancy, make
> sure that if one server dies, then another server has the same data and is used
> seamlessly. (2) Increase performance over one system.
>
> In reason (1) I submit that a server load balance which sits on top of
> PostgreSQL, and executes writes on both servers while distributing reads would
> be best. This is a HUGE project. The load balancer must know EXACTLY how the
> system is configured, which includes all functions and everything.
>
> In reason (2) your system would fail to provide the scalability that would be
> needed. If writes take a long time, but reads are fine, what is the difference
> between the trigger based replicator?
>
> I have in the back of my mind, an idea of patching into the WAL stuff, and
> using that mechanism to push changes out to the slaves.
>
> Where one machine is still the master, but no trigger stuff, just a WAL patch.
> Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
> exactly, the idea hasn't completely formed yet.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>
>


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

From pgsql-hackers-owner+M18660=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 15:20:32 2002
Return-path: <pgsql-hackers-owner+M18660=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18KKSP03731
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 15:20:29 -0500 (EST)
Received: (qmail 28961 invoked by alias); 8 Feb 2002 20:20:27 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 8 Feb 2002 20:20:27 -0000
Received: from inflicted.crimelabs.net (crimelabs.net [66.92.101.112])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18KC7E27667
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 15:12:07 -0500 (EST)
	(envelope-from bpalmer@crimelabs.net)
Received: from mizer.crimelabs.net (mizer.crimelabs.net [192.168.88.10])
	by inflicted.crimelabs.net (Postfix) with ESMTP
	id 1066F8787; Fri,  8 Feb 2002 15:12:08 -0500 (EST)
Date: Fri, 8 Feb 2002 15:12:00 -0500 (EST)
From: bpalmer <bpalmer@crimelabs.net>
To: Randall Jonasz <rjonasz@trueimpact.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
Message-ID: <Pine.BSO.4.43.0202081510130.21860-100000@mizer.crimelabs.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

I've not looked at the first paper,  but I wil.

> Postgres-R is another interesting approach which I think should be taken
> seriously. Anyone interested can read a paper on this at
> http://citeseer.nj.nec.com/330257.html

I would point you to the info on gborg,  but it seems to be down at the
moment.

- Brandon

----------------------------------------------------------------------------
 c: 646-456-5455                                            h: 201-798-4983
 b. palmer,  bpalmer@crimelabs.net           pgp:crimelabs.net/bpalmer.pgp5


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

From pgsql-hackers-owner+M18666=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 17:41:03 2002
Return-path: <pgsql-hackers-owner+M18666=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18Mf2P18046
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 17:41:03 -0500 (EST)
Received: (qmail 63057 invoked by alias); 8 Feb 2002 22:41:02 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 8 Feb 2002 22:41:02 -0000
Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18MR9E60361
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 17:27:11 -0500 (EST)
	(envelope-from darren.johnson@cox.net)
Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net
          (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP
          id <20020208222634.GTRG6710.lakemtao03.mgt.cox.net@cox.net>;
          Fri, 8 Feb 2002 17:26:34 -0500
Message-ID: <3C643F0F.70303@cox.net>
Date: Fri, 08 Feb 2002 16:11:43 -0500
From: Darren Johnson <darren.johnson@cox.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01
X-Accept-Language: en
MIME-Version: 1.0
To: Randall Jonasz <rjonasz@trueimpact.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
References: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR


> I've been looking into database replication theory lately and have found
> some interesting papers discussing various approaches.  (Here's
> one paper that struck me as being very helpful,
> http://citeseer.nj.nec.com/460405.html )


Here is another one from that same group, that addresses  the WAN issues.

> http://www.cnds.jhu.edu/pub/papers/cnds-2002-1.pdf


enjoy,

Darren


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

From pgsql-hackers-owner+M18674=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 19:20:30 2002
Return-path: <pgsql-hackers-owner+M18674=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g190KTP26980
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 19:20:29 -0500 (EST)
Received: (qmail 88124 invoked by alias); 9 Feb 2002 00:20:27 -0000
Received: from unknown (HELO postgresql.org) (64.49.215.8)
  by www.postgresql.org with SMTP; 9 Feb 2002 00:20:27 -0000
Received: from localhost.localdomain (bgp01077650bgs.wanarb01.mi.comcast.net [68.40.135.112])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g190H3E87489
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 19:17:03 -0500 (EST)
	(envelope-from camber@ais.org)
Received: from localhost (camber@localhost)
	by localhost.localdomain (8.11.6/8.11.6) with ESMTP id g190H0P18427;
	Fri, 8 Feb 2002 19:17:00 -0500
X-Authentication-Warning: localhost.localdomain: camber owned process doing -bs
Date: Fri, 8 Feb 2002 19:17:00 -0500 (EST)
From: Brian Bruns <camber@ais.org>
X-X-Sender: <camber@localhost.localdomain>
To: Randall Jonasz <rjonasz@trueimpact.com>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replication
In-Reply-To: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
Message-ID: <Pine.LNX.4.33.0202081904190.18420-100000@localhost.localdomain>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

> > I have in the back of my mind, an idea of patching into the WAL stuff, and
> > using that mechanism to push changes out to the slaves.
> >
> > Where one machine is still the master, but no trigger stuff, just a WAL patch.
> > Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
> > exactly, the idea hasn't completely formed yet.
> >

FWIW, Sybase Replication Server does just such a thing.

They have a secondary log marker (prevents the log from truncating past
the oldest unreplicated transaction).  A thread within the system called
the "rep agent" (but it use to be a separate process call the LTM), reads
the log and forwards it to the rep server, once the rep server has the
whole transaction and it is written to a stable device (aka synced to
disk) the rep server responds to the LTM telling him it's OK to move the
log marker forward.

Anyway, once the replication server proper has the transaction it uses a
publish/subscribe methodology to see who wants get the update.

Bidirectional replication is done by making two oneway replications.  The
whole thing is table based, it marks the tables as replicated or not in
the database to save the trip to the repserver on un replicated tables.

Plus you can take parts of a database (replicate all rows where the
country is "us" to this server and all the rows with "uk" to that server).
Or opposite you can roll up smaller regional databases to bigger ones,
it's very flexible.


Cheers,

Brian


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster