From owner-pgsql-hackers@hub.org Fri Nov 13 13:24:37 1998 Received: from hub.org (majordom@hub.org [209.47.148.200]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA13457 for ; Fri, 13 Nov 1998 13:24:35 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.1/8.9.1) with SMTP id NAA02464; Fri, 13 Nov 1998 13:22:52 -0500 (EST) (envelope-from owner-pgsql-hackers@hub.org) Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Nov 1998 13:21:14 +0000 (EST) Received: (from majordom@localhost) by hub.org (8.9.1/8.9.1) id NAA02331 for pgsql-hackers-outgoing; Fri, 13 Nov 1998 13:21:12 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8]) by hub.org (8.9.1/8.9.1) with SMTP id NAA02316 for ; Fri, 13 Nov 1998 13:21:06 -0500 (EST) (envelope-from wieck@sapserv.debis.de) Received: by orion.SAPserv.Hamburg.dsh.de for pgsql-hackers@postgreSQL.org id m0zeOEf-000EBPC; Fri, 13 Nov 98 19:46 MET Message-Id: From: jwieck@debis.com (Jan Wieck) Subject: [HACKERS] shmem limits and redolog To: pgsql-hackers@postgreSQL.org (PostgreSQL HACKERS) Date: Fri, 13 Nov 1998 19:46:20 +0100 (MET) Reply-To: jwieck@debis.com (Jan Wieck) X-Mailer: ELM [version 2.4 PL25] Content-Type: text Sender: owner-pgsql-hackers@postgreSQL.org Precedence: bulk Status: ROr Hi, I'm currently hacking around on a solution for logging all database operations at query level that can recover a crashed database from the last successful backup by redoing all the commands. Well, I wanted it to be as flexible as can. So I decided to make it per database configurable. One could say which databases are logged and if a database is, if it is logged sync or async (in sync mode, every COMMIT forces an fsync of the actual logfile and controlfiles). To make async mode as fast as can, I'm using a shared memory of 32K per database (not per backend) that is used as a wrap around buffer from the backends to place their query information. So the log writer can fall a little behind if there are many backends doing different things that don't lock each other. Now I'm a little in doubt about the shared memory limits reported. Was it a good decision to use shared memory? Am I better off using socket's? The bad thing in what I have up to now (it's far from complete) is, that even if a database isn't currently logged, a redolog writer is started and creates the 32K shmem segment (plus a semaphore set with 5 semaphores). This is because I plan to create commands like ALTER DATABASE LOG MODE=ASYNC LOGDIR='/somewhere/dbname'; and the like that can be used at runtime (while more than one backend is connected to the database) to turn logging on/off, switch to/from backup mode (all other activity is stopped) etc. So every 32 databases will require another megabyte of shared memory. The logging master controls which databases have activity and kills redolog writers after some time of inactivity, and the shmem is freed then. But it can hurt if someone really has many many databases that are all used at the same time. What do the others say? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) # From owner-pgsql-hackers@hub.org Wed Dec 16 15:46:41 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA00521 for ; Wed, 16 Dec 1998 15:46:40 -0500 (EST) Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id PAA08772 for ; Wed, 16 Dec 1998 15:10:01 -0500 (EST) Received: from localhost (majordom@localhost) by hub.org (8.9.1/8.9.1) with SMTP id PAA01254; Wed, 16 Dec 1998 15:06:56 -0500 (EST) (envelope-from owner-pgsql-hackers@hub.org) Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Dec 1998 14:58:11 +0000 (EST) Received: (from majordom@localhost) by hub.org (8.9.1/8.9.1) id OAA00660 for pgsql-hackers-outgoing; Wed, 16 Dec 1998 14:58:10 -0500 (EST) (envelope-from owner-pgsql-hackers@postgreSQL.org) Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8]) by hub.org (8.9.1/8.9.1) with SMTP id OAA00643 for ; Wed, 16 Dec 1998 14:58:05 -0500 (EST) (envelope-from wieck@sapserv.debis.de) Received: by orion.SAPserv.Hamburg.dsh.de for pgsql-hackers@postgreSQL.org id m0zqNDo-000EBTC; Wed, 16 Dec 98 21:07 MET Message-Id: From: jwieck@debis.com (Jan Wieck) Subject: Re: [HACKERS] redolog - for discussion To: vadim@krs.ru (Vadim Mikheev) Date: Wed, 16 Dec 1998 21:07:00 +0100 (MET) Cc: jwieck@debis.com, pgsql-hackers@postgreSQL.org Reply-To: jwieck@debis.com (Jan Wieck) In-Reply-To: <3677B71D.C67462B3@krs.ru> from "Vadim Mikheev" at Dec 16, 98 08:35:25 pm X-Mailer: ELM [version 2.4 PL25] Content-Type: text Sender: owner-pgsql-hackers@postgreSQL.org Precedence: bulk Status: RO Vadim wrote: > > Jan Wieck wrote: > > > > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET}; > > > ... > > > > For the others, the backend starts the recovery program > > which reads the redolog files, establishes database > > connections as required and reruns all the commands in > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > > them. If a required logfile isn't found, it tells the > ^^^^^ > > I foresee problems with using _commands_ logging for > recovery/replication -:(( > > Let's consider two concurrent updates in READ COMMITTED mode: > > update test set x = 2 where y = 1; > > and > > update test set x = 3 where y = 1; > > The result of both committed transaction will be x = 2 > if the 1st transaction updated row _after_ 2nd transaction > and x = 3 if the 2nd transaction gets row after 1st one. > Order of updates is not defined by order in which commands > begun and so order in which commands should be rerun > will be unknown... Yepp, the order in which commands begun is absolutely not of interest. Locking could already delay the execution of one command until another one started later has finished and released the lock. It's a classic race condition. Thus, my plan was to log the queries just before the call to CommitTransactionCommand() in tcop. This has the advantage, that queries which bail out with errors don't get into the log at all and must not get rerun. And I can set a static flag to false before starting the command, which is set to true in the buffer manager when a buffer is written (marked dirty), so filtering out queries that do no updates at all is easy. Unfortunately query level logging get's hit by the current implementation of sequence numbers. If a query that get's aborted somewhere in the middle (maybe by a trigger) called nextval() for rows processed earlier, the sequence number isn't advanced at recovery time, because the query is suppressed at all. And sequences aren't locked, so for concurrently running queries getting numbers from the same sequence, the results aren't reproduceable. If some application selects a value resulting from a sequence and uses that later in another query, how could the redolog know that this has changed? It's a Const in the query logged, and all that corrupts the whole thing. All that is painful and I don't see another solution yet than to hook into nextval(), log out the numbers generated in normal operation and getting back the same numbers in redo mode. The whole thing gets more and more complicated :-( Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #