postgresql/doc/TODO.detail/logging
1999-09-20 15:40:12 +00:00

208 lines
8.5 KiB
Plaintext

From owner-pgsql-hackers@hub.org Fri Nov 13 13:24:37 1998
Received: from hub.org (majordom@hub.org [209.47.148.200])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA13457
for <maillist@candle.pha.pa.us>; Fri, 13 Nov 1998 13:24:35 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.1/8.9.1) with SMTP id NAA02464;
Fri, 13 Nov 1998 13:22:52 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Nov 1998 13:21:14 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.1/8.9.1) id NAA02331
for pgsql-hackers-outgoing; Fri, 13 Nov 1998 13:21:12 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.1/8.9.1) with SMTP id NAA02316
for <pgsql-hackers@postgreSQL.org>; Fri, 13 Nov 1998 13:21:06 -0500 (EST)
(envelope-from wieck@sapserv.debis.de)
Received: by orion.SAPserv.Hamburg.dsh.de
for pgsql-hackers@postgreSQL.org
id m0zeOEf-000EBPC; Fri, 13 Nov 98 19:46 MET
Message-Id: <m0zeOEf-000EBPC@orion.SAPserv.Hamburg.dsh.de>
From: jwieck@debis.com (Jan Wieck)
Subject: [HACKERS] shmem limits and redolog
To: pgsql-hackers@postgreSQL.org (PostgreSQL HACKERS)
Date: Fri, 13 Nov 1998 19:46:20 +0100 (MET)
Reply-To: jwieck@debis.com (Jan Wieck)
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: ROr
Hi,
I'm currently hacking around on a solution for logging all
database operations at query level that can recover a crashed
database from the last successful backup by redoing all the
commands.
Well, I wanted it to be as flexible as can. So I decided to
make it per database configurable. One could say which
databases are logged and if a database is, if it is logged
sync or async (in sync mode, every COMMIT forces an fsync of
the actual logfile and controlfiles).
To make async mode as fast as can, I'm using a shared memory
of 32K per database (not per backend) that is used as a wrap
around buffer from the backends to place their query
information. So the log writer can fall a little behind if
there are many backends doing different things that don't
lock each other.
Now I'm a little in doubt about the shared memory limits
reported. Was it a good decision to use shared memory? Am I
better off using socket's?
The bad thing in what I have up to now (it's far from
complete) is, that even if a database isn't currently logged,
a redolog writer is started and creates the 32K shmem segment
(plus a semaphore set with 5 semaphores). This is because I
plan to create commands like
ALTER DATABASE LOG MODE=ASYNC LOGDIR='/somewhere/dbname';
and the like that can be used at runtime (while more than one
backend is connected to the database) to turn logging on/off,
switch to/from backup mode (all other activity is stopped)
etc.
So every 32 databases will require another megabyte of shared
memory. The logging master controls which databases have
activity and kills redolog writers after some time of
inactivity, and the shmem is freed then. But it can hurt if
someone really has many many databases that are all used at
the same time.
What do the others say?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
From owner-pgsql-hackers@hub.org Wed Dec 16 15:46:41 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA00521
for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:46:40 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id PAA08772 for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:10:01 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.1/8.9.1) with SMTP id PAA01254;
Wed, 16 Dec 1998 15:06:56 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Dec 1998 14:58:11 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.1/8.9.1) id OAA00660
for pgsql-hackers-outgoing; Wed, 16 Dec 1998 14:58:10 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.1/8.9.1) with SMTP id OAA00643
for <pgsql-hackers@postgreSQL.org>; Wed, 16 Dec 1998 14:58:05 -0500 (EST)
(envelope-from wieck@sapserv.debis.de)
Received: by orion.SAPserv.Hamburg.dsh.de
for pgsql-hackers@postgreSQL.org
id m0zqNDo-000EBTC; Wed, 16 Dec 98 21:07 MET
Message-Id: <m0zqNDo-000EBTC@orion.SAPserv.Hamburg.dsh.de>
From: jwieck@debis.com (Jan Wieck)
Subject: Re: [HACKERS] redolog - for discussion
To: vadim@krs.ru (Vadim Mikheev)
Date: Wed, 16 Dec 1998 21:07:00 +0100 (MET)
Cc: jwieck@debis.com, pgsql-hackers@postgreSQL.org
Reply-To: jwieck@debis.com (Jan Wieck)
In-Reply-To: <3677B71D.C67462B3@krs.ru> from "Vadim Mikheev" at Dec 16, 98 08:35:25 pm
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
Vadim wrote:
>
> Jan Wieck wrote:
> >
> > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET};
> >
> ...
> >
> > For the others, the backend starts the recovery program
> > which reads the redolog files, establishes database
> > connections as required and reruns all the commands in
> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> > them. If a required logfile isn't found, it tells the
> ^^^^^
>
> I foresee problems with using _commands_ logging for
> recovery/replication -:((
>
> Let's consider two concurrent updates in READ COMMITTED mode:
>
> update test set x = 2 where y = 1;
>
> and
>
> update test set x = 3 where y = 1;
>
> The result of both committed transaction will be x = 2
> if the 1st transaction updated row _after_ 2nd transaction
> and x = 3 if the 2nd transaction gets row after 1st one.
> Order of updates is not defined by order in which commands
> begun and so order in which commands should be rerun
> will be unknown...
Yepp, the order in which commands begun is absolutely not of
interest. Locking could already delay the execution of one
command until another one started later has finished and
released the lock. It's a classic race condition.
Thus, my plan was to log the queries just before the call to
CommitTransactionCommand() in tcop. This has the advantage,
that queries which bail out with errors don't get into the
log at all and must not get rerun. And I can set a static
flag to false before starting the command, which is set to
true in the buffer manager when a buffer is written (marked
dirty), so filtering out queries that do no updates at all is
easy.
Unfortunately query level logging get's hit by the current
implementation of sequence numbers. If a query that get's
aborted somewhere in the middle (maybe by a trigger) called
nextval() for rows processed earlier, the sequence number
isn't advanced at recovery time, because the query is
suppressed at all. And sequences aren't locked, so for
concurrently running queries getting numbers from the same
sequence, the results aren't reproduceable. If some
application selects a value resulting from a sequence and
uses that later in another query, how could the redolog know
that this has changed? It's a Const in the query logged, and
all that corrupts the whole thing.
All that is painful and I don't see another solution yet than
to hook into nextval(), log out the numbers generated in
normal operation and getting back the same numbers in redo
mode.
The whole thing gets more and more complicated :-(
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #