From pgsql-hackers-owner+M77861=pgman=candle.pha.pa.us@postgresql.org Fri Dec 23 05:19:20 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Stephen Frost cc: Martijn van Oosterhout , Jim C. Nasby , bizgres-general , pgsql-hackers@postgresql.org In-Reply-To: <20051222223625.GC6026@ns.snowman.net> References: <1135261893.2964.502.camel@localhost.localdomain> <20051222183751.GG72143@pervasive.com> <20051222201826.GH21783@svana.org> <1135289583.2964.536.camel@localhost.localdomain> <20051222223625.GC6026@ns.snowman.net> Date: Fri, 23 Dec 2005 10:18:43 +0000 Message-ID: <1135333123.2964.589.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.1 required=5 tests=[AWL=0.100] X-Spam-Score: 0.1 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 4728 On Thu, 2005-12-22 at 17:36 -0500, Stephen Frost wrote: > * Simon Riggs (simon@2ndquadrant.com) wrote: > > On Thu, 2005-12-22 at 21:18 +0100, Martijn van Oosterhout wrote: > > > Considering "WAL bypass" is code for "breaks PITR" > > > > No it isn't. All of the WAL bypass logic does *not* operate when PITR is > > active. The WAL bypass logic is aimed at Data Warehouses, which > > typically never operate in PITR mode for performance reasons, however > > the choice is yours. OK, thanks for saying all of that; you probably speak for many in raising these concerns. I'll answer each bit as we come to it. Suffice to say, your concerns are good and so are the answers: > Eh? PITR mode is bad for performance? Maybe I missed something but I > wouldn't have thought PITR would degrade regular performance all that > badly. PITR mode is *not* bad for performance. On a very heavily loaded write-intensive test system, the general PITR overhead on regular performance was around 1% - so almost negligible. We have been discussing a number of optimizations to specific commands that would allow them to avoid writing WAL and thus speed up their performance. If archive_command is set then WAL will always be written; if it is not set then these commands will (or could) go faster: - CREATE TABLE AS SELECT (in 8.1) - COPY LOCK (patch submitted) - COPY in same transaction as CREATE TABLE (patch submitted) - INSERT SELECT in same transaction as CREATE TABLE (this discussion) (There are a number of other conditions also, such as there must be no indexes on a table. All of which now documented with the patch) > So long as it doesn't take 15 minutes or some such to move the > WAL to somewhere else (and I'm not sure that'd even slow things down..). > For a Data Warehouse, have you got a better way of doing backups such > that you don't lose at minimum most of a day's work? Yes. Don't just use the backup facilities on their own. Think about how the architecture of your systems will work and see if there is a better way when you look at very large systems. > I'm not exactly a > big fan do doing a pg_dump every night either given that the database is > 360GB. Much nicer to take a weekly dump of the database and then do > PITR for a week or two before taking another dump of the db. e.g. Keep your reference data (low volume) in an Operational Data Store (ODS) database, protected by archiving. Keep your main fact data (high volume) in the Data Warehouse, but save the data in slices as you load it, so that a recovery is simply a reload of the database: no PITR or pg_dump required, so high performance data transformation and load work is possible. This is a commonly used architectural design pattern. > I like the idea of making COPY go faster, but please don't break my > backup system while you're at it. On a personal note, I would only add that I spent a long time working on PITR and I would never design anything that would intentionally break it (nor would patches be accepted that did that). That probably gives me the confidence to approach designs that might look like I'm doing that, but without actually straying over the edge. > I'm honestly kind of nervous about > what you mean by checking it PITR is active- how is that done, exactly? > Check if you have a script set to rotate the logs elsewhere? Or is it > checking if you're in the taking-a-full-database-backup stage? Or what? Internally, we use XLogArchivingActive(). Externally this will be set when the admin sets archive_command to a particular value. My original preference was for a parameter called archive_mode= ON | OFF which would allow us to more easily discuss this, but this does not currently exist. > What's the performance decrease when using PITR, and what's it from? Is > it just that COPY isn't as fast? Honestly, I could live with COPY being > not as fast as it could be if my backups work. :) These commands will not be optimized for speed when archive_command is set: - CREATE TABLE AS SELECT (in 8.1) - COPY LOCK (patch submitted) > Sorry for sounding concerned but, well, backups are very important and > so is performance and I'm afraid either I've not read all the > documentation about the issues being discussed here or there isn't > enough out there to make sense of it all yet. :) If you choose PITR, then you are safe. If you do not, the crash recovery of the database is not endangered by these optimizations. Hope that covers all of your concerns? I'm just writing a course that explains many of these techniques, available in the New Year. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M78004=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 20:59:03 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512290158.jBT1wEK28785@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <20051226122206.GA12934@svana.org> To: Martijn van Oosterhout Date: Wed, 28 Dec 2005 20:58:14 -0500 (EST) cc: Simon Riggs , Tom Lane , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122] X-Spam-Score: 0.122 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 3461 Having read through this thread, I would like to propose a syntax/behavior. I think we all now agree that the logging is more part of the table than the command itself. Right now we have a COPY LOCK patch, but people are going to want to control logging for INSERT INTO ... SELECT, and UPDATE, and all sorts of other things, so I think we are best adding an ALTER TABLE capability. I am thinking of this syntax: ALTER TABLE name RELIABILITY option where "option" is: DROP [ TABLE ON CRASH ] DELETE [ ROWS ON CRASH ] EXCLUSIVE SHARE Let me explain each option. DROP would drop the table on a restart after a non-clean shutdown. It would do _no_ logging on the table and allow concurrent access, plus index access. DELETE is the same as DROP, but it just truncates the table (perhaps TRUNCATE is a better word). EXCLUSIVE would allow only a single session to modify the table, and would do all changes by appending to the table, similar to COPY LOCK. EXCLUSIVE would also not allow indexes because those can not be isolated like appending to the heap. EXCLUSIVE would write all dirty shared buffers for the table and fsync them before committing. SHARE is the functionality we have now, with full logging. Does this get us any closer to a TODO item? It isn't great, but I think it is pretty clear, and I assume pg_dump would use ALTER to load each table. The advanage is that the COPY statements themselves are unchanged so they would work in loading into older versions of PostgreSQL. --------------------------------------------------------------------------- Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote: > > I would not be against such a table-level switch, but the exact > > behaviour would need to be specified more closely before this became a > > TODO item, IMHO. > > Well, I think at a per table level is the only sensible level. If a > table isn't logged, neither are the indexes. After an unclean shutdown > the data could be anywhere between OK and rubbish, with no way of > finding out which way. > > > If someone has a 100 GB table, they would not appreciate the table being > > truncated if a transaction to load 1 GB of data aborts, forcing recovery > > of the 100 GB table. > > Ah, but wouldn't such a large table be partitioned in such a way that > you could have the most recent partition having the loaded data. > Personally, I think these "shared temp tables" have more applications > than meet the eye. I've had systems with cache tables which could be > wiped on boot. Though I think my preference would be to TRUNCATE rather > than DROP on unclean shutdown. > > Have a nice day, > -- > Martijn van Oosterhout http://svana.org/kleptog/ > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > > tool for doing 5% of the work and then sitting around waiting for someone > > else to do the other 95% so you can sue them. -- End of PGP section, PGP failed! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq From pgsql-hackers-owner+M78007=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 22:06:13 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Message-ID: <43B3527A.4040709@commandprompt.com> Date: Wed, 28 Dec 2005 19:05:30 -0800 From: Joshua D. Drake Organization: Command Prompt, Inc. User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en To: Bruce Momjian cc: Martijn van Oosterhout , Simon Riggs , Tom Lane , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and References: <200512290158.jBT1wEK28785@candle.pha.pa.us> In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> X-Greylist: Sender succeded SMTP AUTH authentication, not delayed by milter-greylist-1.6 (hosting.commandprompt.com [192.168.1.101]); Wed, 28 Dec 2005 18:57:25 -0800 (PST) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.05 required=5 tests=[AWL=0.050, UPPERCASE_25_50=0] X-Spam-Score: 0.05 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 725 now agree that the logging is more part of the table than > the command itself. Right now we have a COPY LOCK patch, but people are > going to want to control logging for INSERT INTO ... SELECT, and UPDATE, > and all sorts of other things, so I think we are best adding an ALTER > TABLE capability. I am thinking of this syntax: > > ALTER TABLE name RELIABILITY option > > where "option" is: > > DROP [ TABLE ON CRASH ] > DELETE [ ROWS ON CRASH ] > EXCLUSIVE > SHARE I would say ON FAILURE (Crash just seems way to scary :)) Joshua D. Drake ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq From pgsql-hackers-owner+M78008=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 23:09:58 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512290409.jBT49LD13611@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <43B3527A.4040709@commandprompt.com> To: Joshua D. Drake Date: Wed, 28 Dec 2005 23:09:21 -0500 (EST) cc: Martijn van Oosterhout , Simon Riggs , Tom Lane , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122, UPPERCASE_25_50=0] X-Spam-Score: 0.122 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1111 Joshua D. Drake wrote: > now agree that the logging is more part of the table than > > the command itself. Right now we have a COPY LOCK patch, but people are > > going to want to control logging for INSERT INTO ... SELECT, and UPDATE, > > and all sorts of other things, so I think we are best adding an ALTER > > TABLE capability. I am thinking of this syntax: > > > > ALTER TABLE name RELIABILITY option > > > > where "option" is: > > > > DROP [ TABLE ON CRASH ] > > DELETE [ ROWS ON CRASH ] > > EXCLUSIVE > > SHARE > > I would say ON FAILURE (Crash just seems way to scary :)) Agreed, maybe ON RECOVERY. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match From simon@2ndquadrant.com Thu Dec 29 08:19:47 2005 Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Martijn van Oosterhout , Tom Lane , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> Date: Thu, 29 Dec 2005 13:19:45 +0000 Message-ID: <1135862385.2964.804.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) Content-Length: 7026 On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote: > Having read through this thread, I would like to propose a > syntax/behavior. > > I think we all now agree that the logging is more part of the table than > the command itself. Right now we have a COPY LOCK patch, but people are > going to want to control logging for INSERT INTO ... SELECT, and UPDATE, > and all sorts of other things, so I think we are best adding an ALTER > TABLE capability. I am thinking of this syntax: > > ALTER TABLE name RELIABILITY option > > where "option" is: > > DROP [ TABLE ON CRASH ] > DELETE [ ROWS ON CRASH ] > EXCLUSIVE > SHARE > > Let me explain each option. DROP would drop the table on a restart > after a non-clean shutdown. It would do _no_ logging on the table and > allow concurrent access, plus index access. DELETE is the same as DROP, > but it just truncates the table (perhaps TRUNCATE is a better word). > > EXCLUSIVE would allow only a single session to modify the table, and > would do all changes by appending to the table, similar to COPY LOCK. > EXCLUSIVE would also not allow indexes because those can not be isolated > like appending to the heap. EXCLUSIVE would write all dirty shared > buffers for the table and fsync them before committing. SHARE is the > functionality we have now, with full logging. > > Does this get us any closer to a TODO item? It isn't great, but I think > it is pretty clear, and I assume pg_dump would use ALTER to load each > table. The advanage is that the COPY statements themselves are > unchanged so they would work in loading into older versions of > PostgreSQL. First off, thanks for summarising a complex thread. My view would be that this thread has been complex because everybody has expressed a somewhat different requirement, which could be broken down as: 1. The need for a multi-user-accessible yet temporary table 2. Loading data into a table immediately after it is created (i.e. in same transaction), including but not limited to a reload from pg_dump 3. How to load data quickly into an existing table (COPY) 4. How to add/modify data quickly in an existing table (INSERT SELECT, UPDATE) I can see the need for all of those individually; my existing patch submission covers (2) and (3) only. I very much like your thought to coalesce these various requirements into a single coherent model. For requirement (1), table level options make sense. We would: - CREATE TABLE ALLTHINGS - ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY - lots of SQL, all fast because not logged (2) is catered for adequately by the existing COPY patch i.e. it will detect whether a table has just been created and then avoid writing WAL. In the patch, pg_dump has *not* been altered to use COPY LOCK, so a pg_dump *will* work with any other version of PostgreSQL, which *would not* be the case if we added ALTER TABLE ... RELIABILITY statements into it. Also, a pg_dump created at an earlier version could also be loaded faster using the patch. The only requirement is to issue all SQL as part of the same transaction - which is catered for by the --single-transaction option on pg_restore and psql. So (2) is catered for fully without the need for an ALTER TABLE ... RELIABILITY statement or COPY LOCK. For requirement (3), I would use table level options like this: (the table already exists and is reasonably big; we should not assume that everybody can and does use partitioning) - ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE - COPY - ALTER TABLE RELIABILITY ALLTHINGS2 SHARE For a load into an existing table I would always do all three actions together. COPY LOCK does exactly that *and* does it atomically. The two ways of doing (3) have a few pros/cons either way: Pro for ALTER TABLE: - same syntax as req (1) - doesn't need the keyword LOCK - allows INSERT SELECT, UPDATE operations also (req 4) Cons: - existing programs have to add additional statements to take advantage of this; with COPY LOCK we would add just a single keyword - operation is not atomic, which might lead to some operations waiting for a lock to operate as unlogged, since they would execute before the second ALTER TABLE gets there - operation will be understood by some, but not others. They will forget to switch the RELIABILITY back on and then lose their whole table when the database crashes. (watch...) ...but would it be a problem to have both? So, my thinking would be to separate things into two: a) Add a TODO item "shared temp tables" that caters for (1) and (4) ALTER TABLE name RELIABILITY {DELETE ROWS AT RECOVERY | FULL RECOVERY} (syntax TBD) which would - truncate all rows and remove all index entries during recovery - use shared_buffers, not temp_buffers - never write xlog records, even when in PITR mode - would avoid writing WAL for both heap *and* index tuples b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as *separate* optimizations (but using a common infrastructure in code). [This work was based upon discussions on -hackers only 6 months ago, so its not like its been snuck in or anything http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ] These two thoughts are separable. There is no need to have-both-or-neither within PostgreSQL. Eventually, I'd like all of these options, as a database designer. Best Regards, Simon Riggs > -------------------------------------------------------------------------- > > Martijn van Oosterhout wrote: > -- Start of PGP signed section. > > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote: > > > I would not be against such a table-level switch, but the exact > > > behaviour would need to be specified more closely before this became a > > > TODO item, IMHO. > > > > Well, I think at a per table level is the only sensible level. If a > > table isn't logged, neither are the indexes. After an unclean shutdown > > the data could be anywhere between OK and rubbish, with no way of > > finding out which way. > > > > > If someone has a 100 GB table, they would not appreciate the table being > > > truncated if a transaction to load 1 GB of data aborts, forcing recovery > > > of the 100 GB table. > > > > Ah, but wouldn't such a large table be partitioned in such a way that > > you could have the most recent partition having the loaded data. > > Personally, I think these "shared temp tables" have more applications > > than meet the eye. I've had systems with cache tables which could be > > wiped on boot. Though I think my preference would be to TRUNCATE rather > > than DROP on unclean shutdown. > > > > Have a nice day, > > -- > > Martijn van Oosterhout http://svana.org/kleptog/ > > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > > > tool for doing 5% of the work and then sitting around waiting for someone > > > else to do the other 95% so you can sue them. > -- End of PGP section, PGP failed! > From pgsql-hackers-owner+M78019=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 08:20:11 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Martijn van Oosterhout , Tom Lane , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> Date: Thu, 29 Dec 2005 13:19:45 +0000 Message-ID: <1135862385.2964.804.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.112 required=5 tests=[AWL=0.112] X-Spam-Score: 0.112 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 7139 On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote: > Having read through this thread, I would like to propose a > syntax/behavior. > > I think we all now agree that the logging is more part of the table than > the command itself. Right now we have a COPY LOCK patch, but people are > going to want to control logging for INSERT INTO ... SELECT, and UPDATE, > and all sorts of other things, so I think we are best adding an ALTER > TABLE capability. I am thinking of this syntax: > > ALTER TABLE name RELIABILITY option > > where "option" is: > > DROP [ TABLE ON CRASH ] > DELETE [ ROWS ON CRASH ] > EXCLUSIVE > SHARE > > Let me explain each option. DROP would drop the table on a restart > after a non-clean shutdown. It would do _no_ logging on the table and > allow concurrent access, plus index access. DELETE is the same as DROP, > but it just truncates the table (perhaps TRUNCATE is a better word). > > EXCLUSIVE would allow only a single session to modify the table, and > would do all changes by appending to the table, similar to COPY LOCK. > EXCLUSIVE would also not allow indexes because those can not be isolated > like appending to the heap. EXCLUSIVE would write all dirty shared > buffers for the table and fsync them before committing. SHARE is the > functionality we have now, with full logging. > > Does this get us any closer to a TODO item? It isn't great, but I think > it is pretty clear, and I assume pg_dump would use ALTER to load each > table. The advanage is that the COPY statements themselves are > unchanged so they would work in loading into older versions of > PostgreSQL. First off, thanks for summarising a complex thread. My view would be that this thread has been complex because everybody has expressed a somewhat different requirement, which could be broken down as: 1. The need for a multi-user-accessible yet temporary table 2. Loading data into a table immediately after it is created (i.e. in same transaction), including but not limited to a reload from pg_dump 3. How to load data quickly into an existing table (COPY) 4. How to add/modify data quickly in an existing table (INSERT SELECT, UPDATE) I can see the need for all of those individually; my existing patch submission covers (2) and (3) only. I very much like your thought to coalesce these various requirements into a single coherent model. For requirement (1), table level options make sense. We would: - CREATE TABLE ALLTHINGS - ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY - lots of SQL, all fast because not logged (2) is catered for adequately by the existing COPY patch i.e. it will detect whether a table has just been created and then avoid writing WAL. In the patch, pg_dump has *not* been altered to use COPY LOCK, so a pg_dump *will* work with any other version of PostgreSQL, which *would not* be the case if we added ALTER TABLE ... RELIABILITY statements into it. Also, a pg_dump created at an earlier version could also be loaded faster using the patch. The only requirement is to issue all SQL as part of the same transaction - which is catered for by the --single-transaction option on pg_restore and psql. So (2) is catered for fully without the need for an ALTER TABLE ... RELIABILITY statement or COPY LOCK. For requirement (3), I would use table level options like this: (the table already exists and is reasonably big; we should not assume that everybody can and does use partitioning) - ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE - COPY - ALTER TABLE RELIABILITY ALLTHINGS2 SHARE For a load into an existing table I would always do all three actions together. COPY LOCK does exactly that *and* does it atomically. The two ways of doing (3) have a few pros/cons either way: Pro for ALTER TABLE: - same syntax as req (1) - doesn't need the keyword LOCK - allows INSERT SELECT, UPDATE operations also (req 4) Cons: - existing programs have to add additional statements to take advantage of this; with COPY LOCK we would add just a single keyword - operation is not atomic, which might lead to some operations waiting for a lock to operate as unlogged, since they would execute before the second ALTER TABLE gets there - operation will be understood by some, but not others. They will forget to switch the RELIABILITY back on and then lose their whole table when the database crashes. (watch...) ...but would it be a problem to have both? So, my thinking would be to separate things into two: a) Add a TODO item "shared temp tables" that caters for (1) and (4) ALTER TABLE name RELIABILITY {DELETE ROWS AT RECOVERY | FULL RECOVERY} (syntax TBD) which would - truncate all rows and remove all index entries during recovery - use shared_buffers, not temp_buffers - never write xlog records, even when in PITR mode - would avoid writing WAL for both heap *and* index tuples b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as *separate* optimizations (but using a common infrastructure in code). [This work was based upon discussions on -hackers only 6 months ago, so its not like its been snuck in or anything http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ] These two thoughts are separable. There is no need to have-both-or-neither within PostgreSQL. Eventually, I'd like all of these options, as a database designer. Best Regards, Simon Riggs > -------------------------------------------------------------------------- > > Martijn van Oosterhout wrote: > -- Start of PGP signed section. > > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote: > > > I would not be against such a table-level switch, but the exact > > > behaviour would need to be specified more closely before this became a > > > TODO item, IMHO. > > > > Well, I think at a per table level is the only sensible level. If a > > table isn't logged, neither are the indexes. After an unclean shutdown > > the data could be anywhere between OK and rubbish, with no way of > > finding out which way. > > > > > If someone has a 100 GB table, they would not appreciate the table being > > > truncated if a transaction to load 1 GB of data aborts, forcing recovery > > > of the 100 GB table. > > > > Ah, but wouldn't such a large table be partitioned in such a way that > > you could have the most recent partition having the loaded data. > > Personally, I think these "shared temp tables" have more applications > > than meet the eye. I've had systems with cache tables which could be > > wiped on boot. Though I think my preference would be to TRUNCATE rather > > than DROP on unclean shutdown. > > > > Have a nice day, > > -- > > Martijn van Oosterhout http://svana.org/kleptog/ > > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > > > tool for doing 5% of the work and then sitting around waiting for someone > > > else to do the other 95% so you can sue them. > -- End of PGP section, PGP failed! > ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend From pgsql-hackers-owner+M78021=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 09:35:58 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Rod Taylor To: Simon Riggs cc: Bruce Momjian , Martijn van Oosterhout , Tom Lane , Greg Stark , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <1135862385.2964.804.camel@localhost.localdomain> Date: Thu, 29 Dec 2005 09:35:27 -0500 Message-ID: <1135866927.61038.13.camel@home> X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port X-SA-Exim-Mail-From: pg@rbt.ca Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and X-SA-Exim-Version: 3.1 (built Tue Feb 24 05:09:27 GMT 2004) X-SA-Exim-Scanned: Yes X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.024 required=5 tests=[AWL=0.024, UPPERCASE_25_50=0] X-Spam-Score: 0.024 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 506 > So, my thinking would be to separate things into two: > a) Add a TODO item "shared temp tables" that caters for (1) and (4) > > ALTER TABLE name RELIABILITY > {DELETE ROWS AT RECOVERY | FULL RECOVERY} > (syntax TBD) DELETE ROWS AT RECOVERY would need to be careful or disallowed when referenced via a foreign key to ensure the database is not restored in an inconsistent state. -- ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster From pg@rbt.ca Thu Dec 29 09:35:35 2005 From: Rod Taylor To: Simon Riggs cc: Bruce Momjian , Martijn van Oosterhout , Tom Lane , Greg Stark , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <1135862385.2964.804.camel@localhost.localdomain> Date: Thu, 29 Dec 2005 09:35:27 -0500 Message-ID: <1135866927.61038.13.camel@home> X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port X-SA-Exim-Mail-From: pg@rbt.ca Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on psi.look.ca X-Spam-Level: X-Spam-Status: No, hits=0.7 required=9.0 tests=UPPERCASE_25_50 autolearn=no version=2.63 X-SA-Exim-Version: 3.1 (built Tue Feb 24 05:09:27 GMT 2004) X-SA-Exim-Scanned: Yes Content-Length: 393 > So, my thinking would be to separate things into two: > a) Add a TODO item "shared temp tables" that caters for (1) and (4) > > ALTER TABLE name RELIABILITY > {DELETE ROWS AT RECOVERY | FULL RECOVERY} > (syntax TBD) DELETE ROWS AT RECOVERY would need to be careful or disallowed when referenced via a foreign key to ensure the database is not restored in an inconsistent state. -- From pgsql-hackers-owner+M78022=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 10:10:57 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Rod Taylor cc: Bruce Momjian , Martijn van Oosterhout , Tom Lane , Greg Stark , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <1135866927.61038.13.camel@home> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <1135862385.2964.804.camel@localhost.localdomain> <1135866927.61038.13.camel@home> Date: Thu, 29 Dec 2005 15:10:40 +0000 Message-ID: <1135869040.2964.824.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113] X-Spam-Score: 0.113 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 888 On Thu, 2005-12-29 at 09:35 -0500, Rod Taylor wrote: > > So, my thinking would be to separate things into two: > > a) Add a TODO item "shared temp tables" that caters for (1) and (4) > > > > ALTER TABLE name RELIABILITY > > {DELETE ROWS AT RECOVERY | FULL RECOVERY} > > (syntax TBD) > > DELETE ROWS AT RECOVERY would need to be careful or disallowed when > referenced via a foreign key to ensure the database is not restored in > an inconsistent state. I think we'd need to apply the same rule as we do for temp tables: they cannot be referenced by a permanent table. There are possibly some other restrictions also. Anyone? Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match From tgl@sss.pgh.pa.us Thu Dec 29 11:12:13 2005 To: Simon Riggs cc: Bruce Momjian , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <1135862385.2964.804.camel@localhost.localdomain> Comments: In-reply-to Simon Riggs message dated "Thu, 29 Dec 2005 13:19:45 +0000" Date: Thu, 29 Dec 2005 11:12:11 -0500 Message-ID: <7273.1135872731@sss.pgh.pa.us> From: Tom Lane Content-Length: 1963 Simon Riggs writes: > My view would be that this thread has been complex because everybody has > expressed a somewhat different requirement, which could be broken down > as: > 1. The need for a multi-user-accessible yet temporary table > 2. Loading data into a table immediately after it is created (i.e. in > same transaction), including but not limited to a reload from pg_dump > 3. How to load data quickly into an existing table (COPY) > 4. How to add/modify data quickly in an existing table (INSERT SELECT, > UPDATE) > I can see the need for all of those individually; my existing patch > submission covers (2) and (3) only. I very much like your thought to > coalesce these various requirements into a single coherent model. However, you then seem to be arguing for still using the COPY LOCK syntax, which I think Bruce intended would go away in favor of using these ALTER commands. Certainly that's what I'd prefer --- COPY has got too darn many options already. > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a > pg_dump *will* work with any other version of PostgreSQL, which *would > not* be the case if we added ALTER TABLE ... RELIABILITY statements into > it. Wrong --- the good thing about ALTER TABLE is that an old version of Postgres would simply reject it and keep going. Therefore we could get the speedup in dumps without losing compatibility, which is not true of COPY LOCK. BTW, this is a perfect example of the use-case for not abandoning a dump-file load simply because one command fails. (We have relied on this sort of reasoning many times before, too, for example by using "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) I don't think that "wrap the whole load into begin/end" is really a very workable answer, because there are far too many scenarios where you can't do that. Another one where it doesn't help is a data-only dump. regards, tom lane From pgsql-hackers-owner+M78028=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 11:12:41 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- To: Simon Riggs cc: Bruce Momjian , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <1135862385.2964.804.camel@localhost.localdomain> Comments: In-reply-to Simon Riggs message dated "Thu, 29 Dec 2005 13:19:45 +0000" Date: Thu, 29 Dec 2005 11:12:11 -0500 Message-ID: <7273.1135872731@sss.pgh.pa.us> From: Tom Lane X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.053 required=5 tests=[AWL=0.053] X-Spam-Score: 0.053 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 2075 Simon Riggs writes: > My view would be that this thread has been complex because everybody has > expressed a somewhat different requirement, which could be broken down > as: > 1. The need for a multi-user-accessible yet temporary table > 2. Loading data into a table immediately after it is created (i.e. in > same transaction), including but not limited to a reload from pg_dump > 3. How to load data quickly into an existing table (COPY) > 4. How to add/modify data quickly in an existing table (INSERT SELECT, > UPDATE) > I can see the need for all of those individually; my existing patch > submission covers (2) and (3) only. I very much like your thought to > coalesce these various requirements into a single coherent model. However, you then seem to be arguing for still using the COPY LOCK syntax, which I think Bruce intended would go away in favor of using these ALTER commands. Certainly that's what I'd prefer --- COPY has got too darn many options already. > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a > pg_dump *will* work with any other version of PostgreSQL, which *would > not* be the case if we added ALTER TABLE ... RELIABILITY statements into > it. Wrong --- the good thing about ALTER TABLE is that an old version of Postgres would simply reject it and keep going. Therefore we could get the speedup in dumps without losing compatibility, which is not true of COPY LOCK. BTW, this is a perfect example of the use-case for not abandoning a dump-file load simply because one command fails. (We have relied on this sort of reasoning many times before, too, for example by using "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) I don't think that "wrap the whole load into begin/end" is really a very workable answer, because there are far too many scenarios where you can't do that. Another one where it doesn't help is a data-only dump. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M78025=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 10:57:46 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Message-ID: <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net> Date: Thu, 29 Dec 2005 10:14:37 -0600 (CST) Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Andrew Dunstan To: In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> References: <200512290158.jBT1wEK28785@candle.pha.pa.us> X-Priority: 3 Importance: Normal X-MSMail-Priority: Normal cc: , , , , , , X-Mailer: SquirrelMail (version 1.2.5) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.082 required=5 tests=[AWL=0.082] X-Spam-Score: 0.082 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1185 Bruce Momjian said: > DROP would drop the table on a restart > after a non-clean shutdown. It would do _no_ logging on the table and > allow concurrent access, plus index access. DELETE is the same as > DROP, but it just truncates the table (perhaps TRUNCATE is a better > word). > > EXCLUSIVE would allow only a single session to modify the table, and > would do all changes by appending to the table, similar to COPY LOCK. > EXCLUSIVE would also not allow indexes because those can not be > isolated like appending to the heap. EXCLUSIVE would write all dirty > shared buffers for the table and fsync them before committing. SHARE > is the functionality we have now, with full logging. I an horribly scared that this will be used as a "performance boost" for normal use. I would at least like to see some restrictions that make it harder to mis-use. Perhaps restrict to superuser? cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From tgl@sss.pgh.pa.us Thu Dec 29 11:24:30 2005 To: Bruce Momjian cc: Andrew Dunstan , kleptog@svana.org, simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <200512291605.jBTG5gi00396@candle.pha.pa.us> References: <200512291605.jBTG5gi00396@candle.pha.pa.us> Comments: In-reply-to Bruce Momjian message dated "Thu, 29 Dec 2005 11:05:42 -0500" Date: Thu, 29 Dec 2005 11:24:28 -0500 Message-ID: <7966.1135873468@sss.pgh.pa.us> From: Tom Lane Content-Length: 612 Bruce Momjian writes: > Andrew Dunstan wrote: >> I an horribly scared that this will be used as a "performance boost" for >> normal use. I would at least like to see some restrictions that make it >> harder to mis-use. Perhaps restrict to superuser? > Certainly restrict to table owner. I can see the argument for superuser-only: decisions about data integrity tradeoffs should be reserved to the DBA, who is the one who will get blamed if the database loses data, no matter how stupid his users are. But I'm not wedded to that. I could live with table-owner. regards, tom lane From pgsql-hackers-owner+M78031=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 11:38:17 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512291637.jBTGbdC03848@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <7273.1135872731@sss.pgh.pa.us> To: Tom Lane Date: Thu, 29 Dec 2005 11:37:39 -0500 (EST) cc: Simon Riggs , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122] X-Spam-Score: 0.122 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 3932 Tom Lane wrote: > Simon Riggs writes: > > My view would be that this thread has been complex because everybody has > > expressed a somewhat different requirement, which could be broken down > > as: > > 1. The need for a multi-user-accessible yet temporary table > > 2. Loading data into a table immediately after it is created (i.e. in > > same transaction), including but not limited to a reload from pg_dump > > 3. How to load data quickly into an existing table (COPY) > > 4. How to add/modify data quickly in an existing table (INSERT SELECT, > > UPDATE) > > > I can see the need for all of those individually; my existing patch > > submission covers (2) and (3) only. I very much like your thought to > > coalesce these various requirements into a single coherent model. > > However, you then seem to be arguing for still using the COPY LOCK > syntax, which I think Bruce intended would go away in favor of using > these ALTER commands. Certainly that's what I'd prefer --- COPY has > got too darn many options already. > > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a > > pg_dump *will* work with any other version of PostgreSQL, which *would > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into > > it. > > Wrong --- the good thing about ALTER TABLE is that an old version of > Postgres would simply reject it and keep going. Therefore we could get > the speedup in dumps without losing compatibility, which is not true > of COPY LOCK. > > BTW, this is a perfect example of the use-case for not abandoning a > dump-file load simply because one command fails. (We have relied on > this sort of reasoning many times before, too, for example by using > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) > I don't think that "wrap the whole load into begin/end" is really a very > workable answer, because there are far too many scenarios where you > can't do that. Another one where it doesn't help is a data-only dump. Yep, Tom is echoing my reaction. There is a temptation to add things up onto existing commands, e.g. LOCK, and while it works, it makes for some very complex user API's. Having COPY behave differently because it is in a transaction is fine as long as it is user-invisible, but once you require users to do that to get the speedup, it isn't user-invisible anymore. (I can see it now, "Why is pg_dump putting things in transactions?", "Because it prevents it from being logged." "Oh, should I be doing that in my code?" "Perhaps, if you want ..." You can see where that discussion is going. Having them see "ATER TABLE ... RELIBILITY TRUNCATE" is very clear, and very clear on how it can be used in user code.) I think there is great utility in giving users one API, namely RELIABILITY (or some other keyword), and telling them that is where they control logging. I realize adding one keyword, LOCK, to an existing command isn't a big deal, but once you decentralize your API enough times, you end up with a terribly complex database system. It is this design rigidity that helps make PostgreSQL so much easier to use than other database systems. I do think it is valid concern about someone use the table between the CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow the RELIABILITY as part of the CREATE TABLE, another is to tell users to create the table inside a transaction. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match From pgsql-hackers-owner+M78036=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 12:21:12 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- To: Andrew Dunstan cc: , , , , , , , Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net> In-Reply-To: <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net> From: Greg Stark Organization: The Emacs Conspiracy; member since 1992 Date: 29 Dec 2005 12:20:32 -0500 Message-ID: <87vex74y73.fsf@stark.xeocode.com> Lines: 42 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4 X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.112 required=5 tests=[AWL=0.112] X-Spam-Score: 0.112 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1983 "Andrew Dunstan" writes: > Bruce Momjian said: > > DROP would drop the table on a restart > > after a non-clean shutdown. It would do _no_ logging on the table and > > allow concurrent access, plus index access. DELETE is the same as > > DROP, but it just truncates the table (perhaps TRUNCATE is a better > > word). > > > > EXCLUSIVE would allow only a single session to modify the table, and > > would do all changes by appending to the table, similar to COPY LOCK. > > EXCLUSIVE would also not allow indexes because those can not be > > isolated like appending to the heap. EXCLUSIVE would write all dirty > > shared buffers for the table and fsync them before committing. SHARE > > is the functionality we have now, with full logging. > > I an horribly scared that this will be used as a "performance boost" for > normal use. I would at least like to see some restrictions that make it > harder to mis-use. Perhaps restrict to superuser? Well that's its whole purpose. At least you can hardly argue that you didn't realize the consequences of "DELETE ROWS ON RECOVERY"... :) Some thoughts: a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to use it instead of DELETE ROWS? b) It seems like the other feature people were talking about of not logging for a table created within the same transaction should be handled by having this flag implicitly set for any such newly created table. Ie, the test for whether to log would look like: if (!table->logged && table->xid != myxid) ... c) Every option in ALTER TABLE should be in CREATE TABLE as well. d) Yes as someone else mentioned, this should only be allowable on a table with no foreign keys referencing it. -- greg ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match From pgsql-hackers-owner+M78037=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 12:31:40 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512291730.jBTHUnn09840@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <87vex74y73.fsf@stark.xeocode.com> To: Greg Stark Date: Thu, 29 Dec 2005 12:30:49 -0500 (EST) cc: Andrew Dunstan , kleptog@svana.org, simon@2ndquadrant.com, tgl@sss.pgh.pa.us, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122] X-Spam-Score: 0.122 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 3304 Greg Stark wrote: > "Andrew Dunstan" writes: > > > Bruce Momjian said: > > > DROP would drop the table on a restart > > > after a non-clean shutdown. It would do _no_ logging on the table and > > > allow concurrent access, plus index access. DELETE is the same as > > > DROP, but it just truncates the table (perhaps TRUNCATE is a better > > > word). > > > > > > EXCLUSIVE would allow only a single session to modify the table, and > > > would do all changes by appending to the table, similar to COPY LOCK. > > > EXCLUSIVE would also not allow indexes because those can not be > > > isolated like appending to the heap. EXCLUSIVE would write all dirty > > > shared buffers for the table and fsync them before committing. SHARE > > > is the functionality we have now, with full logging. > > > > I an horribly scared that this will be used as a "performance boost" for > > normal use. I would at least like to see some restrictions that make it > > harder to mis-use. Perhaps restrict to superuser? > > Well that's its whole purpose. At least you can hardly argue that you didn't > realize the consequences of "DELETE ROWS ON RECOVERY"... :) True. I think we are worried about non-owners using it, but the owner had to grant permissions for others to modify it, so we might be OK. > Some thoughts: > > a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to > use it instead of DELETE ROWS? Good question. The use case is doing COPY into a table that already had data. EXCLUSIVE allows additions to the table but preserves the existing data on a crash. > b) It seems like the other feature people were talking about of not logging > for a table created within the same transaction should be handled by > having this flag implicitly set for any such newly created table. > Ie, the test for whether to log would look like: > > if (!table->logged && table->xid != myxid) ... Yes, the question is whether we want to limit users to having this optimization _only_ when they have created the table in the same transaction, and the short answer is we don't. > c) Every option in ALTER TABLE should be in CREATE TABLE as well. I looked into that and see that things like: ALTER [ COLUMN ] column SET STATISTICS integer ALTER [ COLUMN ] column SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } are not supported by CREATE TABLE, and probably shouldn't be because the value can be changed after the table is created. I think the only things we usually support in CREATE TABLE are those that cannot be altered. > d) Yes as someone else mentioned, this should only be allowable on a table > with no foreign keys referencing it. Right, and EXCLUSIVE can not have an index either. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From simon@2ndquadrant.com Fri Dec 30 08:10:53 2005 Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <200512291637.jBTGbdC03848@candle.pha.pa.us> References: <200512291637.jBTGbdC03848@candle.pha.pa.us> Date: Fri, 30 Dec 2005 13:09:12 +0000 Message-ID: <1135948152.2862.113.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) Content-Length: 6343 On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote: > Tom Lane wrote: > > Simon Riggs writes: > > > My view would be that this thread has been complex because everybody has > > > expressed a somewhat different requirement, which could be broken down > > > as: > > > 1. The need for a multi-user-accessible yet temporary table > > > 2. Loading data into a table immediately after it is created (i.e. in > > > same transaction), including but not limited to a reload from pg_dump > > > 3. How to load data quickly into an existing table (COPY) > > > 4. How to add/modify data quickly in an existing table (INSERT SELECT, > > > UPDATE) > > However, you then seem to be arguing for still using the COPY LOCK > > syntax, which I think Bruce intended would go away in favor of using > > these ALTER commands. Certainly that's what I'd prefer --- COPY has > > got too darn many options already. COPY LOCK was Tom's suggestion at the end of a long discussion thread on this precise issue. Nobody objected to it at that point; I implemented it *exactly* that way because I wanted to very visibly follow the consensus of the community, after informed debate. http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php Please re-read the links to previous discussions. http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php There are points there, not made by me, that still apply and need to be considered here, yet have not been. Just to restate my current thinking: - agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS - we should have COPY LOCK rather than ALTER TABLE .... RELIABILITY EXCLUSIVE (Though I welcome better wording and syntax in either case; it is the behaviour only that I discuss). It seems now that we have agreed approaches for (1), (2) and (4). Please note that I have listened to the needs of others with regard to requirement (1), as espoused by earlier by Hannu and again now by Martijn. Some of the points about requirement (3) I made in my previous post have not yet been addressed, IMHO. My mind is not fixed. AFAICS there are valid points remaining on both sides of the discussion about loading data quickly into an existing table. > I do think it is valid concern about someone use the table between the > CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow > the RELIABILITY as part of the CREATE TABLE, another is to tell users to > create the table inside a transaction. Neither solution works for this use case: > > 3. How to load data quickly into an existing table (COPY) This is the only use case for which ALTER TABLE ... EXCLUSIVE makes sense. That option means that any write lock held upon the table would be an EXCLUSIVE table lock, so would never be a performance gain with single row INSERT, UPDATE or DELETEs. Following Andrew's concerns, I'd also note that ALTER TABLE requires a much higher level of privilege to operate than does COPY. That sounds like it will make things more secure, but all it does is open up the administrative rights, since full ownership rights must be obtained merely to load data. > Having COPY behave differently because it is > in a transaction is fine as long as it is user-invisible Good > I think there is great utility in giving users one API, namely > RELIABILITY (or some other keyword), and telling them that is where they > control logging. I realize adding one keyword, LOCK, to an existing > command isn't a big deal, but once you decentralize your API enough > times, you end up with a terribly complex database system. It is this > design rigidity that helps make PostgreSQL so much easier to use than > other database systems. I do see the appeal of your suggestion... TRUNCATE is a special command to delete quickly. There is no requirement to do an ALTER TABLE statement before that command executes. Balance would suggest that a special command to load data quickly would be reasonably accepted by users. Minor points below: > > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a > > > pg_dump *will* work with any other version of PostgreSQL, which *would > > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into > > > it. > > > > Wrong --- the good thing about ALTER TABLE is that an old version of > > Postgres would simply reject it and keep going. Therefore we could get > > the speedup in dumps without losing compatibility, which is not true > > of COPY LOCK. That was pointing out one of Bruce's objections was not relevant because it assumed COPY LOCK was required to make pg_restore go faster; that was not the case - so there is no valid objection either way now. > > BTW, this is a perfect example of the use-case for not abandoning a > > dump-file load simply because one command fails. (We have relied on > > this sort of reasoning many times before, too, for example by using > > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) > > I don't think that "wrap the whole load into begin/end" is really a very > > workable answer, because there are far too many scenarios where you > > can't do that. Another one where it doesn't help is a data-only dump. Which is why --single-transaction is not the default, per the earlier discussion on that point (on -patches). > Yep, Tom is echoing my reaction. There is a temptation to add things up > onto existing commands, e.g. LOCK, and while it works, it makes for some > very complex user API's. Having COPY behave differently because it is > in a transaction is fine as long as it is user-invisible, but once you > require users to do that to get the speedup, it isn't user-invisible > anymore. > > (I can see it now, "Why is pg_dump putting things in transactions?", > "Because it prevents it from being logged." "Oh, should I be doing that > in my code?" "Perhaps, if you want ..." You can see where that > discussion is going. Having them see "ATER TABLE ... RELIBILITY > TRUNCATE" is very clear, and very clear on how it can be used in user > code.) The above case is not an argument against COPY LOCK. Exactly what you say above would still occur even when we have ALTER TABLE ... RELIABILITY statement, since COPY LOCK and COPY-optimized-within-same-transaction are different things. Best Regards, Simon Riggs From pgsql-hackers-owner+M78064=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 11:50:49 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512301649.jBUGnxn21488@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <1135948152.2862.113.camel@localhost.localdomain> To: Simon Riggs Date: Fri, 30 Dec 2005 11:49:59 -0500 (EST) cc: Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] X-Spam-Score: 0.12 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 8888 Simon Riggs wrote: > On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote: > > Tom Lane wrote: > > > Simon Riggs writes: > > > > My view would be that this thread has been complex because everybody has > > > > expressed a somewhat different requirement, which could be broken down > > > > as: > > > > 1. The need for a multi-user-accessible yet temporary table > > > > 2. Loading data into a table immediately after it is created (i.e. in > > > > same transaction), including but not limited to a reload from pg_dump > > > > 3. How to load data quickly into an existing table (COPY) > > > > 4. How to add/modify data quickly in an existing table (INSERT SELECT, > > > > UPDATE) > > > > However, you then seem to be arguing for still using the COPY LOCK > > > syntax, which I think Bruce intended would go away in favor of using > > > these ALTER commands. Certainly that's what I'd prefer --- COPY has > > > got too darn many options already. > > COPY LOCK was Tom's suggestion at the end of a long discussion thread on > this precise issue. Nobody objected to it at that point; I implemented > it *exactly* that way because I wanted to very visibly follow the > consensus of the community, after informed debate. > http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php > > Please re-read the links to previous discussions. > http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php > There are points there, not made by me, that still apply and need to be > considered here, yet have not been. Yes, I know we agreed to the COPY LOCK, but new features now being requested, so we have to re-evaluate where we are going with COPY LOCK to get a more consistent solution. > Just to restate my current thinking: > - agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS > - we should have COPY LOCK rather than > ALTER TABLE .... RELIABILITY EXCLUSIVE > (Though I welcome better wording and syntax in either case; it is the > behaviour only that I discuss). > > It seems now that we have agreed approaches for (1), (2) and (4). Please > note that I have listened to the needs of others with regard to > requirement (1), as espoused by earlier by Hannu and again now by > Martijn. Some of the points about requirement (3) I made in my previous > post have not yet been addressed, IMHO. > > My mind is not fixed. AFAICS there are valid points remaining on both > sides of the discussion about loading data quickly into an existing > table. > > > I do think it is valid concern about someone use the table between the > > CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow > > the RELIABILITY as part of the CREATE TABLE, another is to tell users to > > create the table inside a transaction. > > Neither solution works for this use case: > > > > 3. How to load data quickly into an existing table (COPY) > > This is the only use case for which ALTER TABLE ... EXCLUSIVE makes > sense. That option means that any write lock held upon the table would > be an EXCLUSIVE table lock, so would never be a performance gain with > single row INSERT, UPDATE or DELETEs. Ah, but people wanted fast INSERT INTO ... SELECT, and that would use EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use EXCLUSIVE? We don't want to add "LOCK" to every command that might use EXCLUSIVE. ALTER is much better for this. I agree if we thought EXCLUSIVE would only be used for COPY, we could use LOCK, but I am thinking it will be used for other commands as well. > Following Andrew's concerns, I'd also note that ALTER TABLE requires a > much higher level of privilege to operate than does COPY. That sounds > like it will make things more secure, but all it does is open up the > administrative rights, since full ownership rights must be obtained > merely to load data. True, but as pointed out by others, I don't see that happening too often. > > Having COPY behave differently because it is > > in a transaction is fine as long as it is user-invisible > > Good > > > I think there is great utility in giving users one API, namely > > RELIABILITY (or some other keyword), and telling them that is where they > > control logging. I realize adding one keyword, LOCK, to an existing > > command isn't a big deal, but once you decentralize your API enough > > times, you end up with a terribly complex database system. It is this > > design rigidity that helps make PostgreSQL so much easier to use than > > other database systems. > > I do see the appeal of your suggestion... > > TRUNCATE is a special command to delete quickly. There is no requirement > to do an ALTER TABLE statement before that command executes. The TRUNCATE happens during recovery. There is no user interaction. It happens because we can't restore the contents of the table in a consistent state because no logging was used. Basically, a table marked RELIABILITY TRUNCATE would be truncated on a recovery start of the postmaster. > Balance would suggest that a special command to load data quickly would > be reasonably accepted by users. > > > > > Minor points below: > > > > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a > > > > pg_dump *will* work with any other version of PostgreSQL, which *would > > > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into > > > > it. > > > > > > Wrong --- the good thing about ALTER TABLE is that an old version of > > > Postgres would simply reject it and keep going. Therefore we could get > > > the speedup in dumps without losing compatibility, which is not true > > > of COPY LOCK. > > That was pointing out one of Bruce's objections was not relevant because > it assumed COPY LOCK was required to make pg_restore go faster; that was > not the case - so there is no valid objection either way now. I don't consider the single-transaction to be a no-cost solution. You are adding flags to commands, and you are using a dump layout for performance where the purpose for the layout is not clear. The ALTER is clear to the user, and it allows nologging operations to happen after the table is created. In fact, for use in pg_dump, I think DROP is the proper operation for loading, not your transaction wrapping solution. We already agree we need DROP (or TRUNCATE), so why not use that rather than the transaction wrap idea? > > > BTW, this is a perfect example of the use-case for not abandoning a > > > dump-file load simply because one command fails. (We have relied on > > > this sort of reasoning many times before, too, for example by using > > > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) > > > I don't think that "wrap the whole load into begin/end" is really a very > > > workable answer, because there are far too many scenarios where you > > > can't do that. Another one where it doesn't help is a data-only dump. > > Which is why --single-transaction is not the default, per the earlier > discussion on that point (on -patches). Right, but why not use DROP/TRUNCATE? That works for old dumps too, and has no downsides, meaning it can be always on. > > Yep, Tom is echoing my reaction. There is a temptation to add things up > > onto existing commands, e.g. LOCK, and while it works, it makes for some > > very complex user API's. Having COPY behave differently because it is > > in a transaction is fine as long as it is user-invisible, but once you > > require users to do that to get the speedup, it isn't user-invisible > > anymore. > > > > (I can see it now, "Why is pg_dump putting things in transactions?", > > "Because it prevents it from being logged." "Oh, should I be doing that > > in my code?" "Perhaps, if you want ..." You can see where that > > discussion is going. Having them see "ATER TABLE ... RELIBILITY > > TRUNCATE" is very clear, and very clear on how it can be used in user > > code.) > > The above case is not an argument against COPY LOCK. Exactly what you > say above would still occur even when we have ALTER TABLE ... > RELIABILITY statement, since COPY LOCK and > COPY-optimized-within-same-transaction are different things. See my posting above that we might want EXCLUSIVE for other commands, meaning ALTER makes more sense. So, to summarize, I think we should add DROP/TRUNCATE, and use that by default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings From pgsql-hackers-owner+M78065=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 12:40:48 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Message-ID: <43B570C9.6060406@dunslane.net> Date: Fri, 30 Dec 2005 12:39:21 -0500 From: Andrew Dunstan User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.7.12-1.3.1 X-Accept-Language: en-us, en To: Tom Lane cc: simon@2ndquadrant.com, pgman@candle.pha.pa.us, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and References: <1135948152.2862.113.camel@localhost.localdomain> <56737.68.143.134.146.1135954413.squirrel@www.dunslane.net> <11876.1135954626@sss.pgh.pa.us> In-Reply-To: <11876.1135954626@sss.pgh.pa.us> X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.041 required=5 tests=[AWL=0.041] X-Spam-Score: 0.041 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1815 Tom Lane wrote: >"Andrew Dunstan" writes: > > >>Simon Riggs said: >> >> >>>Following Andrew's concerns, I'd also note that ALTER TABLE requires a >>>much higher level of privilege to operate than does COPY. That sounds >>>like it will make things more secure, but all it does is open up the >>>administrative rights, since full ownership rights must be obtained >>>merely to load data. >>> >>> > > > >>My concern is more about making plain that this is for special operations, >>not normal operations. Or maybe I have misunderstood the purpose. >> >> > >Rephrase that as "full ownership rights must be obtained to load data in >a way that requires dropping any existing indexes and locking out other >users of the table". I don't think the use-case for this will be very >large for non-owners, or indeed even for owners except during initial >table creation; and so I don't think the above argument is strong. > > > > Those restrictions aren't true of Bruce's proposed drop and delete/truncate recovery modes, are they? People do crazy things in pursuit of performance. Illustration: a few months ago I was instrumenting an app (based on MySQL/ISAM) and I noticed that under load it simply didn't update the inventory properly - of 1000 orders placed within a few seconds it might reduce inventory by 3 or 4. I reported this and they shrugged their shoulders and said "well, we'd have to lock the table and that would slow everything down ...". I just want to be sure we aren't providing a footgun. "Oh, just set recovery mode to delete. It won't make any difference unless you crash and you'll run faster." cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings From pgsql-hackers-owner+M78066=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 12:58:52 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512301758.jBUHwFv03107@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <43B570C9.6060406@dunslane.net> To: Andrew Dunstan Date: Fri, 30 Dec 2005 12:58:15 -0500 (EST) cc: Tom Lane , simon@2ndquadrant.com, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] X-Spam-Score: 0.12 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1996 Andrew Dunstan wrote: > >>My concern is more about making plain that this is for special operations, > >>not normal operations. Or maybe I have misunderstood the purpose. > >> > >> > > > >Rephrase that as "full ownership rights must be obtained to load data in > >a way that requires dropping any existing indexes and locking out other > >users of the table". I don't think the use-case for this will be very > >large for non-owners, or indeed even for owners except during initial > >table creation; and so I don't think the above argument is strong. > > > > > > > > > > Those restrictions aren't true of Bruce's proposed drop and > delete/truncate recovery modes, are they? Only the owner could do the ALTER, for sure, but once the owner sets it, any user with permission to write to the table would have those characteristics. > People do crazy things in pursuit of performance. Illustration: a few > months ago I was instrumenting an app (based on MySQL/ISAM) and I > noticed that under load it simply didn't update the inventory properly - > of 1000 orders placed within a few seconds it might reduce inventory by > 3 or 4. I reported this and they shrugged their shoulders and said > "well, we'd have to lock the table and that would slow everything down > ...". > > I just want to be sure we aren't providing a footgun. "Oh, just set > recovery mode to delete. It won't make any difference unless you crash > and you'll run faster." I think we have to trust the object owner in this case. I don't know of any super-user-only ALTER commands, but I suppose we could set it up that way if we wanted. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M78070=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 14:29:06 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <200512301649.jBUGnxn21488@candle.pha.pa.us> References: <200512301649.jBUGnxn21488@candle.pha.pa.us> Date: Fri, 30 Dec 2005 19:28:41 +0000 Message-ID: <1135970921.5052.68.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.034 required=5 tests=[AWL=0.034] X-Spam-Score: 0.034 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 3112 On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote: > Yes, I know we agreed to the COPY LOCK, but new features now being > requested, so we have to re-evaluate where we are going with COPY LOCK > to get a more consistent solution. Thank you. > Ah, but people wanted fast INSERT INTO ... SELECT, and that would use > EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use > EXCLUSIVE? We don't want to add "LOCK" to every command that might use > EXCLUSIVE. ALTER is much better for this. > I agree if we thought EXCLUSIVE would only be used for COPY, we could > use LOCK, but I am thinking it will be used for other commands as well. Agreed, I will look to implement this. Could the internals of my recent patch be reviewed? Changing the user interface is less of a problem than changing the internals, which is where the hard work takes place. I do not want to extend this work further only to have that part rejected later. The implications of EXCLUSIVE are: - there will be a check on each and every I, U, D to check the state of the relation - *every* operation that attempts a write lock will attempt to acquire an EXCLUSIVE full table lock instead - following successful completion of *each* DML statement, the relation will be heap_sync'd involving a full scan of the buffer cache Can I clarify the wording of the syntax? Is EXCLUSIVE the right word? How about FASTLOAD or BULKLOAD? Those words seem less likely to be misused in the future - i.e. we are invoking a special mode, rather than invoking a special "go faster" option. > I don't consider the single-transaction to be a no-cost solution. You > are adding flags to commands, and you are using a dump layout for > performance where the purpose for the layout is not clear. The ALTER is > clear to the user, and it allows nologging operations to happen after > the table is created. > > In fact, for use in pg_dump, I think DROP is the proper operation for > loading, not your transaction wrapping solution. We already agree we > need DROP (or TRUNCATE), so why not use that rather than the transaction > wrap idea? This was discussed on-list by 2 core team members, a committer and myself, but I see no requirements change here. You even accepted the invisible COPY optimization in your last post - why unpick that now? Please forgive my tone, but I am lost for reasonable yet expressive words. The --single-transaction mode would apply even if the dump was created using an earlier version of pg_dump. pg_dump has *not* been altered at all. (And I would again add that the idea was not my own) > So, to summarize, I think we should add DROP/TRUNCATE, and use that by > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. Would you mind stating again what you mean, just so I can understand this? Your summary isn't enough. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq From pgsql-hackers-owner+M78072=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 16:15:30 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512302114.jBULEno02301@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <1135970921.5052.68.camel@localhost.localdomain> To: Simon Riggs Date: Fri, 30 Dec 2005 16:14:49 -0500 (EST) cc: Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] X-Spam-Score: 0.12 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 7285 Simon Riggs wrote: > On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote: > > > Yes, I know we agreed to the COPY LOCK, but new features now being > > requested, so we have to re-evaluate where we are going with COPY LOCK > > to get a more consistent solution. > > Thank you. Good. I think we can be happy that COPY LOCK didn't get into a release, so we don't have to support it forever. When we are adding features, we have to consider not only the current release, but future releases and what people will ask for in the future so the syntax can be expanded without breaking previous usage. > > Ah, but people wanted fast INSERT INTO ... SELECT, and that would use > > EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use > > EXCLUSIVE? We don't want to add "LOCK" to every command that might use > > EXCLUSIVE. ALTER is much better for this. > > > I agree if we thought EXCLUSIVE would only be used for COPY, we could > > use LOCK, but I am thinking it will be used for other commands as well. > > Agreed, I will look to implement this. > > Could the internals of my recent patch be reviewed? Changing the user > interface is less of a problem than changing the internals, which is > where the hard work takes place. I do not want to extend this work > further only to have that part rejected later. OK, I will look it over this week or next. > The implications of EXCLUSIVE are: > - there will be a check on each and every I, U, D to check the state of > the relation > - *every* operation that attempts a write lock will attempt to acquire > an EXCLUSIVE full table lock instead > - following successful completion of *each* DML statement, the relation > will be heap_sync'd involving a full scan of the buffer cache Yes, I think that is it. What we can do is implement EXCLUSIVE to affect only COPY at this point, and document that, and later add other commands. > Can I clarify the wording of the syntax? Is EXCLUSIVE the right word? > How about FASTLOAD or BULKLOAD? Those words seem less likely to be > misused in the future - i.e. we are invoking a special mode, rather than > invoking a special "go faster" option. The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is probably not the best for loading. I would think TRUNCATE would be a better option. In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be the same, mostly. You would create the table, set its RELIABILITY to TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or DEFAULT. The second ALTER has to sync all the dirty data blocks, which the same thing EXCLUSIVE does at the conclusion of COPY. So, we need a name for EXCLUSIVE mode that suggests how it is different from TRUNCATE, and in this case, the difference is that EXCLUSIVE preserves the previous contents of the table on recovery, while TRUNCATE does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER? Anyway, the keywords are easy to modify, even after the patch is submitted. FYI, I usually go through keywords.c looking for a keyword we already use. > > I don't consider the single-transaction to be a no-cost solution. You > > are adding flags to commands, and you are using a dump layout for > > performance where the purpose for the layout is not clear. The ALTER is > > clear to the user, and it allows nologging operations to happen after > > the table is created. > > > > In fact, for use in pg_dump, I think DROP is the proper operation for > > loading, not your transaction wrapping solution. We already agree we > > need DROP (or TRUNCATE), so why not use that rather than the transaction > > wrap idea? > > This was discussed on-list by 2 core team members, a committer and > myself, but I see no requirements change here. You even accepted the > invisible COPY optimization in your last post - why unpick that now? > Please forgive my tone, but I am lost for reasonable yet expressive > words. Do you think you are the only one who has rewritten a patch multiple times? We all have. The goal is to get the functionality into the system in the most seamless way possible. Considering the number of people who use PostgreSQL, if it takes use 10 tries, it is worth it considering the thousands of people who will use it. Would you have us include a sub-optimal patch and have thousands of people adjust to its non-optimal functionality? I am sure you would not. Perhaps a company would say, "Oh, just ship it", but we don't. > The --single-transaction mode would apply even if the dump was created > using an earlier version of pg_dump. pg_dump has *not* been altered at > all. (And I would again add that the idea was not my own) I assume you mean this: http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php I guess with the ALTER commands I don't see much value in the --single-transaction flag. I am sure others suggested it, but would they suggest it now given our current direction. The fact that the patch was submitted does not give it any more weight --- the question is does this feature make sense for 8.2. The goal is not to cram as many optimizations into PostgreSQL as possible, the goal is to present a consistent usable system to users. > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. > > Would you mind stating again what you mean, just so I can understand > this? Your summary isn't enough. New ALTER TABLE mode, perhaps call it PERSISTENCE: ALTER TABLE tab PERSISTENCE DROP ON RECOVERY ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY These would drop or truncate all tables with this flag on a non-clean start of the postmaster, and write something in the server logs. However, I don't know that we have the code in place to DROP/TRUNCATE in recovery mode, and it would affect all databases, so it could be quite complex to implement. In this mode, no WAL logs would be written for table modifications, though DDL commands would have to be logged. ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?) Table contents are preserved across recoveries, but data modifications can happen only one at a time. I don't think we have a lock mode that does this, so I am worried a new lock mode will have to be created. A simplified solution at this stage would be to take an exclusive lock on the table, but really we just need a single-writer table lock, which I don't think we have. initially this can implemented to only affect COPY but later can be done for other commands. ALTER TABLE tab PERSISTENCE DEFAULT This would be our current default mode, which is full concurrency and persistence. It took me over an hour to write this, but I feel the time is worth it because of the number of users who use our software. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M78076=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 17:37:00 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- To: Bruce Momjian cc: Simon Riggs , Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and References: <200512302114.jBULEno02301@candle.pha.pa.us> In-Reply-To: <200512302114.jBULEno02301@candle.pha.pa.us> From: Greg Stark Organization: The Emacs Conspiracy; member since 1992 Date: 30 Dec 2005 17:36:24 -0500 Message-ID: <87mzii8b6f.fsf@stark.xeocode.com> Lines: 28 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4 X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113] X-Spam-Score: 0.113 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1424 As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful functionality but perhaps there doesn't have to be any proprietary user interface to it at all. Why not just check if the conditions are already present to allow the optimization and if so go ahead. That is, if the current transaction already has an exclusive lock on the table and there are no indexes (and PITR isn't active) then Postgres could go ahead and use the same WAL skipping logic as the other operations that already so so. This would work for inserts whether coming from COPY or plain SQL INSERTs. The nice thing about this is that the user's SQL wouldn't need any proprietary extensions at all. Just tell people to do BEGIN; LOCK TABLE foo; COPY foo from ... COMMIT; There could be a COPY LOCK option to obtain a lock, but it would be purely for user convenience so they don't have to bother with BEGIN and COMMIt. The only downside is a check to see if an exclusive table lock is present on every copy and insert. That might be significant but perhaps there are ways to finess that. If not perhaps only doing it on COPY would be a good compromise. -- greg ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From pgsql-hackers-owner+M78077=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 17:47:18 2005 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200512302246.jBUMkjF25196@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <87mzii8b6f.fsf@stark.xeocode.com> To: Greg Stark Date: Fri, 30 Dec 2005 17:46:45 -0500 (EST) cc: Simon Riggs , Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] X-Spam-Score: 0.12 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 2135 Greg Stark wrote: > > As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful > functionality but perhaps there doesn't have to be any proprietary user > interface to it at all. Why not just check if the conditions are already > present to allow the optimization and if so go ahead. > > That is, if the current transaction already has an exclusive lock on the table > and there are no indexes (and PITR isn't active) then Postgres could go ahead > and use the same WAL skipping logic as the other operations that already so > so. This would work for inserts whether coming from COPY or plain SQL INSERTs. > > The nice thing about this is that the user's SQL wouldn't need any proprietary > extensions at all. Just tell people to do > > BEGIN; > LOCK TABLE foo; > COPY foo from ... > COMMIT; > > There could be a COPY LOCK option to obtain a lock, but it would be purely for > user convenience so they don't have to bother with BEGIN and COMMIt. > > The only downside is a check to see if an exclusive table lock is present on > every copy and insert. That might be significant but perhaps there are ways to > finess that. If not perhaps only doing it on COPY would be a good compromise. Well, again, if we wanted to use EXCLUSIVE only for COPY, this might make sense. However, also consider that the idea for EXCLUSIVE was that users could continue read-only queries on the table while it is being loaded (like COPY allows now), and that in EXCLUSIVE mode, we are only going to write into new pages. If someone has an exclusive lock on the table and does a COPY or SELECT INTO do we want to assume we are only going to write into new pages, and do we want to force an exclusive lock rather than a single-writer lock? I don't think so. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend From mpaesold@gmx.at Sat Dec 31 06:59:51 2005 Date: Sat, 31 Dec 2005 12:59:44 +0100 (MET) From: Michael Paesold To: Bruce Momjian cc: simon@2ndquadrant.com, andrew@dunslane.net, tgl@sss.pgh.pa.us, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org References: <200512302114.jBULEno02301@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and X-Priority: 3 (Normal) X-Authenticated: #1946847 Message-ID: <14969.1136030384@www6.gmx.net> X-Mailer: WWW-Mail 1.6 (Global Message Exchange) X-Flags: 0001 Content-Length: 1305 Bruce Momjian wrote: > > The --single-transaction mode would apply even if the dump was created > > using an earlier version of pg_dump. pg_dump has *not* been altered at > > all. (And I would again add that the idea was not my own) > > I assume you mean this: > > http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php > > I guess with the ALTER commands I don't see much value in the > --single-transaction flag. I am sure others suggested it, but would > they suggest it now given our current direction. I just want to add that --single-transaction has a value of it's own. There were times when I wanted to restore parts of a dump all-or-nothing. This is possible with PostgreSQL, unlike many other DBM systems, because people like Tom Lane have invested in ensuring that all DDL is working without implicitly committing an enclosing transaction. Using pg_restore directly into a database, it is not possible to get a single transaction right now. One has to restore to a file and manually added BEGIN/COMMIT. Just for that I think --single-transaction is a great addition and a missing feature. I think more people have a use-case for that. Best Regards, Michael Paesold -- Telefonieren Sie schon oder sparen Sie noch? NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie From pgsql-hackers-owner+M78213=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 12:08:43 2006 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200601031708.k03H85j27170@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <17173.1136306881@sss.pgh.pa.us> To: Tom Lane Date: Tue, 3 Jan 2006 12:08:05 -0500 (EST) cc: Jim C. Nasby , Andrew Dunstan , simon@2ndquadrant.com, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.121 required=5 tests=[AWL=0.121] X-Spam-Score: 0.121 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1125 Tom Lane wrote: > "Jim C. Nasby" writes: > > On Tue, Jan 03, 2006 at 11:26:51AM -0500, Tom Lane wrote: > >> Such an ALTER would certainly require exclusive lock on the table, > >> so I'm not sure that I see much use-case for doing it like that. > >> You'd want to do the ALTER and commit so as not to lock other people > >> out of the table entirely while doing the bulk data-pushing. > > > Maybe this just isn't clear, but would EXCLUSIVE block writes from all > > other sessions then? > > I don't think it should (which implies that EXCLUSIVE is a bad name). Agreed, EXCLUSIVE was used to mean an _exclusive_ writer. The new words I proposed were PRESERVE or STABLE. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org From tgl@sss.pgh.pa.us Tue Jan 3 12:37:34 2006 To: Stephen Frost cc: Jim C. Nasby , Bruce Momjian , Andrew Dunstan , kleptog@svana.org, simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <20060103165359.GP6026@ns.snowman.net> References: <200512291605.jBTG5gi00396@candle.pha.pa.us> <7966.1135873468@sss.pgh.pa.us> <20060103154521.GC82560@pervasive.com> <20060103162137.GO6026@ns.snowman.net> <16856.1136305742@sss.pgh.pa.us> <20060103165359.GP6026@ns.snowman.net> Comments: In-reply-to Stephen Frost message dated "Tue, 03 Jan 2006 11:54:01 -0500" Date: Tue, 03 Jan 2006 12:37:32 -0500 Message-ID: <17841.1136309852@sss.pgh.pa.us> From: Tom Lane Content-Length: 976 Stephen Frost writes: > The problem is that you might want to grant 'truncate' to people who > *aren't* particularly trusted. For truncate, at least I have a > real-world use-case for it. I don't find this use-case particularly convincing. If the users are allowed to delete all data in a given table, then that table must be dedicated to them anyway; so it's not that easy to see why you can't risk giving them ownership rights on it. The worst they can do is screw up their own data, no? In any case, I don't see what's so wrong with the model of using SECURITY DEFINER interface functions when you want a security restriction that's finer-grain than the system provides. I really *don't* want to see us trying to, say, categorize every variety of ALTER TABLE as a separately grantable privilege. I could live with something like a catchall "ADMIN" privilege ... except it's not clear how that would differ from ownership. regards, tom lane From pgsql-hackers-owner+M78221=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 13:30:34 2006 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Date: Tue, 3 Jan 2006 13:30:56 -0500 From: Stephen Frost To: Tom Lane cc: Jim C. Nasby , Bruce Momjian , Andrew Dunstan , kleptog@svana.org, simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and Message-ID: <20060103183056.GR6026@ns.snowman.net> Mail-Followup-To: Tom Lane , "Jim C. Nasby" , Bruce Momjian , Andrew Dunstan , kleptog@svana.org, simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org References: <200512291605.jBTG5gi00396@candle.pha.pa.us> <7966.1135873468@sss.pgh.pa.us> <20060103154521.GC82560@pervasive.com> <20060103162137.GO6026@ns.snowman.net> <16856.1136305742@sss.pgh.pa.us> <20060103165359.GP6026@ns.snowman.net> <17841.1136309852@sss.pgh.pa.us> Content-Disposition: inline In-Reply-To: <17841.1136309852@sss.pgh.pa.us> X-Editor: Vim http://www.vim.org/ X-Info: http://www.snowman.net X-Operating-System: Linux/2.4.24ns.3.0 (i686) X-Uptime: 12:39:16 up 206 days, 9:50, 11 users, load average: 0.02, 0.05, 0.05 User-Agent: Mutt/1.5.9i X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.105 required=5 tests=[AWL=0.105] X-Spam-Score: 0.105 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 2666 -- Start of PGP signed section. * Tom Lane (tgl@sss.pgh.pa.us) wrote: > I don't find this use-case particularly convincing. If the users are > allowed to delete all data in a given table, then that table must be > dedicated to them anyway; so it's not that easy to see why you can't > risk giving them ownership rights on it. The worst they can do is > screw up their own data, no? Being able to delete all data in a given table in no way implies ownership rights. The tables are part of a specification which the users are being asked to respond to. Being able to change the table types or remove the constraints put on the tables would allow the users to upload garbage which would then affect downstream processing. We can't guarentee this won't happen anyway but we try to confine the things they can mess up to a reasonable set which we can check for (and do, through a rather involved error checking system). There are *alot* of things built on top of the table structures and having them change would basically break the whole system (without the appropriate changes being made to the other parts of the system). > In any case, I don't see what's so wrong with the model of using > SECURITY DEFINER interface functions when you want a security > restriction that's finer-grain than the system provides. I really > *don't* want to see us trying to, say, categorize every variety of > ALTER TABLE as a separately grantable privilege. I could live with > something like a catchall "ADMIN" privilege ... except it's not > clear how that would differ from ownership. I don't think anyone's asked for 'ALTER TABLE' privileges to be seperately grantable. It seems to me that the privileges which *need* to be grantable are ones associated with DML statements. I would classify TRUNCATE, VACUUM and ANALYZE as DML statements (along with select, insert, update, and delete). They're PostgreSQL-specific DML statements but they still fall into that category. I don't think it's a coincidence that the SQL-defined DML statements are all, individually, grantable. That doesn't mean I think we should get rid of RULE, REFERENCES or TRIGGER, though honestly I've very rarely needed to grant any of them (I don't think I've ever granted RULE or TRIGGER...). References is DDL-oriented, but for *other* tables; RULE and TRIGGER are DDL and I can't really justify why someone other than the owner would need them but I'm guessing someone's using them. I don't think their existance should imply that if we ever change the grants again we have to include all types of 'ALTER TABLE', etc, though. Thanks, Stephen -- End of PGP section, PGP failed! From sfrost@snowman.net Tue Jan 3 13:30:13 2006 Date: Tue, 3 Jan 2006 13:30:56 -0500 From: Stephen Frost To: Tom Lane cc: Jim C. Nasby , Bruce Momjian , Andrew Dunstan , kleptog@svana.org, simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and Message-ID: <20060103183056.GR6026@ns.snowman.net> Mail-Followup-To: Tom Lane , "Jim C. Nasby" , Bruce Momjian , Andrew Dunstan , kleptog@svana.org, simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org References: <200512291605.jBTG5gi00396@candle.pha.pa.us> <7966.1135873468@sss.pgh.pa.us> <20060103154521.GC82560@pervasive.com> <20060103162137.GO6026@ns.snowman.net> <16856.1136305742@sss.pgh.pa.us> <20060103165359.GP6026@ns.snowman.net> <17841.1136309852@sss.pgh.pa.us> Content-Disposition: inline In-Reply-To: <17841.1136309852@sss.pgh.pa.us> X-Editor: Vim http://www.vim.org/ X-Info: http://www.snowman.net X-Operating-System: Linux/2.4.24ns.3.0 (i686) X-Uptime: 12:39:16 up 206 days, 9:50, 11 users, load average: 0.02, 0.05, 0.05 User-Agent: Mutt/1.5.9i Content-Length: 2666 -- Start of PGP signed section. * Tom Lane (tgl@sss.pgh.pa.us) wrote: > I don't find this use-case particularly convincing. If the users are > allowed to delete all data in a given table, then that table must be > dedicated to them anyway; so it's not that easy to see why you can't > risk giving them ownership rights on it. The worst they can do is > screw up their own data, no? Being able to delete all data in a given table in no way implies ownership rights. The tables are part of a specification which the users are being asked to respond to. Being able to change the table types or remove the constraints put on the tables would allow the users to upload garbage which would then affect downstream processing. We can't guarentee this won't happen anyway but we try to confine the things they can mess up to a reasonable set which we can check for (and do, through a rather involved error checking system). There are *alot* of things built on top of the table structures and having them change would basically break the whole system (without the appropriate changes being made to the other parts of the system). > In any case, I don't see what's so wrong with the model of using > SECURITY DEFINER interface functions when you want a security > restriction that's finer-grain than the system provides. I really > *don't* want to see us trying to, say, categorize every variety of > ALTER TABLE as a separately grantable privilege. I could live with > something like a catchall "ADMIN" privilege ... except it's not > clear how that would differ from ownership. I don't think anyone's asked for 'ALTER TABLE' privileges to be seperately grantable. It seems to me that the privileges which *need* to be grantable are ones associated with DML statements. I would classify TRUNCATE, VACUUM and ANALYZE as DML statements (along with select, insert, update, and delete). They're PostgreSQL-specific DML statements but they still fall into that category. I don't think it's a coincidence that the SQL-defined DML statements are all, individually, grantable. That doesn't mean I think we should get rid of RULE, REFERENCES or TRIGGER, though honestly I've very rarely needed to grant any of them (I don't think I've ever granted RULE or TRIGGER...). References is DDL-oriented, but for *other* tables; RULE and TRIGGER are DDL and I can't really justify why someone other than the owner would need them but I'm guessing someone's using them. I don't think their existance should imply that if we ever change the grants again we have to include all types of 'ALTER TABLE', etc, though. Thanks, Stephen -- End of PGP section, PGP failed! From pgsql-hackers-owner+M78233=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 17:39:06 2006 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200601032238.k03McP804163@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <20060103212750.GT82560@pervasive.com> To: Jim C. Nasby Date: Tue, 3 Jan 2006 17:38:25 -0500 (EST) cc: Tom Lane , Andrew Dunstan , simon@2ndquadrant.com, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.121 required=5 tests=[AWL=0.121] X-Spam-Score: 0.121 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1714 Jim C. Nasby wrote: > > We would be creating a new lock type for this. > > Sorry if I've just missed this in the thread, but what would the new > lock type do? My impression is that as it stands you can either do: > > BEGIN; > ALTER TABLE EXCLUSIVE; > ... > ALTER TABLE SHARE; --fsync > COMMIT; > > Which would block all other access to the table as soon as the first > ALTER TABLE happens. Or you can: > > ALTER TABLE EXCLUSIVE; > ... > ALTER TABLE SHARE; > > Which means that between the two ALTER TABLES every backend that does > DML on that table will not have that DML logged, but because there's no > exclusive lock that DML would be allowed to occur. Right, the DML will be single-threaded and fsync of all dirty pages will happen before commit of each transaction. > BTW, there might be some usecase for the second scenario, in which case > it would probably be better to tell the user to aquire a table-lock on > their own rather than do it automatically as part of the update... > > Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is > > happening is never going to be implemented because it is just too hard > > to do, and too prone to error. > > What I figured. Never hurts to ask though. :) Actually, it does hurt because it generates discussion volume for no purpose. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings From pgsql-hackers-owner+M78234=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 17:54:16 2006 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <200512302114.jBULEno02301@candle.pha.pa.us> References: <200512302114.jBULEno02301@candle.pha.pa.us> Date: Tue, 03 Jan 2006 22:53:53 +0000 Message-ID: <1136328833.5052.223.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.04 required=5 tests=[AWL=0.040] X-Spam-Score: 0.04 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 5373 On Fri, 2005-12-30 at 16:14 -0500, Bruce Momjian wrote: > Simon Riggs wrote: > > The implications of EXCLUSIVE are: > > - there will be a check on each and every I, U, D to check the state of > > the relation > > - *every* operation that attempts a write lock will attempt to acquire > > an EXCLUSIVE full table lock instead > > - following successful completion of *each* DML statement, the relation > > will be heap_sync'd involving a full scan of the buffer cache > > Yes, I think that is it. What we can do is implement EXCLUSIVE to > affect only COPY at this point, and document that, and later add other > commands. > > > Can I clarify the wording of the syntax? Is EXCLUSIVE the right word? > > How about FASTLOAD or BULKLOAD? Those words seem less likely to be > > misused in the future - i.e. we are invoking a special mode, rather than > > invoking a special "go faster" option. > > The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is > probably not the best for loading. I would think TRUNCATE would be a > better option. > > In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be > the same, mostly. You would create the table, set its RELIABILITY to > TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or > DEFAULT. The second ALTER has to sync all the dirty data blocks, which > the same thing EXCLUSIVE does at the conclusion of COPY. > > So, we need a name for EXCLUSIVE mode that suggests how it is different > from TRUNCATE, and in this case, the difference is that EXCLUSIVE > preserves the previous contents of the table on recovery, while TRUNCATE > does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER? > Anyway, the keywords are easy to modify, even after the patch is > submitted. FYI, I usually go through keywords.c looking for a keyword > we already use. I'm very happy for suggestions on what these new modes are called. > > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by > > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE > > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. > > > > Would you mind stating again what you mean, just so I can understand > > this? Your summary isn't enough. > > New ALTER TABLE mode, perhaps call it PERSISTENCE: > > ALTER TABLE tab PERSISTENCE DROP ON RECOVERY > ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY > > These would drop or truncate all tables with this flag on a non-clean > start of the postmaster, and write something in the server logs. > However, I don't know that we have the code in place to DROP/TRUNCATE in > recovery mode, and it would affect all databases, so it could be quite > complex to implement. In this mode, no WAL logs would be written for > table modifications, though DDL commands would have to be logged. Right now, this will be a TODO item... it looks like it will take some thought to implement correctly. > ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?) > > Table contents are preserved across recoveries, but data modifications > can happen only one at a time. I don't think we have a lock mode that > does this, so I am worried a new lock mode will have to be created. A > simplified solution at this stage would be to take an exclusive lock on > the table, but really we just need a single-writer table lock, which I > don't think we have. initially this can implemented to only affect COPY > but later can be done for other commands. ExclusiveLock locks out everything apart from readers, no new lock mode AFAICS. Implementing that is little additional work for COPY. Tom had a concern about setting this for I, U, D commands via the executor. Not sure what the details of that are, as yet. We can use either of the unlogged modes for pg_dump, so I'd suggest its this one. Everybody happy with this being the new default in pg_dump, or should it be an option? > ALTER TABLE tab PERSISTENCE DEFAULT > > This would be our current default mode, which is full concurrency and > persistence. I'm thinking whether the ALTER TABLE statement might be better with two bool flags rather than a 3-state char. flag 1: ENABLE LOGGING | DISABLE LOGGING flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY Giving 3 possible sets of options: -- the default ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default) -- EXCLUSIVE mode ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY; ...which would be used like this ALTER TABLE mytable DISABLE LOGGING; COPY or other bulk data manipulation SQL ALTER TABLE mytable ENABLE LOGGING; ...since FULL RECOVERY is the default. -- multiuser temp table mode ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY; ...which would usually be left on all the time which only uses one new keyword LOGGING and yet all the modes are fairly explicit as to what they do. An alternative might be the slightly more verbose: ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK; which would be turned off by ALTER TABLE mytable ENABLE LOGGING; Comments? Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly From simon@2ndquadrant.com Tue Jan 3 18:10:32 2006 Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Jim C. Nasby , Bruce Momjian cc: Tom Lane , Andrew Dunstan , kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org In-Reply-To: <200601032120.k03LKl609990@candle.pha.pa.us> References: <200601032120.k03LKl609990@candle.pha.pa.us> Date: Tue, 03 Jan 2006 23:10:16 +0000 Message-ID: <1136329816.5052.239.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) Content-Length: 2118 On Tue, 2006-01-03 at 16:20 -0500, Bruce Momjian wrote: > Jim C. Nasby wrote: > > Idealistically, if EXCLUSIVE/PRESERVE/STABLE does it's thing by only > > appending new pages, it would be nice if other backends could continue > > performing updates at the same time, assuming there's free space > > available elsewhere within the table (and that you'd be able to recover > > those logged changes regardless of the non-logged operations). But > > that's a pretty lofty goal... > > "Idealistically", yep. It would be great if we could put a helmet on > and the computer would read your mind. :-) > > Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is > happening is never going to be implemented because it is just too hard > to do, and too prone to error. The reason for locking the whole table was to ensure that we do not have a mixture of logged and non-logged writers writing to the same data blocks, since that could damage blocks unrecoverably in the event of a crash. (Though perhaps only if full_block_writes is on) The ALTER TABLE .. EXCLUSIVE/(insert name) mode would mean that *any* backend who took a write lock on the table, would lock out the whole table. So this new mode is not restricted to the job/user who ran the ALTER TABLE command. (I would note that that is how Oracle and Teradata do this for pre-load utility table locking, but why should we follow them on that?) Currently, when we add a new row when the FSM is empty, we check the last block of the table. That would cause multiple writers to access the same blocks and so we would be in danger. The only way to avoid that would be for logged writers (who would use the FSM if it were not empty) to notify back to the FSM that they have just added a block - and remove the behaviour to look for the last block. Anyway, one step at a time. *Maybe* we can do that in the future, but right now I'd like to add the basic fast write/load functionality. Also, I think I will do the docs first this time, just so everyone can read what we're getting ahead of time, to ensure we all agree. Best Regards, Simon Riggs From pgsql-hackers-owner+M78236=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 18:24:20 2006 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Jim C. Nasby , Tom Lane , Andrew Dunstan , kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org In-Reply-To: <200601032238.k03McP804163@candle.pha.pa.us> References: <200601032238.k03McP804163@candle.pha.pa.us> Date: Tue, 03 Jan 2006 23:23:54 +0000 Message-ID: <1136330634.5052.247.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.043 required=5 tests=[AWL=0.043] X-Spam-Score: 0.043 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 725 On Tue, 2006-01-03 at 17:38 -0500, Bruce Momjian wrote: > Right, the DML will be single-threaded and fsync of all dirty pages will > happen before commit of each transaction. heap_sync() would occur at end of statement, as it does with CTAS. We could delay until EOT but I'm not sure I see why; in most cases they'd be the same point anyway. I'd been toying with the idea of making the freshly added blocks live only in temp_buffers to avoid the shared_buffers overhead, but that was starting to sounds too wierd for my liking. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org From simon@2ndquadrant.com Tue Jan 3 18:58:13 2006 Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Michael Paesold cc: Bruce Momjian , andrew@dunslane.net, tgl@sss.pgh.pa.us, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org In-Reply-To: <14969.1136030384@www6.gmx.net> References: <200512302114.jBULEno02301@candle.pha.pa.us> <14969.1136030384@www6.gmx.net> Date: Tue, 03 Jan 2006 23:58:09 +0000 Message-ID: <1136332689.5052.263.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) Content-Length: 1493 On Sat, 2005-12-31 at 12:59 +0100, Michael Paesold wrote: > Bruce Momjian wrote: > > > > The --single-transaction mode would apply even if the dump was created > > > using an earlier version of pg_dump. pg_dump has *not* been altered at > > > all. (And I would again add that the idea was not my own) > > > > I assume you mean this: > > > > http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php > > > > I guess with the ALTER commands I don't see much value in the > > --single-transaction flag. I am sure others suggested it, but would > > they suggest it now given our current direction. > > I just want to add that --single-transaction has a value of it's own. There > were times when I wanted to restore parts of a dump all-or-nothing. > > This is possible with PostgreSQL, unlike many other DBM systems, because > people like Tom Lane have invested in ensuring that all DDL is working > without implicitly committing an enclosing transaction. > > Using pg_restore directly into a database, it is not possible to get a > single transaction right now. One has to restore to a file and manually > added BEGIN/COMMIT. Just for that I think --single-transaction is a great > addition and a missing feature. > > I think more people have a use-case for that. I did originally separate the --single-transaction patch for this reason. I think its a valid patch on its own and its wrapped and ready to go, with some deletions from the doc patch. Best Regards, Simon Riggs From pgsql-hackers-owner+M78239=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 19:12:18 2006 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <200512291637.jBTGbdC03848@candle.pha.pa.us> References: <200512291637.jBTGbdC03848@candle.pha.pa.us> Date: Wed, 04 Jan 2006 00:11:55 +0000 Message-ID: <1136333515.5052.273.camel@localhost.localdomain> X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.045 required=5 tests=[AWL=0.045] X-Spam-Score: 0.045 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 1200 On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote: > Having COPY behave differently because it is > in a transaction is fine as long as it is user-invisible, but once you > require users to do that to get the speedup, it isn't user-invisible > anymore. Since we're agreed on adding ALTER TABLE rather than COPY LOCK, we have our explicit mechanism for speedup. However, it costs a single line of code and very very little execution time to add in the optimization to COPY to make it bypass WAL when executed in the same transaction that created the table. Everything else is already there. As part of the use_wal test: + if (resultRelInfo->ri_NumIndices == 0 && + !XLogArchivingActive() && >> (cstate->rel->rd_createSubid != InvalidSubTransactionId )) + use_wal = false; the value is already retrieved from cache... Can anyone see a reason *not* to put that change in also? We just don't advertise it as the "suggested" route to gaining performance, nor would we rely on it for pg_dump/restore performance. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster From pgsql-hackers-owner+M78303=pgman=candle.pha.pa.us@postgresql.org Thu Jan 5 12:23:39 2006 X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org X-Greylist: from auto-whitelisted by SQLgrey- From: Bruce Momjian Message-ID: <200601051722.k05HMSM02052@candle.pha.pa.us> Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and In-Reply-To: <1136328833.5052.223.camel@localhost.localdomain> To: Simon Riggs Date: Thu, 5 Jan 2006 12:22:28 -0500 (EST) cc: Andrew Dunstan , Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] X-Spam-Score: 0.12 X-Spam-Level: X-Mailing-List: pgsql-hackers List-Archive: List-Help: List-Id: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org Content-Length: 6020 Simon Riggs wrote: > > So, we need a name for EXCLUSIVE mode that suggests how it is different > > from TRUNCATE, and in this case, the difference is that EXCLUSIVE > > preserves the previous contents of the table on recovery, while TRUNCATE > > does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER? > > Anyway, the keywords are easy to modify, even after the patch is > > submitted. FYI, I usually go through keywords.c looking for a keyword > > we already use. > > I'm very happy for suggestions on what these new modes are called. > > > > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by > > > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE > > > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. > > > > > > Would you mind stating again what you mean, just so I can understand > > > this? Your summary isn't enough. > > > > New ALTER TABLE mode, perhaps call it PERSISTENCE: > > > > ALTER TABLE tab PERSISTENCE DROP ON RECOVERY > > ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY > > > > These would drop or truncate all tables with this flag on a non-clean > > start of the postmaster, and write something in the server logs. > > However, I don't know that we have the code in place to DROP/TRUNCATE in > > recovery mode, and it would affect all databases, so it could be quite > > complex to implement. In this mode, no WAL logs would be written for > > table modifications, though DDL commands would have to be logged. > > Right now, this will be a TODO item... it looks like it will take some > thought to implement correctly. OK, I know my suggestions have made it more complicated. TODO added: * Allow control over which tables are WAL-logged Allow tables to bypass WAL writes and just fsync() dirty pages on commit. To do this, only a single writer can modify the table, and writes must happen only on new pages. Readers can continue accessing the table. This would affect COPY, and perhaps INSERT/UPDATE too. Another option is to avoid transaction logging entirely and truncate or drop the table on crash recovery. These should be implemented using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE | STABLE | DEFAULT ]. Tables using non-default logging should not use referential integrity with default-logging tables, and tables using stable logging probably can not have indexes. [walcontrol] > > ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?) > > > > Table contents are preserved across recoveries, but data modifications > > can happen only one at a time. I don't think we have a lock mode that > > does this, so I am worried a new lock mode will have to be created. A > > simplified solution at this stage would be to take an exclusive lock on > > the table, but really we just need a single-writer table lock, which I > > don't think we have. initially this can implemented to only affect COPY > > but later can be done for other commands. > > ExclusiveLock locks out everything apart from readers, no new lock mode > AFAICS. Implementing that is little additional work for COPY. Nice. > Tom had a concern about setting this for I, U, D commands via the > executor. Not sure what the details of that are, as yet. That is much more complicated than the COPY-only idea, for sure. I am thinking we could add the ALTER syntax and just do COPY at this stage, meaning that I/U/D still do full logging until we get to improving them. The big benefit is that the user API doesn't need to change when we improve the code. In fact I think we could do the TRUNCATE/DROP easily for I/U/D, but the STABLE option would require work and we don't need to implement it in the first patch. > We can use either of the unlogged modes for pg_dump, so I'd suggest its > this one. Everybody happy with this being the new default in pg_dump, or > should it be an option? > > > ALTER TABLE tab PERSISTENCE DEFAULT > > > > This would be our current default mode, which is full concurrency and > > persistence. > > I'm thinking whether the ALTER TABLE statement might be better with two > bool flags rather than a 3-state char. > > flag 1: ENABLE LOGGING | DISABLE LOGGING > > flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY > > Giving 3 possible sets of options: > > -- the default > ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default) > > -- EXCLUSIVE mode > ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY; > ...which would be used like this > ALTER TABLE mytable DISABLE LOGGING; > COPY or other bulk data manipulation SQL > ALTER TABLE mytable ENABLE LOGGING; > ...since FULL RECOVERY is the default. > > -- multiuser temp table mode > ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY; > ...which would usually be left on all the time > > which only uses one new keyword LOGGING and yet all the modes are fairly > explicit as to what they do. > > An alternative might be the slightly more verbose: > ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK; > which would be turned off by > ALTER TABLE mytable ENABLE LOGGING; > > Comments? I had the same idea originally, but avoided it because the logging really does affect what other options you can use. For example, if you want truncate on recovery, you certainly do not want logging, so it seems the options are not really independent. In fact if someone asks for truncate on recovery, do we automatically turn off logging for them, or throw an error, or a warning. It just seemed too error-prone and confusing, though perhaps more logical. Of course, if others like the above, we can do it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster From simon@2ndquadrant.com Thu Jan 5 16:56:25 2006 Return-path: Received: from smtp.nildram.co.uk (smtp.nildram.co.uk [195.112.4.54]) by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k05LuPb02246 for ; Thu, 5 Jan 2006 16:56:25 -0500 (EST) Received: from [192.168.0.3] (unknown [84.12.184.6]) by smtp.nildram.co.uk (Postfix) with ESMTP id A9F0F268C4E; Thu, 5 Jan 2006 21:56:18 +0000 (GMT) Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and From: Simon Riggs To: Bruce Momjian cc: Tom Lane , Martijn van Oosterhout , Greg Stark , Rod Taylor , Qingqing Zhou , pgsql-hackers@postgresql.org In-Reply-To: <200601051727.k05HR5p02803@candle.pha.pa.us> References: <200601051727.k05HR5p02803@candle.pha.pa.us> Content-Type: text/plain Date: Thu, 05 Jan 2006 21:56:21 +0000 Message-ID: <1136498181.21025.285.camel@localhost.localdomain> MIME-Version: 1.0 X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) Content-Transfer-Encoding: 7bit Status: OR On Thu, 2006-01-05 at 12:27 -0500, Bruce Momjian wrote: > Seems like a nice optimization. Negative thoughts: Toast tables have a toast index on them, yes? We have agreed that we cannot use the optimization if we have indexes on the main table. It follows that we cannot use the optimization if we have *any* toasted data, since that would require a pointer between two blocks, which would not be correctly recovered following a crash. If we log the toast table then there could be a mismatch between heap and toast table; if we don't log the toast table there could be a mismatch between toast table and toast index. We can test to see if the toast table is empty when we do ALTER TABLE, but loading operations may try to create toasted data rows. Presumably that means we must either: i) abort a COPY if we get a toastable value ii) if we get a toastable value, insert the row into a new block, which we do logging of, then also log the toast insert and the toast index insert - i.e. some blocks we log, others not This is still useful for many applications, IMHO, but the list of restrictions seems to be growing. Worse, we wouldn't know that the toast tables were empty until after we did the COPY TO for a pg_dump, so we wouldn't be able to retrospectively add an ALTER TABLE command ahead of the COPY. Thoughts? Hopefully there are some flaws in my thinking here, Best Regards, Simon Riggs