From pgsql-performance-owner+M3897@postgresql.org Sat Oct 4 19:50:57 2003 Return-path: Received: from svr5.postgresql.org (svr5.postgresql.org [64.117.225.181]) by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id h94NotQ08911 for ; Sat, 4 Oct 2003 19:50:56 -0400 (EDT) Received: from postgresql.org (svr1.postgresql.org [64.117.224.193]) by svr5.postgresql.org (Postfix) with ESMTP id DB0F072DC9E; Sat, 4 Oct 2003 23:50:50 +0000 (GMT) X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [64.117.224.130]) by svr1.postgresql.org (Postfix) with ESMTP id 70DDDD1B4EC for ; Sat, 4 Oct 2003 23:50:42 +0000 (GMT) Received: from svr1.postgresql.org ([64.117.224.193]) by localhost (neptune.hub.org [64.117.224.130]) (amavisd-new, port 10024) with ESMTP id 14368-03 for ; Sat, 4 Oct 2003 20:49:56 -0300 (ADT) Received: from news.hub.org (unknown [64.117.224.194]) by svr1.postgresql.org (Postfix) with ESMTP id 2FBF7D1B4F0 for ; Sat, 4 Oct 2003 20:49:53 -0300 (ADT) Received: from news.hub.org (host-64-117-224-194.altec1.com [64.117.224.194] (may be forged)) by news.hub.org (8.12.9/8.12.9) with ESMTP id h94NnqQh076664 for ; Sat, 4 Oct 2003 23:49:52 GMT (envelope-from news@news.hub.org) Received: (from news@localhost) by news.hub.org (8.12.9/8.12.9/Submit) id h94NaQEP075478 for pgsql-performance@postgresql.org; Sat, 4 Oct 2003 23:36:26 GMT From: Christopher Browne X-Newsgroups: comp.databases.postgresql.performance Subject: Re: [PERFORM] count(*) slow on large tables Date: Sat, 04 Oct 2003 19:33:46 -0400 Organization: cbbrowne Computing Inc Lines: 77 Message-ID: References: <200310041556.h94Fuek24423@candle.pha.pa.us> <6743.1065286173@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Complaints-To: usenet@news.hub.org X-message-flag: Outlook is rather hackable, isn't it? X-Home-Page: http://www.cbbrowne.com/info/ X-Affero: http://svcs.affero.net/rm.php?r=cbbrowne User-Agent: Gnus/5.1003 (Gnus v5.10.3) XEmacs/21.4 (Reasonable Discussion, linux) Cancel-Lock: sha1:lLXE17xNVoXrMYZPn8CzzK9g1mc= To: pgsql-performance@postgresql.org X-Virus-Scanned: by amavisd-new at postgresql.org X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Status: OR Quoth tgl@sss.pgh.pa.us (Tom Lane): > Bruce Momjian writes: >> We do have a TODO item: >> * Consider using MVCC to cache count(*) queries with no WHERE clause > >> The idea is to cache a recent count of the table, then have >> insert/delete add +/- records to the count. A COUNT(*) would get the >> main cached record plus any visible +/- records. This would allow the >> count to return the proper value depending on the visibility of the >> requesting transaction, and it would require _no_ heap or index scan. > > ... and it would give the wrong answers. Unless the cache is somehow > snapshot-aware, so that it can know which other transactions should be > included in your count. [That's an excellent summary that Bruce did of what came out of the previous discussion...] If this "cache" was a table, itself, the visibility of its records should be identical to that of the visibility of the "real" records. +/- records would become visible when the transaction COMMITed, at the very same time the source records became visible. I thought, at one point, that it would be a slick idea for "record compression" to take place automatically; when you do a COUNT(*), the process would include compressing multiple records down to one. Unfortunately, that turns out to be Tremendously Evil if the same COUNT(*) were being concurrently processed in multiple transactions. Both would repeat much the same work, and this would ultimately lead to one of the transactions aborting. [I recently saw this effect occur, um, a few times...] For this not to have Evil Effects on unsuspecting transactions, we would instead require some process analagous to VACUUM, where a single transaction would be used to compress the "counts table" down to one record per table. Being independent of "user transactions," it could safely compress the data without injuring unsuspecting transactions. But in most cases, the cost of this would be pretty prohibitive. Every transaction that adds a record to a table leads to a record being added to table "pg_exact_row_counts". If transactions typically involve adding ONE row to any given table, this effectively doubles the update traffic. Ouch. That means that in a _real_ implementation, it would make sense to pick and choose the tables that would be so managed. In my earlier arguing of "You don't really want that!", while I may have been guilty of engaging in a _little_ hyperbole, I was certainly _not_ being facetious overall. At work, we tell the developers "avoid doing COUNT(*) inside ordinary transactions!", and that is certainly NOT facetious comment. I recall a case a while back where system performance was getting brutalized by a lurking COUNT(*). (Combined with some other pathological behaviour, of course!) And note that this wasn't a query that the TODO item could address; it was of the form "SELECT COUNT(*) FROM SOME_TABLE WHERE OWNER = VALUE;" As you have commented elsewhere in the thread, much of the time, the point of asking for COUNT(*) is often to get some idea of table size, where the precise number isn't terribly important in comparison with getting general magnitude. Improving the ability to get approximate values would be of some value. I would further argue that "SELECT COUNT(*) FROM TABLE" isn't particularly useful even when precision _is_ important. If I'm working on reports that would be used to reconcile things, the queries I use are a whole lot more involved than that simple form. It is far more likely that I'm using a GROUP BY. It is legitimate to get wishful and imagine that it would be nice if we could get the value of that query "instantaneously." It is also legitimate to think that the effort required to implement that might be better used on improving other things. -- (reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa")) http://www3.sympatico.ca/cbbrowne/ "very few people approach me in real life and insist on proving they are drooling idiots." -- Erik Naggum, comp.lang.lisp ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html From josh@agliodbs.com Sun Oct 5 14:59:07 2003 Return-path: Received: from davinci.ethosmedia.com (vista1-228.percepticon.net [209.128.84.228] (may be forged)) by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id h95Ix5Q17861 for ; Sun, 5 Oct 2003 14:59:06 -0400 (EDT) Received: from [63.195.55.98] (HELO spooky) by davinci.ethosmedia.com (CommuniGate Pro SMTP 4.0.2) with ESMTP id 3728969; Sun, 05 Oct 2003 11:59:26 -0700 Content-Type: text/plain; charset="iso-8859-1" From: Josh Berkus Organization: Aglio Database Solutions To: Bruce Momjian , Tom Lane Subject: Re: [PERFORM] count(*) slow on large tables Date: Sun, 5 Oct 2003 11:57:21 -0700 User-Agent: KMail/1.4.3 cc: Christopher Browne , pgsql-performance@postgresql.org References: <200310041819.h94IJkV07596@candle.pha.pa.us> In-Reply-To: <200310041819.h94IJkV07596@candle.pha.pa.us> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: <200310051157.21555.josh@agliodbs.com> Status: OR Bruce, > OK, I beefed up the TODO: > > * Use a fixed row count and a +/- count with MVCC visibility rules > to allow fast COUNT(*) queries with no WHERE clause(?) > > I can always give the details if someone asks. It doesn't seem complex > enough for a separate TODO.detail item. Hmmm ... this doesn't seem effort-worthy to me. How often does anyone do COUNT with no where clause, except GUIs that give you a record count? (of course, as always, if someone wants to code it, feel free ...) And for those GUIs, wouldn't it be 97% as good to run an ANALYZE and give the approximate record counts for large tables? As for counts with a WHERE clause, this is obviously up to the user. Joe Conway and I tested using a C trigger to track some COUNT ... GROUP BY values for large tables based on additive numbers. It worked fairly well for accuracy, but the performance penalty on data writes was significant ... 8% to 25% penalty for UPDATES, depending on the frequency and batch size (> frequency > batch size --> > penalty) It's possible that this could be improved through some mechanism more tightly integrated with the source code. However,the coding effort would be significant ( 12-20 hours ) and it's possible that there would be no improvement, which is why we didn't do it. We also discussed an asynchronous aggregates collector that would work something like the statistics collector, and keep pre-programmmed aggregate data, updating during "low-activity" periods. This would significantly reduce the performance penalty, but at the cost of accuracy ... that is, a 1%-5% variance on high-activity tables would be unavoidable, and all cached aggregates would have to be recalculated on database restart, significantly slowing down startup. Again, we felt that the effort-result payoff was not worthwhile. Overall, I think the stuff we already have planned ... the hash aggregates in 7.4 and Tom's suggestion of adding an indexable flag to pg_aggs ... are far more likely to yeild useful fruit than any caching plan. -- Josh Berkus Aglio Database Solutions San Francisco From pgsql-performance-owner+M3915@postgresql.org Mon Oct 6 02:08:33 2003 Return-path: Received: from svr5.postgresql.org (svr5.postgresql.org [64.117.225.181]) by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id h9668VQ15914 for ; Mon, 6 Oct 2003 02:08:32 -0400 (EDT) Received: from postgresql.org (svr1.postgresql.org [64.117.224.193]) by svr5.postgresql.org (Postfix) with ESMTP id DC70672E71E; Mon, 6 Oct 2003 06:08:24 +0000 (GMT) X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [64.117.224.130]) by svr1.postgresql.org (Postfix) with ESMTP id BFA49D1B4F6 for ; Mon, 6 Oct 2003 06:07:33 +0000 (GMT) Received: from svr1.postgresql.org ([64.117.224.193]) by localhost (neptune.hub.org [64.117.224.130]) (amavisd-new, port 10024) with ESMTP id 90800-06 for ; Mon, 6 Oct 2003 03:06:44 -0300 (ADT) Received: from smtp.pspl.co.in (unknown [202.54.11.65]) by svr1.postgresql.org (Postfix) with ESMTP id 9033ED1B4EB for ; Mon, 6 Oct 2003 03:06:41 -0300 (ADT) Received: (from root@localhost) by smtp.pspl.co.in (8.12.9/8.12.9) id h966AmTk013993 for ; Mon, 6 Oct 2003 11:40:49 +0530 Received: from persistent.co.in (daithan.intranet.pspl.co.in [192.168.7.161]) (authenticated bits=0) by persistent.co.in (8.12.9/8.12.9) with ESMTP id h966AlYM013922 for ; Mon, 6 Oct 2003 11:40:48 +0530 Message-ID: <3F81066C.90402@persistent.co.in> Date: Mon, 06 Oct 2003 11:36:36 +0530 From: Shridhar Daithankar Organization: Persistent Systems Pvt. Ltd. User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5b) Gecko/20030917 Thunderbird/0.3a X-Accept-Language: en-us, en MIME-Version: 1.0 To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] count(*) slow on large tables References: <200310041819.h94IJkV07596@candle.pha.pa.us> In-Reply-To: <200310041819.h94IJkV07596@candle.pha.pa.us> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at postgresql.org X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Status: OR Bruce Momjian wrote: > OK, I beefed up the TODO: > > * Use a fixed row count and a +/- count with MVCC visibility rules > to allow fast COUNT(*) queries with no WHERE clause(?) > > I can always give the details if someone asks. It doesn't seem complex > enough for a separate TODO.detail item. May I propose alternate approach for this optimisation? - Postgresql allows to maintain user defined variables in shared memory. - These variables obey transactions but do not get written to disk at all. - There should be a facility to detect whether such a variable is initialized or not. How it will help? This is in addition to trigger proposal that came up earlier. With triggers it's not possible to make values visible across backends unless trigger updates a table, which eventually leads to vacuum/dead tuples problem. 1. User creates a trigger to check updates/inserts for certain conditions. 2. It updates the count as and when required. 3. If the trigger detects the count is not initialized, it would issue the same query first time. There is no avoiding this issue. Besides providing facility of resident variables could be used imaginatively as well. Does this make sense? IMO this is more generalised approach over all. Just a thought. Shridhar ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org From pgsql-performance-owner+M3938@postgresql.org Mon Oct 6 16:04:10 2003 Return-path: Received: from svr5.postgresql.org (svr5.postgresql.org [64.117.225.181]) by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id h96K49i20610 for ; Mon, 6 Oct 2003 16:04:10 -0400 (EDT) Received: from postgresql.org (svr1.postgresql.org [64.117.224.193]) by svr5.postgresql.org (Postfix) with ESMTP id 9B73272DC4D; Mon, 6 Oct 2003 20:04:08 +0000 (GMT) X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [64.117.224.130]) by svr1.postgresql.org (Postfix) with ESMTP id 3770CD1B567 for ; Mon, 6 Oct 2003 15:11:08 +0000 (GMT) Received: from svr1.postgresql.org ([64.117.224.193]) by localhost (neptune.hub.org [64.117.224.130]) (amavisd-new, port 10024) with ESMTP id 81338-08 for ; Mon, 6 Oct 2003 12:10:22 -0300 (ADT) Received: from main.gmane.org (main.gmane.org [80.91.224.249]) by svr1.postgresql.org (Postfix) with ESMTP id E71D7D1B51E for ; Mon, 6 Oct 2003 12:10:21 -0300 (ADT) Received: from root by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 1A6X08-0003KO-00 for ; Mon, 06 Oct 2003 17:10:20 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: pgsql-performance@postgresql.org Received: from sea.gmane.org ([80.91.224.252]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1A6Wxn-0003Hh-00 for ; Mon, 06 Oct 2003 17:07:55 +0200 Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian)) id 1A6Wxn-0006U8-00 for ; Mon, 06 Oct 2003 17:07:55 +0200 From: Harald Fuchs Subject: Re: [PERFORM] count(*) slow on large tables Date: 06 Oct 2003 17:08:36 +0200 Organization: Linux Private Site Lines: 21 Message-ID: References: <20031002191547.GZ87525@rlx11.zapatec.com> <20031002193905.GD18417@wolff.to> <3F7C98B8.C892D0E5@nsd.ca> <60brszcng5.fsf@dev6.int.libertyrms.info> <20031002223313.GE87525@rlx11.zapatec.com> <20031003042754.GH87525@rlx11.zapatec.com> <3F7D172E.3060107@persistent.co.in> Reply-To: hf99@protecting.net MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Complaints-To: usenet@sea.gmane.org X-No-Archive: yes User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 X-Virus-Scanned: by amavisd-new at postgresql.org X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Status: OR In article <3F7D172E.3060107@persistent.co.in>, Shridhar Daithankar writes: > Dror Matalon wrote: >> I smell a religious war in the aii:-). Can you go several days in a >> row without doing select count(*) on any >> of your tables? I suspect that this is somewhat a domain specific >> issue. In some areas >> you don't need to know the total number of rows in your tables, in >> others you do. > If I were you, I would have an autovacuum daemon running and rather > than doing select count(*), I would look at stats generated by > vacuums. They give approximate number of tuples and it should be good > enough it is accurate within a percent. The stats might indeed be a good estimate presumed there were not many changes since the last VACUUM. But how about a variant of COUNT(*) using an index? It would not be quite exact since it might contain tuples not visible in the current transaction, but it might be a much better estimate than the stats. ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) From pgsql-performance-owner+M3930@postgresql.org Mon Oct 6 13:03:02 2003 Return-path: Received: from svr4.postgresql.org (svr4.postgresql.org [64.117.224.192]) by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id h96H30Q06466 for ; Mon, 6 Oct 2003 13:03:00 -0400 (EDT) Received: from postgresql.org (svr1.postgresql.org [64.117.224.193]) by svr4.postgresql.org (Postfix) with ESMTP id 314A01CB46D6; Mon, 6 Oct 2003 17:02:55 +0000 (GMT) X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [64.117.224.130]) by svr1.postgresql.org (Postfix) with ESMTP id E83D7D1B4F2 for ; Mon, 6 Oct 2003 17:02:38 +0000 (GMT) Received: from svr1.postgresql.org ([64.117.224.193]) by localhost (neptune.hub.org [64.117.224.130]) (amavisd-new, port 10024) with ESMTP id 03671-08 for ; Mon, 6 Oct 2003 14:01:53 -0300 (ADT) Received: from perrin.nxad.com (internal.nxad.com [69.1.70.251]) by svr1.postgresql.org (Postfix) with ESMTP id DEADDD1B4EC for ; Mon, 6 Oct 2003 14:01:51 -0300 (ADT) Received: by perrin.nxad.com (Postfix, from userid 1001) id 64CEC21068; Mon, 6 Oct 2003 10:01:36 -0700 (PDT) Date: Mon, 6 Oct 2003 10:01:36 -0700 From: Sean Chittenden To: Shridhar Daithankar cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] count(*) slow on large tables Message-ID: <20031006170136.GB94718@perrin.nxad.com> References: <200310041819.h94IJkV07596@candle.pha.pa.us> <3F81066C.90402@persistent.co.in> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3F81066C.90402@persistent.co.in> X-PGP-Key: finger seanc@FreeBSD.org X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341 X-Web-Homepage: http://sean.chittenden.org/ User-Agent: Mutt/1.5.4i X-Virus-Scanned: by amavisd-new at postgresql.org X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Status: OR > How it will help? This is in addition to trigger proposal that came > up earlier. With triggers it's not possible to make values visible > across backends unless trigger updates a table, which eventually > leads to vacuum/dead tuples problem. > > 1. User creates a trigger to check updates/inserts for certain conditions. > 2. It updates the count as and when required. > 3. If the trigger detects the count is not initialized, it would issue the > same query first time. There is no avoiding this issue. > > Besides providing facility of resident variables could be used > imaginatively as well. > > Does this make sense? IMO this is more generalised approach over all. I do this _VERY_ frequently in my databases, only I have my stored procs do the aggregate in a predefined MVCC table that's always there. Here's a denormalized version for public consumption/thought: CREATE TABLE global.dba_aggregate_cache ( dbl TEXT NOT NULL, -- The database location, doesn't need to be -- qualified (ex: schema.table.col) op TEXT NOT NULL, -- The operation, SUM, COUNT, etc. qual TEXT, -- Any kind of conditional, such as a where clause val_int INT, -- Whatever the value is, of type INT val_bigint BIGINT, -- Whatever the value is, of type BIGINT val_text TEXT, -- Whatever the value is, of type TEXT val_bytea BYTEA, -- Whatever the value is, of type BYTEA ); CREATE UNIQUE INDEX dba_aggregate_cache_dbl_op_udx ON global.dba_aggregate_cache(dbl,op); Then, I use a function to retrieve this value instead of a SELECT COUNT(*). SELECT public.cache_count('dbl','qual'); -- In this case, the op is COUNT SELECT public.cache_count('dbl'); -- Returns the COUNT for the table listed in the dbl Then, I create 4 or 5 functions (depends on the op I'm performing): 1) A private function that _doesn't_ run as security definer, that populates the global.dba_aggregate_cache row if it's empty. 2) A STABLE function for SELECTs, if the row doesn't exist, then it calls function #1 to populate its existence. 3) A STABLE function for INSERTs, if the row doesn't exist, then it calls function #1 to populate its existence, then adds the necessary bits to make it accurate. 4) A STABLE function for DELETEs, if the row doesn't exist, then it calls function #1 to populate its existence, then deletes the necessary bits to make it accurate. 5) A STABLE function for UPDATEs, if the row doesn't exist, then it calls function #1 to populate its existence, then updates the necessary bits to make it accurate. It's not uncommon for me to not have an UPDATE function/trigger. Create triggers for functions 2-5, and test away. It's MVCC, searching through a table that's INDEX'ed for a single row is obviously vastly faster than a seqscan/aggregate. If I need any kind of an aggregate to be fast, I use this system with a derivation of the above table. The problem with it being that I have to retrain others to use cache_count(), or some other function instead of using COUNT(*). That said, it'd be nice if there were a way to tell PostgreSQL to do the above for you and teach COUNT(*), SUM(*), or other aggregates to use an MVCC backed cache similar to the above. If people want their COUNT's to be fast, then they have to live with the INSERT, UPDATE, DELETE cost. The above doesn't work with anything complex such as join's, but it's certainly a start and I think satisfies everyone's gripes other than the tuple churn that _does_ happen (*nudge nudge*, pg_autovacuum could be integrated into the backend to handle this). Those worried about performance, the pages that are constantly being recycled would likely stay in disk cache (PG or the OS). There's still some commit overhead, but still... no need to over optimize by requiring the table to be stored in the out dated, slow, and over used shm (also, *nudge nudge*). Anyway, let me throw that out there as a solution that I use and it works quite well. I didn't explain the use of the qual column, but I think those who grasp the above way of handling things probably grok how to use the qual column in a dynamically executed query. CREATE AGGREGATE CACHE anyone? -sc -- Sean Chittenden ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings