postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	039680affb	Don't assume that a tuple's header size is unchanged during toasting. This assumption can be wrong when the toaster is passed a raw on-disk tuple, because the tuple might pre-date an ALTER TABLE ADD COLUMN operation that added columns without rewriting the table. In such a case the tuple's natts value is smaller than what we expect from the tuple descriptor, and so its t_hoff value could be smaller too. In fact, the tuple might not have a null bitmap at all, and yet our current opinion of it is that it contains some trailing nulls. In such a situation, toast_insert_or_update did the wrong thing, because to save a few lines of code it would use the old t_hoff value as the offset where heap_fill_tuple should start filling data. This did not leave enough room for the new nulls bitmap, with the result that the first few bytes of data could be overwritten with null flag bits, as in a recent report from Hubert Depesz Lubaczewski. The particular case reported requires ALTER TABLE ADD COLUMN followed by CREATE TABLE AS SELECT * FROM ... or INSERT ... SELECT * FROM ..., and further requires that there be some out-of-line toasted fields in one of the tuples to be copied; else we'll not reach the troublesome code. The problem can only manifest in this form in 8.4 and later, because before commit `a77eaa6a95`, CREATE TABLE AS or INSERT/SELECT wouldn't result in raw disk tuples getting passed directly to heap_insert --- there would always have been at least a junkfilter in between, and that would reconstitute the tuple header with an up-to-date t_natts and hence t_hoff. But I'm backpatching the tuptoaster change all the way anyway, because I'm not convinced there are no older code paths that present a similar risk.	2011-11-04 23:22:50 -04:00
Simon Riggs	a030bfa6e4	Move user functions related to WAL into xlogfuncs.c	2011-11-04 09:37:17 +00:00
Tom Lane	515e813543	Fix inline_set_returning_function() to allow multiple OUT parameters. inline_set_returning_function failed to distinguish functions returning generic RECORD (which require a column list in the RTE, as well as run-time type checking) from those with multiple OUT parameters (which do not). This prevented inlining from happening. Per complaint from Jay Levitt. Back-patch to 8.4 where this capability was introduced.	2011-11-03 17:54:11 -04:00
Andrew Dunstan	94cd0f1ad8	Do not treat a superuser as a member of every role for HBA purposes. This makes it possible to use reject lines with group roles. Andrew Dunstan, reviewd by Robert Haas.	2011-11-03 12:45:02 -04:00
Heikki Linnakangas	4429f6a9e3	Support range data types. Selectivity estimation functions are missing for some range type operators, which is a TODO. Jeff Davis	2011-11-03 13:42:15 +02:00
Tom Lane	7e3bf99baa	Fix handling of PlaceHolderVars in nestloop parameter management. If we use a PlaceHolderVar from the outer relation in an inner indexscan, we need to reference the PlaceHolderVar as such as the value to be passed in from the outer relation. The previous code effectively tried to reconstruct the PHV from its component expression, which doesn't work since (a) the Vars therein aren't necessarily bubbled up far enough, and (b) it would be the wrong semantics anyway because of the possibility that the PHV is supposed to have gone to null at some point before the current join. Point (a) led to "variable not found in subplan target list" planner errors, but point (b) would have led to silently wrong answers. Per report from Roger Niederland.	2011-11-03 00:50:58 -04:00
Tom Lane	1a77f8b63d	Avoid scanning nulls at the beginning of a btree index scan. If we have an inequality key that constrains the other end of the index, it doesn't directly help us in doing the initial positioning ... but it does imply a NOT NULL constraint on the index column. If the index stores nulls at this end, we can use the implied NOT NULL condition for initial positioning, just as if it had been stated explicitly. This avoids wasting time when there are a lot of nulls in the column. This is the reverse of the examples given in bugs #6278 and #6283, which were about failing to stop early when we encounter nulls at the end of the indexscan.	2011-11-02 19:35:48 -04:00
Tom Lane	882368e854	Fix btree stop-at-nulls logic properly. As pointed out by Naoya Anzai, my previous try at this was a few bricks shy of a load, because I had forgotten that the initial-positioning logic might not try to skip over nulls at the end of the index the scan will start from. We ought to fix that, because it represents an unnecessary inefficiency, but first let's get the scan-stop logic back to a safe state. With this patch, we preserve the performance benefit requested in bug #6278 for the case of scanning forward into NULLs (in a NULLS LAST index), but the reverse case of scanning backward across NULLs when there's no suitable initial-positioning qual is still inefficient.	2011-11-02 17:53:49 -04:00
Simon Riggs	750f70b0fe	Update more comments about checkpoints being done by bgwriter	2011-11-02 17:15:35 +00:00
Simon Riggs	18fb9d8d21	Reduce checkpoints and WAL traffic on low activity database server Previously, we skipped a checkpoint if no WAL had been written since last checkpoint, though this does not appear in user documentation. As of now, we skip a checkpoint until we have written at least one enough WAL to switch the next WAL file. This greatly reduces the level of activity and number of WAL messages generated by a very low activity server. This is safe because the purpose of a checkpoint is to act as a starting place for a recovery, in case of crash. This patch maintains minimal WAL volume for replay in case of crash, thus maintaining very low crash recovery time.	2011-11-02 15:26:33 +00:00
Simon Riggs	9aceb6ab3c	Refactor xlog.c to create src/backend/postmaster/startup.c Startup process now has its own dedicated file, just like all other special/background processes. Reduces role and size of xlog.c	2011-11-02 14:25:01 +00:00
Simon Riggs	86e3364899	Derive oldestActiveXid at correct time for Hot Standby. There was a timing window between when oldestActiveXid was derived and when it should have been derived that only shows itself under heavy load. Move code around to ensure correct timing of derivation. No change to StartupSUBTRANS() code, which is where this failed. Bug report by Chris Redekop	2011-11-02 08:54:56 +00:00
Simon Riggs	10b7c686e5	Start Hot Standby faster when initial snapshot is incomplete. If the initial snapshot had overflowed then we can start whenever the latest snapshot is empty, not overflowed or as we did already, start when the xmin on primary was higher than xmax of our starting snapshot, which proves we have full snapshot data. Bug report by Chris Redekop	2011-11-02 08:47:43 +00:00
Simon Riggs	2296e62a32	Remove spurious entry from missed catch while patch juggling	2011-11-02 08:37:52 +00:00
Simon Riggs	f8409b39d1	Fix timing of Startup CLOG and MultiXact during Hot Standby Patch by me, bug report by Chris Redekop, analysis by Florian Pflug	2011-11-02 08:07:44 +00:00
Robert Haas	c2891b46a4	Initialize myProcLocks queues just once, at postmaster startup. In assert-enabled builds, we assert during the shutdown sequence that the queues have been properly emptied, and during process startup that we are inheriting empty queues. In non-assert enabled builds, we just save a few cycles.	2011-11-01 22:44:54 -04:00
Tom Lane	391af9f784	Preserve Var location information during flatten_join_alias_vars. This allows us to give correct syntax error pointers when complaining about ungrouped variables in a join query with aggregates or GROUP BY. It's pretty much irrelevant for the planner's use of the function, though perhaps it might aid debugging sometimes.	2011-11-01 22:13:11 -04:00
Tom Lane	08e261cbc9	Fix race condition with toast table access from a stale syscache entry. If a tuple in a syscache contains an out-of-line toasted field, and we try to fetch that field shortly after some other transaction has committed an update or deletion of the tuple, there is a race condition: vacuum could come along and remove the toast tuples before we can fetch them. This leads to transient failures like "missing chunk number 0 for toast value NNNNN in pg_toast_2619", as seen in recent reports from Andrew Hammond and Tim Uckun. The design idea of syscache is that access to stale syscache entries should be prevented by relation-level locks, but that fails for at least two cases where toasted fields are possible: ANALYZE updates pg_statistic rows without locking out sessions that might want to plan queries on the same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without any meaningful lock at all. The least risky fix seems to be an idea that Heikki suggested when we were dealing with a related problem back in August: forcibly detoast any out-of-line fields before putting a tuple into syscache in the first place. This avoids the problem because at the time we fetch the parent tuple from the catalog, we should be holding an MVCC snapshot that will prevent removal of the toast tuples, even if the parent tuple is outdated immediately after we fetch it. (Note: I'm not convinced that this statement holds true at every instant where we could be fetching a syscache entry at all, but it does appear to hold true at the times where we could fetch an entry that could have a toasted field. We will need to be a bit wary of adding toast tables to low-level catalogs that don't have them already.) An additional benefit is that subsequent uses of the syscache entry should be faster, since they won't have to detoast the field. Back-patch to all supported versions. The problem is significantly harder to reproduce in pre-9.0 releases, because of their willingness to flush every entry in a syscache whenever the underlying catalog is vacuumed (cf CatalogCacheFlushRelation); but there is still a window for trouble.	2011-11-01 19:49:58 -04:00
Peter Eisentraut	654e1f96b0	Clean up whitespace and indentation in parser and scanner files These are not touched by pgindent, so clean them up a bit manually.	2011-11-01 21:51:30 +02:00
Simon Riggs	f3ebaad45b	Comment changes to show bgwriter no longer performs checkpoints.	2011-11-01 18:48:47 +00:00
Simon Riggs	3ba182056f	Have checkpointer send stats once each processing loop. Noted by Fujii Masao	2011-11-01 18:38:27 +00:00
Simon Riggs	bf405ba8e4	Add new file for checkpointer.c	2011-11-01 18:07:29 +00:00
Simon Riggs	806a2aee37	Split work of bgwriter between 2 processes: bgwriter and checkpointer. bgwriter is now a much less important process, responsible for page cleaning duties only. checkpointer is now responsible for checkpoints and so has a key role in shutdown. Later patches will correct doc references to the now old idea that bgwriter performs checkpoints. Has beneficial effect on performance at high write rates, but mainly refactoring to more easily allow changes for power reduction by simplifying previously tortuous code around required to allow page cleaning and checkpointing to time slice in the same process. Patch by me, Review by Dickson Guedes	2011-11-01 17:14:47 +00:00
Tom Lane	6980f817e8	Stop btree indexscans upon reaching nulls in either direction. The existing scan-direction-sensitive tests were overly complex, and failed to stop the scan in cases where it's perfectly legitimate to do so. Per bug #6278 from Maksym Boguk. Back-patch to 8.3, which is as far back as the patch applies easily. Doesn't seem worth sweating over a relatively minor performance issue in 8.2 at this late date. (But note that this was a performance regression from 8.1 and before, so 8.2 is being left as an outlier.)	2011-10-31 16:40:04 -04:00
Tom Lane	6743a878a4	Support more locale-specific formatting options in cash_out(). The POSIX spec defines locale fields for controlling the ordering of the value, sign, and currency symbol in monetary output, but cash_out only supported a small subset of these options. Fully implement p/n_sign_posn, p/n_cs_precedes, and p/n_sep_by_space per spec. Fix up cash_in so that it will accept all these format variants. Also, make sure that thousands_sep is only inserted to the left of the decimal point, as required by spec. Per bug #6144 from Eduard Kracmar and discussion of bug #6277. This patch includes some ideas from Alexander Lakhin's proposed patch, though it is very different in detail.	2011-10-30 15:02:58 -04:00
Tom Lane	eb5834d5af	Further improvement of make_greater_string. Make sure that it considers all the possibilities that the old code did, instead of trying only one possibility per character position. To keep the runtime in bounds, instead tweak the character incrementers to not try every possible multibyte character code. Remove unnecessary logic to restore the old character value on failure. Additional comment and formatting cleanup.	2011-10-30 12:22:11 -04:00
Robert Haas	fae54e4a16	Update visibilitymap.c header comments. Recent work on index-only scans left this somewhat out of date.	2011-10-29 14:46:59 -04:00
Tom Lane	7609239f3e	Fix assorted bogosities in cash_in() and cash_out(). cash_out failed to handle multiple-byte thousands separators, as per bug #6277 from Alexander Law. In addition, cash_in didn't handle that either, nor could it handle multiple-byte positive_sign. Both routines failed to support multiple-byte mon_decimal_point, which I did not think was worth changing, but at least now they check for the possibility and fall back to using '.' rather than emitting invalid output. Also, make cash_in handle trailing negative signs, which formerly it would reject. Since cash_out generates trailing negative signs whenever the locale tells it to, this last omission represents a fail-to-reload-dumped-data bug. IMO that justifies patching this all the way back.	2011-10-29 14:32:06 -04:00
Robert Haas	78d523b633	Improve make_greater_string() with encoding-specific incrementers. This infrastructure doesn't in any way guarantee that the character we produce will sort before the one we incremented; but it does at least make it much more likely that we'll end up with something that is a valid character, which improves our chances. Kyotaro Horiguchi, with various adjustments by me.	2011-10-29 14:22:20 -04:00
Robert Haas	53f1ca59b5	Allow hint bits to be set sooner for temporary and unlogged tables. We need not wait until the commit record is durably on disk, because in the event of a crash the page we're updating with hint bits will be gone anyway. Per off-list report from Heikki Linnakangas, this can significantly degrade the performance of unlogged tables; I was able to show a 2x speedup from this patch on a pgbench run with scale factor 15. In practice, this will mostly help small, heavily updated tables, because on larger tables you're unlikely to run into the same row again before the commit record makes it out to disk.	2011-10-28 17:08:09 -04:00
Heikki Linnakangas	cbf65509bb	Fix the number of lwlocks needed by the "fast path" lock patch. It needs one lock per backend or auxiliary process - the need for a lock for each aux processes was not accounted for in NumLWLocks(). No-one noticed, because the three locks needed for the three aux processes fit into the few extra lwlocks we allocate for 3rd party modules that don't call RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).	2011-10-27 22:39:58 +03:00
Tom Lane	3e4b3465b6	Improve planner's ability to recognize cases where an IN's RHS is unique. If the right-hand side of a semijoin is unique, then we can treat it like a normal join (or another way to say that is: we don't need to explicitly unique-ify the data before doing it as a normal join). We were recognizing such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP BY decoration, but there's another way: if the RHS is a plain relation with unique indexes, we can check if any of the indexes prove the output is unique. Most of the infrastructure for that was there already in the join removal code, though I had to rearrange it a bit. Per reflection about a recent example in pgsql-performance.	2011-10-26 17:52:29 -04:00
Tom Lane	1e3b21dd5e	Change FK trigger naming convention to fix self-referential FKs. Use names like "RI_ConstraintTrigger_a_NNNN" for FK action triggers and "RI_ConstraintTrigger_c_NNNN" for FK check triggers. This ensures the action trigger fires first in self-referential cases where the very same row update fires both an action and a check trigger. This change provides a non-probabilistic solution for bug #6268, at the risk that it could break client code that is making assumptions about the exact names assigned to auto-generated FK triggers. Hence, change this in HEAD only. No need for forced initdb since old triggers continue to work fine.	2011-10-26 13:19:42 -04:00
Tom Lane	58958726ff	Change FK trigger creation order to better support self-referential FKs. When a foreign-key constraint references another column of the same table, row updates will queue both the PK's ON UPDATE action and the FK's CHECK action in the same event. The ON UPDATE action must execute first, else the CHECK will check a non-final state of the row and possibly throw an inappropriate error, as seen in bug #6268 from Roman Lytovchenko. Now, the firing order of multiple triggers for the same event is determined by the sort order of their pg_trigger.tgnames, and the auto-generated names we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the trigger OID. So most of the time the firing order is the same as creation order, and so rearranging the creation order fixes it. This patch will fail to fix the problem if the OID counter wraps around or adds a decimal digit (eg, from 99999 to 100000) while we are creating the triggers for an FK constraint. Given the small odds of that, and the low usage of self-referential FKs, we'll live with that solution in the back branches. A better fix is to change the auto-generated names for FK triggers, but it seems unwise to do that in stable branches because there may be client code that depends on the naming convention. We'll fix it that way in HEAD in a separate patch. Back-patch to all supported branches, since this bug has existed for a long time.	2011-10-26 13:02:28 -04:00
Magnus Hagander	a87b9ae161	Make event_source visible on all platforms On non-windows platform, we just ignore any value set there. Noted by Jaime Casanova	2011-10-25 22:40:58 +02:00
Magnus Hagander	d8ea33f2c0	Support configurable eventlog application names on Windows This allows different instances to use the eventlog with different identifiers, by setting the event_source GUC, similar to how syslog_ident works. Original patch by MauMau, heavily modified by Magnus Hagander	2011-10-25 20:02:55 +02:00
Tom Lane	0f39d5050d	Don't trust deferred-unique indexes for join removal. The uniqueness condition might fail to hold intra-transaction, and assuming it does can give incorrect query results. Per report from Marti Raudsepp, though this is not his proposed patch. Back-patch to 9.0, where both these features were introduced. In the released branches, add the new IndexOptInfo field to the end of the struct, to try to minimize ABI breakage for third-party code that may be examining that struct.	2011-10-23 00:43:39 -04:00
Tom Lane	bb446b689b	Support synchronization of snapshots through an export/import procedure. A transaction can export a snapshot with pg_export_snapshot(), and then others can import it with SET TRANSACTION SNAPSHOT. The data does not leave the server so there are not security issues. A snapshot can only be imported while the exporting transaction is still running, and there are some other restrictions. I'm not totally convinced that we've covered all the bases for SSI (true serializable) mode, but it works fine for lesser isolation modes. Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified by Tom Lane	2011-10-22 18:23:30 -04:00
Heikki Linnakangas	b436c72f61	Fix overly-complicated usage of errcode_for_file_access(). No need to do "errcode(errcode_for_file_access())", just "errcode_for_file_access()" is enough. The extra errcode() call is useless but harmless, so there's no user-visible bug here. Nevertheless, backpatch to 9.1 where this code were added.	2011-10-22 20:19:50 +03:00
Tom Lane	f9c92a5a3e	Code review for pgstat_get_crashed_backend_activity patch. Avoid possibly dumping core when pgstat_track_activity_query_size has a less-than-default value; avoid uselessly searching for the query string of a successfully-exited backend; don't bother putting out an ERRDETAIL if we don't have a query to show; some other minor stylistic improvements.	2011-10-21 16:36:04 -04:00
Tom Lane	5ac5980744	More cleanup after failed reduced-lock-levels-for-DDL feature. Turns out that use of ShareUpdateExclusiveLock or ShareRowExclusiveLock to protect DDL changes had gotten copied into several places that were not touched by either of Simon's original patches for the feature, and thus neither he nor I thought to revert them. (Indeed, it appears that two of these uses were committed after the reversion, which just goes to show that git merging is no panacea.) Change these places to use AccessExclusiveLock again. If we ever manage to resurrect that feature, we're going to have to think a bit harder about how to keep lock level usage in sync for DDL operations that aren't within the AlterTable infrastructure. Two of these bugs are only in HEAD, but one is in the 9.1 branch too. Alvaro found one of them, I found the other two.	2011-10-21 13:50:30 -04:00
Robert Haas	c8e8b5a6e2	Try to log current the query string when a backend crashes. To avoid minimize risk inside the postmaster, we subject this feature to a number of significant limitations. We very much wish to avoid doing any complex processing inside the postmaster, due to the posssibility that the crashed backend has completely corrupted shared memory. To that end, no encoding conversion is done; instead, we just replace anything that doesn't look like an ASCII character with a question mark. We limit the amount of data copied to 1024 characters, and carefully sanity check the source of that data. While these restrictions would doubtless be unacceptable in a general-purpose logging facility, even this limited facility seems like an improvement over the status quo ante. Marti Raudsepp, reviewed by PDXPUG and myself	2011-10-21 13:26:40 -04:00
Robert Haas	980261929f	Fix DROP OPERATOR FAMILY IF EXISTS. Essentially, the "IF EXISTS" portion was being ignored, and an error thrown anyway if the opfamily did not exist. I broke this in commit fd1843ff8979c0461fb3f1a9eab61140c977e32d; so backpatch to 9.1.X. Report and diagnosis by KaiGai Kohei.	2011-10-21 09:12:23 -04:00
Tom Lane	b4a0223d00	Simplify and improve ProcessStandbyHSFeedbackMessage logic. There's no need to clamp the standby's xmin to be greater than GetOldestXmin's result; if there were any such need this logic would be hopelessly inadequate anyway, because it fails to account for within-database versus cluster-wide values of GetOldestXmin. So get rid of that, and just rely on sanity-checking that the xmin is not wrapped around relative to the nextXid counter. Also, don't reset the walsender's xmin if the current feedback xmin is indeed out of range; that just creates more problems than we already had. Lastly, don't bother to take the ProcArrayLock; there's no need to do that to set xmin. Also improve the comments about this in GetOldestXmin itself.	2011-10-20 19:43:31 -04:00
Robert Haas	8f3362d4b7	Fix get_object_namespace() not to think extensions are "in" a schema. extnamespace means something altogether different in this context. Mostly by accident, this coding error (introduced in my commit `82a4a777d9`) broke the buildfarm instead of just silently doing the wrong thing.	2011-10-20 00:07:41 -04:00
Robert Haas	1d751018d8	Add "skipping" to the NOTICE produced by DROP OPERATOR CLASS IF EXISTS. This makes this message consistent with all the other similar notices produced by other DROP IF EXISTS commands. Noted by KaiGai Kohei	2011-10-19 23:45:31 -04:00
Robert Haas	82a4a777d9	Consolidate DROP handling for some object types. This gets rid of a significant amount of duplicative code. KaiGai Kohei, reviewed in earlier versions by Dimitri Fontaine, with further review and cleanup by me.	2011-10-19 23:27:19 -04:00
Tom Lane	aa90e148ca	Suppress -Wunused-result warnings about write() and fwrite(). This is merely an exercise in satisfying pedants, not a bug fix, because in every case we were checking for failure later with ferror(), or else there was nothing useful to be done about a failure anyway. Document the latter cases.	2011-10-18 21:37:51 -04:00
Tom Lane	e27f52f3a1	Reject empty pg_hba.conf files. An empty HBA file is surely an error, since it means there is no way to connect to the server. We've not heard identifiable reports of people actually doing that, but this will also close off the case Thom Brown just complained of, namely pointing hba_file at a directory. (On at least some platforms with some directories, it will read as an empty file.) Perhaps this should be back-patched, but given the lack of previous complaints, I won't add extra work for the translators.	2011-10-18 20:09:18 -04:00
Magnus Hagander	d1e25b78f9	Exclude postmaster.opts from base backups Noted by Fujii Masao	2011-10-18 15:58:37 +02:00
Tom Lane	336c1d7a51	Avoid assuming that index-only scan data matches the index's rowtype. In general the data returned by an index-only scan should have the datatypes originally computed by FormIndexDatum. If the index opclasses use "storage" datatypes different from their input datatypes, the scan tuple will not have the same rowtype attributed to the index; but we had a hard-wired assumption that that was true in nodeIndexonlyscan.c. We'd already hacked around the issue for the one case where the types are different in btree indexes (btree name_ops), but this would definitely come back to bite us if we ever implement index-only scans in GiST. To fix, require the index AM to explicitly provide the tupdesc for the tuple it is returning. btree can just pass back the index's tupdesc, but GiST will have to work harder when and if it supports index-only scans. I had previously proposed fixing this by allowing the index AM to fill the scan tuple slot directly; but on reflection that seemed like a module layering violation, since TupleTableSlots are creatures of the executor. At least in the btree case, it would also be less efficient, since the tuple deconstruction work would occur even for rows later found to be invisible to the scan's snapshot.	2011-10-16 19:15:04 -04:00
Tom Lane	9e8da0f757	Teach btree to handle ScalarArrayOpExpr quals natively. This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain indexscans, and particularly in index-only scans.	2011-10-16 15:39:24 -04:00
Tom Lane	d26e1ebaf5	Fix bugs in information_schema.referential_constraints view. This view was being insufficiently careful about matching the FK constraint to the depended-on primary or unique key constraint. That could result in failure to show an FK constraint at all, or showing it multiple times, or claiming that it depended on a different constraint than the one it really does. Fix by joining via pg_depend to ensure that we find only the correct dependency. Back-patch, but don't bump catversion because we can't force initdb in back branches. The next minor-version release notes should explain that if you need to fix this in an existing installation, you can drop the information_schema schema then re-create it by sourcing $SHAREDIR/information_schema.sql in each database (as a superuser of course).	2011-10-14 20:24:17 -04:00
Tom Lane	e6858e6657	Measure the number of all-visible pages for use in index-only scan costing. Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. This is pretty primitive and will no doubt see refinements once we've acquired more field experience with the index-only scan mechanism, but it's way better than using a constant. Note: I had to adjust an underspecified query in the window.sql regression test, because it was changing answers when the plan changed to use an index-only scan. Some of the adjacent tests perhaps should be adjusted as well, but I didn't do that here.	2011-10-14 17:23:46 -04:00
Robert Haas	393e828e31	Avoid potential relcache leak in objectaddress.c. Nobody using the missing_ok flag yet, but let's speculate that this will be a better interface for future callers. KaiGai Kohei, with some adjustments by me.	2011-10-14 11:35:40 -04:00
Bruce Momjian	0180bd6180	Remove all "traces" of trace_userlocks, because userlocks were removed in PG 8.2.	2011-10-13 19:59:57 -04:00
Tom Lane	7b96519fe2	Don't mark auto-generated types as extension members. Relation rowtypes and automatically-generated array types do not need to have their own extension membership dependency entries. If we create such then it becomes more difficult to remove items from an extension, and it's also harder for an extension upgrade script to make sure it duplicates the dependencies created by the extension's regular installation script. I changed the code in such a way that this happened in commit `988cccc620`, I think because of worries about the shell-type-replacement case; but that cure was worse than the disease. It would only matter if one extension created a shell type that was replaced with an auto-generated type in another extension, which seems pretty far-fetched. Better to make this work unsurprisingly in normal cases. Report and patch by Robert Haas, comment adjustments by me.	2011-10-12 18:41:49 -04:00
Bruce Momjian	484af9b376	Modify RelationGetBufferForTuple() to use a typedef, rather than a struct, to help pgindent.	2011-10-12 16:53:54 -04:00
Tom Lane	458857cc9d	Throw a useful error message if an extension script file is fed to psql. We have seen one too many reports of people trying to use 9.1 extension files in the old-fashioned way of sourcing them in psql. Not only does that usually not work (due to failure to substitute for MODULE_PATHNAME and/or @extschema@), but if it did work they'd get a collection of loose objects not an extension. To prevent this, insert an \echo ... \quit line that prints a suitable error message into each extension script file, and teach commands/extension.c to ignore lines starting with \echo. That should not only prevent any adverse consequences of loading a script file the wrong way, but make it crystal clear to users that they need to do it differently now. Tom Lane, following an idea of Andrew Dunstan's. Back-patch into 9.1 ... there is not going to be much value in this if we wait till 9.2.	2011-10-12 15:45:03 -04:00
Tom Lane	8c8ba6d11b	Add comment on why pulling data from a "name" index column can't crash. It's been bothering me for several days that pretending that the cstring data stored in a btree name_ops column is really a "name" Datum could lead to reading past the end of memory. However, given the current memory layout used for index-only scans in the btree code, a crash is in fact not possible. Document that so we don't break it. I have not thought of any other solutions that aren't fairly ugly too, and most of them lose the functionality of index-only scans on name columns altogether, so this seems like the way to go.	2011-10-11 18:40:53 -04:00
Tom Lane	cb6771fb32	Generate index-only scan tuple descriptor from the plan node's indextlist. Dept. of second thoughts: as long as we've got that tlist hanging around anyway, we can apply ExecTypeFromTL to it to get a suitable descriptor for the ScanTupleSlot. This is a nicer solution than the previous one because it eliminates some hard-wired knowledge about btree name_ops, and because it avoids the somewhat shaky assumption that we needn't set up the scan tuple descriptor in EXPLAIN_ONLY mode. It doesn't change what actually happens at run-time though, and I'm still a bit nervous about that.	2011-10-11 18:12:57 -04:00
Tom Lane	600d3206d1	Consider index-only scans even when there is no matching qual or ORDER BY. By popular demand.	2011-10-11 15:00:30 -04:00
Tom Lane	a0185461dd	Rearrange the implementation of index-only scans. This commit changes index-only scans so that data is read directly from the index tuple without first generating a faux heap tuple. The only immediate benefit is that indexes on system columns (such as OID) can be used in index-only scans, but this is necessary infrastructure if we are ever to support index-only scans on expression indexes. The executor is now ready for that, though the planner still needs substantial work to recognize the possibility. To do this, Vars in index-only plan nodes have to refer to index columns not heap columns. I introduced a new special varno, INDEX_VAR, to mark such Vars to avoid confusion. (In passing, this commit renames the two existing special varnos to OUTER_VAR and INNER_VAR.) This allows ruleutils.c to handle them with logic similar to what we use for subplan reference Vars. Since index-only scans are now fundamentally different from regular indexscans so far as their expression subtrees are concerned, I also chose to change them to have their own plan node type (and hence, their own executor source file).	2011-10-11 14:21:30 -04:00
Robert Haas	fa351d5a0d	Replace hardcoded switch in object_exists() with a lookup table. There's no particular advantage to this change on its face; indeed, it's possible that this might be slightly slower than the old way. But it makes this information more easily accessible to other functions, and therefore paves the way for future code consolidation. Performance isn't critical here, so there's no need to be smart about how we do the search. This is a heavily cut-down version of a patch from KaiGai Kohei, with several fixes by me. Additional review from Dimitri Fontaine.	2011-10-11 09:14:30 -04:00
Robert Haas	e76bcaba9c	Repair breakage in VirtualXactLock. I broke this in commit `84e3712677`. Report and fix by Fujii Masao.	2011-10-11 07:39:09 -04:00
Bruce Momjian	e26d5fcd94	Mark GUC external_pid_file's default as '' in postgresql.conf, rather than '(none)'.	2011-10-10 08:17:10 -04:00
Robert Haas	c0f03aae04	Fix ALTER TABLE ONLY .. DROP CONSTRAINT. When I consolidated two copies of the HOT-chain search logic in commit `4da99ea423`, I introduced a behavior change: the old code wouldn't necessarily traverse the entire chain, if the most recently returned tuple were updated while the HOT chain traversal is in progress. The new behavior seems more correct, but unfortunately, the code here relies on a scan with SnapshotNow failing to see its own updates. That seems pretty shaky even with the old HOT chain traversal behavior, since there's no guarantee that these updates will always be HOT, but it's trivial to broke a failure with the new HOT search logic. Fix by updating just the first matching pg_constraint tuple, rather than all of them, since there should be only one anyway. But since nobody has reproduced this failure on older versions, no back-patch for now. Report and test case by Alex Hunsaker; tablecmds.c changes by me.	2011-10-09 23:39:52 -04:00
Heikki Linnakangas	d50e125194	Clean up a couple of box gist helper functions. The original idea of this patch was to make box picksplit run faster, by eliminating unnecessary palloc() overhead, but that was obsoleted by the new double-sorting split algorithm that doesn't call these functions so heavily anymore. Nevertheless, the code looks better this way. Original patch by me, reviewed and tidied up after the double-sorting patch by Kevin Grittner.	2011-10-09 18:59:34 +03:00
Tom Lane	cbfa92c23c	Improve index-only scans to avoid repeated access to the index page. We copy all the matched tuples off the page during _bt_readpage, instead of expensively re-locking the page during each subsequent tuple fetch. This costs a bit more local storage, but not more than 2*BLCKSZ worth, and the reduction in LWLock traffic is certainly worth that. What's more, this lets us get rid of the API wart in the original patch that said an index AM could randomly decline to supply an index tuple despite having asserted pg_am.amcanreturn. That will be important for future improvements in the index-only-scan feature, since the executor will now be able to rely on having the index data available.	2011-10-09 00:21:08 -04:00
Tom Lane	b324384f6b	Fix brain fade in cost estimation for index-only scans. visibility_fraction should not be applied to regular indexscans. Noted by Cédric Villemain.	2011-10-08 10:41:17 -04:00
Heikki Linnakangas	1ef60dab70	Don't let transform_null_equals=on affect CASE foo WHEN NULL ... constructs. transform_null_equals is only supposed to affect "foo = NULL" expressions given directly by the user, not the internal "foo = NULL" expression generated from CASE-WHEN. This fixes bug #6242, reported by Sergey. Backpatch to all supported branches.	2011-10-08 11:17:40 +03:00
Tom Lane	a2822fb933	Support index-only scans using the visibility map to avoid heap fetches. When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.	2011-10-07 20:14:13 -04:00
Magnus Hagander	7aeff9f4a4	Ensure walsenders can be SIGTERMed while in non-walsender code In oder to exit on SIGTERM when in non-walsender code, such as do_pg_stop_backup(), we need to set the interrupt variables that are used there, and not just the walsender local ones.	2011-10-06 21:43:14 +02:00
Bruce Momjian	aaa6e1def2	Add postmaster -C option to query configuration parameters, and have pg_ctl use that to query the data directory for config-only installs. This fixes awkward or impossible pg_ctl operation for config-only installs.	2011-10-06 09:38:39 -04:00
Heikki Linnakangas	7f3bd86843	Replace the "New Linear" GiST split algorithm for boxes and points with a new double-sorting algorithm. The new algorithm produces better quality trees, making searches faster. Alexander Korotkov	2011-10-06 10:03:46 +03:00
Tom Lane	ba6f629326	Improve and simplify CREATE EXTENSION's management of GUC variables. CREATE EXTENSION needs to transiently set search_path, as well as client_min_messages and log_min_messages. We were doing this by the expedient of saving the current string value of each variable, doing a SET LOCAL, and then doing another SET LOCAL with the previous value at the end of the command. This is a bit expensive though, and it also fails badly if there is anything funny about the existing search_path value, as seen in a recent report from Roger Niederland. Fortunately, there's a much better way, which is to piggyback on the GUC infrastructure previously developed for functions with SET options. We just open a new GUC nesting level, do our assignments with GUC_ACTION_SAVE, and then close the nesting level when done. This automatically restores the prior settings without a re-parsing pass, so (in principle anyway) there can't be an error. And guc.c still takes care of cleanup in event of an error abort. The CREATE EXTENSION code for this was modeled on some much older code in ri_triggers.c, which I also changed to use the better method, even though there wasn't really much risk of failure there. Also improve the comments in guc.c to reflect this additional usage.	2011-10-05 20:44:16 -04:00
Tom Lane	41e461d36f	Improve define_custom_variable's handling of pre-existing settings. Arrange for any problems with pre-existing settings to be reported as WARNING not ERROR, so that we don't undesirably abort the loading of the incoming add-on module. The bad setting is just discarded, as though it had never been applied at all. (This requires a change in the API of set_config_option. After some thought I decided the most potentially useful addition was to allow callers to just pass in a desired elevel.) Arrange to restore the complete stacked state of the variable, rather than cheesily reinstalling only the active value. This ensures that custom GUCs will behave unsurprisingly even when the module loading operation occurs within nested subtransactions that have changed the active value. Since a module load could occur as a result of, eg, a PL function call, this is not an unlikely scenario.	2011-10-04 19:57:21 -04:00
Tom Lane	fa56a0c3e0	Fix uninitialized-variable bug.	2011-10-04 17:08:18 -04:00
Tom Lane	4bcb82a7d5	Add sourcefile/sourceline data to EXEC_BACKEND GUC transmission files. This oversight meant that on Windows, the pg_settings view would not display source file or line number information for values coming from postgresql.conf, unless the backend had received a SIGHUP since starting. In passing, also make the error detection in read_nondefault_variables a tad more thorough, and fix it to not lose precision on float GUCs (these changes are already in HEAD as of my previous commit).	2011-10-04 16:47:48 -04:00
Tom Lane	9f5836d224	Remember the source GucContext for each GUC parameter. We used to just remember the GucSource, but saving GucContext too provides a little more information --- notably, whether a SET was done by a superuser or regular user. This allows us to rip out the fairly dodgy code that define_custom_variable used to use to try to infer the context to re-install a pre-existing setting with. In particular, it now works for a superuser to SET a extension's SUSET custom variable before loading the associated extension, because GUC can remember whether the SET was done as a superuser or not. The plperl regression tests contain an example where this is useful.	2011-10-04 16:13:50 -04:00
Alvaro Herrera	09e196e453	Use callbacks in SlruScanDirectory for the actual action Previously, the code assumed that the only possible action to take was to delete files behind a certain cutoff point. The async notify code was already a crock: it used a different "pagePrecedes" function for truncation than for regular operation. By allowing it to pass a callback to SlruScanDirectory it can do cleanly exactly what it needs to do. The clog.c code also had its own use for SlruScanDirectory, which is made a bit simpler with this.	2011-10-04 14:03:23 -03:00
Tom Lane	1a00c0ef53	Remove the custom_variable_classes parameter. This variable provides only marginal error-prevention capability (since it can only check the prefix of a qualified GUC name), and the consensus is that that isn't worth the amount of hassle that maintaining the setting creates for DBAs. So, let's just remove it. With this commit, the system will silently accept a value for any qualified GUC name at all, whether it has anything to do with any known extension or not. (Unqualified names still have to match known built-in settings, though; and you will get a WARNING at extension load time if there's an unrecognized setting with that extension's prefix.) There's still some discussion ongoing about whether to tighten that up and if so how; but if we do come up with a solution, it's not likely to look anything like custom_variable_classes.	2011-10-04 12:36:55 -04:00
Tom Lane	76074fcaa0	ProcedureCreate neglected to record dependencies on default expressions. Thus, an object referenced in a default expression could be dropped while the function remained present. This was unaccountably missed in the original patch to add default parameters for functions. Reported by Pavel Stehule.	2011-10-03 12:13:15 -04:00
Tom Lane	d56b3afc03	Restructure error handling in reading of postgresql.conf. This patch has two distinct purposes: to report multiple problems in postgresql.conf rather than always bailing out after the first one, and to change the policy for whether changes are applied when there are unrelated errors in postgresql.conf. Formerly the policy was to apply no changes if any errors could be detected, but that had a significant consistency problem, because in some cases specific values might be seen as valid by some processes but invalid by others. This meant that the latter processes would fail to adopt changes in other parameters even though the former processes had done so. The new policy is that during SIGHUP, the file is rejected as a whole if there are any errors in the "name = value" syntax, or if any lines attempt to set nonexistent built-in parameters, or if any lines attempt to set custom parameters whose prefix is not listed in (the new value of) custom_variable_classes. These tests should always give the same results in all processes, and provide what seems a reasonably robust defense against loading values from badly corrupted config files. If these tests pass, all processes will apply all settings that they individually see as good, ignoring (but logging) any they don't. In addition, the postmaster does not abandon reading a configuration file after the first syntax error, but continues to read the file and report syntax errors (up to a maximum of 100 syntax errors per file). The postmaster will still refuse to start up if the configuration file contains any errors at startup time, but these changes allow multiple errors to be detected and reported before quitting. Alexey Klyukin, reviewed by Andy Colson and av (Alexander ?) with some additional hacking by Tom Lane	2011-10-02 16:50:04 -04:00
Tom Lane	5ec6b7f1b8	Improve generated column names for cases involving sub-SELECTs. We'll now use "exists" for EXISTS(SELECT ...), "array" for ARRAY(SELECT ...), or the sub-select's own result column name for a simple expression sub-select. Previously, you usually got "?column?" in such cases. Marti Raudsepp, reviewed by Kyotaro Horiugchi	2011-10-01 14:01:46 -04:00
Tom Lane	d22a09dc70	Support GiST index support functions that want to cache data across calls. pg_trgm was already doing this unofficially, but the implementation hadn't been thought through very well and leaked memory. Restructure the core GiST code so that it actually works, and document it. Ordinarily this would have required an extra memory context creation/destruction for each GiST index search, but I was able to avoid that in the normal case of a non-rescanned search by finessing the handling of the RBTree. It used to have its own context always, but now shares a context with the scan-lifespan data structures, unless there is more than one rescan call. This should make the added overhead unnoticeable in typical cases.	2011-09-30 19:48:57 -04:00
Tom Lane	79edb2b1dc	Fix recursion into previously planned sub-query in examine_simple_variable. This code was looking at the sub-Query tree as seen in the parent query's RangeTblEntry; but that's the pristine parser output, and what we need to look at is the tree as it stands at the completion of planning. Otherwise we might pick up a Var that references a subquery that got flattened and hence has no RelOptInfo in the subroot. Per report from Peter Geoghegan.	2011-09-29 18:13:16 -04:00
Bruce Momjian	054219c907	Fix pg_upgrade for EXEC_BACKEND builds (e.g. Windows) by properly passing the -b/binary-upgrade flag. Backpatch to 9.1.X.	2011-09-29 17:21:34 -04:00
Tom Lane	cb37c29106	Fix index matching for operators with mixed collatable/noncollatable inputs. If an indexable operator for a non-collatable indexed datatype has a collatable right-hand input type, any OpExpr for it will be marked with a nonzero inputcollid (since having one collatable input is sufficient to make that happen). However, an index on a non-collatable column certainly doesn't have any collation. This caused us to fail to match such operators to their indexes, because indxpath.c required an exact match of index collation and clause collation. It seems correct to allow a match when the index is collation-less regardless of the clause's inputcollid: an operator with both noncollatable and collatable inputs could perhaps depend on the collation of the collatable input, but it could hardly expect the index for the noncollatable input to have that same collation. Per bug #6232 from Pierre Ducroquet. His example is specifically about "hstore ? text" but the problem seems quite generic.	2011-09-29 00:43:42 -04:00
Robert Haas	f70648d5a1	Update comments related to the crash-safety of the visibility map. In hio.c, document how we avoid deadlock with respect to visibility map buffer locks. In visibilitymap.c, update the LOCKING section of the file header comment. Both oversights noted by Heikki Linnakangas.	2011-09-27 09:30:23 -04:00
Robert Haas	624f155ffa	heap_update() must recheck tuple after unlocking and relocking buffer. Bug found by Alvaro Herrera, fix suggested by Heikki Linnakangas and reviewed by Tom Lane.	2011-09-27 08:24:18 -04:00
Tom Lane	269c5dd2f4	Fix window functions that sort by expressions involving aggregates. In commit `c1d9579dd8`, I changed things so that the output of the Agg node that feeds the window functions would not list any ungrouped Vars directly. Formerly, for example, the Agg tlist might have included both "x" and "sum(x)", which is not really valid if "x" isn't a grouping column. If we then had a window function ordering on something like "sum(x) + 1", prepare_sort_from_pathkeys would find no exact match for this in the Agg tlist, and would conclude that it must recompute the expression. But it would break the expression down to just the Var "x", which it would find in the tlist, and then rebuild the ORDER BY expression using a reference to the subplan's "x" output. Now, after the above-referenced changes, "x" isn't in the Agg tlist if it's not a grouping column, so that prepare_sort_from_pathkeys fails with "could not find pathkey item to sort", as reported by Bricklen Anderson. The fix is to not break down Aggrefs into their component parts, but just treat them as irreducible expressions to be sought in the subplan tlist. This is definitely OK for the use with respect to window functions in grouping_planner, since it just built the tlist being used on the same basis. AFAICT it is safe for other uses too; most of the other call sites couldn't encounter Aggrefs anyway.	2011-09-26 23:48:39 -04:00
Tom Lane	57eb009092	Allow snapshot references to still work during transaction abort. In REPEATABLE READ (nee SERIALIZABLE) mode, an attempt to do GetTransactionSnapshot() between AbortTransaction and CleanupTransaction failed, because GetTransactionSnapshot would recompute the transaction snapshot (which is already wrong, given the isolation mode) and then re-register it in the TopTransactionResourceOwner, leading to an Assert because the TopTransactionResourceOwner should be empty of resources after AbortTransaction. This is the root cause of bug #6218 from Yamamoto Takashi. While changing plancache.c to avoid requesting a snapshot when handling a ROLLBACK masks the problem, I think this is really a snapmgr.c bug: it's lower-level than the resource manager mechanism and should not be shutting itself down before we unwind resource manager resources. However, just postponing the release of the transaction snapshot until cleanup time didn't work because of the circular dependency with TopTransactionResourceOwner. Fix by managing the internal reference to that snapshot manually instead of depending on TopTransactionResourceOwner. This saves a few cycles as well as making the module layering more straightforward. predicate.c's dependencies on TopTransactionResourceOwner go away too. I think this is a longstanding bug, but there's no evidence that it's more than a latent bug, so it doesn't seem worth any risk of back-patching.	2011-09-26 22:25:28 -04:00
Robert Haas	821fd903f9	Update obsolete comments. This was partially fixed by `57fdb2b0d8`, back in 2005, but it missed a couple of spots. YAMAMOTO Takashi	2011-09-26 13:12:22 -04:00
Tom Lane	21fb95da46	Use a fresh copy of query_list when making a second plan in GetCachedPlan. The code path that tried a generic plan, didn't like it, and then made a custom plan was mistakenly passing the same copy of the query_list to the planner both times. This doesn't work too well for nontrivial queries, since the planner tends to scribble on its input. Diagnosis and fix by Yamamoto Takashi.	2011-09-26 12:44:17 -04:00
Tom Lane	d5aa7a9fe6	Avoid unnecessary snapshot-acquisitions in BuildCachedPlan. I had copied-and-pasted a claim that we couldn't reach this point when dealing with utility statements, but that was a leftover from when the caller was required to supply a plan to start with. We now will go through here at least once when handling a utility statement, so it seems worth a check to see whether a snapshot is actually needed. (Note that analyze_requires_snapshot is quite a cheap test.) Per suggestion from Yamamoto Takashi. I don't think I believe that this resolves his reported assertion failure; but it's worth changing anyway, just to save a cycle or two.	2011-09-25 17:34:20 -04:00
Tom Lane	7741dd6590	Recognize self-contradictory restriction clauses for non-table relations. The constraint exclusion feature checks for contradictions among scan restriction clauses, as well as contradictions between those clauses and a table's CHECK constraints. The first aspect of this testing can be useful for non-table relations (such as subqueries or functions-in-FROM), but the feature was coded with only the CHECK case in mind so we were applying it only to plain-table RTEs. Move the relation_excluded_by_constraints call so that it is applied to all RTEs not just plain tables. With the default setting of constraint_exclusion this results in no extra work, but with constraint_exclusion = ON we will detect optimizations that we missed before (at the cost of more planner cycles than we expended before). Per a gripe from Gunnlaugur Þór Briem. Experimentation with his example also showed we were not being very bright about the case where constraint exclusion is proven within a subquery within UNION ALL, so tweak the code to allow set_append_rel_pathlist to recognize such cases.	2011-09-24 19:33:16 -04:00
Robert Haas	0c8eda6258	Memory barrier support for PostgreSQL. This is not actually used anywhere yet, but it gets the basic infrastructure in place. It is fairly likely that there are bugs, and support for some important platforms may be missing, so we'll need to refine this as we go along.	2011-09-23 17:52:43 -04:00
Tom Lane	f197272365	Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps. This provides information about the numbers of tuples that were visited but not returned by table scans, as well as the numbers of join tuples that were considered and discarded within a join plan node. There is still some discussion going on about the best way to report counts for outer-join situations, but I think most of what's in the patch would not change if we revise that, so I'm going to go ahead and commit it as-is. Documentation changes to follow (they weren't in the submitted patch either). Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom	2011-09-22 11:30:11 -04:00
Robert Haas	4893552e21	Fix another bit of unlogged-table-induced breakage. Per bug #6205, reported by Abel Abraham Camarillo Ojeda. This isn't a particularly elegant fix, but I'm trying to minimize the chances of causing yet another round of breakage. Adjust regression tests to exercise this case.	2011-09-21 10:48:31 -04:00
Tom Lane	2562dcea81	Suppress "unused function" warning when not HAVE_LOCALE_T. Forgot to consider this case ...	2011-09-20 17:47:21 -04:00
Tom Lane	37d4fd2b9d	Improve reporting of newlocale() failures in CREATE COLLATION. The standardized errno code for "no such locale" failures is ENOENT, which we were just reporting at face value, viz "No such file or directory". Per gripe from Thom Brown, this might confuse users, so add an errdetail message to clarify what it means. Also, report newlocale() failures as ERRCODE_INVALID_PARAMETER_VALUE rather than using errcode_for_file_access(), since newlocale()'s errno values aren't necessarily tied directly to file access failures.	2011-09-20 13:23:40 -04:00
Tom Lane	c4ae968633	Fix Assert failure in new plancache code. The regression tests were failing with CLOBBER_CACHE_ALWAYS enabled, as reported by buildfarm member jaguar. There was an Assert in BuildCachedPlan that asserted that the CachedPlanSource hadn't been invalidated since we called RevalidateCachedQuery, which in theory can't happen because we are holding locks on all the relevant database objects. However, CLOBBER_CACHE_ALWAYS generates a false positive by making an invalidation happen anyway; and on reflection, that could also occur as a result of a badly-timed sinval reset due to queue overflow. We could just remove the Assert and forge ahead with the not-really-stale querytree, but it seems safer to do another RevalidateCachedQuery call just to make real sure everything's OK.	2011-09-17 01:47:33 -04:00
Tom Lane	99b5454167	Remove debug logging for pgstat wait timeout. This reverts commit `79b2ee20c8`, which proved to not be very informative; it looks like the "pgstat wait timeout" warnings in the buildfarm are just a symptom of running on heavily loaded machines, and there isn't any weird mechanism causing them to appear. To try to reduce the frequency of buildfarm failures from this effect, increase PGSTAT_MAX_WAIT_TIME from 5 seconds to 10. Also, arrange to not send a fresh inquiry message every single time through the loop, as that seems more likely to cause problems (by swamping the collector) than fix them. We'll now send an inquiry the first time through the delay loop, and every 640 msec thereafter.	2011-09-16 18:25:27 -04:00
Tom Lane	9d306c66e6	Avoid unnecessary page-level SSI lock check in heap_insert(). As observed by Heikki, we need not conflict on heap page locks during an insert; heap page locks are only aggregated tuple locks, they don't imply locking "gaps" as index page locks do. So we can avoid some unnecessary conflicts, and also do the SSI check while not holding exclusive lock on the target buffer. Kevin Grittner, reviewed by Jeff Davis. Back-patch to 9.1.	2011-09-16 14:47:20 -04:00
Tom Lane	0a6cc28500	gistendscan() forgot to free so->giststate. This oversight led to a massive memory leak --- upwards of 10KB per tuple --- during creation-time verification of an exclusion constraint based on a GIST index. In most other scenarios it'd just be a leak of 10KB that would be recovered at end of query, so not too significant; though perhaps the leak would be noticeable in a situation where a GIST index was being used in a nestloop inner indexscan. In any case, it's a real leak of long standing, so patch all supported branches. Per report from Harald Fuchs.	2011-09-16 04:27:49 -04:00
Tom Lane	e6faf910d7	Redesign the plancache mechanism for more flexibility and efficiency. Rewrite plancache.c so that a "cached plan" (which is rather a misnomer at this point) can support generation of custom, parameter-value-dependent plans, and can make an intelligent choice between using custom plans and the traditional generic-plan approach. The specific choice algorithm implemented here can probably be improved in future, but this commit is all about getting the mechanism in place, not the policy. In addition, restructure the API to greatly reduce the amount of extraneous data copying needed. The main compromise needed to make that possible was to split the initial creation of a CachedPlanSource into two steps. It's worth noting in particular that SPI_saveplan is now deprecated in favor of SPI_keepplan, which accomplishes the same end result with zero data copying, and no need to then spend even more cycles throwing away the original SPIPlan. The risk of long-term memory leaks while manipulating SPIPlans has also been greatly reduced. Most of this improvement is based on use of the recently-added MemoryContextSetParent primitive.	2011-09-16 00:43:52 -04:00
Alvaro Herrera	86822df9b5	Split walsender.h in public/private headers This dramatically cuts short the number of headers the public one brings into whatever includes it.	2011-09-13 21:42:49 -03:00
Tom Lane	6693c9a5ed	deflist_to_tuplestore dumped core on an option with no value. Make it return NULL for the option_value, instead. Per report from Frank van Vugt. Back-patch to 8.4 where this code was added.	2011-09-13 11:36:49 -04:00
Heikki Linnakangas	8caf6132c7	In the final emptying phase of the new GiST buffering build, set the queuedForEmptying flag correctly on buffer when adding it to the queue. Also, don't add buffer to the queue if it's there already. These were harmless oversights; failing to set the flag just means that a buffer might get added to the queue twice if more tuples are added to it (although that can't actually happen at this point because all the upper buffers have already been emptied), and having the same buffer twice in the emptying queue is harmless. But better be tidy.	2011-09-12 13:06:06 +03:00
Tom Lane	b0025bd957	Invent a new memory context primitive, MemoryContextSetParent. This function will be useful for altering the lifespan of a context after creation (for example, by creating it under a transient context and later reparenting it to belong to a long-lived context). It costs almost no new code, since we can refactor what was there. Per my proposal of yesterday.	2011-09-11 16:29:42 -04:00
Peter Eisentraut	1b81c2fe6e	Remove many -Wcast-qual warnings This addresses only those cases that are easy to fix by adding or moving a const qualifier or removing an unnecessary cast. There are many more complicated cases remaining.	2011-09-11 21:54:32 +03:00
Tom Lane	ca4af308c3	Simplify handling of the timezone GUC by making initdb choose the default. We were doing some amazingly complicated things in order to avoid running the very expensive identify_system_timezone() procedure during GUC initialization. But there is an obvious fix for that, which is to do it once during initdb and have initdb install the system-specific default into postgresql.conf, as it already does for most other GUC variables that need system-environment-dependent defaults. This means that the timezone (and log_timezone) settings no longer have any magic behavior in the server. Per discussion.	2011-09-09 17:59:11 -04:00
Tom Lane	a7801b62f2	Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.	2011-09-09 13:23:41 -04:00
Tom Lane	d63de337f3	round() is not portable. Use rint().	2011-09-08 16:38:24 -04:00
Alvaro Herrera	295e7dc929	Tweak string for uniformity	2011-09-08 16:39:58 -03:00
Heikki Linnakangas	5edb24a898	Buffering GiST index build algorithm. When building a GiST index that doesn't fit in cache, buffers are attached to some internal nodes in the index. This speeds up the build by avoiding random I/O that would otherwise be needed to traverse all the way down the tree to the find right leaf page for tuple. Alexander Korotkov	2011-09-08 17:51:23 +03:00
Tom Lane	f0bedf3e45	Fix corner case bug in numeric to_char(). Trailing-zero stripping applied by the FM specifier could strip zeroes to the left of the decimal point, for a format with no digit positions after the decimal point (such as "FM999."). Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.	2011-09-07 17:07:20 -04:00
Tom Lane	99155aaa33	Fix typo in error message. Per Euler Taveira de Oliveira.	2011-09-07 13:29:26 -04:00
Tom Lane	a7d9203cc4	Fix get_name_for_var_field() to deal with RECORD Params. With 9.1's use of Params to pass down values from NestLoop join nodes to their inner plans, it is possible for a Param to have type RECORD, in which case the set of fields comprising the value isn't determinable by inspection of the Param alone. However, just as with a Var of type RECORD, we can find out what we need to know if we can locate the expression that the Param represents. We already knew how to do this in get_parameter(), but I'd overlooked the need to be able to cope in get_name_for_var_field(), which led to EXPLAIN failing with "record type has not been registered". To fix, refactor the search code in get_parameter() so it can be used by both functions. Per report from Marti Raudsepp.	2011-09-07 13:01:36 -04:00
Bruce Momjian	029dfdf115	Fix to_date() and to_timestamp() to handle year masks of length < 4 so they wrap toward year 2020, rather than the inconsistent behavior we had before.	2011-09-07 09:47:51 -04:00
Simon Riggs	df383b03e6	Partially revoke attempt to improve performance with many savepoints. Maintain difference between subtransaction release and commit introduced by earlier patch.	2011-09-07 12:11:26 +01:00
Simon Riggs	dde70cc313	Emit cascaded standby message on shutdown only when appropriate. Adds additional test for active walsenders and closes a race condition for when we failover when a new walsender was connecting. Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas	2011-09-07 09:09:47 +01:00
Tom Lane	db10f01baa	Improve comment about handling of temp tables in shared-inval code.	2011-09-06 17:06:54 -04:00
Peter Eisentraut	e6d800981e	Correct ancient logic mistake in assertion Found by gcc -Wlogical-op	2011-09-06 23:05:02 +03:00
Tom Lane	623f77e9d1	Avoid possibly accessing off the end of memory in SJIS2004 conversion. The code in shift_jis_20042euc_jis_2004() would fetch two bytes even when only one remained in the string. Since conversion functions aren't supposed to assume null-terminated input, this poses a small risk of fetching past the end of memory and incurring SIGSEGV. No such crash has been identified in the field, but we've certainly seen the equivalent happen in other code paths, so patch this one all the way back. Report and patch by Noah Misch.	2011-09-06 14:50:28 -04:00
Tom Lane	780a342c90	Avoid possibly accessing off the end of memory in examine_attribute(). Since the last couple of columns of pg_type are often NULL, sizeof(FormData_pg_type) can be an overestimate of the actual size of the tuple data part. Therefore memcpy'ing that much out of the catalog cache, as analyze.c was doing, poses a small risk of copying past the end of memory and incurring SIGSEGV. No such crash has been identified in the field, but we've certainly seen the equivalent happen in other code paths, so patch this one all the way back. Per valgrind testing by Noah Misch, though this is not his proposed patch. I chose to use SearchSysCacheCopy1 rather than inventing special-purpose infrastructure for copying only the minimal part of a pg_type tuple.	2011-09-06 14:37:22 -04:00
Bruce Momjian	f458c90bff	Add C comment about why we send cache invalidation messages for session-local objects.	2011-09-05 22:09:02 -04:00
Alvaro Herrera	56a9ed92b6	Adjust translator comment format to xgettext expectations	2011-09-05 19:04:30 -03:00
Alvaro Herrera	b64f18c583	Mark some untranslatable messages with errmsg_internal	2011-09-05 17:48:07 -03:00
Peter Eisentraut	a2a5ce6826	Improve "invalid byte sequence for encoding" message It used to say ERROR: invalid byte sequence for encoding "UTF8": 0xdb24 Change this to ERROR: invalid byte sequence for encoding "UTF8": 0xdb 0x24 to make it clear that this is a byte sequence and not a code point. Also fix the adjacent "character has no equivalent" message that has the same issue.	2011-09-05 23:38:27 +03:00
Tom Lane	4c2777d0b7	Change get_variable_numdistinct's API to flag default estimates explicitly. Formerly, callers tested for DEFAULT_NUM_DISTINCT, which had the problem that a perfectly solid estimate might be mistaken for a content-free default.	2011-09-04 15:41:49 -04:00
Tom Lane	1cb108efb0	Dig down into sub-selects to look for column statistics. If a sub-select's output column is a simple Var, recursively look for statistics applying to that Var, and use them if available. The need for this was foreseen ages ago, but we didn't have enough infrastructure to do it with reasonable speed until just now. We punt and stick with default estimates if the subquery uses set operations, GROUP BY, or DISTINCT, since those operations would change the underlying column statistics (particularly, the relative frequencies of different values) beyond recognition. This means that the types of sub-selects for which this improvement applies are fairly limited, since most subqueries satisfying those restrictions would have gotten flattened into the parent query anyway. But it does help for some cases, such as subqueries with ORDER BY or LIMIT.	2011-09-04 15:13:46 -04:00
Tom Lane	698df3350d	Can't print PlannerGlobal's subroots list in outfuncs. Since the subroots will surely link back to the same glob struct, this necessarily leads to infinite recursion. Doh. Found while trying to debug some other code.	2011-09-04 14:43:52 -04:00
Tom Lane	1609797c25	Clean up the #include mess a little. walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.	2011-09-04 01:13:16 -04:00
Tom Lane	b3aaf9081a	Rearrange planner to save the whole PlannerInfo (subroot) for a subquery. Formerly, set_subquery_pathlist and other creators of plans for subqueries saved only the rangetable and rowMarks lists from the lower-level PlannerInfo. But there's no reason not to remember the whole PlannerInfo, and indeed this turns out to simplify matters in a number of places. The immediate reason for doing this was so that the subroot will still be accessible when we're trying to extract column statistics out of an already-planned subquery. But now that I've done it, it seems like a good code-beautification effort in its own right. I also chose to get rid of the transient subrtable and subrowmark fields in SubqueryScan nodes, in favor of having setrefs.c look up the subquery's RelOptInfo. That required changing all the APIs in setrefs.c to pass PlannerInfo not PlannerGlobal, which was a large but quite mechanical transformation. One side-effect not foreseen at the beginning is that this finally broke inheritance_planner's assumption that replanning the same subquery RTE N times would necessarily give interchangeable results each time. That assumption was always pretty risky, but now we really have to make a separate RTE for each instance so that there's a place to carry the separate subroots.	2011-09-03 15:36:24 -04:00
Peter Eisentraut	42ad992fdc	Add archive_command example	2011-09-03 01:29:09 +03:00
Peter Eisentraut	f1e4f3d44f	Whitespace adjustment for consistency in the file	2011-09-03 01:28:05 +03:00
Tom Lane	5b562644fe	Teach ANALYZE to clear pg_class.relhassubclass when appropriate. In the past, relhassubclass always remained true if a relation had ever had child relations, even if the last subclass was long gone. While this had only marginal performance implications in most cases, it was annoying, and I'm now considering some planner changes that would raise the cost of a false positive. It was previously impractical to fix this because of race condition concerns. However, given the recent change that made tablecmds.c take ShareExclusiveLock on relations that are gaining a child (commit `fbcf4b92aa`), we can now allow ANALYZE to clear the flag when it's no longer relevant. There is no additional locking cost to do so, since ANALYZE takes ShareExclusiveLock anyway.	2011-09-02 14:29:31 -04:00
Bruce Momjian	10af3ab2b2	Add C comment about needed include.	2011-09-01 12:53:45 -04:00
Tom Lane	e5b012b788	Put back improperly removed #include.	2011-09-01 11:57:46 -04:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Heikki Linnakangas	a88b6e4cfb	setlocale() on Windows doesn't work correctly if the locale name contains dots. I previously worked around this in initdb, mapping the known problematic locale names to aliases that work, but Hiroshi Inoue pointed out that that's not enough because even if you use one of the aliases, like "Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie. "Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by passing that value back to setlocale(), it fails. Note that you are affected by this bug also if you use one of those short-form names manually, so just reverting the hack in initdb won't fix it. To work around that, move the locale name mapping from initdb to a wrapper around setlocale(), so that the mapping is invoked on every setlocale() call. Also, add a few checks for failed setlocale() calls in the backend. These calls shouldn't fail, and if they do there isn't much we can do about it, but at least you'll get a warning. Backpatch to 9.1, where the initdb hack was introduced. The Windows bug affects older versions too if you set locale manually to one of the aliases, but given the lack of complaints from the field, I'm hesitent to backpatch.	2011-09-01 11:08:32 +03:00
Tom Lane	0d3b231eeb	Further repair of eqjoinsel ndistinct-clamping logic. Examination of examples provided by Mark Kirkwood and others has convinced me that actually commit `7f3eba30c9` was quite a few bricks shy of a load. The useful part of that patch was clamping ndistinct for the inner side of a semi or anti join, and the reason why that's needed is that it's the only way that restriction clauses eliminating rows from the inner relation can affect the estimated size of the join result. I had not clearly understood why the clamping was appropriate, and so mis-extrapolated to conclude that we should clamp ndistinct for the outer side too, as well as for both sides of regular joins. These latter actions were all wrong, and are reverted with this patch. In addition, the clamping logic is now made to affect the behavior of both paths in eqjoinsel_semi, with or without MCV lists to compare. When we have MCVs, we suppose that the most common values are the ones that are most likely to survive the decimation resulting from a lower restriction clause, so we think of the clamping as eliminating non-MCV values, or potentially even the least-common MCVs for the inner relation. Back-patch to 8.4, same as previous fixes in this area.	2011-09-01 00:19:38 -04:00
Tom Lane	97930cf578	Improve eqjoinsel's ndistinct clamping to work for multiple levels of join. This patch fixes an oversight in my commit `7f3eba30c9` of 2008-10-23. That patch accounted for baserel restriction clauses that reduced the number of rows coming out of a table (and hence the number of possibly-distinct values of a join variable), but not for join restriction clauses that might have been applied at a lower level of join. To account for the latter, look up the sizes of the min_lefthand and min_righthand inputs of the current join, and clamp with those in the same way as for the base relations. Noted while investigating a complaint from Ben Chobot, although this in itself doesn't seem to explain his report. Back-patch to 8.4; previous versions used different estimation methods for which this heuristic isn't relevant.	2011-08-31 16:05:43 -04:00
Tom Lane	5bba65de94	Fix a missed case in code for "moving average" estimate of reltuples. It is possible for VACUUM to scan no pages at all, if the visibility map shows that all pages are all-visible. In this situation VACUUM has no new information to report about the relation's tuple density, so it wasn't changing pg_class.reltuples ... but it updated pg_class.relpages anyway. That's wrong in general, since there is no evidence to justify changing the density ratio reltuples/relpages, but it's particularly bad if the previous state was relpages=reltuples=0, which means "unknown tuple density". We just replaced "unknown" with "zero". ANALYZE would eventually recover from this, but it could take a lot of repetitions of ANALYZE to do so if the relation size is much larger than the maximum number of pages ANALYZE will scan, because of the moving-average behavior introduced by commit `b4b6923e03`. The only known situation where we could have relpages=reltuples=0 and yet the visibility map asserts everything's visible is immediately following a pg_upgrade. It might be advisable for pg_upgrade to try to preserve the relpages/reltuples statistics; but in any case this code is wrong on its own terms, so fix it. Per report from Sergey Koposov. Back-patch to 8.4, where the visibility map was introduced, same as the previous change.	2011-08-30 14:51:38 -04:00
Robert Haas	8a3d33c8e6	Fix parsing of time string followed by yesterday/today/tomorrow. Previously, 'yesterday 04:00:00'::timestamp didn't do the same thing as '04:00:00 yesterday'::timestamp, and the return value from the latter was midnight rather than the specified time. Dean Rasheed, with some stylistic changes	2011-08-30 11:38:42 -04:00
Robert Haas	eab2ef6164	Remove some tabs from README file. Some of the ASCII art expected 8-space tab stops, and some of it expected 4-space tab stops. Per report from YAMAMOTO Takashi.	2011-08-29 22:26:29 -04:00
Tom Lane	a5b7640ba0	Fix concat_ws() to not insert a separator after leading NULL argument(s). Per bug #6181 from Itagaki Takahiro. Also do some marginal code cleanup and improve error handling.	2011-08-29 15:20:57 -04:00
Robert Haas	c01c25fbe5	Improve spinlock performance for HP-UX, ia64, non-gcc. At least on this architecture, it's very important to spin on a non-atomic instruction and only retry the atomic once it appears that it will succeed. To fix this, split TAS() into two macros: TAS(), for trying to grab the lock the first time, and TAS_SPIN(), for spinning until we get it. TAS_SPIN() defaults to same as TAS(), but we can override it when we know there's a better way. It's likely that some of the other cases in s_lock.h require similar treatment, but this is the only one we've got conclusive evidence for at present.	2011-08-29 10:05:48 -04:00

1 2 3 4 5 ...

12362 Commits