postgresql

Commit Graph

Author	SHA1	Message	Date
Joe Conway	5eb15c9942	SERIALIZABLE transactions are actually implemented beneath the covers with transaction snapshots, i.e. a snapshot registered at the beginning of a transaction. Change variable naming and comments to reflect this reality in preparation for a future, truly serializable mode, e.g. Serializable Snapshot Isolation (SSI). For the moment transaction snapshots are still used to implement SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin Grittner and Dan Ports with review and some minor wording changes by me.	2010-09-11 18:38:58 +00:00
Tom Lane	db2d9c602c	Fix ExecMakeTableFunctionResult to verify that all rows returned by a SRF returning "record" actually do have the same rowtype. This is needed because the parser can't realistically enforce that they will all have the same typmod, as seen in a recent example from David Wheeler. Back-patch to 8.0, which is as far back as we have the notion of RECORD subtypes being distinguished by typmod. Wheeler's example depends on 8.4-and-up features, but I suspect there may be ways to provoke similar failures before 8.4.	2010-08-26 18:54:37 +00:00
Tom Lane	3573c8346d	Reset the per-output-tuple exprcontext each time through the main loop in ExecModifyTable(). This avoids memory leakage when trigger functions leave junk behind in that context (as they more or less must). Problem and solution identified by Dean Rasheed. I'm a bit concerned about the longevity of this solution --- once a plan can have multiple ModifyTable nodes, we are very possibly going to have to do something different. But it should hold up for 9.0.	2010-08-18 21:52:24 +00:00
Robert Haas	2a6ef3445c	Standardize get_whatever_oid functions for object types with unqualified names. - Add a missing_ok parameter to get_tablespace_oid. - Avoid duplicating get_tablespace_od guts in objectNamesToOids. - Add a missing_ok parameter to get_database_oid. - Replace get_roleid and get_role_checked with get_role_oid. - Add get_namespace_oid, get_language_oid, get_am_oid. - Refactor existing code to use new interfaces. Thanks to KaiGai Kohei for the review.	2010-08-05 14:45:09 +00:00
Tom Lane	77c75076f3	Fix oversight in new EvalPlanQual logic: the second loop over the ExecRowMark list in ExecLockRows() forgot to allow for the possibility that some of the rowmarks are for child tables that aren't relevant to the current row. Per report from Kenichiro Tanaka.	2010-07-28 17:21:56 +00:00
Tom Lane	133924e13e	Fix potential failure when hashing the output of a subplan that produces a pass-by-reference datatype with a nontrivial projection step. We were using the same memory context for the projection operation as for the temporary context used by the hashtable routines in execGrouping.c. However, the hashtable routines feel free to reset their temp context at any time, which'd lead to destroying input data that was still needed. Report and diagnosis by Tao Ma. Back-patch to 8.1, where the problem was introduced by the changes that allowed us to work with "virtual" tuples instead of materializing intermediate tuple values everywhere. The earlier code looks quite similar, but it doesn't suffer the problem because the data gets copied into another context as a result of having to materialize ExecProject's output tuple.	2010-07-28 04:50:50 +00:00
Robert Haas	a3b012b560	CREATE TABLE IF NOT EXISTS. Reviewed by Bernd Helmle.	2010-07-25 23:21:22 +00:00
Robert Haas	b8c6c71d1c	Centralize DML permissions-checking logic. Remove bespoke code in DoCopy and RI_Initial_Check, which now instead fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry. This is intended to make it feasible for an enhanced security provider to actually make use of ExecutorCheckPerms_hook, but also has the advantage that RI_Initial_Check can allow use of the fast-path when column-level but not table-level permissions are present. KaiGai Kohei. Reviewed (in an earlier version) by Stephen Frost, and by me. Some further changes to the comments by me.	2010-07-22 00:47:59 +00:00
Tom Lane	e11cfa87be	Remove a sanity check in the exclusion-constraint code that prevented users from defining non-self-conflicting constraints. Jeff Davis Note: I (tgl) objected to removing this check in 9.0 on the grounds that it was an important sanity check in new, poorly tested code. However, it should be all right to remove it for 9.1, since we'll get field testing from the 9.0 branch.	2010-07-16 00:45:30 +00:00
Tom Lane	53e757689c	Make NestLoop plan nodes pass outer-relation variables into their inner relation using the general PARAM_EXEC executor parameter mechanism, rather than the ad-hoc kluge of passing the outer tuple down through ExecReScan. The previous method was hard to understand and could never be extended to handle parameters coming from multiple join levels. This patch doesn't change the set of possible plans nor have any significant performance effect, but it's necessary infrastructure for future generalization of the concept of an inner indexscan plan. ExecReScan's second parameter is now unused, so it's removed.	2010-07-12 17:01:06 +00:00
Robert Haas	f4122a8d50	Add a hook in ExecCheckRTPerms(). This hook allows a loadable module to gain control when table permissions are checked. It is expected to be used by an eventual SE-PostgreSQL implementation, but there are other possible applications as well. A sample contrib module can be found in the archives at: http://archives.postgresql.org/pgsql-hackers/2010-05/msg01095.php Robert Haas and Stephen Frost	2010-07-09 14:06:01 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Bruce Momjian	7190220555	Add C comment that we will have to remove an exclusion constraint check if we ever implement '<>' index opclasses. Jeff Davis	2010-05-29 02:32:08 +00:00
Tom Lane	f39d57b83c	Rejigger mergejoin logic so that a tuple with a null in the first merge column is treated like end-of-input, if nulls sort last in that column and we are not doing outer-join filling for that input. In such a case, the tuple cannot join to anything from the other input (because we assume mergejoinable operators are strict), and neither can any tuple following it in the sort order. If we're not interested in doing outer-join filling we can just pretend the tuple and its successors aren't there at all. This can save a great deal of time in situations where there are many nulls in the join column, as in a recent example from Scott Marlowe. Also, since the planner tends to not count nulls in its mergejoin scan selectivity estimates, this is an important fix to make the runtime behavior more like the estimate. I regard this as an omission in the patch I wrote years ago to teach mergejoin that tuples containing nulls aren't joinable, so I'm back-patching it. But only to 8.3 --- in older versions, we didn't have a solid notion of whether nulls sort high or low, so attempting to apply this optimization could break things.	2010-05-28 01:14:03 +00:00
Heikki Linnakangas	9b8a73326e	Introduce wal_level GUC to explicitly control if information needed for archival or hot standby should be WAL-logged, instead of deducing that from other options like archive_mode. This replaces recovery_connections GUC in the primary, where it now has no effect, but it's still used in the standby to enable/disable hot standby. Remove the WAL-logging of "unlogged operations", like creating an index without WAL-logging and fsyncing it at the end. Instead, we keep a copy of the wal_mode setting and the settings that affect how much shared memory a hot standby server needs to track master transactions (max_connections, max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings change, at server restart, write a WAL record noting the new settings and update pg_control. This allows us to notice the change in those settings in the standby at the right moment, they used to be included in checkpoint records, but that meant that a changed value was not reflected in the standby until the first checkpoint after the change. Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to the sequence it used to follow, before hot standby and subsequent patches changed it to 0x9003.	2010-04-28 16:10:43 +00:00
Peter Eisentraut	c248d17120	Message tuning	2010-03-21 00:17:59 +00:00
Tom Lane	a836abe9f6	Modify error context callback functions to not assume that they can fetch catalog entries via SearchSysCache and related operations. Although, at the time that these callbacks are called by elog.c, we have not officially aborted the current transaction, it still seems rather risky to initiate any new catalog fetches. In all these cases the needed information is readily available in the caller and so it's just a matter of a bit of extra notation to pass it to the callback. Per crash report from Dennis Koegel. I've concluded that the real fix for his problem is to clear the error context stack at entry to proc_exit, but it still seems like a good idea to make the callbacks a bit less fragile for other cases. Backpatch to 8.4. We could go further back, but the patch doesn't apply cleanly. In the absence of proof that this fixes something and isn't just paranoia, I'm not going to expend the effort.	2010-03-19 22:54:41 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Tom Lane	05d8a561ff	Clean up handling of XactReadOnly and RecoveryInProgress checks. Add some checks that seem logically necessary, in particular let's make real sure that HS slave sessions cannot create temp tables. (If they did they would think that temp tables belonging to the master's session with the same BackendId were theirs. We must not allow myTempNamespace to become set in a slave session.) Change setval() and nextval() so that they are only allowed on temp sequences in a read-only transaction. This seems consistent with what we allow for table modifications in read-only transactions. Since an HS slave can't have a temp sequence, this also provides a nicer cure for the setval PANIC reported by Erik Rijkers. Make the error messages more uniform, and have them mention the specific command being complained of. This seems worth the trifling amount of extra code, since people are likely to see such messages a lot more than before.	2010-02-20 21:24:02 +00:00
Tom Lane	11d5ba97f8	Fix ExecEvalArrayRef to pass down the old value of the array element or slice being assigned to, in case the expression to be assigned is a FieldStore that would need to modify that value. The need for this was foreseen some time ago, but not implemented then because we did not have arrays of composites. Now we do, but the point evidently got overlooked in that patch. Net result is that updating a field of an array element doesn't work right, as illustrated if you try the new regression test on an unpatched backend. Noted while experimenting with EXPLAIN VERBOSE, which has also got some issues in this area. Backpatch to 8.3, where arrays of composites were introduced.	2010-02-18 18:41:47 +00:00
Robert Haas	e26c539e9f	Wrap calls to SearchSysCache and related functions using macros. The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.	2010-02-14 18:42:19 +00:00
Tom Lane	ec4be2ee68	Extend the set of frame options supported for window functions. This patch allows the frame to start from CURRENT ROW (in either RANGE or ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING start and end points. (RANGE value PRECEDING/FOLLOWING isn't there yet --- the grammar works, but that's all.) Hitoshi Harada, reviewed by Pavel Stehule	2010-02-12 17:33:21 +00:00
Tom Lane	cbe9d6beb4	Fix up rickety handling of relation-truncation interlocks. Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.	2010-02-09 21:43:30 +00:00
Tom Lane	d5768dce10	Create an official API function for C functions to use to check if they are being called as aggregates, and to get the aggregate transition state memory context if needed. Use it instead of poking directly into AggState and WindowAggState in places that shouldn't know so much. We should have done this in 8.4, probably, but better late than never. Revised version of a patch by Hitoshi Harada.	2010-02-08 20:39:52 +00:00
Tom Lane	0a469c8769	Remove old-style VACUUM FULL (which was known for a little while as VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.	2010-02-08 04:33:55 +00:00
Tom Lane	b9b8831ad6	Create a "relation mapping" infrastructure to support changing the relfilenodes of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.	2010-02-07 20:48:13 +00:00
Heikki Linnakangas	9de778b24b	Move the responsibility of writing a "unlogged WAL operation" record from heap_sync() to the callers, because heap_sync() is sometimes called even if the operation itself is WAL-logged. This eliminates the bogus unlogged records from CLUSTER that Simon Riggs reported, patch by Fujii Masao.	2010-02-03 10:01:30 +00:00
Robert Haas	42a8ab0a14	Augment EXPLAIN output with more details on Hash nodes. We show the number of buckets, the number of batches (and also the original number if it has changed), and the peak space used by the hash table. Minor executor changes to track peak space used.	2010-02-01 15:43:36 +00:00
Tom Lane	034fffbf31	Fix memory leak created by deferrable-index-constraints patches. We need to free the OID list returned by ExecInsertIndexTuples to avoid a query-lifespan memory leak. When many rows require rechecking, this can be a significant leak --- it's even more than the space used for the queued trigger events. Dean Rasheed	2010-01-31 18:15:39 +00:00
Peter Eisentraut	e7b3349a8a	Type table feature This adds the CREATE TABLE name OF type command, per SQL standard.	2010-01-28 23:21:13 +00:00
Heikki Linnakangas	40f908bdcd	Introduce Streaming Replication. This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me	2010-01-15 09:19:10 +00:00
Tom Lane	292176a118	Improve ExecEvalVar's handling of whole-row variables in cases where the rowtype contains dropped columns. Sometimes the input tuple will be formed from a select targetlist in which dropped columns are filled with a NULL of an arbitrary type (the planner typically uses INT4, since it can't tell what type the dropped column really was). So we need to relax the rowtype compatibility check to not insist on physical compatibility if the actual column value is NULL. In principle we might need to do this for functions returning composite types, too (see tupledesc_match()). In practice there doesn't seem to be a bug there, probably because the function will be using the same cached rowtype descriptor as the caller. Fixing that code path would require significant rearrangement, so I left it alone for now. Per complaint from Filip Rembialkowski.	2010-01-11 15:31:04 +00:00
Tom Lane	85113bcf5a	Make ExecEvalFieldSelect throw a more intelligible error if it's asked to extract a system column, and remove a couple of lines that are useless in light of the fact that we aren't ever going to support this case. There isn't much point in trying to make this work because a tuple Datum does not carry many of the system columns. Per experimentation with a case reported by Dean Rasheed; we'll have to fix his problem somewhere else.	2010-01-09 20:46:19 +00:00
Tom Lane	217dc525c0	Fix oversight in EvalPlanQualFetch: after failing to lock a tuple because someone else has just updated it, we have to set priorXmax to that tuple's xmax (ie, the XID of the other xact that updated it) before looping back to examine the next tuple. Obviously, the next tuple in the update chain should have that XID as its xmin, not the same xmin as the preceding tuple that we had been trying to lock. The mismatch would cause the EvalPlanQual logic to decide that the tuple chain ended in a deletion, when actually there was a live tuple that should have been found. I inserted this error when recently adding logic to EvalPlanQual to make it lock tuples before returning them (as opposed to the old method in which the lock would occur much later, causing a great deal of work to be wasted if we only then discover someone else updated it). Sigh. Per today's report from Takahiro Itagaki of inconsistent results during pgbench runs.	2010-01-08 02:44:00 +00:00
Bruce Momjian	f98fbc78c3	Preserve relfilenodes: Add support to pg_dump --binary-upgrade to preserve all relfilenodes, for use by pg_migrator.	2010-01-06 03:04:03 +00:00
Tom Lane	90f4c2d960	Add support for doing FULL JOIN ON FALSE. While this is really a rather peculiar variant of UNION ALL, and so wouldn't likely get written directly as-is, it's possible for it to arise as a result of simplification of less-obviously-silly queries. In particular, now that we can do flattening of subqueries that have constant outputs and are underneath an outer join, it's possible for the case to result from simplification of queries of the type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality regression for this type of query.	2010-01-05 23:25:36 +00:00
Tom Lane	40608e7f94	When estimating the selectivity of an inequality "column > constant" or "column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.	2010-01-04 02:44:40 +00:00
Tom Lane	2b59274c09	check_exclusion_constraint didn't actually work correctly for index expressions: FormIndexDatum requires the estate's scantuple to already point at the tuple the values are supposedly being extracted from. Adjust test case so that this type of confusion will be exposed. Per report from hubert depesz lubaczewski.	2010-01-02 17:53:57 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Tom Lane	7839d35991	Add an "argisrow" field to NullTest nodes, following a plan made way back in 8.2beta but never carried out. This avoids repetitive tests of whether the argument is of scalar or composite type. Also, be a bit more paranoid about composite arguments in some places where we previously weren't checking.	2010-01-01 23:03:10 +00:00
Tom Lane	29c4ad9829	Support "x IS NOT NULL" clauses as indexscan conditions. This turns out to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.	2010-01-01 21:53:49 +00:00
Tom Lane	649b5ec7c8	Add the ability to store inheritance-tree statistics in pg_statistic, and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.	2009-12-29 20:11:45 +00:00
Heikki Linnakangas	84d723b6ce	Previous fix for temporary file management broke returning a set from PL/pgSQL function within an exception handler. Make sure we use the right resource owner when we create the tuplestore to hold returned tuples. Simplify tuplestore API so that the caller doesn't need to be in the right memory context when calling tuplestore_put* functions. tuplestore.c automatically switches to the memory context used when the tuplestore was created. Tuplesort was already modified like this earlier. This patch also removes the now useless MemoryContextSwitch calls from callers. Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like the previous patch that broke this.	2009-12-29 17:40:59 +00:00
Tom Lane	34d26872ed	Support ORDER BY within aggregate function calls, at long last providing a non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada	2009-12-15 17:57:48 +00:00
Robert Haas	cddca5ec13	Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics. This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.	2009-12-15 04:57:48 +00:00
Tom Lane	a620d5005d	Fix a bug introduced when set-returning SQL functions were made inline-able: we have to cope with the possibility that the declared result rowtype contains dropped columns. This fails in 8.4, as per bug #5240. While at it, be more paranoid about inserting binary coercions when inlining. The pre-8.4 code did not really need to worry about that because it could not inline at all in any case where an added coercion could change the behavior of the function's statement. However, when inlining a SRF we allow sorting, grouping, and set-ops such as UNION. In these cases, modifying one of the targetlist entries that the sort/group/setop depends on could conceivably change the behavior of the function's statement --- so don't inline when such a case applies.	2009-12-14 02:15:54 +00:00
Tom Lane	d8e511fabb	Ensure that the result tuple of an EvalPlanQual cycle gets materialized before we zap the input tuple. Otherwise, pass-by-reference columns of the result slot are likely to contain just references to the input tuple, leading to big trouble if the pfree'd space is reused. Per trouble report from Jaime Casanova. This is a new bug in the recent rewrite of EvalPlanQual, so nothing to back-patch.	2009-12-11 18:14:43 +00:00
Tom Lane	62aba76568	Prevent indirect security attacks via changing session-local state within an allegedly immutable index function. It was previously recognized that we had to prevent such a function from executing SET/RESET ROLE/SESSION AUTHORIZATION, or it could trivially obtain the privileges of the session user. However, since there is in general no privilege checking for changes of session-local state, it is also possible for such a function to change settings in a way that might subvert later operations in the same session. Examples include changing search_path to cause an unexpected function to be called, or replacing an existing prepared statement with another one that will execute a function of the attacker's choosing. The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against these threats, which are the same places previously deemed to need protection against the SET ROLE issue. GUC changes are still allowed, since there are many useful cases for that, but we prevent security problems by forcing a rollback of any GUC change after completing the operation. Other cases are handled by throwing an error if any change is attempted; these include temp table creation, closing a cursor, and creating or deleting a prepared statement. (In 7.4, the infrastructure to roll back GUC changes doesn't exist, so we settle for rejecting changes of "search_path" in these contexts.) Original report and patch by Gurjeet Singh, additional analysis by Tom Lane. Security: CVE-2009-4136	2009-12-09 21:57:51 +00:00
Tom Lane	0cb65564e5	Add exclusion constraints, which generalize the concept of uniqueness to support any indexable commutative operator, not just equality. Two rows violate the exclusion constraint if "row1.col OP row2.col" is TRUE for each of the columns in the constraint. Jeff Davis, reviewed by Robert Haas	2009-12-07 05:22:23 +00:00
Tom Lane	7fc0f06221	Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be checked to determine whether the trigger should be fired. For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER triggers it can provide a noticeable performance improvement, since queuing of a deferred trigger event and re-fetching of the row(s) at end of statement can be short-circuited if the trigger does not need to be fired. Takahiro Itagaki, reviewed by KaiGai Kohei.	2009-11-20 20:38:12 +00:00
Tom Lane	2ace38d226	Fix WHERE CURRENT OF to work as designed within plpgsql. The argument can be the name of a plpgsql cursor variable, which formerly was converted to $N before the core parser saw it, but that's no longer the case. Deal with plain name references to plpgsql variables, and add a regression test case that exposes the failure.	2009-11-09 02:36:59 +00:00
Tom Lane	9bedd128d6	Add support for invoking parser callback hooks via SPI and in cached plans. As proof of concept, modify plpgsql to use the hooks. plpgsql is still inserting $n symbols textually, but the "back end" of the parsing process now goes through the ParamRef hook instead of using a fixed parameter-type array, and then execution only fetches actually-referenced parameters, using a hook added to ParamListInfo. Although there's a lot left to be done in plpgsql, this already cures the "if (TG_OP = 'INSERT' and NEW.foo ...)" problem, as illustrated by the changed regression test.	2009-11-04 22:26:08 +00:00
Tom Lane	8442317beb	Make the overflow guards in ExecChooseHashTableSize be more protective. The original coding ensured nbuckets and nbatch didn't exceed INT_MAX, which while not insane on its own terms did nothing to protect subsequent code like "palloc(nbatch * sizeof(BufFile *))". Since enormous join size estimates might well be planner error rather than reality, it seems best to constrain the initial sizes to be not more than work_mem/sizeof(pointer), thus ensuring the allocated arrays don't exceed work_mem. We will allow nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches calls, but we should still guard against integer overflow in those palloc requests. Per bug #5145 from Bernt Marius Johnsen. Although the given test case only seems to fail back to 8.2, previous releases have variants of this issue, so patch all supported branches.	2009-10-30 20:58:45 +00:00
Tom Lane	9f2ee8f287	Re-implement EvalPlanQual processing to improve its performance and eliminate a lot of strange behaviors that occurred in join cases. We now identify the "current" row for every joined relation in UPDATE, DELETE, and SELECT FOR UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the appropriate row into each scan node in the rechecking plan, forcing it to emit only that one row. The former behavior could rescan the whole of each joined relation for each recheck, which was terrible for performance, and what's much worse could result in duplicated output tuples. Also, the original implementation of EvalPlanQual could not re-use the recheck execution tree --- it had to go through a full executor init and shutdown for every row to be tested. To avoid this overhead, I've associated a special runtime Param with each LockRows or ModifyTable plan node, and arranged to make every scan node below such a node depend on that Param. Thus, by signaling a change in that Param, the EPQ machinery can just rescan the already-built test plan. This patch also adds a prohibition on set-returning functions in the targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the duplicate-output-tuple problem. It seems fairly reasonable since the other restrictions on SELECT FOR UPDATE are meant to ensure that there is a unique correspondence between source tuples and result tuples, which an output SRF destroys as much as anything else does.	2009-10-26 02:26:45 +00:00
Tom Lane	0adaf4cb31	Move the handling of SELECT FOR UPDATE locking and rechecking out of execMain.c and into a new plan node type LockRows. Like the recent change to put table updating into a ModifyTable plan node, this increases planning flexibility by allowing the operations to occur below the top level of the plan tree. It's necessary in any case to restore the previous behavior of having FOR UPDATE locking occur before ModifyTable does. This partially refactors EvalPlanQual to allow multiple rows-under-test to be inserted into the EPQ machinery before starting an EPQ test query. That isn't sufficient to fix EPQ's general bogosity in the face of plans that return multiple rows per test row, though. Since this patch is mostly about getting some plan node infrastructure in place and not about fixing ten-year-old bugs, I will leave EPQ improvements for another day. Another behavioral change that we could now think about is doing FOR UPDATE before LIMIT, but that too seems like it should be treated as a followon patch.	2009-10-12 18:10:51 +00:00
Tom Lane	8a5849b7ff	Split the processing of INSERT/UPDATE/DELETE operations out of execMain.c. They are now handled by a new plan node type called ModifyTable, which is placed at the top of the plan tree. In itself this change doesn't do much, except perhaps make the handling of RETURNING lists and inherited UPDATEs a tad less klugy. But it is necessary preparation for the intended extension of allowing RETURNING queries inside WITH. Marko Tiikkaja	2009-10-10 01:43:50 +00:00
Tom Lane	c970292a94	Remove very ancient tuple-counting infrastructure (IncrRetrieved() and friends). This code has all been ifdef'd out for many years, and doesn't seem to have any prospect of becoming any more useful in the future. EXPLAIN ANALYZE is what people use in practice, and I think if we did want process-wide counters we'd be more likely to put in dtrace events for that than try to resurrect this code. Get rid of it so as to have one less detail to worry about while refactoring execMain.c.	2009-10-08 22:34:57 +00:00
Tom Lane	249724cb01	Create an ALTER DEFAULT PRIVILEGES command, which allows users to adjust the privileges that will be applied to subsequently-created objects. Such adjustments are always per owning role, and can be restricted to objects created in particular schemas too. A notable benefit is that users can override the traditional default privilege settings, eg, the PUBLIC EXECUTE privilege traditionally granted by default for functions. Petr Jelinek	2009-10-05 19:24:49 +00:00
Alvaro Herrera	caa4cfa369	Ensure that a cursor has an immutable snapshot throughout its lifespan. The old coding was using a regular snapshot, referenced elsewhere, that was subject to having its command counter updated. Fix by creating a private copy of the snapshot exclusively for the cursor. Backpatch to 8.4, which is when the bug was introduced during the snapshot management rewrite.	2009-10-02 17:57:30 +00:00
Tom Lane	421d7d8edb	Remove no-longer-needed ExecCountSlots infrastructure.	2009-09-27 21:10:53 +00:00
Tom Lane	f92e8a4b5e	Replace the array-style TupleTable data structure with a simple List of TupleTableSlot nodes. This eliminates the need to count in advance how many Slots will be needed, which seems more than worth the small increase in the amount of palloc traffic during executor startup. The ExecCountSlots infrastructure is now all dead code, but I'll remove it in a separate commit for clarity. Per a comment from Robert Haas.	2009-09-27 20:09:58 +00:00
Tom Lane	4985635230	Extend the BKI infrastructure to allow system catalogs to be given hand-assigned rowtype OIDs, even when they are not "bootstrapped" catalogs that have handmade type rows in pg_type.h. Give pg_database such an OID. Restore the availability of C macros for the rowtype OIDs of the bootstrapped catalogs. (These macros are now in the individual catalogs' .h files, though, not in pg_type.h.) This commit doesn't do anything especially useful by itself, but it's necessary infrastructure for reverting some ill-considered changes in relcache.c.	2009-09-26 22:42:03 +00:00
Tom Lane	9bb342811b	Rewrite the planner's handling of materialized plan types so that there is an explicit model of rescan costs being different from first-time costs. The costing of Material nodes in particular now has some visible relationship to the actual runtime behavior, where before it was essentially fantasy. This also fixes up a couple of places where different materialized plan types were treated differently for no very good reason (probably just oversights). A couple of the regression tests are affected, because the planner now chooses to put the other relation on the inside of a nestloop-with-materialize. So far as I can see both changes are sane, and the planner is now more consistently following the expectation that it should prefer to materialize the smaller of two relations. Per a recent discussion with Robert Haas.	2009-09-12 22:12:09 +00:00
Tom Lane	c38b75947e	Tweak ExecIndexEvalRuntimeKeys to forcibly detoast any toasted comparison values before they get passed to the index access method. This avoids repeated detoastings that will otherwise ensue as the comparison value is examined by various index support functions. We have seen a couple of reports of cases where repeated detoastings result in an order-of-magnitude slowdown, so it seems worth adding a bit of extra logic to prevent this. I had previously proposed trying to avoid duplicate detoastings in general, but this fix takes care of what seems the most important case in practice with very little effort or risk. Back-patch to 8.4 so that the PostGIS folk won't have to wait a year to have this fix in a production release. (The issue exists further back, of course, but the code's diverged enough to make backpatching further a higher-risk action. Also it appears that the possible gains may be limited in prior releases because of different handling of lossy operators.)	2009-08-23 18:26:08 +00:00
Tom Lane	dcb2bda9b7	Improve plpgsql's ability to cope with rowtypes containing dropped columns, by supporting conversions in places that used to demand exact rowtype match. Since this issue is certain to come up elsewhere (in fact, already has, in ExecEvalConvertRowtype), factor out the support code into new core functions for tuple conversion. I chose to put these in a new source file since heaptuple.c is already overly long. Heavily revised version of a patch by Pavel Stehule.	2009-08-06 20:44:32 +00:00
Tom Lane	25d9bf2e3e	Support deferrable uniqueness constraints. The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed	2009-07-29 20:56:21 +00:00
Tom Lane	adfa04293b	Save a few cycles in EXPLAIN and related commands by not bothering to form a physical tuple in do_tup_output(). A virtual tuple is easier to set up and also easier for most tuple receivers to process. Per my comment on Robert Haas' recent patch in this code.	2009-07-23 21:27:10 +00:00
Tom Lane	6a0865e4bb	In a non-hashed Agg node, reset the "aggcontext" at group boundaries, instead of individually pfree'ing pass-by-reference transition values. This should be at least as fast as the prior coding, and it has the major advantage of clearing out any working data an aggregate function may have stored in or underneath the aggcontext. This avoids memory leakage when an aggregate such as array_agg() is used in GROUP BY mode. Per report from Chris Spotts. Back-patch to 8.4. In principle the problem could arise in prior versions, but since they didn't have array_agg the issue seems not critical.	2009-07-23 20:45:27 +00:00
Tom Lane	846c364dd4	Change do_tup_output() to take Datum/isnull arrays instead of a char * array, so it doesn't go through BuildTupleFromCStrings. This is more or less a wash for current uses, but will avoid inefficiency for planned changes to EXPLAIN. Robert Haas	2009-07-22 17:00:23 +00:00
Tom Lane	011eae60ef	Fix error cleanup failure caused by 8.4 changes in plpgsql to try to avoid memory leakage in error recovery. We were calling FreeExprContext, and therefore invoking ExprContextCallback callbacks, in both normal and error exits from subtransactions. However this isn't very safe, as shown in recent trouble report from Frank van Vugt, in which releasing a tupledesc refcount failed. It's also unnecessary, since the resources that callbacks might wish to release should be cleaned up by other error recovery mechanisms (ie the resource owners). We only really want FreeExprContext to release memory attached to the exprcontext in the error-exit case. So, add a bool parameter to FreeExprContext to tell it not to call the callbacks. A more general solution would be to pass the isCommit bool parameter on to the callbacks, so they could do only safe things during error exit. But that would make the patch significantly more invasive and possibly break third-party code that registers ExprContextCallback callbacks. We might want to do that later in HEAD, but for now I'll just do what seems reasonable to back-patch.	2009-07-18 19:15:42 +00:00
Tom Lane	82480e28f5	Fix things so that array_agg_finalfn does not modify or free its input ArrayBuildState, per trouble report from Merlin Moncure. By adopting this fix, we are essentially deciding that aggregate final-functions should not modify their inputs ever. Adjust documentation and comments to match that conclusion.	2009-06-20 18:45:28 +00:00
Tom Lane	e8d78d35f4	ExecAgg() failed to finish running out set-returning functions in the last aggregated tuple of a run. Per report from Laurenz Albe. This is a new bug in 8.4, but only because prior versions rejected SRFs in an Agg plan node altogether.	2009-06-17 16:05:34 +00:00
Tom Lane	44aa60fa7c	Revisit AlterTableCreateToastTable's API once again, hoping to make it what pg_migrator actually needs and not just a partial solution. We have to be able to specify the OID that the new toast table should be created with.	2009-06-11 20:46:11 +00:00
Tom Lane	0c19f05803	Fix things so that you can still do "select foo()" where foo is a SQL function returning setof record. This used to work, more or less accidentally, but I had broken it while extending the code to allow materialize-mode functions to be called in select lists. Add a regression test case so it doesn't get broken again. Per gripe from Greg Davidson.	2009-06-11 17:25:39 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Peter Eisentraut	9b7304bc25	Fix xmlattribute escaping XML special characters twice (bug #4822 ). Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>	2009-06-09 22:00:57 +00:00
Tom Lane	76d4abf2d9	Improve the recently-added support for properly pluralized error messages by extending the ereport() API to cater for pluralization directly. This is better than the original method of calling ngettext outside the elog.c code because (1) it avoids double translation, which wastes cycles and in the worst case could give a wrong result; and (2) it avoids having to use a different coding method in PL code than in the core backend. The client-side uses of ngettext are not touched since neither of these concerns is very pressing in the client environment. Per my proposal of yesterday.	2009-06-04 18:33:08 +00:00
Tom Lane	1e06ed1abe	Add an option to AlterTableCreateToastTable() to allow its caller to force a toast table to be built, even if the sum-of-column-widths calculation indicates one isn't needed. This is needed by pg_migrator because if the old table has a toast table, we have to migrate over the toast table since it might contain some live data, even though subsequent column drops could mean that no recently-added rows could require toasting.	2009-05-07 22:58:28 +00:00
Peter Eisentraut	77d67a4a3b	XMLATTRIBUTES() should send the attribute values through map_sql_value_to_xml_value() instead of directly through the data type output function. This is per SQL standard, and consistent with XMLELEMENT().	2009-04-08 21:51:38 +00:00
Tom Lane	eb4c723e56	Make ExecInitExpr build the list of SubPlans found in a plan tree in order of discovery, rather than reverse order. This doesn't matter functionally (I suppose the previous coding dates from the time when lcons was markedly cheaper than lappend). However now that EXPLAIN is labeling subplans with IDs that are based on order of creation, this may help produce a slightly less surprising printout.	2009-04-05 20:32:06 +00:00
Tom Lane	85369f888e	Refactor ExecProject and associated routines so that fast-path code is used for simple Var targetlist entries all the time, even when there are other entries that are not simple Vars. Also, ensure that we prefetch attributes (with slot_getsomeattrs) for all Vars in the targetlist, even those buried within expressions. In combination these changes seem to significantly reduce the runtime for cases where tlists are mostly but not exclusively Vars. Per my proposal of yesterday.	2009-04-02 22:39:30 +00:00
Bruce Momjian	0e550ff617	Revert DTrace patch from Robert Lor	2009-04-02 20:59:10 +00:00
Bruce Momjian	227f817c1f	Add support for additional DTrace probes. Robert Lor	2009-04-02 19:14:34 +00:00
Tom Lane	793d5662e8	Fix an oversight in the support for storing/retrieving "minimal tuples" in TupleTableSlots. We have functions for retrieving a minimal tuple from a slot after storing a regular tuple in it, or vice versa; but these were implemented by converting the internal storage from one format to the other. The problem with that is it invalidates any pass-by-reference Datums that were already fetched from the slot, since they'll be pointing into the just-freed version of the tuple. The known problem cases involve fetching both a whole-row variable and a pass-by-reference value from a slot that is fed from a tuplestore or tuplesort object. The added regression tests illustrate some simple cases, but there may be other failure scenarios traceable to the same bug. Note that the added tests probably only fail on unpatched code if it's built with --enable-cassert; otherwise the bug leads to fetching from freed memory, which will not have been overwritten without additional conditions. Fix by allowing a slot to contain both formats simultaneously; which turns out not to complicate the logic much at all, if anything it seems less contorted than before. Back-patch to 8.2, where minimal tuples were introduced.	2009-03-30 04:08:43 +00:00
Tom Lane	25bf7f8b9b	Fix possible failures when a tuplestore switches from in-memory to on-disk mode while callers hold pointers to in-memory tuples. I reported this for the case of nodeWindowAgg's primary scan tuple, but inspection of the code shows that all of the calls in nodeWindowAgg and nodeCtescan are at risk. For the moment, fix it with a rather brute-force approach of copying whenever one of the at-risk callers requests a tuple. Later we might think of some sort of reference-count approach to reduce tuple copying.	2009-03-27 18:30:21 +00:00
Peter Eisentraut	8032d76b5b	Gettext plural support In the backend, I changed only a handful of exemplary or important-looking instances to make use of the plural support; there is probably more work there. For the rest of the source, this should cover all relevant cases.	2009-03-26 22:26:08 +00:00
Tom Lane	596efd27ed	Optimize multi-batch hash joins when the outer relation has a nonuniform distribution, by creating a special fast path for the (first few) most common values of the outer relation. Tuples having hashvalues matching the MCVs are effectively forced to be in the first batch, so that we never write them out to the batch temp files. Bryce Cutt and Ramon Lawrence, with some editorialization by me.	2009-03-21 00:04:40 +00:00
Peter Eisentraut	12f87b2c82	Add new SQL:2008 error codes for invalid LIMIT and OFFSET values. Remove unused nonstandard error code that was perhaps intended for this but never used.	2009-03-04 10:55:00 +00:00
Tom Lane	3d02cae310	Ensure that INSERT ... SELECT into a table with OIDs never copies row OIDs from the source table. This could never happen anyway before 8.4 because the executor invariably applied a "junk filter" to rows due to be inserted; but now that we skip doing that when it's not necessary, the case can occur. Problem noted 2008-11-27 by KaiGai Kohei, though I misunderstood what he was on about at the time (the opacity of the patch he proposed didn't help).	2009-02-08 18:02:27 +00:00
Alvaro Herrera	3a5b773715	Allow reloption names to have qualifiers, initially supporting a TOAST qualifier, and add support for this in pg_dump. This allows TOAST tables to have user-defined fillfactor, and will also enable us to move the autovacuum parameters to reloptions without taking away the possibility of setting values for TOAST tables.	2009-02-02 19:31:40 +00:00
Tom Lane	3cb5d6580a	Support column-level privileges, as required by SQL standard. Stephen Frost, with help from KaiGai Kohei and others	2009-01-22 20:16:10 +00:00
Heikki Linnakangas	94136d5a18	Add new SPI_OK_REWRITTEN return code to SPI_execute and friends, for the case that the command is rewritten into another type of command. The old behavior to return the command tag of the last executed command was pretty surprising. In PL/pgSQL, for example, it meant that if a command was rewritten to a utility statement, FOUND wasn't set at all.	2009-01-21 11:02:40 +00:00
Tom Lane	8a4505013d	Tweak order of operations in BitmapHeapNext() to avoid the case of prefetching the same page we are nanoseconds away from reading for real. There should be something left to do on the current page before we consider issuing a prefetch.	2009-01-12 16:00:41 +00:00
Tom Lane	b7b8f0b609	Implement prefetching via posix_fadvise() for bitmap index scans. A new GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark	2009-01-12 05:10:45 +00:00
Tom Lane	43a57cf365	Revise the TIDBitmap API to support multiple concurrent iterations over a bitmap. This is extracted from Greg Stark's posix_fadvise patch; it seems worth committing separately, since it's potentially useful independently of posix_fadvise.	2009-01-10 21:08:36 +00:00
Tom Lane	d04db37072	Arrange for function default arguments to be processed properly in expressions that are set up for execution with ExecPrepareExpr rather than going through the full planner process. By introducing an explicit notion of "expression planning", this patch also lays a bit of groundwork for maybe someday allowing sub-selects in standalone expressions.	2009-01-09 15:46:11 +00:00
Tom Lane	deac9488d3	Insert conditional SPI_push/SPI_pop calls into InputFunctionCall, OutputFunctionCall, and friends. This allows SPI-using functions to invoke datatype I/O without concern for the possibility that a SPI-using function will be called (which could be either the I/O function itself, or a function used in a domain check constraint). It's a tad ugly, but not nearly as ugly as what'd be needed to make this work via retail insertion of push/pop operations in all the PLs. This reverts my patch of 2007-01-30 that inserted some retail SPI_push/pop calls into plpgsql; that approach only fixed plpgsql, and not any other PLs. But the other PLs have the issue too, as illustrated by a recent gripe from Christian Schröder. Back-patch to 8.2, which is as far back as this solution will work. It's also as far back as we need to worry about the domain-constraint case, since earlier versions did not attempt to check domain constraints within datatype input. I'm not aware of any old I/O functions that use SPI themselves, so this should be sufficient for a back-patch.	2009-01-07 20:38:56 +00:00
Tom Lane	1cfd9e8834	Fix executor/spi.h to follow our usual conventions for include files, ie, not include postgres.h nor anything else it doesn't directly need. Add #includes to calling files as needed to compensate. Per my proposal of yesterday. This should be noted as a source code change in the 8.4 release notes, since it's likely to require changes in add-on modules.	2009-01-07 13:44:37 +00:00
Tom Lane	bbeb0bbf6b	Include a pointer to the query's source text in QueryDesc structs. This is practically free given prior 8.4 changes in plancache and portal management, and it makes it a lot easier for ExecutorStart/Run/End hooks to get at the query text. Extracted from Itagaki Takahiro's pg_stat_statements patch, with minor editorialization.	2009-01-02 20:42:00 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00

1 2 3 4 5 ...

1178 Commits