postgresql

Commit Graph

Author	SHA1	Message	Date
Jeff Davis	b1892aaeaa	Revert WAL posix_fallocate() patches. This reverts commit `269e780822` and commit `5b571bb8c8`. Unfortunately, the initial patch had insufficient performance testing, and resulted in a regression. Per report by Thom Brown.	2013-09-04 23:43:41 -07:00
Bruce Momjian	f5c2f5a8f6	Add GUC descriptions for compile-time postgresql.conf settings Previous text was "No description available". Tianyin Xu	2013-09-04 17:44:04 -04:00
Heikki Linnakangas	375d8526f2	Keep heavily-contended fields in XLogCtlInsert on different cache lines. Performance testing shows that if the insertpos_lck spinlock and the fields that it protects are on the same cache line with other variables that are frequently accessed, the false sharing can hurt performance a lot. Keep them apart by adding some padding.	2013-09-04 23:14:33 +03:00
Robert Haas	cc52d5b33f	Expose fsync_fname as a public API. Andres Freund	2013-09-04 11:15:00 -04:00
Tom Lane	0c66a22377	Update comments concerning PGC_S_TEST. This GUC context value was once only used by ALTER DATABASE SET and ALTER USER SET. That's not true anymore, though, so rewrite the comments to be a bit more general. Patch in HEAD only, since this is just an internal documentation issue.	2013-09-03 18:56:22 -04:00
Tom Lane	546f7c2e38	Don't fail for bad GUCs in CREATE FUNCTION with check_function_bodies off. The previous coding attempted to activate all the GUC settings specified in SET clauses, so that the function validator could operate in the GUC environment expected by the function body. However, this is problematic when restoring a dump, since the SET clauses might refer to database objects that don't exist yet. We already have the parameter check_function_bodies that's meant to prevent forward references in function definitions from breaking dumps, so let's change CREATE FUNCTION to not install the SET values if check_function_bodies is off. Authors of function validators were already advised not to make any "context sensitive" checks when check_function_bodies is off, if indeed they're checking anything at all in that mode. But extend the documentation to point out the GUC issue in particular. (Note that we still check the SET clauses to some extent; the behavior with !check_function_bodies is now approximately equivalent to what ALTER DATABASE/ROLE have been doing for awhile with context-dependent GUCs.) This problem can be demonstrated in all active branches, so back-patch all the way.	2013-09-03 18:32:20 -04:00
Tom Lane	0d3f4406df	Allow aggregate functions to be VARIADIC. There's no inherent reason why an aggregate function can't be variadic (even VARIADIC ANY) if its transition function can handle the case. Indeed, this patch to add the feature touches none of the planner or executor, and little of the parser; the main missing stuff was DDL and pg_dump support. It is true that variadic aggregates can create the same sort of ambiguity about parameters versus ORDER BY keys that was complained of when we (briefly) had both one- and two-argument forms of string_agg(). However, the policy formed in response to that discussion only said that we'd not create any built-in aggregates with varying numbers of arguments, not that we shouldn't allow users to do it. So the logical extension of that is we can allow users to make variadic aggregates as long as we're wary about shipping any such in core. In passing, this patch allows aggregate function arguments to be named, to the extent of remembering the names in pg_proc and dumping them in pg_dump. You can't yet call an aggregate using named-parameter notation. That seems like a likely future extension, but it'll take some work, and it's not what this patch is really about. Likewise, there's still some work needed to make window functions handle VARIADIC fully, but I left that for another day. initdb forced because of new aggvariadic field in Aggref parse nodes.	2013-09-03 17:08:46 -04:00
Heikki Linnakangas	a93bdfc711	Fix typo in comment. Also line-wrap an over-wide line in a comment that's ignored by pgindent.	2013-09-03 13:17:09 +03:00
Peter Eisentraut	6a007fa1eb	Translation updates	2013-09-02 02:43:18 -04:00
Tom Lane	8e2b71d2d0	Reset the binary heap in MergeAppend rescans. Failing to do so can cause queries to return wrong data, error out or crash. This requires adding a new binaryheap_reset() method to binaryheap.c, but that probably should have been there anyway. Per bug #8410 from Terje Elde. Diagnosis and patch by Andres Freund.	2013-08-30 19:15:21 -04:00
Alvaro Herrera	9381cb5229	Make error wording more consistent	2013-08-29 12:42:28 -04:00
Robert Haas	090d0f2050	Allow discovery of whether a dynamic background worker is running. Using the infrastructure provided by this patch, it's possible either to wait for the startup of a dynamically-registered background worker, or to poll the status of such a worker without waiting. In either case, the current PID of the worker process can also be obtained. As usual, worker_spi is updated to demonstrate the new functionality. Patch by me. Review by Andres Freund.	2013-08-28 14:08:13 -04:00
Robert Haas	c9e2e2db5c	Partially restore comments discussing enum renumbering hazards. As noted by Tom Lane, commit `813fb03155` was overly optimistic about how safe it is to concurrently change enumsortorder values under MVCC catalog scan semantics. Restore some of the previous text, with hopefully-correct adjustments for the new state of play.	2013-08-28 13:21:08 -04:00
Alvaro Herrera	e246cfc95f	Initialize cached OID to Invalid in new hash entries Andres Freund; bug detected by valgrind	2013-08-27 14:53:17 -04:00
Tom Lane	2aac3399ae	Account better for planning cost when choosing whether to use custom plans. The previous coding in plancache.c essentially used 10% of the estimated runtime as its cost estimate for planning. This can be pretty bogus, especially when the estimated runtime is very small, such as in a simple expression plan created by plpgsql, or a simple INSERT ... VALUES. While we don't have a really good handle on how planning time compares to runtime, it seems reasonable to use an estimate based on the number of relations referenced in the query, with a rather large multiplier. This patch uses 1000 * cpu_operator_cost * (nrelations + 1), so that even a trivial query will be charged 1000 * cpu_operator_cost for planning. This should address the problem reported by Marc Cousin and others that 9.2 and up prefer custom plans in cases where the planning time greatly exceeds what can be saved.	2013-08-24 15:14:17 -04:00
Magnus Hagander	db4ef73760	Don't crash when pg_xlog is empty and pg_basebackup -x is used The backup will not work (without a logarchive, and that's the whole point of -x) in this case, this patch just changes it to throw an error instead of crashing when this happens. Noticed and diagnosed by TAKATSUKA Haruka	2013-08-24 17:13:49 +02:00
Tom Lane	fcf9ecad57	In locate_grouping_columns(), don't expect an exact match of Var typmods. It's possible that inlining of SQL functions (or perhaps other changes?) has exposed typmod information not known at parse time. In such cases, Vars generated by query_planner might have valid typmod values while the original grouping columns only have typmod -1. This isn't a semantic problem since the behavior of grouping only depends on type not typmod, but it breaks locate_grouping_columns' use of tlist_member to locate the matching entry in query_planner's result tlist. We can fix this without an excessive amount of new code or complexity by relying on the fact that locate_grouping_columns only gets called when make_subplanTargetList has set need_tlist_eval == false, and that can only happen if all the grouping columns are simple Vars. Therefore we only need to search the sub_tlist for a matching Var, and we can reasonably define a "match" as being a match of the Var identity fields varno/varattno/varlevelsup. The code still Asserts that vartype matches, but ignores vartypmod. Per bug #8393 from Evan Martin. The added regression test case is basically the same as his example. This has been broken for a very long time, so back-patch to all supported branches.	2013-08-23 17:30:53 -04:00
Tom Lane	3454876314	Fix hash table size estimation error in choose_hashed_distinct(). We should account for the per-group hashtable entry overhead when considering whether to use a hash aggregate to implement DISTINCT. The comparable logic in choose_hashed_grouping() gets this right, but I think I omitted it here in the mistaken belief that there would be no overhead if there were no aggregate functions to be evaluated. This can result in more than 2X underestimate of the hash table size, if the tuples being aggregated aren't very wide. Per report from Tomas Vondra. This bug is of long standing, but per discussion we'll only back-patch into 9.3. Changing the estimation behavior in stable branches seems to carry too much risk of destabilizing plan choices for already-tuned applications.	2013-08-21 13:38:34 -04:00
Tom Lane	20fe870753	Be more wary of unwanted whitespace in pgstat_reset_remove_files(). sscanf isn't the easiest thing to use for exact pattern checks ... also, don't use strncmp where strcmp will do.	2013-08-19 19:36:04 -04:00
Alvaro Herrera	f9b50b7c18	Fix removal of files in pgstats directories Instead of deleting all files in stats_temp_directory and the permanent directory on a crash, only remove those files that match the pattern of files we actually write in them, to avoid possibly clobbering existing unrelated contents of the temporary directory. Per complaint from Jeff Janes, and subsequent discussion, starting at message CAMkU=1z9+7RsDODnT4=cDFBRBp8wYQbd_qsLcMtKEf-oFwuOdQ@mail.gmail.com Also, fix a bug in the same routine to avoid removing files from the permanent directory twice (instead of once from that directory and then from the temporary directory), also per report from Jeff Janes, in message CAMkU=1wbk947=-pAosDMX5VC+sQw9W4ttq6RM9rXu=MjNeEQKA@mail.gmail.com	2013-08-19 17:48:17 -04:00
Heikki Linnakangas	3619a20d33	Rename the "fast_promote" file to just "promote". This keeps the usual trigger file name unchanged from 9.2, avoiding nasty issues if you use a pre-9.3 pg_ctl binary with a 9.3 server or vice versa. The fallback behavior of creating a full checkpoint before starting up is now triggered by a file called "fallback_promote". That can be useful for debugging purposes, but we don't expect any users to have to resort to that and we might want to remove that in the future, which is why the fallback mechanism is undocumented.	2013-08-19 20:59:51 +03:00
Tom Lane	c64de21e96	Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries. In an example such as SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true; it is safe to pull up the LATERAL subquery into its parent, but we must then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN. The previous coding in deconstruct_recurse mistakenly labeled the clause as "is_pushed_down", resulting in wrong semantics if the clause were applied at the join node, as per an example submitted awhile ago by Jeremy Evans. To fix, postpone processing of such clauses until we return back up to the appropriate recursion depth in deconstruct_recurse. In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery; we previously missed the possibility that the LATERAL subquery might itself contain an outer join that makes lateral references in lower quals unsafe. A regression test case equivalent to Jeremy's example was already in my commit of yesterday, but was giving the wrong results because of this bug. This patch fixes the expected output for that, and also adds a test case for the second problem.	2013-08-19 13:19:41 -04:00
Alvaro Herrera	78e1220104	Fix pg_upgrade failure from servers older than 9.3 When upgrading from servers of versions 9.2 and older, and MultiXactIds have been used in the old server beyond the first page (that is, 2048 multis or more in the default 8kB-page build), pg_upgrade would set the next multixact offset to use beyond what has been allocated in the new cluster. This would cause a failure the first time the new cluster needs to use this value, because the pg_multixact/offsets/ file wouldn't exist or wouldn't be large enough. To fix, ensure that the transient server instances launched by pg_upgrade extend the file as necessary. Per report from Jesse Denardo in CANiVXAj4c88YqipsyFQPboqMudnjcNTdB3pqe8ReXqAFQ=HXyA@mail.gmail.com	2013-08-19 12:56:18 -04:00
Peter Eisentraut	a2f2e902b8	Translation updates	2013-08-18 23:41:03 -04:00
Kevin Grittner	28154bb23b	Remove relcache entry invalidation in REFRESH MATERIALIZED VIEW. This was added as part of the attempt to support unlogged matviews along with a populated status. It got missed when unlogged support was removed pre-commit. Noticed by Noah Misch. Back-patched to 9.3 branch.	2013-08-18 16:19:22 -05:00
Tom Lane	f1d5fce7cf	Fix thinko in comment.	2013-08-17 20:36:29 -04:00
Tom Lane	9e7e29c75a	Fix planner problems with LATERAL references in PlaceHolderVars. The planner largely failed to consider the possibility that a PlaceHolderVar's expression might contain a lateral reference to a Var coming from somewhere outside the PHV's syntactic scope. We had a previous report of a problem in this area, which I tried to fix in a quick-hack way in commit `4da6439bd8`, but Antonin Houska pointed out that there were still some problems, and investigation turned up other issues. This patch largely reverts that commit in favor of a more thoroughly thought-through solution. The new theory is that a PHV's ph_eval_at level cannot be higher than its original syntactic level. If it contains lateral references, those don't change the ph_eval_at level, but rather they create a lateral-reference requirement for the ph_eval_at join relation. The code in joinpath.c needs to handle that. Another issue is that createplan.c wasn't handling nested PlaceHolderVars properly. In passing, push knowledge of lateral-reference checks for join clauses into join_clause_is_movable_to. This is mainly so that FDWs don't need to deal with it. This patch doesn't fix the original join-qual-placement problem reported by Jeremy Evans (and indeed, one of the new regression test cases shows the wrong answer because of that). But the PlaceHolderVar problems need to be fixed before that issue can be addressed, so committing this separately seems reasonable.	2013-08-17 20:22:37 -04:00
Robert Haas	2dee7998f9	Move more bgworker code to bgworker.c; also, some renaming. Per discussion on pgsql-hackers. Michael Paquier, slightly modified by me. Original suggestion from Amit Kapila.	2013-08-16 15:31:28 -04:00
Heikki Linnakangas	05cbce6f30	Fix typo in comment.	2013-08-16 16:26:22 +03:00
Kevin Grittner	3f78b1715c	Don't allow ALTER MATERIALIZED VIEW ADD UNIQUE. Was accidentally allowed, but not documented and lacked support for rename or drop once created. Per report from Noah Misch.	2013-08-15 13:14:48 -05:00
Peter Eisentraut	229fb58d4f	Treat timeline IDs as unsigned in replication parser Timeline IDs are unsigned ints everywhere, except the replication parser treated them as signed ints.	2013-08-14 23:18:49 -04:00
Peter Eisentraut	32f7c0ae17	Improve error message when view is not updatable Avoid using the term "updatable" in confusing ways. Suggest a trigger first, before a rule.	2013-08-14 23:02:59 -04:00
Tom Lane	1b1d3d92c3	Remove ph_may_need from PlaceHolderInfo, with attendant simplifications. The planner logic that attempted to make a preliminary estimate of the ph_needed levels for PlaceHolderVars seems to be completely broken by lateral references. Fortunately, the potential join order optimization that this code supported seems to be of relatively little value in practice; so let's just get rid of it rather than trying to fix it. Getting rid of this allows fairly substantial simplifications in placeholder.c, too, so planning in such cases should be a bit faster. Issue noted while pursuing bugs reported by Jeremy Evans and Antonin Houska, though this doesn't in itself fix either of their reported cases. What this does do is prevent an Assert crash in the kind of query illustrated by the added regression test. (I'm not sure that the plan for that query is stable enough across platforms to be usable as a regression test output ... but we'll soon find out from the buildfarm.) Back-patch to 9.3. The problem case can't arise without LATERAL, so no need to touch older branches.	2013-08-14 18:38:47 -04:00
Kevin Grittner	e2cd368678	Remove Assert that matview is not in system schema from REFRESH. We don't want to prevent an extension which creates a matview from being installed in pg_catalog. Issue was raised by Hitoshi Harada. Backpatched to 9.3.	2013-08-14 12:36:55 -05:00
Tom Lane	3d5282c6f0	Emit a log message if output is about to be redirected away from stderr. We've seen multiple cases of people looking at the postmaster's original stderr output to try to diagnose problems, not realizing/remembering that their logging configuration is set up to send log messages somewhere else. This seems particularly likely to happen in prepackaged distributions, since many packagers patch the code to change the factory-standard logging configuration to something more in line with their platform conventions. In hopes of reducing confusion, emit a LOG message about this at the point in startup where we are about to switch log output away from the original stderr, providing a pointer to where to look instead. This message will appear as the last thing in the original stderr output. (We might later also try to emit such link messages when logging parameters are changed on-the-fly; but that case seems to be both noticeably harder to do nicely, and much less frequently a problem in practice.) Per discussion, back-patch to 9.3 but not further.	2013-08-13 15:24:52 -04:00
Peter Eisentraut	072457b360	Message punctuation and pluralization fixes	2013-08-09 08:02:44 -04:00
Peter Eisentraut	9d775d8894	Message style improvements	2013-08-07 22:48:40 -04:00
Fujii Masao	91c3613d37	Fix assertion failure by an immediate shutdown. In PM_WAIT_DEAD_END state, checkpointer process must be dead already. But an immediate shutdown could make postmaster's state machine transition to PM_WAIT_DEAD_END state even if checkpointer process is still running, and which caused assertion failure. This bug was introduced in commit `457d6cf049`. This patch ensures that postmaster's state machine doesn't transition to PM_WAIT_DEAD_END state in an immediate shutdown while checkpointer process is running.	2013-08-08 02:48:53 +09:00
Tom Lane	3ced8837db	Simplify query_planner's API by having it return the top-level RelOptInfo. Formerly, query_planner returned one or possibly two Paths for the topmost join relation, so that grouping_planner didn't see the join RelOptInfo (at least not directly; it didn't have any hesitation about examining cheapest_path->parent, though). However, correct selection of the Paths involved a significant amount of coupling between query_planner and grouping_planner, a problem which has gotten worse over time. It seems best to give up on this API choice and instead return the topmost RelOptInfo explicitly. Then grouping_planner can pull out the Paths it wants from the rel's path list. In this way we can remove all knowledge of grouping behaviors from query_planner. The only real benefit of the old way is that in the case of an empty FROM clause, we never made any RelOptInfos at all, just a Path. Now we have to gin up a dummy RelOptInfo to represent the empty FROM clause. That's not a very big deal though. While at it, simplify query_planner's API a bit more by having the caller set up root->tuple_fraction and root->limit_tuples, rather than passing those values as separate parameters. Since query_planner no longer does anything with either value, requiring it to fill the PlannerInfo fields seemed pretty arbitrary. This patch just rearranges code; it doesn't (intentionally) change any behaviors. Followup patches will do more interesting things.	2013-08-05 15:01:09 -04:00
Kevin Grittner	841c29c8b3	Various cleanups for REFRESH MATERIALIZED VIEW CONCURRENTLY. Open and lock each index before checking definition in RMVC. The ExclusiveLock on the related table is not viewed as sufficient to ensure that no changes are made to the index definition, and invalidation messages from other backends might have been missed. Additionally, use RelationGetIndexExpressions() and check for NIL rather than doing our own loop. Protect against redefinition of tid and rowvar operators in RMVC. While working on this, noticed that the fixes for bugs found during the CF made the UPDATE statement useless, since no rows could qualify for that treatment any more. Ripping out code to support the UPDATE statement simplified the operator cleanups. Change slightly confusing local field name. Use meaningful alias names on queries in refresh_by_match_merge(). Per concerns of raised by Andres Freund and comments and suggestions from Noah Misch. Some additional issues remain, which will be addressed separately.	2013-08-05 09:57:56 -05:00
Tom Lane	221e92f64c	Make sure float4in/float8in accept all standard spellings of "infinity". The C99 and POSIX standards require strtod() to accept all these spellings (case-insensitively): "inf", "+inf", "-inf", "infinity", "+infinity", "-infinity". However, pre-C99 systems might accept only some or none of these, and apparently Windows still doesn't accept "inf". To avoid surprising cross-platform behavioral differences, manually check for each of these spellings if strtod() fails. We were previously handling just "infinity" and "-infinity" that way, but since C99 is most of the world now, it seems likely that applications are expecting all these spellings to work. Per bug #8355 from Basil Peace. It turns out this fix won't actually resolve his problem, because Python isn't being this careful; but that doesn't mean we shouldn't be.	2013-08-03 12:40:27 -04:00
Alvaro Herrera	706f9dd914	Fix old visibility bug in HeapTupleSatisfiesDirty If a tuple is locked but not updated by a concurrent transaction, HeapTupleSatisfiesDirty would return that transaction's Xid in xmax, causing callers to wait on it, when it is not necessary (in fact, if the other transaction had used a multixact instead of a plain Xid to mark the tuple, HeapTupleSatisfiesDirty would have behave differently and not returned the Xmax). This bug was introduced in commit `3f7fbf85dc`, dated December 1998, so it's almost 15 years old now. However, it's hard to see this misbehave, because before we had NOWAIT the only consequence of this is that transactions would wait for slightly more time than necessary; so it's not surprising that this hasn't been reported yet. Craig Ringer and Andres Freund	2013-08-02 17:02:36 -04:00
Alvaro Herrera	88c556680c	Fix crash in error report of invalid tuple lock My tweak of these error messages in commit `c359a1b082` contained the thinko that a query would always have rowMarks set for a query containing a locking clause. Not so: when declaring a cursor, for instance, rowMarks isn't set at the point we're checking, so we'd be dereferencing a NULL pointer. The fix is to pass the lock strength to the function raising the error, instead of trying to reverse-engineer it. The result not only is more robust, but it also seems cleaner overall. Per report from Robert Haas.	2013-08-02 13:18:37 -04:00
Robert Haas	05ee328d66	Fix typo in comment. Etsuro Fujita	2013-08-02 09:15:42 -04:00
Kevin Grittner	f31c149f13	Improve comments for IncrementalMaintenance DML enabling functions. Move the static functions after the comment and expand the comment. Per complaint from Andres Freund, although using different comment text.	2013-08-01 14:31:09 -05:00
Robert Haas	149e38e5ee	Assorted bgworker-related comment fixes. Per gripes by Amit Kapila.	2013-08-01 12:20:31 -04:00
Robert Haas	813fb03155	Remove SnapshotNow and HeapTupleSatisfiesNow. We now use MVCC catalog scans, and, per discussion, have eliminated all other remaining uses of SnapshotNow, so that we can now get rid of it. This will break third-party code which is still using it, which is intentional, as we want such code to be updated to do things the new way.	2013-08-01 10:46:19 -04:00
Stephen Frost	ddef1a39c6	Allow a context to be passed in for error handling As pointed out by Tom Lane, we can allow other users of the error handler callbacks to provide their own memory context by adding the context to use to ErrorData and using that instead of explicitly using ErrorContext. This then allows GetErrorContextStack() to be called from inside exception handlers, so modify plpgsql to take advantage of that and add an associated regression test for it.	2013-08-01 01:07:20 -04:00
Alvaro Herrera	a59516b631	Fix mis-indented lines Per Coverity	2013-07-31 17:57:15 -04:00
Tom Lane	d074b4e50d	Fix regexp_matches() handling of zero-length matches. We'd find the same match twice if it was of zero length and not immediately adjacent to the previous match. replace_text_regexp() got similar cases right, so adjust this search logic to match that. Note that even though the regexp_split_to_xxx() functions share this code, they did not display equivalent misbehavior, because the second match would be considered degenerate and ignored. Jeevan Chalke, with some cosmetic changes by me.	2013-07-31 11:31:22 -04:00
Fujii Masao	c876fb4241	Fix typo in comment. Hitoshi Harada	2013-07-31 22:53:20 +09:00
Noah Misch	16f38f72ab	Restore REINDEX constraint validation. Refactoring as part of commit `8ceb245680` had the unintended effect of making REINDEX TABLE and REINDEX DATABASE no longer validate constraints enforced by the indexes in question; REINDEX INDEX still did so. Indexes marked invalid remained so, and constraint violations arising from data corruption went undetected. Back-patch to 9.0, like the causative commit.	2013-07-30 18:36:52 -04:00
Greg Stark	c62736cc37	Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF) Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost	2013-07-29 16:38:01 +01:00
Peter Eisentraut	626092a2e1	Message style improvements	2013-07-28 07:01:13 -04:00
Tom Lane	3d13623d75	Prevent leakage of SPI tuple tables during subtransaction abort. plpgsql often just remembers SPI-result tuple tables in local variables, and has no mechanism for freeing them if an ereport(ERROR) causes an escape out of the execution function whose local variable it is. In the original coding, that wasn't a problem because the tuple table would be cleaned up when the function's SPI context went away during transaction abort. However, once plpgsql grew the ability to trap exceptions, repeated trapping of errors within a function could result in significant intra-function-call memory leakage, as illustrated in bug #8279 from Chad Wagner. We could fix this locally in plpgsql with a bunch of PG_TRY/PG_CATCH coding, but that would be tedious, probably slow, and prone to bugs of omission; moreover it would do nothing for similar risks elsewhere. What seems like a better plan is to make SPI itself responsible for freeing tuple tables at subtransaction abort. This patch attacks the problem that way, keeping a list of live tuple tables within each SPI function context. Currently, such freeing is automatic for tuple tables made within the failed subtransaction. We might later add a SPI call to mark a tuple table as not to be freed this way, allowing callers to opt out; but until someone exhibits a clear use-case for such behavior, it doesn't seem worth bothering. A very useful side-effect of this change is that SPI_freetuptable() can now defend itself against bad calls, such as duplicate free requests; this should make things more robust in many places. (In particular, this reduces the risks involved if a third-party extension contains now-redundant SPI_freetuptable() calls in error cleanup code.) Even though the leakage problem is of long standing, it seems imprudent to back-patch this into stable branches, since it does represent an API semantics change for SPI users. We'll patch this in 9.3, but live with the leakage in older branches.	2013-07-25 16:46:14 -04:00
Robert Haas	ed93feb808	Change currtid functions to use an MVCC snapshot, not SnapshotNow. This has a slight performance cost, but the only known consumers of these functions, known at the SQL level as currtid and currtid2, is pgsql-odbc; whose usage, we hope, is not sufficiently intensive to make this a problem. Per discussion.	2013-07-25 16:32:02 -04:00
Robert Haas	3483f4332d	Don't use SnapshotNow in get_actual_variable_range. Instead, use the active snapshot. Per Tom Lane, this function is most interested in knowing the range of tuples our scan will actually see. This is another step towards full removal of SnapshotNow.	2013-07-25 14:30:00 -04:00
Stephen Frost	9bd0feeba8	Improvements to GetErrorContextStack() As GetErrorContextStack() borrowed setup and tear-down code from other places, it was less than clear that it must only be called as a top-level entry point into the error system and can't be called by an exception handler (unlike the rest of the error system, which is set up to be reentrant-safe). Being called from an exception handler is outside the charter of GetErrorContextStack(), so add a bit more protection against it, improve the comments addressing why we have to set up an errordata stack for this function at all, and add a few more regression tests. Lack of clarity pointed out by Tom Lane; all bugs are mine.	2013-07-25 09:41:55 -04:00
Stephen Frost	8312832567	Add GET DIAGNOSTICS ... PG_CONTEXT in PL/PgSQL This adds the ability to get the call stack as a string from within a PL/PgSQL function, which can be handy for logging to a table, or to include in a useful message to an end-user. Pavel Stehule, reviewed by Rushabh Lathia and rather heavily whacked around by Stephen Frost.	2013-07-24 18:53:27 -04:00
Tom Lane	fa2fad3c06	Improve ilist.h's support for deletion of slist elements during iteration. Previously one had to use slist_delete(), implying an additional scan of the list, making this infrastructure considerably less efficient than traditional Lists when deletion of element(s) in a long list is needed. Modify the slist_foreach_modify() macro to support deleting the current element in O(1) time, by keeping a "prev" pointer in addition to "cur" and "next". Although this makes iteration with this macro a bit slower, no real harm is done, since in any scenario where you're not going to delete the current list element you might as well just use slist_foreach instead. Improve the comments about when to use each macro. Back-patch to 9.3 so that we'll have consistent semantics in all branches that provide ilist.h. Note this is an ABI break for callers of slist_foreach_modify(). Andres Freund and Tom Lane	2013-07-24 17:42:34 -04:00
Tom Lane	b32a25c3d5	Fix booltestsel() for case where we have NULL stats but not MCV stats. In a boolean column that contains mostly nulls, ANALYZE might not find enough non-null values to populate the most-common-values stats, but it would still create a pg_statistic entry with stanullfrac set. The logic in booltestsel() for this situation did the wrong thing for "col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null values would satisfy these tests (so that the true selectivity would be close to one, not close to zero). Per bug #8274. Fix by Andrew Gierth, some comment-smithing by me.	2013-07-24 00:44:09 -04:00
Tom Lane	10a509d829	Move strip_implicit_coercions() from optimizer to nodeFuncs.c. Use of this function has spread into the parser and rewriter, so it seems like time to pull it out of the optimizer and put it into the more central nodeFuncs module. This eliminates the need to #include optimizer/clauses.h in most of the calling files, demonstrating that this function was indeed a bit outside the normal code reference patterns.	2013-07-23 18:21:19 -04:00
Tom Lane	ef655663c5	Further hacking on ruleutils' new column-alias-assignment code. After further thought about implicit coercions appearing in a joinaliasvars list, I realized that they represent an additional reason why we might need to reference the join output column directly instead of referencing an underlying column. Consider SELECT x FROM t1 LEFT JOIN t2 USING (x) where t1.x is of type date while t2.x is of type timestamptz. The merged output variable is of type timestamptz, but it won't go to null when t2 does, therefore neither t1.x nor t2.x is a valid substitute reference. The code in get_variable() actually gets this case right, since it knows it shouldn't look through a coercion, but we failed to ensure that the unqualified output column name would be globally unique. To fix, modify the code that trawls for a dangerous situation so that it actually scans through an unnamed join's joinaliasvars list to see if there are any non-simple-Var entries.	2013-07-23 17:55:04 -04:00
Tom Lane	a7cd853b75	Change post-rewriter representation of dropped columns in joinaliasvars. It's possible to drop a column from an input table of a JOIN clause in a view, if that column is nowhere actually referenced in the view. But it will still be there in the JOIN clause's joinaliasvars list. We used to replace such entries with NULL Const nodes, which is handy for generation of RowExpr expansion of a whole-row reference to the view. The trouble with that is that it can't be distinguished from the situation after subquery pull-up of a constant subquery output expression below the JOIN. Instead, replace such joinaliasvars with null pointers (empty expression trees), which can't be confused with pulled-up expressions. expandRTE() still emits the old convention, though, for convenience of RowExpr generation and to reduce the risk of breaking extension code. In HEAD and 9.3, this patch also fixes a problem with some new code in ruleutils.c that was failing to cope with implicitly-casted joinaliasvars entries, as per recent report from Feike Steenbergen. That oversight was because of an inadequate description of the data structure in parsenodes.h, which I've now corrected. There were some pre-existing oversights of the same ilk elsewhere, which I believe are now all fixed.	2013-07-23 16:23:45 -04:00
Alvaro Herrera	c359a1b082	Tweak FOR UPDATE/SHARE error message wording (again) In commit `0ac5ad5134` I changed some error messages from "FOR UPDATE/SHARE" to a rather long gobbledygook which nobody liked. Then, in commit `cb9b66d31` I changed them again, but the alternative chosen there was deemed suboptimal by Peter Eisentraut, who in message 1373937980.20441.8.camel@vanquo.pezone.net proposed an alternative involving a dynamically-constructed string based on the actual locking strength specified in the SQL command. This patch implements that suggestion.	2013-07-23 14:03:09 -04:00
Robert Haas	765ad89be3	Use InvalidSnapshot, now SnapshotNow, as the default snapshot. As far as I can determine, there's no code in the core distribution that fails to explicitly set the snapshot of a scan or executor state. If there is any such code, this will probably cause it to seg fault; friendlier suggestions were discussed on pgsql-hackers, but there was no consensus that anything more than this was needed. This is another step towards the hoped-for complete removal of SnapshotNow.	2013-07-23 10:58:32 -04:00
Robert Haas	21e28e4531	Fix cache flush hazard in ExecRefreshMatView. Andres Freund	2013-07-22 18:10:05 -04:00
Robert Haas	f40a318eea	Remove bgw_sighup and bgw_sigterm. Per discussion on pgsql-hackers, these aren't really needed. Interim versions of the background worker patch had the worker starting with signals already unblocked, which would have made this necessary. But the final version does not, so we don't really need it; and it doesn't work well with the new facility for starting dynamic background workers, so just rip it out. Also per discussion on pgsql-hackers, back-patch this change to 9.3. It's best to get the API break out of the way before we do an official release of this facility, to avoid more pain for extension authors later.	2013-07-22 14:13:00 -04:00
Robert Haas	0518eceec3	Adjust HeapTupleSatisfies* routines to take a HeapTuple. Previously, these functions took a HeapTupleHeader, but upcoming patches for logical replication will introduce new a new snapshot type under which the tuple's TID will be used to lookup (CMIN, CMAX) for visibility determination purposes. This makes that information available. Code churn is minimal since HeapTupleSatisfiesVisibility took the HeapTuple anyway, and deferenced it before calling the satisfies function. Independently of logical replication, this allows t_tableOid and t_self to be cross-checked via assertions in tqual.c. This seems like a useful way to make sure that all callers are setting these values properly, which has been previously put forward as desirable. Andres Freund, reviewed by Álvaro Herrera	2013-07-22 13:38:44 -04:00
Alvaro Herrera	0aeb5ae204	Silence compiler warning on an unused variable Also, tweak wording in comments (per Andres) and documentation (myself) to point out that it's the database's default tablespace that can be passed as 0, not DEFAULTTABLESPACE_OID. Robert Haas noticed the bug in the code, but didn't update the accompanying prose.	2013-07-22 13:15:13 -04:00
Robert Haas	f01d1ae3a1	Add infrastructure for mapping relfilenodes to relation OIDs. Future patches are expected to introduce logical replication that works by decoding WAL. WAL contains relfilenodes rather than relation OIDs, so this infrastructure will be needed to find the relation OID based on WAL contents. If logical replication does not make it into this release, we probably should consider reverting this, since it will add some overhead to DDL operations that create new relations. One additional index insert per pg_class row is not a large overhead, but it's more than zero. Another way of meeting the needs of logical replication would be to the relation OID to WAL, but that would burden DML operations, not only DDL. Andres Freund, with some changes by me. Design review, in earlier versions, by Álvaro Herrera.	2013-07-22 11:09:10 -04:00
Peter Eisentraut	ff41a5de09	Clean up new JSON API typedefs The new JSON API uses a bit of an unusual typedef scheme, where for example OkeysState is a pointer to okeysState. And that's not applied consistently either. Change that to the more usual PostgreSQL style where struct typedefs are upper case, and use pointers explicitly.	2013-07-20 06:38:31 -04:00
Alvaro Herrera	6737aa72ba	Fix HeapTupleSatisfiesVacuum on aborted updater xacts By using only the macro that checks infomask bits HEAP_XMAX_IS_LOCKED_ONLY to verify whether a multixact is not an updater, and not the full HeapTupleHeaderIsOnlyLocked, it would come to the wrong result in case of a multixact containing an aborted update; therefore returning the wrong result code. This would cause predicate.c to break completely (as in bug report #8273 from David Leverton), and certain index builds would misbehave. As far as I can tell, other callers of the bogus routine would make harmless mistakes or not be affected by the difference at all; so this was a pretty narrow case. Also, no other user of the HEAP_XMAX_IS_LOCKED_ONLY macro is as careless; they all check specifically for the HEAP_XMAX_IS_MULTI case, and they all verify whether the updater is InvalidXid before concluding that it's a valid updater. So there doesn't seem to be any similar bug.	2013-07-19 18:47:37 -04:00
Tom Lane	d9f37e6661	Add checks for valid multibyte character length in UtfToLocal, LocalToUtf. This is mainly to suppress "uninitialized variable" warnings from very recent versions of gcc. But it seems like a good robustness thing anyway, not to mention that we might someday decide to support 6-byte UTF8. Per report from Karol Trzcionka. No back-patch since there's no reason at the moment to think this is more than cosmetic.	2013-07-18 21:55:38 -04:00
Tom Lane	e2bd904955	Fix regex match failures for backrefs combined with non-greedy quantifiers. An ancient logic error in cfindloop() could cause the regex engine to fail to find matches that begin later than the start of the string. This function is only used when the regex pattern contains a back reference, and so far as we can tell the error is only reachable if the pattern is non-greedy (i.e. its first quantifier uses the ? modifier). Furthermore, the actual match must begin after some potential match that satisfies the DFA but then fails the back-reference's match test. Reported and fixed by Jeevan Chalke, with cosmetic adjustments by me.	2013-07-18 21:22:37 -04:00
Stephen Frost	4cbe3ac3e8	WITH CHECK OPTION support for auto-updatable VIEWs For simple views which are automatically updatable, this patch allows the user to specify what level of checking should be done on records being inserted or updated. For 'LOCAL CHECK', new tuples are validated against the conditionals of the view they are being inserted into, while for 'CASCADED CHECK' the new tuples are validated against the conditionals for all views involved (from the top down). This option is part of the SQL specification. Dean Rasheed, reviewed by Pavel Stehule	2013-07-18 17:10:16 -04:00
Andrew Dunstan	d26888bc4d	Move checking an explicit VARIADIC "any" argument into the parser. This is more efficient and simpler . It does mean that an untyped NULL can no longer be used in such cases, which should be mentioned in Release Notes, but doesn't seem a terrible loss. The workaround is to cast the NULL to some array type. Pavel Stehule, reviewed by Jeevan Chalke.	2013-07-18 11:52:12 -04:00
Tom Lane	405a468b02	Fix direct access to Relation->rd_indpred. Should use RelationGetIndexPredicate(), since rd_indpred is just a cache that is not computed until/unless demanded. Per buildfarm failure on CLOBBER_CACHE_ALWAYS animals; diagnosis and fix by Hitoshi Harada.	2013-07-18 01:02:18 -04:00
Heikki Linnakangas	107cbc90a7	Fix variable names mentioned in comment to match the code. Also, in another comment, explain why holding an insertion slot is a critical section. Per review by Amit Kapila.	2013-07-17 23:32:32 +03:00
Heikki Linnakangas	59c02a36f0	Fix assert failure at end of recovery, broken by XLogInsert scaling patch. Initialization of the first XLOG buffer at end-of-recovery was broken for the case that the last read WAL record ended at a page boundary. Instead of trying to copy the last full xlog page to the buffer cache in that case, just set shared state so that the next page is initialized when the first WAL record after startup is inserted. (that's what we did in earlier version, too) To make the shared state required for that case less surprising, replace the XLogCtl->curridx variable, which was the index of the latest initialized buffer, with an XLogRecPtr of how far the buffers have been initialized. That also allows us to get rid of the XLogRecEndPtrToBufIdx macro. While we're at it, make a similar change for XLogCtl->Write.curridx, getting rid of that variable and calculating the next buffer to write from XLogCtl->LogwrtResult instead.	2013-07-17 23:12:22 +03:00
Heikki Linnakangas	3f2adace1e	Fix end-of-loop optimization in pglz_find_match() function. After the recent pglz optimization patch, the next/prev pointers in the hash table are never NULL, INVALID_ENTRY_PTR is used to represent invalid entries instead. The end-of-loop check in pglz_find_match() function didn't get the memo. The result was the same from a correctness point of view, but because the NULL-check would never fail, the tiny optimization turned into a pessimization. Reported by Stephen Frost, using Coverity scanner.	2013-07-17 20:37:09 +03:00
Noah Misch	ffcf654547	Fix systable_recheck_tuple() for MVCC scan snapshots. Since this function assumed non-MVCC snapshots, it broke when commit `568d4138c6` switched its one caller from SnapshotNow scans to MVCC-snapshot scans. Reviewed by Robert Haas, Tom Lane and Andres Freund.	2013-07-16 20:16:32 -04:00
Noah Misch	b560ec1b0d	Implement the FILTER clause for aggregate function calls. This is SQL-standard with a few extensions, namely support for subqueries and outer references in clause expressions. catversion bump due to change in Aggref and WindowFunc. David Fetter, reviewed by Dean Rasheed.	2013-07-16 20:15:36 -04:00
Noah Misch	7a8e9f298e	Comment on why planagg.c punts "MIN(x ORDER BY y)".	2013-07-16 20:14:37 -04:00
Kevin Grittner	cc1965a99b	Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. This allows reads to continue without any blocking while a REFRESH runs. The new data appears atomically as part of transaction commit. Review questioned the Assert that a matview was not a system relation. This will be addressed separately. Reviewed by Hitoshi Harada, Robert Haas, Andres Freund. Merged after review with security patch `f3ab5d4`.	2013-07-16 12:55:44 -05:00
Robert Haas	7f7485a0cd	Allow background workers to be started dynamically. There is a new API, RegisterDynamicBackgroundWorker, which allows an ordinary user backend to register a new background writer during normal running. This means that it's no longer necessary for all background workers to be registered during processing of shared_preload_libraries, although the option of registering workers at that time remains available. When a background worker exits and will not be restarted, the slot previously used by that background worker is automatically released and becomes available for reuse. Slots used by background workers that are configured for automatic restart can't (yet) be released without shutting down the system. This commit adds a new source file, bgworker.c, and moves some of the existing control logic for background workers there. Previously, there was little enough logic that it made sense to keep everything in postmaster.c, but not any more. This commit also makes the worker_spi contrib module into an extension and adds a new function, worker_spi_launch, which can be used to demonstrate the new facility.	2013-07-16 13:02:15 -04:00
Stephen Frost	4ed22e891f	Check get_tle_by_resno() result before deref When creating a sort to support a group by, we need to look up the target entry in the target list by the resno using get_tle_by_resno(). This particular code-path didn't check the result prior to attempting to dereference it, while all other callers did. While I can't see a way for this usage of get_tle_by_resno() to fail (you can't ask for a column to be sorted on which isn't included in the group by), it's probably best to check that we didn't end up with a NULL somehow anyway than risk the segfault. I'm willing to back-patch this if others feel it's necessary, but my guess is new features are what might tickle this rather than anything existing. Missing check spotted by the Coverity scanner.	2013-07-15 15:04:19 -04:00
Robert Haas	42c80c696e	Assert that syscache lookups don't happen outside transactions. Andres Freund	2013-07-15 13:31:36 -04:00
Stephen Frost	273dcd1628	Ensure 64bit arithmetic when calculating tapeSpace In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring out how many 'tapes' we can use (maxTapes) and then multiplying the result by the tape buffer overhead for each. Unfortunately, when we are on a system with an 8-byte long, we allow work_mem to be larger than 2GB and that allows maxTapes to be large enough that the 32bit arithmetic can overflow when multiplied against the buffer overhead. When this overflow happens, we end up adding the overflow to the amount of space available, causing the amount of memory allocated to be larger than work_mem. Note that to reach this point, you have to set work mem to at least 24GB and be sorting a set which is at least that size. Given that a user who can set work_mem to 24GB could also set it even higher, if they were looking to run the system out of memory, this isn't considered a security issue. This overflow risk was found by the Coverity scanner. Back-patch to all supported branches, as this issue has existed since before 8.4.	2013-07-14 16:26:16 -04:00
Peter Eisentraut	070518ddab	Add session_preload_libraries configuration parameter This is like shared_preload_libraries except that it takes effect at backend start and can be changed without a full postmaster restart. It is like local_preload_libraries except that it is still only settable by a superuser. This can be a better way to load modules such as auto_explain. Since there are now three preload parameters, regroup the documentation a bit. Put all parameters into one section, explain common functionality only once, update the descriptions to reflect current and future realities. Reviewed-by: Dimitri Fontaine <dimitri@2ndQuadrant.fr>	2013-07-12 21:23:50 -04:00
Noah Misch	f3ab5d4696	Switch user ID to the object owner when populating a materialized view. This makes superuser-issued REFRESH MATERIALIZED VIEW safe regardless of the object's provenance. REINDEX is an earlier example of this pattern. As a downside, functions called from materialized views must tolerate running in a security-restricted operation. CREATE MATERIALIZED VIEW need not change user ID. Nonetheless, avoid creation of materialized views that will invariably fail REFRESH by making it, too, start a security-restricted operation. Back-patch to 9.3 so materialized views have this from the beginning. Reviewed by Kevin Grittner.	2013-07-12 18:21:22 -04:00
Noah Misch	448fee2e23	Make comments reflect that omission of SPI_gettypmod() is intentional.	2013-07-12 18:07:46 -04:00
Peter Eisentraut	8dead08c54	Fix lack of message pluralization	2013-07-09 20:49:44 -04:00
Peter Eisentraut	7888c61238	Fix bool abuse path_encode's "closed" argument used to take three values: TRUE, FALSE, or -1, while being of type bool. Replace that with a three-valued enum for more clarity.	2013-07-08 22:42:39 -04:00
Heikki Linnakangas	f489470f8a	Fix Windows build. Was broken by my xloginsert scaling patch. XLogCtl global variable needs to be initialized in each process, as it's not inherited by fork() on Windows.	2013-07-08 17:28:48 +03:00
Heikki Linnakangas	9a20a9b21b	Improve scalability of WAL insertions. This patch replaces WALInsertLock with a number of WAL insertion slots, allowing multiple backends to insert WAL records to the WAL buffers concurrently. This is particularly useful for parallel loading large amounts of data on a system with many CPUs. This has one user-visible change: switching to a new WAL segment with pg_switch_xlog() now fills the remaining unused portion of the segment with zeros. This potentially adds some overhead, but it has been a very common practice by DBA's to clear the "tail" of the segment with an external pg_clearxlogtail utility anyway, to make the WAL files compress better. With this patch, it's no longer necessary to do that. This patch adds a new GUC, xloginsert_slots, to tune the number of WAL insertion slots. Performance testing suggests that the default, 8, works pretty well for all kinds of worklods, but I left the GUC in place to allow others with different hardware to test that easily. We might want to remove that before release. Reviewed by Andres Freund.	2013-07-08 11:23:56 +03:00
Tom Lane	5372275b4b	Fix planning of parameterized appendrel paths with expensive join quals. The code in set_append_rel_pathlist() for building parameterized paths for append relations (inheritance and UNION ALL combinations) supposed that the cheapest regular path for a child relation would still be cheapest when reparameterized. Which might not be the case, particularly if the added join conditions are expensive to compute, as in a recent example from Jeff Janes. Fix it to compare child path costs after reparameterizing. We can short-circuit that if the cheapest pre-existing path is already parameterized correctly, which seems likely to be true often enough to be worth checking for. Back-patch to 9.2 where parameterized paths were introduced.	2013-07-07 22:37:24 -04:00
Jeff Davis	5b571bb8c8	Handle posix_fallocate() errors. On some platforms, posix_fallocate() is available but may still return EINVAL if the underlying filesystem does not support it. So, in case of an error, fall through to the alternate implementation that just writes zeros. Per buildfarm failure and analysis by Tom Lane.	2013-07-06 13:46:04 -07:00
Noah Misch	02d2b694ee	Update messages, comments and documentation for materialized views. All instances of the verbiage lagging the code. Back-patch to 9.3, where materialized views were introduced.	2013-07-05 15:37:51 -04:00
Jeff Davis	269e780822	Use posix_fallocate() for new WAL files, where available. This function is more efficient than actually writing out zeroes to the new file, per microbenchmarks by Jon Nelson. Also, it may reduce the likelihood of WAL file fragmentation. Jon Nelson, with review by Andres Freund, Greg Smith and me.	2013-07-05 12:30:29 -07:00
Magnus Hagander	c87ff71f37	Expose the estimation of number of changed tuples since last analyze This value, now pg_stat_all_tables.n_mod_since_analyze, was already tracked and used by autovacuum, but not exposed to the user. Mark Kirkwood, review by Laurenz Albe	2013-07-05 15:10:15 +02:00
Noah Misch	79e0f87a15	Use type "int64" for memory accounting in tuplesort.c/tuplestore.c. Commit `263865a489` switched tuplesort.c and tuplestore.c variables representing memory usage from type "long" to type "Size". This was unnecessary; I thought doing so avoided overflow scenarios on 64-bit Windows, but guc.c already limited work_mem so as to prevent the overflow. It was also incomplete, not touching the logic that assumed a signed data type. Change the affected variables to "int64". This is perfect for 64-bit platforms, and it reduces the need to contemplate platform-specific overflow scenarios. It also puts us close to being able to support work_mem over 2 GiB on 64-bit Windows. Per report from Andres Freund.	2013-07-04 23:13:54 -04:00
Fujii Masao	7842d41df5	Fix typo in comment. Michael Paquier	2013-07-05 02:47:49 +09:00
Robert Haas	6bc8ef0b7f	Add new GUC, max_worker_processes, limiting number of bgworkers. In 9.3, there's no particular limit on the number of bgworkers; instead, we just count up the number that are actually registered, and use that to set MaxBackends. However, that approach causes problems for Hot Standby, which needs both MaxBackends and the size of the lock table to be the same on the standby as on the master, yet it may not be desirable to run the same bgworkers in both places. 9.3 handles that by failing to notice the problem, which will probably work fine in nearly all cases anyway, but is not theoretically sound. A further problem with simply counting the number of registered workers is that new workers can't be registered without a postmaster restart. This is inconvenient for administrators, since bouncing the postmaster causes an interruption of service. Moreover, there are a number of applications for background processes where, by necessity, the background process must be started on the fly (e.g. parallel query). While this patch doesn't actually make it possible to register new background workers after startup time, it's a necessary prerequisite. Patch by me. Review by Michael Paquier.	2013-07-04 11:24:24 -04:00
Fujii Masao	2ef085d0e6	Get rid of pg_class.reltoastidxid. Treat TOAST index just the same as normal one and get the OID of TOAST index from pg_index but not pg_class.reltoastidxid. This change allows us to handle multiple TOAST indexes, and which is required infrastructure for upcoming REINDEX CONCURRENTLY feature. Patch by Michael Paquier, reviewed by Andres Freund and me.	2013-07-04 03:24:09 +09:00
Tom Lane	5530a82643	Fix handling of auto-updatable views on inherited tables. An INSERT into such a view should work just like an INSERT into its base table, ie the insertion should go directly into that table ... not be duplicated into each child table, as was happening before, per bug #8275 from Rushabh Lathia. On the other hand, the current behavior for UPDATE/DELETE seems reasonable: the update/delete traverses the child tables, or not, depending on whether the view specifies ONLY or not. Add some regression tests covering this area. Dean Rasheed	2013-07-03 12:26:52 -04:00
Alvaro Herrera	620935ad08	Unbreak postmaster restart-after-crash sequence In patch `82233ce7ea`, AbortStartTime wasn't being reset appropriately after the restart sequence, causing subsequent iterations through ServerLoop to malfunction.	2013-07-03 11:08:52 -04:00
Robert Haas	3682025015	Add support for multiple kinds of external toast datums. To that end, support tags rather than lengths for external datums. As an example of how this can be used, add support or "indirect" tuples which point to some externally allocated memory containing a toast tuple. Similar infrastructure could be used for other purposes, including, perhaps, support for alternative compression algorithms. Andres Freund, reviewed by Hitoshi Harada and myself	2013-07-02 13:38:55 -04:00
Robert Haas	568d4138c6	Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.	2013-07-02 09:47:01 -04:00
Robert Haas	0d22987ae9	Add a convenience routine makeFuncCall to reduce duplication. David Fetter and Andrew Gierth, reviewed by Jeevan Chalke	2013-07-01 14:46:54 -04:00
Bruce Momjian	7408c5d29b	Add timezone offset output option to to_char() Add ability for to_char() to output the timezone's UTC offset (OF). We already have the ability to return the timezone abbeviation (TZ/tz). Per request from Andrew Dunstan	2013-07-01 13:40:32 -04:00
Heikki Linnakangas	031cc55bbe	Optimize pglz compressor for small inputs. The pglz compressor has a significant startup cost, because it has to initialize to zeros the history-tracking hash table. On a 64-bit system, the hash table was 64kB in size. While clearing memory is pretty fast, for very short inputs the relative cost of that was quite large. This patch alleviates that in two ways. First, instead of storing pointers in the hash table, store 16-bit indexes into the hist_entries array. That slashes the size of the hash table to 1/2 or 1/4 of the original, depending on the pointer width. Secondly, adjust the size of the hash table based on input size. For very small inputs, you don't need a large hash table to avoid collisions. Review by Amit Kapila.	2013-07-01 11:00:14 +03:00
Heikki Linnakangas	79ce29c734	Retry short writes when flushing WAL. We don't normally bother retrying when the number of bytes written by write() is short of what was requested. It is generally assumed that a write() to disk doesn't return short, unless you run out of disk space. While writing the WAL, however, it seems prudent to try a bit harder, because a failure leads to PANIC. The write() is also much larger than most write()s in the backend (up to wal_buffers), so there's more room for surprises. Also retry on EINTR. All signals used in the backend are flagged SA_RESTART nowadays, so it shouldn't happen, but better to be defensive.	2013-07-01 09:36:00 +03:00
Heikki Linnakangas	ee6556555b	Inline ginCompareItemPointers function for speed. ginCompareItemPointers function is called heavily in gin index scans - inlining it speeds up some kind of queries a lot.	2013-06-29 12:55:34 +03:00
Simon Riggs	d51b271059	Change errcode for lock_timeout to match NOWAIT Set errcode to ERRCODE_LOCK_NOT_AVAILABLE Zoltán Bsöszörményi	2013-06-29 00:57:25 +01:00
Simon Riggs	f177cbfe67	ALTER TABLE ... ALTER CONSTRAINT for FKs Allow constraint attributes to be altered, so the default setting of NOT DEFERRABLE can be altered to DEFERRABLE and back. Review by Abhijit Menon-Sen	2013-06-29 00:27:30 +01:00
Simon Riggs	2f74e4ec50	Assert that ALTER TABLE subcommands have pass set	2013-06-29 00:26:46 +01:00
Alvaro Herrera	82233ce7ea	Send SIGKILL to children if they don't die quickly in immediate shutdown On immediate shutdown, or during a restart-after-crash sequence, postmaster used to send SIGQUIT (and then abandon ship if shutdown); but this is not a good strategy if backends don't die because of that signal. (This might happen, for example, if a backend gets tangled trying to malloc() due to gettext(), as in an example illustrated by MauMau.) This causes problems when later trying to restart the server, because some processes are still attached to the shared memory segment. Instead of just abandoning such backends to their fates, we now have postmaster hang around for a little while longer, send a SIGKILL after some reasonable waiting period, and then exit. This makes immediate shutdown more reliable. There is disagreement on whether it's best for postmaster to exit after sending SIGKILL, or to stick around until all children have reported death. If this controversy is resolved differently than what this patch implements, it's an easy change to make. Bug reported by MauMau in message 20DAEA8949EC4E2289C6E8E58560DEC0@maumau MauMau and Álvaro Herrera	2013-06-28 17:49:46 -04:00
Robert Haas	5893ffa79c	Make the OVER keyword unreserved. This results in a slightly less specific error message when OVER is used in a context where we don't accept window functions, but per discussion, it's worth it to get the benefit of not needing to reserve this keyword any more. This same refactoring will also let us avoid reserving some other keywords that we expect to add in upcoming patches (specifically, IGNORE, RESPECT, and FILTER). Troels Nielsen, with minor changes by me	2013-06-28 11:11:00 -04:00
Heikki Linnakangas	9e0bc7c1e8	Track spinlock delay in microsecond granularity. On many platforms the OS will round the sleep time to millisecond resolution, but there is no reason for us to pre-emptively round the argument to pg_usleep. When the delay was measured in milliseconds and started from 1 ms, it sometimes took many attempts until the logic that increases the delay by multiplying with a random value between 1 and 2 actually managed to bump it from 1 ms to 2 ms. That lead to a sequence of 1 ms waits until the delay started to increase. This wasn't really a problem but it looked odd if you observed the waits. There is no measurable difference in performance, but it's more readable this way. Jeff Janes	2013-06-28 12:39:55 +03:00
Noah Misch	263865a489	Permit super-MaxAllocSize allocations with MemoryContextAllocHuge(). The MaxAllocSize guard is convenient for most callers, because it reduces the need for careful attention to overflow, data type selection, and the SET_VARSIZE() limit. A handful of callers are happy to navigate those hazards in exchange for the ability to allocate a larger chunk. Introduce MemoryContextAllocHuge() and repalloc_huge(). Use this in tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX tuples, a factor-of-48 increase. In particular, B-tree index builds can now benefit from much-larger maintenance_work_mem settings. Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.	2013-06-27 14:53:57 -04:00
Noah Misch	19085116ee	Cooperate with the Valgrind instrumentation framework. Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its Memcheck tool about the PostgreSQL allocator. This makes Valgrind roughly as sensitive to memory errors involving palloc chunks as it is to memory errors involving malloc chunks. Further client requests in PageAddItem() and printtup() verify that all bits being added to a buffer page or furnished to an output function are predictably-defined. Those tests catch failures of C-language functions to fully initialize the bits of a Datum, which in turn stymie optimizations that rely on _equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to enable these additions. An included "suppression file" silences nominal errors we don't plan to fix. Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.	2013-06-26 20:22:25 -04:00
Noah Misch	a855148a29	Refactor aset.c and mcxt.c in preparation for Valgrind cooperation. Move some repeated debugging code into functions and store intermediates in variables where not presently necessary. No code-generation changes in a production build, and no functional changes. This simplifies and focuses the main patch.	2013-06-26 19:56:03 -04:00
Noah Misch	1d96bb9602	Initialize pad bytes in GinFormTuple(). Every other core buffer page consumer initializes the bytes it furnishes to PageAddItem(). For consistency, do the same here. No back-patch; regardless, we couldn't count on the fix so long as binary upgrade can carry forward affected index builds.	2013-06-26 19:55:15 -04:00
Noah Misch	5f538ad004	Renovate display of non-ASCII messages on Windows. GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch	2013-06-26 11:17:33 -04:00
Alvaro Herrera	4ca50e0710	Avoid inconsistent type declaration Clang 3.3 correctly complains that a variable of type enum MultiXactStatus cannot hold a value of -1, which makes sense. Change the declared type of the variable to int instead, and apply casting as necessary to avoid the warning. Per notice from Andres Freund	2013-06-25 16:41:47 -04:00
Fujii Masao	985bd7d497	Support clean switchover. In replication, when we shutdown the master, walsender tries to send all the outstanding WAL records to the standby, and then to exit. This basically means that all the WAL records are fully synced between two servers after the clean shutdown of the master. So, after promoting the standby to new master, we can restart the stopped master as new standby without the need for a fresh backup from new master. But there was one problem so far: though walsender tries to send all the outstanding WAL records, it doesn't wait for them to be replicated to the standby. Then, before receiving all the WAL records, walreceiver can detect the closure of connection and exit. We cannot guarantee that there is no missing WAL in the standby after clean shutdown of the master. In this case, backup from new master is required when restarting the stopped master as new standby. This patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated to the standby before closing the replication connection. Per discussion, this is a fix that needs to get backpatched rather than new feature. So, back-patch to 9.1 where enough infrastructure for this exists. Patch by me, reviewed by Andres Freund.	2013-06-26 02:14:37 +09:00
Simon Riggs	4f14c86d74	Reverting previous commit, pending investigation of sporadic seg faults from various build farm members.	2013-06-24 21:21:18 +01:00
Simon Riggs	b577a57d41	ALTER TABLE ... ALTER CONSTRAINT for FKs Allow constraint attributes to be altered, so the default setting of NOT DEFERRABLE can be altered to DEFERRABLE and back. Review by Abhijit Menon-Sen	2013-06-24 20:07:41 +01:00
Peter Eisentraut	ce18b01159	Translation updates	2013-06-24 14:16:44 -04:00
Simon Riggs	1f09121b4e	Ensure no xid gaps during Hot Standby startup In some cases with higher numbers of subtransactions it was possible for us to incorrectly initialize subtrans leading to complaints of missing pages. Bug report by Sergey Konoplev Analysis and fix by Andres Freund	2013-06-23 11:05:02 +01:00
Peter Eisentraut	7dfd5cd21c	Clarify terminology standalone backend vs. single-user mode Most of the documentation uses "single-user mode", so use that in the code as well. Adjust the documentation to match the new error message wording. Also add a documentation index entry for "single-user mode". Based-on-patch-by: Jeff Janes <jeff.janes@gmail.com>	2013-06-20 23:03:18 -04:00
Fujii Masao	bab54e383d	Support TB (terabyte) memory unit in GUC variables. Patch by Simon Riggs, reviewed by Jeff Janes and me.	2013-06-20 08:17:14 +09:00
Jeff Davis	b8fd1a09f3	Add buffer_std flag to MarkBufferDirtyHint(). MarkBufferDirtyHint() writes WAL, and should know if it's got a standard buffer or not. Currently, the only callers where buffer_std is false are related to the FSM. In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive. Back-patch to 9.3.	2013-06-17 08:02:12 -07:00
Tom Lane	a64ca63e59	Use WaitLatch, not pg_usleep, for delaying in pg_sleep(). This avoids platform-dependent behavior wherein pg_sleep() might fail to be interrupted by statement timeout, query cancel, SIGTERM, etc. Also, since there's no reason to wake up once a second any more, we can reduce the power consumption of a sleeping backend a tad. Back-patch to 9.3, since use of SA_RESTART for SIGALRM makes this a bigger issue than it used to be.	2013-06-15 16:23:24 -04:00
Tom Lane	873ab97219	Use SA_RESTART for all signals, including SIGALRM. The exclusion of SIGALRM dates back to Berkeley days, when Postgres used SIGALRM in only one very short stretch of code. Nowadays, allowing it to interrupt kernel calls doesn't seem like a very good idea, since its use for statement_timeout means SIGALRM could occur anyplace in the code, and there are far too many call sites where we aren't prepared to deal with EINTR failures. When third-party code is taken into consideration, it seems impossible that we ever could be fully EINTR-proof, so better to use SA_RESTART always and deal with the implications of that. One such implication is that we should not assume pg_usleep() will be terminated early by a signal. Therefore, long sleeps should probably be replaced by WaitLatch operations where practical. Back-patch to 9.3 so we can get some beta testing on this change.	2013-06-15 15:39:51 -04:00
Tom Lane	e472b92140	Avoid deadlocks during insertion into SP-GiST indexes. SP-GiST's original scheme for avoiding deadlocks during concurrent index insertions doesn't work, as per report from Hailong Li, and there isn't any evident way to make it work completely. We could possibly lock individual inner tuples instead of their whole pages, but preliminary experimentation suggests that the performance penalty would be huge. Instead, if we fail to get a buffer lock while descending the tree, just restart the tree descent altogether. We keep the old tuple positioning rules, though, in hopes of reducing the number of cases where this can happen. Teodor Sigaev, somewhat edited by Tom Lane	2013-06-14 14:26:43 -04:00
Tom Lane	c62866eeaf	Remove special-case treatment of LOG severity level in standalone mode. elog.c has historically treated LOG messages as low-priority during bootstrap and standalone operation. This has led to confusion and even masked a bug, because the normal expectation of code authors is that elog(LOG) will put something into the postmaster log, and that wasn't happening during initdb. So get rid of the special-case rule and make the priority order the same as it is in normal operation. To keep from cluttering initdb's output and the behavior of a standalone backend, tweak the severity level of three messages routinely issued by xlog.c during startup and shutdown so that they won't appear in these cases. Per my proposal back in December.	2013-06-13 23:15:15 -04:00
Tom Lane	f04216341d	Refactor checksumming code to make it easier to use externally. pg_filedump and other external utility programs are likely to want to be able to check Postgres page checksums. To avoid messy duplication of code, move the checksumming functionality into an exported header file, much as we did awhile back for the CRC code. In passing, get rid of an unportable assumption that a static char[] array will be word-aligned, and do some other minor code beautification.	2013-06-13 22:35:56 -04:00
Tom Lane	629b3e96dd	Only install a portal's ResourceOwner if it actually has one. In most scenarios a portal without a ResourceOwner is dead and not subject to any further execution, but a portal for a cursor WITH HOLD remains in existence with no ResourceOwner after the creating transaction is over. In this situation, if we attempt to "execute" the portal directly to fetch data from it, we were setting CurrentResourceOwner to NULL, leading to a segfault if the datatype output code did anything that required a resource owner (such as trying to fetch system catalog entries that weren't already cached). The case appears to be impossible to provoke with stock libpq, but psqlODBC at least is able to cause it when working with held cursors. Simplest fix is to just skip the assignment to CurrentResourceOwner, so that any resources used by the data output operations will be managed by the transaction-level resource owner instead. For consistency I changed all the places that install a portal's resowner as current, even though some of them are probably not reachable with a held cursor's portal. Per report from Joshua Berry (with thanks to Hiroshi Inoue for developing a self-contained test case). Back-patch to all supported versions.	2013-06-13 13:12:49 -04:00
Noah Misch	66008564f8	Avoid reading past datum end when parsing JSON. Several loops in the JSON parser examined a byte in memory just before checking whether its address was in-bounds, so they could read one byte beyond the datum's allocation. A SIGSEGV is possible. New in 9.3, so no back-patch.	2013-06-12 19:51:12 -04:00
Noah Misch	3a5d0c5533	Avoid reading below the start of a stack variable in tokenize_file(). We would wrongly overwrite the prior stack byte if it happened to contain '\n' or '\r'. New in 9.3, so no back-patch.	2013-06-12 19:50:52 -04:00
Noah Misch	813895e4ac	Don't pass oidvector by value. Since the structure ends with a flexible array, doing so truncates any vector having more than one element. New in 9.3, so no back-patch.	2013-06-12 19:50:37 -04:00
Noah Misch	fb435f40d5	Observe array length in HaveVirtualXIDsDelayingChkpt(). Since commit `f21bb9cfb5`, this function ignores the caller-provided length and loops until it finds a terminator, which GetVirtualXIDsDelayingChkpt() never adds. Restore the previous loop control logic. In passing, revert the addition of an unused variable by the same commit, presumably a debugging relic.	2013-06-12 19:50:14 -04:00
Noah Misch	ff53890f68	Don't use ordinary NULL-terminated strings as Name datums. Consumers are entitled to read the full 64 bytes pertaining to a Name; using a shorter NULL-terminated string leads to reading beyond the end its allocation; a SIGSEGV is possible. Use the frequent idiom of copying to a NameData on the stack. New in 9.3, so no back-patch.	2013-06-12 19:49:50 -04:00
Tom Lane	dc3eb56383	Improve updatability checking for views and foreign tables. Extend the FDW API (which we already changed for 9.3) so that an FDW can report whether specific foreign tables are insertable/updatable/deletable. The default assumption continues to be that they're updatable if the relevant executor callback function is supplied by the FDW, but finer granularity is now possible. As a test case, add an "updatable" option to contrib/postgres_fdw. This patch also fixes the information_schema views, which previously did not think that foreign tables were ever updatable, and fixes view_is_auto_updatable() so that a view on a foreign table can be auto-updatable. initdb forced due to changes in information_schema views and the functions they rely on. This is a bit unfortunate to do post-beta1, but if we don't change this now then we'll have another API break for FDWs when we do change it. Dean Rasheed, somewhat editorialized on by Tom Lane	2013-06-12 17:53:33 -04:00
Andrew Dunstan	78ed8e03c6	Fix unescaping of JSON Unicode escapes, especially for non-UTF8. Per discussion on -hackers. We treat Unicode escapes when unescaping them similarly to the way we treat them in PostgreSQL string literals. Escapes in the ASCII range are always accepted, no matter what the database encoding. Escapes for higher code points are only processed in UTF8 databases, and attempts to process them in other databases will result in an error. \u0000 is never unescaped, since it would result in an impermissible null byte.	2013-06-12 13:35:24 -04:00
Tom Lane	e262755bfc	Fix cache flush hazard in cache_record_field_properties(). We need to increment the refcount on the composite type's cached tuple descriptor while we do lookups of its column types. Otherwise a cache flush could occur and release the tuple descriptor before we're done with it. This fails reliably with -DCLOBBER_CACHE_ALWAYS, but the odds of a failure in a production build seem rather low (since the pfree'd descriptor typically wouldn't get scribbled on immediately). That may explain the lack of any previous reports. Buildfarm issue noted by Christian Ullrich. Back-patch to 9.1 where the bogus code was added.	2013-06-11 17:26:42 -04:00
Tom Lane	a4424c57c3	Remove unnecessary restrictions about RowExprs in transformAExprIn(). When the existing code here was written, it made sense to special-case RowExprs because that was the only way that we could handle row comparisons at all. Now that we have record_eq() and arrays of composites, the generic logic for "scalar" types will in fact work on RowExprs too, so there's no reason to throw error for combinations of RowExprs and other ways of forming composite values, nor to ignore the possibility of using a ScalarArrayOpExpr. But keep using the old logic when comparing two RowExprs, for consistency with the main transformAExprOp() logic. (This allows some cases with not-quite-identical rowtypes to succeed, so we might get push-back if we removed it.) Per bug #8198 from Rafal Rzepecki. Back-patch to all supported branches, since this works fine as far back as 8.4. Rafal Rzepecki and Tom Lane	2013-06-09 18:39:20 -04:00
Tom Lane	f3839ea117	Remove ALTER DEFAULT PRIVILEGES' requirement of schema CREATE permissions. Per discussion, this restriction isn't needed for any real security reason, and it seems to confuse people more often than it helps them. It could also result in some database states being unrestorable. So just drop it. Back-patch to 9.0, where ALTER DEFAULT PRIVILEGES was introduced.	2013-06-09 15:26:40 -04:00
Tom Lane	007556bf08	Remove fixed limit on the number of concurrent AllocateFile() requests. AllocateFile(), AllocateDir(), and some sister routines share a small array for remembering requests, so that the files can be closed on transaction failure. Previously that array had a fixed size, MAX_ALLOCATED_DESCS (32). While historically that had seemed sufficient, Steve Toutant pointed out that this meant you couldn't scan more than 32 file_fdw foreign tables in one query, because file_fdw depends on the COPY code which uses AllocateFile(). There are probably other cases, or will be in the future, where this nonconfigurable limit impedes users. We can't completely remove any such limit, at least not without a lot of work, since each such request requires a kernel file descriptor and most platforms limit the number we can have. (In principle we could "virtualize" these descriptors, as fd.c already does for the main VFD pool, but not without an additional layer of overhead and a lot of notational impact on the calling code.) But we can at least let the array size be configurable. Hence, change the code to allow up to max_safe_fds/2 allocated file requests. On modern platforms this should allow several hundred concurrent file_fdw scans, or more if one increases the value of max_files_per_process. To go much further than that, we'd need to do some more work on the data structure, since the current code for closing requests has potentially O(N^2) runtime; but it should still be all right for request counts in this range. Back-patch to 9.1 where contrib/file_fdw was introduced.	2013-06-09 13:46:54 -04:00
Andrew Dunstan	d535136b5d	Don't downcase non-ascii identifier chars in multi-byte encodings. Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Schäfer. Backpatch to all live releases.	2013-06-08 10:00:09 -04:00
Andrew Dunstan	94e3311b97	Handle Unicode surrogate pairs correctly when processing JSON. In 9.2, Unicode escape sequences are not analysed at all other than to make sure that they are in the form \uXXXX. But in 9.3 many of the new operators and functions try to turn JSON text values into text in the server encoding, and this includes de-escaping Unicode escape sequences. This processing had not taken into account the possibility that this might contain a surrogate pair to designate a character outside the BMP. That is now handled correctly. This also enforces correct use of surrogate pairs, something that is not done by the type's input routines. This fact is noted in the docs.	2013-06-08 09:12:48 -04:00
Heikki Linnakangas	f73cb5567c	Fix typo in comment.	2013-06-06 18:27:01 +03:00
Robert Haas	a6370fd9ed	Ensure that XLOG_HEAP2_VISIBLE always targets an initialized page. Andres Freund	2013-06-06 10:21:47 -04:00
Tom Lane	964c0d0f80	Prevent pushing down WHERE clauses into unsafe UNION/INTERSECT nests. The planner is aware that it mustn't push down upper-level quals into subqueries if the quals reference subquery output columns that contain set-returning functions or volatile functions, or are non-DISTINCT outputs of a DISTINCT ON subquery. However, it missed making this check when there were one or more levels of UNION or INTERSECT above the dangerous expression. This could lead to "set-valued function called in context that cannot accept a set" errors, as seen in bug #8213 from Eric Soroos, or to silently wrong answers in the other cases. To fix, refactor the checks so that we make the column-is-unsafe checks during subquery_is_pushdown_safe(), which already has to recursively inspect all arms of a set-operation tree. This makes qual_is_pushdown_safe() considerably simpler, at the cost that we will spend some cycles checking output columns that possibly aren't referenced in any upper qual. But the cases where this code gets executed at all are already nontrivial queries, so it's unlikely anybody will notice any slowdown of planning. This has been broken since commit `05f916e6ad`, which makes the bug over ten years old. A bit surprising nobody noticed it before now.	2013-06-05 23:45:11 -04:00
Peter Eisentraut	a3bd6096bd	Update SQL features list	2013-06-05 22:05:18 -04:00
Tom Lane	3f783c8827	Put analyze_keyword back in explain_option_name production. In commit `2c92edad48`, I broke "EXPLAIN (ANALYZE)" syntax, because I mistakenly thought that ANALYZE/ANALYSE were only partially reserved and thus would be included in NonReservedWord; but actually they're fully reserved so they still need to be called out here. A nicer solution would be to demote these words to type_func_name_keyword status (they can't be less than that because of "VACUUM [ANALYZE] ColId"). While that works fine so far as the core grammar is concerned, it breaks ECPG's grammar for reasons I don't have time to isolate at the moment. So do this for the time being. Per report from Kevin Grittner. Back-patch to 9.0, like the previous commit.	2013-06-05 13:32:53 -04:00
Tom Lane	530acda4da	Provide better message when CREATE EXTENSION can't find a target schema. The new message (and SQLSTATE) matches the corresponding error cases in namespace.c. This was thought to be a "can't happen" case when extension.c was written, so we didn't think hard about how to report it. But it definitely can happen in 9.2 and later, since we no longer require search_path to contain any valid schema names. It's probably also possible in 9.1 if search_path came from a noninteractive source. So, back-patch to all releases containing this code. Per report from Sean Chittenden, though this isn't exactly his patch.	2013-06-04 17:22:29 -04:00
Tom Lane	dbc6eb1f4b	Fix memory leak in LogStandbySnapshot(). The array allocated by GetRunningTransactionLocks() needs to be pfree'd when we're done with it. Otherwise we leak some memory during each checkpoint, if wal_level = hot_standby. This manifests as memory bloat in the checkpointer process, or in bgwriter in versions before we made the checkpointer separate. Reported and fixed by Naoya Anzai. Back-patch to 9.0 where the issue was introduced. In passing, improve comments for GetRunningTransactionLocks(), and add an Assert that we didn't overrun the palloc'd array.	2013-06-04 14:58:46 -04:00
Heikki Linnakangas	15386281a6	Put back allow_system_table_mods check in heap_create(). This reverts commit `a475c60367`. Erik Rijkers reported back in January 2013 that after the patch, if you do "pg_dump -t myschema.mytable" to dump a single table, and restore that in a database where myschema does not exist, the table is silently created in pg_catalog instead. That is because pg_dump uses "SET search_path=myschema, pg_catalog" to set schema the table is created in. While allow_system_table_mods is not a very elegant solution to this, we can't leave it as it is, so for now, revert it back to the way it was previously.	2013-06-03 17:22:31 +03:00
Stephen Frost	f129615fe7	Additional spelling corrections A few more minor spelling corrections, no functional changes. Thom Brown	2013-06-03 08:40:27 -04:00
Heikki Linnakangas	e1e2bb34f1	Code review of recycling WAL segments in a restartpoint. Seems cleaner to get the currently-replayed TLI in the same call to GetXLogReplayRecPtr that we get the WAL position. Make it more clear in the comment what the code does when recovery has already ended (RecoveryInProgress() will set ThisTimeLineID in that case). Finally, make resetting ThisTimeLineID afterwards more explicit.	2013-06-03 09:25:12 +03:00
Tom Lane	2c92edad48	Allow type_func_name_keywords in some places where they weren't before. This change makes type_func_name_keywords less reserved than they were before, by allowing them for role names, language names, EXPLAIN and COPY options, and SET values for GUCs; which are all places where few if any actual keywords could appear instead, so no new ambiguities are introduced. The main driver for this change is to allow "COPY ... (FORMAT BINARY)" to work without quoting the word "binary". That is an inconsistency that has been complained of repeatedly over the years (at least by Pavel Golub, Kurt Lidl, and Simon Riggs); but we hadn't thought of any non-ugly solution until now. Back-patch to 9.0 where the COPY (FORMAT BINARY) syntax was introduced.	2013-06-02 20:09:20 -04:00
Stephen Frost	c9fc28a7f1	Minor spelling fixes Fix a few spelling mistakes. Per bug report #8193 from Lajos Veres.	2013-06-01 10:18:59 -04:00
Stephen Frost	551938ae22	Post-pgindent cleanup Make slightly better decisions about indentation than what pgindent is capable of. Mostly breaking out long function calls into one line per argument, with a few other minor adjustments. No functional changes- all whitespace. pgindent ran cleanly (didn't change anything) after. Passes all regressions.	2013-06-01 09:38:15 -04:00
Noah Misch	97c4d9b7c7	Don't emit non-canonical empty arrays in array_remove(). Dean Rasheed	2013-05-31 21:50:59 -04:00
Peter Eisentraut	8b5a3998a1	Remove whitespace from end of lines	2013-05-30 21:05:07 -04:00
Peter Eisentraut	d7eb6f46de	Minor spell checking	2013-05-30 20:56:58 -04:00
Peter Eisentraut	97a11fd0e3	postgresql.conf.sample: Improve whitespace	2013-05-29 22:00:13 -04:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Heikki Linnakangas	e2ef289363	Print line number correctly in COPY. When COPY uses the multi-insert method to insert a batch of tuples into the heap at a time, incorrect line number was printed if something went wrong in inserting the index tuples (primary key failure, for exampl), or processing after row triggers. Fixes bug #8173 reported by Lloyd Albin. Backpatch to 9.2, where the multi- insert code was added.	2013-05-23 07:49:59 -04:00
Simon Riggs	22a27ef113	After fast promotion use CHECKPOINT_FORCE Not necessary for correctness, just to make log_checkpoints output look less singular. Requested by Fujii Masao	2013-05-21 21:27:12 +01:00
Simon Riggs	75a192638f	Maintain ThisTimeLineID correctly in checkpointer checkpointer needs to reset ThisTimeLineID after a restartpoint to allow installing/recycling new WAL files. If recovery has already ended this would leave ThisTimeLineID set incorrectly and so we must reset it otherwise later checkpoints do not have the correct timeline. Bug report by Heikki Linnakangas. Further investigation by Heikki and myself.	2013-05-21 21:17:04 +01:00
Tom Lane	2af0971f35	Clarify documentation of EXPLAIN (TIMING OFF) option. Clarify that this option doesn't suppress measurement of the statement's total runtime. Greg Smith	2013-05-19 22:03:32 -04:00
Simon Riggs	d4337a0dcb	Init crash recovery using the latest available TLI This simplifies the handling of crashes after fast promotion and various minor cases that can exist in short timing windows around that case. Broad fix to bug reported by Michael Paquier on -hackers, approach prompted by Heikki Linnakangas	2013-05-19 17:31:07 +01:00
Simon Riggs	1781744cfc	Emit msg correctly for timeline-crossing crash	2013-05-19 17:00:18 +01:00
Simon Riggs	c94dff4c3c	Remove single space on end of a line in xlog.c Michael Paquier	2013-05-19 15:38:47 +01:00
Tom Lane	403bd6a18b	Fix crash when trying to display a NOTIFY rule action. Fixes oversight in commit `2ffa740be9`. Per report from Josh Kupershmidt. I think we've broken this case before, so let's add a regression test this time.	2013-05-16 16:47:26 -04:00
Tom Lane	6563fb2b45	Fix fd.c to preserve errno where needed. PathNameOpenFile failed to ensure that the correct value of errno was returned to its caller after a failure (because it incorrectly supposed that free() can never change errno). In some cases this would result in a user-visible failure because an expected ENOENT errno was replaced with something else. Bogus EINVAL failures have been observed on OS X, for example. There were also a couple of places that could mangle an important value of errno if FDDEBUG was defined. While the usefulness of that debug support is highly debatable, we might as well make it safe to use, so add errno save/restore logic to the DO_DB macro. Per bug #8167 from Nelson Minar, diagnosed by RhodiumToad. Back-patch to all supported branches.	2013-05-16 15:04:31 -04:00
Tom Lane	b142068622	Allow CREATE FOREIGN TABLE to include SERIAL columns. The behavior is that the required sequence is created locally, which is appropriate because the default expression will be evaluated locally. Per gripe from Brad Nicholson that this case was refused with a confusing error message. We could have improved the error message but it seems better to just allow the case. Also, remove ALTER TABLE's arbitrary prohibition against being applied to foreign tables, which was pretty inconsistent considering we allow it for views, sequences, and other relation types that aren't even called tables. This is needed to avoid breaking pg_dump, which sometimes emits column defaults using separate ALTER TABLE commands. (I think this can happen even when the default is not associated with a sequence, so that was a pre-existing bug once we allowed column defaults for foreign tables.)	2013-05-15 19:03:29 -04:00
Tom Lane	e9c336c786	Fix handling of OID wraparound while in standalone mode. If OID wraparound should occur while in standalone mode (unlikely but possible), we want to advance the counter to FirstNormalObjectId not FirstBootstrapObjectId. Otherwise, user objects might be created with OIDs in the system-reserved range. That isn't immediately harmful but it poses a risk of conflicts during future pg_upgrade operations. Noted by Andres Freund. Back-patch to all supported branches, since all of them are supported sources for pg_upgrade operations.	2013-05-13 15:40:16 -04:00
Tom Lane	904af8db8a	Fix handling of strict non-set functions with NULLs in set-valued inputs. In a construct like "select plain_function(set_returning_function(...))", the plain function is applied to each output row of the SRF successively. If some of the SRF outputs are NULL, and the plain function is strict, you'd expect to get NULL results for such rows ... but what actually happened was that such rows were omitted entirely from the result set. This was due to confusion of this case with what should happen for nested set-returning functions; a strict SRF is indeed supposed to yield an empty set for null input. Per bug #8150 from Erwin Brandstetter. Although this has been broken forever, we're not back-patching because of the possibility that some apps out there expect the incorrect behavior. This change should be listed as a possible incompatibility in the 9.3 release notes.	2013-05-12 13:08:12 -04:00
Tom Lane	35d50b527a	Fix to_number() to correctly ignore thousands separator when it's '.'. The existing code in NUM_numpart_from_char has hard-wired logic to treat '.' as decimal point, even when we're using a locale-aware format string and the locale says that '.' is the thousands separator. This results in clearly wrong answers in FM mode (where we must be able to identify the decimal point location), as per bug report from Patryk Kordylewski. Since the initialization code in NUM_prepare_locale already sets up Np->decimal as either the locale decimal-point string or "." depending on which decimal-point format code was used, there's really no need to have any extra logic at all in NUM_numpart_from_char: we only need to test for a match to Np->decimal. (Note: AFAICS there's nothing in here that explicitly checks for thousands separators --- rather, any unmatched character is silently skipped over. That's pretty bogus IMO but it's not the issue being complained of.) This is a longstanding bug, but it's possible that some existing apps are depending on '.' being recognized as decimal point even when using a D format code. Hence, no back-patch. We should probably list this as a potential incompatibility in the 9.3 release notes.	2013-05-11 16:35:03 -04:00
Tom Lane	69cc60dcfd	Guard against input_rows == 0 in estimate_num_groups(). This case doesn't normally happen, because the planner usually clamps all row estimates to at least one row; but I found that it can arise when dealing with relations excluded by constraints. Without a defense, estimate_num_groups() can return zero, which leads to divisions by zero inside the planner as well as assertion failures in the executor. An alternative fix would be to change set_dummy_rel_pathlist() to make the size estimate for a dummy relation 1 row instead of 0, but that seemed pretty ugly; and probably someday we'll want to drop the convention that the minimum rowcount estimate is 1 row. Back-patch to 8.4, as the problem can be demonstrated that far back.	2013-05-10 17:15:30 -04:00
Tom Lane	91715e8293	Fix management of fn_extra caching during repeated GiST index scans. Commit `d22a09dc70` introduced official support for GiST consistentFns that want to cache data using the FmgrInfo fn_extra pointer: the idea was to preserve the cached values across gistrescan(), whereas formerly they'd been leaked. However, there was an oversight in that, namely that multiple scan keys might reference the same column's consistentFn; the code would result in propagating the same cache value into multiple scan keys, resulting in crashes or wrong answers. Use a separate array instead to ensure that each scan key keeps its own state. Per bug #8143 from Joel Roller. Back-patch to 9.2 where the bug was introduced.	2013-05-09 23:09:04 -04:00
Tom Lane	a7b965382c	Better fix for permissions tests in excluded subqueries. This reverts the code changes in `50c137487c`, which turned out to induce crashes and not completely fix the problem anyway. That commit only considered single subqueries that were excluded by constraint-exclusion logic, but actually the problem also exists for subqueries that are appendrel members (ie part of a UNION ALL list). In such cases we can't add a dummy subpath to the appendrel's AppendPath list without defeating the logic that recognizes when an appendrel is completely excluded. Instead, fix the problem by having setrefs.c scan the rangetable an extra time looking for subqueries that didn't get into the plan tree. (This approach depends on the 9.2 change that made set_subquery_pathlist generate dummy paths for excluded single subqueries, so that the exclusion behavior is the same for single subqueries and appendrel members.) Note: it turns out that the appendrel form of the missed-permissions-checks bug exists as far back as 8.4. However, since the practical effect of that bug seems pretty minimal, consensus is to not attempt to fix it in the back branches, at least not yet. Possibly we could back-port this patch once it's gotten a reasonable amount of testing in HEAD. For the moment I'm just going to revert the previous patch in 9.2.	2013-05-08 16:59:58 -04:00
Heikki Linnakangas	2ffa66f497	Fix walsender failure at promotion. If a standby server has a cascading standby server connected to it, it's possible that WAL has already been sent up to the next WAL page boundary, splitting a WAL record in the middle, when the first standby server is promoted. Don't throw an assertion failure or error in walsender if that happens. Also, fix a variant of the same bug in pg_receivexlog: if it had already received WAL on previous timeline up to a segment boundary, when the upstream standby server is promoted so that the timeline switch record falls on the previous segment, pg_receivexlog would miss the segment containing the timeline switch. To fix that, have walsender send the position of the timeline switch at end-of-streaming, in addition to the next timeline's ID. It was previously assumed that the switch happened exactly where the streaming stopped. Note: this is an incompatible change in the streaming protocol. You might get an error if you try to stream over timeline switches, if the client is running 9.3beta1 and the server is more recent. It should be fine after a reconnect, however. Reported by Fujii Masao.	2013-05-08 20:30:17 +03:00
Heikki Linnakangas	cb953d8b1b	Use the term "radix tree" instead of "suffix tree" for SP-GiST text opclass. What we have implemented is a radix tree (or a radix trie or a patricia trie), but the docs and code comments incorrectly called it a "suffix tree". Alexander Korotkov	2013-05-08 14:34:26 +03:00
Tom Lane	1d6c72a55b	Move materialized views' is-populated status into their pg_class entries. Previously this state was represented by whether the view's disk file had zero or nonzero size, which is problematic for numerous reasons, since it's breaking a fundamental assumption about heap storage. This was done to allow unlogged matviews to revert to unpopulated status after a crash despite our lack of any ability to update catalog entries post-crash. However, this poses enough risk of future problems that it seems better to not support unlogged matviews until we can find another way. Accordingly, revert that choice as well as a number of existing kluges forced by it in favor of creating a pg_class.relispopulated flag column.	2013-05-06 13:27:22 -04:00
Tom Lane	5da5798004	Back out some recent translation updates. Very old versions of msgfmt choke on these specific messages, for reasons that are unclear at the moment. Remove them so that we can ship a beta release and not get complaints from testers (these messages will just go untranslated, instead, and we're hardly at 100% coverage anyway). Peter Eisentraut will look for a better fix later.	2013-05-06 12:28:13 -04:00
Tom Lane	3223b25ff7	Disallow unlogged materialized views. The initial implementation of this feature was really unsupportable, because it's relying on the physical size of an on-disk file to carry the relation's populated/unpopulated state, which is at least a modularity violation and could have serious long-term consequences. We could say that an unlogged matview goes to empty on crash, but not everybody likes that definition, so let's just remove the feature for 9.3. We can add it back when we have a less klugy implementation. I left the grammar and tab-completion support for CREATE UNLOGGED MATERIALIZED VIEW in place, since it's harmless and allows delivering a more specific error message about the unsupported feature. I'm committing this separately to ease identification of what should be reverted when/if we are able to re-enable the feature.	2013-05-06 12:00:06 -04:00
Bruce Momjian	8b06e6aba8	Revert idea of zer-padding padding session id in log_line_prefix Removal of doc adjustment and release note mention as well.	2013-05-06 08:59:39 -04:00
Peter Eisentraut	539ecc9241	Translation updates	2013-05-05 22:34:23 -04:00
Kevin Grittner	b69ec7cc99	Prevent (auto)vacuum from truncating first page of populated matview. Per report from Fujii Masao, with regression test using his example.	2013-05-02 17:33:03 -05:00
Andrew Dunstan	5f8b4319b9	Use correct length to convert json unicode escapes. Bug reported on IRC - fix due to Andrew Gierth.	2013-05-01 18:47:18 -04:00
Tom Lane	50c137487c	Fix permission tests for views/tables proven empty by constraint exclusion. A view defined as "select <something> where false" had the curious property that the system wouldn't check whether users had the privileges necessary to select from it. More generally, permissions checks could be skipped for tables referenced in sub-selects or views that were proven empty by constraint exclusion (although some quick testing suggests this seldom happens in cases of practical interest). This happened because the planner failed to include rangetable entries for such tables in the finished plan. This was noticed in connection with erroneous handling of materialized views, but actually the issue is quite unrelated to matviews. Therefore, revert commit `200ba1667b` in favor of a more direct test for the real problem. Back-patch to 9.2 where the bug was introduced (by commit `7741dd6590`).	2013-05-01 18:26:50 -04:00
Simon Riggs	443951748c	Record data_checksum_version in control file. The value is not used anywhere in code, but will allow future changes to the checksum version should that become necessary in the future.	2013-04-30 12:27:12 +01:00
Simon Riggs	730924397c	Ensure we MarkBufferDirty before visibilitymap_set() logs the heap page and sets the LSN. Otherwise a checkpoint could occur between those actions and leave us in an inconsistent state. Jeff Davis	2013-04-30 08:15:49 +01:00
Simon Riggs	fdea2530bd	Compiler optimizations for page checksum code. Ants Aasma and Jeff Davis	2013-04-30 06:59:26 +01:00

... 2 3 4 5 6 ...

13667 Commits