postgresql

Commit Graph

Author	SHA1	Message	Date
Heikki Linnakangas	fe546f3da6	Don't wait for the commit record to be replicated if we wrote no WAL. When using synchronous replication, we waited for the commit record to be replicated, but if we our transaction didn't write any other WAL records, that's not required because we don't even flush the WAL locally to disk in that case. This lead to long waits when committing a transaction that only modified a temporary table. Bug spotted by Thom Brown.	2012-04-17 16:28:31 +03:00
Peter Eisentraut	a33fcd7e79	Fix typo Kyotaro HORIGUCHI	2012-04-16 15:36:40 +03:00
Robert Haas	ea6a2d8d47	Rename synchronous_commit='write' to 'remote_write'. Fujii Masao, per discussion on pgsql-hackers	2012-04-14 10:53:22 -04:00
Robert Haas	4a2d7ad76f	pg_size_pretty(numeric) The output of the new pg_xlog_location_diff function is of type numeric, since it could theoretically overflow an int8 due to signedness; this provides a convenient way to format such values. Fujii Masao, with some beautification by me.	2012-04-14 08:07:25 -04:00
Tom Lane	e54b10a62d	Remove the "last ditch" code path in join_search_one_level(). So far as I can tell, it is no longer possible for this heuristic to do anything useful, because the new weaker definition of have_relevant_joinclause means that any relation with a joinclause must be considered joinable to at least one other relation. It would still be possible for the code block to be entered, for example if there are join order restrictions that prevent any join of the current level from being formed; but in that case it's just a waste of cycles to attempt to form cartesian joins, since the restrictions will still apply. Furthermore, IMO the existence of this code path can mask bugs elsewhere; we would have noticed the problem with cartesian joins a lot sooner if this code hadn't compensated for it in the simplest case. Accordingly, let's remove it and see what happens. I'm committing this separately from the prerequisite changes in have_relevant_joinclause, just to make the question easier to revisit if there is some fault in my logic.	2012-04-13 16:07:18 -04:00
Tom Lane	e3ffd05b02	Weaken the planner's tests for relevant joinclauses. We should be willing to cross-join two small relations if that allows us to use an inner indexscan on a large relation (that is, the potential indexqual for the large table requires both smaller relations). This worked in simple cases but fell apart as soon as there was a join clause to a fourth relation, because the existence of any two-relation join clause caused the planner to not consider clauseless joins between other base relations. The added regression test shows an example case adapted from a recent complaint from Benoit Delbosc. Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and has_relevant_eclass_joinclause to consider that a join clause mentioning three or more relations is sufficient grounds for joining any subset of those relations, even if we have to do so via a cartesian join. Since such clauses are relatively uncommon, this shouldn't affect planning speed on typical queries; in fact it should help a bit, because the latter two functions in particular get significantly simpler. Although this is arguably a bug fix, I'm not going to risk back-patching it, since it might have currently-unforeseen consequences.	2012-04-13 16:07:17 -04:00
Peter Eisentraut	c0cc526e8b	Rename bytea_agg to string_agg and add delimiter argument Per mailing list discussion, we would like to keep the bytea functions parallel to the text functions, so rename bytea_agg to string_agg, which already exists for text. Also, to satisfy the rule that we don't want aggregate functions of the same name with a different number of arguments, add a delimiter argument, just like string_agg for text already has.	2012-04-13 21:36:59 +03:00
Peter Eisentraut	64e1309c76	Consistently quote encoding and locale names in messages	2012-04-13 20:37:07 +03:00
Robert Haas	61167bfaf2	Fix typo in comment.	2012-04-13 08:54:13 -04:00
Robert Haas	5630eddf1e	Update lazy_scan_heap header comment. The previous comment described how things worked in PostgreSQL 8.2 and prior.	2012-04-13 08:51:19 -04:00
Tom Lane	732bfa2448	Fix cost estimation for indexscan filter conditions. cost_index's method for estimating per-tuple costs of evaluating filter conditions (a/k/a qpquals) was completely wrong in the presence of derived indexable conditions, such as range conditions derived from a LIKE clause. This was largely masked in common cases as a result of all simple operator clauses having about the same costs, but it could show up in a big way when dealing with functional indexes containing expensive functions, as seen for example in bug #6579 from Istvan Endredy. Rejigger the calculation to give sane answers when the indexquals aren't a subset of the baserestrictinfo list. As a side benefit, we now do the calculation properly for cases involving join clauses (ie, parameterized indexscans), which we always overestimated before. There are still cases where this is an oversimplification, such as clauses that can be dropped because they are implied by a partial index's predicate. But we've never accounted for that in cost estimates before, and I'm not convinced it's worth the cycles to try to do so.	2012-04-11 20:24:17 -04:00
Tom Lane	880bfc3287	Silently ignore any nonexistent schemas that are listed in search_path. Previously we attempted to throw an error or at least warning for missing schemas, but this was done inconsistently because of implementation restrictions (in many cases, GUC settings are applied outside transactions so that we can't do system catalog lookups). Furthermore, there were exceptions to the rule even in the beginning, and we'd been poking more and more holes in it as time went on, because it turns out that there are lots of use-cases for having some irrelevant items in a common search_path value. It seems better to just adopt a philosophy similar to what's always been done with Unix PATH settings, wherein nonexistent or unreadable directories are silently ignored. This commit also fixes the documentation to point out that schemas for which the user lacks USAGE privilege are silently ignored. That's always been true but was previously not documented. This is mostly in response to Robert Haas' complaint that 9.1 started to throw errors or warnings for missing schemas in cases where prior releases had not. We won't adopt such a significant behavioral change in a back branch, so something different will be needed in 9.1.	2012-04-11 12:02:50 -04:00
Tom Lane	3769fa5fc6	Make pg_tablespace_location(0) return the database's default tablespace. This definition is convenient when applying the function to the reltablespace column of pg_class, since that's what zero means there; and it doesn't interfere with any other plausible use of the function. Per gripe from Bruce Momjian.	2012-04-10 21:43:14 -04:00
Tom Lane	0d9819f7e3	Measure epoch of timestamp-without-time-zone from local not UTC midnight. This patch reverts commit `191ef2b407` and thereby restores the pre-7.3 behavior of EXTRACT(EPOCH FROM timestamp-without-tz). Per discussion, the more recent behavior was misguided on a couple of grounds: it makes it hard to get a non-timezone-aware epoch value for a timestamp, and it makes this one case dependent on the value of the timezone GUC, which is incompatible with having timestamp_part() labeled as immutable. The other behavior is still available (in all releases) by explicitly casting the timestamp to timestamp with time zone before applying EXTRACT. This will need to be called out as an incompatible change in the 9.2 release notes. Although having mutable behavior in a function marked immutable is clearly a bug, we're not going to back-patch such a change.	2012-04-10 12:04:42 -04:00
Tom Lane	65fd91333e	Fix an Assert that turns out to be reachable after all. estimate_num_groups() gets unhappy with create table empty(); select * from empty except select * from empty e2; I can't see any actual use-case for such a query (and the table is illegal per SQL spec), but it seems like a good idea that it not cause an assert failure.	2012-04-09 11:58:24 -04:00
Tom Lane	d515365a61	Don't bother copying empty support arrays in a zero-column MergeJoin. The case could not arise when this code was originally written, but it can now (since we made zero-column MergeJoins work for the benefit of FULL JOIN ON TRUE). I don't think there is any actual bug here, but we might as well treat it consistently with other uses of COPY_POINTER_FIELD(). Per comment from Ashutosh Bapat.	2012-04-09 11:41:54 -04:00
Robert Haas	3ae5133b1c	Teach SLRU code to avoid replacing I/O-busy pages. Patch by me; review by Tom Lane and others.	2012-04-08 23:05:55 -04:00
Heikki Linnakangas	03529a3ff9	set_stack_base() no longer needs to be called in PostgresMain. This was a thinko in previous commit. Now that stack base pointer is now set in PostmasterMain and SubPostmasterMain, it doesn't need to be set in PostgresMain anymore.	2012-04-08 19:39:12 +03:00
Heikki Linnakangas	ef3883d130	Do stack-depth checking in all postmaster children. We used to only initialize the stack base pointer when starting up a regular backend, not in other processes. In particular, autovacuum workers can run arbitrary user code, and without stack-depth checking, infinite recursion in e.g an index expression will bring down the whole cluster. The comment about PL/Java using set_stack_base() is not yet true. As the code stands, PL/java still modifies the stack_base_ptr variable directly. However, it's been discussed in the PL/Java mailing list that it should be changed to use the function, because PL/Java is currently oblivious to the register stack used on Itanium. There's another issues with PL/Java, namely that the stack base pointer it sets is not really the base of the stack, it could be something close to the bottom of the stack. That's a separate issue that might need some further changes to this code, but that's a different story. Backpatch to all supported releases.	2012-04-08 19:07:55 +03:00
Tom Lane	7feecedcce	Fix incorrect make maintainer-clean rule.	2012-04-07 18:16:50 -04:00
Tom Lane	95b9c333b2	Further adjustment of comment about qsort_tuple.	2012-04-07 17:48:40 -04:00
Tom Lane	a25ef7a5f6	Remove useless variable to suppress compiler warning.	2012-04-07 16:44:43 -04:00
Tom Lane	0ab4db52c0	Fix misleading output from gin_desc(). XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.	2012-04-06 18:10:21 -04:00
Tom Lane	17b985b1a0	Fix broken comparetup_datum code. Commit `337b6f5ecf` contained the entirely fanciful assumption that it had made comparetup_datum unreachable. Reported and patched by Takashi Yamamoto. Fix up some not terribly accurate/useful comments from that commit, too.	2012-04-06 16:58:50 -04:00
Tom Lane	cea49fe82f	Dept of second thoughts: improve the API for AnalyzeForeignTable. If we make the initially-called function return the table physical-size estimate, acquire_inherited_sample_rows will be able to use that to allocate numbers of samples among child tables, when the day comes that we want to support foreign tables in inheritance trees.	2012-04-06 16:04:10 -04:00
Tom Lane	263d9de66b	Allow statistics to be collected for foreign tables. ANALYZE now accepts foreign tables and allows the table's FDW to control how the sample rows are collected. (But only manual ANALYZEs will touch foreign tables, for the moment, since among other things it's not very clear how to handle remote permissions checks in an auto-analyze.) contrib/file_fdw is extended to support this. Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.	2012-04-06 15:02:35 -04:00
Simon Riggs	8cb53654db	Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock	2012-04-06 10:21:40 +01:00
Robert Haas	21cc529698	checkopint -> checkpoint Report by Guillaume Lelarge.	2012-04-05 21:37:33 -04:00
Robert Haas	b736aef2ec	Publish checkpoint timing information to pg_stat_bgwriter. Greg Smith, Peter Geoghegan, and Robert Haas	2012-04-05 14:04:37 -04:00
Robert Haas	644828908f	Expose track_iotiming data via the statistics collector. Ants Aasma's original patch to add timing information for buffer I/O requests exposed this data at the relation level, which was judged too costly. I've here exposed it at the database level instead.	2012-04-05 11:40:24 -04:00
Tom Lane	c17e863bc7	Fix syslogger to not lose log coherency under high load. The original coding of the syslogger had an arbitrary limit of 20 large messages concurrently in progress, after which it would just punt and dump message fragments to the output file separately. Our ambitions are a bit higher than that now, so allow the data structure to expand as necessary. Reported and patched by Andrew Dunstan; some editing by Tom	2012-04-04 15:05:10 -04:00
Peter Eisentraut	38b9693fd9	Add support for renaming domain constraints	2012-04-03 08:11:51 +03:00
Simon Riggs	68219aaf6b	Correct epoch of txid_current() when executed on a Hot Standby server. Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user. Bug report from Daniel Farina	2012-03-29 14:55:30 +01:00
Andrew Dunstan	aeca650226	Unbreak Windows builds broken by pgpipe removal.	2012-03-29 04:11:57 -04:00
Heikki Linnakangas	5762a4d909	Inherit max_safe_fds to child processes in EXEC_BACKEND mode. Postmaster sets max_safe_fds by testing how many open file descriptors it can open, and that is normally inherited by all child processes at fork(). Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we effectively ignored max_files_per_process on Windows, and always assumed a conservative default of 32 simultaneous open files. That could have an impact on performance, if you need to access a lot of different files in a query. After this patch, the value is passed to child processes by save/restore_backend_variables() among many other global variables. It has been like this forever, but given the lack of complaints about it, I'm not backpatching this.	2012-03-29 08:19:11 +03:00
Andrew Dunstan	d2c1740dc2	Remove now redundant pgpipe code.	2012-03-28 23:24:07 -04:00
Tom Lane	5d3fcc4c2e	Bend parse location rules for the convenience of pg_stat_statements. Generally, the parse location assigned to a multiple-token construct is the location of its leftmost token. This commit breaks that rule for the syntaxes TYPENAME 'LITERAL' and CAST(CONSTANT AS TYPENAME) --- the resulting Const will have the location of the literal string, not the typename or CAST keyword. The cases where this matters are pretty thin on the ground (no error messages in the regression tests change, for example), and it's unlikely that any user would be confused anyway by an error cursor pointing at the literal. But still it's less than consistent. The reason for changing it is that contrib/pg_stat_statements wants to know the parse location of the original literal, and it was agreed that this is the least unpleasant way to preserve that information through parse analysis. Peter Geoghegan	2012-03-27 15:17:41 -04:00
Tom Lane	a40fa613b5	Add some infrastructure for contrib/pg_stat_statements. Add a queryId field to Query and PlannedStmt. This is not used by the core backend, except for being copied around at appropriate times. It's meant to allow plug-ins to track a particular query forward from parse analysis to execution. The queryId is intentionally not dumped into stored rules (and hence this commit doesn't bump catversion). You could argue that choice either way, but it seems better that stored rule strings not have any dependency on plug-ins that might or might not be present. Also, add a post_parse_analyze_hook that gets invoked at the end of parse analysis (but only for top-level analysis of complete queries, not cases such as analyzing a domain's default-value expression). This is mainly meant to be used to compute and assign a queryId, but it could have other applications. Peter Geoghegan	2012-03-27 15:17:40 -04:00
Robert Haas	40b9b95769	New GUC, track_iotiming, to track I/O timings. Currently, the only way to see the numbers this gathers is via EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through the stats collector and pg_stat_statements in subsequent patches. Ants Aasma, reviewed by Greg Smith, with some further changes by me.	2012-03-27 14:55:02 -04:00
Peter Eisentraut	dcb33b1c64	Remove dead assignment found by Coverity	2012-03-26 21:03:10 +03:00
Robert Haas	7386089d23	Code cleanup for heap_freeze_tuple. It used to be case that lazy vacuum could call this function with only a shared lock on the buffer, but neither lazy vacuum nor any other code path does that any more. Simplify the code accordingly and clean up some related, obsolete comments.	2012-03-26 11:03:06 -04:00
Tom Lane	e8476f46fc	Fix COPY FROM for null marker strings that correspond to invalid encoding. The COPY documentation says "COPY FROM matches the input against the null string before removing backslashes". It is therefore reasonable to presume that null markers like E'\\0' will work ... and they did, until someone put the tests in the wrong order during microoptimization-driven rewrites. Since then, we've been failing if the null marker is something that would de-escape to an invalidly-encoded string. Since null markers generally need to be something that can't appear in the data, this represents a nontrivial loss of functionality; surprising nobody noticed it earlier. Per report from Jeff Davis. Backpatch to 8.4 where this got broken.	2012-03-25 23:17:22 -04:00
Tom Lane	c7cea267de	Replace empty locale name with implied value in CREATE DATABASE and initdb. setlocale() accepts locale name "" as meaning "the locale specified by the process's environment variables". Historically we've accepted that for Postgres' locale settings, too. However, it's fairly unsafe to store an empty string in a new database's pg_database.datcollate or datctype fields, because then the interpretation could vary across postmaster restarts, possibly resulting in index corruption and other unpleasantness. Instead, we should expand "" to whatever it means at the moment of calling CREATE DATABASE, which we can do by saving the value returned by setlocale(). For consistency, make initdb set up the initial lc_xxx parameter values the same way. initdb was already doing the right thing for empty locale names, but it did not replace non-empty names with setlocale results. On a platform where setlocale chooses to canonicalize the spellings of locale names, this would result in annoying inconsistency. (It seems that popular implementations of setlocale don't do such canonicalization, which is a pity, but the POSIX spec certainly allows it to be done.) The same risk of inconsistency leads me to not venture back-patching this, although it could certainly be seen as a longstanding bug. Per report from Jeff Davis, though this is not his proposed patch.	2012-03-25 21:47:22 -04:00
Tom Lane	8279eb4191	Fix planner's handling of outer PlaceHolderVars within subqueries. For some reason, in the original coding of the PlaceHolderVar mechanism I had supposed that PlaceHolderVars couldn't propagate into subqueries. That is of course entirely possible. When it happens, we need to treat an outer-level PlaceHolderVar much like an outer Var or Aggref, that is SS_replace_correlation_vars() needs to replace the PlaceHolderVar with a Param, and then when building the finished SubPlan we have to provide the PlaceHolderVar expression as an actual parameter for the SubPlan. The handling of the contained expression is a bit delicate but it can be treated exactly like an Aggref's expression. In addition to the missing logic in subselect.c, prepjointree.c was failing to search subqueries for PlaceHolderVars that need their relids adjusted during subquery pullup. It looks like everyplace else that touches PlaceHolderVars got it right, though. Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this oversight would fail with "ERROR: Upper-level PlaceHolderVar found where not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong answers, since the value transmitted into the subquery wouldn't go to null when it should.	2012-03-24 16:21:39 -04:00
Tom Lane	ed61127be4	Cast some printf arguments to avoid possibly-nonportable behavior. Per compiler warnings on buildfarm member black_firefly.	2012-03-23 20:18:04 -04:00
Tom Lane	81a646febe	Refactor simplify_function et al to centralize argument simplification. We were doing the recursive simplification of function/operator arguments in half a dozen different places, with rather baroque logic to ensure it didn't get done multiple times on some arguments. This patch improves that by postponing argument simplification until after we've dealt with named parameters and added any needed default expressions. Marti Raudsepp, somewhat hacked on by me	2012-03-23 19:15:58 -04:00
Tom Lane	0339047bc9	Code review for protransform patches. Fix loss of previous expression-simplification work when a transform function fires: we must not simply revert to untransformed input tree. Instead build a dummy FuncExpr node to pass to the transform function. This has the additional advantage of providing a simpler, more uniform API for transform functions. Move documentation to a somewhat less buried spot, relocate some poorly-placed code, be more wary of null constants and invalid typmod values, add an opr_sanity check on protransform function signatures, and some other minor cosmetic adjustments. Note: although this patch touches pg_proc.h, no need for catversion bump, because the changes are cosmetic and don't actually change the intended catalog contents.	2012-03-23 17:29:57 -04:00
Peter Eisentraut	0e85abd658	Clean up compiler warnings from unused variables with asserts disabled For those variables only used when asserts are enabled, use a new macro PG_USED_FOR_ASSERTS_ONLY, which expands to __attribute__((unused)) when asserts are not enabled.	2012-03-21 23:33:10 +02:00
Tom Lane	f70f095c90	Allow new relmapper entries when allow_system_table_mods is true. This restores the pre-9.0 situation that it's possible to add new indexes on pg_class and other mapped-but-not-shared catalogs, so long as you broke the glass and flipped the big red Dont-Touch-Me switch. As before, there are a lot of gotchas, and you'd have to be pretty desperate to try this on a production database; but there doesn't seem to be a reason for relmapper.c to be preventing such things all by itself. Per experimentation with a case suggested by Cody Cutrer.	2012-03-21 14:09:39 -04:00
Robert Haas	aefa6d163e	Add some CHECK_FOR_INTERRUPTS() calls to the heap-sort call path. I broke this in commit `337b6f5ecf`, which among other things arranged for quicksorts to CHECK_FOR_INTERRUPTS() slightly less frequently. Sadly, it also arranged for heapsorts to CHECK_FOR_INTERRUPTS() much less frequently. Repair.	2012-03-20 21:26:39 -04:00
Tom Lane	9dbf2b7d75	Restructure SELECT INTO's parsetree representation into CreateTableAsStmt. Making this operation look like a utility statement seems generally a good idea, and particularly so in light of the desire to provide command triggers for utility statements. The original choice of representing it as SELECT with an IntoClause appendage had metastasized into rather a lot of places, unfortunately, so that this patch is a great deal more complicated than one might at first expect. In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS subcommands required restructuring some EXPLAIN-related APIs. Add-on code that calls ExplainOnePlan or ExplainOneUtility, or uses ExplainOneQuery_hook, will need adjustment. Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO, which formerly were accepted though undocumented, are no longer accepted. The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE. The CREATE RULE case doesn't seem to have much real-world use (since the rule would work only once before failing with "table already exists"), so we'll not bother with that one. Both SELECT INTO and CREATE TABLE AS still return a command tag of "SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn", but for the moment backwards compatibility wins the day. Andres Freund and Tom Lane	2012-03-19 21:38:12 -04:00
Peter Eisentraut	693ff85d47	backend: Fix minor memory leak in configuration file processing Just for consistency with the other code paths. found by Coverity	2012-03-16 20:34:59 +02:00
Tom Lane	b67ad046e6	Improve commentary in match_pathkeys_to_index(). For a little while there I thought match_pathkeys_to_index() was broken because it wasn't trying to match index columns to pathkeys in order. Actually that's correct, because GiST can support ordering operators on any random collection of index columns, but it sure needs a comment.	2012-03-16 14:07:21 -04:00
Tom Lane	dd4134ea56	Revisit handling of UNION ALL subqueries with non-Var output columns. In commit `57664ed25e` I tried to fix a bug reported by Teodor Sigaev by making non-simple-Var output columns distinct (by wrapping their expressions with dummy PlaceHolderVar nodes). This did not work too well. Commit `b28ffd0fcc` fixed some ensuing problems with matching to child indexes, but per a recent report from Claus Stadler, constraint exclusion of UNION ALL subqueries was still broken, because constant-simplification didn't handle the injected PlaceHolderVars well either. On reflection, the original patch was quite misguided: there is no reason to expect that EquivalenceClass child members will be distinct. So instead of trying to make them so, we should ensure that we can cope with the situation when they're not. Accordingly, this patch reverts the code changes in the above-mentioned commits (though the regression test cases they added stay). Instead, I've added assorted defenses to make sure that duplicate EC child members don't cause any problems. Teodor's original problem ("MergeAppend child's targetlist doesn't match MergeAppend") is addressed more directly by revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort list guide creation of each child's sort list. In passing, get rid of add_sort_column; as far as I can tell, testing for duplicate sort keys at this stage is dead code. Certainly it doesn't trigger often enough to be worth expending cycles on in ordinary queries. And keeping the test would've greatly complicated the new logic in prepare_sort_from_pathkeys, because comparing pathkey list entries against a previous output array requires that we not skip any entries in the list. Back-patch to 9.1, like the previous patches. The only known issue in this area that wasn't caused by the ill-advised previous patches was the MergeAppend planning failure, which of course is not relevant before 9.1. It's possible that we need some of the new defenses against duplicate child EC entries in older branches, but until there's some clear evidence of that I'm going to refrain from back-patching further.	2012-03-16 13:11:55 -04:00
Peter Eisentraut	eb990a2b9e	Add const qualifier to tzn returned by timestamp2tm() The tzn value might come from tm->tm_zone, which libc declares as const, so it's prudent that the upper layers know about this as well.	2012-03-15 21:17:19 +02:00
Peter Eisentraut	531e60aec0	Remove unused tzn arguments for timestamp2tm()	2012-03-15 21:13:35 +02:00
Peter Eisentraut	ad4fb0d0d2	Improve EncodeDateTime and EncodeTimeOnly APIs Use an explicit argument to tell whether to include the time zone in the output, rather than using some undocumented pointer magic.	2012-03-14 23:03:34 +02:00
Peter Eisentraut	6f018c6dda	COPY: Add an assertion This is for tools such as Coverity that don't know that the grammar enforces that the case of not having a relation (but instead a query) cannot happen in the FROM case.	2012-03-14 22:44:40 +02:00
Peter Eisentraut	e684ab5e1e	Add additional safety check against invalid backup label file It was already checking for invalid data after "BACKUP FROM", but would possibly crash if "BACKUP FROM" was missing altogether. found by Coverity	2012-03-14 22:41:50 +02:00
Tom Lane	b4af1c25bb	Fix SPGiST vacuum algorithm to handle concurrent tuple motion properly. A leaf tuple that we need to delete could get moved as a consequence of an insertion happening concurrently with the VACUUM scan. If it moves from a page past the current scan point to a page before, we'll miss it, which is not acceptable. Hence, when we see a leaf-page REDIRECT that could have been made since our scan started, chase down the redirection pointer much as if we were doing a normal index search, and be sure to vacuum every page it leads to. This fixes the issue because, if the tuple was on page N at the instant we start our scan, we will surely find it as a consequence of chasing the redirect from page N, no matter how much it moves around in between. Problem noted by Takashi Yamamoto.	2012-03-12 16:10:28 -04:00
Peter Eisentraut	bad250f4f3	Use correct sizeof operand in qsort call Probably no practical impact, since all pointers ought to have the same size, but it was wrong nonetheless. Found by Coverity.	2012-03-12 20:56:13 +02:00
Peter Eisentraut	c9f310d377	Add comment for missing break in switch For clarity, following other sites, and to silence Coverity.	2012-03-12 20:55:09 +02:00
Tom Lane	c6be1f43ab	Make INSERT/UPDATE queries depend on their specific target columns. We have always created a whole-table dependency for the target relation, but that's not really good enough, as it doesn't prevent scenarios such as dropping an individual target column or altering its type. So we have to create an individual dependency for each target column, as well. Per report from Bill MacArthur of a rule containing UPDATE breaking after such an alteration. Note that this patch doesn't try to make such cases work, only to ensure that the attempted ALTER TABLE throws an error telling you it can't cope with adjusting the rule. This is a long-standing bug, but given the lack of prior reports I'm not going to risk back-patching it. A back-patch wouldn't do anything to fix existing rules' dependency lists, anyway.	2012-03-11 18:14:23 -04:00
Tom Lane	c6a11b89e4	Teach SPGiST to store nulls and do whole-index scans. This patch fixes the other major compatibility-breaking limitation of SPGiST, that it didn't store anything for null values of the indexed column, and so could not support whole-index scans or "x IS NULL" tests. The approach is to create a wholly separate search tree for the null entries, and use fixed "allTheSame" insertion and search rules when processing this tree, instead of calling the index opclass methods. This way the opclass methods do not need to worry about dealing with nulls. Catversion bump is for pg_am updates as well as the change in on-disk format of SPGiST indexes; there are some tweaks in SPGiST WAL records as well. Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev. (The original also stored nulls separately, but it reused GIN code to do so; which required undesirable compromises in the on-disk format, and would likely lead to bugs due to the GIN code being required to work in two very different contexts.)	2012-03-11 16:29:59 -04:00
Peter Eisentraut	86947e666d	Add more detail to error message for invalid arguments for server process It now prints the argument that was at fault. Also fix a small misbehavior where the error message issued by getopt() would complain about a program named "--single", because that's what argv[0] is in the server process.	2012-03-11 02:03:52 +02:00
Tom Lane	03e56f798e	Restructure SPGiST opclass interface API to support whole-index scans. The original API definition was incapable of supporting whole-index scans because there was no way to invoke leaf-value reconstruction without checking any qual conditions. Also, it was inefficient for multiple-qual-condition scans because value reconstruction got done over again for each qual condition, and because other internal work in the consistent functions likewise had to be done for each qual. To fix these issues, pass the whole scankey array to the opclass consistent functions, instead of only letting them see one item at a time. (Essentially, the loop over scankey entries is now inside the consistent functions not outside them. This makes the consistent functions a bit more complicated, but not unreasonably so.) In itself this commit does nothing except save a few cycles in multiple-qual-condition index scans, since we can't support whole-index scans on SPGiST indexes until nulls are included in the index. However, I consider this a must-fix for 9.2 because once we release it will get very much harder to change the opclass API definition.	2012-03-10 18:36:49 -05:00
Peter Eisentraut	39d74e346c	Add support for renaming constraints reviewed by Josh Berkus and Dimitri Fontaine	2012-03-10 20:19:13 +02:00
Robert Haas	07d1edb954	Extend object access hook framework to support arguments, and DROP. This allows loadable modules to get control at drop time, perhaps for the purpose of performing additional security checks or to log the event. The initial purpose of this code is to support sepgsql, but other applications should be possible as well. KaiGai Kohei, reviewed by me.	2012-03-09 14:34:56 -05:00
Tom Lane	b14953932d	Revise FDW planning API, again. Further reflection shows that a single callback isn't very workable if we desire to let FDWs generate multiple Paths, because that forces the FDW to do all work necessary to generate a valid Plan node for each Path. Instead split the former PlanForeignScan API into three steps: GetForeignRelSize, GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain, and it's substantially more flexible for complex FDWs. Add an fdw_private field to RelOptInfo so that the new functions can save state there rather than possibly having to recalculate information two or three times. In addition, we'd not thought through what would be needed to allow an FDW to set up subexpressions of its choice for runtime execution. We could treat ForeignScan.fdw_private as an executable expression but that seems likely to break existing FDWs unnecessarily (in particular, it would restrict the set of node types allowable in fdw_private to those supported by expression_tree_walker). Instead, invent a separate field fdw_exprs which will receive the postprocessing appropriate for expression trees. (One field is enough since it can be a list of expressions; also, we assume the corresponding expression state tree(s) will be held within fdw_state, so we don't need to add anything to ForeignScanState.) Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this further as we continue to work on that patch, but to me it feels a lot closer to being right now.	2012-03-09 12:49:25 -05:00
Heikki Linnakangas	342baf4ce6	Update outdated comment. HeapTupleHeader.t_natts field doesn't exist anymore. Kevin Grittner	2012-03-09 08:07:56 +02:00
Tom Lane	08dd23cec7	Fix some issues with temp/transient tables in extension scripts. Phil Sorber reported that a rewriting ALTER TABLE within an extension update script failed, because it creates and then drops a placeholder table; the drop was being disallowed because the table was marked as an extension member. We could hack that specific case but it seems likely that there might be related cases now or in the future, so the most practical solution seems to be to create an exception to the general rule that extension member objects can only be dropped by dropping the owning extension. To wit: if the DROP is issued within the extension's own creation or update scripts, we'll allow it, implicitly performing an "ALTER EXTENSION DROP object" first. This will simplify cases such as extension downgrade scripts anyway. No docs change since we don't seem to have documented the idea that you would need ALTER EXTENSION DROP for such an action to begin with. Also, arrange for explicitly temporary tables to not get linked as extension members in the first place, and the same for the magic pg_temp_nnn schemas that are created to hold them. This prevents assorted unpleasant results if an extension script creates a temp table: the forced drop at session end would either fail or remove the entire extension, and neither of those outcomes is desirable. Note that this doesn't fix the ALTER TABLE scenario, since the placeholder table is not temp (unless the table being rewritten is). Back-patch to 9.1.	2012-03-08 15:53:09 -05:00
Heikki Linnakangas	d93f209f48	Silence warning about unused variable, when building without assertions.	2012-03-08 11:10:02 +02:00
Tom Lane	66a7e6bae9	Improve estimation of IN/NOT IN by assuming array elements are distinct. In constructs such as "x IN (1,2,3,4)" and "x <> ALL(ARRAY[1,2,3,4])", we formerly always used a general-purpose assumption that the probability of success is independent for each comparison of "x" to an array element. But in real-world usage of these constructs, that's a pretty poor assumption; it's much saner to assume that the array elements are distinct and so the match probabilities are disjoint. Apply that assumption if the operator appears to behave as equality (for ANY) or inequality (for ALL). But fall back to the normal independent-probabilities calculation if this yields an impossible result, ie probability > 1 or < 0. We could protect ourselves against bad estimates even more by explicitly checking for equal array elements, but that is expensive and doesn't seem worthwhile: doing it would amount to optimizing for poorly-written queries at the expense of well-written ones. Daniele Varrazzo and Tom Lane, after a suggestion by Ants Aasma	2012-03-07 22:59:49 -05:00
Tom Lane	9088d1b965	Add GetForeignColumnOptions() to foreign.c, and add some documentation. GetForeignColumnOptions provides some abstraction for accessing column-specific FDW options, on a par with the access functions that were already provided here for other FDW-related information. Adjust file_fdw.c to use GetForeignColumnOptions instead of equivalent hand-rolled code. In addition, add some SGML documentation for the functions exported by foreign.c that are meant for use by FDW authors. (This is the fdw_helper portion of the proposed pgsql_fdw patch.) Hanada Shigeru, reviewed by KaiGai Kohei	2012-03-07 18:20:58 -05:00
Tom Lane	d4bf3c9c94	Expose an API for calculating catcache hash values. Now that cache invalidation callbacks get only a hash value, and not a tuple TID (per commits `632ae6829f` and `b5282aa893`), the only way they can restrict what they invalidate is to know what the hash values mean. setrefs.c was doing this via a hard-wired assumption but that seems pretty grotty, and it'll only get worse as more cases come up. So let's expose a calculation function that takes the same parameters as SearchSysCache. Per complaint from Marko Kreen.	2012-03-07 14:51:13 -05:00
Tom Lane	19dbc34631	Add a hook for processing messages due to be sent to the server log. Use-cases for this include custom log filtering rules and custom log message transmission mechanisms (for instance, lossy log message collection, which has been discussed several times recently). As is our common practice for hooks, there's no regression test nor user-facing documentation for this, though the author did exhibit a sample module using the hook. Martin Pihlak, reviewed by Marti Raudsepp	2012-03-06 15:35:41 -05:00
Robert Haas	bc97c38115	Typo fix. Fujii Masao	2012-03-06 08:23:51 -05:00
Heikki Linnakangas	e587e2e3e3	Make the comments more clear on the fact that UpdateFullPageWrites() is not safe to call concurrently from multiple processes.	2012-03-06 10:45:58 +02:00
Heikki Linnakangas	7714c63829	Remove extra copies of LogwrtResult. This simplifies the code a little bit. The new rule is that to update XLogCtl->LogwrtResult, you must hold both WALWriteLock and info_lck, whereas before we had two copies, one that was protected by WALWriteLock and another protected by info_lck. The code that updates them was already holding both locks, so merging the two is trivial. The third copy, XLogCtl->Insert.LogwrtResult, was not totally redundant, it was used in AdvanceXLInsertBuffer to update the backend-local copy, before acquiring the info_lck to read the up-to-date value. But the value of that seems dubious; at best it's saving one spinlock acquisition per completed WAL page, which is not significant compared to all the other work involved. And in practice, it's probably not saving even that much.	2012-03-06 10:18:33 +02:00
Heikki Linnakangas	3b682df326	Simplify the way changes to full_page_writes are logged. It's harmless to do full page writes even when not strictly necessary, so when turning full_page_writes on, we can set the global flag first, and then call XLogInsert. Likewise, when turning it off, we can write the WAL record first, and then clear the flag. This way XLogInsert doesn't need any special handling of the XLOG_FPW_CHANGE record type. XLogInsert is complicated enough already, so anything we can keep away from there is a good thing. Actually I don't think the atomicity of the shared memory flag matters, anyway, because we only write the XLOG_FPW_CHANGE at the end of recovery, when there are no concurrent WAL insertions going on. But might as well make it safe, in case we allow changing full_page_writes on the fly in the future.	2012-03-06 09:48:30 +02:00
Tom Lane	6b289942bf	Redesign PlanForeignScan API to allow multiple paths for a foreign table. The original API specification only allowed an FDW to create a single access path, which doesn't seem like a terribly good idea in hindsight. Instead, move the responsibility for building the Path node and calling add_path() into the FDW's PlanForeignScan function. Now, it can do that more than once if appropriate. There is no longer any need for the transient FdwPlan struct, so get rid of that. Etsuro Fujita, Shigeru Hanada, Tom Lane	2012-03-05 16:15:59 -05:00
Tom Lane	80da9e68fd	Rewrite GiST support code for rangetypes. This patch installs significantly smarter penalty and picksplit functions for ranges, making GiST indexes for them smaller and faster to search. There is no on-disk format change, so no catversion bump, but you'd need to REINDEX to get the benefits for any existing index. Alexander Korotkov, reviewed by Jeff Davis	2012-03-04 22:50:06 -05:00
Tom Lane	e2eed78910	Remove useless "rough estimate" path from mcelem_array_contained_selec. The code in this function that tried to cope with a missing count histogram was quite ineffective for anything except a perfectly flat distribution. Furthermore, since we were already punting for missing MCELEM slot, it's rather useless to sweat over missing DECHIST: there are no cases where ANALYZE will create the first but not the second. So just simplify the code by punting rather than pretending we can do something useful.	2012-03-04 16:03:38 -05:00
Tom Lane	4fb694aebc	Improve histogram-filling loop in new compute_array_stats() code. Do "frac" arithmetic in int64 to prevent overflow with large statistics targets, and improve the comments so people have some chance of understanding how it works. Alexander Korotkov and Tom Lane	2012-03-04 15:40:16 -05:00
Magnus Hagander	141b89826d	More carefully validate xlog location string inputs Now that we have validate_xlog_location, call it from the previously existing functions taking xlog locatoins as a string input. Suggested by Fujii Masao	2012-03-04 12:25:47 +01:00
Magnus Hagander	bc5ac36865	Add function pg_xlog_location_diff to help comparisons Comparing two xlog locations are useful for example when calculating replication lag. Euler Taveira de Oliveira, reviewed by Fujii Masao, and some cleanups from me	2012-03-04 12:22:38 +01:00
Tom Lane	0e5e167aae	Collect and use element-frequency statistics for arrays. This patch improves selectivity estimation for the array <@, &&, and @> (containment and overlaps) operators. It enables collection of statistics about individual array element values by ANALYZE, and introduces operator-specific estimators that use these stats. In addition, ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)" and "const <> ANY/ALL (array_column)" are estimated by treating them as variants of the containment operators. Since we still collect scalar-style stats about the array values as a whole, the pg_stats view is expanded to show both these stats and the array-style stats in separate columns. This creates an incompatible change in how stats for tsvector columns are displayed in pg_stats: the stats about lexemes are now displayed in the array-related columns instead of the original scalar-related columns. There are a few loose ends here, notably that it'd be nice to be able to suppress either the scalar-style stats or the array-element stats for columns for which they're not useful. But the patch is in good enough shape to commit for wider testing. Alexander Korotkov, reviewed by Noah Misch and Nathan Boley	2012-03-03 20:20:57 -05:00
Peter Eisentraut	b59ca98209	Allow CREATE TABLE (LIKE ...) from composite type The only reason this didn't work before was that parserOpenTable() rejects composite types. So use relation_openrv() directly and manually do the errposition() setup that parserOpenTable() does.	2012-03-03 16:03:05 +02:00
Tom Lane	44634e474f	Allow child-relation entries to be made in ec_has_const EquivalenceClasses. This fixes an oversight in commit `11cad29c91`, which introduced MergeAppend plans. Before that happened, we never particularly cared about the sort ordering of scans of inheritance child relations, since appending their outputs together would destroy any ordering anyway. But now it's important to be able to match child relation sort orderings to those of the surrounding query. The original coding of add_child_rel_equivalences skipped ec_has_const EquivalenceClasses, on the originally-correct grounds that adding child expressions to them was useless. The effect of this is that when a parent variable is equated to a constant, we can't recognize that index columns on the equivalent child variables are not sort-significant; that is, we can't recognize that a child index on, say, (x, y) is able to generate output in "ORDER BY y" order when there is a clause "WHERE x = constant". Adding child expressions to the (x, constant) EquivalenceClass fixes this, without any downside that I can see other than a few more planner cycles expended on such queries. Per recent gripe from Robert McGehee. Back-patch to 9.1 where MergeAppend was introduced.	2012-03-02 14:29:07 -05:00
Peter Eisentraut	6688d2878e	Add COLLATION FOR expression reviewed by Jaime Casanova	2012-03-02 21:12:16 +02:00
Heikki Linnakangas	2502f45979	When a GiST page is split during index build, it might not have a buffer. Previously it was thought that it's impossible as the code stands, because insertions create buffers as tuples are cascaded downwards, and index split also creaters buffers eagerly for all halves. But the example from Jay Levitt demonstrates that it can happen, when the root page is split. It's in fact OK if the buffer doesn't exist, so we just need to remove the sanity check. In fact, we've been discussing the possibility of destroying empty buffers to conserve memory, which would render the sanity check completely useless anyway. Fix by Alexander Korotkov	2012-03-02 13:16:09 +02:00
Alvaro Herrera	3433c6ba00	Remove TOAST table from pg_database The only toastable column now is datacl, but we don't really support long ACLs anyway. The TOAST table should have been removed when the pg_db_role_setting catalog was introduced in commit `2eda8dfb52`, but I forgot to do that. Per -hackers discussion on March 2011.	2012-03-01 12:50:52 -03:00
Heikki Linnakangas	d6a7271958	Correctly detect SSI conflicts of prepared transactions after crash. A prepared transaction can get new conflicts in and out after preparing, so we cannot rely on the in- and out-flags stored in the statefile at prepare- time. As a quick fix, make the conservative assumption that after a restart, all prepared transactions are considered to have both in- and out-conflicts. That can lead to unnecessary rollbacks after a crash, but that shouldn't be a big problem in practice; you don't want prepared transactions to hang around for a long time anyway. Dan Ports	2012-02-29 15:42:36 +02:00
Tom Lane	5c02a00d44	Move CRC tables to libpgport, and provide them in a separate include file. This makes it much more convenient to build tools for Postgres that are separately compiled and require a matching CRC implementation. To prevent multiple copies of the CRC polynomial tables being introduced into the postgres binaries, they are now included in the static library libpgport that is mainly meant for replacement system functions. That seems like a bit of a kludge, but there's no better place. This cleans up building of the tools pg_controldata and pg_resetxlog, which previously had to build their own copies of pg_crc.o. In the future, external programs that need access to the CRC tables can include the tables directly from the new header file pg_crc_tables.h. Daniel Farina, reviewed by Abhijit Menon-Sen and Tom Lane	2012-02-28 19:53:39 -05:00
Tom Lane	0140a11b9b	Fix thinko in new match_join_clauses_to_index() logic. We don't need to constrain the other side of an indexable join clause to not be below an outer join; an example here is SELECT FROM t1 LEFT JOIN t2 ON t1.a = t2.b LEFT JOIN t3 ON t2.c = t3.d; We can consider an inner indexscan on t3.d using c = d as indexqual, even though t2.c is potentially nulled by a previous outer join. The comparable logic in orindxpath.c has always worked that way, but I was being overly cautious here.	2012-02-28 18:10:40 -05:00
Peter Eisentraut	973e9fb294	Add const qualifiers where they are accidentally cast away This only produces warnings under -Wcast-qual, but it's more correct and consistent in any case.	2012-02-28 12:42:08 +02:00
Alvaro Herrera	cb3a7c2b95	ALTER TABLE: skip FK validation when it's safe to do so We already skip rewriting the table in these cases, but we still force a whole table scan to validate the data. This can be skipped, and thus we can make the whole ALTER TABLE operation just do some catalog touches instead of scanning the table, when these two conditions hold: (a) Old and new pg_constraint.conpfeqop match exactly. This is actually stronger than needed; we could loosen things by way of operator families, but it'd require a lot more effort. (b) The functions, if any, implementing a cast from the foreign type to the primary opcintype are the same. For this purpose, we can consider a binary coercion equivalent to an exact type match. When the opcintype is polymorphic, require that the old and new foreign types match exactly. (Since ri_triggers.c does use the executor, the stronger check for polymorphic types is no mere future-proofing. However, no core type exercises its necessity.) Author: Noah Misch Committer's note: catalog version bumped due to change of the Constraint node. I can't actually find any way to have such a node in a stored rule, but given that we have "out" support for them, better be safe.	2012-02-27 19:10:24 -03:00
Peter Eisentraut	9bf8603c7a	Call check_keywords.pl in maintainer-check For that purpose, have check_keywords.pl print errors to stderr and return a useful exit status.	2012-02-27 13:53:12 +02:00
Tom Lane	1b630751d0	Fix some more bugs in GIN's WAL replay logic. In commit `4016bdef8a` I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.	2012-02-26 15:12:17 -05:00
Peter Eisentraut	b5c077c368	Remove useless cast	2012-02-26 15:31:16 +02:00
Peter Eisentraut	66f0cf7da8	Remove useless const qualifier Claiming that the typevar argument to DefineCompositeType() is const was a plain lie. A similar case in DefineVirtualRelation() was already changed in passing in commit `1575fbcb`. Also clean up the now unnecessary casts that used to cast away the const.	2012-02-26 15:22:27 +02:00
Tom Lane	4dd78bf37a	Merge dissect() into cdissect() to remove a pile of near-duplicate code. The "uncomplicated" case isn't materially less complicated than the full case, certainly not enough so to justify duplicating nearly 500 lines of code. The only extra work being done in the full path is zaptreesubs, which is very cheap compared to everything else being done here, and besides that I'm less than convinced that it's not needed in some cases even without backrefs.	2012-02-24 18:40:31 -05:00
Tom Lane	587359479a	Avoid repeated creation/freeing of per-subre DFAs during regex search. In nested sub-regex trees, lower-level nodes created DFAs and then destroyed them again before exiting, which is a bit dumb considering that the recursive search is likely to call those nodes again later. Instead cache each created DFA until the end of pg_regexec(). This is basically a space for time tradeoff, in that it might increase the maximum memory usage. However, in most regex patterns there are not all that many subre nodes, so not that many DFAs --- and in any case, the peak usage occurs when reaching the bottom recursion level, and except for alternation cases that's going to be the same anyway.	2012-02-24 18:40:30 -05:00
Tom Lane	3cbfe485e4	Remove useless "retry memory" logic within regex engine. Apparently some primordial version of Spencer's engine needed cdissect() and child functions to be able to continue matching from a previous position when re-called. That is dead code, though, since trivial inspection shows that cdissect can never be entered without having previously done zapmem which resets the relevant retry counter. I have also verified experimentally that no case in the Tcl regression tests reaches cdissect with a nonzero retry value. Accordingly, remove that logic. This doesn't really save any noticeable number of cycles in itself, but it is one step towards making dissect() and cdissect() equivalent, which will allow removing hundreds of lines of near-duplicated code. Since struct subre's "retry" field is no longer particularly related to any kind of retry, rename it to "id". As of this commit it's only used for identifying a subre node in debug printouts, so you might think we should get rid of the field entirely; but I have a plan for another use.	2012-02-24 18:40:28 -05:00
Peter Eisentraut	9cfd800aab	Add some enumeration commas, for consistency	2012-02-24 11:04:45 +02:00
Tom Lane	173e29aa5d	Fix the general case of quantified regex back-references. Cases where a back-reference is part of a larger subexpression that is quantified have never worked in Spencer's regex engine, because he used a compile-time transformation that neglected the need to check the back-reference match in iterations before the last one. (That was okay for capturing parens, and we still do it if the regex has only capturing parens ... but it's not okay for backrefs.) To make this work properly, we have to add an "iteration" node type to the regex engine's vocabulary of sub-regex nodes. Since this is a moderately large change with a fair risk of introducing new bugs of its own, apply to HEAD only, even though it's a fix for a longstanding bug.	2012-02-24 01:41:03 -05:00
Andrew Dunstan	0c9e5d5e0d	Correctly handle NULLs in JSON output. Error reported by David Wheeler.	2012-02-23 23:44:16 -05:00
Tom Lane	077711c2e3	Remove arbitrary limitation on length of common name in SSL certificates. Both libpq and the backend would truncate a common name extracted from a certificate at 32 bytes. Replace that fixed-size buffer with dynamically allocated string so that there is no hard limit. While at it, remove the code for extracting peer_dn, which we weren't using for anything; and don't bother to store peer_cn longer than we need it in libpq. This limit was not so terribly unreasonable when the code was written, because we weren't using the result for anything critical, just logging it. But now that there are options for checking the common name against the server host name (in libpq) or using it as the user's name (in the server), this could result in undesirable failures. In the worst case it even seems possible to spoof a server name or user name, if the correct name is exactly 32 bytes and the attacker can persuade a trusted CA to issue a certificate in which that string is a prefix of the certificate's common name. (To exploit this for a server name, he'd also have to send the connection astray via phony DNS data or some such.) The case that this is a realistic security threat is a bit thin, but nonetheless we'll treat it as one. Back-patch to 8.4. Older releases contain the faulty code, but it's not a security problem because the common name wasn't used for anything interesting. Reported and patched by Heikki Linnakangas Security: CVE-2012-0867	2012-02-23 15:48:04 -05:00
Tom Lane	891e6e7bfd	Require execute permission on the trigger function for CREATE TRIGGER. This check was overlooked when we added function execute permissions to the system years ago. For an ordinary trigger function it's not a big deal, since trigger functions execute with the permissions of the table owner, so they couldn't do anything the user issuing the CREATE TRIGGER couldn't have done anyway. However, if a trigger function is SECURITY DEFINER, that is not the case. The lack of checking would allow another user to install it on his own table and then invoke it with, essentially, forged input data; which the trigger function is unlikely to realize, so it might do something undesirable, for instance insert false entries in an audit log table. Reported by Dinesh Kumar, patch by Robert Haas Security: CVE-2012-0866	2012-02-23 15:38:56 -05:00
Peter Eisentraut	c9d7004440	Remove inappropriate quotes And adjust wording for consistency.	2012-02-23 12:52:17 +02:00
Peter Eisentraut	8251670cb3	Fix build without OpenSSL This is a fixup for commit `a445cb92ef`.	2012-02-23 10:20:25 +02:00
Robert Haas	2254367435	Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written. Also expose the new counters through pg_stat_statements. Patch by me. Review by Fujii Masao and Greg Smith.	2012-02-22 20:33:05 -05:00
Robert Haas	f74f9a277c	Fix typo in comment. Sandro Santilli	2012-02-22 19:46:12 -05:00
Peter Eisentraut	a445cb92ef	Add parameters for controlling locations of server-side SSL files This allows changing the location of the files that were previously hard-coded to server.crt, server.key, root.crt, root.crl. server.crt and server.key continue to be the default settings and are thus required to be present by default if SSL is enabled. But the settings for the server-side CA and CRL are now empty by default, and if they are set, the files are required to be present. This replaces the previous behavior of ignoring the functionality if the files were not found.	2012-02-22 23:40:46 +02:00
Alvaro Herrera	a417f85e1d	REASSIGN OWNED: Support foreign data wrappers and servers This was overlooked when implementing those kinds of objects, in commit `cae565e503`. Per report from Pawel Casperek.	2012-02-22 17:33:12 -03:00
Tom Lane	593a9631a7	Don't clear btpo_cycleid during _bt_vacuum_one_page. When "vacuuming" a single btree page by removing LP_DEAD tuples, we are not actually within a vacuum operation, but rather in an ordinary insertion process that could well be running concurrently with a vacuum. So clearing the cycleid is incorrect, and could cause the concurrent vacuum to miss removing tuples that it needs to remove. This is a longstanding bug introduced by commit `e6284649b9` of 2006-07-25. I believe it explains Maxim Boguk's recent report of index corruption, and probably some other previously unexplained reports. In 9.0 and up this is a one-line fix; before that we need to introduce a flag to tell _bt_delitems what to do.	2012-02-21 15:03:36 -05:00
Tom Lane	9789c99d01	Cosmetic cleanup for commit `a760893dbd`. Mostly, fixing overlooked comments.	2012-02-21 14:14:16 -05:00
Magnus Hagander	c2a2f7516b	Avoid double close of file handle in syslogger on win32 This causes an exception when running under a debugger or in particular when running on a debug version of Windows. Patch from MauMau	2012-02-21 17:12:25 +01:00
Andrew Dunstan	6b044cb810	Fix typo, noticed by Will Crawford.	2012-02-21 11:03:51 -05:00
Andrew Dunstan	83fcaffea2	Fix a couple of cases of JSON output. First, as noted by Itagaki Takahiro, a datum of type JSON doesn't need to be escaped. Second, ensure that numeric output not in the form of a legal JSON number is quoted and escaped.	2012-02-20 15:01:03 -05:00
Tom Lane	5223f96d92	Fix regex back-references that are directly quantified with . The syntax "\n", that is a backref with a * quantifier directly applied to it, has never worked correctly in Spencer's library. This has been an open bug in the Tcl bug tracker since 2005: https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894 The core of the problem is in parseqatom(), which first changes "\n" to "\n+\|" and then applies repeat() to the NFA representing the backref atom. repeat() thinks that any arc leading into its "rp" argument is part of the sub-NFA to be repeated. Unfortunately, since parseqatom() already created the arc that was intended to represent the empty bypass around "\n+", this arc gets moved too, so that it now leads into the state loop created by repeat(). Thus, what was supposed to be an "empty" bypass gets turned into something that represents zero or more repetitions of the NFA representing the backref atom. In the original example, in place of ^([bc])\1$ we now have something that acts like ^([bc])(\1+\|[bc])$ At runtime, the branch involving the actual backref fails, as it's supposed to, but then the other branch succeeds anyway. We could no doubt fix this by some rearrangement of the operations in parseqatom(), but that code is plenty ugly already, and what's more the whole business of converting "x" to "x+\|" probably needs to go away to fix another problem I'll mention in a moment. Instead, this patch suppresses the *-conversion when the target is a simple backref atom, leaving the case of m == 0 to be handled at runtime. This makes the patch in regcomp.c a one-liner, at the cost of having to tweak cbrdissect() a little. In the event I went a bit further than that and rewrote cbrdissect() to check all the string-length-related conditions before it starts comparing characters. It seems a bit stupid to possibly iterate through many copies of an n-character backreference, only to fail at the end because the target string's length isn't a multiple of n --- we could have found that out before starting. The existing coding could only be a win if integer division is hugely expensive compared to character comparison, but I don't know of any modern machine where that might be true. This does not fix all the problems with quantified back-references. In particular, the code is still broken for back-references that appear within a larger expression that is quantified (so that direct insertion of the quantification limits into the BACKREF node doesn't apply). I think fixing that will take some major surgery on the NFA code, specifically introducing an explicit iteration node type instead of trying to transform iteration into concatenation of modified regexps. Back-patch to all supported branches. In HEAD, also add a regression test case for this. (It may seem a bit silly to create a regression test file for just one test case; but I'm expecting that we will soon import a whole bunch of regex regression tests from Tcl, so might as well create the infrastructure now.)	2012-02-20 00:52:33 -05:00
Tom Lane	e00f68e49c	Add caching of ctype.h/wctype.h results in regc_locale.c. While this doesn't save a huge amount of runtime, it still seems worth doing, especially since I realized that the data copying I did in my first draft was quite unnecessary. In this version, once we have the results cached, getting them back for re-use is really very cheap. Also, remove the hard-wired limitation to not consider wctype.h results for character codes above 255. It turns out that we can't push the limit as far up as I'd originally hoped, because the regex colormap code is not efficient enough to cope very well with character classes containing many thousand letters, which a Unicode locale is entirely capable of producing. Still, we can push it up to U+7FF (which I chose as the limit of 2-byte UTF8 characters), which will at least make Eastern Europeans happy pending a better solution. Thus, this commit resolves the specific complaint in bug #6457, but not the more general issue that letters of non-western alphabets are mostly not recognized as matching [[:alpha:]].	2012-02-19 21:01:13 -05:00
Tom Lane	27af91438b	Create the beginnings of internals documentation for the regex code. Create src/backend/regex/README to hold an implementation overview of the regex package, and fill it in with some preliminary notes about the code's DFA/NFA processing and colormap management. Much more to do there of course. Also, improve some code comments around the colormap and cvec code. No functional changes except to add one missing assert.	2012-02-19 18:58:23 -05:00
Andrew Dunstan	2f582f76b1	Improve pretty printing of viewdefs. Some line feeds are added to target lists and from lists to make them more readable. By default they wrap at 80 columns if possible, but the wrap column is also selectable - if 0 it wraps after every item. Andrew Dunstan, reviewed by Hitoshi Harada.	2012-02-19 11:43:46 -05:00
Tom Lane	08fd6ff37f	Sync regex code with Tcl 8.5.11. Sync our regex code with upstream changes since last time we did this, which was Tcl 8.5.0 (see commit `df1e965e12`). There are no functional changes here; the main point is just to lay down a commit-log marker that somebody has looked at this recently, and to do what we can to keep the two codebases comparable.	2012-02-17 19:44:26 -05:00
Tom Lane	4767bc8ff2	Improve statistics estimation to make some use of DISTINCT in sub-queries. Formerly, we just punted when trying to estimate stats for variables coming out of sub-queries using DISTINCT, on the grounds that whatever stats we might have for underlying table columns would be inapplicable. But if the sub-query has only one DISTINCT column, we can consider its output variable as being unique, which is useful information all by itself. The scope of this improvement is pretty narrow, but it costs nearly nothing, so we might as well do it. Per discussion with Andres Freund. This patch differs from the draft I submitted yesterday in updating various comments about vardata.isunique (to reflect its extended meaning) and in tweaking the interaction with security_barrier views. There does not seem to be a reason why we can't use this sort of knowledge even when the sub-query is such a view.	2012-02-16 17:34:00 -05:00
Tom Lane	4bfe68dfab	Run a portal's cleanup hook immediately when pushing it to FAILED state. This extends the changes of commit `6252c4f9e2` so that we run the cleanup hook earlier for failure cases as well as success cases. As before, the point is to avoid an assertion failure from an Assert I added in commit `a874fe7b4c`, which was meant to check that no user-written code can be called during portal cleanup. This fixes a case reported by Pavan Deolasee in which the Assert could be triggered during backend exit (see the new regression test case), and also prevents the possibility that the cleanup hook is run after portions of the portal's state have already been recycled. That doesn't really matter in current usage, but it foreseeably could matter in the future. Back-patch to 9.1 where the Assert in question was added.	2012-02-15 16:19:01 -05:00
Robert Haas	edec8c8e00	Fix VPATH builds, broken by my recent commit to speed up tuplesorting. The relevant commit is `337b6f5ecf`.	2012-02-15 15:53:53 -05:00
Robert Haas	337b6f5ecf	Speed up in-memory tuplesorting. Per recent work by Peter Geoghegan, it's significantly faster to tuplesort on a single sortkey if ApplySortComparator is inlined into quicksort rather reached via a function pointer. It's also faster in general to have a version of quicksort which is specialized for sorting SortTuple objects rather than objects of arbitrary size and type. This requires a couple of additional copies of the quicksort logic, which in this patch are generate using a Perl script. There might be some benefit in adding further specializations here too, but thus far it's not clear that those gains are worth their weight in code footprint.	2012-02-15 12:13:32 -05:00
Robert Haas	73a4b994a6	Make CREATE/ALTER FUNCTION support NOT LEAKPROOF. Because it isn't good to be able to turn things on, and not off again.	2012-02-15 10:45:08 -05:00
Tom Lane	398f70ec07	Preserve column names in the execution-time tupledesc for a RowExpr. The hstore and json datatypes both have record-conversion functions that pay attention to column names in the composite values they're handed. We used to not worry about inserting correct field names into tuple descriptors generated at runtime, but given these examples it seems useful to do so. Observe the nicer-looking results in the regression tests whose results changed. catversion bump because there is a subtle change in requirements for stored rule parsetrees: RowExprs from ROW() constructs now have to include field names. Andrew Dunstan and Tom Lane	2012-02-14 17:34:56 -05:00
Robert Haas	cd30728fb2	Allow LEAKPROOF functions for better performance of security views. We don't normally allow quals to be pushed down into a view created with the security_barrier option, but functions without side effects are an exception: they're OK. This allows much better performance in common cases, such as when using an equality operator (that might even be indexable). There is an outstanding issue here with the CREATE FUNCTION / ALTER FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the leakproof flag. But I'm committing this as-is so that it doesn't have to be rebased again; we can fix up the grammar in a future commit. KaiGai Kohei, with some wordsmithing by me.	2012-02-13 22:21:14 -05:00
Heikki Linnakangas	21b1634275	Fix heap_multi_insert to set t_self field in the caller's tuples. If tuples were toasted, heap_multi_insert didn't update the ctid on the original tuples. This caused a failure if there was an after trigger (including a foreign key), on the table, and a tuple got toasted. Per off-list report and test case from Ted Phelps	2012-02-13 10:20:50 +02:00
Robert Haas	d429ebe347	Add a comment to AdjustIntervalForTypmod to reduce chance of future bugs. It's not entirely evident how the logic here relates to the interval_transform function, so let's clue people in that they need to check that if the rules change.	2012-02-09 12:24:36 -05:00
Robert Haas	6656588575	Improve interval_transform function to detect a few more cases. Noah Misch, per a review comment from me.	2012-02-09 12:24:22 -05:00
Heikki Linnakangas	82e73ba0d1	Add new keywords SNAPSHOT and TYPES to the keyword list in gram.y These were added to kwlist.h as unreserved keywords in separate patches, but authors forgot to add them to the corresponding list in gram.y. Because of that, even though they were supposed to be unreserved keywords, they could not be used as identifiers. src/tools/check_keywords.pl is your friend.	2012-02-09 11:37:54 +02:00
Tom Lane	331bf6712c	Throw error sooner for unlogged GiST indexes. Throwing an error only after we've built the main index fork is pretty unfriendly when the table already contains data. Per gripe from Jay Levitt.	2012-02-08 16:19:27 -05:00
Tom Lane	cb7c84fae8	Check misplaced window functions before checking aggregate/group by sanity. If somebody puts a window function in WHERE, we should complain about that in so many words. The previous coding tended to complain about the window function's arguments instead, which is likely to be misleading to users who are unclear on the semantics of window functions; as seen for example in bug #6440 from Matyas Novak. Just another example of how "add new code at the end" is frequently a bad heuristic.	2012-02-08 13:15:02 -05:00
Robert Haas	c13897983a	Add transform functions for various temporal typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds in some cases. Noah Misch, with trivial changes by me.	2012-02-08 09:33:37 -05:00
Heikki Linnakangas	1a01560cbb	Rename LWLockWaitUntilFree to LWLockAcquireOrWait. LWLockAcquireOrWait makes it more clear that the lock is acquired if it's free.	2012-02-08 09:17:13 +02:00
Robert Haas	af7dd696b0	Fix typos pointed out by Noah Misch.	2012-02-07 21:40:49 -05:00
Robert Haas	f7d7dade8a	Add a transform function for varbit typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds when the new type is unconstraint varbit, or when the allowable number of bits is not decreasing. Noah Misch, with review and a fix for an OID collision by me.	2012-02-07 12:42:50 -05:00
Robert Haas	3cc0800829	Add a transform function for numeric typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds when a column is changed to an unconstrained numeric, or when the scale is unchanged and the precision does not decrease. Noah Misch, with a few stylistic changes and a fix for an OID collision by me.	2012-02-07 12:08:26 -05:00
Robert Haas	af7914c662	Add TIMING option to EXPLAIN, to allow eliminating of timing overhead. Sometimes it may be useful to get actual row counts out of EXPLAIN (ANALYZE) without paying the cost of timing every node entry/exit. With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that. Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.	2012-02-07 11:23:04 -05:00
Heikki Linnakangas	15ad6f1510	When building with LWLOCK_STATS, initialize the stats in LWLockWaitUntilFree. If LWLockWaitUntilFree was called before the first LWLockAcquire call, you would either crash because of access to uninitialized array or account the acquisition incorrectly. LWLockConditionalAcquire doesn't have this problem because it doesn't update the lwlock stats. In practice, this never happens because there is no codepath where you would call LWLockWaitUntilfree before LWLockAcquire after a new process is launched. But that's just accidental, there's no guarantee that that's always going to be true in the future. Spotted by Jeff Janes.	2012-02-07 10:11:54 +02:00
Tom Lane	442231d7f7	Fix postmaster to attempt restart after a hot-standby crash. The postmaster was coded to treat any unexpected exit of the startup process (i.e., the WAL replay process) as a catastrophic crash, and not try to restart it. This was OK so long as the startup process could not have any sibling postmaster children. However, if a hot-standby backend crashes, we SIGQUIT the startup process along with everything else, and the resulting exit is hardly "unexpected". Treating it as such meant we failed to restart a standby server after any child crash at all, not only a crash of the WAL replay process as intended. Adjust that. Back-patch to 9.0 where hot standby was introduced.	2012-02-06 15:30:21 -05:00
Tom Lane	5fc78efcec	Avoid throwing ERROR during WAL replay of DROP TABLESPACE. Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless removal of the tablespace's directories succeeds, that does not guarantee that the same operation will succeed during WAL replay. Foreseeable reasons for it to fail include temp files created in the tablespace by Hot Standby backends, wrong directory permissions on a standby server, etc etc. The original coding threw ERROR if replay failed to remove the directories, but that is a serious overreaction. Throwing an error aborts recovery, and worse means that manual intervention will be needed to get the database to start again, since otherwise the same error will recur on subsequent attempts to replay the same WAL record. And the consequence of failing to remove the directories is only that some probably-small amount of disk space is wasted, so it hardly seems justified to throw an error. Accordingly, arrange to report such failures as LOG messages and keep going when a failure occurs during replay. Back-patch to 9.0 where Hot Standby was introduced. In principle such problems can occur in earlier releases, but Hot Standby increases the odds of trouble significantly. Given the lack of field reports of such issues, I'm satisfied with patching back as far as the patch applies easily.	2012-02-06 14:44:41 -05:00
Tom Lane	c6d76d7c82	Add locking around WAL-replay modification of shared-memory variables. Originally, most of this code assumed that no Postgres backends could be running concurrently with it, and so no locking could be needed. That assumption fails in Hot Standby. While it's still true that Hot Standby backends should never change values like nextXid, they can examine them, and consistency is important in some cases such as when computing a snapshot. Therefore, prudence requires that WAL replay code obtain the relevant locks when modifying such variables, even though it can examine them without taking a lock. We were following that coding rule in some places but not all. This commit applies the coding rule uniformly to all updates of ShmemVariableCache and MultiXactState fields; a search of the replay routines did not find any other cases that seemed to be at risk. In addition, this commit fixes a longstanding thinko in replay of NEXTOID and checkpoint records: we tried to advance nextOid only if it was behind the value in the WAL record, but the comparison would draw the wrong conclusion if OID wraparound had occurred since the previous value. Better to just unconditionally assign the new value, since OID assignment shouldn't be happening during replay anyway. The additional locking seems to be more in the nature of future-proofing than fixing any live bug, so I am not going to back-patch it. The NEXTOID fix will be back-patched separately.	2012-02-06 12:34:10 -05:00
Tom Lane	17118825b8	Fix transient clobbering of shared buffers during WAL replay. RestoreBkpBlocks was in the habit of zeroing and refilling the target buffer; which was perfectly safe when the code was written, but is unsafe during Hot Standby operation. The reason is that we have coding rules that allow backends to continue accessing a tuple in a heap relation while holding only a pin on its buffer. Such a backend could see transiently zeroed data, if WAL replay had occasion to change other data on the page. This has been shown to be the cause of bug #6425 from Duncan Rance (who deserves kudos for developing a sufficiently-reproducible test case) as well as Bridget Frey's re-report of bug #6200. It most likely explains the original report as well, though we don't yet have confirmation of that. To fix, change the code so that only bytes that are supposed to change will change, even transiently. This actually saves cycles in RestoreBkpBlocks, since it's not writing the same bytes twice. Also fix seq_redo, which has the same disease, though it has to work a bit harder to meet the requirement. So far as I can tell, no other WAL replay routines have this type of bug. In particular, the index-related replay routines, which would certainly be broken if they had to meet the same standard, are not at risk because we do not have coding rules that allow access to an index page when not holding a buffer lock on it. Back-patch to 9.0 where Hot Standby was added.	2012-02-05 15:49:17 -05:00
Tom Lane	ee68a44106	Improve comment.	2012-02-04 22:37:34 -05:00
Tom Lane	2af72cefea	Add missing Assert and fix inaccurate elog message in standby_redo(). All other WAL redo routines either call RestoreBkpBlocks() or Assert that they haven't been passed any backup blocks. Make this one do likewise. Also, fix incorrect routine name in its failure message.	2012-02-04 22:32:35 -05:00
Tom Lane	9bff0780cf	Allow SQL-language functions to reference parameters by name. Matthew Draper, reviewed by Hitoshi Harada	2012-02-04 19:23:49 -05:00
Andrew Dunstan	39909d1d39	Add array_to_json and row_to_json functions. Also move the escape_json function from explain.c to json.c where it seems to belong. Andrew Dunstan, Reviewd by Abhijit Menon-Sen.	2012-02-03 12:11:16 -05:00
Robert Haas	0ed7445d73	Allow spgist's text_ops to handle pattern-matching operators. This was presumably intended to work this way all along, but a few key bits of indxpath.c didn't get the memo. Robert Haas and Tom Lane	2012-02-02 13:10:56 -05:00
Robert Haas	b4e0741727	Avoid re-checking for visibility map extension too frequently. When testing bits (but not when setting or clearing them), we now won't check whether the map has been extended. This significantly improves performance in the case where the visibility map doesn't exist yet, by avoiding an extra system call per tuple. To make sure backends notice eventually, send an smgr inval on VM extension. Dean Rasheed, with minor modifications by me.	2012-02-01 20:35:42 -05:00
Peter Eisentraut	8a02339e9b	initdb: Add options --auth-local and --auth-host reviewed by Robert Haas and Pavel Stehule	2012-02-01 21:18:55 +02:00
Tom Lane	c318aeed84	Try to be more consistent about accepting denormalized float8 numbers. On some platforms, strtod() reports ERANGE for a denormalized value (ie, one that can be represented as distinct from zero, but is too small to have full precision). On others, it doesn't. It seems better to try to accept these values consistently, so add a test to see if the result value indicates a true out-of-range condition. This should be okay per Single Unix Spec. On machines where the underlying math isn't IEEE standard, the behavior for such small numbers may not be very consistent, but then it wouldn't be anyway. Marti Raudsepp, after a proposal by Jeroen Vermeulen	2012-02-01 13:11:16 -05:00
Robert Haas	5384a73f98	Built-in JSON data type. Like the XML data type, we simply store JSON data as text, after checking that it is valid. More complex operations such as canonicalization and comparison may come later, but this is enough for not. There are a few open issues here, such as whether we should attempt to detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets the basic framework in place.	2012-01-31 11:48:23 -05:00
Heikki Linnakangas	82d4b262d9	Fix bug in the new wait-until-lwlock-is-free mechanism. If there was a wait-until-free process in the head of the wait queue, followed by an exclusive locker, the exclusive locker was not be woken up as it should.	2012-01-31 00:09:30 +02:00
Peter Eisentraut	82e83f46a2	Add sequence USAGE privileges to information schema The sequence USAGE privilege is sufficiently similar to the SQL standard that it seems reasonable to show in the information schema. Also add some compatibility notes about it on the GRANT reference page.	2012-01-30 21:45:42 +02:00
Heikki Linnakangas	9b38d46d9f	Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that.	2012-01-30 16:53:48 +02:00
Simon Riggs	ba1868ba31	Minor bug fix and cleanup from self-review of sync rep queues patch.	2012-01-30 14:36:17 +00:00
Simon Riggs	73f617f13f	Various minor comments changes from bgwriter to checkpointer.	2012-01-30 14:34:25 +00:00
Heikki Linnakangas	a578257040	Accept a non-existent value in "ALTER USER/DATABASE SET ..." command. When default_text_search_config, default_tablespace, or temp_tablespaces setting is set per-user or per-database, with an "ALTER USER/DATABASE SET ..." statement, don't throw an error if the text search configuration or tablespace does not exist. In case of text search configuration, even if it doesn't exist in the current database, it might exist in another database, where the setting is intended to have its effect. This behavior is now the same as search_path's. Tablespaces are cluster-wide, so the same argument doesn't hold for tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER SET ..." statements before the "CREATE TABLESPACE" statements. Arguably that's pg_dumpall's fault - it should dump the statements in such an order that the tablespace is created first and then the "ALTER USER SET default_tablespace ..." statements after that - but it seems better to be consistent with search_path and default_text_search_config anyway. Besides, you could still create a dump that throws an error, by creating the tablespace, running "ALTER USER SET default_tablespace", then dropping the tablespace and running pg_dumpall on that. Backpatch to all supported versions.	2012-01-30 11:13:36 +02:00
Tom Lane	ad10853b30	Assorted comment fixes, mostly just typos, but some obsolete statements. YAMAMOTO Takashi	2012-01-29 19:23:56 -05:00
Tom Lane	21a39de580	Tweak index costing for problems with partial indexes. btcostestimate() makes an estimate of the number of index tuples that will be visited based on knowledge of which index clauses can actually bound the scan within nbtree. However, it forgot to account for partial indexes in this calculation, with the result that the cost of the index scan could be significantly overestimated for a partial index. Fix that by merging the predicate with the abbreviated indexclause list, in the same way as we do with the full list to estimate how many heap tuples will be visited. Also, slightly increase the "fudge factor" that's meant to give preference to smaller indexes over larger ones. While this is applied to all indexes, it's most important for partial indexes since it can be the only factor that makes a partial index look cheaper than a similar full index. Experimentation shows that the existing value is so small as to easily get swamped by noise such as page-boundary-roundoff behavior. I'm tempted to kick it up more than this, but will refrain for now. Per report from Ruben Blanco. These are long-standing issues, but given the lack of prior complaints I'm not going to risk changing planner behavior in back branches by back-patching.	2012-01-29 18:37:14 -05:00
Tom Lane	b28ffd0fcc	Fix pushing of index-expression qualifications through UNION ALL. In commit `57664ed25e`, I made the planner wrap non-simple-variable outputs of appendrel children (IOW, child SELECTs of UNION ALL subqueries) inside PlaceHolderVars, in order to solve some issues with EquivalenceClass processing. However, this means that any upper-level WHERE clauses mentioning such outputs will now contain PlaceHolderVars after they're pushed down into the appendrel child, and that prevents indxpath.c from recognizing that they could be matched to index expressions. To fix, add explicit stripping of PlaceHolderVars from index operands, same as we have long done for RelabelType nodes. Add a regression test covering both this and the plain-UNION case (which is a totally different code path, but should also be able to do it). Per bug #6416 from Matteo Beccati. Back-patch to 9.1, same as the previous change.	2012-01-29 16:31:23 -05:00
Tom Lane	4ec6581c0c	Fix handling of init_plans list in inheritance_planner(). Formerly we passed an empty list to each per-child-table invocation of grouping_planner, and then merged the results into the global list. However, that fails if there's a CTE attached to the statement, because create_ctescan_plan uses the list to find the plan referenced by a CTE reference; so it was unable to find any CTEs attached to the outer UPDATE or DELETE. But there's no real reason not to use the same list throughout the process, and doing so is simpler and faster anyway. Per report from Josh Berkus of "could not find plan for CTE" failures. Back-patch to 9.1 where we added support for WITH attached to UPDATE or DELETE. Add some regression test cases, too.	2012-01-28 20:24:42 -05:00
Tom Lane	7c1719bc68	Fix handling of data-modifying CTE subplans in EvalPlanQual. We can't just skip initializing such subplans, because the referencing CTE node will expect to find the subplan available when it initializes. That in turn means that ExecInitModifyTable must allow the case (which actually it needed to do anyway, since there's no guarantee that ModifyTable is exactly at the top of the CTE plan tree). So move the complaint about not being allowed in EvalPlanQual mode to execution instead of initialization. Testing turned up yet another problem, which is that we'd try to re-initialize the result relation's index list, leading to leaks and dangling pointers. Per report from Phil Sorber. Back-patch to 9.1 where data-modifying CTEs were introduced.	2012-01-28 17:43:57 -05:00
Magnus Hagander	672614cf21	Prevent logging "failed to stat file: success" for temp files This was broken in commit `bc3347484a`, the addition of statistics counters for temp files. Reported by Thom Brown	2012-01-28 10:03:26 +01:00
Tom Lane	0816fad6ee	Undo 8.4-era lobotomization of subquery pullup rules. After the planner was fixed to convert some IN/EXISTS subqueries into semijoins or antijoins, we had to prevent it from doing that in some cases where the plans risked getting much worse. The reason the plans got worse was that in the unoptimized implementation, subqueries could reference parameters from the outer query at any join level, and so full table scans could be avoided even if they were one or more levels of join below where the semi/anti join would be. Now that we have sufficient mechanism in the planner to handle such cases properly, it should no longer be necessary to play dumb here. This reverts commits `07b9936a0f` and `cd1f0d04bf`. The latter was a stopgap fix that wasn't really sufficiently analyzed at the time. Rather than just restricting ourselves to cases where the new join can be stacked on the right-hand input, we should also consider whether it can be stacked on the left-hand input.	2012-01-27 19:46:41 -05:00
Tom Lane	e2fa76d80b	Use parameterized paths to generate inner indexscans more flexibly. This patch fixes the planner so that it can generate nestloop-with- inner-indexscan plans even with one or more levels of joining between the indexscan and the nestloop join that is supplying the parameter. The executor was fixed to handle such cases some time ago, but the planner was not ready. This should improve our plans in many situations where join ordering restrictions formerly forced complete table scans. There is probably a fair amount of tuning work yet to be done, because of various heuristics that have been added to limit the number of parameterized paths considered. However, we are not going to find out what needs to be adjusted until the code gets some real-world use, so it's time to get it in there where it can be tested easily. Note API change for index AM amcostestimate functions. I'm not aware of any non-core index AMs, but if there are any, they will need minor adjustments.	2012-01-27 19:26:38 -05:00
Peter Eisentraut	b376ec6fa5	Show default privileges in information schema Hitherto, the information schema only showed explicitly granted privileges that were visible in the *acl catalog columns. If no privileges had been granted, the implicit privileges were not shown. To fix that, add an SQL-accessible version of the acldefault() function, and use that inside the aclexplode() calls to substitute the catalog-specific default privilege set for null values. reviewed by Abhijit Menon-Sen	2012-01-27 21:58:51 +02:00
Peter Eisentraut	bf90562aa4	Revert unfortunate whitespace change In `e5e2fc842c`, blank lines were removed after a comment block, which now looks as though the comment refers to the immediately following code, but it actually refers to the preceding code. So put the blank lines back.	2012-01-27 21:39:38 +02:00
Peter Eisentraut	2787458362	Disallow ALTER DOMAIN on non-domain type everywhere This has been the behavior already in most cases, but through omission, ALTER DOMAIN / OWNER TO and ALTER DOMAIN / SET SCHEMA would silently work on non-domain types as well.	2012-01-27 21:20:34 +02:00
Peter Eisentraut	8137f2c323	Hide most variable-length fields from Form_pg_* structs Those fields only appear in the structs so that genbki.pl can create the BKI bootstrap files for the catalogs. But they are not actually usable from C. So hiding them can prevent coding mistakes, saves stack space, and can help the compiler. In certain catalogs, the first variable-length field has been kept visible after manual inspection. These exceptions are noted in C comments. reviewed by Tom Lane	2012-01-27 20:16:17 +02:00
Peter Eisentraut	8a3f745f16	Do not access indclass through Form_pg_index Normally, accessing variable-length members of catalog structures past the first one doesn't work at all. Here, it happened to work because indnatts was checked to be 1, and so the defined FormData_pg_index layout, using int2vector[1] and oidvector[1] for variable-length arrays, happened to match the actual memory layout. But it's a very fragile assumption, and it's not in a performance-critical path, so code it properly using heap_getattr() instead. bug analysis by Tom Lane	2012-01-27 20:08:34 +02:00
Heikki Linnakangas	cf3fff6326	Initialize the new bgwriterLatch field properly. Peter Geoghegan	2012-01-27 18:25:32 +02:00
Robert Haas	c5a03256c7	Adjust tuplesort.c based on the fact that we never use the OS's qsort(). Our own qsort_arg() implementation doesn't have the defect previously observed to affect only QNX 4, so it seems sufficiently to assert that it isn't broken rather than retesting. Also, update a few comments to clarify why it's valuable to retain a tie-break rule based on CTID during index builds. Peter Geoghegan, with slight tweaks by me.	2012-01-26 14:43:28 -05:00
Robert Haas	2d1371d3ee	Be more clear when a new column name collides with a system column name. We now use the same error message for ALTER TABLE .. ADD COLUMN or ALTER TABLE .. RENAME COLUMN that we do for CREATE TABLE. The old message was accurate, but might be confusing to users not aware of our system columns. Vik Reykja, with some changes by me, and further proofreading by Tom Lane	2012-01-26 12:44:30 -05:00
Heikki Linnakangas	6d90eaaa89	Make bgwriter sleep longer when it has no work to do, to save electricity. To make it wake up promptly when activity starts again, backends nudge it by setting a latch in MarkBufferDirty(). The latch is kept set while bgwriter is active, so there is very little overhead from that when the system is busy. It is only armed before going into longer sleep. Peter Geoghegan, with some changes by me.	2012-01-26 18:39:13 +02:00
Robert Haas	467ff207f5	Add missing #include, to suppress compiler warning.	2012-01-26 10:16:26 -05:00
Magnus Hagander	7729e22d83	Fix a copy/pasted typo in several comments	2012-01-26 16:02:33 +01:00
Magnus Hagander	61cb8c5abb	Add deadlock counter to pg_stat_database Adds a counter that tracks number of deadlocks that occurred in each database to pg_stat_database. Magnus Hagander, reviewed by Jaime Casanova	2012-01-26 15:58:19 +01:00
Robert Haas	0e549697d1	Classify DROP operations by whether or not they are user-initiated. This doesn't do anything useful just yet, but is intended as supporting infrastructure for allowing sepgsql to sensibly check DROP permissions. KaiGai Kohei and Robert Haas	2012-01-26 09:30:27 -05:00
Magnus Hagander	bc3347484a	Track temporary file count and size in pg_stat_database Add counters for number and size of temporary files used for spill-to-disk queries for each database to the pg_stat_database view. Tomas Vondra, review by Magnus Hagander	2012-01-26 14:41:19 +01:00
Robert Haas	9d35116611	Damage control for yesterday's CheckIndexCompatible changes. Rip out a regression test that doesn't play well with settings put in place by the build farm, and rewrite the code in CheckIndexCompatible in a hopefully more transparent style.	2012-01-26 08:21:31 -05:00
Robert Haas	9f9135d129	Instrument index-only scans to count heap fetches performed. Patch by me; review by Tom Lane, Jeff Davis, and Peter Geoghegan.	2012-01-25 20:41:52 -05:00
Robert Haas	6eb71ac552	Make CheckIndexCompatible simpler and more bullet-proof. This gives up the "don't rewrite the index" behavior in a couple of relatively unimportant cases, such as changing between an array type and an unconstrained domain over that array type, in return for making this code more future-proof. Noah Misch	2012-01-25 15:28:07 -05:00
Simon Riggs	8366c7803e	Allow pg_basebackup from standby node with safety checking. Base backup follows recommended procedure, plus goes to great lengths to ensure that partial page writes are avoided. Jun Ishizuka and Fujii Masao, with minor modifications	2012-01-25 18:02:04 +00:00
Alvaro Herrera	74ab96a45e	Add pg_trigger_depth() function This reports the depth level of triggers currently in execution, or zero if not called from inside a trigger. No catversion bump in this patch, but you have to initdb if you want access to the new function. Author: Kevin Grittner	2012-01-25 13:22:54 -03:00
Simon Riggs	443b4821f1	Add new replication mode synchronous_commit = 'write'. Replication occurs only to memory on standby, not to disk, so provides additional performance if user wishes to reduce durability level slightly. Adds concept of multiple independent sync rep queues. Fujii Masao and Simon Riggs	2012-01-24 20:22:37 +00:00
Peter Eisentraut	89dda5f297	Remove quotes around format_type_be() output format_type_be() takes care of any needed quoting itself.	2012-01-24 21:49:27 +02:00
Tom Lane	f26c9896b3	Suppress variable-clobbered-by-longjmp warning seen with older gcc versions.	2012-01-24 13:44:07 -05:00
Tom Lane	beef89567e	Suppress possibly-uninitialized-variable warning seen with older gcc versions.	2012-01-24 13:40:26 -05:00
Bruce Momjian	890a9992ce	Reduce tab outdent of "error handling" GUC comments in postgresql.conf, to match surrounding outdenting.	2012-01-24 10:41:00 -05:00
Simon Riggs	c172b7b02e	Resolve timing issue with logging locks for Hot Standby. We log AccessExclusiveLocks for replay onto standby nodes, but because of timing issues on ProcArray it is possible to log a lock that is still held by a just committed transaction that is very soon to be removed. To avoid any timing issue we avoid applying locks made by transactions with InvalidXid. Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee	2012-01-23 23:37:32 +00:00
Simon Riggs	b8a91d9d1c	ALTER <thing> [IF EXISTS] ... allows silent DDL if required, e.g. ALTER FOREIGN TABLE IF EXISTS foo RENAME TO bar Pavel Stehule	2012-01-23 23:25:04 +00:00
Magnus Hagander	a65023e7de	Further doc cleanups from the pg_stat_activity changes Fujii Masao	2012-01-20 12:23:26 +01:00
Robert Haas	cc53a1e7cc	Add bitwise AND, OR, and NOT operators for macaddr data type. Brendan Jurd, reviewed by Fujii Masao	2012-01-19 15:25:14 -05:00

... 2 3 4 5 6 ...

12742 Commits