postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	6308ba05a7	Improve control logic for bgwriter hibernation mode. Commit `6d90eaaa89` added a hibernation mode to the bgwriter to reduce the server's idle-power consumption. However, its interaction with the detailed behavior of BgBufferSync's feedback control loop wasn't very well thought out. That control loop depends primarily on the rate of buffer allocation, not the rate of buffer dirtying, so the hibernation mode has to be designed to operate only when no new buffer allocations are happening. Also, the check for whether the system is effectively idle was not quite right and would fail to detect a constant low level of activity, thus allowing the bgwriter to go into hibernation mode in a way that would let the cycle time vary quite a bit, possibly further confusing the feedback loop. To fix, move the wakeup support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode unless no buffer allocations have happened recently. In addition, fix the delaying logic to remove the problem of possibly not responding to signals promptly, which was basically caused by trying to use the process latch's is_set flag for multiple purposes. I can't prove it but I'm suspicious that that hack was responsible for the intermittent "postmaster does not shut down" failures we've been seeing in the buildfarm lately. In any case it did nothing to improve the readability or robustness of the code. In passing, express the hibernation sleep time as a multiplier on BgWriterDelay, not a constant. I'm not sure whether there's any value in exposing the longer sleep time as an independently configurable setting, but we can at least make it act like this for little extra code.	2012-05-09 23:37:10 -04:00
Peter Eisentraut	5d39807a00	Add make dependency so that postgres.bki is rebuilt in major version change Every time since the current rule for postgres.bki was put in place when we change the major version, people complain that their tests fail in strange ways. This is because the version number in postgres.bki is not updated, because it has no dependency for that. And you can't even force the rebuild manually if you don't happen to know which file has the problem. Fix that now before it will happen again. The only remaining problem with switching major versions, as far as the regression tests are concerned, is that contrib needs to be rebuilt. But that's easily invoked, and in any case the failure modes are more friendly if you forget that.	2012-05-09 20:45:56 +03:00
Simon Riggs	8f28789bff	Rename BgWriterShmem/Request to CheckpointerShmem/Request	2012-05-09 14:23:45 +01:00
Simon Riggs	bbd3ec9dce	Rename BgWriterCommLock to CheckpointerCommLock	2012-05-09 14:11:48 +01:00
Simon Riggs	5829387381	Avoid xid error from age() function when run on Hot Standby	2012-05-09 13:56:24 +01:00
Tom Lane	acd4c7d58b	Fix an issue in recent walwriter hibernation patch. Users of asynchronous-commit mode expect there to be a guaranteed maximum delay before an async commit's WAL records get flushed to disk. The original version of the walwriter hibernation patch broke that. Add an extra shared-memory flag to allow async commits to kick the walwriter out of hibernation mode, without adding any noticeable overhead in cases where no action is needed.	2012-05-08 23:06:40 -04:00
Tom Lane	49340037ee	Reduce idle power consumption of stats collector process. Latch-ify the stats collector, so that it does not need an arbitrary wakeup cycle to check for postmaster death. The incremental savings in idle power is pretty marginal, since we only had it waking every two seconds; but I believe that this patch may also improve the collector's performance under load, by reducing the number of kernel calls made per message when messages are arriving constantly (we now avoid a select/poll call except when we need to sleep). The change also reduces the time needed for a normal database shutdown on platforms where signals don't interrupt select().	2012-05-08 21:26:46 -04:00
Tom Lane	5461564a9d	Reduce idle power consumption of walwriter and checkpointer processes. This patch modifies the walwriter process so that, when it has not found anything useful to do for many consecutive wakeup cycles, it extends its sleep time to reduce the server's idle power consumption. It reverts to normal as soon as it's done any successful flushes. It's still true that during any async commit, backends check for completed, unflushed pages of WAL and signal the walwriter if there are any; so that in practice the walwriter can get awakened and returned to normal operation sooner than the sleep time might suggest. Also, improve the checkpointer so that it uses a latch and a computed delay time to not wake up at all except when it has something to do, replacing a previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for reducing the server's power consumption when idle. In passing, get rid of the dedicated latch for signaling the walwriter in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Also, fix a pre-existing bug with failure to save/restore errno in walwriter's signal handlers. Peter Geoghegan, somewhat simplified by Tom	2012-05-08 20:03:26 -04:00
Magnus Hagander	916d589a10	Make "unexpected EOF" messages DEBUG1 unless in an open transaction "Unexpected EOF on client connection" without an open transaction is mostly noise, so turn it into DEBUG1. With an open transaction it's still indicating a problem, so keep those as ERROR, and change the message to indicate that it happened in a transaction.	2012-05-07 18:50:44 +02:00
Tom Lane	71b9549d05	Overdue code review for transaction-level advisory locks patch. Commit `62c7bd31c8` had assorted problems, most visibly that it broke PREPARE TRANSACTION in the presence of session-level advisory locks (which should be ignored by PREPARE), as per a recent complaint from Stephen Rees. More abstractly, the patch made the LockMethodData.transactional flag not merely useless but outright dangerous, because in point of fact that flag no longer tells you anything at all about whether a lock is held transactionally. This fix therefore removes that flag altogether. We now rely entirely on the convention already in use in lock.c that transactional lock holds must be owned by some ResourceOwner, while session holds are never so owned. Setting the locallock struct's owner link to NULL thus denotes a session hold, and there is no redundant marker for that. PREPARE TRANSACTION now works again when there are session-level advisory locks, and it is also able to transfer transactional advisory locks to the prepared transaction, but for implementation reasons it throws an error if we hold both types of lock on a single lockable object. Perhaps it will be worth improving that someday. Assorted other minor cleanup and documentation editing, as well. Back-patch to 9.1, except that in the 9.1 branch I did not remove the LockMethodData.transactional flag for fear of causing an ABI break for any external code that might be examining those structs.	2012-05-04 17:44:31 -04:00
Bruce Momjian	ebcaa5fcde	Remove BSD/OS (BSDi) port. There are no known users upgrading to Postgres 9.2, and perhaps no existing users either.	2012-05-03 10:58:44 -04:00
Peter Eisentraut	e9605a039b	Even more duplicate word removal, in the spirit of the season	2012-05-02 20:56:03 +03:00
Robert Haas	0038110421	Avoid repeated CLOG access from heap_hot_search_buffer. At the time we check whether the tuple is dead to all running transactions, we've already verified that it isn't visible to our scan, setting hint bits if appropriate. So there's no need to recheck CLOG for the all-dead test we do just a moment later. So, add HeapTupleIsSurelyDead() to test the appropriate condition under the assumption that all relevant hit bits are already set. Review by Tom Lane.	2012-05-02 12:40:07 -04:00
Robert Haas	1b4998fd44	Further corrections from the department of redundancy department. Thom Brown	2012-05-02 11:11:25 -04:00
Robert Haas	e01e66f808	More duplicate word removal.	2012-05-02 09:28:16 -04:00
Heikki Linnakangas	f291ccd43e	Remove duplicate words in comments. Found these with grep -r "for for ".	2012-05-02 10:20:27 +03:00
Tom Lane	50c2d6a1a6	Kill some remaining references to SVR4 and univel. Both terms still appear in a few places, but I thought it best to leave those alone in context.	2012-05-02 00:29:17 -04:00
Peter Eisentraut	f2f9439fbf	Remove dead ports Remove the following ports: - dgux - nextstep - sunos4 - svr4 - ultrix4 - univel These are obsolete and not worth rescuing. In most cases, there is circumstantial evidence that they wouldn't work anymore anyway.	2012-05-01 22:11:12 +03:00
Tom Lane	809e7e21af	Converge all SQL-level statistics timing values to float8 milliseconds. This patch adjusts the core statistics views to match the decision already taken for pg_stat_statements, that values representing elapsed time should be represented as float8 and measured in milliseconds. By using float8, we are no longer tied to a specific maximum precision of timing data. (Internally, it's still microseconds, but we could now change that without needing changes at the SQL level.) The columns affected are pg_stat_bgwriter.checkpoint_write_time pg_stat_bgwriter.checkpoint_sync_time pg_stat_database.blk_read_time pg_stat_database.blk_write_time pg_stat_user_functions.total_time pg_stat_user_functions.self_time pg_stat_xact_user_functions.total_time pg_stat_xact_user_functions.self_time The first four of these are new in 9.2, so there is no compatibility issue from changing them. The others require a release note comment that they are now double precision (and can show a fractional part) rather than bigint as before; also their underlying statistics functions now match the column definitions, instead of returning bigint microseconds.	2012-04-30 14:03:33 -04:00
Robert Haas	0d2235a25b	Remove duplicate word in comment. Noted by Peter Geoghegan.	2012-04-30 13:14:46 -04:00
Tom Lane	1dd89eadcd	Rename I/O timing statistics columns to blk_read_time and blk_write_time. This seems more consistent with the pre-existing choices for names of other statistics columns. Rename assorted internal identifiers to match.	2012-04-29 18:13:33 -04:00
Tom Lane	309c64745e	Rename track_iotiming GUC to track_io_timing. This spelling seems significantly more readable to me.	2012-04-29 16:23:54 -04:00
Peter Eisentraut	81107282a5	Change return type of ExceptionalCondition to void and mark it noreturn In ancient times, it was thought that this wouldn't work because of TrapMacro/AssertMacro, but changing those to use a comma operator appears to work without compiler warnings.	2012-04-29 21:20:14 +03:00
Tom Lane	cdbad241f4	Clear I/O timing counters after sending them to the stats collector. This oversight caused the reported times to accumulate in an O(N^2) fashion the longer a backend runs.	2012-04-28 15:11:13 -04:00
Tom Lane	d6f7d4fdc5	Fix printing of whole-row Vars at top level of a SELECT targetlist. Normally whole-row Vars are printed as "tabname.". However, that does not work at top level of a targetlist, because per SQL standard the parser will think that the "" should result in column-by-column expansion; which is not at all what a whole-row Var implies. We used to just print the table name in such cases, which works most of the time; but it fails if the table name matches a column name available anywhere in the FROM clause. This could lead for instance to a view being interpreted differently after dump and reload. Adding parentheses doesn't fix it, but there is a reasonably simple kluge we can use instead: attach a no-op cast, so that the "*" isn't syntactically at top level anymore. This makes the printing of such whole-row Vars a lot more consistent with other Vars, and may indeed fix more cases than just the reported one; I'm suspicious that cases involving schema qualification probably didn't work properly before, either. Per bug report and fix proposal from Abbas Butt, though this patch is quite different in detail from his. Back-patch to all supported versions.	2012-04-27 19:49:18 -04:00
Tom Lane	537b266953	Fix syslogger's rotation disable/re-enable logic. If it fails to open a new log file, the syslogger assumes there's something wrong with its parameters (such as log_directory), and stops attempting automatic time-based or size-based log file rotations. Sending it SIGHUP is supposed to start that up again. However, the original coding for that was really bogus, involving clobbering a couple of GUC variables and hoping that SIGHUP processing would restore them. Get rid of that technique in favor of maintaining a separate flag showing we've turned rotation off. Per report from Mark Kirkwood. Also, the syslogger will automatically attempt to create the log_directory directory if it doesn't exist, but that was only happening at startup. For consistency and ease of use, it should do the same whenever the value of log_directory is changed by SIGHUP. Back-patch to all supported branches.	2012-04-27 00:12:42 -04:00
Robert Haas	3424bff90f	Prevent index-only scans from returning wrong answers under Hot Standby. The alternative of disallowing index-only scans in HS operation was discussed, but the consensus was that it was better to treat marking a page all-visible as a recovery conflict for snapshots that could still fail to see XIDs on that page. We may in the future try to soften this, so that we simply force index scans to do heap fetches in cases where this may be an issue, rather than throwing a hard conflict.	2012-04-26 20:00:21 -04:00
Tom Lane	7c85aa39fc	Fix oversight in recent parameterized-path patch. bitmap_scan_cost_est() has to be able to cope with a BitmapOrPath, but I'd taken a shortcut that didn't work for that case. Noted by Heikki. Add some regression tests since this area is evidently under-covered.	2012-04-26 14:17:44 -04:00
Tom Lane	9fa82c9809	Fix planner's handling of RETURNING lists in writable CTEs. setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists, which meant they were left with the wrong varnos when the RETURNING list was in a subquery. That was never possible before writable CTEs, of course, but now it's broken. The executor fails to notice any problem because ExecEvalVar just references the ecxt_scantuple for any normal varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a recent complaint from Bartosz Dmytrak. Since the eventual rtoffset of the subquery is not known at the time we are preparing its plan node, the previous scheme of executing set_returning_clause_references() at that time cannot handle this adjustment. Fortunately, it turns out that we don't really need to do it that way, because all the needed information is available during normal setrefs.c execution; we just have to dig it out of the ModifyTable node. So, do that, and get rid of the kluge of early setrefs processing of RETURNING lists. (This is a little bit of a cheat in the case of inherited UPDATE/DELETE, because we are not passing a "root" struct that corresponds exactly to what the subplan was built with. But that doesn't matter, and anyway this is less ugly than early setrefs processing was.) Back-patch to 9.1, where the problem became possible to hit.	2012-04-25 20:20:33 -04:00
Tom Lane	9873001e6d	Another trivial comment-typo fix.	2012-04-25 14:28:58 -04:00
Robert Haas	3ce7f18e92	Casts to or from a domain type are ignored; warn and document. Prohibiting this outright would break dumps taken from older versions that contain such casts, which would create far more pain than is justified here. Per report by Jaime Casanova and subsequent discussion.	2012-04-24 09:20:53 -04:00
Robert Haas	5d4b60f2f2	Lots of doc corrections. Josh Kupershmidt	2012-04-23 22:43:09 -04:00
Robert Haas	7ab9b2f3b7	Rearrange lazy_scan_heap to avoid visibility map race conditions. We must set the visibility map bit before releasing our exclusive lock on the heap page; otherwise, someone might clear the heap page bit before we set the visibility map bit, leading to a situation where the visibility map thinks the page is all-visible but it's really not. This problem has existed since 8.4, but it wasn't critical before we had index-only scans, since the worst case scenario was that the page wouldn't get vacuumed until the next scan_all vacuum. Along the way, a couple of minor, related improvements: (1) if we pause the heap scan to do an index vac cycle, release any visibility map page we're holding, since really long-running pins are not good for a variety of reasons; and (2) warn if we see a page that's marked all-visible in the visibility map but not on the page level, since that should never happen any more (it was allowed in previous releases, but not in 9.2).	2012-04-23 22:08:06 -04:00
Robert Haas	85efd5f065	Reduce hash size for compute_array_stats, compute_tsvector_stats. The size is only a hint, but a big hint chews up a lot of memory without apparently improving performance much. Analysis and patch by Noah Misch.	2012-04-23 22:05:41 -04:00
Peter Eisentraut	48658a1b81	Fix some typos Josh Kupershmidt	2012-04-22 19:23:47 +03:00
Tom Lane	33e99153e9	Use fuzzy not exact cost comparison for the final tie-breaker in add_path. Instead of an exact cost comparison, use a fuzzy comparison with 1e-10 delta after all other path metrics have proved equal. This is to avoid having platform-specific roundoff behaviors determine the choice when two paths are really the same to our cost estimators. Adjust the recently-added test case that made it obvious we had a problem here.	2012-04-21 00:51:14 -04:00
Alvaro Herrera	09ff76fcdb	Recast "ONLY" column CHECK constraints as NO INHERIT The original syntax wasn't universally loved, and it didn't allow its usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and it also allows using ALTER TABLE ONLY to add an uninherited CHECK constraint, per discussion. The pg_constraint column has accordingly been renamed connoinherit. This commit partly reverts some of the changes in `61d81bd28d`, particularly some pg_dump and psql bits, because now pg_get_constraintdef includes the necessary NO INHERIT within the constraint definition. Author: Nikhil Sontakke Some tweaks by me	2012-04-20 23:56:57 -03:00
Tom Lane	1f03630011	Adjust join_search_one_level's handling of clauseless joins. For an initial relation that lacks any join clauses (that is, it has to be cartesian-product-joined to the rest of the query), we considered only cartesian joins with initial rels appearing later in the initial-relations list. This creates an undesirable dependency on FROM-list order. We would never fail to find a plan, but perhaps we might not find the best available plan. Noted while discussing the logic with Amit Kapila. Improve the comments a bit in this area, too. Arguably this is a bug fix, but given the lack of complaints from the field I'll refrain from back-patching.	2012-04-20 20:10:46 -04:00
Tom Lane	5b7b5518d0	Revise parameterized-path mechanism to fix assorted issues. This patch adjusts the treatment of parameterized paths so that all paths with the same parameterization (same set of required outer rels) for the same relation will have the same rowcount estimate. We cache the rowcount estimates to ensure that property, and hopefully save a few cycles too. Doing this makes it practical for add_path_precheck to operate without a rowcount estimate: it need only assume that paths with different parameterizations never dominate each other, which is close enough to true anyway for coarse filtering, because normally a more-parameterized path should yield fewer rows thanks to having more join clauses to apply. In add_path, we do the full nine yards of comparing rowcount estimates along with everything else, so that we can discard parameterized paths that don't actually have an advantage. This fixes some issues I'd found with add_path rejecting parameterized paths on the grounds that they were more expensive than not-parameterized ones, even though they yielded many fewer rows and hence would be cheaper once subsequent joining was considered. To make the same-rowcounts assumption valid, we have to require that any parameterized path enforce all join clauses that could be obtained from the particular set of outer rels, even if not all of them are useful for indexing. This is required at both base scans and joins. It's a good thing anyway since the net impact is that join quals are checked at the lowest practical level in the join tree. Hence, discard the original rather ad-hoc mechanism for choosing parameterization joinquals, and build a better one that has a more principled rule for when clauses can be moved. The original rule was actually buggy anyway for lack of knowledge about which relations are part of an outer join's outer side; getting this right requires adding an outer_relids field to RestrictInfo.	2012-04-19 15:53:47 -04:00
Robert Haas	293ec33c32	Remove bogus comment from HeapTupleSatisfiesNow. This has been wrong for a really long time. We don't use two-phase locking to protect against serialization anomalies. Per discussion on pgsql-hackers about 2011-03-07; original report by Dan Ports.	2012-04-18 11:50:45 -04:00
Robert Haas	4a6fab03f2	Finish rename of FastPathStrongLocks to FastPathStrongRelationLocks. Commit `8e5ac74c12` tried to do this renaming, but I relied on gcc to tell me where I needed to make changes, instead of grep. Noted by Jeff Davis.	2012-04-18 11:29:34 -04:00
Robert Haas	53c5b869b4	Tighten up error recovery for fast-path locking. The previous code could cause a backend crash after BEGIN; SAVEPOINT a; LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts in other situations. Report by Zoltán Böszörményi; patch review by Jeff Davis.	2012-04-18 11:17:30 -04:00
Robert Haas	ab77b2da8b	Fix incorrect comment in SetBufferCommitInfoNeedsSave(). Noah Misch spotted the fact that the old comment is in fact incorrect, due to memory ordering hazards.	2012-04-18 10:55:40 -04:00
Robert Haas	e93c0b820f	After PageSetAllVisible, use MarkBufferDirty. Previously, we used SetBufferCommitInfoNeedsSave, but that's really intended for dirty-marks we can theoretically afford to lose, such as hint bits. As for 9.2, the PD_ALL_VISIBLE mustn't be lost in this way, since we could then end up with a heap page that isn't all-visible and a visibility map page that is all visible, causing index-only scans to return wrong answers.	2012-04-18 10:49:37 -04:00
Robert Haas	b5eccaef2c	Fix copyfuncs/equalfuncs support for ReassignOwnedStmt. Noah Misch	2012-04-18 10:45:18 -04:00
Robert Haas	53bbc681ca	Fix various infelicities in node functions. Mostly, this consists of adding support for fields which exist in the structure but aren't handled by copy/equal/outfuncs; but the create foreign table case can actually produce garbage output. Noah Misch	2012-04-18 10:43:16 -04:00
Heikki Linnakangas	fe546f3da6	Don't wait for the commit record to be replicated if we wrote no WAL. When using synchronous replication, we waited for the commit record to be replicated, but if we our transaction didn't write any other WAL records, that's not required because we don't even flush the WAL locally to disk in that case. This lead to long waits when committing a transaction that only modified a temporary table. Bug spotted by Thom Brown.	2012-04-17 16:28:31 +03:00
Peter Eisentraut	a33fcd7e79	Fix typo Kyotaro HORIGUCHI	2012-04-16 15:36:40 +03:00
Robert Haas	ea6a2d8d47	Rename synchronous_commit='write' to 'remote_write'. Fujii Masao, per discussion on pgsql-hackers	2012-04-14 10:53:22 -04:00
Robert Haas	4a2d7ad76f	pg_size_pretty(numeric) The output of the new pg_xlog_location_diff function is of type numeric, since it could theoretically overflow an int8 due to signedness; this provides a convenient way to format such values. Fujii Masao, with some beautification by me.	2012-04-14 08:07:25 -04:00
Tom Lane	e54b10a62d	Remove the "last ditch" code path in join_search_one_level(). So far as I can tell, it is no longer possible for this heuristic to do anything useful, because the new weaker definition of have_relevant_joinclause means that any relation with a joinclause must be considered joinable to at least one other relation. It would still be possible for the code block to be entered, for example if there are join order restrictions that prevent any join of the current level from being formed; but in that case it's just a waste of cycles to attempt to form cartesian joins, since the restrictions will still apply. Furthermore, IMO the existence of this code path can mask bugs elsewhere; we would have noticed the problem with cartesian joins a lot sooner if this code hadn't compensated for it in the simplest case. Accordingly, let's remove it and see what happens. I'm committing this separately from the prerequisite changes in have_relevant_joinclause, just to make the question easier to revisit if there is some fault in my logic.	2012-04-13 16:07:18 -04:00
Tom Lane	e3ffd05b02	Weaken the planner's tests for relevant joinclauses. We should be willing to cross-join two small relations if that allows us to use an inner indexscan on a large relation (that is, the potential indexqual for the large table requires both smaller relations). This worked in simple cases but fell apart as soon as there was a join clause to a fourth relation, because the existence of any two-relation join clause caused the planner to not consider clauseless joins between other base relations. The added regression test shows an example case adapted from a recent complaint from Benoit Delbosc. Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and has_relevant_eclass_joinclause to consider that a join clause mentioning three or more relations is sufficient grounds for joining any subset of those relations, even if we have to do so via a cartesian join. Since such clauses are relatively uncommon, this shouldn't affect planning speed on typical queries; in fact it should help a bit, because the latter two functions in particular get significantly simpler. Although this is arguably a bug fix, I'm not going to risk back-patching it, since it might have currently-unforeseen consequences.	2012-04-13 16:07:17 -04:00
Peter Eisentraut	c0cc526e8b	Rename bytea_agg to string_agg and add delimiter argument Per mailing list discussion, we would like to keep the bytea functions parallel to the text functions, so rename bytea_agg to string_agg, which already exists for text. Also, to satisfy the rule that we don't want aggregate functions of the same name with a different number of arguments, add a delimiter argument, just like string_agg for text already has.	2012-04-13 21:36:59 +03:00
Peter Eisentraut	64e1309c76	Consistently quote encoding and locale names in messages	2012-04-13 20:37:07 +03:00
Robert Haas	61167bfaf2	Fix typo in comment.	2012-04-13 08:54:13 -04:00
Robert Haas	5630eddf1e	Update lazy_scan_heap header comment. The previous comment described how things worked in PostgreSQL 8.2 and prior.	2012-04-13 08:51:19 -04:00
Tom Lane	732bfa2448	Fix cost estimation for indexscan filter conditions. cost_index's method for estimating per-tuple costs of evaluating filter conditions (a/k/a qpquals) was completely wrong in the presence of derived indexable conditions, such as range conditions derived from a LIKE clause. This was largely masked in common cases as a result of all simple operator clauses having about the same costs, but it could show up in a big way when dealing with functional indexes containing expensive functions, as seen for example in bug #6579 from Istvan Endredy. Rejigger the calculation to give sane answers when the indexquals aren't a subset of the baserestrictinfo list. As a side benefit, we now do the calculation properly for cases involving join clauses (ie, parameterized indexscans), which we always overestimated before. There are still cases where this is an oversimplification, such as clauses that can be dropped because they are implied by a partial index's predicate. But we've never accounted for that in cost estimates before, and I'm not convinced it's worth the cycles to try to do so.	2012-04-11 20:24:17 -04:00
Tom Lane	880bfc3287	Silently ignore any nonexistent schemas that are listed in search_path. Previously we attempted to throw an error or at least warning for missing schemas, but this was done inconsistently because of implementation restrictions (in many cases, GUC settings are applied outside transactions so that we can't do system catalog lookups). Furthermore, there were exceptions to the rule even in the beginning, and we'd been poking more and more holes in it as time went on, because it turns out that there are lots of use-cases for having some irrelevant items in a common search_path value. It seems better to just adopt a philosophy similar to what's always been done with Unix PATH settings, wherein nonexistent or unreadable directories are silently ignored. This commit also fixes the documentation to point out that schemas for which the user lacks USAGE privilege are silently ignored. That's always been true but was previously not documented. This is mostly in response to Robert Haas' complaint that 9.1 started to throw errors or warnings for missing schemas in cases where prior releases had not. We won't adopt such a significant behavioral change in a back branch, so something different will be needed in 9.1.	2012-04-11 12:02:50 -04:00
Tom Lane	3769fa5fc6	Make pg_tablespace_location(0) return the database's default tablespace. This definition is convenient when applying the function to the reltablespace column of pg_class, since that's what zero means there; and it doesn't interfere with any other plausible use of the function. Per gripe from Bruce Momjian.	2012-04-10 21:43:14 -04:00
Tom Lane	0d9819f7e3	Measure epoch of timestamp-without-time-zone from local not UTC midnight. This patch reverts commit `191ef2b407` and thereby restores the pre-7.3 behavior of EXTRACT(EPOCH FROM timestamp-without-tz). Per discussion, the more recent behavior was misguided on a couple of grounds: it makes it hard to get a non-timezone-aware epoch value for a timestamp, and it makes this one case dependent on the value of the timezone GUC, which is incompatible with having timestamp_part() labeled as immutable. The other behavior is still available (in all releases) by explicitly casting the timestamp to timestamp with time zone before applying EXTRACT. This will need to be called out as an incompatible change in the 9.2 release notes. Although having mutable behavior in a function marked immutable is clearly a bug, we're not going to back-patch such a change.	2012-04-10 12:04:42 -04:00
Tom Lane	65fd91333e	Fix an Assert that turns out to be reachable after all. estimate_num_groups() gets unhappy with create table empty(); select * from empty except select * from empty e2; I can't see any actual use-case for such a query (and the table is illegal per SQL spec), but it seems like a good idea that it not cause an assert failure.	2012-04-09 11:58:24 -04:00
Tom Lane	d515365a61	Don't bother copying empty support arrays in a zero-column MergeJoin. The case could not arise when this code was originally written, but it can now (since we made zero-column MergeJoins work for the benefit of FULL JOIN ON TRUE). I don't think there is any actual bug here, but we might as well treat it consistently with other uses of COPY_POINTER_FIELD(). Per comment from Ashutosh Bapat.	2012-04-09 11:41:54 -04:00
Robert Haas	3ae5133b1c	Teach SLRU code to avoid replacing I/O-busy pages. Patch by me; review by Tom Lane and others.	2012-04-08 23:05:55 -04:00
Heikki Linnakangas	03529a3ff9	set_stack_base() no longer needs to be called in PostgresMain. This was a thinko in previous commit. Now that stack base pointer is now set in PostmasterMain and SubPostmasterMain, it doesn't need to be set in PostgresMain anymore.	2012-04-08 19:39:12 +03:00
Heikki Linnakangas	ef3883d130	Do stack-depth checking in all postmaster children. We used to only initialize the stack base pointer when starting up a regular backend, not in other processes. In particular, autovacuum workers can run arbitrary user code, and without stack-depth checking, infinite recursion in e.g an index expression will bring down the whole cluster. The comment about PL/Java using set_stack_base() is not yet true. As the code stands, PL/java still modifies the stack_base_ptr variable directly. However, it's been discussed in the PL/Java mailing list that it should be changed to use the function, because PL/Java is currently oblivious to the register stack used on Itanium. There's another issues with PL/Java, namely that the stack base pointer it sets is not really the base of the stack, it could be something close to the bottom of the stack. That's a separate issue that might need some further changes to this code, but that's a different story. Backpatch to all supported releases.	2012-04-08 19:07:55 +03:00
Tom Lane	7feecedcce	Fix incorrect make maintainer-clean rule.	2012-04-07 18:16:50 -04:00
Tom Lane	95b9c333b2	Further adjustment of comment about qsort_tuple.	2012-04-07 17:48:40 -04:00
Tom Lane	a25ef7a5f6	Remove useless variable to suppress compiler warning.	2012-04-07 16:44:43 -04:00
Tom Lane	0ab4db52c0	Fix misleading output from gin_desc(). XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.	2012-04-06 18:10:21 -04:00
Tom Lane	17b985b1a0	Fix broken comparetup_datum code. Commit `337b6f5ecf` contained the entirely fanciful assumption that it had made comparetup_datum unreachable. Reported and patched by Takashi Yamamoto. Fix up some not terribly accurate/useful comments from that commit, too.	2012-04-06 16:58:50 -04:00
Tom Lane	cea49fe82f	Dept of second thoughts: improve the API for AnalyzeForeignTable. If we make the initially-called function return the table physical-size estimate, acquire_inherited_sample_rows will be able to use that to allocate numbers of samples among child tables, when the day comes that we want to support foreign tables in inheritance trees.	2012-04-06 16:04:10 -04:00
Tom Lane	263d9de66b	Allow statistics to be collected for foreign tables. ANALYZE now accepts foreign tables and allows the table's FDW to control how the sample rows are collected. (But only manual ANALYZEs will touch foreign tables, for the moment, since among other things it's not very clear how to handle remote permissions checks in an auto-analyze.) contrib/file_fdw is extended to support this. Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.	2012-04-06 15:02:35 -04:00
Simon Riggs	8cb53654db	Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock	2012-04-06 10:21:40 +01:00
Robert Haas	21cc529698	checkopint -> checkpoint Report by Guillaume Lelarge.	2012-04-05 21:37:33 -04:00
Robert Haas	b736aef2ec	Publish checkpoint timing information to pg_stat_bgwriter. Greg Smith, Peter Geoghegan, and Robert Haas	2012-04-05 14:04:37 -04:00
Robert Haas	644828908f	Expose track_iotiming data via the statistics collector. Ants Aasma's original patch to add timing information for buffer I/O requests exposed this data at the relation level, which was judged too costly. I've here exposed it at the database level instead.	2012-04-05 11:40:24 -04:00
Tom Lane	c17e863bc7	Fix syslogger to not lose log coherency under high load. The original coding of the syslogger had an arbitrary limit of 20 large messages concurrently in progress, after which it would just punt and dump message fragments to the output file separately. Our ambitions are a bit higher than that now, so allow the data structure to expand as necessary. Reported and patched by Andrew Dunstan; some editing by Tom	2012-04-04 15:05:10 -04:00
Peter Eisentraut	38b9693fd9	Add support for renaming domain constraints	2012-04-03 08:11:51 +03:00
Simon Riggs	68219aaf6b	Correct epoch of txid_current() when executed on a Hot Standby server. Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user. Bug report from Daniel Farina	2012-03-29 14:55:30 +01:00
Andrew Dunstan	aeca650226	Unbreak Windows builds broken by pgpipe removal.	2012-03-29 04:11:57 -04:00
Heikki Linnakangas	5762a4d909	Inherit max_safe_fds to child processes in EXEC_BACKEND mode. Postmaster sets max_safe_fds by testing how many open file descriptors it can open, and that is normally inherited by all child processes at fork(). Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we effectively ignored max_files_per_process on Windows, and always assumed a conservative default of 32 simultaneous open files. That could have an impact on performance, if you need to access a lot of different files in a query. After this patch, the value is passed to child processes by save/restore_backend_variables() among many other global variables. It has been like this forever, but given the lack of complaints about it, I'm not backpatching this.	2012-03-29 08:19:11 +03:00
Andrew Dunstan	d2c1740dc2	Remove now redundant pgpipe code.	2012-03-28 23:24:07 -04:00
Tom Lane	5d3fcc4c2e	Bend parse location rules for the convenience of pg_stat_statements. Generally, the parse location assigned to a multiple-token construct is the location of its leftmost token. This commit breaks that rule for the syntaxes TYPENAME 'LITERAL' and CAST(CONSTANT AS TYPENAME) --- the resulting Const will have the location of the literal string, not the typename or CAST keyword. The cases where this matters are pretty thin on the ground (no error messages in the regression tests change, for example), and it's unlikely that any user would be confused anyway by an error cursor pointing at the literal. But still it's less than consistent. The reason for changing it is that contrib/pg_stat_statements wants to know the parse location of the original literal, and it was agreed that this is the least unpleasant way to preserve that information through parse analysis. Peter Geoghegan	2012-03-27 15:17:41 -04:00
Tom Lane	a40fa613b5	Add some infrastructure for contrib/pg_stat_statements. Add a queryId field to Query and PlannedStmt. This is not used by the core backend, except for being copied around at appropriate times. It's meant to allow plug-ins to track a particular query forward from parse analysis to execution. The queryId is intentionally not dumped into stored rules (and hence this commit doesn't bump catversion). You could argue that choice either way, but it seems better that stored rule strings not have any dependency on plug-ins that might or might not be present. Also, add a post_parse_analyze_hook that gets invoked at the end of parse analysis (but only for top-level analysis of complete queries, not cases such as analyzing a domain's default-value expression). This is mainly meant to be used to compute and assign a queryId, but it could have other applications. Peter Geoghegan	2012-03-27 15:17:40 -04:00
Robert Haas	40b9b95769	New GUC, track_iotiming, to track I/O timings. Currently, the only way to see the numbers this gathers is via EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through the stats collector and pg_stat_statements in subsequent patches. Ants Aasma, reviewed by Greg Smith, with some further changes by me.	2012-03-27 14:55:02 -04:00
Peter Eisentraut	dcb33b1c64	Remove dead assignment found by Coverity	2012-03-26 21:03:10 +03:00
Robert Haas	7386089d23	Code cleanup for heap_freeze_tuple. It used to be case that lazy vacuum could call this function with only a shared lock on the buffer, but neither lazy vacuum nor any other code path does that any more. Simplify the code accordingly and clean up some related, obsolete comments.	2012-03-26 11:03:06 -04:00
Tom Lane	e8476f46fc	Fix COPY FROM for null marker strings that correspond to invalid encoding. The COPY documentation says "COPY FROM matches the input against the null string before removing backslashes". It is therefore reasonable to presume that null markers like E'\\0' will work ... and they did, until someone put the tests in the wrong order during microoptimization-driven rewrites. Since then, we've been failing if the null marker is something that would de-escape to an invalidly-encoded string. Since null markers generally need to be something that can't appear in the data, this represents a nontrivial loss of functionality; surprising nobody noticed it earlier. Per report from Jeff Davis. Backpatch to 8.4 where this got broken.	2012-03-25 23:17:22 -04:00
Tom Lane	c7cea267de	Replace empty locale name with implied value in CREATE DATABASE and initdb. setlocale() accepts locale name "" as meaning "the locale specified by the process's environment variables". Historically we've accepted that for Postgres' locale settings, too. However, it's fairly unsafe to store an empty string in a new database's pg_database.datcollate or datctype fields, because then the interpretation could vary across postmaster restarts, possibly resulting in index corruption and other unpleasantness. Instead, we should expand "" to whatever it means at the moment of calling CREATE DATABASE, which we can do by saving the value returned by setlocale(). For consistency, make initdb set up the initial lc_xxx parameter values the same way. initdb was already doing the right thing for empty locale names, but it did not replace non-empty names with setlocale results. On a platform where setlocale chooses to canonicalize the spellings of locale names, this would result in annoying inconsistency. (It seems that popular implementations of setlocale don't do such canonicalization, which is a pity, but the POSIX spec certainly allows it to be done.) The same risk of inconsistency leads me to not venture back-patching this, although it could certainly be seen as a longstanding bug. Per report from Jeff Davis, though this is not his proposed patch.	2012-03-25 21:47:22 -04:00
Tom Lane	8279eb4191	Fix planner's handling of outer PlaceHolderVars within subqueries. For some reason, in the original coding of the PlaceHolderVar mechanism I had supposed that PlaceHolderVars couldn't propagate into subqueries. That is of course entirely possible. When it happens, we need to treat an outer-level PlaceHolderVar much like an outer Var or Aggref, that is SS_replace_correlation_vars() needs to replace the PlaceHolderVar with a Param, and then when building the finished SubPlan we have to provide the PlaceHolderVar expression as an actual parameter for the SubPlan. The handling of the contained expression is a bit delicate but it can be treated exactly like an Aggref's expression. In addition to the missing logic in subselect.c, prepjointree.c was failing to search subqueries for PlaceHolderVars that need their relids adjusted during subquery pullup. It looks like everyplace else that touches PlaceHolderVars got it right, though. Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this oversight would fail with "ERROR: Upper-level PlaceHolderVar found where not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong answers, since the value transmitted into the subquery wouldn't go to null when it should.	2012-03-24 16:21:39 -04:00
Tom Lane	ed61127be4	Cast some printf arguments to avoid possibly-nonportable behavior. Per compiler warnings on buildfarm member black_firefly.	2012-03-23 20:18:04 -04:00
Tom Lane	81a646febe	Refactor simplify_function et al to centralize argument simplification. We were doing the recursive simplification of function/operator arguments in half a dozen different places, with rather baroque logic to ensure it didn't get done multiple times on some arguments. This patch improves that by postponing argument simplification until after we've dealt with named parameters and added any needed default expressions. Marti Raudsepp, somewhat hacked on by me	2012-03-23 19:15:58 -04:00
Tom Lane	0339047bc9	Code review for protransform patches. Fix loss of previous expression-simplification work when a transform function fires: we must not simply revert to untransformed input tree. Instead build a dummy FuncExpr node to pass to the transform function. This has the additional advantage of providing a simpler, more uniform API for transform functions. Move documentation to a somewhat less buried spot, relocate some poorly-placed code, be more wary of null constants and invalid typmod values, add an opr_sanity check on protransform function signatures, and some other minor cosmetic adjustments. Note: although this patch touches pg_proc.h, no need for catversion bump, because the changes are cosmetic and don't actually change the intended catalog contents.	2012-03-23 17:29:57 -04:00
Peter Eisentraut	0e85abd658	Clean up compiler warnings from unused variables with asserts disabled For those variables only used when asserts are enabled, use a new macro PG_USED_FOR_ASSERTS_ONLY, which expands to __attribute__((unused)) when asserts are not enabled.	2012-03-21 23:33:10 +02:00
Tom Lane	f70f095c90	Allow new relmapper entries when allow_system_table_mods is true. This restores the pre-9.0 situation that it's possible to add new indexes on pg_class and other mapped-but-not-shared catalogs, so long as you broke the glass and flipped the big red Dont-Touch-Me switch. As before, there are a lot of gotchas, and you'd have to be pretty desperate to try this on a production database; but there doesn't seem to be a reason for relmapper.c to be preventing such things all by itself. Per experimentation with a case suggested by Cody Cutrer.	2012-03-21 14:09:39 -04:00
Robert Haas	aefa6d163e	Add some CHECK_FOR_INTERRUPTS() calls to the heap-sort call path. I broke this in commit `337b6f5ecf`, which among other things arranged for quicksorts to CHECK_FOR_INTERRUPTS() slightly less frequently. Sadly, it also arranged for heapsorts to CHECK_FOR_INTERRUPTS() much less frequently. Repair.	2012-03-20 21:26:39 -04:00
Tom Lane	9dbf2b7d75	Restructure SELECT INTO's parsetree representation into CreateTableAsStmt. Making this operation look like a utility statement seems generally a good idea, and particularly so in light of the desire to provide command triggers for utility statements. The original choice of representing it as SELECT with an IntoClause appendage had metastasized into rather a lot of places, unfortunately, so that this patch is a great deal more complicated than one might at first expect. In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS subcommands required restructuring some EXPLAIN-related APIs. Add-on code that calls ExplainOnePlan or ExplainOneUtility, or uses ExplainOneQuery_hook, will need adjustment. Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO, which formerly were accepted though undocumented, are no longer accepted. The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE. The CREATE RULE case doesn't seem to have much real-world use (since the rule would work only once before failing with "table already exists"), so we'll not bother with that one. Both SELECT INTO and CREATE TABLE AS still return a command tag of "SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn", but for the moment backwards compatibility wins the day. Andres Freund and Tom Lane	2012-03-19 21:38:12 -04:00
Peter Eisentraut	693ff85d47	backend: Fix minor memory leak in configuration file processing Just for consistency with the other code paths. found by Coverity	2012-03-16 20:34:59 +02:00
Tom Lane	b67ad046e6	Improve commentary in match_pathkeys_to_index(). For a little while there I thought match_pathkeys_to_index() was broken because it wasn't trying to match index columns to pathkeys in order. Actually that's correct, because GiST can support ordering operators on any random collection of index columns, but it sure needs a comment.	2012-03-16 14:07:21 -04:00
Tom Lane	dd4134ea56	Revisit handling of UNION ALL subqueries with non-Var output columns. In commit `57664ed25e` I tried to fix a bug reported by Teodor Sigaev by making non-simple-Var output columns distinct (by wrapping their expressions with dummy PlaceHolderVar nodes). This did not work too well. Commit `b28ffd0fcc` fixed some ensuing problems with matching to child indexes, but per a recent report from Claus Stadler, constraint exclusion of UNION ALL subqueries was still broken, because constant-simplification didn't handle the injected PlaceHolderVars well either. On reflection, the original patch was quite misguided: there is no reason to expect that EquivalenceClass child members will be distinct. So instead of trying to make them so, we should ensure that we can cope with the situation when they're not. Accordingly, this patch reverts the code changes in the above-mentioned commits (though the regression test cases they added stay). Instead, I've added assorted defenses to make sure that duplicate EC child members don't cause any problems. Teodor's original problem ("MergeAppend child's targetlist doesn't match MergeAppend") is addressed more directly by revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort list guide creation of each child's sort list. In passing, get rid of add_sort_column; as far as I can tell, testing for duplicate sort keys at this stage is dead code. Certainly it doesn't trigger often enough to be worth expending cycles on in ordinary queries. And keeping the test would've greatly complicated the new logic in prepare_sort_from_pathkeys, because comparing pathkey list entries against a previous output array requires that we not skip any entries in the list. Back-patch to 9.1, like the previous patches. The only known issue in this area that wasn't caused by the ill-advised previous patches was the MergeAppend planning failure, which of course is not relevant before 9.1. It's possible that we need some of the new defenses against duplicate child EC entries in older branches, but until there's some clear evidence of that I'm going to refrain from back-patching further.	2012-03-16 13:11:55 -04:00
Peter Eisentraut	eb990a2b9e	Add const qualifier to tzn returned by timestamp2tm() The tzn value might come from tm->tm_zone, which libc declares as const, so it's prudent that the upper layers know about this as well.	2012-03-15 21:17:19 +02:00
Peter Eisentraut	531e60aec0	Remove unused tzn arguments for timestamp2tm()	2012-03-15 21:13:35 +02:00
Peter Eisentraut	ad4fb0d0d2	Improve EncodeDateTime and EncodeTimeOnly APIs Use an explicit argument to tell whether to include the time zone in the output, rather than using some undocumented pointer magic.	2012-03-14 23:03:34 +02:00
Peter Eisentraut	6f018c6dda	COPY: Add an assertion This is for tools such as Coverity that don't know that the grammar enforces that the case of not having a relation (but instead a query) cannot happen in the FROM case.	2012-03-14 22:44:40 +02:00
Peter Eisentraut	e684ab5e1e	Add additional safety check against invalid backup label file It was already checking for invalid data after "BACKUP FROM", but would possibly crash if "BACKUP FROM" was missing altogether. found by Coverity	2012-03-14 22:41:50 +02:00
Tom Lane	b4af1c25bb	Fix SPGiST vacuum algorithm to handle concurrent tuple motion properly. A leaf tuple that we need to delete could get moved as a consequence of an insertion happening concurrently with the VACUUM scan. If it moves from a page past the current scan point to a page before, we'll miss it, which is not acceptable. Hence, when we see a leaf-page REDIRECT that could have been made since our scan started, chase down the redirection pointer much as if we were doing a normal index search, and be sure to vacuum every page it leads to. This fixes the issue because, if the tuple was on page N at the instant we start our scan, we will surely find it as a consequence of chasing the redirect from page N, no matter how much it moves around in between. Problem noted by Takashi Yamamoto.	2012-03-12 16:10:28 -04:00
Peter Eisentraut	bad250f4f3	Use correct sizeof operand in qsort call Probably no practical impact, since all pointers ought to have the same size, but it was wrong nonetheless. Found by Coverity.	2012-03-12 20:56:13 +02:00
Peter Eisentraut	c9f310d377	Add comment for missing break in switch For clarity, following other sites, and to silence Coverity.	2012-03-12 20:55:09 +02:00
Tom Lane	c6be1f43ab	Make INSERT/UPDATE queries depend on their specific target columns. We have always created a whole-table dependency for the target relation, but that's not really good enough, as it doesn't prevent scenarios such as dropping an individual target column or altering its type. So we have to create an individual dependency for each target column, as well. Per report from Bill MacArthur of a rule containing UPDATE breaking after such an alteration. Note that this patch doesn't try to make such cases work, only to ensure that the attempted ALTER TABLE throws an error telling you it can't cope with adjusting the rule. This is a long-standing bug, but given the lack of prior reports I'm not going to risk back-patching it. A back-patch wouldn't do anything to fix existing rules' dependency lists, anyway.	2012-03-11 18:14:23 -04:00
Tom Lane	c6a11b89e4	Teach SPGiST to store nulls and do whole-index scans. This patch fixes the other major compatibility-breaking limitation of SPGiST, that it didn't store anything for null values of the indexed column, and so could not support whole-index scans or "x IS NULL" tests. The approach is to create a wholly separate search tree for the null entries, and use fixed "allTheSame" insertion and search rules when processing this tree, instead of calling the index opclass methods. This way the opclass methods do not need to worry about dealing with nulls. Catversion bump is for pg_am updates as well as the change in on-disk format of SPGiST indexes; there are some tweaks in SPGiST WAL records as well. Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev. (The original also stored nulls separately, but it reused GIN code to do so; which required undesirable compromises in the on-disk format, and would likely lead to bugs due to the GIN code being required to work in two very different contexts.)	2012-03-11 16:29:59 -04:00
Peter Eisentraut	86947e666d	Add more detail to error message for invalid arguments for server process It now prints the argument that was at fault. Also fix a small misbehavior where the error message issued by getopt() would complain about a program named "--single", because that's what argv[0] is in the server process.	2012-03-11 02:03:52 +02:00
Tom Lane	03e56f798e	Restructure SPGiST opclass interface API to support whole-index scans. The original API definition was incapable of supporting whole-index scans because there was no way to invoke leaf-value reconstruction without checking any qual conditions. Also, it was inefficient for multiple-qual-condition scans because value reconstruction got done over again for each qual condition, and because other internal work in the consistent functions likewise had to be done for each qual. To fix these issues, pass the whole scankey array to the opclass consistent functions, instead of only letting them see one item at a time. (Essentially, the loop over scankey entries is now inside the consistent functions not outside them. This makes the consistent functions a bit more complicated, but not unreasonably so.) In itself this commit does nothing except save a few cycles in multiple-qual-condition index scans, since we can't support whole-index scans on SPGiST indexes until nulls are included in the index. However, I consider this a must-fix for 9.2 because once we release it will get very much harder to change the opclass API definition.	2012-03-10 18:36:49 -05:00
Peter Eisentraut	39d74e346c	Add support for renaming constraints reviewed by Josh Berkus and Dimitri Fontaine	2012-03-10 20:19:13 +02:00
Robert Haas	07d1edb954	Extend object access hook framework to support arguments, and DROP. This allows loadable modules to get control at drop time, perhaps for the purpose of performing additional security checks or to log the event. The initial purpose of this code is to support sepgsql, but other applications should be possible as well. KaiGai Kohei, reviewed by me.	2012-03-09 14:34:56 -05:00
Tom Lane	b14953932d	Revise FDW planning API, again. Further reflection shows that a single callback isn't very workable if we desire to let FDWs generate multiple Paths, because that forces the FDW to do all work necessary to generate a valid Plan node for each Path. Instead split the former PlanForeignScan API into three steps: GetForeignRelSize, GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain, and it's substantially more flexible for complex FDWs. Add an fdw_private field to RelOptInfo so that the new functions can save state there rather than possibly having to recalculate information two or three times. In addition, we'd not thought through what would be needed to allow an FDW to set up subexpressions of its choice for runtime execution. We could treat ForeignScan.fdw_private as an executable expression but that seems likely to break existing FDWs unnecessarily (in particular, it would restrict the set of node types allowable in fdw_private to those supported by expression_tree_walker). Instead, invent a separate field fdw_exprs which will receive the postprocessing appropriate for expression trees. (One field is enough since it can be a list of expressions; also, we assume the corresponding expression state tree(s) will be held within fdw_state, so we don't need to add anything to ForeignScanState.) Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this further as we continue to work on that patch, but to me it feels a lot closer to being right now.	2012-03-09 12:49:25 -05:00
Heikki Linnakangas	342baf4ce6	Update outdated comment. HeapTupleHeader.t_natts field doesn't exist anymore. Kevin Grittner	2012-03-09 08:07:56 +02:00
Tom Lane	08dd23cec7	Fix some issues with temp/transient tables in extension scripts. Phil Sorber reported that a rewriting ALTER TABLE within an extension update script failed, because it creates and then drops a placeholder table; the drop was being disallowed because the table was marked as an extension member. We could hack that specific case but it seems likely that there might be related cases now or in the future, so the most practical solution seems to be to create an exception to the general rule that extension member objects can only be dropped by dropping the owning extension. To wit: if the DROP is issued within the extension's own creation or update scripts, we'll allow it, implicitly performing an "ALTER EXTENSION DROP object" first. This will simplify cases such as extension downgrade scripts anyway. No docs change since we don't seem to have documented the idea that you would need ALTER EXTENSION DROP for such an action to begin with. Also, arrange for explicitly temporary tables to not get linked as extension members in the first place, and the same for the magic pg_temp_nnn schemas that are created to hold them. This prevents assorted unpleasant results if an extension script creates a temp table: the forced drop at session end would either fail or remove the entire extension, and neither of those outcomes is desirable. Note that this doesn't fix the ALTER TABLE scenario, since the placeholder table is not temp (unless the table being rewritten is). Back-patch to 9.1.	2012-03-08 15:53:09 -05:00
Heikki Linnakangas	d93f209f48	Silence warning about unused variable, when building without assertions.	2012-03-08 11:10:02 +02:00
Tom Lane	66a7e6bae9	Improve estimation of IN/NOT IN by assuming array elements are distinct. In constructs such as "x IN (1,2,3,4)" and "x <> ALL(ARRAY[1,2,3,4])", we formerly always used a general-purpose assumption that the probability of success is independent for each comparison of "x" to an array element. But in real-world usage of these constructs, that's a pretty poor assumption; it's much saner to assume that the array elements are distinct and so the match probabilities are disjoint. Apply that assumption if the operator appears to behave as equality (for ANY) or inequality (for ALL). But fall back to the normal independent-probabilities calculation if this yields an impossible result, ie probability > 1 or < 0. We could protect ourselves against bad estimates even more by explicitly checking for equal array elements, but that is expensive and doesn't seem worthwhile: doing it would amount to optimizing for poorly-written queries at the expense of well-written ones. Daniele Varrazzo and Tom Lane, after a suggestion by Ants Aasma	2012-03-07 22:59:49 -05:00
Tom Lane	9088d1b965	Add GetForeignColumnOptions() to foreign.c, and add some documentation. GetForeignColumnOptions provides some abstraction for accessing column-specific FDW options, on a par with the access functions that were already provided here for other FDW-related information. Adjust file_fdw.c to use GetForeignColumnOptions instead of equivalent hand-rolled code. In addition, add some SGML documentation for the functions exported by foreign.c that are meant for use by FDW authors. (This is the fdw_helper portion of the proposed pgsql_fdw patch.) Hanada Shigeru, reviewed by KaiGai Kohei	2012-03-07 18:20:58 -05:00
Tom Lane	d4bf3c9c94	Expose an API for calculating catcache hash values. Now that cache invalidation callbacks get only a hash value, and not a tuple TID (per commits `632ae6829f` and `b5282aa893`), the only way they can restrict what they invalidate is to know what the hash values mean. setrefs.c was doing this via a hard-wired assumption but that seems pretty grotty, and it'll only get worse as more cases come up. So let's expose a calculation function that takes the same parameters as SearchSysCache. Per complaint from Marko Kreen.	2012-03-07 14:51:13 -05:00
Tom Lane	19dbc34631	Add a hook for processing messages due to be sent to the server log. Use-cases for this include custom log filtering rules and custom log message transmission mechanisms (for instance, lossy log message collection, which has been discussed several times recently). As is our common practice for hooks, there's no regression test nor user-facing documentation for this, though the author did exhibit a sample module using the hook. Martin Pihlak, reviewed by Marti Raudsepp	2012-03-06 15:35:41 -05:00
Robert Haas	bc97c38115	Typo fix. Fujii Masao	2012-03-06 08:23:51 -05:00
Heikki Linnakangas	e587e2e3e3	Make the comments more clear on the fact that UpdateFullPageWrites() is not safe to call concurrently from multiple processes.	2012-03-06 10:45:58 +02:00
Heikki Linnakangas	7714c63829	Remove extra copies of LogwrtResult. This simplifies the code a little bit. The new rule is that to update XLogCtl->LogwrtResult, you must hold both WALWriteLock and info_lck, whereas before we had two copies, one that was protected by WALWriteLock and another protected by info_lck. The code that updates them was already holding both locks, so merging the two is trivial. The third copy, XLogCtl->Insert.LogwrtResult, was not totally redundant, it was used in AdvanceXLInsertBuffer to update the backend-local copy, before acquiring the info_lck to read the up-to-date value. But the value of that seems dubious; at best it's saving one spinlock acquisition per completed WAL page, which is not significant compared to all the other work involved. And in practice, it's probably not saving even that much.	2012-03-06 10:18:33 +02:00
Heikki Linnakangas	3b682df326	Simplify the way changes to full_page_writes are logged. It's harmless to do full page writes even when not strictly necessary, so when turning full_page_writes on, we can set the global flag first, and then call XLogInsert. Likewise, when turning it off, we can write the WAL record first, and then clear the flag. This way XLogInsert doesn't need any special handling of the XLOG_FPW_CHANGE record type. XLogInsert is complicated enough already, so anything we can keep away from there is a good thing. Actually I don't think the atomicity of the shared memory flag matters, anyway, because we only write the XLOG_FPW_CHANGE at the end of recovery, when there are no concurrent WAL insertions going on. But might as well make it safe, in case we allow changing full_page_writes on the fly in the future.	2012-03-06 09:48:30 +02:00
Tom Lane	6b289942bf	Redesign PlanForeignScan API to allow multiple paths for a foreign table. The original API specification only allowed an FDW to create a single access path, which doesn't seem like a terribly good idea in hindsight. Instead, move the responsibility for building the Path node and calling add_path() into the FDW's PlanForeignScan function. Now, it can do that more than once if appropriate. There is no longer any need for the transient FdwPlan struct, so get rid of that. Etsuro Fujita, Shigeru Hanada, Tom Lane	2012-03-05 16:15:59 -05:00
Tom Lane	80da9e68fd	Rewrite GiST support code for rangetypes. This patch installs significantly smarter penalty and picksplit functions for ranges, making GiST indexes for them smaller and faster to search. There is no on-disk format change, so no catversion bump, but you'd need to REINDEX to get the benefits for any existing index. Alexander Korotkov, reviewed by Jeff Davis	2012-03-04 22:50:06 -05:00
Tom Lane	e2eed78910	Remove useless "rough estimate" path from mcelem_array_contained_selec. The code in this function that tried to cope with a missing count histogram was quite ineffective for anything except a perfectly flat distribution. Furthermore, since we were already punting for missing MCELEM slot, it's rather useless to sweat over missing DECHIST: there are no cases where ANALYZE will create the first but not the second. So just simplify the code by punting rather than pretending we can do something useful.	2012-03-04 16:03:38 -05:00
Tom Lane	4fb694aebc	Improve histogram-filling loop in new compute_array_stats() code. Do "frac" arithmetic in int64 to prevent overflow with large statistics targets, and improve the comments so people have some chance of understanding how it works. Alexander Korotkov and Tom Lane	2012-03-04 15:40:16 -05:00
Magnus Hagander	141b89826d	More carefully validate xlog location string inputs Now that we have validate_xlog_location, call it from the previously existing functions taking xlog locatoins as a string input. Suggested by Fujii Masao	2012-03-04 12:25:47 +01:00
Magnus Hagander	bc5ac36865	Add function pg_xlog_location_diff to help comparisons Comparing two xlog locations are useful for example when calculating replication lag. Euler Taveira de Oliveira, reviewed by Fujii Masao, and some cleanups from me	2012-03-04 12:22:38 +01:00
Tom Lane	0e5e167aae	Collect and use element-frequency statistics for arrays. This patch improves selectivity estimation for the array <@, &&, and @> (containment and overlaps) operators. It enables collection of statistics about individual array element values by ANALYZE, and introduces operator-specific estimators that use these stats. In addition, ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)" and "const <> ANY/ALL (array_column)" are estimated by treating them as variants of the containment operators. Since we still collect scalar-style stats about the array values as a whole, the pg_stats view is expanded to show both these stats and the array-style stats in separate columns. This creates an incompatible change in how stats for tsvector columns are displayed in pg_stats: the stats about lexemes are now displayed in the array-related columns instead of the original scalar-related columns. There are a few loose ends here, notably that it'd be nice to be able to suppress either the scalar-style stats or the array-element stats for columns for which they're not useful. But the patch is in good enough shape to commit for wider testing. Alexander Korotkov, reviewed by Noah Misch and Nathan Boley	2012-03-03 20:20:57 -05:00
Peter Eisentraut	b59ca98209	Allow CREATE TABLE (LIKE ...) from composite type The only reason this didn't work before was that parserOpenTable() rejects composite types. So use relation_openrv() directly and manually do the errposition() setup that parserOpenTable() does.	2012-03-03 16:03:05 +02:00
Tom Lane	44634e474f	Allow child-relation entries to be made in ec_has_const EquivalenceClasses. This fixes an oversight in commit `11cad29c91`, which introduced MergeAppend plans. Before that happened, we never particularly cared about the sort ordering of scans of inheritance child relations, since appending their outputs together would destroy any ordering anyway. But now it's important to be able to match child relation sort orderings to those of the surrounding query. The original coding of add_child_rel_equivalences skipped ec_has_const EquivalenceClasses, on the originally-correct grounds that adding child expressions to them was useless. The effect of this is that when a parent variable is equated to a constant, we can't recognize that index columns on the equivalent child variables are not sort-significant; that is, we can't recognize that a child index on, say, (x, y) is able to generate output in "ORDER BY y" order when there is a clause "WHERE x = constant". Adding child expressions to the (x, constant) EquivalenceClass fixes this, without any downside that I can see other than a few more planner cycles expended on such queries. Per recent gripe from Robert McGehee. Back-patch to 9.1 where MergeAppend was introduced.	2012-03-02 14:29:07 -05:00
Peter Eisentraut	6688d2878e	Add COLLATION FOR expression reviewed by Jaime Casanova	2012-03-02 21:12:16 +02:00
Heikki Linnakangas	2502f45979	When a GiST page is split during index build, it might not have a buffer. Previously it was thought that it's impossible as the code stands, because insertions create buffers as tuples are cascaded downwards, and index split also creaters buffers eagerly for all halves. But the example from Jay Levitt demonstrates that it can happen, when the root page is split. It's in fact OK if the buffer doesn't exist, so we just need to remove the sanity check. In fact, we've been discussing the possibility of destroying empty buffers to conserve memory, which would render the sanity check completely useless anyway. Fix by Alexander Korotkov	2012-03-02 13:16:09 +02:00
Alvaro Herrera	3433c6ba00	Remove TOAST table from pg_database The only toastable column now is datacl, but we don't really support long ACLs anyway. The TOAST table should have been removed when the pg_db_role_setting catalog was introduced in commit `2eda8dfb52`, but I forgot to do that. Per -hackers discussion on March 2011.	2012-03-01 12:50:52 -03:00
Heikki Linnakangas	d6a7271958	Correctly detect SSI conflicts of prepared transactions after crash. A prepared transaction can get new conflicts in and out after preparing, so we cannot rely on the in- and out-flags stored in the statefile at prepare- time. As a quick fix, make the conservative assumption that after a restart, all prepared transactions are considered to have both in- and out-conflicts. That can lead to unnecessary rollbacks after a crash, but that shouldn't be a big problem in practice; you don't want prepared transactions to hang around for a long time anyway. Dan Ports	2012-02-29 15:42:36 +02:00
Tom Lane	5c02a00d44	Move CRC tables to libpgport, and provide them in a separate include file. This makes it much more convenient to build tools for Postgres that are separately compiled and require a matching CRC implementation. To prevent multiple copies of the CRC polynomial tables being introduced into the postgres binaries, they are now included in the static library libpgport that is mainly meant for replacement system functions. That seems like a bit of a kludge, but there's no better place. This cleans up building of the tools pg_controldata and pg_resetxlog, which previously had to build their own copies of pg_crc.o. In the future, external programs that need access to the CRC tables can include the tables directly from the new header file pg_crc_tables.h. Daniel Farina, reviewed by Abhijit Menon-Sen and Tom Lane	2012-02-28 19:53:39 -05:00
Tom Lane	0140a11b9b	Fix thinko in new match_join_clauses_to_index() logic. We don't need to constrain the other side of an indexable join clause to not be below an outer join; an example here is SELECT FROM t1 LEFT JOIN t2 ON t1.a = t2.b LEFT JOIN t3 ON t2.c = t3.d; We can consider an inner indexscan on t3.d using c = d as indexqual, even though t2.c is potentially nulled by a previous outer join. The comparable logic in orindxpath.c has always worked that way, but I was being overly cautious here.	2012-02-28 18:10:40 -05:00
Peter Eisentraut	973e9fb294	Add const qualifiers where they are accidentally cast away This only produces warnings under -Wcast-qual, but it's more correct and consistent in any case.	2012-02-28 12:42:08 +02:00
Alvaro Herrera	cb3a7c2b95	ALTER TABLE: skip FK validation when it's safe to do so We already skip rewriting the table in these cases, but we still force a whole table scan to validate the data. This can be skipped, and thus we can make the whole ALTER TABLE operation just do some catalog touches instead of scanning the table, when these two conditions hold: (a) Old and new pg_constraint.conpfeqop match exactly. This is actually stronger than needed; we could loosen things by way of operator families, but it'd require a lot more effort. (b) The functions, if any, implementing a cast from the foreign type to the primary opcintype are the same. For this purpose, we can consider a binary coercion equivalent to an exact type match. When the opcintype is polymorphic, require that the old and new foreign types match exactly. (Since ri_triggers.c does use the executor, the stronger check for polymorphic types is no mere future-proofing. However, no core type exercises its necessity.) Author: Noah Misch Committer's note: catalog version bumped due to change of the Constraint node. I can't actually find any way to have such a node in a stored rule, but given that we have "out" support for them, better be safe.	2012-02-27 19:10:24 -03:00
Peter Eisentraut	9bf8603c7a	Call check_keywords.pl in maintainer-check For that purpose, have check_keywords.pl print errors to stderr and return a useful exit status.	2012-02-27 13:53:12 +02:00
Tom Lane	1b630751d0	Fix some more bugs in GIN's WAL replay logic. In commit `4016bdef8a` I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.	2012-02-26 15:12:17 -05:00
Peter Eisentraut	b5c077c368	Remove useless cast	2012-02-26 15:31:16 +02:00
Peter Eisentraut	66f0cf7da8	Remove useless const qualifier Claiming that the typevar argument to DefineCompositeType() is const was a plain lie. A similar case in DefineVirtualRelation() was already changed in passing in commit `1575fbcb`. Also clean up the now unnecessary casts that used to cast away the const.	2012-02-26 15:22:27 +02:00
Tom Lane	4dd78bf37a	Merge dissect() into cdissect() to remove a pile of near-duplicate code. The "uncomplicated" case isn't materially less complicated than the full case, certainly not enough so to justify duplicating nearly 500 lines of code. The only extra work being done in the full path is zaptreesubs, which is very cheap compared to everything else being done here, and besides that I'm less than convinced that it's not needed in some cases even without backrefs.	2012-02-24 18:40:31 -05:00
Tom Lane	587359479a	Avoid repeated creation/freeing of per-subre DFAs during regex search. In nested sub-regex trees, lower-level nodes created DFAs and then destroyed them again before exiting, which is a bit dumb considering that the recursive search is likely to call those nodes again later. Instead cache each created DFA until the end of pg_regexec(). This is basically a space for time tradeoff, in that it might increase the maximum memory usage. However, in most regex patterns there are not all that many subre nodes, so not that many DFAs --- and in any case, the peak usage occurs when reaching the bottom recursion level, and except for alternation cases that's going to be the same anyway.	2012-02-24 18:40:30 -05:00
Tom Lane	3cbfe485e4	Remove useless "retry memory" logic within regex engine. Apparently some primordial version of Spencer's engine needed cdissect() and child functions to be able to continue matching from a previous position when re-called. That is dead code, though, since trivial inspection shows that cdissect can never be entered without having previously done zapmem which resets the relevant retry counter. I have also verified experimentally that no case in the Tcl regression tests reaches cdissect with a nonzero retry value. Accordingly, remove that logic. This doesn't really save any noticeable number of cycles in itself, but it is one step towards making dissect() and cdissect() equivalent, which will allow removing hundreds of lines of near-duplicated code. Since struct subre's "retry" field is no longer particularly related to any kind of retry, rename it to "id". As of this commit it's only used for identifying a subre node in debug printouts, so you might think we should get rid of the field entirely; but I have a plan for another use.	2012-02-24 18:40:28 -05:00
Peter Eisentraut	9cfd800aab	Add some enumeration commas, for consistency	2012-02-24 11:04:45 +02:00
Tom Lane	173e29aa5d	Fix the general case of quantified regex back-references. Cases where a back-reference is part of a larger subexpression that is quantified have never worked in Spencer's regex engine, because he used a compile-time transformation that neglected the need to check the back-reference match in iterations before the last one. (That was okay for capturing parens, and we still do it if the regex has only capturing parens ... but it's not okay for backrefs.) To make this work properly, we have to add an "iteration" node type to the regex engine's vocabulary of sub-regex nodes. Since this is a moderately large change with a fair risk of introducing new bugs of its own, apply to HEAD only, even though it's a fix for a longstanding bug.	2012-02-24 01:41:03 -05:00
Andrew Dunstan	0c9e5d5e0d	Correctly handle NULLs in JSON output. Error reported by David Wheeler.	2012-02-23 23:44:16 -05:00
Tom Lane	077711c2e3	Remove arbitrary limitation on length of common name in SSL certificates. Both libpq and the backend would truncate a common name extracted from a certificate at 32 bytes. Replace that fixed-size buffer with dynamically allocated string so that there is no hard limit. While at it, remove the code for extracting peer_dn, which we weren't using for anything; and don't bother to store peer_cn longer than we need it in libpq. This limit was not so terribly unreasonable when the code was written, because we weren't using the result for anything critical, just logging it. But now that there are options for checking the common name against the server host name (in libpq) or using it as the user's name (in the server), this could result in undesirable failures. In the worst case it even seems possible to spoof a server name or user name, if the correct name is exactly 32 bytes and the attacker can persuade a trusted CA to issue a certificate in which that string is a prefix of the certificate's common name. (To exploit this for a server name, he'd also have to send the connection astray via phony DNS data or some such.) The case that this is a realistic security threat is a bit thin, but nonetheless we'll treat it as one. Back-patch to 8.4. Older releases contain the faulty code, but it's not a security problem because the common name wasn't used for anything interesting. Reported and patched by Heikki Linnakangas Security: CVE-2012-0867	2012-02-23 15:48:04 -05:00
Tom Lane	891e6e7bfd	Require execute permission on the trigger function for CREATE TRIGGER. This check was overlooked when we added function execute permissions to the system years ago. For an ordinary trigger function it's not a big deal, since trigger functions execute with the permissions of the table owner, so they couldn't do anything the user issuing the CREATE TRIGGER couldn't have done anyway. However, if a trigger function is SECURITY DEFINER, that is not the case. The lack of checking would allow another user to install it on his own table and then invoke it with, essentially, forged input data; which the trigger function is unlikely to realize, so it might do something undesirable, for instance insert false entries in an audit log table. Reported by Dinesh Kumar, patch by Robert Haas Security: CVE-2012-0866	2012-02-23 15:38:56 -05:00
Peter Eisentraut	c9d7004440	Remove inappropriate quotes And adjust wording for consistency.	2012-02-23 12:52:17 +02:00
Peter Eisentraut	8251670cb3	Fix build without OpenSSL This is a fixup for commit `a445cb92ef`.	2012-02-23 10:20:25 +02:00
Robert Haas	2254367435	Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written. Also expose the new counters through pg_stat_statements. Patch by me. Review by Fujii Masao and Greg Smith.	2012-02-22 20:33:05 -05:00
Robert Haas	f74f9a277c	Fix typo in comment. Sandro Santilli	2012-02-22 19:46:12 -05:00
Peter Eisentraut	a445cb92ef	Add parameters for controlling locations of server-side SSL files This allows changing the location of the files that were previously hard-coded to server.crt, server.key, root.crt, root.crl. server.crt and server.key continue to be the default settings and are thus required to be present by default if SSL is enabled. But the settings for the server-side CA and CRL are now empty by default, and if they are set, the files are required to be present. This replaces the previous behavior of ignoring the functionality if the files were not found.	2012-02-22 23:40:46 +02:00
Alvaro Herrera	a417f85e1d	REASSIGN OWNED: Support foreign data wrappers and servers This was overlooked when implementing those kinds of objects, in commit `cae565e503`. Per report from Pawel Casperek.	2012-02-22 17:33:12 -03:00
Tom Lane	593a9631a7	Don't clear btpo_cycleid during _bt_vacuum_one_page. When "vacuuming" a single btree page by removing LP_DEAD tuples, we are not actually within a vacuum operation, but rather in an ordinary insertion process that could well be running concurrently with a vacuum. So clearing the cycleid is incorrect, and could cause the concurrent vacuum to miss removing tuples that it needs to remove. This is a longstanding bug introduced by commit `e6284649b9` of 2006-07-25. I believe it explains Maxim Boguk's recent report of index corruption, and probably some other previously unexplained reports. In 9.0 and up this is a one-line fix; before that we need to introduce a flag to tell _bt_delitems what to do.	2012-02-21 15:03:36 -05:00
Tom Lane	9789c99d01	Cosmetic cleanup for commit `a760893dbd`. Mostly, fixing overlooked comments.	2012-02-21 14:14:16 -05:00
Magnus Hagander	c2a2f7516b	Avoid double close of file handle in syslogger on win32 This causes an exception when running under a debugger or in particular when running on a debug version of Windows. Patch from MauMau	2012-02-21 17:12:25 +01:00
Andrew Dunstan	6b044cb810	Fix typo, noticed by Will Crawford.	2012-02-21 11:03:51 -05:00
Andrew Dunstan	83fcaffea2	Fix a couple of cases of JSON output. First, as noted by Itagaki Takahiro, a datum of type JSON doesn't need to be escaped. Second, ensure that numeric output not in the form of a legal JSON number is quoted and escaped.	2012-02-20 15:01:03 -05:00
Tom Lane	5223f96d92	Fix regex back-references that are directly quantified with . The syntax "\n", that is a backref with a * quantifier directly applied to it, has never worked correctly in Spencer's library. This has been an open bug in the Tcl bug tracker since 2005: https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894 The core of the problem is in parseqatom(), which first changes "\n" to "\n+\|" and then applies repeat() to the NFA representing the backref atom. repeat() thinks that any arc leading into its "rp" argument is part of the sub-NFA to be repeated. Unfortunately, since parseqatom() already created the arc that was intended to represent the empty bypass around "\n+", this arc gets moved too, so that it now leads into the state loop created by repeat(). Thus, what was supposed to be an "empty" bypass gets turned into something that represents zero or more repetitions of the NFA representing the backref atom. In the original example, in place of ^([bc])\1$ we now have something that acts like ^([bc])(\1+\|[bc])$ At runtime, the branch involving the actual backref fails, as it's supposed to, but then the other branch succeeds anyway. We could no doubt fix this by some rearrangement of the operations in parseqatom(), but that code is plenty ugly already, and what's more the whole business of converting "x" to "x+\|" probably needs to go away to fix another problem I'll mention in a moment. Instead, this patch suppresses the *-conversion when the target is a simple backref atom, leaving the case of m == 0 to be handled at runtime. This makes the patch in regcomp.c a one-liner, at the cost of having to tweak cbrdissect() a little. In the event I went a bit further than that and rewrote cbrdissect() to check all the string-length-related conditions before it starts comparing characters. It seems a bit stupid to possibly iterate through many copies of an n-character backreference, only to fail at the end because the target string's length isn't a multiple of n --- we could have found that out before starting. The existing coding could only be a win if integer division is hugely expensive compared to character comparison, but I don't know of any modern machine where that might be true. This does not fix all the problems with quantified back-references. In particular, the code is still broken for back-references that appear within a larger expression that is quantified (so that direct insertion of the quantification limits into the BACKREF node doesn't apply). I think fixing that will take some major surgery on the NFA code, specifically introducing an explicit iteration node type instead of trying to transform iteration into concatenation of modified regexps. Back-patch to all supported branches. In HEAD, also add a regression test case for this. (It may seem a bit silly to create a regression test file for just one test case; but I'm expecting that we will soon import a whole bunch of regex regression tests from Tcl, so might as well create the infrastructure now.)	2012-02-20 00:52:33 -05:00
Tom Lane	e00f68e49c	Add caching of ctype.h/wctype.h results in regc_locale.c. While this doesn't save a huge amount of runtime, it still seems worth doing, especially since I realized that the data copying I did in my first draft was quite unnecessary. In this version, once we have the results cached, getting them back for re-use is really very cheap. Also, remove the hard-wired limitation to not consider wctype.h results for character codes above 255. It turns out that we can't push the limit as far up as I'd originally hoped, because the regex colormap code is not efficient enough to cope very well with character classes containing many thousand letters, which a Unicode locale is entirely capable of producing. Still, we can push it up to U+7FF (which I chose as the limit of 2-byte UTF8 characters), which will at least make Eastern Europeans happy pending a better solution. Thus, this commit resolves the specific complaint in bug #6457, but not the more general issue that letters of non-western alphabets are mostly not recognized as matching [[:alpha:]].	2012-02-19 21:01:13 -05:00
Tom Lane	27af91438b	Create the beginnings of internals documentation for the regex code. Create src/backend/regex/README to hold an implementation overview of the regex package, and fill it in with some preliminary notes about the code's DFA/NFA processing and colormap management. Much more to do there of course. Also, improve some code comments around the colormap and cvec code. No functional changes except to add one missing assert.	2012-02-19 18:58:23 -05:00
Andrew Dunstan	2f582f76b1	Improve pretty printing of viewdefs. Some line feeds are added to target lists and from lists to make them more readable. By default they wrap at 80 columns if possible, but the wrap column is also selectable - if 0 it wraps after every item. Andrew Dunstan, reviewed by Hitoshi Harada.	2012-02-19 11:43:46 -05:00
Tom Lane	08fd6ff37f	Sync regex code with Tcl 8.5.11. Sync our regex code with upstream changes since last time we did this, which was Tcl 8.5.0 (see commit `df1e965e12`). There are no functional changes here; the main point is just to lay down a commit-log marker that somebody has looked at this recently, and to do what we can to keep the two codebases comparable.	2012-02-17 19:44:26 -05:00
Tom Lane	4767bc8ff2	Improve statistics estimation to make some use of DISTINCT in sub-queries. Formerly, we just punted when trying to estimate stats for variables coming out of sub-queries using DISTINCT, on the grounds that whatever stats we might have for underlying table columns would be inapplicable. But if the sub-query has only one DISTINCT column, we can consider its output variable as being unique, which is useful information all by itself. The scope of this improvement is pretty narrow, but it costs nearly nothing, so we might as well do it. Per discussion with Andres Freund. This patch differs from the draft I submitted yesterday in updating various comments about vardata.isunique (to reflect its extended meaning) and in tweaking the interaction with security_barrier views. There does not seem to be a reason why we can't use this sort of knowledge even when the sub-query is such a view.	2012-02-16 17:34:00 -05:00
Tom Lane	4bfe68dfab	Run a portal's cleanup hook immediately when pushing it to FAILED state. This extends the changes of commit `6252c4f9e2` so that we run the cleanup hook earlier for failure cases as well as success cases. As before, the point is to avoid an assertion failure from an Assert I added in commit `a874fe7b4c`, which was meant to check that no user-written code can be called during portal cleanup. This fixes a case reported by Pavan Deolasee in which the Assert could be triggered during backend exit (see the new regression test case), and also prevents the possibility that the cleanup hook is run after portions of the portal's state have already been recycled. That doesn't really matter in current usage, but it foreseeably could matter in the future. Back-patch to 9.1 where the Assert in question was added.	2012-02-15 16:19:01 -05:00
Robert Haas	edec8c8e00	Fix VPATH builds, broken by my recent commit to speed up tuplesorting. The relevant commit is `337b6f5ecf`.	2012-02-15 15:53:53 -05:00
Robert Haas	337b6f5ecf	Speed up in-memory tuplesorting. Per recent work by Peter Geoghegan, it's significantly faster to tuplesort on a single sortkey if ApplySortComparator is inlined into quicksort rather reached via a function pointer. It's also faster in general to have a version of quicksort which is specialized for sorting SortTuple objects rather than objects of arbitrary size and type. This requires a couple of additional copies of the quicksort logic, which in this patch are generate using a Perl script. There might be some benefit in adding further specializations here too, but thus far it's not clear that those gains are worth their weight in code footprint.	2012-02-15 12:13:32 -05:00
Robert Haas	73a4b994a6	Make CREATE/ALTER FUNCTION support NOT LEAKPROOF. Because it isn't good to be able to turn things on, and not off again.	2012-02-15 10:45:08 -05:00
Tom Lane	398f70ec07	Preserve column names in the execution-time tupledesc for a RowExpr. The hstore and json datatypes both have record-conversion functions that pay attention to column names in the composite values they're handed. We used to not worry about inserting correct field names into tuple descriptors generated at runtime, but given these examples it seems useful to do so. Observe the nicer-looking results in the regression tests whose results changed. catversion bump because there is a subtle change in requirements for stored rule parsetrees: RowExprs from ROW() constructs now have to include field names. Andrew Dunstan and Tom Lane	2012-02-14 17:34:56 -05:00
Robert Haas	cd30728fb2	Allow LEAKPROOF functions for better performance of security views. We don't normally allow quals to be pushed down into a view created with the security_barrier option, but functions without side effects are an exception: they're OK. This allows much better performance in common cases, such as when using an equality operator (that might even be indexable). There is an outstanding issue here with the CREATE FUNCTION / ALTER FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the leakproof flag. But I'm committing this as-is so that it doesn't have to be rebased again; we can fix up the grammar in a future commit. KaiGai Kohei, with some wordsmithing by me.	2012-02-13 22:21:14 -05:00
Heikki Linnakangas	21b1634275	Fix heap_multi_insert to set t_self field in the caller's tuples. If tuples were toasted, heap_multi_insert didn't update the ctid on the original tuples. This caused a failure if there was an after trigger (including a foreign key), on the table, and a tuple got toasted. Per off-list report and test case from Ted Phelps	2012-02-13 10:20:50 +02:00
Robert Haas	d429ebe347	Add a comment to AdjustIntervalForTypmod to reduce chance of future bugs. It's not entirely evident how the logic here relates to the interval_transform function, so let's clue people in that they need to check that if the rules change.	2012-02-09 12:24:36 -05:00
Robert Haas	6656588575	Improve interval_transform function to detect a few more cases. Noah Misch, per a review comment from me.	2012-02-09 12:24:22 -05:00
Heikki Linnakangas	82e73ba0d1	Add new keywords SNAPSHOT and TYPES to the keyword list in gram.y These were added to kwlist.h as unreserved keywords in separate patches, but authors forgot to add them to the corresponding list in gram.y. Because of that, even though they were supposed to be unreserved keywords, they could not be used as identifiers. src/tools/check_keywords.pl is your friend.	2012-02-09 11:37:54 +02:00
Tom Lane	331bf6712c	Throw error sooner for unlogged GiST indexes. Throwing an error only after we've built the main index fork is pretty unfriendly when the table already contains data. Per gripe from Jay Levitt.	2012-02-08 16:19:27 -05:00
Tom Lane	cb7c84fae8	Check misplaced window functions before checking aggregate/group by sanity. If somebody puts a window function in WHERE, we should complain about that in so many words. The previous coding tended to complain about the window function's arguments instead, which is likely to be misleading to users who are unclear on the semantics of window functions; as seen for example in bug #6440 from Matyas Novak. Just another example of how "add new code at the end" is frequently a bad heuristic.	2012-02-08 13:15:02 -05:00
Robert Haas	c13897983a	Add transform functions for various temporal typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds in some cases. Noah Misch, with trivial changes by me.	2012-02-08 09:33:37 -05:00
Heikki Linnakangas	1a01560cbb	Rename LWLockWaitUntilFree to LWLockAcquireOrWait. LWLockAcquireOrWait makes it more clear that the lock is acquired if it's free.	2012-02-08 09:17:13 +02:00
Robert Haas	af7dd696b0	Fix typos pointed out by Noah Misch.	2012-02-07 21:40:49 -05:00
Robert Haas	f7d7dade8a	Add a transform function for varbit typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds when the new type is unconstraint varbit, or when the allowable number of bits is not decreasing. Noah Misch, with review and a fix for an OID collision by me.	2012-02-07 12:42:50 -05:00
Robert Haas	3cc0800829	Add a transform function for numeric typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds when a column is changed to an unconstrained numeric, or when the scale is unchanged and the precision does not decrease. Noah Misch, with a few stylistic changes and a fix for an OID collision by me.	2012-02-07 12:08:26 -05:00
Robert Haas	af7914c662	Add TIMING option to EXPLAIN, to allow eliminating of timing overhead. Sometimes it may be useful to get actual row counts out of EXPLAIN (ANALYZE) without paying the cost of timing every node entry/exit. With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that. Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.	2012-02-07 11:23:04 -05:00
Heikki Linnakangas	15ad6f1510	When building with LWLOCK_STATS, initialize the stats in LWLockWaitUntilFree. If LWLockWaitUntilFree was called before the first LWLockAcquire call, you would either crash because of access to uninitialized array or account the acquisition incorrectly. LWLockConditionalAcquire doesn't have this problem because it doesn't update the lwlock stats. In practice, this never happens because there is no codepath where you would call LWLockWaitUntilfree before LWLockAcquire after a new process is launched. But that's just accidental, there's no guarantee that that's always going to be true in the future. Spotted by Jeff Janes.	2012-02-07 10:11:54 +02:00
Tom Lane	442231d7f7	Fix postmaster to attempt restart after a hot-standby crash. The postmaster was coded to treat any unexpected exit of the startup process (i.e., the WAL replay process) as a catastrophic crash, and not try to restart it. This was OK so long as the startup process could not have any sibling postmaster children. However, if a hot-standby backend crashes, we SIGQUIT the startup process along with everything else, and the resulting exit is hardly "unexpected". Treating it as such meant we failed to restart a standby server after any child crash at all, not only a crash of the WAL replay process as intended. Adjust that. Back-patch to 9.0 where hot standby was introduced.	2012-02-06 15:30:21 -05:00
Tom Lane	5fc78efcec	Avoid throwing ERROR during WAL replay of DROP TABLESPACE. Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless removal of the tablespace's directories succeeds, that does not guarantee that the same operation will succeed during WAL replay. Foreseeable reasons for it to fail include temp files created in the tablespace by Hot Standby backends, wrong directory permissions on a standby server, etc etc. The original coding threw ERROR if replay failed to remove the directories, but that is a serious overreaction. Throwing an error aborts recovery, and worse means that manual intervention will be needed to get the database to start again, since otherwise the same error will recur on subsequent attempts to replay the same WAL record. And the consequence of failing to remove the directories is only that some probably-small amount of disk space is wasted, so it hardly seems justified to throw an error. Accordingly, arrange to report such failures as LOG messages and keep going when a failure occurs during replay. Back-patch to 9.0 where Hot Standby was introduced. In principle such problems can occur in earlier releases, but Hot Standby increases the odds of trouble significantly. Given the lack of field reports of such issues, I'm satisfied with patching back as far as the patch applies easily.	2012-02-06 14:44:41 -05:00
Tom Lane	c6d76d7c82	Add locking around WAL-replay modification of shared-memory variables. Originally, most of this code assumed that no Postgres backends could be running concurrently with it, and so no locking could be needed. That assumption fails in Hot Standby. While it's still true that Hot Standby backends should never change values like nextXid, they can examine them, and consistency is important in some cases such as when computing a snapshot. Therefore, prudence requires that WAL replay code obtain the relevant locks when modifying such variables, even though it can examine them without taking a lock. We were following that coding rule in some places but not all. This commit applies the coding rule uniformly to all updates of ShmemVariableCache and MultiXactState fields; a search of the replay routines did not find any other cases that seemed to be at risk. In addition, this commit fixes a longstanding thinko in replay of NEXTOID and checkpoint records: we tried to advance nextOid only if it was behind the value in the WAL record, but the comparison would draw the wrong conclusion if OID wraparound had occurred since the previous value. Better to just unconditionally assign the new value, since OID assignment shouldn't be happening during replay anyway. The additional locking seems to be more in the nature of future-proofing than fixing any live bug, so I am not going to back-patch it. The NEXTOID fix will be back-patched separately.	2012-02-06 12:34:10 -05:00
Tom Lane	17118825b8	Fix transient clobbering of shared buffers during WAL replay. RestoreBkpBlocks was in the habit of zeroing and refilling the target buffer; which was perfectly safe when the code was written, but is unsafe during Hot Standby operation. The reason is that we have coding rules that allow backends to continue accessing a tuple in a heap relation while holding only a pin on its buffer. Such a backend could see transiently zeroed data, if WAL replay had occasion to change other data on the page. This has been shown to be the cause of bug #6425 from Duncan Rance (who deserves kudos for developing a sufficiently-reproducible test case) as well as Bridget Frey's re-report of bug #6200. It most likely explains the original report as well, though we don't yet have confirmation of that. To fix, change the code so that only bytes that are supposed to change will change, even transiently. This actually saves cycles in RestoreBkpBlocks, since it's not writing the same bytes twice. Also fix seq_redo, which has the same disease, though it has to work a bit harder to meet the requirement. So far as I can tell, no other WAL replay routines have this type of bug. In particular, the index-related replay routines, which would certainly be broken if they had to meet the same standard, are not at risk because we do not have coding rules that allow access to an index page when not holding a buffer lock on it. Back-patch to 9.0 where Hot Standby was added.	2012-02-05 15:49:17 -05:00
Tom Lane	ee68a44106	Improve comment.	2012-02-04 22:37:34 -05:00
Tom Lane	2af72cefea	Add missing Assert and fix inaccurate elog message in standby_redo(). All other WAL redo routines either call RestoreBkpBlocks() or Assert that they haven't been passed any backup blocks. Make this one do likewise. Also, fix incorrect routine name in its failure message.	2012-02-04 22:32:35 -05:00
Tom Lane	9bff0780cf	Allow SQL-language functions to reference parameters by name. Matthew Draper, reviewed by Hitoshi Harada	2012-02-04 19:23:49 -05:00
Andrew Dunstan	39909d1d39	Add array_to_json and row_to_json functions. Also move the escape_json function from explain.c to json.c where it seems to belong. Andrew Dunstan, Reviewd by Abhijit Menon-Sen.	2012-02-03 12:11:16 -05:00
Robert Haas	0ed7445d73	Allow spgist's text_ops to handle pattern-matching operators. This was presumably intended to work this way all along, but a few key bits of indxpath.c didn't get the memo. Robert Haas and Tom Lane	2012-02-02 13:10:56 -05:00

... 2 3 4 5 6 ...

12788 Commits