postgresql

Commit Graph

Author	SHA1	Message	Date
Simon Riggs	443951748c	Record data_checksum_version in control file. The value is not used anywhere in code, but will allow future changes to the checksum version should that become necessary in the future.	2013-04-30 12:27:12 +01:00
Tom Lane	db9f0e1d9a	Postpone creation of pathkeys lists to fix bug #8049 . This patch gets rid of the concept of, and infrastructure for, non-canonical PathKeys; we now only ever create canonical pathkey lists. The need for non-canonical pathkeys came from the desire to have grouping_planner initialize query_pathkeys and related pathkey lists before calling query_planner. However, since query_planner didn't actually do anything with those lists before they'd been made canonical, we can get rid of the whole mess by just not creating the lists at all until the point where we formerly canonicalized them. There are several ways in which we could implement that without making query_planner itself deal with grouping/sorting features (which are supposed to be the province of grouping_planner). I chose to add a callback function to query_planner's API; other alternatives would have required adding more fields to PlannerInfo, which while not bad in itself would create an ABI break for planner-related plugins in the 9.2 release series. This still breaks ABI for anything that calls query_planner directly, but it seems somewhat unlikely that there are any such plugins. I had originally conceived of this change as merely a step on the way to fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes that bug all by itself, as per the added regression test. The reason is that now get_eclass_for_sort_expr is adding the ORDER BY expression at the end of EquivalenceClass creation not the start, and so anything that is in a multi-member EquivalenceClass has already been created with correct em_nullable_relids. I am suspicious that there are related scenarios in which we still need to teach get_eclass_for_sort_expr to compute correct nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3 to fix bugs that are only hypothetical. So for the moment, do this and stop here. Back-patch to 9.2 but not to earlier branches, since they don't exhibit this bug for lack of join-clause-movement logic that depends on em_nullable_relids being correct. (We might have to revisit that choice if any related bugs turn up.) In 9.2, don't change the signature of make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as not to risk more plugin breakage than we have to.	2013-04-29 14:50:03 -04:00
Simon Riggs	43e7a66849	Introduce new page checksum algorithm and module. Isolate checksum calculation to its own module, so that bufpage knows little if anything about the details of the calculation. This implementation is a modified FNV-1a hash checksum, details of which are given in the new checksum.c header comments. Basic implementation only, so we fix the output value. Later related commits will add version numbers to pg_control, compiler optimization flags and memory barriers. Ants Aasma, reviewed by Jeff Davis and Simon Riggs	2013-04-29 09:05:27 +01:00
Tom Lane	f8db76e875	Editorialize a bit on new ProcessUtility() API. Choose a saner ordering of parameters (adding a new input param after the output params seemed a bit random), update the function's header comment to match reality (cmon folks, is this really that hard?), get rid of useless and sloppily-defined distinction between PROCESS_UTILITY_SUBCOMMAND and PROCESS_UTILITY_GENERATED.	2013-04-28 00:18:45 -04:00
Tom Lane	5194024d72	Incidental cleanup of matviews code. Move checking for unscannable matviews into ExecOpenScanRelation, which is a better place for it first because the open relation is already available (saving a relcache lookup cycle), and second because this eliminates the problem of telling the difference between rangetable entries that will or will not be scanned by the query. In particular we can get rid of the not-terribly-well-thought-out-or-implemented isResultRel field that the initial matviews patch added to RangeTblEntry. Also get rid of entirely unnecessary scannability check in the rewriter, and a bogus decision about whether RefreshMatViewStmt requires a parse-time snapshot. catversion bump due to removal of a RangeTblEntry field, which changes stored rules.	2013-04-27 17:48:57 -04:00
Peter Eisentraut	cc26ea9fe2	Clean up references to SQL92 In most cases, these were just references to the SQL standard in general. In a few cases, a contrast was made between SQL92 and later standards -- those have been kept unchanged.	2013-04-20 11:04:41 -04:00
Heikki Linnakangas	87ae9e7265	Remove some unused and seldom used fields from RelationAmInfo. This saves some memory from each index relcache entry. At least on a 64-bit machine, it saves just enough to shrink a typical relcache entry's memory usage from 2k to 1k. That's nice if you have a lot of backends and a lot of indexes.	2013-04-16 15:07:58 +03:00
Andrew Dunstan	d788121aba	Mark json IO and extraction functions immutable. Per complaint from Hubert Depesz Lubaczewski. Catalog version bumped.	2013-04-15 21:46:25 -04:00
Tom Lane	0b33790421	Clean up the mess around EXPLAIN and materialized views. Revert the matview-related changes in explain.c's API, as per recent complaint from Robert Haas. The reason for these appears to have been principally some ill-considered choices around having intorel_startup do what ought to be parse-time checking, plus a poor arrangement for passing it the view parsetree it needs to store into pg_rewrite when creating a materialized view. Do the latter by having parse analysis stick a copy into the IntoClause, instead of doing it at runtime. (On the whole, I seriously question the choice to represent CREATE MATERIALIZED VIEW as a variant of SELECT INTO/CREATE TABLE AS, because that means injecting even more complexity into what was already a horrid legacy kluge. However, I didn't go so far as to rethink that choice ... yet.) I also moved several error checks into matview parse analysis, and made the check for external Params in a matview more accurate. In passing, clean things up a bit more around interpretOidsOption(), and fix things so that we can use that to force no-oids for views, sequences, etc, thereby eliminating the need to cons up "oids = false" options when creating them. catversion bump due to change in IntoClause. (I wonder though if we really need readfuncs/outfuncs support for IntoClause anymore.)	2013-04-12 19:25:31 -04:00
Robert Haas	f8a54e936b	sepgsql: Enforce db_procedure:{execute} permission. To do this, we add an additional object access hook type, OAT_FUNCTION_EXECUTE. KaiGai Kohei	2013-04-12 08:58:01 -04:00
Robert Haas	d017bf41a3	Minor wording corrections for object-access hook stuff. KaiGai Kohei	2013-04-12 08:40:02 -04:00
Alvaro Herrera	6a76edb188	Fix confusion between ObjectType and ObjectClass Per report by Will Leinweber and Peter Eisentraut	2013-04-11 11:59:47 -03:00
Kevin Grittner	52e6e33ab4	Create a distinction between a populated matview and a scannable one. The intent was that being populated would, long term, be just one of the conditions which could affect whether a matview was scannable; being populated should be necessary but not always sufficient to scan the relation. Since only CREATE and REFRESH currently determine the scannability, names and comments accidentally conflated these concepts, leading to confusion. Also add missing locking for the SQL function which allows a test for scannability, and fix a modularity violatiion. Per complaints from Tom Lane, although its not clear that these will satisfy his concerns. Hopefully this will at least better frame the discussion.	2013-04-09 13:02:49 -05:00
Robert Haas	0bf42a5f3b	Adjust ExplainOneQuery_hook_type to take a DestReceiver argument. The materialized views patch adjusted ExplainOneQuery to take an additional DestReceiver argument, but failed to add a matching argument to the definition of ExplainOneQuery_hook. This is a problem for users of the hook that want to call ExplainOnePlan. Fix by adding the missing argument.	2013-04-09 10:25:08 -04:00
Tom Lane	3ccae48f44	Support indexing of regular-expression searches in contrib/pg_trgm. This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane	2013-04-09 01:06:54 -04:00
Simon Riggs	47c4333189	Avoid tricky race condition recording XLOG_HINT We copy the buffer before inserting an XLOG_HINT to avoid WAL CRC errors caused by concurrent hint writes to buffer while share locked. To make this work we refactor RestoreBackupBlock() to allow an XLOG_HINT to avoid the normal path for backup blocks, which assumes the underlying buffer is exclusive locked. Resulting code completely changes layout of XLOG_HINT WAL records, but this isn't even beta code, so this is a low impact change. In passing, avoid taking WALInsertLock for full page writes on checksummed hints, remove related cruft from XLogInsert() and improve xlog_desc record for XLOG_HINT. Andres Freund Bug report by Fujii Masao, testing by Jeff Janes and Jaime Casanova, review by Jeff Davis and Simon Riggs. Applied with changes from review and some comment editing.	2013-04-08 08:52:39 +01:00
Robert Haas	e965e6344c	sepgsql: Enforce db_schema:search permission. KaiGai Kohei, with comment and doc wordsmithing by me	2013-04-05 08:51:31 -04:00
Heikki Linnakangas	bf2b0a1478	Fix crash on compiling a regular expression with more than 32k colors. Throw an error instead. Backpatch to all supported branches.	2013-04-04 19:48:11 +03:00
Tom Lane	17fe2793ea	Fix insecure parsing of server command-line switches. An oversight in commit `e710b65c1c` allowed database names beginning with "-" to be treated as though they were secure command-line switches; and this switch processing occurs before client authentication, so that even an unprivileged remote attacker could exploit the bug, needing only connectivity to the postmaster's port. Assorted exploits for this are possible, some requiring a valid database login, some not. The worst known problem is that the "-r" switch can be invoked to redirect the process's stderr output, so that subsequent error messages will be appended to any file the server can write. This can for example be used to corrupt the server's configuration files, so that it will fail when next restarted. Complete destruction of database tables is also possible. Fix by keeping the database name extracted from a startup packet fully separate from command-line switches, as had already been done with the user name field. The Postgres project thanks Mitsumasa Kondo for discovering this bug, Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing the full extent of the danger. Security: CVE-2013-1899	2013-04-01 14:00:51 -04:00
Tom Lane	ce9ab88981	Make REPLICATION privilege checks test current user not authenticated user. The pg_start_backup() and pg_stop_backup() functions checked the privileges of the initially-authenticated user rather than the current user, which is wrong. For example, a user-defined index function could successfully call these functions when executed by ANALYZE within autovacuum. This could allow an attacker with valid but low-privilege database access to interfere with creation of routine backups. Reported and fixed by Noah Misch. Security: CVE-2013-1901	2013-04-01 13:09:24 -04:00
Andrew Dunstan	a570c98d7f	Add new JSON processing functions and parser API. The JSON parser is converted into a recursive descent parser, and exposed for use by other modules such as extensions. The API provides hooks for all the significant parser event such as the beginning and end of objects and arrays, and providing functions to handle these hooks allows for fairly simple construction of a wide variety of JSON processing functions. A set of new basic processing functions and operators is also added, which use this API, including operations to extract array elements, object fields, get the length of arrays and the set of keys of a field, deconstruct an object into a set of key/value pairs, and create records from JSON objects and arrays of objects. Catalog version bumped. Andrew Dunstan, with some documentation assistance from Merlin Moncure.	2013-03-29 14:12:13 -04:00
Alvaro Herrera	473ab40c8b	Add sql_drop event for event triggers This event takes place just before ddl_command_end, and is fired if and only if at least one object has been dropped by the command. (For instance, DROP TABLE IF EXISTS of a table that does not in fact exist will not lead to such a trigger firing). Commands that drop multiple objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event to fire. Some firings might be surprising, such as ALTER TABLE DROP COLUMN. The trigger is fired after the drop has taken place, because that has been deemed the safest design, to avoid exposing possibly-inconsistent internal state (system catalogs as well as current transaction) to the user function code. This means that careful tracking of object identification is required during the object removal phase. Like other currently existing events, there is support for tag filtering. To support the new event, add a new pg_event_trigger_dropped_objects() set-returning function, which returns a set of rows comprising the objects affected by the command. This is to be used within the user function code, and is mostly modelled after the recently introduced pg_identify_object() function. Catalog version bumped due to the new function. Dimitri Fontaine and Álvaro Herrera Review by Robert Haas, Tom Lane	2013-03-28 13:05:48 -03:00
Simon Riggs	593c39d156	Revoke `bc5334d867`	2013-03-28 09:18:02 +00:00
Simon Riggs	bc5334d867	Allow external recovery_config_directory If required, recovery.conf can now be located outside of the data directory. Server needs read/write permissions on this directory.	2013-03-27 11:45:42 +00:00
Kevin Grittner	549dae0352	Fix problems with incomplete attempt to prohibit OIDS with MVs. Problem with assertion failure in restoring from pg_dump output reported by Joachim Wieland. Review and suggestions by Tom Lane and Robert Haas.	2013-03-22 13:27:34 -05:00
Simon Riggs	96ef3b8ff1	Allow I/O reliability checks using 16-bit checksums Checksums are set immediately prior to flush out of shared buffers and checked when pages are read in again. Hint bit setting will require full page write when block is dirtied, which causes various infrastructure changes. Extensive comments, docs and README. WARNING message thrown if checksum fails on non-all zeroes page; ERROR thrown but can be disabled with ignore_checksum_failure = on. Feature enabled by an initdb option, since transition from option off to option on is long and complex and has not yet been implemented. Default is not to use checksums. Checksum used is WAL CRC-32 truncated to 16-bits. Simon Riggs, Jeff Davis, Greg Smith Wide input and assistance from many community members. Thank you.	2013-03-22 13:54:07 +00:00
Tom Lane	9cbc4b80dd	Redo postgres_fdw's planner code so it can handle parameterized paths. I wasn't going to ship this without having at least some example of how to do that. This version isn't terribly bright; in particular it won't consider any combinations of multiple join clauses. Given the cost of executing a remote EXPLAIN, I'm not sure we want to be very aggressive about doing that, anyway. In support of this, refactor generate_implied_equalities_for_indexcol so that it can be used to extract equivalence clauses that aren't necessarily tied to an index.	2013-03-21 19:44:32 -04:00
Alvaro Herrera	f8348ea32e	Allow extracting machine-readable object identity Introduce pg_identify_object(oid,oid,int4), which is similar in spirit to pg_describe_object but instead produces a row of machine-readable information to uniquely identify the given object, without resorting to OIDs or other internal representation. This is intended to be used in the event trigger implementation, to report objects being operated on; but it has usefulness of its own. Catalog version bumped because of the new function.	2013-03-20 18:19:19 -03:00
Simon Riggs	bb7cc2623f	Remove PageSetTLI and rename pd_tli to pd_checksum Remove use of PageSetTLI() from all page manipulation functions and adjust README to indicate change in the way we make changes to pages. Repurpose those bytes into the pd_checksum field and explain how that works in comments about page header. Refactoring ahead of actual feature patch which would make use of the checksum field, arriving later. Jeff Davis, with comments and doc changes by Simon Riggs Direction suggested by Robert Haas; many others providing review comments.	2013-03-18 13:46:42 +00:00
Robert Haas	05f3f9c7b2	Extend object-access hook machinery to support post-alter events. This also slightly widens the scope of what we support in terms of post-create events. KaiGai Kohei, with a few changes, mostly to the comments, by me	2013-03-17 22:57:26 -04:00
Tom Lane	e2a203a190	initdb needs pqsignal() even on Windows. I had thought we weren't using this version of pqsignal() at all on Windows, but that's wrong --- initdb is using it (and coping with the POSIX-ish semantics of bare signal() :-(). So allow the file to be built in WIN32+FRONTEND case, and add it to the MSVC build logic.	2013-03-17 15:19:47 -04:00
Tom Lane	da5aeccf64	Move pqsignal() to libpgport. We had two copies of this function in the backend and libpq, which was already pretty bogus, but it turns out that we need it in some other programs that don't use libpq (such as pg_test_fsync). So put it where it probably should have been all along. The signal-mask-initialization support in src/backend/libpq/pqsignal.c stays where it is, though, since we only need that in the backend.	2013-03-17 12:06:42 -04:00
Tom Lane	d43837d030	Add lock_timeout configuration parameter. This GUC allows limiting the time spent waiting to acquire any one heavyweight lock. In support of this, improve the recently-added timeout infrastructure to permit efficiently enabling or disabling multiple timeouts at once. That reduces the performance hit from turning on lock_timeout, though it's still not zero. Zoltán Böszörményi, reviewed by Tom Lane, Stephen Frost, and Hari Babu	2013-03-16 23:22:57 -04:00
Tom Lane	4387cf956b	Avoid inserting Result nodes that only compute identity projections. The planner sometimes inserts Result nodes to perform column projections (ie, arbitrary scalar calculations) above plan nodes that lack projection logic of their own. However, we did that even if the lower plan node was in fact producing the required column set already; which is a pretty common case given the popularity of "SELECT * FROM ...". Measurements show that the useless plan node adds non-negligible overhead, especially when there are many columns in the result. So add a check to avoid inserting a Result node unless there's something useful for it to do. There are a couple of remaining places where unnecessary Result nodes could get inserted, but they are (a) much less performance-critical, and (b) coded in such a way that it's hard to avoid inserting a Result, because the desired tlist is changed on-the-fly in subsequent logic. We'll leave those alone for now. Kyotaro Horiguchi; reviewed and further hacked on by Amit Kapila and Tom Lane.	2013-03-14 13:43:18 -04:00
Heikki Linnakangas	59d0bf9dca	Add cost estimation of range @> and <@ operators. The estimates are based on the existing lower bound histogram, and a new histogram of range lengths. Bump catversion, because the range length histogram now needs to be present in statistic slot kind 6, or you get an error on @> and <@ queries. (A re-ANALYZE would be enough to fix that, though) Alexander Korotkov, with some refactoring by me.	2013-03-14 15:36:56 +02:00
Andrew Dunstan	38fb4d978c	JSON generation improvements. This adds the following: json_agg(anyrecord) -> json to_json(any) -> json hstore_to_json(hstore) -> json (also used as a cast) hstore_to_json_loose(hstore) -> json The last provides heuristic treatment of numbers and booleans. Also, in json generation, if any non-builtin type has a cast to json, that function is used instead of the type's output function. Andrew Dunstan, reviewed by Steve Singer. Catalog version bumped.	2013-03-10 17:35:36 -04:00
Tom Lane	21734d2fb8	Support writable foreign tables. This patch adds the core-system infrastructure needed to support updates on foreign tables, and extends contrib/postgres_fdw to allow updates against remote Postgres servers. There's still a great deal of room for improvement in optimization of remote updates, but at least there's basic functionality there now. KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather heavily revised by Tom Lane.	2013-03-10 14:16:02 -04:00
Magnus Hagander	7f49a67f95	Report pg_hba line number and contents when users fail to log in Instead of just reporting which user failed to log in, log both the line number in the active pg_hba.conf file (which may not match reality in case the file has been edited and not reloaded) and the contents of the matching line (which will always be correct), to make it easier to debug incorrect pg_hba.conf files. The message to the client remains unchanged and does not include this information, to prevent leaking security sensitive information. Reviewed by Tom Lane and Dean Rasheed	2013-03-10 15:54:37 +01:00
Heikki Linnakangas	96443d1420	Forgot catversion bump in the SP-GiST adjacent support patch.	2013-03-08 17:12:38 +02:00
Heikki Linnakangas	23f10b6473	SP-GiST support of the range adjacent operator -\|- Alexander Korotkov, reviewed by Jeff Davis.	2013-03-08 15:03:19 +02:00
Heikki Linnakangas	7ccefe8610	Fix tli history file fetching, broken by the archive after crash recevery patch. If we were about to enter archive recovery after crash recovery, we scanned the archive for the latest tli history file, and set the recovery target timeline to that. However, when we actually tried to read the history file, we would not fetch the file from the archive, because we were not in archive recovery yet. To fix, make readTimeLineHistory and existsTimeLineHistory to always fetch the file from archive if archive recovery is requested, even if we're not in archive recovery yet. Backpatch to 9.2. Mitsumasa KONDO	2013-03-07 12:33:24 +02:00
Tom Lane	1908abc4a3	Arrange to cache FdwRoutine structs in foreign tables' relcache entries. This saves several catalog lookups per reference. It's not all that exciting right now, because we'd managed to minimize the number of places that need to fetch the data; but the upcoming writable-foreign-tables patch needs this info in a lot more places.	2013-03-06 23:48:09 -05:00
Robert Haas	f90cc26982	Code beautification for object-access hook machinery. KaiGai Kohei	2013-03-06 20:53:25 -05:00
Tom Lane	e11cb8ba2c	Fix missing #include in commands/matview.h. It needs parsenodes.h to be compilable regardless of previous headers.	2013-03-06 18:21:05 -05:00
Tom Lane	80b011ef0a	Fix to_char() to use ASCII-only case-folding rules where appropriate. formatting.c used locale-dependent case folding rules in some code paths where the result isn't supposed to be locale-dependent, for example to_char(timestamp, 'DAY'). Since the source data is always just ASCII in these cases, that usually didn't matter ... but it does matter in Turkish locales, which have unusual treatment of "i" and "I". To confuse matters even more, the misbehavior was only visible in UTF8 encoding, because in single-byte encodings we used pg_toupper/pg_tolower which don't have locale-specific behavior for ASCII characters. Fix by providing intentionally ASCII-only case-folding functions and using these where appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active branches, since it's been like this for a long time.	2013-03-05 13:02:30 -05:00
Kevin Grittner	c8056592bc	Bump catversion because of new function in the materialized view patch.	2013-03-05 05:32:03 -06:00
Kevin Grittner	3bf3ab8c56	Add a materialized view relations. A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.	2013-03-03 18:23:31 -06:00
Tom Lane	b15a6da292	Get rid of any toast table when converting a table to a view. Also make sure other fields of the view's pg_class entry are appropriate for a view; it shouldn't have relfrozenxid set for instance. This ancient omission isn't believed to have any serious consequences in versions 8.4-9.2, so no backpatch. But let's fix it before it does bite us in some serious way. It's just luck that the case doesn't cause problems for autovacuum. (It did cause problems in 8.3, but that's out of support.) Andres Freund	2013-03-03 19:05:47 -05:00
Tom Lane	2b78d101d1	Fix SQL function execution to be safe with long-lived FmgrInfos. fmgr_sql had been designed on the assumption that the FmgrInfo it's called with has only query lifespan. This is demonstrably unsafe in connection with range types, as shown in bug #7881 from Andrew Gierth. Fix things so that we re-generate the function's cache data if the (sub)transaction it was made in is no longer active. Back-patch to 9.2. This might be needed further back, but it's not clear whether the case can realistically arise without range types, so for now I'll desist from back-patching further.	2013-03-03 17:39:58 -05:00
Heikki Linnakangas	3d009e45bd	Add support for piping COPY to/from an external program. This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding psql \copy syntax. Like with reading/writing files, the backend version is superuser-only, and in the psql version, the program is run in the client. In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you the stdin/stdout is quoted, it's now interpreted as a filename. For example, "\copy foo from 'stdin'" now reads from a file called 'stdin', not from standard input. Before this, there was no way to specify a filename called stdin, stdout, pstdin or pstdout. This creates a new function in pgport, wait_result_to_str(), which can be used to convert the exit status of a process, as returned by wait(3), to a human-readable string. Etsuro Fujita, reviewed by Amit Kapila.	2013-02-27 18:22:31 +02:00
Tom Lane	c153530dc1	Install headers from the new src/include/common subdirectory. This got missed in commit `8396447cdb`. Andres Freund	2013-02-26 15:27:30 -05:00
Alvaro Herrera	639ed4e84b	Add pg_xlogdump contrib program This program relies on rm_desc backend routines and the xlogreader infrastructure to emit human-readable rendering of WAL records. Author: Andres Freund, with many reworks by Álvaro Reviewed (in a much earlier version) by Peter Eisentraut	2013-02-22 16:56:55 -03:00
Alvaro Herrera	a730183926	Move relpath() to libpgcommon This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.	2013-02-21 22:46:17 -03:00
Tom Lane	54a2786835	Need to decorate XactIsoLevel as PGDLLIMPORT for postgres_fdw. Per buildfarm.	2013-02-21 09:28:42 -05:00
Alvaro Herrera	a40d09e27f	Move ExceptionalCondition back to postgres.h It needs to be defined in the backend even when assertions are not enabled. It's cleaner to put it back, than create a separate #ifdef section in c.h. Per trouble report from Jeff Janes	2013-02-18 18:53:32 -03:00
Alvaro Herrera	187492b6c2	Split pgstat file in smaller pieces We now write one file per database and one global file, instead of having the whole thing in a single huge file. This reduces the I/O that must be done when partial data is required -- which is all the time, because each process only needs information on its own database anyway. Also, the autovacuum launcher does not need data about tables and functions in each database; having the global stats for all DBs is enough. Catalog version bumped because we have a new subdir under PGDATA. Author: Tomas Vondra. Some rework by Álvaro Testing by Jeff Janes Other discussion by Heikki Linnakangas, Tom Lane.	2013-02-18 18:12:52 -03:00
Peter Eisentraut	9475db3a4e	Add ALTER ROLE ALL SET command This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE ... SET functionality to allow creating settings that apply to all users in all databases. reviewed by Pavel Stehule	2013-02-17 23:45:36 -05:00
Simon Riggs	c2f79ba269	Force archive_status of .done for xlogs created by dearchival/replication. This is a forward-patch of commit `6f4b8a4f4f`, applied to 9.2 back in August. The plan was to do something else in master, but it looks like it's not going to happen, so let's just apply the 9.2 solution to master as well. Fujii Masao	2013-02-15 19:28:06 +02:00
Tom Lane	fdaf44862b	Invent pre-commit/pre-prepare/pre-subcommit events for xact callbacks. Currently it's only possible for loadable modules to get control during post-commit cleanup of a transaction. That doesn't work too well if they want to do something that could throw an error; for example, an FDW might need to issue a remote commit, which could well fail. To improve matters, extend the existing APIs for XactCallback and SubXactCallback functions to provide new pre-commit events for this purpose. The release notes will need to mention that existing callback functions should be checked to make sure they don't do something unwanted when one of the new event types occurs. In the examples within our source tree, contrib/sepgsql was fine but plpgsql had been a bit too cute.	2013-02-14 20:35:08 -05:00
Tom Lane	71627f3d19	Fix CVE-2013-0255 properly. Revert commit `ab0f7b6089` (in HEAD only) in favor of the proper solution, which is to declare enum_recv() correctly in the system catalogs. It should be declared to take type "internal" not "cstring". Also improve the type_sanity regression test, which should have caught this typo, so that it actually would. Most of the relevant checks on the signature of type I/O functions should not have been restricted to basetypes/pseudotypes, as they should apply to any type's I/O functions.	2013-02-13 16:20:01 -05:00
Alvaro Herrera	0e81ddde2c	Rename "string" pstrdup argument to "in" The former name collides with a symbol also used in the isolation test's parser, causing assorted failures in certain platforms.	2013-02-12 12:43:09 -03:00
Alvaro Herrera	8396447cdb	Create libpgcommon, and move pg_malloc et al to it libpgcommon is a new static library to allow sharing code among the various frontend programs and backend; this lets us eliminate duplicate implementations of common routines. We avoid libpgport, because that's intended as a place for porting issues; per discussion, it seems better to keep them separate. The first use case, and the only implemented by this patch, is pg_malloc and friends, which many frontend programs were already using. At the same time, we can use this to provide palloc emulation functions for the frontend; this way, some palloc-using files in the backend can also be used by the frontend cleanly. To do this, we change palloc() in the backend to be a function instead of a macro on top of MemoryContextAlloc(). This was previously believed to cause loss of performance, but this implementation has been tweaked by Tom and Andres so that on modern compilers it provides a slight improvement over the previous one. This lets us clean up some places that were already with localized hacks. Most of the pg_malloc/palloc changes in this patch were authored by Andres Freund. Zoltán Böszörményi also independently provided a form of that. libpgcommon infrastructure was authored by Álvaro.	2013-02-12 11:21:05 -03:00
Peter Eisentraut	0cb1fac3b1	Add noreturn attributes to some error reporting functions	2013-02-12 07:13:22 -05:00
Heikki Linnakangas	62401db45c	Support unlogged GiST index. The reason this wasn't supported before was that GiST indexes need an increasing sequence to detect concurrent page-splits. In a regular WAL- logged GiST index, the LSN of the page-split record is used for that purpose, and in a temporary index, we can get away with a backend-local counter. Neither of those methods works for an unlogged relation. To provide such an increasing sequence of numbers, create a "fake LSN" counter that is saved and restored across shutdowns. On recovery, unlogged relations are blown away, so the counter doesn't need to survive that either. Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.	2013-02-11 23:07:09 +02:00
Heikki Linnakangas	7803e9327d	Include previous TLI in end-of-recovery and shutdown checkpoint records. This isn't used for anything but a sanity check at the moment, but it could be highly valuable for debugging purposes. It could also be used to recreate timeline history by traversing WAL, which seems useful.	2013-02-11 18:16:25 +02:00
Tom Lane	0fd0f3688b	Document and clean up gistsplit.c. Improve comments, rename some variables and functions, slightly simplify a couple of APIs, in an attempt to make this code readable by people other than its original author. Even though this is essentially just cosmetic, back-patch to all active branches, because otherwise it's going to make back-patching future fixes in this file very painful.	2013-02-10 11:58:15 -05:00
Tom Lane	c61e26ee3e	Add support for ALTER RULE ... RENAME TO. Ali Dar, reviewed by Dean Rasheed.	2013-02-08 23:58:40 -05:00
Alvaro Herrera	381d4b70a9	Clean up c.h / postgres.h after Assert() move Per Tom	2013-02-08 12:50:58 -03:00
Tom Lane	166d534fcd	Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.	2013-02-07 17:44:02 -05:00
Alvaro Herrera	5a1cd89f8f	Split out list of XLog resource managers The new rmgrlist.h header, containing all necessary data about built-in resource managers, allows other pieces of code to access them. In particular, this allows a future pg_xlogdump program to extract rm_desc function pointers, without having to keep a duplicate list of them.	2013-02-06 08:47:28 -03:00
Alvaro Herrera	e1d25de35a	Move Assert() definitions to c.h This way, they can be used by frontend and backend code. We already supported that, but doing it this way allows us to mix true frontend files with backend files compiled in frontend environment. Author: Andres Freund	2013-02-01 17:50:04 -03:00
Tom Lane	0900ac2d0d	Fix plpgsql's reporting of plan-time errors in possibly-simple expressions. exec_simple_check_plan and exec_eval_simple_expr attempted to call GetCachedPlan directly. This meant that if an error was thrown during planning, the resulting context traceback would not include the line normally contributed by _SPI_error_callback. This is already inconsistent, but just to be really odd, a re-execution of the very same expression would show the additional context line, because we'd already have cached the plan and marked the expression as non-simple. The problem is easy to demonstrate in 9.2 and HEAD because planning of a cached plan doesn't occur at all until GetCachedPlan is done. In earlier versions, it could only be an issue if initial planning had succeeded, then a replan was forced (already somewhat improbable for a simple expression), and the replan attempt failed. Since the issue is mainly cosmetic in older branches anyway, it doesn't seem worth the risk of trying to fix it there. It is worth fixing in 9.2 since the instability of the context printout can affect the results of GET STACKED DIAGNOSTICS, as per a recent discussion on pgsql-novice. To fix, introduce a SPI function that wraps GetCachedPlan while installing the correct callback function. Use this instead of calling GetCachedPlan directly from plpgsql. Also introduce a wrapper function for extracting a SPI plan's CachedPlanSource list. This lets us stop including spi_priv.h in pl_exec.c, which was never a very good idea from a modularity standpoint. In passing, fix a similar inconsistency that could occur in SPI_cursor_open, which was also calling GetCachedPlan without setting up a context callback.	2013-01-30 20:02:23 -05:00
Tom Lane	991f3e5ab3	Provide database object names as separate fields in error messages. This patch addresses the problem that applications currently have to extract object names from possibly-localized textual error messages, if they want to know for example which index caused a UNIQUE_VIOLATION failure. It adds new error message fields to the wire protocol, which can carry the name of a table, table column, data type, or constraint associated with the error. (Since the protocol spec has always instructed clients to ignore unrecognized field types, this should not create any compatibility problem.) Support for providing these new fields has been added to just a limited set of error reports (mainly, those in the "integrity constraint violation" SQLSTATE class), but we will doubtless add them to more calls in future. Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with additional hacking by Tom Lane.	2013-01-29 17:08:26 -05:00
Simon Riggs	fd4ced5230	Fast promote mode skips checkpoint at end of recovery. pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we can achieve very fast failover when the apply delay is low. Write new WAL record XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log readers. If we skip synchronous end of recovery checkpoint we request a normal spread checkpoint so that the window of re-recovery is low. Simon Riggs and Kyotaro Horiguchi, with input from Fujii Masao. Review by Heikki Linnakangas	2013-01-29 00:06:15 +00:00
Bruce Momjian	7e2322dff3	Allow CREATE TABLE IF EXIST so succeed if the schema is nonexistent Previously, CREATE TABLE IF EXIST threw an error if the schema was nonexistent. This was done by passing 'missing_ok' to the function that looks up the schema oid.	2013-01-26 13:24:50 -05:00
Tom Lane	0d5fbdc157	Change plan caching to honor, not resist, changes in search_path. In the initial implementation of plan caching, we saved the active search_path when a plan was first cached, then reinstalled that path anytime we needed to reparse or replan. The idea of that was to try to reselect the same referenced objects, in somewhat the same way that views continue to refer to the same objects in the face of schema or name changes. Of course, that analogy doesn't bear close inspection, since holding the search_path fixed doesn't cope with object drops or renames. Moreover sticking with the old path seems to create more surprises than it avoids. So instead of doing that, consider that the cached plan depends on search_path, and force reparse/replan if the active search_path is different than it was when we last saved the plan. This gets us fairly close to having "transparency" of plan caching, in the sense that the cached statement acts the same as if you'd just resubmitted the original query text for another execution. There are still some corner cases where this fails though: a new object added in the search path schema(s) might capture a reference in the query text, but we'd not realize that and force a reparse. We might try to fix that in the future, but for the moment it looks too expensive and complicated.	2013-01-25 14:14:41 -05:00
Andrew Dunstan	1068771abf	Use correct output device for Windows prompts. This ensures that mapping of non-ascii prompts to the correct code page occurs. Bug report and original patch from Alexander Law, reviewed and reworked by Noah Misch. Backpatch to all live branches.	2013-01-24 16:01:31 -05:00
Alvaro Herrera	74ebba84ae	Redefine HEAP_XMAX_IS_LOCKED_ONLY Tuples marked SELECT FOR UPDATE in a cluster that's later processed by pg_upgrade would have a different infomask bit pattern than those produced by 9.3dev; that bit pattern was being seen as "dead" by HEAD (because they would fail the "is this tuple locked" test, and so the visibility rules would thing they're updated, even though there's no HEAP_UPDATED version of them). In other words, some rows could silently disappear after pg_upgrade. With this new definition, those tuples become visible again. This is breakage resulting from my commit `0ac5ad5134`.	2013-01-24 16:10:02 -03:00
Alvaro Herrera	0ac5ad5134	Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov	2013-01-23 12:04:59 -03:00
Heikki Linnakangas	52906f175a	Implement pg_unreachable() on MSVC.	2013-01-23 12:53:55 +02:00
Heikki Linnakangas	990fe3c4ed	Fix more issues with cascading replication and timeline switches. When a standby server follows the master using WAL archive, and it chooses a new timeline (recovery_target_timeline='latest'), it only fetches the timeline history file for the chosen target timeline, not any other history files that might be missing from pg_xlog. For example, if the current timeline is 2, and we choose 4 as the new recovery target timeline, the history file for timeline 3 is not fetched, even if it's part of this server's history. That's enough for the standby itself - the history file for timeline 4 includes timeline 3 as well - but if a cascading standby server wants to recover to timeline 3, it needs the history file. To fix, when a new recovery target timeline is chosen, try to copy any missing history files from the archive to pg_xlog between the old and new target timeline. A second similar issue was with the WAL files. When a standby recovers from archive, and it reaches a segment that contains a switch to a new timeline, recovery fetches only the WAL file labelled with the new timeline's ID. The file from the new timeline contains a copy of the WAL from the old timeline up to the point where the switch happened, and recovery recovers it from the new file. But in streaming replication, walsender only tries to read it from the old timeline's file. To fix, change walsender to read it from the new file, so that it behaves the same as recovery in that sense, and doesn't try to open the possibly nonexistent file with the old timeline's ID.	2013-01-23 10:19:20 +02:00
Tom Lane	75b39e7909	Add infrastructure for storing a VARIADIC ANY function's VARIADIC flag. Originally we didn't bother to mark FuncExprs with any indication whether VARIADIC had been given in the source text, because there didn't seem to be any need for it at runtime. However, because we cannot fold a VARIADIC ANY function's arguments into an array (since they're not necessarily all the same type), we do actually need that information at runtime if VARIADIC ANY functions are to respond unsurprisingly to use of the VARIADIC keyword. Add the missing field, and also fix ruleutils.c so that VARIADIC ANY function calls are dumped properly. Extracted from a larger patch that also fixes concat() and format() (the only two extant VARIADIC ANY functions) to behave properly when VARIADIC is specified. This portion seems appropriate to review and commit separately. Pavel Stehule	2013-01-21 20:26:15 -05:00
Robert Haas	841a5150c5	Add ddl_command_end support for event triggers. Dimitri Fontaine, with slight changes by me	2013-01-21 18:00:24 -05:00
Alvaro Herrera	765cbfdc92	Refactor ALTER some-obj RENAME implementation Remove duplicate implementations of catalog munging and miscellaneous privilege checks. Instead rely on already existing data in objectaddress.c to do the work. Author: KaiGai Kohei, changes by me Reviewed by: Robert Haas, Álvaro Herrera, Dimitri Fontaine	2013-01-21 12:06:41 -03:00
Heikki Linnakangas	6f7cddc7ae	Now that START_REPLICATION returns the next timeline's ID after reaching end of timeline, take advantage of that in walreceiver. Startup process is still in control of choosign the target timeline, by scanning the timeline history files present in pg_xlog, but walreceiver now uses the next timeline's ID to fetch its history file immediately after it has finished streaming the old timeline. Before, the standby would first try to restart streaming on the old timeline, which fetches the missing timeline history file as a side-effect, and only then restart from the new timeline. This patch eliminates the extra iteration, which speeds up the timeline switch and reduces the noise in the log caused by the extra restart on the old timeline.	2013-01-18 11:59:34 +02:00
Heikki Linnakangas	2ff6555313	Use the right timeline when beginning to stream from master. The xlogreader refactoring broke the logic to decide which timeline to start streaming from. XLogPageRead() uses the timeline history to check which timeline the requested WAL position falls into. However, after the refactoring, XLogPageRead() is always first called with the first page in the segment, to verify the segment header, and only then with the actual WAL position we're interested in. That first read of the segment's header made XLogPageRead() to always start streaming from the old timeline containing the segment header, not the timeline containing the actual record, if there was a timeline switch within the segment. I thought I fixed this yesterday, but that fix was too narrow and only fixed this for the corner-case that the timeline switch happened in the first page of the segment. To fix this more robustly, pass explicitly the position of the record we're actually interested in to XLogPageRead, and use that to decide which timeline to read from, rather than deduce it from the page and offset. Per report from Fujii Masao.	2013-01-18 11:46:49 +02:00
Alvaro Herrera	279628a0a7	Accelerate end-of-transaction dropping of relations When relations are dropped, at end of transaction we need to remove the files and clean the buffer pool of buffers containing pages of those relations. Previously we would scan the buffer pool once per relation to clean up buffers. When there are many relations to drop, the repeated scans make this process slow; so we now instead pass a list of relations to drop and scan the pool once, checking each buffer against the passed list. When the number of relations is larger than a threshold (which as of this patch is being set to 20 relations) we sort the array before starting, and bsearch the array; when it's smaller, we simply scan the array linearly each time, because that's faster. The exact optimal threshold value depends on many factors, but the difference is not likely to be significant enough to justify making it user-settable. This has been measured to be a significant win (a 15x win when dropping 100,000 relations; an extreme case, but reportedly a real one). Author: Tomas Vondra, some tweaks by me Reviewed by: Robert Haas, Shigeru Hanada, Andres Freund, Álvaro Herrera	2013-01-17 16:13:17 -03:00
Heikki Linnakangas	0b6329130e	Make pg_receivexlog and pg_basebackup -X stream work across timeline switches. This mirrors the changes done earlier to the server in standby mode. When receivelog reaches the end of a timeline, as reported by the server, it fetches the timeline history file of the next timeline, and restarts streaming from the new timeline by issuing a new START_STREAMING command. When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the last segment on the old timeline. This helps you to tell apart a partial segment left in the directory because of a timeline switch, and a completed segment. If you just follow a single server, it won't make a difference, but it can be significant in more complicated scenarios where new WAL is still generated on the old timeline. This includes two small changes to the streaming replication protocol: First, when you reach the end of timeline while streaming, the server now sends the TLI of the next timeline in the server's history to the client. pg_receivexlog uses that as the next timeline, so that it doesn't need to parse the timeline history file like a standby server does. Second, when BASE_BACKUP command sends the begin and end WAL positions, it now also sends the timeline IDs corresponding the positions.	2013-01-17 20:23:00 +02:00
Heikki Linnakangas	9ee4d06f3f	Make GiST indexes on-disk compatible with 9.2 again. The patch that turned XLogRecPtr into a uint64 inadvertently changed the on-disk format of GiST indexes, because the NSN field in the GiST page opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format of that field back to the two-field struct that XLogRecPtr was before. This is the same we did to LSNs in the page header to avoid changing on-disk format. Bump catversion, as this invalidates any existing GiST indexes built on 9.3devel.	2013-01-17 16:46:16 +02:00
Alvaro Herrera	7fcbf6a405	Split out XLog reading as an independent facility This new facility can not only be used by xlog.c to carry out crash recovery, but also by external programs. By supplying a function to read XLog pages from somewhere, all the WAL reading can be used for completely different purposes. For the standard backend use, the behavior should be pretty much the same as previously. As for non-backend programs, an hypothetical pg_xlogdump program is now closer to reality, but some more backend support is still necessary. This patch was originally submitted by Andres Freund in a different form, but Heikki Linnakangas opted for and authored another design of the concept. Andres has advanced the patch since Heikki's initial version. Review and some (mostly cosmetics) changes by me.	2013-01-16 16:12:53 -03:00
Alvaro Herrera	7ac5760fa2	Rework order of checks in ALTER / SET SCHEMA When attempting to move an object into the schema in which it already was, for most objects classes we were correctly complaining about exactly that ("object is already in schema"); but for some other object classes, such as functions, we were instead complaining of a name collision ("object already exists in schema"). The latter is wrong and misleading, per complaint from Robert Haas in CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com To fix, refactor the way these checks are done. As a bonus, the resulting code is smaller and can also share some code with Rename cases. While at it, remove use of getObjectDescriptionOids() in error messages. These are normally disallowed because of translatability considerations, but this one had slipped through since 9.1. (Not sure that this is worth backpatching, though, as it would create some untranslated messages in back branches.) This is loosely based on a patch by KaiGai Kohei, heavily reworked by me.	2013-01-15 13:23:43 -03:00
Tom Lane	2065dd2834	Prevent very-low-probability PANIC during PREPARE TRANSACTION. The code in PostPrepare_Locks supposed that it could reassign locks to the prepared transaction's dummy PGPROC by deleting the PROCLOCK table entries and immediately creating new ones. This was safe when that code was written, but since we invented partitioning of the shared lock table, it's not safe --- another process could steal away the PROCLOCK entry in the short interval when it's on the freelist. Then, if we were otherwise out of shared memory, PostPrepare_Locks would have to PANIC, since it's too late to back out of the PREPARE at that point. Fix by inventing a dynahash.c function to atomically update a hashtable entry's key. (This might possibly have other uses in future.) This is an ancient bug that in principle we ought to back-patch, but the odds of someone hitting it in the field seem really tiny, because (a) the risk window is small, and (b) nobody runs servers with maxed-out lock tables for long, because they'll be getting non-PANIC out-of-memory errors anyway. So fixing it in HEAD seems sufficient, at least until the new code has gotten some testing.	2013-01-13 22:20:22 -05:00
Tom Lane	b853eb9718	Improve handling of ereport(ERROR) and elog(ERROR). In commit `71450d7fd6`, we added code to inform suitably-intelligent compilers that ereport() doesn't return if the elevel is ERROR or higher. This patch extends that to elog(), and also fixes a double-evaluation hazard that the previous commit created in ereport(), as well as reducing the emitted code size. The elog() improvement requires the compiler to support __VA_ARGS__, which should be available in just about anything nowadays since it's required by C99. But our minimum language baseline is still C89, so add a configure test for that. The previous commit assumed that ereport's elevel could be evaluated twice, which isn't terribly safe --- there are already counterexamples in xlog.c. On compilers that have __builtin_constant_p, we can use that to protect the second test, since there's no possible optimization gain if the compiler doesn't know the value of elevel. Otherwise, use a local variable inside the macros to prevent double evaluation. The local-variable solution is inferior because (a) it leads to useless code being emitted when elevel isn't constant, and (b) it increases the optimization level needed for the compiler to recognize that subsequent code is unreachable. But it seems better than not teaching non-gcc compilers about unreachability at all. Lastly, if the compiler has __builtin_unreachable(), we can use that instead of abort(), resulting in a noticeable code savings since no function call is actually emitted. However, it seems wise to do this only in non-assert builds. In an assert build, continue to use abort(), so that the behavior will be predictable and debuggable if the "impossible" happens. These changes involve making the ereport and elog macros emit do-while statement blocks not just expressions, which forces small changes in a few call sites. Andres Freund, Tom Lane, Heikki Linnakangas	2013-01-13 18:40:09 -05:00
Tom Lane	31f38f28b0	Redesign the planner's handling of index-descent cost estimation. Historically we've used a couple of very ad-hoc fudge factors to try to get the right results when indexes of different sizes would satisfy a query with the same number of index leaf tuples being visited. In commit `21a39de580` I tweaked one of these fudge factors, with results that proved disastrous for larger indexes. Commit `bf01e34b55` fudged it some more, but still with not a lot of principle behind it. What seems like a better way to address these issues is to explicitly model index-descent costs, since that's what's really at stake when considering diferent indexes with similar leaf-page-level costs. We tried that once long ago, and found that charging random_page_cost per page descended through was way too much, because upper btree levels tend to stay in cache in real-world workloads. However, there's still CPU costs to think about, and the previous fudge factors can be seen as a crude attempt to account for those costs. So this patch replaces those fudge factors with explicit charges for the number of tuple comparisons needed to descend the index tree, plus a small charge per page touched in the descent. The cost multipliers are chosen so that the resulting charges are in the vicinity of the historical (pre-9.2) fudge factors for indexes of up to about a million tuples, while not ballooning unreasonably beyond that, as the old fudge factor did (even more so in 9.2). To make this work accurately for btree indexes, add some code that allows extraction of the known root-page height from a btree. There's no equivalent number readily available for other index types, but we can use the log of the number of index pages as an approximate substitute. This seems like too much of a behavioral change to risk back-patching, but it should improve matters going forward. In 9.2 I'll just revert the fudge-factor change.	2013-01-11 12:56:58 -05:00
Magnus Hagander	940d136661	Centralize single quote escaping in src/port/quotes.c For code-reuse in upcoming functionality in pg_basebackup. Zoltan Boszormenyi	2013-01-05 15:40:19 +01:00
Tom Lane	94afbd5831	Invent a "one-shot" variant of CachedPlans for better performance. SPI_execute() and related functions create a CachedPlan, execute it once, and immediately discard it, so that the functionality offered by plancache.c is of no value in this code path. And performance measurements show that the extra data copying and invalidation checking done by plancache.c slows down simple queries by 10% or more compared to 9.1. However, enough of the SPI code is shared with functions that do need plan caching that it seems impractical to bypass plancache.c altogether. Instead, let's invent a variant version of cached plans that preserves 99% of the API but doesn't offer any of the actual functionality, nor the overhead. This puts SPI_execute() performance back on par, or maybe even slightly better, than it was before. This change should resolve recent complaints of performance degradation from Dong Ye, Pavel Stehule, and others. By avoiding data copying, this change also reduces the amount of memory needed to execute many-statement SPI_execute() strings, as for instance in a recent complaint from Tomas Vondra. An additional benefit of this change is that multi-statement SPI_execute() query strings are now processed fully serially, that is we complete execution of earlier statements before running parse analysis and planning on following ones. This eliminates a long-standing POLA violation, in that DDL that affects the behavior of a later statement will now behave as expected. Back-patch to 9.2, since this was a performance regression compared to 9.1. (In 9.2, place the added struct fields so as to avoid changing the offsets of existing fields.) Heikki Linnakangas and Tom Lane	2013-01-04 17:42:19 -05:00
Heikki Linnakangas	b0daba57bb	Tolerate timeline switches while "pg_basebackup -X fetch" is running. If you take a base backup from a standby server with "pg_basebackup -X fetch", and the timeline switches while the backup is being taken, the backup used to fail with an error "requested WAL segment %s has already been removed". This is because the server-side code that sends over the required WAL files would not construct the WAL filename with the correct timeline after a switch. Fix that by using readdir() to scan pg_xlog for all the WAL segments in the range, regardless of timeline. Also, include all timeline history files in the backup, if taken with "-X fetch". That fixes another related bug: If a timeline switch happened just before the backup was initiated in a standby, the WAL segment containing the initial checkpoint record contains WAL from the older timeline too. Recovery will not accept that without a timeline history file that lists the older timeline. Backpatch to 9.2. Versions prior to that were not affected as you could not take a base backup from a standby before 9.2.	2013-01-03 19:51:00 +02:00
Magnus Hagander	794397ae1d	Move tar function headers to pgtar.h This makes it possible to include them only where they are used, so we can avoid the conflict of the uid_t and gid_t datatypes that happened in plperl (since plperl doesn't need the tar functions)	2013-01-02 20:34:08 +01:00
Alvaro Herrera	dfbba2c86c	Make sure MaxBackends is always set Auxiliary and bootstrap processes weren't getting it, causing initdb to fail completely.	2013-01-02 14:39:11 -03:00
Alvaro Herrera	cdbc0ca48c	Fix background workers for EXEC_BACKEND Commit `da07a1e8` was broken for EXEC_BACKEND because I failed to realize that the MaxBackends recomputation needed to be duplicated by subprocesses in SubPostmasterMain. However, instead of having the value be recomputed at all, it's better to assign the correct value at postmaster initialization time, and have it be propagated to exec'ed backends via BackendParameters. MaxBackends stays as zero until after modules in shared_preload_libraries have had a chance to register bgworkers, since the value is going to be untrustworthy till that's finished. Heikki Linnakangas and Álvaro Herrera	2013-01-02 12:01:14 -03:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Magnus Hagander	f5d4bdd3a5	Unify some tar functionality across different parts Move some of the tar functionality that existed mostly duplicated in both pg_dump and the walsender basebackup functionality into port/tar.c instead, so it can be used from both. It will also be used by pg_basebackup in the future, which would've caused a third copy of it around. Zoltan Boszormenyi and Magnus Hagander	2013-01-01 18:15:57 +01:00
Heikki Linnakangas	60df192aea	Keep timeline history files restored from archive in pg_xlog. The cascading standby patch in 9.2 changed the way WAL files are treated when restored from the archive. Before, they were restored under a temporary filename, and not kept in pg_xlog, but after the patch, they were copied under pg_xlog. This is necessary for a cascading standby to find them, but it also means that if the archive goes offline and a standby is restarted, it can recover back to where it was using the files in pg_xlog. It also means that if you take an offline backup from a standby server, it includes all the required WAL files in pg_xlog. However, the same change was not made to timeline history files, so if the WAL segment containing the checkpoint record contains a timeline switch, you will still get an error if you try to restart recovery without the archive, or recover from an offline backup taken from the standby. With this patch, timeline history files restored from archive are copied into pg_xlog like WAL files are, so that pg_xlog contains all the files required to recover. This is a corner-case pre-existing issue in 9.2, but even more important in master where it's possible for a standby to follow a timeline switch through streaming replication. To make that possible, the timeline history files must be present in pg_xlog.	2012-12-30 14:29:45 +02:00
Robert Haas	82b1b213ca	Adjust more backend functions to return OID rather than void. This is again intended to support extensions to the event trigger functionality. This may go a bit further than we need for that purpose, but there's some value in being consistent, and the OID may be useful for other purposes also. Dimitri Fontaine	2012-12-29 07:55:37 -05:00
Alvaro Herrera	5ab3af46dd	Remove obsolete XLogRecPtr macros This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera	2012-12-28 13:06:15 -03:00
Alvaro Herrera	eaa1f7220a	Remove unused NextLogPage macro Commit `061e7efb1b` did away with its last caller, but neglected to remove the actual definition. Author: Andres Freund	2012-12-27 18:23:23 -03:00
Simon Riggs	c2b3218064	Update comments on rd_newRelfilenodeSubid. Ensure comments accurately reflect state of code given new understanding, and recent changes. Include example code from Noah Misch to illustrate how rd_newRelfilenodeSubid can be reset deterministically. No code changes.	2012-12-24 17:07:06 +00:00
Simon Riggs	42fa810c14	Fix more weird compiler messages caused by unmatched function prototypes. Andres Freund	2012-12-24 16:25:26 +00:00
Simon Riggs	2dcb2ebee2	Add function prototype from previous commit.	2012-12-24 09:18:42 +00:00
Robert Haas	c504513f83	Adjust many backend functions to return OID rather than void. Extracted from a larger patch by Dimitri Fontaine. It is hoped that this will provide infrastructure for enriching the new event trigger functionality, but it seems possibly useful for other purposes as well.	2012-12-23 18:37:58 -05:00
Tom Lane	31bc839724	Prevent failure when RowExpr or XmlExpr is parse-analyzed twice. transformExpr() is required to cope with already-transformed expression trees, for various ugly-but-not-quite-worth-cleaning-up reasons. However, some of its newer subroutines hadn't gotten the memo. This accounts for bug #7763 from Norbert Buchmuller: transformRowExpr() was overwriting the previously determined type of a RowExpr during CREATE TABLE LIKE INCLUDING INDEXES. Additional investigation showed that transformXmlExpr had the same kind of problem, but all the other cases seem to be safe. Andres Freund and Tom Lane	2012-12-23 14:07:24 -05:00
Heikki Linnakangas	d57a97343e	Forgot to remove extern declaration of GetRecoveryTargetTLI() Fujii Masao	2012-12-21 09:29:03 +02:00
Heikki Linnakangas	af275a12df	Follow TLI of last replayed record, not recovery target TLI, in walsenders. Most of the time, the last replayed record comes from the recovery target timeline, but there is a corner case where it makes a difference. When the startup process scans for a new timeline, and decides to change recovery target timeline, there is a window where the recovery target TLI has already been bumped, but there are no WAL segments from the new timeline in pg_xlog yet. For example, if we have just replayed up to point 0/30002D8, on timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog that contains the WAL up to that point. When recovery switches recovery target timeline to 2, a walsender can immediately try to read WAL from 0/30002D8, from timeline 2, so it will try to open WAL file 000000020000000000000003. However, that doesn't exist yet - the startup process hasn't copied that file from the archive yet nor has the walreceiver streamed it yet, so walsender fails with error "requested WAL segment 000000020000000000000003 has already been removed". That's harmless, in that the standby will try to reconnect later and by that time the segment is already created, but error messages that should be ignored are not good. To fix that, have walsender track the TLI of the last replayed record, instead of the recovery target timeline. That way walsender will not try to read anything from timeline 2, until the WAL segment has been created and at least one record has been replayed from it. The recovery target timeline is now xlog.c's internal affair, it doesn't need to be exposed in shared memory anymore. This fixes the error reported by Thom Brown. depesz the same error message, but I'm not sure if this fixes his scenario.	2012-12-20 14:39:04 +02:00
Tom Lane	6919b7e329	Fix failure to ignore leftover temp tables after a server crash. During crash recovery, we remove disk files belonging to temporary tables, but the system catalog entries for such tables are intentionally not cleaned up right away. Instead, the first backend that uses a temp schema is expected to clean out any leftover objects therein. This approach requires that we be careful to ignore leftover temp tables (since any actual access attempt would fail), even if their BackendId matches our session, if we have not yet established use of the session's corresponding temp schema. That worked fine in the past, but was broken by commit `debcec7dc3` which incorrectly removed the rd_islocaltemp relcache flag. Put it back, and undo various changes that substituted tests like "rel->rd_backend == MyBackendId" for use of a state-aware flag. Per trouble report from Heikki Linnakangas. Back-patch to 9.1 where the erroneous change was made. In the back branches, be careful to add rd_islocaltemp in a spot in the struct that was alignment padding before, so as not to break existing add-on code.	2012-12-17 20:15:32 -05:00
Andrew Dunstan	1c382655ad	Provide Assert() for frontend code. Per discussion on-hackers. psql is converted to use the new code. Follows a suggestion from Heikki Linnakangas.	2012-12-14 18:03:07 -05:00
Heikki Linnakangas	abfd192b1b	Allow a streaming replication standby to follow a timeline switch. Before this patch, streaming replication would refuse to start replicating if the timeline in the primary doesn't exactly match the standby. The situation where it doesn't match is when you have a master, and two standbys, and you promote one of the standbys to become new master. Promoting bumps up the timeline ID, and after that bump, the other standby would refuse to continue. There's significantly more timeline related logic in streaming replication now. First of all, when a standby connects to primary, it will ask the primary for any timeline history files that are missing from the standby. The missing files are sent using a new replication command TIMELINE_HISTORY, and stored in standby's pg_xlog directory. Using the timeline history files, the standby can follow the latest timeline present in the primary (recovery_target_timeline='latest'), just as it can follow new timelines appearing in an archive directory. START_REPLICATION now takes a TIMELINE parameter, to specify exactly which timeline to stream WAL from. This allows the standby to request the primary to send over WAL that precedes the promotion. The replication protocol is changed slightly (in a backwards-compatible way although there's little hope of streaming replication working across major versions anyway), to allow replication to stop when the end of timeline reached, putting the walsender back into accepting a replication command. Many thanks to Amit Kapila for testing and reviewing various versions of this patch.	2012-12-13 19:17:32 +02:00
Heikki Linnakangas	527668717a	Make xlog_internal.h includable in frontend context. This makes unnecessary the ugly hack used to #include postgres.h in pg_basebackup. Based on Alvaro Herrera's patch	2012-12-13 14:59:13 +02:00
Kevin Grittner	b19e4250b4	Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck	2012-12-11 14:33:08 -06:00
Tom Lane	a99c42f291	Support automatically-updatable views. This patch makes "simple" views automatically updatable, without the need to create either INSTEAD OF triggers or INSTEAD rules. "Simple" views are those classified as updatable according to SQL-92 rules. The rewriter transforms INSERT/UPDATE/DELETE commands on such views directly into an equivalent command on the underlying table, which will generally have noticeably better performance than is possible with either triggers or user-written rules. A view that has INSTEAD OF triggers or INSTEAD rules continues to operate the same as before. For the moment, security_barrier views are not considered simple. Also, we do not support WITH CHECK OPTION. These features may be added in future. Dean Rasheed, reviewed by Amit Kapila	2012-12-08 18:26:21 -05:00
Alvaro Herrera	da07a1e856	Background worker processes Background workers are postmaster subprocesses that run arbitrary user-specified code. They can request shared memory access as well as backend database connections; or they can just use plain libpq frontend database connections. Modules listed in shared_preload_libraries can register background workers in their _PG_init() function; this is early enough that it's not necessary to provide an extra GUC option, because the necessary extra resources can be allocated early on. Modules can install more than one bgworker, if necessary. Care is taken that these extra processes do not interfere with other postmaster tasks: only one such process is started on each ServerLoop iteration. This means a large number of them could be waiting to be started up and postmaster is still able to quickly service external connection requests. Also, shutdown sequence should not be impacted by a worker process that's reasonably well behaved (i.e. promptly responds to termination signals.) The current implementation lets worker processes specify their start time, i.e. at what point in the server startup process they are to be started: right after postmaster start (in which case they mustn't ask for shared memory access), when consistent state has been reached (useful during recovery in a HOT standby server), or when recovery has terminated (i.e. when normal backends are allowed). In case of a bgworker crash, actions to take depend on registration data: if shared memory was requested, then all other connections are taken down (as well as other bgworkers), just like it were a regular backend crashing. The bgworker itself is restarted, too, within a configurable timeframe (which can be configured to be never). More features to add to this framework can be imagined without much effort, and have been discussed, but this seems good enough as a useful unit already. An elementary sample module is supplied. Author: Álvaro Herrera This patch is loosely based on prior patches submitted by KaiGai Kohei, and unsubmitted code by Simon Riggs. Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund, Heikki Linnakangas, Simon Riggs, Amit Kapila	2012-12-06 17:47:30 -03:00
Heikki Linnakangas	32f4de0adf	Write exact xlog position of timeline switch in the timeline history file. This allows us to do some more rigorous sanity checking for various incorrect point-in-time recovery scenarios, and provides more information for debugging purposes. It will also come handy in the upcoming patch to allow timeline switches to be replicated by streaming replication.	2012-12-04 17:29:07 +02:00
Heikki Linnakangas	5ce108bf32	Track the timeline associated with minRecoveryPoint, for more sanity checks. This allows recovery to notice certain incorrect recovery scenarios. If a server has recovered to point X on timeline 5, and you restart recovery, it better be on timeline 5 when it reaches point X again, not on some timeline with a higher ID. This can happen e.g if you a standby server is shut down, a new timeline appears in the WAL archive, and the standby server is restarted. It will try to follow the new timeline, which is wrong because some WAL on the old timeline was already replayed before shutdown. Requires an initdb (or at least pg_resetxlog), because this adds a field to the control file.	2012-12-04 11:31:00 +02:00
Peter Eisentraut	aa2fec0a18	Add support for LDAP URLs Allow specifying LDAP authentication parameters as RFC 4516 LDAP URLs.	2012-12-03 23:31:02 -05:00
Simon Riggs	f21bb9cfb5	Refactor inCommit flag into generic delayChkpt flag. Rename PGXACT->inCommit flag into delayChkpt flag, and generalise comments to allow use in other situations, such as the forthcoming potential use in checksum patch. Replace wait loop to look for VXIDs with delayChkpt set. No user visible changes, not behaviour changes at present. Simon Riggs, reviewed and rebased by Jeff Davis	2012-12-03 13:13:53 +00:00
Simon Riggs	5457a130d3	Reduce scope of changes for COPY FREEZE. Allow support only for freezing tuples by explicit command. Previous coding mistakenly extended slightly beyond what was agreed as correct on -hackers. So essentially a partial revoke of earlier work, leaving just the COPY FREEZE command.	2012-12-02 20:52:52 +00:00
Tom Lane	3114cb60a1	Don't advance checkPoint.nextXid near the end of a checkpoint sequence. This reverts commit `c11130690d` in favor of actually fixing the problem: namely, that we should never have been modifying the checkpoint record's nextXid at this point to begin with. The nextXid should match the state as of the checkpoint's logical WAL position (ie the redo point), not the state as of its physical position. It's especially bogus to advance it in some wal_levels and not others. In any case there is no need for the checkpoint record to carry the same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by LogStandbySnapshot, as any replay operation will already have adopted that value as current. This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug #6291 from Daniel Farina, in that if a checkpoint were in progress at the instant of XID wraparound, the epoch bump would be lost as reported. (And, of course, these days there's at least a 50-50 chance of a checkpoint being in progress at any given instant.) Diagnosed by me and independently by Andres Freund. Back-patch to all branches supporting hot standby.	2012-12-02 15:20:41 -05:00
Simon Riggs	5c11725867	Rearrange storage of data in xl_running_xacts. Previously we stored all xids mixed together. Now we store top-level xids first, followed by all subxids. Also skip logging any subxids if the snapshot is suboverflowed, since there are potentially large numbers of them and they are not useful in that case anyway. Has value in the envisaged design for decoding of WAL. No planned effect on Hot Standby. Andres Freund, reviewed by me	2012-12-02 19:39:37 +00:00
Tom Lane	7b90469b71	Allow adding values to an enum type created in the current transaction. Normally it is unsafe to allow ALTER TYPE ADD VALUE in a transaction block, because instances of the value could be added to indexes later in the same transaction, and then they would still be accessible even if the transaction rolls back. However, we can allow this if the enum type itself was created in the current transaction, because then any such indexes would have to go away entirely on rollback. The reason for allowing this is to support pg_upgrade's new usage of pg_restore --single-transaction: in --binary-upgrade mode, pg_dump emits enum types as a succession of ALTER TYPE ADD VALUE commands so that it can preserve the values' OIDs. The support is a bit limited, so we'll leave it undocumented. Andres Freund	2012-12-01 14:27:30 -05:00
Simon Riggs	8de72b66a2	COPY FREEZE and mark committed on fresh tables. When a relfilenode is created in this subtransaction or a committed child transaction and it cannot otherwise be seen by our own process, mark tuples committed ahead of transaction commit for all COPY commands in same transaction. If FREEZE specified on COPY and pre-conditions met then rows will also be frozen. Both options designed to avoid revisiting rows after commit, increasing performance of subsequent commands after data load and upgrade. pg_restore changes later. Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design input from Tom Lane, Robert Haas and Kevin Grittner	2012-12-01 12:54:20 +00:00
Tom Lane	4af446e7cd	Produce a more useful error message for over-length Unix socket paths. The length of a socket path name is constrained by the size of struct sockaddr_un, and there's not a lot we can do about it since that is a kernel API. However, it would be a good thing if we produced an intelligible error message when the user specifies a socket path that's too long --- and getaddrinfo's standard API is too impoverished to do this in the natural way. So insert explicit tests at the places where we construct a socket path name. Now you'll get an error that makes sense and even tells you what the limit is, rather than something generic like "Non-recoverable failure in name resolution". Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.	2012-11-29 19:57:01 -05:00
Simon Riggs	f1e57a4ec9	Cleanup VirtualXact at end of Hot Standby.	2012-11-29 21:59:11 +00:00
Robert Haas	7a2fe9bd03	Basic binary heap implementation. There are probably other places where this can be used, but for now, this just makes MergeAppend use it, so that this code will have test coverage. There is other work in the queue that will use this, as well. Abhijit Menon-Sen, reviewed by Andres Freund, Robert Haas, Álvaro Herrera, Tom Lane, and others.	2012-11-29 11:16:59 -05:00
Tom Lane	3c84046490	Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit `8cb53654db`, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.	2012-11-28 21:26:01 -05:00
Alvaro Herrera	1577b46b7c	Split out rmgr rm_desc functions into their own files This is necessary (but not sufficient) to have them compilable outside of a backend environment.	2012-11-28 13:01:15 -03:00
Tom Lane	e78d288c89	Add explicit casts in ilist.h's inline functions. Needed to silence C++ errors, per report from Peter Eisentraut. Andres Freund	2012-11-27 10:58:37 -05:00
Heikki Linnakangas	1f67078ea3	Add OpenTransientFile, with automatic cleanup at end-of-xact. Files opened with BasicOpenFile or PathNameOpenFile are not automatically cleaned up on error. That puts unnecessary burden on callers that only want to keep the file open for a short time. There is AllocateFile, but that returns a buffered FILE * stream, which in many cases is not the nicest API to work with. So add function called OpenTransientFile, which returns a unbuffered fd that's cleaned up like the FILE* returned by AllocateFile(). This plugs a few rare fd leaks in error cases: 1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile 2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't use OpenTransientFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenTransientFile instead of PathNameOpenFile. In addition to plugging those leaks, this replaces many BasicOpenFile() calls with OpenTransientFile() that were not leaking, because the code meticulously closed the file on error. That wasn't strictly necessary, but IMHO it's good for robustness. The same leaks exist in older versions, but given the rarity of the issues, I'm not backpatching this. Not yet, anyway - it might be good to backpatch later, after this mechanism has had some more testing in master branch.	2012-11-27 10:25:50 +02:00
Tom Lane	532994299e	Revert patch for taking fewer snapshots. This reverts commit `d573e239f0`, "Take fewer snapshots". While that seemed like a good idea at the time, it caused execution to use a snapshot that had been acquired before locking any of the tables mentioned in the query. This created user-visible anomalies that were not present in any prior release of Postgres, as reported by Tomas Vondra. While this whole area could do with a redesign (since there are related cases that have anomalies anyway), it doesn't seem likely that any future patch would be reasonably back-patchable; and we don't want 9.2 to exhibit a behavior that's subtly unlike either past or future releases. Hence, revert to prior code while we rethink the problem.	2012-11-26 15:55:43 -05:00
Tom Lane	d3237e04ca	Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is pretty useless (there being only one output row), but nonetheless it shouldn't fail. But it could fail if "tab" is an inheritance parent, because planagg.c's code for fixing up equivalence classes after making the index-optimized MIN/MAX transformation wasn't prepared to find child-table versions of the aggregate expression. The least ugly fix seems to be to add an option to mutate_eclass_expressions() to skip child-table equivalence class members, which aren't used anymore at this stage of planning so it's not really necessary to fix them. Since child members are ignored in many cases already, it seems plausible for mutate_eclass_expressions() to have an option to ignore them too. Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the same code exists before that, it cannot encounter child-table aggregates AFAICS, because the index optimization transformation cannot succeed on inheritance trees before 9.1 (for lack of MergeAppend).	2012-11-26 12:57:58 -05:00
Heikki Linnakangas	644a0a6379	Fix archive_cleanup_command. When I moved ExecuteRecoveryCommand() from xlog.c to xlogarchive.c, I didn't realize that it's called from the checkpoint process, not the startup process. I tried to use InRedo variable to decide whether or not to attempt cleaning up the archive (must not do so before we have read the initial checkpoint record), but that variable is only valid within the startup process. Instead, let ExecuteRecoveryCommand() always clean up the archive, and add an explicit argument to RestoreArchivedFile() to say whether that's allowed or not. The caller knows better. Reported by Erik Rijkers, diagnosis by Fujii Masao. Only 9.3devel is affected.	2012-11-19 10:14:20 +02:00
Tom Lane	3bbf668de9	Fix multiple problems in WAL replay. Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.	2012-11-12 22:05:53 -05:00
Heikki Linnakangas	dbdf9679d7	Use correct text domain for translating errcontext() messages. errcontext() is typically used in an error context callback function, not within an ereport() invocation like e.g errmsg and errdetail are. That means that the message domain that the TEXTDOMAIN magic in ereport() determines is not the right one for the errcontext() calls. The message domain needs to be determined by the C file containing the errcontext() call, not the file containing the ereport() call. Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use for the errcontext message. "errcontext" was used in a few places as a variable or struct field name, I had to rename those out of the way, now that errcontext is a macro. We've had this problem all along, but this isn't doesn't seem worth backporting. It's a fairly minor issue, and turning errcontext from a function to a macro requires at least a recompile of any external code that calls errcontext().	2012-11-12 17:07:29 +02:00
Heikki Linnakangas	c9d44a75d4	Silence "expression result unused" warnings in AssertVariableIsOfTypeMacro At least clang 3.1 generates those warnings. Prepend (void) to avoid them, like we have in AssertMacro.	2012-11-12 15:02:40 +02:00
Tom Lane	3e7fdcffd6	Fix WaitLatch() to return promptly when the requested timeout expires. If the sleep is interrupted by a signal, we must recompute the remaining time to wait; otherwise, a steady stream of non-wait-terminating interrupts could delay return from WaitLatch indefinitely. This has been shown to be a problem for the autovacuum launcher, and there may well be other places now or in the future with similar issues. So we'd better make the function robust, even though this'll add at least one gettimeofday call per wait. Back-patch to 9.2. We might eventually need to fix 9.1 as well, but the code is quite different there, and the usage of WaitLatch in 9.1 is so limited that it's not clearly important to do so. Reported and diagnosed by Jeff Janes, though I rewrote his patch rather heavily.	2012-11-08 20:04:48 -05:00
Tom Lane	dcc55dd21a	Rename ResolveNew() to ReplaceVarsFromTargetList(), and tweak its API. This function currently lacks the option to throw error if the provided targetlist doesn't have any matching entry for a Var to be replaced. Two of the four existing call sites would be better off with an error, as would the usage in the pending auto-updatable-views patch, so it seems past time to extend the API to support that. To do so, replace the "event" parameter (historically of type CmdType, though it was declared plain int) with a special-purpose enum type. It's unclear whether this function might be called by third-party code. Since many C compilers wouldn't warn about a call site continuing to use the old calling convention, rename the function to forcibly break any such code that hasn't been updated. The old name was none too well chosen anyhow.	2012-11-08 16:52:49 -05:00
Bruce Momjian	aa69670e42	Add URLs to document why DLLIMPORT is needed on Windows. Per email from Craig Ringer	2012-11-07 15:01:25 -05:00
Heikki Linnakangas	add6c3179a	Make the streaming replication protocol messages architecture-independent. We used to send structs wrapped in CopyData messages, which works as long as the client and server agree on things like endianess, timestamp format and alignment. That's good enough for running a standby server, which has to run on the same platform anyway, but it's useful for tools like pg_receivexlog to work across platforms. This breaks protocol compatibility of streaming replication, but we never promised that to be compatible across versions, anyway.	2012-11-07 19:09:13 +02:00
Tom Lane	5ed6546cf7	Fix handling of inherited check constraints in ALTER COLUMN TYPE. This case got broken in 8.4 by the addition of an error check that complains if ALTER TABLE ONLY is used on a table that has children. We do use ONLY for this situation, but it's okay because the necessary recursion occurs at a higher level. So we need to have a separate flag to suppress recursion without making the error check. Reported and patched by Pavan Deolasee, with some editorial adjustments by me. Back-patch to 8.4, since this is a regression of functionality that worked in earlier branches.	2012-11-05 13:36:16 -05:00
Alvaro Herrera	04f28bdb84	Fix ALTER EXTENSION / SET SCHEMA In its original conception, it was leaving some objects into the old schema, but without their proper pg_depend entries; this meant that the old schema could be dropped, causing future pg_dump calls to fail on the affected database. This was originally reported by Jeff Frost as #6704; there have been other complaints elsewhere that can probably be traced to this bug. To fix, be more consistent about altering a table's subsidiary objects along the table itself; this requires some restructuring in how tables are relocated when altering an extension -- hence the new AlterTableNamespaceInternal routine which encapsulates it for both the ALTER TABLE and the ALTER EXTENSION cases. There was another bug lurking here, which was unmasked after fixing the previous one: certain objects would be reached twice via the dependency graph, and the second attempt to move them would cause the entire operation to fail. Per discussion, it seems the best fix for this is to do more careful tracking of objects already moved: we now maintain a list of moved objects, to avoid attempting to do it twice for the same object. Authors: Alvaro Herrera, Dimitri Fontaine Reviewed by Tom Lane	2012-10-31 10:52:55 -03:00
Kevin Grittner	6868ed7491	Throw error if expiring tuple is again updated or deleted. This prevents surprising behavior when a FOR EACH ROW trigger BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or deletes the the old row. Prior to this patch the requested action on the row could be silently ignored while all triggered actions based on the occurence of the requested action could be committed. One example of how this could happen is if the BEFORE DELETE trigger for a "parent" row deleted "children" which had trigger functions to update summary or status data on the parent. This also prevents similar surprising problems if the query has a volatile function which updates a target row while it is already being updated. There are related issues present in FOR UPDATE cursors and READ COMMITTED queries which are not handled by this patch. These issues need further evalution to determine what change, if any, is needed. Where the new error messages are generated, in most cases the best fix will be to move code from the BEFORE trigger to an AFTER trigger. Where this is not feasible, the trigger can avoid the error by re-issuing the triggering statement and returning NULL. Documentation changes will be submitted in a separate patch. Kevin Grittner and Tom Lane with input from Florian Pflug and Robert Haas, based on problems encountered during conversion of Wisconsin Circuit Court trigger logic to plpgsql triggers.	2012-10-26 14:55:36 -05:00
Tom Lane	a4e8680a6c	When converting a table to a view, remove its system columns. Views should not have any pg_attribute entries for system columns. However, we forgot to remove such entries when converting a table to a view. This could lead to crashes later on, if someone attempted to reference such a column, as reported by Kohei KaiGai. Patch in HEAD only. This bug has been there forever, but in the back branches we will have to defend against existing mis-converted views, so it doesn't seem worthwhile to change the conversion code too.	2012-10-24 13:39:37 -04:00
Alvaro Herrera	f4c4335a4a	Add context info to OAT_POST_CREATE security hook ... and have sepgsql use it to determine whether to check permissions during certain operations. Indexes that are being created as a result of REINDEX, for instance, do not need to have their permissions checked; they were already checked when the index was created. Author: KaiGai Kohei, slightly revised by me	2012-10-23 18:24:24 -03:00
Tom Lane	dc5aeca168	Remove unnecessary "head" arguments from some dlist/slist functions. dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after do not need access to the list header, and indeed insisting on that negates one of the main advantages of a doubly-linked list. In consequence, revert addition of "cache_bucket" field to CatCTup.	2012-10-18 19:04:20 -04:00
Tom Lane	8f8d746478	Code review for inline-list patch. Make foreach macros less syntactically dangerous, and fix some typos in evidently-never-tested ones. Add missing slist_next_node and slist_head_node functions. Fix broken dlist_check code. Assorted comment improvements.	2012-10-18 16:47:07 -04:00
Tom Lane	72a4231f0c	Fix planning of non-strict equivalence clauses above outer joins. If a potential equivalence clause references a variable from the nullable side of an outer join, the planner needs to take care that derived clauses are not pushed to below the outer join; else they may use the wrong value for the variable. (The problem arises only with non-strict clauses, since if an upper clause can be proven strict then the outer join will get simplified to a plain join.) The planner attempted to prevent this type of error by checking that potential equivalence clauses aren't outerjoin-delayed as a whole, but actually we have to check each side separately, since the two sides of the clause will get moved around separately if it's treated as an equivalence. Bugs of this type can be demonstrated as far back as 7.4, even though releases before 8.3 had only a very ad-hoc notion of equivalence clauses. In addition, we neglected to account for the possibility that such clauses might have nonempty nullable_relids even when not outerjoin-delayed; so the equivalence-class machinery lacked logic to compute correct nullable_relids values for clauses it constructs. This oversight was harmless before 9.2 because we were only using RestrictInfo.nullable_relids for OR clauses; but as of 9.2 it could result in pushing constructed equivalence clauses to incorrect places. (This accounts for bug #7604 from Bill MacArthur.) Fix the first problem by adding a new test check_equivalence_delay() in distribute_qual_to_rels, and fix the second one by adding code in equivclass.c and called functions to set correct nullable_relids for generated clauses. Although I believe the second part of this is not currently necessary before 9.2, I chose to back-patch it anyway, partly to keep the logic similar across branches and partly because it seems possible we might find other reasons why we need valid values of nullable_relids in the older branches. Add regression tests illustrating these problems. In 9.0 and up, also add test cases checking that we can push constants through outer joins, since we've broken that optimization before and I nearly broke it again with an overly simplistic patch for this problem.	2012-10-18 12:30:10 -04:00
Tom Lane	ff3f9c8de5	Close un-owned SMgrRelations at transaction end. If an SMgrRelation is not "owned" by a relcache entry, don't allow it to live past transaction end. This design allows the same SMgrRelation to be used for blind writes of multiple blocks during a transaction, but ensures that we don't hold onto such an SMgrRelation indefinitely. Because an SMgrRelation typically corresponds to open file descriptors at the fd.c level, leaving it open when there's no corresponding relcache entry can mean that we prevent the kernel from reclaiming deleted disk space. (While CacheInvalidateSmgr messages usually fix that, there are cases where they're not issued, such as DROP DATABASE. We might want to add some more sinval messaging for that, but I'd be inclined to keep this type of logic anyway, since allowing VFDs to accumulate indefinitely for blind-written relations doesn't seem like a good idea.) This code replaces a previous attempt towards the same goal that proved to be unreliable. Back-patch to 9.1 where the previous patch was added.	2012-10-17 12:38:21 -04:00
Tom Lane	9bacf0e373	Revert "Use "transient" files for blind writes, take 2". This reverts commit `fba105b109`. That approach had problems with the smgr-level state not tracking what we really want to happen, and with the VFD-level state not tracking the smgr-level state very well either. In consequence, it was still possible to hold kernel file descriptors open for long-gone tables (as in recent report from Tore Halset), and yet there were also cases of FDs being closed undesirably soon. A replacement implementation will follow.	2012-10-17 12:37:08 -04:00
Alvaro Herrera	a66ee69add	Embedded list interface Provide a common implementation of embedded singly-linked and doubly-linked lists. "Embedded" in the sense that the nodes' next/previous pointers exist within some larger struct; this design choice reduces memory allocation overhead. Most of the implementation uses inlineable functions (where supported), for performance. Some existing uses of both types of lists have been converted to the new code, for demonstration purposes. Other uses can (and probably will) be converted in the future. Since dllist.c is unused after this conversion, it has been removed. Author: Andres Freund Some tweaks by me Reviewed by Tom Lane, Peter Geoghegan	2012-10-17 11:31:20 -03:00
Heikki Linnakangas	ff6c78c480	Remove comment that is no longer true. AddToDataDirLockFile() supports out-of-order updates of the lockfile nowadays.	2012-10-15 11:03:39 +03:00
Tom Lane	e81e8f9342	Split up process latch initialization for more-fail-soft behavior. In the previous coding, new backend processes would attempt to create their self-pipe during the OwnLatch call in InitProcess. However, pipe creation could fail if the kernel is short of resources; and the system does not recover gracefully from a FATAL error right there, since we have armed the dead-man switch for this process and not yet set up the on_shmem_exit callback that would disarm it. The postmaster then forces an unnecessary database-wide crash and restart, as reported by Sean Chittenden. There are various ways we could rearrange the code to fix this, but the simplest and sanest seems to be to split out creation of the self-pipe into a new function InitializeLatchSupport, which must be called from a place where failure is allowed. For most processes that gets called in InitProcess or InitAuxiliaryProcess, but processes that don't call either but still use latches need their own calls. Back-patch to 9.1, which has only a part of the latch logic that 9.2 and HEAD have, but nonetheless includes this bug.	2012-10-14 22:59:56 -04:00
Tom Lane	a29f7ed554	Get rid of COERCE_DONTCARE. We don't need this hack any more.	2012-10-12 13:35:00 -04:00
Tom Lane	71e58dcfb9	Make equal() ignore CoercionForm fields for better planning with casts. This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.	2012-10-12 12:11:22 -04:00
Heikki Linnakangas	6f60fdd701	Improve replication connection timeouts. Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. Amit Kapila, heavily edited by me.	2012-10-11 17:48:08 +03:00
Tom Lane	a80889a735	Set procost to 10 for each of the pg_foo_is_visible() functions. The idea here is to make sure the planner will evaluate these functions last not first among the filter conditions in psql pattern search and tab-completion queries. We've discussed this several times, and there was consensus to do it back in August, but we didn't want to do it just before a release. Now seems like a safer time. No catversion bump, since this catalog change doesn't create a backend incompatibility nor any regression test result changes.	2012-10-10 12:19:25 -04:00
Tom Lane	7e0cce0265	Remove unnecessary overhead in backend's large-object operations. Do read/write permissions checks at most once per large object descriptor, not once per lo_read or lo_write call as before. The repeated tests were quite useless in the read case since the snapshot-based tests were guaranteed to produce the same answer every time. In the write case, the extra tests could in principle detect revocation of write privileges after a series of writes has started --- but there's a race condition there anyway, since we'd check privileges before performing and certainly before committing the write. So there's no real advantage to checking every single time, and we might as well redefine it as "only check the first time". On the same reasoning, remove the LargeObjectExists checks in inv_write and inv_truncate. We already checked existence when the descriptor was opened, and checking again doesn't provide any real increment of safety that would justify the cost.	2012-10-09 16:38:00 -04:00
Alvaro Herrera	f46baf601d	Rename USE_INLINE to PG_USE_INLINE The former name was too likely to conflict with symbols from external headers; and, as seen in recent buildfarm failures in member spoonbill, it has now happened at least in plpython.	2012-10-09 11:17:33 -03:00
Tom Lane	26fe56481c	Code review for 64-bit-large-object patch. Fix broken-on-bigendian-machines byte-swapping functions, add missed update of alternate regression expected file, improve error reporting, remove some unnecessary code, sync testlo64.c with current testlo.c (it seems to have been cloned from a very old copy of that), assorted cosmetic improvements.	2012-10-08 18:24:32 -04:00
Alvaro Herrera	976fa10d20	Add support for easily declaring static inline functions We already had those, but they forced modules to spell out the function bodies twice. Eliminate some duplicates we had already grown. Extracted from a somewhat larger patch from Andres Freund.	2012-10-08 16:28:01 -03:00
Robert Haas	08c8058ce9	Add #define for UUIDOID. Phil Sorber and Thom Brown. Reviewed by Albe Laurenz.	2012-10-08 10:15:15 -04:00
Tom Lane	95d035e66d	Autoconfiscate selection of 64-bit int type for 64-bit large object API. Get rid of the fundamentally indefensible assumption that "long long int" exists and is exactly 64 bits wide on every platform Postgres runs on. Instead let the configure script select the type to use for "pg_int64". This is a bit of a pain in the rear since we do not want to pollute client namespace with all the random symbols that pg_config.h defines; instead we have to create a separate generated header file, "pg_config_ext.h". But now that the infrastructure is there, we might have the ability to add some other stuff that's long been wanting in this area.	2012-10-07 21:52:43 -04:00
Tatsuo Ishii	7e2f8ed2b0	Fix compiling errors on Windows platform. Fix wrong usage of INT64CONST macro. Fix lo_hton64 and lo_ntoh64 not to use int32_t and uint32_t.	2012-10-07 23:30:31 +09:00
Tatsuo Ishii	b51a65f5bf	Bump up catalog vesion due to 64-bit large object API functions addition.	2012-10-07 09:36:20 +09:00
Tatsuo Ishii	461ef73f09	Add API for 64-bit large object access. Now users can access up to 4TB large objects (standard 8KB BLCKSZ case). For this purpose new libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added. Also corresponding new backend functions lo_lseek64, lo_tell64 and lo_truncate64 are added. inv_api.c is changed to handle 64-bit offsets. Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata (frontend side, docs, regression tests and example program). Reviewed by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.	2012-10-07 08:36:48 +09:00
Heikki Linnakangas	fd5942c18f	Use the regular main processing loop also in walsenders. The regular backend's main loop handles signal handling and error recovery better than the current WAL sender command loop does. For example, if the client hangs and a SIGTERM is received before starting streaming, the walsender will now terminate immediately, rather than hang until the connection times out.	2012-10-05 17:21:12 +03:00
Tom Lane	fb34e94d21	Support CREATE SCHEMA IF NOT EXISTS. Per discussion, schema-element subcommands are not allowed together with this option, since it's not very obvious what should happen to the element objects. Fabrízio de Royes Mello	2012-10-03 19:47:11 -04:00
Alvaro Herrera	994c36e01d	refactor ALTER some-obj SET OWNER implementation Remove duplicate implementation of catalog munging and miscellaneous privilege and consistency checks. Instead rely on already existing data in objectaddress.c to do the work. Author: KaiGai Kohei Tweaked by me Reviewed by Robert Haas	2012-10-03 18:07:46 -03:00
Alvaro Herrera	2164f9a125	Refactor "ALTER some-obj SET SCHEMA" implementation Instead of having each object type implement the catalog munging independently, centralize knowledge about how to do it and expand the existing table in objectaddress.c with enough data about each object type to support this operation. Author: KaiGai Kohei Tweaks by me Reviewed by Robert Haas	2012-10-02 18:13:54 -03:00
Heikki Linnakangas	d5497b95f3	Split off functions related to timeline history files and XLOG archiving. This is just refactoring, to make the functions accessible outside xlog.c. A followup patch will make use of that, to allow fetching timeline history files over streaming replication.	2012-10-02 13:37:19 +03:00
Tom Lane	0d0aa5d291	Provide some static-assertion functionality on all compilers. On reflection (especially after noticing how many buildfarm critters have __builtin_types_compatible_p but not _Static_assert), it seems like we ought to try a bit harder to make these macros do something everywhere. The initial cut at it would have been no help to code that is compiled only on platforms without _Static_assert, for instance; and in any case not all our contributors do their initial coding on the latest gcc version. Some googling about static assertions turns up quite a bit of prior art for making it work in compilers that lack _Static_assert. The method that seems closest to our needs involves defining a struct with a bit-field that has negative width if the assertion condition fails. There seems no reliable way to get the error message string to be output, but throwing a compile error with a confusing message is better than missing the problem altogether. In the same spirit, if we don't have __builtin_types_compatible_p we can at least insist that the variable have the same width as the type. This won't catch errors such as "wrong pointer type", but it's far better than nothing. In addition to changing the macro definitions, adjust a compile-time-constant Assert in contrib/hstore to use StaticAssertStmt, so we can get some buildfarm coverage on whether that macro behaves sanely or not. There's surely more places that could be converted, but this is the first one I came across.	2012-09-30 22:46:29 -04:00
Tom Lane	ea473fb2de	Add infrastructure for compile-time assertions about variable types. Currently, the macros only work with fairly recent gcc versions, but there is room to expand them to other compilers that have comparable features. Heavily revised and autoconfiscated version of a patch by Andres Freund.	2012-09-30 14:38:31 -04:00
Andrew Dunstan	6e9876dc32	Remove checks for now long outdated compilers.	2012-09-28 19:43:50 -04:00
Tom Lane	70bc583319	Fix btmarkpos/btrestrpos to handle array keys. This fixes another error in commit `9e8da0f757`. I neglected to make the mark/restore functionality save and restore the current set of array key values, which led to strange behavior if an IndexScan with ScalarArrayOpExpr quals was used as the inner side of a mergejoin. Per bug #7570 from Melese Tesfaye.	2012-09-27 17:01:02 -04:00
Heikki Linnakangas	2a0c81a12c	Add support for include_dir in config file. This allows easily splitting configuration into many files, deployed in a directory. Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.	2012-09-24 18:07:53 +03:00
Tom Lane	31510194cc	Minor corrections for ALTER TYPE ADD VALUE IF NOT EXISTS patch. Produce a NOTICE when the label already exists, for consistency with other CREATE IF NOT EXISTS commands. Also, fix the code so it produces something more user-friendly than an index violation when the label already exists. This not incidentally enables making a regression test that the previous patch didn't make for fear of exposing an unpredictable OID in the results. Also some wordsmithing on the documentation.	2012-09-22 18:35:22 -04:00
Andrew Dunstan	6d12b68cd7	Allow IF NOT EXISTS when add a new enum label. If the label is already in the enum the statement becomes a no-op. This will reduce the pain that comes from our not allowing this operation inside a transaction block. Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.	2012-09-22 12:53:31 -04:00
Tom Lane	11e131854f	Improve ruleutils.c's heuristics for dealing with rangetable aliases. The previous scheme had bugs in some corner cases involving tables that had been renamed since a view was made. This could result in dumped views that failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd Albin, as well as in some pgsql-hackers discussion back in January. Also, its behavior for printing EXPLAIN plans was sometimes confusing because of willingness to use the same alias for multiple RTEs (it was Ashutosh Bapat's complaint about that aspect that started the January thread). To fix, ensure that each RTE in the query has a unique unqualified alias, by modifying the alias if necessary (we add "_" and digits as needed to create a non-conflicting name). Then we can just print its variables with that alias, avoiding the confusing and bug-prone scheme of sometimes schema-qualifying variable names. In EXPLAIN, it proves to be expedient to take the further step of only assigning such aliases to RTEs that are actually referenced in the query, since the planner has a habit of generating extra RTEs with the same alias in situations such as inheritance-tree expansion. Although this fixes a bug of very long standing, I'm hesitant to back-patch such a noticeable behavioral change. My experiments while creating a regression test convinced me that actually incorrect output (as opposed to confusing output) occurs only in very narrow cases, which is backed up by the lack of previous complaints from the field. So we may be better off living with it in released branches; and in any case it'd be smart to let this ripen awhile in HEAD before we consider back-patching it.	2012-09-21 19:03:10 -04:00
Heikki Linnakangas	7c45e3a3c6	Parse pg_ident.conf when it's loaded, keeping it in memory in parsed format. Similar changes were done to pg_hba.conf earlier already, this commit makes pg_ident.conf to behave the same as pg_hba.conf. This has two user-visible effects. First, if pg_ident.conf contains multiple errors, the whole file is parsed at postmaster startup time and all the errors are immediately reported. Before this patch, the file was parsed and the errors were reported only when someone tries to connect using an authentication method that uses the file, and the parsing stopped on first error. Second, if you SIGHUP to reload the config files, and the new pg_ident.conf file contains an error, the error is logged but the old file stays in effect. Also, regular expressions in pg_ident.conf are now compiled only once when the file is loaded, rather than every time the a user is authenticated. That should speed up authentication if you have a lot of regexps in the file. Amit Kapila	2012-09-21 17:54:39 +03:00
Alvaro Herrera	22c734fcdb	Remove execdesc.h inclusion from tcopprot.h	2012-09-20 11:07:59 -03:00
Tom Lane	9a93e71008	Fix a couple other leftover uses of 'conisonly' terminology.	2012-09-12 15:12:24 -04:00
Tom Lane	46c508fbcf	Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH. The planner previously assumed that parameter Vars having the same absolute query level, varno, and varattno could safely be assigned the same runtime PARAM_EXEC slot, even though they might be different Vars appearing in different subqueries. This was (probably) safe before the introduction of CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can be executed during execution of some other subquery, causing the lifespan of Params at the same syntactic nesting level as the CTE to overlap with use of the same slots inside the CTE. In 9.1 we created additional hazards by using the same parameter-assignment technology for nestloop inner scan parameters, but it was broken before that, as illustrated by the added regression test. To fix, restructure the planner's management of PlannerParamItems so that items having different semantic lifespans are kept rigorously separated. This will probably result in complex queries using more runtime PARAM_EXEC slots than before, but the slots are cheap enough that this hardly matters. Also, stop generating PlannerParamItems containing Params for subquery outputs: all we really need to do is reserve the PARAM_EXEC slot number, and that now only takes incrementing a counter. The planning code is simpler and probably faster than before, as well as being more correct. Per report from Vik Reykja. These changes will mostly also need to be made in the back branches, but I'm going to hold off on that until after 9.2.0 wraps.	2012-09-05 12:55:01 -04:00
Alvaro Herrera	e20a90e188	Trim spgist_private.h inclusion It doesn't really need rel.h; relcache.h is enough.	2012-09-05 11:06:51 -03:00
Heikki Linnakangas	c4c227477b	Fix bugs in cascading replication with recovery_target_timeline='latest' The cascading replication code assumed that the current RecoveryTargetTLI never changes, but that's not true with recovery_target_timeline='latest'. The obvious upshot of that is that RecoveryTargetTLI in shared memory needs to be protected by a lock. A less obvious consequence is that when a cascading standby is connected, and the standby switches to a new target timeline after scanning the archive, it will continue to stream WAL to the cascading standby, but from a wrong file, ie. the file of the previous timeline. For example, if the standby is currently streaming from the middle of file 000000010000000000000005, and the timeline changes, the standby will continue to stream from that file. However, the WAL on the new timeline is in file 000000020000000000000005, so the standby sends garbage from 000000010000000000000005 to the cascading standby, instead of the correct WAL from file 000000020000000000000005. This also fixes a related bug where a partial WAL segment is restored from the archive and streamed to a cascading standby. The code assumed that when a WAL segment is copied from the archive, it can immediately be fully streamed to a cascading standby. However, if the segment is only partially filled, ie. has the right size, but only N first bytes contain valid WAL, that's not safe. That can happen if a partial WAL segment is manually copied to the archive, or if a partial WAL segment is archived because a server is started up on a new timeline within that segment. The cascading standby will get confused if the WAL it received is not valid, and will get stuck until it's restarted. This patch fixes that problem by not allowing WAL restored from the archive to be streamed to a cascading standby until it's been replayed, and thus validated.	2012-09-04 19:33:21 -07:00
Bruce Momjian	015722fb36	Fix to_date() and to_timestamp() to allow specification of the day of the week via ISO or Gregorian designations. The fix is to store the day-of-week consistently as 1-7, Sunday = 1. Fixes bug reported by Marc Munro	2012-09-03 22:52:44 -04:00
Tom Lane	6d2c8c0e2a	Drop cheap-startup-cost paths during add_path() if we don't need them. We can detect whether the planner top level is going to care at all about cheap startup cost (it will only do so if query_planner's tuple_fraction argument is greater than zero). If it isn't, we might as well discard paths immediately whose only advantage over others is cheap startup cost. This turns out to get rid of quite a lot of paths in complex queries --- I saw planner runtime reduction of more than a third on one large query. Since add_path isn't currently passed the PlannerInfo "root", the easiest way to tell it whether to do this was to add a bool flag to RelOptInfo. That's a bit redundant, since all relations in a given query level will have the same setting. But in the future it's possible that we'd refine the control decision to work on a per-relation basis, so this seems like a good arrangement anyway. Per my suggestion of a few months ago.	2012-09-01 18:16:24 -04:00
Tom Lane	58a031f920	Make configure probe for mbstowcs_l as well as wcstombs_l. We previously supposed that any given platform would supply both or neither of these functions, so that one configure test would be sufficient. It now appears that at least on AIX this is not the case ... which is likely an AIX bug, but nonetheless we need to cope with it. So use separate tests. Per bug #6758; thanks to Andrew Hastie for doing the followup testing needed to confirm what was happening. Backpatch to 9.1, where we began using these functions.	2012-08-31 14:17:56 -04:00
Alvaro Herrera	c219d9b0a5	Split tuple struct defs from htup.h to htup_details.h This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.	2012-08-30 16:52:35 -04:00
Tom Lane	77387f0ac8	Suppress creation of backwardly-indexed paths for LATERAL join clauses. Given a query such as SELECT * FROM foo JOIN LATERAL (SELECT foo.var1) ss(x) ON ss.x = foo.var2 the existence of the join clause "ss.x = foo.var2" encourages indxpath.c to build a parameterized path for foo using any index available for foo.var2. This is completely useless activity, though, since foo has got to be on the outside not the inside of any nestloop join with ss. It's reasonably inexpensive to add tests that prevent creation of such paths, so let's do that.	2012-08-30 14:33:00 -04:00
Tom Lane	e83bb10d6d	Adjust definition of cheapest_total_path to work better with LATERAL. In the initial cut at LATERAL, I kept the rule that cheapest_total_path was always unparameterized, which meant it had to be NULL if the relation has no unparameterized paths. It turns out to work much more nicely if we always have some path nominated as cheapest-total for each relation. In particular, let's still say it's the cheapest unparameterized path if there is one; if not, take the cheapest-total-cost path among those of the minimum available parameterization. (The first rule is actually a special case of the second.) This allows reversion of some temporary lobotomizations I'd put in place. In particular, the planner can now consider hash and merge joins for joins below a parameter-supplying nestloop, even if there aren't any unparameterized paths available. This should bring planning of LATERAL-containing queries to the same level as queries not using that feature. Along the way, simplify management of parameterized paths in add_path() and friends. In the original coding for parameterized paths in 9.2, I tried to minimize the logic changes in add_path(), so it just treated parameterization as yet another dimension of comparison for paths. We later made it ignore pathkeys (sort ordering) of parameterized paths, on the grounds that ordering isn't a useful property for the path on the inside of a nestloop, so we might as well get rid of useless parameterized paths as quickly as possible. But we didn't take that reasoning as far as we should have. Startup cost isn't a useful property inside a nestloop either, so add_path() ought to discount startup cost of parameterized paths as well. Having done that, the secondary sorting I'd implemented (in add_parameterized_path) is no longer needed --- any parameterized path that survives add_path() at all is worth considering at higher levels. So this should be a bit faster as well as simpler.	2012-08-29 22:06:07 -04:00
Alvaro Herrera	21c09e99dc	Split heapam_xlog.h from heapam.h The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.	2012-08-28 19:02:00 -04:00
Alvaro Herrera	fda0594fc2	remove catcache.h from syscache.h Instead, place a forward struct declaration for struct catclist in syscache.h. This reduces header proliferation somewhat.	2012-08-28 18:36:39 -04:00
Alvaro Herrera	45326c5a11	Split resowner.h This lets files that are mere users of ResourceOwner not automatically include the headers for stuff that is managed by the resowner mechanism.	2012-08-28 18:02:07 -04:00

... 2 3 4 5 6 ...

6229 Commits