postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-09-28 06:11:49 +02:00

Author	SHA1	Message	Date
Heikki Linnakangas	c79c570bd8	Small comment fixes and enhancements.	2011-06-10 17:22:46 +03:00
Tom Lane	829ae4bf83	Tag 9.1beta2.	2011-06-09 19:40:42 -04:00
Alvaro Herrera	9261557eb1	Revert "Use "transient" files for blind writes" This reverts commit `54d9e8c6c1`, which caused a failure on the buildfarm. Not a good thing to have just before a beta release.	2011-06-09 16:41:44 -04:00
Alvaro Herrera	54d9e8c6c1	Use "transient" files for blind writes "Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.)	2011-06-09 16:25:49 -04:00
Bruce Momjian	6560407c7d	Pgindent run before 9.1 beta2.	2011-06-09 14:32:50 -04:00
Heikki Linnakangas	8f9622bbb3	Make DDL operations play nicely with Serializable Snapshot Isolation. Truncating or dropping a table is treated like deletion of all tuples, and check for conflicts accordingly. If a table is clustered or rewritten by ALTER TABLE, all predicate locks on the heap are promoted to relation-level locks, because the tuple or page ids of any existing tuples will change and won't be valid after rewriting the table. Arguably ALTER TABLE should be treated like a mass-UPDATE of every row, but if you e.g change the datatype of a column, you could also argue that it's just a change to the physical layout, not a logical change. Reindexing promotes all locks on the index to relation-level lock on the heap. Kevin Grittner, with a lot of cosmetic changes by me.	2011-06-08 14:02:43 +03:00
Tom Lane	ea8e42f3a0	Fix failure to check whether a rowtype's component types are sortable. The existence of a btree opclass accepting composite types caused us to assume that every composite type is sortable. This isn't true of course; we need to check if the column types are all sortable. There was logic for this for the case of array comparison (ie, check that the element type is sortable), but we missed the point for rowtypes. Per Teodor's report of an ANALYZE failure for an unsortable composite type. Rather than just add some more ad-hoc logic for this, I moved knowledge of the issue into typcache.c. The typcache will now only report out array_eq, record_cmp, and friends as usable operators if the array or composite type will work with those functions. Unfortunately we don't have enough info to do this for anonymous RECORD types; in that case, just assume it will work, and take the runtime failure as before if it doesn't. This patch might be a candidate for back-patching at some point, but given the lack of complaints from the field, I'd rather just test it in HEAD for now. Note: most of the places touched in this patch will need further work when we get around to supporting hashing of record types.	2011-06-03 15:39:17 -04:00
Heikki Linnakangas	c8630919e0	SSI comment fixes and enhancements. Notably, document that the conflict-out flag actually means that the transaction has a conflict out to a transaction that committed before the flagged transaction. Kevin Grittner	2011-06-03 12:45:42 +03:00
Tom Lane	680ea6a6df	Looks like we can't declare getpeereid on Windows anyway. ... for lack of the uid_t and gid_t typedefs. Per buildfarm.	2011-06-02 17:27:30 -04:00
Tom Lane	3980f7fc6e	Implement getpeereid() as a src/port compatibility function. This unifies a bunch of ugly #ifdef's in one place. Per discussion, we only need this where HAVE_UNIX_SOCKETS, so no need to cover Windows. Marko Kreen, some adjustment by Tom Lane	2011-06-02 13:05:01 -04:00
Tom Lane	be4585b1c2	Replace use of credential control messages with getsockopt(LOCAL_PEERCRED). It turns out the reason we hadn't found out about the portability issues with our credential-control-message code is that almost no modern platforms use that code at all; the ones that used to need it now offer getpeereid(), which we choose first. The last holdout was NetBSD, and they added getpeereid() as of 5.0. So far as I can tell, the only live platform on which that code was being exercised was Debian/kFreeBSD, ie, FreeBSD kernel with Linux userland --- since glibc doesn't provide getpeereid(), we fell back to the control message code. However, the FreeBSD kernel provides a LOCAL_PEERCRED socket parameter that's functionally equivalent to Linux's SO_PEERCRED. That is both much simpler to use than control messages, and superior because it doesn't require receiving a message from the other end at just the right time. Therefore, add code to use LOCAL_PEERCRED when necessary, and rip out all the credential-control-message code in the backend. (libpq still has such code so that it can still talk to pre-9.1 servers ... but eventually we can get rid of it there too.) Clean up related autoconf probes, too. This means that libpq's requirepeer parameter now works on exactly the same platforms where the backend supports peer authentication, so adjust the documentation accordingly.	2011-05-31 16:10:46 -04:00
Tom Lane	b4b6923e03	Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.	2011-05-30 17:06:52 -04:00
Heikki Linnakangas	3103f9a77d	The row-version chaining in Serializable Snapshot Isolation was still wrong. On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.	2011-05-30 20:47:17 +03:00
Robert Haas	7149b128dc	Improve hash_array() logic for combining hash values. The new logic is less vulnerable to transpositions. This invalidates the contents of hash indexes built with the old functions; hence, bump catversion. Dean Rasheed	2011-05-23 15:17:18 -04:00
Tom Lane	299d171652	Install defenses against overflow in BuildTupleHashTable(). The planner can sometimes compute very large values for numGroups, and in cases where we have no alternative to building a hashtable, such a value will get fed directly to BuildTupleHashTable as its nbuckets parameter. There were two ways in which that could go bad. First, BuildTupleHashTable declared the parameter as "int" but most callers were passing "long"s, so on 64-bit machines undetected overflow could occur leading to a bogus negative value. The obvious fix for that is to change the parameter to "long", which is what I've done in HEAD. In the back branches that seems a bit risky, though, since third-party code might be calling this function. So for them, just put in a kluge to treat negative inputs as INT_MAX. Second, hash_create can go nuts with extremely large requested table sizes (notably, my_log2 becomes an infinite loop for inputs larger than LONG_MAX/2). What seems most appropriate to avoid that is to bound the initial table size request to work_mem. This fixes bug #6035 reported by Daniel Schreiber. Although the reported case only occurs back to 8.4 since it involves WITH RECURSIVE, I think it's a good idea to install the defenses in all supported branches.	2011-05-23 12:52:46 -04:00
Heikki Linnakangas	30e98a7e6e	Pull up isReset flag from AllocSetContext to MemoryContext struct. This avoids the overhead of one function call when calling MemoryContextReset(), and it seems like the isReset optimization would be applicable to any new memory context we might invent in the future anyway. This buys back the overhead I just added in previous patch to always call MemoryContextReset() in ExecScan, even when there's no quals or projections.	2011-05-21 14:47:19 -04:00
Robert Haas	9bb6d97952	More cleanup of FOREIGN TABLE permissions handling. This commit fixes psql, pg_dump, and the information schema to be consistent with the backend changes which I made as part of commit `be90032e0d`, and also includes a related documentation tweak. Shigeru Hanada, with slight adjustment.	2011-05-13 15:51:03 -04:00
Tom Lane	e05b866447	Split PGC_S_DEFAULT into two values, for true boot_val vs computed default. Failure to distinguish these cases is the real cause behind the recent reports of Windows builds crashing on 'infinity'::timestamp, which was directly due to failure to establish a value of timezone_abbreviations in postmaster child processes. The postmaster had the desired value, but write_one_nondefault_variable() didn't transmit it to backends. To fix that, invent a new value PGC_S_DYNAMIC_DEFAULT, and be sure to use that or PGC_S_ENV_VAR (as appropriate) for "default" settings that are computed during initialization. (We need both because there's at least one variable that could receive a value from either source.) This commit also fixes ProcessConfigFile's failure to restore the correct default value for certain GUC variables if they are set in postgresql.conf and then removed/commented out of the file. We have to recompute and reinstall the value for any GUC variable that could have received a value from PGC_S_DYNAMIC_DEFAULT or PGC_S_ENV_VAR sources, and there were a number of oversights. (That whole thing is a crock that needs to be redesigned, but not today.) However, I intentionally didn't make it work "exactly right" for the cases of timezone and log_timezone. The exactly right behavior would involve running select_default_timezone, which we'd have to do independently in each postgres process, causing the whole database to become entirely unresponsive for as much as several seconds. That didn't seem like a good idea, especially since the variable's removal from postgresql.conf might be just an accidental edit. Instead the behavior is to adopt the previously active setting as if it were default. Note that this patch creates an ABI break for extensions that use any of the PGC_S_XXX constants; they'll need to be recompiled.	2011-05-11 19:57:38 -04:00
Andrew Dunstan	c02d5b7c27	Use a macro variable PG_PRINTF_ATTRIBUTE for the style used for checking printf type functions. The style is set to "printf" for backwards compatibility everywhere except on Windows, where it is set to "gnu_printf", which eliminates hundreds of false error messages from modern versions of gcc arising from %m and %ll{d,u} formats.	2011-04-28 10:56:14 -04:00
Tom Lane	993c5e5904	Tag 9.1beta1.	2011-04-27 17:17:22 -04:00
Andrew Dunstan	6693fec0e8	Revert "Force use of "%I64d" format for 64 bit ints on MinGW." This reverts commit `52d01c2f52`. the UINT64_FORMAT bit broke the b uildfarm, so I'm reverting the whole thing pending further investigation.	2011-04-27 14:55:18 -04:00
Andrew Dunstan	52d01c2f52	Force use of "%I64d" format for 64 bit ints on MinGW. Both this and "%lld" work, but the compiler's format checking doesn't like "%lld", so we get all sorts of spurious warnings.	2011-04-27 10:09:23 -04:00
Robert Haas	68ef051f5c	Refactor broken CREATE TABLE IF NOT EXISTS support. Per bug #5988, reported by Marko Tiikkaja, and further analyzed by Tom Lane, the previous coding was broken in several respects: even if the target table already existed, a subsequent CREATE TABLE IF NOT EXISTS might try to add additional constraints or sequences-for-serial specified in the new CREATE TABLE statement. In passing, this also fixes a minor information leak: it's no longer possible to figure out whether a schema to which you don't have CREATE access contains a sequence named like "x_y_seq" by attempting to create a table in that schema called "x" with a serial column called "y". Some more refactoring of this code in the future might be warranted, but that will need to wait for a later major release.	2011-04-25 16:55:11 -04:00
Robert Haas	be90032e0d	Remove partial and undocumented GRANT .. FOREIGN TABLE support. Instead, foreign tables are treated just like views: permissions can be granted using GRANT privilege ON [TABLE] foreign_table_name TO role, and revoked similarly. GRANT/REVOKE .. FOREIGN TABLE is no longer supported, just as we don't support GRANT/REVOKE .. VIEW. The set of accepted permissions for foreign tables is now identical to the set for regular tables, and views. Per report from Thom Brown, and subsequent discussion.	2011-04-25 16:39:18 -04:00
Andrew Dunstan	860be17ec3	Assorted minor changes to silence Windows compiler warnings. Mostly to do with macro redefinitions or object signedness.	2011-04-25 12:56:53 -04:00
Bruce Momjian	76dd09bbec	Add postmaster/postgres undocumented -b option for binary upgrades. This option turns off autovacuum, prevents non-super-user connections, and enables oid setting hooks in the backend. The code continues to use the old autoavacuum disable settings for servers with earlier catalog versions. This includes a catalog version bump to identify servers that support the -b option.	2011-04-25 12:00:21 -04:00
Tom Lane	e6a30a8c3c	Improve cost estimation for aggregates and window functions. The previous coding failed to account properly for the costs of evaluating the input expressions of aggregates and window functions, as seen in a recent gripe from Claudio Freire. (I said at the time that it wasn't counting these costs at all; but on closer inspection, it was effectively charging these costs once per output tuple. That is completely wrong for aggregates, and not exactly right for window functions either.) There was also a hard-wired assumption that aggregates and window functions had procost 1.0, which is now fixed to respect the actual cataloged costs. The costing of WindowAgg is still pretty bogus, since it doesn't try to estimate the effects of spilling data to disk, but that seems like a separate issue.	2011-04-24 16:55:20 -04:00
Tom Lane	2ab0796d7a	Fix char2wchar/wchar2char to support collations properly. These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.	2011-04-23 12:35:41 -04:00
Tom Lane	ae20bf1740	Make GIN and GIST pass the index collation to all their support functions. Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.	2011-04-22 20:13:12 -04:00
Robert Haas	68739ba856	Allow ALTER TABLE name {OF type \| NOT OF}. This syntax allows a standalone table to be made into a typed table, or a typed table to be made standalone. This is possibly a mildly useful feature in its own right, but the real motivation for this change is that we need it to make pg_upgrade work with typed tables. This doesn't actually fix that problem, but it's necessary infrastructure. Noah Misch	2011-04-20 21:38:47 -04:00
Tom Lane	8c19977e9c	Avoid changing an index's indcheckxmin horizon during REINDEX. There can never be a need to push the indcheckxmin horizon forward, since any HOT chains that are actually broken with respect to the index must pre-date its original creation. So we can just avoid changing pg_index altogether during a REINDEX operation. This offers a cleaner solution than my previous patch for the problem found a few days ago that we mustn't try to update pg_index while we are reindexing it. System catalog indexes will always be created with indcheckxmin = false during initdb, and with this modified code we should never try to change their pg_index entries. This avoids special-casing system catalogs as the former patch did, and should provide a performance benefit for many cases where REINDEX formerly caused an index to be considered unusable for a short time. Back-patch to 8.3 to cover all versions containing HOT. Note that this patch changes the API for index_build(), but I believe it is unlikely that any add-on code is calling that directly.	2011-04-19 18:50:56 -04:00
Tom Lane	918854cc08	Fix handling of collations in multi-row VALUES constructs. Per spec we ought to apply select_common_collation() across the expressions in each column of the VALUES table. The original coding was just taking the first row and assuming it was representative. This patch adds a field to struct RangeTblEntry to carry the resolved collations, so initdb is forced for changes in stored rule representation.	2011-04-18 15:31:52 -04:00
Tom Lane	2d3320d3d2	Simplify reindex_relation's API. For what seem entirely historical reasons, a bitmask "flags" argument was recently added to reindex_relation without subsuming its existing boolean argument into that bitmask. This seems a bit bizarre, so fold them together.	2011-04-16 17:26:41 -04:00
Tom Lane	121f49a00e	Clean up collation processing in prepunion.c. This area was a few bricks shy of a load, and badly under-commented too. We have to ensure that the generated targetlist entries for a set-operation node expose the correct collation for each entry, since higher-level processing expects the tlist to reflect the true ordering of the plan's output. This hackery wouldn't be necessary if SortGroupClause carried collation info ... but making it do so would inject more pain in the parser than would be saved here. Still, we might want to rethink that sometime.	2011-04-16 16:40:42 -04:00
Tom Lane	d64713df7e	Pass collations to functions in FunctionCallInfoData, not FmgrInfo. Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...	2011-04-12 19:19:24 -04:00
Tom Lane	921b993677	Fix RI_Initial_Check to use a COLLATE clause when needed in its query. If the referencing and referenced columns have different collations, the parser will be unable to resolve which collation to use unless it's helped out in this way. The effects are sometimes masked, if we end up using a non-collation-sensitive plan; but if we do use a mergejoin we'll see a failure, as recently noted by Robert Haas. The SQL spec states that the referenced column's collation should be used to resolve RI checks, so that's what we do. Note however that we currently don't append a COLLATE clause when writing a query that examines only the referencing column. If we ever support collations that have varying notions of equality, that will have to be changed. For the moment, though, it's preferable to leave it off so that we can use a normal index on the referencing column.	2011-04-11 21:32:53 -04:00
Tom Lane	3c381a55b0	Teach pattern_fixed_prefix() about collations. This is necessary, not optional, now that ILIKE and regexes are collation aware --- else we might derive a wrong comparison constant for index optimized pattern matches.	2011-04-11 12:28:28 -04:00
Heikki Linnakangas	7c797e7194	Fix the size of predicate lock manager's shared memory hash tables at creation. This way they don't compete with the regular lock manager for the slack shared memory, making the behavior more predictable.	2011-04-11 13:43:31 +03:00
Tom Lane	f510fc1d90	Add some more mapping macros for Microsoft wide-character API. Per buildfarm.	2011-04-10 19:37:24 -04:00
Tom Lane	1e16a8107d	Teach regular expression operators to honor collations. This involves getting the character classification and case-folding functions in the regex library to use the collations infrastructure. Most of this work had been done already in connection with the upper/lower and LIKE logic, so it was a simple matter of transposition. While at it, split out these functions into a separate source file regc_pg_locale.c, so that they can be correctly labeled with the Postgres project's license rather than the Scriptics license. These functions are 100% Postgres-written code whereas what remains in regc_locale.c is still mostly not ours, so lumping them both under the same copyright notice was getting more and more misleading.	2011-04-10 18:03:09 -04:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Peter Eisentraut	11745364d0	Add collation support on Windows (MSVC build) There is not yet support in initdb to populate the pg_collation catalog, but if that is done manually, the rest should work.	2011-04-10 00:15:41 +03:00
Tom Lane	c5ff3ff492	Avoid an unnecessary syscache lookup in parse_coerce.c. All the other fields of the constant are being extracted from the syscache entry we already have, so handle collation similarly. (There don't seem to be any other uses for the new function at the moment.)	2011-04-08 16:11:41 -04:00
Tom Lane	2594cf0e8c	Revise the API for GUC variable assign hooks. The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".	2011-04-07 00:12:02 -04:00
Robert Haas	f5e524d92b	Add casts from int4 and int8 to numeric. Joey Adams, per gripe from Ramanujam. Review by myself and Tom Lane.	2011-04-05 09:35:43 -04:00
Simon Riggs	88f32b7ca2	Avoid assuming there will be only 3 states for synchronous_commit. Also avoid hardcoding the current default state by giving it the name "on" and replace with a meaningful name that reflects its behaviour. Coding only, no change in behaviour.	2011-04-04 23:23:13 +01:00
Robert Haas	240067b3b0	Merge synchronous_replication setting into synchronous_commit. This means one less thing to configure when setting up synchronous replication, and also avoids some ambiguity around what the behavior should be when the settings of these variables conflict. Fujii Masao, with additional hacking by me.	2011-04-04 16:25:52 -04:00
Robert Haas	6c57239985	Rearrange "add column" logic to merge columns at exec time. The previous coding set attinhcount too high in some cases, resulting in an undumpable, undroppable column. Per bug #5856, reported by Naoya Anzai. See also commit `31b6fc06d8`, which fixes a similar bug in ALTER TABLE .. ADD CONSTRAINT. Patch by Noah Misch.	2011-04-03 21:53:32 -04:00
Robert Haas	38b27792ea	Avoid possible hang during smart shutdown. If a smart shutdown occurs just as a child is starting up, and the child subsequently becomes a walsender, there is a race condition: the postmaster might count the exstant backends, determine that there is one normal backend, and wait for it to die off. Had the walsender transition already occurred before the postmaster counted, it would have proceeded with the shutdown. To fix this, have each child that transforms into a walsender kick the postmaster just after doing so, so that the state machine is certain to advance. Fujii Masao	2011-04-03 19:42:00 -04:00
Robert Haas	50533a6dc5	Support comments on FOREIGN DATA WRAPPER and SERVER objects. This mostly involves making it work with the objectaddress.c framework, which does most of the heavy lifting. In that vein, change GetForeignDataWrapperOidByName to get_foreign_data_wrapper_oid and GetForeignServerOidByName to get_foreign_server_oid, to match the pattern we use for other object types. Robert Haas and Shigeru Hanada	2011-04-01 11:28:28 -04:00
Heikki Linnakangas	c8ae318cbe	Increase SHMEM_INDEX_SIZE from 32 to 64. We're currently at 40 entries in ShmemIndex, so 64 leaves some headroom. Kevin Grittner	2011-03-31 13:37:01 +03:00
Heikki Linnakangas	754baa21f7	Automatically terminate replication connections that are idle for more than replication_timeout (a new GUC) milliseconds. The TCP timeout is often too long, you want the master to notice a dead connection much sooner. People complained about that in 9.0 too, but with synchronous replication it's even more important to notice dead connections promptly. Fujii Masao and Heikki Linnakangas	2011-03-30 10:20:37 +03:00
Peter Eisentraut	6c0dfc0356	Add maintainer-check target This can do various source code checks that are not appropriate for either the build or the regression tests. Currently: duplicate_oids, SGML syntax and tabs check, NLS syntax check.	2011-03-28 22:56:52 +03:00
Peter Eisentraut	aa6fdd186c	Make duplicate_oids return nonzero exit status if duplicates were found Automatic detection of errors is easier that way.	2011-03-28 22:56:52 +03:00
Tom Lane	eb51af71f2	Prevent a rowtype from being included in itself. Eventually we might be able to allow that, but it's not clear how many places need to be fixed to prevent infinite recursion when there's a direct or indirect inclusion of a rowtype in itself. One such place is CheckAttributeType(), which will recurse to stack overflow in cases such as those exhibited in bug #5950 from Alex Perepelica. If we were sure it was the only such place, we could easily modify the code added by this patch to stop the recursion without a complaint ... but it probably isn't the only such place. Hence, throw error until such time as someone is excited enough about this type of usage to put work into making it safe. Back-patch as far as 8.3. 8.2 doesn't have the recursive call in CheckAttributeType in the first place, so I see no need to add code there in the absence of clear evidence of a problem elsewhere.	2011-03-28 15:46:04 -04:00
Tom Lane	7208fae18f	Clean up cruft around collation initialization for tupdescs and scankeys. I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.	2011-03-26 18:28:40 -04:00
Tom Lane	0c9d9e8dd6	More collations cleanup, from trawling for missed collation assignments. Mostly cosmetic, though I did find that generateClonedIndexStmt failed to clone the index's collations.	2011-03-26 16:35:25 -04:00
Tom Lane	b23c9fa929	Clean up a few failures to set collation fields in expression nodes. I'm not sure these have any non-cosmetic implications, but I'm not sure they don't, either. In particular, ensure the CaseTestExpr generated by transformAssignmentIndirection to represent the base target column carries the correct collation, because parse_collate.c won't fix that. Tweak lsyscache.c API so that we can get the appropriate collation without an extra syscache lookup.	2011-03-26 14:25:48 -04:00
Tom Lane	bfa4440ca5	Pass collation to makeConst() instead of looking it up internally. In nearly all cases, the caller already knows the correct collation, and in a number of places, the value the caller has handy is more correct than the default for the type would be. (In particular, this patch makes it significantly less likely that eval_const_expressions will result in changing the exposed collation of an expression.) So an internal lookup is both expensive and wrong.	2011-03-25 20:10:42 -04:00
Tom Lane	27dc7e240b	Fix handling of collation in SQL-language functions. Ensure that parameter symbols receive collation from the function's resolved input collation, and fix inlining to behave properly. BTW, this commit lays about 90% of the infrastructure needed to support use of argument names in SQL functions. Parsing of parameters is now done via the parser-hook infrastructure ... we'd just need to supply a column-ref hook ...	2011-03-24 20:30:23 -04:00
Simon Riggs	ec497a5ad6	Make FKs valid at creation when added as column constraints. Bug report from Alvaro Herrera	2011-03-22 23:10:35 +00:00
Tom Lane	8df08c8489	Reimplement planner's handling of MIN/MAX aggregate optimization (again). Instead of playing cute games with pathkeys, just build a direct representation of the intended sub-select, and feed it through query_planner to get a Path for the index access. This is a bit slower than 9.1's previous method, since we'll duplicate most of the overhead of query_planner; but since the whole optimization only applies to rather simple single-table queries, that probably won't be much of a problem in practice. The advantage is that we get to do the right thing when there's a partial index that needs the implicit IS NOT NULL clause to be usable. Also, although this makes planagg.c be a bit more closely tied to the ordering of operations in grouping_planner, we can get rid of some coupling to lower-level parts of the planner. Per complaint from Marti Raudsepp.	2011-03-22 00:34:31 -04:00
Tom Lane	176d5bae1d	Fix up handling of C/POSIX collations. Install just one instance of the "C" and "POSIX" collations into pg_collation, rather than one per encoding. Make these instances exist and do something useful even in machines without locale_t support: to wit, it's now possible to force comparisons and case-folding functions to use C locale in an otherwise non-C database, whether or not the platform has support for using any additional collations. Fix up severely broken upper/lower/initcap functions, too: the C/POSIX fastpath now does what it is supposed to, and non-default collations are handled correctly in single-byte database encodings. Merge the two separate collation hashtables that were being maintained in pg_locale.c, and be more wary of the possibility that we fail partway through filling a cache entry.	2011-03-20 12:44:13 -04:00
Tom Lane	b310b6e31c	Revise collation derivation method and expression-tree representation. All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.	2011-03-19 20:30:08 -04:00
Magnus Hagander	6f9192df61	Rename ident authentication over local connections to peer This removes an overloading of two authentication options where one is very secure (peer) and one is often insecure (ident). Peer is also the name used in libpq from 9.1 to specify the same type of authentication. Also make initdb select peer for local connections when ident is chosen, and ident for TCP connections when peer is chosen. ident keyword in pg_hba.conf is still accepted and maps to peer authentication.	2011-03-19 18:44:35 +01:00
Robert Haas	9a56dc3389	Fix various possible problems with synchronous replication. 1. Don't ignore query cancel interrupts. Instead, if the user asks to cancel the query after we've already committed it, but before it's on the standby, just emit a warning and let the COMMIT finish. 2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown). Instead, emit a warning message and close the connection without acknowledging the commit. Other backends will still see the effect of the commit, but there's no getting around that; it's too late to abort at this point, and ignoring die interrupts altogether doesn't seem like a good idea. 3. If synchronous_standby_names becomes empty, wake up all backends waiting for synchronous replication to complete. Without this, someone attempting to shut synchronous replication off could easily wedge the entire system instead. 4. Avoid depending on the assumption that if a walsender updates MyProc->syncRepState, we'll see the change even if we read it without holding the lock. The window for this appears to be quite narrow (and probably doesn't exist at all on machines with strong memory ordering) but protecting against it is practically free, so do that. 5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and doesn't actually do anything. There's still some further work needed here to make the behavior of fast shutdown plausible, but that looks complex, so I'm leaving it for a separate commit. Review by Fujii Masao.	2011-03-17 13:12:21 -04:00
Bruce Momjian	ddd6ff289f	Add database comments to template0 and postgres databases, and improve the comments on the template1 database. No catalog version bump because they are just comments.	2011-03-15 11:26:57 -04:00
Robert Haas	5ca4dfc79f	Remove 13 keywords that are used only for ROLE options. Review by Tom Lane.	2011-03-15 10:22:58 -04:00
Bruce Momjian	b051a34fd8	Remove duplicate time-based macros recently added.	2011-03-14 10:40:14 -04:00
Tom Lane	696d1f7f06	Make all comparisons done for/with statistics use the default collation. While this will give wrong answers when estimating selectivity for a comparison operator that's using a non-default collation, the estimation error probably won't be large; and anyway the former approach created estimation errors of its own by trying to use a histogram that might have been computed with some other collation. So we'll adopt this simplified approach for now and perhaps improve it sometime in the future. This patch incorporates changes from Andres Freund to make sure that selfuncs.c passes a valid collation OID to any datatype-specific function it calls, in case that function wants collation information. Said OID will now always be DEFAULT_COLLATION_OID, but at least we won't get errors.	2011-03-12 16:30:36 -05:00
Bruce Momjian	3a3f39fdc0	Use macros for time-based constants, rather than constants.	2011-03-12 09:35:56 -05:00
Tom Lane	8acdb8bf9c	Split CollateClause into separate raw and analyzed node types. CollateClause is now used only in raw grammar output, and CollateExpr after parse analysis. This is for clarity and to avoid carrying collation names in post-analysis parse trees: that's both wasteful and possibly misleading, since the collation's name could be changed while the parsetree still exists. Also, clean up assorted infelicities and omissions in processing of the node type.	2011-03-11 16:28:18 -05:00
Tom Lane	e3c732a85c	Create an explicit concept of collations that work for any encoding. Use collencoding = -1 to represent such a collation in pg_collation. We need this to make the "default" entry work sanely, and a later patch will fix the C/POSIX entries to be represented this way instead of duplicating them across all encodings. All lookup operations now search first for an entry that's database-encoding-specific, and then for the same name with collencoding = -1. Also some incidental code cleanup in collationcmds.c and pg_collation.c.	2011-03-11 13:20:11 -05:00
Bruce Momjian	7d23e0f803	Update C comment about O_DIRECT and fsync().	2011-03-11 06:46:44 -05:00
Tom Lane	7564654adf	Revert addition of third argument to format_type(). Including collation in the behavior of that function promotes a world view we do not want. Moreover, it was producing the wrong behavior for pg_dump anyway: what we want is to dump a COLLATE clause on attributes whose attcollation is different from the underlying type, and likewise for domains, and the function cannot do that for us. Doing it the hard way in pg_dump is a bit more tedious but produces more correct output. In passing, fix initdb so that the initial entry in pg_collation is properly pinned. It was droppable before :-(	2011-03-10 17:30:46 -05:00
Robert Haas	2e019c8611	More synchronous replication typo fixes. Fujii Masao	2011-03-10 15:56:18 -05:00
Robert Haas	b8bb8dbf20	More synchronous replication tweaks. SyncRepRequested() must check not only the value of the synchronous_replication GUC but also whether max_wal_senders > 0. Otherwise, we might end up waiting for sync rep even when there's no possibility of a standby ever managing to connect. There are some existing cross-checks to prevent this, but they're not quite sufficient: the user can start the server with max_wal_senders=0, synchronous_standby_names='', and synchronous_replication=off and then subsequent make synchronous_standby_names not empty using pg_ctl reload, and then SET synchronous_standby=on, leading to an indefinite hang. Along the way, rename the global variable for the synchronous_replication GUC to match the name of the GUC itself, for clarity. Report by Fujii Masao, though I didn't use his patch.	2011-03-10 15:43:37 -05:00
Robert Haas	e397d2ee64	Remove obsolete comment. In earlier versions of the sync rep patch, waiters removed themselves from the queue, but now walsender removes them before doing the wakeup. Report by Fujii Masao.	2011-03-10 15:00:20 -05:00
Robert Haas	6436098795	Minor sync rep corrections. Fujii Masao, with a bit of additional wordsmithing by me.	2011-03-10 14:57:02 -05:00
Itagaki Takahiro	2d8de0a50b	Cleanup copyright years and file names in the header comments of some files.	2011-03-10 15:05:33 +09:00
Tom Lane	a051ef699c	Remove collation information from TypeName, where it does not belong. The initial collations patch treated a COLLATE spec as part of a TypeName, following what can only be described as brain fade on the part of the SQL committee. It's a lot more reasonable to treat COLLATE as a syntactically separate object, so that it can be added in only the productions where it actually belongs, rather than needing to reject it in a boatload of places where it doesn't belong (something the original patch mostly failed to do). In addition this change lets us meet the spec's requirement to allow COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly behavior for constructs such as "foo::type COLLATE collation". To do this, pull collation information out of TypeName and put it in ColumnDef instead, thus reverting most of the collation-related changes in parse_type.c's API. I made one additional structural change, which was to use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd nodes. This provides enough room to get rid of the "transform" wart in AlterTableCmd too, since the ColumnDef can carry the USING expression easily enough. Also fix some other minor bugs that have crept in in the same areas, like failure to copy recently-added fields of ColumnDef in copyfuncs.c. While at it, document the formerly secret ability to specify a collation in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about what the default collation selection will be when COLLATE is omitted. BTW, the three-parameter form of format_type() should go away too, since it just contributes to the confusion in this area; but I'll do that in a separate patch.	2011-03-09 22:39:20 -05:00
Tom Lane	49a08ca1e9	Adjust the permissions required for COMMENT ON ROLE. Formerly, any member of a role could change the role's comment, as of course could superusers; but holders of CREATEROLE privilege could not, unless they were also members. This led to the odd situation that a CREATEROLE holder could create a role but then could not comment on it. It also seems a bit dubious to let an unprivileged user change his own comment, let alone those of group roles he belongs to. So, change the rule to be "you must be superuser to comment on a superuser role, or hold CREATEROLE to comment on non-superuser roles". This is the same as the privilege check for creating/dropping roles, and thus fits much better with the rule for other object types, namely that only the owner of an object can comment on it. In passing, clean up the documentation for COMMENT a little bit. Per complaint from Owen Jacobson and subsequent discussion.	2011-03-09 11:28:34 -05:00
Heikki Linnakangas	4cd3fb6e12	Truncate predicate lock manager's SLRU lazily at checkpoint. That's safer than doing it aggressively whenever the tail-XID pointer is advanced, because this way we don't need to do it while holding SerializableXactHashLock. This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an obsolete comment spotted by Kevin Grittner.	2011-03-08 12:12:54 +02:00
Simon Riggs	dcfe3f60c1	Catversion increment for pg_stat_replication changes for syncrep	2011-03-06 23:44:44 +00:00
Simon Riggs	966fb05b58	Add new files for syncrep missed in previous commit	2011-03-06 23:39:14 +00:00
Simon Riggs	a8a8a3e096	Efficient transaction-controlled synchronous replication. If a standby is broadcasting reply messages and we have named one or more standbys in synchronous_standby_names then allow users who set synchronous_replication to wait for commit, which then provides strict data integrity guarantees. Design avoids sending and receiving transaction state information so minimises bookkeeping overheads. We synchronize with the highest priority standby that is connected and ready to synchronize. Other standbys can be defined to takeover in case of standby failure. This version has very strict behaviour; more relaxed options may be added at a later date. Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime Casanova, Heikki Linnakangas and Robert Haas, plus the assistance of many other design reviewers.	2011-03-06 22:49:16 +00:00
Tom Lane	bfd7f8cbb2	Make plpythonu language use plpython2 shared library directly. The original scheme for this was to symlink plpython.$DLSUFFIX to plpython2.$DLSUFFIX, but that doesn't work on Windows, and only accidentally failed to fail because of the way that CREATE LANGUAGE created or didn't create new C functions. My changes of yesterday exposed the weakness of that approach. To fix, get rid of the symlink and make pg_pltemplate show what's really going on.	2011-03-05 15:13:15 -05:00
Tom Lane	63b656b7bf	Create extension infrastructure for the core procedural languages. This mostly just involves creating control, install, and update-from-unpackaged scripts for them. However, I had to adjust plperl and plpython to not share the same support functions between variants, because we can't put the same function into multiple extensions. catversion bump forced due to new contents of pg_pltemplate, and because initdb now installs plpgsql as an extension not a bare language. Add support for regression testing these as extensions not bare languages. Fix a couple of other issues that popped up while testing this: my initial hack at pg_dump binary-upgrade support didn't work right, and we don't want an extra schema permissions test after all. Documentation changes still to come, but I'm committing now to see whether the MSVC build scripts need work (likely they do).	2011-03-04 21:51:14 -05:00
Peter Eisentraut	b9cff97fdf	Don't allow CREATE TABLE AS to create a column with invalid collation It is possible that an expression ends up with a collatable type but without a collation. CREATE TABLE AS could then create a table based on that. But such a column cannot be dumped with valid SQL syntax, so we disallow creating such a column. per test report from Noah Misch	2011-03-04 23:42:07 +02:00
Tom Lane	8d3b421f5f	Allow non-superusers to create (some) extensions. Remove the unconditional superuser permissions check in CREATE EXTENSION, and instead define a "superuser" extension property, which when false (not the default) skips the superuser permissions check. In this case the calling user only needs enough permissions to execute the commands in the extension's installation script. The superuser property is also enforced in the same way for ALTER EXTENSION UPDATE cases. In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of the extension rather than superuserness. ALTER EXTENSION ADD/DROP needs to insist on ownership of the target object as well; to do that without duplicating code, refactor comment.c's big switch for permissions checks into a separate function in objectaddress.c. I also removed the superuserness checks in pg_available_extensions and related functions; there's no strong reason why everybody shouldn't be able to see that info. Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that in pg_dump, so that dumps won't fail for installed-by-default extensions. We don't have any of those yet, but we will soon. This is all per discussion of wrapping the standard procedural languages into extensions. I'll make those changes in a separate commit; this is just putting the core infrastructure in place.	2011-03-04 16:08:53 -05:00
Tom Lane	908ab80286	Further refine patch for commenting operator implementation functions. Instead of manually maintaining the "implementation of XXX operator" comments in pg_proc.h, delete all those entries and let initdb create them via a join. To let initdb figure out which name to use when there is a conflict, change the comments for deprecated operators to say they are deprecated --- which seems like a good thing to do anyway.	2011-03-03 15:55:47 -05:00
Tom Lane	6252c4f9e2	Run a portal's cleanup hook immediately when pushing it to DONE state. This works around the problem noted by Yamamoto Takashi in bug #5906, that there were code paths whereby we could reach AtCleanup_Portals with a portal's cleanup hook still unexecuted. The changes I made a few days ago were intended to prevent that from happening, and I think that on balance it's still a good thing to avoid, so I don't want to remove the Assert in AtCleanup_Portals. Hence do this instead.	2011-03-03 13:04:06 -05:00
Tom Lane	94133a9354	Mark operator implementation functions as such in their comments. Historically, we've not had separate comments for built-in pg_operator entries, but relied on the comments for the underlying functions. The trouble with this approach is that there isn't much of anything to suggest to users that they'd be better off using the operators instead. So, move all the relevant comments into pg_operator, and give each underlying function a comment that just says "implementation of XXX operator". There are only about half a dozen cases where it seems reasonable to use the underlying function interchangeably with the operator; in these cases I left the same comment in place on the function as on the operator. While at it, establish a policy that every built-in function and operator entry should have a comment: there are now queries in the opr_sanity regression test that will complain if one doesn't. This only required adding a dozen or two more entries than would have been there anyway. I also spent some time trying to eliminate gratuitous inconsistencies in the style of the comments, though it's hopeless to suppose that more won't creep in soon enough. Per my proposal of 2010-10-15.	2011-03-03 01:34:17 -05:00
Heikki Linnakangas	6eba5a7c57	Change pg_last_xlog_receive_location() not to move backwards. That makes it a lot more useful for determining which standby is most up-to-date, for example. There was long discussions on whether overwriting existing existing WAL makes sense to begin with, and whether we should do some more extensive variable renaming, but this change nevertheless seems quite uncontroversial. Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.	2011-03-01 20:54:35 +02:00
Heikki Linnakangas	47ad79122b	Fix bugs in Serializable Snapshot Isolation. Change the way UPDATEs are handled. Instead of maintaining a chain of tuple-level locks in shared memory, copy any existing locks on the old tuple to the new tuple at UPDATE. Any existing page-level lock needs to be duplicated too, as a lock on the new tuple. That was neglected previously. Store xmin on tuple-level predicate locks, to distinguish a lock on an old already-recycled tuple from a new tuple at the same physical location. Failure to distinguish them caused loops in the tuple-lock chains, as reported by YAMAMOTO Takashi. Although we don't use the chain representation of UPDATEs anymore, it seems like a good idea to store the xmin to avoid some false positives if no other reason. CheckSingleTargetForConflictsIn now correctly handles the case where a lock that's being held is not reflected in the local lock table. That happens if another backend acquires a lock on our behalf due to an UPDATE or a page split. PredicateLockPageCombine now retains locks for the page that is being removed, rather than removing them. This prevents a potentially dangerous false-positive inconsistency where the local lock table believes that a lock is held, but it is actually not. Dan Ports and Kevin Grittner	2011-03-01 19:05:16 +02:00
Tom Lane	c0b0076036	Rearrange snapshot handling to make rule expansion more consistent. With this patch, portals, SQL functions, and SPI all agree that there should be only a CommandCounterIncrement between the queries that are generated from a single SQL command by rule expansion. Fetching a whole new snapshot now happens only between original queries. This is equivalent to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the best choice since it eliminates one source of concurrency hazards for rules. The patch should also make things marginally faster by reducing the number of snapshot push/pop operations. The patch removes pg_parse_and_rewrite(), which is no longer used anywhere. There was considerable discussion about more aggressive refactoring of the query-processing functions exported by postgres.c, but for the moment nothing more has been done there. I also took the opportunity to refactor snapmgr.c's API slightly: the former PushUpdatedSnapshot() has been split into two functions. Marko Tiikkaja, reviewed by Steve Singer and Tom Lane	2011-02-28 23:28:06 -05:00
Robert Haas	92c30fd2ed	Rename pg_stat_replication.apply_location to replay_location. For consistency with pg_last_xlog_replay_location. Per discussion.	2011-02-28 12:49:57 -05:00
Tom Lane	a874fe7b4c	Refactor the executor's API to support data-modifying CTEs better. The originally committed patch for modifying CTEs didn't interact well with EXPLAIN, as noted by myself, and also had corner-case problems with triggers, as noted by Dean Rasheed. Those problems show it is really not practical for ExecutorEnd to call any user-defined code; so split the cleanup duties out into a new function ExecutorFinish, which must be called between the last ExecutorRun call and ExecutorEnd. Some Asserts have been added to these functions to help verify correct usage. It is no longer necessary for callers of the executor to call AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now done by ExecutorStart/ExecutorFinish respectively. If you really need to suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to ExecutorStart. Also, refactor portal commit processing to allow for the possibility that PortalDrop will invoke user-defined code. I think this is not actually necessary just yet, since the portal-execution-strategy logic forces any non-pure-SELECT query to be run to completion before we will consider committing. But it seems like good future-proofing.	2011-02-27 13:44:12 -05:00
Tom Lane	389af95155	Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH. This patch implements data-modifying WITH queries according to the semantics that the updates all happen with the same command counter value, and in an unspecified order. Therefore one WITH clause can't see the effects of another, nor can the outer query see the effects other than through the RETURNING values. And attempts to do conflicting updates will have unpredictable results. We'll need to document all that. This commit just fixes the code; documentation updates are waiting on author. Marko Tiikkaja and Hitoshi Harada	2011-02-25 18:58:02 -05:00
Tom Lane	bdca82f44d	Add a relkind field to RangeTblEntry to avoid some syscache lookups. The recent additions for FDW support required checking foreign-table-ness in several places in the parse/plan chain. While it's not clear whether that would really result in a noticeable slowdown, it seems best to avoid any performance risk by keeping a copy of the relation's relkind in RangeTblEntry. That might have some other uses later, anyway. Per discussion.	2011-02-22 19:24:40 -05:00
Peter Eisentraut	1c51c7d5ff	Add PL/Python functions for quoting strings Add functions plpy.quote_ident, plpy.quote_literal, plpy.quote_nullable, which wrap the equivalent SQL functions. To be able to propagate char * constness properly, make the argument of quote_literal_cstr() const char *. This also makes it more consistent with quote_identifier(). Jan Urbański, reviewed by Hitoshi Harada, some refinements by Peter Eisentraut	2011-02-22 23:41:23 +02:00
Tom Lane	1ab9b012bd	Allow binary I/O of type "void". void_send is useful for the same reason that void_out doesn't throw error, namely that someone might do "select void_returning_func(...)" from a client that prefers to operate in binary mode. The void_recv function may or may not have any practical use, but we provide it for symmetry. Radosław Smogura	2011-02-22 13:08:22 -05:00
Tom Lane	2e852e541c	Remove ExecRemoveJunk(), which is no longer used anywhere. This was a leftover from the pre-8.1 design of junkfilters. It doesn't seem to have any reason to live, since it's merely a combination of two easy function calls, and not a well-designed combination at that (it encourages callers to leak the result tuple).	2011-02-21 21:41:08 -05:00
Tom Lane	a210be7720	Fix dangling-pointer problem in before-row update trigger processing. ExecUpdate checked for whether ExecBRUpdateTriggers had returned a new tuple value by seeing if the returned tuple was pointer-equal to the old one. But the "old one" was in estate->es_junkFilter's result slot, which would be scribbled on if we had done an EvalPlanQual update in response to a concurrent update of the target tuple; therefore we were comparing a dangling pointer to a live one. Given the right set of circumstances we could get a false match, resulting in not forcing the tuple to be stored in the slot we thought it was stored in. In the case reported by Maxim Boguk in bug #5798, this led to "cannot extract system attribute from virtual tuple" failures when trying to do "RETURNING ctid". I believe there is a very-low-probability chance of more serious errors, such as generating incorrect index entries based on the original rather than the trigger-modified version of the row. In HEAD, change all of ExecBRInsertTriggers, ExecIRInsertTriggers, ExecBRUpdateTriggers, and ExecIRUpdateTriggers so that they continue to have similar APIs. In the back branches I just changed ExecBRUpdateTriggers, since there is no bug in the ExecBRInsertTriggers case.	2011-02-21 21:19:50 -05:00
Itagaki Takahiro	3cba8240a1	Add ENCODING option to COPY TO/FROM and file_fdw. File encodings can be specified separately from client encoding. If not specified, client encoding is used for backward compatibility. Cases when the encoding doesn't match client encoding are slower than matched cases because we don't have conversion procs for other encodings. Performance improvement would be be a future work. Original patch by Hitoshi Harada, and modified by me.	2011-02-21 14:32:40 +09:00
Tom Lane	7c5d0ae707	Add contrib/file_fdw foreign-data wrapper for reading files via COPY. This is both very useful in its own right, and an important test case for the core FDW support. This commit includes a small refactoring of copy.c to expose its option checking code as a separately callable function. The original patch submission duplicated hundreds of lines of that code, which seemed pretty unmaintainable. Shigeru Hanada, reviewed by Itagaki Takahiro and Tom Lane	2011-02-20 14:06:59 -05:00
Tom Lane	bb74240794	Implement an API to let foreign-data wrappers actually be functional. This commit provides the core code and documentation needed. A contrib module test case will follow shortly. Shigeru Hanada, Jan Urbanski, Heikki Linnakangas	2011-02-20 00:18:14 -05:00
Tom Lane	327e025071	Create the catalog infrastructure for foreign-data-wrapper handlers. Add a fdwhandler column to pg_foreign_data_wrapper, plus HANDLER options in the CREATE FOREIGN DATA WRAPPER and ALTER FOREIGN DATA WRAPPER commands, plus pg_dump support for same. Also invent a new pseudotype fdw_handler with properties similar to language_handler. This is split out of the "FDW API" patch for ease of review; it's all stuff we will certainly need, regardless of any other details of the FDW API. FDW handler functions will not actually get called yet. In passing, fix some omissions and infelicities in foreigncmds.c. Shigeru Hanada, Jan Urbanski, Heikki Linnakangas	2011-02-19 00:07:15 -05:00
Simon Riggs	06828c5feb	Separate messages for standby replies and hot standby feedback. Allow messages to be sent at different times, and greatly reduce the frequency of hot standby feedback. Refactor to allow additional message types.	2011-02-18 11:31:49 +00:00
Itagaki Takahiro	62c7bd31c8	Add transaction-level advisory locks. They share the same locking namespace with the existing session-level advisory locks, but they are automatically released at the end of the current transaction and cannot be released explicitly via unlock functions. Marko Tiikkaja, reviewed by me.	2011-02-18 14:05:12 +09:00
Tom Lane	52b60530f2	Fix tsmatchsel() to account properly for null rows. ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh	2011-02-17 19:00:49 -05:00
Robert Haas	4a25bc145a	Add client_hostname field to pg_stat_activity. Peter Eisentraut, reviewed by Steve Singer, Alvaro Herrera, and me.	2011-02-17 16:03:28 -05:00
Tom Lane	a2095f7fb5	Fix bogus test for hypothetical indexes in get_actual_variable_range(). That function was supposing that indexoid == 0 for a hypothetical index, but that is not likely to be true in any non-toy implementation of an index adviser, since assigning a fake OID is the only way to know at EXPLAIN time which hypothetical index got selected. Fix by adding a flag to IndexOptInfo to mark hypothetical indexes. Back-patch to 9.0 where get_actual_variable_range() was added. Gurjeet Singh	2011-02-16 19:24:45 -05:00
Tom Lane	6595dd04d1	Add backwards-compatible declarations of some core GIN support functions. These are needed to support reloading dumps of 9.0 installations containing contrib/intarray or contrib/tsearch2. Since not only regular dump/reload but binary upgrade would fail, it seems worth the trouble to carry these stubs for awhile. Note that the contrib opclasses referencing these functions will still work fine, since GIN doesn't actually pay any attention to the declared signature of a support function.	2011-02-16 17:24:46 -05:00
Simon Riggs	bca8b7f16a	Hot Standby feedback for avoidance of cleanup conflicts on standby. Standby optionally sends back information about oldestXmin of queries which is then checked and applied to the WALSender's proc->xmin. GetOldestXmin() is modified slightly to agree with GetSnapshotData(), so that all backends on primary include WALSender within their snapshots. Note this does nothing to change the snapshot xmin on either master or standby. Feedback piggybacks on the standby reply message. vacuum_defer_cleanup_age is no longer used on standby, though parameter still exists on primary, since some use cases still exist. Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas	2011-02-16 19:29:37 +00:00
Tom Lane	6e02755b22	Add FOREACH IN ARRAY looping to plpgsql. (I'm not entirely sure that we've finished bikeshedding the syntax details, but the functionality seems OK.) Pavel Stehule, reviewed by Stephen Frost and Tom Lane	2011-02-16 01:53:03 -05:00
Robert Haas	4695da5ae9	pg_ctl promote Fujii Masao, reviewed by Robert Haas, Stephen Frost, and Magnus Hagander.	2011-02-15 21:30:23 -05:00
Itagaki Takahiro	8ddc05fb01	Export the external file reader used in COPY FROM as APIs. They are expected to be used by extension modules like file_fdw. There are no user-visible changes. Itagaki Takahiro Reviewed and tested by Kevin Grittner and Noah Misch.	2011-02-16 11:19:11 +09:00
Tom Lane	887dd041a6	Fix obsolete comment. Comment about MaxAllocSize was not updated when the TOAST-header macros were replaced in 8.3 "varvarlena" changes. Per report from Frederik Ramm.	2011-02-15 13:27:54 -05:00
Tom Lane	555353c0c5	Rearrange extension-related views as per recent discussion. The original design of pg_available_extensions did not consider the possibility of version-specific control files. Split it into two views: pg_available_extensions shows information that is generic about an extension, while pg_available_extension_versions shows all available versions together with information that could be version-dependent. Also, add an SRF pg_extension_update_paths() to assist in checking that a collection of update scripts provide sane update path sequences.	2011-02-14 19:22:36 -05:00
Bruce Momjian	0de0cc150a	Properly handle Win32 paths of 'E:abc', which can be either absolute or relative, by creating a function path_is_relative_and_below_cwd() to check for specific requirements. It is unclear if this fixes a security problem or not but the new code is more robust.	2011-02-12 09:47:51 -05:00
Peter Eisentraut	b313bca0af	DDL support for collations - collowner field - CREATE COLLATION - ALTER COLLATION - DROP COLLATION - COMMENT ON COLLATION - integration with extensions - pg_dump support for the above - dependency management - psql tab completion - psql \dO command	2011-02-12 15:55:18 +02:00
Tom Lane	1214749901	Add support for multiple versions of an extension and ALTER EXTENSION UPDATE. This follows recent discussions, so it's quite a bit different from Dimitri's original. There will probably be more changes once we get a bit of experience with it, but let's get it in and start playing with it. This is still just core code. I'll start converting contrib modules shortly. Dimitri Fontaine and Tom Lane	2011-02-11 21:25:57 -05:00
Robert Haas	2c20ba1fd2	Tweak find_composite_type_dependencies API a bit more. Per discussion with Noah Misch, the previous coding, introduced by my commit `65377e0b9c` on 2011-02-06, was really an abuse of RELKIND_COMPOSITE_TYPE, since the caller in typecmds.c is actually passing the name of a domain. So go back having a type name argument, but make the first argument a Relation rather than just a string so we can tell whether it's a table or a foreign table and emit the proper error message.	2011-02-11 08:47:38 -05:00
Tom Lane	01467d3e4f	Extend "ALTER EXTENSION ADD object" to permit "DROP object" as well. Per discussion, this is something we should have sooner rather than later, and it doesn't take much additional code to support it.	2011-02-10 17:37:22 -05:00
Heikki Linnakangas	b186523fd9	Send status updates back from standby server to master, indicating how far the standby has written, flushed, and applied the WAL. At the moment, this is for informational purposes only, the values are only shown in pg_stat_replication system view, but in the future they will also be needed for synchronous replication. Extracted from Simon riggs' synchronous replication patch by Robert Haas, with some tweaking by me.	2011-02-10 21:04:02 +02:00
Magnus Hagander	4c468b37a2	Track last time for statistics reset on databases and bgwriter Tracks one counter for each database, which is reset whenever the statistics for any individual object inside the database is reset, and one counter for the background writer. Tomas Vondra, reviewed by Greg Smith	2011-02-10 15:14:04 +01:00
Tom Lane	e617f0d7e4	Fix improper matching of resjunk column names for FOR UPDATE in subselect. Flattening of subquery range tables during setrefs.c could lead to the rangetable indexes in PlanRowMark nodes not matching up with the column names previously assigned to the corresponding resjunk ctid (resp. tableoid or wholerow) columns. Typical symptom would be either a "cannot extract system attribute from virtual tuple" error or an Assert failure. This wasn't a problem before 9.0 because we didn't support FOR UPDATE below the top query level, and so the final flattening could never renumber an RTE that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique number for each PlanRowMark to label the associated resjunk columns, so that the number need not change during flattening. Per report from David Johnston (though I'm darned if I can see how this got past initial testing of the relevant code). Back-patch to 9.0.	2011-02-09 23:27:42 -05:00
Tom Lane	caddcb8f4b	Fix pg_upgrade to handle extensions. This follows my proposal of yesterday, namely that we try to recreate the previous state of the extension exactly, instead of allowing CREATE EXTENSION to run a SQL script that might create some entirely-incompatible on-disk state. In --binary-upgrade mode, pg_dump won't issue CREATE EXTENSION at all, but instead uses a kluge function provided by pg_upgrade_support to recreate the pg_extension row (and extension-level pg_depend entries) without creating any member objects. The member objects are then restored in the same way as if they weren't members, in particular using pg_upgrade's normal hacks to preserve OIDs that need to be preserved. Then, for each member object, ALTER EXTENSION ADD is issued to recreate the pg_depend entry that marks it as an extension member. In passing, fix breakage in pg_upgrade's enum-type support: somebody didn't fix it when the noise word VALUE got added to ALTER TYPE ADD. Also, rationalize parsetree representation of COMMENT ON DOMAIN and fix get_object_address() to allow OBJECT_DOMAIN.	2011-02-09 19:18:08 -05:00
Peter Eisentraut	2e2d56fea9	Information schema views for collation support Add the views character_sets, collations, and collation_character_set_applicability.	2011-02-09 23:26:48 +02:00
Tom Lane	5bc178b89f	Implement "ALTER EXTENSION ADD object". This is an essential component of making the extension feature usable; first because it's needed in the process of converting an existing installation containing "loose" objects of an old contrib module into the extension-based world, and second because we'll have to use it in pg_dump --binary-upgrade, as per recent discussion. Loosely based on part of Dimitri Fontaine's ALTER EXTENSION UPGRADE patch.	2011-02-09 11:56:37 -05:00
Magnus Hagander	3144c33a2f	Implement NOWAIT option for BASE_BACKUP command Specifying this option makes the server not wait for the xlog to be archived, or emit a warning that it can't, instead leaving the responsibility with the client. This is useful when the log is being streamed using the streaming protocol in parallel with the backup, without having log archiving enabled.	2011-02-09 10:59:53 +01:00
Tom Lane	d9572c4e3b	Core support for "extensions", which are packages of SQL objects. This patch adds the server infrastructure to support extensions. There is still one significant loose end, namely how to make it play nice with pg_upgrade, so I am not yet committing the changes that would make all the contrib modules depend on this feature. In passing, fix a disturbingly large amount of breakage in AlterObjectNamespace() and callers. Dimitri Fontaine, reviewed by Anssi Kääriäinen, Itagaki Takahiro, Tom Lane, and numerous others	2011-02-08 16:13:22 -05:00
Peter Eisentraut	414c5a2ea6	Per-column collation support This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch	2011-02-08 23:04:18 +02:00
Simon Riggs	7a7d36ec33	Continue long tradition of bumping the catalog version a little late.	2011-02-08 19:44:50 +00:00
Simon Riggs	c016ce7281	Named restore points in recovery. Users can record named points, then new recovery.conf parameter recovery_target_name allows PITR to specify named points as recovery targets. Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits	2011-02-08 19:39:08 +00:00
Simon Riggs	8c6e3adbf7	Basic Recovery Control functions for use in Hot Standby. Pause, Resume, Status check functions only. Also, new recovery.conf parameter to pause_at_recovery_target, default on. Simon Riggs, reviewed by Fujii Masao	2011-02-08 18:30:22 +00:00
Heikki Linnakangas	f9f9d696a9	UINT64_MAX isn't defined on MSVC.	2011-02-08 18:15:53 +02:00
Simon Riggs	722bf7017b	Extend ALTER TABLE to allow Foreign Keys to be added without initial validation. FK constraints that are marked NOT VALID may later be VALIDATED, which uses an ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced table. Significantly reduces lock strength and duration when adding FKs. New state visible from psql. Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas	2011-02-08 12:23:20 +00:00
Robert Haas	32896c40ca	Avoid having autovacuum workers wait for relation locks. Waiting for relation locks can lead to starvation - it pins down an autovacuum worker for as long as the lock is held. But if we're doing an anti-wraparound vacuum, then we still wait; maintenance can no longer be put off. To assist with troubleshooting, if log_autovacuum_min_duration >= 0, we log whenever an autovacuum or autoanalyze is skipped for this reason. Per a gripe by Josh Berkus, and ensuing discussion.	2011-02-07 22:04:29 -05:00
Heikki Linnakangas	47082fa875	Oops, forgot to bump catversion in the Serializable Snapshot Isolation patch. I thought we didn't need that, but then I remembered that it added a new SLRU subdirectory, pg_serial. While we're at it, document what pg_serial is.	2011-02-08 00:24:23 +02:00
Heikki Linnakangas	dafaa3efb7	Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen	2011-02-08 00:09:08 +02:00
Robert Haas	65377e0b9c	Tighten ALTER FOREIGN TABLE .. SET DATA TYPE checks. If the foreign table's rowtype is being used as the type of a column in another table, we can't just up and change its data type. This was already checked for composite types and ordinary tables, but we previously failed to enforce it for foreign tables.	2011-02-06 00:26:27 -05:00
Robert Haas	356f2cbbb4	Make handling of errcodes.h more consistent with other generated headers. This fixes make distprep, and seems more robust in other ways as well. Some special handling is required because errcodes.txt is needed by some stuff in src/port, but just by src/backend as is the case for the other generated headers. While I'm at it, fix a few other things that were overlooked in the original patch.	2011-02-04 09:29:10 -05:00
Robert Haas	ddfe26f644	Avoid maintaining three separate copies of the error codes list. src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a big chunk of errcodes.sgml are now automatically generated from a single file, src/backend/utils/errcodes.txt. Jan Urbański, reviewed by Tom Lane.	2011-02-03 22:32:49 -05:00
Bruce Momjian	35b0a6b205	Simplify code used in is_absolute_path() macro; also add comment about 'E:abc' Win32 path handling.	2011-02-03 10:47:06 -05:00
Bruce Momjian	426227850b	Rename function to first_path_var_separator() to clarify it works with path variables, not directory paths.	2011-02-02 22:49:54 -05:00
Peter Eisentraut	15f55cc38a	Add validator to PL/Python Jan Urbański, reviewed by Hitoshi Harada	2011-02-01 22:55:04 +02:00
Magnus Hagander	5273f21434	Undefine setlocale() macro on Win32 New versions of libintl redefine setlocale() to a macro which causes problems when the backend and libintl are linked against different versions of the runtime, which is often the case in msvc builds. Hiroshi Inoue, slightly updated comment by me	2011-02-01 13:19:18 +01:00
Simon Riggs	56b21b7ae3	Re-classify ERRCODE_DATABASE_DROPPED to 57P04	2011-02-01 08:44:01 +00:00
Simon Riggs	9e95c9ad55	Create new errcode for recovery conflict caused by db drop on master. Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change. Unlikely to happen on most servers, so low impact change to allow session poolers to correctly handle this situation. Tatsuo Ishii, edits by me, review by Robert Haas	2011-02-01 00:20:53 +00:00
Heikki Linnakangas	997b48ed96	Support multiple concurrent pg_basebackup backups. With this patch, pg_basebackup doesn't write a backup_label file in the data directory, so it doesn't interfere with a pg_start/stop_backup() based backup anymore. backup_label is still included in the backup, but it is injected directly into the tar stream. Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.	2011-01-31 18:25:39 +02:00
Andrew Dunstan	48c9de8028	Fix typo	2011-01-30 20:34:05 -05:00
Andrew Dunstan	91812df4ed	Enable building with the Mingw64 compiler. This can be used to build 64 bit Windows binaries, not only on 64 bit Windows but on supported cross-compiling hosts including 32 bit Windows, Cygwin, Darwin and Linux.	2011-01-30 19:56:46 -05:00
Magnus Hagander	507069de6d	Add option to include WAL in base backup When included, this makes the base backup a complete working "clone" of the initial database, ready to have a postmaster started against it without the need to set up any log archiving or similar. Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas	2011-01-30 21:30:09 +01:00
Peter Eisentraut	6fe5e4e63e	autoreconf Synchronize pg_config.h.in with configure.in (someone must have forgotten to run autoheader or autoreconf), and clean up some spurious change in configure introduced by the last commit there.	2011-01-27 01:19:45 +02:00
Tom Lane	bd1ad1b019	Replace pg_class.relhasexclusion with pg_index.indisexclusion. There isn't any need to track this state on a table-wide basis, and trying to do so introduces undesirable semantic fuzziness. Move the flag to pg_index, where it clearly describes just a single index and can be immutable after index creation.	2011-01-25 17:51:59 -05:00
Tom Lane	88452d5ba6	Implement ALTER TABLE ADD UNIQUE/PRIMARY KEY USING INDEX. This feature allows a unique or pkey constraint to be created using an already-existing unique index. While the constraint isn't very functionally different from the bare index, it's nice to be able to do that for documentation purposes. The main advantage over just issuing a plain ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with CREATE INDEX CONCURRENTLY, so that there is not a long interval where the table is locked against updates. On the way, refactor some of the code in DefineIndex() and index_create() so that we don't have to pass through those functions in order to create the index constraint's catalog entries. Also, in parse_utilcmd.c, pass around the ParseState pointer in struct CreateStmtContext to save on notation, and add error location pointers to some error reports that didn't have one before. Gurjeet Singh, reviewed by Steve Singer and Tom Lane	2011-01-25 15:43:05 -05:00
Magnus Hagander	e5487f65fd	Make walsender options order-independent While doing this, also move base backup options into a struct instead of increasing the number of parameters to multiple functions for each new option.	2011-01-23 23:39:18 +01:00
Magnus Hagander	048d148fe6	Add pg_basebackup tool for streaming base backups This tool makes it possible to do the pg_start_backup/ copy files/pg_stop_backup step in a single command. There are still some steps to be done before this is a complete backup solution, such as the ability to stream the required WAL logs, but it's still usable, and could do with some buildfarm coverage. In passing, make the checkpoint request optionally fast instead of hardcoding it. Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine	2011-01-23 12:21:23 +01:00
Robert Haas	6f59777c65	Code cleanup for assign_transaction_read_only. As in commit `fb4c5d2798` on 2011-01-21, this avoids spurious debug messages and allows idempotent changes at any time. Along the way, make assign_XactIsoLevel allow idempotent changes even when not within a subtransaction, to be consistent with the new coding of assign_transaction_read_only and because there's no compelling reason to do otherwise. Kevin Grittner, with some adjustments.	2011-01-22 20:55:50 -05:00
Robert Haas	fb4c5d2798	Code cleanup for assign_XactIsoLevel. The new coding avoids a spurious debug message when a transaction that has changed the isolation level has been rolled back. It also allows the property to be freely changed to the current value within a subtransaction. Kevin Grittner, with one small change by me.	2011-01-21 21:49:19 -05:00
Robert Haas	8ceb245680	Make ALTER TABLE revalidate uniqueness and exclusion constraints. Failure to do so can lead to constraint violations. This was broken by commit `1ddc2703a9` on 2010-02-07, so back-patch to 9.0. Noah Misch. Regression test by me.	2011-01-20 22:44:10 -05:00
Tom Lane	6ca452ba7f	Move a couple of declarations to reflect where the routines really are.	2011-01-15 16:09:05 -05:00
Heikki Linnakangas	8f5d65e916	Treat a WAL sender process that hasn't started streaming yet as a regular backend, as far as the postmaster shutdown logic is concerned. That means, fast shutdown will wait for WAL sender processes to exit before signaling bgwriter to finish. This avoids race conditions between a base backup stopping or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't want e.g the end-of-backup WAL record to be written after the shutdown checkpoint.	2011-01-15 16:38:21 +02:00
Magnus Hagander	fcd810c69a	Use a lexer and grammar for parsing walsender commands Makes it easier to parse mainly the BASE_BACKUP command with it's options, and avoids having to manually deal with quoted identifiers in the label (previously broken), and makes it easier to add new commands and options in the future. In passing, refactor the case statement in the walsender to put each command in it's own function.	2011-01-14 16:30:33 +01:00
Magnus Hagander	688423d004	Exit from base backups when shutdown is requested When the exit waits until the whole backup completes, it may take a very long time. In passing, add back an error check in the main loop so we detect clients that disconnect much earlier if the backup is large.	2011-01-14 12:36:45 +01:00
Tom Lane	52948169bc	Code review for postmaster.pid contents changes. Fix broken test for pre-existing postmaster, caused by wrong code for appending lines to the lockfile; don't write a failed listen_address setting into the lockfile; don't arbitrarily change the location of the data directory in the lockfile compared to previous releases; provide more consistent and useful definitions of the socket path and listen_address entries; avoid assuming that pg_ctl has the same DEFAULT_PGSOCKET_DIR as the postmaster; assorted code style improvements.	2011-01-13 19:01:28 -05:00
Tom Lane	d487afbb81	Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly. In an inherited UPDATE/DELETE, each target table has its own subplan, because it might have a column set different from other targets. This means that the resjunk columns we add to support EvalPlanQual might be at different physical column numbers in each subplan. The EvalPlanQual rewrite I did for 9.0 failed to account for this, resulting in possible misbehavior or even crashes during concurrent updates to the same row, as seen in a recent report from Gordon Shannon. Revise the data structure so that we track resjunk column numbers separately for each subplan. I also chose to move responsibility for identifying the physical column numbers back to executor startup, instead of assuming that numbers derived during preprocess_targetlist would stay valid throughout subsequent massaging of the plan. That's a bit slower, so we might want to consider undoing it someday; but it would complicate the patch considerably and didn't seem justifiable in a bug fix that has to be back-patched to 9.0.	2011-01-12 20:47:02 -05:00
Magnus Hagander	4c8e20f815	Track walsender state in shared memory and expose in pg_stat_replication	2011-01-11 21:25:28 +01:00
Magnus Hagander	0eb59c4591	Backend support for streaming base backups Add BASE_BACKUP command to walsender, allowing it to stream a base backup to the client (in tar format). The syntax is still far from ideal, that will be fixed in the switch to use a proper grammar for walsender. No client included yet, will come as a separate commit. Magnus Hagander and Heikki Linnakangas	2011-01-10 14:04:19 +01:00
Magnus Hagander	4448917d51	Split pg_start_backup() and pg_stop_backup() into two pieces Move the actual functionality into a separate function that's easier to call internally, and change the SQL-callable function to be a wrapper calling this. Also create a pg_abort_backup() function, only callable internally, that does only the most vital parts of pg_stop_backup(), making it safe(r) to call from error handlers.	2011-01-09 21:00:28 +01:00
Tom Lane	52fd2d65a3	Fix up core tsquery GIN support for new extractQuery API. No need for the empty-prefix-match kluge to force a full scan anymore.	2011-01-09 14:34:50 -05:00
Magnus Hagander	db4d22d0ef	Add pgreadlink() on Windows to read junction points Add support for reading back information about the symbolic links we've created with pgsymlink(), which are actually Junction Points. Just like pgsymlink() can only create directory symlinks, pgreadlink() can only read directory symlinks.	2011-01-09 15:09:19 +01:00
Tom Lane	adf328c0e1	Add array_contains_nulls() function in arrayfuncs.c. This will support fixing contrib/intarray (and probably other places) so that they don't have to fail on arrays that contain a null bitmap but no live null entries.	2011-01-08 20:26:14 -05:00
Tom Lane	7e2f906201	Remove pg_am.amindexnulls. The only use we have had for amindexnulls is in determining whether an index is safe to cluster on; but since the addition of the amclusterable flag, that usage is pretty redundant. In passing, clean up assorted sloppiness from the last patch that touched pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.	2011-01-08 16:08:05 -05:00
Tom Lane	56a57473a9	Refactor GIN's handling of duplicate search entries. The original coding could combine duplicate entries only when they originated from the same qual condition. In particular it could not combine cases where multiple qual conditions all give rise to full-index scan requests, which is an expensive case well worth optimizing. Refactor so that duplicates are recognized across all the quals.	2011-01-08 14:48:08 -05:00
Tom Lane	a032d50128	Fix the built-in GIN support procedure declarations in pg_proc.h. Add more "internal" arguments so that these pg_proc entries reflect the current preferred API. This is purely a cosmetic change, since GIN doesn't actually consult the pg_proc entry when calling a support function. Accordingly, no catversion bump.	2011-01-07 20:40:48 -05:00
Tom Lane	73912e7fbd	Fix GIN to support null keys, empty and null items, and full index scans. Per my recent proposal(s). Null key datums can now be returned by extractValue and extractQuery functions, and will be stored in the index. Also, placeholder entries are made for indexable items that are NULL or contain no keys according to extractValue. This means that the index is now always complete, having at least one entry for every indexed heap TID, and so we can get rid of the prohibition on full-index scans. A full-index scan is implemented much the same way as partial-match scans were already: we build a bitmap representing all the TIDs found in the index, and then drive the results off that. Also, introduce a concept of a "search mode" that can be requested by extractQuery when the operator requires matching to empty items (this is just as cheap as matching to a single key) or requires a full index scan (which is not so cheap, but it sure beats failing or giving wrong answers). The behavior remains backward compatible for opclasses that don't return any null keys or request a non-default search mode. Using these features, we can now make the GIN index opclass for anyarray behave in a way that matches the actual anyarray operators for &&, <@, @>, and = ... which it failed to do before in assorted corner cases. This commit fixes the core GIN code and ginarrayprocs.c, updates the documentation, and adds some simple regression test cases for the new behaviors using the array operators. The tsearch and contrib GIN opclass support functions still need to be looked over and probably fixed. Another thing I intend to fix separately is that this is pretty inefficient for cases where more than one scan condition needs a full-index search: we'll run duplicate GinScanEntrys, each one of which builds a large bitmap. There is some existing logic to merge duplicate GinScanEntrys but it needs refactoring to make it work for entries belonging to different scan keys. Note that most of gin.h has been split out into a new file gin_private.h, so that gin.h doesn't export anything that's not supposed to be used by GIN opclasses or the rest of the backend. I did quite a bit of other code beautification work as well, mostly fixing comments and choosing more appropriate names for things.	2011-01-07 19:16:24 -05:00
Robert Haas	9b4271deb9	Document pg_stat_replication, bump catversion since that was overlooked. Itagaki Takahiro, edited by me.	2011-01-07 11:06:55 -05:00
Itagaki Takahiro	a755ea33ae	New system view pg_stat_replication displays activity of wal sender processes. Itagaki Takahiro and Simon Riggs.	2011-01-07 20:35:38 +09:00
Magnus Hagander	66a8a0428d	Give superusers REPLIACTION permission by default This can be overriden by using NOREPLICATION on the CREATE ROLE statement, but by default they will have it, making it backwards compatible and "less surprising" (given that superusers normally override all checks).	2011-01-05 14:24:17 +01:00
Robert Haas	7f60be72b0	Fix crash in ALTER OPERATOR CLASS/FAMILY .. SET SCHEMA. In the previous coding, the parser emitted a List containing a C string, which is no good, because copyObject() can't handle it. Dimitri Fontaine	2011-01-03 22:08:55 -05:00
Magnus Hagander	77745cc7f1	Bump catversion, forgot in previous commit.	2011-01-03 12:50:30 +01:00
Magnus Hagander	40d9e94bd7	Add views and functions to monitor hot standby query conflicts Add the view pg_stat_database_conflicts and a column to pg_stat_database, and the underlying functions to provide the information.	2011-01-03 12:46:03 +01:00
Peter Eisentraut	39b8843296	Implement remaining fields of information_schema.sequences view Add new function pg_sequence_parameters that returns a sequence's start, minimum, maximum, increment, and cycle values, and use that in the view. (bug #5662; design suggestion by Tom Lane) Also slightly adjust the view's column order and permissions after review of SQL standard.	2011-01-02 15:15:21 +02:00
Robert Haas	0d692a0dc9	Basic foreign table support. Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas	2011-01-01 23:48:11 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Bruce Momjian	30aeda4394	Include the first valid listen address in pg_ctl to improve server start "wait" detection and add postmaster start time to help determine if the postmaster is actually using the specified data directory.	2010-12-31 17:25:02 -05:00
Tom Lane	7b46401557	Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c. There's no reason for these values to be known anywhere else. After doing this, executor/execdefs.h is vestigial and can be removed.	2010-12-30 22:12:40 -05:00
Tom Lane	f4e4b32743	Support RIGHT and FULL OUTER JOIN in hash joins. This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.	2010-12-30 20:26:08 -05:00
Alvaro Herrera	55573990ca	Avoid unnecessary public struct declaration in slru.h Instead, declare a public wrapper of the sole function using it for external callers, so that they don't have to always pass a NULL argument. Author: Kevin Grittner	2010-12-30 12:09:17 -03:00
Robert Haas	d2bc1c9907	Bump XLOG_PAGE_MAGIC. The unlogged tables patch (commit `53dbc27c62`, 2010-12-29) should have done this, since it changes the format of an XLOG_SMGR_CREATE record.	2010-12-29 07:19:21 -05:00
Robert Haas	53dbc27c62	Support unlogged tables. The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.	2010-12-29 06:48:53 -05:00
Magnus Hagander	9b8aff8c19	Add REPLICATION privilege for ROLEs This privilege is required to do Streaming Replication, instead of superuser, making it possible to set up a SR slave that doesn't have write permissions on the master. Superuser privileges do NOT override this check, so in order to use the default superuser account for replication it must be explicitly granted the REPLICATION permissions. This is backwards incompatible change, in the interest of higher default security.	2010-12-29 11:05:03 +01:00
Tom Lane	f2ba1e994c	Avoid unexpected conversion overflow in planner for distant date values. The "date" type supports a wider range of dates than int64 timestamps do. However, there is pre-int64-timestamp code in the planner that assumes that all date values can be converted to timestamp with impunity. Fortunately, what we really need out of the conversion is always a double (float8) value; so even when the date is out of timestamp's range it's possible to produce a sane answer. All we need is a code path that doesn't try to force the result into int64. Per trouble report from David Rericha. Back-patch to all supported versions. Although this is surely a corner case, there's not much point in advertising a date range wider than timestamp's if we will choke on such values in unexpected places.	2010-12-28 22:49:57 -05:00
Tom Lane	84fc571395	Rename the C functions bitand(), bitor() to bit_and(), bit_or(). This is to avoid use of the C++ keywords "bitand" and "bitor" in the header file utils/varbit.h. Note the functions' SQL-level names are not changed, only their C-level names. In passing, make some comments in varbit.c conform to project-standard layout.	2010-12-27 14:57:41 -05:00
Tom Lane	37b61a69f3	Fix failure of executor/hashjoin.h to compile standalone. Noted while experimenting with cpluspluscheck.	2010-12-27 12:20:09 -05:00
Tom Lane	275411912d	Fix ill-chosen use of "private" as an argument and struct field name. "private" is a keyword in C++, so this breaks the poorly-enforced policy that header files should be include-able in C++ code. Per report from Craig Ringer and some investigation with cpluspluscheck.	2010-12-27 11:26:19 -05:00
Robert Haas	63676ebff4	Corrections to patch adding SQL/MED error codes. My previous commit, `85cff3ce7f` on 2010-12-25, failed to update errcodes.sgml or plerrcodes.h. This patch corrects that oversight, per a gripe from Tom Lane, and also corrects a typographical error.	2010-12-26 21:35:25 -05:00
Andrew Dunstan	a534728afb	Only build in crashdump support on Windows if there's a working dbghelp.h.	2010-12-26 10:34:47 -05:00
Robert Haas	85cff3ce7f	Add foreign data wrapper error code values for SQL/MED. Extracted from a much larger patch by Shigeru Hanada.	2010-12-25 13:57:39 -05:00
Heikki Linnakangas	9de3aa65f0	Rewrite the GiST insertion logic so that we don't need the post-recovery cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.	2010-12-23 16:21:47 +02:00
Magnus Hagander	dcb09b595f	Support for collecting crash dumps on Windows Add support for collecting "minidump" style crash dumps on Windows, by setting up an exception handling filter. Crash dumps will be generated in PGDATA/crashdumps if the directory is created (the existance of the directory is used as on/off switch for the generation of the dumps). Craig Ringer and Magnus Hagander	2010-12-19 16:45:28 +01:00
Tom Lane	61b53695fb	Remove optreset from src/port/ implementations of getopt and getopt_long. We don't actually need optreset, because we can easily fix the code to ensure that it's cleanly restartable after having completed a scan over the argv array; which is the only case we need to restart in. Getting rid of it avoids a class of interactions with the system libraries and allows reversion of my change of yesterday in postmaster.c and postgres.c. Back-patch to 8.4. Before that the getopt code was a bit different anyway.	2010-12-16 16:23:05 -05:00
Itagaki Takahiro	03db44eae3	Add pg_read_binary_file() and whole-file-at-once versions of pg_read_file(). One of the usages of the binary version is to read files in a different encoding from the server encoding. Dimitri Fontaine and Itagaki Takahiro.	2010-12-16 06:56:28 +09:00
Robert Haas	34c70c7ac4	Instrument checkpoint sync calls. Greg Smith, reviewed by Jeff Janes	2010-12-14 09:26:19 -05:00
Robert Haas	d368e1a2a7	Allow plugins to suppress inlining and hook function entry/exit/abort. This is intended as infrastructure to allow an eventual SE-Linux plugin to support trusted procedures. KaiGai Kohei	2010-12-13 19:15:53 -05:00
Robert Haas	5f7b58fad8	Generalize concept of temporary relations to "relation persistence". This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.	2010-12-13 12:34:26 -05:00
Tom Lane	5132ad8bdf	Make S_IRGRP etc available in mingw builds as well as MSVC. (Hm, I wonder whether BCC defines them either...) Also label dangling endifs a bit better in this area.	2010-12-12 13:43:44 -05:00
Tom Lane	1319002e2e	Provide a complete set of file-permission-bit macros in win32.h. My previous patch exposed the fact that we didn't have these. Those hard-wired octal constants were actually wrong on Windows, not just inconsistent.	2010-12-11 13:11:18 -05:00
Robert Haas	d3d414696f	Allow bidirectional copy messages in streaming replication mode. Fujii Masao. Review by Alvaro Herrera, Tom Lane, and myself.	2010-12-11 09:27:37 -05:00
Tom Lane	671199929d	Move a couple of initdb's subroutines into src/port/. mkdir_p and check_data_dir will be useful in CREATE TABLESPACE, since we have agreed that that command should handle subdirectory creation just like initdb creates the PGDATA directory. Push them into src/port/ so that they are available to both initdb and the backend. Rename to pg_mkdir_p and pg_check_dir, just to be on the safe side. Add FreeBSD's copyright notice to pgmkdirp.c, since that's where the code came from originally (this really should have been in initdb.c). Very marginal code/comment cleanup.	2010-12-10 19:42:44 -05:00
Tom Lane	576477e73c	Force default wal_sync_method to be fdatasync on Linux. Recent versions of the Linux system header files cause xlogdefs.h to believe that open_datasync should be the default sync method, whereas formerly fdatasync was the default on Linux. open_datasync is a bad choice, first because it doesn't actually outperform fdatasync (in fact the reverse), and second because we try to use O_DIRECT with it, causing failures on certain filesystems (e.g., ext4 with data=journal option). This part of the patch is largely per a proposal from Marti Raudsepp. More extensive changes are likely to follow in HEAD, but this is as much change as we want to back-patch. Also clean up confusing code and incorrect documentation surrounding the fsync_writethrough option. Those changes shouldn't result in any actual behavioral change, but I chose to back-patch them anyway to keep the branches looking similar in this area. In 9.0 and HEAD, also do some copy-editing on the WAL Reliability documentation section. Back-patch to all supported branches, since any of them might get used on modern Linux versions.	2010-12-08 20:01:09 -05:00
Simon Riggs	e620ee35b2	Optimize commit_siblings in two ways to improve group commit. First, avoid scanning the whole ProcArray once we know there are at least commit_siblings active; second, skip the check altogether if commit_siblings = 0. Greg Smith	2010-12-08 18:48:03 +00:00
Heikki Linnakangas	5a031a5556	Fix bugs in the hot standby known-assigned-xids tracking logic. If there's an old transaction running in the master, and a lot of transactions have started and finished since, and a WAL-record is written in the gap between the creating the running-xacts snapshot and WAL-logging it, recovery will fail with "too many KnownAssignedXids" error. This bug was reported by Joachim Wieland on Nov 19th. In the same scenario, when fewer transactions have started so that all the xids fit in KnownAssignedXids despite the first bug, a more serious bug arises. We incorrectly initialize the clog code with the oldest still running transaction, and when we see the WAL record belonging to a transaction with an XID larger than one that committed already before the checkpoint we're recovering from, we zero the clog page containing the already committed transaction, leading to data loss. In hindsight, trying to track xids in the known-assigned-xids array before seeing the running-xacts record was too complicated. To fix that, hold XidGenLock while the running-xacts snapshot is taken and WAL-logged. That ensures that no transaction can begin or end in that gap, so that in recvoery we know that the snapshot contains all transactions running at that point in WAL.	2010-12-07 09:23:30 +01:00
Tom Lane	e194a942f9	Update comment to match later code changes.	2010-12-04 03:21:49 -05:00
Tom Lane	554506871b	KNNGIST, otherwise known as order-by-operator support for GIST. This commit represents a rather heavily editorialized version of Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1 patches. I redid the opclass API to add a separate Distance method instead of turning the Consistent method into an illogical mess, fixed some bit-rot in the rbtree interfaces, and generally worked over the code style and comments. There's still no non-code documentation to speak of, but I'll work on that separately. Some contrib-module changes are also yet to come (right now, point <-> point is the only KNN-ified operator). Teodor Sigaev and Tom Lane	2010-12-03 20:53:29 -05:00
Robert Haas	970a18687f	Use GUC lexer for recovery.conf parsing. This eliminates some crufty, special-purpose code and, as a non-trivial side benefit, allows recovery.conf parameters to be unquoted. Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki Takahiro, and me.	2010-12-03 08:56:44 -05:00
Tom Lane	d583f10b7e	Create core infrastructure for KNNGIST. This is a heavily revised version of builtin_knngist_core-0.9. The ordering operators are no longer mixed in with actual quals, which would have confused not only humans but significant parts of the planner. Instead, ordering operators are carried separately throughout planning and execution. Since the API for ambeginscan and amrescan functions had to be changed anyway, this commit takes the opportunity to rationalize that a bit. RelationGetIndexScan no longer forces a premature index_rescan call; instead, callers of index_beginscan must call index_rescan too. Aside from making the AM-side initialization logic a bit less peculiar, this has the advantage that we do not make a useless extra am_rescan call when there are runtime key values. AMs formerly could not assume that the key values passed to amrescan were actually valid; now they can. Teodor Sigaev and Tom Lane	2010-12-02 20:51:37 -05:00
Tom Lane	c0b5fac701	Simplify and speed up mapping of index opfamilies to pathkeys. Formerly we looked up the operators associated with each index (caching them in relcache) and then the planner looked up the btree opfamily containing such operators in order to build the btree-centric pathkey representation that describes the index's sort order. This is quite pointless for btree indexes: we might as well just use the index's opfamily information directly. That saves syscache lookup cycles during planning, and furthermore allows us to eliminate the relcache's caching of operators altogether, which may help in reducing backend startup time. I added code to plancat.c to perform the same type of double lookup on-the-fly if it's ever faced with a non-btree amcanorder index AM. If such a thing actually becomes interesting for production, we should replace that logic with some more-direct method for identifying the corresponding btree opfamily; but it's not worth spending effort on now. There is considerably more to do pursuant to my recent proposal to get rid of sort-operator-based representations of sort orderings, but this patch grabs some of the low-hanging fruit. I'll look at the remainder of that work after the current commitfest.	2010-11-29 12:30:43 -05:00
Simon Riggs	ed78384acd	Move call to GetTopTransactionId() earlier in LockAcquire(), removing an infrequently occurring race condition in Hot Standby. An xid must be assigned before a lock appears in shared memory, rather than immediately after, else GetRunningTransactionLocks() may see InvalidTransactionId, causing assertion failures during lock processing on standby. Bug report and diagnosis by Fujii Masao, fix by me.	2010-11-29 01:08:02 +00:00
Robert Haas	55109313f9	Add more ALTER <object> .. SET SCHEMA commands. This adds support for changing the schema of a conversion, operator, operator class, operator family, text search configuration, text search dictionary, text search parser, or text search template. Dimitri Fontaine, with assorted corrections and other kibitzing.	2010-11-26 17:31:54 -05:00
Robert Haas	cc1ed40d57	Object access hook framework, with post-creation hook. After a SQL object is created, we provide an opportunity for security or logging plugins to get control; for example, a security label provider could use this to assign an initial security label to newly created objects. The basic infrastructure is (hopefully) reusable for other types of events that might require similar treatment. KaiGai Kohei, with minor adjustments.	2010-11-25 11:50:13 -05:00
Bruce Momjian	ba11258ccb	When reporting the server as not responding, if the hostname was supplied, also print the IP address. This allows IPv4 and IPv6 failures to be distinguished. Also useful when a hostname resolves to multiple IP addresses. Also, remove use of inet_ntoa() and use our own inet_net_ntop() in all places, including in libpq, because it is thread-safe.	2010-11-24 17:04:19 -05:00
Tom Lane	725d52d0c2	Create the system catalog infrastructure needed for KNNGIST. This commit adds columns amoppurpose and amopsortfamily to pg_amop, and column amcanorderbyop to pg_am. For the moment all the entries in amcanorderbyop are "false", since the underlying support isn't there yet. Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with [ FOR SEARCH \| FOR ORDER BY sort_operator_family ] clauses to allow the new columns of pg_amop to be populated, and create pg_dump support for dumping that information. I also added some documentation, although it's perhaps a bit premature given that the feature doesn't do anything useful yet. Teodor Sigaev, Robert Haas, Tom Lane	2010-11-24 14:22:17 -05:00
Peter Eisentraut	f2a4278330	Propagate ALTER TYPE operations to typed tables This adds RESTRICT/CASCADE flags to ALTER TYPE ... ADD/DROP/ALTER/ RENAME ATTRIBUTE to control whether to alter typed tables as well.	2010-11-23 22:50:17 +02:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Robert Haas	44475e782f	Centralize some ALTER <whatever> .. SET SCHEMA checks. Any flavor of ALTER <whatever> .. SET SCHEMA fails if (1) the object is already in the new schema, (2) either the old or new schema is a temp schema, or (3) either the old or new schema is the TOAST schema. Extraced from a patch by Dimitri Fontaine, with additional hacking by me.	2010-11-22 19:53:34 -05:00
Robert Haas	506070be34	Bump catversion. Should have done this as part of format(text) patch.	2010-11-21 06:34:42 -05:00
Robert Haas	7504870778	Add new SQL function, format(text). Currently, three conversion format specifiers are supported: %s for a string, %L for an SQL literal, and %I for an SQL identifier. The latter two are deliberately designed not to overlap with what sprintf() already supports, in case we want to add more of sprintf()'s functionality here later. Patch by Pavel Stehule, heavily revised by me. Reviewed by Jeff Janes and, in earlier versions, by Itagaki Takahiro and Tom Lane.	2010-11-20 22:33:27 -05:00
Tom Lane	89a368418c	Further cleanup of indxpath logic related to IndexOptInfo.opfamily array. We no longer need the terminating zero entry in opfamily[], so get rid of it. Also replace assorted ad-hoc looping logic with simple for and foreach constructs. This code is now noticeably more readable than it was an hour ago; credit to Robert for seeing that it could be simplified.	2010-11-20 15:07:16 -05:00
Robert Haas	4343c0e546	Expose quote_literal_cstr() from core. This eliminates the need for inefficient implementions of this functionality in both contrib/dblink and contrib/tablefunc, so remove them. The upcoming patch implementing an in-core format() function will also require this functionality. In passing, add some regression tests.	2010-11-20 10:04:48 -05:00
Robert Haas	4fc115b2e9	Speed up conversion of signed integers to C strings. A hand-coded implementation turns out to be much faster than calling printf(). In passing, add a few more regresion tests. Andres Freund, with assorted, mostly cosmetic changes.	2010-11-19 22:13:11 -05:00
Tom Lane	0f61d4dd1b	Improve relation width estimation for subqueries. As per the ancient comment for set_rel_width, it really wasn't much good for relations that aren't plain tables: it would never find any stats and would always fall back on datatype-based estimates, which are often pretty silly. Fix that by copying up width estimates from the subquery planning process. At some point we might want to do this for CTEs too, but that would be a significantly more invasive patch because the sub-PlannerInfo is no longer accessible by the time it's needed. I refrained from doing anything about that, partly for fear of breaking the unmerged CTE-related patches. In passing, also generate less bogus width estimates for whole-row Vars. Per a gripe from Jon Nelson.	2010-11-19 17:31:50 -05:00
Alvaro Herrera	6cc2deb86e	Add pg_describe_object function This function is useful to obtain textual descriptions of objects as stored in pg_depend.	2010-11-18 17:06:19 -03:00
Tom Lane	6fbc323c80	Further fallout from the MergeAppend patch. Fix things so that top-N sorting can be used in child Sort nodes of a MergeAppend node, when there is a LIMIT and no intervening joins or grouping. Actually doing this on the executor side isn't too bad, but it's a bit messier to get the planner to cost it properly. Per gripe from Robert Haas. In passing, fix an oversight in the original top-N-sorting patch: query_planner should not assume that a LIMIT can be used to make an explicit sort cheaper when there will be grouping or aggregation in between. Possibly this should be back-patched, but I'm not sure the mistake is serious enough to be a real problem in practice.	2010-11-18 00:30:10 -05:00
Tom Lane	511e902b51	Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally. In the previous coding, we simply issued ALTER SEQUENCE RESTART commands, which do not roll back on error. This meant that an error between truncating and committing left the sequences out of sync with the table contents, with potentially bad consequences as were noted in a Warning on the TRUNCATE man page. To fix, create a new storage file (relfilenode) for a sequence that is to be reset due to RESTART IDENTITY. If the transaction aborts, we'll automatically revert to the old storage file. This acts just like a rewriting ALTER TABLE operation. A penalty is that we have to take exclusive lock on the sequence, but since we've already got exclusive lock on its owning table, that seems unlikely to be much of a problem. The interaction of this with usual nontransactional behaviors of sequence operations is a bit weird, but it's hard to see what would be completely consistent. Our choice is to discard cached-but-unissued sequence values both when the RESTART is executed, and at rollback if any; but to not touch the currval() state either time. In passing, move the sequence reset operations to happen before not after any AFTER TRUNCATE triggers are fired. The previous ordering was not logically sensible, but was forced by the need to minimize inconsistency if the triggers caused an error. Transactional rollback is a much better solution to that. Patch by Steve Singer, rather heavily adjusted by me.	2010-11-17 16:42:18 -05:00
Heikki Linnakangas	2edc5cd493	The GiST scan algorithm uses LSNs to detect concurrent pages splits, but temporary indexes are not WAL-logged. We used a constant LSN for temporary indexes, on the assumption that we don't need to worry about concurrent page splits in temporary indexes because they're only visible to the current session. But that assumption is wrong, it's possible to insert rows and split pages in the same session, while a scan is in progress. For example, by opening a cursor and fetching some rows, and INSERTing new rows before fetching some more. Fix by generating fake increasing LSNs, used in place of real LSNs in temporary GiST indexes.	2010-11-16 11:32:21 +02:00
Robert Haas	3134d8863e	Add new buffers_backend_fsync field to pg_stat_bgwriter. This new field counts the number of times that a backend which writes a buffer out to the OS must also fsync() it. This happens when the bgwriter fsync request queue is full, and is generally detrimental to performance, so it's good to know when it's happening. Along the way, log a new message at level DEBUG1 whenever we fail to hand off an fsync, so that the problem can also be seen in examination of log files (if the logging level is cranked up high enough). Greg Smith, with minor tweaks by me.	2010-11-15 12:42:59 -05:00
Robert Haas	20cf8ae478	Fix copy-and-pasteo a little more completely. copydir.c is no longer in src/port	2010-11-15 10:10:58 -05:00
Alvaro Herrera	ae4b17edee	Fix copy-and-pasteo.	2010-11-15 11:52:56 -03:00
Robert Haas	11e482c350	Move copydir() prototype into its own header file. Having this in src/include/port.h makes no sense, now that copydir.c lives in src/backend/strorage rather than src/port. Along the way, remove an obsolete comment from contrib/pg_upgrade that makes reference to the old location.	2010-11-12 16:39:53 -05:00
Robert Haas	7ba6e4f0e0	Add monitoring function pg_last_xact_replay_timestamp. Fujii Masao, with a little wordsmithing by me.	2010-11-09 22:52:19 -05:00
Itagaki Takahiro	844ed5dc97	Don't use __declspec (dllimport) for PGDLLEXPORT to reduce warnings by gcc version 4 on mingw and cygwin. We don't use dllexport here because dllexport and dllwrap don't work well together.	2010-11-10 12:17:43 +09:00
Tom Lane	947d0c862c	Use appendrel planning logic for top-level UNION ALL structures. Formerly, we could convert a UNION ALL structure inside a subquery-in-FROM into an appendrel, as a side effect of pulling up the subquery into its parent; but top-level UNION ALL always caused use of plan_set_operations(). That didn't matter too much because you got an Append-based plan either way. However, now that the appendrel code can do things with MergeAppend, it's worthwhile to hack up the top-level case so it also uses appendrels. This is a bit of a stopgap; but going much further than this will require a major rewrite of the planner's set-operations support, which I'm not prepared to undertake now. For the moment let's grab the low-hanging fruit.	2010-11-08 15:15:02 -05:00
Tom Lane	034967bdcb	Reimplement planner's handling of MIN/MAX aggregate optimization. Per my recent proposal, get rid of all the direct inspection of indexes and manual generation of paths in planagg.c. Instead, set up EquivalenceClasses for the aggregate argument expressions, and let the regular path generation logic deal with creating paths that can satisfy those sort orders. This makes planagg.c a bit more visible to the rest of the planner than it was originally, but the approach is basically a lot cleaner than before. A major advantage of doing it this way is that we get MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans) practically for free, whereas in the old way we'd have had to add a whole lot more duplicative logic. One small disadvantage of this approach is that MIN/MAX aggregates can no longer exploit partial indexes having an "x IS NOT NULL" predicate, unless that restriction or something that implies it is specified in the query. The previous implementation was able to use the added "x IS NOT NULL" condition as an extra predicate proof condition, but in this version we rely entirely on indexes that are considered usable by the main planning process. That seems a fair tradeoff for the simplicity and functionality gained.	2010-11-04 12:01:17 -04:00
Tom Lane	0811ff2063	Avoid using a local FunctionCallInfoData struct in ExecMakeFunctionResult and related routines. We already had a redundant FunctionCallInfoData struct in FuncExprState, but were using that copy only in set-returning-function cases, to avoid keeping function evaluation state in the expression tree for the benefit of plpgsql's "simple expression" logic. But of course that didn't work anyway. Given the recent fixes in plpgsql there is no need to have two separate behaviors here. Getting rid of the local FunctionCallInfoData structs should make things a little faster (because we don't need to do InitFunctionCallInfoData each time), and it also makes for a noticeable reduction in stack space consumption during recursive calls.	2010-11-01 13:54:21 -04:00
Tom Lane	186cbbda8f	Provide hashing support for arrays. The core of this patch is hash_array() and associated typcache infrastructure, which works just about exactly like the existing support for array comparison. In addition I did some work to ensure that the planner won't think that an array type is hashable unless its element type is hashable, and similarly for sorting. This includes adding a datatype parameter to op_hashjoinable and op_mergejoinable, and adding an explicit "hashable" flag to SortGroupClause. The lack of a cross-check on the element type was a pre-existing bug in mergejoin support --- but it didn't matter so much before, because if you couldn't sort the element type there wasn't any good alternative to failing anyhow. Now that we have the alternative of hashing the array type, there are cases where we can avoid a failure by being picky at the planner stage, so it's time to be picky. The issue of exactly how to combine the per-element hash values to produce an array hash is still open for discussion, but the rest of this is pretty solid, so I'll commit it as-is.	2010-10-30 21:56:11 -04:00
Tom Lane	14231a41a9	Avoid creation of useless EquivalenceClasses during planning. Zoltan Boszormenyi exhibited a test case in which planning time was dominated by construction of EquivalenceClasses and PathKeys that had no actual relevance to the query (and in fact got discarded immediately). This happened because we generated PathKeys describing the sort ordering of every index on every table in the query, and only after that checked to see if the sort ordering was relevant. The EC/PK construction code is O(N^2) in the number of ECs, which is all right for the intended number of such objects, but it gets out of hand if there are ECs for lots of irrelevant indexes. To fix, twiddle the handling of mergeclauses a little bit to ensure that every interesting EC is created before we begin path generation. (This doesn't cost anything --- in fact I think it's a bit cheaper than before --- since we always eventually created those ECs anyway.) Then, if an index column can't be found in any pre-existing EC, we know that that sort ordering is irrelevant for the query. Instead of creating a useless EC, we can just not build a pathkey for the index column in the first place. The index will still be considered if it's useful for non-order-related reasons, but we will think of its output as unsorted.	2010-10-29 11:52:50 -04:00

... 3 4 5 6 7 ...

5664 Commits