postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-09-30 17:21:14 +02:00

Author	SHA1	Message	Date
Simon Riggs	21d6a6a128	Tune GetSnapshotData() during Hot Standby by avoiding loop through normal backends. Makes code clearer also, since we avoid various Assert()s. Performance of snapshots taken during recovery no longer depends upon number of read-only backends.	2010-04-18 18:06:07 +00:00
Heikki Linnakangas	361bd1662e	Allow Hot Standby to begin from a shutdown checkpoint. Patch by Simon Riggs & me	2010-04-13 14:17:46 +00:00
Heikki Linnakangas	30556568f5	Update the location of last removed WAL segment in shared memory only after actually removing one, so that if we can't remove segments because WAL archiving is lagging behind, we don't unnecessarily forbid streaming the old not-yet-archived segments that are still perfectly valid. Per suggestion from Fujii Masao.	2010-04-12 10:40:43 +00:00
Heikki Linnakangas	e57cd7f0a1	Change the logic to decide when to delete old WAL segments, so that it doesn't take into account how far the WAL senders are. This way a hung WAL sender doesn't prevent old WAL segments from being recycled/removed in the primary, ultimately causing the disk to fill up. Instead add standby_keep_segments setting to control how many old WAL segments are kept in the primary. This also makes it more reliable to use streaming replication without WAL archiving, assuming that you set standby_keep_segments high enough.	2010-04-12 09:52:29 +00:00
Tom Lane	9029df17c4	Fix updateAclDependencies() to not assume that ACL role dependencies can only be added during GRANT and can only be removed during REVOKE; and fix its callers to not lie to it about the existing set of dependencies when instantiating a formerly-default ACL. The previous coding accidentally failed to malfunction so long as default ACLs contain only references to the object's owning role, because that role is ignored by updateAclDependencies. However this is obviously pretty fragile, as well as being an undocumented assumption. The new coding is a few lines longer but IMO much clearer.	2010-04-05 01:09:53 +00:00
Magnus Hagander	4c10623306	Update a number of broken links in comments. Josh Kupershmidt	2010-04-02 15:21:20 +00:00
Robert Haas	54943734f8	Refer to max_wal_senders in a more consistent fashion. The error message now makes explicit reference to the GUC that must be changed to fix the problem, using wording suggested by Tom Lane. Along the way, rename the GUC from MaxWalSenders to max_wal_senders for consistency and grep-ability.	2010-04-01 00:43:29 +00:00
Tom Lane	d174a4adbb	Fix "constraint_exclusion = partition" logic so that it will also attempt constraint exclusion on an inheritance set that is the target of an UPDATE or DELETE query. Per gripe from Marc Cousin. Back-patch to 8.4 where the feature was introduced.	2010-03-30 21:58:11 +00:00
Tom Lane	b78f6264eb	Rework join-removal logic as per recent discussion. In particular this fixes things so that it works for cases where nested removals are possible. The overhead of the optimization should be significantly less, as well.	2010-03-28 22:59:34 +00:00
Simon Riggs	a760893dbd	Derive latestRemovedXid for btree deletes by reading heap pages. The WAL record for btree delete contains a list of tids, even when backup blocks are present. We follow the tids to their heap tuples, taking care to follow LP_REDIRECT tuples. We ignore LP_DEAD tuples on the understanding that they will always have xmin/xmax earlier than any LP_NORMAL tuples referred to by killed index tuples. Iff all tuples are LP_DEAD we return InvalidTransactionId. The heap relfilenode is added to the WAL record, requiring API changes to pass down the heap Relation. XLOG_PAGE_MAGIC updated.	2010-03-28 09:27:02 +00:00
Alvaro Herrera	be8cebc717	Prevent ALTER USER f RESET ALL from removing the settings that were put there by a superuser -- "ALTER USER f RESET setting" already disallows removing such a setting. Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database owner that's not superuser.	2010-03-25 14:44:34 +00:00
Simon Riggs	bf6285b3a7	Further corrections of mismatching struct and btree SizeOf macros. In this case, correction is to remove now unused fields from struct. Since these were unused and full of garbage anyway, no version change.	2010-03-20 07:49:48 +00:00
Tom Lane	865b29540e	Fix oversight in btpo.xact patch; it was in fact installing garbage in the xact field on replay, due to not writing out all the data in the wal log struct.	2010-03-19 20:51:30 +00:00
Simon Riggs	aa36bd2039	Update XLOG_PAGE_MAGIC to recognise WAL format changes.	2010-03-19 17:42:10 +00:00
Simon Riggs	3cdafe40e7	Adjust comment in .history file to match recovery target specified. Comment present since 8.0 was never fully meaningful, since two recovery targets cannot be specified. Refactor recovery target type to make this change and associated code easier to understand. No change in function. Bug report arising from internal support question.	2010-03-19 11:05:15 +00:00
Simon Riggs	5c73ae17d1	Reset btpo.xact following recovery of btree delete page. Add btpo_xact field into WAL record and reset it from there, rather than using FrozenTransactionId which can lead to some corner case bugs. Problem report and suggested route to a fix from Heikki, details by me.	2010-03-19 10:41:22 +00:00
Tom Lane	93324355eb	Pass incompletely-transformed aggregate argument lists as separate parameters to transformAggregateCall, instead of abusing fields in Aggref to carry them temporarily. No change in functionality but hopefully the code is a bit clearer now. Per gripe from Gokulakannan Somasundaram.	2010-03-17 16:52:38 +00:00
Bruce Momjian	a6c1cea2b7	Add libpq warning message if the .pgpass-retrieved password fails. Add ERRCODE_INVALID_PASSWORD sqlstate error code.	2010-03-13 14:55:57 +00:00
Tom Lane	4df5c6c719	Update comment for pg_constraint.conindid to mention that it's used for exclusion constraints. Not sure how we managed to update the comment for it in catalogs.sgml but miss this one.	2010-03-11 03:36:22 +00:00
Tom Lane	8bf14182cf	Export xml.c's libxml-error-handling support so that contrib/xml2 can use it too, instead of duplicating the functionality (badly). I renamed xml_init to pg_xml_init, because the former seemed just a bit too generic to be safe as a global symbol. I considered likewise renaming xml_ereport to pg_xml_ereport, but felt that the reference to ereport probably made it sufficiently PG-centric already.	2010-03-03 17:29:45 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Tom Lane	11b5847058	Add an OR REPLACE option to CREATE LANGUAGE. This operates in the same way as other CREATE OR REPLACE commands, ie, it replaces everything but the ownership and ACL lists of an existing entry, and requires the caller to have owner privileges for that entry. While modifying an existing language has some use in development scenarios, in typical usage all the "replaced" values come from pg_pltemplate so there will be no actual change in the language definition. The reason for adding this is mainly to allow programs to ensure that a language exists without triggering an error if it already does exist. This commit just adds and documents the new option. A followon patch will use it to clean up some unpleasant cases in pg_dump and pg_regress.	2010-02-23 22:51:43 +00:00
Tom Lane	05d8a561ff	Clean up handling of XactReadOnly and RecoveryInProgress checks. Add some checks that seem logically necessary, in particular let's make real sure that HS slave sessions cannot create temp tables. (If they did they would think that temp tables belonging to the master's session with the same BackendId were theirs. We must not allow myTempNamespace to become set in a slave session.) Change setval() and nextval() so that they are only allowed on temp sequences in a read-only transaction. This seems consistent with what we allow for table modifications in read-only transactions. Since an HS slave can't have a temp sequence, this also provides a nicer cure for the setval PANIC reported by Erik Rijkers. Make the error messages more uniform, and have them mention the specific command being complained of. This seems worth the trifling amount of extra code, since people are likely to see such messages a lot more than before.	2010-02-20 21:24:02 +00:00
Peter Eisentraut	2f6cf9192c	Revert version stamping in wrong branch	2010-02-19 18:42:30 +00:00
Peter Eisentraut	a779afb40c	Version stamp 9.0alpha4	2010-02-19 16:03:22 +00:00
Heikki Linnakangas	ad458cfe81	Don't use O_DIRECT when writing WAL files if archiving or streaming is enabled. Bypassing the kernel cache is counter-productive in that case, because the archiver/walsender process will read from the WAL file soon after it's written, and if it's not cached the read will cause a physical read, eating I/O bandwidth available on the WAL drive. Also, walreceiver process does unaligned writes, so disable O_DIRECT in walreceiver process for that reason too.	2010-02-19 10:51:04 +00:00
Tom Lane	50a90fac40	Stamp HEAD as 9.0devel, and update various places that were referring to 8.5 (hope I got 'em all). Per discussion, this release will be 9.0 not 8.5.	2010-02-17 04:19:41 +00:00
Tom Lane	d1e027221d	Replace the pg_listener-based LISTEN/NOTIFY mechanism with an in-memory queue. In addition, add support for a "payload" string to be passed along with each notify event. This implementation should be significantly more efficient than the old one, and is also more compatible with Hot Standby usage. There is not yet any facility for HS slaves to receive notifications generated on the master, although such a thing is possible in future. Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.	2010-02-16 22:34:57 +00:00
Andrew Dunstan	fc5173ad51	Add query text to auto_explain output. Still to be done: fix docs and fix regression failures under auto_explain.	2010-02-16 22:19:59 +00:00
Magnus Hagander	215cbc90f8	Add emulation of non-blocking sockets to the win32 socket/signal layer, and use this in pq_getbyte_if_available. It's only a limited implementation which swithes the whole emulation layer no non-blocking mode, but that's enough as long as non-blocking is only used during a short period of time, and only one socket is accessed during this time.	2010-02-16 19:26:02 +00:00
Greg Stark	f8c183a1ac	Speed up CREATE DATABASE by deferring the fsyncs until after copying all the data and using posix_fadvise to nudge the OS into flushing it earlier. This also hopefully makes CREATE DATABASE avoid spamming the cache. Tests show a big speedup on Linux at least on some filesystems. Idea and patch from Andres Freund.	2010-02-15 00:50:57 +00:00
Robert Haas	e26c539e9f	Wrap calls to SearchSysCache and related functions using macros. The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.	2010-02-14 18:42:19 +00:00
Tom Lane	7507b193bc	Don't expose the inline definition of MemoryContextSwitchTo when FRONTEND is defined. Its reference to CurrentMemoryContext causes link failures on some platforms, evidently because the inline function gets compiled despite lack of use. Per buildfarm member warthog.	2010-02-13 20:46:52 +00:00
Simon Riggs	dd428c79a4	Fix relcache init file invalidation during Hot Standby for the case where a database has a non-default tablespaceid. Pass thru MyDatabaseId and MyDatabaseTableSpace to allow file path to be re-created in standby and correct invalidation to take place in all cases. Update and rework xact_commit_desc() debug messages. Bug report from Tom by code inspection. Fix by me.	2010-02-13 16:15:48 +00:00
Tom Lane	e08ab7c312	Support inlining various small performance-critical functions on non-GCC compilers, by applying a configure check to see if the compiler will accept an unreferenced "static inline foo ..." function without warnings. It is believed that such warnings are the only reason not to declare inlined functions in headers, if the compiler understands "inline" at all. Kurt Harriman	2010-02-13 02:34:16 +00:00
Simon Riggs	b95a720a48	Re-enable max_standby_delay = -1 using deadlock detection on startup process. If startup waits on a buffer pin we send a request to all backends to cancel themselves if they are holding the buffer pin required and they are also waiting on a lock. If not, startup waits until max_standby_delay before cancelling any backend waiting for the requested buffer pin.	2010-02-13 01:32:20 +00:00
Simon Riggs	fafa374f2d	Introduce WAL records to log reuse of btree pages, allowing conflict resolution during Hot Standby. Page reuse interlock requested by Tom. Analysis and patch by me.	2010-02-13 00:59:58 +00:00
Tom Lane	ec4be2ee68	Extend the set of frame options supported for window functions. This patch allows the frame to start from CURRENT ROW (in either RANGE or ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING start and end points. (RANGE value PRECEDING/FOLLOWING isn't there yet --- the grammar works, but that's all.) Hitoshi Harada, reviewed by Pavel Stehule	2010-02-12 17:33:21 +00:00
Teodor Sigaev	5209c084a6	Generic implementation of red-black binary tree. It's planned to use in several places, but for now only GIN uses it during index creation. Using self-balanced tree greatly speeds up index creation in corner cases with preordered data.	2010-02-11 14:29:50 +00:00
Tom Lane	cbe9d6beb4	Fix up rickety handling of relation-truncation interlocks. Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.	2010-02-09 21:43:30 +00:00
Tom Lane	d5768dce10	Create an official API function for C functions to use to check if they are being called as aggregates, and to get the aggregate transition state memory context if needed. Use it instead of poking directly into AggState and WindowAggState in places that shouldn't know so much. We should have done this in 8.4, probably, but better late than never. Revised version of a patch by Hitoshi Harada.	2010-02-08 20:39:52 +00:00
Bruce Momjian	dfc902854a	Add C comments that HEAP_MOVED_* define usage is only for pre-9.0 binary upgrades.	2010-02-08 14:10:21 +00:00
Tom Lane	9a75803b1a	Remove CatalogCacheFlushRelation, and the reloidattr infrastructure that was needed by nothing else. The restructuring I just finished doing on cache management exposed to me how silly this routine was. Its function was to go into the catcache and blow away all entries related to a given relation when there was a relcache flush on that relation. However, there is no point in removing a catcache entry if the catalog row it represents is still valid --- and if it isn't valid, there must have been a catcache entry flush on it, because that's triggered directly by heap_update or heap_delete on the catalog row. So this routine accomplished nothing except to blow away valid cache entries that we'd very likely be wanting in the near future to help reconstruct the relcache entry. Dumb. On top of which, it required a subtle and easy-to-get-wrong attribute in syscache definitions, ie, the column containing the OID of the related relation if any. Removing that is a very useful maintenance simplification.	2010-02-08 05:53:55 +00:00
Tom Lane	0a469c8769	Remove old-style VACUUM FULL (which was known for a little while as VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.	2010-02-08 04:33:55 +00:00
Tom Lane	1ddc2703a9	Work around deadlock problems with VACUUM FULL/CLUSTER on system catalogs, as per my recent proposal. First, teach IndexBuildHeapScan to not wait for INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples to commit unless the index build is checking uniqueness/exclusion constraints. If it isn't, there's no harm in just indexing the in-doubt tuple. Second, modify VACUUM FULL/CLUSTER to suppress reverifying uniqueness/exclusion constraint properties while rebuilding indexes of the target relation. This is reasonable because these commands aren't meant to deal with corrupted-data situations. Constraint properties will still be rechecked when an index is rebuilt by a REINDEX command. This gets us out of the problem that new-style VACUUM FULL would often wait for other transactions while holding exclusive lock on a system catalog, leading to probable deadlock because those other transactions need to look at the catalogs too. Although the real ultimate cause of the problem is a debatable choice to release locks early after modifying system catalogs, changing that choice would require pretty serious analysis and is not something to be undertaken lightly or on a tight schedule. The present patch fixes the problem in a fairly reasonable way and should also improve the speed of VACUUM FULL/CLUSTER a little bit.	2010-02-07 22:40:33 +00:00
Tom Lane	b9b8831ad6	Create a "relation mapping" infrastructure to support changing the relfilenodes of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.	2010-02-07 20:48:13 +00:00
Tom Lane	9727c583fe	Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swapping of old and new toast tables can be done either at the logical level (by swapping the heaps' reltoastrelid links) or at the physical level (by swapping the relfilenodes of the toast tables and their indexes). This is necessary infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared system catalogs, where we cannot change reltoastrelid. The physical swap saves a few catalog updates too. We unfortunately have to keep the logical-level swap logic because in some cases we will be adding or deleting a toast table, so there's no possibility of a physical swap. However, that only happens as a consequence of schema changes in the table, which we do not need to support for system catalogs, so such cases aren't an obstacle for that. In passing, refactor the cluster support functions a little bit to eliminate unnecessarily-duplicated code; and fix the problem that while CLUSTER had been taught to rename the final toast table at need, ALTER TABLE had not.	2010-02-04 00:09:14 +00:00
Heikki Linnakangas	808969d0e7	Add a message type header to the CopyData messages sent from primary to standby in streaming replication. While we only have one message type at the moment, adding a message type header makes this easier to extend.	2010-02-03 09:47:19 +00:00
Tom Lane	70a2b05a59	Assorted cleanups in preparation for using a map file to support altering the relfilenode of currently-not-relocatable system catalogs. 1. Get rid of inval.c's dependency on relfilenode, by not having it emit smgr invalidations as a result of relcache flushes. Instead, smgr sinval messages are sent directly from smgr.c when an actual relation delete or truncate is done. This makes considerably more structural sense and allows elimination of a large number of useless smgr inval messages that were formerly sent even in cases where nothing was changing at the physical-relation level. Note that this reintroduces the concept of nontransactional inval messages, but that's okay --- because the messages are sent by smgr.c, they will be sent in Hot Standby slaves, just from a lower logical level than before. 2. Move setNewRelfilenode out of catalog/index.c, where it never logically belonged, into relcache.c; which is a somewhat debatable choice as well but better than before. (I considered catalog/storage.c, but that seemed too low level.) Rename to RelationSetNewRelfilenode. 3. Cosmetic cleanups of some other relfilenode manipulations.	2010-02-03 01:14:17 +00:00
Magnus Hagander	0a27347141	Make RADIUS authentication use pg_getaddrinfo_all() to get address of the server. Gets rid of a fairly ugly hack for Solaris, and also provides hostname and IPV6 support.	2010-02-02 19:09:37 +00:00
Robert Haas	d8db6a6096	Fold FindConversion() into FindConversionByName() and remove ACL check. All callers of FindConversionByName() already do suitable permissions checking already apart from this function, but this is not just dead code removal: the unnecessary permissions check can actually lead to spurious failures - there's no reason why inability to execute the underlying function should prohibit renaming the conversion, for example. (The error messages in these cases were also rather poor: FindConversion would return InvalidOid, eventually leading to a complaint that the conversion "did not exist", which was not correct.) KaiGai Kohei	2010-02-02 18:52:33 +00:00
Robert Haas	63f9282f6e	Tighten integrity checks on ALTER TABLE ... ALTER COLUMN ... RENAME. When a column is renamed, we recursively rename the same column in all descendent tables. But if one of those tables also inherits that column from a table outside the inheritance hierarchy rooted at the named table, we must throw an error. The previous coding correctly prohibited the rename when the parent had inherited the column from elsewhere, but overlooked the case where the parent was OK but a child table also inherited the same column from a second, unrelated parent. For now, not backpatched due to lack of complaints from the field. KaiGai Kohei, with further changes by me. Reviewed by Bernd Helme and Tom Lane.	2010-02-01 19:28:56 +00:00
Robert Haas	42a8ab0a14	Augment EXPLAIN output with more details on Hash nodes. We show the number of buckets, the number of batches (and also the original number if it has changed), and the peak space used by the hash table. Minor executor changes to track peak space used.	2010-02-01 15:43:36 +00:00
Simon Riggs	296578feb4	Revoke augmentation of WAL records for btree delete, per discussion.	2010-02-01 13:40:28 +00:00
Itagaki Takahiro	9ea9918e37	Add string_agg aggregate functions. The one argument version concatenates the input values into a string. The two argument version also does the same thing, but inserts delimiters between elements. Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.	2010-02-01 03:14:45 +00:00
Simon Riggs	c85c941470	Detect early deadlock in Hot Standby when Startup is already waiting. First stage of required deadlock detection to allow re-enabling max_standby_delay setting of -1, which is now essential in the absence of improved relation- specific conflict resoluton. Requested by Greg Stark et al.	2010-01-31 19:01:11 +00:00
Tom Lane	0c0203d0a9	Parenthesize this macro, just in case.	2010-01-31 17:35:46 +00:00
Simon Riggs	6d2bc0a6cf	Augment WAL records for btree delete with GetOldestXmin() to reduce false positives during Hot Standby conflict processing. Simple patch to enhance conflict processing, following previous discussions. Controlled by parameter minimize_standby_conflicts = on \| off, with default off allows measurement of performance impact to see whether it should be set on all the time.	2010-01-29 18:39:05 +00:00
Simon Riggs	76be0c81cc	Filter recovery conflicts based upon dboid from relfilenode of WAL records for heap and btree. Minor change, mostly API changes to pass through the required values. This is a simple change though also provides the refactoring required for further enhancements to conflict processing using the relOid. Changes only have effect during Hot Standby.	2010-01-29 17:10:05 +00:00
Peter Eisentraut	e7b3349a8a	Type table feature This adds the CREATE TABLE name OF type command, per SQL standard.	2010-01-28 23:21:13 +00:00
Magnus Hagander	083e1b0f27	Add functions to reset the statistics counter for a single table/index or a single function.	2010-01-28 14:25:41 +00:00
Magnus Hagander	21d3ae09da	Define INADDR_NONE on Solaris when it's missing. Per a couple of buildfarm members complaining.	2010-01-28 11:36:14 +00:00
Heikki Linnakangas	e0e8b96345	Change a few remaining calls of XLogArchivingActive() to use XLogIsNeeded() instead, to determine if an otherwise non-logged operation needs to be logged in WAL for standby servers. Fujii Masao	2010-01-28 07:31:42 +00:00
Heikki Linnakangas	1bb2558046	Make standby server continuously retry restoring the next WAL segment with restore_command, if the connection to the primary server is lost. This ensures that the standby can recover automatically, if the connection is lost for a long time and standby falls behind so much that the required WAL segments have been archived and deleted in the master. This also makes standby_mode useful without streaming replication; the server will keep retrying restore_command every few seconds until the trigger file is found. That's the same basic functionality pg_standby offers, but without the bells and whistles. To implement that, refactor the ReadRecord/FetchRecord functions. The FetchRecord() function introduced in the original streaming replication patch is removed, and all the retry logic is now in a new function called XLogReadPage(). XLogReadPage() is now responsible for executing restore_command, launching walreceiver, and waiting for new WAL to arrive from primary, as required. This also changes the life cycle of walreceiver. When launched, it now only tries to connect to the master once, and exits if the connection fails, or is lost during streaming for any reason. The startup process detects the death, and re-launches walreceiver if necessary.	2010-01-27 15:27:51 +00:00
Magnus Hagander	b3daac5a9c	Add support for RADIUS authentication.	2010-01-27 12:12:00 +00:00
Tom Lane	d879697cd2	Remove the default_do_language parameter, instead making DO use a hardwired default of "plpgsql". This is more reasonable than it was when the DO patch was written, because we have since decided that plpgsql should be installed by default. Per discussion, having a parameter for this doesn't seem useful enough to justify the risk of application breakage if the value is changed unexpectedly.	2010-01-26 16:33:40 +00:00
Tom Lane	9507c8a1db	Add get_bit/set_bit functions for bit strings, paralleling those for bytea, and implement OVERLAY() for bit strings and bytea. In passing also convert text OVERLAY() to a true built-in, instead of relying on a SQL function. Leonardo F, reviewed by Kevin Grittner	2010-01-25 20:55:32 +00:00
Simon Riggs	959ac58c04	In HS, Startup process sets SIGALRM when waiting for buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.	2010-01-23 16:37:12 +00:00
Robert Haas	d779199175	Fix several oversights in previous commit - attribute options patch. I failed to 'cvs add' the new files and also neglected to bump catversion.	2010-01-22 16:42:31 +00:00
Robert Haas	76a47c0e74	Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism. Attributes can now have options, just as relations and tablespaces do, and the reloptions code is used to parse, validate, and store them. For simplicity and because these options are not performance critical, we store them in a separate cache rather than the main relcache. Thanks to Alex Hunsaker for the review.	2010-01-22 16:40:19 +00:00
Peter Eisentraut	adb7764030	PL/Python DO handler Also cleaned up some redundancies between the primary error messages and the error context in PL/Python. Hannu Valtonen	2010-01-22 15:45:15 +00:00
Heikki Linnakangas	09b115f706	Write a WAL record whenever we perform an operation without WAL-logging that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.	2010-01-20 19:43:40 +00:00
Heikki Linnakangas	47ce95a7b9	Now that much of walreceiver has been pulled back into the postgres binary, revert PGDLLIMPORT decoration of global variables. I'm not sure if there's any real harm from unnecessary PGDLLIMPORTs, but these are all internal variables that external modules really shouldn't be messing with. ThisTimeLineID still needs PGDLLIMPORT.	2010-01-20 18:54:27 +00:00
Heikki Linnakangas	32bc08b1d4	Rethink the way walreceiver is linked into the backend. Instead than shoving walreceiver as whole into a dynamically loaded module, split the libpq-specific parts of it into dynamically loaded module and keep the rest in the main backend binary. Although Tom fixed the Windows compilation problems with the old walreceiver module already, this is a cleaner division of labour and makes the code more readable. There's also the prospect of adding new transport methods as pluggable modules in the future, which this patch makes easier, though for now the API between libpqwalreceiver and walreceiver process should be considered private. The libpq-specific module is now in src/backend/replication/libpqwalreceiver, and the part linked with postgres binary is in src/backend/replication/walreceiver.c.	2010-01-20 09:16:24 +00:00
Magnus Hagander	7e40cdc075	Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide shared statistics of the bgwriter. Greg Smith	2010-01-19 14:11:32 +00:00
Tom Lane	4f15699d70	Add pg_table_size() and pg_indexes_size() to provide more user-friendly wrappers around the pg_relation_size() function. Bernd Helmle, reviewed by Greg Smith	2010-01-19 05:50:18 +00:00
Tom Lane	9a915e596f	Improve the handling of SET CONSTRAINTS commands by having them search pg_constraint before searching pg_trigger. This allows saner handling of corner cases; in particular we now say "constraint is not deferrable" rather than "constraint does not exist" when the command is applied to a constraint that's inherently non-deferrable. Per a gripe several months ago from hubert depesz lubaczewski. To make this work without breaking user-defined constraint triggers, we have to add entries for them to pg_constraint. However, in return we can remove the pgconstrname column from pg_constraint, which represents a fairly sizable space savings. I also replaced the tgisconstraint column with tgisinternal; the old meaning of tgisconstraint can now be had by testing for nonzero tgconstraint, while there is no other way to get the old meaning of nonzero tgconstraint, namely that the trigger was internally generated rather than being user-created. In passing, fix an old misstatement in the docs and comments, namely that pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable. Actually, we mark RI action triggers as nondeferrable even when they belong to a nominally deferrable FK constraint. The SET CONSTRAINTS code now relies on that instead of hard-coding a list of exception OIDs.	2010-01-17 22:56:23 +00:00
Simon Riggs	a8ce974cdd	Teach standby conflict resolution to use SIGUSR1 Conflict reason is passed through directly to the backend, so we can take decisions about the effect of the conflict based upon the local state. No specific changes, as yet, though this prepares for later work. CancelVirtualTransaction() sends signals while holding ProcArrayLock. Introduce errdetail_abort() to give message detail explaining that the abort was caused by conflict processing. Remove CONFLICT_MODE states in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.	2010-01-16 10:05:59 +00:00
Tom Lane	c9dc53be77	Huh, apparently on cygwin we HAVE_SIGPROCMASK, so both variants of the BlockSig/UnBlockSig declaration have to be PGDLLIMPORT'ified. Per buildfarm results.	2010-01-16 05:52:29 +00:00
Tom Lane	47a09eda89	PGDLLIMPORT-ize the remaining variables needed by walreceiver.	2010-01-16 00:04:41 +00:00
Tom Lane	08f8d478eb	Do parse analysis of an EXPLAIN's contained statement during the normal parse analysis phase, rather than at execution time. This makes parameter handling work the same as it does in ordinary plannable queries, and in particular fixes the incompatibility that Pavel pointed out with plpgsql's new handling of variable references. plancache.c gets a little bit grottier, but the alternatives seem worse.	2010-01-15 22:36:35 +00:00
Heikki Linnakangas	40f908bdcd	Introduce Streaming Replication. This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me	2010-01-15 09:19:10 +00:00
Teodor Sigaev	4cbe473938	Add point_ops opclass for GiST.	2010-01-14 16:31:09 +00:00
Simon Riggs	e99767bc28	First part of refactoring of code for ResolveRecoveryConflict. Purposes of this are to centralise the conflict code to allow further change, as well as to allow passing through the full reason for the conflict through to the conflicting backends. Backend state alters how we can handle different types of conflict so this is now required. As originally suggested by Heikki, no longer optional.	2010-01-14 11:08:02 +00:00
Bruce Momjian	228170410d	Please tablespace directories in their own subdirectory so pg_migrator can upgrade clusters without renaming the tablespace directories. New directory structure format is, e.g.: $PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814	2010-01-12 02:42:52 +00:00
Tom Lane	6164b4bc19	Some trivial adjustments in comments for struct RelationData.	2010-01-10 22:19:17 +00:00
Simon Riggs	3bfcccc295	During Hot Standby, fix drop database when sessions idle. Previously we only cancelled sessions that were in-transaction. Simple fix is to just cancel all sessions without waiting. Doing it this way avoids complicating common code paths, which would not be worth the trouble to cover this rare case. Problem report and fix by Andres Freund, edited somewhat by me	2010-01-10 15:44:28 +00:00
Magnus Hagander	87091cb1f1	Create typedef pgsocket for storing socket descriptors. This silences some warnings on Win64. Not using the proper SOCKET datatype was actually wrong on Win32 as well, but didn't cause any warnings there. Also create define PGINVALID_SOCKET to indicate an invalid/non-existing socket, instead of using a hardcoded -1 value.	2010-01-10 14:16:08 +00:00
Robert Haas	84b6d5f359	Remove partial, broken support for NULL pointers when fetching attributes. Previously, fastgetattr() and heap_getattr() tested their fourth argument against a null pointer, but any attempt to use them with a literal-NULL fourth argument evaluated to (void )0, resulting in a compiler error. Remove these NULL tests to avoid leading future readers of this code to believe that this has a chance of working. Also clean up related legacy code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr(). The new coding standard is that any code which calls a getattr-type function or macro which takes an isnull argument MUST pass a valid boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro Herrera.	2010-01-10 04:26:36 +00:00
Simon Riggs	42edbd16fb	During Hot Standby, set DatabasePath correctly during relcache init file deletion, so that we attempt to unlink the correct filepath. unlink() errors are ignorable there, so lack of a DatabasePath initialization step did not cause visible problems until a related bug showed up on Solaris. Code refactored from xact_redo_commit() to ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay shared invalidation messages for many databases, so we cannot SetDatabasePath() once as we do in normal backends. Read the databaseid from the shared invalidation messages, then set DatabasePath temporarily before calling RelationCacheInitFileInvalidate(). Problem report by Robert Treat, analysis and fix by me.	2010-01-09 16:49:27 +00:00
Itagaki Takahiro	83a5a338bf	pgBufferUsage needs PGDLLIMPORT for pg_stat_statements on Windows.	2010-01-08 00:48:56 +00:00
Tom Lane	50626efe0a	Fix 3-parameter form of bit substring() to throw error for negative length, as required by SQL standard.	2010-01-07 20:17:44 +00:00
Tom Lane	901be0fad4	Remove all the special-case code for INT64_IS_BUSTED, per decision that we're not going to support that anymore. I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a performance excuse to live. It's a bit moot since that's all ifdef'd out, of course.	2010-01-07 04:53:35 +00:00
Robert Haas	814c8a03ba	Further fixes for per-tablespace options patch. Add missing varlena header to TableSpaceOpts structure. And, per Tom Lane, instead of calling tablespace_reloptions in CacheMemoryContext, call it in the caller's memory context and copy the value over afterwards, to reduce the chances of a session-lifetime memory leak.	2010-01-07 03:53:08 +00:00
Tom Lane	d15cb38dec	Alter the configure script to fail immediately if the C compiler does not provide a working 64-bit integer datatype. As recently noted, we've been broken on such platforms since early in the 8.4 development cycle. Since it took nearly two years for anyone to even notice, it seems that the rationale for continuing to support such platforms has reached the point of non-existence. Rather than thrashing around to try to make it work again, we'll just admit up front that this no longer works. Back-patch to 8.4 since that branch is also broken. We should go around to remove INT64_IS_BUSTED support, but just in HEAD, so that seems like material for a separate commit.	2010-01-07 00:25:05 +00:00
Itagaki Takahiro	946cf229e8	Support rewritten-based full vacuum as VACUUM FULL. Traditional VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming. Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot use it for system tables. VACUUM FULL for system tables always fall back into VACUUM FULL INPLACE silently. Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.	2010-01-06 05:31:14 +00:00
Bruce Momjian	28f6cab61a	binary upgrade: Preserve relfilenodes for views and composite types --- even though we don't store data in, them, they do consume relfilenodes. Bump catalog version.	2010-01-06 05:18:18 +00:00
Bruce Momjian	2391396b3d	Update catalog version for recent relfilenode patch, so pg_migrator can identify the new API.	2010-01-06 03:07:24 +00:00
Bruce Momjian	f98fbc78c3	Preserve relfilenodes: Add support to pg_dump --binary-upgrade to preserve all relfilenodes, for use by pg_migrator.	2010-01-06 03:04:03 +00:00
Bruce Momjian	8cdb85b512	Remove tabs in SGML. Move OIDCHARS to proper include file.	2010-01-06 02:41:37 +00:00
Tom Lane	90f4c2d960	Add support for doing FULL JOIN ON FALSE. While this is really a rather peculiar variant of UNION ALL, and so wouldn't likely get written directly as-is, it's possible for it to arise as a result of simplification of less-obviously-silly queries. In particular, now that we can do flattening of subqueries that have constant outputs and are underneath an outer join, it's possible for the case to result from simplification of queries of the type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality regression for this type of query.	2010-01-05 23:25:36 +00:00
Robert Haas	d86d51a958	Support ALTER TABLESPACE name SET/RESET ( tablespace_options ). This patch only supports seq_page_cost and random_page_cost as parameters, but it provides the infrastructure to scalably support many more. In particular, we may want to add support for effective_io_concurrency, but I'm leaving that as future work for now. Thanks to Tom Lane for design help and Alvaro Herrera for the review.	2010-01-05 21:54:00 +00:00
Magnus Hagander	ce92f8b463	Use _mm_pause() for win64 spin_delay(), per note from Tsutomu Yamada.	2010-01-05 11:06:28 +00:00
Tom Lane	64737e9313	Get rid of the need for manual maintenance of the initial contents of pg_attribute, by having genbki.pl derive the information from the various catalog header files. This greatly simplifies modification of the "bootstrapped" catalogs. This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on Perl scripts for those build steps. To avoid creating a Perl build dependency where there was not one before, the output files generated by these scripts are now treated as distprep targets, ie, they will be built and shipped in tarballs. But you will need a reasonably modern Perl (probably at least 5.6) if you want to build from a CVS pull. The changes to the MSVC build process are untested, and may well break --- we'll soon find out from the buildfarm. John Naylor, based on ideas from Robert Haas and others	2010-01-05 01:06:57 +00:00
Magnus Hagander	305e85b098	Add a Win64-specific spin_delay() function. We can't use the same as before, since MSVC on Win64 doesn't support inline assembly.	2010-01-04 17:10:24 +00:00
Heikki Linnakangas	06f82b2961	Write an end-of-backup WAL record at pg_stop_backup(), and wait for it at recovery instead of reading the backup history file. This is more robust, as it stops you from prematurely starting up an inconsisten cluster if the backup history file is lost for some reason, or if the base backup was never finished with pg_stop_backup(). This also paves the way for a simpler streaming replication patch, which doesn't need to care about backup history files anymore. The backup history file is still created and archived as before, but it's not used by the system anymore. It's just for informational purposes now. Bump PG_CONTROL_VERSION as the location of the backup startpoint is now written to a new field in pg_control, and catversion because initdb is required Original patch by Fujii Masao per Simon's idea, with further fixes by me.	2010-01-04 12:50:50 +00:00
Tom Lane	40608e7f94	When estimating the selectivity of an inequality "column > constant" or "column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.	2010-01-04 02:44:40 +00:00
Magnus Hagander	c3705d8cc5	Make ssize_t 64-bit on Win64, for compatibility with for example plpython.	2010-01-02 22:47:37 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Magnus Hagander	8491998d3d	Set proper sizes for size_t and void* on 64-bit Windows builds. Tsutomu Yamada	2010-01-02 13:56:37 +00:00
Magnus Hagander	2de9a463ff	Support 64-bit shared memory when building on 64-bit Windows. Tsutomu Yamada	2010-01-02 12:18:45 +00:00
Tom Lane	7839d35991	Add an "argisrow" field to NullTest nodes, following a plan made way back in 8.2beta but never carried out. This avoids repetitive tests of whether the argument is of scalar or composite type. Also, be a bit more paranoid about composite arguments in some places where we previously weren't checking.	2010-01-01 23:03:10 +00:00
Tom Lane	29c4ad9829	Support "x IS NOT NULL" clauses as indexscan conditions. This turns out to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.	2010-01-01 21:53:49 +00:00
Tom Lane	85d02a6586	Redefine Datum as uintptr_t, instead of unsigned long. This is more in keeping with modern practice, and is a first step towards porting to Win64 (which has sizeof(pointer) > sizeof(long)). Tsutomu Yamada, Magnus Hagander, Tom Lane	2009-12-31 19:41:37 +00:00
Tom Lane	48c192c15e	Revise pgstat's tracking of tuple changes to improve the reliability of decisions about when to auto-analyze. The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples, where all three of these numbers could be bad estimates from ANALYZE itself. Even worse, in the presence of a steady flow of HOT updates and matching HOT-tuple reclamations, auto-analyze might never trigger at all, even if all three numbers are exactly right, because n_dead_tuples could hold steady. To fix, replace last_anl_tuples with an accurately tracked count of the total number of committed tuple inserts + updates + deletes since the last ANALYZE on the table. This can still be compared to the same threshold as before, but it's much more trustworthy than the old computation. Tracking this requires one more intra-transaction counter per modified table within backends, but no additional memory space in the stats collector. There probably isn't any measurable speed difference; if anything it might be a bit faster than before, since I was able to eliminate some per-tuple arithmetic operations in favor of adding sums once per (sub)transaction. Also, simplify the logic around pgstat vacuum and analyze reporting messages by not trying to fold VACUUM ANALYZE into a single pgstat message. The original thought behind this patch was to allow scheduling of analyzes on parent tables by artificially inflating their changes_since_analyze count. I've left that for a separate patch since this change seems to stand on its own merit.	2009-12-30 20:32:14 +00:00
Tom Lane	540e69a061	Add an index on pg_inherits.inhparent, and use it to avoid seqscans in find_inheritance_children(). This is a complete no-op in databases without any inheritance. In databases where there are just a few entries in pg_inherits, it could conceivably be a small loss. However, in databases with many inheritance parents, it can be a big win.	2009-12-29 22:00:14 +00:00
Tom Lane	649b5ec7c8	Add the ability to store inheritance-tree statistics in pg_statistic, and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.	2009-12-29 20:11:45 +00:00
Bruce Momjian	e5b457c2ac	Add backend and pg_dump code to allow preservation of pg_enum oids, for use in binary upgrades. Bump catalog version for detection by pg_migrator of new backend API.	2009-12-27 14:50:46 +00:00
Bruce Momjian	c44327afa4	Binary upgrade: Modify pg_dump --binary-upgrade and add backend support routines to support the preservation of pg_type oids when doing a binary upgrade. This allows user-defined composite types and arrays to be binary upgraded.	2009-12-24 22:09:24 +00:00
Tom Lane	d68e08d1fe	Allow the index name to be omitted in CREATE INDEX, causing the system to choose an index name the same as it would do for an unnamed index constraint. (My recent changes to the index naming logic have helped to ensure that this will be a reasonable choice.) Per a suggestion from Peter. A necessary side-effect is to promote CONCURRENTLY to type_func_name_keyword status, ie, it can't be a table/column/index name anymore unless quoted. This is not all bad, since we have heard more than once of people typing CREATE INDEX CONCURRENTLY ON foo (...) and getting a normal index build of an index named "concurrently", which was not what they wanted. Now this syntax will result in a concurrent build of an index with system-chosen name; which they can rename afterwards if they want something else.	2009-12-23 17:41:45 +00:00
Tom Lane	cfc5008a51	Adjust naming of indexes and their columns per recent discussion. Index expression columns are now named after the FigureColname result for their expressions, rather than always being "pg_expression_N". Digits are appended to this name if needed to make the column name unique within the index. (That happens for regular columns too, thus fixing the old problem that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes there was no real reason to do such a thing, but now maybe there is.) Default names for indexes and associated constraints now include the column names of all their columns, not only the first one as in previous practice. (Of course, this will be truncated as needed to fit in NAMEDATALEN. Also, pkey indexes retain the historical behavior of not naming specific columns at all.) An example of the results: regression=# create table foo (f1 int, f2 text, regression(# exclude (f1 with =, lower(f2) with =)); NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo" CREATE TABLE regression=# \d foo_f1_lower_exclusion Index "public.foo_f1_lower_exclusion" Column \| Type \| Definition --------+---------+------------ f1 \| integer \| f1 lower \| text \| lower(f2) btree, for table "public.foo"	2009-12-23 02:35:25 +00:00
Tom Lane	4fca795de4	Bump catversion to reflect the fact that HS patch changed pg_proc contents, and PG_CONTROL_VERSION to reflect the fact that it changed pg_control contents. (I see we did at least remember to change XLOG_PAGE_MAGIC for the WAL contents changes.)	2009-12-19 04:08:32 +00:00
Simon Riggs	efc16ea520	Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.	2009-12-19 01:32:45 +00:00
Peter Eisentraut	d6de43099a	Don't unblock SIGQUIT in the SIGQUIT handler This was possibly linked to a deadlock-like situation in glibc syslog code invoked by the ereport call in quickdie(). In any case, a signal handler should not unblock its own signal unless there is a specific reason to.	2009-12-16 23:05:00 +00:00
Peter Eisentraut	b63b967a7e	If there is no sigdelset(), define it as a macro. This removes some duplicate code that recreated the identical workaround when the newer signal API is missing.	2009-12-16 22:55:34 +00:00
Peter Eisentraut	dd4cd55c15	Python 3 support in PL/Python Behaves more or less unchanged compared to Python 2, but the new language variant is called plpython3u. Documentation describing the naming scheme is included.	2009-12-15 22:59:55 +00:00
Tom Lane	a5495cd841	Add a hook to let loadable modules get control at ProcessUtility execution, and use it to extend contrib/pg_stat_statements to track utility commands. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.	2009-12-15 20:04:49 +00:00
Tom Lane	34d26872ed	Support ORDER BY within aggregate function calls, at long last providing a non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada	2009-12-15 17:57:48 +00:00
Robert Haas	cddca5ec13	Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics. This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.	2009-12-15 04:57:48 +00:00
Tom Lane	a620d5005d	Fix a bug introduced when set-returning SQL functions were made inline-able: we have to cope with the possibility that the declared result rowtype contains dropped columns. This fails in 8.4, as per bug #5240. While at it, be more paranoid about inserting binary coercions when inlining. The pre-8.4 code did not really need to worry about that because it could not inline at all in any case where an added coercion could change the behavior of the function's statement. However, when inlining a SRF we allow sorting, grouping, and set-ops such as UNION. In these cases, modifying one of the targetlist entries that the sort/group/setop depends on could conceivably change the behavior of the function's statement --- so don't inline when such a case applies.	2009-12-14 02:15:54 +00:00
Magnus Hagander	0182d6f646	Allow LDAP authentication to operate in search+bind mode, meaning it does a search for the user in the directory first, and then binds with the DN found for this user. This allows for LDAP logins in scenarios where the DN of the user cannot be determined simply by prefix and suffix, such as the case where different users are located in different containers. The old way of authentication can be significantly faster, so it's kept as an option. Robert Fleming and Magnus Hagander	2009-12-12 21:35:21 +00:00
Robert Haas	02490d4692	Export ExplainBeginOutput() and ExplainEndOutput() for auto_explain. Without these functions, anyone outside of explain.c can't actually use ExplainPrintPlan, because the ExplainState won't be initialized properly. The user-visible result of this was a crash when using auto_explain with the JSON output format. Report by Euler Taveira de Oliveira. Analysis by Tom Lane. Patch by me.	2009-12-12 00:35:34 +00:00
Itagaki Takahiro	f1325ce213	Add large object access control. A new system catalog pg_largeobject_metadata manages ownership and access privileges of large objects. KaiGai Kohei, reviewed by Jaime Casanova.	2009-12-11 03:34:57 +00:00
Andrew Dunstan	324385d67f	Add YAML to list of EXPLAIN formats. Greg Sabino Mullane, reviewed by Takahiro Itagaki.	2009-12-11 01:33:35 +00:00
Tom Lane	62aba76568	Prevent indirect security attacks via changing session-local state within an allegedly immutable index function. It was previously recognized that we had to prevent such a function from executing SET/RESET ROLE/SESSION AUTHORIZATION, or it could trivially obtain the privileges of the session user. However, since there is in general no privilege checking for changes of session-local state, it is also possible for such a function to change settings in a way that might subvert later operations in the same session. Examples include changing search_path to cause an unexpected function to be called, or replacing an existing prepared statement with another one that will execute a function of the attacker's choosing. The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against these threats, which are the same places previously deemed to need protection against the SET ROLE issue. GUC changes are still allowed, since there are many useful cases for that, but we prevent security problems by forcing a rollback of any GUC change after completing the operation. Other cases are handled by throwing an error if any change is attempted; these include temp table creation, closing a cursor, and creating or deleting a prepared statement. (In 7.4, the infrastructure to roll back GUC changes doesn't exist, so we settle for rejecting changes of "search_path" in these contexts.) Original report and patch by Gurjeet Singh, additional analysis by Tom Lane. Security: CVE-2009-4136	2009-12-09 21:57:51 +00:00
Tom Lane	0cb65564e5	Add exclusion constraints, which generalize the concept of uniqueness to support any indexable commutative operator, not just equality. Two rows violate the exclusion constraint if "row1.col OP row2.col" is TRUE for each of the columns in the constraint. Jeff Davis, reviewed by Robert Haas	2009-12-07 05:22:23 +00:00
Tom Lane	8de7472b45	Don't use a duplicate OID for aclexplode().	2009-12-06 02:55:54 +00:00
Peter Eisentraut	36f887c41c	Speed up information schema privilege views Instead of expensive cross joins to resolve the ACL, add table-returning function aclexplode() that expands the ACL into a useful form, and join against that. Also, implement the role__grants views as a thin layer over the respective _privileges views instead of essentially repeating the same code twice. fixes bug #4596 by Joachim Wieland, with cleanup by me	2009-12-05 21:43:36 +00:00
Heikki Linnakangas	ab3148b712	Fix bug in temporary file management with subtransactions. A cursor opened in a subtransaction stays open even if the subtransaction is aborted, so any temporary files related to it must stay alive as well. With the patch, we use ResourceOwners to track open temporary files and don't automatically close them at subtransaction end (though in the normal case temporary files are registered with the subtransaction resource owner and will therefore be closed). At end of top transaction, we still check that there's no temporary files marked as close-at-end-of-transaction open, but that's now just a debugging cross-check as the resource owner cleanup should've closed them already.	2009-12-03 11:03:29 +00:00
Tom Lane	0d32342501	Teach the regular expression functions to do case-insensitive matching and locale-dependent character classification properly when the database encoding is UTF8. The previous coding worked okay in single-byte encodings, or in any case for ASCII characters, but failed entirely on multibyte characters. The fix assumes that the <wctype.h> functions use Unicode code points as the wchar representation for Unicode, ie, wchar matches pg_wchar. This is only a partial solution, since we're still stupid about non-ASCII characters in multibyte encodings other than UTF8. The practical effect of that is limited, however, since those cases are generally Far Eastern glyphs for which concepts like case-folding don't apply anyway. Certainly all or nearly all of the field reports of problems have been about UTF8. A more general solution would require switching to the platform's wchar representation for all regex operations; which is possible but would have substantial disadvantages. Let's try this and see if it's sufficient in practice.	2009-12-01 21:00:24 +00:00
Bruce Momjian	ef51395e24	Revert due to Tom's concerns: Add ProcessUtility_hook() to handle all DDL to contrib/pg_stat_statements.	2009-12-01 02:31:13 +00:00
Bruce Momjian	d85cb27293	ProcessUtility_hook: Add ProcessUtility_hook() to handle all DDL to contrib/pg_stat_statements. Itagaki Takahiro	2009-12-01 01:08:46 +00:00
Tom Lane	0c61cff57a	Make pg_stat_activity.application_name visible to all users, rather than being hidden when current_query is. Relocate it to a column position more consistent with that behavior. Per discussion.	2009-11-29 18:14:32 +00:00
Tom Lane	42b2907d12	Add support for anonymous code blocks (DO blocks) to PL/Perl. Joshua Tolley, reviewed by Brendan Jurd and Tim Bunce	2009-11-29 03:02:27 +00:00
Tom Lane	8217cfbd99	Add support for an application_name parameter, which is displayed in pg_stat_activity and recorded in log entries. Dave Page, reviewed by Andres Freund	2009-11-28 23:38:08 +00:00
Tom Lane	1a95f12702	Eliminate a lot of list-management overhead within join_search_one_level by adding a requirement that build_join_rel add new join RelOptInfos to the appropriate list immediately at creation. Per report from Robert Haas, the list_concat_unique_ptr() calls that this change eliminates were taking the lion's share of the runtime in larger join problems. This doesn't do anything to fix the fundamental combinatorial explosion in large join problems, but it should push out the threshold of pain a bit further. Note: because this changes the order in which joinrel lists are built, it might result in changes in selected plans in cases where different alternatives have exactly the same costs. There is one example in the regression tests.	2009-11-28 00:46:19 +00:00
Heikki Linnakangas	cd87b6f8a5	Fix an old bug in multixact and two-phase commit. Prepared transactions can be part of multixacts, so allocate a slot for each prepared transaction in the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer the oldest member value from the current backends slot to the prepared xact slot. Also save and recover the value from the 2pc state file. The symptom of the bug was that after a transaction prepared, a shared lock still held by the prepared transaction was sometimes ignored by other transactions. Fix back to 8.1, where both 2PC and multixact were introduced.	2009-11-23 09:58:36 +00:00
Tom Lane	7fc0f06221	Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be checked to determine whether the trigger should be fired. For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER triggers it can provide a noticeable performance improvement, since queuing of a deferred trigger event and re-fetching of the row(s) at end of statement can be short-circuited if the trigger does not need to be fired. Takahiro Itagaki, reviewed by KaiGai Kohei.	2009-11-20 20:38:12 +00:00
Tom Lane	c742b795dd	Add a hook to CREATE/ALTER ROLE to allow an external module to check the strength of database passwords, and create a sample implementation of such a hook as a new contrib module "passwordcheck". Laurenz Albe, reviewed by Takahiro Itagaki	2009-11-18 21:57:56 +00:00
Tom Lane	5e66a51c2e	Provide a parenthesized-options syntax for VACUUM, analogous to that recently adopted for EXPLAIN. This will allow additional options to be implemented in future without having to make them fully-reserved keywords. The old syntax remains available for existing options, however. Itagaki Takahiro	2009-11-16 21:32:07 +00:00

1 2 3 4 5 ...

5190 Commits