postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	c504513f83	Adjust many backend functions to return OID rather than void. Extracted from a larger patch by Dimitri Fontaine. It is hoped that this will provide infrastructure for enriching the new event trigger functionality, but it seems possibly useful for other purposes as well.	2012-12-23 18:37:58 -05:00
Tom Lane	31bc839724	Prevent failure when RowExpr or XmlExpr is parse-analyzed twice. transformExpr() is required to cope with already-transformed expression trees, for various ugly-but-not-quite-worth-cleaning-up reasons. However, some of its newer subroutines hadn't gotten the memo. This accounts for bug #7763 from Norbert Buchmuller: transformRowExpr() was overwriting the previously determined type of a RowExpr during CREATE TABLE LIKE INCLUDING INDEXES. Additional investigation showed that transformXmlExpr had the same kind of problem, but all the other cases seem to be safe. Andres Freund and Tom Lane	2012-12-23 14:07:24 -05:00
Heikki Linnakangas	1ff92eea14	Fix sloppiness in the timeline switch over streaming replication patch. Here's another attempt at fixing the logic that decides how far the WAL can be streamed, which was still broken if the timeline changed while streaming. You would get an assertion failure. The way the logic is now written is more readable, too. Thom Brown reported the assertion failure.	2012-12-21 20:08:12 +02:00
Heikki Linnakangas	36e4456d78	Fix race condition if a file is removed while pg_basebackup is running. If a relation file was removed when the server-side counterpart of pg_basebackup was just about to open it to send it to the client, you'd get a "could not open file" error. Fix that. Backpatch to 9.1, this goes back to when pg_basebackup was introduced.	2012-12-21 15:34:15 +02:00
Heikki Linnakangas	d57a97343e	Forgot to remove extern declaration of GetRecoveryTargetTLI() Fujii Masao	2012-12-21 09:29:03 +02:00
Peter Eisentraut	740ee42da5	Make some messages more consistent in style	2012-12-21 00:10:46 -05:00
Peter Eisentraut	a0bfb7b36e	Fix grammatical mistake in error message	2012-12-20 23:36:13 -05:00
Tom Lane	343c2a865b	Fix pg_extension_config_dump() to handle update cases more sanely. If pg_extension_config_dump() is executed again for a table already listed in the extension's extconfig, the code was blindly making a new array entry. This does not seem useful. Fix it to replace the existing array entry instead, so that it's possible for extension update scripts to alter the filter conditions for configuration tables. In addition, teach ALTER EXTENSION DROP TABLE to check for an extconfig entry for the target table, and remove it if present. This is not a 100% solution because it's allowed for an extension update script to just summarily DROP a member table, and that code path doesn't go through ExecAlterExtensionContentsStmt. We could probably make that case clean things up if we had to, but it would involve sticking a very ugly wart somewhere in the guts of dependency.c. Since on the whole it seems quite unlikely that extension updates would want to remove pre-existing configuration tables, making the case possible with an explicit command seems sufficient. Per bug #7756 from Regina Obe. Back-patch to 9.1 where extensions were introduced.	2012-12-20 16:31:42 -05:00
Heikki Linnakangas	343ee00b73	Fix recycling of WAL segments after switching timeline during recovery. This was broken before, we would recycle old WAL segments on wrong timeline after the recovery target timeline had changed, but my recent commit to not initialize ThisTimeLineID at all in a standby's checkpointer process broke this completely. The problem is that when installing a recycled WAL segment as a future one, ThisTimeLineID is used to construct the filename. To fix, always update ThisTimeLineID to the current timeline being recovered, before recycling WAL segments at a restartpoint. This still leaves a small window where we might install WAL segments under wrong timeline ID, if the timeline is changed just as we're about to start recycling. Also, even if we're replaying timeline X at the momnent, there's no guarantee that we'll need as many WAL segments on that timeline as we recycle. We might be just about to reach the point where we switch to next timeline, so might only need one more WAL segment on the current timeline. We'll live with the waste in that situation. Bug pointed out by Fujii Masao. 9.1 and 9.2 had the same issue, when recovery target timeline was changed, but I committed a slightly different version of this patch on those branches.	2012-12-20 22:00:58 +02:00
Bruce Momjian	dc9896a245	Avoid using NAMEDATALEN in pg_upgrade Because the client encoding might not match the server encoding, pg_upgrade can't allocate NAMEDATALEN bytes for storage of database, relation, and namespace identifiers. Instead pg_strdup() the memory and free it. Also add C comment in initdb.c about safe NAMEDATALEN usage.	2012-12-20 13:56:31 -05:00
Heikki Linnakangas	af275a12df	Follow TLI of last replayed record, not recovery target TLI, in walsenders. Most of the time, the last replayed record comes from the recovery target timeline, but there is a corner case where it makes a difference. When the startup process scans for a new timeline, and decides to change recovery target timeline, there is a window where the recovery target TLI has already been bumped, but there are no WAL segments from the new timeline in pg_xlog yet. For example, if we have just replayed up to point 0/30002D8, on timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog that contains the WAL up to that point. When recovery switches recovery target timeline to 2, a walsender can immediately try to read WAL from 0/30002D8, from timeline 2, so it will try to open WAL file 000000020000000000000003. However, that doesn't exist yet - the startup process hasn't copied that file from the archive yet nor has the walreceiver streamed it yet, so walsender fails with error "requested WAL segment 000000020000000000000003 has already been removed". That's harmless, in that the standby will try to reconnect later and by that time the segment is already created, but error messages that should be ignored are not good. To fix that, have walsender track the TLI of the last replayed record, instead of the recovery target timeline. That way walsender will not try to read anything from timeline 2, until the WAL segment has been created and at least one record has been replayed from it. The recovery target timeline is now xlog.c's internal affair, it doesn't need to be exposed in shared memory anymore. This fixes the error reported by Thom Brown. depesz the same error message, but I'm not sure if this fixes his scenario.	2012-12-20 14:39:04 +02:00
Heikki Linnakangas	1a11d4609e	Don't set ThisTimeLineID in checkpointer & bgwriter during recovery. We used to set it to the current recovery target timeline, but the recovery target timeline can change during recovery, leaving ThisTimeLineID at an old value. That seems worse than always leaving it at zero to begin with. AFAICS there was no good reason to set it in the first place. ThisTimeLineID is not needed in checkpointer or bgwriter process, until it's time to write the end-of-recovery checkpoint, and at that point ThisTimeLineID is updated anyway.	2012-12-20 14:39:04 +02:00
Heikki Linnakangas	e43f947bf3	Check if we've reached end-of-backup point also if no redo is required. If you restored from a backup taken from a standby, and the last record in the backup is the checkpoint record, ie. there is no redo required except for the checkpoint record, we would fail to notice that we've reached the end-of-backup point, and the database is consistent. The result was an error "WAL ends before end of online backup". To fix, move the have-we-reached-end-of-backup check into CheckRecoveryConsistency(), which is already responsible for similar checks with minRecoveryPoint, and is called in the right places. Backpatch to 9.2, this check and bug did not exist before that.	2012-12-19 14:22:00 +02:00
Peter Eisentraut	f2b88080db	Rename SQL feature S403 to ARRAY_MAX_CARDINALITY In an earlier version of the standard, this was called just "MAX_CARDINALITY".	2012-12-19 07:14:27 -05:00
Peter Eisentraut	6925e38dad	pg_basebackup: Small message punctuation improvements	2012-12-19 07:01:11 -05:00
Andrew Dunstan	9ac749ceb5	Don't include postgres.h in postgres_fe.h for cpluspluscheck. Error exposed by recent Assert changes. Complaint from Peter Eisentraut.	2012-12-18 16:30:14 -05:00
Peter Eisentraut	1a5f04dd7e	Remove allow_nonpic_in_shlib This was used in a time when a shared libperl or libpython was difficult to come by. That is obsolete, and the idea behind the flag was never fully portable anyway and will likely fail on more modern CPU architectures.	2012-12-18 01:13:59 -05:00
Tom Lane	6919b7e329	Fix failure to ignore leftover temp tables after a server crash. During crash recovery, we remove disk files belonging to temporary tables, but the system catalog entries for such tables are intentionally not cleaned up right away. Instead, the first backend that uses a temp schema is expected to clean out any leftover objects therein. This approach requires that we be careful to ignore leftover temp tables (since any actual access attempt would fail), even if their BackendId matches our session, if we have not yet established use of the session's corresponding temp schema. That worked fine in the past, but was broken by commit `debcec7dc3` which incorrectly removed the rd_islocaltemp relcache flag. Put it back, and undo various changes that substituted tests like "rel->rd_backend == MyBackendId" for use of a state-aware flag. Per trouble report from Heikki Linnakangas. Back-patch to 9.1 where the erroneous change was made. In the back branches, be careful to add rd_islocaltemp in a spot in the struct that was alignment padding before, so as not to break existing add-on code.	2012-12-17 20:15:32 -05:00
Tom Lane	c299477229	Fix filling of postmaster.pid in bootstrap/standalone mode. We failed to ever fill the sixth line (LISTEN_ADDR), which caused the attempt to fill the seventh line (SHMEM_KEY) to fail, so that the shared memory key never got added to the file in standalone mode. This has been broken since we added more content to our lock files in 9.1. To fix, tweak the logic in CreateLockFile to add an empty LISTEN_ADDR line in standalone mode. This is a tad grotty, but since that function already knows almost everything there is to know about the contents of lock files, it doesn't seem that it's any better to hack it elsewhere. It's not clear how significant this bug really is, since a standalone backend should never have any children and thus it seems not critical to be able to check the nattch count of the shmem segment externally. But I'm going to back-patch the fix anyway. This problem had escaped notice because of an ancient (and in hindsight pretty dubious) decision to suppress LOG-level messages by default in standalone mode; so that the elog(LOG) complaint in AddToDataDirLockFile that should have warned of the problem didn't do anything. Fixing that is material for a separate patch though.	2012-12-16 15:02:49 -05:00
Andrew Dunstan	3717f0837b	Tidy up from frontend Assert change. Quiet compiler warnings noted by Peter Eisentraut.	2012-12-16 12:22:57 -05:00
Magnus Hagander	c1f856a17f	Properly copy fmgroids.h after clean on Win32 Craig Ringer	2012-12-16 14:56:51 +01:00
Andrew Dunstan	1c382655ad	Provide Assert() for frontend code. Per discussion on-hackers. psql is converted to use the new code. Follows a suggestion from Heikki Linnakangas.	2012-12-14 18:03:07 -05:00
Robert Haas	75758a6ff0	Update comment in heapgetpage() regarding PD_ALL_VISIBLE vs. Hot Standby. Pavan Deolasee, slightly modified by me	2012-12-14 15:44:38 -05:00
Peter Eisentraut	fdb67eb2b6	NLS: Use msgmerge --previous option It provides some additional help to translators.	2012-12-13 23:12:12 -05:00
Heikki Linnakangas	abfd192b1b	Allow a streaming replication standby to follow a timeline switch. Before this patch, streaming replication would refuse to start replicating if the timeline in the primary doesn't exactly match the standby. The situation where it doesn't match is when you have a master, and two standbys, and you promote one of the standbys to become new master. Promoting bumps up the timeline ID, and after that bump, the other standby would refuse to continue. There's significantly more timeline related logic in streaming replication now. First of all, when a standby connects to primary, it will ask the primary for any timeline history files that are missing from the standby. The missing files are sent using a new replication command TIMELINE_HISTORY, and stored in standby's pg_xlog directory. Using the timeline history files, the standby can follow the latest timeline present in the primary (recovery_target_timeline='latest'), just as it can follow new timelines appearing in an archive directory. START_REPLICATION now takes a TIMELINE parameter, to specify exactly which timeline to stream WAL from. This allows the standby to request the primary to send over WAL that precedes the promotion. The replication protocol is changed slightly (in a backwards-compatible way although there's little hope of streaming replication working across major versions anyway), to allow replication to stop when the end of timeline reached, putting the walsender back into accepting a replication command. Many thanks to Amit Kapila for testing and reviewing various versions of this patch.	2012-12-13 19:17:32 +02:00
Heikki Linnakangas	527668717a	Make xlog_internal.h includable in frontend context. This makes unnecessary the ugly hack used to #include postgres.h in pg_basebackup. Based on Alvaro Herrera's patch	2012-12-13 14:59:13 +02:00
Heikki Linnakangas	6264cd3d69	In multi-insert, don't go into infinite loop on a huge tuple and fillfactor. If a tuple is larger than page size minus space reserved for fillfactor, heap_multi_insert would never find a page that it fits in and repeatedly ask for a new page from RelationGetBufferForTuple. If a tuple is too large to fit on any page, taking fillfactor into account, RelationGetBufferForTuple will always expand the relation. In a normal insert, heap_insert will accept that and put the tuple on the new page. heap_multi_insert, however, does a fillfactor check of its own, and doesn't accept the newly-extended page RelationGetBufferForTuple returns, even though there is no other choice to make the tuple fit. Fix that by making the logic in heap_multi_insert more like the heap_insert logic. The first tuple is always put on the page RelationGetBufferForTuple gives us, and the fillfactor check is only applied to the subsequent tuples. Report from David Gould, although I didn't use his patch.	2012-12-12 13:54:42 +02:00
Tom Lane	691c5ebf79	Add defenses against integer overflow in dynahash numbuckets calculations. The dynahash code requires the number of buckets in a hash table to fit in an int; but since we calculate the desired hash table size dynamically, there are various scenarios where we might calculate too large a value. The resulting overflow can lead to infinite loops, division-by-zero crashes, etc. I (tgl) had previously installed some defenses against that in commit `299d171652`, but that covered only one call path. Moreover it worked by limiting the request size to work_mem, but in a 64-bit machine it's possible to set work_mem high enough that the problem appears anyway. So let's fix the problem at the root by installing limits in the dynahash.c functions themselves. Trouble report and patch by Jeff Davis.	2012-12-11 22:09:05 -05:00
Tom Lane	cd3413ec36	Disable event triggers in standalone mode. Per discussion, this seems necessary to allow recovery from broken event triggers, or broken indexes on pg_event_trigger. Dimitri Fontaine	2012-12-11 19:28:31 -05:00
Kevin Grittner	b19e4250b4	Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck	2012-12-11 14:33:08 -06:00
Heikki Linnakangas	970fb12de1	Consistency check should compare last record replayed, not last record read. EndRecPtr is the last record that we've read, but not necessarily yet replayed. CheckRecoveryConsistency should compare minRecoveryPoint with the last replayed record instead. This caused recovery to think it's reached consistency too early. Now that we do the check in CheckRecoveryConsistency correctly, we have to move the call of that function to after redoing a record. The current place, after reading a record but before replaying it, is wrong. In particular, if there are no more records after the one ending at minRecoveryPoint, we don't enter hot standby until one extra record is generated and read by the standby, and CheckRecoveryConsistency is called. These two bugs conspired to make the code appear to work correctly, except for the small window between reading the last record that reaches minRecoveryPoint, and replaying it. In the passing, rename recoveryLastRecPtr, which is the last record replayed, to lastReplayedEndRecPtr. This makes it slightly less confusing with replayEndRecPtr, which is the last record read that we're about to replay. Original report from Kyotaro HORIGUCHI, further diagnosis by Fujii Masao. Backpatch to 9.0, where Hot Standby subtly changed the test from "minRecoveryPoint < EndRecPtr" to "minRecoveryPoint <= EndRecPtr". The former works because where the test is performed, we have always read one more record than we've replayed.	2012-12-11 18:54:02 +02:00
Andrew Dunstan	ad69bd052f	Add mode where contrib installcheck runs each module in a separately named database. Normally each module is tested in a database named contrib_regression, which is dropped and recreated at the beginhning of each pg_regress run. This new mode, enabled by adding USE_MODULE_DB=1 to the make command line, runs most modules in a database with the module name embedded in it. This will make testing pg_upgrade on clusters with the contrib modules a lot easier. Second attempt at this, this time accomodating make versions older than 3.82. Still to be done: adapt to the MSVC build system. Backpatch to 9.0, which is the earliest version it is reasonably possible to test upgrading from.	2012-12-11 11:52:45 -05:00
Heikki Linnakangas	7bffc9b7bf	Update minimum recovery point on truncation. If a file is truncated, we must update minRecoveryPoint. Once a file is truncated, there's no going back; it would not be safe to stop recovery at a point earlier than that anymore. Per report from Kyotaro HORIGUCHI. Backpatch to 8.4. Before that, minRecoveryPoint was not updated during recovery at all.	2012-12-10 16:57:16 +02:00
Heikki Linnakangas	6be799664a	Fix the tracking of min recovery point timeline. Forgot to update it at the right place. Also, consider checkpoint record that switches to new timelne to be on the new timeline. This fixes erroneous "requested timeline 2 does not contain minimum recovery point" errors, pointed out by Amit Kapila while testing another patch.	2012-12-10 16:04:26 +02:00
Tom Lane	b46c92112b	Fix assorted bugs in privileges-for-types patch. Commit `729205571e` added privileges on data types, but there were a number of oversights. The implementation of default privileges for types missed a few places, and pg_dump was utterly innocent of the whole concept. Per bug #7741 from Nathan Alden, and subsequent wider investigation.	2012-12-09 00:08:23 -05:00
Tom Lane	a99c42f291	Support automatically-updatable views. This patch makes "simple" views automatically updatable, without the need to create either INSTEAD OF triggers or INSTEAD rules. "Simple" views are those classified as updatable according to SQL-92 rules. The rewriter transforms INSERT/UPDATE/DELETE commands on such views directly into an equivalent command on the underlying table, which will generally have noticeably better performance than is possible with either triggers or user-written rules. A view that has INSTEAD OF triggers or INSTEAD rules continues to operate the same as before. For the moment, security_barrier views are not considered simple. Also, we do not support WITH CHECK OPTION. These features may be added in future. Dean Rasheed, reviewed by Amit Kapila	2012-12-08 18:26:21 -05:00
Simon Riggs	ef754fb51b	Correct xmax test for COPY FREEZE	2012-12-07 14:18:47 +00:00
Simon Riggs	1f023f9297	Optimize COPY FREEZE with CREATE TABLE also. Jeff Davis, additional test by me	2012-12-07 13:26:52 +00:00
Simon Riggs	1eb6cee499	Clarify that COPY FREEZE is not a hard rule. Remove message when FREEZE not honoured, clarify reasons in comments and docs.	2012-12-07 12:59:05 +00:00
Tom Lane	31a891857a	Improve pl/pgsql to support composite-type expressions in RETURN. For some reason lost in the mists of prehistory, RETURN was only coded to allow a simple reference to a composite variable when the function's return type is composite. Allow an expression instead, while preserving the efficiency of the original code path in the case where the expression is indeed just a composite variable's name. Likewise for RETURN NEXT. As is true in various other places, the supplied expression must yield exactly the number and data types of the required columns. There was some discussion of relaxing that for pl/pgsql, but no consensus yet, so this patch doesn't address that. Asif Rehman, reviewed by Pavel Stehule	2012-12-06 23:09:52 -05:00
Alvaro Herrera	da07a1e856	Background worker processes Background workers are postmaster subprocesses that run arbitrary user-specified code. They can request shared memory access as well as backend database connections; or they can just use plain libpq frontend database connections. Modules listed in shared_preload_libraries can register background workers in their _PG_init() function; this is early enough that it's not necessary to provide an extra GUC option, because the necessary extra resources can be allocated early on. Modules can install more than one bgworker, if necessary. Care is taken that these extra processes do not interfere with other postmaster tasks: only one such process is started on each ServerLoop iteration. This means a large number of them could be waiting to be started up and postmaster is still able to quickly service external connection requests. Also, shutdown sequence should not be impacted by a worker process that's reasonably well behaved (i.e. promptly responds to termination signals.) The current implementation lets worker processes specify their start time, i.e. at what point in the server startup process they are to be started: right after postmaster start (in which case they mustn't ask for shared memory access), when consistent state has been reached (useful during recovery in a HOT standby server), or when recovery has terminated (i.e. when normal backends are allowed). In case of a bgworker crash, actions to take depend on registration data: if shared memory was requested, then all other connections are taken down (as well as other bgworkers), just like it were a regular backend crashing. The bgworker itself is restarted, too, within a configurable timeframe (which can be configured to be never). More features to add to this framework can be imagined without much effort, and have been discussed, but this seems good enough as a useful unit already. An elementary sample module is supplied. Author: Álvaro Herrera This patch is loosely based on prior patches submitted by KaiGai Kohei, and unsubmitted code by Simon Riggs. Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund, Heikki Linnakangas, Simon Riggs, Amit Kapila	2012-12-06 17:47:30 -03:00
Tom Lane	e31d524867	Fix intermittent crash in DROP INDEX CONCURRENTLY. When deleteOneObject closes and reopens the pg_depend relation, we must see to it that the relcache pointer held by the calling function (typically performMultipleDeletions) is updated. Usually the relcache entry is retained so that the pointer value doesn't change, which is why the problem had escaped notice ... but after a cache flush event there's no guarantee that the same memory will be reassigned. To fix, change the recursive functions' APIs so that we pass around a "Relation *" not just "Relation". Per investigation of occasional buildfarm failures. This is trivial to reproduce with -DCLOBBER_CACHE_ALWAYS, which points up the sad lack of any buildfarm member running that way on a regular basis.	2012-12-05 23:42:51 -05:00
Alvaro Herrera	5e15cdb2ae	Update comment at top of index_create I neglected to update it in commit `f4c4335`. Michael Paquier	2012-12-05 23:09:46 -03:00
Tom Lane	af4aba2f05	Ensure recovery pause feature doesn't pause unless users can connect. If we're not in hot standby mode, then there's no way for users to connect to reset the recoveryPause flag, so we shouldn't pause. The code was aware of this but the test to see if pausing was safe was seriously inadequate: it wasn't paying attention to reachedConsistency, and besides what it was testing was that we could legally enter hot standby, not that we have done so. Get rid of that in favor of checking LocalHotStandbyActive, which because of the coding in CheckRecoveryConsistency is tantamount to checking that we have told the postmaster to enter hot standby. Also, move the recoveryPausesHere() call that reacts to asynchronous recoveryPause requests so that it's not in the middle of application of a WAL record. I put it next to the recoveryStopsHere() call --- in future those are going to need to interact significantly, so this seems like a good waystation. Also, don't bother trying to read another WAL record if we've already decided not to continue recovery. This was no big deal when the code was written originally, but now that reading a record might entail actions like fetching an archive file, it seems a bit silly to do it like that. Per report from Jeff Janes and subsequent discussion. The pause feature needs quite a lot more work, but this gets rid of some indisputable bugs, and seems safe enough to back-patch.	2012-12-05 18:27:50 -05:00
Heikki Linnakangas	d67b06fe3e	Oops, meant to change the comment in writeTimeLineHistory.	2012-12-05 21:00:59 +02:00
Simon Riggs	6aa2e49a87	Must not reach consistency before XLOG_BACKUP_RECORD When waiting for an XLOG_BACKUP_RECORD the minRecoveryPoint will be incorrect, so we must not declare recovery as consistent before we have seen the record. Major bug allowing recovery to end too early in some cases, allowing people to see inconsistent db. This patch to HEAD and 9.2, other fix required for 9.1 and 9.0 Simon Riggs and Andres Freund, bug report by Jeff Janes	2012-12-05 13:28:03 +00:00
Tom Lane	cdf498c5d7	Attempt to un-break Windows builds with USE_LDAP. The buildfarm shows this case is entirely broken, and I'm betting the reason is lack of any include file.	2012-12-04 17:25:51 -05:00
Michael Meskes	ac99ca68d7	Include isinf.o in libecpg if isinf() is not available on the system. Patch done by Jiang Guiqing <jianggq@cn.fujitsu.com>.	2012-12-04 16:44:22 +01:00
Heikki Linnakangas	90991c40eb	Downgrade a status message from LOG to DEBUG2. I never intended this to be anything other than a debugging aid, but forgot to change the level before committing.	2012-12-04 17:29:44 +02:00
Heikki Linnakangas	32f4de0adf	Write exact xlog position of timeline switch in the timeline history file. This allows us to do some more rigorous sanity checking for various incorrect point-in-time recovery scenarios, and provides more information for debugging purposes. It will also come handy in the upcoming patch to allow timeline switches to be replicated by streaming replication.	2012-12-04 17:29:07 +02:00
Bruce Momjian	a84c30dda5	In initdb.c, move auth warning code into main() from secondary function.	2012-12-04 09:52:00 -05:00
Peter Eisentraut	ec8d1e32dd	Fix build of LDAP URL feature Some code was not ifdef'ed out for non-LDAP builds. patch from Bruce Momjian	2012-12-04 06:42:25 -05:00
Heikki Linnakangas	5ce108bf32	Track the timeline associated with minRecoveryPoint, for more sanity checks. This allows recovery to notice certain incorrect recovery scenarios. If a server has recovered to point X on timeline 5, and you restart recovery, it better be on timeline 5 when it reaches point X again, not on some timeline with a higher ID. This can happen e.g if you a standby server is shut down, a new timeline appears in the WAL archive, and the standby server is restarted. It will try to follow the new timeline, which is wrong because some WAL on the old timeline was already replayed before shutdown. Requires an initdb (or at least pg_resetxlog), because this adds a field to the control file.	2012-12-04 11:31:00 +02:00
Peter Eisentraut	aa2fec0a18	Add support for LDAP URLs Allow specifying LDAP authentication parameters as RFC 4516 LDAP URLs.	2012-12-03 23:31:02 -05:00
Bruce Momjian	26374f2a0f	In initdb.c, rename some newly created functions, and move the directory creation and xlog symlink creation to separate functions. Per suggestions from Andrew Dunstan.	2012-12-03 23:22:56 -05:00
Bruce Momjian	630cd14426	Add initdb --sync-only option to sync the data directory to durable storage. Have pg_upgrade use it, and enable server options fsync=off and full_page_writes=off. Document that users turning fsync from off to on should run initdb --sync-only. [ Previous commit was incorrectly applied as a git merge. ]	2012-12-03 22:47:59 -05:00
Bruce Momjian	25d1ed04a2	Revert initdb --sync-only patch that had incorrect commit messages.	2012-12-03 22:46:51 -05:00
Bruce Momjian	cd7569a546	dummy commit	2012-12-03 22:45:02 -05:00
Bruce Momjian	db00d837c1	In pg_upgrade, fix bug where no users were dumped in pg_dumpall binary-upgrade mode; instead only skip dumping the current user. This bug was introduced in during the removal of split_old_dump(). Bug discovered during local testing.	2012-12-03 19:43:02 -05:00
Andrew Dunstan	fc5c1bbbeb	Revert "Add mode where contrib installcheck runs each module in a separately named database." This reverts commit `e2b3c21b05`.	2012-12-03 15:00:51 -05:00
Simon Riggs	62656617db	Avoid holding vmbuffer pin after VACUUM. During VACUUM if we pause to perform a cycle of index cleanup we drop the vmbuffer pin, so we should do the same thing when heap scan completes. This avoids holding vmbuffer pin across the main index cleanup in VACUUM, which could be minutes or hours longer than necessary for correctness. Bug report and suggested fix from Pavan Deolasee	2012-12-03 18:53:31 +00:00
Andrew Dunstan	d5652e50d5	Attempt to unbreak MSVC builds broken by `f21bb9cfb5`. We can't use type uint, so use uint32.	2012-12-03 10:23:22 -05:00
Simon Riggs	f21bb9cfb5	Refactor inCommit flag into generic delayChkpt flag. Rename PGXACT->inCommit flag into delayChkpt flag, and generalise comments to allow use in other situations, such as the forthcoming potential use in checksum patch. Replace wait loop to look for VXIDs with delayChkpt set. No user visible changes, not behaviour changes at present. Simon Riggs, reviewed and rebased by Jeff Davis	2012-12-03 13:13:53 +00:00
Simon Riggs	7a764990d8	Clarify locking for PageGetLSN() in XLogCheckBuffer()	2012-12-03 12:20:31 +00:00
Simon Riggs	1c563a2ae1	Clarify when to use PageSetLSN/PageGetLSN(). Update README to explain prerequisites for correct access to LSN fields of a page. Independent chunk removed from checksums patch to reduce size of patch.	2012-12-03 11:59:25 +00:00
Heikki Linnakangas	a068c391ab	Refactor the code implementing standby-mode logic. It is now easier to see that it's a state machine, making the code easier to understand overall.	2012-12-03 12:32:44 +02:00
Andrew Dunstan	e2b3c21b05	Add mode where contrib installcheck runs each module in a separately named database. Normally each module is tested in aq database named contrib_regression, which is dropped and recreated at the beginhning of each pg_regress run. This mode, enabled by adding USE_MODULE_DB=1 to the make command line, runs most modules in a database with the module name embedded in it. This will make testing pg_upgrade on clusters with the contrib modules a lot easier. Still to be done: adapt to the MSVC build system. Backpatch to 9.0, which is the earliest version it is reasonably possible to test upgrading from.	2012-12-02 17:20:38 -05:00
Tom Lane	fc75d4f81c	Update time zone data files to tzdata release 2012j. DST law changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa, and portions of Brazil.	2012-12-02 16:35:23 -05:00
Simon Riggs	5457a130d3	Reduce scope of changes for COPY FREEZE. Allow support only for freezing tuples by explicit command. Previous coding mistakenly extended slightly beyond what was agreed as correct on -hackers. So essentially a partial revoke of earlier work, leaving just the COPY FREEZE command.	2012-12-02 20:52:52 +00:00
Tom Lane	3114cb60a1	Don't advance checkPoint.nextXid near the end of a checkpoint sequence. This reverts commit `c11130690d` in favor of actually fixing the problem: namely, that we should never have been modifying the checkpoint record's nextXid at this point to begin with. The nextXid should match the state as of the checkpoint's logical WAL position (ie the redo point), not the state as of its physical position. It's especially bogus to advance it in some wal_levels and not others. In any case there is no need for the checkpoint record to carry the same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by LogStandbySnapshot, as any replay operation will already have adopted that value as current. This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug #6291 from Daniel Farina, in that if a checkpoint were in progress at the instant of XID wraparound, the epoch bump would be lost as reported. (And, of course, these days there's at least a 50-50 chance of a checkpoint being in progress at any given instant.) Diagnosed by me and independently by Andres Freund. Back-patch to all branches supporting hot standby.	2012-12-02 15:20:41 -05:00
Simon Riggs	5c11725867	Rearrange storage of data in xl_running_xacts. Previously we stored all xids mixed together. Now we store top-level xids first, followed by all subxids. Also skip logging any subxids if the snapshot is suboverflowed, since there are potentially large numbers of them and they are not useful in that case anyway. Has value in the envisaged design for decoding of WAL. No planned effect on Hot Standby. Andres Freund, reviewed by me	2012-12-02 19:39:37 +00:00
Simon Riggs	c11130690d	XidEpoch++ if wraparound during checkpoint. If wal_level = hot_standby we update the checkpoint nextxid, though in the case where a wraparound occurred half-way through a checkpoint we would neglect updating the epoch also. Updating the nextxid is arguably the wrong thing to do, but changing that may introduce subtle bugs into hot standby startup, while updating the value doesn't cause any known bugs yet. Minimal fix now to HEAD and backbranches, wider fix later in HEAD. Bug reported in #6291 by Daniel Farina and slightly differently in Cause analysis and recommended fixes from Tom Lane and Andres Freund. Applied patch is minimal version of Andres Freund's work.	2012-12-02 14:57:44 +00:00
Simon Riggs	9f98704b82	Clarify operation of online checkpoints. Previous comments left, but were too obscure for such an important aspect of the system.	2012-12-02 13:09:55 +00:00
Tatsuo Ishii	53edb8dc02	Fix psql crash while parsing SQL file whose encoding is different from client encoding and the client encoding is not safe one. Such an example is, file encoding is UTF-8 and client encoding SJIS. Patch contributed by Jiang Guiqing.	2012-12-02 21:11:15 +09:00
Tom Lane	c35fea1026	Prevent passing gmake's environment variables down through pg_regress. When we do "make install" to create a temp installation, we don't want that instance of make to try to communicate with any instance of make that might be calling us. This is known to cause problems if the upper make has a -jN flag, and in principle could cause problems even without that. Unset the relevant environment variables to prevent such issues. Andres Freund	2012-12-01 17:23:49 -05:00
Tom Lane	b1346822f3	Make sure sharedir/extension/ directory is created when needed. The previous coding worked as long as MODULEDIR wasn't set explicitly, because we create sharedir/$(datamoduledir) and the default value of that is "extension". But if some other value is specified for MODULEDIR then the installation directory needed for the control file wasn't made. Cédric Villemain	2012-12-01 16:04:39 -05:00
Tom Lane	7b90469b71	Allow adding values to an enum type created in the current transaction. Normally it is unsafe to allow ALTER TYPE ADD VALUE in a transaction block, because instances of the value could be added to indexes later in the same transaction, and then they would still be accessible even if the transaction rolls back. However, we can allow this if the enum type itself was created in the current transaction, because then any such indexes would have to go away entirely on rollback. The reason for allowing this is to support pg_upgrade's new usage of pg_restore --single-transaction: in --binary-upgrade mode, pg_dump emits enum types as a succession of ALTER TYPE ADD VALUE commands so that it can preserve the values' OIDs. The support is a bit limited, so we'll leave it undocumented. Andres Freund	2012-12-01 14:27:30 -05:00
Simon Riggs	02aea36414	Second tweak of COPY FREEZE	2012-12-01 14:55:35 +00:00
Simon Riggs	ddf509eb4a	Tweak tests in COPY FREEZE	2012-12-01 13:46:41 +00:00
Simon Riggs	8de72b66a2	COPY FREEZE and mark committed on fresh tables. When a relfilenode is created in this subtransaction or a committed child transaction and it cannot otherwise be seen by our own process, mark tuples committed ahead of transaction commit for all COPY commands in same transaction. If FREEZE specified on COPY and pre-conditions met then rows will also be frozen. Both options designed to avoid revisiting rows after commit, increasing performance of subsequent commands after data load and upgrade. pg_restore changes later. Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design input from Tom Lane, Robert Haas and Kevin Grittner	2012-12-01 12:54:20 +00:00
Alvaro Herrera	113d25c4e6	Change test ExceptionalCondition to return void Commit `81107282a` changed it in assert.c, but overlooked this other file.	2012-11-30 19:24:21 -03:00
Bruce Momjian	b86327c1c5	Split initdb.c main() code into multiple functions, for easier maintenance.	2012-11-30 16:45:08 -05:00
Bruce Momjian	12ee6ec71f	In pg_upgrade, dump each database separately and use --single-transaction to restore each database schema. This yields performance improvements for databases with many tables. Also, remove split_old_dump() as it is no longer needed.	2012-11-30 16:30:13 -05:00
Bruce Momjian	bd9c8e741b	Move long_options structures to the top of main() functions, for consistency. Per suggestion from Tom.	2012-11-30 14:49:55 -05:00
Tom Lane	da63fec7db	Add missing buffer lock acquisition in GetTupleForTrigger(). If we had not been holding buffer pin continuously since the tuple was initially fetched by the UPDATE or DELETE query, it would be possible for VACUUM or a page-prune operation to move the tuple while we're trying to copy it. This would result in a garbage "old" tuple value being passed to an AFTER ROW UPDATE or AFTER ROW DELETE trigger. The preconditions for this are somewhat improbable, and the timing constraints are very tight; so it's not so surprising that this hasn't been reported from the field, even though the bug has been there a long time. Problem found by Andres Freund. Back-patch to all active branches.	2012-11-30 13:55:55 -05:00
Magnus Hagander	65c3bf19fd	Add libpq function PQconninfo() This allows a caller to get back the exact conninfo array that was used to create a connection, including parameters read from the environment. In doing this, restructure how options are copied from the conninfo to the actual connection. Zoltan Boszormenyi and Magnus Hagander	2012-11-30 15:11:08 +09:00
Tom Lane	4af446e7cd	Produce a more useful error message for over-length Unix socket paths. The length of a socket path name is constrained by the size of struct sockaddr_un, and there's not a lot we can do about it since that is a kernel API. However, it would be a good thing if we produced an intelligible error message when the user specifies a socket path that's too long --- and getaddrinfo's standard API is too impoverished to do this in the natural way. So insert explicit tests at the places where we construct a socket path name. Now you'll get an error that makes sense and even tells you what the limit is, rather than something generic like "Non-recoverable failure in name resolution". Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.	2012-11-29 19:57:01 -05:00
Simon Riggs	d3fe59939c	Correctly init fast path fields on PGPROC	2012-11-29 22:15:52 +00:00
Simon Riggs	f1e57a4ec9	Cleanup VirtualXact at end of Hot Standby.	2012-11-29 21:59:11 +00:00
Robert Haas	7a2fe9bd03	Basic binary heap implementation. There are probably other places where this can be used, but for now, this just makes MergeAppend use it, so that this code will have test coverage. There is other work in the queue that will use this, as well. Abhijit Menon-Sen, reviewed by Andres Freund, Robert Haas, Álvaro Herrera, Tom Lane, and others.	2012-11-29 11:16:59 -05:00
Michael Meskes	086cf1458c	When processing nested structure pointer variables ecpg always expected an array datatype which of course is wrong. Applied patch by Muhammad Usama <m.usama@gmail.com> to fix this.	2012-11-29 17:12:00 +01:00
Tom Lane	1fc698cf14	Suppress parallel build in interfaces/ecpg/preproc/. This is to see if it will stop intermittent build failures on buildfarm member okapi. We know that gmake 3.82 has some problems with sometimes not honoring dependencies in parallel builds, and it seems likely that this is more of the same. Since the vast bulk of the work in the preproc directory is associated with creating preproc.c and then preproc.o, parallelism buys us hardly anything here anyway. Also, make both this .NOTPARALLEL and the one previously added in interfaces/ecpg/Makefile be conditional on "ifeq ($(MAKE_VERSION),3.82)". The known bug in gmake is fixed upstream and should not be present in 3.83 and up, and there's no reason to think it affects older releases.	2012-11-28 22:19:46 -05:00
Tom Lane	3c84046490	Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit `8cb53654db`, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.	2012-11-28 21:26:01 -05:00
Alvaro Herrera	1577b46b7c	Split out rmgr rm_desc functions into their own files This is necessary (but not sufficient) to have them compilable outside of a backend environment.	2012-11-28 13:01:15 -03:00
Heikki Linnakangas	dd7353dde8	If we don't have a backup-end-location, don't claim we've reached it. This was apparently a typo, which caused recovery to think that it immediately reached the end of backup, and allowed the database to start up too early. Reported by Jeff Janes. Backpatch to 9.2, where this code was introduced.	2012-11-28 15:14:27 +02:00
Tom Lane	e78d288c89	Add explicit casts in ilist.h's inline functions. Needed to silence C++ errors, per report from Peter Eisentraut. Andres Freund	2012-11-27 10:58:37 -05:00
Heikki Linnakangas	1f67078ea3	Add OpenTransientFile, with automatic cleanup at end-of-xact. Files opened with BasicOpenFile or PathNameOpenFile are not automatically cleaned up on error. That puts unnecessary burden on callers that only want to keep the file open for a short time. There is AllocateFile, but that returns a buffered FILE * stream, which in many cases is not the nicest API to work with. So add function called OpenTransientFile, which returns a unbuffered fd that's cleaned up like the FILE* returned by AllocateFile(). This plugs a few rare fd leaks in error cases: 1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile 2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't use OpenTransientFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenTransientFile instead of PathNameOpenFile. In addition to plugging those leaks, this replaces many BasicOpenFile() calls with OpenTransientFile() that were not leaking, because the code meticulously closed the file on error. That wasn't strictly necessary, but IMHO it's good for robustness. The same leaks exist in older versions, but given the rarity of the issues, I'm not backpatching this. Not yet, anyway - it might be good to backpatch later, after this mechanism has had some more testing in master branch.	2012-11-27 10:25:50 +02:00
Tom Lane	532994299e	Revert patch for taking fewer snapshots. This reverts commit `d573e239f0`, "Take fewer snapshots". While that seemed like a good idea at the time, it caused execution to use a snapshot that had been acquired before locking any of the tables mentioned in the query. This created user-visible anomalies that were not present in any prior release of Postgres, as reported by Tomas Vondra. While this whole area could do with a redesign (since there are related cases that have anomalies anyway), it doesn't seem likely that any future patch would be reasonably back-patchable; and we don't want 9.2 to exhibit a behavior that's subtly unlike either past or future releases. Hence, revert to prior code while we rethink the problem.	2012-11-26 15:55:43 -05:00
Tom Lane	d3237e04ca	Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is pretty useless (there being only one output row), but nonetheless it shouldn't fail. But it could fail if "tab" is an inheritance parent, because planagg.c's code for fixing up equivalence classes after making the index-optimized MIN/MAX transformation wasn't prepared to find child-table versions of the aggregate expression. The least ugly fix seems to be to add an option to mutate_eclass_expressions() to skip child-table equivalence class members, which aren't used anymore at this stage of planning so it's not really necessary to fix them. Since child members are ignored in many cases already, it seems plausible for mutate_eclass_expressions() to have an option to ignore them too. Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the same code exists before that, it cannot encounter child-table aggregates AFAICS, because the index optimization transformation cannot succeed on inheritance trees before 9.1 (for lack of MergeAppend).	2012-11-26 12:57:58 -05:00
Michael Meskes	c50b8a4637	Applied patch by Chen Huajun <chenhj@cn.fujitsu.com> to make ecpg able to cope with very long structs.	2012-11-23 14:39:27 +01:00

1 2 3 4 5 ...

23861 Commits