postgresql

Commit Graph

Author	SHA1	Message	Date
Simon Riggs	8693559cac	New autovacuum_work_mem parameter If autovacuum_work_mem is set, autovacuum workers now use this parameter in preference to maintenance_work_mem. Peter Geoghegan	2013-12-12 11:42:39 +00:00
Simon Riggs	36da3cfb45	Allow time delayed standbys and recovery Set min_recovery_apply_delay to force a delay in recovery apply for commit and restore point WAL records. Other records are replayed immediately. Delay is measured between WAL record time and local standby time. Robert Haas, Fabrízio de Royes Mello and Simon Riggs Detailed review by Mitsumasa Kondo	2013-12-12 10:53:20 +00:00
Heikki Linnakangas	108e3992cd	Display old and new values in pg_resetxlog -n output. For extra clarity. Rajeev Rastogi, reviewed by Amit Kapila	2013-12-12 11:57:18 +02:00
Tom Lane	22310b808d	Remove bogus executable permissions on xlog.c. Apparently fat-fingered in `1a3d104475`. Noted by Peter Geoghegan.	2013-12-11 22:12:25 -05:00
Tom Lane	6bff0e7d92	Add a regression test case for plpython function returning setof RECORD. We had coverage for functions returning setof a named composite type, but not for anonymous records, which is a somewhat different code path. In view of recent crash report from Sergey Konoplev, this seems worth testing, though I doubt there's any deterministic bug here today.	2013-12-11 17:22:55 -05:00
Simon Riggs	cf589c9c1f	Regression tests for SCHEMA commands Hari Babu Kommi reviewed by David Rowley	2013-12-11 20:45:15 +00:00
Simon Riggs	b921a26fb8	Regression tests for ALTER TABLESPACE RENAME,OWNER Hari Babu Kommi reviewed by David Rowley	2013-12-11 20:42:58 +00:00
Tom Lane	b5e0a2a384	Tweak placement of explicit ANALYZE commands in the regression tests. Make the COPY test, which loads most of the large static tables used in the tests, also explicitly ANALYZE those tables. This allows us to get rid of various ad-hoc, and rather redundant, ANALYZE commands that had gotten stuck into various test scripts over time to ensure we got consistent plan choices. (We could have done a database-wide ANALYZE, but that would cause stats to get attached to the small static tables too, which results in plan changes compared to the historical behavior. I'm not sure that's a good idea, so not going that far for now.) Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing regression tests for lack of an "ANALYZE tenk1" in the subselect test. There's no need for this in 8.4 since we didn't print any plans back then.	2013-12-11 15:09:15 -05:00
Robert Haas	60dd40bbda	Under wal_level=logical, when saving old tuples, always save OID. There's no real point in not doing this. It doesn't cost anything in performance or space. So let's go wild. Andres Freund, with substantial editing as to style by me.	2013-12-11 13:19:31 -05:00
Kevin Grittner	09df854b8a	Add table name to VACUUM statement in matview.c. The test only needs the one table to be vacuumed. Vacuuming the database may affect other tests. Per gripe from Tom Lane. Back-patch to 9.3, where the test was was added.	2013-12-11 08:53:03 -06:00
Peter Eisentraut	e5dc4cc24d	PL/Perl: Add event trigger support From: Dimitri Fontaine <dimitri@2ndQuadrant.fr>	2013-12-11 08:11:59 -05:00
Robert Haas	6bea96dd49	Add a new option, -g, to createuser, to add membership in a role. Chistopher Browne, reviewed by Sameer Thakur, Amit Kapila, and Peter Eisentraut.	2013-12-11 07:50:36 -05:00
Robert Haas	66abc2608c	Add a new reloption, user_catalog_table. When this reloption is set and wal_level=logical is configured, we'll record the CIDs stamped by inserts, updates, and deletes to the table just as we would for an actual catalog table. This will allow logical decoding to use historical MVCC snapshots to access such tables just as they access ordinary catalog tables. Replication solutions built around the logical decoding machinery will likely need to set this operation for their configuration tables; it might also be needed by extensions which perform table access in their output functions. Andres Freund, reviewed by myself and others.	2013-12-10 19:17:34 -05:00
Robert Haas	e55704d8b2	Add new wal_level, logical, sufficient for logical decoding. When wal_level=logical, we'll log columns from the old tuple as configured by the REPLICA IDENTITY facility added in commit `07cacba983`. This makes it possible a properly-configured logical replication solution to correctly follow table updates even if they change the chosen key columns, or, with REPLICA IDENTITY FULL, even if the table has no key at all. Note that updates which do not modify the replica identity column won't log anything extra, making the choice of a good key (i.e. one that will rarely be changed) important to performance when wal_level=logical is configured. Each insert, update, or delete to a catalog table will also log the CMIN and/or CMAX values of stamped by the current transaction. This is necessary because logical decoding will require access to historical snapshots of the catalog in order to decode some data types, and the CMIN/CMAX values that we may need in order to judge row visibility may have been overwritten by the time we need them. Andres Freund, reviewed in various versions by myself, Heikki Linnakangas, KONDO Mitsumasa, and many others.	2013-12-10 19:01:40 -05:00
Tom Lane	9ec6199d18	Fix possible crash with nested SubLinks. An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...) could produce an invalid plan that results in a crash at execution time, if the planner attempts to flatten the outer IN into a semi-join. This happens because convert_testexpr() was not expecting any nested SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging to the inner SubLink. (I think the comment denying that this case could happen was wrong when written; it's certainly been wrong for quite a long time, since very early versions of the semijoin flattening logic.) Per report from Teodor Sigaev. Back-patch to all supported branches.	2013-12-10 16:10:17 -05:00
Noah Misch	53685d7981	Rename TABLE() to ROWS FROM(). SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and other collection types. This feature, however, deals with set-returning functions. Use a different syntax for this feature to keep open the possibility of implementing the standard TABLE().	2013-12-10 09:34:37 -05:00
Robert Haas	d9250da032	Fixups for dsm.c's file descriptor handling. Per complaint from Tom Lane.	2013-12-09 11:15:19 -05:00
Peter Eisentraut	3164721462	SSL: Support ECDH key exchange This sets up ECDH key exchange, when compiling against OpenSSL that supports EC. Then the ECDHE-RSA and ECDHE-ECDSA cipher suites can be used for SSL connections. The latter one means that EC keys are now usable. The reason for EC key exchange is that it's faster than DHE and it allows to go to higher security levels where RSA will be horribly slow. There is also new GUC option ssl_ecdh_curve that specifies the curve name used for ECDH. It defaults to "prime256v1", which is the most common curve in use in HTTPS. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>	2013-12-07 15:11:44 -05:00
Peter Eisentraut	ef3267523d	SSL: Add configuration option to prefer server cipher order By default, OpenSSL (and SSL/TLS in general) lets the client cipher order take priority. This is OK for browsers where the ciphers were tuned, but few PostgreSQL client libraries make the cipher order configurable. So it makes sense to have the cipher order in postgresql.conf take priority over client defaults. This patch adds the setting "ssl_prefer_server_ciphers" that can be turned on so that server cipher order is preferred. Per discussion, this now defaults to on. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>	2013-12-07 08:13:50 -05:00
Alvaro Herrera	312bde3d40	Fix improper abort during update chain locking In `247c76a989`, I added some code to do fine-grained checking of MultiXact status of locking/updating transactions when traversing an update chain. There was a thinko in that patch which would have the traversing abort, that is return HeapTupleUpdated, when the other transaction is a committed lock-only. In this case we should ignore it and return success instead. Of course, in the case where there is a committed update, HeapTupleUpdated is the correct return value. A user-visible symptom of this bug is that in REPEATABLE READ and SERIALIZABLE transaction isolation modes spurious serializability errors can occur: ERROR: could not serialize access due to concurrent update In order for this to happen, there needs to be a tuple that's key-share- locked and also updated, and the update must abort; a subsequent transaction trying to acquire a new lock on that tuple would abort with the above error. The reason is that the initial FOR KEY SHARE is seen as committed by the new locking transaction, which triggers this bug. (If the UPDATE commits, then the serialization error is correctly reported.) When running a query in READ COMMITTED mode, what happens is that the locking is aborted by the HeapTupleUpdated return value, then EvalPlanQual fetches the newest version of the tuple, which is then the only version that gets locked. (The second time the tuple is checked there is no misbehavior on the committed lock-only, because it's not checked by the code that traverses update chains; so no bug.) Only the newest version of the tuple is locked, not older ones, but this is harmless. The isolation test added by this commit illustrates the desired behavior, including the proper serialization errors that get thrown. Backpatch to 9.3.	2013-12-05 17:47:51 -03:00
Tom Lane	74242c23c1	Clear retry flags properly in replacement OpenSSL sock_write function. Current OpenSSL code includes a BIO_clear_retry_flags() step in the sock_write() function. Either we failed to copy the code correctly, or they added this since we copied it. In any case, lack of the clear step appears to be the cause of the server lockup after connection loss reported in bug #8647 from Valentine Gogichashvili. Assume that this is correct coding for all OpenSSL versions, and hence back-patch to all supported branches. Diagnosis and patch by Alexander Kukushkin.	2013-12-05 12:48:28 -05:00
Alvaro Herrera	07aeb1fec5	Avoid resetting Xmax when it's a multi with an aborted update HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while checking the contents of a multixact and finding it contains an aborted update, by setting the HEAP_XMAX_INVALID bit. This would lead to concurrent transactions not noticing any previous locks held by transactions that might still be running, and thus being able to acquire subsequent locks they wouldn't be normally able to acquire. This bug was introduced in commit 1ce150b7bb; backpatch this fix to 9.3, like that commit. This change reverts the change to the delete-abort-savept isolation test in `1ce150b7bb`, because that behavior change was caused by this bug. Noticed by Andres Freund while investigating a different issue reported by Noah Misch.	2013-12-05 12:21:55 -03:00
Bruce Momjian	86ef4796f5	build: pass EXTRA_REGRESS_OPTS to secondary regression tests Christoph Berg	2013-12-04 10:14:45 -05:00
Heikki Linnakangas	9e857436ef	Don't include unused space in LOG_NEWPAGE records. This is the same trick we use when taking a full page image of a buffer passed to XLogInsert.	2013-12-04 00:10:47 +02:00
Heikki Linnakangas	22122c83f1	Fix full-page writes of internal GIN pages. Insertion to a non-leaf GIN page didn't make a full-page image of the page, which is wrong. The code used to do it correctly, but was changed (commit `853d1c3103`) because the redo-routine didn't track incomplete splits correctly when the page was restored from a full page image. Of course, that was not right way to fix it, the redo routine should've been fixed instead. The redo-routine was surreptitiously fixed in 2010 (commit `4016bdef8a`), so all we need to do now is revert the code that creates the record to its original form. This doesn't change the format of the WAL record. Backpatch to all supported versions.	2013-12-03 23:16:01 +02:00
Bruce Momjian	4a8adfd4d0	C comment: again update comment for pg_fe_sendauth for error cases	2013-12-03 11:42:18 -05:00
Bruce Momjian	6a6b7bbb81	Update C comment for pg_fe_getauthname This function no longer takes an argument.	2013-12-03 11:33:46 -05:00
Bruce Momjian	9e0a97f1c8	libpq: change PQconndefaults() to ignore invalid service files Previously missing or invalid service files returned NULL. Also fix pg_upgrade to report "out of memory" for a null return from PQconndefaults(). Patch by Steve Singer, rewritten by me	2013-12-03 11:12:25 -05:00
Peter Eisentraut	fef88b3fda	Report exit code from external recovery commands properly When an external recovery command such as restore_command or archive_cleanup_command fails, report the exit code properly, distinguishing signals and normal exists, using the existing wait_result_to_str() facility, instead of just reporting the return value from system(). Reviewed-by: Peter Geoghegan <pg@heroku.com>	2013-12-02 22:31:05 -05:00
Tom Lane	7ab321404c	Fix crash in assign_collations_walker for EXISTS with empty SELECT list. We (I think I, actually) forgot about this corner case while coding collation resolution. Per bug #8648 from Arjen Nienhuis.	2013-12-02 20:28:45 -05:00
Tom Lane	7a1e34d371	Increase git_changelog's timestamp_slop from 10 min to 1 day. Many committers seem to now be using a work flow in which back-patched commits are timestamped minutes or even hours apart in different branches (most likely because they commit in one branch before starting work on the next one). git_changelog was failing to merge its reports in such cases, so increase the max time it's willing to merge commits across. I considered getting rid of the limit altogether, but that produces some odd results in terms of how the merged commit gets sorted relative to unrelated commits.	2013-12-02 11:33:49 -05:00
Robert Haas	c6d4b1dd3e	Flag mmap implemenation of dynamic shared memory as resize-capable. Error noted by Heikki Linnakangas	2013-12-02 11:18:54 -05:00
Robert Haas	a8656a3ab0	Make NUM_TOCHAR_prepare and NUM_TOCHAR_finish macros declare "len". Remove the variable from the enclosing scopes so that nothing can be relying on it. The net result of this refactoring is that we get rid of a few unnecessary strlen() calls. Original patch from Greg Jaskiewicz, substantially expanded by me.	2013-12-02 10:51:06 -05:00
Robert Haas	9d140f7be2	Avoid out-of-bounds read in errfinish if error_stack_depth < 0. If errordata_stack_depth < 0, we won't find that out and correct the problem until CHECK_STACK_DEPTH() is invoked. In the meantime, elevel will be set based on an invalid read. This is probably harmless in practice, but it seems cleaner this way. Xi Wang	2013-12-02 10:42:01 -05:00
Peter Eisentraut	3e3520cf7a	Translation updates	2013-12-02 00:17:07 -05:00
Tom Lane	335470251d	Update time zone data files to tzdata release 2013h. DST law changes in Argentina, Brazil, Jordan, Libya, Liechtenstein, Morocco, Palestine. New timezone abbreviations WIB, WIT, WITA for Indonesia.	2013-12-01 14:11:44 -05:00
Kevin Grittner	4bd371f6f8	Fix pg_dumpall to work for databases flagged as read-only. pg_dumpall's charter is to be able to recreate a database cluster's contents in a virgin installation, but it was failing to honor that contract if the cluster had any ALTER DATABASE SET default_transaction_read_only settings. By including a SET command for the connection for each connection opened by pg_dumpall output, errors are avoided and the source cluster is successfully recreated. There was discussion of whether to also set this for the connection applying pg_dump output, but it was felt that it was both less appropriate in that context, and far easier to work around. Backpatch to all supported branches.	2013-11-30 11:24:56 -06:00
Peter Eisentraut	34fa72ec9c	Remove use of obsolescent Autoconf macros Remove the use of the following macros, which are obsolescent according to the Autoconf documentation: - AC_C_CONST - AC_C_STRINGIZE - AC_C_VOLATILE - AC_FUNC_MEMCMP	2013-11-30 09:17:08 -05:00
Alvaro Herrera	2393c7d102	Fix a couple of bugs in MultiXactId freezing Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look into a multixact to check the members against cutoff_xid. This means that a very old Xid could survive hidden within a multi, possibly outliving its CLOG storage. In the distant future, this would cause clog lookup failures: ERROR: could not access status of transaction 3883960912 DETAIL: Could not open file "pg_clog/0E78": No such file or directory. This mostly was problematic when the updating transaction aborted, since in that case the row wouldn't get pruned away earlier in vacuum and the multixact could possibly survive for a long time. In many cases, data that is inaccessible for this reason way can be brought back heuristically. As a second bug, heap_freeze_tuple() didn't properly handle multixacts that need to be frozen according to cutoff_multi, but whose updater xid is still alive. Instead of preserving the update Xid, it just set Xmax invalid, which leads to both old and new tuple versions becoming visible. This is pretty rare in practice, but a real threat nonetheless. Existing corrupted rows, unfortunately, cannot be repaired in an automated fashion. Existing physical replicas might have already incorrectly frozen tuples because of different behavior than in master, which might only become apparent in the future once pg_multixact/ is truncated; it is recommended that all clones be rebuilt after upgrading. Following code analysis caused by bug report by J Smith in message CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com and privately by F-Secure. Backpatch to 9.3, where freezing of MultiXactIds was introduced. Analysis and patch by Andres Freund, with some tweaks by Álvaro.	2013-11-29 21:47:25 -03:00
Alvaro Herrera	1ce150b7bb	Don't TransactionIdDidAbort in HeapTupleGetUpdateXid It is dangerous to do so, because some code expects to be able to see what's the true Xmax even if it is aborted (particularly while traversing HOT chains). So don't do it, and instead rely on the callers to verify for abortedness, if necessary. Several race conditions and bugs fixed in the process. One isolation test changes the expected output due to these. This also reverts commit `c235a6a589`, which is no longer necessary. Backpatch to 9.3, where this function was introduced. Andres Freund	2013-11-29 21:47:21 -03:00
Alvaro Herrera	1df0122daa	Truncate pg_multixact/'s contents during crash recovery Commit `9dc842f08` of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund	2013-11-29 21:47:15 -03:00
Alvaro Herrera	f54106f77e	Fix full-table-vacuum request mechanism for MultiXactIds While autovacuum dutifully launched anti-multixact-wraparound vacuums when the multixact "age" was reached, the vacuum code was not aware that it needed to make them be full table vacuums. As the resulting partial-table vacuums aren't capable of actually increasing relminmxid, autovacuum continued to launch anti-wraparound vacuums that didn't have the intended effect, until age of relfrozenxid caused the vacuum to finally be a full table one via vacuum_freeze_table_age. To fix, introduce logic for multixacts similar to that for plain TransactionIds, using the same GUCs. Backpatch to 9.3, where permanent MultiXactIds were introduced. Andres Freund, some cleanup by Álvaro	2013-11-29 21:47:13 -03:00
Alvaro Herrera	76a31c689c	Replace hardcoded 200000000 with autovacuum_freeze_max_age Parts of the code used autovacuum_freeze_max_age to determine whether anti-multixact-wraparound vacuums are necessary, while others used a hardcoded 200000000 value. This leads to problems when autovacuum_freeze_max_age is set to a non-default value. Use the latter everywhere. Backpatch to 9.3, where vacuuming of multixacts was introduced. Andres Freund	2013-11-29 21:47:09 -03:00
Tom Lane	79193c75f8	Fix assorted issues in pg_ctl's pgwin32_CommandLine(). Ensure that the invocation command for postgres or pg_ctl runservice double-quotes the executable's pathname; failure to do this leads to trouble when the path contains spaces. Also, ensure that the path ends in ".exe" in both cases and uses backslashes rather than slashes as directory separators. The latter issue is reported to confuse some third-party tools such as Symantec Backup Exec. Also, rewrite the function to avoid buffer overrun issues by using a PQExpBuffer instead of a fixed-size static buffer. Combinations of very long executable pathnames and very long data directory pathnames could have caused trouble before, for example. Back-patch to all active branches, since this code has been like this for a long while. Naoya Anzai and Tom Lane, reviewed by Rajeev Rastogi	2013-11-29 18:34:07 -05:00
Tom Lane	8b151558c8	Be sure to release proc->backendLock after SetupLockInTable() failure. The various places that transferred fast-path locks to the main lock table neglected to release the PGPROC's backendLock if SetupLockInTable failed due to being out of shared memory. In most cases this is no big deal since ensuing error cleanup would release all held LWLocks anyway. But there are some hot-standby functions that don't consider failure of FastPathTransferRelationLocks to be a hard error, and in those cases this oversight could lead to system lockup. For consistency, make all of these places look the same as FastPathTransferRelationLocks. Noted while looking for the cause of Dan Wood's bugs --- this wasn't it, but it's a bug anyway.	2013-11-29 17:35:09 -05:00
Tom Lane	16e1b7a1b7	Fix assorted race conditions in the new timeout infrastructure. Prevent handle_sig_alarm from losing control partway through due to a query cancel (either an asynchronous SIGINT, or a cancel triggered by one of the timeout handler functions). That would at least result in failure to schedule any required future interrupt, and might result in actual corruption of timeout.c's data structures, if the interrupt happened while we were updating those. We could still lose control if an asynchronous SIGINT arrives just as the function is entered. This wouldn't break any data structures, but it would have the same effect as if the SIGALRM interrupt had been silently lost: we'd not fire any currently-due handlers, nor schedule any new interrupt. To forestall that scenario, forcibly reschedule any pending timer interrupt during AbortTransaction and AbortSubTransaction. We can avoid any extra kernel call in most cases by not doing that until we've allowed LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events. Another hazard is that some platforms (at least Linux and *BSD) block a signal before calling its handler and then unblock it on return. When we longjmp out of the handler, the unblock doesn't happen, and the signal is left blocked indefinitely. Again, we can fix that by forcibly unblocking signals during AbortTransaction and AbortSubTransaction. These latter two problems do not manifest when the longjmp reaches postgres.c, because the error recovery code there kills all pending timeout events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal mask is restored. So errors thrown outside any transaction should be OK already, and cleaning up in AbortTransaction and AbortSubTransaction should be enough to fix these issues. (We're assuming that any code that catches a query cancel error and doesn't re-throw it will do at least a subtransaction abort to clean up; but that was pretty much required already by other subsystems.) Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when disabling that event: if a lock timeout interrupt happened after the lock was granted, the ensuing query cancel is still going to happen at the next CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user cancel. Per reports from Dan Wood. Back-patch to 9.3 where the new timeout handling infrastructure was introduced. We may at some point decide to back-patch the signal unblocking changes further, but I'll desist from that until we hear actual field complaints about it.	2013-11-29 16:41:00 -05:00
Robert Haas	8e18d04d4d	Refine our definition of what constitutes a system relation. Although user-defined relations can't be directly created in pg_catalog, it's possible for them to end up there, because you can create them in some other schema and then use ALTER TABLE .. SET SCHEMA to move them there. Previously, such relations couldn't afterwards be manipulated, because IsSystemRelation()/IsSystemClass() rejected all attempts to modify objects in the pg_catalog schema, regardless of their origin. With this patch, they now reject only those objects in pg_catalog which were created at initdb-time, allowing most operations on user-created tables in pg_catalog to proceed normally. This patch also adds new functions IsCatalogRelation() and IsCatalogClass(), which is similar to IsSystemRelation() and IsSystemClass() but with a slightly narrower definition: only TOAST tables of system catalogs are included, rather than all TOAST tables. This is currently used only for making decisions about when invalidation messages need to be sent, but upcoming logical decoding patches will find other uses for this information. Andres Freund, with some modifications by me.	2013-11-28 20:57:20 -05:00
Heikki Linnakangas	2fe69cacff	Another gin_desc fix. The number of items inserted was incorrectly printed as if it was a boolean.	2013-11-28 23:35:50 +02:00
Heikki Linnakangas	97c19e6c38	Fix gin_desc routine to match the WAL format. In the GIN incomplete-splits patch, I used BlockIdDatas to store the block number of left and right children, when inserting a downlink after a split to an internal page posting list page. But gin_desc thought they were stored as BlockNumbers.	2013-11-28 21:57:42 +02:00
Tom Lane	da8a716089	Fix latent(?) race condition in LockReleaseAll. We have for a long time checked the head pointer of each of the backend's proclock lists and skipped acquiring the corresponding locktable partition lock if the head pointer was NULL. This was safe enough in the days when proclock lists were changed only by the owning backend, but it is pretty questionable now that the fast-path patch added cases where backends add entries to other backends' proclock lists. However, we don't really wish to revert to locking each partition lock every time, because in simple transactions that would add a lot of useless lock/unlock cycles on already-heavily-contended LWLocks. Fortunately, the only way that another backend could be modifying our proclock list at this point would be if it was promoting a formerly fast-path lock of ours; and any such lock must be one that we'd decided not to delete in the previous loop over the locallock table. So it's okay if we miss seeing it in this loop; we'd just decide not to delete it again. However, once we've detected a non-empty list, we'd better re-fetch the list head pointer after acquiring the partition lock. This guards against possibly fetching a corrupt-but-non-null pointer if pointer fetch/store isn't atomic. It's not clear if any practical architectures are like that, but we've never assumed that before and don't wish to start here. In any case, the situation certainly deserves a code comment. While at it, refactor the partition traversal loop to use a for() construct instead of a while() loop with goto's. Back-patch, just in case the risk is real and not hypothetical.	2013-11-28 12:17:46 -05:00

1 2 3 4 5 ...

24758 Commits