postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-07-18 20:51:09 +02:00

Author	SHA1	Message	Date
Tom Lane	8de3e410fa	In RelationClearRelation, postpone cache reload if !IsTransactionState(). We may process relcache flush requests during transaction startup or shutdown. In general it's not terribly safe to do catalog access at those times, so the code's habit of trying to immediately revalidate unflushable relcache entries is risky. Although there are no field trouble reports that are positively traceable to this, we have been able to demonstrate failure of the assertions recently added in RelationIdGetRelation() and SearchCatCache(). On the other hand, it seems safe to just postpone revalidation of the cache entry until we're inside a valid transaction. The one case where this is questionable is where we're exiting a subtransaction and the outer transaction is holding the relcache entry open --- but if we made any significant changes to the rel inside such a subtransaction, we've got problems anyway. There are mechanisms in place to prevent that (to wit, locks for cross-session cases and CheckTableNotInUse() for intra-session cases), so let's trust to those mechanisms to keep us out of trouble.	2014-02-06 19:38:06 -05:00
Andrew Dunstan	45e1b6c4c4	Alphabeticize list in OBJS definition in utils/adt Makefile.	2014-02-06 12:11:49 -05:00
Tom Lane	ddfc9cb054	Assert(IsTransactionState()) in RelationIdGetRelation(). Commit `42c80c696e` added an Assert(IsTransactionState()) in SearchCatCache(), to catch any code that thought it could do a catcache lookup outside transactions. Extend the same idea to relcache lookups.	2014-02-06 11:28:13 -05:00
Peter Eisentraut	f65233755c	Fix whitespace	2014-02-05 23:12:51 -05:00
Fujii Masao	489e6ac5a1	Fix comparison of an array of characters with zero to compare with '\0' instead. Report from Andres Freund.	2014-02-04 10:59:39 +09:00
Andrew Dunstan	d3ee45152b	In json code, clean up temp memory contexts after processing. Craig Ringer.	2014-02-03 10:40:12 -05:00
Fujii Masao	3e8554a54a	Make pg_basebackup skip temporary statistics files. The temporary statistics files don't need to be included in the backup because they are always reset at the beginning of the archive recovery. This patch changes pg_basebackup so that it skips all files located in $PGDATA/pg_stat_tmp or the directory specified by stats_temp_directory parameter.	2014-02-03 23:19:49 +09:00
Bruce Momjian	d0ee93797d	arrays: tighten checks for multi-dimensional input Previously an input array string that started with a single-element array dimension would then later accept a multi-dimensional segment. BACKWARD INCOMPATIBILITY	2014-02-01 10:49:17 -05:00
Robert Haas	858ec11858	Introduce replication slots. Replication slots are a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation. In a few places, we refer to the type of replication slots introduced by this patch as "physical" slots, because forthcoming patches for logical decoding will also have slots, but with somewhat different properties. Andres Freund and Robert Haas	2014-01-31 22:45:36 -05:00
Bruce Momjian	146604ec43	Add checks for interval overflow/underflow New checks include input, month/day/time internal adjustments, addition, subtraction, multiplication, and negation. Also adjust docs to correctly specify interval size in bytes. Report from Rok Kralj	2014-01-30 09:41:43 -05:00
Andrew Dunstan	120c5cc761	Silence compiler warnings about possibly unset variables. They are in fact set in every case where they are needed, but the compiler doesn't know that. Per gripe from Tom Lane.	2014-01-29 18:54:14 -05:00
Andrew Dunstan	5264d91541	Add json_array_elements_text function. This was a notable omission from the json functions added in 9.3 and there have been numerous complaints about its absence. Laurence Rowe.	2014-01-29 15:39:01 -05:00
Heikki Linnakangas	1a3458b6d8	Allow using huge TLB pages on Linux (MAP_HUGETLB) This patch adds an option, huge_tlb_pages, which allows requesting the shared memory segment to be allocated using huge pages, by using the MAP_HUGETLB flag in mmap(). This can improve performance. The default is 'try', which means that we will attempt using huge pages, and fall back to non-huge pages if it doesn't work. Currently, only Linux has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as 'off'. In the passing, don't try to round the mmap() size to a multiple of pagesize. mmap() doesn't require that, and there's no particular reason for PostgreSQL to do that either. When using MAP_HUGETLB, however, round the request size up to nearest 2MB boundary. This is to work around a bug in some Linux kernel versions, but also to avoid wasting memory, because the kernel will round the size up anyway. Many people were involved in writing this patch, including Christian Kruse, Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund and me.	2014-01-29 14:08:30 +02:00
Andrew Dunstan	105639900b	New json functions. json_build_array() and json_build_object allow for the construction of arbitrarily complex json trees. json_object() turns a one or two dimensional array, or two separate arrays, into a json_object of name/value pairs, similarly to the hstore() function. json_object_agg() aggregates its two arguments into a single json object as name value pairs. Catalog version bumped. Andrew Dunstan, reviewed by Marko Tiikkaja.	2014-01-28 17:48:21 -05:00
Fujii Masao	9132b189bf	Add pg_stat_archiver statistics view. This view shows the statistics about the WAL archiver process's activity. Gabriele Bartolini, reviewed by Michael Paquier, refactored a bit by me.	2014-01-29 02:58:22 +09:00
Robert Haas	ea9df812d8	Relax the requirement that all lwlocks be stored in a single array. This makes it possible to store lwlocks as part of some other data structure in the main shared memory segment, or in a dynamic shared memory segment. There is still a main LWLock array and this patch does not move anything out of it, but it provides necessary infrastructure for doing that in the future. This change is likely to increase the size of LWLockPadded on some platforms, especially 32-bit platforms where it was previously only 16 bytes. Patch by me. Review by Andres Freund and KaiGai Kohei.	2014-01-27 11:07:44 -05:00
Tom Lane	2850896961	Code review for auto-tuned effective_cache_size. Fix integer overflow issue noted by Magnus Hagander, as well as a bunch of other infelicities in commit `ee1e5662d8` and its unreasonably large number of followups.	2014-01-27 00:05:56 -05:00
Fujii Masao	dd515d4082	Change the suffix of auto conf temporary file from "temp" to "tmp". Michael Paquier	2014-01-27 12:39:11 +09:00
Fujii Masao	7c619be623	Fix typos in comments for ALTER SYSTEM. Michael Paquier	2014-01-27 12:23:20 +09:00
Andrew Dunstan	cec8394b5c	Enable building with Visual Studion 2013. Backpatch to 9.3. Brar Piening.	2014-01-26 09:49:10 -05:00
Tom Lane	ac4ef637ad	Allow use of "z" flag in our printf calls, and use it where appropriate. Since C99, it's been standard for printf and friends to accept a "z" size modifier, meaning "whatever size size_t has". Up to now we've generally dealt with printing size_t values by explicitly casting them to unsigned long and using the "l" modifier; but this is really the wrong thing on platforms where pointers are wider than longs (such as Win64). So let's start using "z" instead. To ensure we can do that on all platforms, teach src/port/snprintf.c to understand "z", and add a configure test to force use of that implementation when the platform's version doesn't handle "z". Having done that, modify a bunch of places that were using the unsigned-long hack to use "z" instead. This patch doesn't pretend to have gotten everyplace that could benefit, but it catches many of them. I made an effort in particular to ensure that all uses of the same error message text were updated together, so as not to increase the number of translatable strings. It's possible that this change will result in format-string warnings from pre-C99 compilers. We might have to reconsider if there are any popular compilers that will warn about this; but let's start by seeing what the buildfarm thinks. Andres Freund, with a little additional work by me	2014-01-23 17:18:33 -05:00
Alvaro Herrera	b152c6cd0d	Make DROP IF EXISTS more consistently not fail Some cases were still reporting errors and aborting, instead of a NOTICE that the object was being skipped. This makes it more difficult to cleanly handle pg_dump --clean, so change that to instead skip missing objects properly. Per bug #7873 reported by Dave Rolsky; apparently this affects a large number of users. Authors: Pavel Stehule and Dean Rasheed. Some tweaks by Álvaro Herrera	2014-01-23 14:40:29 -03:00
Andrew Dunstan	243ee26633	Reindent json.c and jsonfuncs.c. This will help in preparation of clean patches for upcoming json work.	2014-01-22 08:46:51 -05:00
Robert Haas	01f7808b3e	Add a cardinality function for arrays. Unlike our other array functions, this considers the total number of elements across all dimensions, and returns 0 rather than NULL when the array has no elements. But it seems that both of those behaviors are almost universally disliked, so hopefully that's OK. Marko Tiikkaja, reviewed by Dean Rasheed and Pavel Stehule	2014-01-21 12:38:53 -05:00
Robert Haas	033b2343fa	Fix inadvertent semantics change in last patch to plug memory leaks. Commit `a5bca4ef03` accidentally changed the semantics when the "skipping missing configuration file" is emitted, because it forced OK to true instead of leaving the value untouched. Spotted by Tom Lane.	2014-01-21 11:42:37 -05:00
Robert Haas	a5bca4ef03	Plug more memory leaks when reloading config file. Commit `138184adc5` plugged some but not all of the leaks from commit `2a0c81a12c`. This tightens things up some more. Amit Kapila, per an observation by Tom Lane	2014-01-21 09:41:40 -05:00
Tom Lane	9a8f5729b4	Fix to_timestamp/to_date's handling of consecutive spaces in format string. When there are consecutive spaces (or other non-format-code characters) in the format, we should advance over exactly that many characters of input. The previous coding mistakenly did a "skip whitespace" action between such characters, possibly allowing more input to be skipped than the user intended. We only need to skip whitespace just before an actual field. This is really a bug fix, but given the minimal number of field complaints and the risk of breaking applications coded to expect the old behavior, let's not back-patch it. Jeevan Chalke	2014-01-20 13:45:51 -05:00
Magnus Hagander	4b8f2859cc	Adjust the SSL connection notification message Suggested by Tom	2014-01-19 13:27:22 +01:00
Tom Lane	0d79c0a8cc	Make various variables const (read-only). These changes should generally improve correctness/maintainability. A nice side benefit is that several kilobytes move from initialized data to text segment, allowing them to be shared across processes and probably reducing copy-on-write overhead while forking a new backend. Unfortunately this doesn't seem to help libpq in the same way (at least not when it's compiled with -fpic on x86_64), but we can hope the linker at least collects all nominally-const data together even if it's not actually part of the text segment. Also, make pg_encname_tbl[] static in encnames.c, since there seems no very good reason for any other code to use it; per a suggestion from Wim Lewis, who independently submitted a patch that was mostly a subset of this one. Oskari Saarenmaa, with some editorialization by me	2014-01-18 16:04:32 -05:00
Magnus Hagander	4cba1f6bbf	Show SSL encryption information when logging connections Expand the messages when log_connections is enabled to include the fact that SSL is used and the SSL cipher information. Dr. Andreas Kunert, review by Marko Kreen	2014-01-17 13:32:31 +01:00
Robert Haas	05ff5062da	Code improvements for ALTER SYSTEM .. SET. Move FreeConfigVariables() later to make sure ErrorConfFile is valid when we use it, and get rid of an unnecessary string copy operation. Amit Kapila, kibitzed by me.	2014-01-13 14:54:00 -05:00
Tom Lane	910bac5953	Fix possible crashes due to using elog/ereport too early in startup. Per reports from Andres Freund and Luke Campbell, a server failure during set_pglocale_pgservice results in a segfault rather than a useful error message, because the infrastructure needed to use ereport hasn't been initialized; specifically, MemoryContextInit hasn't been called. One known cause of this is starting the server in a directory it doesn't have permission to read. We could try to prevent set_pglocale_pgservice from using anything that depends on palloc or elog, but that would be messy, and the odds of future breakage seem high. Moreover there are other things being called in main.c that look likely to use palloc or elog too --- perhaps those things shouldn't be there, but they are there today. The best solution seems to be to move the call of MemoryContextInit to very early in the backend's real main() function. I've verified that an elog or ereport occurring immediately after that is now capable of sending something useful to stderr. I also added code to elog.c to print something intelligible rather than just crashing if MemoryContextInit hasn't created the ErrorContext. This could happen if MemoryContextInit itself fails (due to malloc failure), and provides some future-proofing against someone trying to sneak in new code even earlier in server startup. Back-patch to all supported branches. Since we've only heard reports of this type of failure recently, it may be that some recent change has made it more likely to see a crash of this kind; but it sure looks like it's broken all the way back.	2014-01-11 16:36:07 -05:00
Tom Lane	faab7a957d	Remove unnecessary local variables to work around an icc optimization bug. Buildfarm member dunlin has been crashing since commit `8b49a60`, but other machines seem fine with that code. It turns out that removing the local variables in ordered_set_startup() that are copies of fields in "qstate" dodges the problem. This might cost a few cycles on register-rich machines, but it's probably a wash on others, and in any case this code isn't performance-critical. Thanks to Jeremy Drake for off-list investigation.	2014-01-09 12:59:55 -05:00
Tom Lane	847e46abc9	Avoid extra AggCheckCallContext() checks in ordered-set aggregates. In the transition functions, we don't really need to recheck this after the first call. I had been feeling paranoid about possibly getting a non-null argument value in some other context; but it's probably game over anyway if we have a non-null "internal" value that's not what we are expecting. In the final functions, the general convention in pre-existing final functions seems to be that an Assert() is good enough, so do it like that here too. This seems to save a few tenths of a percent of overall query runtime, which isn't much, but still it's just overhead if there's not a plausible case where the checks would fire.	2014-01-08 14:33:52 -05:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Heikki Linnakangas	f68220df92	Silence compiler warning on MSVC. MSVC doesn't know that elog(ERROR) doesn't return, and gives a warning about missing return. Silence that. Amit Kapila	2014-01-07 21:49:15 +02:00
Peter Eisentraut	edc43458d7	Add more use of psprintf()	2014-01-06 21:30:26 -05:00
Tom Lane	8b49a6044d	Cache catalog lookup data across groups in ordered-set aggregates. The initial commit of ordered-set aggregates just did all the setup work afresh each time the aggregate function is started up. But in a GROUP BY query, the catalog lookups need not be repeated for each group, since the column datatypes and sort information won't change. When there are many small groups, this makes for a useful, though not huge, performance improvement. Per suggestion from Andrew Gierth. Profiling of these cases suggests that it might be profitable to avoid duplicate lookups within tuplesort startup as well; but changing the tuplesort APIs would have much broader impact, so I left that for another day.	2014-01-05 12:28:39 -05:00
Tom Lane	5858cf8ab2	Fix header comment for bitncmp(). The result is an int less than, equal to, or greater than zero, in the style of memcmp (and, in fact, exactly the output of memcmp in some cases). This comment previously said -1, 1, or 0, which was an overspecification, as noted by Emre Hasegeli. All of the existing callers appear to be fine with the actual behavior, so just fix the comment. In passing, improve infelicitous formatting of some call sites.	2014-01-04 14:01:51 -05:00
Robert Haas	4b351841fa	Rename walLogHints to wal_log_hints for easier grepping. Michael Paquier	2014-01-01 20:17:00 -05:00
Peter Eisentraut	71812a98cb	Update grammar From: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>	2013-12-28 20:54:23 -05:00
Andrew Dunstan	29dcf7ded5	Properly detect invalid JSON numbers when generating JSON. Instead of looking for characters that aren't valid in JSON numbers, we simply pass the output string through the JSON number parser, and if it fails the string is quoted. This means among other things that money and domains over money will be quoted correctly and generate valid JSON. Fixes bug #8676 reported by Anderson Cristian da Silva. Backpatched to 9.2 where JSON generation was introduced.	2013-12-27 17:04:00 -05:00
Kevin Grittner	a133bf7031	Fix misplaced right paren bugs in pgstatfuncs.c. The bug would only show up if the C sockaddr structure contained zero in the first byte for a valid address; otherwise it would fail to fail, which is probably why it went unnoticed for so long. Patch submitted by Joel Jacobson after seeing an article by Andrey Karpov in which he reports finding this through static code analysis using PVS-Studio. While I was at it I moved a definition of a local variable referenced in the buggy code to a more local context. Backpatch to all supported branches.	2013-12-27 15:26:24 -06:00
Tom Lane	1def747db6	Fix inadequately-tested code path in tuplesort_skiptuples(). Per report from Jeff Davis.	2013-12-24 17:13:02 -05:00
Tom Lane	4eeda92d86	Fix ANALYZE failure on a column that's a domain over a range. Most other range operations seem to work all right on domains, but this one not so much, at least not since commit `918eee0c`. Per bug #8684 from Brett Neumeier.	2013-12-23 22:18:48 -05:00
Tom Lane	8d65da1f01	Support ordered-set (WITHIN GROUP) aggregates. This patch introduces generic support for ordered-set and hypothetical-set aggregate functions, as well as implementations of the instances defined in SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(), percent_rank(), cume_dist()). We also added mode() though it is not in the spec, as well as versions of percentile_cont() and percentile_disc() that can compute multiple percentile values in one pass over the data. Unlike the original submission, this patch puts full control of the sorting process in the hands of the aggregate's support functions. To allow the support functions to find out how they're supposed to sort, a new API function AggGetAggref() is added to nodeAgg.c. This allows retrieval of the aggregate call's Aggref node, which may have other uses beyond the immediate need. There is also support for ordered-set aggregates to install cleanup callback functions, so that they can be sure that infrastructure such as tuplesort objects gets cleaned up. In passing, make some fixes in the recently-added support for variadic aggregates, and make some editorial adjustments in the recent FILTER additions for aggregates. Also, simplify use of IsBinaryCoercible() by allowing it to succeed whenever the target type is ANY or ANYELEMENT. It was inconsistent that it dealt with other polymorphic target types but not these. Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing, and rather heavily editorialized upon by Tom Lane	2013-12-23 16:11:35 -05:00
Robert Haas	37484ad2aa	Change the way we mark tuples as frozen. Instead of changing the tuple xmin to FrozenTransactionId, the combination of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous proposals to freeze tuples opportunistically before vacuum_freeze_min_age is reached have foundered on the objection that replacing xmin by FrozenTransactionId might hinder debugging efforts when things in this area go awry; this patch is intended to solve that problem by keeping the XID around (but largely ignoring the value to which it is set). Third-party code that checks for HEAP_XMIN_INVALID on tuples where HEAP_XMIN_COMMITTED might be set will be broken by this change. To fix, use the new accessor macros in htup_details.h rather than consulting the bits directly. HeapTupleHeaderGetXmin has been modified to return FrozenTransactionId when the infomask bits indicate that the tuple is frozen; use HeapTupleHeaderGetRawXmin when you already know that the tuple isn't marked commited or frozen, or want the raw value anyway. We currently do this in routines that display the xmin for user consumption, in tqual.c where it's known to be safe and important for the avoidance of extra cycles, and in the function-caching code for various procedural languages, which shouldn't invalidate the cache just because the tuple gets frozen. Robert Haas and Andres Freund	2013-12-22 15:49:09 -05:00
Fujii Masao	961bf59fb7	Rename wal_log_hintbits to wal_log_hints, per discussion on pgsql-hackers. Sawada Masahiko	2013-12-21 03:33:16 +09:00
Robert Haas	001a573a20	Allow on-detach callbacks for dynamic shared memory segments. Just as backends must clean up their shared memory state (releasing lwlocks, buffer pins, etc.) before exiting, they must also perform any similar cleanups related to dynamic shared memory segments they have mapped before unmapping those segments. So add a mechanism to ensure that. Existing on_shmem_exit hooks include both "user level" cleanup such as transaction abort and removal of leftover temporary relations and also "low level" cleanup that forcibly released leftover shared memory resources. On-detach callbacks should run after the first group but before the second group, so create a new before_shmem_exit function for registering the early callbacks and keep on_shmem_exit for the regular callbacks. (An earlier draft of this patch added an additional argument to on_shmem_exit, but that had a much larger footprint and probably a substantially higher risk of breaking third party code for no real gain.) Patch by me, reviewed by KaiGai Kohei and Andres Freund.	2013-12-18 13:09:09 -05:00
Alvaro Herrera	11ac4c73cb	Don't ignore tuple locks propagated by our updates If a tuple was locked by transaction A, and transaction B updated it, the new version of the tuple created by B would be locked by A, yet visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the lock held by A wouldn't get checked if transaction B later deleted (or key-updated) the new version of the tuple. This might cause referential integrity checks to give false positives (that is, allow deletes that should have been rejected). This is an easy oversight to have made, because prior to improved tuple locks in commit `0ac5ad5134` it wasn't possible to have tuples created by our own transaction that were also locked by remote transactions, and so locks weren't even considered in that code path. It is recommended that foreign keys be rechecked manually in bulk after installing this update, in case some referenced rows are missing with some referencing row remaining. Per bug reported by Daniel Wood in CAPweHKe5QQ1747X2c0tA=5zf4YnS2xcvGf13Opd-1Mq24rF1cQ@mail.gmail.com	2013-12-18 13:45:51 -03:00
Tatsuo Ishii	65d6e4cb5c	Add ALTER SYSTEM command to edit the server configuration file. Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii, Boszormenyi Zoltan, Andres Freund, Greg Smith and others.	2013-12-18 23:42:44 +09:00
Heikki Linnakangas	30b96549ab	Mark variables 'static' where possible. Move GinFuzzySearchLimit to ginget.c Per "clang -Wmissing-variable-declarations" output, posted by Andres Freund. I didn't silence all those warnings, though, only the most obvious cases.	2013-12-16 11:41:17 +02:00
Heikki Linnakangas	dde6282500	Fix more instances of "the the" in comments. Plus one instance of "to to" in the docs.	2013-12-13 20:02:01 +02:00
Tom Lane	e8312b4f03	Don't let timeout interrupts happen unless ImmediateInterruptOK is set. Serious oversight in commit `16e1b7a1b7`: we should not allow an interrupt to take control away from mainline code except when ImmediateInterruptOK is set. Just to be safe, let's adopt the same save-clear-restore dance that's been used for many years in HandleCatchupInterrupt and HandleNotifyInterrupt, so that nothing bad happens if a timeout handler invokes code that tests or even manipulates ImmediateInterruptOK. Per report of "stuck spinlock" failures from Christophe Pettus, though many other symptoms are possible. Diagnosis by Andres Freund.	2013-12-13 11:50:15 -05:00
Heikki Linnakangas	50e547096c	Add GUC to enable WAL-logging of hint bits, even with checksums disabled. WAL records of hint bit updates is useful to tools that want to examine which pages have been modified. In particular, this is required to make the pg_rewind tool safe (without checksums). This can also be used to test how much extra WAL-logging would occur if you enabled checksums, without actually enabling them (which you can't currently do without re-initdb'ing). Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with further changes by me.	2013-12-13 16:26:14 +02:00
Simon Riggs	8693559cac	New autovacuum_work_mem parameter If autovacuum_work_mem is set, autovacuum workers now use this parameter in preference to maintenance_work_mem. Peter Geoghegan	2013-12-12 11:42:39 +00:00
Robert Haas	e55704d8b2	Add new wal_level, logical, sufficient for logical decoding. When wal_level=logical, we'll log columns from the old tuple as configured by the REPLICA IDENTITY facility added in commit `07cacba983`. This makes it possible a properly-configured logical replication solution to correctly follow table updates even if they change the chosen key columns, or, with REPLICA IDENTITY FULL, even if the table has no key at all. Note that updates which do not modify the replica identity column won't log anything extra, making the choice of a good key (i.e. one that will rarely be changed) important to performance when wal_level=logical is configured. Each insert, update, or delete to a catalog table will also log the CMIN and/or CMAX values of stamped by the current transaction. This is necessary because logical decoding will require access to historical snapshots of the catalog in order to decode some data types, and the CMIN/CMAX values that we may need in order to judge row visibility may have been overwritten by the time we need them. Andres Freund, reviewed in various versions by myself, Heikki Linnakangas, KONDO Mitsumasa, and many others.	2013-12-10 19:01:40 -05:00
Noah Misch	53685d7981	Rename TABLE() to ROWS FROM(). SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and other collection types. This feature, however, deals with set-returning functions. Use a different syntax for this feature to keep open the possibility of implementing the standard TABLE().	2013-12-10 09:34:37 -05:00
Peter Eisentraut	3164721462	SSL: Support ECDH key exchange This sets up ECDH key exchange, when compiling against OpenSSL that supports EC. Then the ECDHE-RSA and ECDHE-ECDSA cipher suites can be used for SSL connections. The latter one means that EC keys are now usable. The reason for EC key exchange is that it's faster than DHE and it allows to go to higher security levels where RSA will be horribly slow. There is also new GUC option ssl_ecdh_curve that specifies the curve name used for ECDH. It defaults to "prime256v1", which is the most common curve in use in HTTPS. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>	2013-12-07 15:11:44 -05:00
Peter Eisentraut	ef3267523d	SSL: Add configuration option to prefer server cipher order By default, OpenSSL (and SSL/TLS in general) lets the client cipher order take priority. This is OK for browsers where the ciphers were tuned, but few PostgreSQL client libraries make the cipher order configurable. So it makes sense to have the cipher order in postgresql.conf take priority over client defaults. This patch adds the setting "ssl_prefer_server_ciphers" that can be turned on so that server cipher order is preferred. Per discussion, this now defaults to on. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>	2013-12-07 08:13:50 -05:00
Alvaro Herrera	07aeb1fec5	Avoid resetting Xmax when it's a multi with an aborted update HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while checking the contents of a multixact and finding it contains an aborted update, by setting the HEAP_XMAX_INVALID bit. This would lead to concurrent transactions not noticing any previous locks held by transactions that might still be running, and thus being able to acquire subsequent locks they wouldn't be normally able to acquire. This bug was introduced in commit 1ce150b7bb; backpatch this fix to 9.3, like that commit. This change reverts the change to the delete-abort-savept isolation test in `1ce150b7bb`, because that behavior change was caused by this bug. Noticed by Andres Freund while investigating a different issue reported by Noah Misch.	2013-12-05 12:21:55 -03:00
Robert Haas	a8656a3ab0	Make NUM_TOCHAR_prepare and NUM_TOCHAR_finish macros declare "len". Remove the variable from the enclosing scopes so that nothing can be relying on it. The net result of this refactoring is that we get rid of a few unnecessary strlen() calls. Original patch from Greg Jaskiewicz, substantially expanded by me.	2013-12-02 10:51:06 -05:00
Robert Haas	9d140f7be2	Avoid out-of-bounds read in errfinish if error_stack_depth < 0. If errordata_stack_depth < 0, we won't find that out and correct the problem until CHECK_STACK_DEPTH() is invoked. In the meantime, elevel will be set based on an invalid read. This is probably harmless in practice, but it seems cleaner this way. Xi Wang	2013-12-02 10:42:01 -05:00
Alvaro Herrera	1ce150b7bb	Don't TransactionIdDidAbort in HeapTupleGetUpdateXid It is dangerous to do so, because some code expects to be able to see what's the true Xmax even if it is aborted (particularly while traversing HOT chains). So don't do it, and instead rely on the callers to verify for abortedness, if necessary. Several race conditions and bugs fixed in the process. One isolation test changes the expected output due to these. This also reverts commit `c235a6a589`, which is no longer necessary. Backpatch to 9.3, where this function was introduced. Andres Freund	2013-11-29 21:47:21 -03:00
Tom Lane	16e1b7a1b7	Fix assorted race conditions in the new timeout infrastructure. Prevent handle_sig_alarm from losing control partway through due to a query cancel (either an asynchronous SIGINT, or a cancel triggered by one of the timeout handler functions). That would at least result in failure to schedule any required future interrupt, and might result in actual corruption of timeout.c's data structures, if the interrupt happened while we were updating those. We could still lose control if an asynchronous SIGINT arrives just as the function is entered. This wouldn't break any data structures, but it would have the same effect as if the SIGALRM interrupt had been silently lost: we'd not fire any currently-due handlers, nor schedule any new interrupt. To forestall that scenario, forcibly reschedule any pending timer interrupt during AbortTransaction and AbortSubTransaction. We can avoid any extra kernel call in most cases by not doing that until we've allowed LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events. Another hazard is that some platforms (at least Linux and *BSD) block a signal before calling its handler and then unblock it on return. When we longjmp out of the handler, the unblock doesn't happen, and the signal is left blocked indefinitely. Again, we can fix that by forcibly unblocking signals during AbortTransaction and AbortSubTransaction. These latter two problems do not manifest when the longjmp reaches postgres.c, because the error recovery code there kills all pending timeout events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal mask is restored. So errors thrown outside any transaction should be OK already, and cleaning up in AbortTransaction and AbortSubTransaction should be enough to fix these issues. (We're assuming that any code that catches a query cancel error and doesn't re-throw it will do at least a subtransaction abort to clean up; but that was pretty much required already by other subsystems.) Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when disabling that event: if a lock timeout interrupt happened after the lock was granted, the ensuing query cancel is still going to happen at the next CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user cancel. Per reports from Dan Wood. Back-patch to 9.3 where the new timeout handling infrastructure was introduced. We may at some point decide to back-patch the signal unblocking changes further, but I'll desist from that until we hear actual field complaints about it.	2013-11-29 16:41:00 -05:00
Robert Haas	8e18d04d4d	Refine our definition of what constitutes a system relation. Although user-defined relations can't be directly created in pg_catalog, it's possible for them to end up there, because you can create them in some other schema and then use ALTER TABLE .. SET SCHEMA to move them there. Previously, such relations couldn't afterwards be manipulated, because IsSystemRelation()/IsSystemClass() rejected all attempts to modify objects in the pg_catalog schema, regardless of their origin. With this patch, they now reject only those objects in pg_catalog which were created at initdb-time, allowing most operations on user-created tables in pg_catalog to proceed normally. This patch also adds new functions IsCatalogRelation() and IsCatalogClass(), which is similar to IsSystemRelation() and IsSystemClass() but with a slightly narrower definition: only TOAST tables of system catalogs are included, rather than all TOAST tables. This is currently used only for making decisions about when invalidation messages need to be sent, but upcoming logical decoding patches will find other uses for this information. Andres Freund, with some modifications by me.	2013-11-28 20:57:20 -05:00
Peter Eisentraut	85ed91ee7d	Implement information_schema.parameters.parameter_default column Reviewed-by: Ali Dar <ali.munir.dar@gmail.com> Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com> Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>	2013-11-26 23:21:35 -05:00
Jeff Davis	7cc0ba9f17	Add missing entry for session_preload_libraries in sample config. The omission was apparently an oversight in the original patch.	2013-11-25 21:03:07 -08:00
Bruce Momjian	a6542a4b68	Change SET LOCAL/CONSTRAINTS/TRANSACTION and ABORT behavior Change SET LOCAL/CONSTRAINTS/TRANSACTION behavior outside of a transaction block from error (post-9.3) to warning. (Was nothing in <= 9.3.) Also change ABORT outside of a transaction block from notice to warning.	2013-11-25 19:19:40 -05:00
Jeff Davis	559d535819	Lessen library-loading log level. Previously, messages were emitted at the LOG level every time a backend preloaded a library. That was acceptable (though unnecessary) for shared_preload_libraries; but it was excessive for local_preload_libraries and session_preload_libraries. Reduce to DEBUG1. Also, there was logic in the EXEC_BACKEND case to avoid repeated messages for shared_preload_libraries by demoting them to DEBUG2. DEBUG1 seems more appropriate there, as well, so eliminate that special case. Peter Geoghegan.	2013-11-24 10:50:54 -08:00
Peter Eisentraut	b7212c9726	Fix thinko in SPI_execute_plan() calls Two call sites were apparently thinking that the last argument of SPI_execute_plan() is the number of query parameters, but it is actually the row limit. Change the calls to 0, since we don't care about the limit there. The previous code didn't break anything, but it was still wrong.	2013-11-23 09:34:57 -05:00
Peter Eisentraut	4053189d59	Avoid potential buffer overflow crash A pointer to a C string was treated as a pointer to a "name" datum and passed to SPI_execute_plan(). This pointer would then end up being passed through datumCopy(), which would try to copy the entire 64 bytes of name data, thus running past the end of the C string. Fix by converting the string to a proper name structure. Found by LLVM AddressSanitizer.	2013-11-23 07:25:37 -05:00
Tom Lane	784e762e88	Support multi-argument UNNEST(), and TABLE() syntax for multiple functions. This patch adds the ability to write TABLE( function1(), function2(), ...) as a single FROM-clause entry. The result is the concatenation of the first row from each function, followed by the second row from each function, etc; with NULLs inserted if any function produces fewer rows than others. This is believed to be a much more useful behavior than what Postgres currently does with multiple SRFs in a SELECT list. This syntax also provides a reasonable way to combine use of column definition lists with WITH ORDINALITY: put the column definition list inside TABLE(), where it's clear that it doesn't control the ordinality column as well. Also implement SQL-compliant multiple-argument UNNEST(), by turning UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)). The SQL standard specifies TABLE() with only a single function, not multiple functions, and it seems to require an implicit UNNEST() which is not what this patch does. There may be something wrong with that reading of the spec, though, because if it's right then the spec's TABLE() is just a pointless alternative spelling of UNNEST(). After further review of that, we might choose to adopt a different syntax for what this patch does, but in any case this functionality seems clearly worthwhile. Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and significantly revised by me	2013-11-21 19:37:20 -05:00
Robert Haas	f1df4731ee	Use cstring_to_text_with_len when length is known. This avoids a potentially-expensive extra call to strlen(). David Rowley	2013-11-18 10:19:00 -05:00
Tom Lane	f901bb50e3	Add make_date() and make_time() functions. Pavel Stehule, reviewed by Jeevan Chalke and Atri Sharma	2013-11-17 15:06:50 -05:00
Tom Lane	69c8fbac20	Improve performance of numeric sum(), avg(), stddev(), variance(), etc. This patch improves performance of most built-in aggregates that formerly used a NUMERIC or NUMERIC array as their transition type; this includes not only aggregates on numeric inputs, but some aggregates on integer inputs where overflow of an int8 value is a possibility. The code now uses a special-purpose data structure to avoid array construction and deconstruction overhead, as well as packing and unpacking overhead for numeric values. These aggregates' transition type is now declared as INTERNAL, since it doesn't correspond to any SQL data type. To keep the planner from thinking that that means a lot of storage will be used, we make use of the just-added pg_aggregate.aggtransspace feature. The space estimate is set to 128 bytes, which is at least in the right ballpark. Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra	2013-11-16 18:46:34 -05:00
Robert Haas	c46c803f8a	Fix relfilenodemap.c's handling of cache invalidations. The old code entered a new hash table entry first, then scanned pg_class to determine what value to fill in, and then populated the entry. This fails to work properly if a cache invalidation happens as a result of opening pg_class. Repair. Along the way, get rid of the idea of blowing away the entire hash table as a method of processing invalidations. Instead, just delete all the entries one by one. This is probably not quite as cheap but it's simpler, and shouldn't happen often. Andres Freund	2013-11-13 10:52:59 -05:00
Peter Eisentraut	aa04b323c3	Move variable closer to where it is used This avoids an unused variable warning on Windows when building without asserts From: David Rowley <dgrowleyml@gmail.com>	2013-11-13 06:26:27 -05:00
Tom Lane	ebefbb5fde	Fix failure with whole-row reference to a subquery. Simple oversight in commit `1cb108efb0` --- recursively examining a subquery output column is only sane if the original Var refers to a single output column. Found by Kevin Grittner.	2013-11-11 16:36:27 -05:00
Tom Lane	0b7e660d6c	Fix ruleutils pretty-printing to not generate trailing whitespace. The pretty-printing logic in ruleutils.c operates by inserting a newline and some indentation whitespace into strings that are already valid SQL. This naturally results in leaving some trailing whitespace before the newline in many cases; which can be annoying when processing the output with other tools, as complained of by Joe Abbate. We can fix that in a pretty localized fashion by deleting any trailing whitespace before we append a pretty-printing newline. In addition, we have to modify the code inserted by commit `2f582f76b1` so that we also delete trailing whitespace when transposing items from temporary buffers into the main result string, when a temporary item starts with a newline. This results in rather voluminous changes to the regression test results, but it's easily verified that they are only removal of trailing whitespace. Back-patch to 9.3, because the aforementioned commit resulted in many more cases of trailing whitespace than had occurred in earlier branches.	2013-11-11 13:36:38 -05:00
Peter Eisentraut	001e114b8d	Fix whitespace issues found by git diff --check, add gitattributes Set per file type attributes in .gitattributes to fine-tune whitespace checks. With the associated cleanups, the tree is now clean for git	2013-11-10 14:48:29 -05:00
Robert Haas	07cacba983	Add the notion of REPLICA IDENTITY for a table. Pending patches for logical replication will use this to determine which columns of a tuple ought to be considered as its candidate key. Andres Freund, with minor, mostly cosmetic adjustments by me	2013-11-08 12:30:43 -05:00
Kevin Grittner	b64b5ccb6a	Silence benign warnings from clang version 3.0-6ubuntu3.	2013-11-07 16:35:43 -06:00
Tom Lane	8dace66e07	Add #ifdef guards for some POSIX error symbols that Windows doesn't like. Per buildfarm results. It looks like the older the Windows version, the more errno codes it hasn't got ...	2013-11-06 20:22:42 -05:00
Tom Lane	8e68816cc2	Be more robust when strerror() doesn't give a useful result. glibc, at least, is capable of returning "???" instead of anything useful if it doesn't like the setting of LC_CTYPE. If this happens, or in the previously-known case of strerror() returning an empty string, try to print the C macro name for the error code ("EACCES" etc). Only if we don't have the error code in our compiled-in list of popular error codes (which covers most though not quite all of what's called out in the POSIX spec) will we fall back to printing a numeric error code. This should simplify debugging. Note that this functionality is currently only provided for %m in backend ereport/elog messages. That may be sufficient, since we don't fool with the locale environment in frontend clients, but it's foreseeable that we might want similar code in libpq for instance. There was some talk of back-patching this, but let's see how the buildfarm likes it first. It seems likely that at least some of the POSIX-defined error code symbols don't exist on all platforms. I don't want to clutter the entire list with #ifdefs, but we may need more than are here now. MauMau, edited by me	2013-11-06 15:50:17 -05:00
Tom Lane	bb45c64041	Support default arguments and named-argument notation for window functions. These things didn't work because the planner omitted to do the necessary preprocessing of a WindowFunc's argument list. Add the few dozen lines of code needed to handle that. Although this sounds like a feature addition, it's really a bug fix because the default-argument case was likely to crash previously, due to lack of checking of the number of supplied arguments in the built-in window functions. It's not a security issue because there's no way for a non-superuser to create a window function definition with defaults that refers to a built-in C function, but nonetheless people might be annoyed that it crashes rather than producing a useful error message. So back-patch as far as the patch applies easily, which turns out to be 9.2. I'll put a band-aid in earlier versions as a separate patch. (Note that these features still don't work for aggregates, and fixing that case will be harder since we represent aggregate arg lists as target lists not bare expression lists. There's no crash risk though because CREATE AGGREGATE doesn't accept defaults, and we reject named-argument notation when parsing an aggregate call.)	2013-11-06 13:33:09 -05:00
Tom Lane	e36ce0c7f7	Get rid of more cases of the "must detoast before output function" meme. I missed that json.c was doing this too, because for some bizarre reason it wasn't doing it adjacent to the output function call.	2013-11-03 11:55:37 -05:00
Tom Lane	b006f4ddb9	Prevent memory leaks from accumulating across printtup() calls. Historically, printtup() has assumed that it could prevent memory leakage by pfree'ing the string result of each output function and manually managing detoasting of toasted values. This amounts to assuming that datatype output functions never leak any memory internally; an assumption we've already decided to be bogus elsewhere, for example in COPY OUT. range_out in particular is known to leak multiple kilobytes per call, as noted in bug #8573 from Godfried Vanluffelen. While we could go in and fix that leak, it wouldn't be very notationally convenient, and in any case there have been and undoubtedly will again be other leaks in other output functions. So what seems like the best solution is to run the output functions in a temporary memory context that can be reset after each row, as we're doing in COPY OUT. Some quick experimentation suggests this is actually a tad faster than the retail pfree's anyway. This patch fixes all the variants of printtup, except for debugtup() which is used in standalone mode. It doesn't seem worth worrying about query-lifespan leaks in standalone mode, and fixing that case would be a bit tedious since debugtup() doesn't currently have any startup or shutdown functions. While at it, remove manual detoast management from several other output-function call sites that had copied it from printtup(). This doesn't make a lot of difference right now, but in view of recent discussions about supporting "non-flattened" Datums, we're going to want that code gone eventually anyway. Back-patch to 9.2 where range_out was introduced. We might eventually decide to back-patch this further, but in the absence of known major leaks in older output functions, I'll refrain for now.	2013-11-03 11:33:05 -05:00
Tom Lane	45f64f1bbf	Remove CTimeZone/HasCTZSet, root and branch. These variables no longer have any useful purpose, since there's no reason to special-case brute force timezones now that we have a valid session_timezone setting for them. Remove the variables, and remove the SET/SHOW TIME ZONE code that deals with them. The user-visible impact of this is that SHOW TIME ZONE will now show a POSIX-style zone specification, in the form "<+-offset>-+offset", rather than an interval value when a brute-force zone has been set. While perhaps less intuitive, this is a better definition than before because it's actually possible to give that string back to SET TIME ZONE and get the same behavior, unlike what used to happen. We did not previously mention the angle-bracket syntax when describing POSIX timezone specifications; add some documentation so that people can figure out what these strings do. (There's still quite a lot of undocumented functionality there, but anybody who really cares can go read the POSIX spec to find out about it. In practice most people seem to prefer Olsen-style city names anyway.)	2013-11-01 13:57:31 -04:00
Tom Lane	1c8a7f617f	Remove internal uses of CTimeZone/HasCTZSet. The only remaining places where we actually look at CTimeZone/HasCTZSet are abstime2tm() and timestamp2tm(). Now that session_timezone is always valid, we can remove these special cases. The caller-visible impact of this is that these functions now always return a valid zone abbreviation if requested, whereas before they'd return a NULL pointer if a brute-force timezone was in use. In the existing code, the only place I can find that changes behavior is to_char(), whose TZ format code will now print something useful rather than nothing for such zones. (In the places where the returned zone abbreviation is passed to EncodeDateTime, the lack of visible change is because we've chosen the abbreviation used for these zones to match what EncodeTimezone would have printed.) It's likely that there is now a fair amount of removable dead code around the call sites, namely anything that's meant to cope with getting a NULL timezone abbreviation, but I've not made an effort to root that out. This could be back-patched if we decide we'd like to fix to_char()'s behavior in the back branches, but there doesn't seem to be much enthusiasm for that at present.	2013-11-01 12:51:27 -04:00
Tom Lane	631dc390f4	Fix some odd behaviors when using a SQL-style simple GMT offset timezone. Formerly, when using a SQL-spec timezone setting with a fixed GMT offset (called a "brute force" timezone in the code), the session_timezone variable was not updated to match the nominal timezone; rather, all code was expected to ignore session_timezone if HasCTZSet was true. This is of course obviously fragile, though a search of the code finds only timeofday() failing to honor the rule. A bigger problem was that DetermineTimeZoneOffset() supposed that if its pg_tz parameter was pointer-equal to session_timezone, then HasCTZSet should override the parameter. This would cause datetime input containing an explicit zone name to be treated as referencing the brute-force zone instead, if the zone name happened to match the session timezone that had prevailed before installing the brute-force zone setting (as reported in bug #8572). The same malady could affect AT TIME ZONE operators. To fix, set up session_timezone so that it matches the brute-force zone specification, which we can do using the POSIX timezone definition syntax "<abbrev>offset", and get rid of the bogus lookaside check in DetermineTimeZoneOffset(). Aside from fixing the erroneous behavior in datetime parsing and AT TIME ZONE, this will cause the timeofday() function to print its result in the user-requested time zone rather than some previously-set zone. It might also affect results in third-party extensions, if there are any that make use of session_timezone without considering HasCTZSet, but in all cases the new behavior should be saner than before. Back-patch to all supported branches.	2013-11-01 12:13:18 -04:00
Robert Haas	cacbdd7810	Use appendStringInfoString instead of appendStringInfo where possible. This shaves a few cycles, and generally seems like good programming practice. David Rowley	2013-10-31 10:55:59 -04:00
Tom Lane	43fe90f66a	Suppress -0 in the C field of lines computed by line_construct_pts(). It's not entirely clear why some PPC machines are generating -0 here, since the underlying computation should be exactly 0 - 0. Perhaps there's some wider-than-nominal-precision calculations happening? Anyway, the best way to avoid platform-dependent results seems to be to explicitly reset -0 to regular zero.	2013-10-25 15:55:15 -04:00
Tom Lane	3147acd63e	Use improved vsnprintf calling logic in more places. When we are using a C99-compliant vsnprintf implementation (which should be most places, these days) it is worth the trouble to make use of its report of how large the buffer needs to be to succeed. This patch adjusts stringinfo.c and some miscellaneous usages in pg_dump to do that, relying on the logic recently added in libpgcommon's psprintf.c. Since these places want to know the number of bytes written once we succeed, modify the API of pvsnprintf() to report that. There remains near-duplicate logic in pqexpbuffer.c, but since that code is in libpq, psprintf.c's approach of exit()-on-error isn't appropriate for use there. Also note that I didn't bother touching the multitude of places that call (v)snprintf without any attempt to provide a resizable buffer. Release-note-worthy incompatibility: the API of appendStringInfoVA() changed. If there's any third-party code that's calling that directly, it will need tweaking along the same lines as in this patch. David Rowley and Tom Lane	2013-10-24 21:43:57 -04:00
Heikki Linnakangas	138184adc5	Plug memory leak when reloading config file. The absolute path to config file was not pfreed. There are probably more small leaks here and there in the config file reload code and assign hooks, and in practice no-one reloads the config files frequently enough for it to be a problem, but this one is trivial enough that might as well fix it. Backpatch to 9.3 where the leak was introduced.	2013-10-24 15:27:40 +03:00
Tom Lane	2c66f9924c	Replace pg_asprintf() with psprintf(). This eliminates an awkward coding pattern that's also unnecessarily inconsistent with backend coding. psprintf() is now the thing to use everywhere.	2013-10-22 19:40:26 -04:00
Tom Lane	09a89cb5fc	Get rid of use of asprintf() in favor of a more portable implementation. asprintf(), aside from not being particularly portable, has a fundamentally badly-designed API; the psprintf() function that was added in passing in the previous patch has a much better API choice. Moreover, the NetBSD implementation that was borrowed for the previous patch doesn't work with non-C99-compliant vsnprintf, which is something we still have to cope with on some platforms; and it depends on va_copy which isn't all that portable either. Get rid of that code in favor of an implementation similar to what we've used for many years in stringinfo.c. Also, move it into libpgcommon since it's not really libpgport material. I think this patch will be enough to turn the buildfarm green again, but there's still cosmetic work left to do, namely get rid of pg_asprintf() in favor of using psprintf(). That will come in a followon patch.	2013-10-22 18:42:13 -04:00
Peter Eisentraut	586a8fc75b	Make use of psprintf() in recent changes	2013-10-22 07:04:41 -04:00
Tom Lane	2885881147	Fix blatantly broken record_image_cmp() logic for pass-by-value fields. Doesn't anybody here pay attention to compiler warnings?	2013-10-22 00:38:53 -04:00
Robert Haas	cab5dc5daf	Allow only some columns of a view to be auto-updateable. Previously, unless all columns were auto-updateable, we wouldn't inserts, updates, or deletes, or at least not without a rule or trigger; now, we'll allow inserts and updates that target only the auto-updateable columns, and deletes even if there are no auto-updateable columns at all provided the view definition is otherwise suitable. Dean Rasheed, reviewed by Marko Tiikkaja	2013-10-18 10:35:36 -04:00
Robert Haas	ea91a6be89	Remove IRIX port. Development of IRIX has been discontinued, and support is scheduled to end in December of 2013. Therefore, there will be no supported versions of this operating system by the time PostgreSQL 9.4 is released. Furthermore, we have no maintainer for this platform.	2013-10-18 08:14:21 -04:00
Bruce Momjian	7778ddc7a2	Allow 5+ digit years for non-ISO timestamp/date strings, where appropriate Report from Haribabu Kommi	2013-10-16 13:22:55 -04:00
Robert Haas	05a0283e7a	Fix details missed by dynamic shared memory patch. Additional documentation update, and a comment fix. Both issues reported by Amit Kapila.	2013-10-14 08:00:26 -04:00
Peter Eisentraut	5b6d08cd29	Add use of asprintf() Add asprintf(), pg_asprintf(), and psprintf() to simplify string allocation and composition. Replacement implementations taken from NetBSD. Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Asif Naeem <anaeem.it@gmail.com>	2013-10-13 00:09:18 -04:00
Kevin Grittner	4cbb646334	Fix several possibly non-portable gaffs in record_image_ops. Sparc machines in the buildfarm were made happy by the previous fix, but PowerPC machines still are still failing. Hopefully this will cure that.	2013-10-11 13:02:52 -05:00
Alvaro Herrera	31cf1a1a43	Rework SSL renegotiation code The existing renegotiation code was home for several bugs: it might erroneously report that renegotiation had failed; it might try to execute another renegotiation while the previous one was pending; it failed to terminate the connection if the renegotiation never actually took place; if a renegotiation was started, the byte count was reset, even if the renegotiation wasn't completed (this isn't good from a security perspective because it means continuing to use a session that should be considered compromised due to volume of data transferred.) The new code is structured to avoid these pitfalls: renegotiation is started a little earlier than the limit has expired; the handshake sequence is retried until it has actually returned successfully, and no more than that, but if it fails too many times, the connection is closed. The byte count is reset only when the renegotiation has succeeded, and if the renegotiation byte count limit expires, the connection is terminated. This commit only touches the master branch, because some of the changes are controversial. If everything goes well, a back-patch might be considered. Per discussion started by message 20130710212017.GB4941@eldon.alvh.no-ip.org	2013-10-10 23:45:20 -03:00
Kevin Grittner	15e46fd1dd	Fix bug in record_image_ops on big endian machines. The buildfarm pointed out the problem. Fix based on suggestion by Robert Haas.	2013-10-10 11:25:30 -05:00
Andrew Dunstan	4d212bac17	json_typeof function. Andrew Tipton.	2013-10-10 12:21:59 -04:00
Peter Eisentraut	261c7d4b65	Revive line type Change the input/output format to {A,B,C}, to match the internal representation. Complete the implementations of line_in, line_out, line_recv, line_send. Remove comments and error messages about the line type not being implemented. Add regression tests for existing line operators and functions. Reviewed-by: rui hua <365507506hua@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>	2013-10-09 22:34:38 -04:00
Robert Haas	0ac5e5a7e1	Allow dynamic allocation of shared memory segments. Patch by myself and Amit Kapila. Design help from Noah Misch. Review by Andres Freund.	2013-10-09 21:05:02 -04:00
Kevin Grittner	f566515192	Add record_image_ops opclass for matview concurrent refresh. REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview containing a column of a type without a default btree operator class. It also did not produce results consistent with a non- concurrent REFRESH or a normal view if any column was of a type which allowed user-visible differences between values which compared as equal according to the type's default btree opclass. Concurrent matview refresh was modified to use the new operators to solve these problems. Documentation was added for record comparison, both for the default btree operator class for record, and the newly added operators. Regression tests now check for proper behavior both for a matview with a box column and a matview containing a citext column. Reviewed by Steve Singer, who suggested some of the doc language.	2013-10-09 14:26:09 -05:00
Bruce Momjian	0c6b675076	Centralize effective_cache_size default setting	2013-10-09 08:33:12 -04:00
Bruce Momjian	6648775028	Update postgres.conf.sample for effective_cache_size's new default	2013-10-08 12:50:05 -04:00
Bruce Momjian	ee1e5662d8	Auto-tune effective_cache size to be 4x shared buffers	2013-10-08 12:12:24 -04:00
Bruce Momjian	a54141aebc	Issue error on SET outside transaction block in some cases Issue error for SET LOCAL/CONSTRAINTS/TRANSACTION outside a transaction block, as they have no effect. Per suggestion from Morten Hustveit	2013-10-04 13:50:28 -04:00
Bruce Momjian	d50f281210	Adjust C comments that would be wrap-able.	2013-10-01 19:45:01 -04:00
Robert Haas	4334639f4b	Allow printf-style padding specifications in log_line_prefix. David Rowley, after a suggestion from Heikki Linnakangas. Reviewed by Albe Laurenz, and further edited by me.	2013-09-26 17:56:31 -04:00
Heikki Linnakangas	77ae7f7c35	Plug memory leak in range_cmp function. B-tree operators are not allowed to leak memory into the current memory context. Range_cmp leaked detoasted copies of the arguments. That caused a quick out-of-memory error when creating an index on a range column. Reported by Marian Krucina, bug #8468.	2013-09-25 16:02:00 +03:00
Heikki Linnakangas	0892ecbc01	Add a GUC to report whether data page checksums are enabled. Bernd Helmle	2013-09-16 14:36:01 +03:00
Kevin Grittner	277607d600	Eliminate pg_rewrite.ev_attr column and related dead code. Commit `95ef6a3448` removed the ability to create rules on an individual column as of 7.3, but left some residual code which has since been useless. This cleans up that dead code without any change in behavior other than dropping the useless column from the catalog.	2013-09-05 14:03:43 -05:00
Heikki Linnakangas	20cb18db46	Make catalog cache hash tables resizeable. If the hash table backing a catalog cache becomes too full (fillfactor > 2), enlarge it. A new buckets array, double the size of the old, is allocated, and all entries in the old hash are moved to the right bucket in the new hash. This has two benefits. First, cache lookups don't get so expensive when there are lots of entries in a cache, like if you access hundreds of thousands of tables. Second, we can make the (initial) sizes of the caches much smaller, which saves memory. This patch dials down the initial sizes of the catcaches. The new sizes are chosen so that a backend that only runs a few basic queries still won't need to enlarge any of them.	2013-09-05 20:20:03 +03:00
Bruce Momjian	f5c2f5a8f6	Add GUC descriptions for compile-time postgresql.conf settings Previous text was "No description available". Tianyin Xu	2013-09-04 17:44:04 -04:00
Tom Lane	0c66a22377	Update comments concerning PGC_S_TEST. This GUC context value was once only used by ALTER DATABASE SET and ALTER USER SET. That's not true anymore, though, so rewrite the comments to be a bit more general. Patch in HEAD only, since this is just an internal documentation issue.	2013-09-03 18:56:22 -04:00
Tom Lane	0d3f4406df	Allow aggregate functions to be VARIADIC. There's no inherent reason why an aggregate function can't be variadic (even VARIADIC ANY) if its transition function can handle the case. Indeed, this patch to add the feature touches none of the planner or executor, and little of the parser; the main missing stuff was DDL and pg_dump support. It is true that variadic aggregates can create the same sort of ambiguity about parameters versus ORDER BY keys that was complained of when we (briefly) had both one- and two-argument forms of string_agg(). However, the policy formed in response to that discussion only said that we'd not create any built-in aggregates with varying numbers of arguments, not that we shouldn't allow users to do it. So the logical extension of that is we can allow users to make variadic aggregates as long as we're wary about shipping any such in core. In passing, this patch allows aggregate function arguments to be named, to the extent of remembering the names in pg_proc and dumping them in pg_dump. You can't yet call an aggregate using named-parameter notation. That seems like a likely future extension, but it'll take some work, and it's not what this patch is really about. Likewise, there's still some work needed to make window functions handle VARIADIC fully, but I left that for another day. initdb forced because of new aggvariadic field in Aggref parse nodes.	2013-09-03 17:08:46 -04:00
Alvaro Herrera	e246cfc95f	Initialize cached OID to Invalid in new hash entries Andres Freund; bug detected by valgrind	2013-08-27 14:53:17 -04:00
Tom Lane	2aac3399ae	Account better for planning cost when choosing whether to use custom plans. The previous coding in plancache.c essentially used 10% of the estimated runtime as its cost estimate for planning. This can be pretty bogus, especially when the estimated runtime is very small, such as in a simple expression plan created by plpgsql, or a simple INSERT ... VALUES. While we don't have a really good handle on how planning time compares to runtime, it seems reasonable to use an estimate based on the number of relations referenced in the query, with a rather large multiplier. This patch uses 1000 * cpu_operator_cost * (nrelations + 1), so that even a trivial query will be charged 1000 * cpu_operator_cost for planning. This should address the problem reported by Marc Cousin and others that 9.2 and up prefer custom plans in cases where the planning time greatly exceeds what can be saved.	2013-08-24 15:14:17 -04:00
Tom Lane	3d5282c6f0	Emit a log message if output is about to be redirected away from stderr. We've seen multiple cases of people looking at the postmaster's original stderr output to try to diagnose problems, not realizing/remembering that their logging configuration is set up to send log messages somewhere else. This seems particularly likely to happen in prepackaged distributions, since many packagers patch the code to change the factory-standard logging configuration to something more in line with their platform conventions. In hopes of reducing confusion, emit a LOG message about this at the point in startup where we are about to switch log output away from the original stderr, providing a pointer to where to look instead. This message will appear as the last thing in the original stderr output. (We might later also try to emit such link messages when logging parameters are changed on-the-fly; but that case seems to be both noticeably harder to do nicely, and much less frequently a problem in practice.) Per discussion, back-patch to 9.3 but not further.	2013-08-13 15:24:52 -04:00
Peter Eisentraut	072457b360	Message punctuation and pluralization fixes	2013-08-09 08:02:44 -04:00
Peter Eisentraut	9d775d8894	Message style improvements	2013-08-07 22:48:40 -04:00
Tom Lane	221e92f64c	Make sure float4in/float8in accept all standard spellings of "infinity". The C99 and POSIX standards require strtod() to accept all these spellings (case-insensitively): "inf", "+inf", "-inf", "infinity", "+infinity", "-infinity". However, pre-C99 systems might accept only some or none of these, and apparently Windows still doesn't accept "inf". To avoid surprising cross-platform behavioral differences, manually check for each of these spellings if strtod() fails. We were previously handling just "infinity" and "-infinity" that way, but since C99 is most of the world now, it seems likely that applications are expecting all these spellings to work. Per bug #8355 from Basil Peace. It turns out this fix won't actually resolve his problem, because Python isn't being this careful; but that doesn't mean we shouldn't be.	2013-08-03 12:40:27 -04:00
Alvaro Herrera	706f9dd914	Fix old visibility bug in HeapTupleSatisfiesDirty If a tuple is locked but not updated by a concurrent transaction, HeapTupleSatisfiesDirty would return that transaction's Xid in xmax, causing callers to wait on it, when it is not necessary (in fact, if the other transaction had used a multixact instead of a plain Xid to mark the tuple, HeapTupleSatisfiesDirty would have behave differently and not returned the Xmax). This bug was introduced in commit `3f7fbf85dc`, dated December 1998, so it's almost 15 years old now. However, it's hard to see this misbehave, because before we had NOWAIT the only consequence of this is that transactions would wait for slightly more time than necessary; so it's not surprising that this hasn't been reported yet. Craig Ringer and Andres Freund	2013-08-02 17:02:36 -04:00
Robert Haas	813fb03155	Remove SnapshotNow and HeapTupleSatisfiesNow. We now use MVCC catalog scans, and, per discussion, have eliminated all other remaining uses of SnapshotNow, so that we can now get rid of it. This will break third-party code which is still using it, which is intentional, as we want such code to be updated to do things the new way.	2013-08-01 10:46:19 -04:00
Stephen Frost	ddef1a39c6	Allow a context to be passed in for error handling As pointed out by Tom Lane, we can allow other users of the error handler callbacks to provide their own memory context by adding the context to use to ErrorData and using that instead of explicitly using ErrorContext. This then allows GetErrorContextStack() to be called from inside exception handlers, so modify plpgsql to take advantage of that and add an associated regression test for it.	2013-08-01 01:07:20 -04:00
Alvaro Herrera	a59516b631	Fix mis-indented lines Per Coverity	2013-07-31 17:57:15 -04:00
Tom Lane	d074b4e50d	Fix regexp_matches() handling of zero-length matches. We'd find the same match twice if it was of zero length and not immediately adjacent to the previous match. replace_text_regexp() got similar cases right, so adjust this search logic to match that. Note that even though the regexp_split_to_xxx() functions share this code, they did not display equivalent misbehavior, because the second match would be considered degenerate and ignored. Jeevan Chalke, with some cosmetic changes by me.	2013-07-31 11:31:22 -04:00
Greg Stark	c62736cc37	Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF) Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost	2013-07-29 16:38:01 +01:00
Robert Haas	ed93feb808	Change currtid functions to use an MVCC snapshot, not SnapshotNow. This has a slight performance cost, but the only known consumers of these functions, known at the SQL level as currtid and currtid2, is pgsql-odbc; whose usage, we hope, is not sufficiently intensive to make this a problem. Per discussion.	2013-07-25 16:32:02 -04:00
Robert Haas	3483f4332d	Don't use SnapshotNow in get_actual_variable_range. Instead, use the active snapshot. Per Tom Lane, this function is most interested in knowing the range of tuples our scan will actually see. This is another step towards full removal of SnapshotNow.	2013-07-25 14:30:00 -04:00
Stephen Frost	9bd0feeba8	Improvements to GetErrorContextStack() As GetErrorContextStack() borrowed setup and tear-down code from other places, it was less than clear that it must only be called as a top-level entry point into the error system and can't be called by an exception handler (unlike the rest of the error system, which is set up to be reentrant-safe). Being called from an exception handler is outside the charter of GetErrorContextStack(), so add a bit more protection against it, improve the comments addressing why we have to set up an errordata stack for this function at all, and add a few more regression tests. Lack of clarity pointed out by Tom Lane; all bugs are mine.	2013-07-25 09:41:55 -04:00
Stephen Frost	8312832567	Add GET DIAGNOSTICS ... PG_CONTEXT in PL/PgSQL This adds the ability to get the call stack as a string from within a PL/PgSQL function, which can be handy for logging to a table, or to include in a useful message to an end-user. Pavel Stehule, reviewed by Rushabh Lathia and rather heavily whacked around by Stephen Frost.	2013-07-24 18:53:27 -04:00
Tom Lane	b32a25c3d5	Fix booltestsel() for case where we have NULL stats but not MCV stats. In a boolean column that contains mostly nulls, ANALYZE might not find enough non-null values to populate the most-common-values stats, but it would still create a pg_statistic entry with stanullfrac set. The logic in booltestsel() for this situation did the wrong thing for "col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null values would satisfy these tests (so that the true selectivity would be close to one, not close to zero). Per bug #8274. Fix by Andrew Gierth, some comment-smithing by me.	2013-07-24 00:44:09 -04:00
Tom Lane	10a509d829	Move strip_implicit_coercions() from optimizer to nodeFuncs.c. Use of this function has spread into the parser and rewriter, so it seems like time to pull it out of the optimizer and put it into the more central nodeFuncs module. This eliminates the need to #include optimizer/clauses.h in most of the calling files, demonstrating that this function was indeed a bit outside the normal code reference patterns.	2013-07-23 18:21:19 -04:00
Tom Lane	ef655663c5	Further hacking on ruleutils' new column-alias-assignment code. After further thought about implicit coercions appearing in a joinaliasvars list, I realized that they represent an additional reason why we might need to reference the join output column directly instead of referencing an underlying column. Consider SELECT x FROM t1 LEFT JOIN t2 USING (x) where t1.x is of type date while t2.x is of type timestamptz. The merged output variable is of type timestamptz, but it won't go to null when t2 does, therefore neither t1.x nor t2.x is a valid substitute reference. The code in get_variable() actually gets this case right, since it knows it shouldn't look through a coercion, but we failed to ensure that the unqualified output column name would be globally unique. To fix, modify the code that trawls for a dangerous situation so that it actually scans through an unnamed join's joinaliasvars list to see if there are any non-simple-Var entries.	2013-07-23 17:55:04 -04:00
Tom Lane	a7cd853b75	Change post-rewriter representation of dropped columns in joinaliasvars. It's possible to drop a column from an input table of a JOIN clause in a view, if that column is nowhere actually referenced in the view. But it will still be there in the JOIN clause's joinaliasvars list. We used to replace such entries with NULL Const nodes, which is handy for generation of RowExpr expansion of a whole-row reference to the view. The trouble with that is that it can't be distinguished from the situation after subquery pull-up of a constant subquery output expression below the JOIN. Instead, replace such joinaliasvars with null pointers (empty expression trees), which can't be confused with pulled-up expressions. expandRTE() still emits the old convention, though, for convenience of RowExpr generation and to reduce the risk of breaking extension code. In HEAD and 9.3, this patch also fixes a problem with some new code in ruleutils.c that was failing to cope with implicitly-casted joinaliasvars entries, as per recent report from Feike Steenbergen. That oversight was because of an inadequate description of the data structure in parsenodes.h, which I've now corrected. There were some pre-existing oversights of the same ilk elsewhere, which I believe are now all fixed.	2013-07-23 16:23:45 -04:00
Robert Haas	0518eceec3	Adjust HeapTupleSatisfies* routines to take a HeapTuple. Previously, these functions took a HeapTupleHeader, but upcoming patches for logical replication will introduce new a new snapshot type under which the tuple's TID will be used to lookup (CMIN, CMAX) for visibility determination purposes. This makes that information available. Code churn is minimal since HeapTupleSatisfiesVisibility took the HeapTuple anyway, and deferenced it before calling the satisfies function. Independently of logical replication, this allows t_tableOid and t_self to be cross-checked via assertions in tqual.c. This seems like a useful way to make sure that all callers are setting these values properly, which has been previously put forward as desirable. Andres Freund, reviewed by Álvaro Herrera	2013-07-22 13:38:44 -04:00
Alvaro Herrera	0aeb5ae204	Silence compiler warning on an unused variable Also, tweak wording in comments (per Andres) and documentation (myself) to point out that it's the database's default tablespace that can be passed as 0, not DEFAULTTABLESPACE_OID. Robert Haas noticed the bug in the code, but didn't update the accompanying prose.	2013-07-22 13:15:13 -04:00
Robert Haas	f01d1ae3a1	Add infrastructure for mapping relfilenodes to relation OIDs. Future patches are expected to introduce logical replication that works by decoding WAL. WAL contains relfilenodes rather than relation OIDs, so this infrastructure will be needed to find the relation OID based on WAL contents. If logical replication does not make it into this release, we probably should consider reverting this, since it will add some overhead to DDL operations that create new relations. One additional index insert per pg_class row is not a large overhead, but it's more than zero. Another way of meeting the needs of logical replication would be to the relation OID to WAL, but that would burden DML operations, not only DDL. Andres Freund, with some changes by me. Design review, in earlier versions, by Álvaro Herrera.	2013-07-22 11:09:10 -04:00
Peter Eisentraut	ff41a5de09	Clean up new JSON API typedefs The new JSON API uses a bit of an unusual typedef scheme, where for example OkeysState is a pointer to okeysState. And that's not applied consistently either. Change that to the more usual PostgreSQL style where struct typedefs are upper case, and use pointers explicitly.	2013-07-20 06:38:31 -04:00
Alvaro Herrera	6737aa72ba	Fix HeapTupleSatisfiesVacuum on aborted updater xacts By using only the macro that checks infomask bits HEAP_XMAX_IS_LOCKED_ONLY to verify whether a multixact is not an updater, and not the full HeapTupleHeaderIsOnlyLocked, it would come to the wrong result in case of a multixact containing an aborted update; therefore returning the wrong result code. This would cause predicate.c to break completely (as in bug report #8273 from David Leverton), and certain index builds would misbehave. As far as I can tell, other callers of the bogus routine would make harmless mistakes or not be affected by the difference at all; so this was a pretty narrow case. Also, no other user of the HEAP_XMAX_IS_LOCKED_ONLY macro is as careless; they all check specifically for the HEAP_XMAX_IS_MULTI case, and they all verify whether the updater is InvalidXid before concluding that it's a valid updater. So there doesn't seem to be any similar bug.	2013-07-19 18:47:37 -04:00
Tom Lane	d9f37e6661	Add checks for valid multibyte character length in UtfToLocal, LocalToUtf. This is mainly to suppress "uninitialized variable" warnings from very recent versions of gcc. But it seems like a good robustness thing anyway, not to mention that we might someday decide to support 6-byte UTF8. Per report from Karol Trzcionka. No back-patch since there's no reason at the moment to think this is more than cosmetic.	2013-07-18 21:55:38 -04:00
Andrew Dunstan	d26888bc4d	Move checking an explicit VARIADIC "any" argument into the parser. This is more efficient and simpler . It does mean that an untyped NULL can no longer be used in such cases, which should be mentioned in Release Notes, but doesn't seem a terrible loss. The workaround is to cast the NULL to some array type. Pavel Stehule, reviewed by Jeevan Chalke.	2013-07-18 11:52:12 -04:00
Heikki Linnakangas	3f2adace1e	Fix end-of-loop optimization in pglz_find_match() function. After the recent pglz optimization patch, the next/prev pointers in the hash table are never NULL, INVALID_ENTRY_PTR is used to represent invalid entries instead. The end-of-loop check in pglz_find_match() function didn't get the memo. The result was the same from a correctness point of view, but because the NULL-check would never fail, the tiny optimization turned into a pessimization. Reported by Stephen Frost, using Coverity scanner.	2013-07-17 20:37:09 +03:00
Noah Misch	b560ec1b0d	Implement the FILTER clause for aggregate function calls. This is SQL-standard with a few extensions, namely support for subqueries and outer references in clause expressions. catversion bump due to change in Aggref and WindowFunc. David Fetter, reviewed by Dean Rasheed.	2013-07-16 20:15:36 -04:00
Robert Haas	42c80c696e	Assert that syscache lookups don't happen outside transactions. Andres Freund	2013-07-15 13:31:36 -04:00
Stephen Frost	273dcd1628	Ensure 64bit arithmetic when calculating tapeSpace In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring out how many 'tapes' we can use (maxTapes) and then multiplying the result by the tape buffer overhead for each. Unfortunately, when we are on a system with an 8-byte long, we allow work_mem to be larger than 2GB and that allows maxTapes to be large enough that the 32bit arithmetic can overflow when multiplied against the buffer overhead. When this overflow happens, we end up adding the overflow to the amount of space available, causing the amount of memory allocated to be larger than work_mem. Note that to reach this point, you have to set work mem to at least 24GB and be sorting a set which is at least that size. Given that a user who can set work_mem to 24GB could also set it even higher, if they were looking to run the system out of memory, this isn't considered a security issue. This overflow risk was found by the Coverity scanner. Back-patch to all supported branches, as this issue has existed since before 8.4.	2013-07-14 16:26:16 -04:00
Peter Eisentraut	070518ddab	Add session_preload_libraries configuration parameter This is like shared_preload_libraries except that it takes effect at backend start and can be changed without a full postmaster restart. It is like local_preload_libraries except that it is still only settable by a superuser. This can be a better way to load modules such as auto_explain. Since there are now three preload parameters, regroup the documentation a bit. Put all parameters into one section, explain common functionality only once, update the descriptions to reflect current and future realities. Reviewed-by: Dimitri Fontaine <dimitri@2ndQuadrant.fr>	2013-07-12 21:23:50 -04:00
Peter Eisentraut	7888c61238	Fix bool abuse path_encode's "closed" argument used to take three values: TRUE, FALSE, or -1, while being of type bool. Replace that with a three-valued enum for more clarity.	2013-07-08 22:42:39 -04:00
Heikki Linnakangas	9a20a9b21b	Improve scalability of WAL insertions. This patch replaces WALInsertLock with a number of WAL insertion slots, allowing multiple backends to insert WAL records to the WAL buffers concurrently. This is particularly useful for parallel loading large amounts of data on a system with many CPUs. This has one user-visible change: switching to a new WAL segment with pg_switch_xlog() now fills the remaining unused portion of the segment with zeros. This potentially adds some overhead, but it has been a very common practice by DBA's to clear the "tail" of the segment with an external pg_clearxlogtail utility anyway, to make the WAL files compress better. With this patch, it's no longer necessary to do that. This patch adds a new GUC, xloginsert_slots, to tune the number of WAL insertion slots. Performance testing suggests that the default, 8, works pretty well for all kinds of worklods, but I left the GUC in place to allow others with different hardware to test that easily. We might want to remove that before release. Reviewed by Andres Freund.	2013-07-08 11:23:56 +03:00
Magnus Hagander	c87ff71f37	Expose the estimation of number of changed tuples since last analyze This value, now pg_stat_all_tables.n_mod_since_analyze, was already tracked and used by autovacuum, but not exposed to the user. Mark Kirkwood, review by Laurenz Albe	2013-07-05 15:10:15 +02:00
Noah Misch	79e0f87a15	Use type "int64" for memory accounting in tuplesort.c/tuplestore.c. Commit `263865a489` switched tuplesort.c and tuplestore.c variables representing memory usage from type "long" to type "Size". This was unnecessary; I thought doing so avoided overflow scenarios on 64-bit Windows, but guc.c already limited work_mem so as to prevent the overflow. It was also incomplete, not touching the logic that assumed a signed data type. Change the affected variables to "int64". This is perfect for 64-bit platforms, and it reduces the need to contemplate platform-specific overflow scenarios. It also puts us close to being able to support work_mem over 2 GiB on 64-bit Windows. Per report from Andres Freund.	2013-07-04 23:13:54 -04:00
Robert Haas	6bc8ef0b7f	Add new GUC, max_worker_processes, limiting number of bgworkers. In 9.3, there's no particular limit on the number of bgworkers; instead, we just count up the number that are actually registered, and use that to set MaxBackends. However, that approach causes problems for Hot Standby, which needs both MaxBackends and the size of the lock table to be the same on the standby as on the master, yet it may not be desirable to run the same bgworkers in both places. 9.3 handles that by failing to notice the problem, which will probably work fine in nearly all cases anyway, but is not theoretically sound. A further problem with simply counting the number of registered workers is that new workers can't be registered without a postmaster restart. This is inconvenient for administrators, since bouncing the postmaster causes an interruption of service. Moreover, there are a number of applications for background processes where, by necessity, the background process must be started on the fly (e.g. parallel query). While this patch doesn't actually make it possible to register new background workers after startup time, it's a necessary prerequisite. Patch by me. Review by Michael Paquier.	2013-07-04 11:24:24 -04:00
Fujii Masao	2ef085d0e6	Get rid of pg_class.reltoastidxid. Treat TOAST index just the same as normal one and get the OID of TOAST index from pg_index but not pg_class.reltoastidxid. This change allows us to handle multiple TOAST indexes, and which is required infrastructure for upcoming REINDEX CONCURRENTLY feature. Patch by Michael Paquier, reviewed by Andres Freund and me.	2013-07-04 03:24:09 +09:00
Robert Haas	568d4138c6	Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.	2013-07-02 09:47:01 -04:00
Bruce Momjian	7408c5d29b	Add timezone offset output option to to_char() Add ability for to_char() to output the timezone's UTC offset (OF). We already have the ability to return the timezone abbeviation (TZ/tz). Per request from Andrew Dunstan	2013-07-01 13:40:32 -04:00
Heikki Linnakangas	031cc55bbe	Optimize pglz compressor for small inputs. The pglz compressor has a significant startup cost, because it has to initialize to zeros the history-tracking hash table. On a 64-bit system, the hash table was 64kB in size. While clearing memory is pretty fast, for very short inputs the relative cost of that was quite large. This patch alleviates that in two ways. First, instead of storing pointers in the hash table, store 16-bit indexes into the hist_entries array. That slashes the size of the hash table to 1/2 or 1/4 of the original, depending on the pointer width. Secondly, adjust the size of the hash table based on input size. For very small inputs, you don't need a large hash table to avoid collisions. Review by Amit Kapila.	2013-07-01 11:00:14 +03:00
Noah Misch	263865a489	Permit super-MaxAllocSize allocations with MemoryContextAllocHuge(). The MaxAllocSize guard is convenient for most callers, because it reduces the need for careful attention to overflow, data type selection, and the SET_VARSIZE() limit. A handful of callers are happy to navigate those hazards in exchange for the ability to allocate a larger chunk. Introduce MemoryContextAllocHuge() and repalloc_huge(). Use this in tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX tuples, a factor-of-48 increase. In particular, B-tree index builds can now benefit from much-larger maintenance_work_mem settings. Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.	2013-06-27 14:53:57 -04:00
Noah Misch	19085116ee	Cooperate with the Valgrind instrumentation framework. Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its Memcheck tool about the PostgreSQL allocator. This makes Valgrind roughly as sensitive to memory errors involving palloc chunks as it is to memory errors involving malloc chunks. Further client requests in PageAddItem() and printtup() verify that all bits being added to a buffer page or furnished to an output function are predictably-defined. Those tests catch failures of C-language functions to fully initialize the bits of a Datum, which in turn stymie optimizations that rely on _equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to enable these additions. An included "suppression file" silences nominal errors we don't plan to fix. Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.	2013-06-26 20:22:25 -04:00
Noah Misch	a855148a29	Refactor aset.c and mcxt.c in preparation for Valgrind cooperation. Move some repeated debugging code into functions and store intermediates in variables where not presently necessary. No code-generation changes in a production build, and no functional changes. This simplifies and focuses the main patch.	2013-06-26 19:56:03 -04:00
Noah Misch	5f538ad004	Renovate display of non-ASCII messages on Windows. GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch	2013-06-26 11:17:33 -04:00
Fujii Masao	bab54e383d	Support TB (terabyte) memory unit in GUC variables. Patch by Simon Riggs, reviewed by Jeff Janes and me.	2013-06-20 08:17:14 +09:00
Jeff Davis	b8fd1a09f3	Add buffer_std flag to MarkBufferDirtyHint(). MarkBufferDirtyHint() writes WAL, and should know if it's got a standard buffer or not. Currently, the only callers where buffer_std is false are related to the FSM. In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive. Back-patch to 9.3.	2013-06-17 08:02:12 -07:00
Tom Lane	a64ca63e59	Use WaitLatch, not pg_usleep, for delaying in pg_sleep(). This avoids platform-dependent behavior wherein pg_sleep() might fail to be interrupted by statement timeout, query cancel, SIGTERM, etc. Also, since there's no reason to wake up once a second any more, we can reduce the power consumption of a sleeping backend a tad. Back-patch to 9.3, since use of SA_RESTART for SIGALRM makes this a bigger issue than it used to be.	2013-06-15 16:23:24 -04:00
Tom Lane	c62866eeaf	Remove special-case treatment of LOG severity level in standalone mode. elog.c has historically treated LOG messages as low-priority during bootstrap and standalone operation. This has led to confusion and even masked a bug, because the normal expectation of code authors is that elog(LOG) will put something into the postmaster log, and that wasn't happening during initdb. So get rid of the special-case rule and make the priority order the same as it is in normal operation. To keep from cluttering initdb's output and the behavior of a standalone backend, tweak the severity level of three messages routinely issued by xlog.c during startup and shutdown so that they won't appear in these cases. Per my proposal back in December.	2013-06-13 23:15:15 -04:00
Noah Misch	66008564f8	Avoid reading past datum end when parsing JSON. Several loops in the JSON parser examined a byte in memory just before checking whether its address was in-bounds, so they could read one byte beyond the datum's allocation. A SIGSEGV is possible. New in 9.3, so no back-patch.	2013-06-12 19:51:12 -04:00
Tom Lane	dc3eb56383	Improve updatability checking for views and foreign tables. Extend the FDW API (which we already changed for 9.3) so that an FDW can report whether specific foreign tables are insertable/updatable/deletable. The default assumption continues to be that they're updatable if the relevant executor callback function is supplied by the FDW, but finer granularity is now possible. As a test case, add an "updatable" option to contrib/postgres_fdw. This patch also fixes the information_schema views, which previously did not think that foreign tables were ever updatable, and fixes view_is_auto_updatable() so that a view on a foreign table can be auto-updatable. initdb forced due to changes in information_schema views and the functions they rely on. This is a bit unfortunate to do post-beta1, but if we don't change this now then we'll have another API break for FDWs when we do change it. Dean Rasheed, somewhat editorialized on by Tom Lane	2013-06-12 17:53:33 -04:00
Andrew Dunstan	78ed8e03c6	Fix unescaping of JSON Unicode escapes, especially for non-UTF8. Per discussion on -hackers. We treat Unicode escapes when unescaping them similarly to the way we treat them in PostgreSQL string literals. Escapes in the ASCII range are always accepted, no matter what the database encoding. Escapes for higher code points are only processed in UTF8 databases, and attempts to process them in other databases will result in an error. \u0000 is never unescaped, since it would result in an impermissible null byte.	2013-06-12 13:35:24 -04:00
Tom Lane	e262755bfc	Fix cache flush hazard in cache_record_field_properties(). We need to increment the refcount on the composite type's cached tuple descriptor while we do lookups of its column types. Otherwise a cache flush could occur and release the tuple descriptor before we're done with it. This fails reliably with -DCLOBBER_CACHE_ALWAYS, but the odds of a failure in a production build seem rather low (since the pfree'd descriptor typically wouldn't get scribbled on immediately). That may explain the lack of any previous reports. Buildfarm issue noted by Christian Ullrich. Back-patch to 9.1 where the bogus code was added.	2013-06-11 17:26:42 -04:00
Andrew Dunstan	94e3311b97	Handle Unicode surrogate pairs correctly when processing JSON. In 9.2, Unicode escape sequences are not analysed at all other than to make sure that they are in the form \uXXXX. But in 9.3 many of the new operators and functions try to turn JSON text values into text in the server encoding, and this includes de-escaping Unicode escape sequences. This processing had not taken into account the possibility that this might contain a surrogate pair to designate a character outside the BMP. That is now handled correctly. This also enforces correct use of surrogate pairs, something that is not done by the type's input routines. This fact is noted in the docs.	2013-06-08 09:12:48 -04:00
Stephen Frost	f129615fe7	Additional spelling corrections A few more minor spelling corrections, no functional changes. Thom Brown	2013-06-03 08:40:27 -04:00
Stephen Frost	c9fc28a7f1	Minor spelling fixes Fix a few spelling mistakes. Per bug report #8193 from Lajos Veres.	2013-06-01 10:18:59 -04:00
Noah Misch	97c4d9b7c7	Don't emit non-canonical empty arrays in array_remove(). Dean Rasheed	2013-05-31 21:50:59 -04:00
Peter Eisentraut	97a11fd0e3	postgresql.conf.sample: Improve whitespace	2013-05-29 22:00:13 -04:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Tom Lane	403bd6a18b	Fix crash when trying to display a NOTIFY rule action. Fixes oversight in commit `2ffa740be9`. Per report from Josh Kupershmidt. I think we've broken this case before, so let's add a regression test this time.	2013-05-16 16:47:26 -04:00
Tom Lane	35d50b527a	Fix to_number() to correctly ignore thousands separator when it's '.'. The existing code in NUM_numpart_from_char has hard-wired logic to treat '.' as decimal point, even when we're using a locale-aware format string and the locale says that '.' is the thousands separator. This results in clearly wrong answers in FM mode (where we must be able to identify the decimal point location), as per bug report from Patryk Kordylewski. Since the initialization code in NUM_prepare_locale already sets up Np->decimal as either the locale decimal-point string or "." depending on which decimal-point format code was used, there's really no need to have any extra logic at all in NUM_numpart_from_char: we only need to test for a match to Np->decimal. (Note: AFAICS there's nothing in here that explicitly checks for thousands separators --- rather, any unmatched character is silently skipped over. That's pretty bogus IMO but it's not the issue being complained of.) This is a longstanding bug, but it's possible that some existing apps are depending on '.' being recognized as decimal point even when using a D format code. Hence, no back-patch. We should probably list this as a potential incompatibility in the 9.3 release notes.	2013-05-11 16:35:03 -04:00
Tom Lane	69cc60dcfd	Guard against input_rows == 0 in estimate_num_groups(). This case doesn't normally happen, because the planner usually clamps all row estimates to at least one row; but I found that it can arise when dealing with relations excluded by constraints. Without a defense, estimate_num_groups() can return zero, which leads to divisions by zero inside the planner as well as assertion failures in the executor. An alternative fix would be to change set_dummy_rel_pathlist() to make the size estimate for a dummy relation 1 row instead of 0, but that seemed pretty ugly; and probably someday we'll want to drop the convention that the minimum rowcount estimate is 1 row. Back-patch to 8.4, as the problem can be demonstrated that far back.	2013-05-10 17:15:30 -04:00
Tom Lane	1d6c72a55b	Move materialized views' is-populated status into their pg_class entries. Previously this state was represented by whether the view's disk file had zero or nonzero size, which is problematic for numerous reasons, since it's breaking a fundamental assumption about heap storage. This was done to allow unlogged matviews to revert to unpopulated status after a crash despite our lack of any ability to update catalog entries post-crash. However, this poses enough risk of future problems that it seems better to not support unlogged matviews until we can find another way. Accordingly, revert that choice as well as a number of existing kluges forced by it in favor of creating a pg_class.relispopulated flag column.	2013-05-06 13:27:22 -04:00
Bruce Momjian	8b06e6aba8	Revert idea of zer-padding padding session id in log_line_prefix Removal of doc adjustment and release note mention as well.	2013-05-06 08:59:39 -04:00
Andrew Dunstan	5f8b4319b9	Use correct length to convert json unicode escapes. Bug reported on IRC - fix due to Andrew Gierth.	2013-05-01 18:47:18 -04:00
Tom Lane	ac63dca607	Fix longstanding race condition in plancache.c. When creating or manipulating a cached plan for a transaction control command (particularly ROLLBACK), we must not perform any catalog accesses, since we might be in an aborted transaction. However, plancache.c busily saved or examined the search_path for every cached plan. If we were unlucky enough to do this at a moment where the path's expansion into schema OIDs wasn't already cached, we'd do some catalog accesses; and with some more bad luck such as an ill-timed signal arrival, that could lead to crashes or Assert failures, as exhibited in bug #8095 from Nachiket Vaidya. Fortunately, there's no real need to consider the search path for such commands, so we can just skip the relevant steps when the subject statement is a TransactionStmt. This is somewhat related to bug #5269, though the failure happens during initial cached-plan creation rather than revalidation. This bug has been there since the plan cache was invented, so back-patch to all supported branches.	2013-04-20 17:00:23 -04:00
Peter Eisentraut	cc26ea9fe2	Clean up references to SQL92 In most cases, these were just references to the SQL standard in general. In a few cases, a contrast was made between SQL92 and later standards -- those have been kept unchanged.	2013-04-20 11:04:41 -04:00
Andrew Dunstan	728ec9731f	Correct handling of NULL arguments in json funcs. Per gripe from Tom Lane.	2013-04-15 16:20:21 -04:00
Kevin Grittner	52e6e33ab4	Create a distinction between a populated matview and a scannable one. The intent was that being populated would, long term, be just one of the conditions which could affect whether a matview was scannable; being populated should be necessary but not always sufficient to scan the relation. Since only CREATE and REFRESH currently determine the scannability, names and comments accidentally conflated these concepts, leading to confusion. Also add missing locking for the SQL function which allows a test for scannability, and fix a modularity violatiion. Per complaints from Tom Lane, although its not clear that these will satisfy his concerns. Hopefully this will at least better frame the discussion.	2013-04-09 13:02:49 -05:00
Tom Lane	3ccae48f44	Support indexing of regular-expression searches in contrib/pg_trgm. This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane	2013-04-09 01:06:54 -04:00
Andrew Dunstan	e75feb2834	Fix off by one error in JSON extract path code. Bug report by David Wheeler, diagnosis assistance from Tom Lane.	2013-04-04 18:26:52 -04:00
Tom Lane	f7b0006f42	Avoid updating our PgBackendStatus entry when track_activities is off. The point of turning off track_activities is to avoid this reporting overhead, but a thinko in commit `4f42b546fd` caused pgstat_report_activity() to perform half of its updates anyway. Fix that, and also make sure that we clear all the now-disabled fields when transitioning to the non-reporting state.	2013-04-03 14:13:28 -04:00
Tom Lane	17fe2793ea	Fix insecure parsing of server command-line switches. An oversight in commit `e710b65c1c` allowed database names beginning with "-" to be treated as though they were secure command-line switches; and this switch processing occurs before client authentication, so that even an unprivileged remote attacker could exploit the bug, needing only connectivity to the postmaster's port. Assorted exploits for this are possible, some requiring a valid database login, some not. The worst known problem is that the "-r" switch can be invoked to redirect the process's stderr output, so that subsequent error messages will be appended to any file the server can write. This can for example be used to corrupt the server's configuration files, so that it will fail when next restarted. Complete destruction of database tables is also possible. Fix by keeping the database name extracted from a startup packet fully separate from command-line switches, as had already been done with the user name field. The Postgres project thanks Mitsumasa Kondo for discovering this bug, Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing the full extent of the danger. Security: CVE-2013-1899	2013-04-01 14:00:51 -04:00
Tom Lane	ce9ab88981	Make REPLICATION privilege checks test current user not authenticated user. The pg_start_backup() and pg_stop_backup() functions checked the privileges of the initially-authenticated user rather than the current user, which is wrong. For example, a user-defined index function could successfully call these functions when executed by ANALYZE within autovacuum. This could allow an attacker with valid but low-privilege database access to interfere with creation of routine backups. Reported and fixed by Noah Misch. Security: CVE-2013-1901	2013-04-01 13:09:24 -04:00
Andrew Dunstan	a570c98d7f	Add new JSON processing functions and parser API. The JSON parser is converted into a recursive descent parser, and exposed for use by other modules such as extensions. The API provides hooks for all the significant parser event such as the beginning and end of objects and arrays, and providing functions to handle these hooks allows for fairly simple construction of a wide variety of JSON processing functions. A set of new basic processing functions and operators is also added, which use this API, including operations to extract array elements, object fields, get the length of arrays and the set of keys of a field, deconstruct an object into a set of key/value pairs, and create records from JSON objects and arrays of objects. Catalog version bumped. Andrew Dunstan, with some documentation assistance from Merlin Moncure.	2013-03-29 14:12:13 -04:00
Alvaro Herrera	473ab40c8b	Add sql_drop event for event triggers This event takes place just before ddl_command_end, and is fired if and only if at least one object has been dropped by the command. (For instance, DROP TABLE IF EXISTS of a table that does not in fact exist will not lead to such a trigger firing). Commands that drop multiple objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event to fire. Some firings might be surprising, such as ALTER TABLE DROP COLUMN. The trigger is fired after the drop has taken place, because that has been deemed the safest design, to avoid exposing possibly-inconsistent internal state (system catalogs as well as current transaction) to the user function code. This means that careful tracking of object identification is required during the object removal phase. Like other currently existing events, there is support for tag filtering. To support the new event, add a new pg_event_trigger_dropped_objects() set-returning function, which returns a set of rows comprising the objects affected by the command. This is to be used within the user function code, and is mostly modelled after the recently introduced pg_identify_object() function. Catalog version bumped due to the new function. Dimitri Fontaine and Álvaro Herrera Review by Robert Haas, Tom Lane	2013-03-28 13:05:48 -03:00
Simon Riggs	593c39d156	Revoke `bc5334d867`	2013-03-28 09:18:02 +00:00
Simon Riggs	d139a5e26b	Revoke `7a5a59d378`	2013-03-28 09:12:55 +00:00
Simon Riggs	7a5a59d378	Set recovery_config_directory for EXEC_BACKEND. Remove comment questioning whether this is necessary for DataDir. From buildfarm failures on Windows.	2013-03-27 16:35:38 +00:00
Simon Riggs	bc5334d867	Allow external recovery_config_directory If required, recovery.conf can now be located outside of the data directory. Server needs read/write permissions on this directory.	2013-03-27 11:45:42 +00:00
Simon Riggs	96ef3b8ff1	Allow I/O reliability checks using 16-bit checksums Checksums are set immediately prior to flush out of shared buffers and checked when pages are read in again. Hint bit setting will require full page write when block is dirtied, which causes various infrastructure changes. Extensive comments, docs and README. WARNING message thrown if checksum fails on non-all zeroes page; ERROR thrown but can be disabled with ignore_checksum_failure = on. Feature enabled by an initdb option, since transition from option off to option on is long and complex and has not yet been implemented. Default is not to use checksums. Checksum used is WAL CRC-32 truncated to 16-bits. Simon Riggs, Jeff Davis, Greg Smith Wide input and assistance from many community members. Thank you.	2013-03-22 13:54:07 +00:00
Simon Riggs	13fe298ca0	Change commit_delay to be SUSET for 9.3+ Prior to 9.3 the commit_delay affected only the current user, whereas now only the group leader waits while holding the WALWriteLock. Deliberate or accidental settings to a poor value could seriously degrade performance for all users. Privileges may be delegated by SECURITY DEFINER functions for anyone that needs per-user settings in real situations. Request for change from Peter Geoghegan	2013-03-22 12:01:16 +00:00
Heikki Linnakangas	f897c4744f	Fix "element <@ range" cost estimation. The statistics-based cost estimation patch for range types broke that, by incorrectly assuming that the left operand of all range oeprators is a range. That lead to a "type x is not a range type" error. Because it took so long for anyone to notice, add a regression test for that case. We still don't do proper statistics-based cost estimation for that, so you just get a default constant estimate. We should look into implementing that, but this patch at least fixes the regression. Spotted by Tom Lane, when testing query from Josh Berkus.	2013-03-21 11:21:51 +02:00
Alvaro Herrera	f8348ea32e	Allow extracting machine-readable object identity Introduce pg_identify_object(oid,oid,int4), which is similar in spirit to pg_describe_object but instead produces a row of machine-readable information to uniquely identify the given object, without resorting to OIDs or other internal representation. This is intended to be used in the event trigger implementation, to report objects being operated on; but it has usefulness of its own. Catalog version bumped because of the new function.	2013-03-20 18:19:19 -03:00
Tom Lane	6ac7facdd3	Improve signal-handler lockout mechanism in timeout.c. Rather than doing a fairly-expensive setitimer() call to prevent interrupts from happening, let's just invent a simple boolean flag that the signal handler is required to check. This is not only faster but considerably more robust than before, since the previous code effectively assumed that only ITIMER_REAL events would ever fire the SIGALRM handler, which is obviously something that can be broken easily by third-party code. Zoltán Böszörményi and Tom Lane	2013-03-17 22:42:19 -04:00
Tom Lane	da5aeccf64	Move pqsignal() to libpgport. We had two copies of this function in the backend and libpq, which was already pretty bogus, but it turns out that we need it in some other programs that don't use libpq (such as pg_test_fsync). So put it where it probably should have been all along. The signal-mask-initialization support in src/backend/libpq/pqsignal.c stays where it is, though, since we only need that in the backend.	2013-03-17 12:06:42 -04:00
Tom Lane	d43837d030	Add lock_timeout configuration parameter. This GUC allows limiting the time spent waiting to acquire any one heavyweight lock. In support of this, improve the recently-added timeout infrastructure to permit efficiently enabling or disabling multiple timeouts at once. That reduces the performance hit from turning on lock_timeout, though it's still not zero. Zoltán Böszörményi, reviewed by Tom Lane, Stephen Frost, and Hari Babu	2013-03-16 23:22:57 -04:00
Tom Lane	73e7025bd8	Extend format() to handle field width and left/right alignment. This change adds some more standard sprintf() functionality to format(). Pavel Stehule, reviewed by Dean Rasheed and Kyotaro Horiguchi	2013-03-14 22:56:56 -04:00
Heikki Linnakangas	59d0bf9dca	Add cost estimation of range @> and <@ operators. The estimates are based on the existing lower bound histogram, and a new histogram of range lengths. Bump catversion, because the range length histogram now needs to be present in statistic slot kind 6, or you get an error on @> and <@ queries. (A re-ANALYZE would be enough to fix that, though) Alexander Korotkov, with some refactoring by me.	2013-03-14 15:36:56 +02:00
Andrew Dunstan	38fb4d978c	JSON generation improvements. This adds the following: json_agg(anyrecord) -> json to_json(any) -> json hstore_to_json(hstore) -> json (also used as a cast) hstore_to_json_loose(hstore) -> json The last provides heuristic treatment of numbers and booleans. Also, in json generation, if any non-builtin type has a cast to json, that function is used instead of the type's output function. Andrew Dunstan, reviewed by Steve Singer. Catalog version bumped.	2013-03-10 17:35:36 -04:00
Heikki Linnakangas	23f10b6473	SP-GiST support of the range adjacent operator -\|- Alexander Korotkov, reviewed by Jeff Davis.	2013-03-08 15:03:19 +02:00
Tom Lane	1908abc4a3	Arrange to cache FdwRoutine structs in foreign tables' relcache entries. This saves several catalog lookups per reference. It's not all that exciting right now, because we'd managed to minimize the number of places that need to fetch the data; but the upcoming writable-foreign-tables patch needs this info in a lot more places.	2013-03-06 23:48:09 -05:00
Robert Haas	f90cc26982	Code beautification for object-access hook machinery. KaiGai Kohei	2013-03-06 20:53:25 -05:00
Tom Lane	80b011ef0a	Fix to_char() to use ASCII-only case-folding rules where appropriate. formatting.c used locale-dependent case folding rules in some code paths where the result isn't supposed to be locale-dependent, for example to_char(timestamp, 'DAY'). Since the source data is always just ASCII in these cases, that usually didn't matter ... but it does matter in Turkish locales, which have unusual treatment of "i" and "I". To confuse matters even more, the misbehavior was only visible in UTF8 encoding, because in single-byte encodings we used pg_toupper/pg_tolower which don't have locale-specific behavior for ASCII characters. Fix by providing intentionally ASCII-only case-folding functions and using these where appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active branches, since it's been like this for a long time.	2013-03-05 13:02:30 -05:00
Tom Lane	542eeba269	Fix overflow check in tm2timestamp (this time for sure). I fixed this code back in commit `841b4a2d5`, but didn't think carefully enough about the behavior near zero, which meant it improperly rejected 1999-12-31 24:00:00. Per report from Magnus Hagander.	2013-03-04 15:13:31 -05:00
Tom Lane	bc61878682	Fix map_sql_value_to_xml_value() to treat domains like their base types. This was already the case for domains over arrays, but not for domains over certain built-in types such as boolean. The special formatting rules for those types should apply to domains over them as well. Per discussion. While this is a bug fix, it's also a behavioral change that seems likely to trip up some applications. So no back-patch. Pavel Stehule	2013-03-03 19:32:22 -05:00
Kevin Grittner	3bf3ab8c56	Add a materialized view relations. A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.	2013-03-03 18:23:31 -06:00
Alvaro Herrera	a730183926	Move relpath() to libpgcommon This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.	2013-02-21 22:46:17 -03:00
Alvaro Herrera	187492b6c2	Split pgstat file in smaller pieces We now write one file per database and one global file, instead of having the whole thing in a single huge file. This reduces the I/O that must be done when partial data is required -- which is all the time, because each process only needs information on its own database anyway. Also, the autovacuum launcher does not need data about tables and functions in each database; having the global stats for all DBs is enough. Catalog version bumped because we have a new subdir under PGDATA. Author: Tomas Vondra. Some rework by Álvaro Testing by Jeff Janes Other discussion by Heikki Linnakangas, Tom Lane.	2013-02-18 18:12:52 -03:00
Peter Eisentraut	9475db3a4e	Add ALTER ROLE ALL SET command This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE ... SET functionality to allow creating settings that apply to all users in all databases. reviewed by Pavel Stehule	2013-02-17 23:45:36 -05:00
Tom Lane	71627f3d19	Fix CVE-2013-0255 properly. Revert commit `ab0f7b6089` (in HEAD only) in favor of the proper solution, which is to declare enum_recv() correctly in the system catalogs. It should be declared to take type "internal" not "cstring". Also improve the type_sanity regression test, which should have caught this typo, so that it actually would. Most of the relevant checks on the signature of type I/O functions should not have been restricted to basetypes/pseudotypes, as they should apply to any type's I/O functions.	2013-02-13 16:20:01 -05:00
Alvaro Herrera	8396447cdb	Create libpgcommon, and move pg_malloc et al to it libpgcommon is a new static library to allow sharing code among the various frontend programs and backend; this lets us eliminate duplicate implementations of common routines. We avoid libpgport, because that's intended as a place for porting issues; per discussion, it seems better to keep them separate. The first use case, and the only implemented by this patch, is pg_malloc and friends, which many frontend programs were already using. At the same time, we can use this to provide palloc emulation functions for the frontend; this way, some palloc-using files in the backend can also be used by the frontend cleanly. To do this, we change palloc() in the backend to be a function instead of a macro on top of MemoryContextAlloc(). This was previously believed to cause loss of performance, but this implementation has been tweaked by Tom and Andres so that on modern compilers it provides a slight improvement over the previous one. This lets us clean up some places that were already with localized hacks. Most of the pg_malloc/palloc changes in this patch were authored by Andres Freund. Zoltán Böszörményi also independently provided a form of that. libpgcommon infrastructure was authored by Álvaro.	2013-02-12 11:21:05 -03:00
Tom Lane	f806c191a3	Simplify box_overlap computations. Given the assumption that a box's high coordinates are not less than its low coordinates, the tests in box_ov() are overly complicated and can be reduced to about half as much work. Since many other functions in geo_ops.c rely on that assumption, there doesn't seem to be a good reason not to use it here. Per discussion of Alexander Korotkov's GiST fix, which was already using the simplified logic (in a non-fuzzy form, but the equivalence holds just as well for fuzzy).	2013-02-08 18:26:08 -05:00
Andrew Dunstan	e1c1e21732	Enable building with Microsoft Visual Studio 2012. Backpatch to release 9.2 Brar Piening and Noah Misch, reviewed by Craig Ringer.	2013-02-06 14:52:29 -05:00
Tom Lane	ab0f7b6089	Prevent execution of enum_recv() from SQL. This function was misdeclared to take cstring when it should take internal. This at least allows crashing the server, and in principle an attacker might be able to use the function to examine the contents of server memory. The correct fix is to adjust the system catalog contents (and fix the regression tests that should have caught this but failed to). However, asking users to correct the catalog contents in existing installations is a pain, so as a band-aid fix for the back branches, install a check in enum_recv() to make it throw error if called with a cstring argument. We will later revert this in HEAD in favor of correcting the catalogs. Our thanks to Sumit Soni (via Secunia SVCRP) for reporting this issue. Security: CVE-2013-0255	2013-02-04 16:25:01 -05:00
Simon Riggs	f480e29449	Reset vacuum_defer_cleanup_age to PGC_SIGHUP. Revert commit `84725aa5ef`	2013-02-04 16:39:55 +00:00
Tom Lane	62e666400d	Perform line wrapping and indenting by default in ruleutils.c. This patch changes pg_get_viewdef() and allied functions so that PRETTY_INDENT processing is always enabled. Per discussion, only the PRETTY_PAREN processing (that is, stripping of "unnecessary" parentheses) poses any real forward-compatibility risk, so we may as well make dump output look as nice as we safely can. Also, set the default wrap length to zero (i.e, wrap after each SELECT or FROM list item), since there's no very principled argument for the former default of 80-column wrapping, and most people seem to agree this way looks better. Marko Tiikkaja, reviewed by Jeevan Chalke, further hacking by Tom Lane	2013-02-03 15:56:45 -05:00
Simon Riggs	84725aa5ef	Mark vacuum_defer_cleanup_age as PGC_POSTMASTER. Following bug analysis of #7819 by Tom Lane	2013-02-02 18:49:54 +00:00
Tom Lane	9afc58396a	Reject nonzero day fields in AT TIME ZONE INTERVAL functions. It's not sensible for an interval that's used as a time zone value to be larger than a day. When we changed the interval type to contain a separate day field, check_timezone() was adjusted to reject nonzero day values, but timetz_izone(), timestamp_izone(), and timestamptz_izone() evidently were overlooked. While at it, make the error messages for these three cases consistent.	2013-01-31 12:12:23 -05:00
Tom Lane	991f3e5ab3	Provide database object names as separate fields in error messages. This patch addresses the problem that applications currently have to extract object names from possibly-localized textual error messages, if they want to know for example which index caused a UNIQUE_VIOLATION failure. It adds new error message fields to the wire protocol, which can carry the name of a table, table column, data type, or constraint associated with the error. (Since the protocol spec has always instructed clients to ignore unrecognized field types, this should not create any compatibility problem.) Support for providing these new fields has been added to just a limited set of error reports (mainly, those in the "integrity constraint violation" SQLSTATE class), but we will doubtless add them to more calls in future. Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with additional hacking by Tom Lane.	2013-01-29 17:08:26 -05:00
Bruce Momjian	7e2322dff3	Allow CREATE TABLE IF EXIST so succeed if the schema is nonexistent Previously, CREATE TABLE IF EXIST threw an error if the schema was nonexistent. This was done by passing 'missing_ok' to the function that looks up the schema oid.	2013-01-26 13:24:50 -05:00
Tom Lane	0d5fbdc157	Change plan caching to honor, not resist, changes in search_path. In the initial implementation of plan caching, we saved the active search_path when a plan was first cached, then reinstalled that path anytime we needed to reparse or replan. The idea of that was to try to reselect the same referenced objects, in somewhat the same way that views continue to refer to the same objects in the face of schema or name changes. Of course, that analogy doesn't bear close inspection, since holding the search_path fixed doesn't cope with object drops or renames. Moreover sticking with the old path seems to create more surprises than it avoids. So instead of doing that, consider that the cached plan depends on search_path, and force reparse/replan if the active search_path is different than it was when we last saved the plan. This gets us fairly close to having "transparency" of plan caching, in the sense that the cached statement acts the same as if you'd just resubmitted the original query text for another execution. There are still some corner cases where this fails though: a new object added in the search path schema(s) might capture a reference in the query text, but we'd not realize that and force a reparse. We might try to fix that in the future, but for the moment it looks too expensive and complicated.	2013-01-25 14:14:41 -05:00
Tom Lane	760f3c043a	Fix concat() and format() to handle VARIADIC-labeled arguments correctly. Previously, the VARIADIC labeling was effectively ignored, but now these functions act as though the array elements had all been given as separate arguments. Pavel Stehule	2013-01-25 00:19:56 -05:00
Alvaro Herrera	0ac5ad5134	Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov	2013-01-23 12:04:59 -03:00
Tom Lane	75b39e7909	Add infrastructure for storing a VARIADIC ANY function's VARIADIC flag. Originally we didn't bother to mark FuncExprs with any indication whether VARIADIC had been given in the source text, because there didn't seem to be any need for it at runtime. However, because we cannot fold a VARIADIC ANY function's arguments into an array (since they're not necessarily all the same type), we do actually need that information at runtime if VARIADIC ANY functions are to respond unsurprisingly to use of the VARIADIC keyword. Add the missing field, and also fix ruleutils.c so that VARIADIC ANY function calls are dumped properly. Extracted from a larger patch that also fixes concat() and format() (the only two extant VARIADIC ANY functions) to behave properly when VARIADIC is specified. This portion seems appropriate to review and commit separately. Pavel Stehule	2013-01-21 20:26:15 -05:00
Robert Haas	841a5150c5	Add ddl_command_end support for event triggers. Dimitri Fontaine, with slight changes by me	2013-01-21 18:00:24 -05:00
Tom Lane	535e69a43f	Fix error-checking typo in check_TSCurrentConfig(). The code failed to detect an out-of-memory failure. Xi Wang	2013-01-20 23:09:35 -05:00
Tom Lane	d5b31cc32b	Fix an O(N^2) performance issue for sessions modifying many relations. AtEOXact_RelationCache() scanned the entire relation cache at the end of any transaction that created a new relation or assigned a new relfilenode. Thus, clients such as pg_restore had an O(N^2) performance problem that would start to be noticeable after creating 10000 or so tables. Since typically only a small number of relcache entries need any cleanup, we can fix this by keeping a small list of their OIDs and doing hash_searches for them. We fall back to the full-table scan if the list overflows. Ideally, the maximum list length would be set at the point where N hash_searches would cost just less than the full-table scan. Some quick experimentation says that point might be around 50-100; I (tgl) conservatively set MAX_EOXACT_LIST = 32. For the case that we're worried about here, which is short single-statement transactions, it's unlikely there would ever be more than about a dozen list entries anyway; so it's probably not worth being too tense about the value. We could avoid the hash_searches by instead keeping the target relcache entries linked into a list, but that would be noticeably more complicated and bug-prone because of the need to maintain such a list in the face of relcache entry drops. Since a relcache entry can only need such cleanup after a somewhat-heavyweight filesystem operation, trying to save a hash_search per cleanup doesn't seem very useful anyway --- it's the scan over all the not-needing-cleanup entries that we wish to avoid here. Jeff Janes, reviewed and tweaked a bit by Tom Lane	2013-01-20 13:45:10 -05:00
Tom Lane	8ae35e9180	Improve memory space management in tuplesort and tuplestore. The code originally just doubled the size of the tuple-pointer array so long as that would fit in allowedMem. This could result in failing to use as much as half of allowedMem, if (as is typical) the last doubling attempt didn't quite fit. Worse, we might double the array size but be unable to use most of the added slots, because there was no room left within the allowedMem limit for tuples the slots should point to. To fix, double only so long as we've used less than half of allowedMem in total. Then do one more array enlargement, but scale it based on total memory consumption so far. This will work nicely as long as the average tuple size is reasonably stable, and in any case should be better than the old method. This change will result in large sort operations consuming a larger fraction of work_mem than they typically did in the past. The release notes should mention that users may want to revisit their work_mem settings, if they'd tuned those settings based on the old behavior of sorting. Jeff Janes, reviewed by Peter Geoghegan and Robert Haas	2013-01-17 13:12:56 -05:00
Magnus Hagander	bba486f372	Base the default SSL ciphers on DEFAULT instead of ALL It's better to start from what the OpenSSL people consider a good default and then remove insecure things (low encryption, exportable encryption and md5 at this point) from that, instead of starting from everything that exists and remove from that. We trust the OpenSSL people to make good choices about what the default is.	2013-01-17 15:04:44 +01:00
Tom Lane	1b794d3f32	Fix hash_update_hash_key() to handle same-bucket case correctly. Original coding would corrupt the hashtable if the item being updated was at the end of its bucket chain and the new hash key hashed to that same bucket. Diagnosis and fix by Heikki Linnakangas.	2013-01-14 21:57:15 -05:00
Tom Lane	5c4eb9166e	Reject out-of-range dates in to_date(). Dates outside the supported range could be entered, but would not print reasonably, and operations such as conversion to timestamp wouldn't behave sanely either. Since this has the potential to result in undumpable table data, it seems worth back-patching. Hitoshi Harada	2013-01-14 15:19:48 -05:00
Tom Lane	2065dd2834	Prevent very-low-probability PANIC during PREPARE TRANSACTION. The code in PostPrepare_Locks supposed that it could reassign locks to the prepared transaction's dummy PGPROC by deleting the PROCLOCK table entries and immediately creating new ones. This was safe when that code was written, but since we invented partitioning of the shared lock table, it's not safe --- another process could steal away the PROCLOCK entry in the short interval when it's on the freelist. Then, if we were otherwise out of shared memory, PostPrepare_Locks would have to PANIC, since it's too late to back out of the PREPARE at that point. Fix by inventing a dynahash.c function to atomically update a hashtable entry's key. (This might possibly have other uses in future.) This is an ancient bug that in principle we ought to back-patch, but the odds of someone hitting it in the field seem really tiny, because (a) the risk window is small, and (b) nobody runs servers with maxed-out lock tables for long, because they'll be getting non-PANIC out-of-memory errors anyway. So fixing it in HEAD seems sufficient, at least until the new code has gotten some testing.	2013-01-13 22:20:22 -05:00
Peter Eisentraut	9d2cd99a60	Make spelling more uniform	2013-01-13 21:42:03 -05:00
Tom Lane	24dd0502a1	Update comments for elog_start(). Forgot I was going to do this as part of the previous patch ...	2013-01-13 18:50:48 -05:00
Tom Lane	31f38f28b0	Redesign the planner's handling of index-descent cost estimation. Historically we've used a couple of very ad-hoc fudge factors to try to get the right results when indexes of different sizes would satisfy a query with the same number of index leaf tuples being visited. In commit `21a39de580` I tweaked one of these fudge factors, with results that proved disastrous for larger indexes. Commit `bf01e34b55` fudged it some more, but still with not a lot of principle behind it. What seems like a better way to address these issues is to explicitly model index-descent costs, since that's what's really at stake when considering diferent indexes with similar leaf-page-level costs. We tried that once long ago, and found that charging random_page_cost per page descended through was way too much, because upper btree levels tend to stay in cache in real-world workloads. However, there's still CPU costs to think about, and the previous fudge factors can be seen as a crude attempt to account for those costs. So this patch replaces those fudge factors with explicit charges for the number of tuple comparisons needed to descend the index tree, plus a small charge per page touched in the descent. The cost multipliers are chosen so that the resulting charges are in the vicinity of the historical (pre-9.2) fudge factors for indexes of up to about a million tuples, while not ballooning unreasonably beyond that, as the old fudge factor did (even more so in 9.2). To make this work accurately for btree indexes, add some code that allows extraction of the known root-page height from a btree. There's no equivalent number readily available for other index types, but we can use the log of the number of index pages as an approximate substitute. This seems like too much of a behavioral change to risk back-patching, but it should improve matters going forward. In 9.2 I'll just revert the fudge-factor change.	2013-01-11 12:56:58 -05:00

... 3 4 5 6 7 ...

5243 Commits