postgresql

Commit Graph

Author	SHA1	Message	Date
Andrew Dunstan	cec8394b5c	Enable building with Visual Studion 2013. Backpatch to 9.3. Brar Piening.	2014-01-26 09:49:10 -05:00
Bruce Momjian	89774b58b0	Adjust C comment in slot_attisnull() regarding nulls.	2014-01-25 16:43:36 -05:00
Heikki Linnakangas	71c6a8e375	Add recovery_target='immediate' option. This allows ending recovery as a consistent state has been reached. Without this, there was no easy way to e.g restore an online backup, without replaying any extra WAL after the backup ended. MauMau and me.	2014-01-25 17:34:04 +02:00
Heikki Linnakangas	d150ff5781	Reset unused fields in GIN data leaf page footer. The maxoff field is not used in the new, compressed page format. Let's reset it when converting an old-format page to the new format. The code won't care either way, but this makes it possible to use the field for something else in the future.	2014-01-24 19:10:10 +02:00
Heikki Linnakangas	a8f374849f	Fix off-by-one in newly-introdcued GIN assertion. Spotted by Alexander Korotkov	2014-01-24 11:10:09 +02:00
Heikki Linnakangas	398cf255ad	In GIN recompression code, use mmemove rather than memcpy, for vacuum. When vacuuming a data leaf page, any compressed posting lists that are not modified, are copied back to the buffer from a later location in the same buffer rather than from a palloc'd copy. IOW, they are just moved downwards in the same buffer. Because the source and destination addresses can overlap, we must use memmove rather than memcpy. Report and fix by Alexander Korotkov.	2014-01-24 10:48:45 +02:00
Stephen Frost	fbe19ee3b8	ALTER TABLESPACE ... MOVE ... OWNED BY Add the ability to specify the objects to move by who those objects are owned by (as relowner) and change ALL to mean ALL objects. This makes the command always operate against a well-defined set of objects and not have the objects-to-be-moved based on the role of the user running the command. Per discussion with Simon and Tom.	2014-01-23 23:52:40 -05:00
Tom Lane	ac4ef637ad	Allow use of "z" flag in our printf calls, and use it where appropriate. Since C99, it's been standard for printf and friends to accept a "z" size modifier, meaning "whatever size size_t has". Up to now we've generally dealt with printing size_t values by explicitly casting them to unsigned long and using the "l" modifier; but this is really the wrong thing on platforms where pointers are wider than longs (such as Win64). So let's start using "z" instead. To ensure we can do that on all platforms, teach src/port/snprintf.c to understand "z", and add a configure test to force use of that implementation when the platform's version doesn't handle "z". Having done that, modify a bunch of places that were using the unsigned-long hack to use "z" instead. This patch doesn't pretend to have gotten everyplace that could benefit, but it catches many of them. I made an effort in particular to ensure that all uses of the same error message text were updated together, so as not to increase the number of translatable strings. It's possible that this change will result in format-string warnings from pre-C99 compilers. We might have to reconsider if there are any popular compilers that will warn about this; but let's start by seeing what the buildfarm thinks. Andres Freund, with a little additional work by me	2014-01-23 17:18:33 -05:00
Heikki Linnakangas	ec8f692c3c	Fix alignment of GIN in-line posting lists stored in entry tuples. The Sparc machines in the buildfarm are crashing because of misaligned access to posting lists stored in entry tuples. I accidentally removed a critical SHORTALIGN() from ginFormTuple, as part of the packed posting lists patch. Perhaps I thought it was unnecessary, because the index_form_tuple() call above the SHORTALIGN already aligned the size, missing the fact that the null-category byte makes it misaligned again (I think the SHORTALIGN is indeed unnecessary if there's no null- category byte, but let's just play it safe...)	2014-01-23 22:58:12 +02:00
Heikki Linnakangas	0fdb2f7d7c	Silence compiler warning. Not all compilers understand that elog(ERROR, ...) never returns.	2014-01-23 22:15:31 +02:00
Alvaro Herrera	b152c6cd0d	Make DROP IF EXISTS more consistently not fail Some cases were still reporting errors and aborting, instead of a NOTICE that the object was being skipped. This makes it more difficult to cleanly handle pg_dump --clean, so change that to instead skip missing objects properly. Per bug #7873 reported by Dave Rolsky; apparently this affects a large number of users. Authors: Pavel Stehule and Dean Rasheed. Some tweaks by Álvaro Herrera	2014-01-23 14:40:29 -03:00
Heikki Linnakangas	6668ad1d70	Fix declaration of GinVacuumState. gcc 4.8 was happy with having a duplicate typedef, but most compilers seem not to be, per buildfarm.	2014-01-22 19:55:36 +02:00
Heikki Linnakangas	36a35c550a	Compress GIN posting lists, for smaller index size. GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.	2014-01-22 19:20:58 +02:00
Andrew Dunstan	243ee26633	Reindent json.c and jsonfuncs.c. This will help in preparation of clean patches for upcoming json work.	2014-01-22 08:46:51 -05:00
Stephen Frost	6c36f383df	Allow type_func_name_keywords in even more places A while back, `2c92edad48` allowed type_func_name_keywords to be used in more places, including role identifiers. Unfortunately, that commit missed out on cases where name_list was used for lists-of-roles, eg: for DROP ROLE. This resulted in the unfortunate situation that you could CREATE a role with a type_func_name_keywords-allowed identifier, but not DROP it (directly- ALTER could be used to rename it to something which could be DROP'd). This extends allowing type_func_name_keywords to places where role lists can be used. Back-patch to 9.0, as `2c92edad48` was.	2014-01-21 22:49:22 -05:00
Tom Lane	69c7a9838c	Tweak parse location assignment for CURRENT_DATE and related constructs. All these constructs generate parse trees consisting of a Const and a run-time type coercion (perhaps a FuncExpr or a CoerceViaIO). Modify the raw parse output so that we end up with the original token's location attached to the type coercion node while the Const has location -1; before, it was the other way around. This makes no difference in terms of what exprLocation() will say about the parse tree as a whole, so it should not have any user-visible impact. The point of changing it is that we do not want contrib/pg_stat_statements to treat these constructs as replaceable constants. It will do the right thing if the Const has location -1 rather than a valid location. This is a pretty ugly hack, but then this code is ugly already; we should someday replace this translation with special-purpose parse node(s) that would allow ruleutils.c to reconstruct the original query text. (See also commit `5d3fcc4c2e`, which also hacked location assignment rules for the benefit of pg_stat_statements.) Back-patch to 9.2 where pg_stat_statements grew the ability to recognize replaceable constants. Kyotaro Horiguchi	2014-01-21 16:34:28 -05:00
Robert Haas	01f7808b3e	Add a cardinality function for arrays. Unlike our other array functions, this considers the total number of elements across all dimensions, and returns 0 rather than NULL when the array has no elements. But it seems that both of those behaviors are almost universally disliked, so hopefully that's OK. Marko Tiikkaja, reviewed by Dean Rasheed and Pavel Stehule	2014-01-21 12:38:53 -05:00
Robert Haas	033b2343fa	Fix inadvertent semantics change in last patch to plug memory leaks. Commit `a5bca4ef03` accidentally changed the semantics when the "skipping missing configuration file" is emitted, because it forced OK to true instead of leaving the value untouched. Spotted by Tom Lane.	2014-01-21 11:42:37 -05:00
Robert Haas	5709b8acc6	Avoid a possible relcache leak in get_object_address_attribute. There's no apparent way to trigger this, so I'm not going to worry about back-patching it for now. But it's still wrong. Marti Raudsepp	2014-01-21 10:02:37 -05:00
Robert Haas	a5bca4ef03	Plug more memory leaks when reloading config file. Commit `138184adc5` plugged some but not all of the leaks from commit `2a0c81a12c`. This tightens things up some more. Amit Kapila, per an observation by Tom Lane	2014-01-21 09:41:40 -05:00
Alvaro Herrera	d2458e3b20	Expose a routine to print triggers during EXPLAIN ANALYZE This is so that auto_explain can use it. Kyotaro HORIGUCHI	2014-01-20 17:13:47 -03:00
Tom Lane	9a8f5729b4	Fix to_timestamp/to_date's handling of consecutive spaces in format string. When there are consecutive spaces (or other non-format-code characters) in the format, we should advance over exactly that many characters of input. The previous coding mistakenly did a "skip whitespace" action between such characters, possibly allowing more input to be skipped than the user intended. We only need to skip whitespace just before an actual field. This is really a bug fix, but given the minimal number of field complaints and the risk of breaking applications coded to expect the old behavior, let's not back-patch it. Jeevan Chalke	2014-01-20 13:45:51 -05:00
Fujii Masao	5363c7f2bc	Fix typo in comment. Sawada Masahiko	2014-01-21 02:24:17 +09:00
Simon Riggs	4d1e2aeb1a	Speed up COPY into tables with DEFAULT nextval() Previously the presence of a nextval() prevented the use of batch-mode COPY. This patch introduces a special case just for nextval() functions. In future we will introduce a general case solution for labelling volatile functions as safe for use.	2014-01-20 17:22:38 +00:00
Magnus Hagander	98de86e422	Remove support for native krb5 authentication krb5 has been deprecated since 8.3, and the recommended way to do Kerberos authentication is using the GSSAPI authentication method (which is still fully supported). libpq retains the ability to identify krb5 authentication, but only gives an error message about it being unsupported. Since all authentication is initiated from the backend, there is no need to keep it at all in the backend.	2014-01-19 17:05:01 +01:00
Magnus Hagander	4b8f2859cc	Adjust the SSL connection notification message Suggested by Tom	2014-01-19 13:27:22 +01:00
Stephen Frost	5254958e92	Add CREATE TABLESPACE ... WITH ... Options Tablespaces have a few options which can be set on them to give PG hints as to how the tablespace behaves (perhaps it's faster for sequential scans, or better able to handle random access, etc). These options were only available through the ALTER TABLESPACE command. This adds the ability to set these options at CREATE TABLESPACE time, removing the need to do both a CREATE TABLESPACE and ALTER TABLESPACE to get the correct options set on the tablespace. Vik Fearing, reviewed by Michael Paquier.	2014-01-18 20:59:31 -05:00
Tom Lane	115f414124	Fix VACUUM's reporting of dead-tuple counts to the stats collector. Historically, VACUUM has just reported its new_rel_tuples estimate (the same thing it puts into pg_class.reltuples) to the stats collector. That number counts both live and dead-but-not-yet-reclaimable tuples. This behavior may once have been right, but modern versions of the pgstats code track live and dead tuple counts separately, so putting the total into n_live_tuples and zero into n_dead_tuples is surely pretty bogus. Fix it to report live and dead tuple counts separately. This doesn't really do much for situations where updating transactions commit concurrently with a VACUUM scan (possibly causing double-counting or omission of the tuples they add or delete); but it's clearly an improvement over what we were doing before. Hari Babu, reviewed by Amit Kapila	2014-01-18 19:24:33 -05:00
Stephen Frost	76e91b38ba	Add ALTER TABLESPACE ... MOVE command This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of objects from one tablespace to another. This can be extremely handy and avoids a lot of error-prone scripting. ALTER TABLESPACE ... MOVE will only move objects the user owns, will notify the user if no objects were found, and can be used to move ALL objects or specific types of objects (TABLES, INDEXES, or MATERIALIZED VIEWS).	2014-01-18 18:56:40 -05:00
Stephen Frost	6f25c62d78	Allow SET TABLESPACE to database default We've always allowed CREATE TABLE to create tables in the database's default tablespace without checking for CREATE permissions on that tablespace. Unfortunately, the original implementation of ALTER TABLE ... SET TABLESPACE didn't pick up on that exception. This changes ALTER TABLE ... SET TABLESPACE to allow the database's default tablespace without checking for CREATE rights on that tablespace, just as CREATE TABLE works today. Users could always do this through a series of commands (CREATE TABLE ... AS SELECT * FROM ...; DROP TABLE ...; etc), so let's fix the oversight in SET TABLESPACE's original implementation.	2014-01-18 18:41:52 -05:00
Tom Lane	0d79c0a8cc	Make various variables const (read-only). These changes should generally improve correctness/maintainability. A nice side benefit is that several kilobytes move from initialized data to text segment, allowing them to be shared across processes and probably reducing copy-on-write overhead while forking a new backend. Unfortunately this doesn't seem to help libpq in the same way (at least not when it's compiled with -fpic on x86_64), but we can hope the linker at least collects all nominally-const data together even if it's not actually part of the text segment. Also, make pg_encname_tbl[] static in encnames.c, since there seems no very good reason for any other code to use it; per a suggestion from Wim Lewis, who independently submitted a patch that was mostly a subset of this one. Oskari Saarenmaa, with some editorialization by me	2014-01-18 16:04:32 -05:00
Magnus Hagander	4cba1f6bbf	Show SSL encryption information when logging connections Expand the messages when log_connections is enabled to include the fact that SSL is used and the SSL cipher information. Dr. Andreas Kunert, review by Marko Kreen	2014-01-17 13:32:31 +01:00
Heikki Linnakangas	a472ae1e4e	Fix Hot Standby feedback sending when streaming busily. Commit `6f60fdd701` accidentally removed a call to XLogWalRcvSendHSFeedback() after flushing received WAL to disk. The consequence is that when walsender is busy streaming WAL, it doesn't send HS feedback messages. One is sent if nothing is received from the master for 100ms, but if there's a steady stream of WAL, it never happens. Backpatch to 9.3. Andres Freund and Amit Kapila	2014-01-16 23:15:41 +02:00
Heikki Linnakangas	8ba288da5d	Suppress Coverity complaints in readfuncs.c. Coverity is complaining that the value returned by pg_strtok in READ_LOCATION_FIELD and READ_BITMAPSET_FIELD macros is not used. In commit `39bfc94c86`, we did this to the other macros to placate compilers that complained when the variable was completely unused, this extends that to the last remaining macros.	2014-01-16 12:00:19 +02:00
Robert Haas	ed46758381	Logging running transactions every 15 seconds. Previously, we did this just once per checkpoint, but that could make Hot Standby take a long time to initialize. To avoid busying an otherwise-idle system, we don't do this if no WAL has been written since we did it last. Andres Freund	2014-01-15 12:41:20 -05:00
Robert Haas	d02c0ddb15	Fix missing parentheses resulting in wrong order of dereference. This could result in referencing uninitialized memory. Michael Paquier, in response to a complaint from Andres Freund	2014-01-15 11:00:50 -05:00
Tom Lane	061b079f89	Fix multiple bugs in index page locking during hot-standby WAL replay. In ordinary operation, VACUUM must be careful to take a cleanup lock on each leaf page of a btree index; this ensures that no indexscans could still be "in flight" to heap tuples due to be deleted. (Because of possible index-tuple motion due to concurrent page splits, it's not enough to lock only the pages we're deleting index tuples from.) In Hot Standby, the WAL replay process must likewise lock every leaf page. There were several bugs in the code for that: * The replay scan might come across unused, all-zero pages in the index. While btree_xlog_vacuum itself did the right thing (ie, nothing) with such pages, xlogutils.c supposed that such pages must be corrupt and would throw an error. This accounts for various reports of replication failures with "PANIC: WAL contains references to invalid pages". To fix, add a ReadBufferMode value that instructs XLogReadBufferExtended not to complain when we're doing this. * btree_xlog_vacuum performed the extra locking if standbyState == STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up for hot standby queries until the database has reached consistency, and we don't want to do the extra locking till then either, for fear of reading corrupted pages (which bufmgr.c would complain about). Fix by exporting a new function from xlog.c that will report whether we're actually in hot standby replay mode. * To ensure full coverage of the index in the replay scan, btvacuumscan would emit a dummy WAL record for the last page of the index, if no vacuuming work had been done on that page. However, if the last page of the index is all-zero, that would result in corruption of said page, since the functions called on it weren't prepared to handle that case. There's no need to lock any such pages, so change the logic to target the last normal leaf page instead. The first two of these bugs were diagnosed by Andres Freund, the other one by me. Fixes based on ideas from Heikki Linnakangas and myself. This has been wrong since Hot Standby was introduced, so back-patch to 9.0.	2014-01-14 17:35:21 -05:00
Robert Haas	ec9037df26	Single-reader, single-writer, lightweight shared message queue. This code provides infrastructure for user backends to communicate relatively easily with background workers. The message queue is structured as a ring buffer and allows messages of arbitary length to be sent and received. Patch by me. Review by KaiGai Kohei and Andres Freund.	2014-01-14 12:23:22 -05:00
Robert Haas	6ddd5137b2	Simple table of contents for a shared memory segment. This interface is intended to make it simple to divide a dynamic shared memory segment into different regions with distinct purposes. It therefore serves much the same purpose that ShmemIndex accomplishes for the main shared memory segment, but it is intended to be more lightweight. Patch by me. Review by Andres Freund.	2014-01-14 12:18:58 -05:00
Robert Haas	05ff5062da	Code improvements for ALTER SYSTEM .. SET. Move FreeConfigVariables() later to make sure ErrorConfFile is valid when we use it, and get rid of an unnecessary string copy operation. Amit Kapila, kibitzed by me.	2014-01-13 14:54:00 -05:00
Robert Haas	2bb1f14b89	Make bitmap heap scans show exact/lossy block info in EXPLAIN ANALYZE. Etsuro Fujita	2014-01-13 14:42:16 -05:00
Tom Lane	158b7fa6a3	Disallow LATERAL references to the target table of an UPDATE/DELETE. On second thought, commit `0c051c9008` was over-hasty: rather than allowing this case, we ought to reject it for now. That leaves the field clear for a future feature that allows the target table to be re-specified in the FROM (or USING) clause, which will enable left-joining the target table to something else. We can then also allow LATERAL references to such an explicitly re-specified target table. But allowing them right now will create ambiguities or worse for such a feature, and it isn't something we documented 9.3 as supporting. While at it, add a convenience subroutine to avoid having several copies of the ereport for disalllowed-LATERAL-reference cases.	2014-01-11 19:03:12 -05:00
Tom Lane	910bac5953	Fix possible crashes due to using elog/ereport too early in startup. Per reports from Andres Freund and Luke Campbell, a server failure during set_pglocale_pgservice results in a segfault rather than a useful error message, because the infrastructure needed to use ereport hasn't been initialized; specifically, MemoryContextInit hasn't been called. One known cause of this is starting the server in a directory it doesn't have permission to read. We could try to prevent set_pglocale_pgservice from using anything that depends on palloc or elog, but that would be messy, and the odds of future breakage seem high. Moreover there are other things being called in main.c that look likely to use palloc or elog too --- perhaps those things shouldn't be there, but they are there today. The best solution seems to be to move the call of MemoryContextInit to very early in the backend's real main() function. I've verified that an elog or ereport occurring immediately after that is now capable of sending something useful to stderr. I also added code to elog.c to print something intelligible rather than just crashing if MemoryContextInit hasn't created the ErrorContext. This could happen if MemoryContextInit itself fails (due to malloc failure), and provides some future-proofing against someone trying to sneak in new code even earlier in server startup. Back-patch to all supported branches. Since we've only heard reports of this type of failure recently, it may be that some recent change has made it more likely to see a crash of this kind; but it sure looks like it's broken all the way back.	2014-01-11 16:36:07 -05:00
Tom Lane	6286526207	Fix compute_scalar_stats() for case that all values exceed WIDTH_THRESHOLD. The standard typanalyze functions skip over values whose detoasted size exceeds WIDTH_THRESHOLD (1024 bytes), so as to limit memory bloat during ANALYZE. However, we (I think I, actually :-() failed to consider the possibility that every non-null value in a column is too wide. While compute_minimal_stats() seems to behave reasonably anyway in such a case, compute_scalar_stats() just fell through and generated no pg_statistic entry at all. That's unnecessarily pessimistic: we can still produce valid stanullfrac and stawidth values in such cases, since we do include too-wide values in the average-width calculation. Furthermore, since the general assumption in this code is that too-wide values are probably all distinct from each other, it seems reasonable to set stadistinct to -1 ("all distinct"). Per complaint from Kadri Raudsepp. This has been like this since roughly neolithic times, so back-patch to all supported branches.	2014-01-11 13:42:42 -05:00
Bruce Momjian	111022eac6	Move username lookup functions from /port to /common Per suggestion from Peter E and Alvaro	2014-01-10 18:03:28 -05:00
Alvaro Herrera	423e1211a8	Accept pg_upgraded tuples during multixact freezing The new MultiXact freezing routines introduced by commit `8e9a16ab8f` neglected to consider tuples that came from a pg_upgrade'd database; a vacuum run that tried to freeze such tuples would die with an error such as ERROR: MultiXactId 11415437 does no longer exist -- apparent wraparound To fix, ensure that GetMultiXactIdMembers is allowed to return empty multis when the infomask bits are right, as is done in other callsites. Per trouble report from F-Secure. In passing, fix a copy&paste bug reported by Andrey Karpov from VIVA64 from their PVS-Studio static checked, that instead of setting relminmxid to Invalid, we were setting relfrozenxid twice. Not an important mistake because that code branch is about relations for which we don't use the frozenxid/minmxid values at all in the first place, but seems to warrants a fix nonetheless.	2014-01-10 18:03:18 -03:00
Tom Lane	faab7a957d	Remove unnecessary local variables to work around an icc optimization bug. Buildfarm member dunlin has been crashing since commit `8b49a60`, but other machines seem fine with that code. It turns out that removing the local variables in ordered_set_startup() that are copies of fields in "qstate" dodges the problem. This might cost a few cycles on register-rich machines, but it's probably a wash on others, and in any case this code isn't performance-critical. Thanks to Jeremy Drake for off-list investigation.	2014-01-09 12:59:55 -05:00
Heikki Linnakangas	c945af80cf	Refactor checking whether we've reached the recovery target. Makes the replay loop slightly more readable, by separating the concerns of whether to stop and whether to delay, and how to extract the timestamp from a record. This has the user-visible change that the timestamp of the last applied record is now updated after actually applying it. Before, it was updated just before applying it. That meant that pg_last_xact_replay_timestamp() could return the timestamp of a commit record that is in process of being replayed, but not yet applied. Normally the difference is small, but if min_recovery_apply_delay is set, there could be a significant delay between reading a record and applying it. Another behavioral change is that if you recover to a restore point, we stop after the restore point record, not before it. It makes no difference as far as running queries on the server is concerned, as applying a restore point record changes nothing, but if examine the timeline history you will see that the new timeline branched off just after the restore point record, not before it. One practical consequence is that if you do PITR to the new timeline, and set recovery target to the same named restore point again, it will find and stop recovery at the same restore point. Conceptually, I think it makes more sense to consider the restore point as part of the new timeline's history than not. In principle, setting the last-replayed timestamp before actually applying the record was a bug all along, but it doesn't seem worth the risk to backpatch, since min_recovery_apply_delay was only added in 9.4.	2014-01-09 14:00:39 +02:00
Tom Lane	220b34331f	We don't need to include pg_sema.h in s_lock.h anymore. Minor improvement to commit daa7527afc2274432094ebe7ceb03aa41f916607: s_lock.h no longer has any need to mention PGSemaphoreData, so we can rip out the #include that supplies that. In a non-HAVE_SPINLOCKS build, this doesn't really buy much since we still need the #include in spin.h --- but everywhere else, this reduces #include footprint by some trifle, and helps keep the different locking facilities separate.	2014-01-08 20:58:22 -05:00
Tom Lane	080b7db72e	Fix "cannot accept a set" error when only some arms of a CASE return a set. In commit `c1352052ef`, I implemented an optimization that assumed that a function's argument expressions would either always return a set (ie multiple rows), or always not. This is wrong however: we allow CASE expressions in which some arms return a set of some type and others just return a scalar of that type. There may be other examples as well. To fix, replace the run-time test of whether an argument returned a set with a static precheck (expression_returns_set). This adds a little bit of query startup overhead, but it seems barely measurable. Per bug #8228 from David Johnston. This has been broken since 8.0, so patch all supported branches.	2014-01-08 20:18:58 -05:00
Robert Haas	daa7527afc	Reduce the number of semaphores used under --disable-spinlocks. Instead of allocating a semaphore from the operating system for every spinlock, allocate a fixed number of semaphores (by default, 1024) from the operating system and multiplex all the spinlocks that get created onto them. This could self-deadlock if a process attempted to acquire more than one spinlock at a time, but since processes aren't supposed to execute anything other than short stretches of straight-line code while holding a spinlock, that shouldn't happen. One motivation for this change is that, with the introduction of dynamic shared memory, it may be desirable to create spinlocks that last for less than the lifetime of the server. Without this change, attempting to use such facilities under --disable-spinlocks would quickly exhaust any supply of available semaphores. Quite apart from that, it's desirable to contain the quantity of semaphores needed to run the server simply on convenience grounds, since using too many may make it harder to get PostgreSQL running on a new platform, which is mostly the point of --disable-spinlocks in the first place. Patch by me; review by Tom Lane.	2014-01-08 18:58:00 -05:00
Heikki Linnakangas	3739e5ab93	Fix pause_at_recovery_target + recovery_target_inclusive combination. If pause_at_recovery_target is set, recovery pauses before applying the target record, even if recovery_target_inclusive is set. If you then continue with pg_xlog_replay_resume(), it will apply the target record before ending recovery. In other words, if you log in while it's paused and verify that the database looks OK, ending recovery changes its state again, possibly destroying data that you were tring to salvage with PITR. Backpatch to 9.1, this has been broken since pause_at_recovery_target was added.	2014-01-08 23:28:52 +02:00
Heikki Linnakangas	815d71deed	If multiple recovery_targets are specified, use the latest one. The docs say that only one of recovery_target_xid, recovery_target_time, or recovery_target_name can be specified. But the code actually did something different, so that a name overrode time, and xid overrode both time and name. Now the target specified last takes effect, whether it's an xid, time or name. With this patch, we still accept multiple recovery_target settings, even though docs say that only one can be specified. It's a general property of the recovery.conf file parser that you if you specify the same option twice, the last one takes effect, like with postgresql.conf.	2014-01-08 22:26:39 +02:00
Tom Lane	847e46abc9	Avoid extra AggCheckCallContext() checks in ordered-set aggregates. In the transition functions, we don't really need to recheck this after the first call. I had been feeling paranoid about possibly getting a non-null argument value in some other context; but it's probably game over anyway if we have a non-null "internal" value that's not what we are expecting. In the final functions, the general convention in pre-existing final functions seems to be that an Assert() is good enough, so do it like that here too. This seems to save a few tenths of a percent of overall query runtime, which isn't much, but still it's just overhead if there's not a plausible case where the checks would fire.	2014-01-08 14:33:52 -05:00
Tom Lane	e6336b8b57	Save a few cycles in advance_transition_function(). Keep a pre-initialized FunctionCallInfoData in AggStatePerAggData, and re-use that at each row instead of doing InitFunctionCallInfoData each time. This saves only half a dozen assignments and maybe some stack manipulation, and yet that seems to be good for a percent or two of the overall query run time for simple aggregates such as count(*). The cost is that the FunctionCallInfoData (which is about a kilobyte, on 64-bit machines) stays allocated for the duration of the query instead of being short-lived stack data. But we're already paying an equivalent space cost for each regular FuncExpr or OpExpr node, so I don't feel bad about paying it for aggregate functions. The code seems a little cleaner this way too, since the number of things passed to advance_transition_function decreases.	2014-01-08 13:58:37 -05:00
Heikki Linnakangas	d59ff6c110	Fix bug in determining when recovery has reached consistency. When starting WAL replay from an online checkpoint, the last replayed WAL record variable was initialized using the checkpoint record's location, even though the records between the REDO location and the checkpoint record had not been replayed yet. That was noted as "slightly confusing" but harmless in the comment, but in some cases, it fooled CheckRecoveryConsistency to incorrectly conclude that we had already reached a consistent state immediately at the beginning of WAL replay. That caused the system to accept read-only connections in hot standby mode too early, and also PANICs with message "WAL contains references to invalid pages". Fix by initializing the variables to the REDO location instead. In 9.2 and above, change CheckRecoveryConsistency() to use lastReplayedEndRecPtr variable when checking if backup end location has been reached. It was inconsistently using EndRecPtr for that check, but lastReplayedEndRecPtr when checking min recovery point. It made no difference before this patch, because in all the places where CheckRecoveryConsistency was called the two variables were the same, but it was always an accident waiting to happen, and would have been wrong after this patch anyway. Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0, where hot standby was introduced.	2014-01-08 15:03:09 +02:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Tom Lane	0c051c9008	Fix LATERAL references to target table of UPDATE/DELETE. I failed to think much about UPDATE/DELETE when implementing LATERAL :-(. The implemented behavior ended up being that subqueries in the FROM or USING clause (respectively) could access the update/delete target table as though it were a lateral reference; which seems fine if they said LATERAL, but certainly ought to draw an error if they didn't. Fix it so you get a suitable error when you omit LATERAL. Per report from Emre Hasegeli.	2014-01-07 15:25:27 -05:00
Heikki Linnakangas	f68220df92	Silence compiler warning on MSVC. MSVC doesn't know that elog(ERROR) doesn't return, and gives a warning about missing return. Silence that. Amit Kapila	2014-01-07 21:49:15 +02:00
Magnus Hagander	9544cc0d65	Move permissions check from do_pg_start_backup to pg_start_backup And the same for do_pg_stop_backup. The code in do_pg_* is not allowed to access the catalogs. For manual base backups, the permissions check can be handled in the calling function, and for streaming base backups only users with the required permissions can get past the authentication step in the first place. Reported by Antonin Houska, diagnosed by Andres Freund	2014-01-07 17:50:56 +01:00
Magnus Hagander	b168c5ef27	Avoid including tablespaces inside PGDATA twice in base backups If a tablespace was crated inside PGDATA it was backed up both as part of the PGDATA backup and as the backup of the tablespace. Avoid this by skipping any directory inside PGDATA that contains one of the active tablespaces. Dimitri Fontaine and Magnus Hagander	2014-01-07 17:11:32 +01:00
Peter Eisentraut	edc43458d7	Add more use of psprintf()	2014-01-06 21:30:26 -05:00
Tom Lane	8b49a6044d	Cache catalog lookup data across groups in ordered-set aggregates. The initial commit of ordered-set aggregates just did all the setup work afresh each time the aggregate function is started up. But in a GROUP BY query, the catalog lookups need not be repeated for each group, since the column datatypes and sort information won't change. When there are many small groups, this makes for a useful, though not huge, performance improvement. Per suggestion from Andrew Gierth. Profiling of these cases suggests that it might be profitable to avoid duplicate lookups within tuplesort startup as well; but changing the tuplesort APIs would have much broader impact, so I left that for another day.	2014-01-05 12:28:39 -05:00
Tom Lane	5858cf8ab2	Fix header comment for bitncmp(). The result is an int less than, equal to, or greater than zero, in the style of memcmp (and, in fact, exactly the output of memcmp in some cases). This comment previously said -1, 1, or 0, which was an overspecification, as noted by Emre Hasegeli. All of the existing callers appear to be fine with the actual behavior, so just fix the comment. In passing, improve infelicitous formatting of some call sites.	2014-01-04 14:01:51 -05:00
Alvaro Herrera	1a3e82a7f9	Restore some comments lost during `15732b34e8` Michael Paquier	2014-01-03 13:22:03 -03:00
Tom Lane	a3b4aeecfe	Ooops, should use double not single quotes in StaticAssertStmt(). That's what I get for testing this on an older compiler.	2014-01-02 21:54:20 -05:00
Tom Lane	a7ef273e1c	Fix calculation of maximum statistics-message size. The PGSTAT_NUM_TABENTRIES macro should have been updated when new fields were added to struct PgStat_MsgTabstat in commit `644828908`, but it wasn't. Fix that. Also, add a static assertion that we didn't overrun the intended size limit on stats messages. This will not necessarily catch every mistake in computing the maximum array size for stats messages, but it will catch ones that have practical consequences. (The assertion in fact doesn't complain about the aforementioned error in PGSTAT_NUM_TABENTRIES, because that was not big enough to cause the array length to increase.) No back-patch, as there's no actual bug in existing releases; this is just in the nature of future-proofing. Mark Dilger and Tom Lane	2014-01-02 21:45:51 -05:00
Alvaro Herrera	638cf09e76	Handle 5-char filenames in SlruScanDirectory Original users of slru.c were all producing 4-digit filenames, so that was all that that code was prepared to handle. Changes to multixact.c in the course of commit `0ac5ad5134` made pg_multixact/members create 5-digit filenames once a certain threshold was reached, which SlruScanDirectory wasn't prepared to deal with; in particular, 5-digit-name files were not removed during truncation. Change that routine to make it aware of those files, and have it process them just like any others. Right now, some pg_multixact/members directories will contain a mixture of 4-char and 5-char filenames. A future commit is expected fix things so that each slru.c user declares the correct maximum width for the files it produces, to avoid such unsightly mixtures. Noticed while investigating bug #8673 reported by Serge Negodyuck.	2014-01-02 18:17:29 -03:00
Alvaro Herrera	a50d976254	Wrap multixact/members correctly during extension In the 9.2 code for extending multixact/members, the logic was very simple because the number of entries in a members page was a proper divisor of 2^32, and thus at 2^32 wraparound the logic for page switch was identical than at any other page boundary. In commit `0ac5ad5134` I failed to realize this and introduced code that was not able to go over the 2^32 boundary. Fix that by ensuring that when we reach the last page of the last segment we correctly zero the initial page of the initial segment, using correct uint32-wraparound-safe arithmetic. Noticed while investigating bug #8673 reported by Serge Negodyuck, as diagnosed by Andres Freund.	2014-01-02 18:17:07 -03:00
Alvaro Herrera	722acf51a0	Handle wraparound during truncation in multixact/members In pg_multixact/members, relying on modulo-2^32 arithmetic for wraparound handling doesn't work all that well. Because we don't explicitely track wraparound of the allocation counter for members, it is possible that the "live" area exceeds 2^31 entries; trying to remove SLRU segments that are "old" according to the original logic might lead to removal of segments still in use. To fix, have the truncation routine use a tailored SlruScanDirectory callback that keeps track of the live area in actual use; that way, when the live range exceeds 2^31 entries, the oldest segments still live will not get removed untimely. This new SlruScanDir callback needs to take care not to remove segments that are "in the future": if new SLRU segments appear while the truncation is ongoing, make sure we don't remove them. This requires examination of shared memory state to recheck for false positives, but testing suggests that this doesn't cause a problem. The original coding didn't suffer from this pitfall because segments created when truncation is running are never considered to be removable. Per Andres Freund's investigation of bug #8673 reported by Serge Negodyuck.	2014-01-02 18:16:54 -03:00
Robert Haas	3cff1879f8	Aggressively freeze tables when CLUSTER or VACUUM FULL rewrites them. We haven't wanted to do this in the past on the grounds that in rare cases the original xmin value will be needed for forensic purposes, but commit `37484ad2aa` removes that objection, so now we can. Per extensive discussion, among many people, on pgsql-hackers.	2014-01-02 15:15:51 -05:00
Robert Haas	4b351841fa	Rename walLogHints to wal_log_hints for easier grepping. Michael Paquier	2014-01-01 20:17:00 -05:00
Tom Lane	c01bc51f8d	Fix broken support for event triggers as extension members. CREATE EVENT TRIGGER forgot to mark the event trigger as a member of its extension, and pg_dump didn't pay any attention anyway when deciding whether to dump the event trigger. Per report from Moshe Jacobson. Given the obvious lack of testing here, it's rather astonishing that ALTER EXTENSION ADD/DROP EVENT TRIGGER work, but they seem to.	2013-12-30 14:00:02 -05:00
Tom Lane	f7fbf4b0be	Remove dead code now that orindxpath.c is history. We don't need make_restrictinfo_from_bitmapqual() anymore at all. generate_bitmap_or_paths() doesn't need to be exported, and we can drop its rather klugy restriction_only flag.	2013-12-30 12:50:31 -05:00
Tom Lane	f343a880d5	Extract restriction OR clauses whether or not they are indexable. It's possible to extract a restriction OR clause from a join clause that has the form of an OR-of-ANDs, if each sub-AND includes a clause that mentions only one specific relation. While PG has been aware of that idea for many years, the code previously only did it if it could extract an indexable OR clause. On reflection, though, that seems a silly limitation: adding a restriction clause can be a win by reducing the number of rows that have to be filtered at the join step, even if we have to test the clause as a plain filter clause during the scan. This should be especially useful for foreign tables, where the change can cut the number of rows that have to be retrieved from the foreign server; but testing shows it can win even on local tables. Per a suggestion from Robert Haas. As a heuristic, I made the code accept an extracted restriction clause if its estimated selectivity is less than 0.9, which will probably result in accepting extracted clauses just about always. We might need to tweak that later based on experience. Since the code no longer has even a weak connection to Path creation, remove orindxpath.c and create a new file optimizer/util/orclauses.c. There's some additional janitorial cleanup of now-dead code that needs to happen, but it seems like that's a fit subject for a separate commit.	2013-12-30 12:24:37 -05:00
Peter Eisentraut	71812a98cb	Update grammar From: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>	2013-12-28 20:54:23 -05:00
Andrew Dunstan	29dcf7ded5	Properly detect invalid JSON numbers when generating JSON. Instead of looking for characters that aren't valid in JSON numbers, we simply pass the output string through the JSON number parser, and if it fails the string is quoted. This means among other things that money and domains over money will be quoted correctly and generate valid JSON. Fixes bug #8676 reported by Anderson Cristian da Silva. Backpatched to 9.2 where JSON generation was introduced.	2013-12-27 17:04:00 -05:00
Kevin Grittner	a133bf7031	Fix misplaced right paren bugs in pgstatfuncs.c. The bug would only show up if the C sockaddr structure contained zero in the first byte for a valid address; otherwise it would fail to fail, which is probably why it went unnoticed for so long. Patch submitted by Joel Jacobson after seeing an article by Andrey Karpov in which he reports finding this through static code analysis using PVS-Studio. While I was at it I moved a definition of a local variable referenced in the buggy code to a more local context. Backpatch to all supported branches.	2013-12-27 15:26:24 -06:00
Tom Lane	1def747db6	Fix inadequately-tested code path in tuplesort_skiptuples(). Per report from Jeff Davis.	2013-12-24 17:13:02 -05:00
Tom Lane	4eeda92d86	Fix ANALYZE failure on a column that's a domain over a range. Most other range operations seem to work all right on domains, but this one not so much, at least not since commit `918eee0c`. Per bug #8684 from Brett Neumeier.	2013-12-23 22:18:48 -05:00
Robert Haas	d43760b624	Revise documentation for new freezing method. Commit `37484ad2aa` invalidated a good chunk of documentation, so patch it up to reflect the new state of play. Along the way, patch remaining documentation references to FrozenXID to say instead FrozenTransactionId, so that they match the way we actually spell it in the code.	2013-12-23 20:36:31 -05:00
Tom Lane	cf63c641ca	Fix portability issue in ordered-set patch. Overly compact coding in makeOrderedSetArgs() led to a platform dependency: if the compiler chose to execute the subexpressions in the wrong order, list_length() might get applied to an already-modified List, giving a value we didn't want. Per buildfarm.	2013-12-23 20:24:07 -05:00
Tom Lane	8d65da1f01	Support ordered-set (WITHIN GROUP) aggregates. This patch introduces generic support for ordered-set and hypothetical-set aggregate functions, as well as implementations of the instances defined in SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(), percent_rank(), cume_dist()). We also added mode() though it is not in the spec, as well as versions of percentile_cont() and percentile_disc() that can compute multiple percentile values in one pass over the data. Unlike the original submission, this patch puts full control of the sorting process in the hands of the aggregate's support functions. To allow the support functions to find out how they're supposed to sort, a new API function AggGetAggref() is added to nodeAgg.c. This allows retrieval of the aggregate call's Aggref node, which may have other uses beyond the immediate need. There is also support for ordered-set aggregates to install cleanup callback functions, so that they can be sure that infrastructure such as tuplesort objects gets cleaned up. In passing, make some fixes in the recently-added support for variadic aggregates, and make some editorial adjustments in the recent FILTER additions for aggregates. Also, simplify use of IsBinaryCoercible() by allowing it to succeed whenever the target type is ANY or ANYELEMENT. It was inconsistent that it dealt with other polymorphic target types but not these. Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing, and rather heavily editorialized upon by Tom Lane	2013-12-23 16:11:35 -05:00
Robert Haas	37484ad2aa	Change the way we mark tuples as frozen. Instead of changing the tuple xmin to FrozenTransactionId, the combination of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous proposals to freeze tuples opportunistically before vacuum_freeze_min_age is reached have foundered on the objection that replacing xmin by FrozenTransactionId might hinder debugging efforts when things in this area go awry; this patch is intended to solve that problem by keeping the XID around (but largely ignoring the value to which it is set). Third-party code that checks for HEAP_XMIN_INVALID on tuples where HEAP_XMIN_COMMITTED might be set will be broken by this change. To fix, use the new accessor macros in htup_details.h rather than consulting the bits directly. HeapTupleHeaderGetXmin has been modified to return FrozenTransactionId when the infomask bits indicate that the tuple is frozen; use HeapTupleHeaderGetRawXmin when you already know that the tuple isn't marked commited or frozen, or want the raw value anyway. We currently do this in routines that display the xmin for user consumption, in tqual.c where it's known to be safe and important for the avoidance of extra cycles, and in the function-caching code for various procedural languages, which shouldn't invalidate the cache just because the tuple gets frozen. Robert Haas and Andres Freund	2013-12-22 15:49:09 -05:00
Fujii Masao	961bf59fb7	Rename wal_log_hintbits to wal_log_hints, per discussion on pgsql-hackers. Sawada Masahiko	2013-12-21 03:33:16 +09:00
Alvaro Herrera	6130208e75	Avoid useless palloc during transaction commit We can allocate the initial relations-to-drop array when first needed, instead of at function entry; this avoids allocating it when the function is not going to do anything, which is most of the time. Backpatch to 9.3, where this behavior was introduced by commit `279628a0a7`. There's more that could be done here, such as possible reworking of the code to avoid having to palloc anything, but that doesn't sound as backpatchable as this relatively minor change. Per complaint from Noah Misch in 20131031145234.GA621493@tornado.leadboat.com	2013-12-20 12:37:30 -03:00
Bruce Momjian	527fdd9df1	Move pg_upgrade_support global variables to their own include file Previously their declarations were spread around to avoid accidental access.	2013-12-19 16:10:07 -05:00
Alvaro Herrera	13aa624431	Optimize updating a row that's locked by same xid Updating or locking a row that was already locked by the same transaction under the same Xid caused a MultiXact to be created; but this is unnecessary, because there's no usefulness in being able to differentiate two locks by the same transaction. In particular, if a transaction executed SELECT FOR UPDATE followed by an UPDATE that didn't modify columns of the key, we would dutifully represent the resulting combination as a multixact -- even though a single key-update is sufficient. Optimize the case so that only the strongest of both locks/updates is represented in Xmax. This can save some Xmax's from becoming MultiXacts, which can be a significant optimization. This missed optimization opportunity was spotted by Andres Freund while investigating a bug reported by Oliver Seemann in message CANCipfpfzoYnOz5jj=UZ70_R=CwDHv36dqWSpwsi27vpm1z5sA@mail.gmail.com and also directly as a performance regression reported by Dong Ye in message d54b8387.000012d8.00000010@YED-DEVD1.vmware.com Reportedly, this patch fixes the performance regression. Since the missing optimization was reported as a significant performance regression from 9.2, backpatch to 9.3. Andres Freund, tweaked by Álvaro Herrera	2013-12-19 16:53:49 -03:00
Robert Haas	001a573a20	Allow on-detach callbacks for dynamic shared memory segments. Just as backends must clean up their shared memory state (releasing lwlocks, buffer pins, etc.) before exiting, they must also perform any similar cleanups related to dynamic shared memory segments they have mapped before unmapping those segments. So add a mechanism to ensure that. Existing on_shmem_exit hooks include both "user level" cleanup such as transaction abort and removal of leftover temporary relations and also "low level" cleanup that forcibly released leftover shared memory resources. On-detach callbacks should run after the first group but before the second group, so create a new before_shmem_exit function for registering the early callbacks and keep on_shmem_exit for the regular callbacks. (An earlier draft of this patch added an additional argument to on_shmem_exit, but that had a much larger footprint and probably a substantially higher risk of breaking third party code for no real gain.) Patch by me, reviewed by KaiGai Kohei and Andres Freund.	2013-12-18 13:09:09 -05:00
Bruce Momjian	613c6d26bd	Fix incorrect error message reported for non-existent users Previously, lookups of non-existent user names could return "Success"; it will now return "User does not exist" by resetting errno. This also centralizes the user name lookup code in libpgport. Report and analysis by Nicolas Marchildon; patch by me	2013-12-18 12:16:21 -05:00
Alvaro Herrera	11ac4c73cb	Don't ignore tuple locks propagated by our updates If a tuple was locked by transaction A, and transaction B updated it, the new version of the tuple created by B would be locked by A, yet visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the lock held by A wouldn't get checked if transaction B later deleted (or key-updated) the new version of the tuple. This might cause referential integrity checks to give false positives (that is, allow deletes that should have been rejected). This is an easy oversight to have made, because prior to improved tuple locks in commit `0ac5ad5134` it wasn't possible to have tuples created by our own transaction that were also locked by remote transactions, and so locks weren't even considered in that code path. It is recommended that foreign keys be rechecked manually in bulk after installing this update, in case some referenced rows are missing with some referencing row remaining. Per bug reported by Daniel Wood in CAPweHKe5QQ1747X2c0tA=5zf4YnS2xcvGf13Opd-1Mq24rF1cQ@mail.gmail.com	2013-12-18 13:45:51 -03:00
Tatsuo Ishii	65d6e4cb5c	Add ALTER SYSTEM command to edit the server configuration file. Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii, Boszormenyi Zoltan, Andres Freund, Greg Smith and others.	2013-12-18 23:42:44 +09:00
Bruce Momjian	dba5a9dda9	Comment: COPY comment improvement Etsuro Fujita	2013-12-17 12:51:16 -05:00
Alvaro Herrera	3b97e6823b	Rework tuple freezing protocol Tuple freezing was broken in connection to MultiXactIds; commit `8e53ae025d` tried to fix it, but didn't go far enough. As noted by Noah Misch, freezing a tuple whose Xmax is a multi containing an aborted update might cause locks in the multi to go ignored by later transactions. This is because the code depended on a multixact above their cutoff point not having any lock-only member older than the cutoff point for Xids, which is easily defeated in READ COMMITTED transactions. The fix for this involves creating a new MultiXactId when necessary. But this cannot be done during WAL replay, and moreover multixact examination requires using CLOG access routines which are not supposed to be used during WAL replay either; so tuple freezing cannot be done with the old freeze WAL record. Therefore, separate the freezing computation from its execution, and change the WAL record to carry all necessary information. At WAL replay time, it's easy to re-execute freezing because we don't need to re-compute the new infomask/Xmax values but just take them from the WAL record. While at it, restructure the coding to ensure all page changes occur in a single critical section without much room for failures. The previous coding wasn't using a critical section, without any explanation as to why this was acceptable. In replication scenarios using the 9.3 branch, standby servers must be upgraded before their master, so that they are prepared to deal with the new WAL record once the master is upgraded; failure to do so will cause WAL replay to die with a PANIC message. Later upgrade of the standby will allow the process to continue where it left off, so there's no disruption of the data in the standby in any case. Standbys know how to deal with the old WAL record, so it's okay to keep the master running the old code for a while. In master, the old freeze WAL record is gone, for cleanliness' sake; there's no compatibility concern there. Backpatch to 9.3, where the original bug was introduced and where the previous fix was backpatched. Álvaro Herrera and Andres Freund	2013-12-16 11:29:50 -03:00
Heikki Linnakangas	30b96549ab	Mark variables 'static' where possible. Move GinFuzzySearchLimit to ginget.c Per "clang -Wmissing-variable-declarations" output, posted by Andres Freund. I didn't silence all those warnings, though, only the most obvious cases.	2013-12-16 11:41:17 +02:00
Tom Lane	1b4f7f93b4	Allow empty target list in SELECT. This fixes a problem noted as a followup to bug #8648: if a query has a semantically-empty target list, e.g. SELECT * FROM zero_column_table, ruleutils.c will dump it as a syntactically-empty target list, which was not allowed. There doesn't seem to be any reliable way to fix this by hacking ruleutils (note in particular that the originally zero-column table might since have had columns added to it); and even if we had such a fix, it would do nothing for existing dump files that might contain bad syntax. The best bet seems to be to relax the syntactic restriction. Also, add parse-analysis errors for SELECT DISTINCT with no columns (after *-expansion) and RETURNING with no columns. These cases previously produced unexpected behavior because the parsed Query looked like it had no DISTINCT or RETURNING clause, respectively. If anyone ever offers a plausible use-case for this, we could work a bit harder on making the situation distinguishable. Arguably this is a bug fix that should be back-patched, but I'm worried that there may be client apps or PLs that expect "SELECT ;" to throw a syntax error. The issue doesn't seem important enough to risk changing behavior in minor releases.	2013-12-14 20:23:26 -05:00
Tom Lane	c03ad5602f	Fix inherited UPDATE/DELETE with UNION ALL subqueries. Fix an oversight in commit b3aaf9081a1a95c245fd605dcf02c91b3a5c3a29: we do indeed need to process the planner's append_rel_list when copying RTE subqueries, because if any of them were flattenable UNION ALL subqueries, the append_rel_list shows which subquery RTEs were pulled up out of which other ones. Without this, UNION ALL subqueries aren't correctly inserted into the update plans for inheritance child tables after the first one, typically resulting in no update happening for those child table(s). Per report from Victor Yegorov. Experimentation with this case also exposed a fault in commit a7b965382cf0cb30aeacb112572718045e6d4be7: if an inherited UPDATE/DELETE was proven totally dummy by constraint exclusion, we might arrive at add_rtes_to_flat_rtable with root->simple_rel_array being NULL. This should be interpreted as not having any RelOptInfos. I chose to code the guard as a check against simple_rel_array_size, so as to also provide some protection against indexing off the end of the array. Back-patch to 9.2 where the faulty code was added.	2013-12-14 17:33:53 -05:00
Alvaro Herrera	60eea3780c	Fix typo	2013-12-13 17:27:16 -03:00
Alvaro Herrera	d881dd6233	Rework MultiXactId cache code The original performs too poorly; in some scenarios it shows way too high while profiling. Try to make it a bit smarter to avoid excessive cosst. In particular, make it have a maximum size, and have entries be sorted in LRU order; once the max size is reached, evict the oldest entry to avoid it from growing too large. Per complaint from Andres Freund in connection with new tuple freezing code.	2013-12-13 17:16:25 -03:00
Tom Lane	2efc6dc256	Add HOLD/RESUME_INTERRUPTS in HandleCatchupInterrupt/HandleNotifyInterrupt. This prevents a possible longjmp out of the signal handler if a timeout or SIGINT occurs while something within the handler has transiently set ImmediateInterruptOK. For safety we must hold off the timeout or cancel error until we're back in mainline, or at least till we reach the end of the signal handler when ImmediateInterruptOK was true at entry. This syncs these functions with the logic now present in handle_sig_alarm. AFAICT there is no live bug here in 9.0 and up, because I don't think we currently can wait for any heavyweight lock inside these functions, and there is no other code (except read-from-client) that will turn on ImmediateInterruptOK. However, that was not true pre-9.0: in older branches ProcessIncomingNotify might block trying to lock pg_listener, and then a SIGINT could lead to undesirable control flow. It might be all right anyway given the relatively narrow code ranges in which NOTIFY interrupts are enabled, but for safety's sake I'm back-patching this.	2013-12-13 14:05:51 -05:00
Heikki Linnakangas	dde6282500	Fix more instances of "the the" in comments. Plus one instance of "to to" in the docs.	2013-12-13 20:02:01 +02:00
Tom Lane	e8312b4f03	Don't let timeout interrupts happen unless ImmediateInterruptOK is set. Serious oversight in commit 16e1b7a1b7f7ffd8a18713e83c8cd72c9ce48e07: we should not allow an interrupt to take control away from mainline code except when ImmediateInterruptOK is set. Just to be safe, let's adopt the same save-clear-restore dance that's been used for many years in HandleCatchupInterrupt and HandleNotifyInterrupt, so that nothing bad happens if a timeout handler invokes code that tests or even manipulates ImmediateInterruptOK. Per report of "stuck spinlock" failures from Christophe Pettus, though many other symptoms are possible. Diagnosis by Andres Freund.	2013-12-13 11:50:15 -05:00
Heikki Linnakangas	50e547096c	Add GUC to enable WAL-logging of hint bits, even with checksums disabled. WAL records of hint bit updates is useful to tools that want to examine which pages have been modified. In particular, this is required to make the pg_rewind tool safe (without checksums). This can also be used to test how much extra WAL-logging would occur if you enabled checksums, without actually enabling them (which you can't currently do without re-initdb'ing). Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with further changes by me.	2013-12-13 16:26:14 +02:00
Heikki Linnakangas	a49633d8dc	Fix WAL-logging of setting the visibility map bit. The operation that removes the remaining dead tuples from the page must be WAL-logged before the setting of the VM bit. Otherwise, if you replay the WAL to between those two records, you end up with the VM bit set, but the dead tuples are still there. Backpatch to 9.3, where this bug was introduced.	2013-12-13 14:15:04 +02:00
Tom Lane	ccca6f56f5	Fix ancient docs/comments thinko: XID comparison is mod 2^32, not 2^31. Pointed out by Gianni Ciolli.	2013-12-12 12:39:48 -05:00
Tom Lane	f26099057a	Improve EXPLAIN to print the grouping columns in Agg and Group nodes. Per request from Kevin Grittner.	2013-12-12 11:24:38 -05:00
Simon Riggs	8693559cac	New autovacuum_work_mem parameter If autovacuum_work_mem is set, autovacuum workers now use this parameter in preference to maintenance_work_mem. Peter Geoghegan	2013-12-12 11:42:39 +00:00
Simon Riggs	36da3cfb45	Allow time delayed standbys and recovery Set min_recovery_apply_delay to force a delay in recovery apply for commit and restore point WAL records. Other records are replayed immediately. Delay is measured between WAL record time and local standby time. Robert Haas, Fabrízio de Royes Mello and Simon Riggs Detailed review by Mitsumasa Kondo	2013-12-12 10:53:20 +00:00
Tom Lane	22310b808d	Remove bogus executable permissions on xlog.c. Apparently fat-fingered in `1a3d104475`. Noted by Peter Geoghegan.	2013-12-11 22:12:25 -05:00
Robert Haas	60dd40bbda	Under wal_level=logical, when saving old tuples, always save OID. There's no real point in not doing this. It doesn't cost anything in performance or space. So let's go wild. Andres Freund, with substantial editing as to style by me.	2013-12-11 13:19:31 -05:00
Robert Haas	66abc2608c	Add a new reloption, user_catalog_table. When this reloption is set and wal_level=logical is configured, we'll record the CIDs stamped by inserts, updates, and deletes to the table just as we would for an actual catalog table. This will allow logical decoding to use historical MVCC snapshots to access such tables just as they access ordinary catalog tables. Replication solutions built around the logical decoding machinery will likely need to set this operation for their configuration tables; it might also be needed by extensions which perform table access in their output functions. Andres Freund, reviewed by myself and others.	2013-12-10 19:17:34 -05:00
Robert Haas	e55704d8b2	Add new wal_level, logical, sufficient for logical decoding. When wal_level=logical, we'll log columns from the old tuple as configured by the REPLICA IDENTITY facility added in commit `07cacba983`. This makes it possible a properly-configured logical replication solution to correctly follow table updates even if they change the chosen key columns, or, with REPLICA IDENTITY FULL, even if the table has no key at all. Note that updates which do not modify the replica identity column won't log anything extra, making the choice of a good key (i.e. one that will rarely be changed) important to performance when wal_level=logical is configured. Each insert, update, or delete to a catalog table will also log the CMIN and/or CMAX values of stamped by the current transaction. This is necessary because logical decoding will require access to historical snapshots of the catalog in order to decode some data types, and the CMIN/CMAX values that we may need in order to judge row visibility may have been overwritten by the time we need them. Andres Freund, reviewed in various versions by myself, Heikki Linnakangas, KONDO Mitsumasa, and many others.	2013-12-10 19:01:40 -05:00
Tom Lane	9ec6199d18	Fix possible crash with nested SubLinks. An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...) could produce an invalid plan that results in a crash at execution time, if the planner attempts to flatten the outer IN into a semi-join. This happens because convert_testexpr() was not expecting any nested SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging to the inner SubLink. (I think the comment denying that this case could happen was wrong when written; it's certainly been wrong for quite a long time, since very early versions of the semijoin flattening logic.) Per report from Teodor Sigaev. Back-patch to all supported branches.	2013-12-10 16:10:17 -05:00
Noah Misch	53685d7981	Rename TABLE() to ROWS FROM(). SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and other collection types. This feature, however, deals with set-returning functions. Use a different syntax for this feature to keep open the possibility of implementing the standard TABLE().	2013-12-10 09:34:37 -05:00
Robert Haas	d9250da032	Fixups for dsm.c's file descriptor handling. Per complaint from Tom Lane.	2013-12-09 11:15:19 -05:00
Peter Eisentraut	3164721462	SSL: Support ECDH key exchange This sets up ECDH key exchange, when compiling against OpenSSL that supports EC. Then the ECDHE-RSA and ECDHE-ECDSA cipher suites can be used for SSL connections. The latter one means that EC keys are now usable. The reason for EC key exchange is that it's faster than DHE and it allows to go to higher security levels where RSA will be horribly slow. There is also new GUC option ssl_ecdh_curve that specifies the curve name used for ECDH. It defaults to "prime256v1", which is the most common curve in use in HTTPS. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>	2013-12-07 15:11:44 -05:00
Peter Eisentraut	ef3267523d	SSL: Add configuration option to prefer server cipher order By default, OpenSSL (and SSL/TLS in general) lets the client cipher order take priority. This is OK for browsers where the ciphers were tuned, but few PostgreSQL client libraries make the cipher order configurable. So it makes sense to have the cipher order in postgresql.conf take priority over client defaults. This patch adds the setting "ssl_prefer_server_ciphers" that can be turned on so that server cipher order is preferred. Per discussion, this now defaults to on. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>	2013-12-07 08:13:50 -05:00
Alvaro Herrera	312bde3d40	Fix improper abort during update chain locking In `247c76a989`, I added some code to do fine-grained checking of MultiXact status of locking/updating transactions when traversing an update chain. There was a thinko in that patch which would have the traversing abort, that is return HeapTupleUpdated, when the other transaction is a committed lock-only. In this case we should ignore it and return success instead. Of course, in the case where there is a committed update, HeapTupleUpdated is the correct return value. A user-visible symptom of this bug is that in REPEATABLE READ and SERIALIZABLE transaction isolation modes spurious serializability errors can occur: ERROR: could not serialize access due to concurrent update In order for this to happen, there needs to be a tuple that's key-share- locked and also updated, and the update must abort; a subsequent transaction trying to acquire a new lock on that tuple would abort with the above error. The reason is that the initial FOR KEY SHARE is seen as committed by the new locking transaction, which triggers this bug. (If the UPDATE commits, then the serialization error is correctly reported.) When running a query in READ COMMITTED mode, what happens is that the locking is aborted by the HeapTupleUpdated return value, then EvalPlanQual fetches the newest version of the tuple, which is then the only version that gets locked. (The second time the tuple is checked there is no misbehavior on the committed lock-only, because it's not checked by the code that traverses update chains; so no bug.) Only the newest version of the tuple is locked, not older ones, but this is harmless. The isolation test added by this commit illustrates the desired behavior, including the proper serialization errors that get thrown. Backpatch to 9.3.	2013-12-05 17:47:51 -03:00
Tom Lane	74242c23c1	Clear retry flags properly in replacement OpenSSL sock_write function. Current OpenSSL code includes a BIO_clear_retry_flags() step in the sock_write() function. Either we failed to copy the code correctly, or they added this since we copied it. In any case, lack of the clear step appears to be the cause of the server lockup after connection loss reported in bug #8647 from Valentine Gogichashvili. Assume that this is correct coding for all OpenSSL versions, and hence back-patch to all supported branches. Diagnosis and patch by Alexander Kukushkin.	2013-12-05 12:48:28 -05:00
Alvaro Herrera	07aeb1fec5	Avoid resetting Xmax when it's a multi with an aborted update HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while checking the contents of a multixact and finding it contains an aborted update, by setting the HEAP_XMAX_INVALID bit. This would lead to concurrent transactions not noticing any previous locks held by transactions that might still be running, and thus being able to acquire subsequent locks they wouldn't be normally able to acquire. This bug was introduced in commit 1ce150b7bb; backpatch this fix to 9.3, like that commit. This change reverts the change to the delete-abort-savept isolation test in `1ce150b7bb`, because that behavior change was caused by this bug. Noticed by Andres Freund while investigating a different issue reported by Noah Misch.	2013-12-05 12:21:55 -03:00
Heikki Linnakangas	9e857436ef	Don't include unused space in LOG_NEWPAGE records. This is the same trick we use when taking a full page image of a buffer passed to XLogInsert.	2013-12-04 00:10:47 +02:00
Heikki Linnakangas	22122c83f1	Fix full-page writes of internal GIN pages. Insertion to a non-leaf GIN page didn't make a full-page image of the page, which is wrong. The code used to do it correctly, but was changed (commit `853d1c3103`) because the redo-routine didn't track incomplete splits correctly when the page was restored from a full page image. Of course, that was not right way to fix it, the redo routine should've been fixed instead. The redo-routine was surreptitiously fixed in 2010 (commit `4016bdef8a`), so all we need to do now is revert the code that creates the record to its original form. This doesn't change the format of the WAL record. Backpatch to all supported versions.	2013-12-03 23:16:01 +02:00
Peter Eisentraut	fef88b3fda	Report exit code from external recovery commands properly When an external recovery command such as restore_command or archive_cleanup_command fails, report the exit code properly, distinguishing signals and normal exists, using the existing wait_result_to_str() facility, instead of just reporting the return value from system(). Reviewed-by: Peter Geoghegan <pg@heroku.com>	2013-12-02 22:31:05 -05:00
Tom Lane	7ab321404c	Fix crash in assign_collations_walker for EXISTS with empty SELECT list. We (I think I, actually) forgot about this corner case while coding collation resolution. Per bug #8648 from Arjen Nienhuis.	2013-12-02 20:28:45 -05:00
Robert Haas	c6d4b1dd3e	Flag mmap implemenation of dynamic shared memory as resize-capable. Error noted by Heikki Linnakangas	2013-12-02 11:18:54 -05:00
Robert Haas	a8656a3ab0	Make NUM_TOCHAR_prepare and NUM_TOCHAR_finish macros declare "len". Remove the variable from the enclosing scopes so that nothing can be relying on it. The net result of this refactoring is that we get rid of a few unnecessary strlen() calls. Original patch from Greg Jaskiewicz, substantially expanded by me.	2013-12-02 10:51:06 -05:00
Robert Haas	9d140f7be2	Avoid out-of-bounds read in errfinish if error_stack_depth < 0. If errordata_stack_depth < 0, we won't find that out and correct the problem until CHECK_STACK_DEPTH() is invoked. In the meantime, elevel will be set based on an invalid read. This is probably harmless in practice, but it seems cleaner this way. Xi Wang	2013-12-02 10:42:01 -05:00
Peter Eisentraut	3e3520cf7a	Translation updates	2013-12-02 00:17:07 -05:00
Alvaro Herrera	2393c7d102	Fix a couple of bugs in MultiXactId freezing Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look into a multixact to check the members against cutoff_xid. This means that a very old Xid could survive hidden within a multi, possibly outliving its CLOG storage. In the distant future, this would cause clog lookup failures: ERROR: could not access status of transaction 3883960912 DETAIL: Could not open file "pg_clog/0E78": No such file or directory. This mostly was problematic when the updating transaction aborted, since in that case the row wouldn't get pruned away earlier in vacuum and the multixact could possibly survive for a long time. In many cases, data that is inaccessible for this reason way can be brought back heuristically. As a second bug, heap_freeze_tuple() didn't properly handle multixacts that need to be frozen according to cutoff_multi, but whose updater xid is still alive. Instead of preserving the update Xid, it just set Xmax invalid, which leads to both old and new tuple versions becoming visible. This is pretty rare in practice, but a real threat nonetheless. Existing corrupted rows, unfortunately, cannot be repaired in an automated fashion. Existing physical replicas might have already incorrectly frozen tuples because of different behavior than in master, which might only become apparent in the future once pg_multixact/ is truncated; it is recommended that all clones be rebuilt after upgrading. Following code analysis caused by bug report by J Smith in message CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com and privately by F-Secure. Backpatch to 9.3, where freezing of MultiXactIds was introduced. Analysis and patch by Andres Freund, with some tweaks by Álvaro.	2013-11-29 21:47:25 -03:00
Alvaro Herrera	1ce150b7bb	Don't TransactionIdDidAbort in HeapTupleGetUpdateXid It is dangerous to do so, because some code expects to be able to see what's the true Xmax even if it is aborted (particularly while traversing HOT chains). So don't do it, and instead rely on the callers to verify for abortedness, if necessary. Several race conditions and bugs fixed in the process. One isolation test changes the expected output due to these. This also reverts commit `c235a6a589`, which is no longer necessary. Backpatch to 9.3, where this function was introduced. Andres Freund	2013-11-29 21:47:21 -03:00
Alvaro Herrera	1df0122daa	Truncate pg_multixact/'s contents during crash recovery Commit `9dc842f08` of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund	2013-11-29 21:47:15 -03:00
Alvaro Herrera	f54106f77e	Fix full-table-vacuum request mechanism for MultiXactIds While autovacuum dutifully launched anti-multixact-wraparound vacuums when the multixact "age" was reached, the vacuum code was not aware that it needed to make them be full table vacuums. As the resulting partial-table vacuums aren't capable of actually increasing relminmxid, autovacuum continued to launch anti-wraparound vacuums that didn't have the intended effect, until age of relfrozenxid caused the vacuum to finally be a full table one via vacuum_freeze_table_age. To fix, introduce logic for multixacts similar to that for plain TransactionIds, using the same GUCs. Backpatch to 9.3, where permanent MultiXactIds were introduced. Andres Freund, some cleanup by Álvaro	2013-11-29 21:47:13 -03:00
Alvaro Herrera	76a31c689c	Replace hardcoded 200000000 with autovacuum_freeze_max_age Parts of the code used autovacuum_freeze_max_age to determine whether anti-multixact-wraparound vacuums are necessary, while others used a hardcoded 200000000 value. This leads to problems when autovacuum_freeze_max_age is set to a non-default value. Use the latter everywhere. Backpatch to 9.3, where vacuuming of multixacts was introduced. Andres Freund	2013-11-29 21:47:09 -03:00
Tom Lane	8b151558c8	Be sure to release proc->backendLock after SetupLockInTable() failure. The various places that transferred fast-path locks to the main lock table neglected to release the PGPROC's backendLock if SetupLockInTable failed due to being out of shared memory. In most cases this is no big deal since ensuing error cleanup would release all held LWLocks anyway. But there are some hot-standby functions that don't consider failure of FastPathTransferRelationLocks to be a hard error, and in those cases this oversight could lead to system lockup. For consistency, make all of these places look the same as FastPathTransferRelationLocks. Noted while looking for the cause of Dan Wood's bugs --- this wasn't it, but it's a bug anyway.	2013-11-29 17:35:09 -05:00
Tom Lane	16e1b7a1b7	Fix assorted race conditions in the new timeout infrastructure. Prevent handle_sig_alarm from losing control partway through due to a query cancel (either an asynchronous SIGINT, or a cancel triggered by one of the timeout handler functions). That would at least result in failure to schedule any required future interrupt, and might result in actual corruption of timeout.c's data structures, if the interrupt happened while we were updating those. We could still lose control if an asynchronous SIGINT arrives just as the function is entered. This wouldn't break any data structures, but it would have the same effect as if the SIGALRM interrupt had been silently lost: we'd not fire any currently-due handlers, nor schedule any new interrupt. To forestall that scenario, forcibly reschedule any pending timer interrupt during AbortTransaction and AbortSubTransaction. We can avoid any extra kernel call in most cases by not doing that until we've allowed LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events. Another hazard is that some platforms (at least Linux and *BSD) block a signal before calling its handler and then unblock it on return. When we longjmp out of the handler, the unblock doesn't happen, and the signal is left blocked indefinitely. Again, we can fix that by forcibly unblocking signals during AbortTransaction and AbortSubTransaction. These latter two problems do not manifest when the longjmp reaches postgres.c, because the error recovery code there kills all pending timeout events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal mask is restored. So errors thrown outside any transaction should be OK already, and cleaning up in AbortTransaction and AbortSubTransaction should be enough to fix these issues. (We're assuming that any code that catches a query cancel error and doesn't re-throw it will do at least a subtransaction abort to clean up; but that was pretty much required already by other subsystems.) Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when disabling that event: if a lock timeout interrupt happened after the lock was granted, the ensuing query cancel is still going to happen at the next CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user cancel. Per reports from Dan Wood. Back-patch to 9.3 where the new timeout handling infrastructure was introduced. We may at some point decide to back-patch the signal unblocking changes further, but I'll desist from that until we hear actual field complaints about it.	2013-11-29 16:41:00 -05:00
Robert Haas	8e18d04d4d	Refine our definition of what constitutes a system relation. Although user-defined relations can't be directly created in pg_catalog, it's possible for them to end up there, because you can create them in some other schema and then use ALTER TABLE .. SET SCHEMA to move them there. Previously, such relations couldn't afterwards be manipulated, because IsSystemRelation()/IsSystemClass() rejected all attempts to modify objects in the pg_catalog schema, regardless of their origin. With this patch, they now reject only those objects in pg_catalog which were created at initdb-time, allowing most operations on user-created tables in pg_catalog to proceed normally. This patch also adds new functions IsCatalogRelation() and IsCatalogClass(), which is similar to IsSystemRelation() and IsSystemClass() but with a slightly narrower definition: only TOAST tables of system catalogs are included, rather than all TOAST tables. This is currently used only for making decisions about when invalidation messages need to be sent, but upcoming logical decoding patches will find other uses for this information. Andres Freund, with some modifications by me.	2013-11-28 20:57:20 -05:00
Heikki Linnakangas	2fe69cacff	Another gin_desc fix. The number of items inserted was incorrectly printed as if it was a boolean.	2013-11-28 23:35:50 +02:00
Heikki Linnakangas	97c19e6c38	Fix gin_desc routine to match the WAL format. In the GIN incomplete-splits patch, I used BlockIdDatas to store the block number of left and right children, when inserting a downlink after a split to an internal page posting list page. But gin_desc thought they were stored as BlockNumbers.	2013-11-28 21:57:42 +02:00
Tom Lane	da8a716089	Fix latent(?) race condition in LockReleaseAll. We have for a long time checked the head pointer of each of the backend's proclock lists and skipped acquiring the corresponding locktable partition lock if the head pointer was NULL. This was safe enough in the days when proclock lists were changed only by the owning backend, but it is pretty questionable now that the fast-path patch added cases where backends add entries to other backends' proclock lists. However, we don't really wish to revert to locking each partition lock every time, because in simple transactions that would add a lot of useless lock/unlock cycles on already-heavily-contended LWLocks. Fortunately, the only way that another backend could be modifying our proclock list at this point would be if it was promoting a formerly fast-path lock of ours; and any such lock must be one that we'd decided not to delete in the previous loop over the locallock table. So it's okay if we miss seeing it in this loop; we'd just decide not to delete it again. However, once we've detected a non-empty list, we'd better re-fetch the list head pointer after acquiring the partition lock. This guards against possibly fetching a corrupt-but-non-null pointer if pointer fetch/store isn't atomic. It's not clear if any practical architectures are like that, but we've never assumed that before and don't wish to start here. In any case, the situation certainly deserves a code comment. While at it, refactor the partition traversal loop to use a for() construct instead of a while() loop with goto's. Back-patch, just in case the risk is real and not hypothetical.	2013-11-28 12:17:46 -05:00
Alvaro Herrera	d51a8c52ba	Unbreak buildfarm I removed an intermediate commit before pushing and forgot to test the resulting tree :-(	2013-11-28 12:59:45 -03:00
Alvaro Herrera	247c76a989	Use a more granular approach to follow update chains Instead of simply checking the KEYS_UPDATED bit, we need to check whether each lock held on the future version of the tuple conflicts with the lock we're trying to acquire. Per bug report #8434 by Tomonari Katsumata	2013-11-28 12:00:12 -03:00
Alvaro Herrera	e4828e9ccb	Compare Xmin to previous Xmax when locking an update chain Not doing so causes us to traverse an update chain that has been broken by concurrent page pruning. All other code that traverses update chains uses this check as one of the cases in which to stop iterating, so replicate it here too. Failure to do so leads to erroneous CLOG, subtrans or multixact lookups. Per discussion following the bug report by J Smith in CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com as diagnosed by Andres Freund.	2013-11-28 12:00:12 -03:00
Alvaro Herrera	c235a6a589	Don't try to set InvalidXid as page pruning hint If a transaction updates/deletes a tuple just before aborting, and a concurrent transaction tries to prune the page concurrently, the pruner may see HeapTupleSatisfiesVacuum return HEAPTUPLE_DELETE_IN_PROGRESS, but a later call to HeapTupleGetUpdateXid() return InvalidXid. This would cause an assertion failure in development builds, but would be otherwise Mostly Harmless. Fix by checking whether the updater Xid is valid before trying to apply it as page prune point. Reported by Andres in 20131124000203.GA4403@alap2.anarazel.de	2013-11-28 12:00:12 -03:00
Alvaro Herrera	e518fa7adf	Cope with heap_fetch failure while locking an update chain The reason for the fetch failure is that the tuple was removed because it was dead; so the failure is innocuous and can be ignored. Moreover, there's no need for further work and we can return success to the caller immediately. EvalPlanQualFetch is doing something very similar to this already. Report and test case from Andres Freund in 20131124000203.GA4403@alap2.anarazel.de	2013-11-28 12:00:12 -03:00
Tom Lane	7db285afc9	Fix stale-pointer problem in fast-path locking logic. When acquiring a lock in fast-path mode, we must reset the locallock object's lock and proclock fields to NULL. They are not necessarily that way to start with, because the locallock could be left over from a failed lock acquisition attempt earlier in the transaction. Failure to do this led to all sorts of interesting misbehaviors when LockRelease tried to clean up no-longer-related lock and proclock objects in shared memory. Per report from Dan Wood. In passing, modify LockRelease to elog not just Assert if it doesn't find lock and proclock objects for a formerly fast-path lock, matching the code in FastPathGetRelationLockEntry and LockRefindAndRelease. This isn't a bug but it will help in diagnosing any future bugs in this area. Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry to break out of their loops over the fastpath array once they've found the sole matching entry. This was inconsistently done in some search loops and not others. Improve assorted related comments, too. Back-patch to 9.2 where the fast-path mechanism was introduced.	2013-11-27 18:10:00 -05:00
Tom Lane	8c84803e14	Minor corrections in lmgr/README. Correct an obsolete statement that no backend touches another backend's PROCLOCK lists. This was probably wrong even when written (the deadlock checker looks at everybody's lists), and it's certainly quite wrong now that fast-path locking can require creation of lock and proclock objects on behalf of another backend. Also improve some statements in the hot standby explanation, and do one or two other trivial bits of wordsmithing/ reformatting.	2013-11-27 15:07:13 -05:00
Heikki Linnakangas	631118fe1e	Get rid of the post-recovery cleanup step of GIN page splits. Replace it with an approach similar to what GiST uses: when a page is split, the left sibling is marked with a flag indicating that the parent hasn't been updated yet. When the parent is updated, the flag is cleared. If an insertion steps on a page with the flag set, it will finish split before proceeding with the insertion. The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. This also divides the responsibility of WAL-logging more clearly between the generic ginbtree.c code, and the parts specific to entry and posting trees. There is now a common WAL record format for insertions and deletions, which is written by ginbtree.c, followed by tree-specific payload, which is returned by the placetopage- and split- callbacks.	2013-11-27 19:21:23 +02:00
Heikki Linnakangas	ce5326eed3	More GIN refactoring. Separate the insertion payload from the more static portions of GinBtree. GinBtree now only contains information related to searching the tree, and the information of what to insert is passed separately. Add root block number to GinBtree, instead of passing it around all the functions as argument. Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is responsible for finding the parent and inserting the downlink to it.	2013-11-27 15:43:05 +02:00
Heikki Linnakangas	82b43f7df2	Don't update relfrozenxid if any pages were skipped. Vacuum recognizes that it can update relfrozenxid by checking whether it has processed all pages of a relation. Unfortunately it performed that check after truncating the dead pages at the end of the relation, and used the new number of pages to decide whether all pages have been scanned. If the new number of pages happened to be smaller or equal to the number of pages scanned, it incorrectly decided that all pages were scanned. This can lead to relfrozenxid being updated, even though some pages were skipped that still contain old XIDs. That can lead to data loss due to xid wraparounds with some rows suddenly missing. This likely has escaped notice so far because it takes a large number (~2^31) of xids being used to see the effect, while a full-table vacuum before that would fix the issue. The incorrect logic was introduced by commit `b4b6923e03`. Backpatch this fix down to 8.4, like that commit. Andres Freund, with some modifications by me.	2013-11-27 13:43:27 +02:00
Peter Eisentraut	85ed91ee7d	Implement information_schema.parameters.parameter_default column Reviewed-by: Ali Dar <ali.munir.dar@gmail.com> Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com> Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>	2013-11-26 23:21:35 -05:00
Jeff Davis	7cc0ba9f17	Add missing entry for session_preload_libraries in sample config. The omission was apparently an oversight in the original patch.	2013-11-25 21:03:07 -08:00
Bruce Momjian	a6542a4b68	Change SET LOCAL/CONSTRAINTS/TRANSACTION and ABORT behavior Change SET LOCAL/CONSTRAINTS/TRANSACTION behavior outside of a transaction block from error (post-9.3) to warning. (Was nothing in <= 9.3.) Also change ABORT outside of a transaction block from notice to warning.	2013-11-25 19:19:40 -05:00
Jeff Davis	559d535819	Lessen library-loading log level. Previously, messages were emitted at the LOG level every time a backend preloaded a library. That was acceptable (though unnecessary) for shared_preload_libraries; but it was excessive for local_preload_libraries and session_preload_libraries. Reduce to DEBUG1. Also, there was logic in the EXEC_BACKEND case to avoid repeated messages for shared_preload_libraries by demoting them to DEBUG2. DEBUG1 seems more appropriate there, as well, so eliminate that special case. Peter Geoghegan.	2013-11-24 10:50:54 -08:00
Tom Lane	36a3be6540	Fix new and latent bugs with errno handling in secure_read/secure_write. These functions must be careful that they return the intended value of errno to their callers. There were several scenarios where this might not happen: 1. The recent SSL renegotiation patch added a hunk of code that would execute after setting errno. In the first place, it's doubtful that we should consider renegotiation to be successfully completed after a failure, and in the second, there's no real guarantee that the called OpenSSL routines wouldn't clobber errno. Fix by not executing that hunk except during success exit. 2. errno was left in an unknown state in case of an unrecognized return code from SSL_get_error(). While this is a "can't happen" case, it seems like a good idea to be sure we know what would happen, so reset errno to ECONNRESET in such cases. (The corresponding code in libpq's fe-secure.c already did this.) 3. There was an (undocumented) assumption that client_read_ended() wouldn't change errno. While true in the current state of the code, this seems less than future-proof. Add explicit saving/restoring of errno to make sure that changes in the called functions won't break things. I see no need to back-patch, since #1 is new code and the other two issues are mostly hypothetical. Per discussion with Amit Kapila.	2013-11-24 13:09:38 -05:00
Tom Lane	45e02e3232	Fix array slicing of int2vector and oidvector values. The previous coding labeled expressions such as pg_index.indkey[1:3] as being of int2vector type; which is not right because the subscript bounds of such a result don't, in general, satisfy the restrictions of int2vector. To fix, implicitly promote the result of slicing int2vector to int2[], or oidvector to oid[]. This is similar to what we've done with domains over arrays, which is a good analogy because these types are very much like restricted domains of the corresponding regular-array types. A side-effect is that we now also forbid array-element updates on such columns, eg while "update pg_index set indkey[4] = 42" would have worked before if you were superuser (and corrupted your catalogs irretrievably, no doubt) it's now disallowed. This seems like a good thing since, again, some choices of subscripting would've led to results not satisfying the restrictions of int2vector. The case of an array-slice update was rejected before, though with a different error message than you get now. We could make these cases work in future if we added a cast from int2[] to int2vector (with a cast function checking the subscript restrictions) but it seems unlikely that there's any value in that. Per report from Ronan Dunklau. Back-patch to all supported branches because of the crash risks involved.	2013-11-23 20:03:56 -05:00
Peter Eisentraut	b7212c9726	Fix thinko in SPI_execute_plan() calls Two call sites were apparently thinking that the last argument of SPI_execute_plan() is the number of query parameters, but it is actually the row limit. Change the calls to 0, since we don't care about the limit there. The previous code didn't break anything, but it was still wrong.	2013-11-23 09:34:57 -05:00
Peter Eisentraut	4053189d59	Avoid potential buffer overflow crash A pointer to a C string was treated as a pointer to a "name" datum and passed to SPI_execute_plan(). This pointer would then end up being passed through datumCopy(), which would try to copy the entire 64 bytes of name data, thus running past the end of the C string. Fix by converting the string to a proper name structure. Found by LLVM AddressSanitizer.	2013-11-23 07:25:37 -05:00
Tom Lane	f19e92ed04	Flatten join alias Vars before pulling up targetlist items from a subquery. pullup_replace_vars()'s decisions about whether a pulled-up replacement expression needs to be wrapped in a PlaceHolderVar depend on the assumption that what looks like a Var behaves like a Var. However, if the Var is a join alias reference, later flattening of join aliases might replace the Var with something that's not a Var at all, and should have been wrapped. To fix, do a forcible pass of flatten_join_alias_vars() on the subquery targetlist before we start to copy items out of it. We'll re-run that processing on the pulled-up expressions later, but that's harmless. Per report from Ken Tanzer; the added regression test case is based on his example. This bug has been there since the PlaceHolderVar mechanism was invented, but has escaped detection because the circumstances that trigger it are fairly narrow. You need a flattenable query underneath an outer join, which contains another flattenable query inside a join of its own, with a dangerous expression (a constant or something else non-strict) in that one's targetlist. Having seen this, I'm wondering if it wouldn't be prudent to do all alias-variable flattening earlier, perhaps even in the rewriter. But that would probably not be a back-patchable change.	2013-11-22 14:37:21 -05:00
Heikki Linnakangas	98f58a30c1	Fix Hot-Standby initialization of clog and subtrans. These bugs can cause data loss on standbys started with hot_standby=on at the moment they start to accept read only queries, by marking committed transactions as uncommited. The likelihood of such corruptions is small unless the primary has a high transaction rate. `5a031a5556` fixed bugs in HS's startup logic by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state was reached, missing the fact that both clog and subtrans are written to before that. This only failed to fail in common cases because the usage of ExtendCLOG in procarray.c was superflous since clog extensions are actually WAL logged. f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the missing extensions of pg_subtrans due to the former commit's changes - which are not WAL logged - by performing the extensions when switching to a state > STANDBY_INITIALIZED and not performing xid assignments before that - again missing the fact that ExtendCLOG is unneccessary - but screwed up twice: Once because latestObservedXid wasn't updated anymore in that state due to the earlier commit and once by having an off-by-one error in the loop performing extensions. This means that whenever a CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed between the start of the checkpoint recovery started from and the first xl_running_xact record old transactions commit bits in pg_clog could be overwritten if they started and committed in that window. Fix this mess by not performing ExtendCLOG() in HS at all anymore since it's unneeded and evidently dangerous and by performing subtrans extensions even before reaching STANDBY_SNAPSHOT_PENDING. Analysis and patch by Andres Freund. Reported by Christophe Pettus. Backpatch down to 9.0, like the previous commit that caused this.	2013-11-22 14:45:41 +02:00
Heikki Linnakangas	1a3d104475	Avoid acquiring spinlock when checking if recovery has finished, for speed. RecoveryIsInProgress() can be called very frequently. During normal operation, it just checks a backend-local variable and returns quickly, but during hot standby, it checks a spinlock-protected shared variable. Those spinlock acquisitions can become a point of contention on a busy hot standby system. Replace the spinlock acquisition with a memory barrier. Per discussion with Andres Freund, Ants Aasma and Merlin Moncure.	2013-11-22 13:07:23 +02:00
Tom Lane	784e762e88	Support multi-argument UNNEST(), and TABLE() syntax for multiple functions. This patch adds the ability to write TABLE( function1(), function2(), ...) as a single FROM-clause entry. The result is the concatenation of the first row from each function, followed by the second row from each function, etc; with NULLs inserted if any function produces fewer rows than others. This is believed to be a much more useful behavior than what Postgres currently does with multiple SRFs in a SELECT list. This syntax also provides a reasonable way to combine use of column definition lists with WITH ORDINALITY: put the column definition list inside TABLE(), where it's clear that it doesn't control the ordinality column as well. Also implement SQL-compliant multiple-argument UNNEST(), by turning UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)). The SQL standard specifies TABLE() with only a single function, not multiple functions, and it seems to require an implicit UNNEST() which is not what this patch does. There may be something wrong with that reading of the spec, though, because if it's right then the spec's TABLE() is just a pointless alternative spelling of UNNEST(). After further review of that, we might choose to adopt a different syntax for what this patch does, but in any case this functionality seems clearly worthwhile. Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and significantly revised by me	2013-11-21 19:37:20 -05:00
Heikki Linnakangas	04eee1fa9e	More GIN refactoring. Split off the portion of ginInsertValue that inserts the tuple to current level into a separate function, ginPlaceToPage. ginInsertValue's charter is now to recurse up the tree to insert the downlink, when a page split is required. This is in preparation for a patch to change the way incomplete splits are handled, which will need to do these operations separately. And IMHO makes the code more readable anyway.	2013-11-20 17:01:33 +02:00
Heikki Linnakangas	501012631e	Refactor the internal GIN B-tree interface for forming a downlink. This creates a new gin-btree callback function for creating a downlink for a page. Previously, ginxlog.c duplicated the logic used during normal operation.	2013-11-20 16:57:41 +02:00
Heikki Linnakangas	04965ad40e	Further GIN refactoring. Merge some functions that were always called together. Makes the code little bit more readable.	2013-11-20 16:09:14 +02:00
Robert Haas	f1df4731ee	Use cstring_to_text_with_len when length is known. This avoids a potentially-expensive extra call to strlen(). David Rowley	2013-11-18 10:19:00 -05:00
Heikki Linnakangas	4c697d8f48	Count locked pages that don't need vacuuming as scanned. Previously, if VACUUM skipped vacuuming a page because it's pinned, it didn't count that page as scanned. However, that meant that relfrozenxid was not bumped up either, which prevented anti-wraparound vacuum from doing its job. Report by Миша Тюрин, analysis and patch by Sergey Burladyn and Jeff Janes. Backpatch to 9.2, where the skip-locked-pages behavior was introduced.	2013-11-18 09:51:09 +02:00
Tom Lane	f901bb50e3	Add make_date() and make_time() functions. Pavel Stehule, reviewed by Jeevan Chalke and Atri Sharma	2013-11-17 15:06:50 -05:00
Tom Lane	69c8fbac20	Improve performance of numeric sum(), avg(), stddev(), variance(), etc. This patch improves performance of most built-in aggregates that formerly used a NUMERIC or NUMERIC array as their transition type; this includes not only aggregates on numeric inputs, but some aggregates on integer inputs where overflow of an int8 value is a possibility. The code now uses a special-purpose data structure to avoid array construction and deconstruction overhead, as well as packing and unpacking overhead for numeric values. These aggregates' transition type is now declared as INTERNAL, since it doesn't correspond to any SQL data type. To keep the planner from thinking that that means a lot of storage will be used, we make use of the just-added pg_aggregate.aggtransspace feature. The space estimate is set to 128 bytes, which is at least in the right ballpark. Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra	2013-11-16 18:46:34 -05:00
Tom Lane	6cb86143e8	Allow aggregates to provide estimates of their transition state data size. Formerly the planner had a hard-wired rule of thumb for guessing the amount of space consumed by an aggregate function's transition state data. This estimate is critical to deciding whether it's OK to use hash aggregation, and in many situations the built-in estimate isn't very good. This patch adds a column to pg_aggregate wherein a per-aggregate estimate can be provided, overriding the planner's default, and infrastructure for setting the column via CREATE AGGREGATE. It may be that additional smarts will be required in future, perhaps even a per-aggregate estimation function. But this is already a step forward. This is extracted from a larger patch to improve the performance of numeric and int8 aggregates. I (tgl) thought it was worth reviewing and committing this infrastructure separately. In this commit, all built-in aggregates are given aggtransspace = 0, so no behavior should change. Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra	2013-11-16 16:03:40 -05:00
Tom Lane	f1f21b2d6f	Fix incorrect loop counts in tidbitmap.c. A couple of places that should have been iterating over WORDS_PER_CHUNK words were iterating over WORDS_PER_PAGE words instead. This thinko accidentally failed to fail, because (at least on common architectures with default BLCKSZ) WORDS_PER_CHUNK is a bit less than WORDS_PER_PAGE, and the extra words being looked at were always zero so nothing happened. Still, it's a bug waiting to happen if anybody ever fools with the parameters affecting TIDBitmap sizes, and it's a small waste of cycles too. So back-patch to all active branches. Etsuro Fujita	2013-11-15 18:34:14 -05:00
Tom Lane	f3b3b8d5be	Compute correct em_nullable_relids in get_eclass_for_sort_expr(). Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr must be able to compute valid em_nullable_relids for any new equivalence class members it creates. I'd worried about this in the commit message for `db9f0e1d9a`, but claimed that it wasn't a problem because multi-member ECs should already exist when it runs. That is transparently wrong, though, because this function is also called by initialize_mergeclause_eclasses, which runs during deconstruct_jointree. The example given in the bug report (which the new regression test item is based upon) fails because the COALESCE() expression is first seen by initialize_mergeclause_eclasses rather than process_equivalence. Fixing this requires passing the appropriate nullable_relids set to get_eclass_for_sort_expr, and it requires new code to compute that set for top-level expressions such as ORDER BY, GROUP BY, etc. We store the top-level nullable_relids in a new field in PlannerInfo to avoid computing it many times. In the back branches, I've added the new field at the end of the struct to minimize ABI breakage for planner plugins. There doesn't seem to be a good alternative to changing get_eclass_for_sort_expr's API signature, though. There probably aren't any third-party extensions calling that function directly; moreover, if there are, they probably need to think about what to pass for nullable_relids anyway. Back-patch to 9.2, like the previous patch in this area.	2013-11-15 16:46:18 -05:00
Tom Lane	80e3a470ba	Minor comment corrections for sequence hashtable patch. There were enough typos in the comments to annoy me ...	2013-11-15 12:17:12 -05:00
Heikki Linnakangas	5cb719beee	Fix bogus hash table creation. Andres Freund	2013-11-15 14:23:40 +02:00
Heikki Linnakangas	21025d4a53	Use a hash table to store current sequence values. This speeds up nextval() and currval(), when you touch a lot of different sequences in the same backend. David Rowley	2013-11-15 12:29:38 +02:00
Robert Haas	c46c803f8a	Fix relfilenodemap.c's handling of cache invalidations. The old code entered a new hash table entry first, then scanned pg_class to determine what value to fill in, and then populated the entry. This fails to work properly if a cache invalidation happens as a result of opening pg_class. Repair. Along the way, get rid of the idea of blowing away the entire hash table as a method of processing invalidations. Instead, just delete all the entries one by one. This is probably not quite as cheap but it's simpler, and shouldn't happen often. Andres Freund	2013-11-13 10:52:59 -05:00
Heikki Linnakangas	07fca603b5	Fix bug in GIN posting tree root creation. The root page is filled with as many items as fit, and the rest are inserted using normal insertions. However, I fumbled the variable names, and the code actually memcpy'd all the items on the page, overflowing the buffer. While at it, rename the variable to make the distinction more clear. Reported by Teodor Sigaev. This bug was introduced by my recent refactorings, so no backpatching required.	2013-11-13 13:47:59 +02:00
Peter Eisentraut	aa04b323c3	Move variable closer to where it is used This avoids an unused variable warning on Windows when building without asserts From: David Rowley <dgrowleyml@gmail.com>	2013-11-13 06:26:27 -05:00
Tom Lane	ebefbb5fde	Fix failure with whole-row reference to a subquery. Simple oversight in commit `1cb108efb0` --- recursively examining a subquery output column is only sane if the original Var refers to a single output column. Found by Kevin Grittner.	2013-11-11 16:36:27 -05:00
Tom Lane	0b7e660d6c	Fix ruleutils pretty-printing to not generate trailing whitespace. The pretty-printing logic in ruleutils.c operates by inserting a newline and some indentation whitespace into strings that are already valid SQL. This naturally results in leaving some trailing whitespace before the newline in many cases; which can be annoying when processing the output with other tools, as complained of by Joe Abbate. We can fix that in a pretty localized fashion by deleting any trailing whitespace before we append a pretty-printing newline. In addition, we have to modify the code inserted by commit `2f582f76b1` so that we also delete trailing whitespace when transposing items from temporary buffers into the main result string, when a temporary item starts with a newline. This results in rather voluminous changes to the regression test results, but it's easily verified that they are only removal of trailing whitespace. Back-patch to 9.3, because the aforementioned commit resulted in many more cases of trailing whitespace than had occurred in earlier branches.	2013-11-11 13:36:38 -05:00
Tom Lane	648bd05b13	Re-allow duplicate aliases within aliased JOINs. Although the SQL spec forbids duplicate table aliases, historically we've allowed queries like SELECT ... FROM tab1 x CROSS JOIN (tab2 x CROSS JOIN tab3 y) z on the grounds that the aliased join (z) hides the aliases within it, therefore there is no conflict between the two RTEs named "x". The LATERAL patch broke this, on the misguided basis that "x" could be ambiguous if tab3 were a LATERAL subquery. To avoid breaking existing queries, it's better to allow this situation and complain only if tab3 actually does contain an ambiguous reference. We need only remove the check that was throwing an error, because the column lookup code is already prepared to handle ambiguous references. Per bug #8444.	2013-11-11 10:42:57 -05:00
Peter Eisentraut	001e114b8d	Fix whitespace issues found by git diff --check, add gitattributes Set per file type attributes in .gitattributes to fine-tune whitespace checks. With the associated cleanups, the tree is now clean for git	2013-11-10 14:48:29 -05:00
Heikki Linnakangas	ac4ab97ec0	Fix race condition in GIN posting tree page deletion. If a page is deleted, and reused for something else, just as a search is following a rightlink to it from its left sibling, the search would continue scanning whatever the new contents of the page are. That could lead to incorrect query results, or even something more curious if the page is reused for a different kind of a page. To fix, modify the search algorithm to lock the next page before releasing the previous one, and refrain from deleting pages from the leftmost branch of the tree. Add a new Concurrency section to the README, explaining why this works. There is a lot more one could say about concurrency in GIN, but that's for another patch. Backpatch to all supported versions.	2013-11-08 22:21:42 +02:00
Robert Haas	07cacba983	Add the notion of REPLICA IDENTITY for a table. Pending patches for logical replication will use this to determine which columns of a tuple ought to be considered as its candidate key. Andres Freund, with minor, mostly cosmetic adjustments by me	2013-11-08 12:30:43 -05:00
Tom Lane	b97ee66cc1	Make contain_volatile_functions/contain_mutable_functions look into SubLinks. This change prevents us from doing inappropriate subquery flattening in cases such as dangerous functions hidden inside a sub-SELECT in the targetlist of another sub-SELECT. That could result in unexpected behavior due to multiple evaluations of a volatile function, as in a recent complaint from Etienne Dube. It's been questionable from the very beginning whether these functions should look into subqueries (as noted in their comments), and this case seems to provide proof that they should. Because the new code only descends into SubLinks, not SubPlans or InitPlans, the change only affects the planner's behavior during prepjointree processing and not later on --- for example, you can still get it to use a volatile function in an indexqual if you wrap the function in (SELECT ...). That's a historical behavior, for sure, but it's reasonable given that the executor's evaluation rules for subplans don't depend on whether there are volatile functions inside them. In any case, we need to constrain the behavioral change as narrowly as we can to make this reasonable to back-patch.	2013-11-08 11:36:57 -05:00
Tom Lane	060b22a99a	Fix subtly-wrong volatility checking in BeginCopyFrom(). contain_volatile_functions() is best applied to the output of expression_planner(), not its input, so that insertion of function default arguments and constant-folding have been done. (See comments at CheckMutability, for instance.) It's perhaps unlikely that anyone will notice a difference in practice, but still we should do it properly. In passing, change variable type from Node* to Expr* to reduce the net number of casts needed. Noted while perusing uses of contain_volatile_functions().	2013-11-08 08:59:39 -05:00
Tom Lane	20803d7881	Make LOCK_PRINT & PROCLOCK_PRINT expand to ((void) 0) when not in use. This avoids warnings from more-anal-than-average compilers, and might prevent hidden syntax problems in the future. Andres Freund	2013-11-07 19:07:48 -05:00
Kevin Grittner	b64b5ccb6a	Silence benign warnings from clang version 3.0-6ubuntu3.	2013-11-07 16:35:43 -06:00
Tom Lane	c28b289bf3	Prevent display of dropped columns in row constraint violation messages. ExecBuildSlotValueDescription() printed "null" for each dropped column in a row being complained of by ExecConstraints(). This has some sanity in terms of the underlying implementation, but is of course pretty surprising to users. To fix, we must pass the target relation's descriptor to ExecBuildSlotValueDescription(), because the slot descriptor it had been using doesn't get labeled with attisdropped markers. Per bug #8408 from Maxim Boguk. Back-patch to 9.2 where the feature of printing row values in NOT NULL and CHECK constraint violation messages was introduced. Michael Paquier and Tom Lane	2013-11-07 14:41:36 -05:00
Tom Lane	5e900bc00f	Fix generation of MergeAppend plans for optimized min/max on expressions. Before jamming a desired targetlist into a plan node, one really ought to make sure the plan node can handle projections, and insert a buffering Result plan node if not. planagg.c forgot to do this, which is a hangover from the days when it only dealt with IndexScan plan types. MergeAppend doesn't project though, not to mention that it gets unhappy if you remove its possibly-resjunk sort columns. The code accidentally failed to fail for cases in which the min/max argument was a simple Var, because the new targetlist would be equivalent to the original "flat" tlist anyway. For any more complex case, it's been broken since 9.1 where we introduced the ability to optimize min/max using MergeAppend, as reported by Raphael Bauduin. Fix by duplicating the logic from grouping_planner that decides whether we need a Result node. In 9.2 and 9.1, this requires back-porting the tlist_same_exprs() function introduced in commit `4387cf956b`, else we'd uselessly add a Result node in cases that worked before. It's rather tempting to back-patch that whole commit so that we can avoid extra Result nodes in mainline cases too; but I'll refrain, since that code hasn't really seen all that much field testing yet.	2013-11-07 13:14:14 -05:00
Heikki Linnakangas	fde7172d93	Fix setting of right bound at GIN page split. Broken by my refactoring.	2013-11-07 19:45:07 +02:00
Tom Lane	8dace66e07	Add #ifdef guards for some POSIX error symbols that Windows doesn't like. Per buildfarm results. It looks like the older the Windows version, the more errno codes it hasn't got ...	2013-11-06 20:22:42 -05:00
Tom Lane	8e68816cc2	Be more robust when strerror() doesn't give a useful result. glibc, at least, is capable of returning "???" instead of anything useful if it doesn't like the setting of LC_CTYPE. If this happens, or in the previously-known case of strerror() returning an empty string, try to print the C macro name for the error code ("EACCES" etc). Only if we don't have the error code in our compiled-in list of popular error codes (which covers most though not quite all of what's called out in the POSIX spec) will we fall back to printing a numeric error code. This should simplify debugging. Note that this functionality is currently only provided for %m in backend ereport/elog messages. That may be sufficient, since we don't fool with the locale environment in frontend clients, but it's foreseeable that we might want similar code in libpq for instance. There was some talk of back-patching this, but let's see how the buildfarm likes it first. It seems likely that at least some of the POSIX-defined error code symbols don't exist on all platforms. I don't want to clutter the entire list with #ifdefs, but we may need more than are here now. MauMau, edited by me	2013-11-06 15:50:17 -05:00
Tom Lane	bb45c64041	Support default arguments and named-argument notation for window functions. These things didn't work because the planner omitted to do the necessary preprocessing of a WindowFunc's argument list. Add the few dozen lines of code needed to handle that. Although this sounds like a feature addition, it's really a bug fix because the default-argument case was likely to crash previously, due to lack of checking of the number of supplied arguments in the built-in window functions. It's not a security issue because there's no way for a non-superuser to create a window function definition with defaults that refers to a built-in C function, but nonetheless people might be annoyed that it crashes rather than producing a useful error message. So back-patch as far as the patch applies easily, which turns out to be 9.2. I'll put a band-aid in earlier versions as a separate patch. (Note that these features still don't work for aggregates, and fixing that case will be harder since we represent aggregate arg lists as target lists not bare expression lists. There's no crash risk though because CREATE AGGREGATE doesn't accept defaults, and we reject named-argument notation when parsing an aggregate call.)	2013-11-06 13:33:09 -05:00
Kevin Grittner	5829082a57	Keep heap open until new heap generated in RMV. Early close became apparent when invalidation messages were processed in a new location under CLOBBER_CACHE_ALWAYS builds, due to additional locking. Back-patch to 9.3	2013-11-06 12:27:52 -06:00
Heikki Linnakangas	0ea53256a8	Fix missing argument and function prototypes. Not sure how I missed these in previous commit.	2013-11-06 11:22:58 +02:00
Heikki Linnakangas	ecaa4708e5	Misc GIN refactoring. Merge the isEnoughSpace and placeToPage functions in the b-tree interface into one function that tries to put a tuple on page, and returns false if it doesn't fit. Move createPostingTree function to gindatapage.c, and change its contract so that it can be passed more items than fit on the root page. It's in a better position than the callers to know how many items fit. Move ginMergeItemPointers out of gindatapage.c, into a separate file. These changes make no difference now, but reduce the footprint of Alexander Korotkov's upcoming patch to pack item pointers more tightly.	2013-11-06 10:32:09 +02:00
Tom Lane	920c8261d5	Improve the error message given for modifying a window with frame clause. For rather inscrutable reasons, SQL:2008 disallows copying-and-modifying a window definition that has any explicit framing clause. The error message we gave for this only made sense if the referencing window definition itself contains an explicit framing clause, which it might well not. Moreover, in the context of an OVER clause it's not exactly obvious that "OVER (windowname)" implies copy-and-modify while "OVER windowname" does not. This has led to multiple complaints, eg bug #5199 from Iliya Krapchatov. Change to a hopefully more intelligible error message, and in the case where we have just "OVER (windowname)", add a HINT suggesting that omitting the parentheses will fix it. Also improve the related documentation. Back-patch to all supported branches.	2013-11-05 21:58:08 -05:00
Kevin Grittner	2636ecf78b	Lock relation used to generate fresh data for RMV. The relation should not be accessible to any other process, but it should be locked for consistency. Since this is not known to cause any bug, it will not be back-patch, at least for now. Per report from Andres Freund	2013-11-05 15:36:33 -06:00
Tom Lane	6331de1d44	Fix some obsolete information in src/backend/optimizer/README. Constant quals aren't handled the same way they used to be. Also, add mention of a couple more major steps in grouping_planner. Per complaint a couple months back from Etsuro Fujita.	2013-11-05 11:31:35 -05:00
Kevin Grittner	732758db4c	Fix breakage of MV column name list usage. Per bug report from Tomonari Katsumata. Back-patch to 9.3.	2013-11-04 14:31:07 -06:00
Robert Haas	dddc34408a	Fix format code used to print dsm request sizes. Per report from Peter Eisentraut.	2013-11-04 11:22:03 -05:00
Tom Lane	e36ce0c7f7	Get rid of more cases of the "must detoast before output function" meme. I missed that json.c was doing this too, because for some bizarre reason it wasn't doing it adjacent to the output function call.	2013-11-03 11:55:37 -05:00
Tom Lane	b006f4ddb9	Prevent memory leaks from accumulating across printtup() calls. Historically, printtup() has assumed that it could prevent memory leakage by pfree'ing the string result of each output function and manually managing detoasting of toasted values. This amounts to assuming that datatype output functions never leak any memory internally; an assumption we've already decided to be bogus elsewhere, for example in COPY OUT. range_out in particular is known to leak multiple kilobytes per call, as noted in bug #8573 from Godfried Vanluffelen. While we could go in and fix that leak, it wouldn't be very notationally convenient, and in any case there have been and undoubtedly will again be other leaks in other output functions. So what seems like the best solution is to run the output functions in a temporary memory context that can be reset after each row, as we're doing in COPY OUT. Some quick experimentation suggests this is actually a tad faster than the retail pfree's anyway. This patch fixes all the variants of printtup, except for debugtup() which is used in standalone mode. It doesn't seem worth worrying about query-lifespan leaks in standalone mode, and fixing that case would be a bit tedious since debugtup() doesn't currently have any startup or shutdown functions. While at it, remove manual detoast management from several other output-function call sites that had copied it from printtup(). This doesn't make a lot of difference right now, but in view of recent discussions about supporting "non-flattened" Datums, we're going to want that code gone eventually anyway. Back-patch to 9.2 where range_out was introduced. We might eventually decide to back-patch this further, but in the absence of known major leaks in older output functions, I'll refrain for now.	2013-11-03 11:33:05 -05:00
Kevin Grittner	2a781d57dc	Acquire appropriate locks when rewriting during RMV. Since the query has not been freshly parsed when executing REFRESH MATERIALIZED VIEW, locks must be explicitly taken before rewrite. Backpatch to 9.3. Andres Freund	2013-11-02 19:18:08 -05:00
Kevin Grittner	be420fa02e	Fix subquery reference to non-populated MV in CMV. A subquery reference to a matview should be allowed by CREATE MATERIALIZED VIEW WITH NO DATA, just like a direct reference is. Per bug report from Laurent Sartran. Backpatch to 9.3.	2013-11-02 18:38:17 -05:00
Tom Lane	24ace4053d	Retry after buffer locking failure during SPGiST index creation. The original coding thought this case was impossible, but it can happen if the bgwriter or checkpointer processes decide to write out an index page while creation is still proceeding, leading to a bogus "unexpected spgdoinsert() failure" error. Problem reported by Jonathan S. Katz. Teodor Sigaev	2013-11-02 16:45:42 -04:00
Tom Lane	bffd1ce92c	Ensure all files created for a single BufFile have the same resource owner. Callers expect that they only have to set the right resource owner when creating a BufFile, not during subsequent operations on it. While we could insist this be fixed at the caller level, it seems more sensible for the BufFile to take care of it. Without this, some temp files belonging to a BufFile can go away too soon, eg at the end of a subtransaction, leading to errors or crashes. Reported and fixed by Andres Freund. Back-patch to all active branches.	2013-11-01 16:09:48 -04:00
Tom Lane	45f64f1bbf	Remove CTimeZone/HasCTZSet, root and branch. These variables no longer have any useful purpose, since there's no reason to special-case brute force timezones now that we have a valid session_timezone setting for them. Remove the variables, and remove the SET/SHOW TIME ZONE code that deals with them. The user-visible impact of this is that SHOW TIME ZONE will now show a POSIX-style zone specification, in the form "<+-offset>-+offset", rather than an interval value when a brute-force zone has been set. While perhaps less intuitive, this is a better definition than before because it's actually possible to give that string back to SET TIME ZONE and get the same behavior, unlike what used to happen. We did not previously mention the angle-bracket syntax when describing POSIX timezone specifications; add some documentation so that people can figure out what these strings do. (There's still quite a lot of undocumented functionality there, but anybody who really cares can go read the POSIX spec to find out about it. In practice most people seem to prefer Olsen-style city names anyway.)	2013-11-01 13:57:31 -04:00
Tom Lane	1c8a7f617f	Remove internal uses of CTimeZone/HasCTZSet. The only remaining places where we actually look at CTimeZone/HasCTZSet are abstime2tm() and timestamp2tm(). Now that session_timezone is always valid, we can remove these special cases. The caller-visible impact of this is that these functions now always return a valid zone abbreviation if requested, whereas before they'd return a NULL pointer if a brute-force timezone was in use. In the existing code, the only place I can find that changes behavior is to_char(), whose TZ format code will now print something useful rather than nothing for such zones. (In the places where the returned zone abbreviation is passed to EncodeDateTime, the lack of visible change is because we've chosen the abbreviation used for these zones to match what EncodeTimezone would have printed.) It's likely that there is now a fair amount of removable dead code around the call sites, namely anything that's meant to cope with getting a NULL timezone abbreviation, but I've not made an effort to root that out. This could be back-patched if we decide we'd like to fix to_char()'s behavior in the back branches, but there doesn't seem to be much enthusiasm for that at present.	2013-11-01 12:51:27 -04:00
Tom Lane	631dc390f4	Fix some odd behaviors when using a SQL-style simple GMT offset timezone. Formerly, when using a SQL-spec timezone setting with a fixed GMT offset (called a "brute force" timezone in the code), the session_timezone variable was not updated to match the nominal timezone; rather, all code was expected to ignore session_timezone if HasCTZSet was true. This is of course obviously fragile, though a search of the code finds only timeofday() failing to honor the rule. A bigger problem was that DetermineTimeZoneOffset() supposed that if its pg_tz parameter was pointer-equal to session_timezone, then HasCTZSet should override the parameter. This would cause datetime input containing an explicit zone name to be treated as referencing the brute-force zone instead, if the zone name happened to match the session timezone that had prevailed before installing the brute-force zone setting (as reported in bug #8572). The same malady could affect AT TIME ZONE operators. To fix, set up session_timezone so that it matches the brute-force zone specification, which we can do using the POSIX timezone definition syntax "<abbrev>offset", and get rid of the bogus lookaside check in DetermineTimeZoneOffset(). Aside from fixing the erroneous behavior in datetime parsing and AT TIME ZONE, this will cause the timeofday() function to print its result in the user-requested time zone rather than some previously-set zone. It might also affect results in third-party extensions, if there are any that make use of session_timezone without considering HasCTZSet, but in all cases the new behavior should be saner than before. Back-patch to all supported branches.	2013-11-01 12:13:18 -04:00
Robert Haas	cacbdd7810	Use appendStringInfoString instead of appendStringInfo where possible. This shaves a few cycles, and generally seems like good programming practice. David Rowley	2013-10-31 10:55:59 -04:00
Robert Haas	343bb134ea	Avoid too-large shift on 32-bit Windows. Apparently, shifts greater than or equal to the width of the type are undefined, and can surprisingly produce a non-zero value. Amit Kapila, with a comment by me.	2013-10-30 09:14:56 -04:00
Tom Lane	9a9473f3cc	Prevent using strncpy with src == dest in TupleDescInitEntry. The C and POSIX standards state that strncpy's behavior is undefined when source and destination areas overlap. While it remains dubious whether any implementations really misbehave when the pointers are exactly equal, some platforms are now starting to force the issue by complaining when an undefined call occurs. (In particular OS X 10.9 has been seen to dump core here, though the exact set of circumstances needed to trigger that remain elusive. Similar behavior can be expected to be optional on Linux and other platforms in the near future.) So tweak the code to explicitly do nothing when nothing need be done. Back-patch to all active branches. In HEAD, this also lets us get rid of an exception in valgrind.supp. Per discussion of a report from Matthias Schmitt.	2013-10-28 20:49:24 -04:00
Robert Haas	d2aecaea15	Modify dynamic shared memory code to use Size rather than uint64. This is more consistent with what we do elsewhere.	2013-10-28 12:12:06 -04:00
Tom Lane	c2b51cf190	Improve documentation about usage of FDW validator functions. SGML documentation, as well as code comments, failed to note that an FDW's validator will be applied to foreign-table options for foreign tables using the FDW. Etsuro Fujita	2013-10-28 10:28:35 -04:00
Noah Misch	c50b7c09d8	Add large object functions catering to SQL callers. With these, one need no longer manipulate large object descriptors and extract numeric constants from header files in order to read and write large object contents from SQL. Pavel Stehule, reviewed by Rushabh Lathia.	2013-10-27 22:56:54 -04:00
Tom Lane	43fe90f66a	Suppress -0 in the C field of lines computed by line_construct_pts(). It's not entirely clear why some PPC machines are generating -0 here, since the underlying computation should be exactly 0 - 0. Perhaps there's some wider-than-nominal-precision calculations happening? Anyway, the best way to avoid platform-dependent results seems to be to explicitly reset -0 to regular zero.	2013-10-25 15:55:15 -04:00
Tom Lane	3147acd63e	Use improved vsnprintf calling logic in more places. When we are using a C99-compliant vsnprintf implementation (which should be most places, these days) it is worth the trouble to make use of its report of how large the buffer needs to be to succeed. This patch adjusts stringinfo.c and some miscellaneous usages in pg_dump to do that, relying on the logic recently added in libpgcommon's psprintf.c. Since these places want to know the number of bytes written once we succeed, modify the API of pvsnprintf() to report that. There remains near-duplicate logic in pqexpbuffer.c, but since that code is in libpq, psprintf.c's approach of exit()-on-error isn't appropriate for use there. Also note that I didn't bother touching the multitude of places that call (v)snprintf without any attempt to provide a resizable buffer. Release-note-worthy incompatibility: the API of appendStringInfoVA() changed. If there's any third-party code that's calling that directly, it will need tweaking along the same lines as in this patch. David Rowley and Tom Lane	2013-10-24 21:43:57 -04:00
Heikki Linnakangas	98c50656ca	Increase the number of different values used when seeding random(). When a backend process is forked, we initialize the system's random number generator with srandom(). The seed used is derived from the backend's pid and the timestamp. However, we only used the microseconds part of the timestamp, and it was XORed with the pid, so the total range of different seed values chosen was 0-999999. That's quite limited. Change the code to also use the seconds part of the timestamp in the seed, and shift the microseconds so that all 32 bits of the seed are used. Honza Horak	2013-10-24 17:00:18 +03:00
Heikki Linnakangas	138184adc5	Plug memory leak when reloading config file. The absolute path to config file was not pfreed. There are probably more small leaks here and there in the config file reload code and assign hooks, and in practice no-one reloads the config files frequently enough for it to be a problem, but this one is trivial enough that might as well fix it. Backpatch to 9.3 where the leak was introduced.	2013-10-24 15:27:40 +03:00
Heikki Linnakangas	bb598456dc	Fix memory leak when an empty ident file is reloaded. Hari Babu	2013-10-24 14:03:26 +03:00
Heikki Linnakangas	4d6d425ab8	Fix typos in comments.	2013-10-24 11:50:02 +03:00
Heikki Linnakangas	83eb54001c	Fix two bugs in setting the vm bit of empty pages. Use a critical section when setting the all-visible flag on an empty page, and WAL-logging it. log_newpage_buffer() contains an assertion that it must be called inside a critical section, and it's the right thing to do when modifying a buffer anyway. Also, the page should be marked dirty before calling log_newpage_buffer(), per the comment in log_newpage_buffer() and src/backend/access/transam/README. Patch by Andres Freund, in response to my report. Backpatch to 9.2, like the patch that introduced these bugs (`a6370fd9`).	2013-10-23 14:24:37 +03:00
Tom Lane	5f1ab46101	Suppress a couple of compiler warnings seen with older gcc versions. To wit, bgworker.c: In function `RegisterDynamicBackgroundWorker': bgworker.c:761: warning: `generation' might be used uninitialized in this function dsm_impl.c: In function `dsm_impl_op': dsm_impl.c:197: warning: control reaches end of non-void function Neither of these represent actual bugs, but we may as well tweak the code so that more compilers can tell that. This won't change the generated code on compilers that do recognize that the cases are unreachable.	2013-10-22 21:31:57 -04:00
Tom Lane	2c66f9924c	Replace pg_asprintf() with psprintf(). This eliminates an awkward coding pattern that's also unnecessarily inconsistent with backend coding. psprintf() is now the thing to use everywhere.	2013-10-22 19:40:26 -04:00
Tom Lane	09a89cb5fc	Get rid of use of asprintf() in favor of a more portable implementation. asprintf(), aside from not being particularly portable, has a fundamentally badly-designed API; the psprintf() function that was added in passing in the previous patch has a much better API choice. Moreover, the NetBSD implementation that was borrowed for the previous patch doesn't work with non-C99-compliant vsnprintf, which is something we still have to cope with on some platforms; and it depends on va_copy which isn't all that portable either. Get rid of that code in favor of an implementation similar to what we've used for many years in stringinfo.c. Also, move it into libpgcommon since it's not really libpgport material. I think this patch will be enough to turn the buildfarm green again, but there's still cosmetic work left to do, namely get rid of pg_asprintf() in favor of using psprintf(). That will come in a followon patch.	2013-10-22 18:42:13 -04:00
Peter Eisentraut	586a8fc75b	Make use of psprintf() in recent changes	2013-10-22 07:04:41 -04:00
Tom Lane	2885881147	Fix blatantly broken record_image_cmp() logic for pass-by-value fields. Doesn't anybody here pay attention to compiler warnings?	2013-10-22 00:38:53 -04:00
Noah Misch	709170b790	Consistently use unsigned arithmetic for alignment calculations. This avoids an assumption about the signed number representation. It is anticipated to have no functional changes on supported configurations; many two's complement assumptions remain elsewhere. Per a suggestion from Andres Freund.	2013-10-20 21:04:52 -04:00
Peter Eisentraut	713a9f210d	Add libpgcommon to backend gettext source files This ought to have been done when libpgcommon was split off from libpgport.	2013-10-19 13:49:05 -04:00
Robert Haas	cab5dc5daf	Allow only some columns of a view to be auto-updateable. Previously, unless all columns were auto-updateable, we wouldn't inserts, updates, or deletes, or at least not without a rule or trigger; now, we'll allow inserts and updates that target only the auto-updateable columns, and deletes even if there are no auto-updateable columns at all provided the view definition is otherwise suitable. Dean Rasheed, reviewed by Marko Tiikkaja	2013-10-18 10:35:36 -04:00
Robert Haas	523beaa11b	Provide a reliable mechanism for terminating a background worker. Although previously-introduced APIs allow the process that registers a background worker to obtain the worker's PID, there's no way to prevent a worker that is not currently running from being restarted. This patch introduces a new API TerminateBackgroundWorker() that prevents the background worker from being restarted, terminates it if it is currently running, and causes it to be unregistered if or when it is not running. Patch by me. Review by Michael Paquier and KaiGai Kohei.	2013-10-18 10:23:11 -04:00
Robert Haas	ea91a6be89	Remove IRIX port. Development of IRIX has been discontinued, and support is scheduled to end in December of 2013. Therefore, there will be no supported versions of this operating system by the time PostgreSQL 9.4 is released. Furthermore, we have no maintainer for this platform.	2013-10-18 08:14:21 -04:00
Robert Haas	81051a86bc	Remove spinlock support for SINIX, Sun3, and NS32K. All of these platforms are very much obsolete. As far as I can determine, the last version of SINIX, later renamed Reliant, occurred some time between 2002 and 2005. The last release of SunOS that would run on a sun3 was released in November of 1991; the last release of OpenBSD which supported that platform was in 2001. The highest clock speed of any processor in the family was 25MHz. The NS32K (national semiconductor 320xx) architecture was retired in 1990. Support can be re-added if a maintainer emerges for any of these platforms, but it seems unlikely. Reviewed by Andres Freund.	2013-10-17 12:02:05 -04:00
Alvaro Herrera	86029b31e5	Silence compiler warning when SSL not in use Per Jaime Casanova and Vik Fearing	2013-10-17 11:28:50 -03:00
Bruce Momjian	7778ddc7a2	Allow 5+ digit years for non-ISO timestamp/date strings, where appropriate Report from Haribabu Kommi	2013-10-16 13:22:55 -04:00
Robert Haas	e515861367	In dsm_impl_windows, don't error out when the segment already exists. This is the behavior of the other implementations, and the behavior expected by the callers of this function. Amit Kapila	2013-10-14 11:48:49 -04:00
Robert Haas	05a0283e7a	Fix details missed by dynamic shared memory patch. Additional documentation update, and a comment fix. Both issues reported by Amit Kapila.	2013-10-14 08:00:26 -04:00
Peter Eisentraut	5b6d08cd29	Add use of asprintf() Add asprintf(), pg_asprintf(), and psprintf() to simplify string allocation and composition. Replacement implementations taken from NetBSD. Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Asif Naeem <anaeem.it@gmail.com>	2013-10-13 00:09:18 -04:00
Kevin Grittner	4cbb646334	Fix several possibly non-portable gaffs in record_image_ops. Sparc machines in the buildfarm were made happy by the previous fix, but PowerPC machines still are still failing. Hopefully this will cure that.	2013-10-11 13:02:52 -05:00
Alvaro Herrera	ada01014d4	Use $(PERL) to invoke duplicate_oids Per buildfarm failure reported by smilodon	2013-10-10 23:45:38 -03:00
Alvaro Herrera	31cf1a1a43	Rework SSL renegotiation code The existing renegotiation code was home for several bugs: it might erroneously report that renegotiation had failed; it might try to execute another renegotiation while the previous one was pending; it failed to terminate the connection if the renegotiation never actually took place; if a renegotiation was started, the byte count was reset, even if the renegotiation wasn't completed (this isn't good from a security perspective because it means continuing to use a session that should be considered compromised due to volume of data transferred.) The new code is structured to avoid these pitfalls: renegotiation is started a little earlier than the limit has expired; the handshake sequence is retried until it has actually returned successfully, and no more than that, but if it fails too many times, the connection is closed. The byte count is reset only when the renegotiation has succeeded, and if the renegotiation byte count limit expires, the connection is terminated. This commit only touches the master branch, because some of the changes are controversial. If everything goes well, a back-patch might be considered. Per discussion started by message 20130710212017.GB4941@eldon.alvh.no-ip.org	2013-10-10 23:45:20 -03:00
Peter Eisentraut	5dd41f3574	Remove maintainer-check target, fold into normal build make maintainer-check was obscure and rarely called in practice, and many breakages were missed. Fold everything that make maintainer-check used to do into the normal build. Specifically: - Call duplicate_oids when genbki.pl is called. - Check for tabs in SGML files when the documentation is built. - Run msgfmt with the -c option during the regular build. Add an additional configure check to see whether we are using the GNU version. (make maintainer-check probably used to fail with non-GNU msgfmt.) Keep maintainer-check as around as phony target for the time being in case anyone is calling it. But it won't do anything anymore.	2013-10-10 20:11:56 -04:00
Kevin Grittner	15e46fd1dd	Fix bug in record_image_ops on big endian machines. The buildfarm pointed out the problem. Fix based on suggestion by Robert Haas.	2013-10-10 11:25:30 -05:00
Andrew Dunstan	4d212bac17	json_typeof function. Andrew Tipton.	2013-10-10 12:21:59 -04:00
Robert Haas	4b7b9a7904	Fix incorrect use of shm_unlink where unlink should be used. Per buildfarm.	2013-10-10 10:57:10 -04:00
Peter Eisentraut	261c7d4b65	Revive line type Change the input/output format to {A,B,C}, to match the internal representation. Complete the implementations of line_in, line_out, line_recv, line_send. Remove comments and error messages about the line type not being implemented. Add regression tests for existing line operators and functions. Reviewed-by: rui hua <365507506hua@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>	2013-10-09 22:34:38 -04:00
Robert Haas	0ac5e5a7e1	Allow dynamic allocation of shared memory segments. Patch by myself and Amit Kapila. Design help from Noah Misch. Review by Andres Freund.	2013-10-09 21:05:02 -04:00
Kevin Grittner	f566515192	Add record_image_ops opclass for matview concurrent refresh. REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview containing a column of a type without a default btree operator class. It also did not produce results consistent with a non- concurrent REFRESH or a normal view if any column was of a type which allowed user-visible differences between values which compared as equal according to the type's default btree opclass. Concurrent matview refresh was modified to use the new operators to solve these problems. Documentation was added for record comparison, both for the default btree operator class for record, and the newly added operators. Regression tests now check for proper behavior both for a matview with a box column and a matview containing a citext column. Reviewed by Steve Singer, who suggested some of the doc language.	2013-10-09 14:26:09 -05:00
Bruce Momjian	0c6b675076	Centralize effective_cache_size default setting	2013-10-09 08:33:12 -04:00
Bruce Momjian	96dfa6ec0d	Adjust the effective_cache_size default for standalone backends	2013-10-08 23:53:39 -04:00
Bruce Momjian	6b82f78ff9	Again move function where we set effective_cache_size's default	2013-10-08 23:12:45 -04:00
Bruce Momjian	cbafd6618a	Move new effective_cache_size function Previously set_default_effective_cache_size() could not handle fork, non-fork, and bootstrap cases.	2013-10-08 22:41:23 -04:00
Bruce Momjian	bf46524b31	Fix C comment in check_effective_cache_size()	2013-10-08 19:25:26 -04:00
Bruce Momjian	6648775028	Update postgres.conf.sample for effective_cache_size's new default	2013-10-08 12:50:05 -04:00
Bruce Momjian	ee1e5662d8	Auto-tune effective_cache size to be 4x shared buffers	2013-10-08 12:12:24 -04:00
Heikki Linnakangas	5962519b36	TYPEALIGN doesn't work on int64 on 32-bit platforms. The TYPEALIGN macro, and the related ones like MAXALIGN, don't work with values larger than intptr_t, because TYPEALIGN casts the argument to intptr_t to do the arithmetic. That's not a problem when dealing with pointers or lengths or offsets related to pointers, but the XLogInsert scaling patch added a call to MAXALIGN with an XLogRecPtr argument. To fix, add wider variants of the macros, called TYPEALIGN64 and MAXALIGN64, which are just like the existing variants but work with uint64 instead of intptr_t. Report and patch by David Rowley, analysis by Andres Freund.	2013-10-08 01:59:57 +03:00
Heikki Linnakangas	81fbbfe335	Fix bugs in SSI tuple locking. 1. In heap_hot_search_buffer(), the PredicateLockTuple() call is passed wrong offset number. heapTuple->t_self is set to the tid of the first tuple in the chain that's visited, not the one actually being read. 2. CheckForSerializableConflictIn() uses the tuple's t_ctid field instead of t_self to check for exiting predicate locks on the tuple. If the tuple was updated, but the updater rolled back, t_ctid points to the aborted dead tuple. Reported by Hannu Krosing. Backpatch to 9.1.	2013-10-08 00:18:43 +03:00
Peter Eisentraut	0b109c822b	Translation updates	2013-10-07 16:51:52 -04:00
Robert Haas	16a906f535	Make DISCARD SEQUENCES also discard the last used sequence. Otherwise, we access already-freed memory. Oops. Report by Michael Paquier. Fix by me.	2013-10-07 15:55:56 -04:00
Kevin Grittner	c01262a824	Eliminate xmin from hash tag for predicate locks on heap tuples. If a tuple was frozen while its predicate locks mattered, read-write dependencies could be missed, resulting in failure to detect conflicts which could lead to anomalies in committed serializable transactions. This field was added to the tag when we still thought that it was necessary to carry locks forward to a new version of an updated row. That was later proven to be unnecessary, which allowed simplification of the code, but elimination of xmin from the tag was missed at the time. Per report and analysis by Heikki Linnakangas. Backpatch to 9.1.	2013-10-07 14:16:54 -05:00
Alvaro Herrera	bf2617981c	Fix various bugs in postmaster SIGKILL processing Clamp the minimum sleep time during immediate shutdown or crash to a minimum of zero, not a maximum of one second. The previous code could result in a negative sleep time, leading to failure in select() calls. Also, on crash recovery, reset AbortStartTime as soon as SIGKILL is sent or abort processing has commenced instead of waiting until the startup process completes. Reset AbortStartTime as soon as SIGKILL is sent, too, to avoid doing that repeatedly. Per trouble report from Jeff Janes on CAMkU=1xd3=wFqZwwuXPWe4BQs3h1seYo8LV9JtSjW5RodoPxMg@mail.gmail.com Author: MauMau	2013-10-05 23:52:04 -03:00
Bruce Momjian	a54141aebc	Issue error on SET outside transaction block in some cases Issue error for SET LOCAL/CONSTRAINTS/TRANSACTION outside a transaction block, as they have no effect. Per suggestion from Morten Hustveit	2013-10-04 13:50:28 -04:00
Robert Haas	0f1ef79095	Fix silly thinko in ResetSequenceCaches. Report from Kevin Hale Boyes.	2013-10-03 20:17:51 -04:00
Robert Haas	d90ced8bb2	Add DISCARD SEQUENCES command. DISCARD ALL will now discard cached sequence information, as well. Fabrízio de Royes Mello, reviewed by Zoltán Böszörményi, with some further tweaks by me.	2013-10-03 16:23:31 -04:00
Heikki Linnakangas	c2b175b948	Minor GIN code refactoring. It makes for cleaner code to have separate Get/Add functions for PostingItems and ItemPointers. A few callsites that have to deal with both types need to be duplicated because of this, but all the callers have to know which one they're dealing with anyway. Overall, this reduces the amount of casting required. Extracted from Alexander Korotkov's larger patch to change the data page format.	2013-10-03 11:51:31 +03:00
Bruce Momjian	d50f281210	Adjust C comments that would be wrap-able.	2013-10-01 19:45:01 -04:00
Alvaro Herrera	15732b34e8	Add WaitForLockers in lmgr, refactoring index.c code This is in support of a future REINDEX CONCURRENTLY feature. Michael Paquier	2013-10-01 17:57:01 -03:00
Heikki Linnakangas	ee01d848f3	In bms_add_member(), use repalloc() if the bms needs to be enlarged. Previously bms_add_member() would palloc a whole-new copy of the existing set, copy the words, and pfree the old one. repalloc() is potentially much faster, and more importantly, this is less surprising if CurrentMemoryContext is not the same as the context the old set is in. bms_add_member() still allocates a new bitmapset in CurrentMemoryContext if NULL is passed as argument, but that is a lot less likely to induce bugs. Nicholas White.	2013-09-30 16:54:03 +03:00
Heikki Linnakangas	357f752138	Fix snapshot leak if lo_open called on non-existent object. lo_open registers the currently active snapshot, and checks if the large object exists after that. Normally, snapshots registered by lo_open are unregistered at end of transaction when the lo descriptor is closed, but if we error out before the lo descriptor is added to the list of open descriptors, it is leaked. Fix by moving the snapshot registration to after checking if the large object exists. Reported by Pavel Stehule. Backpatch to 8.4. The snapshot registration system was introduced in 8.4, so prior versions are not affected (and not supported, anyway).	2013-09-30 12:53:14 +03:00
Robert Haas	4334639f4b	Allow printf-style padding specifications in log_line_prefix. David Rowley, after a suggestion from Heikki Linnakangas. Reviewed by Albe Laurenz, and further edited by me.	2013-09-26 17:56:31 -04:00
Heikki Linnakangas	adaba2751f	Fix spurious warning after vacuuming a page on a table with no indexes. There is a rare race condition, when a transaction that inserted a tuple aborts while vacuum is processing the page containing the inserted tuple. Vacuum prunes the page first, which normally removes any dead tuples, but if the inserting transaction aborts right after that, the loop after pruning will see a dead tuple and remove it instead. That's OK, but if the page is on a table with no indexes, and the page becomes completely empty after removing the dead tuple (or tuples) on it, it will be immediately marked as all-visible. That's OK, but the sanity check in vacuum would throw a warning because it thinks that the page contains dead tuples and was nevertheless marked as all-visible, even though it just vacuumed away the dead tuples and so it doesn't actually contain any. Spotted this while reading the code. It's difficult to hit the race condition otherwise, but can be done by putting a breakpoint after the heap_page_prune() call. Backpatch all the way to 8.4, where this code first appeared.	2013-09-26 11:31:53 +03:00
Heikki Linnakangas	77ae7f7c35	Plug memory leak in range_cmp function. B-tree operators are not allowed to leak memory into the current memory context. Range_cmp leaked detoasted copies of the arguments. That caused a quick out-of-memory error when creating an index on a range column. Reported by Marian Krucina, bug #8468.	2013-09-25 16:02:00 +03:00
Alvaro Herrera	b2fc4d6142	Fix pgindent comment breakage	2013-09-24 18:20:37 -03:00
Robert Haas	ba3d39c969	Don't allow system columns in CHECK constraints, except tableoid. Previously, arbitray system columns could be mentioned in table constraints, but they were not correctly checked at runtime, because the values weren't actually set correctly in the tuple. Since it seems easy enough to initialize the table OID properly, do that, and continue allowing that column, but disallow the rest unless and until someone figures out a way to make them work properly. No back-patch, because this doesn't seem important enough to take the risk of destabilizing the back branches. In fact, this will pose a dump-and-reload hazard for those upgrading from previous versions: constraints that were accepted before but were not correctly enforced will now either be enforced correctly or not accepted at all. Either could result in restore failures, but in practice I think very few users will notice the difference, since the use case is pretty marginal anyway and few users will be relying on features that have not historically worked. Amit Kapila, reviewed by Rushabh Lathia, with doc changes by me.	2013-09-23 13:31:22 -04:00
Robert Haas	496439d943	Fix compiler warning in WaitForBackgroundWorkerStartup(). Per complaint from Andrew Gierth.	2013-09-19 13:00:17 -04:00
Robert Haas	86a174bff0	Typo fix. Etsuro Fujita	2013-09-18 08:57:44 -04:00
Alvaro Herrera	1247ea28cb	Remove `proc` argument from LockCheckConflicts This has been unused since commit `8563ccae2c`. Noted by Antonin Houska	2013-09-16 22:14:14 -03:00
Alvaro Herrera	dd778e9d88	Rename various "freeze multixact" variables It seems to make more sense to use "cutoff multixact" terminology throughout the backend code; "freeze" is associated with replacing of an Xid with FrozenTransactionId, which is not what we do for MultiXactIds. Andres Freund Some adjustments by Álvaro Herrera	2013-09-16 15:47:31 -03:00
Heikki Linnakangas	0892ecbc01	Add a GUC to report whether data page checksums are enabled. Bernd Helmle	2013-09-16 14:36:01 +03:00
Noah Misch	d41cb869aa	Ignore interrupts during quickdie(). Once the administrator has called for an immediate shutdown or a backend crash has triggered a reinitialization, no mere SIGINT or SIGTERM should change that course. Such derailment remains possible when the signal arrives before quickdie() blocks signals. That being a narrow race affecting most PostgreSQL signal handlers in some way, leave it for another patch. Back-patch this to all supported versions.	2013-09-11 20:10:15 -04:00
Peter Eisentraut	b34f8f409b	Show schemas in information_schema.schemata that the current has access to Before, it would only show schemas that the current user owns. Per discussion, the new behavior is more useful and consistent for PostgreSQL.	2013-09-09 22:25:37 -04:00
Robert Haas	71901ab6da	Introduce InvalidCommandId. This allows a 32-bit field to represent an optional command ID without a separate flag bit. Andres Freund	2013-09-09 16:25:29 -04:00
Noah Misch	b8104730c8	Don't VALGRIND_PRINTF() each query string. Doing so was helpful for some Valgrind usage and distracting for other usage. One can achieve the same effect by changing log_statement and pointing both PostgreSQL and Valgrind logging to stderr. Per gripe from Andres Freund.	2013-09-06 19:42:00 -04:00
Kevin Grittner	277607d600	Eliminate pg_rewrite.ev_attr column and related dead code. Commit `95ef6a3448` removed the ability to create rules on an individual column as of 7.3, but left some residual code which has since been useless. This cleans up that dead code without any change in behavior other than dropping the useless column from the catalog.	2013-09-05 14:03:43 -05:00
Heikki Linnakangas	20cb18db46	Make catalog cache hash tables resizeable. If the hash table backing a catalog cache becomes too full (fillfactor > 2), enlarge it. A new buckets array, double the size of the old, is allocated, and all entries in the old hash are moved to the right bucket in the new hash. This has two benefits. First, cache lookups don't get so expensive when there are lots of entries in a cache, like if you access hundreds of thousands of tables. Second, we can make the (initial) sizes of the caches much smaller, which saves memory. This patch dials down the initial sizes of the catcaches. The new sizes are chosen so that a backend that only runs a few basic queries still won't need to enlarge any of them.	2013-09-05 20:20:03 +03:00
Jeff Davis	b1892aaeaa	Revert WAL posix_fallocate() patches. This reverts commit `269e780822` and commit `5b571bb8c8`. Unfortunately, the initial patch had insufficient performance testing, and resulted in a regression. Per report by Thom Brown.	2013-09-04 23:43:41 -07:00
Bruce Momjian	f5c2f5a8f6	Add GUC descriptions for compile-time postgresql.conf settings Previous text was "No description available". Tianyin Xu	2013-09-04 17:44:04 -04:00
Heikki Linnakangas	375d8526f2	Keep heavily-contended fields in XLogCtlInsert on different cache lines. Performance testing shows that if the insertpos_lck spinlock and the fields that it protects are on the same cache line with other variables that are frequently accessed, the false sharing can hurt performance a lot. Keep them apart by adding some padding.	2013-09-04 23:14:33 +03:00
Robert Haas	cc52d5b33f	Expose fsync_fname as a public API. Andres Freund	2013-09-04 11:15:00 -04:00
Tom Lane	0c66a22377	Update comments concerning PGC_S_TEST. This GUC context value was once only used by ALTER DATABASE SET and ALTER USER SET. That's not true anymore, though, so rewrite the comments to be a bit more general. Patch in HEAD only, since this is just an internal documentation issue.	2013-09-03 18:56:22 -04:00
Tom Lane	546f7c2e38	Don't fail for bad GUCs in CREATE FUNCTION with check_function_bodies off. The previous coding attempted to activate all the GUC settings specified in SET clauses, so that the function validator could operate in the GUC environment expected by the function body. However, this is problematic when restoring a dump, since the SET clauses might refer to database objects that don't exist yet. We already have the parameter check_function_bodies that's meant to prevent forward references in function definitions from breaking dumps, so let's change CREATE FUNCTION to not install the SET values if check_function_bodies is off. Authors of function validators were already advised not to make any "context sensitive" checks when check_function_bodies is off, if indeed they're checking anything at all in that mode. But extend the documentation to point out the GUC issue in particular. (Note that we still check the SET clauses to some extent; the behavior with !check_function_bodies is now approximately equivalent to what ALTER DATABASE/ROLE have been doing for awhile with context-dependent GUCs.) This problem can be demonstrated in all active branches, so back-patch all the way.	2013-09-03 18:32:20 -04:00
Tom Lane	0d3f4406df	Allow aggregate functions to be VARIADIC. There's no inherent reason why an aggregate function can't be variadic (even VARIADIC ANY) if its transition function can handle the case. Indeed, this patch to add the feature touches none of the planner or executor, and little of the parser; the main missing stuff was DDL and pg_dump support. It is true that variadic aggregates can create the same sort of ambiguity about parameters versus ORDER BY keys that was complained of when we (briefly) had both one- and two-argument forms of string_agg(). However, the policy formed in response to that discussion only said that we'd not create any built-in aggregates with varying numbers of arguments, not that we shouldn't allow users to do it. So the logical extension of that is we can allow users to make variadic aggregates as long as we're wary about shipping any such in core. In passing, this patch allows aggregate function arguments to be named, to the extent of remembering the names in pg_proc and dumping them in pg_dump. You can't yet call an aggregate using named-parameter notation. That seems like a likely future extension, but it'll take some work, and it's not what this patch is really about. Likewise, there's still some work needed to make window functions handle VARIADIC fully, but I left that for another day. initdb forced because of new aggvariadic field in Aggref parse nodes.	2013-09-03 17:08:46 -04:00
Heikki Linnakangas	a93bdfc711	Fix typo in comment. Also line-wrap an over-wide line in a comment that's ignored by pgindent.	2013-09-03 13:17:09 +03:00
Peter Eisentraut	6a007fa1eb	Translation updates	2013-09-02 02:43:18 -04:00
Tom Lane	8e2b71d2d0	Reset the binary heap in MergeAppend rescans. Failing to do so can cause queries to return wrong data, error out or crash. This requires adding a new binaryheap_reset() method to binaryheap.c, but that probably should have been there anyway. Per bug #8410 from Terje Elde. Diagnosis and patch by Andres Freund.	2013-08-30 19:15:21 -04:00
Alvaro Herrera	9381cb5229	Make error wording more consistent	2013-08-29 12:42:28 -04:00
Robert Haas	090d0f2050	Allow discovery of whether a dynamic background worker is running. Using the infrastructure provided by this patch, it's possible either to wait for the startup of a dynamically-registered background worker, or to poll the status of such a worker without waiting. In either case, the current PID of the worker process can also be obtained. As usual, worker_spi is updated to demonstrate the new functionality. Patch by me. Review by Andres Freund.	2013-08-28 14:08:13 -04:00
Robert Haas	c9e2e2db5c	Partially restore comments discussing enum renumbering hazards. As noted by Tom Lane, commit `813fb03155` was overly optimistic about how safe it is to concurrently change enumsortorder values under MVCC catalog scan semantics. Restore some of the previous text, with hopefully-correct adjustments for the new state of play.	2013-08-28 13:21:08 -04:00
Alvaro Herrera	e246cfc95f	Initialize cached OID to Invalid in new hash entries Andres Freund; bug detected by valgrind	2013-08-27 14:53:17 -04:00
Tom Lane	2aac3399ae	Account better for planning cost when choosing whether to use custom plans. The previous coding in plancache.c essentially used 10% of the estimated runtime as its cost estimate for planning. This can be pretty bogus, especially when the estimated runtime is very small, such as in a simple expression plan created by plpgsql, or a simple INSERT ... VALUES. While we don't have a really good handle on how planning time compares to runtime, it seems reasonable to use an estimate based on the number of relations referenced in the query, with a rather large multiplier. This patch uses 1000 * cpu_operator_cost * (nrelations + 1), so that even a trivial query will be charged 1000 * cpu_operator_cost for planning. This should address the problem reported by Marc Cousin and others that 9.2 and up prefer custom plans in cases where the planning time greatly exceeds what can be saved.	2013-08-24 15:14:17 -04:00
Magnus Hagander	db4ef73760	Don't crash when pg_xlog is empty and pg_basebackup -x is used The backup will not work (without a logarchive, and that's the whole point of -x) in this case, this patch just changes it to throw an error instead of crashing when this happens. Noticed and diagnosed by TAKATSUKA Haruka	2013-08-24 17:13:49 +02:00
Tom Lane	fcf9ecad57	In locate_grouping_columns(), don't expect an exact match of Var typmods. It's possible that inlining of SQL functions (or perhaps other changes?) has exposed typmod information not known at parse time. In such cases, Vars generated by query_planner might have valid typmod values while the original grouping columns only have typmod -1. This isn't a semantic problem since the behavior of grouping only depends on type not typmod, but it breaks locate_grouping_columns' use of tlist_member to locate the matching entry in query_planner's result tlist. We can fix this without an excessive amount of new code or complexity by relying on the fact that locate_grouping_columns only gets called when make_subplanTargetList has set need_tlist_eval == false, and that can only happen if all the grouping columns are simple Vars. Therefore we only need to search the sub_tlist for a matching Var, and we can reasonably define a "match" as being a match of the Var identity fields varno/varattno/varlevelsup. The code still Asserts that vartype matches, but ignores vartypmod. Per bug #8393 from Evan Martin. The added regression test case is basically the same as his example. This has been broken for a very long time, so back-patch to all supported branches.	2013-08-23 17:30:53 -04:00
Tom Lane	3454876314	Fix hash table size estimation error in choose_hashed_distinct(). We should account for the per-group hashtable entry overhead when considering whether to use a hash aggregate to implement DISTINCT. The comparable logic in choose_hashed_grouping() gets this right, but I think I omitted it here in the mistaken belief that there would be no overhead if there were no aggregate functions to be evaluated. This can result in more than 2X underestimate of the hash table size, if the tuples being aggregated aren't very wide. Per report from Tomas Vondra. This bug is of long standing, but per discussion we'll only back-patch into 9.3. Changing the estimation behavior in stable branches seems to carry too much risk of destabilizing plan choices for already-tuned applications.	2013-08-21 13:38:34 -04:00
Tom Lane	20fe870753	Be more wary of unwanted whitespace in pgstat_reset_remove_files(). sscanf isn't the easiest thing to use for exact pattern checks ... also, don't use strncmp where strcmp will do.	2013-08-19 19:36:04 -04:00
Alvaro Herrera	f9b50b7c18	Fix removal of files in pgstats directories Instead of deleting all files in stats_temp_directory and the permanent directory on a crash, only remove those files that match the pattern of files we actually write in them, to avoid possibly clobbering existing unrelated contents of the temporary directory. Per complaint from Jeff Janes, and subsequent discussion, starting at message CAMkU=1z9+7RsDODnT4=cDFBRBp8wYQbd_qsLcMtKEf-oFwuOdQ@mail.gmail.com Also, fix a bug in the same routine to avoid removing files from the permanent directory twice (instead of once from that directory and then from the temporary directory), also per report from Jeff Janes, in message CAMkU=1wbk947=-pAosDMX5VC+sQw9W4ttq6RM9rXu=MjNeEQKA@mail.gmail.com	2013-08-19 17:48:17 -04:00
Heikki Linnakangas	3619a20d33	Rename the "fast_promote" file to just "promote". This keeps the usual trigger file name unchanged from 9.2, avoiding nasty issues if you use a pre-9.3 pg_ctl binary with a 9.3 server or vice versa. The fallback behavior of creating a full checkpoint before starting up is now triggered by a file called "fallback_promote". That can be useful for debugging purposes, but we don't expect any users to have to resort to that and we might want to remove that in the future, which is why the fallback mechanism is undocumented.	2013-08-19 20:59:51 +03:00
Tom Lane	c64de21e96	Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries. In an example such as SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true; it is safe to pull up the LATERAL subquery into its parent, but we must then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN. The previous coding in deconstruct_recurse mistakenly labeled the clause as "is_pushed_down", resulting in wrong semantics if the clause were applied at the join node, as per an example submitted awhile ago by Jeremy Evans. To fix, postpone processing of such clauses until we return back up to the appropriate recursion depth in deconstruct_recurse. In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery; we previously missed the possibility that the LATERAL subquery might itself contain an outer join that makes lateral references in lower quals unsafe. A regression test case equivalent to Jeremy's example was already in my commit of yesterday, but was giving the wrong results because of this bug. This patch fixes the expected output for that, and also adds a test case for the second problem.	2013-08-19 13:19:41 -04:00
Alvaro Herrera	78e1220104	Fix pg_upgrade failure from servers older than 9.3 When upgrading from servers of versions 9.2 and older, and MultiXactIds have been used in the old server beyond the first page (that is, 2048 multis or more in the default 8kB-page build), pg_upgrade would set the next multixact offset to use beyond what has been allocated in the new cluster. This would cause a failure the first time the new cluster needs to use this value, because the pg_multixact/offsets/ file wouldn't exist or wouldn't be large enough. To fix, ensure that the transient server instances launched by pg_upgrade extend the file as necessary. Per report from Jesse Denardo in CANiVXAj4c88YqipsyFQPboqMudnjcNTdB3pqe8ReXqAFQ=HXyA@mail.gmail.com	2013-08-19 12:56:18 -04:00
Peter Eisentraut	a2f2e902b8	Translation updates	2013-08-18 23:41:03 -04:00
Kevin Grittner	28154bb23b	Remove relcache entry invalidation in REFRESH MATERIALIZED VIEW. This was added as part of the attempt to support unlogged matviews along with a populated status. It got missed when unlogged support was removed pre-commit. Noticed by Noah Misch. Back-patched to 9.3 branch.	2013-08-18 16:19:22 -05:00
Tom Lane	f1d5fce7cf	Fix thinko in comment.	2013-08-17 20:36:29 -04:00
Tom Lane	9e7e29c75a	Fix planner problems with LATERAL references in PlaceHolderVars. The planner largely failed to consider the possibility that a PlaceHolderVar's expression might contain a lateral reference to a Var coming from somewhere outside the PHV's syntactic scope. We had a previous report of a problem in this area, which I tried to fix in a quick-hack way in commit `4da6439bd8`, but Antonin Houska pointed out that there were still some problems, and investigation turned up other issues. This patch largely reverts that commit in favor of a more thoroughly thought-through solution. The new theory is that a PHV's ph_eval_at level cannot be higher than its original syntactic level. If it contains lateral references, those don't change the ph_eval_at level, but rather they create a lateral-reference requirement for the ph_eval_at join relation. The code in joinpath.c needs to handle that. Another issue is that createplan.c wasn't handling nested PlaceHolderVars properly. In passing, push knowledge of lateral-reference checks for join clauses into join_clause_is_movable_to. This is mainly so that FDWs don't need to deal with it. This patch doesn't fix the original join-qual-placement problem reported by Jeremy Evans (and indeed, one of the new regression test cases shows the wrong answer because of that). But the PlaceHolderVar problems need to be fixed before that issue can be addressed, so committing this separately seems reasonable.	2013-08-17 20:22:37 -04:00
Robert Haas	2dee7998f9	Move more bgworker code to bgworker.c; also, some renaming. Per discussion on pgsql-hackers. Michael Paquier, slightly modified by me. Original suggestion from Amit Kapila.	2013-08-16 15:31:28 -04:00
Heikki Linnakangas	05cbce6f30	Fix typo in comment.	2013-08-16 16:26:22 +03:00
Kevin Grittner	3f78b1715c	Don't allow ALTER MATERIALIZED VIEW ADD UNIQUE. Was accidentally allowed, but not documented and lacked support for rename or drop once created. Per report from Noah Misch.	2013-08-15 13:14:48 -05:00
Peter Eisentraut	229fb58d4f	Treat timeline IDs as unsigned in replication parser Timeline IDs are unsigned ints everywhere, except the replication parser treated them as signed ints.	2013-08-14 23:18:49 -04:00
Peter Eisentraut	32f7c0ae17	Improve error message when view is not updatable Avoid using the term "updatable" in confusing ways. Suggest a trigger first, before a rule.	2013-08-14 23:02:59 -04:00
Tom Lane	1b1d3d92c3	Remove ph_may_need from PlaceHolderInfo, with attendant simplifications. The planner logic that attempted to make a preliminary estimate of the ph_needed levels for PlaceHolderVars seems to be completely broken by lateral references. Fortunately, the potential join order optimization that this code supported seems to be of relatively little value in practice; so let's just get rid of it rather than trying to fix it. Getting rid of this allows fairly substantial simplifications in placeholder.c, too, so planning in such cases should be a bit faster. Issue noted while pursuing bugs reported by Jeremy Evans and Antonin Houska, though this doesn't in itself fix either of their reported cases. What this does do is prevent an Assert crash in the kind of query illustrated by the added regression test. (I'm not sure that the plan for that query is stable enough across platforms to be usable as a regression test output ... but we'll soon find out from the buildfarm.) Back-patch to 9.3. The problem case can't arise without LATERAL, so no need to touch older branches.	2013-08-14 18:38:47 -04:00
Kevin Grittner	e2cd368678	Remove Assert that matview is not in system schema from REFRESH. We don't want to prevent an extension which creates a matview from being installed in pg_catalog. Issue was raised by Hitoshi Harada. Backpatched to 9.3.	2013-08-14 12:36:55 -05:00
Tom Lane	3d5282c6f0	Emit a log message if output is about to be redirected away from stderr. We've seen multiple cases of people looking at the postmaster's original stderr output to try to diagnose problems, not realizing/remembering that their logging configuration is set up to send log messages somewhere else. This seems particularly likely to happen in prepackaged distributions, since many packagers patch the code to change the factory-standard logging configuration to something more in line with their platform conventions. In hopes of reducing confusion, emit a LOG message about this at the point in startup where we are about to switch log output away from the original stderr, providing a pointer to where to look instead. This message will appear as the last thing in the original stderr output. (We might later also try to emit such link messages when logging parameters are changed on-the-fly; but that case seems to be both noticeably harder to do nicely, and much less frequently a problem in practice.) Per discussion, back-patch to 9.3 but not further.	2013-08-13 15:24:52 -04:00
Peter Eisentraut	072457b360	Message punctuation and pluralization fixes	2013-08-09 08:02:44 -04:00
Peter Eisentraut	9d775d8894	Message style improvements	2013-08-07 22:48:40 -04:00
Fujii Masao	91c3613d37	Fix assertion failure by an immediate shutdown. In PM_WAIT_DEAD_END state, checkpointer process must be dead already. But an immediate shutdown could make postmaster's state machine transition to PM_WAIT_DEAD_END state even if checkpointer process is still running, and which caused assertion failure. This bug was introduced in commit `457d6cf049`. This patch ensures that postmaster's state machine doesn't transition to PM_WAIT_DEAD_END state in an immediate shutdown while checkpointer process is running.	2013-08-08 02:48:53 +09:00
Tom Lane	3ced8837db	Simplify query_planner's API by having it return the top-level RelOptInfo. Formerly, query_planner returned one or possibly two Paths for the topmost join relation, so that grouping_planner didn't see the join RelOptInfo (at least not directly; it didn't have any hesitation about examining cheapest_path->parent, though). However, correct selection of the Paths involved a significant amount of coupling between query_planner and grouping_planner, a problem which has gotten worse over time. It seems best to give up on this API choice and instead return the topmost RelOptInfo explicitly. Then grouping_planner can pull out the Paths it wants from the rel's path list. In this way we can remove all knowledge of grouping behaviors from query_planner. The only real benefit of the old way is that in the case of an empty FROM clause, we never made any RelOptInfos at all, just a Path. Now we have to gin up a dummy RelOptInfo to represent the empty FROM clause. That's not a very big deal though. While at it, simplify query_planner's API a bit more by having the caller set up root->tuple_fraction and root->limit_tuples, rather than passing those values as separate parameters. Since query_planner no longer does anything with either value, requiring it to fill the PlannerInfo fields seemed pretty arbitrary. This patch just rearranges code; it doesn't (intentionally) change any behaviors. Followup patches will do more interesting things.	2013-08-05 15:01:09 -04:00
Kevin Grittner	841c29c8b3	Various cleanups for REFRESH MATERIALIZED VIEW CONCURRENTLY. Open and lock each index before checking definition in RMVC. The ExclusiveLock on the related table is not viewed as sufficient to ensure that no changes are made to the index definition, and invalidation messages from other backends might have been missed. Additionally, use RelationGetIndexExpressions() and check for NIL rather than doing our own loop. Protect against redefinition of tid and rowvar operators in RMVC. While working on this, noticed that the fixes for bugs found during the CF made the UPDATE statement useless, since no rows could qualify for that treatment any more. Ripping out code to support the UPDATE statement simplified the operator cleanups. Change slightly confusing local field name. Use meaningful alias names on queries in refresh_by_match_merge(). Per concerns of raised by Andres Freund and comments and suggestions from Noah Misch. Some additional issues remain, which will be addressed separately.	2013-08-05 09:57:56 -05:00
Tom Lane	221e92f64c	Make sure float4in/float8in accept all standard spellings of "infinity". The C99 and POSIX standards require strtod() to accept all these spellings (case-insensitively): "inf", "+inf", "-inf", "infinity", "+infinity", "-infinity". However, pre-C99 systems might accept only some or none of these, and apparently Windows still doesn't accept "inf". To avoid surprising cross-platform behavioral differences, manually check for each of these spellings if strtod() fails. We were previously handling just "infinity" and "-infinity" that way, but since C99 is most of the world now, it seems likely that applications are expecting all these spellings to work. Per bug #8355 from Basil Peace. It turns out this fix won't actually resolve his problem, because Python isn't being this careful; but that doesn't mean we shouldn't be.	2013-08-03 12:40:27 -04:00
Alvaro Herrera	706f9dd914	Fix old visibility bug in HeapTupleSatisfiesDirty If a tuple is locked but not updated by a concurrent transaction, HeapTupleSatisfiesDirty would return that transaction's Xid in xmax, causing callers to wait on it, when it is not necessary (in fact, if the other transaction had used a multixact instead of a plain Xid to mark the tuple, HeapTupleSatisfiesDirty would have behave differently and not returned the Xmax). This bug was introduced in commit `3f7fbf85dc`, dated December 1998, so it's almost 15 years old now. However, it's hard to see this misbehave, because before we had NOWAIT the only consequence of this is that transactions would wait for slightly more time than necessary; so it's not surprising that this hasn't been reported yet. Craig Ringer and Andres Freund	2013-08-02 17:02:36 -04:00
Alvaro Herrera	88c556680c	Fix crash in error report of invalid tuple lock My tweak of these error messages in commit `c359a1b082` contained the thinko that a query would always have rowMarks set for a query containing a locking clause. Not so: when declaring a cursor, for instance, rowMarks isn't set at the point we're checking, so we'd be dereferencing a NULL pointer. The fix is to pass the lock strength to the function raising the error, instead of trying to reverse-engineer it. The result not only is more robust, but it also seems cleaner overall. Per report from Robert Haas.	2013-08-02 13:18:37 -04:00
Robert Haas	05ee328d66	Fix typo in comment. Etsuro Fujita	2013-08-02 09:15:42 -04:00
Kevin Grittner	f31c149f13	Improve comments for IncrementalMaintenance DML enabling functions. Move the static functions after the comment and expand the comment. Per complaint from Andres Freund, although using different comment text.	2013-08-01 14:31:09 -05:00
Robert Haas	149e38e5ee	Assorted bgworker-related comment fixes. Per gripes by Amit Kapila.	2013-08-01 12:20:31 -04:00
Robert Haas	813fb03155	Remove SnapshotNow and HeapTupleSatisfiesNow. We now use MVCC catalog scans, and, per discussion, have eliminated all other remaining uses of SnapshotNow, so that we can now get rid of it. This will break third-party code which is still using it, which is intentional, as we want such code to be updated to do things the new way.	2013-08-01 10:46:19 -04:00
Stephen Frost	ddef1a39c6	Allow a context to be passed in for error handling As pointed out by Tom Lane, we can allow other users of the error handler callbacks to provide their own memory context by adding the context to use to ErrorData and using that instead of explicitly using ErrorContext. This then allows GetErrorContextStack() to be called from inside exception handlers, so modify plpgsql to take advantage of that and add an associated regression test for it.	2013-08-01 01:07:20 -04:00
Alvaro Herrera	a59516b631	Fix mis-indented lines Per Coverity	2013-07-31 17:57:15 -04:00
Tom Lane	d074b4e50d	Fix regexp_matches() handling of zero-length matches. We'd find the same match twice if it was of zero length and not immediately adjacent to the previous match. replace_text_regexp() got similar cases right, so adjust this search logic to match that. Note that even though the regexp_split_to_xxx() functions share this code, they did not display equivalent misbehavior, because the second match would be considered degenerate and ignored. Jeevan Chalke, with some cosmetic changes by me.	2013-07-31 11:31:22 -04:00
Fujii Masao	c876fb4241	Fix typo in comment. Hitoshi Harada	2013-07-31 22:53:20 +09:00
Noah Misch	16f38f72ab	Restore REINDEX constraint validation. Refactoring as part of commit `8ceb245680` had the unintended effect of making REINDEX TABLE and REINDEX DATABASE no longer validate constraints enforced by the indexes in question; REINDEX INDEX still did so. Indexes marked invalid remained so, and constraint violations arising from data corruption went undetected. Back-patch to 9.0, like the causative commit.	2013-07-30 18:36:52 -04:00
Greg Stark	c62736cc37	Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF) Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost	2013-07-29 16:38:01 +01:00
Peter Eisentraut	626092a2e1	Message style improvements	2013-07-28 07:01:13 -04:00
Tom Lane	3d13623d75	Prevent leakage of SPI tuple tables during subtransaction abort. plpgsql often just remembers SPI-result tuple tables in local variables, and has no mechanism for freeing them if an ereport(ERROR) causes an escape out of the execution function whose local variable it is. In the original coding, that wasn't a problem because the tuple table would be cleaned up when the function's SPI context went away during transaction abort. However, once plpgsql grew the ability to trap exceptions, repeated trapping of errors within a function could result in significant intra-function-call memory leakage, as illustrated in bug #8279 from Chad Wagner. We could fix this locally in plpgsql with a bunch of PG_TRY/PG_CATCH coding, but that would be tedious, probably slow, and prone to bugs of omission; moreover it would do nothing for similar risks elsewhere. What seems like a better plan is to make SPI itself responsible for freeing tuple tables at subtransaction abort. This patch attacks the problem that way, keeping a list of live tuple tables within each SPI function context. Currently, such freeing is automatic for tuple tables made within the failed subtransaction. We might later add a SPI call to mark a tuple table as not to be freed this way, allowing callers to opt out; but until someone exhibits a clear use-case for such behavior, it doesn't seem worth bothering. A very useful side-effect of this change is that SPI_freetuptable() can now defend itself against bad calls, such as duplicate free requests; this should make things more robust in many places. (In particular, this reduces the risks involved if a third-party extension contains now-redundant SPI_freetuptable() calls in error cleanup code.) Even though the leakage problem is of long standing, it seems imprudent to back-patch this into stable branches, since it does represent an API semantics change for SPI users. We'll patch this in 9.3, but live with the leakage in older branches.	2013-07-25 16:46:14 -04:00
Robert Haas	ed93feb808	Change currtid functions to use an MVCC snapshot, not SnapshotNow. This has a slight performance cost, but the only known consumers of these functions, known at the SQL level as currtid and currtid2, is pgsql-odbc; whose usage, we hope, is not sufficiently intensive to make this a problem. Per discussion.	2013-07-25 16:32:02 -04:00
Robert Haas	3483f4332d	Don't use SnapshotNow in get_actual_variable_range. Instead, use the active snapshot. Per Tom Lane, this function is most interested in knowing the range of tuples our scan will actually see. This is another step towards full removal of SnapshotNow.	2013-07-25 14:30:00 -04:00
Stephen Frost	9bd0feeba8	Improvements to GetErrorContextStack() As GetErrorContextStack() borrowed setup and tear-down code from other places, it was less than clear that it must only be called as a top-level entry point into the error system and can't be called by an exception handler (unlike the rest of the error system, which is set up to be reentrant-safe). Being called from an exception handler is outside the charter of GetErrorContextStack(), so add a bit more protection against it, improve the comments addressing why we have to set up an errordata stack for this function at all, and add a few more regression tests. Lack of clarity pointed out by Tom Lane; all bugs are mine.	2013-07-25 09:41:55 -04:00
Stephen Frost	8312832567	Add GET DIAGNOSTICS ... PG_CONTEXT in PL/PgSQL This adds the ability to get the call stack as a string from within a PL/PgSQL function, which can be handy for logging to a table, or to include in a useful message to an end-user. Pavel Stehule, reviewed by Rushabh Lathia and rather heavily whacked around by Stephen Frost.	2013-07-24 18:53:27 -04:00
Tom Lane	fa2fad3c06	Improve ilist.h's support for deletion of slist elements during iteration. Previously one had to use slist_delete(), implying an additional scan of the list, making this infrastructure considerably less efficient than traditional Lists when deletion of element(s) in a long list is needed. Modify the slist_foreach_modify() macro to support deleting the current element in O(1) time, by keeping a "prev" pointer in addition to "cur" and "next". Although this makes iteration with this macro a bit slower, no real harm is done, since in any scenario where you're not going to delete the current list element you might as well just use slist_foreach instead. Improve the comments about when to use each macro. Back-patch to 9.3 so that we'll have consistent semantics in all branches that provide ilist.h. Note this is an ABI break for callers of slist_foreach_modify(). Andres Freund and Tom Lane	2013-07-24 17:42:34 -04:00
Tom Lane	b32a25c3d5	Fix booltestsel() for case where we have NULL stats but not MCV stats. In a boolean column that contains mostly nulls, ANALYZE might not find enough non-null values to populate the most-common-values stats, but it would still create a pg_statistic entry with stanullfrac set. The logic in booltestsel() for this situation did the wrong thing for "col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null values would satisfy these tests (so that the true selectivity would be close to one, not close to zero). Per bug #8274. Fix by Andrew Gierth, some comment-smithing by me.	2013-07-24 00:44:09 -04:00
Tom Lane	10a509d829	Move strip_implicit_coercions() from optimizer to nodeFuncs.c. Use of this function has spread into the parser and rewriter, so it seems like time to pull it out of the optimizer and put it into the more central nodeFuncs module. This eliminates the need to #include optimizer/clauses.h in most of the calling files, demonstrating that this function was indeed a bit outside the normal code reference patterns.	2013-07-23 18:21:19 -04:00
Tom Lane	ef655663c5	Further hacking on ruleutils' new column-alias-assignment code. After further thought about implicit coercions appearing in a joinaliasvars list, I realized that they represent an additional reason why we might need to reference the join output column directly instead of referencing an underlying column. Consider SELECT x FROM t1 LEFT JOIN t2 USING (x) where t1.x is of type date while t2.x is of type timestamptz. The merged output variable is of type timestamptz, but it won't go to null when t2 does, therefore neither t1.x nor t2.x is a valid substitute reference. The code in get_variable() actually gets this case right, since it knows it shouldn't look through a coercion, but we failed to ensure that the unqualified output column name would be globally unique. To fix, modify the code that trawls for a dangerous situation so that it actually scans through an unnamed join's joinaliasvars list to see if there are any non-simple-Var entries.	2013-07-23 17:55:04 -04:00
Tom Lane	a7cd853b75	Change post-rewriter representation of dropped columns in joinaliasvars. It's possible to drop a column from an input table of a JOIN clause in a view, if that column is nowhere actually referenced in the view. But it will still be there in the JOIN clause's joinaliasvars list. We used to replace such entries with NULL Const nodes, which is handy for generation of RowExpr expansion of a whole-row reference to the view. The trouble with that is that it can't be distinguished from the situation after subquery pull-up of a constant subquery output expression below the JOIN. Instead, replace such joinaliasvars with null pointers (empty expression trees), which can't be confused with pulled-up expressions. expandRTE() still emits the old convention, though, for convenience of RowExpr generation and to reduce the risk of breaking extension code. In HEAD and 9.3, this patch also fixes a problem with some new code in ruleutils.c that was failing to cope with implicitly-casted joinaliasvars entries, as per recent report from Feike Steenbergen. That oversight was because of an inadequate description of the data structure in parsenodes.h, which I've now corrected. There were some pre-existing oversights of the same ilk elsewhere, which I believe are now all fixed.	2013-07-23 16:23:45 -04:00

... 5 6 7 8 9 ...

14103 Commits